PII - Personally Identifiable Information

About 2 min read

Personally Identifiable Information (PII) is a general term for information that can identify a specific individual, either on its own or in combination with other information. Its scope varies greatly depending on the jurisdiction and context, ranging from items that directly identify a person, such as a name or address, to items that can identify a person only when combined with other data, such as an IP address or a Cookie ID. It is the most fundamental category of data classification and a concept that serves as the starting point for every security measure.

Direct Identifiers and Quasi-Identifiers

Direct identifiers

Can identify a person on their own

Full name
My Number / SSN
Passport number
Email address
Facial photo

Quasi-identifiers

Can identify a person in combination

Date of birth
Postal code
Gender
Occupation
IP address

A study by Professor Latanya Sweeney of Carnegie Mellon University (2000) showed that, using U.S. census data, just three items (postal code, date of birth, and gender) could uniquely identify about 87% of the population. This study demonstrated how vulnerable "supposedly anonymized data" can be, and was a groundbreaking achievement that alerted the world to the dangers of quasi-identifiers.

Differences in Definition by Jurisdiction

Aspect	GDPR (EU)	CCPA (California, USA)	Act on the Protection of Personal Information (Japan)
Term	Personal Data	Personal Information	Personal information
IP address	Counts as personal data	May count	Does not count on its own (counts when combined with communication logs)
Cookie ID	Counts as personal data	May count	Personal-related information (regulation tightened in the 2022 amendment)
Penalties for violations	4% of global turnover or €20 million	Up to $7,500 per violation	Up to 1 year of imprisonment or a fine of up to 1 million yen for violating an order

Notably, the GDPR adopts the broadest definition. Under the GDPR, "any information relating to an identified or identifiable natural person" is treated as personal data, and online identifiers (cookies, advertising IDs) are clearly included. When operating a service globally, defining PII according to the standard of the strictest jurisdiction is the safe approach in practice.

Anonymization Techniques - k-Anonymity and Beyond

Anonymization is the technique of processing a dataset containing PII so that individuals cannot be identified when the data is used for analysis or sharing. There are three representative methods.

k-Anonymity
k or more share the same attributes

l-Diversity
l or more distinct sensitive values

t-Closeness
Keep distribution skew at or below t

k-Anonymity guarantees that "at least k records share the same combination of quasi-identifiers," but if a sensitive attribute (such as a disease name) has the same value for everyone, individual information can still be inferred. l-Diversity addresses this weakness, and t-Closeness goes further by controlling even the skew in the distribution of attribute values. In practice, it is common to provide layered protection by combining these with data masking and tokenization.

The Principle of Data Minimization

The most effective strategy for protecting PII is to avoid collecting and retaining unnecessary PII in the first place. The "principle of data minimization" set out in Article 5 of the GDPR requires collecting and processing only the minimum data necessary for the purpose. In practice, it is important to ask "is this field really necessary?" at the design stage. For example, if the goal is age verification, an "18 or over" flag is sufficient rather than the full date of birth. The less PII you retain, the more limited the damage will naturally be in the event of a data breach.privacy protection books on Amazon are a good way to learn concrete implementation patterns for data minimization.

Impact and Response When PII Is Leaked

When PII is leaked, an organization has a legal obligation to report to the supervisory authority and notify the individuals concerned. The GDPR requires notification of the supervisory authority within 72 hours, and Japan's Act on the Protection of Personal Information was also amended in 2022 to make reporting to the Personal Information Protection Commission and notifying individuals mandatory. The impact of a leak is not limited to financial damage; there is a risk that the information will be abused as target data for credential stuffing or spear phishing. Please also review the data breach response guide and the privacy settings guide.

Related Terms

Was this article helpful?

← Back to Glossary