PII - Personally Identifiable Information
About 2 min read
Personally Identifiable Information (PII) is a general term for information that can identify a specific individual, either on its own or in combination with other information. Its scope varies greatly depending on the jurisdiction and context, ranging from items that directly identify a person, such as a name or address, to items that can identify a person only when combined with other data, such as an IP address or a Cookie ID. It is the most fundamental category of data classification and a concept that serves as the starting point for every security measure.
Direct Identifiers and Quasi-Identifiers
Can identify a person on their own
- Full name
- My Number / SSN
- Passport number
- Email address
- Facial photo
Can identify a person in combination
- Date of birth
- Postal code
- Gender
- Occupation
- IP address
A study by Professor Latanya Sweeney of Carnegie Mellon University (2000) showed that, using U.S. census data, just three items (postal code, date of birth, and gender) could uniquely identify about 87% of the population. This study demonstrated how vulnerable "supposedly anonymized data" can be, and was a groundbreaking achievement that alerted the world to the dangers of quasi-identifiers.
Differences in Definition by Jurisdiction
| Aspect | GDPR (EU) | CCPA (California, USA) | Act on the Protection of Personal Information (Japan) |
|---|---|---|---|
| Term | Personal Data | Personal Information | Personal information |
| IP address | Counts as personal data | May count | Does not count on its own (counts when combined with communication logs) |
| Cookie ID | Counts as personal data | May count | Personal-related information (regulation tightened in the 2022 amendment) |
| Penalties for violations | 4% of global turnover or €20 million | Up to $7,500 per violation | Up to 1 year of imprisonment or a fine of up to 1 million yen for violating an order |
Notably, the GDPR adopts the broadest definition. Under the GDPR, "any information relating to an identified or identifiable natural person" is treated as personal data, and online identifiers (cookies, advertising IDs) are clearly included. When operating a service globally, defining PII according to the standard of the strictest jurisdiction is the safe approach in practice.
Anonymization Techniques - k-Anonymity and Beyond
Anonymization is the technique of processing a dataset containing PII so that individuals cannot be identified when the data is used for analysis or sharing. There are three representative methods.
k or more share the same attributes
l or more distinct sensitive values
Keep distribution skew at or below t
k-Anonymity guarantees that "at least k records share the same combination of quasi-identifiers," but if a sensitive attribute (such as a disease name) has the same value for everyone, individual information can still be inferred. l-Diversity addresses this weakness, and t-Closeness goes further by controlling even the skew in the distribution of attribute values. In practice, it is common to provide layered protection by combining these with data masking and tokenization.
The Principle of Data Minimization
The most effective strategy for protecting PII is to avoid collecting and retaining unnecessary PII in the first place. The "principle of data minimization" set out in Article 5 of the GDPR requires collecting and processing only the minimum data necessary for the purpose. In practice, it is important to ask "is this field really necessary?" at the design stage. For example, if the goal is age verification, an "18 or over" flag is sufficient rather than the full date of birth. The less PII you retain, the more limited the damage will naturally be in the event of a data breach.privacy protection books on Amazon are a good way to learn concrete implementation patterns for data minimization.
Impact and Response When PII Is Leaked
When PII is leaked, an organization has a legal obligation to report to the supervisory authority and notify the individuals concerned. The GDPR requires notification of the supervisory authority within 72 hours, and Japan's Act on the Protection of Personal Information was also amended in 2022 to make reporting to the Personal Information Protection Commission and notifying individuals mandatory. The impact of a leak is not limited to financial damage; there is a risk that the information will be abused as target data for credential stuffing or spear phishing. Please also review the data breach response guide and the privacy settings guide.
Was this article helpful?