In March, the Federal Trade Commission started a new technology blog and Twitter account for FTC Chief Technologist Ed Felten. Recently, Felten wrote two posts concerning the issues of anonymity and privacy. In the first, he discusses “hashing” as a poor technique for “anonymization.” (We’ve discussed problems with anonymization and de-anonymization before.) Felten writes:
What is hashing anyway? What we’re talking about is technically called a “cryptographic hash function” (or, to super hardcore theory nerds, a randomly chosen member of a pseudorandom function family–but I digress). I’ll just call it a “hash” for short. A hash is a mathematical function: you give it an input value and the function thinks for a while and then emits an output value; and the same input always yields the same output. What makes a hash special is that it is as unpredictable as a mathematical function can be–it is designed so that there is no rhyme or reason to its behavior, except for the iron rule that the same input always yields the same output.
He goes on to give an example of how a hash can be a poor anonymization technique, but he also notes: “Does this means that hashing always fails, and is never a good way to scrub data? Almost, but not quite. There are more advanced uses of hashing that can offer some protection in some settings. But the casual assumption that hashing is sufficient to anonymize data is risky at best, and usually wrong.”
In the second blog post, Felten writes about pseudonyms as related to anonymity:
A pseudonym is any kind of identifier, other than a name, that is associated with a person or (what often amounts to the same thing) a device. Pseudonyms are very common. Examples include the random ID value in a tracking cookie; a device ID such as a WiFi MAC address or a phone’s UDID; a synthetic identifier such as an “OpenUDID”; a mobile phone number; or a Twitter handle. […]
You might think that a randomly chosen pure pseudonym conveys no information about anybody, but that is not right. As soon as you associate the pseudonym with somebody, the pseudonym gives you the ability to record information about that person, or associate information with them. For example, if you can observe the browsing habits of a person who has a known pseudonym, then you can build up a record of where that individual browsed–which conveys information about them.