Washington Technology has a report on the Department of Health and Human Services and “anonymized data.” I’ve written about this issue before. Often, data that is believed to have been made anonymous can easily be “de-anonymized,” and sensitive data would be linked back with the affected individual.
Carnegie Mellon professor Latanya Sweeney has been researching the issue of de-anonymization or re-identification of data for years. In 1998, she explained how a former governor of Massachusetts had his full medical record re-identified by cross-referencing Census information with de-identified health data. Sweeney also found that, with birth date alone, 12 percent of a population of voters can be re-identified. With birth date and gender, that number increases to 29 percent, and with birth date and zip code it increases to 69 percent.
In 2000, Sweeney found that 87 percent of the U.S. population could be identified with birth date, gender and zip code. She used 1990 Census data. In 2006, Philippe Golle at the Palo Alto Research Center revisited her research, using 2000 Census data, and found that (pdf) “disclosing one’s gender, ZIP code and full date of birth allows for unique identification” revealed the identity of 63 percent of the U.S. population. (Note that the U.S. population in 1990 was 248.7 million and the 2000 population was 281.4 million.)
In August, University of Colorado law professor Paul Ohm discussed “the surprising failure of anonymization,” and said, “Data can either be useful or perfectly anonymous but never both.” He said anonymization’s failure “should trigger a sea change in the law, because nearly every information privacy law or regulation grants a get-out-of-jail-free card to those who anonymize their data.”
Now, Washington Technology reports:
HHS intends to hire a contractor to demonstrate either the “ability or inability” to re-identify data from a data set that has been de-identified under the Health Information Portability and Accountability Act (HIPAA) Privacy Rule, according to a Jan. 4 notice on the Federal Business Opportunities Web site. […]
Under HIPAA, hospitals and other health care providers de-identify personal medical data by removing the 18 identifiers in the data. The hospital or other entity does not have actual knowledge that the data could be used alone or in combinations to identify the individual.
Under this new contract, HHS will research re-identifying the data and matching it to a specific individual.
“The contractor shall take one or more HIPAA Privacy Rule de-identified data sets and, using methods and technologies that exclude “brute force” matching, demonstrate the ability or inability to re-identify the data,” the notice states.
The re-identification must be an accurate and unambiguous match to an individual.
To protect the privacy of the personal medical data to be used in the project, the data will be prohibited from being shared in either its de-identified form or any other forms created in the project, the notice adds.