Latanya Sweeney, visiting professor of computer science at Harvardâ€™s School of Engineering and Applied Sciences,Â has been researching the issue of de-anonymization or re-identification of data for years. In 1998,Â she explained how a former governor of Massachusetts had his full medical record re-identified by cross-referencing Census information with de-identified health data. SweeneyÂ also found that, with birth date alone, 12 percent of a population of voters can be re-identified. With birth date and gender, that number increases to 29 percent, and with birth date and zip code it increases to 69 percent.Â In 2000,Â Sweeney found that 87 percent of the U.S. population could be identified with birth date, gender and zip code. She used 1990 Census data.
Now, the Harvard Gazette reports on research by Sweeney concerning the dangers that can arise from the re-identification of “anonymized” medical data, and her advocacy of a “privacy-preserving marketplace” for data:
When you visit a pharmacy to pick up antidepressants, cholesterol medication, or birth control pills, you might expect a certain measure of privacy. In reality, prescription information is routinely sold to analytics companies for use in research and pharmaceutical marketing.
That information might include your doctorâ€™s name and address, your diagnosis, the name and dose of your prescription, the time and place where you picked it up, your age and gender, and an encoded version of your name.
Under federal privacy law, this data sharing is perfectly legal. As a safeguard, part of the Health Insurance Portability and Accountability Act (HIPAA) requires that a person â€œwith appropriate knowledge of and experience with generally accepted statistical and scientific principles and methodsâ€ must certify that there is a â€œvery smallâ€ risk of re-identification by an â€œanticipated recipientâ€ of the data.
But Latanya Sweeney, A.L.B. â€™95, a visiting professor of computer science at Harvardâ€™s School of Engineering and Applied Sciences (SEAS), warns that loopholes abound. Even without the patientsâ€™ names, she says, it may be quite easy to re-identify the subjects. […]
With SEAS faculty members David Parkes, the Gordon McKay Professor of Computer Science, and Stephen Chong, assistant professor of computer science, as well as Alex Pentland at Massachusetts Institute of Technology, Sweeney advocates a â€œprivacy-preserving marketplaceâ€ in which society can reap the benefits of shared data, especially in the scientific and medical arenas, while also protecting individuals from economic harm when the data is shared beyond its original intended use.
â€œWe donâ€™t want data to be locked away and never used, because we could be doing so much more if people were able to share data in a way thatâ€™s trustworthy and aligned with the intentions of all the participants,â€ said Parkes.
Medical, genetic, financial, and location data, along with purchasing histories, are all extremely valuable pieces of information for social science research, epidemiology, strategic marketing, and other behind-the-scenes industries. But if one database can be matched up with another â€” and, as Sweeney has demonstrated, it often can â€” then an interested party can easily generate a detailed picture of a specific individualâ€™s life.
This can be both useful and damaging, as when participants in a genomic study help advance science but then find themselves unable to obtain life insurance. […]
After-the-fact protections for some of these types of discrimination do exist, but mechanisms to compensate for these harms fairly â€” or to prevent them entirely â€” are weak.