Latanya Sweeney, director of theÂ Data Privacy LabÂ at Harvard,Â has been researching the issue of de-anonymization or re-identification of data for years. In 1998,Â she explained howÂ a former governor of Massachusetts had his full medical record re-identified by cross-referencing Census information with de-identified health data. SweeneyÂ also found that, with birth date alone, 12 percent of a population of voters can be re-identified. With birth date and gender, that number increases to 29 percent, and with birth date and zip code it increases to 69 percent.Â In 2000,Â Sweeney found thatÂ 87 percent of the U.S. population could be identified with birth date, gender and zip code. She used 1990 Census data. Â In 2011, her research reported onÂ the dangers that can arise from the re-identification of â€œanonymizedâ€ medical data, and her advocacy of a â€œprivacy-preserving marketplaceâ€ for data.
Now, Forbes reports, a group led by Sweeney has been able to re-identify “more than 40% of a sample of anonymous participants in a high-profile DNA study.”
From the onset, theÂ Personal Genome Project, set up by Harvard Medical School Professor of Genetics George Church, has warned participants of the risk that someone someday could identify them, meaning anyone could look up the intimate medical histories that many have posted along with their genome data. That day arrived on Thursday.
Professor Latanya Sweeney, director of theÂ Data Privacy LabÂ at Harvard, along with her research assistant and two students scraped data on 1,130 people of the now more than 2,500 who have shared their DNA data for the Personal Genome Project. Churchâ€™s project posts information about the volunteers on the Internet to help researchers gain new insights about human health and disease. Their names do not appear, but the profiles list medical conditions including abortions, illegal drug use, alcoholism, depression, sexually transmitted diseases, medications and their DNA sequence.
Of the 1,130 volunteers Sweeney and her team reviewed, about 579 provided zip code, date of birth and gender, the three key pieces of information she needs to identify anonymous people combined with information from voter rolls or other public records. Of these, Sweeney succeeded in naming 241, or 42% of the total. The Personal Genome Project confirmed that 97% of the names matched those in its database if nicknames and first name variations were included. She describes her findingsÂ here.
Sweeney has also set up aÂ web pageÂ for anyone to test how unique their birthdate, gender and zip are in combination. When I tried it, I was the only match in my zip code, suggesting that I, like so many others, would be easy to re-identify. â€œThis allows us to show the vulnerabilities and to show that they can be identified by name,â€ she said. â€œVulnerabilities exist but there are solutions too.â€
Read the full article to learn more.