Latanya Sweeney, director of the Data Privacy Lab at Harvard, has been researching the issue of de-anonymization or re-identification of data for years. In 1998, she explained how a former governor of Massachusetts had his full medical record re-identified by cross-referencing Census information with de-identified health data. Sweeney also found that, with birth date alone, 12 percent of a population of voters can be re-identified. With birth date and gender, that number increases to 29 percent, and with birth date and zip code it increases to 69 percent. In 2000, Sweeney found that 87 percent of the U.S. population could be identified with birth date, gender and zip code. She used 1990 Census data. In 2011, her research reported on the dangers that can arise from the re-identification of “anonymized” medical data, and her advocacy of a “privacy-preserving marketplace” for data.
Now, Forbes reports, a group led by Sweeney has been able to re-identify “more than 40% of a sample of anonymous participants in a high-profile DNA study.”
From the onset, the Personal Genome Project, set up by Harvard Medical School Professor of Genetics George Church, has warned participants of the risk that someone someday could identify them, meaning anyone could look up the intimate medical histories that many have posted along with their genome data. That day arrived on Thursday.
Professor Latanya Sweeney, director of the Data Privacy Lab at Harvard, along with her research assistant and two students scraped data on 1,130 people of the now more than 2,500 who have shared their DNA data for the Personal Genome Project. Church’s project posts information about the volunteers on the Internet to help researchers gain new insights about human health and disease. Their names do not appear, but the profiles list medical conditions including abortions, illegal drug use, alcoholism, depression, sexually transmitted diseases, medications and their DNA sequence.
Of the 1,130 volunteers Sweeney and her team reviewed, about 579 provided zip code, date of birth and gender, the three key pieces of information she needs to identify anonymous people combined with information from voter rolls or other public records. Of these, Sweeney succeeded in naming 241, or 42% of the total. The Personal Genome Project confirmed that 97% of the names matched those in its database if nicknames and first name variations were included. She describes her findings here.
Sweeney has also set up a web page for anyone to test how unique their birthdate, gender and zip are in combination. When I tried it, I was the only match in my zip code, suggesting that I, like so many others, would be easy to re-identify. “This allows us to show the vulnerabilities and to show that they can be identified by name,” she said. “Vulnerabilities exist but there are solutions too.”
Read the full article to learn more.