MIT Technology Review reports on a new paper discussed at a Symposium on the Dynamics of the Internet and Society that discusses why companies and academic researchers should be careful about gathering and using consumer data:
The reams of data that many modern businesses collect—dubbed “big data”—can provide powerful insights. […] A new paper presented at a recent Symposium on the Dynamics of the Internet and Society spells out the reasons that businesses and academics should proceed with caution. While privacy invasions—both deliberate and accidental—are obvious issues, the paper also warns that data can easily be incomplete and distorted.
“With big data comes big responsibilities,” says Kate Crawford, an associate professor at the University of New South Wales, who was involved with the work. […]
Crawford’s paper, written with Microsoft senior researcher Danah Boyd, illustrates the ways that big data sets can fall down, particularly when used to make claims about people’s behavior. “Big data sets are never complete,” Crawford says. For example, researchers often study Facebook to analyze people’s social relationships, using connections made through the social network as a stand-in for real-world ties. But it’s common for Facebook to show a distorted picture of people’s closest social relationships, such as with parents, live-in romantic partners, or friends seen daily. […]
Google is a poster child for the power of data. The company has transformed a massive amount of information, gathered through its search engine, into a commanding ad network and powerful role as the gatekeeper of much of the world’s information. […]
The researchers add that big data can also raise serious ethical concerns. Many times, Crawford notes, combining data from different sources can lead to unexpected results for the people involved. For example, other researchers have previously shown that they can identify individuals by using social media data in combination with supposedly anonymized behavioral data provided by companies.
Jennifer Chayes, managing director of Microsoft Research New England, says her lab has had firsthand experience with such problems. The lab wanted to run a contest for researchers to analyze a set of search data, she says, and was going over the data carefully to avoid the sorts of deanonymizing scandals that have occurred from search data releases in the past. They discovered that people often entered search terms that were personally identifying and embarrassing—such as, “Is my wife Jane Doe cheating on me?” The lab nixed the contest. Chayes says, “We began to realize how much we didn’t understand about human behavior around search engines.”