In an opinion column at MIT Technology Review, Jeffrey F. Rayport of MarketspaceNext discusses the issue of data collection and ethics:
In this era of Big Data, there is little that cannot be tracked in our online lives—or even in our offline lives. […] companies are steadily gaining new ways to capture information about us. They now have the technology to make sense of massive amounts of unstructured data, using natural language processing, machine learning, and software architectures such as Hadoop, which handles high volumes of simultaneous search queries. Messy data of this kind, long relegated to data warehouses, is now the target of data mining. So is the information generated by social networks—user profiles and posts. Its quantity is staggering: a recent report from the market intelligence firm IDC estimates that in 2009 stored information totaled 0.8 zetabytes, the equivalent of 800 billion gigabytes. IDC predicts that by 2020, 35 zetabytes of information will be stored globally. Much of that will be customer information. As the store of data grows, the analytics available to draw inferences from it will only become more sophisticated.
It’s no wonder that there are calls for corporations to create positions such as chief privacy officer, chief safety officer, and chief data officer, or that American and European legislators have been considering several kinds of privacy measures. […]
In the private sector, the Digital Advertising Alliance has sought to get ahead of such rule-making by introducing its own privacy framework to assure the security and safety of customer information. […]
The opportunity for profit helps explain the rise of dozens of data exchanges, data marts, predictive analytic engines, and other intermediaries. It’s also why players such as Google, Facebook, and Zynga, among many others, are finding ways to aggregate ever more information about users. […]
The potential dark side of Big Data suggests the need for a code of ethical principles. Here are some proposals for how to structure them.
Clarity on Practices: When data is being collected, let users know about it—in real time. Such disclosure would address the issue of hidden files and unauthorized tracking. Giving users access to what a company knows about them could go a long way toward building trust. Google has done this already. If you want to know what Google knows about you, go to www.google.com/ads/preferences, and you can see both the data it has collected and the inferences it, and third parties, have drawn from what you’ve done.
Read the full article for more suggestions for an ethical code on data collection.