The Wall Street Journal reports that social-networking site Facebook is testing new software for gathering more data on its users:
Facebook Inc. is testing technology that would greatly expand the scope of data that it collects about its users, the head of the company’s analytics group said Tuesday.
The social network may start collecting data on minute user interactions with its content, such as how long a user’s cursor hovers over a certain part of its website, or whether a user’s newsfeed is visible at a given moment on the screen of his or her mobile phone, Facebook analytics chief Ken Rudin said Tuesday during an interview. […]
Facebook collects two kinds of data, demographic and behavioral. The demographic data—such as where a user lives or went to school—documents a user’s life beyond the network. The behavioral data—such as one’s circle of Facebook friends, or “likes”—is captured in real time on the network itself. The ongoing tests would greatly expand the behavioral data that is collected, according to Mr. Rudin. The tests are ongoing and part of a broader technology testing program, but Facebook should know within months whether it makes sense to incorporate the new data collection into the business, he said.
New types of data Facebook may collect include “did your cursor hover over that ad … and was the newsfeed in a viewable area,” Mr. Rudin said. “It is a never-ending phase. I can’t promise that it will roll out. We probably will know in a couple of months,” said Mr. Rudin, a Silicon Valley veteran who arrived at Facebook in April 2012 from Zynga Inc., where he was vice president of analytics and platform technologies. […]
Facebook also is a major user of Hadoop, an open-source framework that is used to store large amounts of data on clusters of inexpensive machines. Facebook designs its own hardware to store its massive data analytics warehouse, which has grown 4,000 times during the last four years to a current level of 300 petabytes. The company uses a modified version of Hadoop to manage its data, according to Mr. Rudin. There are additional software layers on top of Hadoop, which rank the value of data and make sure it is accessible.
The data in the analytics warehouse—which is separate from the company’s user data, the volume of which has not been disclosed—is used in the targeting of advertising. As the company captures more data, it can help marketers target their advertising more effectively—assuming, of course, that the data is accessible.