• Categories

  • Archives

    « Home

    National Research Council Report Finds Data-Mining Programs Don’t Really Work

    “Automated identification of terrorists through data mining (or any other known methodology) is neither feasible as an objective nor desirable as a goal of technology development efforts,” says a report on data mining from the National Research Council. “[E]ven in well-managed programs such tools are likely to return significant rates of false positives, especially if the tools are highly automated.” 

    The remarks are part of new report, “Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment,” which was sponsored by the Department of Homeland Security and the National Science Foundation. The authors held a panel discussion on the report’s findings. Speakers were: Committee co-chairman Charles Vest, president of the National Academy of Engineering; co-chairman and former defense secretary William Perry, and Indiana University professor Fred Cate. 

    The report sets out a framework for evaluating data mining programs, which the committee hopes will be used for both classified and unclassified programs domestically and globally. (Read about a notorious data-mining program: Total Information Awareness.) Basically, evaluators must determine whether the program works, how invasive it is, and whether the benefits are worth the costs. The authors believe a new framework is necessary, because current laws are not adequate for protecting privacy in the digital age.

    This is the essence of the information age — it provides us with convenience, choice, efficiency, knowledge, and entertainment; it supports education, health care, safety, and scientific discovery. Everyone leaves personal digital tracks in these systems whenever he or she makes a purchase, takes a trip, uses a bank account, makes a phone call, walks past a security camera, obtains a prescription, sends or receives a package, files income tax forms, applies for a loan, e‑mails a friend, sends a fax, rents a video, or engages in just about any other activity. The proliferation of security cameras and means of tagging and tracking people and objects increases the scope and nature of available data. Law-abiding citizens leave extensive digital tracks, and so do criminals and terrorists.

    The panelists were careful to note that they did not evaluate specific programs and they did not review classified programs. They said they wanted to ensure that the entire report could remain unclassified. 

    Some important points from the panel discussion:

    • More data does not mean better results; they are emphasizing quality of the data over quantity.
    • Any programs attempting to assess an individual’s state of mind is considered suspect. Such behavioral detection programs should be a tool for further investigation, but not determinative of intent because there is a high probability of false positives.
    • Also, in order to protect privacy and prevent mission creep, there should be external oversight of programs no matter how much internal oversight there is.

    The committee’s findings are nothing new to civil liberties activists and experts who have highlighted problems with data-mining programs searching for terrorists. 

    What I told the Washington Times a couple years ago: 

    Melissa Ngo, director of the identification and surveillance project for the Electronic Privacy Information Center, says the technology overwhelms law-enforcement officials with too much information that cannot be verified.

    “This is not a thorough investigative technique, but searching for a needle in a haystack when you don’t even know what the needle looks like. We will always support thorough investigations based on real evidence, not rumors, guesses or fishing expeditions,” she said.

    Security guru Bruce Schneier has written a couple of good essays on the subject. His argument:

    Data mining works best when you’re searching for a well-defined profile, a reasonable number of attacks per year and a low cost of false alarms. Credit-card fraud is one of data mining’s success stories: all credit-card companies mine their transaction databases for data for spending patterns that indicate a stolen card. […]

    Terrorist plots are different. There is no well-defined profile and attacks are very rare. Taken together, these facts mean that data-mining systems won’t uncover any terrorist plots until they are very accurate, and that even very accurate systems will be so flooded with false alarms that they will be useless.

    All data-mining systems fail in two different ways: false positives and false negatives. A false positive is when the system identifies a terrorist plot that really isn’t one. A false negative is when the system misses an actual terrorist plot. Depending on how you “tune” your detection algorithms, you can err on one side or the other: you can increase the number of false positives to ensure you are less likely to miss an actual terrorist plot, or you can reduce the number of false positives at the expense of missing terrorist plots.

    To reduce both those numbers, you need a well-defined profile. And that’s a problem when it comes to terrorism. In hindsight, it was really easy to connect the 9/11 dots and point to the warning signs, but it’s much harder before the fact. Certainly, many terrorist plots share common warning signs, but each is unique, as well. The better you can define what you’re looking for, the better your results will be. Data mining for terrorist plots will be sloppy, and it’ll be hard to find anything useful.

    The National Research Council report also discussed the problems that can come from false positives:

    Because the data being analyzed are primarily about ordinary, law-abiding citizens and businesses, false positives can result in invasion of their privacy. Such intrusions raise valid concerns about the misuse and abuse of data, about the accuracy of data and the manner in which the data are aggregated, and about the possibility that the government could, through its collection and analysis of data, inappropriately influence individuals’ conduct.

    More coverage of the report is available at the New York Times, Wired, and CNet News.

    Leave a Reply