Guardian UK: NHS data provided to researchers in an ‘anonymous’ form is often easy to link to the patients concerned
We’ve discussed before the problems that can arise from so-called “anonymized data.” Often, data that is believed to have been made anonymous can easily be “de-anonymized,” and sensitive data would be linked back with the affected individual.
One of the most-publicized examples of reidentification of anonymized data occurred in 2006 with the publication of search records of 658,000 Americans by AOL demonstrated that the storage of a number as opposed to a name or address does not necessarily mean that search data cannot be linked back to an individual. Though the search logs released by AOL had been anonymized, identifying the user by only a number, New York Times reporters were quickly able to match some user numbers with the correct individuals. User No. 4417749 “conducted hundreds of searches over a three-month period on topics ranging from ‘numb fingers’ to ‘60 single men’ to ‘dog that urinates on everything.’” A short investigation led Times reporters to “Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga.” and has three dogs.
Now, the Guardian discusses privacy concerns about National Health Service data in the UK being provided to researchers:
Under the current legal system medical records can only be accessed with either (a) the explicit consent of the patient; (b) by special permission from the National Information Governance Board (NIGB, a body established to authorise access of identifiable data without consent); or, crucially, (c) if the information has been pseudonymised (or key coded) – more detail in a letter written to the British Medical Journal by myself and colleagues. This makes it important to clarify what is meant by pseudonymisation as other safeguards are not triggered.
Pseudonymisation is achieved by removing identifiers such as a person’s name and first line of their address, and replacing them with a unique identifying number. Whilst this method might prevent immediate identification it does not make re-identification impossible or even difficult. Furthermore, for some research, identifiers may be desirable to facilitate accurate linkage between data systems.
Inadequate measures to anonymise data means that, in data protection law, the data remain identifiable, and thus as ‘personal data’ are subject to UK and European data protection rules which emphasise the need for individual consent. […]
The steps currently taken by the NHS to anonymise patient data are inadequate and do not move the data out the scope of data protection laws.
For more on reidentification of “de-identified” data, see privacy consultant Bob Gellman’s recent article in the Fordham Intellectual Property Media and Entertainment Law Journal. “The Deidentification Dilemma: A Legislative and Contractual Proposal” (Fordham pdf; archive pdf).