|Home | About | Journals | Submit | Contact Us | Français|
Sharing data between multiple institutions offers better regulation and public protection
The recent loss of sensitive data on 25 million people by the government of the United Kingdom is just the latest in a series of events covering a wide variety of institutions in different countries.1 2 Media coverage has tended to focus on obvious aspects of the unintended release of personal data, such as the disks that were lost and omissions in procedure, such as encryption.3 The most important question in all of these scandals, however, is how a single failure or lapse in procedure could result in a catastrophic disclosure.
The consequences of these security failures are exacerbated by our increasing tendency to centralise large and detailed data from multiple sources, and the existence of policies and legislation that enable the sharing of data between organisations. In health care, the collection and storage of sensitive personal data is essential for delivering a high quality clinical service and for research.4 Indeed, the future function of the National Health Service (NHS) depends on it.5
The fundamental problem is the simplistic approach often taken to define and regulate the access of users to data. Privilege management and access control refer to the policies and systems in place to specify what users are allowed to do, including what modifications, exports, or onward communications may be performed.6 Once the system has given a user access to certain data, the only protection against misuse is usually a set of documented procedures that specify how users should behave in given circumstances. The same rules apply whether the data come from a single system, a data warehouse, or a group of systems logically connected to appear as one (database federation). These written documents, often called standard operating procedures (SOPs), can prevent inadvertent disclosure of data only if staff are trained to use them consistently; if users do not have malicious intent, are competent, and don’t make mistakes; and if the author of the SOP has planned for all scenarios relating to data access and sharing. Unfortunately, evidence shows that it is difficult for all of these conditions to be fulfilled.7
Access control is one of a range of security measures; other examples are encryption and authentication. Authentication confirms the identity of the user as opposed to determining what he or she is permitted to do, whereas encryption prevents eavesdropping. Neither improves the quality of access control.
A common feature of the methods used to allow the legitimate release of data is that usually only one authorised user extracts and communicates the requested information, without supervision or cross checking. This is why SOPs are important but also insufficient, because they form a closed opaque system. Many features of an SOP can be computerised, and we urgently need better technical measures to enforce and verify procedures that represent good practice. When properly implemented, this can provide transparency, counter conflicts of interest, and enforce agreed procedures.
The dispersal of identifiable data between institutions has advantages—ranging from improved security to local control—but it does require a multi-institutional policy and a mechanism for the construction of combined datasets.8 General practitioners concerned about the NHS Care Record Service might be more comfortable with the concept of local record systems, which need additional authorisation before aggregation at a national level.9
Providing such additional authorisation has logistic and workload implications, but it is possible. Transparency can be provided by multiple intermediaries who are independent from those who hold and receive the data.10 Under such a framework, a single failure or lapse in procedure cannot result in release of data.
The use of unique individual identifiers is essential when sharing bulk data. Different institutions often use different identifiers and this contributes to the problem of sharing data. To share person specific data—for example, in a post-genomic research project that links drug prescribing to genetic data—we must either share unique identifiers or understand the relation between the different sets of identifiers used.11 This process is known as linkage, and the purpose of the new NHS number and the proposed national ID card is to make this accurate and efficient. The best way of achieving linkage is for an independent intermediary to anonymise the data and provide a new common identifier that links the different records for the same person, but which is not the same as any “real” identifier used in the NHS. This process, known as pseudonymisation, permits authorised users to combine the anonymised data on individuals that they have been given. It does not permit linkage to any other (identifiable) clinical data that they might gain access to.
The ID card is the subject of much debate, and whether it protects or harms depends on the environment into which it is introduced.12 In a world of many institutions sharing data behind closed doors it could be harmful. But in a world in which data are shared in an open, transparent, and well regulated environment it is an essential way to preserve privacy. In a multi-institutional mechanism, the dispersal of data provides safeguards for both privacy and security,while combining these dataoffers independent regulation and technical safeguards to control who receives such data and to limit the ability to identify such data.
Competing interests: None declared.
Provenance and peer review: Commissioned based on an idea from the author; not externally peer reviewed.