We explore relationships between health information seeking activities and engagement with healthcare professionals via a privacy-sensitive analysis of geo-tagged data from mobile devices.
Materials and methods
We analyze logs of mobile interaction data stripped of individually identifiable information and location data. The data analyzed consist of time-stamped search queries and distances to medical care centers. We examine search activity that precedes the observation of salient evidence of healthcare utilization (EHU) (ie, data suggesting that the searcher is using healthcare resources), in our case taken as queries occurring at or near medical facilities.
We show that the time between symptom searches and observation of salient evidence of seeking healthcare utilization depends on the acuity of symptoms. We construct statistical models that make predictions of forthcoming EHU based on observations about the current search session, prior medical search activities, and prior EHU. The predictive accuracy of the models varies (65%–90%) depending on the features used and the timeframe of the analysis, which we explore via a sensitivity analysis.
We provide a privacy-sensitive analysis that can be used to generate insights about the pursuit of health information and healthcare. The findings demonstrate how large-scale studies of mobile devices can provide insights on how concerns about symptomatology lead to the pursuit of professional care.
We present new methods for the analysis of mobile logs and describe a study that provides evidence about how people transition from mobile searches on symptoms and diseases to the pursuit of healthcare in the world.
Healthcare utilization; privacy-sensitive analytics; search log analysis; mobile health; predictive modeling; information retrieval; human computer interaction; search log analysis; web search; user studies
To identify key principles for establishing a national clinical decision support (CDS) knowledge sharing framework.
Materials and methods
As part of an initiative by the US Office of the National Coordinator for Health IT (ONC) to establish a framework for national CDS knowledge sharing, key stakeholders were identified. Stakeholders' viewpoints were obtained through surveys and in-depth interviews, and findings and relevant insights were summarized. Based on these insights, key principles were formulated for establishing a national CDS knowledge sharing framework.
Nineteen key stakeholders were recruited, including six executives from electronic health record system vendors, seven executives from knowledge content producers, three executives from healthcare provider organizations, and three additional experts in clinical informatics. Based on these stakeholders' insights, five key principles were identified for effectively sharing CDS knowledge nationally. These principles are (1) prioritize and support the creation and maintenance of a national CDS knowledge sharing framework; (2) facilitate the development of high-value content and tooling, preferably in an open-source manner; (3) accelerate the development or licensing of required, pragmatic standards; (4) acknowledge and address medicolegal liability concerns; and (5) establish a self-sustaining business model.
Based on the principles identified, a roadmap for national CDS knowledge sharing was developed through the ONC's Advancing CDS initiative.
The study findings may serve as a useful guide for ongoing activities by the ONC and others to establish a national framework for sharing CDS knowledge and improving clinical care.
Knowledge management; clinical decision support systems; office of the national coordinator for health information technology; meaningful use; standards
Securing protected health information is a critical responsibility of every healthcare organization. We explore information security practices and identify practice patterns that are associated with improved regulatory compliance.
We employed Ward's cluster analysis using minimum variance based on the adoption of security practices. Variance between organizations was measured using dichotomous data indicating the presence or absence of each security practice. Using t tests, we identified the relationships between the clusters of security practices and their regulatory compliance.
We utilized the results from the Kroll/Healthcare Information and Management Systems Society telephone-based survey of 250 US healthcare organizations including adoption status of security practices, breach incidents, and perceived compliance levels on Health Information Technology for Economic and Clinical Health, Health Insurance Portability and Accountability Act, Red Flags rules, Centers for Medicare and Medicaid Services, and state laws governing patient information security.
Our analysis identified three clusters (which we call leaders, followers, and laggers) based on the variance of security practice patterns. The clusters have significant differences among non-technical practices rather than technical practices, and the highest level of compliance was associated with hospitals that employed a balanced approach between technical and non-technical practices (or between one-off and cultural practices).
Hospitals in the highest level of compliance were significantly managing third parties’ breaches and training. Audit practices were important to those who scored in the middle of the pack on compliance. Our results provide security practice benchmarks for healthcare administrators and can help policy makers in developing strategic and practical guidelines for practice adoption.
Data from electronic healthcare records (EHR) can be used to monitor drug safety, but in order to compare and pool data from different EHR databases, the extraction of potential adverse events must be harmonized. In this paper, we describe the procedure used for harmonizing the extraction from eight European EHR databases of five events of interest deemed to be important in pharmacovigilance: acute myocardial infarction (AMI); acute renal failure (ARF); anaphylactic shock (AS); bullous eruption (BE); and rhabdomyolysis (RHABD).
The participating databases comprise general practitioners’ medical records and claims for hospitalization and other healthcare services. Clinical information is collected using four different disease terminologies and free text in two different languages. The Unified Medical Language System was used to identify concepts and corresponding codes in each terminology. A common database model was used to share and pool data and verify the semantic basis of the event extraction queries. Feedback from the database holders was obtained at various stages to refine the extraction queries.
Standardized and age specific incidence rates (IRs) were calculated to facilitate benchmarking and harmonization of event data extraction across the databases. This was an iterative process.
The study population comprised overall 19 647 445 individuals with a follow-up of 59 929 690 person-years (PYs). Age adjusted IRs for the five events of interest across the databases were as follows: (1) AMI: 60–148/100 000 PYs; (2) ARF: 3–49/100 000 PYs; (3) AS: 2–12/100 000 PYs; (4) BE: 2–17/100 000 PYs; and (5) RHABD: 0.1–8/100 000 PYs.
The iterative harmonization process enabled a more homogeneous identification of events across differently structured databases using different coding based algorithms. This workflow can facilitate transparent and reproducible event extractions and understanding of differences between databases.
To determine what, if any, opportunity exists in using administrative medical claims data for supplemental reporting to the state infectious disease registry system.
Materials and methods
Cases of five tick-borne (Lyme disease (LD), babesiosis, ehrlichiosis, Rocky Mountain spotted fever (RMSF), tularemia) and two mosquito-borne diseases (West Nile virus, La Crosse viral encephalitis) reported to the Tennessee Department of Health during 2000–2009 were selected for study. Similarly, medically diagnosed cases from a Tennessee-based managed care organization (MCO) claims data warehouse were extracted for the same time period. MCO and Tennessee Department of Health incidence rates were compared using a complete randomized block design within a general linear mixed model to measure potential supplemental reporting opportunity.
MCO LD incidence was 7.7 times higher (p<0.001) than that reported to the state, possibly indicating significant under-reporting (∼196 unreported cases per year). MCO data also suggest about 33 cases of RMSF go unreported each year in Tennessee (p<0.001). Three cases of babesiosis were discovered using claims data, a significant finding as this disease was only recently confirmed in Tennessee.
Data sharing between MCOs and health departments for vaccine information already exists (eg, the Vaccine Safety Datalink Rapid Cycle Analysis project). There may be a significant opportunity in Tennessee to supplement the current passive infectious disease reporting system with administrative claims data, particularly for LD and RMSF.
There are limitations with administrative claims data, but health plans may help bridge data gaps and support the federal administration's vision of combining public and private data into one source.
Administrative medical claims data; zoonotic diseases; tickborne; mosquito-borne; notifiable diseases; GIS; spatial epidemiology; wetlands; ecology
It has been claimed that most research findings are false, and it is known that large-scale studies involving omics data are especially prone to errors in design, execution, and analysis. The situation is alarming because taxpayer dollars fund a substantial amount of biomedical research, and because the publication of a research article that is later determined to be flawed can erode the credibility of an entire field, resulting in a severe and negative impact for years to come. Here, we urge the development of an online, open-access, postpublication, peer review system that will increase the accountability of scientists for the quality of their research and the ability of readers to distinguish good from sloppy science.
peer review; omics; high-dimensional; transparency; reproducible research
Population Data BC (PopData) is an innovative leader in facilitating access to linked data for population health research. Researchers from academic institutions across Canada work with PopData to submit data access requests for projects involving linked administrative data, with or without their own researcher-collected data. PopData and its predecessor—the British Columbia Linked Health Database—have facilitated over 350 research projects analyzing a broad spectrum of population health issues. PopData embeds privacy in every aspect of its operations. This case study focuses on how implementing the Privacy by Design model protects privacy while supporting access to individual-level data for research in the public interest. It explores challenges presented by legislation, stewardship, and public perception and demonstrates how PopData achieves both operational efficiencies and due diligence.
Ensuring the security and appropriate use of patient health information contained within electronic medical records systems is challenging. Observing these difficulties, we present an addition to the explanation-based auditing system (EBAS) that attempts to determine the clinical or operational reason why accesses occur to medical records based on patient diagnosis information. Accesses that can be explained with a reason are filtered so that the compliance officer has fewer suspicious accesses to review manually.
Our hypothesis is that specific hospital employees are responsible for treating a given diagnosis. For example, Dr Carl accessed Alice's medical record because Hem/Onc employees are responsible for chemotherapy patients. We present metrics to determine which employees are responsible for a diagnosis and quantify their confidence. The auditing system attempts to use this responsibility information to determine the reason why an access occurred. We evaluate the auditing system's classification quality using data from the University of Michigan Health System.
The EBAS correctly determines which departments are responsible for a given diagnosis. Adding this responsibility information to the EBAS increases the number of first accesses explained by a factor of two over previous work and explains over 94% of all accesses with high precision.
The EBAS serves as a complementary security tool for personal health information. It filters a majority of accesses such that it is more feasible for a compliance officer to review the remaining suspicious accesses manually.
Electronic Medical Records; Security
De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents.
Materials and methods
We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible.
We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation.
BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives.
Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.
To assess patients’ desire for granular level privacy control over which personal health information should be shared, with whom, and for what purpose; and whether these preferences vary based on sensitivity of health information.
Materials and methods
A card task for matching health information with providers, questionnaire, and interview with 30 patients whose health information is stored in an electronic medical record system. Most patients’ records contained sensitive health information.
No patients reported that they would prefer to share all information stored in an electronic medical record (EMR) with all potential recipients. Sharing preferences varied by type of information (EMR data element) and recipient (eg, primary care provider), and overall sharing preferences varied by participant. Patients with and without sensitive records preferred less sharing of sensitive versus less-sensitive information.
Patients expressed sharing preferences consistent with a desire for granular privacy control over which health information should be shared with whom and expressed differences in sharing preferences for sensitive versus less-sensitive EMR data. The pattern of results may be used by designers to generate privacy-preserving EMR systems including interfaces for patients to express privacy and sharing preferences.
To maintain the level of privacy afforded by medical records and to achieve alignment with patients’ preferences, patients should have granular privacy control over information contained in their EMR.
Privacy of Patient Data; Electronic Medical Record; User Computer Interface
To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.
Materials and methods
In our threat model, an attacker obtains 5–7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied—simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.
Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).
Discussion and conclusion
With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.
Privacy; information dissemination; laboratory marker; electronic health records; confidentiality/legislation and jurisprudence; biomedical informatics; nlp; data-mining; clinical decision support; error management and prevention; evaluation; monitoring and surveillance; ADEs; ethical study methods; Statistical analysis of large datasets; methods for integration of information from disparate sources; distributed systems; assuring information system security and personal privacy; CDSS; internal medicine
We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data.
Materials and Methods
SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE.
Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data.
SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.
statistical data release; differential privacy; de-identification; biomedical data privacy
Ethics; web 2.0; qualitative methods; consumer health; patient education; e-health; system implementation and management issues; improving the education and skills training of health professionals; developing/using clinical decision support (other than diagnostic) and guideline systems; systems to support and improve diagnostic accuracy; measuring/improving patient safety and reducing medical errors; qualitative/ethnographic field study; enhancing the conduct of biological/clinical research and trials; classical experimental and quasi-experimental study methods (lab and field); consumer health informatics; ethical study methods; clinical research informatics; nutrition; clinical natural language processing; developing/using computerized provider order entry; other specific ehr applications (results review); medication administration; ethics committees; code of ethics; bioethics
Private data analysis—the useful analysis of confidential data—requires a rigorous and practicable definition of privacy. Differential privacy, an emerging standard, is the subject of intensive investigation in several diverse research communities. We review the definition, explain its motivation, and discuss some of the challenges to bringing this concept to practice.
differential privacy; private data analysis; privacy; confidentiality
Ascertainment of potential subjects has been a longstanding problem in clinical research. Various methods have been proposed, including using data in electronic health records. However, these methods typically suffer from scaling effects—some methods work well for large cohorts; others work for small cohorts only.
We propose a method that provides a simple identification of pre-research cohorts and relies on data available in most states in the USA: merged public health data sources.
Materials and methods
The Utah Population Database Limited query tool allows users to build complex queries that may span several types of health records, such as cancer registries, inpatient hospital discharges, and death certificates; in addition, these can be combined with family history information. The architectural approach incorporates several coding systems for medical information. It provides a front-end graphical user interface and enables researchers to build and run queries and view aggregate results. Multiple strategies have been incorporated to maintain confidentiality.
This tool was rapidly adopted; since its release, 241 users representing a wide range of disciplines from 17 institutions have signed the user agreement and used the query tool. Three examples are discussed: pregnancy complications co-occurring with cardiovascular disease; spondyloarthritis; and breast cancer.
Discussion and conclusions
This query tool was designed to provide results as pre-research so that institutional review board approval would not be required. This architecture uses well-described technologies that should be within the reach of most institutions.
medical informatics applications; clinical trials; patient selection; public health informatics; clinical research informatics
Electronic health records (EHR) are becoming more common because of the federal EHR incentive programme, which is also promoting electronic health information exchange (HIE). To determine whether consumers' attitudes toward EHR and HIE are associated with experience with doctors using EHR, a nationwide random-digit-dial survey was conducted in December 2011. Of 1603 eligible people contacted, 1000 (63%) participated. Most believed EHR and HIE would improve healthcare quality (66% and 79%, respectively). Respondents whose doctor had an EHR were more likely to believe that these technologies would improve quality (for EHR, OR 2.3; for HIE, OR 1.7). However, experience with physicians using EHR was not associated with privacy concerns. Consumers whose physicians use EHR were more likely to believe that EHR and HIE will improve healthcare when compared to others. However, experience with a physician using an EHR had no relationship with privacy concerns.
Electronic health records; health information exchange; health information technology; privacy and security; public attitudes and perceptions
To provide a set of high-quality time-series physiologic and event data from anesthetic cases formatted in an easy-to-use structure.
Materials and methods
With ethics committee approval, data from surgical operations under general anesthesia were collected, including physiologic data, drug administrations, events, and clinicians' comments. These data were de-identified, formatted in a combined CSV/XML structure and made publicly available.
Two separate datasets were collected containing physiologic time-series data and time-stamped events for 34 patients. For 20 patients, the data included 400 physiologic signals collected over 20 h, 274 events, and 597 drug administrations. For 14 patients, the data included 23 physiologic signals collected over 69 h, with 286 time stamped comments.
Data reuse potentially saves significant time and financial costs. However, there are few high-quality repositories for accessible physiologic data and clinical interventions from surgical cases. De-identifying records assists with overcoming problems of privacy and storing the data in a format which is easily manipulated with computing resources facilitates access by the wider research community. It is hoped that additional high-quality data will be added. Future work includes developing tools to explore and visualize the data more efficiently, and establishing quality control measures.
An approach to collecting and storing high-quality datasets from surgical operations under anesthesia such that they can be easily accessed by others for use in research has been demonstrated.
Clinical data; anesthesia; database; medical records word count =2173; physiology; patient safety; drugs; modeling; pain; cardiac; informatics
Much of what is currently documented in the electronic health record is in response toincreasingly complex and prescriptive medicolegal, reimbursement, and regulatory requirements. These requirements often result in redundant data capture and cumbersome documentation processes. AMIA's 2011 Health Policy Meeting examined key issues in this arena and envisioned changes to help move toward an ideal future state of clinical data capture and documentation. The consensus of the meeting was that, in the move to a technology-enabled healthcare environment, the main purpose of documentation should be to support patient care and improved outcomes for individuals and populations and that documentation for other purposes should be generated as a byproduct of care delivery. This paper summarizes meeting deliberations, and highlights policy recommendations and research priorities. The authors recommend development of a national strategy to review and amend public policies to better support technology-enabled data capture and documentation practices.
In 2011, the US Supreme Court decided Sorrell v. IMS Health, Inc., a case that addressed the mining of large aggregated databases and the sale of prescriber data for marketing prescription drugs. The court struck down a Vermont law that required data mining companies to obtain permission from individual providers before selling prescription records that included identifiable physician prescription information to pharmaceutical companies for drug marketing. The decision was based on constitutional free speech protections rather than data sharing considerations. Sorrell illustrates challenges at the intersection of biomedical informatics, public health, constitutional liberties, and ethics. As states, courts, regulatory agencies, and federal bodies respond to Sorrell, informaticians’ expertise can contribute to more informed, ethical, and appropriate policies.
Confidentiality legislation & jurisprudence; Data mining legislation & jurisprudence; Privacy legislation & jurisprudence; Ethics; Health records
Current image sharing is carried out by manual transportation of CDs by patients or organization-coordinated sharing networks. The former places a significant burden on patients and providers. The latter faces challenges to patient privacy.
To allow healthcare providers efficient access to medical imaging data acquired at other unaffiliated healthcare facilities while ensuring strong protection of patient privacy and minimizing burden on patients, providers, and the information technology infrastructure.
An image sharing framework is described that involves patients as an integral part of, and with full control of, the image sharing process. Central to this framework is the Patient Controlled Access-key REgistry (PCARE) which manages the access keys issued by image source facilities. When digitally signed by patients, the access keys are used by any requesting facility to retrieve the associated imaging data from the source facility. A centralized patient portal, called a PCARE patient control portal, allows patients to manage all the access keys in PCARE.
A prototype of the PCARE framework has been developed by extending open-source technology. The results for feasibility, performance, and user assessments are encouraging and demonstrate the benefits of patient-controlled image sharing.
The PCARE framework is effective in many important clinical cases of image sharing and can be used to integrate organization-coordinated sharing networks. The same framework can also be used to realize a longitudinal virtual electronic health record.
The PCARE framework allows prior imaging data to be shared among unaffiliated healthcare facilities while protecting patient privacy with minimal burden on patients, providers, and infrastructure. A prototype has been implemented to demonstrate the feasibility and benefits of this approach.
Medical image; health information exchange; image sharing; diagnostic imaging; electronic health records; pcareportal
DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords.
Materials and methods
This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch.
We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space.
Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture.
It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence.
genomic privacy; genomic encryption; genomic sequencing; transmission of genomic data; cryptography; biorepository research
Privacy; Biomedical Data; Review