The PaTH (University of Pittsburgh/UPMC, Penn State College of Medicine, Temple University Hospital, and Johns Hopkins University) clinical data research network initiative is a collaborative effort among four academic health centers in the Mid-Atlantic region. PaTH will provide robust infrastructure to conduct research, explore clinical outcomes, link with biospecimens, and improve methods for sharing and analyzing data across our diverse populations. Our disease foci are idiopathic pulmonary fibrosis, atrial fibrillation, and obesity. The four network sites have extensive experience in using data from electronic health records and have devised robust methods for patient outreach and recruitment. The network will adopt best practices by using the open-source data-sharing tool, Informatics for Integrating Biology and the Bedside (i2b2), at each site to enhance data sharing using centrally defined common data elements, and will use the Shared Health Research Information Network (SHRINE) for distributed queries across the network.
clinical data research network (CDRN); I2B2; distributed cohort query; patient-centered outcomes research (PCORI); patient reported outcomes (PROs); electronic health records (EHRs)
This editorial provides insights into how informatics can attract highly trained students by involving them in science, technology, engineering, and math (STEM) training at the high school level and continuing to provide mentorship and research opportunities through the formative years of their education. Our central premise is that the trajectory necessary to be expert in the emergent fields in front of them requires acceleration at an early time point. Both pathology (and biomedical) informatics are new disciplines which would benefit from involvement by students at an early stage of their education. In 2009, Michael T Lotze MD, Kirsten Livesey (then a medical student, now a medical resident at University of Pittsburgh Medical Center (UPMC)), Richard Hersheberger, PhD (Currently, Dean at Roswell Park), and Megan Seippel, MS (the administrator) launched the University of Pittsburgh Cancer Institute (UPCI) Summer Academy to bring high school students for an 8 week summer academy focused on Cancer Biology. Initially, pathology and biomedical informatics were involved only in the classroom component of the UPCI Summer Academy. In 2011, due to popular interest, an informatics track called Computer Science, Biology and Biomedical Informatics (CoSBBI) was launched. CoSBBI currently acts as a feeder program for the undergraduate degree program in bioinformatics at the University of Pittsburgh, which is a joint degree offered by the Departments of Biology and Computer Science. We believe training in bioinformatics is the best foundation for students interested in future careers in pathology informatics or biomedical informatics. We describe our approach to the recruitment, training and research mentoring of high school students to create a pipeline of exceptionally well-trained applicants for both the disciplines of pathology informatics and biomedical informatics. We emphasize here how mentoring of high school students in pathology informatics and biomedical informatics will be critical to assuring their success as leaders in the era of big data and personalized medicine.
Bioinformatics; education; medical informatics; science; technology; engineering; and math education
Pathology informatics has evolved to varying levels around the world. The history of pathology informatics in different countries is a tale with many dimensions. At first glance, it is the familiar story of individuals solving problems that arise in their clinical practice to enhance efficiency, better manage (e.g., digitize) laboratory information, as well as exploit emerging information technologies. Under the surface, however, lie powerful resource, regulatory, and societal forces that helped shape our discipline into what it is today. In this monograph, for the first time in the history of our discipline, we collectively perform a global review of the field of pathology informatics. In doing so, we illustrate how general far-reaching trends such as the advent of computers, the Internet and digital imaging have affected pathology informatics in the world at large. Major drivers in the field included the need for pathologists to comply with national standards for health information technology and telepathology applications to meet the scarcity of pathology services and trained people in certain countries. Following trials by a multitude of investigators, not all of them successful, it is apparent that innovation alone did not assure the success of many informatics tools and solutions. Common, ongoing barriers to the widespread adoption of informatics devices include poor information technology infrastructure in undeveloped areas, the cost of technology, and regulatory issues. This review offers a deeper understanding of how pathology informatics historically developed and provides insights into what the promising future might hold.
History; pathology informatics; clinical informatics; electronic medical record; laboratory information systems; pathology education
The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers.
The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission.
The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards.
Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site.
The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy.
The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health.
Ontology; Biositemaps; Resources; Biomedical research; Resource annotation; Resource discovery; Search; Semantic web; Web 2.0; Clinical and Translational Science Awards
Last year, our pathology informatics fellowship added informatics-based interactive case studies to its existing educational platform of operational and research rotations, clinical conferences, a common core curriculum with an accompanying didactic course, and national meetings.
The structure of the informatics case studies was based on the traditional business school case study format. Three different formats were used, varying in length from short, 15-minute scenarios to more formal multiple hour-long case studies. Case studies were presented over the course of three retreats (Fall 2011, Winter 2012, and Spring 2012) and involved both local and visiting faculty and fellows.
Both faculty and fellows found the case studies and the retreats educational, valuable, and enjoyable. From this positive feedback, we plan to incorporate the retreats in future academic years as an educational component of our fellowship program.
Interactive case studies appear to be valuable in teaching several aspects of pathology informatics that are difficult to teach in more traditional venues (rotations and didactic class sessions). Case studies have become an important component of our fellowship's educational platform.
Case study method; clinical informatics training; clinical informatics; informatics fellowship training; informatics teaching; pathology informatics fellowship; pathology informatics training; pathology informatics; retreats
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized. We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Bioinformatics; clinical medicine; next generation sequencing; pathology
BACKGROUND: Conflicting roles for Slit2, a protein involved in mediating the processes of cell migration and chemotactic response, have been previously described in prostate cancer. Here we use immunohistochemistry to evaluate the expression of Slit2 in normal donor prostate (NDP), benign prostatic hyperplasia (BPH), high-grade prostatic intraepithelial neoplasia (HGPIN), normal tissue adjacent to prostatic adenocarcinoma (NAC), primary prostatic adenocarcinoma (PCa), and metastatic prostatic adenocarcinoma (Mets). METHODS: Tissue microarrays were immunostained for Slit2. The staining intensities were quantified using automated image analysis software. The data was statistically analyzed using one-way analysis of variance with subsequent Tukey tests for multiple comparisons or a nonparametric equivalent. Eleven cases of NDP, 35 cases of NAC, 15 cases of BPH, 35 cases of HGPIN, 106 cases of PCa, and 37 cases of Mets were analyzed. RESULTS: Specimens of PCa and HGPIN had the highest absolute staining for Slit2. Significant differences were seen between PCa and NDP (P < .05), PCa and NAC (P < .05), HGPIN and NDP (P < .05), and HGPIN and NAC (P < .05). Whereas the average Mets staining was not significantly different from NDP or NAC, several individual Mets cases featured intense staining. CONCLUSIONS: To our knowledge, this represents the first study comparing the immunohistochemical profiles of Slit2 in PCa and Mets to specimens of HGPIN, BPH, NDP, and NAC. These findings suggest that Slit2 expression can be increased in HGPIN, PCa, and Mets, making it a potentially important biomarker for prostate cancer.
A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need.
We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found.
We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations.
Ezrin-radixin-moesin-binding phosphoprotein 50 (EBP50) is an adapter protein which has been shown to play an active role in a wide variety of cellular processes, including interactions with proteins related to both tumor suppression and oncogenesis. Here we use immunohistochemistry to evaluate EBP50's expression in normal donor prostate (NDP), benign prostatic hyperplasia (BPH), high grade prostatic intraepithelial neoplasia (HGPIN), normal tissue adjacent to prostatic adenocarcinoma (NAC), primary prostatic adenocarcinoma (PCa), and metastatic prostatic adenocarcinoma (Mets).
Tissue microarrays were immunohistochemically stained for EBP50, with the staining intensities quantified using automated image analysis software. The data were statistically analyzed using one-way ANOVA with subsequent Tukey tests for multiple comparisons. Eleven cases of NDP, 37 cases of NAC, 15 cases of BPH, 35 cases of HGPIN, 103 cases of PCa, and 36 cases of Mets were analyzed in the microarrays.
Specimens of PCa and Mets had the lowest absolute staining for EBP50. Mets staining was significantly lower than NDP (p = 0.027), BPH (p = 0.012), NAC (p < 0.001), HGPIN (p < 0.001), and PCa (p = 0.006). Additionally, HGPIN staining was significantly higher than NAC (p < 0.009) and PCa (p < 0.001).
To our knowledge, this represents the first study comparing the immunohistochemical profiles of EBP50 in PCa and Mets to specimens of HGPIN, BPH, NDP, and NAC and suggests that EBP50 expression is decreased in Mets. Given that PCa also had significantly higher expression than Mets, future studies are warranted to assess EBP50's potential as a prognostic biomarker for prostate cancer.
Claudins are integral membrane proteins that are involved in forming cellular tight junctions. One member of the claudin family, claudin-3, has been shown to be overexpressed in breast, ovarian, and pancreatic cancer. Here we use immunohistochemistry to evaluate its expression in benign prostatic hyperplasia (BPH), prostatic intraepithelial neoplasia (PIN), normal tissue adjacent to prostatic adenocarcinoma (NAC), primary prostatic adenocarcinoma (PCa), and metastatic prostatic adenocarcinoma (Mets).
Tissue microarrays were immunohistochemically stained for claudin-3, with the staining intensities subsequently quantified and statistically analyzed using a one-way ANOVA with subsequent Tukey tests for multiple comparisons or a nonparametric equivalent. Fifty-three cases of NAC, 17 cases of BPH, 35 cases of PIN, 107 cases of PCa, and 55 cases of Mets were analyzed in the microarrays.
PCa and Mets had the highest absolute staining for claudin-3. Both had significantly higher staining than BPH (p < 0.05 in both cases) and NAC (p < 0.05 in both cases). PIN had a lower, but non-significant, staining score than PCa and Mets, but a statistically higher score than both BPH and NAC (p < 0.05 for both cases). No significant differences were observed between PCa, Mets, and PIN.
To our knowledge, this represents one of the first studies comparing the immunohistochemical profiles of claudin-3 in PCa and NAC to specimens of PIN, BPH, and Mets. These findings provide further evidence that claudin-3 may serve as an important biomarker for prostate cancer, both primary and metastatic, but does not provide evidence that claudin-3 can be used to predict risk of metastasis.
Some members of the Protein 4.1 superfamily are believed to be involved in cell proliferation and growth, or in the regulation of these processes. While the expression levels of two members of this family, radixin and moesin, have been studied in many tumor types, to our knowledge they have not been investigated in prostate cancer.
Tissue microarrays were immunohistochemically stained for either radixin or moesin, with the staining intensities subsequently quantified and statistically analyzed using One-Way ANOVA or nonparametric equivalent with subsequent Student-Newman-Keuls tests for multiple comparisons. There were 11 cases of normal donor prostates (NDP), 14 cases of benign prostatic hyperplasia (BPH), 23 cases of high-grade prostatic intraepithelial neoplasia (HGPIN), 88 cases of prostatic adenocarcinoma (PCa), and 25 cases of normal tissue adjacent to adenocarcinoma (NAC) analyzed in the microarrays.
NDP, BPH, and HGPIN had higher absolute staining scores for radixin than PCa and NAC, but with a significant difference observed between only HGPIN and PCa (p = < 0.001) and HGPIN and NAC (p = 0.001). In the moesin-stained specimens, PCa, NAC, HGPIN, and BPH all received absolute higher staining scores than NDP, but the differences were not significant. Stage 4 moesin-stained PCa had a significantly reduced staining intensity compared to Stage 2 (p = 0.003).
To our knowledge, these studies represent the first reports on the expression profiles of radixin and moesin in prostatic adenocarcinoma. The current study has shown that there were statistically significant differences observed between HGPIN and PCa and HGPIN and NAC in terms of radixin expression. The differences in the moesin profiles by tissue type were not statistically significant. Additional larger studies with these markers may further elucidate their potential roles in prostatic neoplasia progression.
A genome-wide association study (GWAS) involves examining representative SNPs obtained using high throughput technologies. A GWAS data set can entail a million SNPs and may soon entail many millions. In a GWAS researchers often investigate the correlation of each SNP with a disease. With so many hypotheses, it is not straightforward how to interpret the results. Strategies include using the Bonferroni correction to determine the significance of a model and Bayesian methods. However, when we are discovering new locus-disease associations, i.e., so called de novo discoveries, we should not just endeavor to determine the significance of particular models, but also concern ourselves with determining whether it is likely that we have any true discoveries, and if so how many of the highest ranking models we should investigate further. We develop a method based on a signal-to-noise ratio that targets this issue. We apply the method to a GWAS Alzheimer’s data set.
Clinical and translational research increasingly requires computation. Projects may involve multiple computationally-oriented groups including information technology (IT) professionals, computer scientists and biomedical informaticians. However, many biomedical researchers are not aware of the distinctions among these complementary groups, leading to confusion, delays and sub-optimal results. Although written from the perspective of clinical and translational science award (CTSA) programs within academic medical centers, the paper addresses issues that extend beyond clinical and translational research. The authors describe the complementary but distinct roles of operational IT, research IT, computer science and biomedical informatics using a clinical data warehouse as a running example. In general, IT professionals focus on technology. The authors distinguish between two types of IT groups within academic medical centers: central or administrative IT (supporting the administrative computing needs of large organizations) and research IT (supporting the computing needs of researchers). Computer scientists focus on general issues of computation such as designing faster computers or more efficient algorithms, rather than specific applications. In contrast, informaticians are concerned with data, information and knowledge. Biomedical informaticians draw on a variety of tools, including but not limited to computers, to solve information problems in health care and biomedicine. The paper concludes with recommendations regarding administrative structures that can help to maximize the benefit of computation to biomedical research within academic health centers.
To describe the management of and satisfaction with laboratory testing, and desirability of laboratory health information technology in the nursing home setting.
Cross-sectional study using an Internet-based survey.
Participants and Setting
National sample of 426 nurse practitioners and 308 physicians who practice in the nursing home setting.
Systems and processes available for ordering and reviewing laboratory tests, laboratory test result management satisfaction, self-reported delays in laboratory test result review, and desirability of computerized laboratory test result management features in the nursing home setting.
A total of 96 participants (48 physicians and 48 nurse practitioners) completed the survey, for an overall response rate of 13.1% (96/734). Of the survey participants, 77.1% had worked in the nursing home setting for more than 5 years. Over half of clinicians (52.1%) reported three or more recent delays in receiving laboratory test results. Only 43.8% were satisfied with their laboratory test results management. Satisfaction was associated with keeping a list of laboratory orders and availability of computerized laboratory test order entry. In the nursing home, 35.4% of participants reported the ability to electronically review laboratory test results, 12.5% and 10.4% respectively had computerized ordering of chemistry/hematology and microbiology/pathology tests. The following three features were rated most desirable in a computerized laboratory test result management system: showing abnormal results first, warning if a test result was missed, and allowing electronic acknowledgment of test results.
Delays in receiving laboratory test results and dissatisfaction with the management of laboratory test result information are commonly reported among physicians and nurse practitioners working in nursing homes. Test result management satisfaction was associated with computerized order entry and keeping track of ordered lab tests, suggesting that implementation of certain health information technology could potentially improve quality of care.
Laboratory techniques and procedures; laboratories; nursing homes; medication monitoring
Tissue microarrays (TMAs) are enormously useful tools for translational research, but incompatibilities in database systems between various researchers and institutions prevent the efficient sharing of data that could help realize their full potential. Resource Description Framework (RDF) provides a flexible method to represent knowledge in triples, which take the form Subject-Predicate-Object. All data resources are described using Uniform Resource Identifiers (URIs), which are global in scope. We present an OWL (Web Ontology Language) schema that expands upon the TMA data exchange specification to address this issue and assist in data sharing and integration.
A minimal OWL schema was designed containing only concepts specific to TMA experiments. More general data elements were incorporated from predefined ontologies such as the NCI thesaurus. URIs were assigned using the Linked Data format.
We present examples of files utilizing the schema and conversion of XML data (similar to the TMA DES) to OWL.
By utilizing predefined ontologies and global unique identifiers, this OWL schema provides a solution to the limitations of XML, which represents concepts defined in a localized setting. This will help increase the utilization of tissue resources, facilitating collaborative translational research efforts.
Ontology; OWL; tissue microarray
Tissue banking informatics deals with standardized annotation, collection and storage of biospecimens that can further be shared by researchers. Over the last decade, the Department of Biomedical Informatics (DBMI) at the University of Pittsburgh has developed various tissue banking informatics tools to expedite translational medicine research. In this review, we describe the technical approach and capabilities of these models.
Clinical annotation of biospecimens requires data retrieval from various clinical information systems and the de-identification of the data by an honest broker. Based upon these requirements, DBMI, with its collaborators, has developed both Oracle-based organ-specific data marts and a more generic, model-driven architecture for biorepositories. The organ-specific models are developed utilizing Oracle 126.96.36.199 server tools and software applications and the model-driven architecture is implemented in a J2EE framework.
The organ-specific biorepositories implemented by DBMI include the Cooperative Prostate Cancer Tissue Resource (http://www.cpctr.info/), Pennsylvania Cancer Alliance Bioinformatics Consortium (http://pcabc.upmc.edu/main.cfm), EDRN Colorectal and Pancreatic Neoplasm Database (http://edrn.nci.nih.gov/) and Specialized Programs of Research Excellence (SPORE) Head and Neck Neoplasm Database (http://spores.nci.nih.gov/current/hn/index.htm). The model-based architecture is represented by the National Mesothelioma Virtual Bank (http://mesotissue.org/). These biorepositories provide thousands of well annotated biospecimens for the researchers that are searchable through query interfaces available via the Internet.
These systems, developed and supported by our institute, serve to form a common platform for cancer research to accelerate progress in clinical and translational research. In addition, they provide a tangible infrastructure and resource for exposing research resources and biospecimen services in collaboration with the clinical anatomic pathology laboratory information system (APLIS) and the cancer registry information systems.
Tissue banking informatics; information models for translational research
Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents.
Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system.
Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively.
We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations.
Information extraction; natural language processing; cancer history classifcation
We report on the development of an instrument to measure clinicians’ perceptions of their personal power in the workplace in relation to resistance to computerized physician order entry (CPOE). The instrument is based on French and Raven’s six bases of social power and uses a semantic differential methodology. A measurement study was conducted to determine the reliability and validity of the survey. The survey was administered online and distributed via a URL by email to 19 physicians, nurses, and health unit coordinators from a university hospital. Acceptable reliability was achieved by removing or moving some semantic differential word pairs used to represent the six power bases (alpha range from 0.76–0.89). The Semantic Differential Power Perception (SDPP) survey validity was tested against an already validated instrument and found to be acceptable (correlation range from 0.51–0.81). The SDPP survey instrument was determined to be both reliable and valid.
power; resistance; measurement; questionnaire; electronic health records; clinical informatics; socio-technical; human factors; hospital information systems; computerized provider order entry
Honest broker services are essential for tissue- and data-based research. The honest broker provides a firewall between clinical and research activities. Clinical information is stripped of Health Insurance Portability and Accountability Act-denoted personal health identifiers. Research material may have linkage codes, precluding the identification of patients to researchers. The honest broker provides data derived from clinical and research sources. These data are for research use only, and there are rules in place that prohibit reidentification. Very rarely, the institutional review board (IRB) may allow recontact and develop a recontact plan with the honest broker. Certain databases are structured to serve a clinical and research function and incorporate ‘real-time’ updating of information. This complex process needs resolution of a variety of issues regarding the precise role of the HB and their interaction with data. There also is an obvious need for software solutions to make the task of deidentification easier.
The University of Pittsburgh has implemented a novel, IRB-approved mechanism to address honest broker functions to meet the specimen and data needs of researchers. The Tissue Bank stores biologic specimens. The Cancer Registry culls data and annotating information as part of state- and federal-mandated functions and collects data on the clinical progression, treatment, and outcomes of cancer patients. The Cancer Registry also has additional IRB approval to collect data elements only for research purposes. The Clinical Outcomes Group is involved in patient safety and health services research. Radiation Oncology and Medical Oncology provide critical treatment related information. Pathology and Oncology Informatics have designed software tools for querying availability of specimens, extracting data, and deidentifying specimens and annotating data for clinical and translational research. These entities partnered and submitted a joint IRB proposal to create an institutional honest broker facility. The employees of this conglomerate have honest broker agreements with the University of Pittsburgh and the Medical Center. This provides a large group of honest brokers, ensuring availability for projects without any conflict of interest.
The honest broker system has been an IRB-approved institutional entity at the University of Pittsburgh since 2003. The honest broker system currently includes 33 certified honest brokers encompassing the multiple partners of this system. The honest broker system has handled >1600 requests over the past 4 years with a 25% increase in volume each year.
The current results indicate that the collaborative honest broker model described herein is robust and provides a highly functional solution to the specimen and data needs for critical clinical and translational research activities.
honest broker; biologic specimens; data annotation; Institutional Review Board; tissue bank; translational research; Health Insurance Portability; Accountability Act of 1996
To determine the minimal frequency of laboratory monitoring of 30 types of chronic medications or classes that are administered to nursing facility residents and are either listed under pharmacy services tag F329 (the tag for unnecessary medications), or have a narrow therapeutic index.
Design and Setting
Cross-sectional, Internet-based survey.
National sample of 500 pharmacists, 500 nurse practitioners, and 327 physicians.
Main Outcome Measure
Minimal frequency of monitoring, recorded as an interval of 1, 3, 6, 9, or 12 months, for each of 35 laboratory parameters (e.g., serum drug level, complete blood count, liver function tests) for the 30 types of chronic medications or classes. Agreement was defined as having two or more of the three professional groups select the same minimal monitoring interval.
Overall, 116 professionals (20 pharmacists, 48 physicians, and 48 nurse practitioners) completed the survey. Most respondents were women (58.6% [68/116]), and most had worked in nursing facilities for > 5 years (66.4% [77/116]). Regarding minimal laboratory monitoring intervals, respondents reached agreement concerning 33 of 35 parameters. They selected three or six months as the minimum interval for 30 of 35 parameters (85.7%), one month as the minimum for two parameters, and 12 months as the minimum for one parameter.
The multidisciplinary panel agreed that most medications that were listed under the F329 tag or have a narrow therapeutic index should have laboratory monitoring every three or six months. The results can be used by nursing facility professionals to establish minimal laboratory monitoring parameters for chronic medications, which may potentially reduce the occurrence of adverse drug reactions.
Adverse drug reactions; Drug monitoring; Nursing facility
Rebecca Crowley and colleagues propose that academic health centers can and should lead the transition towards a culture of biomedical data sharing.