Last year, our pathology informatics fellowship added informatics-based interactive case studies to its existing educational platform of operational and research rotations, clinical conferences, a common core curriculum with an accompanying didactic course, and national meetings.
The structure of the informatics case studies was based on the traditional business school case study format. Three different formats were used, varying in length from short, 15-minute scenarios to more formal multiple hour-long case studies. Case studies were presented over the course of three retreats (Fall 2011, Winter 2012, and Spring 2012) and involved both local and visiting faculty and fellows.
Both faculty and fellows found the case studies and the retreats educational, valuable, and enjoyable. From this positive feedback, we plan to incorporate the retreats in future academic years as an educational component of our fellowship program.
Interactive case studies appear to be valuable in teaching several aspects of pathology informatics that are difficult to teach in more traditional venues (rotations and didactic class sessions). Case studies have become an important component of our fellowship's educational platform.
Case study method; clinical informatics training; clinical informatics; informatics fellowship training; informatics teaching; pathology informatics fellowship; pathology informatics training; pathology informatics; retreats
The Human Genome Project (HGP) provided the initial draft of mankind's DNA sequence in 2001. The HGP was produced by 23 collaborating laboratories using Sanger sequencing of mapped regions as well as shotgun sequencing techniques in a process that occupied 13 years at a cost of ~$3 billion. Today, Next Generation Sequencing (NGS) techniques represent the next phase in the evolution of DNA sequencing technology at dramatically reduced cost compared to traditional Sanger sequencing. A single laboratory today can sequence the entire human genome in a few days for a few thousand dollars in reagents and staff time. Routine whole exome or even whole genome sequencing of clinical patients is well within the realm of affordability for many academic institutions across the country. This paper reviews current sequencing technology methods and upcoming advancements in sequencing technology as well as challenges associated with data generation, data manipulation and data storage. Implementation of routine NGS data in cancer genomics is discussed along with potential pitfalls in the interpretation of the NGS data. The overarching importance of bioinformatics in the clinical implementation of NGS is emphasized. We also review the issue of physician education which also is an important consideration for the successful implementation of NGS in the clinical workplace. NGS technologies represent a golden opportunity for the next generation of pathologists to be at the leading edge of the personalized medicine approaches coming our way. Often under-emphasized issues of data access and control as well as potential ethical implications of whole genome NGS sequencing are also discussed. Despite some challenges, it's hard not to be optimistic about the future of personalized genome sequencing and its potential impact on patient care and the advancement of knowledge of human biology and disease in the near future.
Bioinformatics; clinical medicine; next generation sequencing; pathology
BACKGROUND: Conflicting roles for Slit2, a protein involved in mediating the processes of cell migration and chemotactic response, have been previously described in prostate cancer. Here we use immunohistochemistry to evaluate the expression of Slit2 in normal donor prostate (NDP), benign prostatic hyperplasia (BPH), high-grade prostatic intraepithelial neoplasia (HGPIN), normal tissue adjacent to prostatic adenocarcinoma (NAC), primary prostatic adenocarcinoma (PCa), and metastatic prostatic adenocarcinoma (Mets). METHODS: Tissue microarrays were immunostained for Slit2. The staining intensities were quantified using automated image analysis software. The data was statistically analyzed using one-way analysis of variance with subsequent Tukey tests for multiple comparisons or a nonparametric equivalent. Eleven cases of NDP, 35 cases of NAC, 15 cases of BPH, 35 cases of HGPIN, 106 cases of PCa, and 37 cases of Mets were analyzed. RESULTS: Specimens of PCa and HGPIN had the highest absolute staining for Slit2. Significant differences were seen between PCa and NDP (P < .05), PCa and NAC (P < .05), HGPIN and NDP (P < .05), and HGPIN and NAC (P < .05). Whereas the average Mets staining was not significantly different from NDP or NAC, several individual Mets cases featured intense staining. CONCLUSIONS: To our knowledge, this represents the first study comparing the immunohistochemical profiles of Slit2 in PCa and Mets to specimens of HGPIN, BPH, NDP, and NAC. These findings suggest that Slit2 expression can be increased in HGPIN, PCa, and Mets, making it a potentially important biomarker for prostate cancer.
Ezrin-radixin-moesin-binding phosphoprotein 50 (EBP50) is an adapter protein which has been shown to play an active role in a wide variety of cellular processes, including interactions with proteins related to both tumor suppression and oncogenesis. Here we use immunohistochemistry to evaluate EBP50's expression in normal donor prostate (NDP), benign prostatic hyperplasia (BPH), high grade prostatic intraepithelial neoplasia (HGPIN), normal tissue adjacent to prostatic adenocarcinoma (NAC), primary prostatic adenocarcinoma (PCa), and metastatic prostatic adenocarcinoma (Mets).
Tissue microarrays were immunohistochemically stained for EBP50, with the staining intensities quantified using automated image analysis software. The data were statistically analyzed using one-way ANOVA with subsequent Tukey tests for multiple comparisons. Eleven cases of NDP, 37 cases of NAC, 15 cases of BPH, 35 cases of HGPIN, 103 cases of PCa, and 36 cases of Mets were analyzed in the microarrays.
Specimens of PCa and Mets had the lowest absolute staining for EBP50. Mets staining was significantly lower than NDP (p = 0.027), BPH (p = 0.012), NAC (p < 0.001), HGPIN (p < 0.001), and PCa (p = 0.006). Additionally, HGPIN staining was significantly higher than NAC (p < 0.009) and PCa (p < 0.001).
To our knowledge, this represents the first study comparing the immunohistochemical profiles of EBP50 in PCa and Mets to specimens of HGPIN, BPH, NDP, and NAC and suggests that EBP50 expression is decreased in Mets. Given that PCa also had significantly higher expression than Mets, future studies are warranted to assess EBP50's potential as a prognostic biomarker for prostate cancer.
Claudins are integral membrane proteins that are involved in forming cellular tight junctions. One member of the claudin family, claudin-3, has been shown to be overexpressed in breast, ovarian, and pancreatic cancer. Here we use immunohistochemistry to evaluate its expression in benign prostatic hyperplasia (BPH), prostatic intraepithelial neoplasia (PIN), normal tissue adjacent to prostatic adenocarcinoma (NAC), primary prostatic adenocarcinoma (PCa), and metastatic prostatic adenocarcinoma (Mets).
Tissue microarrays were immunohistochemically stained for claudin-3, with the staining intensities subsequently quantified and statistically analyzed using a one-way ANOVA with subsequent Tukey tests for multiple comparisons or a nonparametric equivalent. Fifty-three cases of NAC, 17 cases of BPH, 35 cases of PIN, 107 cases of PCa, and 55 cases of Mets were analyzed in the microarrays.
PCa and Mets had the highest absolute staining for claudin-3. Both had significantly higher staining than BPH (p < 0.05 in both cases) and NAC (p < 0.05 in both cases). PIN had a lower, but non-significant, staining score than PCa and Mets, but a statistically higher score than both BPH and NAC (p < 0.05 for both cases). No significant differences were observed between PCa, Mets, and PIN.
To our knowledge, this represents one of the first studies comparing the immunohistochemical profiles of claudin-3 in PCa and NAC to specimens of PIN, BPH, and Mets. These findings provide further evidence that claudin-3 may serve as an important biomarker for prostate cancer, both primary and metastatic, but does not provide evidence that claudin-3 can be used to predict risk of metastasis.
Some members of the Protein 4.1 superfamily are believed to be involved in cell proliferation and growth, or in the regulation of these processes. While the expression levels of two members of this family, radixin and moesin, have been studied in many tumor types, to our knowledge they have not been investigated in prostate cancer.
Tissue microarrays were immunohistochemically stained for either radixin or moesin, with the staining intensities subsequently quantified and statistically analyzed using One-Way ANOVA or nonparametric equivalent with subsequent Student-Newman-Keuls tests for multiple comparisons. There were 11 cases of normal donor prostates (NDP), 14 cases of benign prostatic hyperplasia (BPH), 23 cases of high-grade prostatic intraepithelial neoplasia (HGPIN), 88 cases of prostatic adenocarcinoma (PCa), and 25 cases of normal tissue adjacent to adenocarcinoma (NAC) analyzed in the microarrays.
NDP, BPH, and HGPIN had higher absolute staining scores for radixin than PCa and NAC, but with a significant difference observed between only HGPIN and PCa (p = < 0.001) and HGPIN and NAC (p = 0.001). In the moesin-stained specimens, PCa, NAC, HGPIN, and BPH all received absolute higher staining scores than NDP, but the differences were not significant. Stage 4 moesin-stained PCa had a significantly reduced staining intensity compared to Stage 2 (p = 0.003).
To our knowledge, these studies represent the first reports on the expression profiles of radixin and moesin in prostatic adenocarcinoma. The current study has shown that there were statistically significant differences observed between HGPIN and PCa and HGPIN and NAC in terms of radixin expression. The differences in the moesin profiles by tissue type were not statistically significant. Additional larger studies with these markers may further elucidate their potential roles in prostatic neoplasia progression.
A genome-wide association study (GWAS) involves examining representative SNPs obtained using high throughput technologies. A GWAS data set can entail a million SNPs and may soon entail many millions. In a GWAS researchers often investigate the correlation of each SNP with a disease. With so many hypotheses, it is not straightforward how to interpret the results. Strategies include using the Bonferroni correction to determine the significance of a model and Bayesian methods. However, when we are discovering new locus-disease associations, i.e., so called de novo discoveries, we should not just endeavor to determine the significance of particular models, but also concern ourselves with determining whether it is likely that we have any true discoveries, and if so how many of the highest ranking models we should investigate further. We develop a method based on a signal-to-noise ratio that targets this issue. We apply the method to a GWAS Alzheimer’s data set.
To describe the management of and satisfaction with laboratory testing, and desirability of laboratory health information technology in the nursing home setting.
Cross-sectional study using an Internet-based survey.
Participants and Setting
National sample of 426 nurse practitioners and 308 physicians who practice in the nursing home setting.
Systems and processes available for ordering and reviewing laboratory tests, laboratory test result management satisfaction, self-reported delays in laboratory test result review, and desirability of computerized laboratory test result management features in the nursing home setting.
A total of 96 participants (48 physicians and 48 nurse practitioners) completed the survey, for an overall response rate of 13.1% (96/734). Of the survey participants, 77.1% had worked in the nursing home setting for more than 5 years. Over half of clinicians (52.1%) reported three or more recent delays in receiving laboratory test results. Only 43.8% were satisfied with their laboratory test results management. Satisfaction was associated with keeping a list of laboratory orders and availability of computerized laboratory test order entry. In the nursing home, 35.4% of participants reported the ability to electronically review laboratory test results, 12.5% and 10.4% respectively had computerized ordering of chemistry/hematology and microbiology/pathology tests. The following three features were rated most desirable in a computerized laboratory test result management system: showing abnormal results first, warning if a test result was missed, and allowing electronic acknowledgment of test results.
Delays in receiving laboratory test results and dissatisfaction with the management of laboratory test result information are commonly reported among physicians and nurse practitioners working in nursing homes. Test result management satisfaction was associated with computerized order entry and keeping track of ordered lab tests, suggesting that implementation of certain health information technology could potentially improve quality of care.
Laboratory techniques and procedures; laboratories; nursing homes; medication monitoring
Tissue microarrays (TMAs) are enormously useful tools for translational research, but incompatibilities in database systems between various researchers and institutions prevent the efficient sharing of data that could help realize their full potential. Resource Description Framework (RDF) provides a flexible method to represent knowledge in triples, which take the form Subject-Predicate-Object. All data resources are described using Uniform Resource Identifiers (URIs), which are global in scope. We present an OWL (Web Ontology Language) schema that expands upon the TMA data exchange specification to address this issue and assist in data sharing and integration.
A minimal OWL schema was designed containing only concepts specific to TMA experiments. More general data elements were incorporated from predefined ontologies such as the NCI thesaurus. URIs were assigned using the Linked Data format.
We present examples of files utilizing the schema and conversion of XML data (similar to the TMA DES) to OWL.
By utilizing predefined ontologies and global unique identifiers, this OWL schema provides a solution to the limitations of XML, which represents concepts defined in a localized setting. This will help increase the utilization of tissue resources, facilitating collaborative translational research efforts.
Ontology; OWL; tissue microarray
Tissue banking informatics deals with standardized annotation, collection and storage of biospecimens that can further be shared by researchers. Over the last decade, the Department of Biomedical Informatics (DBMI) at the University of Pittsburgh has developed various tissue banking informatics tools to expedite translational medicine research. In this review, we describe the technical approach and capabilities of these models.
Clinical annotation of biospecimens requires data retrieval from various clinical information systems and the de-identification of the data by an honest broker. Based upon these requirements, DBMI, with its collaborators, has developed both Oracle-based organ-specific data marts and a more generic, model-driven architecture for biorepositories. The organ-specific models are developed utilizing Oracle 126.96.36.199 server tools and software applications and the model-driven architecture is implemented in a J2EE framework.
The organ-specific biorepositories implemented by DBMI include the Cooperative Prostate Cancer Tissue Resource (http://www.cpctr.info/), Pennsylvania Cancer Alliance Bioinformatics Consortium (http://pcabc.upmc.edu/main.cfm), EDRN Colorectal and Pancreatic Neoplasm Database (http://edrn.nci.nih.gov/) and Specialized Programs of Research Excellence (SPORE) Head and Neck Neoplasm Database (http://spores.nci.nih.gov/current/hn/index.htm). The model-based architecture is represented by the National Mesothelioma Virtual Bank (http://mesotissue.org/). These biorepositories provide thousands of well annotated biospecimens for the researchers that are searchable through query interfaces available via the Internet.
These systems, developed and supported by our institute, serve to form a common platform for cancer research to accelerate progress in clinical and translational research. In addition, they provide a tangible infrastructure and resource for exposing research resources and biospecimen services in collaboration with the clinical anatomic pathology laboratory information system (APLIS) and the cancer registry information systems.
Tissue banking informatics; information models for translational research
Clinical records are often unstructured, free-text documents that create information extraction challenges and costs. Healthcare delivery and research organizations, such as the National Mesothelioma Virtual Bank, require the aggregation of both structured and unstructured data types. Natural language processing offers techniques for automatically extracting information from unstructured, free-text documents.
Five hundred and eight history and physical reports from mesothelioma patients were split into development (208) and test sets (300). A reference standard was developed and each report was annotated by experts with regard to the patient’s personal history of ancillary cancer and family history of any cancer. The Hx application was developed to process reports, extract relevant features, perform reference resolution and classify them with regard to cancer history. Two methods, Dynamic-Window and ConText, for extracting information were evaluated. Hx’s classification responses using each of the two methods were measured against the reference standard. The average Cohen’s weighted kappa served as the human benchmark in evaluating the system.
Hx had a high overall accuracy, with each method, scoring 96.2%. F-measures using the Dynamic-Window and ConText methods were 91.8% and 91.6%, which were comparable to the human benchmark of 92.8%. For the personal history classification, Dynamic-Window scored highest with 89.2% and for the family history classification, ConText scored highest with 97.6%, in which both methods were comparable to the human benchmark of 88.3% and 97.2%, respectively.
We evaluated an automated application’s performance in classifying a mesothelioma patient’s personal and family history of cancer from clinical reports. To do so, the Hx application must process reports, identify cancer concepts, distinguish the known mesothelioma from ancillary cancers, recognize negation, perform reference resolution and determine the experiencer. Results indicated that both information extraction methods tested were dependant on the domain-specific lexicon and negation extraction. We showed that the more general method, ConText, performed as well as our task-specific method. Although Dynamic- Window could be modified to retrieve other concepts, ConText is more robust and performs better on inconclusive concepts. Hx could greatly improve and expedite the process of extracting data from free-text, clinical records for a variety of research or healthcare delivery organizations.
Information extraction; natural language processing; cancer history classifcation
We report on the development of an instrument to measure clinicians’ perceptions of their personal power in the workplace in relation to resistance to computerized physician order entry (CPOE). The instrument is based on French and Raven’s six bases of social power and uses a semantic differential methodology. A measurement study was conducted to determine the reliability and validity of the survey. The survey was administered online and distributed via a URL by email to 19 physicians, nurses, and health unit coordinators from a university hospital. Acceptable reliability was achieved by removing or moving some semantic differential word pairs used to represent the six power bases (alpha range from 0.76–0.89). The Semantic Differential Power Perception (SDPP) survey validity was tested against an already validated instrument and found to be acceptable (correlation range from 0.51–0.81). The SDPP survey instrument was determined to be both reliable and valid.
power; resistance; measurement; questionnaire; electronic health records; clinical informatics; socio-technical; human factors; hospital information systems; computerized provider order entry
Honest broker services are essential for tissue- and data-based research. The honest broker provides a firewall between clinical and research activities. Clinical information is stripped of Health Insurance Portability and Accountability Act-denoted personal health identifiers. Research material may have linkage codes, precluding the identification of patients to researchers. The honest broker provides data derived from clinical and research sources. These data are for research use only, and there are rules in place that prohibit reidentification. Very rarely, the institutional review board (IRB) may allow recontact and develop a recontact plan with the honest broker. Certain databases are structured to serve a clinical and research function and incorporate ‘real-time’ updating of information. This complex process needs resolution of a variety of issues regarding the precise role of the HB and their interaction with data. There also is an obvious need for software solutions to make the task of deidentification easier.
The University of Pittsburgh has implemented a novel, IRB-approved mechanism to address honest broker functions to meet the specimen and data needs of researchers. The Tissue Bank stores biologic specimens. The Cancer Registry culls data and annotating information as part of state- and federal-mandated functions and collects data on the clinical progression, treatment, and outcomes of cancer patients. The Cancer Registry also has additional IRB approval to collect data elements only for research purposes. The Clinical Outcomes Group is involved in patient safety and health services research. Radiation Oncology and Medical Oncology provide critical treatment related information. Pathology and Oncology Informatics have designed software tools for querying availability of specimens, extracting data, and deidentifying specimens and annotating data for clinical and translational research. These entities partnered and submitted a joint IRB proposal to create an institutional honest broker facility. The employees of this conglomerate have honest broker agreements with the University of Pittsburgh and the Medical Center. This provides a large group of honest brokers, ensuring availability for projects without any conflict of interest.
The honest broker system has been an IRB-approved institutional entity at the University of Pittsburgh since 2003. The honest broker system currently includes 33 certified honest brokers encompassing the multiple partners of this system. The honest broker system has handled >1600 requests over the past 4 years with a 25% increase in volume each year.
The current results indicate that the collaborative honest broker model described herein is robust and provides a highly functional solution to the specimen and data needs for critical clinical and translational research activities.
honest broker; biologic specimens; data annotation; Institutional Review Board; tissue bank; translational research; Health Insurance Portability; Accountability Act of 1996
Rebecca Crowley and colleagues propose that academic health centers can and should lead the transition towards a culture of biomedical data sharing.
The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, http://www.pcabc.upmc.edu) is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers.
The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission.
The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards.
Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site.
The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy.
Synoptic reporting, either as part of the pathology report or replacing some free text component incorporates standardized data elements in the form of checklists for pathology reporting. This ensures the pathologists make note of these findings in their reports, thereby improving the quality and uniformity of information in the pathology reports.
The purpose of this project is to develop the entire set of elements in the synoptic templates or "worksheets" for hematologic and lymphoid neoplasms using the World Health Organization (WHO) Classification and the College of American Pathologists (CAP) Cancer Checklists. The CAP checklists' content was supplemented with the most updated classification scheme (WHO classification), specimen details, staging as well as information on various ancillary techniques such as cytochemical studies, immunophenotyping, cytogenetics including Fluorescent In-situ Hybridization (FISH) studies and genotyping. We have used a digital synoptic reporting system as part of an existing laboratory information system (LIS), CoPathPlus, from Cerner DHT, Inc. The synoptic elements are presented as discrete data points, so that a data element such as tumor type is assigned from the synoptic value dictionary under the value of tumor type, allowing the user to search for just those cases that have that value point populated.
These synoptic worksheets are implemented for use in our LIS. The data is stored as discrete data elements appear as an accession summary within the final pathology report. In addition, the synoptic data can be exported to research databases for linking pathological details on banked tissues.
Synoptic reporting provides a structured method for entering the diagnostic as well as prognostic information for a particular pathology specimen or sample, thereby reducing transcription services and reducing specimen turnaround time. Furthermore, it provides accurate and consistent diagnostic information dictated by pathologists as a basis for appropriate therapeutic modalities. Using synoptic reports, consistent data elements with minimized typographical and transcription errors can be generated and placed in the LIS relational database, enabling quicker access to desired information and improved communication for appropriate cancer management. The templates will also eventually serve as a conduit for capturing and storing data in the virtual biorepository for translational research. Such uniformity of data lends itself to subsequent ease of data viewing and extraction, as demonstrated by rapid production of standardized, high-quality data from the hemopoietic and lymphoid neoplasm specimens.
Microarray studies in cancer compare expression levels between two or more sample groups on thousands of genes. Data analysis follows a population-level approach (e.g., comparison of sample means) to identify differentially expressed genes. This leads to the discovery of 'population-level' markers, i.e., genes with the expression patterns A > B and B > A. We introduce the PPST test that identifies genes where a significantly large subset of cases exhibit expression values beyond upper and lower thresholds observed in the control samples.
Interestingly, the test identifies A > B and B < A pattern genes that are missed by population-level approaches, such as the t-test, and many genes that exhibit both significant overexpression and significant underexpression in statistically significantly large subsets of cancer patients (ABA pattern genes). These patterns tend to show distributions that are unique to individual genes, and are aptly visualized in a 'gene expression pattern grid'. The low degree of among-gene correlations in these genes suggests unique underlying genomic pathologies and high degree of unique tumor-specific differential expression. We compare the PPST and the ABA test to the parametric and non-parametric t-test by analyzing two independently published data sets from studies of progression in astrocytoma.
The PPST test resulted findings similar to the nonparametric t-test with higher self-consistency. These tests and the gene expression pattern grid may be useful for the identification of therapeutic targets and diagnostic or prognostic markers that are present only in subsets of cancer patients, and provide a more complete portrait of differential expression in cancer.
The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health.
Ontology; Biositemaps; Resources; Biomedical research; Resource annotation; Resource discovery; Search; Semantic web; Web 2.0; Clinical and Translational Science Awards
A genome-wide association study (GWAS) typically involves examining representative SNPs in individuals from some population. A GWAS data set can concern a million SNPs and may soon concern billions. Researchers investigate the association of each SNP individually with a disease, and it is becoming increasingly commonplace to also analyze multi-SNP associations. Techniques for handling so many hypotheses include the Bonferroni correction and recently developed Bayesian methods. These methods can encounter problems. Most importantly, they are not applicable to a complex multi-locus hypothesis which has several competing hypotheses rather than only a null hypothesis. A method that computes the posterior probability of complex hypotheses is a pressing need.
We introduce the Bayesian network posterior probability (BNPP) method which addresses the difficulties. The method represents the relationship between a disease and SNPs using a directed acyclic graph (DAG) model, and computes the likelihood of such models using a Bayesian network scoring criterion. The posterior probability of a hypothesis is computed based on the likelihoods of all competing hypotheses. The BNPP can not only be used to evaluate a hypothesis that has previously been discovered or suspected, but also to discover new disease loci associations. The results of experiments using simulated and real data sets are presented. Our results concerning simulated data sets indicate that the BNPP exhibits both better evaluation and discovery performance than does a p-value based method. For the real data sets, previous findings in the literature are confirmed and additional findings are found.
We conclude that the BNPP resolves a pressing problem by providing a way to compute the posterior probability of complex multi-locus hypotheses. A researcher can use the BNPP to determine the expected utility of investigating a hypothesis further. Furthermore, we conclude that the BNPP is a promising method for discovering disease loci associations.
Clinical and translational research increasingly requires computation. Projects may involve multiple computationally-oriented groups including information technology (IT) professionals, computer scientists and biomedical informaticians. However, many biomedical researchers are not aware of the distinctions among these complementary groups, leading to confusion, delays and sub-optimal results. Although written from the perspective of clinical and translational science award (CTSA) programs within academic medical centers, the paper addresses issues that extend beyond clinical and translational research. The authors describe the complementary but distinct roles of operational IT, research IT, computer science and biomedical informatics using a clinical data warehouse as a running example. In general, IT professionals focus on technology. The authors distinguish between two types of IT groups within academic medical centers: central or administrative IT (supporting the administrative computing needs of large organizations) and research IT (supporting the computing needs of researchers). Computer scientists focus on general issues of computation such as designing faster computers or more efficient algorithms, rather than specific applications. In contrast, informaticians are concerned with data, information and knowledge. Biomedical informaticians draw on a variety of tools, including but not limited to computers, to solve information problems in health care and biomedicine. The paper concludes with recommendations regarding administrative structures that can help to maximize the benefit of computation to biomedical research within academic health centers.
To determine the minimal frequency of laboratory monitoring of 30 types of chronic medications or classes that are administered to nursing facility residents and are either listed under pharmacy services tag F329 (the tag for unnecessary medications), or have a narrow therapeutic index.
Design and Setting
Cross-sectional, Internet-based survey.
National sample of 500 pharmacists, 500 nurse practitioners, and 327 physicians.
Main Outcome Measure
Minimal frequency of monitoring, recorded as an interval of 1, 3, 6, 9, or 12 months, for each of 35 laboratory parameters (e.g., serum drug level, complete blood count, liver function tests) for the 30 types of chronic medications or classes. Agreement was defined as having two or more of the three professional groups select the same minimal monitoring interval.
Overall, 116 professionals (20 pharmacists, 48 physicians, and 48 nurse practitioners) completed the survey. Most respondents were women (58.6% [68/116]), and most had worked in nursing facilities for > 5 years (66.4% [77/116]). Regarding minimal laboratory monitoring intervals, respondents reached agreement concerning 33 of 35 parameters. They selected three or six months as the minimum interval for 30 of 35 parameters (85.7%), one month as the minimum for two parameters, and 12 months as the minimum for one parameter.
The multidisciplinary panel agreed that most medications that were listed under the F329 tag or have a narrow therapeutic index should have laboratory monitoring every three or six months. The results can be used by nursing facility professionals to establish minimal laboratory monitoring parameters for chronic medications, which may potentially reduce the occurrence of adverse drug reactions.
Adverse drug reactions; Drug monitoring; Nursing facility
Advances in translational research have led to the need for well characterized biospecimens for research. The National Mesothelioma Virtual Bank is an initiative which collects annotated datasets relevant to human mesothelioma to develop an enterprising biospecimen resource to fulfill researchers' need.
The National Mesothelioma Virtual Bank architecture is based on three major components: (a) common data elements (based on College of American Pathologists protocol and National North American Association of Central Cancer Registries standards), (b) clinical and epidemiologic data annotation, and (c) data query tools. These tools work interoperably to standardize the entire process of annotation. The National Mesothelioma Virtual Bank tool is based upon the caTISSUE Clinical Annotation Engine, developed by the University of Pittsburgh in cooperation with the Cancer Biomedical Informatics Grid™ (caBIG™, see ). This application provides a web-based system for annotating, importing and searching mesothelioma cases. The underlying information model is constructed utilizing Unified Modeling Language class diagrams, hierarchical relationships and Enterprise Architect software.
The database provides researchers real-time access to richly annotated specimens and integral information related to mesothelioma. The data disclosed is tightly regulated depending upon users' authorization and depending on the participating institute that is amenable to the local Institutional Review Board and regulation committee reviews.
The National Mesothelioma Virtual Bank currently has over 600 annotated cases available for researchers that include paraffin embedded tissues, tissue microarrays, serum and genomic DNA. The National Mesothelioma Virtual Bank is a virtual biospecimen registry with robust translational biomedical informatics support to facilitate basic science, clinical, and translational research. Furthermore, it protects patient privacy by disclosing only de-identified datasets to assure that biospecimens can be made accessible to researchers.