Genome-wide association studies (GWAS) have successfully identified numerous genetic loci that are associated with phenotypic traits and diseases. GWAS Integrator is a bioinformatics tool that integrates information on these associations from the National Human Genome Research institute (NHGRI) Catalog, SNAP (SNP Annotation and Proxy Search), and the Human Genome Epidemiology (HuGE) Navigator literature database. This tool includes robust search and data mining functionalities that can be used to quickly identify relevant associations from GWAS, as well as proxy single-nucleotide polymorphisms (SNPs) and potential candidate genes. Query-based University of California Santa Cruz (UCSC) Genome Browser custom tracks are generated dynamically on the basis of users' selected GWAS hits or candidate genes from HuGE Navigator literature database (http://www.hugenavigator.net/HuGENavigator/gWAHitStartPage.do). The GWAS Integrator may help enhance inference on potential genetic associations identified from GWAS studies.
genome-wide association studies; database; bioinformatics
The rapid and continuing progress in gene discovery for complex diseases is fuelling interest in the potential application of genetic risk models for clinical and public health practice.The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality.Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction.A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by prior reporting guidelines.These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
Pathogen genetics is already a mainstay of public health investigation and control efforts; now advances in technology make it possible to investigate the role of human genetic variation in the epidemiology of infectious diseases. To describe trends in this field, we analyzed articles that were published from 2001 through 2010 and indexed by the HuGE Navigator, a curated online database of PubMed abstracts in human genome epidemiology. We extracted the principal findings from all meta-analyses and genome-wide association studies (GWAS) with an infectious disease-related outcome. Finally, we compared the representation of diseases in HuGE Navigator with their contributions to morbidity worldwide. We identified 3,730 articles on infectious diseases, including 27 meta-analyses and 23 GWAS. The number published each year increased from 148 in 2001 to 543 in 2010 but remained a small fraction (about 7%) of all studies in human genome epidemiology. Most articles were by authors from developed countries, but the percentage by authors from resource-limited countries increased from 9% to 25% during the period studied. The most commonly studied diseases were HIV/AIDS, tuberculosis, hepatitis B infection, hepatitis C infection, sepsis, and malaria. As genomic research methods become more affordable and accessible, population-based research on infectious diseases will be able to examine the role of variation in human as well as pathogen genomes. This approach offers new opportunities for understanding infectious disease susceptibility, severity, treatment, control, and prevention.
Low-producing MBL2 genotypes may have increased risk for MRSA co-infection.
We compared the prevalence of 8 polymorphisms in the tumor necrosis factor and mannose-binding lectin genes among 105 children and young adults with fatal influenza with US population estimates and determined in subanalyses whether these polymorphisms were associated with sudden death and bacterial co-infection among persons with fatal influenza. No differences were observed in genotype prevalence or minor allele frequencies between persons with fatal influenza and the reference sample. Fatal cases with low-producing MBL2 genotypes had a 7-fold increased risk for invasive methicillin-resistant Staphylococcus aureus (MRSA) co-infection compared with fatal cases with high- and intermediate-producing MBL2 genotypes (odds ratio 7.1, 95% confidence interval 1.6–32.1). Limited analysis of 2 genes important to the innate immune response found no association between genetic variants and fatal influenza infection. Among children and young adults who died of influenza, low-producing MBL2 genotypes may have increased risk for MRSA co-infection.
viruses; influenza; children; genetics; polymorphism; death; Staphylococcus aureus
The rapid and continuing progress in gene discovery for complex diseases is fueling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by previous reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
The rapid and continuing progress in gene discovery for complex diseases is fuelling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by prior reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
Genetic; Risk prediction; Methodology; Guidelines; Reporting
Public health preparedness requires effective surveillance of and rapid response to infectious disease outbreaks. Inclusion of research activities within the outbreak setting provides important opportunities to maximize limited resources, to enhance gains in scientific knowledge, and ultimately to increase levels of preparedness. With rapid advances in laboratory technologies, banking and analysis of human genomic specimens can be conducted as part of public health investigations, enabling valuable research well into the future.
Evidence on Genomic Tests is an open access publication option for communicating high-quality, scientific information that is needed to evaluate health applications of genomic research. By using Google’s knol platform, we aim to reduce conventional barriers to sharing, updating, and accessing the results of knowledge synthesis and to increase the benefits to authors and users alike.
The increasing availability of personal genomic tests has led to discussions about the validity and utility of such tests and the balance of benefits and harms. A multidisciplinary workshop was convened by the National Institutes of Health and the Centers for Disease Control and Prevention to review the scientific foundation for using personal genomics in risk assessment and disease prevention and to develop recommendations for targeted research. The clinical validity and utility of personal genomics is a moving target with rapidly developing discoveries but little translation research to close the gap between discoveries and health impact. Workshop participants made recommendations in five domains: (1) developing and applying scientific standards for assessing personal genomic tests; (2) developing and applying a multidisciplinary research agenda, including observational studies and clinical trials to fill knowledge gaps in clinical validity and utility; (3) enhancing credible knowledge synthesis and information dissemination to clinicians and consumers; (4) linking scientific findings to evidence-based recommendations for use of personal genomics; and (5) assessing how the concept of personal utility can affect health benefits, costs, and risks by developing appropriate metrics for evaluation. To fulfill the promise of personal genomics, a rigorous multidisciplinary research agenda is needed.
behavioral sciences; epidemiologic methods; evidence-based medicine; genetics; genetic testing; genomics; medicine; public health
Recent emphasis on translational research (TR) is highlighting the role of epidemiology in translating scientific discoveries into population health impact. The authors present applications of epidemiology in TR through 4 phases designated T1–T4, illustrated by examples from human genomics. In T1, epidemiology explores the role of a basic scientific discovery (e.g., a disease risk factor or biomarker) in developing a “candidate application” for use in practice (e.g., a test used to guide interventions). In T2, epidemiology can help to evaluate the efficacy of a candidate application by using observational studies and randomized controlled trials. In T3, epidemiology can help to assess facilitators and barriers for uptake and implementation of candidate applications in practice. In T4, epidemiology can help to assess the impact of using candidate applications on population health outcomes. Epidemiology also has a leading role in knowledge synthesis, especially using quantitative methods (e.g., meta-analysis). To explore the emergence of TR in epidemiology, the authors compared articles published in selected issues of the Journal in 1999 and 2009. The proportion of articles identified as translational doubled from 16% (11/69) in 1999 to 33% (22/66) in 2009 (P = 0.02). Epidemiology is increasingly recognized as an important component of TR. By quantifying and integrating knowledge across disciplines, epidemiology provides crucial methods and tools for TR.
epidemiology; genomics; medicine; public health; translational research
We present a potentially useful alternative approach based on support vector machine (SVM) techniques to classify persons with and without common diseases. We illustrate the method to detect persons with diabetes and pre-diabetes in a cross-sectional representative sample of the U.S. population.
We used data from the 1999-2004 National Health and Nutrition Examination Survey (NHANES) to develop and validate SVM models for two classification schemes: Classification Scheme I (diagnosed or undiagnosed diabetes vs. pre-diabetes or no diabetes) and Classification Scheme II (undiagnosed diabetes or pre-diabetes vs. no diabetes). The SVM models were used to select sets of variables that would yield the best classification of individuals into these diabetes categories.
For Classification Scheme I, the set of diabetes-related variables with the best classification performance included family history, age, race and ethnicity, weight, height, waist circumference, body mass index (BMI), and hypertension. For Classification Scheme II, two additional variables--sex and physical activity--were included. The discriminative abilities of the SVM models for Classification Schemes I and II, according to the area under the receiver operating characteristic (ROC) curve, were 83.5% and 73.2%, respectively. The web-based tool-Diabetes Classifier was developed to demonstrate a user-friendly application that allows for individual or group assessment with a configurable, user-defined threshold.
Support vector machine modeling is a promising classification approach for detecting persons with common diseases such as diabetes and pre-diabetes in the population. This approach should be further explored in other complex diseases using common variables.
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modeling haplotype variation, Hardy–Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.
Gene–disease associations; Genetics; Gene–environment interaction; Systematic review; Meta analysis; Reporting recommendations; Epidemiology; Genome-wide association
Genome-wide association studies (GWAS) have led to a rapid increase in available data on common genetic variants and phenotypes and numerous discoveries of new loci associated with susceptibility to common complex diseases. Integrating the evidence from GWAS and candidate gene studies depends on concerted efforts in data production, online publication, database development, and continuously updated data synthesis. Here the authors summarize current experience and challenges on these fronts, which were discussed at a 2008 multidisciplinary workshop sponsored by the Human Genome Epidemiology Network. Comprehensive field synopses that integrate many reported gene-disease associations have been systematically developed for several fields, including Alzheimer's disease, schizophrenia, bladder cancer, coronary heart disease, preterm birth, and DNA repair genes in various cancers. The authors summarize insights from these field synopses and discuss remaining unresolved issues—especially in the light of evidence from GWAS, for which they summarize empirical P-value and effect-size data on 223 discovered associations for binary outcomes (142 with P < 10−7). They also present a vision of collaboration that builds reliable cumulative evidence for genetic associations with common complex diseases and a transparent, distributed, authoritative knowledge base on genetic variation and human health. As a next step in the evolution of Human Genome Epidemiology reviews, the authors invite investigators to submit field synopses for possible publication in the American Journal of Epidemiology.
association; database; encyclopedias; epidemiologic methods; genome, human; genome-wide association study; genomics; meta-analysis
Despite large public investments in genome-wide association studies of common human diseases, so far, few gene discoveries have led to applications for clinical medicine or public health. Genome-wide association studies in the context of clinical trials of drug safety and efficacy may be quicker to yield clinical applications. Certain methodological concerns, such as selection bias and confounding, may be mitigated when genome-wide association studies are conducted within clinical trials, in which randomization of exposure, prospective evaluation of outcome and careful definition of phenotype are incorporated by design.
We analyzed the use of RefSNP (rs) numbers to identify genetic variants in abstracts of human genetic association studies published from 2001 through 2007. The proportion of abstracts reporting rs numbers increased rapidly but was still only 15% in 2007. We developed a web-based tool called Variant Name Mapper to assist in mapping historical genetic variant names to rs numbers. The consistent use of rs numbers in abstracts that report genetic associations would enhance knowledge synthesis and translation in this field.
Julian Little and colleagues present the STREGA recommendations, which are aimed at improving the reporting of genetic association studies.
gene-disease associations; genetics; gene-environment interaction; systematic review; meta analysis; reporting recommendations; epidemiology; genome-wide association
Millions of single nucleotide polymorphisms have been identified as a result of the human genome project and the rapid advance of high throughput genotyping technology. Genetic association studies, such as recent genome-wide association studies (GWAS), have provided a springboard for exploring the contribution of inherited genetic variation and gene/environment interactions in relation to disease. Given the capacity of such studies to produce a plethora of information that may then be described in a number of publications, selecting possible disease susceptibility genes and identifying related modifiable risk factors is a major challenge. A Web-based application for finding evidence of such relationships is key to the development of follow-up studies and evidence for translational research.
We developed a Web-based application that selects and prioritizes potential disease-related genes by using a highly curated and updated literature database of genetic association studies. The application, called Gene Prospector, also provides a comprehensive set of links to additional data sources.
We compared Gene Prospector results for the query "Parkinson" with a list of 13 leading candidate genes (Top Results) from a curated, specialty database for genetic associations with Parkinson disease (PDGene). Nine of the thirteen leading candidate genes from PDGene were in the top 10th percentile of the ranked list from Gene Prospector. In fact, Gene Prospector included more published genetic association studies for the 13 leading candidate genes than PDGene did.
Gene Prospector provides an online gateway for searching for evidence about human genes in relation to diseases, other phenotypes, and risk factors, and provides links to published literature and other online data sources. Gene Prospector can be accessed via .
Several thousand human genome epidemiology association studies are published every year investigating the relationship between common genetic variants and diverse phenotypes. Transparent reporting of study methods and results allows readers to better assess the validity of study findings. Here, we document reporting practices of human genome epidemiology studies.
Articles were randomly selected from a continuously updated database of human genome epidemiology association studies to be representative of genetic epidemiology literature. The main analysis evaluated 315 articles published in 2001–2003. For a comparative update, we evaluated 28 more recent articles published in 2006, focusing on issues that were poorly reported in 2001–2003.
During both time periods, most studies comprised relatively small study populations and examined one or more genetic variants within a single gene. Articles were inconsistent in reporting the data needed to assess selection bias and the methods used to minimize misclassification (of the genotype, outcome, and environmental exposure) or to identify population stratification. Statistical power, the use of unrelated study participants, and the use of replicate samples were reported more often in articles published during 2006 when compared with the earlier sample.
We conclude that many items needed to assess error and bias in human genome epidemiology association studies are not consistently reported. Although some improvements were seen over time, reporting guidelines and online supplemental material may help enhance the transparency of this literature.
Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies.
The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy.
GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.
Identifying relevant research in an ever-growing body of published literature is becoming increasingly difficult. Establishing domain-specific knowledge bases may be a more effective and efficient way to manage and query information within specific biomedical fields. Adopting controlled vocabulary is a critical step toward data integration and interoperability in any information system. We present an open source infrastructure that provides a powerful capacity for managing and mining data within a domain-specific knowledge base. As a practical application of our infrastructure, we presented two applications – Literature Finder and Investigator Browser – as well as a tool set for automating the data curating process for the human genome published literature database. The design of this infrastructure makes the system potentially extensible to other data sources.
Information retrieval and usability tests demonstrated that the system had high rates of recall and precision, 90% and 93% respectively. The system was easy to learn, easy to use, reasonably speedy and effective.
The open source system infrastructure presented in this paper provides a novel approach to managing and querying information and knowledge from domain-specific PubMed data. Using the controlled vocabulary UMLS enhanced data integration and interoperability and the extensibility of the system. In addition, by using MVC-based design and Java as a platform-independent programming language, this system provides a potential infrastructure for any domain-specific knowledge base in the biomedical field.
Collaboration among investigators has become critical to scientific research. This includes ad hoc collaboration established through personal contacts as well as formal consortia established by funding agencies. Continued growth in online resources for scientific research and communication has promoted the development of highly networked research communities. Extending these networks globally requires identifying additional investigators in a given domain, profiling their research interests, and collecting current contact information. We present a novel strategy for building investigator networks dynamically and producing detailed investigator profiles using data available in PubMed abstracts.
We developed a novel strategy to obtain detailed investigator information by automatically parsing the affiliation string in PubMed records. We illustrated the results by using a published literature database in human genome epidemiology (HuGE Pub Lit) as a test case. Our parsing strategy extracted country information from 92.1% of the affiliation strings in a random sample of PubMed records and in 97.0% of HuGE records, with accuracies of 94.0% and 91.0%, respectively. Institution information was parsed from 91.3% of the general PubMed records (accuracy 86.8%) and from 94.2% of HuGE PubMed records (accuracy 87.0). We demonstrated the application of our approach to dynamic creation of investigator networks by creating a prototype information system containing a large database of PubMed abstracts relevant to human genome epidemiology (HuGE Pub Lit), indexed using PubMed medical subject headings converted to Unified Medical Language System concepts. Our method was able to identify 70–90% of the investigators/collaborators in three different human genetics fields; it also successfully identified 9 of 10 genetics investigators within the PREBIC network, an existing preterm birth research network.
We successfully created a web-based prototype capable of creating domain-specific investigator networks based on an application that accurately generates detailed investigator profiles from PubMed abstracts combined with robust standard vocabularies. This approach could be used for other biomedical fields to efficiently establish domain-specific investigator networks.
Growing evidence suggests that the Arg16Arg genotype of the beta-2 adrenergic receptor gene may be associated with adverse effects of beta-agonist therapy. We sought to examine the association of beta-agonist use and the Arg16Gly polymorphism with lung function and mortality among participants in the Atherosclerosis Risk in Communities study.
Methodology and Principal Findings
We genotyped study participants and analyzed the association of the Arg16Gly polymorphism and beta-agonist use with lung function at baseline and clinical examination three years later and with all-cause mortality during 10 years of follow-up. Lung function was characterized by percent-predicted forced expiratory volume in 1 second. Associations were examined separately for blacks and whites. Black beta-agonist users with the Arg/Arg genotype had better lung function at baseline and at the second clinical visit than those with Arg/Gly and Gly/Gly genotypes. Adjusted mean percent-predicted FEV1 was 21% higher in Arg/Arg subjects compared to Gly/Gly at baseline (p = 0.01) and 20% higher than Gly/Gly at visit 2 (p = 0.01). Arg/Gly subjects had adjusted percent-predicted FEV1 17% lower than Arg/Arg at baseline but were similar to Arg/Arg subjects at visit 2. Although black beta-agonist users with the Arg/Arg genotype appeared to have better crude survival rates, the association between genotype and all-cause mortality was inconclusive. We found no difference in lung function or mortality by genotype among blacks who did not use beta-agonists or among whites, regardless of beta-agonist use.
Black beta-agonist users with the ADRB2 Arg16Arg genotype had better lung function, and, possibly, better overall survival compared to black beta-agonist users with the Gly16Gly genotype. Our findings highlight the need for additional studies of sufficient size and statistical power to allow examination of outcomes among beta-agonist users of different races and genotypes.