Search tips
Search criteria

Results 1-25 (36)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Horizon Scanning for Translational Genomic Research Beyond Bench to Bedside 
The dizzying pace of genomic discoveries is leading to an increasing number of clinical applications. However, very little translational research is ongoing beyond Bench to Bedside to assess validity, utility, implementation and outcomes of such applications. Here we report cross sectional results of ongoing horizon scanning of translational genomic research conducted between May 16, 2012 and May 15, 2013. Based on a weekly, systematic query of PubMed, we created a curated set of 505 beyond bench-to-bedside research publications, including 312 original research articles, 123 systematic and other reviews, 38 clinical guidelines, policies and recommendations, and 32 papers describing tools, decision support and educational materials. Most papers (62%) addressed a specific genomic test or other health application; almost half of these (n=180) were related to cancer. We estimate that these publications account for 0.5% of reported human genomics and genetics research during the same time. These data provide baseline information to track the evolving knowledge base and gaps in genomic medicine. Continuous horizon scanning is crucial for an evidence-based translation of genomic discoveries into improved health care and disease prevention.
PMCID: PMC4079725  PMID: 24406461
genomic medicine; public health; surveillance; translational research
2.  A systematic review of cancer GWAS and candidate gene meta-analyses reveals limited overlap but similar effect sizes 
Candidate gene and genome-wide association studies (GWAS) represent two complementary approaches to uncovering genetic contributions to common diseases. We systematically reviewed the contributions of these approaches to our knowledge of genetic associations with cancer risk by analyzing the data in the Cancer Genome-wide Association and Meta Analyses database (Cancer GAMAdb). The database catalogs studies published since January 1, 2000, by study and cancer type. In all, we found that meta-analyses and pooled analyses of candidate genes reported 349 statistically significant associations and GWAS reported 269, for a total of 577 unique associations. Only 41 (7.1%) associations were reported in both candidate gene meta-analyses and GWAS, usually with similar effect sizes. When considering only noteworthy associations (defined as those with false-positive report probabilities ≤0.2) and accounting for indirect overlap, we found 202 associations, with 27 of those appearing in both meta-analyses and GWAS. Our findings suggest that meta-analyses of well-conducted candidate gene studies may continue to add to our understanding of the genetic associations in the post-GWAS era.
PMCID: PMC3925284  PMID: 23881057
GWAS; candidate gene studies; meta-analysis; cancer
3.  Future Health Applications of Genomics 
Despite the quickening momentum of genomic discovery, the communication, behavioral, and social sciences research needed for translating this discovery into public health applications has lagged behind. The National Human Genome Research Institute held a 2-day workshop in October 2008 convening an interdisciplinary group of scientists to recommend forward-looking priorities for translational research. This research agenda would be designed to redress the top three risk factors (tobacco use, poor diet, and physical inactivity) that contribute to the four major chronic diseases (heart disease, type 2 diabetes, lung disease, and many cancers) and account for half of all deaths worldwide. Three priority research areas were identified: (1) improving the public’s genetic literacy in order to enhance consumer skills; (2) gauging whether genomic information improves risk communication and adoption of healthier behaviors more than current approaches; and (3) exploring whether genomic discovery in concert with emerging technologies can elucidate new behavioral intervention targets. Important crosscutting themes also were identified, including the need to: (1) anticipate directions of genomic discovery; (2) take an agnostic scientific perspective in framing research questions asking whether genomic discovery adds value to other health promotion efforts; and (3) consider multiple levels of influence and systems that contribute to important public health problems. The priorities and themes offer a framework for a variety of stakeholders, including those who develop priorities for research funding, interdisciplinary teams engaged in genomics research, and policymakers grappling with how to use the products born of genomics research to address public health challenges.
PMCID: PMC4188632  PMID: 20409503
4.  Strategies, Actions, and Outcomes of Pilot State Programs in Public Health Genomics, 2003–2008 
State health departments in Michigan, Minnesota, Oregon, and Utah explored the use of genomic information, including family health history, in chronic disease prevention programs. To support these explorations, the Office of Public Health Genomics at the Centers for Disease Control and Prevention provided cooperative agreement funds from 2003 through 2008. The 4 states’ chronic disease programs identified advocates, formed partnerships, and assessed public data; they integrated genomics into existing state plans for genetics and chronic disease prevention; they developed projects focused on prevention of asthma, cancer, cardiovascular disease, diabetes, and other chronic conditions; and they created educational curricula and materials for health workers, policymakers, and the public. Each state’s program was different because of the need to adapt to existing culture, infrastructure, and resources, yet all were able to enhance their chronic disease prevention programs with the use of family health history, a low-tech “genomic tool.” Additional states are drawing on the experience of these 4 states to develop their own approaches.
PMCID: PMC4060875  PMID: 24921900
5.  A Population Perspective on How Personalized Medicine Can Improve Health 
The term P4 medicine is used to denote an evolving field of medicine that uses systems biology approaches and information technologies to enhance wellness rather than just treat disease. Its four components include predictive, preventive, personalized, and participatory medicine. In the current paper, it is argued that in order to fulfill the promise of P4 medicine, a “fifth P” must be integrated--the population perspective--into each of the other four components. A population perspective integrates predictive medicine into the ecologic model of health; applies principles of population screening to preventive medicine; uses evidence-based practice to personalize medicine; and grounds participatory medicine on the three core functions of public health: assessment, policy development, and assurance. Population sciences--including epidemiology; behavioral, social, and communication sciences; and health economics, implementation science, and outcomes research--are needed to show the value of P4 medicine. Balanced strategies that implement both population- and individual-level interventions can best maximize health benefits, minimize harms, and avoid unnecessary healthcare costs.
PMCID: PMC3629731  PMID: 22608383
6.  GWAS Integrator: a bioinformatics tool to explore human genetic associations reported in published genome-wide association studies 
European Journal of Human Genetics  2011;19(10):1095-1099.
Genome-wide association studies (GWAS) have successfully identified numerous genetic loci that are associated with phenotypic traits and diseases. GWAS Integrator is a bioinformatics tool that integrates information on these associations from the National Human Genome Research institute (NHGRI) Catalog, SNAP (SNP Annotation and Proxy Search), and the Human Genome Epidemiology (HuGE) Navigator literature database. This tool includes robust search and data mining functionalities that can be used to quickly identify relevant associations from GWAS, as well as proxy single-nucleotide polymorphisms (SNPs) and potential candidate genes. Query-based University of California Santa Cruz (UCSC) Genome Browser custom tracks are generated dynamically on the basis of users' selected GWAS hits or candidate genes from HuGE Navigator literature database ( The GWAS Integrator may help enhance inference on potential genetic associations identified from GWAS studies.
PMCID: PMC3190251  PMID: 21610748
genome-wide association studies; database; bioinformatics
7.  Strengthening the Reporting of Genetic Risk Prediction Studies (GRIPS): Explanation and Elaboration 
European journal of epidemiology  2011;26(4):313-337.
The rapid and continuing progress in gene discovery for complex diseases is fuelling interest in the potential application of genetic risk models for clinical and public health practice.The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality.Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction.A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by prior reporting guidelines.These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
PMCID: PMC3088812  PMID: 21424820
8.  Trends in Population-Based Studies of Human Genetics in Infectious Diseases 
PLoS ONE  2012;7(2):e25431.
Pathogen genetics is already a mainstay of public health investigation and control efforts; now advances in technology make it possible to investigate the role of human genetic variation in the epidemiology of infectious diseases. To describe trends in this field, we analyzed articles that were published from 2001 through 2010 and indexed by the HuGE Navigator, a curated online database of PubMed abstracts in human genome epidemiology. We extracted the principal findings from all meta-analyses and genome-wide association studies (GWAS) with an infectious disease-related outcome. Finally, we compared the representation of diseases in HuGE Navigator with their contributions to morbidity worldwide. We identified 3,730 articles on infectious diseases, including 27 meta-analyses and 23 GWAS. The number published each year increased from 148 in 2001 to 543 in 2010 but remained a small fraction (about 7%) of all studies in human genome epidemiology. Most articles were by authors from developed countries, but the percentage by authors from resource-limited countries increased from 9% to 25% during the period studied. The most commonly studied diseases were HIV/AIDS, tuberculosis, hepatitis B infection, hepatitis C infection, sepsis, and malaria. As genomic research methods become more affordable and accessible, population-based research on infectious diseases will be able to examine the role of variation in human as well as pathogen genomes. This approach offers new opportunities for understanding infectious disease susceptibility, severity, treatment, control, and prevention.
PMCID: PMC3274513  PMID: 22347358
9.  A Pilot Study of Host Genetic Variants Associated with Influenza-associated Deaths among Children and Young Adults1 
Emerging Infectious Diseases  2011;17(12):2294-2302.
Low-producing MBL2 genotypes may have increased risk for MRSA co-infection.
We compared the prevalence of 8 polymorphisms in the tumor necrosis factor and mannose-binding lectin genes among 105 children and young adults with fatal influenza with US population estimates and determined in subanalyses whether these polymorphisms were associated with sudden death and bacterial co-infection among persons with fatal influenza. No differences were observed in genotype prevalence or minor allele frequencies between persons with fatal influenza and the reference sample. Fatal cases with low-producing MBL2 genotypes had a 7-fold increased risk for invasive methicillin-resistant Staphylococcus aureus (MRSA) co-infection compared with fatal cases with high- and intermediate-producing MBL2 genotypes (odds ratio 7.1, 95% confidence interval 1.6–32.1). Limited analysis of 2 genes important to the innate immune response found no association between genetic variants and fatal influenza infection. Among children and young adults who died of influenza, low-producing MBL2 genotypes may have increased risk for MRSA co-infection.
PMCID: PMC3311214  PMID: 22172537
viruses; influenza; children; genetics; polymorphism; death; Staphylococcus aureus
10.  Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration 
The rapid and continuing progress in gene discovery for complex diseases is fueling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by previous reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
PMCID: PMC3083630  PMID: 21407270
11.  Strengthening the reporting of genetic risk prediction studies (GRIPS): explanation and elaboration 
European Journal of Epidemiology  2011;26(4):313-337.
The rapid and continuing progress in gene discovery for complex diseases is fuelling interest in the potential application of genetic risk models for clinical and public health practice. The number of studies assessing the predictive ability is steadily increasing, but they vary widely in completeness of reporting and apparent quality. Transparent reporting of the strengths and weaknesses of these studies is important to facilitate the accumulation of evidence on genetic risk prediction. A multidisciplinary workshop sponsored by the Human Genome Epidemiology Network developed a checklist of 25 items recommended for strengthening the reporting of Genetic RIsk Prediction Studies (GRIPS), building on the principles established by prior reporting guidelines. These recommendations aim to enhance the transparency, quality and completeness of study reporting, and thereby to improve the synthesis and application of information from multiple studies that might differ in design, conduct or analysis.
PMCID: PMC3088812  PMID: 21424820
Genetic; Risk prediction; Methodology; Guidelines; Reporting
12.  Human genomics and preparedness for infectious threats 
Genome Medicine  2009;1(12):119.
Public health preparedness requires effective surveillance of and rapid response to infectious disease outbreaks. Inclusion of research activities within the outbreak setting provides important opportunities to maximize limited resources, to enhance gains in scientific knowledge, and ultimately to increase levels of preparedness. With rapid advances in laboratory technologies, banking and analysis of human genomic specimens can be conducted as part of public health investigations, enabling valuable research well into the future.
PMCID: PMC2808735  PMID: 20090897
13.  PLoS Currents: Evidence on Genomic Tests – At the Crossroads of Translation 
PLoS Currents  2010;2:RRN1179.
Evidence on Genomic Tests is an open access publication option for communicating high-quality, scientific information that is needed to evaluate health applications of genomic research. By using Google’s knol platform, we aim to reduce conventional barriers to sharing, updating, and accessing the results of knowledge synthesis and to increase the benefits to authors and users alike.
PMCID: PMC2940140  PMID: 20877450
14.  The Scientific Foundation for Personal Genomics: Recommendations from a National Institutes of Health–Centers for Disease Control and Prevention Multidisciplinary Workshop 
The increasing availability of personal genomic tests has led to discussions about the validity and utility of such tests and the balance of benefits and harms. A multidisciplinary workshop was convened by the National Institutes of Health and the Centers for Disease Control and Prevention to review the scientific foundation for using personal genomics in risk assessment and disease prevention and to develop recommendations for targeted research. The clinical validity and utility of personal genomics is a moving target with rapidly developing discoveries but little translation research to close the gap between discoveries and health impact. Workshop participants made recommendations in five domains: (1) developing and applying scientific standards for assessing personal genomic tests; (2) developing and applying a multidisciplinary research agenda, including observational studies and clinical trials to fill knowledge gaps in clinical validity and utility; (3) enhancing credible knowledge synthesis and information dissemination to clinicians and consumers; (4) linking scientific findings to evidence-based recommendations for use of personal genomics; and (5) assessing how the concept of personal utility can affect health benefits, costs, and risks by developing appropriate metrics for evaluation. To fulfill the promise of personal genomics, a rigorous multidisciplinary research agenda is needed.
PMCID: PMC2936269  PMID: 19617843
behavioral sciences; epidemiologic methods; evidence-based medicine; genetics; genetic testing; genomics; medicine; public health
15.  The Emergence of Translational Epidemiology: From Scientific Discovery to Population Health Impact 
American Journal of Epidemiology  2010;172(5):517-524.
Recent emphasis on translational research (TR) is highlighting the role of epidemiology in translating scientific discoveries into population health impact. The authors present applications of epidemiology in TR through 4 phases designated T1–T4, illustrated by examples from human genomics. In T1, epidemiology explores the role of a basic scientific discovery (e.g., a disease risk factor or biomarker) in developing a “candidate application” for use in practice (e.g., a test used to guide interventions). In T2, epidemiology can help to evaluate the efficacy of a candidate application by using observational studies and randomized controlled trials. In T3, epidemiology can help to assess facilitators and barriers for uptake and implementation of candidate applications in practice. In T4, epidemiology can help to assess the impact of using candidate applications on population health outcomes. Epidemiology also has a leading role in knowledge synthesis, especially using quantitative methods (e.g., meta-analysis). To explore the emergence of TR in epidemiology, the authors compared articles published in selected issues of the Journal in 1999 and 2009. The proportion of articles identified as translational doubled from 16% (11/69) in 1999 to 33% (22/66) in 2009 (P = 0.02). Epidemiology is increasingly recognized as an important component of TR. By quantifying and integrating knowledge across disciplines, epidemiology provides crucial methods and tools for TR.
PMCID: PMC2927741  PMID: 20688899
epidemiology; genomics; medicine; public health; translational research
17.  Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes 
We present a potentially useful alternative approach based on support vector machine (SVM) techniques to classify persons with and without common diseases. We illustrate the method to detect persons with diabetes and pre-diabetes in a cross-sectional representative sample of the U.S. population.
We used data from the 1999-2004 National Health and Nutrition Examination Survey (NHANES) to develop and validate SVM models for two classification schemes: Classification Scheme I (diagnosed or undiagnosed diabetes vs. pre-diabetes or no diabetes) and Classification Scheme II (undiagnosed diabetes or pre-diabetes vs. no diabetes). The SVM models were used to select sets of variables that would yield the best classification of individuals into these diabetes categories.
For Classification Scheme I, the set of diabetes-related variables with the best classification performance included family history, age, race and ethnicity, weight, height, waist circumference, body mass index (BMI), and hypertension. For Classification Scheme II, two additional variables--sex and physical activity--were included. The discriminative abilities of the SVM models for Classification Schemes I and II, according to the area under the receiver operating characteristic (ROC) curve, were 83.5% and 73.2%, respectively. The web-based tool-Diabetes Classifier was developed to demonstrate a user-friendly application that allows for individual or group assessment with a configurable, user-defined threshold.
Support vector machine modeling is a promising classification approach for detecting persons with common diseases such as diabetes and pre-diabetes in the population. This approach should be further explored in other complex diseases using common variables.
PMCID: PMC2850872  PMID: 20307319
18.  Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE statement 
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modeling haplotype variation, Hardy–Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data, and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct, or analysis.
PMCID: PMC2764094  PMID: 19189221
Gene–disease associations; Genetics; Gene–environment interaction; Systematic review; Meta analysis; Reporting recommendations; Epidemiology; Genome-wide association
19.  Genome-Wide Association Studies, Field Synopses, and the Development of the Knowledge Base on Genetic Variation and Human Diseases 
American Journal of Epidemiology  2009;170(3):269-279.
Genome-wide association studies (GWAS) have led to a rapid increase in available data on common genetic variants and phenotypes and numerous discoveries of new loci associated with susceptibility to common complex diseases. Integrating the evidence from GWAS and candidate gene studies depends on concerted efforts in data production, online publication, database development, and continuously updated data synthesis. Here the authors summarize current experience and challenges on these fronts, which were discussed at a 2008 multidisciplinary workshop sponsored by the Human Genome Epidemiology Network. Comprehensive field synopses that integrate many reported gene-disease associations have been systematically developed for several fields, including Alzheimer's disease, schizophrenia, bladder cancer, coronary heart disease, preterm birth, and DNA repair genes in various cancers. The authors summarize insights from these field synopses and discuss remaining unresolved issues—especially in the light of evidence from GWAS, for which they summarize empirical P-value and effect-size data on 223 discovered associations for binary outcomes (142 with P < 10−7). They also present a vision of collaboration that builds reliable cumulative evidence for genetic associations with common complex diseases and a transparent, distributed, authoritative knowledge base on genetic variation and human health. As a next step in the evolution of Human Genome Epidemiology reviews, the authors invite investigators to submit field synopses for possible publication in the American Journal of Epidemiology.
PMCID: PMC2714948  PMID: 19498075
association; database; encyclopedias; epidemiologic methods; genome, human; genome-wide association study; genomics; meta-analysis
20.  Genome-wide association studies in pharmacogenomics: untapped potential for translation 
Genome Medicine  2009;1(4):46.
Despite large public investments in genome-wide association studies of common human diseases, so far, few gene discoveries have led to applications for clinical medicine or public health. Genome-wide association studies in the context of clinical trials of drug safety and efficacy may be quicker to yield clinical applications. Certain methodological concerns, such as selection bias and confounding, may be mitigated when genome-wide association studies are conducted within clinical trials, in which randomization of exposure, prospective evaluation of outcome and careful definition of phenotype are incorporated by design.
PMCID: PMC2684667  PMID: 19439031
21.  The need for genetic variant naming standards in published abstracts of human genetic association studies 
BMC Research Notes  2009;2:56.
We analyzed the use of RefSNP (rs) numbers to identify genetic variants in abstracts of human genetic association studies published from 2001 through 2007. The proportion of abstracts reporting rs numbers increased rapidly but was still only 15% in 2007. We developed a web-based tool called Variant Name Mapper to assist in mapping historical genetic variant names to rs numbers. The consistent use of rs numbers in abstracts that report genetic associations would enhance knowledge synthesis and translation in this field.
PMCID: PMC2672936  PMID: 19366450
22.  STrengthening the REporting of Genetic Association Studies (STREGA)— An Extension of the STROBE Statement 
PLoS Medicine  2009;6(2):e1000022.
Julian Little and colleagues present the STREGA recommendations, which are aimed at improving the reporting of genetic association studies.
PMCID: PMC2634792  PMID: 19192942
gene-disease associations; genetics; gene-environment interaction; systematic review; meta analysis; reporting recommendations; epidemiology; genome-wide association
23.  Gene Prospector: An evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases 
BMC Bioinformatics  2008;9:528.
Millions of single nucleotide polymorphisms have been identified as a result of the human genome project and the rapid advance of high throughput genotyping technology. Genetic association studies, such as recent genome-wide association studies (GWAS), have provided a springboard for exploring the contribution of inherited genetic variation and gene/environment interactions in relation to disease. Given the capacity of such studies to produce a plethora of information that may then be described in a number of publications, selecting possible disease susceptibility genes and identifying related modifiable risk factors is a major challenge. A Web-based application for finding evidence of such relationships is key to the development of follow-up studies and evidence for translational research.
We developed a Web-based application that selects and prioritizes potential disease-related genes by using a highly curated and updated literature database of genetic association studies. The application, called Gene Prospector, also provides a comprehensive set of links to additional data sources.
We compared Gene Prospector results for the query "Parkinson" with a list of 13 leading candidate genes (Top Results) from a curated, specialty database for genetic associations with Parkinson disease (PDGene). Nine of the thirteen leading candidate genes from PDGene were in the top 10th percentile of the ranked list from Gene Prospector. In fact, Gene Prospector included more published genetic association studies for the 13 leading candidate genes than PDGene did.
Gene Prospector provides an online gateway for searching for evidence about human genes in relation to diseases, other phenotypes, and risk factors, and provides links to published literature and other online data sources. Gene Prospector can be accessed via .
PMCID: PMC2613935  PMID: 19063745
24.  Reporting of Human Genome Epidemiology (HuGE) association studies: An empirical assessment 
Several thousand human genome epidemiology association studies are published every year investigating the relationship between common genetic variants and diverse phenotypes. Transparent reporting of study methods and results allows readers to better assess the validity of study findings. Here, we document reporting practices of human genome epidemiology studies.
Articles were randomly selected from a continuously updated database of human genome epidemiology association studies to be representative of genetic epidemiology literature. The main analysis evaluated 315 articles published in 2001–2003. For a comparative update, we evaluated 28 more recent articles published in 2006, focusing on issues that were poorly reported in 2001–2003.
During both time periods, most studies comprised relatively small study populations and examined one or more genetic variants within a single gene. Articles were inconsistent in reporting the data needed to assess selection bias and the methods used to minimize misclassification (of the genotype, outcome, and environmental exposure) or to identify population stratification. Statistical power, the use of unrelated study participants, and the use of replicate samples were reported more often in articles published during 2006 when compared with the earlier sample.
We conclude that many items needed to assess error and bias in human genome epidemiology association studies are not consistently reported. Although some improvements were seen over time, reporting guidelines and online supplemental material may help enhance the transparency of this literature.
PMCID: PMC2413261  PMID: 18492284
25.  GAPscreener: An automatic tool for screening human genetic association literature in PubMed using the support vector machine technique 
BMC Bioinformatics  2008;9:205.
Synthesis of data from published human genetic association studies is a critical step in the translation of human genome discoveries into health applications. Although genetic association studies account for a substantial proportion of the abstracts in PubMed, identifying them with standard queries is not always accurate or efficient. Further automating the literature-screening process can reduce the burden of a labor-intensive and time-consuming traditional literature search. The Support Vector Machine (SVM), a well-established machine learning technique, has been successful in classifying text, including biomedical literature. The GAPscreener, a free SVM-based software tool, can be used to assist in screening PubMed abstracts for human genetic association studies.
The data source for this research was the HuGE Navigator, formerly known as the HuGE Pub Lit database. Weighted SVM feature selection based on a keyword list obtained by the two-way z score method demonstrated the best screening performance, achieving 97.5% recall, 98.3% specificity and 31.9% precision in performance testing. Compared with the traditional screening process based on a complex PubMed query, the SVM tool reduced by about 90% the number of abstracts requiring individual review by the database curator. The tool also ascertained 47 articles that were missed by the traditional literature screening process during the 4-week test period. We examined the literature on genetic associations with preterm birth as an example. Compared with the traditional, manual process, the GAPscreener both reduced effort and improved accuracy.
GAPscreener is the first free SVM-based application available for screening the human genetic association literature in PubMed with high recall and specificity. The user-friendly graphical user interface makes this a practical, stand-alone application. The software can be downloaded at no charge.
PMCID: PMC2387176  PMID: 18430222

Results 1-25 (36)