Rapid advances in "omic" technologies and basic research have led to discovery of genetic variants, genetic associations, and biomarkers. These advances show promise for translation into applications for clinical practice and health care [
5]. Conducting systematic reviews and meta-analyses of population-based genetic association data is an essential approach to synthesizing knowledge for translation. Some recent publications [
20,
21] have demonstrated the value of this approach; however, this work is usually painstaking and slow. Even now systematic reviews are lacking for many associations [
22]. To facilitate such efforts, Gene Prospector has been developed as an evidence gateway to key information sources, selecting genes studied for association with human traits and diseases.
Many gene-centered databases have been developed to gather information related to specific genes. For example, the NCBI Entrez Gene [
15] and GeneCard [
23] databases attempt to capture all relevant information, including gene-disease associations. However, because they were designed from gene-centered perspective in terms of query functionality, it is not easy to retrieve information related to specific diseases or risk factors. Several different approaches to candidate gene selection have been proposed and implemented. For example, G2D [
24] is a bioinformatics tool for predicting genes associated with disease based on multiple information sources, including gene functions in sequence, literature reports, and genetic associations with similar phenotypes. The latter are from a pre-computed list of monogenetic diseases derived from Online Mendelian Inheritance in Man (OMIM) [
25], which limits the value of this tool for studies of complex diseases.
SNPs3D is another online database that performs candidate gene selection. SNPs3D applies a heuristic ranking formula to PubMed records downloaded from the NCBI Gene database GeneRIFs (Gene References Into Function) section. In contrast to SNPs3D, Gene Prospector uses a continuously updated and curated data source that is specific for human genetic association studies and classified by publication type, so that more important publications receive greater weight in the scoring formula. Using the PDGene database for comparison, we demonstrated that the Gene Prospector performed better than SNPs3D.
We based our heuristic scoring formula on the total number of publications in the database for a particular gene-disease combination, with additional weight given to four different types of publications: genetic association studies, genome-wide association studies, meta-analyses/pooled analyses, and articles about genetic testing. The added weights reflect the relative importance of such articles in evaluating the evidence for genetic association.
A list of genes ranked by score allows users to see quickly which associations have been studied most often and most systematically. Thus, the main focus of Gene Prospector is not to predict genetic associations with diseases or outcomes but to provide an efficient resource for users seeking to evaluate genetic associations. The Gene Prospector's prioritized gene list for Parkinson overlapped substantially with the Top Results gene list from PDGene, a curated database for genetic association studies of Parkinson disease. Clearly, such a list is no substitute for priorities based on a specialized database curated by a domain expert. However, few such databases currently exist, outside formal research consortia, and even fewer are freely accessible online. However, a prioritized list produced by our scoring strategy may be useful as a starting point for evaluating genetic associations in fields in which specialized resources are not available. As an evidence gateway, Gene Prospector provides a set of links for each candidate gene to curated subsets of published studies (e.g., GWAS); thus, it provides researchers with an information center for quickly and systematically retrieving the evidence needed to evaluate candidate genes for relationships with diseases or risk factors.
The HuGE Navigator database is one of most frequently updated and highly curated literature repositories in the field of genetic association studies. Recently, publications based on GWAS have become a leading source of replicated genetic associations [
26]. In collaboration with the Catalog of Published Genome-Wide Association Studies [
27], we aim to maintain the most complete and updated collection of GWAS publications. The heuristic scoring function in Gene Prospector gives greater weight to GWAS publications because their abstracts typically feature genes with statistically significant associations. Genes included in meta-analyses also receive extra weight because these labor-intensive analyses tend to be conducted exclusively for associations with the greatest amount of evidence [
21].
The Gene Prospector takes advantage of features of the other applications in HuGE Navigator to make information more accessible and easy to navigate; for example, the link to Genopedia provides summaries and quick data links related to the gene. The link to HuGE Literature Finder allows users to continue navigating the information contained in the PubMed abstract of each article. The current version of the Gene Prospector provides information mostly at the gene level, with links to generic information on SNPs. To enhance and enrich the evidence that Gene Prospector can offer, we are in the process of extracting quantitative genetic association data from published meta-analyses, such as numbers of cases and controls, effect sizes, and measures of heterogeneity. The integration of variant-level information into the evidence and scoring system would make Gene Prospector even more useful.