The Cancer GAMAdb catalogs published GWAS and meta- and pooled analyses that have evaluated the association between genetic polymorphisms and cancer risk since 1 January 2000. The methodology used in creating this robust database can be seen in . To efficiently retrieve the published genetic association articles from PubMed, a computerized text mining search algorithm with high sensitivity (97.5%) and specificity (98.3%),5
combined with follow-up manual curation, is used to find genetic association articles from PubMed as part of a published literature database screening process in the Human Genome Epidemiology (HuGE) Navigator.6
Among the HuGE literature repository, articles are eligible for inclusion if they meet the following criteria: (1) evaluate cancer risk as the outcome, (2) represent a GWAS study, meta-, or pooled analyses with aggregated estimates of effect, and (3) are published in English. The curator flags PubMed abstracts by ‘meta-analysis', ‘pooled analysis', or ‘genome-wide association' if the articles fall within the inclusion criteria. As a starting point, we used a previously published dataset by Dong et al
which included meta-analyses and pooled analyses found in PubMed that evaluated the relationship between genetic polymorphisms and cancer risk through 15 March 2008. We also review relevant articles in the online NIH GWAS Catalog (http://www.genome.gov/26525384
) as a quality check in case any GWAS articles have been overlooked. Data elements extracted from each full text article include cancer site, the gene and variant names, risk phenotype or allele, risk estimates (odds ratios or relative risk), 95% confidence intervals, ethnicity or gender (when applicable), minor allelic frequency (when applicable), number of studies, number of cases and controls, P
-values, tests for heterogeneity, tests of publication bias, type of platform used (if GWAS), gene–environment interactions (if applicable), study replication (if GWAS), copy number variation (if applicable), study type (candidate, GWAS, or clinical trial), and analysis type (meta, pooled, or consortia). Random-effect estimates from meta-analyses were used, unless the paper included only fixed-effect estimates. Significant associations from GWAS are recorded based on the NIH GWAS Catalog criteria (http://www.genome.gov/27529028
). For the standardization of the cancer phenotypes, gene names, and variant names, we manually code phenotypes with a Unified Medical Language System (UMLS) unique identifier, gene names with the Human Genome Organisation gene symbol and National Center for Biotechnology Information (NCBI) Entrez Gene GeneID, and RefSNP accession ID (rs numbers) for the variant names if they are available. On the use of the UMLS Metathesaurus (http://www.nlm.nih.gov/research/umls/
), NCBI Entrez Gene (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gene
), and Variant Name Mapper (http://www.hugenavigator.net/HuGENavigator/startPageMapper.do
) as reference sources, the database offers a robust search capacity with a user-friendly web interface in a free-text search manner ().
Workflow of the methodology use to create the Cancer GAMAdb.
Screenshot of the search for ‘bladder cancer'.