|Home | About | Journals | Submit | Contact Us | Français|
The mission of the Pharmacogenomics Knowledge Base (PharmGKB; www.pharmgkb.org) is to collect, encode and disseminate knowledge about the impact of human genetic variations on drug responses. It is an important worldwide resource of clinical pharmacogenomic biomarkers available to all. The PharmGKB website has evolved to highlight our knowledge curation and aggregation over our previous emphasis on collecting primary data. This review summarizes the methods we use to drive this expanded scope of ‘Knowledge Acquisition to Clinical Applications’, the new features available on our website and our future goals.
The Pharmacogenomics Knowledge Base (PharmGKB) is a publicly available online worldwide resource . The main objective of the PharmGKB is to aggregate and disseminate information and knowledge regarding pharmacogenomics in an online database website (alongside downloadable content), contributing to the drive towards personalized medicine for better therapeutics. The PharmGKB also plays an active role as an independent broker for international research consortia focused on pharmacogenomics. The Clinical Pharmacogenetics Implementation Consortium (CPIC), led by the PharmGKB and the NIH Pharmacogenomics Research Network (PGRN) was established to develop genetics-based dosing guidelines for specific drugs. The PharmGKB website has recently been redesigned to integrate these clinical dosing guidelines and Clinical Annotations with gene, drug and disease information in a user-friendly and accessible manner.
From the www.pharmgkb.org homepage, several search boxes can be utilized to find the gene, genetic variant or drug of interest (Figure 1). On each drug or gene page under the ‘Pharmacogenomics (PGx) Research’ tab, a table of variants associated with the drug or gene is presented (Figure 2). The first column of the table contains icons representing what kind of information is available for each variant (Table 1). This information can be found by clicking on the icon, variant, gene or drug links in these tables.
One of the many tasks of the PharmGKB curators is to review the past and current literature and add any relevant pharmacogenetic or genomic articles to the PharmGKB database. Manual coverage of the huge volume of pharmacogenetic literature is not feasible given the size of our curation staff, and we expect ongoing efforts in natural language processing (NLP) research will aid this in the future (see ‘Future perspective’ section). The curator team routinely curates literature from several major pharmacogenetic journals and from publications by the PGRN. Additional pharmacogenomic literature is annotated in the course of creating pathways (see ‘Pathway summaries’ section), Very Important Pharmacogene (VIP) summaries (see ‘VIP summaries’ section) and further curation tasks.
A Variant Annotation is a summary created for a particular genetic variant associated with a drug described in a single publication. Essential information regarding the pharmacogenetic association is provided, such as the alleles or genotypes described, the phenotypic effect or effect on drug metabolism, study characteristics of the population, and statistics. Variant Annotations summarize the findings of a variety of study types, including clinical trials, clinical case studies, genome-wide association studies, and functional in vivo and in vitro studies. Curators also endeavor to cover pharmacogenetic studies mentioned within a review, introduction or discussion. Variant Annotations are represented by the ‘VA’ symbol throughout the PharmGKB website (Table 1 & Figure 2). Links to PubMed® IDs (PMIDs)  within pathway descriptions and VIP summaries also link to curated Variant Annotations.
An example of a Variant Annotation in our new standardized format is shown in Figure 3, for PMID 21383771 . Below the article title and abstract, the gene, drug and diseases mentioned in the article are displayed. These link directly to the individual gene, drug and disease pages. A symbol is provided to tag whether a pharmacokinetic (PK) or pharmacodynamic (PD) relationship is discussed (Table 1). In this example, the journal discusses the genes CYP2C9, CYP4F2 and VKORC1 and response to the drug warfarin. Although the CYP2C9 and CYP4F2 genes are involved in the metabolism of warfarin, the annotated article studies responses to warfarin (PD) and not the PK of the drug, and therefore, it has been labeled with a PD tag by the curator.
Individual Variant Annotations for any genetic variants investigated for a drug interaction are provided in the table (Figure 3). Variant Annotations include those for single nucleotide polymorphisms (SNPs), insertions or deletions (indels), copy number variants and haplotypes. Full haplotype spreadsheets can be downloaded from the ‘Haplotype’ tab of a gene page, if available. For PMID 21383771, three different Variant Annotations with a PD relationship with warfarin are given (Figure 3). The National Center for Biotechnology (NCBI) SNP database (dbSNP)  reference SNP identification (rsID) is used as a reference for each variant. A standardized sentence is provided, describing the association of the allele or genotype with a drug response. For the CYP2C9 variant rs1057910 “Genotype AA is associated with increased dose of warfarin in people with a stable therapeutic international normalized ratio between two and three as compared to genotypes CC + AC”. This individual association is labeled with a PD relationship, as response to warfarin is discussed in the article. Study parameter details relevant to the significance of the association are collected, including study size, race, allele frequencies and the p-value of the association. For example, the variant associations described in PMID 21383771 were found in a study of 248 Asian individuals, and the association between rs1057910 genotype AA and increased dose of warfarin was statistically significant at p = 0.000161. This information is then stored in our database, and is valuable to both users as well as to curators when compiling Clinical Annotations (see ‘Clinical Annotations’ section below).
Our previous Variant Annotations allowed free text and were subject to the curator’s own approach. The aim of this new feature is to standardize the annotations, provide the most essential information relevant to the pharmacogenetic association, and allow all the information to enter our database so it can easily be searched or downloaded by users. We have integrated internal microdictionaries from which standardized terms can be selected. Disease and phenotype ontologies are sourced from the National Library of Medicine’s Medical Subject Headings (MeSH browser) , and drugs, compound and substance ontology from DrugBank [2–4,105], the WHO’s Anatomical Therapeutic Chemical (ATC) classification system  or PubChem® [5,107]. Unfortunately, the rsID is often not used in the paper, and we urge authors to use this form of identification to standardize genetic variant nomenclature throughout the scientific community in their future published works, facilitating the creation of useful annotations by our curators. If not provided in the article, identifying the rsID for a gene variant can entail several resources including use of PharmGKB Variant Annotations, literature cross-referencing, HapMap , Online Mendelian Inheritance in Man® (OMIM®)  or mapping to the human reference genome sequence, to ensure the correct position and, thus, the correct variant identification is obtained. If a gene or locus is associated with a drug, but no specific variant is mentioned in the article, the publication is curated by adding the relationships into our database as a Literature Annotation, represented by the ‘LA’ symbol (Table 1). These can be accessed by a variety of means, by clicking on the LA icon, by searching directly for a publication using the PMID or searching for a drug, gene or disease, and found under the ‘Is Related To’ tab (Figure 2).
The PharmGKB Clinical Interpretations are the latest addition to the PharmGKB knowledge repertoire, a product of the direction that PharmGKB is taking towards its expanded mission “From knowledge acquisition to clinical applications”. The homepage of the PharmGKB website now features a new search box entitled ‘Clinical Interpretations’ (Figure 1). Within this box are four useful links to clinically related pharmacogenetic information on the PharmGKB website. The ‘Clinical Variant Annotations’ link takes users to a list of our curated Clinical Annotations for genetic variants with the highest level of evidence (for further information see ‘Clinical Annotations’ section below). The link to ‘Genotype-based dosing guidelines’ leads the user to a list of published genetically influenced dosing guidelines, including those by CPIC  and The Royal Dutch Association for the Advancement of Pharmacy Pharmacogenetics Working Group . This list is updated as genotype-based dosing guidelines are released, and further information can be obtained by clicking on each gene–drug pair individually (Figure 4). A list of drug labels that have pharmacogenetic information highlighted by the US FDA  can be found using the ‘Drug labels’ link. We anticipate expanding the drugs label link to include international drug labels, such as pharmacogenetic information highlighted by the European Medicines Agency (EMA) and the Pharmaceuticals and Medical Devices Agency (PMDA), Japan. The ‘Genetic tests for PGx’ link takes the user to a noncomprehensive list of example pharmacogenetic diagnostic tests. Within the Clinical Interpretations box users can search for a gene, variant rsID, drug or disease in order to retrieve related clinical information. Clinically related information can also be found under the ‘Clinical PGx’ tab found on every drug and gene page on the PharmGKB website. Under this tab are four subtabs containing the relevant information for that drug or gene (Figure 4): ‘Dosing Guidelines’; ‘Drug Labels’; ‘Clinical Annotations’; and ‘Genetic Tests’.
The PharmGKB Clinical Annotations are created by the PharmGKB curators and aim to combine accumulated Variant Annotations to provide an evidence-rated genotype profile for a particular pharmacogenetic variant. The objective is to present a succinct clinical interpretation that is a summary of the literature evidence for an association between a genetic variant and a drug, but is also useable for clinicians, researchers and the general public. A growing number of people have purchased their genotype profile from a private company and are interested in learning more about their personalized pharmacogenetics. We hope to provide a source of information for anyone interested in pharmacogenetics. The PharmGKB curators use specific criteria to assess the collective Variant Annotations and determine the level of evidence (Table 2). Clinical Annotations are reviewed routinely, and therefore can move up or down the scale as further evidence is published, or as contradictory findings are released.
Clinical Annotations are found under the ‘Clinical PGx’ tab (Figure 5). A table of Level 1 Clinical Annotations for warfarin is shown, with the option to show the full list. A downloadable file with a summary list of all the PharmGKB Clinical Annotations is also available on this page. The table provides the variant rsID and gene, the ‘relevance’ of the association between warfarin and each genotype of the variant, and the level of evidence for this association. An example of a Level 1 Clinical Annotation for warfarin and the variant rs1057910 in the CYP2C9 gene is shown in Figure 5. A summary of information for each genotype is provided:
Each genetic variant has a page on the PharmGKB website. By clicking on the variant link from the Clinical Annotation page, more information regarding the rs1057910 variant can be seen (Figure 6). This includes curated Clinical and Variant Annotations involving the variant, information from the VIP summary (if available) and information about haplotypes containing this variant (if available).
The objective of a VIP summary is to provide a succinct pharmacogenetic-based overview of the literature of an important gene involved in drug response. These cover background information including gene structure, physiological role of the encoded protein and disease associations, as well as providing a collective view on PK and PD relationships with variants in the gene. The VIP summary also includes more in-depth information of particularly important variants and haplotypes involved in drug response. They provide an excellent starting point for those new to the field wanting to learn more. VIPs are chosen through a variety of means including genes referred to within FDA drug guidelines or dosing guidelines, important pharmacogenetic developments in recent literature, historically known pharmacogenes and those that have a large list of Variant Annotations on the PharmGKB website. The curator begins with a thorough search of the literature and combines Variant and Literature Annotations already available on the PharmGKB website. Each pharmacogene is assigned to a curator, and the summaries are reviewed by at least one other curator before posting on the PharmGKB website. Many of the VIP summaries are also reviewed or contributed to by experts in the field for their input and approval. We currently have 44 VIP summaries available online, many of which have been published in the Pharmacogenetics and Genomics journal. See “PharmGKB summary: very important pharmacogene information for PTGS2” for a recent example . A direct link to a full list of our VIP annotations can be found on the homepage in the ‘Genes’ search box named ‘Important PGx genes’ (Figure 1). The orange ‘VIP’ symbol used throughout the PharmGKB website indicates a link to a gene or variant that has a VIP summary (Table 1).
The PharmGKB curated pathways aim to demonstrate the physiological interaction between genes and drugs in particular cells, and the relationships between different molecules with regard to their role in drug response. The pathways are documented through an extensive literature review, to provide an evidence-based summary of the PK and PD of a drug. They are drawn out using the PathVisio® software, and are available in Biopax format [10,111] to allow for this information to be more easily computationally searched, downloaded, shared across different databases and interchanged between different data platforms. A more illustrative representation of the pathway is also created. A pathway summary is included to explain the pathway and how the components interact, with links to Literature Annotations as evidence. A table of the individual components of the pathway is also provided, linking to individual gene and drug information for a more in-depth view. Many of the PharmGKB pathway summaries have also been published; see “PharmGKB summary: methotrexate pathway” for a recent example . A full list of our pathways can be accessed via the homepage from the ‘Pathways’ search box, by selecting ‘All pathways’, or narrowing down the search query by selecting for PD and PK pathways (Figure 1). We currently have 79 curated pathways available online, many of which have been published in the Pharmacogenetics and Genomics journal. Through our feedback correspondence we know that the pathway diagrams are often used by researchers, students and pharmacologists for presentations and reports, citing Klein et al. , and provide an important resource of pharmacogenetic information.
A major challenge for the PharmGKB is to present the breadth and depth of material associated with human genetic variations on drug response. Our new interface is an attempt to integrate the Clinical Interpretations with our gene, drug and disease information in a straightforward layout for any user regardless of background. We encourage feedback from users of PharmGKB, and provide a ‘Feedback’ button at the top right-hand corner of each PharmGKB webpage.
The current challenge for curators is the huge exponential volume of pharmacogenetic literature being generated , keeping track of new drug labeling information and genetic test guidelines, updating previous VIP annotations as well as creating new variant and Clinical Annotations. A goal of the PharmGKB is to integrate NLP technology in an automated pipeline to identify relevant scientific publications and extract key pharmacogenomic information [14,15]. The information would be prioritized and displayed to the curators for their approval prior to entry into the knowledge base, thereby maintaining our high level of curated information while significantly increasing the speed and throughput of our current curation pipeline. This would enable broader coverage of the literature rather than the limited set of specialized journals covered today. As NLP summarization techniques improve, we expect to create automated Literature Annotations or pathway diagrams with supporting evidence automatically generated per pathway edge. NLP techniques will also be used to analyze our coverage of the pharmacogenomic space and to flag important variants and genes involved in drug response. The current role of NLP in the curation pipeline is minimal, but development is actively underway to enable more sophisticated methods of automatic knowledge aggregation and extraction. PharmGKB is currently creating an annotated corpus in which text providing the supporting evidence for Variant Annotations is highlighted in the full text articles in a semantically meaningful format. This annotated corpus will become a training set enabling collaborating text mining research groups to develop methods to automatically extract pharmacogenomic information from the text.
The heterogeneous and often confusing nomenclature for single variants, alleles and haplotypes for many gene families (e.g., CYP450 and UGT), makes curation harder but at the same time more valuable. The in-depth VIP and Clinical Annotations, which gather heterogeneous information and create a consensus, are valuable tools for researchers, students and clinicians alike, and we will continue with these endeavors. We also aim to further distribute PharmGKB-generated tools and information, aid in education of pharmacists, pharmacology and medical students, molecular scientists, drug regulators, bioinformaticians, and in schools, to help broaden our scope of users, contribute to scientific communication with the public, as well as contributing to driving personalized medicine forward.
The authors would like to thank the PharmGKB team.
For reprint orders, please contact: moc.enicidemerutuf@stnirper
Financial & competing interests disclosure
This work is supported by the NIH National Institute of General Medical Sciences (R24 GM61374) and the National Library of Medicine (contract HHSN-276201000025C). RB Altman serves as a founder and consultant for Personalis and a consultant to 23andMe. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Papers of special note have been highlighted as:
of considerable interest