|Home | About | Journals | Submit | Contact Us | Français|
The NIH initiated the PharmGKB in April 2000. The primary mission was to create a repository of primary data, tools to track associations between genes and drugs, and to catalog the location and frequency of genetic variations known to impact drug response. Over the past 10 years, new technologies have shifted research from candidate gene pharmacogenetics to phenotype-based pharmacogenomics with a consequent explosion of data. PharmGKB has refocused on curating knowledge rather than housing primary genotype and phenotype data, and now, captures more complex relationships between genes, variants, drugs, diseases and pathways. Going forward, the challenges are to provide the tools and knowledge to plan and interpret genome-wide pharmacogenomics studies, predict gene–drug relationships based on shared mechanisms and support data-sharing consortia investigating clinical applications of pharmacogenomics.
The Pharmacogenomics Knowledge Base (PharmGKB) began in 2000 as one of the first ‘post-genomic’ databases . At that time, there was no standard format for the description and storage of genotype and phenotype data from pharmacogenetic studies. An important challenge was to maintain the quality of data without compromising the privacy of subjects . As data collection methods advanced and pharmacogenomics outpaced pharmacogenetics, the PharmGKB adapted to increasing volumes of data and new ways to present this. Relationships were built with other resources, such as the University of California Santa Cruz (CA, USA) Genome Browser , Drugbank  and Biopax , to enhance the knowledge by selecting, aggregating and annotating data relevant to pharmacogenomics (Table 1).
Initially, the main aim was gathering highly detailed primary data from the community at large and specifically, the Pharmacogenetics Research Network ; more than ten groups across the USA spanning a variety of different gene, drug and disease interests from asthma to thiopurine S-methyltransferase. The PharmGKB team worked with these groups to define gene variant data in detail, and how it was obtained, which formed the PharmGKB XML schema . These schema allowed many useful comparisons, such as sequencing data from one researcher with RFLPs to another group working on the same gene and to see frequencies of variants in and across sample sets.
Defining phenotypes in a computationally robust manner was another challenge. Several vocabularies and ontologies were tested for describing clinical entities for the electronic medical record, but these lacked the kind of molecular detail needed for many of the Pharmacogenetics Research Network studies. The idea was to be able to integrate data across several studies, such as combining irinotecan area under the curve data from patients in a study at the University of Chicago (IL, USA) with those from Washington University (MO, USA). The computational challenges in combining such data are substantial; data may be provided in different units and collected under different conditions. The challenge of standardization requires trained curators who understand the relevant phenotypes. Data standardization can be different for every dataset, and is expensive and not possible for all phenotype data. Therefore, the PharmGKB adopted a two-level procedure. Curators capture and tag metadata for all submitted studies without a default effort in standardization. If there is a dataset of particular importance and impact, it is curated to enable comparisons across studies, for example, international normalized ratios and genotypes across the datasets from the International Warfarin Pharmacogenetics Consortium .
As a central knowledge-sharing site, PharmGKB noticed an opportunity to facilitate data-sharing consortia, in which investigators with complementary data create a collaboration based on a common scientific interest and the ability to combine datasets. PharmGKB then uses its curatorial staff to integrate, aggregate and annotate the contributing datasets. We have facilitated the formation of several consortia (for the pharmacogenomics of warfarin, tamoxifen and irinotecan), bringing together groups to create diverse sample sets that provide greater statistical power to detect complex associations . The success of these consortia relies on a trustworthy framework for collaboration, as participants are often scientific competitors in other venues. PharmGKB involvement ensures high-quality curation, including the development of a standard template to capture the data integrating and recoding, formatting to allow comparison across the many groups and annotating with metadata to allow for computational searching. Most importantly, PharmGKB acts as an independent party and has developed a reputation as a dependable and scientifically neutral collaborator.
Over the last decade, PharmGKB has collected and annotated pharmacogenomic data from a variety of sources. The published literature is a major source of knowledge, but the volume of papers is so vast that finding the information is cumbersome. We have developed structures to tag and describe relationships in the literature such that they can be found by search mechanisms but also still understood by readers. Gene, drug, disease and variant relationships have been identified and labeled with categories of interest (clinical outcomes; pharmacodynamics [PD]; pharmacokinetics [PK]; cellular and molecular functional assays and genotype data) . The data are accessed from the related gene, drug and disease tabs on individual gene, drug and disease pages. Top gene pages include CYP2D6, ABCB1 and CYP2C9; the top drug pages visited are warfarin, amiodarone and clopidogrel and top diseases include Torsades de pointes, breast neoplasms and epilepsy (Table 2). We have over 4000 literature annotations (as of 17th November 2009) that link gene, drug and disease relationships. Natural language processing is used to streamline the identification of articles of interest to annotate and we are developing tools to speed up the annotation process .
From knowledge of the literature, PharmGKB scientists develop and maintain drug pathways with production-quality graphics and supporting scientific evidence. The PharmGKB currently has 60 curated pathways (as of 17th November 2009) illustrating PD and/or PK aspects for over 180 drugs. The top pathways viewed include platelet aggregation PD pathway, codeine and morphine PK pathway and nicotine PK pathway (Table 2). The repository of relationships built from the literature annotations now allows us to generate automated networks that can be used to start new PharmGKB pathways or downloaded for users to explore with their own methods.
PharmGKB curators have not only annotated gene–drug relationships, but have annotated specific human variations of importance to pharmacogenomics in the ‘variant annotation project’ . Curators summarize the findings of pharmacogenomic relevance regarding a genomic variant and associate these with the appropriate genes, drugs and diseases. Mapping the genomic variants is not as trivial a task; many papers either do not include dbSNP identifiers or have them hidden within the methods sections, or, when they are used, the authors often neglect to specify which base is associated with the phenotype . We have built a dictionary to attempt to cross-reference the various names for variants used in the literature and databases (3000 variant annotations as of 17th November, 2009). We are participating in efforts being made by the biocuration community to require inclusion of standard identifiers for variants, such as dbSNP rs number, in publications. We also write detailed online summaries of very important pharmaco genes (41 to date) and their variants, many of which have also been published [14–23].
Our user interface (Figure 1) now makes searching easier for people to get directly to genes, variants, drugs and pathways of interest. For example, users can easily find genes related to their drug of interest by entering the drug name in the gene search box or view annotations and frequency data for variants related to their drug of interest by searching with the variant box.
PharmGKB has received overwhelmingly positive feedback from users regarding the usefulness of PharmGKB in research, as well as educational programs and presentations. PharmGKB is used to introduce the concept of pharmacogenomics to students in medicine, pharmacy, genetics, toxicology and public health, as well as for the continuing education of medical professionals, including physicians, pharmacists and nurses. We have also used expertise from PharmGKB to pilot a pharmaco genomics project for high school students and teachers. DNATwist  is an interactive website that introduces basic concepts of pharmacogenomics that is also being adapted for use at the Tech Museum of Innovation in California (CA, USA) .
The field of pharmacogenomics is at a critical juncture. Some have criticized the slow pace at which pharmacogenomic interventions are entering routine clinical practice. These critics fail to appreciate the important advances made in understanding the genetic basis of drug response, an important prerequisite for using genetics to intervene. For example, genome-wide association studies focused on drug response are just now emerging, with only a handful published to date . The criticisms associated with genome-wide association studies for complex disease  may be less relevant for many drug responses, where common variants may have more explanatory power . At the same time, our ability to sequence entire genes, exomes and even human genomes is giving us unprecedented access to rare variations, whose interpretation will be critical . The key underlying need is to move from the observation of an association to an understanding of the mechanism. Only with a mechanistic understanding can we find the causative common variants in genome-wide association studies, and only with mechanistic models can we determine which rare variants (and in which combinations) explain drug-response phenotypes. PharmGKB will continue to provide the platform to examine the relationships between variants and drug response, adding new tools as new data is gathered and disseminating it to researchers and educators.
The authors would like to thank PharmGKB team members past and present without whom none of this would be possible: Dorit Berlin, John Conroy, Katrina Easton, Ray Fergerson, Li Gong, Mei Gong, Winston Gor, Joan Hebert, Tina Hernandez- Boussard, Micheal Hewett, Amy Hodge, Laura Hodges, Daniel Holbert, Mark Kiuchi, Steve Lin, Feng Liu, Xing Jian Lou, Charity Lu, Andrew MacBride, Diane Oliver, Connie Oshiro, Ryan Owen, Daniel Rubin, Katrin Sangkuhl, Farhad Shafa, Ravi Shankar, Rebecca Tang, TC Truong, Ryan Whaley, Michelle Whirl Carrillo, Mark Woon and Tina Zhou.
For reprint orders, please contact: moc.enicidemerutuf@stnirper
Financial & competing interests disclosure
This work is supported by the NIH/NIGMS (U01GM61374). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.
Caroline F Thorn, Department of Genetics, Stanford University Medical Center, Stanford, CA, USA.
Teri E Klein, Department of Genetics, Stanford University Medical Center, Stanford, CA, USA.
Russ B Altman, Department of Genetics, Stanford university Medical Center, 300 Pasteur Drive, Lane L301, Mail Code 5120, Stanford, CA 94305-5120, USA, Tel.: +1 650 725 0659, Fax: +1 650 725 3863 and Department of Bioengineering, Stanford University Medical Center, Stanford, CA, USA.