|Home | About | Journals | Submit | Contact Us | Français|
While pharmacogenetics - the correlation of genotype and response to medicines - currently has a small but measurable impact on the prescribing practice of clinicians, the advent of the `personal genome' is likely to change this significantly. Advances in high-throughput technologies aimed at characterizing human genetic variation, including chip-based genotyping and next-generation sequencing, are poised to provide a flood of information that will affect both pharmacogenetic discovery and pharmacogenetic application in clinical practice. In order for this flood of information to not overwhelm both researchers and clinicians alike, a variety of new and expanded information management tools will be needed, including electronic medical records, bioinformatic algorithms for analyzing sequence data, information management systems for storing, retrieving and interpreting whole-genome sequence data, and pharmacogenetic decision tools for prescribers.
The practice of modern medicine employs a broad array of drugs to cure acute conditions, such as infections, and to manage chronic conditions, such as diabetes and heart disease. To be approved for use for a given indication, a drug must be demonstrated to be both safe and effective. However, this determination is based on aggregate measures, and individual response to drugs can vary substantially. Some patients may show no response, some may have the desired therapeutic response, while others may have adverse responses ranging from the merely annoying to life threatening. This variation in response may be explained in part by diagnostic uncertainty, environmental factors such as diet and interacting drugs, and by clinical factors such as age and comorbid conditions. However, genetics has also been demonstrated to play a major role in patient–patient variation in drug response. Individual genetic variation in enzymes of drug metabolism and transport may lead to inadequate therapeutic responses owing to the inability to absorb a drug, the inability to activate a prodrug, or excessive metabolism and/or excretion of an active drug. Alternatively, the form of the drug target encoded in an individual's genome may be insensitive to the drug . Similarly, an individual may experience adverse reactions to a drug if their genome encodes defective enzymes of drug metabolism, resulting in an abnormally high exposure to the drug despite normal dosing. Pharmacogenetics is the study of these effects of the `personal genome' on drug response.
The field of pharmacogenetics had its beginnings in the 1950s, with discoveries demonstrating that individual variation in the levels of activity of enzymes involved in the metabolism of a number of drugs correlated with adverse reactions to those drugs . The advent of molecular cloning and sequencing technologies enabled the identification of sequence variants in the genes for these enzymes that are responsible for the variation in activity, and catalogs detailing large numbers of these variants are now available online . Chip-based genotyping technologies have now made practical the testing of panels of genetic variants in large numbers of individuals, effectively making available a limited but focused view of the `personal genomes' of those individuals. Next-generation sequencing technologies are now coming into widespread use and offer the prospect of complete, genome-wide information on an individual basis, but raise new issues relating to the interpretation of that information in relation to pharmacogenetics. This rapidly changing landscape has implications for both the discovery of genetic variants with pharmacogenetic effects and the application into clinical practice.
Characterizing the extent of genetic variation in populations of interest and understanding its distribution across the genome is a prerequisite for any genetic study, and this is certainly true for pharmacogenetic studies. Discovery of most types of genetic variation depends on DNA sequencing. Discovery can be focused on a single gene or a limited number of genes by using molecular cloning or PCR to isolate those genes individually from the DNA of multiple subjects. Comparison of the DNA sequences derived from different subjects allows for the identification of sequence variants, which are positions in the genome where the DNA sequence differs between individuals. Alternatively, an automated pipeline for random, `shotgun' sequencing can identify variants distributed across the entire genome, as was done for the HapMap project, which identified and validated more than 3.1 million SNPs [3,4]. The dbSNP database  currently lists more than 6.5 million validated DNA sequence variants in humans. The large majority of these are common variants, defined as having minor allele frequencies (MAF) greater than 5%, but this is still only a fraction of the 15–20 million sequence variants estimated to occur in human populations, approximately half of which are expected to be uncommon variants (MAF < 5%) . To identify 99% of both common and uncommon variants (MAF > 0.1%) in the Caucasian population would require the sequencing of approximately 1000 subjects , and a publicly funded project to obtain complete genome sequences for this many subjects is now underway .
For the last 30 years, the sequencing methodology of choice has been the Sanger method . Constant improvements in sequencing chemistry, gel-based sequencing equipment, automation and software have resulted in the dramatic increases in sequencing throughput, culminating in the determination of two versions of a consensus sequence for the human genome [7,8]. These sequences now serve as the reference human genomes against which other sequences can be compared for the identification of variants. However, the use of this technology base for the determination of the complete sequences of multiple, individual human genomes remains impractical owing to issues of cost and capacity. Novel technologies for sequencing using highly multiplexed arrays of single DNA molecules or single molecule-derived clusters of DNA (so-called next-generation sequencing) have emerged in the last few years and represent a transformational advance for DNA sequencing, making the prospect of individual, `personal genome sequences' conceivable . Several complete human genome sequences have been published using these technologies, and have identified a treasure trove of novel genetic variation. For example, complete genome sequences of a Caucasian, three Asians and an African individual yielded approximately 600,000, 420,000–590,000, and 1,000,000 novel variants, respectively [10–14]. The 1000 Genomes Project , currently underway, is expected to catalog the vast majority of the remaining common genetic variation in the human genome. Given the means to comprehensively catalog human genetic variation, the challenge for the field of pharmacogenetics is now twofold: first, to discover which of the known or novel variants in the human genome are responsible for differences in individual responses to drugs, and second, to apply these discoveries to clinical practice to allow for the prescription of the safest and most effective drug for the individual patient.
Demonstrating the pharmacogenetic effects of DNA variants requires the collection of a patient cohort with documented medication and outcomes data as well as available DNA samples for testing. Clinical trials and observational studies of patients in specialty clinics often provide the best opportunity to collect the requisite data and samples. However, the number of subjects with the phenotype of interest that can be collected in this way is often small, severely limiting the power to detect pharmacogenetic effects. This is particularly true when the phenotype is uncommon, as is often the case for adverse events. Postmarketing surveillance would seem to be a logical source of patients experiencing adverse events. However, the current system in the USA, the US FDA MedWatch system, with its reliance on voluntary submissions of adverse event reports, is inadequate to provide the high quality patient data required in sufficient numbers. Large epidemiological cohorts, such as the Framingham study in the USA  or birth cohorts collected in a number of European countries, may provide a useful number of subjects, particularly when these studies are linked to electronic medical and prescribing records. Health maintenance organizations (HMOs) and pharmacy benefits managers (PBMs) have large patient databases with electronic records, and a number of projects exploiting these resources are already underway. For example, the Marshfield Clinic's personalized medicine research project combines a biobank of DNA, plasma and serum samples from 20,000 consented individuals with access to the Clinic's medical records . A number of pharmaco-genetic studies based on this resource have been completed or are underway, including studies of statins, metformin, warfarin and tamoxifen. Many of these studies were undertaken in collaboration with academic researchers, demonstrating that academic/HMO partnerships are feasible and can be very productive. Interest in pharmacogenetics by PBMs is exemplified by Medco's (NJ, USA) project on the pharmacogenetic dosing of warfarin, where patients initiating warfarin therapy are recruited in real-time from Medco's covered population . The National Human Genome Research Institute (MD, USA) has organized the eMERGE Network, a consortium of academic and private health-care systems, designed to `develop, disseminate and apply approaches to research that combine DNA biorepositories with electronic medical record (EMR) systems for large-scale, high-throughput genetic research' . The success of these efforts demon strates that the recruitment of subjects based on electronic medical and prescribing records represents an important source of patient cohorts for pharmacogenetic discovery in the future.
Once a cohort of patients has been identified, a choice must be made as to which variants to genotype, how many variants to genotype and the appropriate genotyping technology platform to be used. The traditional approach of testing variants one at a time with individual genotyping assays may severely limit the number of variants that can be typed due to considerations of cost, time and the amount of DNA available. It is then crucial to choose variants with high a priori probability of involvement in the phenotype to be tested. For this reason, pharmacogenetic discovery has, to date, largely relied on the so-called `candidate gene' approach. Candidate genes for a particular drug are selected for evaluation based on pre-existing knowledge relating to the drug target or to pathways of drug metabolism. A typical candidate gene study usually involves one to a few genes. Variants in the candidate genes may be selected from databases or from published reports, based on their presumed or established functional effects, or simply be distributed across the genes at some prespecified density. Typically, one to a few variants are tested per gene. If sufficient variants are not available, the candidate genes may be resequenced in a limited number of subjects (either randomly selected controls or patients with the drug-response phenotype of interest) to identify genetic variants, which may subsequently be genotyped in larger collections of patients with measured drug-response phenotypes. A significant association of a drug-response phenotype with specific alleles or genotypes for a given genetic variant is taken as evidence of an effect of the gene containing that variant on drug response.
The candidate gene approach can be very successful when the correct candidates are chosen and genetic variation within those candidates has been adequately surveyed. However, the candidate gene approach will fail when the state of knowledge about metabolic pathways or drug targets or target pathways for the drug under study is inadequate to allow the correct choice of candidates for testing. For example, selective serotonin reuptake inhibitors are widely used in the treatment of depression, and show widely varying efficacy and side-effect profiles in individual patients. They have also been the subject of many pharmacogenetic studies focusing on the drug target (serotonin reuptake transporter), downstream target pathways (serotonin receptors) and genes involved in drug metabolism (CYP2D6 and CYP2C19). However, results of most of these studies have been either negative or inconsistent , suggesting that genes with significant pharmacogenetic effects must be sought outside of the primary pharmacokinetic and pharmacodynamic pathways. The candidate gene approach may also fail when the state of knowledge concerning genetic variation in the candidate genes is inadequate. Variant discovery may have been carried out in an inadequate number of subjects to successfully identify all relevant variants, particularly those occurring at a low frequency in the discovery population, or may not have covered critical regions of the gene, such as regulatory elements in introns or promoter regions. Patterns of genetic variation differ between racial groups [19,20], and variants may have been described in subjects from a different racial group than those being tested for pharmacogenetic effects.
Recent advancements in our understanding of the patterns of genetic variation across the human genome, provided in large part by the HapMap project, as well as advances in highly multiplexed genotyping technologies, have made possible an alternative to the candidate gene approach: the genome-wide association study (GWAS). No knowledge of candidate genes is required; the GWAS approach only assumes that genetic polymorphisms affecting the trait of interest exist somewhere in the genome, and conducts an unbiased search across the entire genome to find them. This is done by utilizing chip-based technologies to simultaneously (and cost-effectively) genotype 500,000–1 million SNPs (the most common type of genetic variant in the human genome) approximately evenly distributed across the genome. Even this enormous number of variants is only a small fraction of the total number of variants known or predicted to exist in the human population. However, genotypes for common SNPs in close proximity to each other in the genome often tend to be highly correlated, due to the coinheritance through human evolutionary history of the ancestral genotype combinations present on small chromosomal segments, a phenomenon know as linkage disequilibrium. Linkage disequilibrium between the SNPs tested on the chips and nearby, untested SNPs allows a GWAS to capture the majority of genetic information at a much larger number of SNPs, with the tested SNPs serving as proxies, or `tags' for the information contained in the untested SNPs. Patterns of linkage disequilibrium differ between racial groups and estimates of the proportion of common variants for which the current generation of GWAS chips will yield more than 80% of the total available information range from nearly 90% in Caucasians to less than 70% in individuals of African descent .
A GWAS will largely test for only the effects of common polymorphisms (MAF > 0.05). Because the power to detect association decreases as the MAF decreases, the vast majority of SNPs included on the commercially available genotyping chips are common SNPs. Furthermore, of all the untested variants in the genome, only common polymorphisms will show substantial linkage disequilibrium with the common SNPs tested on the chip. The large number of association tests performed in a GWAS requires the use of stringent levels of statistical significance (p-values of 10−7 or better) to avoid false-positive associations, and consequently the power to detect associations at these genome-wide levels of significance for variants with modest effects on phenotype may be quite low unless large numbers of subjects are studied. Regions of segmental duplication in the human genome are typically under-represented on GWAS genotyping chips, and many genes of potential pharmacogenetic interest, such as cytochrome P450 gene families, tend to fall in these regions and may therefore be inadequately covered in a GWAS. Despite these caveats, hundreds of GWAS studies have been conducted in the last few years and have successfully identified scores of genes contributing to the risk of many common human diseases and conditions, such as diabetes, heart disease, asthma and obesity . Many of the genes identified would not have been considered candidates for the trait in question, demonstrating the value of the unbiased, genome-wide approach for making novel discoveries.
The GWAS era in pharmacogenetics is just beginning, with only a small number of published studies to date. Investigation of variation in therapeutic response to drugs include studies of iloperidone treatment of schizophrenia , thiazide diuretic treatment of hypertension , anti-TNF treatment in rheumatoid arthritis  and warfarin dosing [24,25]. The investigation of adverse events includes studies of the elevation of serum alanine aminotransferase by ximelagatran , QT interval prolongation by iloperi-done , bisphosphonate-related osteonecrosis of the jaw , statin-induced myopathy  and drug-induced liver injury due to flucloxacillin . As in the common disease studies, many of the pharmacogenetics associations detected are in genes that would not necessarily have been considered candidates for the effect studied, providing novel insights into biology. The odds ratios for the variants detected in pharmacogenetic GWAS have mostly been relatively high (>2), in contrast to the variants found in common disease GWAS, where the odds ratios are usually quite low (<1.5). However, this is not surprising in that most of the cohorts used in the pharmacogenetic studies (several hundred to a thousand subjects) are quite small by GWAS standards, and therefore were not sufficiently powered to detect variants with low odds ratios . It is to be expected that expanding the size of pharmacogenetics cohorts, perhaps by exploiting large-scale epidemiological cohorts or HMO or pharmacy databases, will allow the detection of additional variants contributing to drug response phenotypes at lower odds ratios.
An intermediate between the traditional candidate gene approach and GWAS is the use of chip-based technology to genotype panels of variants from large numbers of potential candidate genes. For example, the commercially available DMET™ plus chip (Affymetrix, CA, USA) assays nearly 2000 variants in 225 genes thought to play a prominent role in adsorption, distribution, metabolism and excretion (ADME) for a wide range of drugs . By assaying a number of variants several orders of magnitude smaller than a GWAS chip, the statistical penalty for multiple testing is reduced, increasing the power to detect pharmacogenetic effects. More alleles of each gene are assayed, over a broader range of MAF, providing a more comprehensive view of the effects of these genes. DMET chips have been used in the discovery of the pharmacogenetics effect of polymorphisms in the CYP4F2 gene on warfarin dose , and polymorphisms in the CYP2C19 gene on clopidogrel efficacy . It is likely that the use of this and other custom chip designs will have a prominent place in pharmacogenetics studies in the future.
Given the fact that both the candidate gene approach and GWAS typically focus primarily on common variants, it is fair to ask if rare variants should also be investigated for pharmacogenetics effects. In studies of susceptibility to common, complex diseases, the effect sizes of common polymorphisms are often small and account for a small percentage of total variation in the phenotype[31,35]. Rare variants of high penetrance are also likely to contribute to phenotypes of interest, and multiple rare variants in the many genes involved in pathways affecting a given phenotype may, in aggregate, account for a larger fraction of the total variation in the phenotype than common polymorphisms . For example, Fearnhead and colleagues identified 13 potentially deleterious, rare variants in five candidate genes involved in Wnt signaling and mismatch repair from a series of patients with multiple colorectal adenomas and a series of random controls . A variant was present in 24.9% of patients compared with only 12% of controls. The difference was highly significant . Similarly, Sandilands and colleagues showed that multiple null alleles of the filaggren gene, including uncommon polymorphisms (MAF ~ 1%) and rare variants, were present in 47% of a series of Irish patients with atopic dermatitis, compared with approximately 8% of controls . In a series of papers, Cohen and colleagues identified rare, nonsynonymous variants in six candidate genes related to cholesterol metabolism by comprehensive sequencing of the exons of these genes in a series of patient samples, and showed higher frequencies of these variants in individuals at the extremes of the trait distributions [39–42], suggesting a causal role. Ji and colleagues screened 3000 patients from the Framingham Heart Study offspring cohort for rare variants in three genes known to be involved in recessive diseases exhibiting extremely low blood pressure. They found 30 rare variants of known or inferred functional effect in the heterozygous state in 49 subjects. These subjects had an average systolic blood pressure of 6.3 mmHg lower than that of the entire cohort, and a reduced prevalence of hypertension .
In pharmacogenetics research, in contrast to disease susceptibility research, there are many examples of common variants with high penetrance that account for a substantial portion of the variation in response to certain drugs in human populations. However, there are also many examples of uncommon to rare variants that have similar values of penetrance. For example, the CYP2D6 gene, responsible for the metabolism of 25% of commonly used drugs, has been extensively studied over the years, and over 75 variants have been described . Most are low frequency variants in any given population (though frequencies vary geographically), and novel variants continue to be discovered as larger and more diverse groups of subjects are examined. Not all of these CYP2D6 variants have been characterized functionally, but a substantial fraction have been demonstrated to have little or no enzyme activity and would be expected to have an impact on phenotypes related to CYP2D6 activity. Despite this evidence of allelic diversity, most candidate gene studies will choose to genotype only the few variants known to be common in the population being studied for reasons of cost and convenience, and GWAS will miss the effects of all of the rare variants due to the fact that they will not be adequately `tagged' by any of the common SNPs on the genotyping chip. Thus, in a discovery setting, part of the effect of CYP2D6 on a pharmacogenetic phenotype in a population will be missed, and in a clinical setting, individual patients carrying the rare, but untyped, variants will be given incorrect information about their CYP2D6 status and their consequent risk for drug reactions. The situation could be improved by using a multiplexed genotyping assay to allow cost-effective coverage of a larger range of genetic variation. For example, the DMET Plus chip assays 31 variants in the CYP2D6 gene, a substantial improvement over the handful of common variants typically tested by individual assays. However, novel variants will still be missed, and for some applications, multiplexed assays with the required content will not be available.
The best way to insure the comprehensive identification and genotyping of all variants of a gene, both common and rare, is to resequence that particular gene in each subject. For example, complete sequencing of the BRCA1 and BRCA2 genes is routinely employed in a diagnostic setting to identify variants conferring a risk for breast cancer in subjects with a family history of the disease . Traditional Sanger sequencing is adequate for resequencing of one or two genes, but if large panels of genes must be resequenced for pharmacogenetic discovery or pharmacogenetic risk prediction, this approach becomes impractical.
A variety of next-generation sequencing technologies are now commercially available . All rely on sequential fluorescent imaging of base incorporations in large scale arrays of single DNA molecules or single molecule-derived clusters of DNA, though the enzymology and chemistry of cluster generation and base incorporation differ between technologies. All these technologies currently produce shorter individual sequences with higher per base error rates than traditional Sanger sequencing, but rely on the enormous number of individual sequences generated by the massively parallel nature of the approach to produce highly-redundant coverage of each base, resulting in an accurate consensus sequence and accurate calling of sequence variants. Next-generation sequencing technologies have recently been used to determine the essentially complete genome sequences of five individuals [10–14], and both tumor and constitutional genome sequences of a sixth individual . However, despite the enormous increases in throughput and reductions in cost per consensus base pair of these approaches, applying whole-genome sequencing to the large number of individuals needed to identify rare variants and demonstrate associations of those variants with phenotypes of interest is still a large project, and is still prohibitively expensive. Inexpensive, complete, `personal genomes' may be on the horizon, but that horizon is still a few years away.
An alternative to whole-genome sequencing is to apply next-generation technology to targeted resequencing of specific genes. Two main approaches have been employed for targeting specific regions of the genome for sequencing: hybridization-based sequence capture [46–49]and multiplexed PCR amplification [50–53]. Each of these approaches has advantages and disadvantages. Both are still in the early stages of development, and it is likely that improvements in one or more of these methods in the near future will allow for the targeting of larger regions with improved specificity and uniformity of coverage, and at reasonable cost. Targeted resequencing using next-generation technology can easily produce megabases of sequence covering extensive lists of candidate genes in a cost-effective and timely manner, providing a comprehensive but focused view of the portion of an individual genome where variants believed to have high a priori probabilities of affecting a phenotype of interest may be found. Alternatively, a gene-targeted but genome-wide approach can be taken by targeting the entire exome (the protein coding sequences for all genes, comprising approximately 1–2% of the genome) .
In a recent paper, the exome was extracted from the complete genome sequence of Craig Venter (one of a handful of complete individual genome sequences currently in the public domain) and analyzed for potentially functional variants . Venter's exome contained approximately 10,400 nonsynonymous single nucleotide variants ([SNVs] i.e., single nucleotide changes that alter the amino acid encoded at a particular position in a protein), and about 15–20% of these SNVs are rare in the human population. Bioinformatic analysis of the amino acid changes produced by these variants indicated that about 1500 are likely to be functional. Only seven nonsynonymous SNPs had annotations in the literature linking them to disease states, and all of these had low odds ratios. No obvious correlations of any of these SNPs with the phenotype of Dr Venter could be inferred. This study points out some of the difficulties of interpretation that will be encountered in analyzing large-scale individual sequence data. In any whole-genome sequence, whole-exome sequence, or even moderately large set of candidate gene sequences, a large number of variants will be found. Most of the common variants and essentially all of the rare variants will have no annotation in the literature linking them with disease states or response to medicines. Functional consequences can be assumed for SNVs that result in protein truncation, or indels (insertion/deletion polymorphisms) that result in frameshifts, or deletions that remove large portions of a gene. Bioinformatic analysis can infer function for some of the nonsynonymous SNVs [56,57] and an even smaller portion of regulatory variants. However, many variants will remain uncharacterized as to function, unless high-throughput systems for functional screening can be devised. Even if a variant can be demonstrated to have an effect on protein function, it remains to be shown whether this results in an effect on phenotype. Moderately common to common variants discovered through sequencing can be tested for association with a phenotype in a suitable patient cohort, as has traditionally been done in both candidate gene studies and GWAS. For rare variants, this approach will have insufficient power even in large cohorts, and new approaches must be employed that will aggregate multiple rare variants of similar functional consequences for analysis. For example, Cohen et al.  showed that multiple nonsynonymous variants in candidate genes for high-density lipoprotein (HDL) cholesterol levels, taken together, were enriched in patients at one extreme of the population distribution of HDL levels, suggesting that variants that affect the function of these genes also affect HDL levels. Focusing resequencing efforts on subjects collected from the extremes of the population distribution of pharmacogenetic traits may prove to be a powerful approach to pharmacogenetic discovery.
Testing for individual genetic variants with known effects on drug response is already available to clinicians and is being used to guide therapy to a limited extent. For example, testing for common variants of CYP2C9 and VKORC1, which affect warfarin dose requirements, is available from many clinical laboratories. Algorithms for calculating a starting dose of warfarin based on genotype plus clinical factors have been developed  and are available online , facilitating the interpretation and use of pharmacogenetic information by clinicians. Testing for HLA-B*5701 in patients initiating abacavir therapy has become widespread, and essentially eliminates cases of abacavir hypersensitivity . Cetuximab, a monoclonal antibody directed against the EGF receptor, is ineffective in treating colorectal cancers containing somatic mutations in the KRAS oncogene, and KRAS genotyping is becoming common in selecting an appropriate chemotherapy regimen for colorectal cancer patients . Many other single gene tests are available from both clinical laboratories and direct-to-consumer genetics companies, with varying degrees of proof regarding clinical utility and cost-effectiveness, and clinicians are generally left to their own devises in interpretation and incorporation of the genetic results into their prescribing practices. Improvements in the evidence base and development of widely accepted, detailed pharmacogenetic-based prescribing guidelines for clinicians are needed to further the goal of personalizing prescribing practices.
The field of pharmacogenetics is evolving rapidly today. While traditional candidate gene studies are still being pursued, GWAS are becoming much more common, and deep resequencing is seen as the next logical step. In the near term, as larger numbers of clinically validated pharmacogenetic effects are identified, multiplexed, chip-based assays for panels of variants are likely to become the preferred method for delivering personalized prescribing based on individual genetic variation to large numbers of patients. The AmpliChip® (Roche, Basal, Switzerland) was the first such device to gain US FDA approval for testing for polymorphisms in the CYP2D6 and CYP2C19 genes. However, the AmpliChip has not come into widespread use owing to its relatively high cost and limited coverage of drug metabolism genes. The DMET Plus chip, though not an FDA approved device, is a promising development in this area, covering over 200 genes involved in drug metabolism and transport. Additional, custom chips or alternative multiplexed genotyping technologies will undoubtedly be developed for specific diagnostic needs and updated as novel pharmacogenetic variants are discovered. These types of highly multiplexed assays for individual variants provide a view into the `pharmacogenome' of an individual, and thus represent a focused version of the `personal genome'.
The holy grail of the genome sequencing technology world is the US$1000 genome. As the cost of genome sequencing declines towards this level over the next few years, clinicians face the prospect of patients arriving in their offices with copies of their personal genome in hand. What will be required to translate this overwhelming amount of information into clinical decision making that takes into account the patient's inherited susceptibility to disease and their likely reactions to the drugs that may be used to treat those diseases? First, an enormous infrastructure to support the actual sequencing itself, the storage of the resulting sequences, and the ability to access those sequences, will have to be developed. A single run of a next-generation sequencing instrument produces approximately one terabyte of raw data, which must be transferred from the sequencer to temporary storage space over high-speed networks, analyzed on dedicated servers to extract sequence files, and assembled into completed genome or target-gene sequences using special purpose software . This infrastructure exists today only within a few genome centers, but will have to be replicated and extended enormously to allow worldwide dissemination of the technology. Given that the finished sequences themselves can be generated, EMR systems adapted to handling sequence data will have to be widely adopted across the healthcare system . Second, improvements in bioinformatics tools will be needed to assess the functional significance of rare variants identified in genes of potential pharmacogenetic impact in individual patients' genome sequences. Third, major advances will be needed in our understanding of how genetic variation affects responses to commonly used drugs and other drugs with potentially severe side effects, and in the clinical utility of using genetic information to guide prescribing of these drugs. Assuming that suitable protections are put into place to protect patient privacy, the mining of large databases of individual genome sequences coupled to drug response data in EMRs will lead to novel pharmacogenetic discoveries and provide the evidence base for the clinical utility of pharmacogenetic markers . Fourth, as the pharmacogenetics knowledge base expands, application of sequence data to clinical decision making will follow. Informatics-based decision tools will be needed to help guide clinicians in interpretation of genome sequences and personalization of prescribing practices based on those interpretations. Finally, outreach and education programs will be needed to facilitate clinician understanding of the new tools available to them to bring the benefits of personalized, genome-based medicine to their patients.
The author would like to acknowledge Howard McLeod, Sarah McWhinney Glass and Kevin Long for their helpful comments.
Financial & competing interests disclosure The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.
Papers of special note have been highlighted as:
of considerable interest