|Home | About | Journals | Submit | Contact Us | Français|
Next‐generation sequencing (NGS) approaches for measuring RNA and DNA benefit from greatly increased sensitivity, dynamic range and detection of novel transcripts. These technologies are rapidly becoming the standard for molecular assays and represent huge potential value to the practice of oncology. However, many challenges exist in the transition of these technologies from research application to clinical practice. This review discusses the value of NGS in detecting mutations, copy number changes and RNA quantification and their applications in oncology, the challenges for adoption and the relevant steps that are needed for translating this potential to routine practice.
The last decade of research has consolidated our understanding of cancer as a genetic disease caused by genomic disruptions ranging from single point mutations, deletions or amplifications of chromosomal segments, and structural rearrangements that give rise to chimeral genes. The aberrations at the genomic level drive changes in gene expression, activate or silence genes and thereby perturb gene networks and pathways.
There now exists an extensive literature cataloging genomic disruptions in cancer and their effect on biological functions of cancer cells. Several of these disruptions are important biomarkers and impact treatment options. Estrogen receptor (ER) testing has been routinely performed on breast carcinoma samples since the 1980's to determine if hormonal therapy is indicated. Similarly, EGFR mutation status has been used to determine which lung cancer patients will benefit from agents targeting the EGFR receptor. The FDA lists more than 100 indications where pharmacogenomic testing is indicated, including 38 in oncology. (http://www.fda.gov/drugs/scienceresearch/researchareas/Pharmacogenetics/ucm083378.htm). While each of these individual measures is of value, individually they represent a single data point in the complex environment of cancer.
The advent of multigene analysis tools, i.e. gene arrays, CGH and next generation sequencing, has advanced the field by measuring the complexity of cancer in a more comprehensive fashion. Several multigene assays have been introduced into the clinic to predict patient outcome. Oncotype DX™ (Genomic Health, Redwood, CA) quantifies the expression of 21 genes by RT‐PCR and uses an algorithm to combine the expression values into a “recurrence score” to predict chemotherapy benefit for a subset of breast cancer patients. Many additional tests and markers have been reported to be of use in the management of cancer – CancerTYPE ID (bioTheranostics, Inc, San Diego, CA) to aid in the classification of the tissue of origin and tumor subtype for patients diagnosed with malignant disease, OncoType Dx Colon (Genomic Health) for assessment of risk of recurrence following surgery in stage II colon cancer patients and Mammaprint™ (Agendia, Irvine, CA) for identifying risk of distant recurrence following surgery. All these tests measure genes and their expression levels through capillary sequencing, microarrays, or PCR and are standardized for measuring a small subset of the tumor genome.
As the utility of these tools increases, the number of different tests and diagnostic providers makes it increasingly cumbersome for pathologists and oncologists to obtain enough sample material for analysis. Next Generation Sequencing (NGS) technologies offer the potential to measure and quantify all these markers at once and provide a more complete view of the tumor's molecular state. In addition, NGS has allowed the analysis of complete human genomes at a reasonable cost – the cost for sequencing one human genome has come down from $100 million in 2001 to just under $3000 in 2012 (www.genome.gov/sequencingcosts).
NGS can now provide the following depth and breadth of genomic information in a single test: (Gargis et al., 2012) (i) Whole genome information at single nucleotide level with a complete catalog of mutations, (Ellis and Perou, 2013); (ii) A profile of the copy number states of individual genes and many chromosomal aberrations (Ellis and Perou, 2013); (iii) Whole transcriptome landscape including mRNA levels of protein coding genes, non‐coding RNA, expression of repeat rich regions, and aberrant fusion genes. We can now foresee a scenario in which a tumor sample, once obtained through biopsy or surgery, is used to extract DNA and RNA, sequenced and assembled to provide the full patient genome and an RNASeq profile of his/her transcriptome (Figure 1). This data is then mined to catalog all mutations and copy number aberrations and quantify expression of genes. Once the mutational, copy number and transcriptomic profile is generated, clinical decision support algorithms are used to extract and present useful and clinically actionable information from the results of the analyses. This has already been shown in pilot studies (Gargis et al., 2012) and is a major step forward in realizing the potential of personalized medicine.
A model for enabling sequencing based personalized oncology. A CLIA certified laboratory generates NGS data (Layer 1) that is transferred to a High Performance Computing environment where the requisite quality control and analysis of the data is performed ...
The comprehensive nature of NGS also has the potential to replace a multitude of single gene tests that are currently performed on multiple discrete specimens with a single test on one specimen. This would lead to improved standardization of tests for specific genetic abnormalities, more in depth information for the clinicians and more cost effective molecular diagnostic testing. Additionally, once sequenced, this genome information is “digitized and may be immortalized” which enables the sequenced sample to be “frozen” in‐silico and accessible for further querying as the treatment progresses or whenever new clinically relevant aberrations are identified or reported. This is more advantageous than testing an archived tumor sample for prospective or retrospective analysis. This makes a patient genome a source of data for comprehensive genome forward and backward approaches i.e. mutations that are identified would drive the selection of therapies based on retrospective data on prior patients with similar genomic profiles whose outcome is known (genome backward medicine). For patients who have failed conventional therapies or for whom there are no clearly delineated guidelines on therapy choice, new therapeutic strategies could be attempted (as part of clearly defined clinical trials) based on the comprehensive analysis of the patient tumor's genetic makeup (genome forward medicine). Recently, Ellis and Perou offered a prospective view of how genomic profiling could help in the treatment of breast cancer (Ellis and Perou, 2013). They catalogued specific examples of mutations (PIK3CA, BRCA1, BRCA2, GATA3, MLL gene family, rare Receptor Tyrosine Kinases) and genomic abnormalities (amplifications/gain of function mutations in Her2, FGFR1, FGF3, Cyclin D1/CDK4/CDK6, MDM2, deletions/loss of function mutations in PTEN, PIK3R1) that could be used to target therapies in breast cancers.
However, the optimal use of these novel molecular assays will be a challenge to the practicing oncologist. There exist many challenges to transfer this vision from being used in a few luminary sites and hospitals into routine medical practice. Many of the opportunities and challenges in applying next‐generation sequencing for clinical applications have been reviewed elsewhere (Biesecker et al., 2012; Biesecker et al., 2009; Nekrutenko and Taylor, 2012; Treangen and Salzberg, 2011; Berg et al., 2011; Maher, 2011; McDermott et al., 2011; Ormond et al., 2010). In this review, we will focus on the role of next generation sequencing technologies for cancer patients, the challenges we face in using this technology for clinical applications and provide a framework for oncologists to the promise and pitfalls for use in routine clinical practice. We consider three major groups of challenges for the introduction of a new technology such as sequencing into clinical practice. We outline these major steps in Figure 2.
Major challenges for introducing sequencing based oncology into routine practice.
We will now review in detail the status of NGS technology in the context of the broad outlines described above.
Next Generation Sequencing has rapidly replaced other high‐throughput technologies such as microarrays as the platform of choice for many genomic applications. The base‐call quality of Illumina NGS machines when integrated across all reads of a given base exceed those of Sanger based capillary sequencers. Ninety percent of the bases called by Illumina NGS sequencer have quality phred‐scores of Q30 compared to around Q20 for Sanger based sequencers. A phred score of Q30 corresponds to a probability of base‐call error of 1 in 1000 or 99.9% accuracy. Also, throughput of the sequencers has increased dramatically so that it is now possible to generate enough sequence data to assemble a full human genome in 1 day. New sequencers such as Ion Torrent (Life technologies, Carlsbad, CA) and Oxford Nanopore Technologies (Oxford, UK) promise even higher throughput at lower costs. The accuracy, speed and cost of assembling a human genome have met the threshold for enabling clinical use. Additionally, these instruments are undergoing regulatory approval under Clinical Laboratory Improvement Amendments (CLIA) and FDA (510(k)). Several commercial providers have begun to offer CLIA certified lab developed tests (LDT) that use NGS technologies. Foundation Medicine (Cambridge, MA), offers a CLIA certified test that scans for somatic alterations in 236 relevant cancer‐related genes. Ambry Genetics (Aliso Viejo, CA) offers CLIA certified Exome Sequencing for undiagnosed genetic diseases. Both Life Technologies (AmpliSeq™ Comprehensive Cancer Panel) and Illumina (TruSeq Amplicon – Cancer Panel) offer target selection assays for resequencing cancer genes. These methods allow for fast and efficient resequencing of key genes in formalin fixed paraffin embedded (FFPE) samples.
For NGS to become part of routine clinical practice, the medical benefits must be clearly demonstrated. In the area of oncology, there are a number of clinical needs that can be met by sequencing technologies.
Genetic abnormalities detectable by sequencing can be classified into three major groups – single nucleotide changes or point mutations, copy number changes (amplification, rearrangement or deletion of sections of chromosomes) and changes in expression levels of genes. In the following paragraphs, we discuss biomarkers with known clinical utility, measured by individual tests, which could be converted to a whole genome sequencing approach.
Point mutations that lead to constitutive activation of oncogenes or inactivation of tumor suppressor (TS) genes have been used to guide development of novel targeted therapies – the best described is the cKIT mutation in gastrointestinal stromal tumors that can be targeted by imatinib or nilotinib. Recently, mutations in EGFR have been reported to be predictive for EGFR inhibitors (Lynch et al., 2004; Paez et al., 2004), mutations in BRAF oncogene at codon 600 have been reported to have clinical utility in melanoma, colorectal, lung and thyroid cancers (De Roock et al., 2011; Melck et al., 2010; Pao and Girard, 2011). Patients with BRAF V600E mutation‐positive, inoperable or metastatic melanoma, are eligible for treatment with vemurafenib. The cobas 4800 BRAF V600 Mutation Test (Roche), is used to identify patients eligible for treatment. An exemplary list of point mutations for which commercial testing is available and their utility in cancer treatment are provided in Table 1.
Clinically relevant tests for mutations in cancer and their utility.
For single gene tests to be used, the mutations in the gene have to meet a threshold of prevalence to warrant testing and have very high clinical value. Many of the genes listed in Table 1 are frequently not tested in the clinic because they fail to meet this threshold or the resultant clinical action of the tests is not clear. Recent reports suggest that each tumor has a distinct number of mutations driving it – however, each of these mutations is present only in a small percentage of tumors. Mardis et al. (2009) identified 750 point mutations in AML of which 64 were in coding/regulatory regions. Only 4 out of these 64 point mutations could be detected in more than one sample, suggesting that individual mutations are not recurrent. Other studies have reported that while a specific mutation might not be found recurrently, common pathways can be identified that may drive pathogenesis – Stransky et al. (2011) found that more than 30% of head and neck cancer cases harbored mutations in genes that regulate squamous differentiation (e.g., NOTCH1, IRF6, and TP63). These studies suggest that methods that assess the entire genome would offer a comprehensive approach in defining perturbed pathways as opposed to using single gene expression levels or mutation of particular codons as surrogates of perturbed pathways.
It is also increasingly recognized that cancer mutations are not limited by tissue type, although the prevalence of these mutations might be so. For example, BRAF mutations are common in melanoma (>50%), but have also been detected in lower frequencies in other cancers (Brose et al., 2002; Davies et al., 2002). Similarly, ERBB2 amplifications were originally described in breast cancers and form the basis of treatment with trastuzumab. Recently, it has been reported that 10% of gastric cancers also harbor the ERBB2 amplicon and these patients might benefit from Herceptin therapy (Barros‐Silva et al., 2009). Currently, although cumulatively these low prevalence mutations may make up a large portion of cancer drivers they are not routinely tested in the clinical setting. In this context, it might be more important to screen for all mutations in the patient by sequencing in an unbiased manner, instead of applying single gene tests.
Roychowdhury et al. (2011) recently reported an interesting pilot study demonstrating the value of comprehensive sequencing in a patient with metastatic colorectal cancer. A mutation in the NRAS gene and an amplification of the CDK8 locus were identified, both of which could be used to enroll the patient in a clinical trial targeting these aberrations (Roychowdhury et al. (2011)). Interestingly, this patient underwent testing for KRAS and was deemed to be wild type which is a basis for prescribing anti‐EGFR (Biesecker et al., 2012) therapy. Although this patient was not prescribed anti‐EGFR therapy, he would have been eligible. The authors noted that the NRAS mutation is functionally equivalent to a KRAS mutation and if known should preclude this patient from EGFR therapy. This case study also reinforces the difficulty in prescribing single gene mutation testing – While NRAS mutations are present in 18% of cutaneous melanomas (Lee et al., 2011), they are much rarer in colorectal cancers (2%) (Irahara et al., 2010). Only an unbiased sequencing based mutation testing would have uncovered these important driver mutations.
Several platforms and testing services are now available for identification of multiple mutations in cancer‐related genes in a single assay. MacConaill et al. (2009) reported a mutation profiling platform (‘‘OncoMap’’) to interrogate 400 mutations in 33 known oncogenes and tumor suppressors. Such assays, while an improvement over single gene assays, still offer a limited fraction of mutational information relevant to cancer. The advent of whole genome sequencing technologies offers very competitive price points and the ability to profile tumors for multiple mutations with a single test. Commercial providers such as Foundation Medicine (Cambridge, MA) already use deep sequencing on selected sets of cancer genes from DNA extracted from routine pathology specimens to provide actionable information to treatment providers.
Regions of the genome are commonly amplified or deleted in cancer and these regions contain genes that drive cancer progression – the best example being the 17q12 amplicon that harbors the HER2 oncogene. This amplicon leads to a more aggressive type of tumor, which is now the target of a highly successful antibody therapy, trastuzumab (Herceptin®). Other amplicons in 11q13/14, 8q24, and 20q13.2 have been found in cancers that seem to drive the cancer phenotype and have prognostic significance. These regions contain gene sets, which are important in DNA metabolism and maintenance of chromosomal integrity, suggesting that response to DNA damaging agents used as anticancer therapy might be modulated by the presence of particular amplicons. Recently authors of a large breast cancer study proposed ER positive breast cancers harboring amplifications in 11q13/14 as a separate subgroup with worse outcomes (Curtis et al., 2012).
There exist several published methods for inferring gene copy number using next generation sequencing. The sensitivity and specificity of these methods exceed that of microarray based techniques and require very small amounts of tumor DNA as starting material.
Translocations and their corresponding gene fusion products have been known to play an important role in the onset and development of several cancers (Mitelman et al., 2007). The classic example of a translocation resulting in the creation of a fusion transcript is the reciprocal translocation t(9;22)(q34;q11) (McDermott et al., 2011; Lee et al., 2011) causing the BCR‐ABL1 fusion transcript. BCR‐ABL1 fusion occurs in most patients with chronic myelogenous leukemia (CML) and a third of patients with acute lymphoblastic leukemia. While the clinical impact of gene fusions has been most prevalent in hematological malignancies, there is growing evidence that they could have prognostic and predictive utility in common solid tumors (Table 2). The clinical utility of interrogating solid tumors for gene fusions can be seen from the recent approval of crizotinib for the treatment of NSCLC that harbor rearrangements in ALK (Kwak et al., 2010). Similar therapeutic implications have also been recently reported for MAST kinase rearrangements in breast cancer (Robinson et al., 2011) and RET rearrangements in lung adenocarcinoma (Lipson et al., 2012), thus highlighting the clinical relevance of genomic translocations and their fusion transcripts. This provides an additional clinical indication for sequencing technologies that can detect such variations in addition to mutations, amplifications and deletions.
Clinically relevant chromosomal abnormalities in cancer and their utility.
There have been several studies documenting the use of gene expression of one or more genes in determining cancer prognosis or treatment. An early series of publications specifically described molecular signatures in breast cancer, primarily focused on associations between particular sets of genes with altered expression and survival (Sorlie et al., 2001; van 't Veer et al., 2002; West et al., 2001; Perou et al., 2000). These studies led to the development of the clinically used tests such as Oncotype DX™, Mammaprint™, Breast Index™ (BCI), PAM50 and others. Other tumors where multi‐gene tests are useful are the ColoPrint™ assay for recurrence risk of Stage II and III colon cancer patients.
The ability of RNA‐seq to more completely characterize the entire transcriptome in comparison to any existing single technology such as RT‐PCR or microarray promises significant clinical utility (Wang et al., 2009; Ryu et al., 2011). A big advantage of using RNA sequencing would be the opportunity to apply multiple individual or multi‐gene tests to the same sample at a fraction of the cost in running each of these separately in a single RNA Seq run. For example, in breast cancer a single RNA Seq run could provide input to calculate cancer subtype using the PAM50 gene set (Parker et al., 2009), obtain ER/PR and Her2 RNA abundance values (Kamalakaran et al., 2011), calculate the Genomic Grade Index (Filho et al., 2011; Liedtke et al., 2009) based on the 97 genes and identify expression of any other amplified or deleted genes such as EGFR which are known to impact therapy response.
Recent improvements in sample processing protocols have also allowed for robust extraction of DNA/RNA from FFPE samples and use of very small amounts of DNA for complete sequencing (Bonin et al., 2010; Fairley et al., 2012; van Eijk et al., 2012). Sample quality is of primary importance in generating sequence based diagnostic tests. Quality control steps such as RNA integrity number (Schroeder et al., 2006) as used in standard molecular tests are an absolute requirement. In addition, steps are needed to evaluate and quantify histology of the specimen – especially to establish percentage invasive tumor in the specimen and the heterogeneity based on staining. Additionally following the sequencing run, general quality checkpoints are essential to ensure data integrity and quality. The quality of the reads produced by the sequencer needs to be evaluated before subsequent mapping and feature extraction are performed. While most sequencers generate a quality control (QC) report as part of their analysis pipeline, tools such as FASTQC are available and should be incorporated routinely in any clinical analysis. A QC report that identifies problems as originating either in the sequencer or in the starting library material would be more useful in the clinical context. The interpretation of the reads produced by the sequencer involves data analytic steps that first require alignment of the reads to a reference genome or transcriptome and subsequent interpretation of the resulting alignments to detect the presence of mutations, copy number abnormalities, transcript abundance. Additionally following the sequencing run, general quality checkpoints are essential to ensure data integrity and quality. Quality checkpoints for sequencing based measurements are detailed in Table 3.
Quality control checkpoints for sequencing based measurements.
There has been an enormous body of computational biology research on the identification of mutations, copy number changes and quantification of gene expression from tumor samples. However, to be truly useful in the clinic, the methods must be used in conjunction with strict controls on data quality, coverage and an understanding of the assumptions under which the algorithms were developed.
Phred quality scores (Ewing and Green, 1998; Ewing et al., 1998) were originally introduced by Phil Green to assess the probability of accurately calling a base in capillary sequencers. Li et al. (Li et al., 2008a) introduced the concept of using these quality scores in a consensus nucleotide calling algorithm to call mutations and single nucleotide polymorphisms. These Phred scores (Ewing and Green, 1998; Ewing et al., 1998) must be higher than 30 to ensure correct base calls by the sequencer. Additionally, each base called must have enough reads that align to that particular base to have accurate genotyping. In order to achieve an accuracy of 1 genotyping error in 1 million calls, Ajay et al. (Ajay et al., 2011) required at least 50X coverage of a clinical blood sample. Other reports have a higher threshold of 100X for calling genotypes (Carter et al., 2012). Kohlmann et al. (2011) recently reported on a study across 10 different laboratories for targeted mutation screening using sequencing for clinical applications. They reported a high concordance, including a robust detection of novel variants, which were undetected by standard Sanger sequencing. Additionally, they demonstrated sensitivity to detect low‐level variants present with 1–2% frequency. In comparison, the threshold is 20% for traditional Sanger‐based sequencing, and demonstrates the power and strength of the next generation sequencing technologies.
For normal diploid genomes, Ajay et al. (2011) reported that the conventional read depth of ~30X coverage has produced high quality genotypes but stringent filters for data quality allowed them to accurately genotype only 30% of the genome at 30X mean coverage of the whole genome. They demonstrated an average coverage depth of 50X was necessary to accurately genotype 95% of the genome. However, challenges exist in calling mutations from cancer genomes as consensus calling algorithms used in this field typically assume that the genome is diploid which is not true for large segments of cancer genomes. Detection of somatic alterations in tumor biopsy samples is complicated by both the presence of normal cells in the biopsied tissue (purity) as well as the presence of multiple clonal subpopulations within tumor cells. These factors affect the required depth of sequencing to call clonal mutations at sufficient power (>0.8) in each sample, with greater than 100‐fold coverage required to detect mutations that may be present in around 20% of tumor cells (Carter et al., 2012).
Finally, to differentiate tumor specific mutations from the 3–4 million naturally occurring variations/single nucleotide polymorphisms, a normal sample (blood or saliva) from the same patient would need to be sequenced. The presence of copy number aberrations in tumor samples adds an additional layer of complexity in determining allele states.
Several methods have been developed for abundance estimation of genes (Baba et al., 2006; Karst et al., 2011; Orth et al., 2010) and isoforms as well as for the detection of fusion‐transcripts (Voutilainen et al., 2006; Latoszek‐Berendsen et al., 2010). The methods differ in terms of the units of expression of transcript abundance but also show differing sensitivity to sequencing parameters such as read‐length, read‐quality, and insert sizes for paired‐end reads. Standardization of transcript abundance units in addition to evaluation of sensitivity and robustness in both technical and biological replicates are needed before clinical adoption. Additional quality control steps are necessary for optimal sample processing to be assured before proceeding with sequencing. It may be that sequencing technologies could provide a new ‘gold standard’ if they provide more accurate prediction of clinical outcome.
In order to translate an existing multigene signature (e.g. Mammaprint, OncotypeDx, ColoPrint, Genomic Grade Index) whose technical validation and medical utility has already been established, it will be critical to show that the gene expression read‐out using RNA‐seq is technically equivalent to the original microarray or RT‐PCR based assay. In comparing number of sequence reads mapped to each gene and the corresponding absolute intensities from array (normalized) it was found that the correlation is greater for genes that are mapped to by large numbers of sequence reads (Wang et al., 2009; Marioni et al., 2008a). Marioni et al. found 81% overlap when the differentially‐expressed genes from the two technologies were compared (Marioni et al., 2008b). This suggests that the expression values from RNA‐seq will need some mathematical transformations for certain genes to fit the established classifier. We recently reported the comparison of RNA‐seq with standardized clinical measures of ER, PR and HER2 by immunohistochemistry or fluorescent in‐situ hybridization (Kamalakaran et al., 2011). We found that RNA‐seq measurements were 100% concordant with IHC for ER while PR and HER2 were 80–90% concordant and sensitive to sample quality (heterogeneity and percentage invasive component).
In addition to establishing technical equivalence, steps have to be taken toward analytical validation. To establish robust performance of RNA‐seq‐based analysis, the processing pipeline has to be repeated several times with the same samples with low variability. Also, the sequencing based assay and the existing assay (e.g.PCR, RT‐PCR, microarray) should be evaluated on the same set of samples with the intended focused objective to avoid confounding issues. Standardization of RNA‐seq protocols and such evaluations of reproducibility will be key in transitioning into this new technology (Simon, 2005) and provide a path to market via CLIA certification and/or FDA pre‐market reviews.
As described above, most of existing software tools for next generation sequencing have been developed primarily in the research context. Many of these algorithms will need to be validated and quality tested for robustness before routine clinical use. Clinical decision support software which interprets the gigabytes of sequencing data into clinically actionable decisions are still in embryonic stages. The bioinformatic analysis and interpretation requires both standardized genomic “content” and a host of interpretation and knowledge discovery tools. A major challenge will be in presenting the large volume of data that would be available from whole genome sequencing. Databases such as MutaDATABASE (Bale et al., 2011), a standardized and centralized warehouse to hold disease associated variants, would be useful in prioritization of the mutations that will be identified. A similar standardized database for annotating copy number variations does not exist today. For the oncology experts, the decision support is usually performed by a team of bioinformaticians, statisticians and genetic counselors. The importance and relevance of the mutations must be ascertained and presented to the oncologist in an intuitive manner. Berg et al. (2011) proposed a three tiered system that classifies each mutation/variant data as clinically useful, clinically valid or clinical implication unknown. This system would allow prioritization of the data into useful and clinically actionable bits of information that can then be used in disease management,
Visualizing results of sequencing analysis is currently done on an ad‐hoc basic using a cluster of different software tools. OncoPrints from cBio is a tool for visualizing genomic alterations, including somatic mutations, copy number alterations, and mRNA expression changes across a set of patients. Tools such as the Integrated Genomic Viewer from the Broad Institute or the cBio cancer genomics portal from MSKCC are meant for exploratory use for clinical discovery studies. There is a need for an integrated software solution that would analyze, store sequence data and present useful genomic features for clinical actions.
In addition to introducing a new data type and a new modality in the clinic, sequencing will also provide several new challenges to implement, adopt, and utilize in the clinical environment. The Standardization of Clinical Testing (Nex‐StoCT) workgroup in conjunction with the US Centers for Disease Control and Prevention (CDC) has taken steps to define technical process elements to assure the analytical validity and compliance of NGS tests with existing regulatory and professional quality standards (Gargis et al., 2012). These guidelines were drafted to ensure reliable next‐generation sequencing (NGS) based testing and their application for clinically useful decision making. These guidelines “address four topics that are components of quality management in a clinical environment: (i) test validation, (ii) quality control (QC) procedures to assure and maintain accurate test results, (iii) the independent assessment of test performance through proficiency testing (PT) or alternative approaches and (iv) reference materials (RMs)”. These recommendations are a good framework to build upon for ensuring clinically meaningful use of NGS technologies.
The next generation technologies will require not only hardware and software resources but also human expertise. Even with bioinformatics resources, the interpretation of test results will require a sophisticated set of experts with knowledge in molecular biology, biostatistics, bioethics and medicine. The new era of the ‘molecular oncologist’ will soon be upon us where students receive training not only in medical disciplines and basic biology, but also in the interpretation of high dimensional data.
Providers of clinical sequencing services (Ambry Genetics, Aliso Viejo, CA) and Foundation Medicine, Cambridge, MA) have operated as Lab Developed Tests (LDT) under Clinical Laboratory Improvement Amendments (CLIA) certification. However, there are no clear guidelines of operation for Next generation Sequencing based tests. Recently, the College of American Pathologists (CAP), updated the Molecular Pathology checklists of their CAP Accreditation Program to include a section on next generation sequencing based assays. The CAP also updated their master activity menu to include specific Test/Activity codes for the use of next generation sequencing. The checklist provides a framework for documenting processes and ensuring quality for both the analytical wet bench process of sample preparation and sequence generation and the bioinformatics process/pipeline of sequence alignment, annotation and variant calling. We believe these guidelines and processes would allow sequencing based tests to be more reliable and allow for wide deployment into clinical practice. However, these guidelines would need to be updated to include not just mutation/variant calling based NGS tests, but also RNASeq and DNA copy number based tests.
Raw sequencing runs generate hundreds of gigabytes of data from a single measurement, and thus will surpass existing clinical data management infrastructure by one or more orders of magnitude. If follow‐up screening measurements or serial measurements of disease status will be performed, sequencing data will easily be the most data rich modality used clinically in the future. With large quantities of data comes the need for computational resources. Currently, analysis of sequencing data can take days on large computer clusters and is typically offloaded to a computational cloud whose capacity is unlikely to be met by resources currently in place in the clinic (Stein, 2010). It is expected that technological advances in cloud computing and cloud storage will become available to meet the needs for the clinical setting, hiding the details from the end user via a service that provides management and access to this data. However, this will initially augment concerns over data safety and security, liability, and compliance with regulatory requirements, such as how data is stored, which parties own and have access, and details of data deletion, archiving, and retrieval.
Long‐term storage and clinical utility of sequencing data in diagnosis, therapy planning and therapy monitoring will require standards that will ensure interoperability with the electronic healthcare records. HL7 has a Clinical Genomics Working Group that creates and promotes its standards by enabling the communication between interested parties of the clinical and personalized genomic data. The goal is the personalization (differences in individual's genome) of the genomic data and the linking to relevant clinical information. The HL7CG will develop and document scenarios and use cases in clinical genomics to determine what data needs to be exchanged. Also, they review existing genomics standards formats such as BSML (Bioinformatics Sequence Markup Language), MAGE‐ML (Microarray and Gene Expression Markup Language), LSID (Life Science Identifier) and others.
Publicly funded efforts such as iRODS and Galaxy are helping the community currently to cope with the massive sharing of data and procedural knowledge about the sequencing downstream analysis. With the pipelines for RNA‐Seq, Chip‐Seq, exomic and full sequence analysis, the operational question is how to annotate terabytes, petabytes and exabytes of data and how to organize the data for downstream analysis. iRODS (Integrated Rule Oriented Data System), is a technical solution that involves metadata‐driven file management coming from different domains, on different devices, under the control of different groups. It manages descriptive metadata about each item, including duplicate detection, archiving, data migration, access controls, authorization, and integrity checks, and enforcing management policies for any desired property (e.g. enforcement of rules regarding privacy for different consent types). Galaxy is an open, web‐based platform for data intensive biomedical research. Galaxy (http://galaxyproject.org) is a software system that provides this support through a framework that gives experimentalists simple interfaces to powerful tools, while automatically managing the computational details. Galaxy is distributed both as a publicly available Web service, which provides tools for the analysis of genomic, comparative genomic, and functional genomic data, or a downloadable package that can be deployed in individual laboratories. One of the early projects that fully deploys large scale medical sequencing as a productive and critical component in genomic medicine is the ClinSeq project. The ClinSeq project attempts to address issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and informatics challenges in archiving, analyzing, and displaying sequence data (Biesecker et al., 2009).
The implantation of whole genome analysis presents several challenges for privacy and security. First, the amount of data being generated demands large computing requirements not available in most institutions. There are ongoing efforts to ensure security of personal medical information (PMI) in computing ‘clouds’.
In addition to security of PMI, the implications of sequencing whole genomes on patients and their families cannot be overlooked. In addition to tumor sequence, it is clear that host sequence will also be produced by next generation sequencing technologies. The issues that have become well studied in the field of human genetics are amplified in this context where the presence of ALL known heritable mutations and polymorphisms in a particular patient's genomes will become available by sequencing their tumor sample. For a number of reasons, including increased risk of bias, discrimination, and stigma, genetic privacy and confidentiality are sometimes thought to be more important than privacy and confidentiality in other kinds of research (Esposito and Goodman, 2009). Clearly, safeguards must be put in place before such sensitive information is generated and patients must be counseled and consented appropriately for known, and theoretic risks this knowledge carries. For example, potential participants must be informed on which entities and persons will have access to the data. This might include investigators at other institutions, corporate sponsors, a government, employers, etc. If information obtained during research will be placed in a patient's medical record, this too must be disclosed. Subjects should also be told of the risks of others having access to his or her genetic information.
The growth of bioinformatics or computational genomics makes it clear that, in the near future, the concern will not be so much with stored biological samples but with digitalized samples—electronic data that can be stored, transmitted, and analyzed with new ease and power. It is important for institutions to consider policies surrounding the use of genetic information (Massoudi et al., 2011). These processes should address data collection and management, encryption, destruction of specimens and/or genetic information, and loss of data. Researchers and research ethics reviewers should address the issue of clinically suspicious or significant incidental findings, and whether and how they will be communicated to subjects. Incidental findings can be of great interest to subjects, and a comprehensive consent process should make clear whether such findings will be disclosed. In the United States, the passage of the Genetic Information Nondiscrimination Act (GINA) in 2008 (http://www.genome.gov/24519851) has provided, at least in principle, sweeping protections for patients and subjects. GINA prohibits discrimination in healthcare insurance and employment based on genetic information. However, the extent to which GINA changes or reduces the risks of participation in genetic or genomic research should be included in the consent process.
CPT coding needs to evolve in order to encompass the existing single gene tests and the addition of panels of genes and multiplexed tests as well as translated molecular signatures in a manner that also makes economic sense. CPT codes have been developed in the last two decades in two major directions: the first one in microbiology testing and the second one for inherited diseases and cancer. To date, 466 descriptors for molecular pathology services have been drafted by the Molecular Pathology Working Group (MPWG) of AMA. Most are currently accepted for the CPT code set. Availability of the technology does not immediately translate into wide clinical use since healthcare providers and payors need to have well established policies about medical necessity of complex sequencing tests. The MPWG has organized two groups of CPT codes: tier 1 which provides codes for specific procedures performed in high volume (KRAS mutations), and tier 2 which provides codes for large number of less commonly performed tests, requiring more technical and professional resources (similar to the six CPT levels used for Surgical Pathology services). For example, level 2 includes trinucleotide repeat disorders; level 4, sequencing of a single target/exon; level 9 covers full sequencing of genes>50 exons. It is envisioned that next generation sequencing may exist within existing tiers – as medical necessity is established. Multianalyte assays with algorithmic procedures are also likely to be further specified in the context of next generation sequencing tests.
The practice of oncology is being transformed by the vast amount of knowledge that is gained by high throughput molecular profiling technologies. The big challenge before us is the ability to discriminate between the potentially actionable information that can be gleaned from these data from those that provide insights into tumor biology. The former is immediately actionable and would make a difference for the individual patient at hand, while the latter would help devise therapeutic strategies for other patients. It is imperative that these cases are distinguished and proper ethical guidelines are set up to enable oncologists to provide better care for their patients. The collection of the complete molecular profile of the tumor sample would provide an avenue for oncologists to constantly query the patient profile and update as and when new relevant information becomes available. For example, a report on a new clinical trial targetting a mutation/aberration present in the patient would allow the oncologist to switch or update the treatment protocols and improve outcomes or the patient (Figure 3).
The current status and future vision for oncology – replacement of multitude of single gene/panel tests by one comprehensive molecular profile.
W.R.M. has participated in Illumina sponsored meetings over the past four years and received travel reimbursement and an honorarium for presenting at these events. Illumina had no role in decisions relating to the study/work to be published, data collection and analysis of data and the decision to publish.
W.R.M. has participated in Pacific Biosciences sponsored meetings over the past three years and received travel reimbursement for presenting at these events.
W.R.M. is a founder and shared holder of Orion Genomics, which focuses on plant genomics and cancer genetics.
W.R.M received support from the Cancer Center Support Grant (CA045508) from the NCI.
Kamalakaran Sitharthan, Varadan Vinay, Janevski Angel, Banerjee Nilanjana, Tuck David, McCombie W. Richard, Dimitrova Nevenka, Harris Lyndsay N., (2013), Translating next generation sequencing to practice: Opportunities and necessary steps, Molecular Oncology, 7, doi: 10.1016/j.molonc.2013.04.008.
Sitharthan Kamalakaran, Email: moc.spilihp@khtrahtis.
Vinay Varadan, Email: firstname.lastname@example.org.
Angel Janevski, Email: email@example.com.
Nilanjana Banerjee, Email: firstname.lastname@example.org.
David Tuck, Email: moc.liamg@kcutpd.
W. Richard McCombie, Email: ude.lhsc@eibmoccm.
Nevenka Dimitrova, Email: email@example.com.
Lyndsay N. Harris, Email: gro.slatipsohHU@sirraH.yasdnyL.