Here the use of NGS technology to accurately identify mutations in positive control CDG patients is described. Sequence enrichment by RainDance and Fluidigm technology amplified most of the targeted coding sequences with high sensitivity, specificity and accuracy. Samples that failed to amplify were PCR amplified and sequenced separately (). The fact that all of the disease causing mutations were detected by NGS establishes that this method is reliable for mutation detection in the clinical laboratory. These results demonstrate the robustness of this technology for sequencing many genes and its potential to provide a rapid and accurate molecular diagnosis in CDG patients who currently lack genetic characterization. However, following up NGS results with biochemical analysis is necessary, as it is likely that missense changes will be detected with clinical testing and their effect on enzyme function will need to be evaluated.
Benefits of a clinical CDG NGS panel
A clinical CDG NGS panel was recently launched in the molecular genetics diagnostic clinic at EGL. For clinical testing, RainDance was chosen as the sequence enrichment method because with this technology one sample can be processed at a time, which is very important when clinical testing is performed for rare disorders and the sample volume will remain small. RainDance is also ideal for resequencing large numbers of exons, which is beneficial if many genes are implicated in a genetic disorder. Individual gene Sanger sequencing is also available in the laboratory for use when a single gene is biochemically indicated or for familial mutation analysis. The clinical presentation and severity of CDG symptoms varies from patient to patient and for the different subtypes making it difficult to predict which gene could be defective in patients. With no direct candidate gene for Sanger sequencing these patients end up being diagnosed with an unknown type of CDG (CDG-Ix or CDG-IIx). Therefore, molecular testing using a panel of known CDG genes will expedite the process of identifying which gene is defective in patients. Implementation of the CDG NGS panel in the clinical laboratory will reduce the number of patients without genetic characterization, shorten a patient’s time to diagnosis, facilitate genetic counseling and could improve patient management by providing insight into possible future complications that are associated with defects in each gene and by helping to determine which patients could benefit from current therapies. Molecular diagnosis of additional patients with CDG will provide an estimate of the prevalence of each subtype and enable the study of genotype/phenotype correlations. As more patients receive a molecular diagnosis, a comprehensive database can be developed that will encompass information for all of the known subtypes of CDG and will be an invaluable resource to clinicians and researchers involved with this disorder.
If there is a clinical suspicion of CDG it is more cost effective to look for mutations in the 24 CDG associated genes as opposed to a gene-by-gene approach. The average gene contains 10 exons and it costs approximately $1000 for a molecular diagnostic laboratory to PCR, sequence, and clinically interpret and report the results through a genetic counselor. Labor and laboratory overheads are also included in this estimated cost. Alternatively, to screen for mutations in all 24 CDG associated genes via NGS the cost is $5000 and includes all of the services listed above. Therefore, NGS is a viable alternative compared to the gene by gene approach, which was the only method available before the advent of this innovative technology. NGS technology will also drastically reduce costs in the clinical laboratory when other gene panels become available for more genetic disorders.
Targeted CDG panel versus whole exome or whole genome sequencing in a clinical setting
Lately, there are a number of publications that have used whole exome sequencing to molecularly diagnose patients.
34,50–52 This approach is essentially a gene discovery tool. Whole exome sequencing will also be used for new gene discovery for CDG. However, if new genes are identified for CDG, there will have to be other studies to determine whether defects in these genes impair glycosylation and whether these genes belong to a glycosylation biosynthesis pathway or in a pathway that influences glycosylation, which is beyond the scope of the clinical laboratory. Until these studies are performed it is impossible to interpret whether the variants identified by whole exome sequencing are disease causing and result in CDG. This is especially true for identified missense variants. These findings cannot be reported in the clinical setting until there is enough evidence that defects in these novel genes are associated with CDG. Furthermore, the necessary coverage is also not well established for accurate variant calling with whole exome or whole genome sequencing and data analysis would take approximately six months to one year. Adequate bioinformatics support would also be required for all of the data generated from these approaches. It is unknown what the true false positive and false negative rates are with whole exome or whole genome sequencing, but the costs associated with whole exome or whole genome sequencing are currently not feasible for adoption in the clinical setting. Therefore, it remains to be seen how these approaches will be adopted in the clinical laboratory. These current limitations highlight why a given panel approach is beneficial for a rapid patient diagnosis and reporting results in a reasonable turn around time.
If CDG is suspected in a patient based on biochemical analysis, a targeted CDG NGS panel makes sense and this targeted approach offers adequate sensitivity and specificity. Furthermore, mutations in genes on this panel can be interpreted and the results can be reported since loss of function mutations in these genes certainly cause CDG and the location of these genes within the glycosylation pathway or their involvement in glycosylation is known. This targeted panel also has implications for prenatal testing. If there is a family history of CDG and the NGS panel identifies the disease causing mutations the carrier status can be determined for future pregnancies. It is important to note that at this time CLIA and CAP have no guidelines for validation and use of next generation sequencing in a clinical laboratory. Nevertheless, this validation demonstrates NGS technology can be adopted in the clinical setting to improve patient diagnosis.
As 1% of the human genome encodes proteins directly involved in glycan assembly it is likely that additional genes implicated in CDG will be found.
53 These genes will eventually be added to a new version of the CDG NGS panel after thorough review. The targeted CDG NGS panel did not include the gene
ALG1 because highly multiplex PCR lacks the specificity to differentiate between active genes and pseudogenes. Therefore, genes that have associated pseudogenes will need to be analyzed separately via Sanger sequencing. This is important to keep in mind as NGS panels are created for other disorders. As more subtypes of CDG are identified, the nomenclature for CDG will most likely change. Currently, different subtypes of CDG are named alphabetically based on the order the new subtypes are discovered,
12 although a new nomenclature system has been adopted that uses the gene name followed by the suffix “–CDG” and is already being used in the literature.
54Coverage of all target regions for the CDG NGS panel
Coverage can vary due to library preparation and the choice of target enrichment method. Nine low coverage exons (coverage less than 10×) were present in these 24 genes requiring Sanger sequencing of these exons to analyze whether mutations are present in these regions (). Whole exons with low coverage could be due to high GC content and sequence complexity. Additionally, there were 19 exons with no coverage (). For RainDance enrichment all exons from GNE failed to amplify due to bad library synthesis and had to be Sanger sequenced. There was also panel-wide difficulty in amplifying exon 1 mainly due to GC content. A high level of multiplexing and special PCR conditions for amplifying GC-rich exons is needed for amplification. It is estimated with RainDance Technology that there can be up to a 10% library failure rate. A similar failure rate was experienced with Agilent SureSelect™ (data not shown). The number of exons that failed to amplify varied from sample to sample. In this study, Fluidigm generally had greater coverage than RainDance because Fluidigm uses singleplex PCR, which results in a greater number of copies of each amplicon versus the multiplex PCR performed by RainDance. Sanger sequencing is required for confirmation of NGS results because variants with low coverage may be true positives. This would apply to all variants with less than 15× coverage identified in the coding region or close to the coding region. Therefore, caution must be exercised when doing analysis from NGS data. Hence, NGS panels will need to be complimented with Sanger sequencing for some exons for adequate sequencing of whole genes and for analysis of mutations. It will be important to analyze enrichment data for each exon independently from each gene in a panel for coverage and any exon below 15× coverage should be Sanger sequenced to avoid the possibility of a false negative result making Sanger sequencing a necessary compliment to NGS.
Recommendations for data analysis for clinical next generation sequencing
As demonstrated for the clinical validation for CDG, it is important to confirm the variants identified from NGS by Sanger sequencing before reporting results in order to rule out the possibility of a false positive result. Although NGS accurately identified the disease causing mutations in all 12 of these positive control patients, there were a total of 550 variants in these patients. More variants were detected using Fluidigm for enrichment compared to RainDance. This is most likely due to different library designs and differences in the analysis algorithm. Further analysis of the data and eliminating variants that are likely to be false positives can drastically reduce the number of variants that need to be confirmed by Sanger sequencing. Variants that are silent changes, reported SNPs, or not likely to impair gene function are not a priority for Sanger sequencing. The data were filtered by taking into account low coverage (<15×), low quality score (0–100, <10), and the percentage representation of the mutant allele (homozygous or hemizygous variant- >80% mutant allele and heterozygous variant- ~50:50 wild type to mutant ratio). This significantly reduced the number of variant calls that were believed to be real eliminating the majority of variants that would need to be confirmed by Sanger sequencing (). However, based on the coverage or the mutant allele percentage of the disease causing mutations some of these would have been overlooked for Sanger confirmation. For example, patient 0012 has the deletion c.897_899delAAT with a coverage greater than 30× for both RainDance and Fluidigm, but the deletion allele percentages were below 24. Based on the filtering parameters this mutation would have been eliminated for Sanger confirmation due to the low allele percentage. Therefore, a cut-off of 15× coverage along with a thorough assessment of allele representation and the potential of the variant to be deleterious is necessary to select variants for Sanger confirmation and can help eliminate false positives. Direct assessment of each variant for coverage, quality score, mutant allele percentage and whether the variant was detected previously and how many times can also help determine the selection of variants for sanger confirmation. For example, novel silent variants not documented in the dbSNP database but detected in next generation sequencing runs in multiple samples and are within the defined selection parameters and have been confirmed at least once probably need not be selected again for Sanger confirmation.
When both enrichment methods were used over 73% of the variant calls were believed to be real based on the filtering parameters. However, running a patient sample two times is not cost effective in the clinical setting and would require a large amount of DNA for NGS and Sanger confirmation of detected variants. It is more reasonable to use at least two programs for data analysis as this will further improve the accuracy of the data set and avoid false negatives and reduce false positives, thereby reducing cost of the overall test. Examples of programs are NextGENe (SoftGenetics LLC, State College, PA) Bioscope (Life Technologies, Carlsbad, CA) and Corona Lite (Life Technologies, Carlsbad, CA) software. Different analysis software including Integrated Genomics Viewer (BROAD Institute) and software from Genologics and CLC bio are available commercially. Biochemical data previously obtained from analysis of transferrin, serum- or cell-derived glycans, metabolic labeling or cell-associated markers can also help to focus on or eliminate selected candidate genes and variants. A well-coordinated combination of biochemical and genetic information can reduce the extent of confirmation by Sanger sequencing.
Limitations of mutation detection using NGS for clinical testing in CDG patients
As clinical testing continues, it is possible that NGS may only identify one mutation in a patient suggesting a large deletion may be present that is not detectable by NGS. Therefore, another approach is necessary to detect these types of mutations. EGL is the first laboratory to develop a molecular array to detect large duplications and deletions and currently offers this service for more than 200 disease-associated genes.
55 The frequency of large deletions and duplications in CDG patients are currently unknown, as most mutations identified to date are point mutations, splice site mutations and small insertions or deletions. Further testing for intragenic duplications or deletions in these 24 genes will be conducted using targeted array CGH. Genomic loci from all 24 genes will be investigated on a single 60K format array CGH from Oxford Gene Technologies. Alternatively, it is possible that the second mutation could be a non-coding change deep within the intronic region. The only caveat from detecting these changes is that interpretation would be difficult unless functional studies were performed to prove these changes affect splicing, which is typically not performed in a clinical laboratory. Use of both of these technologies in the clinical laboratory will allow for a thorough evaluation of whether mutations are present in the known genes associated with a disorder.
A new disease paradigm may become common with the use of NGS technology
Although NGS successfully identified the disease causing mutations in all 12 positive controls, it also uncovered additional variants in different genes for patients CDG-0012, CDG-0216 and CDG-0270. These additional variants were previously reported disease causing mutations or novel changes predicted to be deleterious.
8 It is currently unknown whether these additional changes contribute to the phenotype in these patients. These additional findings were confirmed by Sanger sequencing and reported to Hudson Freeze. This brings up the possibility of new disease paradigms as a result of using NGS panels for a set of genes known to be associated with a particular disorder. As CDG is an autosomal recessive disorder the detection of additional variants suggests the possibility of synergistic heterozygosity. A previous study pointed to this disease paradigm when several patients were found to have significant reductions in energy metabolism due to partial defects in one or more metabolic pathways.
56 It is possible that partial defects in more than one gene within or associated with the N-glycosylation biosynthesis pathway could result in CDG. Although this has yet to be demonstrated, it is an intriguing possibility especially when only one mutation is identified in a single gene and array CGH does not detect the second mutation or a single mutation is identified in two different genes within this pathway. In patients for whom this is the case, it will be important to perform biochemical analysis to determine whether the variants reduce enzyme function. Synergistic heterozygosity also has implications in the diagnosis of CDG where mutations may not be identified in a single causative gene or set of genes that were chosen as the likely candidates based on biochemical testing. Again, this is a situation in which the CDG NGS clinical panel will be beneficial because it will test 24 genes currently associated with this disorder resulting in a better chance of identifying the gene defect in these patients compared to testing one or several genes individually. As NGS technology gains ground in the clinical setting and more evidence emerges for multiple partial defects in different genes causing a clinical phenotype it is possible that synergistic heterozygosity may become accepted as a common disease mechanism.
Algorithm for molecular diagnosis of CDG
A combination of biochemical and molecular approaches are used to diagnose a patient with a specific subtype of CDG (). The first step in determining which CDG subtype a patient has is through biochemical studies. In some cases clinical data and biochemical testing can provide insight into the gene defect and Sanger sequencing of the suspected gene identifies two mutations leading to a molecular diagnosis of which subtype of CDG the patient is afflicted with and the case is then reported. This is common for patients with Type I defects. However, biochemical testing cannot always reveal the gene defect, especially in patients with combined Type I and Type II defects or Type II defects. If biochemical testing is inconclusive the CDG NGS clinical panel is used. If two mutations are identified in one of the 24 genes on the panel or in the ALG1 gene and they are previously reported mutations or likely to impair protein function the case is reported and the patient is given a diagnosis of a specific subtype of CDG. However, if NGS or direct sequencing of candidate genes reveals only one mutation, further investigation is needed. In these cases, array CGH will be performed to determine whether the second mutation is due to a large deletion. If the second mutation is identified using this approach the case is reported and the patient is given a diagnosis of a specific subtype of CDG. If the second mutation is not identified by array CGH consent will be sought for the patient sample to be analyzed by whole genome sequencing in a research setting.
It is important to keep in mind that NGS may detect novel deleterious variants in these genes. However, these findings should be complemented with biochemical testing if possible. If detected potential deleterious variants have not been reported before enzyme activity will need to be assessed using established assays. A reduction in enzyme activity would be evidence that the variant impairs gene function. Unfortunately, convenient, clinic-friendly biochemical assays are not available for the great majority of CDG-related genes. Therefore, NGS alone will probably not be enough. Further “genetics” approaches that are used in the laboratory for interpreting potential deleterious variants include confirming whether the mutations were inherited from the parents or confirming concordance with affected family members. Diagnosing a patient using both biochemical and molecular approaches will increase the power of diagnostic testing for this group of disorders.