The Affymetrix MitoChip v2.0 resequencing array provides a relatively fast, high-throughput, cost-effective, and sensitive method for detecting both homoplasmic and heteroplasmic mutations across the entire human mitochondrial DNA genome. The costs for each MitoChip v2.0 array and reagents (currently estimated at less than $200/sample) are lower than any other commercially available mitochondrial DNA sequencing methodology. Furthermore, under a week is needed to sequence and analyze MitoChip v2.0 data using the MFP pipeline. By contrast, the current cost of sequencing the whole mitochondrial genome using Sanger sequencing or more recently available NGS platforms typically exceeds $2,000 per sample at a clinical diagnostic laboratory, and can take up to two months to obtain analyzed results. Furthermore, while GSEQ 4.1 is a standard software package designed to analyze MitoChip v2.0 data, it provides less than the desirable call rates necessary for clinical diagnostic applications in suspected mitochondrial disease.
In this study, we systematically analyzed the MitoChip v2.0 data from 24 carefully selected samples with a number of statistical methods. The outcome was the development of a custom pipeline, MFP, that significant improved the call rate and accuracy relative to GSEQ 4.1. With an average call rate of 99.75% across the entire genome and an estimated accuracy of 99.98%, MitoChip v2.0 analysis with the MFP bioinformatics pipeline can now be viewed as a viable and highly attractive alternative to Sanger sequencing.
A distinct advantage of Sanger sequencing over MitoChip v2.0 has been its ability to detect and precisely determine the breakpoints of indels of any size. In this study, we showed that, while indels of a few base pairs in size are still difficult to detect with MitoChip v2.0, deletions larger than 10 bp should be fairly straightforward to detect and precisely define. It is not clear whether the oligos in the Mitochip v2.0 can be further modified to detect smaller indels in mtDNA samples.
Since the MitoChip v2.0 platform was designed based on the revised Cambridge reference sequence that belong to mitochondrial haplogroup H [
14,
15], it has been suggested that MitoChip analysis would not reliably detect sequence variants from more divergent haplogroups, such as L0 [
16]. As shown in Table , the 24 study samples analyzed here originate from a number of diverse haplogroups. If the data set passed QC control parameters, then a desirable call rate could be readily achieved in all cases. These data satisfactorily address concerns about the potential lack of sequence identification of the probes for divergent or rare mitochondrial DNA lineages.
While haplogroup origins could be precisely determined based on our manual curation of the MFP-determined mtDNA genome sequence, we note that implementation in MFP of an automated algorithm that relies solely upon a panel of 22 SNPs to assign haplogroup [
17] failed to permit the accurate assignment of B (sample #3) or D5 haplogroups (samples #19, #20, #24). The haplogroup for sample #14 that had a 5.8 kb deletion was also understandably misassigned, as several of the haplogroup-defining variants for this sample (which belongs to haplogroup K) fell in the deleted region of this sample. Thus, haplogroup assignment can be readily made with MFP but must be viewed with caution if based on the 22 common SNP panel used here or if the sample harbors a large deletion. However, the use of an expanded set of SNPs representing a wide range of phylogenetically important markers from a global set of haplogroups will likely rectify these kinds of misassignments.
Another potential advantage of MitoChip v2.0 analysis over Sanger sequencing is its potential to more sensitively detect heteroplasmy. However, failure to exploit the data captured on MitoChip v2.0 relative to heteroplasmy detection and quantitation appears to be attributable to a limitation of the current GSEQ 4.1 and other software. This same problem has been noted in previous studies employing the MitoChip v2.0 [
4]. While we were able to use MFP to detect 6 confirmed heteroplasmic bases (4 consistent with Sanger sequencing and 2 consistent with Illumina GAII), more extensive validation results are needed to determine the conditions that must be met to consistently achieve those calls across the mtDNA genome. MFP analysis did detect two potentially low-level heteroplasmic mutations in sample #15 at np 15940 and 15944, which were supported by deep sequencing data (Additional File
6). This result suggests that MFP can potentially make robust and accurate heteroplasmy calls.
Any array-based sequencing technology has inherent limitations that cannot be fully addressed by statistical or informatic means. Aside from the difficulty of detecting small indels, as mentioned above, another important issue is the reliable and consistent detection of very low levels of heteroplasmy. As newer technologies emerge, superior heteroplasmy detection will be achieved. In this regard, next-generation sequencing technologies offer enormous depth of coverage for the mitochondrial genome such that point mutations, small indels and low levels of heteroplasmy (at least 5%-10%) can be reliably and quantitatively detected [
18-
20].
Yet, current next-generation sequencing technologies are not without their own limitations. For example, such technologies are still relatively expensive in terms of equipment, reagents, and labor costs. Furthermore, they are high-throughput technologies only in terms of the amount of sequence data that they generate per run, but not in terms of the number of samples that can be individually processed at the same time. In contrast to NGS methods, MitoChip v2.0 analysis can be run on individual samples without having to accumulate a sufficient number of samples to batch analyze them.