|Home | About | Journals | Submit | Contact Us | Français|
The current study characterizes a cohort of limb-girdle muscular dystrophy (LGMD) in the United States using whole exome sequencing. Fifty-five families affected by LGMD were recruited using an institutionally-approved protocol. Exome sequencing was performed on probands and selected parental samples. Pathogenic mutations and co-segregation patterns were confirmed by Sanger sequencing. Twenty-two families (40%) had novel and previously reported pathogenic mutations, primarily in LGMD genes, but also in genes for Duchenne muscular dystrophy, facioscapulohumeral muscular dystrophy, congenital myopathy, myofibrillar myopathy, inclusion body myopathy, and Pompe disease. One family was diagnosed via clinical testing. Dominant mutations were identified in COL6A1, COL6A3, FLNC, LMNA, RYR1, SMCHD1, and VCP, recessive mutations in ANO5, CAPN3, GAA, LAMA2, SGCA, and SGCG, and X-linked mutations in DMD. A previously reported variant in DMD was confirmed to be benign. Exome sequencing is a powerful diagnostic tool for LGMD. Despite careful phenotypic screening, pathogenic mutations were found in other muscle disease genes, largely accounting for the increased sensitivity of exome sequencing. Our experience suggests that broad sequencing panels are useful for these analyses due to the phenotypic overlap of many neuromuscular conditions. The confirmation of a benign DMD variant illustrates the potential of exome sequencing to help determine pathogenicity.
Limb girdle muscular dystrophy (LGMD) is a broad and increasingly heterogeneous category of inherited muscle diseases1. LGMD typically causes progressive proximal muscle weakness and has been associated with classic histological abnormalities on muscle biopsy. As genetic discoveries in LGMD proliferate, it has become clear that the clinical and histological presentations, as well as outcomes, may vary widely between subtypes and among different affected individuals. However, these variations are not consistent enough to enable clinicians to identify subtypes based on phenotype alone. Two major subcategories are recognized based on inheritance patterns: LGMD type 1 (LGMD1) is dominantly inherited and LGMD type 2 (LGMD2) is recessively inherited. To date, 8 dominant forms (LGMD1A-H) and 23 recessive forms (LGMD2A-W) have been described, each corresponding to a different causative gene2. Onset of symptoms may occur at almost any age, with the exception of infancy, which would indicate the presence of a congenital muscular dystrophy. Traditional approaches of identifying pathogenic mutations by immunohistochemistry, western blotting and Sanger sequencing of selected genes can yield genetic diagnoses in 35% of families3. Clinical exome sequencing in general has been reported to have a diagnostic rate of 25%4, whereas recent studies of exome sequencing for neuromuscular disease show a 46% diagnostic rate in the United States5 and 73% in a highly consanguineous population from Iran6. Diagnostic rates in LGMD have recently been reported to be 45% in Australia using exome sequencing7 and 33% in Germany using targeted sequence capture8. The results of exome sequencing in LGMD for a large cohort from the United States has not previously been published.
We analyzed 55 families from the United States, each of which has one or more individuals with the clinical diagnosis of LGMD. Pathogenic mutations were identified in 22 of 55 families using exome sequence analysis in concert with clinical findings and Sanger sequence confirmation. Our results correlate with the results of studies performed in other countries, and yield interesting observations about approaches to genetic diagnosis in muscular dystrophy.
Patients with the clinical diagnosis of LGMD who did not have a genetic diagnosis after clinical evaluation (including some clinical genetic testing), as well as their available informative family members, were recruited for this study. Onset of symptoms for all probands was over 1 year. A total of 55 families were enrolled via an institutionally approved research protocol at Boston Children’s Hospital. One of the authors (EE), a certified genetic counselor, personally enrolled all of the subjects and reviewed risks and benefits in detail during the consent process. Clinical data collected included medical and family histories, physical examinations, laboratory results, clinical genetic test results, and clinical muscle biopsy data, which were stored in a secure Filemaker Pro v.10 database (see Supplementary Figure 1 for sample form). Peripheral blood or saliva samples were collected from probands and informative relatives for DNA extraction. Any clinical information that indicated specific gene candidates, such as deficiencies of protein expression on immunohistochemistry, was taken into account when analyzing the exome sequencing data.
The Genomics Platform at the Broad Institute was used to perform whole exome sequencing of DNA samples representing selected subjects from 45 of the 55 families; the full sequencing protocol has been published for LGMD cohorts from other countries7, 9. The Agilent Sure-Select Human All Exon v2.0, 44Mb baited target and the Broad in-solution hybrid selection process were used to target exons in genomic DNA. At least 250 ng of DNA with concentrations of at least 2ng/μl were submitted for each sample. The hybrid selection libraries cover >80% of targets at 20x or more, with a mean target coverage of >80x. Exome sequencing data was processed through a pipeline based on Picard (https://github.com/broadinstitute/picard), using base quality score recalibration and local realignment at known insertions and deletions. The BWA aligner (https://github.com/lh3/bwa) mapped reads to the human genome build 37 (hg19) reference sequence. The variant call set was uploaded on to xBrowse (https://atgu.mgh.harvard.edu/xbrowse/) and an analysis limited to the candidate gene list was performed using the various inheritance patterns. The main report contains variants restricted to nonsense, frameshift, essential splice site and missense variants and filtered on variant site and genotype quality.
DNA samples from the remaining 10 of the 55 families underwent whole exome sequencing at the Genomic Diagnostic Laboratory and analyzed by the Interpretive Genomic Services team at Boston Children’s Hospital as previously described10. Briefly, exome capture was performed using the Agilent V4 Human Exome Kit. Library sequencing was performed on an Illumina HiSeq, generating 31 million paired end reads (100bp x 2) and a mean target coverage of 27x, with 81% of the target covered by ≥ 10 reads. Alignment, variant calling, and annotation were performed with a custom informatics pipeline employing Burrows-Wheeler Aligner (BWA), Picard (http://picard.sourceforge.net), Genome Analysis Toolkit (GATK), and ANNOVAR. The human genome reference used for these studies was hg19/GRCh37. Single nucleotide changes, microdeletions, and microinsertions were reported and annotated using the NCBI and UCSC reference sequences and online genome databases (NHLBI Exome Sequencing Project with ~5400 exomes, 1000 Genomes Project, dbSNP135, Complete Genomics 52).
A total of 30 exomes were sequenced from the 22 diagnosed families. Seventeen families had only proband samples available for sequencing. Trios (proband & parents) underwent exome sequencing in three families, while the proband and an additional informative family member were sequenced in each of the remaining 2 families. As the exome sequencing was performed on a research basis, incidental findings of pathogenic mutations for unrelated diseases were not systematically sought, identified or reported.
The candidate variants were identified by xBrowse and other software. The 1000 Genomes Project (http://www.1000genomes.org) and The Exome Aggregation Consortium (ExAC) databases (http://exac.broadinstitute.org) were used to determine if the candidate variants were known single nucleotide polymorphisms (SNPs). Candidate variants that were known SNPs were required to have a minor allele frequency (MAF) < 0.0001 to be considered for further analysis. SNPs with a MAF > 0.0001 were determined to be non-pathogenic. The UCSC browser (https://genome.ucsc.edu/) was used to determine candidate variant amino acid conservation among species through evolution from lamprey to humans. Species conservation was determined using the likelihood ratio test of significantly conserved amino acid positions (LRT) and PhyloP (http://ccg.vital-it.ch/mga/hg19/phylop/phylop.html). Pathogenicity of these variants was predicted by using SIFT (http://sift.jcvi.org), PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2), Mutation Taster (http://mutationtaster.org), and FATHMM (http://fathmm.biocomputer.org.uk). Variants affecting conserved amino acids that were reported to be pathogenic by at least 2 of the 4 prediction programs were selected for further analysis. In light of the limitations on the accuracy of these programs11, outputs from these prediction algorithms were used only for screening purposes with a deliberately liberal threshold, and were not used to make final determinations of pathogenicity.
PCR amplification of selected candidate variants from exome sequence analysis were amplified using standard PCR primers. Amplicons were assessed via agarose gel electrophoresis, then purified by treating 5 μl of PCR product with 2μl of Exonuclease and Shrimp Alkaline Phosphatase (Exo-SAP-IT; Affymetrix) and submitted to the Molecular Genetics Core Facility at Boston Children’s Hospital or the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida for sequencing using the ABI Prism BigDye Terminator cycle sequencing protocols (Applied Biosystems, Perkin-Elmer Corp., Foster City, CA). Sequence data were generated in an ABI Prism® 3130 or 3730 Genetic Analyzer (Applied Biosystems, Foster City, CA), formatted by ABI Sequencing Analysis software v.5.2 and KB Basecaller, and analyzed using Sequencher v.5.2.3 or earlier versions (GeneCodes Corporation, Ann Arbor, MI). Sanger sequencing was performed in affected family members and other informative family members to confirm pathogenic mutations and track co-segregation patterns. The only widespread screening performed via Sanger sequencing was for FKRP in 18 families who had exome sequencing on an older platform that did not have good coverage of that gene7.
Clinical features and details of clinical diagnostic testing are summarized in Table 1. Most of the probands had clinical muscle biopsies, and none of the muscle biopsies led to a genetic diagnosis prior to enrollment. Analysis of exome sequencing data yielded the identification of pathogenic mutations in 21 of the 55 families, with one additional family among the 55 receiving a clinical genetic diagnosis during the course of the study (Figure 1 and Table 2). The 22 families with diagnoses included 11 with dominant mutations, 10 with recessive mutations, and 1 with an X-linked DMD mutation. Novel pathogenic mutations were identified in 8 families; 4 of these novel mutations were heterozygous mutations. Two other families have pathogenic mutations reported in public databases, including LOVD (http://www.lovd.nl), Emory Genetics Laboratory (http://geneticslab.emory.edu/emvclass/emvclass.php), and GeneDx but not published; one pathogenic mutation was in both categories. Sanger sequencing confirmed these mutations in all probands and also confirmed expected co-segregation patterns for available family members. Co-segregation was confirmed in 13 of the 22 families, while the remaining 9 had only proband DNA samples available. The families with only proband samples available included 7 with previously reported pathogenic mutations and 2 with novel pathogenic mutations (one family had compound heterozygous pathogenic mutations that included a previously reported nonsense mutation and a novel essential splice site mutation). No FKRP mtuations were found on Sanger sequencing.
Two unrelated individuals representing families 930 and 1125 were found to have LGMD1B with pathogenic mutations in LMNA. Both affected individuals had onset in the toddler years, elevated serum creatine kinase levels, and dystrophic muscle biopsies.
Family 1092 was found to have novel dominant missense pathogenic mutations in COL6A1. This gene is classically associated with Bethlem myopathy and Ullrich congenital muscular dystrophy, but recent reports also link it with LGMD12, 13. The COL6A1 NM_001848.2 c.868G>A, NP_001839.2 p.Gly290Arg (rs121912939) pathogenic mutation in family 1092 has been reported by GeneDx (http://www.genedx.com/test-catalog/disorders/limb-girdle-muscular-dystrophy-lgmd/, with NCBI submission accession number: SCV000196773.1) and Emory Genetics Laboratory (http://geneticslab.emory.edu/index.html, with NCBI submission accession numbers: SCV000224895.1, SCV000224896.1 and SCV000111716.3) as being pathogenic. A dominant missense pathogenic mutation c.868G>C that causes the identical p.Gly290Arg amino acid substitution has been reported in Ullrich congenital muscular dystrophy14.
Similarly, pathogenic mutations in COL6A315 are known to cause Ullrich congenital muscular dystrophy and Bethlem myopathy, but the association with LGMD has only been reported recently7. We identified pathogenic mutations in COL6A3 in three families. A dominant mutation in COL6A3 (NP_004360.2, p.Glu1386Lys) identified in family 965 was previously reported as being pathogenic15 and that amino acid residue was highly conserved across species. Proband 965-1 had neither distal laxity nor a tendency towards keloid formation, and a thigh MRI did not show findings specific for Ullrich congenital muscular dystrophy or Bethlem myopathy16. The pathogenic mutations of COL6A3 identified in two other families (1093 and 1115) are de novo essential splice site mutations. The pathogenic COL6A3 mutation (NM_004369.3, c.6283-1C>T) in family 1093 is novel, whereas the NM_004369.3, c.6156+1G>A de novo pathogenic mutation observed in 1115 was previously reported15. The proband in family 1093 showed a mixed phenotype of LGMD and congenital muscular dystrophy.
Ryanodine receptor 1 (RYR1) mutations are known to cause a congenital myopathy, central core disease. A de novo dominant missense pathogenic mutation (NM_001042723.1, c.14567G>A, NP_001036188.1 p.Arg4856His) in RYR1 was found in the proband of family 596. This mutation has been reported to cause a congenital neuromuscular disease with uniform type 1 fibers and an association with central core disease17, 18.
Pathogenic VCP mutations are known to cause amyotrophic lateral sclerosis (ALS) and inclusion body myopathy. The mutation identified in family 1250 (VCP NM_007126.3, c.572G>A, NP_009057.1, p.Arg191Gln (rs121909334)) was previously reported in familial amyotrophic lateral sclerosis and in patients with an unusual syndrome of inclusion body myopathy, Paget disease of bone, and frontotemporal dementia19. The inclusion body myopathy may present with manifestations similar to LGMD19.
Pathogenic mutations in gamma filamin (FLNC) usually cause myofibrillar myopathy with distal weakness, but a recent report showed that they may cause an LGMD phenotype7. The dominant missense pathogenic mutation FLNC NM_001458.4, c.7409C>A, NP_001449.3, p.Pro2470His identified in 1399 is novel, has not been reported in any population database, and was predicted to be pathogenic by SIFT, PolyPhen, MutTaster and FATHMM. The proband of family 1399 showed an LGMD phenotype with cardiomyopathy, accompanied by features of myofibrillar myopathy, similar to other individuals reported to have pathogenic FLNC mutations.
The dominant pathogenic mutation in SMCHD1, identified in family 1090, causes an in-frame deletion of amino acid lysine at position 275 and has been previously reported10. While sequence data were being analyzed, the proband from 1258 informed the research team that he had been diagnosed with FSHD1 based on clinical genetic testing of the D4Z4 region on chromosome 4q35. He had asymmetric weakness in the right chest and arm, but no facial weakness.
Compound heterozygous pathogenic mutations in Calpain 3 (CAPN3) were identified in families 1197 and 1365. The missense mutations found in family 1197 were previously reported as homozygous mutations in different families20, 21. Both heterozygous pathogenic mutations of CAPN3 found in family 1365 affect splicing, and a Western blot of protein extracted from muscle biopsy tissue showed reduced Calpain 3 expression. The CAPN3 NM_000070.2, c.1746-20C>G (rs201892814) pathogenic mutation was reported previously by the Emory Genetics Laboratory (http://www.ncbi.nlm.nih.gov/clinvar/variation/92408/, with NCBI submission accession number: SCV000109927.4), and c.945+5G>A is a novel pathogenic mutation that shifts a splice site downstream, extending the exon. The latter was found to have a minor allele frequency of 0.0000082 (i.e., singleton) in the ExAC database.
A consanguineous family, 1299, had a pathogenic homozygous recessive missense mutation (NM_000023.2, c.109G>T, NP_000014.1, p.Val37Leu) in SGCA; these mutations have not been previously reported. Two pathogenic mutations in SGCG, a previously reported heterozygous deletion of 4 nucleotides (AGTA) at NM_000231.2, c.195+4_195+722 and a novel heterozygous substitution of c.195+1G>C (rs200502077), were found in family 1049. The latter is an essential splice site mutation. A muscle biopsy was performed on the proband, but tissue from this biopsy was not available for the current study.
Pathogenic mutations in ANO5, which cause LGMD2L, were found in three families (1102, 1105 & 1395). The homozygous recessive mutation found in family 1102, ANO5 NM_213599.2, c.191dupA, NP_998764.1, p.Asn64Lys fs Ter15 (rs137854521), is a known pathogenic mutation23–25 that generates a stop codon 15 amino acid residues downstream of the mutation. The two other families (1105 & 1395) also have this mutation but in a heterozygous state; the other allele has novel mutations: a nonsense mutation c.835C>T, p.Arg279Ter in family 1395 and a splicing mutation c.2235+5 G>A in family 1105. The pathogenic ANO5 mutations were confirmed for co-segregation in their respective families.
Pathogenic mutations in LAMA2 have been identified as the cause of merosin-deficient congenital muscular dystrophy. Several studies have reported that partial merosin deficiency by LAMA2 mutations and some forms of LAMA2 mutations are known to manifest as LGMD phenotypes26–30, suggesting that LAMA2 should be included among the causative genes for LGMD231. Compound heterozygous pathogenic mutations in LAMA2, a previously reported nonsense mutation NM_000426.3, c.5116C>T, NP_000417.2, p.Arg1706Ter28 and a novel splice site mutation c.8703+1G>A r.spl, were identified in family1409. The phenotype of the proband, 1409-1, was reviewed again and was confirmed to meet criteria for LGMD. The proband had some contractures and onset was in early childhood but was not early enough to be classified as congenital muscular dystrophy. Mutations in LAMA2 have recently been associated with Emery-Dreifuss muscular dystrophy32 and this diagnosis has also been a consideration for the proband. However, the subject was a young adult at the most recent evaluation and ongoing cardiac monitoring has revealed little to no evidence for overt cardiac complications to date.
Compound heterozygous pathogenic mutations in GAA, known to cause Pompe disease, were found in family 1117. These were a missense mutation NM_000152.3, c.1841C>A, NP_000143.2, p.Thr614Lys (rs369531647)33 and a substitution c.-32-13T>G r.spl (rs386834236) that affects splicing34. Both mutations were previously reported.
One family was found to have an X-linked pathogenic mutation in the dystrophin gene (DMD). The pathogenic nonsense mutation DMD NM_004006.2, c.9G>A, NP_003997.1, p.Trp3Ter, found in family 1107, was previously reported35–37.
Suspected but unconfirmed mutations are listed in Table 3. Exome sequencing analysis showed that family 1027 has a heterozygous dominant variant in MYOT (NM_006790.2 c.1345delC, NP_006781.1 p.Pro449Gln fs Ter16 (rs780331457). Mutations in MYOT are known to cause LGMD1A, but DNA is only available on the proband for this family, hence it is difficult to confirm this variant as a pathogenic mutation. It is a novel variant that is not found in the 1000 Genomes database and with MAF of 0.00004942 in the ExAC database. The amino acid residue is also very well conserved. We found compound heterozygous variants of POMGNT2 (GTDC2) in family 1255. A rare missense variant (NM_032806.5 c.190G>A, NP_116195.2 p.Gly64Ser (rs548769646)) is found in the proband as well as both parents, whereas a 2 base pair deletion (c.740_741delAA, p.Phe247CysfsTer16) is present in the proband and absent in both parents; the latter appears more likely to be pathogenic. The missense variant of COL6A1 found in family 1366 is novel (NM_001848.2 c.466G>T, NP_001839.2 p.Val156Leu), and the affected amino acid residue is conserved from lamprey through human. The mutation in family 1366 is not found in the 1000 Genomes database and has a very low minor allele frequency of 0.0000085 in the ExAC database (http://exac.broadinstitute.org/). It is predicted to be pathogenic by 3 of 4 prediction programs analysed. The phenotype of the proband in family 1366 showed some overlap with congenital muscular dystrophy. DNA was only available for the proband in this family, thus analysis of co-segregation patterns was not possible.
A DMD NM_004006.2 c.8762A>G, NP_003997.1 p.His2921Arg (rs1800279) variant suspected of being benign37–40 was identified in the probands of four families (1258, 1309, 1365, 1398). In each family, the variant was confirmed to be benign due to causative mutations found in other genes (Table 4). Two of the families, 1309 and 1398, were from Saudi Arabia and were not included in the 55 families for the main analysis noted above, but are mentioned here as further evidence of the benign nature of this variant. The proband of family 1258 is male, but his muscle biopsy showed normal dystrophin staining and he was diagnosed with FSHD1, as noted above. Both the male proband and an unaffected brother in family 1309 had the hemizygous DMD variant in question. The female proband of family 1365 had confirmed CAPN3 compound heterozygous missense mutations as well as the heterozygous DMD mutation. Family 1398 was found to have a known homozygous SGCG NM_000231.2, c.212T>C, XP_005266562.1, p.Leu71Ser mutation that co-segregates with phenotype in this family. The proband of this family is female and was found to have the heterozygous DMD variant, while the unaffected father was found to have the hemizygous DMD variant. The minor allele frequency for this variant (rs1800279) in the ExAC database is 0.02629, which is not compatible with a pathogenic mutation.
Among the 55 families studied, exome sequencing analysis identified pathogenic mutations in 21, while clinical genetic testing revealed the diagnosis for an additional family. The overall success rate of 40% is comparable to recent previous reports of exome sequencing analysis for LGMD and neuromuscular diseases in non-consanguineous populations5, 7, 8. Traditional genetic, biochemical and histopathological examinations yield diagnoses in 30 – 40% of LGMD cases3, 41, and targeted sequence capture has similar yields8. Exome sequencing has improved the diagnostic yield to the 40 – 45% range, both in our cohort and in the literature5, 7, 8, likely due in part to the use of trios and family studies. As the subjects had varying degrees of clinical evaluation prior to enrollment, including clinical genetic testing, a similar approach would be expected to have an even higher yield in the clinical setting for patients who had not had prior genetic testing or were screened appropriately for pathogenic mutations not amenable to sequencing technologies. Several families had pathogenic mutations in CAPN3, sarcoglycans, and ANO5, common LGMD genes for which clinical genetic testing is readily available. The absence of any subjects with pathogenic DYSF mutations is notable, as well as the under-representation of common genes aside from ANO5. The depth of clinical evaluations varied among these families. Many patients with pathogenic mutations in common LGMD genes were likely diagnosed on clinical genetic testing and this cohort does not represent those individuals. Most of the subjects who had extensive LGMD genetic testing prior to enrollment underwent those evaluations prior to the association of ANO5 with LGMD that was first described in 2010.
Among the pathogenic mutations identified in our cohort, six were found in loci not traditionally classified as being associated with LGMD (e.g., DMD, GAA, SMCHD1, VCP, FLNC, and the D4Z4 region of 4q35), suggesting that these genes could account for at least some of the increased diagnostic yield, as recently noted7. These findings, along with the decreasing use of muscle biopsy in clinical settings, indicate that diagnostic genetic testing panels based on targeted sequence capture for LGMD should include a broad array of muscle disease genes, not only ones that meet the strict definition of LGMD. The diversity of causative genes also illustrates the importance of accurate clinical phenotyping for both clinical and research purposes. There is significant phenotypic overlap between LGMD and diseases that are not traditionally considered to be LGMD, such as Pompe disease, and though the subjects in our cohort with non-LGMD causative genes could not be distinguished from the others based on clinical presentation, there may be other cases where this is possible. Of note, given the availability of a treatment for Pompe disease, the individual with the GAA mutations had clinical confirmation in compliance with our IRB protocol so that treatment options could be offered.
This study confirmed that DMD NM_004006.2, c.8762A>G, NP_003997.1, p.His2921Arg is a non-pathogenic benign variant, as it was found in multiple unaffected individuals in the hemizygous state, and affected individuals were also found to have confirmed pathogenic mutations in other genes. The variant has been increasingly suspected of being benign37–40. The additional findings in our study illustrate one of the benefits of accumulating databases of exome sequences. Though the amount of data is significantly larger, requiring sophisticated computational approaches to analyze completely, the array of identified variants for each individual tested is more complete, which over time will permit more definitive assignments of pathogenicity, fewer “variants of unknown significance”, and correction of reported mutations that may not truly be pathogenic42.
These diagnostic outcomes have been consistent across multiple exome sequencing studies performed on disease categories that are genetically heterogeneous, as LGMD is. This suggests that the previous estimate that 85% of pathogenic mutations are found in coding regions43 may be too high. However, the subjects selected for the current study and similar studies were ones who had previously had clinical evaluations, including genetic testing, suggesting that the yield would be higher had the cohorts not been pre-screened. In addition, certain types of pathogenic mutations affecting coding regions are not easily detected with current exome sequencing technologies. For example, single and multiple exon deletions and duplications comprise the majority of pathogenic mutations in Duchenne and Becker muscular dystrophy, trinucleotide repeat expansions cause the most common form of myotonic dystrophy, and the D4Z4 macrosatellite deletion on 4q35 that is associated with facioscapulohumeral muscular dystrophy type 1 is also not easily detected on exome sequencing. A number of our subjects who had phenotypes suggestive of these specific types of muscle disease had appropriate clinical genetic testing, but a patient with an atypical presentation of facioscapulohumeral muscular dystrophy type 1 was enrolled in our research and received a clinical genetic diagnosis of LGMD due to his phenotype. Careful phenotyping of individuals and family members proves to be very important to help keep the investigator on the proper course to ultimately lead to a molecular diagnosis.
Ethical issues persist in the collection of exome and genome-wide sequencing data with respect to the potential for the identification of incidental pathogenic mutations. These mutations are often hidden in the mountains of data generated, as research laboratories and clinical laboratories typically extract only those variants that lie in a specific, limited set of genes of interest. Incidental variants would only be found if they were actively sought during variant analysis. Another problem is that if some pathogenic mutations may not lead to symptomatic disease for decades, what would be an optimal time to discover and report such mutations. Various national and international organizations are actively discussing this issue. One solution is to provide patients and research subjects access to their electronic sequencing data, so that they may, if they choose, seek additional analysis by other facilities and investigators without having to have the sequencing repeated..
Further analysis continues on the families in whom pathogenic mutations were not identified in the current study. Some of the probands had clinical muscle biopsies performed, and when available, biopsy reports and slides were reviewed to confirm the absence of pathogenic findings. The possibility of digenic compound heterozygous mutations will be considered, as has been described for specific diseases44, including muscular dystrophy45, 46. To extend the current study, we plan to perform whole genome sequencing and other genetic analyses on selected families in an attempt to detect larger pathogenic mutations such as copy number variants, inversions, and large-scale deletions such as the D4Z4 macrosatellite contraction. The rare pathogenic mutation in a non-coding region will be difficult to identify and confirm, even with the assistance of whole genome sequencing, given the collective size of the intronic regions and the number of variants that will be identified for each affected individual. Exceptions may be found in regions with known functions such as miRNA binding sites, where pathogenic mutations have been confirmed in a handful of cases. And there is of course the promise that novel disease genes remain to be identified. We are currently examining candidate mutations in several potential novel genes that have been identified on the exome sequencing analysis. Though such genes are becoming more difficult to discover and confirm, it is unlikely that we have identified all the genes associated with LGMD, and the number of cases that remain without a genetic diagnosis provide a tantalizing clue that more such genes are out there.
The current analysis of whole exome data from a sizeable cohort of families affected by LGMD in the United States has yielded similar overall findings to those reported in other countries. Most of the pathogenic mutations identified were in known LGMD genes, but a few were in muscle disease genes that are not strictly considered to be LGMD, indicating that clinical genetic testing panels should include a broad array of genes to maximize the yield. A previously reported pathogenic mutation in DMD was found to be a benign variant in multiple families, providing an example of how candidate mutations in both known and novel disease genes should be scrutinized carefully. The number of cases without a genetic diagnosis remains stubbornly high, even after exome sequencing, suggesting that there are unusual pathogenic mutations in known genes and all manner of pathogenic mutations in novel disease genes that have yet to be identified.
Blank copy of database form from Filemaker Pro v.10 used in the study.
The authors thank all the study participants for their valuable contributions to this study. The work was supported by NIH R01 NS080929 (HMR, KAC, EE, and PBK), NIH R01 GM104371 (DGM), Department of Pediatrics at the University of Florida College of Medicine (KAC and PBK), Muscular Dystrophy Association Research Grant 186796 (PBK), and the Bernard F. and Alva B. Gimbel Foundation (LMK). Exome sequencing was supported by Medical Sequencing Program grant U54HG003067 from the National Human Genome Research Institute. M.A.S. was supported by the Deanship of Scientific Research, King Saud University, Riyadh, Saudi Arabia via research group project number RGP-VPP-301.
Conflict of Interest Disclosure
TWY and LMK have received personal compensation from Claritas Genomics.