|Home | About | Journals | Submit | Contact Us | Français|
Facioscapulohumeral dystrophy (FSHD) is characterized by chromatin relaxation of the D4Z4 macrosatellite array on chromosome 4 and expression of the D4Z4-encoded DUX4 gene in skeletal muscle. The more common form, autosomal dominant FSHD1, is caused by a contraction of the D4Z4 array, whereas the genetic determinants and inheritance of D4Z4 array contraction-independent FSHD2 are unclear. Here we show that mutations in SMCHD1 (structural maintenance of chromosomes flexible hinge domain containing 1) on chromosome 18 reduce SMCHD1 protein levels and segregate with genome-wide D4Z4 CpG hypomethylation in human kindreds. FSHD2 occurs in individuals who inherited both the SMCHD1 mutation and a normal-sized D4Z4 array on a chromosome 4 haplotype permissive for DUX4 expression. Reducing SMCHD1 levels in skeletal muscle results in contraction-independent DUX4 expression. Our study identifies SMCHD1 as an epigenetic modifier of the D4Z4 metastable epiallele and as a causal genetic determinant of FSHD2 and possibly other human diseases subject to epigenetic regulation.
FSHD [MIM158900] is clinically characterized by the initial onset of facial and upper-extremity muscle weakness that is often asymmetric and progresses to involve both upper and lower extremities1. FSHD1 and FSHD2 are phenotypically indistinguishable and both are associated with DNA hypomethylation and decreased repressive heterochromatin of the D4Z4 array, which we will collectively refer to as chromatin relaxation2–8 (Supplementary Fig. 1). Each D4Z4 unit contains a copy of the DUX4 (double homeobox 4) retrogene9–13, a transcription factor expressed in the germline and epigenetically repressed in somatic tissues. The D4Z4 chromatin relaxation in FSHD results in inefficient epigenetic repression of DUX4 and a variegated pattern of DUX4 protein expression in a subset of skeletal muscle nuclei14 (Supplementary Fig. 1). Ectopic expression of DUX4 in skeletal muscle activates the expression of stem cell and germline genes15 and when over-expressed in somatic cells DUX4 can ultimately lead to cell death12,16–20. Chromatin relaxation in FSHD1 is associated with a contraction of the array to 1–10 D4Z4 repeat units and therefore has a dominant inheritance pattern linked to the contracted array. In FSHD2, chromatin relaxation is independent of the size of the D4Z4 array and occurs on both chromosome-4 D4Z4 arrays and also on the highly homologous arrays on chromosome 102,7,8,21,22 (Supplementary Fig. 1).
D4Z4 chromatin relaxation must occur on a specific chromosome-4 haplotype in order to cause FSHD1 and FSHD2. This haplotype contains a polyadenylation (pA) signal to stabilize DUX4 mRNA in skeletal muscle13,23–27. Chromosomes 4 and 10 that lack this pA signal fail to produce DUX4 protein; consequently, D4Z4 chromatin relaxation and transcriptional derepression on these non-permissive haplotypes does not lead to disease. Because chromatin relaxation occurs at both chromosome 4 and chromosome 10 D4Z4 repeats in FSHD2, we sought to determine whether an inherited defect in a modifier of D4Z4 repeat-mediated epigenetic repression might cause FSHD2 when combined with an FSHD-permissive DUX4 allele.
To measure D4Z4 chromatin relaxation, we quantified the percentage of CpG methylation based on cleavage by the methylation sensitive FseI endonuclease, an assay that averages the percentage D4Z4 methylation on both alleles of chromosomes 4 and 10 in a cohort of 72 controls, 93 FSHD1 patients and 53 FSHD2 patients. In FSHD2 affected individuals D4Z4 methylation was at least 2SD below the average levels in the general population (44+/−10% for the general population and 11+/−5% for FSHD2, see Fig. 1a, Supplementary Note and Supplementary Fig. 2). Using a stringent methylation threshold of <25%, we discovered that in some kindreds identified by an FSHD2 proband D4Z4 hypomethylation segregated in a pattern consistent with autosomal dominant inheritance that was not linked to the chromosome-4 or -10 D4Z4 array haplotypes (Fig. 1b). In these kindreds, FSHD2 individuals inherited both the hypomethylation trait and the FSHD-permissive chromosome-4 haplotype with the DUX4 pA signal, suggesting that two independently segregating loci cause and determine the penetrance of FSHD2.
In order to identify the locus controlling the D4Z4 hypomethylation trait, we performed whole exome sequencing28 of twelve individuals in seven unrelated FSHD2 families: five with dominant segregation of the hypomethylation trait and two with sporadic hypomethylation and FSHD2). Detailed genetic analysis of the repeat lengths and haplotypes did not reveal evidence for non-paternity in these families (Fig. 1b). Families were stratified according to the criteria listed in Supplementary Table 1 and described in Supplementary Information. We identified rare and potentially pathogenic mutations in the SMCHD1 (Structural maintenance of chromosomes flexible hinge domain-containing 1) gene in all individuals with D4Z4 hypomethylation with the exception of members of one family (Rf854: Table 1). These mutations were not present in public (dbSNP132 and the 1000 Genomes Project) or internal databases or in family members with normal D4Z4 methylation levels.
We confirmed the presence of these mutations by Sanger sequencing and included 12 additional unrelated FSHD2 families for which DNA or RNA was available. We identified heterozygous out-of-frame deletions, splice-site mutations, and heterozygous missense mutations in SMCHD1 in 15/19 (79%) families (Table 1 and Fig. 1b). We also confirmed that the splice-site mutations altered the normal SMCHD1 mRNA by exclusion of exons and cryptic splice site usage (Supplementary Fig. 4a,b).
Because heterozygous SMCHD1 mutations co-segregated with D4Z4 hypomethylation in FSHD2 families or occurred de novo in sporadic hypomethylation/FSHD2 individuals (Fig. 1b), we considered SMCHD1 haploinsufficiency as a candidate disease mechanism, particularly since many of the mutations were predicted to affect the production of the full protein. Indeed, fibroblasts from FSHD2 patients with non-synonymous or splice-site mutations in SMCHD1 had substantially reduced SMCHD1 protein levels (Fig. 2a). We found normal levels of SMCHD1 protein in the hypomethylated FSHD2 individual in family Rf854 that did not have an SMCHD1 mutation (Fig. 2a), suggesting that FSHD2 in this family has a genetic cause other than SMCHD1 haploinsufficiency. Finally, chromatin immunoprecipitation (ChIP) demonstrated the presence of SMCHD1 on the D4Z4 array and reduced levels of this association in FSHD2 individuals with SMCHD1 mutations (Fig. 2b). Together, these results support haploinsufficiency of SMCHD1 as a cause of D4Z4 hypomethylation in unrelated FSHD2 kindreds.
FSHD is characterized by low-level variegated expression of DUX4 in skeletal muscle. Therefore, we assessed DUX4 expression in skeletal muscle cells from control individuals after decreasing SMCHD1 by RNA interference (Figs. 3a,b). We detected no DUX4 mRNA in primary myotubes from an unaffected individual with a normal-sized and methylated D4Z4 array on the FSHD-permissive DUX4 pA haplotype. In contrast, DUX4 was transcriptionally activated in these myotubes (Fig. 3c) when SMCHD1 transcripts and protein were suppressed to <50% of normal levels. We observed a variegated pattern of DUX4 protein in myotubes in all samples with adequate SMCHD1 knockdown (Fig. 3d); this pattern is similar to that seen in myotubes from FSHD2 patients. Cells expressing a scrambled or ineffective shRNA did not express DUX4 (Fig. 3, cont. and 4059).
In order to demonstrate that the SMCHD1 splice mutations identified in FSHD2 patients result in DUX4 expression, we manipulated SMCHD1 pre-mRNA splicing in skeletal muscle cells using antisense oligonucleotides (AONs) directed to exon 29 or 36. These AONs induced skipping of SMCHD1 exon 29 or 36 at rates comparable to those detected in some FSHD2 patients and resulted in transcription of DUX4 (Fig. 3e,f). Thus, SMCHD1 activity is necessary for the somatic repression of DUX4, and reduction of this activity produces D4Z4 arrays that express DUX4 when an FSHD-permissive DUX4 haplotype is present, with a pattern of variegated expression similar to that observed in FSHD1 and FSHD2 myotube cultures.
SMCHD1 belongs to the SMC gene superfamily that regulates chromatin repression of loci in many different organisms, including silencing mating loci in yeast29, dosage compensation in C. elegans30,31, position-effect variegation in D. melanogaster32, and RNA-directed DNA methylation in Arabidopsis33. SMCHD1 was first identified in a mouse mutagenesis screen for modifiers of the variegated expression of a multi-copy transgene34. Gene targeting confirmed that Smchd1 was necessary for hypermethylation of (a subset of) CpG islands associated with X-inactivation, and continued association of the Smchd1 protein with the inactive X suggested its continuous requirement in maintaining X inactivation35,36. Our observations paint a strikingly similar picture for SMCHD1 and the D4Z4 arrays: SMCHD1 is necessary for D4Z4 hypermethylation, SMCHD1 remains associated with the D4Z4 array in skeletal muscle cells, and its continuous expression is required to maintain array silencing. It will be interesting to examine individuals with SMCHD1 mutations for subclinical abnormalities of X-inactivation.
The Smchd1 mutation was originally called the Momme D1 (Modifiers of Murine Metastable Epialleles D1) locus34. The term metastable epiallele has been applied to genes that show variable expression because of probabilistic determinants of epigenetic repression37. An example of a metastable epiallele in mice is the agouti viable yellow (Avy) locus; coat colors of isogenic mice can vary based on the epigenetic state of a retrotransposon integrated near the agouti promoter38. SmcHD1 is a modifier of metastable epialleles because SmcHD1 haploinsufficiency increased the penetrance of agouti expression34. In the case of FSHD, decreased levels of SMCHD1 resulted in decreased D4Z4 CpG methylation and variegated expression of DUX4 in myonuclei. In both FSHD1 and FSHD2, the penetrance is incomplete, and the presentation is often asymmetric. Out of the 26 hypomethylated individuals with a SMCHD1 mutation and carrying a permissive D4Z4 haplotype, five of them are asymptomatic (19%) (Supplementary Table 2). This proportion of clinically unaffected carriers is remarkably similar to FSHD139 although a recent publication corroborates an earlier observation that non-penetrance may be much more frequent40,41. Thus, both features are consistent with FSHD as a metastable epiallele disease. Our demonstration that independently variable modifiers of D4Z4 chromatin relaxation (repeat size for FSHD1 and SMCHD1 activity for FSHD2) modulate the variegated expression of DUX4, suggests that DUX4 should be regarded as a metastable epiallele causing phenotypic variation in humans.
The disease mechanisms of FSHD1 and FSHD2 converge at the level of D4Z4 chromatin relaxation and the variegated expression of DUX414,15. Both FSHD1 and FSHD2 require inheritance of two independent genetic variations: a version of the DUX4 gene with a polyadenylation signal and a second genetic variant that results in D4Z4 chromatin relaxation. For FSHD1 the genetic variant associated with chromatin relaxation is contraction of the D4Z4 array and is therefore transmitted as a dominant trait. For FSHD2, mutations in SMCHD1, which is on chromosome 18, segregate independently of the FSHD-permissive DUX4 allele on chromosome 4 and result in a digenic inheritance pattern in affected kindreds. Considering the variable clinical severity and asymmetric disease presentation, as well as the FSHD2 families without SMCHD1 mutations, it is likely that other modifier loci will be identified that affect the chromatin structure of D4Z4. SMCHD1 mutations could also modify the penetrance of FSHD1. Finally, many other human diseases show variable penetrance that might be related to epigenetic control. Our findings establish the possibility that SMCHD1 mutations modify the epigenetic repression of other genomic regions and the penetrance of other human diseases.
SeattleSeq Annotation: http://snp.gs.washington.edu/SeattleSeqAnnotation131/;
1000 Genomes: http://www.1000genomes.org/;
Mutalyzer 2.0.beta-21: https://mutalyzer.nl/;
FSHD genotyping and methylation analysis protocols: http://www.urmc.rochester.edu/fields-center/
NCBI NM_015295.2 (mRNA SMCHD1) and NP_056110.2 (protein SMCHD1)
Forty-one FSHD2 patients were selected based on published clinical and molecular criteria5,7,43,44 and D4Z4 methylation levels <25% as described in the previous section (Supplementary Table 1). Assessment of the FSHD2 phenotype was determined by experienced neurologists (RT, BGME, GWP, SS, CD, MV). Initial testing was performed using Pulsed Field Gel electrophoresis and hybridization of Southern blots with P13E-11, “A” and “B” probes, and SSLP length determined using an ABI Prism 3100 Genetic analyzer41,45,46 according to protocols at the Fields Center for FSHD Research website. Forty of them had D4Z4 array sizes >10 units on both chromosomes 4 ruling out FSHD1. One patient had 2 contracted alleles on chromosome 10 possibly explaining the low D4Z4 methylation and was therefore excluded from further studies. Of the 39 remaining families, of 13 we had sufficient family information suggesting dominant inheritance of the D4Z4 hypomethylation and in 7 cases the hypomethylation appeared to have occurred de novo (Fig. 1b). For exome sequencing, we selected 5 families with a dominant inheritance pattern and 2 with de novo hypomethylation in the patient. In total 14 individuals from these families were analyzed by exome sequencing. All participants provided written consent, and the Institutional Review Boards of participating institutes approved all studies.
Genomic DNA was double digested with EcoRI and BglII overnight at 37°C and cleaved DNA was purified using PCR extraction columns (supplementary note). Purified EcoRI/BglII digested DNA was digested with FseI for 4 hours, separated by size on 0.8% agarose gels, transferred to a nylon membrane (Hybond XL, Amersham) by Southern blotting and probed using the p13E-11 radiolabeled probe22. Probe signals were quantified using a phosphorimager and Image Quant software. The signal from the total amount of hybridizing fragments 4061 bp (methylated fragments) and 3387 bp (unmethylated fraction) was divided by the signal quantity from the 4061 bp fragment to give the percentage of methylated FseI sites within the most proximal D4Z4 unit (see supplementary note).
We targeted all protein-coding regions as defined by RefSeq 36.3. Entries were filtered for the following: (i) CDS as the feature type, (ii) transcript name starting with “NM_” or “-”, (iii) reference as the group_label, (iv) not being on an unplaced contig (for example, 17|NT_113931.1). Overlapping coordinates were collapsed for a total of 31,922,798 bases over 186,040 discontiguous regions. A single custom array (Agilent, 1M features, aCGH format) was designed to have probes over these coordinates as previously described, except here, the maximum melting temperature (Tm) was raised to 73 °C. The mappable exome was also determined as previously described using this RefSeq exome definition instead. After masking for ‘unmappable’ regions, 30,923,460 bases were left as the mappable target.
Genomic DNA was extracted from peripheral blood lymphocytes using standard protocols. Five micrograms of DNA from each of the eight individuals was used for construction of a shotgun sequencing library as described previously using paired-end adaptors for sequencing on an Illumina Genome Analyzer II (GAII). Each shotgun library was hybridized to an array for target enrichment; this was then followed by washing, elution and additional amplification. Enriched libraries were then sequenced on a GAII to get either single-end or paired-end reads.
Reads were mapped and processed largely as previously described. In brief, reads were quality recalibrated using Eland and then aligned to the reference human genome (hg19) using Maq. When reads with the same start site and orientation were filtered, paired-end reads were treated like separate single-end reads; this method is overly conservative and hence the actual coverage of the exomes is higher than reported here. Sequence calls were performed using Maq and these calls were filtered to coordinates with ≥8× coverage and consensus quality ≥20.
Indels affecting coding sequences were identified as previously described, but we used phaster instead of cross_match and Maq. Specifically, unmapped reads from Maq were aligned to the reference sequence using phaster (version 1.100122a) with the parameters -max_ins:21 -max_del:21 -gapextend_ins:-1 -gapextend_del:-1 -match_report_type:1. Reads were then filtered for those with at most two substitutions and one indel. Reads that mapped to the negative strand were reverse complemented and, together with the other filtered reads, were remapped using the same parameters to reduce ambiguity in the called indel positions. These reads were then filtered for (i) having a single indel more than 3 bp from the ends and (ii) having no other substitutions in the read. Putative indels were then called per individual if they were supported by at least two filtered reads that started from different positions. An ‘indel reference’ was generated as previously described, and all the reads from each individual were mapped back to this reference using phaster with default settings and -match_report_type:1. Indel genotypes were called as previously described.
To determine the novelty of the variants, sequence calls were compared against 1200 individuals for whom we had previously reported exome data, and to the 1000 genomes database dbSNP. Annotations of variants were based on NCBI and UCSC databases using an in-house server (SeattleSeqAnnotation). Loss-of-function variants were defined as nonsense mutations (premature stop) or frame-shifting indels. For each variant, we also generated constraint scores as implemented in GERP.
Candidate genes were ranked by summation of variant scores calculated by counting the total number of nonsense and nonsynonymous variants across the five FSHD2 exomes.
Sanger sequencing of PCR amplicons (LGTC, Leiden, Netherlands) from genomic DNA was used to confirm the presence and identity of mutations in SMCHD1 via exome sequencing and to screen the mutation in affected and unaffected family members of FSHD2 families.
Primary human myoblasts were obtained through the Fields Center at the University of Rochester (http://www.urmc.rochester.edu/fields-center/protocols/myoblast-cell-cultures.cfm). Biopsies were obtained after full consent with an IRB-approved protocol. Consents included the possibility of exome sequencing and sharing of samples with other investigators. Normal human myoblasts were grown on dishes coated with .01% Calf skin collagen (Sigma Aldrich, St. Louis, MI) in F10 medium (Invitrogen) supplemented with 20% FBS, 100U/ml penicillin and 100μg/ml streptomycin, 4μg/ml bFGF (Invitrogen), and 1 μM dexamethasone (Sigma Aldrich), in a humidified atmosphere containing 5% CO2 at 37°C13. Transduction of human myoblasts with retroviral vectors was accomplished by seeding cells at 5 × 104 cells/cm2 density on day −1. On Day 0 the medium is changed and cells are incubated with vector preparations and polybrene (4 μg/ml, Sigma Aldrich). 2–4 hours later the medium is replaced with a fresh sample and cells are cultured and split at ~75% confluence to prevent differentiation. Human myoblasts transduced with pGIPZ shRNA expression vectors were selected with puromycin (0.5 μg/ml). Differentiation was induced using F10 medium supplemented with 1% horse serum and ITS supplement (insulin 0.1%, 0.000067% sodium selenite, 0.055% transferrin; Invitrogen).
Fibroblast obtained from FSHD2 patients and family members were cultured in DMEM/F-12 media supplemented with 20% heat inactivated fetal bovine serum, 1% penicillin/streptomycin, 10mM HEPES, 1mM Sodium Pyruvate (all Invitrogen).
Total RNA was extracted using the Qiagen miRNeasy mini isolation kit with DnaseI treatment. The RNA concentration was determined on a ND-1000 spectrophotometer (Thermo Scientific, Wilmington, USA) and the quality was analyzed with a RNA 6000 Nanochip Labchip on an Agilent 2100 BioAnalyzer (Agilent Technologies Netherlands BV, Amstelveen, The Netherlands). cDNA was synthesized from 2 μg of total RNA using random hexamer primers (Fermentas, St Leon-Rot, Germany) and the RevertAid H Minus M-MuLV First Strand Kit (Fermentas Life Sciences, Burlington, ON, Canada) according to the manufacturer’s instructions. After the cDNA reaction 30 μL of water was added to an end volume of 50 μL.
Splicing alterations were analyzed by RT-PCR using different primer sets covering the exons surrounding the possible splice site mutation. Subsequently, PCR fragments obtained from SMCHD1 heterozygotes and control samples were analyzed on 1,5-2% agarose gels. Fragments were isolated from gel and analyzed by Sanger sequencing (LGTC).
Allelic expression analysis of missense mutations (wild type versus mutant allele) was done by Sanger sequencing (LGTC) by comparison of nucleotide peak heights of wild type and mutant alleles.
DUX4 mRNA levels were analyzed in duplo by real-time RT-PCR using SYBR Green QPCR master mix kit (Stratagene) on a MyiQ (Biorad Laboratories, Veenendaal, The Netherlands) running an initial denaturation step at 95°C for 6 min, followed by 40 cycles of 10 s at 95°C and 30 s at 60°C (35 cycles for DUX4 RT-PCR samples shown in Fig. 3e,f). All PCR products were analyzed on a 2% agarose gel. Expression levels were corrected for GAPDH and GUS as constitutively expressed standard for cDNA input, and the relative steady-state RNA levels of the genes of interest were calculated by the method of Pfaffl47. All primers were designed using Primer 3 software and sequences are shown in Supplementary Table 3
Chromatin was prepared from myoblast cells lines fixed with 1% formaldehyde according to a published protocol48. Control and FSHD2 myoblast carried a comparable total number D4Z4 repeat units on permissive and nonpermissive chromosomes. 60ug chromatin was incubated with the different antibodies. Every sample was independently studied twice. Antibodies against SMCHD1 (ab31865) and H3 (ab1791) were purchased from Abcam (Cambridge, MA, USA). Normal rabbit serum was used to measure unspecific binding of proteins to beads. Immunopurified DNA was quantified with D4Z4 Q-PCR primer pair8 and quantitative PCR measurements were done with CFX96TM real time system using iQTM SYBRR Green Supermix. Relative enrichment values were calculated by subtracting the IgG ChIP values representing background from the ChIP values with the SMCHD1 and H3 antibody and SMCHD1 values were divided by H3 enrichment values for D4Z4 copy number correction.
Antisense oligonucleotides (AONs) for SMCHD1 exons 29 (29AON5) and 36 (36AON1) were designed based on the guidelines for Duchenne Muscular Dystrophy (DMD) exons (Supplementary Table 3)49. All AONs target exon-internal sequences and consist of 2′-O-methyl RNA with a full-length phosphorothioate backbone and were manufactured by Eurogentec (Seraing, Belgium). Human control myoblasts were seeded in 6 wells plates or 6 cm dishes at a cell density of approximately 1*104 cells per cm2 and cultured for 2 days. Myotubes were obtained by growing 70% confluence myoblasts for 4 days on differentiation media (DMEM (+glucose, +L-glutamin, +pyruvate), 2% horse serum). Four hours after the differentiation medium was added AONs were transfected at a 250nM concentration, using 2.5μl polyethyleneimine (MBI-Fermentas, Leon-Rot, Germany) per μg AON according to the manufacturer’s instructions. A FAM-labeled AON targeting exon 50 of the DMD gene was used to confirm the efficiency of transfection and exon skipping. Primers flanking the targeted exons were used to study splicing of the SMCHD1 or DMD gene.
SMCHD1 transcripts were targeted for degradation using lentiviral vectors expressing short hairpin RNA’s from a CMV promoter and linked to a puromycin selection cassette by an internal ribosome entry site (IRES). Five different pGIPZ (Open Biosystems, Huntsville, AL) vectors were purchased and each was tested in normal human myoblasts for the effect on SMCHD1 transcripts by quantitative PCR, immunofluorescence signal intensity, and western blot.
Immunofluorescence for human DUX4 was performed using a rabbit monoclonal C-terminal specific antibody (Epitomics E5-5) as previously described15. Immunoreactivity was detected with a mouse anti-rabbit secondary antibody conjugated to Alexa Fluor 594 (Molecular Probes, 1:1000 dilution).
For western blotting, fibroblast or myoblast lysates were run on a 7.5% SDS-PAGE and transferred to PVDF membrane. SMCHD1 protein was detected using a commercially available rabbit polyclonal antibody (Sigma, HPA039441 (1:250 dilution)), and as reference protein Tubulin was detected with a commercially available mouse monoclonal antibody (Sigma, T6199 (1:2000)). Bound antibodies were detected with an HRP-conjugated donkey anti-rabbit (Pierce, 31458 (1:5000)) and an IRDye 800CW-conjugated goat anti-mouse antibody (Westburg, 926-32210 (1:5000)), respectively.
The authors thank all patients and family members for their participation. We thank Dr. Debbie Nickerson and Dr. Jay Shendure for excellent assistance, and Dr. Barbara Trask for helpful discussions and critical reading. This work was supported by grants from the NIH (NINDS P01NS069539; CTSA UL1RR024160; NIAMS R01AR045203; NHGRI HG005608 and HG006493), NGI Horizon Valorization Project Grant (Nr 93515504), The University of Washington Center for Mendelian Genomics, the MDA (217596), the Fields Center for FSHD Research, the Geraldi Norton and Eklund family foundation, the FSH Society, The Friends of FSH Research, EU FP7 framework program agreements 223026 (NMD-chip) and 223143 (TechGene), and the Stichting FSHD. Yu Sun is supported by China Scholarship Council.
Author contributionsR.J.L.F.L, R.T, M.J.B, S.J.T, D.G.M. R.R.F., B.B., A.A.R. and S.M. conceived of and designed the study. D.G.M. and S.M. directed the study. G.W.E.S., Y.S, Q.H. and D.G.M. performed the bioinformatics data analysis. R.J.L.F.L., D.G.M., L.M.P., J.B., G.J.B., A.M.A., P.J.V., R.A., K.R.S., Y.D.K, R.K. and J.C.G. performed experiments. R.T., J.T.D., C.M.D.S, G.W.P., B.G.M.E., G.N.F., M.V., C.D., and S.S. contributed samples, reagents, data and comments on the manuscript. R.J.L.F.L, S.J.T., D.G.M. and S.M. analyzed and interpreted data and wrote the manuscript with the assistance and final approval from all authors.