|Home | About | Journals | Submit | Contact Us | Français|
The methyl-binding protein gene, MECP2, is a candidate for involvement in autism through its implication as a major causative factor in Rett syndrome that has similarities to autism. Rare mutations in MECP2 have also been identified in autistic individuals. We have examined the possible broader involvement of MECP2 as a predisposing factor in the disorder. Analysis of polymorphic markers spanning the gene and comprising both microsatellites and single nucleotide polymorphisms (SNPs) by the transmission disequilibrium test in two collections of families (219 in total), one in the USA and one in the UK, has provided evidence for significant association (P = 0.009) for a three-marker SNP haplotype of MECP2 with autism/autism spectrum disorders. This association is supported by association of both Single Sequence Repeat (SSR) and SNP single markers located at the 3′ end of the MECP2 locus and flanking sequence, the most significant being that of an indel marker located in intron 2 (P = 0.001 – Bonferroni corrected P = 0.006). This suggests that one or more functional variants of MECP2 existing at significant frequencies in the population may confer increased risk of autism/autism spectrum disorders and warrants further investigation in additional independent samples.
Autism is a neurodevelopmental disorder characterized by social and communication impairments, coupled with stereotyped and repetitive behaviours. It is far more common in males (4–10:1 male:female ratio; Folstein & Rosen-Sheidley 2001). Linkage studies have provided only weak evidence for the involvement of X-linked loci but have limited power to detect those of small effect size (Freitag 2007). Gauthier et al. (2006) have provided some evidence for the involvement of a locus at Xq27-q28 in the disorder employing both linkage and association, and rare mutations in X-linked genes such as neuroligin 3 (NLGN3) and neuroligin 4 (NLGN4) have also been reported to be associated with autism (Jamain et al. 2003).
Rett syndrome (RTT) is an X-linked neurodevelopmental disorder, with a prevalence of around 1 per 10 000–15 000 (Hagberg 1985; Hagberg & Hagberg 1997; Leonard et al. 1997). Many symptoms, including impaired language, stereotypic behaviours, high frequency of seizures and sleep abnormalities as well as the developmental timing are common to both RTT and autism. Indeed, misdiagnosis of RTT individuals as autistic can occur (Abdul-Rahman & Hudgins 2006; Moretti et al. 2005). Both disorders are grouped under the heading of pervasive developmental disorders (PDD) in Diagnostic and Statistical Manual of Disorders-IV (DSM-IV).
The MECP2 gene (Xq28) encodes the methyl-CpG-binding protein 2 (MECP2), and mutations in the gene are reported to be responsible for around 75% of cases of classical RTT. Methyl-CpG-binding protein 2 protein binds to methylated CpGs and is believed to repress the transcription of downstream genes, e.g. Bdnf in mice (Chen et al. 2003; Lewis et al. 1992; Meehan et al. 1992; Moretti & Zoghbi 2006); however, there is evidence that the situation is more complex with opposite directions of regulation existing in different tissues (LaSalle 2007). MECP2 comprises four exons, with the coding sequence shared among exons 2, 3 and 4 and a highly conserved 3′ untranslated region (UTR) of 8.5 kb (Coy et al. 1999). Both increases and decreases in MECP2 expression have been implicated in a range of PDDs including autism suggesting that a common pathway may be involved in these disorders (Samaco et al. 2004, 2005; Van Esch et al. 2005).
Although they are rare, mutations in the coding region of MECP2 have been observed in autistic individuals (Beyer et al. 2002; Carney et al. 2003; Lam et al. 2000; Lobo-Menendez et al. 2003; Vourc’h et al. 2001). Other studies by Shibayama et al. (2004) detected one missense and two 3′ UTR variants in 24 autism patients vs. only one missense mutation in 144 ethnically matched individuals without autism. Recently, Liu and Francke (2006) showed that certain sequence motifs, distributed over a distance of 130 kb in and around the MECP2 gene, make up a ‘functional expression module’ containing enhancers and silencers. These interact with the MECP2 promoter and affect the tissue-specific, developmental stage-specific or splice-variant-specific control of MECP2 protein expression.
Thus, variations throughout the coding and non-coding regions of the MECP2 gene, as well as flanking regions, could be important factors contributing to the complex disorder of autism. This study was designed to investigate a series of polymorphic variants in the MECP2 gene, including flanking and intronic regions, as potential markers for the disorder by transmission disequilibrium tests (TDTs) in two series of autistic families.
The DNA was available from 219 families with an affected autism spectrum disorder proband, some of their siblings (n = 81) and one or both the proband’s parents (219 mothers and 196 fathers). The sample collections taken from two main sources are as follows.
These 121 families were mainly families in which the proband was diagnosed by the National Specialist service multidisciplinary team at the Michael Rutter Centre, South London and Maudsley (SLAM) NHS Trust, London, UK. Some additional probands were included in this sample from three other sources: Dr Anne O’Hare at Edinburgh University; Dr A. J. Sharma at the Mary Sheridan Child Development Centre, Camberwell and The Behaviour Genetics Clinic, SLAM NHS Trust (young people with autism spectrum disorders). Cases with an autism spectrum disorder (which was defined as a case with autistic disorder, atypical autism, Asperger’s syndrome, pervasive developmental disorder and others) were included. Diagnoses were made by experienced clinicians following multidisciplinary assessments according to ICD-10 criteria and wherever possible with the assistance of structured parent interviews and semi-standardized observational assessments [the Autism Diagnostic Interview (ADI-R; Lord et al. 1994) or an updated version of the Development and Wellbeing Assessment (Goodman et al. 2000) and/or the Autism Diagnostic Observational Schedule (ADOS); n = 70]. Cases were excluded if they had another known, possible autism-causing medical condition (e.g. fragile X syndrome or tuberous sclerosis) or if their IQ was below 35.
This sample (n = 98 families) is a subset of the sample collected for the Autism Genetic Resource Exchange (AGRE) (Geschwind et al. 2001) with the support of the Cure Autism Now (http://www.cureautismnow.org: now merged with Autism Speaks – http://www.autism-speaks.org) under the direction of a scientific steering committee (see websites). The DNA used in this study is from trios with an affected proband and was collected for the initial purposes of screening. All probands meet the criteria for autism spectrum disorder in one of three categories: (1) autism diagnosed using the ADI-R, (2) Not Quite Autism (NQA), where individuals are not more than one point away from meeting autism criteria on any or all the three ‘content’ domains (i.e. social, communication and non-social behaviour) and meet criteria for the ‘age of onset’ domain or who meet criteria on all three ‘content’ domains but not on the ‘age of onset’ domain and (3) broad spectrum where individuals who showed patterns of impairment along the spectrum of PDD. This is a broad diagnostic category and encompasses PDDs such as Asperger’s syndrome. It is operationally defined according to domain scores on the ADI algorithm and includes cases who show (1) severe deficit in at least one domain, (2) more moderate deficits in at least two domains and (3) only minimal deficits, but in all three domains.
Diagnoses were confirmed using the ADI-R (Lord et al. 1994) and/or the ADOS in 140 cases. In total, there were 170 cases with a diagnosis of autism and 1 case with NQA, and 38 with a broad-spectrum diagnosis [13 from the AGRE sample and 25 from the Molecular Genetics of Autism Study (MGAS) sample]. Genotypic data were available on 715 individuals. This included 219 autism spectrum disorder probands, 81 siblings and 415 parents.
Microsatellite genotyping and analysis was set up and performed in-house. Using the University of California Santa Cruz (UCSC) Genome Browser, four microsatellite repeats were identified in the MECP2 gene whose properties suggested that they were likely candidates to be variable in the population. Primers were designed using Primo Multiplex to flank all four of these loci, and initial polymerase chain reactions (PCRs) were carried out using standard conditions to investigate heterozygosity in around 20 samples of female control DNA. Two of the four markers were extensively polymorphic and therefore informative. These markers were PCR amplified in separate 25 μl reactions using different template DNA concentrations, annealing temperatures and magnesium chloride concentrations (40 ng DNA, 3.5 mm MgCl2 and 59.3°C for marker 1 and 20 ng DNA, 2.5 mm MgCl2 and 56°C for marker 2). The PCR products from the single reactions were multiplexed for loading onto an ABI prism 3100 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). Microsatellite marker 1 was an AT dinucleotide repeat located at chrX:152823442. Marker 2 was a GT dinucleotide repeat located at chrX:152832198. These are both loci within the large intron 2 in the MECP2 gene (Fig. 1).
Six single nucleotide polymorphisms (SNPs) were selected initially, using Haploview version 3.2, on the basis of their having minimum allele frequencies predicted to be high enough to be informative (>0.05) and locations that were spread across the gene and capable of capturing variation in the single block of linkage disequilibrium (LD) across the whole gene. These SNPs (for which one assay failed) were genotyped at K Biosciences (Hoddesdon, UK; http://www.kbioscience.co.uk). The five SNPs for which data were obtained are from 3′ to 5′: (1) rs5945173 (A/G), chrX:152771019; (2) rs11465839 (C/T), chrX: 152801114; (3) rs2734647 (C/T), chrX:152813027; (4) rs5945175 (C/T), chrX:152879604 and (5) rs5945397 (A/G), chrX:152894373 (Fig. 1). Genotyping success rates achieved by the company varied between 93% and 97% for female samples and between 95% and 97% for male samples with an exception being rs5945173, where the success rate was 90%. This SNP was of low information content in transmission analysis, and although it was included in the preliminary analysis, its contribution to the final haplotype analysis was minimal. Allele frequencies were calculated separately for males and females and showed no significant differences (P > 0.6). All SNPs examined were in Hardy–Weinberg equilibrium with P > 0.05 after correction for multiple testing, apart from rs2734647 (P = 0.011). The five SNPs cover the region defined as the ‘MECP2 functional expression module’ (Liu & Francke 2006). SNPs rs2734647, rs5945175 and rs5945397 and the Single Sequence Repeat (SSR) markers are within a block of high LD covering the MECP2 locus, and SNPs rs5945173 and rs11465839 are located on either side of the 3′ enhancer region F17.
Based on May 2004 assembly release, the co-ordinates of the MECP2 region (which includes the IRAK1 locus which abuts the 3′ region of MECP2 and overlaps the enhancers examined) were chrX:152,789,145-152,902,943. The tag SNPs selected capture 94% of the known variation at r2 = 0.8 within this region (http://www.broad.mit.edu/mpg/tagger/) based on HAPMAP Build 35 release 21a/phase II jan 07 on NCBI B35 assembly dpSNP 125. The Centre d’Etude du Polymorphisme Humain Collection-European (CEPH CEU) population was examined by pairwise tagging.
Distortion in the transmission of microsatellite and SNP alleles was tested for by the TDT implemented in tdtphase from the unphased version 2.404 (2003) suite of programs (Dudbridge 2003). Analysis was stratified by sample collection, sex and phenotypic subgroups. For microsatellite 1, alleles were clumped using the program nocom version 20 (April 2007) (Ott 1979) to estimate the best fitting normal distributions for observed allele counts.
Data were analysed using the TDT for deviations from expected random inheritance of alleles by offspring, which may indicate their tracking of predisposing or protective variants of the gene. Once data with any Mendelian errors or missing information had been removed, 219 affected families were available for analysis with both markers employing tdtphase.
For SSR marker 1, an inspection of the pattern of allele frequencies indicated that allele sizes approximated to a trimodal distribution, suggesting that the three groups (numbered 1–3) contained alleles that may be closely related in evolutionary terms by replication slippage. Examining the fit of the allele size and frequency data to one or more normal distributions employing the program nocom (Ott 1979) indicate that two distributions fit significantly better than one and three distributions significantly better than two (P = < 0.0001) centring on mean allele sizes of 212, 228 and 240 bp with frequencies of 0.76, 0.18 and 0.06 (Fig. 2).
As an expedient to reduce the requirement for conservative multiple testing, we therefore analysed the frequencies of the three size groups by TDT. This indicated that allele group 1 is over-transmitted to the probands in the combined sample (P = 0.021), while allele group 2 shows a trend to be under-transmitted to the probands (P = 0.073) (Table 1).
Analyses for marker 1 were also carried out on each sample (AGRE and MGAS) separately to see if one cohort was driving the pattern of results (Table 1). As shown, the strongest significance was found in the AGRE sample for the under-transmission of allele group 2 (P = 0.010), followed by over-transmission of allele group 1 (P= 0.018), with the MGAS samples following the same pattern of over-transmission for allele group 1, although the figures are not significant.
The distribution of the allele sizes was also examined for marker 2; however, there was no obvious division into allele size groupings that would have enabled simplification of the analysis. Nevertheless, it appeared that although the original sequence analysis of marker 2 indicated it to be a dinucleotide repeat (with consecutively sized products that are presumptively 2 bp different from one another), there were a substantial number (n = 101) of alleles representing nine different sizes separated by 1 bp from the dinucleotide peaks (data not shown). As the amplification method had been designed to eliminate problems with the sporadic addition of an extra nucleotide (plus-A) (Brownstein et al. 1996), the possibility that these additional alleles represented independent alleles was therefore investigated. It was subsequently found from the published sequence that the amplified region between the primers for marker 2 also contains a single base pair indel polymorphism G/− (rs11348580) in a string of G residues located 3 bp 3′ to the dinucleotide repeat, which created two allelic series separated by 1 bp. Unfortunately, the sequence surrounding the indel precluded a simple PCR-based genotyping assay. We decided, therefore, to analyse the transmittance and non-transmittance of alleles with and without a G deletion indirectly by comparing products with an odd number vs. products with an even number of base pairs (Table 2). The alleles of the indel employing this indirect assay were in Hardy–Weinberg equilibrium (P > 0.05 after correction for multiple testing). As indicated, even-sized alleles (no G deletion) were significantly over-transmitted, while odd-sized alleles (G deletion) were significantly under-transmitted (P = 0.001). A TDT was also carried out for this single base pair deletion allele in the MGAS and AGRE samples separately. As shown in Table 2, the MGAS samples showed a significant over-transmission of the even-sized (no deletion) allele (P = 0.039), and although this over-transmission was not significant for the AGRE sample (P = 0.103), the bias was again for the even-sized allele to be transmitted more often (75 transmitted vs. 65 not transmitted). Because the over-transmission of the even-sized allele was at its most significant when the MGAS and AGRE samples were combined (P = 0.001), it appears that the finding is consistent in both cohorts.
The distribution of the five successfully genotyped SNPs across the MECP2 locus is shown in Fig. 1. As was the case for microsatellites, TDT analysis of the SNP data was carried out using tdtphase (Table 3). The data for the two samples combined indicated that SNPs rs5945173 and rs2734647 were significantly biased in transmission (P = 0.006 and P = 0.016, respectively). Evidence for the remaining SNPs was compromised by the low frequencies of the rarer alleles – in spite of our preliminary selection for minor allele frequencies to be >0.05 – the frequencies for SNPs rs11465839, rs5945175 and rs594539 that we observed were only between 0.01 and 0.04, and consequently, few informative transmissions were available for analysis, and they were not considered from the viewpoint of multiple testing. When the AGRE and MGAS cohorts were analysed separately, the same markers were significant in the AGRE cohort (rs5945173, P = 0.003 and rs2734647, P = 0.008). Analysis of MGAS data alone did not show significance for the two SNPs that were significant in the AGRE sample, although the direction of the biased transmission in the AGRE sample was weakly echoed.
Application of a Bonferroni correction for multiple testing of the indel and the two usefully informative SNPs together with the microsatellite SSR marker 1 reduces the probability for association in the combined samples as follows: indel rs11348580 (0.004), rs5945173 (0.024) and rs2734647 (0.064) leaving the indel and rs5945173 significant and rs2734647 showing a trend to significance.
To investigate the effect of using narrower phenotypic criteria, TDT analysis was repeated on the combined sample after removing all families in which the proband was allotted a broad-spectrum diagnosis. This left only families where the proband had a pure ‘autism’ or an ‘Asperger’s syndrome’ diagnosis. Asperger’s individuals were still included in the narrow phenotype sample both to maintain a relatively large sample size and because their social interactions are as severely impaired as they are for autistic individuals. We have previously indicated the importance of X-linked loci to the social substrate of the autistic phenotype (Loat et al. 2004) and, therefore, all probands for whom the diagnosis implied a level of deficit in social functioning equivalent to autism were analysed. Analysis of the data after the removal of the less severe phenotypic group of probands (which included 39 individuals, most of whom had a PDD nitric oxide synthase diagnosis) marginally increased the significance of rs5945173 (P = 0.002) and rs2734647 (P = 0.006).
Haplotype analysis for rs5945173 and rs2734647 (i.e. those with appreciable heterozygosity, c. 30%) indicated that the G-C haplotype was significantly over-transmitted (P = 0.011) and the A-T haplotype was significantly under-transmitted (P = 0.021). When the LD patterns at the rare SNPs are also included to determine if this can subdivide the common G-C haplotype using the whole panel of five SNPs (numbered in order from 3′ to 5′ – Fig. 1) this showed that the G-T-C-T-G haplotype was significantly over-transmitted (P = 0.004), while the A-T-T-T-G haplotype was under-transmitted P = 0.032) from parents to autistic offspring (Table 4). If the usefully informative SNP markers (rs5945173 and rs2734647) are combined with the indel within SSR marker 2, the haplotype significance for over-transmission for G-C odd is 0.009. Given the patterns of significance for the SSR marker 1, the indel and SNPs, it appears that any functional variant associated with autism and tracked by these may be located towards the 3′ end of the gene.
The possibility that MECP2 variants may be involved in autism has been suggested by previous research identifying a clear association between mutations in MECP2 and RTT and by the acknowledgement of an overlap in behavioural phenotypes and gene expression profiles in brain tissue from RTT and autistic individuals. Furthermore, there is some evidence suggesting that mutations in the coding, and 3′ UTRs of MECP2 may be associated with the disorder (Carney et al. 2003; Lam et al. 2000; Shibayama et al. 2004). Recent evidence additionally suggests that regions of the genome within and near to the MECP2 locus harbour regulatory elements affecting its transcription (Liu & Francke 2006). The study reported here aimed to investigate whether there is evidence for a bias in the transmission, from parents to autistic offspring in two different sample collections, of particular alleles at several polymorphic sites in and around MECP2. Data from two microsatellites located in intron 2, and five SNPs within, upstream and downstream of the gene were analysed. Our results indicate that one or more relatively common variants in MECP2 may play some part in altering the risk for autism. Of the two highly polymorphic microsatellite markers we investigated, one gave a significant bias in transmission to probands when alleles were grouped based on clusters of sizes closely approximating to three normal distributions, the groups themselves being distinct from each other. The other microsatellite, while not significant itself, was located in close proximity to a single base pair deletion (rs11348580), which could be detected using the same microsatellite assay, and this single base pair insertion/deletion showed a significant association with autism in the combined sample.
Our SNP analyses also generated interesting results. While the most significant findings arose when one of the samples (AGRE) was considered in isolation, the two apparently associated SNPs (rs5945173 and rs2734647) continued to be significantly associated in the combined sample. While the data for the other SNPs examined did not suggest significant association, the low frequencies of the minor alleles resulted in relatively few informative transmissions. The reason for the difference between the strength of findings in the AGRE and MGAS samples is not clear but may be influenced by the fact that they were collected in different countries and diagnosed against slightly different criteria. Both samples were primarily made up from cases meeting clinical/ADI-R criteria for autism (79% MGAS and 86% AGRE). The slightly higher proportion of cases meeting the criteria for autism in the AGRE sample may be relevant, but the numbers in the samples were too small to permit meaningful subgroup analyses.
The most significant of the SNP haplotypes made up of varying numbers of SNPs was the five-marker G-T-C-T-G haplotype, which was transmitted to autistic probands on 35 occasions and not transmitted on only 15 for the combined sample. This may reflect the fact that there is strong LD in the MECP2 gene and that any true association of a functional variant with autism may be detected at some distance from the real locus of interest. The results of SNP analysis based on rs5945173 and rs2734647, which had relatively frequent minor alleles, indicated that they were significantly associated with autism both individually and as a haplotype. Furthermore, the combined haplotype for rs5945173 and rs2734647 and the indel within SSR marker 2 was also significantly associated (P = 0.009). This might imply that the 3′ end of the gene is more likely to track the truly functional variant. However, given the low number of informative transmissions for the other SNPs, it is conceivable that this concentration of significant findings at the 3′ end simply results from greater power to detect significance at these more informative loci.
The data from the MECP2 microsatellites and five SNPs were not incorporated into the same haplotype analyses because the nature of LD between these disparate types of markers is unknown and so the validity of such an approach to identify which specific region of a gene is of most interest is questionable. Nevertheless, it is reassuring that significant associations were observed with both SNPs and microsatellite/indel, and associations observed in one sample group were generally supported by trends in the same direction observed for the other group. It is reasonable to assume that true associations will be detected by single marker analysis, and this would suggest that the 3′ end of the MECP2 gene might be playing a role in vulnerability to autism. Given the concentration of Liu and Francke’s regulatory regions in this area, it is entirely possible that one or more functional polymorphisms in and around this end of the gene might be of importance. Furthermore, the observation of the functional importance of microRNA recognition elements located in the 3′ UTR of the long transcript of the gene, which is the predominant form in brain, adds support to this interpretation (Klein et al. 2007).
Further studies using other large autistic samples with clear inclusion criteria would help to clarify these findings. Overall, our results suggest that MECP2 may well be a predisposing factor for autism. Certainly the gene has an impressive list of credentials for involvement in behavioural disorders. An intriguing recent report suggests, for example, that MECP2 interacts with ATRX (X linked) and that point mutations within the methylated DNA-binding domain of MECP2 can disrupt this interaction in vitro and in vivo (Nan et al. 2007). ATRX has been implicated in a range of X-linked mental retardation syndromes (OMIM 300032). Furthermore, the involvement of the key neuronal growth factor gene, Bdnf, as a target (Chen et al. 2003) further points to MECP2’s potential involvement in important aspects of brain development. Most recently, the introduction of a conditional MECP2 allele with only 50% activity into mice (MECP2Flox/y) was observed to result in a spectrum of abnormalities including learning and motor deficiencies and altered social behaviour, indicating that more subtle changes in MECP2 activity may have a significant behavioural impact (Samaco et al. 2008).
Clearly, autism is a disorder with complex genetic aetiology, and MECP2 is a gene with wide-ranging interactions and mechanisms of effect. It is vital, therefore, that we incorporate sufficient complexity into our models for understanding in detail how MECP2 and its gene partners might be involved in vulnerability to autism.
C.S.L. was funded by a Medical Research Council (MRC) PhD studentship. Part of this work was funded by the National Institute for Health Research (NIHR) through the Biomedical Research Centre at the South London and Maudsley NHS Trust and Institute of Psychiatry. The authors also thank Ben Williams for his assistance with bioinformatics.