Sequence variants in families with XLMR
Genomic DNA from a male proband, or in five instances a female obligate carrier, from 208 families with multiple individuals with mental retardation and a pattern of transmission compatible with X linkage (see ) was sequenced through the coding exons of 718 X-chromosome genes (Supplementary Table 1). This set was composed of 699 out of 829 genes from the Vega database and 19 X-chromosome genes not included in Vega but present in Ensembl/NCBI (Supplementary Table 1). The average coverage of the 718 genes screened was 75%; therefore, the coverage of the full protein-coding sequences of the Vega X chromosome was 65%. Sixteen of the genes screened are in the pseudoautosomal regions common to the X and Y chromosomes and 702 are in the X-specific part. The screened DNA corresponds to ~1 Mb of coding sequence per sample and >200 Mb in total. The 208 families were prescreened and found negative for cytogenetic abnormalities at 500G banding resolution, for expansion of the FMR1 trinucleotide repeat and for unambiguous disease-causing sequence variants in the XLMR-causing genes published when the study was initiated (Supplementary Table 1).
Summary of the clinical features of the mental retardation probands studied
We detected 1,858 different coding sequence variants, 1,769 from the X-specific and 89 from the pseudoautosomal X-chromosome regions (). We found that 1,814 were single-nucleotide changes: of these, 980 caused missense amino-acid substitutions, 22 caused nonsense (termination) codons, 13 were abnormalities at highly conserved bases at splice acceptor and donor sites and 799 were synonymous (silent) changes. Three variants were missense double-nucleotide substitutions, and 41 variants were small insertions and deletions, of which 26 were in-frame and 15 caused translational frameshifts.
The dataset allows direct characterization of the pattern of haplotypic coding sequence variation of individual X chromosomes. Although ascertained from individuals with XLMR, only a small fraction of the observed variants is likely to cause mental retardation and, therefore, the set predominantly represents background population variation. Of the 1,769 coding sequence variants from the X-specific part of the X chromosome, 914 were nonrecurrent (that is, observed in only one XLMR-affected family) and 855 were recurrent (observed in multiple XLMR-affected families, ). We identified 63% of the recurrent and 16% of the nonrecurrent variants in the dbSNP database. The sequences of any two individuals differed on average by 109 variants. Of these, six were nonrecurrent, including four missense and two synonymous variants, and 103 were recurrent, including 40 missense, 60 synonymous and two in-frame insertions/deletions. The results illustrate that most coding sequence differences between individuals are recurrent (‘common’) variants despite the existence of a larger number of different nonrecurrent (‘rare’) variants.
Sequence variants that truncate proteins
A subset of the sequence variants is predicted to introduce a premature termination codon and hence truncate the wild-type protein sequence. Truncating variants are usually highly deleterious to protein function: they constitute a substantial proportion of monogenic (mendelian) disease-causing mutations but a relatively small proportion of polymorphisms. Therefore, as the first analytic step to identify new genes involved in mental retardation, we considered the set of truncating variants detected in the screen.
We observed 42 different truncating variants in 30 genes (); 40 were in 28 genes from the X-specific region of the X chromosome and 2 were in 2 genes from the pseudoautosomal region. In addition, we found four ‘read-through’ variants that cause a translational frameshift close to the wild-type termination codon and extend the open reading frame into previously untranslated 3′ DNA (described further in Supplementary Note online).
Truncating and read-through variants identified in the screen
Three truncating variants were recurrent (UBE2NL:266T>G, L89*; MAGEE2: 358G>T, E120* and GTPBP6: 118C>T, Q40*) (). These were found in controls at a similar prevalence to XLMR-affected families and so are unlikely to be responsible for mental retardation in the families in which they were identified. They each, however, are predicted to cause substantial truncation of the encoded proteins. Therefore, loss of some, or all, functions of UBE2NL, MAGEE2 and GTPBP6 seems compatible with normal development and intellectual function.
Thirty-eight truncating variants observed in the 702 genes from the X-specific part of the X chromosome were each found in only a single XLMR-affected family (that is, they were nonrecurrent variants). One gene (CUL4B) had five different nonrecurrent truncating variants, two genes (AP1S2 and UPF3B) had three, four genes (BRWD3, ZDHHC9, ITIH5L, SLC9A6) had two, and 19 genes had a single nonrecurrent truncating variant ( and Supplementary Fig. 1 online). Simulating a random distribution of truncating variants through the 702 genes and comparing it to the distribution observed provided strong evidence for clustering of these nonrecurrent truncating variants (P < 0.001) in a subset of genes. The clustering is consistent with this subset of genes being involved in XLMR, but other explanations cannot be excluded at this stage of analysis.
To evaluate further the genes with multiple truncating variants, we examined segregation of the variants in the families in which they were observed. Some of these results have been previously published16-21
. In brief, truncating variants in AP1S2, CUL4B, BRWD3, UPF3B, ZDHHC9
segregated completely with mental retardation in the families in which they were identified; that is, each truncating variant was present in all genotyped subjects with mental retardation and absent in unaffected males (Supplementary Fig. 1
). We sequenced all the coding exons of these six genes in control X chromosomes and did not find the truncating variants detected in XLMR-affected subjects or any other truncating variants (). The clustering of multiple different truncating variants in these genes, the evidence for segregation with mental retardation and the absence of truncating variants in controls indicate strongly that AP1S2, CUL4B, BRWD3, UPF3B, ZDHHC9
are XLMR genes. Five missense or in-frame variants in CUL4B, ZDHHC9
also showed evidence of involvement in mental retardation17,19,21
. Mental retardation–causing variants in AP1S2
together account for the disease in 22 (10.6%) families out of the 208 screened.
By contrast, neither of the two truncating variants in ITIH5L segregated completely with mental retardation ( and Supplementary Fig. 1). We analyzed the complete coding sequence of ITIH5L in controls and found one of the truncating variants previously observed in a subject with mental retardation. The lack of segregation with mental retardation, the presence of a truncating variant in normal controls and the recent finding of a likely mental retardation–causing IL1RAPL1 deletion in one family with an ITIH5L truncating variant (unpublished data, ) suggests that truncating variants in ITIH5L are not the cause of mental retardation in the families in which they were identified. Nevertheless, the strong evidence overall for the role in mental retardation of genes with more than one nonrecurrent truncating variant is reflected in the heterogeneity lod score of 18.3, with an estimated 92% families in this subset due to the truncating variant.
A single nonrecurrent truncating variant was found in 19 genes from the X-specific part of the X chromosome. Analysis of segregation in each family revealed that the truncating variant in nine genes (ATXN3L, DRP2, MAP3K15, MAP7D3, RPL9P7, SATL1, SSX6, SYTL5 and ZCCHC13) did not segregate with mental retardation ( and Supplementary Fig. 1). In nine of the remaining ten genes there was full segregation with the disease and in one, VSIG4, additional DNA samples were unavailable for testing. A heterogeneity lod score of 2.4 was obtained for the truncating variants in these 19 families, with mental retardation in 43% attributable to the truncating variant. Sequencing of the complete coding sequences of the 19 genes in male controls revealed one or more truncating variants in ATXN3L, BEX4, MAP3K15 and P2RY4 (). Furthermore, likely MR-causing abnormalities in MECP2, SLC9A6 and IL1RAPL1 have recently been found in affected individuals with single non-recurrent truncating variants in FAM47B, SATL1 and SAGE1, respectively (). Taken together, the results from 6 of the 19 genes with a single nonrecurrent truncating variant remain compatible with involvement in the causation of mental retardation (SYP, ZNF711, ARSF, ZNF183, VSIG4, and USP9X), whereas 13 others have one or more inconsistencies. To evaluate these six genes further, we sequenced their complete coding sequences in a further 914 male index subjects from XLMR-affected families and 1,129 male controls (Supplementary Table 2 online).
In SYP (also known as synaptophysin or p38) an additional nonsense variant was found in one of the 914 additional XLMR-affected subjects and an additional 4-bp deletion was identified in a second XLMR-affected subject ( and ). The nonsense mutation showed evidence of segregation with mental retardation (lod score 1.2). Samples were not available for evaluation of the 4-bp deletion. No SYP truncating variants were found in the additional 1,129 controls. Together with the data from the primary screen, three SYP truncating variants were found in 1,122 XLMR-affected subjects examined, two of which have been examined and segregate with the disease (combined lod score 1.7), and there were no truncating variants in 1,401 controls. A missense variant found in a single subject with mental retardation at an amino acid residue that is highly conserved and which segregated with mental retardation (lod score 1.8) is also likely to be implicated in disease causation (). These data implicate SYP in XLMR. In the three families with truncating variants, mental retardation was mild to moderate and there were no consistent additional features, although epilepsy was noted in some individuals. SYP encodes an integral membrane protein of small synaptic vesicles.
Figure 1 Pedigrees of families with likely deleterious variants in the SYP, ZNF711 and CASK genes. Shaded symbols indicate individuals with mental retardation and open symbols indicate individuals who are unaffected. Symbols containing a red square indicate individuals (more ...)
Likely deleterious variants in SYP and ZNF711
In ZNF711, an additional truncating variant was found in one subject ( and ) and showed strong evidence of segregation with mental retardation (lod score 2.1). No ZNF711 variants were found in controls. Together with the results from the primary screen, two truncating variants were found in 1,122 XLMR-affected individuals, both of which segregate with the disease (combined lod score 3.4), and no truncating variants were found in 41,200 controls. These results indicate that ZNF711 is also an XLMR-associated gene. The two families with truncating ZNF711 mutations had moderate mental retardation without consistent additional distinctive features. ZNF711 encodes a zinc-finger protein of unknown function.
In ARSF three additional truncating variants were found among the 914 subjects with XLMR; only one could be evaluated and did not segregate with mental retardation. At least one additional truncating variant was found in each of ARSF, VSIG4 and ZNF183 in controls. In total, we found four truncating variants in ARSF in 1,122 XLMR-affected subjects and five in 1,346 controls. In both VSIG4 and ZNF183 one truncating variant was found in 1,122 cases and one in 1,653 and 1,391 controls, respectively. The results therefore suggest that none of these three genes is likely to be involved in mental retardation. No further USP9X truncating variants were found in cases or controls; consequently, the role of USP9X in XLMR remains uncertain.
Nonsynonymous and synonymous variants
We identified 983 different single-base substitution missense variants ( and Supplementary Table 3 online). The 26 in-frame deletions/insertions found are listed in Supplementary Table 4 online and described further in the Supplementary Note. As appears to be the case for truncating variants, missense variants may include a subset that causes mental retardation, with the remainder representing background population variation. However, the prevalence of missense variants in normal individuals is much higher than that of truncating variants, and the disruption of protein function they entail is generally more modest. Therefore, disease-causing missense variants are likely to represent a relatively small fraction of the total and distinguishing them from rare polymorphisms is problematic. We applied two analytic approaches to identify potential mental retardation–causing missense variants.
Disease-causing missense variants generally alter amino-acid residues that are more highly conserved during evolution than polymorphisms. Thus, we ranked the 983 missense variants according to a score that reflects the conservation of each amino acid (see Methods). Scrutiny of the top ranking variants from this analysis highlighted CASK
. Only two missense variants in CASK
were identified in the primary sequencing screen, and these are positioned second and third in the ranking ( and Supplementary Table 5
online). In silico
and RT-PCR analyses indicate that one of these, 2129A>G (D710G), introduces a splice site that removes 27 bp of the coding sequence and thus nine amino acids of the CASK protein. Two further missense variants in CASK
were found in a screen of 150 additional families with XLMR, both of which are at highly conserved amino acids and would score second and sixth in the ranking of missense variants. We did not find any missense variants in the complete coding region of CASK
in 390 control X chromosomes. Mental retardation was mild to moderate in the four families with missense variants. In two, it was accompanied by nystagmus, a highly unusual accessory feature of XLMR, in multiple affected individuals. Three of the four variants segregate completely with mental retardation (). The fourth variant, in family 74, is present in the three individuals with both mental retardation and nystagmus, but is absent from an individual with mental retardation without nystagmus (III-4), who may be a phenocopy. While this manuscript was under review, heterozygous inactivating mutations of CASK
were reported to cause severe cerebral malformation in females22
and, in a male, a hemizygous truncation caused early neonatal lethality. These results are consistent with the discovery here of multiple different missense variants, which are likely to be less deleterious than truncating variants, in viable males with XLMR. CASK
encodes a calcium/calmodulin-dependent serine protein kinase that is a member of the membrane-associated guanylate kinase (MAGUK) family and is located at the postsynaptic membrane of central nervous synapses23
As a further strategy to identify missense variants causing mental retardation, we investigated the number of variants in each gene. Genes that show more amino acid variation in a human population than expected from their rate of evolution are identifiable by the McDonald-Kreitman test24
. Application of this test to a random population sample identifies positively selected genes. Application to X-chromosome genes in a sample ascertained for XLMR, however, would be expected to identify both positively selected genes and those with excess missense variants that cause mental retardation. We restricted application of the McDonald-Kreitman test to nonrecurrent variants, as recurrent variants are less likely to be implicated in mental retardation. The results highlighted ZFX
= 0.0014) and G6PD
= 0.008), both of which have previously been identified as strong candidates for recent positive selection25,26
. It also highlighted, however, four known genes involved in XLMR at similar levels of significance: HUWE1
= 0.007), OPHN1
= 0.001), MED12
(0.004) and PGK1
= 0.002). As these genes do not show evidence of recent positive selection25
, their excess variation may be due to mental retardation–causing missense variants. By contrast, zero genes known to cause diseases other than mental retardation (excluding G6PD) were highlighted. Of genes not yet implicated in a monogenic disease, only one was highlighted: SMARCA1
= 0.009). These results suggest that missense variants in known and possibly additional XLMR-associated genes account for the disease in a further subset of families.
We observed 428 recurrent and 328 nonrecurrent synonymous variants in the X-specific part of the X chromosome. Although most synonymous variants are biologically silent, a small subset may exert cryptic biological effects through alterations in transcript processing or splicing. To search for additional cryptic splice variants, we applied the program NNsplice27
to all synonymous and missense base substitutions. Three synonymous and seven missense variants were predicted with a high score (>0.9) to introduce a new splice site (Supplementary Table 6
online). One of these is the missense variant in CASK
(described above) that causes an abnormality of splicing. Of the remainder, five were recurrent and four were nonrecurrent variants. As all the likely mental retardation–causing variants thus far discovered in this study are nonrecurrent, these results suggest that many of the variants predicted to alter splicing are not implicated in mental retardation.
Characteristics of XLMR genes identified in this screen
Detailed clinical descriptions of the families with mental retardation–causing variants in six of the nine genes implicated in XLMR identified in this screen have been published (AP1S2
and therefore their features are only reviewed briefly here. Mental retardation ranged from mild to severe and most families were previously classified as having nonsyndromic mental retardation. Following the identification of the genes involved in mental retardation, however, phenotypic characteristics common to some affected males were identified: for example, epilepsy and ataxia (SLC9A6
), macrocephaly (BRWD3
), relative macrocephaly, hypogonadism, central obesity and tremor (CUL4B
), Marfanoid habitus (ZDHHC9
), elements of the FG and Lujan-Fryns syndromes (UPF3B
), epilepsy (SYP
) and nystagmus (CASK
). The encoded proteins have roles in vesicle trafficking (AP1S2
), chromatin structure (BRWD3
), nonsense-mediated RNA decay (UPF3B
), ubiquitination (CUL4B
), post-translational modification by palmitoylation (ZDHHC9
), synaptic function (SYP
) and synaptic signal transduction (CASK
Structural and evolutionary characteristics of XLMR genes
The known biological functions of proteins encoded by genes involved in XLMR are diverse. To explore further the attributes of these genes and their encoded proteins, we compared features of currently identified genes associated with XLMR (80 genes) to X-chromosome genes associated with disease phenotypes that do not include cognitive impairment (61 genes) and X-chromosome genes that have not yet been associated with a mendelian disease (608 genes).
Genes involved in XLMR are more constrained in their evolution between human and macaque as measured by the dN/dS ratio (nonsynonymous changes per nonsynonymous site/synonymous changes per synonymous site) of 0.17 than genes associated with a disease other than mental retardation (0.31, P
= 0.009) or genes not associated with a genetic disease (0.32, P
= 0.003). Similarly, their amino acid compositions are more highly conserved between human and mouse compared to the other two classes (P
< 0.01 for both comparisons). Previous studies have reported that genes implicated in nervous system diseases, genes associated with neurological functions and genes expressed in the brain undergo more purifying selection than genes in other functional classes28-30
. The source of this evolutionary constraint is unclear.
Both genes involved in XLMR and those associated with other diseases are characterized by longer protein-coding sequences (P
< 0.01), larger genomic footprints (P
< 0.02) and greater numbers of exons (P
< 0.01) than X-chromosome genes not associated with a disease. In part, this may be attributable to ascertainment bias. Larger genes are likely to have a higher mutation rate, because they constitute a bigger target for mutational processes, and therefore may have a greater prevalence of disease-causing mutations in the population. The higher the prevalence of disease-causing mutations, the more likely a gene is to have been identified as a disease gene. The comparatively large size of disease genes generally has previously been reported31
. Significant differences between the three groups of genes were not found in levels of brain expression, tolerance of common sequence variation or the presence of paralogs in the human genome.