Our study represents the largest chromosome-wide study of the X chromosome in association with ASD. Combining three independent GWAS studies allowed us to thoroughly evaluate the X chromosome using meta- and joint analyses in the pooled sample to improve our study's power to detect true-positives. Also, the replication and candidate gene analyses were used to further filter out false-positives. We found one intronic SNP, rs17321050, in
TBL1X that demonstrated chromosome-wide evidence of association in the meta- and joint analyses, strongly supporting
TBL1X as a risk factor for ASD. This finding was further supported by the replication analysis. Our estimates of the odds ratios for rs17321050 in
TBL1X for males, based on the major allele as the reference, are 0.85 (95% confidence interval (95% CI) = 0.74 to 0.99) in the discovery data set and 0.74 (95% CI = 0.63 to 0.86) in the validation data set. Though modest, these effects were consistent across samples and similar to effect sizes seen for other significant regions in other recent GWASs of complex diseases [
9,
20]. One explanation for the modest odds ratios is that these associations reflect rare variants (structural variations or copy number variants (CNVs)) with very strong effects on ASD in LD with the significant SNPs. Follow-up sequencing might be warranted to identify the rare variants. Alternatively, the modest odds ratios might simply be due to the fact that the autism loci in LD with the SNPs have modest effects on the disorder. Moreover, the major allele in rs17321050, which is the positively associated allele, may be in LD with the risk alleles in the autism loci. Also, the SNP could have an unknown regulatory effect, as it is intronic with no predicted function. We performed a power study based on the sample sizes and show the power curves in Additional file
7. The power study suggests that, given the sample size and relative risk greater than 1.3, the test has more than 70% power to detect the true signal when the ASD allele frequency is greater than 0.3. The power study further demonstrates that the study design has the power to detect markers with modest effects on ASD.
The peak result in
TBL1X is particularly interesting because
TBL1X is in the Wnt signaling pathway [KEGG pathway ID:hsa04310]. The Wnt family of proteins is a highly conserved group of genes that are key mediators of cell-cell signaling during embryogenesis and play an essential role in the generation of normal embryos. Many WNT receptors are expressed during development and in the adult central nervous system. Previously published experiments have also suggested that the Wnt signaling pathway may function differently in brain regions based on expression analysis in mouse brain [
21].
TBL1X and its family member transducin β-like 1 X-linked receptor 1 (
TBL1XR1) [OMIM:608628] have been shown to interact with β-catenin and bind to the promoter of Wnt target genes induced by Wnt signaling [
22].
Engrailed 2 (
EN2) [OMIM:131310], a Wnt target gene [
23], has been associated with autism in several studies [
24-
27]. The
WNT2 gene (Wingless-type mouse mammary tumor virus integration site family, member 2) [OMIM:147870] is a candidate gene for autism [
28]. Therefore,
TBL1X plays a role in pathways that may be critical to the etiology of autism and is an excellent candidate gene.
The region containing
TBL1X carries CNVs that have been associated with a diverse range of neurodevelopmental phenotypes. Multiple studies have shown that deletions in the Xp22.2 to Xp22.3 distal region that contain
NLGN4 and
TBL1X are associated with autism. Researchers in one study found deletions encompassing the Xp22.3 region in three autistic females [
29]. Investigators in another study identified a 5.5-Mb deletion in the Xp22.2 to Xp22.3 region in a female who had autism, moderate mental retardation and some dysmorphic features [
30]. A familial deletion in the Xp22.2 to Xp22.3 region has been found to be associated with a variable phenotype, including autism, in female carriers [
31].
TBL1X is about 5 Mb from the
NLGN4 gene [OMIM:300427]. Researchers in several studies have suggested that deletions or point mutations in
NLGN4 are associated with autism [
32,
33]. Furthermore,
TBL1X is partially or completely deleted in patients with the ocular albinism with late-onset sensorineural deafness (
OASD) gene [OMIM:300650] carrying Xp22.3-terminal deletions. Given these findings, we used PennCNV [
34] to identify possible CNVs in the Xp22.2 to Xp22.3 region in our data set. Only one affected female sibling carried a single-copy duplication, with a size of 10.26 kb, in the
TBL1X gene region. However, the lack of CNV identification in the
TBL1X region might be due to the limitation of CNV identification algorithms based on GWAS data [
35].
The association at
TBL1X is even more intriguing, given that it is seen only in the male-only analysis. The
P values of the female-specific tests for the three significant markers given in Table were all greater than 0.1 in the individual, joint and meta-analyses. This suggests the possibility of a recessive effect at this locus; however, analysis of affected females with a recessive model did not improve the statistical significance. Alternatively, the lack of significance in females could be explained by skewed, allele-specific X-chromosome inactivation, which has been suggested previously in autism [
36], or simply by low power in the female subset due to its relatively small sample size compared to males, as shown in Additional file
4.
The
TBL1X gene has not been reported in other association studies of ASD. This might be due to the limited sample sizes, as we have shown that the odds ratios of the SNPs are modest. Wang
et al. [
9] also analyzed the X-chromosome markers for ASD using the AGRE and ACC data sets for replication analysis and meta-analysis. The markers in
TBL1X did not pass the meta-analysis threshold of 1 × 10
-4 in their studies. We increased the sample size by including the HIHG/CHGR data set as well as the AGRE and ACC data sets in the meta- and joint analyses, which increased the power of our association studies.
We also found that two candidate genes,
DMD and
IL1RAPL2, were significantly associated with the discovery and validation data sets based on
a priori hypotheses at previously identified candidate genes. The finding of a significant SNP (rs721699 in the
DMD gene) warrants mention in light of recent findings in which exon duplications in the
DMD gene were found to give rise to an autism phenotype [
37]. The results reported by Pagnamenta
et al. [
38] support previous work by Hendriksen and Vles [
39] in showing an increased rate, relative to the general population, of autism features in individuals with
DMD. Although this finding is most likely a result of LD with functional variants in
DMD, it adds to the growing body of work suggesting that autism may co-occur with other neurological conditions. Moreover, investigators in a recent study found a hemizygous deletion in the
DMD gene in a male who had been diagnosed with ASD and later with muscle weakness [
40]. Researchers in other studies have also suggested that
IL1RAPL1, which is closely related to the
IL1RAPL2 gene, identified in this study, is associated with autism [
41,
42].
Wang
et al. [
9] found that the three markers rs11798405, rs5972577 and rs6646569 on the X chromosome showedstatistically significant association with ASD, with
P values less than 10
-5 based on meta-analysis of individual
P values for AGRE and ACC. However, we observed significantly higher missing genotype rates for SNPs rs5972577 and rs6646569 in males than in females consistently across the HIHG/CHGR, AGRE and ACC data sets. As discussed in Additional file
8, these higher missing genotype rates can cause spurious association results. The problem of differential missingness between the sexes can be eliminated in sex-matched case-control studies. However, for family-based studies or case-control studies not matched on sex, the patterns of missing data between the sexes should be carefully examined, especially for sex-linked chromosomes. In Additional file
9, we show that the missing genotype rates for the markers in Table are very low, which suggests that the statistical significance for these markers are not a result of the informative missingness. The proportion of markers (for example, 4.3% in the HIHG data set with
P values less than 0.05 for the missingness tests between males and females) on the X chromosome that showed this differential missingness is what we expected by chance alone, which suggests that there is no systemic defect in the genotype calling algorithm for the X chromosome. The differential missingness between the sexes may be due to sequence homology or CNVs, which can introduce outliers into the intensity plots [
43]. We set the threshold for missing genotypes at 2.5% for either males or females for each SNP in our QC procedure to minimize the effects of this problem. This resulted in the removal of SNPs rs5972577 and rs6646569 from our analyses. We also found that reclustering the samples without reference samples before genotype calling reduced the effect of the nonrandom and allele-specific missingness on association tests. For example, rs11798405 did not show significant results (
P value greater than 0.05) in our reclustered AGRE samples, though it had a
P value = 0.0067 in the study be Wang
et al. [
9]. In another ASD study [
44], none of the markers on the X chromosome were reported to be significant. This could be due to the smaller sample size used in the study (2,394 probands) than we had in our study (3,503 affected individuals).