For the past 15 years, genomic studies of complex diseases have relied on a model in which common genetic variation contributes significantly to common diseases
[
82-
84]. Based on this model, the systematic genotyping of common variants was perceived as the best way to begin characterizing the allelic architecture of complex human traits
[
85]. To make such experiments possible required the development of highly accurate, low-cost, high-throughput genotyping platforms and a catalog of common human genetic variation like the HapMap project
[
86,
87]. Furthermore, because direct sequencing was not a viable strategy, assessing the role of common variation was really the only feasible genome-wide experiment. Thus, until recently the contribution of rare coding and noncoding variation to complex disorders like autism has gone largely unexplored.
While most quantitative traits, including human diseases, show substantial heritability in most populations, their allelic architecture remains poorly understood
[
88,
89]. Haldane in the 1920s was the first to recognize that deleterious alleles of large effect will be maintained only at very low frequencies in the general population
[
90]. Copy number variation studies of ASD have identified variants with a large effect size, having odds ratios (ORs) often greater than 5.0
[
23-
34]. Much as Haldane would have predicted, these variants are quite rare, often occurring much less often than one in a thousand in the general population, a frequency generally consistent with a large effect locus at mutation selection balance.
At the same time, genome-wide association studies have shown that common variants with large effects are unlikely to exist in the human population for many disorders, although a large number of loci with alleles with much smaller ORs (< 1.2) remains plausible (see review in
[
91]). This is borne out in ASD, as genome-wide association studies have identified just a few loci of interest, which have largely failed to replicate findings between studies
[
7,
20,
21], whereas a meta-analysis suggests it is extremely unlikely that any common variant influences autism susceptibility with an OR of greater than 1.5
[
15,
22].
Here we used targeted, massively parallel sequencing of two X-linked genes, previously shown to harbor very rare point mutations causing ASD, to explore whether they might also have rare noncoding variants at evolutionary conserved sites that act as ASD susceptibility alleles. Using this approach we found a set of seven candidate variants, including three located in the 3’UTR, in the two genes examined among the 144 individuals sequenced (Table
). As a comparison, a search for similar variants at highly conserved sites among 1,094 individuals sequenced and deposited into the 1,000 Genomes database identified a total of 49 3’UTR variants (38 SNVs, 11 indels) identified in
NLGN3 and
NLGN4X genes (Additional file
4). None of the indels were found in highly conserved regions. A total of seven SNVs were found at highly conserved sites (PhastCons > 0.7), and two of the variants had an estimated minor allele frequency of 0.001. The remaining five variants did not have an estimated minor allele frequency. In considering this comparison, it is important to note that because we sequenced our samples to a far greater depth as compared to the 1,000 Genomes samples, our study had a greater probability of detecting rare variation.
Functional analysis of the 3’ UTR variants in a luciferase assay did not show a statistically significant difference in their expression (Additional file
8). The most likely interpretation is that these variants do not influence the risk of autism in these probands. However, two points are worth noting. First, under a quantitative genetic model of autism, we would not expect to find noncoding variants with large effects (that is, monogenic causes of autism), and instead might expect to find many alleles at many different loci, each with modest effects
[
15,
92]. Second, our functional assays may be imperfect or insufficiently sensitive to reveal how these variants might act on their respective genes. Collectively these results point out the challenges of functional validation of alleles with modest effect sizes, even though the great heterogeneity of autism implies that such alleles should exist.
Our most promising intronic variant (chrX:70291656) is located in a highly conserved site in a TFBS that has been associated with neuronal dysfunction (Figure
B). The Bach1 transcription factor protects cells from damage by activating HO-1. Bach1 dysregulation has been associated with Down syndrome (DS): Bach1 is significantly overexpressed in the fetal cortex of DS fetuses when compared to controls
[
93], whereas in another study, expression was significantly reduced in the frontal cortex of DS patients. In Bach1 knockout mice, expression of Bach1 mRNA was significantly higher in the olfactory bulb, but lower in the cortex versus wild-type mice, providing another link to olfaction
[
94]. It is possible the variant we found within the conserved TFBS influences olfactory neuron development and expression which could contribute to the sensory dysregulation phenotype of ASD. Interestingly, the affected individual harboring this variant in addition to being autistic, is intellectually disabled. He is diagnosed with sensory abnormalities including increased sensitivity towards acoustic and decreased sensitivity towards tactile senses. Still, our data do not demonstrate that this variant is functional through a direct experiment, but do predict that effects ought to be observed in such an experiment (for example, ChIP Seq).
Compared to children without neurodevelopmental disorders, children with ASD demonstrate olfactory and taste dysfunction
[
95,
96]. Notably, in mice the
NLGN3 gene is expressed in all neurons of the olfactory bulb
[
40]. It is also interesting that we identified an intronic variant (chrX:70284973) that falls within a highly conserved TFBS related to olfactory neuron development (Figure
C). Interestingly, this variant is predicted to increase binding efficiency at this TFBS. The Roaz transcription factor regulates both the temporal and spatial pattern of olfactory neuronal gene expression by binding to a consensus recognition sequence and modulating transcriptional activity
[
81,
97]. Over 90% of children with ASD report sensory abnormalities, among them visual, auditory, tactile, and olfactory dysregulation (reviewed in
[
98]).
Our results highlight the importance of targeted sequencing of both coding and noncoding regions of candidate genes for complex, polygenic traits. Genetic studies of the X-chromosome have suggested that both rare and common X-linked variation may contribute to ASD
[
16,
17,
31,
99-
101], but much remains to be discovered. Although exome sequencing studies are now identifying point mutations, small indels, and
de novo variants that contribute to ASD
[
35-
37], these studies are limited by the regions they include in their exome capture chips, as well as biases in the capture efficiency of paralogous genes. Due to these constraints, these kinds of studies would have completely missed the noncoding variants we identified here. A study such as ours is also an important follow up for exome studies to assess the complete spectrum of genetic variation in genes known to harbor ASD-contributing mutations. These genes are often in candidate pathways related to neuronal development and function, and identifying mutations in noncoding and regulatory regions will likely shed more light on the etiology of ASD pathogenesis. As ASD is a polygenic trait, noncoding mutations probably play a role in the genetic contribution to ASD, in combination with other forms of genetic variation, including CNVs, coding mutations, and gene-disruptive indels that affect pathways related to brain development
[
102]. Still, our study points out that functional testing of rare variants remains challenging and not sufficiently high-throughput to perform this experiment on a genome-wide scale, especially when the effect sizes are modest. Finally, as whole-genome sequencing becomes increasingly cost effective and a more feasible experimental paradigm, detailed analyses of both coding and noncoding variation, as we have carried out here, can be expected to uncover ever more genetic variants that contribute to complex disorders like autism. These studies, however, will face significant challenges in direct functional testing of large numbers of these rare variants at highly conserved evolutionary sites.