Several linkage studies have suggested that chromosomes 2q and 7q may harbor one or more genes contributing to the risk for developing an ASD. Here, we have presented a comprehensive high-density SNP genotyping, association and CNV study covering the 2q23.3–q32.3 and 7q21.3–q33 chromosome regions. We have tested more than 3000 SNPs in each region, covering all known genes, as well as in highly conserved non-genic sequences.
The complementary case–control and family-based approach taken in our study allowed us to extract the maximum information from our sample, taking into consideration the advantages and disadvantages of the two different approaches. Case–control studies are more powerful compared to family-based approaches, but are sensitive to the presence of population stratification. Structure analysis using 50 genome-wide SNPs did not reveal strong population stratification, although we cannot exclude that undetected low levels may be present. Family-based approaches are more robust to confounding by population stratification and in addition they enable testing for parent-of-origin effects.
Although the strongest signals identified by the two approaches did not coincide, comparison of the results led us to pinpoint the most interesting loci supported by both methods, albeit with different strength. In addition, consistency of the results obtained by frequentist and Bayesian approaches suggested that our strongest signals are independent of the analysis method.
Primary association analysis of the chromosome 2 region identified the most interesting results in NOSTRIN
encodes the nitric oxide synthase trafficker. Interestingly, the nitric oxide signaling pathway has been recently shown to be overrepresented in genes disrupted by CNVs in schizophrenia.47
However, the NOSTRIN
association was stronger in the case–control analysis with only minor support from the TDT, and it was not confirmed in the replication sample or in the combined meta-analysis, suggesting that it might represent a false-positive result.
Similarly, the ZNF533
association was not replicated, however rs7590028 remained one of the strongest signals in case–control combined analysis of IMGSAC samples. ZNF533
encodes a protein containing four matrin-type zinc fingers and is highly conserved in evolution. Given its putative nuclear location, it is thought to act as a repressor of transcription, although no specific targets are currently known. ZNF533
is widely expressed in adult tissues, including brain. Expression of all isoforms in fetal brain was confirmed by reverse transcriptase–PCR (data not shown). Deletions including ZNF533
have been described in several patients with a neurological phenotype including mental retardation,48, 49
and other zinc-finger genes have also been implicated in mental retardation cases.50, 51, 52
The zinc-finger gene ZNF804A
was recently identified as the strongest result in a genome-wide association study of schizophrenia and bipolar disorder,53
suggesting that they may act as transcription regulators in a wide range of human cognitive processes.
On chromosome 7, the most significant association result from the primary cohort was in the IMMP2L
gene. Although SNPs in this gene failed to replicate in independent samples, the IMMP2L
intronic SNP rs12537269 achieved the strongest result in the case–control meta-analysis of the IMGSAC sample (P
=7.3 × 10−5
). This gene encodes an inner mitochondrial membrane protease-like protein and is a plausible candidate for autism, because it was previously reported to be disrupted in an individual with Tourette syndrome, a complex neuropsychiatric disorder showing phenotypic overlap with ASDs.54
contains a neuronal leucine-rich repeat gene (LRRN3
) nested within its large third intron. The expression profile of LRRN3
also makes it an interesting candidate gene for autism, as it is most highly expressed in fetal brain. Studies in Drosophila
demonstrate that many members of the LRR family provide an essential role in target recognition, axonal pathfinding and cell differentiation during neural development,55
and murine studies suggest these LRR proteins could have similar functions in mammalian neural development.56
The only SNP that achieved significant replication, after Bonferroni correction for multiple testing, is rs2217262 in the neighboring gene DOCK4
, also a good autism candidate. This gene encodes a protein that activates Rac GTPase and is often deleted during tumor progression.57
A recent study in rats indicates that DOCK4
is predominantly expressed in the hippocampus as well as in the lung.58
This study further demonstrated that in cultured hippocampal neurons, DOCK4
is upregulated at the same time as dendrites start growing, and that knockdown of this gene by RNA interference results in impaired dendritic morphogenesis.
The association result for rs2217262 indicates that the common allele in the population is associated with increased risk for autism, or the minor allele is a ‘protective' variant. It has been shown that in presence of missing data, SNPs with a low MAF may show a bias in TDT, resulting in artificial overtransmission of the common allele.59
This problem is not likely to apply to rs2217262, as this association was supported also by case–control analysis.
Although only the rs2217262 association was confirmed by replication analysis, suggesting that the other results may represent false positives, this polymorphism (with MAF only about 5%) would not alone account for the linkage signal seen at AUTS1 in the IMGSAC sample. It is thus possible that multiple loci might contribute to the overall linkage seen for this region, and that the other significant SNPs from primary analysis may in reality be true signals but with lower OR, which our replication study was underpowered to detect. We do recognize that several limitations may have affected our replication sample. The primary sample was composed of trios selected from multiplex families based on IBD sharing, thereby more likely to be enriched for susceptibility alleles. By contrast, the replication population was a more heterogeneous sample, not preselected on linkage, and was mostly composed of singleton families. Power calculation suggested that our replication sample (IMGSAC-R and ND) should give us sufficient power to replicate the most significant primary results. However, the well-known ‘winners curse' theory also suggests that the effect sizes from the initial study may have been overestimated, thus requiring a much larger sample for replication. We did not detect presence of structure in the combined IMGSAC primary and IMGSAC-R samples, but it is possible that heterogenity may be present among the different samples used in this study (ND, Mount Sinai and University of Washington). This could have also contributed to the lack of replication, as could have gene–environment interactions, when different environmental exposures are present between population samples.
and/or inherited CNVs are emerging as important causes of ASDs and other complex disorders.8, 11, 12, 13
Hence we exploited our dense SNP genotyping data to mine for structural variants. The most interesting discovery is the occurrence of deletions and duplications in four independent families in the IMMP2L/DOCK4
locus, given the coincident SNP association also seen for these genes. A maternal deletion was transmitted to both affected sons and the unaffected daughter in family 15-0084. In all other instances (two deletions and one duplication) the second affected sib did not inherit the CNV. Interestingly, the maternally segregating deletion extends to the 3′ end of the DOCK4
gene, whereas the non-segregating deletions or those identified in controls and in the DGV were limited to IMMP2L
. Taken together, these data seem to suggest that a copy number loss of DOCK4
may influence susceptibility to ASDs, whereas duplications may not be damaging. The effect of DOCK4
deletions might be less penetrant in women because the mother and the unaffected daughter also carried the deletion. Larger studies will be needed to confirm this hypothesis.
The predominantly gene-based nature of our study represents a possible limitation, as we may have missed susceptibility alleles in intergenic regions. Recent findings from the ENCODE Consortium emphasize the importance of looking at noncoding sequence, as several functional elements in the genome seem to be in these regions.60
We attempted to minimize this limitation by including several SNPs in non-genic evolutionary conserved elements.
Our study also suggests that no common variants of large effect size are present within genic regions at AUTS1
and highlights the importance of very large sample sizes for identification of robust associations and rare CNVs with sufficient power for statistical significance. Evidence from recent genome-wide association studies for various disorders clearly shows that effect sizes for loci contributing to complex traits are generally lower than those predicted a few years ago.61
Several whole-genome association and CNV studies for autism are currently in progress by large consortia, and it will be interesting to see if any of the genes highlighted by this study are also identified by these extensive studies.
It is possible that rare variants, both point mutations and CNVs, may account for a larger fraction of the overall genetic risk in complex psychiatric disorders than previously assumed. The present study was not designed to assess the contribution of rare sequence variants and our results do not preclude that the chromosome 2q and 7q linkage regions may harbor rare variation showing allelic heterogeneity across families, which may require resequencing to uncover.
The inconclusive findings identified with this study reflect the status of the field of autism genetics and suggest that classical approaches such as linkage and association analyses alone may not be sufficient to deal with the genetic and phenotypic heterogeneity seen in autism. One recent study of note used homozygosity mapping to uncover a number of large homozygous deletions in consanguineous pedigrees, highlighting the utility of this approach for heterogeneous disorders like autism.10
Another successful study found linkage to 15q13.3–q14 in a subset of families with IQ
70, suggesting that the use of informative subphenotypes to define homogeneous sets of ASD families could be very important in detecting susceptibility loci involved in autism.62
Finally, another report indicated that level of somatic CNVs between MZ twins may be higher than expected.63
If confirmed, this finding could be a powerful tool for identification of autism susceptibility loci in MZ twins with a discordant phenotype. We believe a combination of these (and other) novel approaches, together with traditional methods will be required to uncover all the genes and biological pathways leading to autism.
In summary, the present high-density SNP association and CNV screen have provided evidence that variants in the IMMP2L/DOCK4 locus on chromosome 7 and in ZNF533 on chromosome 2 may increase susceptibility to ASDs. Association of the common allele of SNP rs2217262 in DOCK4 was supported by an independent replication, whereas the associations in IMMP2L and ZNF533 are not sufficiently significant in the context of multiple testing and warrant further studies.