In the present report, we focused on the DLX bigene clusters given their importance in forebrain development [
18,
25] and potential neurophysiological processes underlying autism [
10]. We sequenced the coding regions and flanking non-coding regions for DLX1, DLX2, DLX5, and DLX6 in 161 autistic probands and 58 non-autistic siblings. We also sequenced four intergenic enhancers (two each between each cluster [
28,
30]) and an enhancer sequence ~13 kb upstream of DLX1 (Ghanem and Ekker, unpublished observations). In the gene regions, we identified 28 variants, four of which were previously deposited in public SNP databases. We found five variants that are predicted to change or insert an amino acid in the protein in DLX2 and DLX5. Three synonymous SNPs were also found in these genes. Interestingly, no coding sequence variants, synonymous or non-synonymous, were seen in DLX1 or DLX6. The low frequency of the non-synonymous variants preclude meaningful assessment of correlation between variant and disease in the families analyzed. For example, SNP-2 in DLX2 (Glu→Lys), SNP-6 in DLX2 (Ala→Thr), and SNP-6 in DLX5 (Ser→Pro) were each seen in single pedigrees (
Additional File 1). In all three cases, one affected offspring in the pedigree has the variant, while either a second affected member does not, or a non-affected sibling also has the variant. The insertion of Serine residue between Serine 36 and Leucine 47 in DLX2 (InDel-1) occurred in three families, but occurred equally as frequently in autistic individuals as in non-autistic siblings, and was seen in 12 of 376 chromosomes in the polymorphism discovery sample. Interestingly, mouse and rat show tri-peptide insertions (Asn-Ser-Ser and Asn-Ser-Asn, respectively) at this site when compared to human and chimp sequence. Finally, a variant predicting a serine to arginine change in DLX5 (SNP-7), was seen in three pedigrees. In two pedigrees, each containing three offspring, two affected individuals were heterozygous for the variant, while a non-autistic sibling was homozygous for the wild type allele. In the third pedigree containing an affected sibling pair, one individual was heterozygous and the other was homozygous for the wild type allele. This variant was seen in one of 374 chromosomes in the polymorphism discovery sample, as well as in one of 380 chromosomes from the human variation panel samples. The non-synonymous changes from SNPs-2 and 6 in DLX2 and SNP-6 in DLX5 were not seen in 376 polymorphism discovery chromosomes or in the human variation panels. Our study design, involving resequencing of DLX family genes in autistic probands, has identified potentially interesting variants within these genes, but cannot provide statistically meaningful inferences about the effect on populations given our ascertainment scheme using non-independent probands from multiplex families and a rather small number of non-affected siblings.
Given the location of DLX1/2 and Dlx5/6 in relation to autism linkage intervals, other groups have examined whether DNA variants in these genes are significantly associated with autism. The IMGSAC consortium conducted a sequence survey of DLX1 and DLX2 in 48 autistic probands [
32], which identified three variants in DLX1 that were not detected in our much larger sample and three of the DLX2 variants seen in our sample. One explanation for the non-overlap in findings is the low allele frequency of most variants found in both studies. For example, our SNP-1 in DLX1 was in a region also sequenced by the IMGSAC group, but our allele frequency was 0.2%, making it highly unlikely that it would be detected by assaying 96 chromosomes [
32]. The high frequency SNPs-1 and 4 occurred in both samples. Another reason may be the origin of samples. The AGRE samples are predominantly of U.S. origin [
41], while the IMGSAC samples are from a variety of geographically diverse countries [
39], raising the possibility of population specific variants. This is particularly the case with rare variants, which are less likely to be shared across populations. Finally, differences in coverage of the gene (coding and non-coding regions), depth of coverage (i.e., sample size), or variant detection technology (e.g., direct sequencing in this report, versus variant detection with denaturing high performance liquid chromatography followed by direct sequencing by Bacchelli
et al.) may explain the discrepancy in variants described. For example, our DLX1 SNPs-2 and 3 lie outside of the region assayed by IMGSAC.
In another recent study in 99 AGRE pedigrees and 308 other pedigrees, two SNPs in DLX2 were investigated [
44]. One of these, rs2228184, corresponding to our DLX2 SNP-4, a synonymous coding sequence variant, showed marginal association to autism. This common variant was equally common in autistic probands and their unaffected siblings in our study (
Additional File 2). The second SNP in the study by Rabionet et al., which was not associated with autism, occurs outside of the region we sequenced. These published data and our own do not provide support for the possibility that common variation in the DLX loci is associated with autism.
Another study focusing on DLX6 reported the existence of a CAG repeat in exon 1 of DLX6 after assaying 90 Caucasian samples [
45]. Although we sequenced the same region, we did not detect this variation. In any case, the uncommon nature of the DLX variants reported here, even in aggregate, are unlikely to provide the basis for any linkage signal in the DLX gene clusters on chromosomes 2 and 7.
A striking observation in our data was the prominent lack of sequence diversity in the five non-coding regions we investigated. In the 4,000 bp encompassing the four intergenic enhancers and DLX1/2 upstream regulatory element, we found only seven variants. However, given the sequence conservation of these functional elements [
28], this result is not surprising. Indeed, three of the four intergenic elements are included in 481 genomic segments greater than 200 bp with 100% conservation of human sequence with mouse and rat [
46]. In other words, 0.6% of known ultraconserved sequences can be found in the 0.001% of the genome representing the DLX1/2 and DLX5/6 clusters. These deeply conserved sequences that were not exonic were significantly enriched near genes involved with transcriptional regulation, and in particular, those with Homeobox domains (p < 10
-14) [
46].
Of the variants identified, eight occurred in the coding regions of DLX2 and DLX5. The four variants that change the identity of an amino acid are non-conservative modifications: DLX2 SNP-2 (Glutamic acid to Lysine), DLX2 SNP-6 (Alanine to Threonine), DLX5 SNP-6 (Serine to Proline), and DLX5 SNP-7 (Serine to Arginine). In both DLX2 and DLX5, amino acid substitutions lie in conserved regions of the proteins. DLX2 SNP-2 is just N-terminal of the homeodomain in a region conserved among the human DLX2,3,5 subgroup. This DLX2 residue is conserved between human and chimp, dog, rat, and mouse, and fugu, while chicken, African clawed frog, and zebrafish contain the conservative Aspartic acid at the same position. The amino acids changed by DLX2 SNP-6, DLX5 SNP-6 and DLX5 SNP-7 are adjacent to a Proline-rich domain C-terminal to the homeodomain. The amino acids substituted by DLX5 SNP-6 and SNP-7 are invariant in 5 mammal species, chicken (except for the Serine changed to Arginine by SNP-7), and frog. DLX2 InDel-1 leads to the insertion of seventh Serine residue into a six residue polyserine tract within the conserved DLX2,3,5 DllA domain [
21,
47]. The functional significance of such a change is unknown.
The functional significance of the three synonymous SNPs (DLX2 SNP-4, DLX5 SNP-1 and DLX5 SNP-2) is uncertain, but cannot be summarily dismissed. For example, such "silent" variants can alter binding sites (exonic splice enhancers, ESE) for proteins involved in RNA splicing [
48]. Using ESEfinder, a web-based application designed to analyze exonic sequences to identify potential ESEs responsive to the human SR proteins [
49], each of these three variants alter the predicted strength or presence of recognition sites for one or more of several highly conserved and structurally related splicing factors termed Serine/Arginine-rich (SR) proteins (data not shown). For instance, the C to T substitution for DLX5 SNP-2 synonymous change abolishes binding sites for two of three SR proteins located in the region surrounding the SNP. The functional significance of this
in silico observation is unknown, but highlights the potential importance of DNA variation that does not necessarily alter the primary structure or proteins. As is always the case with the analysis of rare variants, until the functionality of these variants is demonstrated, either through statistical differences in allele frequencies at the population level or through direct functional studies, these variants should not be considered disease mutations.
While the identification of variants that generate non-conserved amino acid changes in DLX2 and DLX5 in autistic people suggests that the DLX genes could contribute to autism susceptibility, there are limitations to our study. First the identified DLX variants identified here are rare variants that could be expected to naturally occur across the human population. While this is a possibility given the low likelihood that any random gene is an autism susceptibility locus [
50], we believe that biological and genetic linkage data elevate the
a priori probability that the DLX genes analyzed here may be autism genes. Furthermore, the nature of the amino acid changes suggests that they could alter the function of the DLX proteins, although direct demonstration of this is currently lacking. A second limitation is that the rarity of the DLX variants precludes the possibility that they account for a significant portion of the genetic susceptibility to autism. A further weakness is that our study did not allow the large-scale population-based case control design that would allow a better estimation of the probability that these variants contribute to causing autism. A third limitation involves the use of unaffected siblings in our mutation screen. These siblings were rigorously phenotyped, but not having clinical autism does not rule out the possibility that they have milder traits representing aspects of the autism phenotype. Thus, it is possible that variants shared by affected and "unaffected" siblings may be functionally significant. In terms of co-occurring medical conditions, we found no evidence for such in five families segregating non-synonymous amino acid variants. A fourth limitation is that the population sample we used to compare the allele frequencies of the autism variants with a "normal" population was not optimal. We used the DNA Polymorphism Discovery Resource sample, which is designed to mirror the sequence diversity of the human population. While this sample allowed us to examine a cross section of global genetic diversity, the use of this sample introduces two major limitations for the interpretation of our data. One is the lack of knowledge of phenotypes in the PDR and human variation samples raises the possibility that we may falsely conclude that a variant seen both in autistic probands and the Coriell samples is not involved with autism when in fact the Coriell samples with the variant unbeknownst to us may have autism or a related phenotype. The second involves the relative enrichment of the PDR sample for non-Caucasian samples when compared to our autism sample, which is predominantly Caucasian. This under-sampling of Caucasians (and consequent lack of power to detect rare variants) in the PDR sample may cause us to falsely attribute a rare variant as autism-related, when it may in reality be merely Caucasian-specific. This may indeed be the case for the two DLX5 non-synonymous variants, which were not seen in the PDR sample, but were seen in the Caucasian panel. Thus by including the Caucasian and African-American human variation panels, we have observed that the variants may be Caucasian-specific, albeit at low allele frequencies.
Despite the caveats described above, we suggest that the non-synonymous DLX SNPs could contribute to autism susceptibility for several reasons. Mice lacking DLX1 have epilepsy [
26], a common feature in autistic patients. Heterozygosity of transcription factor mutations is well-known to cause human disease [
51]. In mice the dosage of the DLX genes is known to be important in controlling the differentiation of forebrain GABAergic neurons and morphogenesis of craniofacial structures, including the middle and inner ear. Indeed, heterozygosity of DLX2 alters morphogenesis of the skull (Depew and Rubenstein, unpublished), although it is uncertain whether heterozygosity of a DLX gene alters brain function. Given recent evidence that the DLX5 locus is partially imprinted in humans [
52], heterozygosity for DLX5 alleles could have profound ramifications. In our own pedigrees, we note that two of three pedigrees segregating the DLX5 non-synonymous SNP-7 show maternal transmission, while the single DLX5 SNP-6 pedigree showed paternal transmission (
Additional File 3). Increases in
Dlx5 expression have been found in mice lacking MECP2 (the Rett Syndrome gene), which are associated with alterations in long-range chromatin organization [
53]. Therefore, several recent findings are increasing the likelihood that changes in DLX function/expression are involved in neuropsychiatric disorders.
The fact that 4.4% of autistic probands had non-synonymous DLX2 and DLX5 variants (5% when including the DLX5/6 intergenic enhancer variant) could reflect the multifactorial etiology of autism. Finally, perhaps the variants in DLX2, DLX5 and ARX [
16], all of which alter the development of forebrain GABAergic neurons [
18-
25], are providing a clue that an increase in the ratio of excitation/inhibition underlies some forms of autism [
10]. Furthermore, it suggests that one should study other genes within genetic pathways that control the ratio of excitation/inhibition in neural circuits that regulate cognition, memory and emotion [
10].