Autism is an early onset neurodevelopmental disorder belonging to a group of conditions known as autism spectrum disorders (ASDs) which includes classical autism, pervasive developmental disorder-not otherwise specified (PDD-NOS), and Asperger syndrome [1
]. ASDs are genetically and phenotypically heterogeneous with a variable degree of severity and symptomology. The prevalence of autism spectrum disorders has risen in recent decades to 6.7 per 1000 children in the United States [2
]. Diagnosis of autism is defined by significant impairments in three developmental domains: reciprocal social behavior, communication, and repetitive stereotypic behaviors or restricted interests [1
Several candidate genes have been linked to this highly heritable disorder, but the etiology of most cases remains unknown. Linkage analyses for autism susceptibility loci have suggested the involvement of multiple genes from different chromosomes. Despite the completion of several genome-wide linkage studies for autism, most of the loci identified have not been replicated. Furthermore, association of several candidate genes have been reported and examined in subjects with autism, mainly with no conclusive evidence. As a result, a number of autism susceptibility nucleotide changes have been reported but have not been replicated. These inconsistent results could be in part a reflection of the clinical heterogeneity and varying degrees of severity in ASD.
For example, in 2003 the first evidence of mutations in coding sequences of two X-linked neuroligin genes, NLGN3
, were reported in individuals with autism spectrum disorders [3
]. Neuroligins are cell adhesion proteins involved in the formation of neural synapses [4
]. Electrophysiological studies on mutant neuroligins carrying deletions in either the cytoplasmic tail or in the esterase-homology domain showed the critical role of the neuroligin genes in maintaining a functional balance between excitatory and inhibitory synapses in hippocampal neurons [5
]. This finding resulted in the conclusion that neuroligin defects lead to selective loss of inhibitory function and abnormal excitatory/inhibitory balance in neurons. Such a defect is believed to play a role in autism [5
Despite strong supportive evidence for the role of these neuroligin genes in synaptic function, only a few causal mutations in the NLGN3
genes have been identified in subjects with autism, suggesting that these mutations are not common and occur at a low frequency in the autistic population (less than 1%) [7
]. Therefore, at the population level the actual proportion of known genetic variants or changes contributing to the etiology of autism remains to be determined, since most identified genetic causes may account for a small effect. A fact that is expected, given the clinical heterogeneity and varying degrees of severity in this complex disorder, which demands the evaluation of multiple factors using integrated approaches.
Furthermore, genomic DNA copy number variations (CNVs) including small deletions and duplications of chromosomes, which may affect gene function have been recently reported in association with complex disorders such as autism [16
]. In a recent review, the association of CNVs with neuropsychiatric conditions including ASD was discussed by Cook and Scherer [20
]. One conclusion of this review paper was that while it is more likely for a de novo
than an inherited CNV to be pathogenic, the final causal effect of CNVs might be influenced by other cis
- or trans
-acting factors in a particular genomic environment, representing in an incomplete penetrance or a variable expressivity for a given CNV. This suggests that due to the complexity of neuropsychiatric disorders, the evaluation of biological relevance of CNVs should be considered in an integrated context [20
]. For a recent review discussing advances in autism genetics see Abrahams and Geschwind [21
Recent developments in molecular genetic technologies and knowledge have introduced new avenues to be explored, in particular for complex disorders. A good example is gene regulatory factors such as noncoding RNAs (ncRNAs) which are highly expressed in the nervous system [22
]. An estimated 98% of the transcriptional output in humans and other mammals consist of ncRNAs that do not code for protein but have other functions in cells [23
]. Four main groups of ncRNAs include microRNAs, snoRNAs, piRNAs, and siRNAs. A brief description of each type and their relevance to human disease is provided here.
microRNAs are small RNA molecules of approximately 22 nt that regulate gene expression by binding to the 3'-untranslated regions (3'UTR) of target mRNA(s), directing translational repression or transcript degradation [24
]. It is estimated that up to 30% of human genes may be microRNA targets [25
]. Small nucleolar RNAs (snoRNAs) direct the site-specific modification of nucleotides in target ribosomal RNAs (rRNAs) [26
]. However, some snoRNA (known as orphan snoRNAs) lack known targets for rRNA. Two classes of snoRNA can be distinguished based on their conserved sequence motifs: H/ACA box snoRNAs and C/D box snoRNAs. The C/D box snoRNAs contain four conserved motifs called boxes C, C', D, and D', with a 10-21 nucleotide long antisense element located upstream of the D and/or D' boxes.
One of the most studied snoRNAs in humans is HBII-52, located at chromosome 15q11 [27
]. In addition to HBII-52, this chromosomal region contains several other paternally expressed (imprinted) brain-specific orphan snoRNAs [27
]. However, complementarity to a given mRNA sequence has been reported for only HBII-52. The antisense element of HBII-52 exhibits an 18-nt complementarity to the 5-HT2C
mRNA whereby it is subject to posttranscriptional RNA editing and an alternatively spliced exon Vb [27
]. Subjects with Prader-Willi syndrome, a neurodevelopmental disorder involving a chromosome 15q11 abnormality, have different 5-HT2C
mRNA processing than healthy individuals, which may contribute to their clinical symptoms [28
]. In an attempt to identify targets for other orphan snoRNAs, we have recently developed a computer program, snoTARGET [29
]. According to our initial analysis using snoTARGET, there are potential target mRNAs for other orphan snoRNAs which need to be verified using molecular and functional assays. This finding further suggests the importance of exploring the role of snoRNAs in human diseases.
piRNA (Piwi-interacting RNAs) are a newly discovered class of small RNAs, 26-31 nucleotides in length, that are expressed abundantly in the spermatogenic cells [30
]. The majorities of piRNAs exist as clusters and occur on one or both strands, designated as monodirectional or bidirectional clusters, respectively. The biological function of piRNAs is not fully known, but their expression pattern indicates that they play roles in spermatogenesis and germline development [30
Small interfering RNAs (siRNAs) are about 21 nucleotides in length and derive from double stranded RNA (dsRNA), typically a result of transgenic, viral or other exogenous dsRNA sources [31
]. In addition to exogenous siRNAs, there have been reports of endogenous siRNAs found in plants, flies, and mammals [31
]; however, endogenous siRNAs in humans remain to be discovered. The siRNAs consist of a guide strand and a passenger strand. The guide strand binds to mRNA molecules resulting in a knockdown in the levels of mRNA, protein or both [31
]. Brief analysis of the siRNA data available from the MIT siRNA database, containing experimentally validated siRNAs [32
], showed that several autism candidate genes are targets of exogenous siRNAs.
Multiple classes of ncRNAs are highly represented in the nervous system, emphasizing the likelihood that nervous system development and function is heavily dependent on RNA regulatory networks, and alterations of these networks may result in many neurological diseases. It is thought that ncRNAs may provide the key to better understanding the etiology of human diseases, particularly neurological diseases [33
]. For example, dysregulation of microRNAs has been reported in association with Alzheimer's disease [34
], Parkinson's disease [37
], and Tourette's syndrome [38
]. More recently, a study conducted by our group [39
] and a report by Abu-Elneel et al. [40
] suggested that microRNAs should be evaluated in the etiology of autism. Therefore, functional features and biological significance of ncRNAs suggests that this class of gene regulatory factors should be considered in relation to complex disorders.
Fragile sites are another important genomic factor in human genetics. Fragile chromosome sites are nonrandom gaps or breaks of variable size that can appear spontaneously or after exposing the cells to chemical agents [41
]. Based on their frequency in the general population, fragile sites can be classified into two main classes: common and rare [42
]. One rare fragile site (FRAXE) is associated with a form of mental retardation and also has been reported as the most common cause of autism [43
]. Analysis of the global distribution of fragile sites and microRNAs in relation to genomic regions involved in cancers indicated that microRNAs are frequently located at fragile sites and cancer-associated genomic regions [45
]. These lines of evidence warrant the need for further analysis of fragile sites in autism using an integrated approach to gain more insight into the possible role of this form of cytogenetic marker in relation to other contributing genetic factors.
The growing list of autism susceptibility genetic factors and the need to explore the role of gene regulatory elements (e.g., ncRNAs) warrants the implementation of bioinformatics tools to facilitate a more comprehensive approach evaluating this complex neurodevelopmental disorder. In an effort to make all reported genomic features associated with ASD (i.e., susceptibility genes and CNVs) and their potential relationship with other genomic features impacting on human disease (e.g., ncRNAs [23
] and fragile sites [46
]) accessible to the scientific community, the Autism Genetic Database (AGD), a freely available database, was designed by our research group.