|Home | About | Journals | Submit | Contact Us | Français|
Autism spectrum disorders (ASD) are neurodevelopmental disorders characterized by delayed/abnormal language development, deficits in social interaction, repetitive behaviors and restricted interests. The heterogeneity in clinical presentation of ASD, likely due to different etiologies, complicates genetic/biological analyses of these disorders. DNA microarray analyses were conducted on 116 lymphoblastoid cell lines (LCL) from individuals with idiopathic autism who are divided into three phenotypic subgroups according to severity scores from the commonly used Autism Diagnostic Interview-Revised questionnaire and age-matched, nonautistic controls. Statistical analyses of gene expression data from control LCL against that of LCL from ASD probands identify genes for which expression levels are either quantitatively or qualitatively associated with phenotypic severity. Comparison of the significant differentially expressed genes from each subgroup relative to the control group reveals differentially expressed genes unique to each subgroup as well as genes in common across subgroups. Among the findings unique to the most severely affected ASD group are 15 genes that regulate circadian rhythm, which has been shown to have multiple effects on neurological as well as metabolic functions commonly dysregulated in autism. Among the genes common to all three subgroups of ASD are 20 novel genes mostly in putative noncoding regions, which appear to associate with androgen sensitivity and which may underlie the strong 4:1 bias toward affected males.
Autism spectrum disorders (ASD) collectively refer to a group of related neurodevelopmental disorders characterized by delayed or abnormal language development, deficits in social interaction, and restricted, repetitive behaviors and interests. The heterogeneity in clinical presentation of ASD, thought to be due to different etiologies, complicates genetic and molecular analyses of these disorders. Recently, several types of cluster analyses of data from ADI, ADI-R, and ADOS test instruments have revealed distinct, albeit somewhat different, components within the ASD spectrum [Lord, Leventhal, & Cook, 2001; Silverman et al., 2001; Stevens et al., 2000; Tadevosyan-Leyfer et al., 2003; Tanguay, Robertson, & Derrick, 1998]. One study, which analyzed 98 items on the ADI/ADI-R from 292 individuals, identified six clusters of variables that include spoken language, social intent, compulsions, developmental milestones, savant skills, and sensory aversions [Tadevosyan-Leyfer et al., 2003]. By selecting for the “savant skills” phenotype, Nurmi et al. demonstrated greatly increased linkage to chromosome 15q11–q13 (HLOD~2.6) relative to that obtained with an unsegregated ASD population with an HLOD of ~0.8 [Nurmi et al., 2003]. Similarly, incorporating language phenotypes into linkage analyses identified markers on chromosome regions 7q and 13q [Bradford et al., 2001]. In a similar vein, stratifying the ASD population according to the severity of a specific core symptom (e.g., repetitive behavior) allowed quantitative correlations with altered neurotransmitter levels (5-HT and oxytocin) [Hollander et al., 2000]. Interestingly, deficient oxytocin levels were also linked to deficits in social recognition (a common symptom in ASD) in a study focused on the molecular basis of “social memory” in an animal model [Winslow & Insel, 2002]. In addition, quantitative trait loci (QTL) can be clearly associated with severely language-impaired and nonverbal communication-impaired endophenotypes of ASD [Alarcon, Yonan, Gilliam, Cantor, & Geschwind, 2005; Chen, Kono, Geschwind, & Cantor, 2006]. Most recently, several QTL were identified based upon severity scores on the Social Responsiveness Scale [Duvall et al., 2007], which was designed to quantitate deficits in social functioning typical of ASD, but which are present within the broader population as a social endophenotype [Constantino et al., 2003]. Thus, a strong argument has been made for subtyping or stratifying ASD subjects according to phenotype or severity within a specific domain as well as for including related individuals expressing the “broad autism phenotype” in genetic and molecular analyses. By reducing the variability in this heterogeneous population, a “phenotypic approach” is expected to facilitate the identification of genes and pathways contributing to, or associated with, a specific behavioral/symptomatic phenotype.
As described in the accompanying article [Hu & Steinberg, 2009], we have applied several clustering algorithms to scored data from the Autism Diagnostic Interview-Revised (ADIR) questionnaires from nearly 2000 individuals with idiopathic autism in an attempt to divide the autistic probands into phenotypic subgroups based upon severity across 123 ADIR items. Our approach differs significantly from that employed by other investigators in that our subgroups are defined by multiple items within different behavioral or functional categories, including spoken language, nonverbal communication, social skills, play skills, physical attributes and sensitivities, aggression, and savant skills, while many other studies utilize at most several item scores within a single category to define subgroups of individuals. Another aspect of our approach that differs from previous analyses is that we employ multiple clustering algorithms to the data, which results in a clearer and more intuitive phenotypic description of the subgroups. Using these combined methods to identify both severe and mild subgroups of ASD individuals as well as those with notable savant skills, we now demonstrate discrimination of autistic from nonautistic individuals based upon gene expression profiles. In addition, both qualitative and quantitative differences in gene expression are observed between the subgroups. Furthermore, we show that the several phenotypes of idiopathic autism can also be distinguished by pathway analyses, corroborating the distinct biological phenotypes of ASD.
Lymphoblastoid cell lines (LCL) for gene expression analyses were selected based upon phenotype analyses and exclusionary criteria described in detail in the accompanying manuscript [Hu & Steinberg, 2009]. In brief, autistic probands were clustered into four groups based upon 123 scores from the ADIR questionnaire. Representative samples from three of the groups were selected after excluding all females, individuals with cognitive impairment (Raven's scores <70), those with known genetic or chromosomal abnormalities (e.g., Fragile X, Retts, tuberous sclerosis, chromosome 15q11–q13 duplication), those born prematurely (<35 weeks gestation), and those with diagnosed comorbid psychiatric disorders (e.g., bipolar disorder, obsessive compulsive disorder, severe anxiety). Impairment in spoken language was additionally confirmed based upon low standard scores (<80) on the Peabody Picture Vocabulary Test.
The LCL were cultured as previously described [Hu, Frank, Heine, Lee, & Quackenbush, 2006] according to the protocol specified by the Rutgers University Cell and DNA Repository, which maintains the Autism Genetic Research Exchange (AGRE) collection of biological materials from autistic individuals and relatives. Briefly, cells are cultured in RPMI 1640 supplemented with 15% fetal bovine serum, and 1% penicillin/streptomycin. Cultures are split 1:2 every 3−4 days and cells are typically harvested for RNA isolation 3 days after a split while the cultures are in logarithmic growth phase.
Gene expression profiling is accomplished using TIGR 40K human arrays as previously described [Hu et al., 2006]. The annotation file for this microarray platform is GPL3427 in the Gene Expression Omnibus (GEO) archive. Total RNA was isolated from LCL using the TRIzol (Invitrogen, USA) isolation method according to the manufacturer's protocols, and cDNA was synthesized, labeled, and hybridized to the microarrays as described in our earlier study, with the exception that cDNA from each sample was labeled with Cy-3 dye and hybridized against Cy-5 labeled reference cDNA prepared from Universal human RNA (Stratagene, USA). This “reference” design allows the flexibility to perform different comparisons among the samples since all expression values are against a common reference. After hybridization, washing of the arrays, and laser scanning to elicit dye intensities for each element on the array, the raw intensity data was normalized and filtered using the Lowess normalization and standard deviation (variance) regularization programs [Quackenbush, 2002] contained within MIDAS [Saeed et al., 2003]. Prior to and following normalization and filtering of microarray data, LOWESS plots were visually inspected for skewness and artifacts. Additional microarray quality control assessment included principal components analysis (PCA) of the entire microarray data set to identify potential outliers. As no outlier experiments were detected, the entire data set was included for statistical analysis. The normalized intensities were then uploaded into MEV [Saeed et al., 2003], and a 70% data filter was applied, which removes genes for which log2 intensity values are missing in >30% of the samples (>35 out of 116 samples). This filtering step, when applied to data from 116 samples, reduces the number of transcripts from 41,472 to 25,221, which are then used for the statistical analyses in MEV. Significant differentially expressed genes were identified using the Significance Analysis of Microarrays (SAM) [Chu, Weir, & Wolfinger, 2002; Tusher, Tibshirani, & Chu, 2001] module within MEV for both 2-class and 4-class analyses, with a maximum false discovery rate (FDR) of 5%. Among our control subjects, there were 6 who are siblings of individuals in the savant group and 1 who is a sibling of an individual in the mild group. Because genotype has been shown to influence gene expression profiles [Cheung & Spielman, 2002; Cheung et al., 2003; Morley et al., 2004], we removed all controls who were siblings of the respective autistic probands prior to each SAM analyses.
Select genes were confirmed by real-time RT-PCR on an ABI Prism 7300 Sequence Detection System using Invitrogen's Platinum SYBR Green qPCR SuperMix-UDG with ROX. Total RNA (same preparations used in microarray analyses) was reverse transcribed into cDNA using the iScript cDNA Synthesis Kit (Bio-Rad, Hercules, CA). Briefly, 1 μg of total RNA was added to a 20 μl reaction mix containing reaction buffer, magnesium chloride, dNTPs, an optimized blend of random primers and oligo (dT), an RNase inhibitor and a MMLV RNase H+reverse transcriptase. The reaction was incubated at 25°C for 5 min followed by 42°C for 30 min and ending with 85°C for 5 min. The cDNA reactions were then diluted to a volume of 50 μl with water and used as a template for quantitative PCR.
Quantitative RT-PCR (qRT-PCR) primers for genes identified by microarray analysis as differentially expressed were selected for specificity by the National Center for Biotechnology Information Basic Local Alignment Search Tool of the human genome, and amplicon specificity was verified by first-derivative melting curve analysis with the use of software provided by PerkinElmer (Emeryville, CA) and Applied Biosystems (Foster City, CA). Sequences of primers used for the real-time RT-PCR are given in Supplemental Table I.
qRT-PCR analyses were performed on all samples, with quantification and normalization of relative gene expression using the comparative threshold cycle (Delta-delta-Ct) method [Livak & Schmittgen, 2001] as described previously [Hu et al., 2006]. The expression of the “housekeeping” genes MDH1, ARF1, and ACSL5 were used for normalization as these genes did not exhibit differential expression in our microarray assays. The qRT-PCR reactions were done in triplicate.
Lymphoblastoid cells from a nonautistic individual (HI0813) were cultured in RPMI media with 15% FBS and 1% antibiotics at 37°C and 5% CO2. Prior to treatment, the cells were transferred to phenol red free RPMI media with 15% charcoal dextran-treated serum and 1% antibiotics for 24 hr to deplete the medium of endogenous hormones [Arnold, Le, McFann, & Blackman, 2005]. Dihydrotestosterone (DHT) was dissolved in ethanol (final ethanol concentration not exceeding 0.001%) and added at a final concentration of 10 nM. The cells were harvested 2 days after treatment with DHT and RNA was isolated using TRIzol reagent (Invitrogen). Quantitative PCR was performed as described above using primers for seven novel transcripts (Supplemental Table I). The qRT-PCR reaction for each gene was done in triplicate and average expression values were calculated relative to the vehicle-treated samples. In this study, ARF1, ACSL5, and MDH1 were used for normalization of the queried genes. All log2(fold change) values for the seven genes are statistically significant with P-value <0.05.
The data sets of differentially expressed genes between autistic probands and unrelated controls were analyzed using Ingenuity Pathway Analysis (IPA) and Pathway Studio 5 to identify relational gene networks, high-level functions, and small molecules associated with the gene regulatory networks. For pathway analyses using IPA, the Fisher Exact Test was used to identify significant pathways and functions. To determine P-values, the number of genes with a particular function within the experimental data set is compared to the number of genes with that function in Ingenuity's Knowledge Base, which is a comprehensive database containing manually curated and published information about the functions and interactions of millions of genes and gene products. Unless otherwise noted, we applied an expression cutoff of log2 (ratio)≥±0.3 to the data before pathway analyses as we are generally able to confirm by qRT-PCR analyses average expression values of this magnitude.
The P-values for the enrichment of differentially expressed genes in QTL were calculated based on the hypergeometric test [Johnson, Kotz, & Kemp, 1992], which was chosen because of its computational efficiency to calculate the statistical significance of an observed number of overlapped genes given the number of selected genes and the reference pattern. This test has been widely used in large-scale biomedical studies to evaluate whether an observation is significantly enriched when compared to a reference pattern [Chen et al., 2007; Falcon & Gentleman, 2007]. Only genes with complete chromosome start/end information were included in this analysis. An overlap between a gene and a QTL was determined based on the following criteria: (1) they are on the same chromosome; (2) the gene's start position is between the QTL's start and end positions, or the QTL's start position is between the gene's start and end positions. Of the 40,032 transcripts on the microarray, 29,723 have complete chromosome start/end information and 9,734 of these genes overlap with the QTLs according to our stated criteria.
All microarray data is reported according to the MIAME standards and has been submitted to the GEO repository for public access. The GEO accession number is GSE15402.
In the accompanying manuscript [Hu & Steinberg, 2009], we report on a novel clustering method for stratifying autistic individuals according to phenotypes, which encompass 123 scores on 63 distinct items on the ADIR questionnaire, most of which are represented by two separate scores related to “current” (existing) or “ever” (previously exhibited) behaviors. In this manuscript, the gene expression profiles of LCL of individuals from three of the four phenotypic subgroups that resulted from the cluster analyses of ADIR scores were analyzed and were shown to demonstrate different functions overrepresented within the different subgroups that are suggestive of distinct “biological phenotypes.” To test the concept that the phenotypic subgroups can be differentiated from each other by gene expression profiles, we analyzed the subgroup with severe language impairment and high severity scores across most of the ADIR items used for clustering (except for savant skills), the mild subgroup comprised of individuals many of whom were clinically diagnosed with PDD-NOS or Asperger's Syndrome who exhibited distinctly lower severity ADIR item scores, and the subgroup of individuals with noticeably high scores in the savant skills categories to identify genes that may be associated with this unusual and interesting trait. Included within the savant group were some individuals who, despite exhibiting pronounced savant skills, were also language impaired. The demographic profile of the individuals whose LCL were included in this study is provided in Supplemental Table II, with additional information (e.g., scores on the Peabody Picture Vocabulary Test and Raven's Test) provided in the accompanying manuscript. The intermediate subgroup was not included in this study because we wanted to be able to first demonstrate differences between phenotypic groups at the extreme ends of the spectrum.
Gene expression profiles of LCL from each of the autistic individuals studied and age-matched controls were obtained by cDNA microarray analyses. A 2-class analysis of the data reveals a set of significant differentially expressed genes (FDR≤5%) that distinguish controls from all autistic samples (Fig. 1A and Supplemental Table III). Interestingly, when the samples from the autistic individuals are grouped according to phenotype, the gene expression matrix from this analysis reveals a gradient in differential gene expression for some genes in which the level of gene expression reflects the overall severity of the ASD phenotype relative to controls (Fig. 1B). Separation of the three ASD phenotypes from each other as well as from controls was also revealed by a 4-class SAM analysis of the microarray data (FDR≤0.1%) from all individuals. An expression matrix of the top 123 most significant genes from this 4-class analysis illustrates quantitative as well as qualitative differences in gene expression profiles that relate to “phenotype” (Fig. 2A, Supplemental Table IV). PCA was applied to this set of significant genes to reduce the dimensionality of the microarray data with respect to individual samples (Fig. 2B), and confirms that phenotypic subgroups can be differentiated from each other as well as from nonautistic controls, although there is still some mixing of the phenotypes, particularly between the “savant” group (yellow) and controls (turquoise), which are most similar in terms of gene expression profiles (Fig. 2A). The differences in gene expression profiles among several ASD phenotypes thus emphasize the need to identify and utilize more homogeneous samples for biological analyses.
Toward this goal, we treated each ASD subgroup as a separate class and performed 2-class statistical analyses on the gene expression data obtained from each of the groups in comparison to nonautistic controls to identify the differentially expressed genes that were specific to each subgroup. Figure 3A–C show the PCA plots demonstrating separation of individuals from each of the ASD subgroups and controls on the basis of gene expression profile. Lists of the genes included in the respective PCA analyses are provided in Supplemental Tables V–VII.
The Venn diagram in Figure 4 displays the overlap in differentially expressed genes among the three ASD subgroups. Pathway analysis of the overlapping genes between the L and M subgroups with Pathway Studio 5 software reveals a network of genes that affect common functional targets, such as synaptic transmission, neurogenesis, neurulation, long-term potentiation (learning), protein ubiquitination, and brain function that have been identified as dysfunctional in ASD (Supplemental Fig. 2). Of additional interest are the disorders associated with this set of genes, including inflammation, epilepsy, muscle function (hypotonia), disorders of intestinal absorption, and diabetes mellitus, which have been reported in subsets of individuals with ASD [Herbert et al., 2006; Hollander & Nowinski, 2003; Levy et al., 2007; Palmen, van Engeland, Hof, & Schmitz, 2004; Pardo, Vargas, & Zimmerman, 2005; Polleux & Lauder, 2004; Schultz et al., 2008]. Ingenuity Pathways Analysis software was used to identify the top multigene network (Fig. 5), with the most significant functions, pathways, and associated disorders listed in Table I. Interestingly, many of the genes in the network shown in Figure 5A are involved in neurological functions and disorders (Table II), with the predominant functions being neurite outgrowth, axon pathfinding, neurogenesis, and brain morphology. Thus, qRT-PCR was used to confirm several of the genes in this network (Fig. 5B). Table III lists the overlapping significantly differentially expressed transcripts across all three ASD subgroups (see Fig. 4). What is intriguing about this set of genes is that all 20 transcripts are novel and uncharacterized transcribed sequences, 19 of which map to intronic or intergenic regions. The majority of these transcripts also appear to be associated with cellular response to androgens as revealed by gene expression studies on Androgen Insensitivity Syndrome and androgen-sensitive and -insensitive prostate tumors, which are reported in GEO [Holterhus, Hiort, Demeter, Brown, & Brooks, 2003; Zhao et al., 2005]. To examine the possibility that some of these genes are androgen responsive, LCL were treated with dihydroxytestosterone, a potent metabolite of testosterone, and the levels of these transcripts were quantitated relative to vehicle-treated cells. As shown in Figure 6, all seven of the tested transcripts exhibited a response to DHT treatment. Further studies are needed to identify and fully characterize the nature of these possibly noncoding transcripts.
To understand the differences in the pathways and functions that are affected in each of the phenotypic subgroups, separate pathway and functional analyses were conducted on the gene data sets derived from comparison of the respective phenotypes vs. controls. Table IV summarizes the results obtained from IPA for each group in terms of categories of molecular and cellular functions, canonical pathways, and toxicity that are significantly enriched within the differentially expressed gene data set. It is clear from this summary that biological functions and pathways are most altered in the severely language-impaired subgroup and least altered in the “savant” subgroup. Among the genes relating to molecular and cellular functions, cell death genes are overwhelmingly overrepresented in the subgroup with severe language deficits, while genes involved in small molecule biochemistry, free radical scavenging, and cellular function and maintenance are overrepresented in the differentially expressed gene data set of the mild phenotype. It is interesting that RNA posttranscriptional modification is a highly significant molecular function enriched in the savant phenotype, as variation in RNA splicing has been noted to be associated with autism [Talebizadeh et al., 2006]. Among the genes involved in specific canonical pathways are those related to liver toxicity (hepatic fibrosis and stellate cell activation), which are enriched in both the severely language-impaired and the mild subgroups, but with more genes and hepatic cholestasis also affected in the severe subgroup. It is proposed that the dysregulation of at least some of these genes may be responsible for gastrointestinal disorders that are often associated with autism. Further comparison of the severe and mild subgroups on the basis of genes that are significantly enriched for neurological functions and disorders revealed not only differences in the number of genes associated with cell death in the severely language-impaired subgroup, but also a greater number of differentially expressed genes involved with various neurological disorders commonly associated with autism, such as allodynia, catalepsy, hypernocieption, and epilepsy (Table V). Particularly noteworthy are the 15 genes that are involved in the regulation of circadian rhythm, which also affect many of the neurological functions and disorders commonly associated with ASD, such as synaptic plasticity, learning, memory, inflammation, cytokine production, and digestion (Fig. 7) [Herbert et al., 2006; Palmen et al., 2004; Pardo et al., 2005; Persico & Bourgeron, 2006; Polleux & Lauder, 2004]. Differential expression of these genes was observed only in the samples from severely language-impaired individuals (L subgroup), with each individual showing altered expression of multiple (but not all 15) genes. We have confirmed differential expression of 6 genes (out of 6 tested) by qRT-PCR (Table VI).
Gene expression analyses indicate that there are hundreds to thousands of genes that are differentially expressed between LCL of nonautistic individuals and those of each of the three ASD subgroups studied. To investigate whether these genes bear any relationship to genetically identified autism susceptibility loci, we mapped the differentially expressed genes to QTL reported by seven laboratories [Alarcon et al., 2005; Bailey et al., 1998; Chen et al., 2006; Duvall et al., 2007; Philippe et al., 1999; Szatmari, Paterson, Zwaigenbaum, Roberts, Brian, Liu et al., 2007b; Weiss et al., 2008]. On average, about 27−33% of the differentially expressed genes are associated with autism QTL across all subgroups and the autistic samples combined (Fig. 8). The hypergeometric distribution test was applied to determine whether there is statistically significant enrichment of differentially expressed genes in QTLs within the combined autistic sample (A) as well as within the individual ASD subgroups. The analyses showed that there is indeed enrichment of differentially expressed genes in the selected QTLs for all groups, except for the language (L) subgroup, where the P-values were 0.001, 0.266, 0.001, and 0.003, for the A, L, M, and S gene data sets reported in Supplemental Tables III–VI, respectively. With respect to distribution of differentially expressed genes in QTL on specific chromosomes, there is significant enrichment (P<0.025) of differentially expressed genes in QTLs on chromosomes 7, 10, 16, 17, and 22 for the combined autistic group (530 genes at <5% FDR), as shown in Figure 8. In addition, there is enrichment in QTLs on chromosomes 2, 4, 7, 8, 10, 16, 17, 19, and 22 for the L subgroup (P<0.05), chromosomes 7, 16, and 17 for the M group (P = 0.0001), and on chromosomes 7 and 16 for the S subgroup (P<0.0015) according to Chi square tests. It is notable that all of these chromosomes have undergone intensive genetic analyses as “hot spots” with respect to autism [Auranen et al., 2000; Balciuniene et al., 2007; Buxbaum et al., 2001; Kumar et al., 2008; McCauley et al., 2005; Santangelo et al., 2000; Schellenberg et al., 2006; Stone, Merriman, Cantor, Geschwind, & Nelson, 2007; Wassink et al., 2008]. Thus, the layering of gene expression data onto genetic data may be a useful means of prioritizing candidate genes within the candidate susceptibility loci for further functional and genetic analyses.
Genetic and other biological analyses of idiopathic autism, which makes up at least 70−80% of ASD cases have been hampered by the inherent heterogeneity of presentation of ASD in different individuals which, in turn, increases the noise in the experimental data. We therefore sought to reduce the phenotypic heterogeneity of clinical samples obtained through the AGRE/NIMH tissue repository by subgrouping/stratifying individuals based upon cluster analyses of 123 of their respective ADIR scores on 63 items, some of which are queried twice, with respect to current behaviors and previously exhibited behaviors. While other studies have utilized several ADIR item scores within a specific domain (e.g., spoken language, nonverbal communication, social skills, or repetitive behaviors) to stratify ASD individuals for genetic analyses [Alarcon et al., 2005; Chen et al., 2006; Duvall et al., 2007; Hollander et al., 2000], this is the first study to subgroup individuals on the basis of ADIR item scores that reflect the full range of deficits commonly associated with ASD. We demonstrate that the gene expression profiles associated with each of the 3 ASD phenotypes that were selected for DNA microarray analyses show both qualitative and quantitative differences, which are dependent on ASD phenotype. We also show overlap of some of the differentially expressed genes among subgroups, which indicates common underlying biological deficits in ASD as well as differences that suggest dysregulation of specific pathways or functions in a particular subgroup of ASD.
We demonstrate here that the gene expression profiles associated with each of the three ASD phenotypes that were selected for DNA microarray analyses show both quantitative and qualitative differences, which are dependent on ASD phenotype. The quantitative differences that were revealed in a 2-class analysis of the gene expression profiles of all autistic probands vs. controls were particularly surprising and likely identify genes that influence the severity of ASD. Such genes would thus serve as good candidates for expression QTL (eQTL) analyses, which, in turn, will help to prioritize genes for in-depth genetic association and linkage studies. It should be pointed out that the gradient of gene expression in Figure 1B is only apparent when the samples are clustered according to ASD subtype, thus validating the value of our clustering methods, which were applied to 123 selected ADIR item scores. Genes whose expression levels are qualitatively, but not quantitatively, dependent on subgroups (as represented by subgroup-specific genes in the Venn diagram of Fig. 4) also present a strong case for subtyping ASD individuals according to our methods since averaging gene expression values across all samples would dampen the overall expression differences from controls and obscure the biological differences between the subgroups. It is therefore suggested that such clustering of individuals to reduce the phenotypic heterogeneity of the study groups will also be of value to genetic and other biological analyses of ASD.
The Venn diagram in Figure 4 summarizes the number of overlapping differentially expressed genes among the three ASD subgroups, with the largest overlap occurring between the severe (L) and mild (M) subgroups. Among the major functions associated with this set of overlapping genes are apoptosis and inflammation, as well as many neurological and metabolic processes commonly associated with ASD, such as central nervous system development, synaptic transmission, brain function, neuronal death, protein ubiquitination, RNA splicing, and oxidative stress (Supplemental Fig. 2). Genes, which were confirmed by qRT-PCR analyses (ITGAM, NFKB1, RHOA, SLIT2, and MBD2) are all strong candidates for further evaluation of their role in ASD (Fig. 5). ITGAM is involved in synapse formation and neuron toxicity, and is associated with chronic neural inflammation and microglial activation, which has been indicated in previous reports [Pardo et al., 2005; Vargas, Nascimbene, Krishnan, Zimmerman, & Pardo, 2005]. Similarly, the transcription factor NFKB1 is also a key regulator of inflammatory responses, which have been associated with ASD [DeFelice et al., 2003; Jyonouchi, Sun, & Le, 2001; Zimmerman et al., 2005]. RHOA and SLIT2 are components of the synpatogenesis/axon guidance pathway, which is strongly implicated in ASD [Jamain et al., 2003; Matzke et al., 2007; Persico & Bourgeron, 2006; Szatmari, Paterson, Zwaigenbaum, Roberts, Brian, Liu et al., 2007a]. These biological processes (inflammation, axon guidance) as well as others shown in Supplemental Figure 2 (e.g., apoptosis, neurogenesis, cell survival, and sex determination) replicate those identified in our previous gene expression studies of monozygotic twins discordant in diagnosis or severity of autism [Hu et al., 2006] and autistic–nonautistic sib pairs [Hu, Nguyen, Kim, Steinberg, Sarachana et al., 2009]. Altered expression of MBD2, a methyl-CpG binding protein, suggests the role of epigenetic factors in ASD. Indeed, several mutations have been identified in this family of transcriptional regulator proteins in autistic patients [Li, Yamagata, Mori, Yasuhara, & Momoi, 2005] and MeCP2, in particular, is responsible for Rett's Syndrome, a genetically defined ASD [Amir et al., 1999; Van Den Veyver & Zoghbi, 2001]. Our previous observation of gene expression differences between monozygotic twins discordant in diagnosis or severity of autism further supports the role of epigenetic regulation in ASD [Hu et al., 2006].
The most intriguing of the overlapping genes are the 20 novel genes that are shared by all three ASD groups because of their potential importance to core symptoms or pathology of ASD (Table III). As mentioned earlier, most if not all of these highly significant differentially expressed genes have been observed to be differentially regulated within the context of androgen insensitivity [Holterhus et al., 2003; Zhao et al., 2005]. This, in itself, is very interesting because of the hypothesis that higher levels of fetal testosterone may be a risk factor for ASD [Baron-Cohen, Knickmeyer, & Belmonte, 2005; Knickmeyer, Baron-Cohen, Raggatt, & Taylor, 2005; Knickmeyer & Baron-Cohen, 2006]. In fact, there is experimental support for this hypothesis, both from analysis of serum levels of androgens in individuals with ASD [Geier & Geier, 2006; Ingudomnukul, Baron-Cohen, Wheelwright, & Knickmeyer, 2007], as well as from our own studies [Hu et al., 2009], which show dysregulation of genes within the steroid hormone biosynthetic pathway in LCL from ASD probands as well as higher testosterone levels in their LCL extracts relative to their respective nearly age-matched siblings. From the location of these genes in the intronic regions of known genes and intergenic regions of the DNA, it is highly possible that at least some of these transcripts are noncoding RNA, some of which are known to play critical roles in neurodevelopment and function [Kapsimali et al., 2007; Kosik, 2006; Rogaev, 2005]. Although differentially expressed microRNA has been observed in both brain [Abu-Elneel et al., 2008] and LCL [Talebizadeh, Butler, & Theodoro, 2008; Sarachana et al., unpublished data] from autistic probands vs. nonautistic individuals, the novel transcripts do not include these microRNA and have not been found in several microRNA databases. The nature of the genes with which these transcripts are associated may reveal some of the functions impacted by autism, if the intronic transcripts are regulators of the respective genes. One example of a gene containing such a transcript is the Jumonji-domain-containing histone demethylase JMJD2C (GASC1) [Cloos et al., 2006; Pollard et al., 2008], which has been identified as an androgen receptor coregulator whose transcript is modestly affected by DHT [Urbannuci, Waltering, Suikki, Helenius, & Visakorpi, 2008]. Interestingly, deletions of chromosome 9p24 in the region of this gene (as well as DMRT1 and DMRT3) have been associated with autism or autistic-like behaviors [Vinci et al., 2007]. Because all of the novel transcripts are decreased in LCL from autistic individuals relative to controls, another possible explanation to consider is defective splicing [Talebizadeh et al., 2006] or excision of introns, which may be related to an overall decrease in expression of multiple splicing factors observed in our studies (Supplemental Tables VI–VIII). Clearly, more research is needed to identify and characterize these novel transcripts as well as to demonstrate their function within the context of ASD.
Subtyping of ASD individuals prior to gene expression analyses also revealed differentially expressed genes that were unique to each subgroup. Fifteen circadian rhythm regulatory or responsive genes were among the genes identified as differentially expressed in the most severe (L) subgroup but not in the mild or savant subgroups, suggesting a connection between dysregulation of circadian rhythm and the severity of this phenotype. In 2002, Wimpory et al. proposed a relationship between social timing, “clock” (circadian rhythm) genes, and autism [Wimpory, Nicholas, & Nash, 2002], and more recently demonstrated genetic association of PER1 and NPAS2, but not other circadian rhythm genes, with autistic disorder [Nicholas et al., 2007]. This very interesting hypothesis is based in part upon the prevalence of sleep disorders in ASD, which suggest deficits in the regulation of circadian rhythm [Johnson & Malow, 2008; Malow, 2004]. A recent report that Fragile X-related proteins regulate transcriptional activity of the clock genes provides additional experimental support for the involvement of circadian rhythm in ASD [Zhang et al., 2008]. Bourgeron has further proposed a connection between clock and synaptic genes (NLGN3, NLGN4, NRXN1, and SHANK3) in ASD [Bourgeron, 2007]. He also pointed out the importance of gene dosage in the balance of excitatory and inhibitory signaling at the synapse and suggested the possible importance of the circadian rhythm in controlling such signaling and hence the severity of ASD. The significance of gene dosage effects (which can be manifested by altered gene expression) as contributors to ASD are emphasized by recent studies, which show that copy number variants can be associated with both familial and spontaneous forms of ASD [Sebat et al., 2007; Szatmari, Paterson, Zwaigenbaum, Roberts, Brian, Liu et al., 2007b]. Our network analysis of the 15 circadian rhythm genes that are differentially expressed only in the severe ASD group shows the relationships between these genes and many neurological functions as well as disorders typically observed in ASD (Fig. 7). It should be mentioned that multiple genes (though not all 15) are differentially expressed in each individual, suggesting a multi-hit mechanism of dysregulation of the circadian rhythm in the most severe phenotype of ASD.
Among the genes confirmed by qRT-PCR are AANAT, BHLBH2 (DEC1), CRY1, NPAS2, PER3, and DPYD. A significant decrease is observed for AANAT, an enzyme which catalyzes the rate-limiting first step of the biochemical conversion of serotonin to melatonin, a key regulator hormone of the circadian cycle. A reduction in this enzyme would be consistent with the abnormally low levels of melatonin, which have been reported in a number of studies of autistic patients [Nir et al., 1995], which may be associated with the high incidence of sleep disorders reported in ASD [Johnson & Malow, 2008]. Related to the decrease in melatonin in ASD, Melke et al. have recently reported two polymorphisms in the promoter of the gene for acetylserotonin methyltransferase (ASMT), the last enzyme in the synthesis of melatonin, which are associated with decrease in ASMT transcripts and reduced ASMT activity [Melke et al., 2008]. Overexpression of BHLHB2/DEC1, which regulates the expression of the master circadian regulator genes CLOCK and BMAL1, has also been shown to delay the phase of several clock genes (e.g., DEC1, DEC2, and PER1), which contain E boxes in their regulatory regions [Nakashima et al., 2008; Sato et al., 2004]. CRY1 and PER3 are also transcriptional modulators of CLOCK/BMAL1, while NPAS2 is a CLOCK analog expressed primarily in brain tissues [Bertolucci et al., 2008; Van Der Schalie, Conte, Marz, & Green, 2007]. While not directly involved in the control of circadian rhythm, DPYD is a major target of the clock genes and a particularly important gene with respect to neurological functions. In fact, DPYD deficiency leads most frequently to epilepsy, mental and motor retardation (all symptoms frequently associated with autism), and other developmental disorders, with 18% of DPYD-deficient individuals receiving a diagnosis of autism [Van Kuilenburg et al., 1999]. Metabolically, DPYD catalyzes the breakdown of uracil to β-alanine, which activates both GABAA and glycine receptors with the same efficacy as their respective natural ligands [Horikoshi, Asanuma, Yanagisawa, Anzai, & Goto, 1988; Wu, Gibbs, & Farb, 1993]. Thus, a deficiency in DPYD or the resultant subnormal levels of β-alanine can be predicted to lead to decreased inhibitory signaling activity at the synapse. Interestingly, anti-convulsant medications, which are often prescribed as a therapeutic regimen for epilepsy associated with DPYD deficiency are also efficacious in improving behaviors in a subgroup of ASD individuals, even without apparent seizures. It is therefore suggested that evaluation of DPYD status, β-alanine levels, or circadian rhythm function in ASD individuals might be helpful in identifying those patients that would most benefit from this type of medication. Overall, the net effect of the observed changes in gene expression is the dysregulation of circadian rhythm in this most severely affected subgroup of ASD individuals. Since the circadian rhythm affects not only neurological but also endocrine, gastrointestinal, and cardiovascular functions, dysregulation of these genes can also have a systemic impact on affected individuals, causing many of the symptoms that are often associated with ASD. Thus, it may be proposed that interventions aimed at normalizing the circadian “clock” may ameliorate some of the symptoms associated with ASD for this subgroup.
While the first large-scale gene expression analysis of autism was performed on postmortem brain tissues from autistic and nonautistic individuals [Purcell, Jeon, Zimmerman, Blue, & Pevsner, 2001], several subsequent gene expression studies, including our earlier study on monozygotic twins discordant for autism, have utilized LCL or primary blood lymphocytes from autistic probands and controls with the goal of identifying expressed biomarkers for diagnostic purposes [Baron, Liu, Hicks, & Gregg, 2006; Gregg et al., 2008; Hu et al., 2006; Nishimura et al, 2007a]. It is further postulated that these surrogate experimental models of autism may reveal clues to the pathobiology of this neurobiological disorder [Walker, Segal, & Aschner, 2006]. Indeed, gene expression profiles of different brain regions have been shown to exhibit the highest similarity to whole blood [Sullivan, Fan, & Perou, 2006], while a meta-analysis of studies performed in blood and postmortem brain demonstrated convergent gene expression changes [Middleton et al., 2005]. Although there is great variability in the specific differentially expressed genes identified by the different studies on peripheral cells from autistic individuals vs. controls, in part due to different microarray platforms and experimental design, common biological themes (including inflammation, immune system dysregulation, neurite outgrowth, and disruption of axon guidance) emerge from these studies and reflect neuropathological findings [Amaral, Schumann, & Nordahl, 2008; Palmen et al., 2004; Pardo et al., 2005; Zimmerman et al., 2005]. In addition to providing further support for the involvement of these processes in the pathophysiology of ASD, this study also identifies circadian rhythm dysregulation as a prominent aberration in the subtype of ASD characterized by severe language impairment, which is consistent with recent studies implicating the involvement of “clock” genes and circadian metabolism in autism [Bourgeron, 2007; Melke et al., 2008; Nicholas et al., 2007] and related disorders, such as Fragile X [Zhang et al., 2008]. Of particular interest for future studies are the novel genes of unknown function, the majority of which are in noncoding regions, that are significantly differentially expressed across all three subgroups of ASD identified here. Because of their apparent sensitivity to androgens based upon our pilot studies on DHT-treated LCL (Fig. 6) as well as gene expression data deposited into the GEO repository for data from large-scale microarray analyses [Holterhus et al., 2003; Zhao et al., 2005], these genes may underlie the prominent 4:1 male-to-female sex bias in susceptibility to ASD.
This study demonstrates the value of subdividing individuals with ASD on the basis of cluster analyses of ADIR scores that incorporate all three core domains of ASD as described in the accompanying manuscript. Stratifying the sample by cluster analyses revealed quantitative differences in gene expression that appear to correlate with severity of ASD phenotype as well as gene expression profiles for each subtype that associate a “biological phenotype” (i.e., gene expression profile) to the respective functional/behavioral phenotype. The biological phenotypes reveal differences in some of the biological functions affecting individuals with ASD, such as circadian rhythm dysregulation in the severe (L) phenotype, suggesting possible therapeutic interventions specific to this subgroup. On the other hand, overlapping genes among the phenotypes indicate dysregulation of genes controlling both neurological and metabolic functions that may lie at the core of ASD.
In summary, our studies show that:
Finally, our results suggest that some of the neurological manifestations of ASD are at least in part the result of dysregulated signaling and metabolic pathways that are reflective of a systemic disorder which, once identified, may be treatable. The implications of these findings as well as those of others who have identified gene signatures of psychiatric disorders in lymphoblasts [Baron et al., 2006; Gregg et al., 2008; Iwamoto, Kakiuchi, Bundo, Ikeda, & Kato, 2004; Kakiuchi et al., 2003, 2004, 2008; Kato et al., 2005; Maes et al., 2007; Nishimura et al., 2007b; Tang, Lu, Aronow, Wagner, & Sharp, 2002; Tang, Nee, Lu, Ran, & Sharp, 2003] support the use of non-neuronal tissues, including patient-derived LCL and primary peripheral cells, to investigate the pathobiology of ASD.
We are grateful for the contributions of laboratory interns, Mandana Manzari for identifying genes associated with reported QTLs, Xinyi Zhou and Jonathan Chang for literature searches and preliminary qRT-PCR analyses. This work was supported by NIMH Grant # R21 MH073393 (V. W. H) and Autism Speaks, Grant #2381 (V. W. H). We also gratefully acknowledge the resources provided by the Autism Genetic Resource Exchange (AGRE) Consortium* and the participating AGRE families. The Autism Genetic Resource Exchange is a program of Autism Speaks and is supported, in part, by grant 1U24MH081810 from the National Institute of Mental Health to Clara M. Lajonchere (PI). We especially thank Dr.Vlad Kustanovich of AGRE for providing us with additional information about the samples, which were not easily retrievable in the database. *The AGRE Consortium: Dan Geschwind, M.D., Ph.D., UCLA, Los Angeles, CA; Maja Bucan, Ph.D., University of Pennsylvania, Philadelphia, PA; W.Ted Brown, M.D., Ph.D., F.A.C.M.G., N.Y.S. Institute for Basic Research in Developmental Disabilities, Long Island, NY; Rita M. Cantor, Ph.D., UCLA School of Medicine, Los Angeles, CA; John N. Constantino, M.D., Washington University School of Medicine, St. Louis, MO; T.Conrad Gilliam, Ph.D., University of Chicago, Chicago, IL; Martha Herbert, M.D., Ph.D., Harvard Medical School, Boston, MA; Clara Lajonchere, Ph.D, Cure Autism Now, Los Angeles, CA; David H. Ledbetter, Ph.D., Emory University, Atlanta, GA; Christa Lese-Martin, Ph.D., Emory University, Atlanta, GA; Janet Miller, J.D., Ph.D., Cure Autism Now, Los Angeles, CA; Stanley F. Nelson, M.D., UCLA School of Medicine, Los Angeles, CA; Gerard D. Schellenberg, Ph.D., University of Washington, Seattle, WA; Carol A. Samango-Sprouse, Ed.D., George Washington University, Washington, D.C.; Sarah Spence, M.D., Ph.D., UCLA, Los Angeles, CA; Matthew State, M.D., Ph.D., Yale University , New Haven, CT. Rudolph E. Tanzi, Ph.D., Massachusetts General Hospital, Boston, MA. Contributions to this manuscript: VWH conceived and designed the experiments, performed data analyses, and wrote the manuscript. TS and KSK performed the microarray experiments and TS also assisted in the microarray data analysis. AN and MES performed qRT-PCR analyses, SK conducted the analyses of novel transcripts in DHT-treated cells, TL was responsible for printing and quality control of the DNA microarray slides, and YL performed the hypergeometric test on the genes in QTL, and NHL performed the χ2 analyses to determine the significance of gene distribution in QTLs on specific chromosomes as well as participated in the discussion and editing of the manuscript.
Additional Supporting Information may be found in the online version of this article.
Grant sponsor: National Institute of Mental Health, NIH; Grant number: R21 MH073393; Grant sponsor: Autism Speaks; Grant number: 2381.
Present address of Kyung Soon Kim is Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN 55905
Present address of Mara E. Steinberg is Department of Hearing and Speech Sciences, University of Maryland; College Park, MD 20742