|Home | About | Journals | Submit | Contact Us | Français|
Two separate genome-wide association studies were conducted to identify single nucleotide polymorphisms (SNPs) associated with social and nonsocial autistic-like traits. We predicted that we would find SNPs associated with social and non-social autistic-like traits and that different SNPs would be associated with social and nonsocial. In Stage 1, each study screened for allele frequency differences in ~430,000 autosomal SNPs using pooled DNA on microarrays in high-scoring versus low-scoring boys from a general population sample (N = ~400/group). In Stage 2, 22 and 20 SNPs in the social and non-social studies, respectively, were tested for QTL association by individually genotyping an independent community sample of 1,400 boys. One SNP (rs11894053) was nominally associated (P < .05, uncorrected for multiple testing) with social autistic-like traits. When the sample was increased by adding females, 2 additional SNPs were nominally significant (P < .05). These 3 SNPs, however, showed no significant association in transmission disequilibrium analyses of diagnosed ASD families.
Social interaction problems and ‘nonsocial’ behaviors, such as restricted repetitive behaviors, are two core symptoms that define autistic spectrum disorders (ASD). Recent population studies show that autistic-like traits vary dimensionally in the general population (Baron-Cohen et al. 2001; Constantino et al. 2003; Posserud et al. 2006; Ronald et al. 2005; Skuse et al. 2005).
Twin studies have reported that autistic-like traits measured dimensionally in the general population are highly heritable (Constantino and Todd 2000, 2003; Hoekstra et al. 2007; Ronald et al. 2006a, 2005; Scourfield et al. 1999; Skuse et al. 2005). Furthermore, liability threshold models and extreme group twin regression analyses using DeFries–Fulker analysis have demonstrated that ‘extreme’ autistic traits (i.e., the most severely affected 15, 10, 5 or 2% of the population) are also highly heritable and show a similar heritability to autism (Ronald et al. 2006a). This suggests that dimensional measures of autistic-like traits might be genetically related to autistic behaviors at the high (impaired) extreme.
Some of these recent twin analyses of autistic-like traits have also explored the nature of the relationship between the different autistic-like traits that together form the diagnostic criteria. In data collected from parents and teachers on over 3,000 7-year-old pairs in a community twin sample, social and nonsocial autistic-like traits were both found to be highly heritable, but showed only modest genetic overlap (Ronald et al. 2005). The genetic correlation was estimated at 0.2, which suggests that only a small proportion of the genes influencing variation in social and nonsocial traits in the general population were overlapping, with the majority of genetic influences acting specifically on each trait. This finding has since been replicated using a different measure (Ronald et al. 2006a). Modest genetic overlap between social and nonsocial autistic-like traits has also been reported for autistic-like traits at the impaired extreme (Ronald et al. 2006b).
Family studies have demonstrated that undiagnosed relatives of individuals with autism show sub-threshold traits characteristic of autism (the ‘broader autism phenotype’), suggesting that these behaviors are familial and supporting the notion that these behaviors lie on a continuum of impairment (Bailey et al. 1998). Furthermore, it has been noted that there is some segregation of the phenotype among relatives, that is, often relatives show some but not all autistic-like traits, for example, social difficulties without any nonsocial behaviors or communication problems (Bailey et al. 1998; Bishop et al. 2004; Pickles et al. 2000; Szatmari et al. 2000). These findings from family and twin studies suggest that different causal influences might affect quantitative variation in social and nonsocial autistic-like traits (Happé et al. 2006).
There is also some indirect evidence from linkage studies using diagnosed autism samples that different genetic influences may play a role in different autistic behaviors. For example, in several studies, linkage signals have been shown to increase when families were selected based on particular nonsocial features such as having high scores on insistence on sameness (Shao et al. 2003), obsessive compulsive behaviors (Buxbaum et al. 2004), savant skills (Nurmi et al. 2003), high scores on the restricted repetitive behaviors and interests (RRBI) domain (Sutcliffe et al. 2005), severe compulsive behaviors and rigidity (McCauley et al. 2004) and repetitive behaviors (Alarcon et al. 2002).
Many linkage studies have been carried out for diagnosed autism and nearly every chromosome has been implicated (Abrahams and Geschwind 2008; Sykes and Lamb 2007; Yang and Gill 2007). A previous linkage study that used a quantitative assessment of autistic traits with the Social Reciprocity Scale in 100 multiplex ASD families found linkage signals on chromosomes 11 and 17 (Duvall et al. 2007). The first study to directly test for different linkage regions for social and non-social autistic behaviors has recently been carried out (Liu et al. 2009). In a sample of 2,025 individuals with an ASD, the ADI-R social interaction and the non-social behavior domains correlated .28. Genome-wide linkage analyses were performed separately for these two domains—reciprocal social interaction and restricted repetitive and stereotyped behaviors—but no genome-wide significant linkage signals were found. For complex traits though, linkage is limited to detecting large effects that may reflect a summary of effects over vast genetic distances. For this reason, allelic association, which is more powerful than linkage for detecting quantitative trait loci (QTLs) of small effect size (Risch 2000; Sham et al. 2000), has become the latest hope to unearth causal variants underlying complex traits and disease.
There have been many candidate genes proposed for autism (Abrahams and Geschwind 2008), and, like the linkage studies mentioned above, candidate gene studies have begun to explore the possibility of symptom-specific genetic associations in autism. A good example is the set of studies on the serotonin transporter gene (SLC6A4). A recent study reported that subjects with the short version of the serotonin transporter gene promoter polymorphism (5-HTTLPR) (S/L or S/S genotypes) were rated as more severe on a social subdomain “failure to use nonverbal communication to regulate social interaction,” whereas subjects with the long version (L/L genotype) were more severe on a nonsocial subdomain “stereotyped and repetitive motor mannerisms” and on an aggression measure (Brune et al. 2006). Increased severity on social/communication domains in individuals with the short version was also found in an earlier study (Tordjman et al. 2001), and other variants within this gene have also been found to be specifically associated with increased severity on nonsocial domains (Mulder et al. 2005; Sutcliffe et al. 2005).
However a problem with candidate gene studies is their unsystematic nature. Genome-wide association studies (GWAS) provide a solution to this problem: they are highly systematic and are now possible using SNP microarrays (Hirschhorn and Daly 2005). One economical strategy for screening large samples is to pool DNA for groups such as low and high groups on a quantitative trait, which averages allele frequencies biologically for the comparison groups rather than obtaining individual genotypes and averaging them statistically (Darvasi and Soller 1994; Knight and Sham 2006; Norton et al. 2004; Sham et al. 2002).
In the present study we have combined the strengths of microarrays and pooled DNA in a method we call SNP Microarraysand Pooling (SNP-MaP). Pooled DNA can be genotyped reliably on microarrays (Butcher et al. 2004; Docherty et al. 2007; Kirov et al. 2006; Meaburn et al. 2005, 2006; Pearson et al. 2007). To our knowledge four SNP GWAS for autism have so far been published, three with positive findings (SNPs identified on chromosome 15p; Ma et al. 2009; Wang et al. 2009; Weiss et al. 2009) and one which identified no genome-wide significant SNPs (Arking et al. 2008).
The purpose of the present study was to undertake the first GWAS of autistic-like traits in the general population, using a dimensional measure of autistic-like behaviors. Our first hypothesis, based on the high heritability of both social and nonsocial autistic-like traits, was that SNP associations would be found for both social and nonsocial autistic-like traits. A quantitative trait model has several advantages beyond the practical advantages of using community rather than clinical samples. First, quantitative information about the degree of autistic-like traits may be more informative than categorical information about presence or absence of a disorder (Abrahams and Geschwind 2008; Duvall et al. 2007). Second, and most pertinent to the current study, a trait approach allows the relationship between the different symptoms within a disorder to be studied independently, for example, the social and nonsocial behaviors that are both key features in the autism diagnosis. Importantly, we conducted two separate GWAS, one for social and one for nonsocial autistic-like traits. The Affymetrix GeneChip Human Mapping 500 K Array Set was employed, and the findings were followed up in a second stage using an independent community sample. Our second hypothesis, again based on the findings from twin studies, was that most markers associated with social autistic-like traits would be different from those associated with nonsocial autistic-like traits. Finally, we took advantage of the availability of the Autism Genetic Resource Exchange (AGRE) dataset, and tested, in a third replication stage, whether SNPs found to be associated with autistic-like traits in the general population were also associated with diagnosed ASD. Our third hypothesis was that SNPs associated with variation in autistic-like traits would also be associated with diagnosed ASD.
The general population sample came from the Twins Early Development Study (TEDS), a UK-based sample born in England and Wales in 1994–1996 (Oliver and Plomin 2007). Children were excluded who did not have ethnicity information or DNA available. Other exclusion criteria were extreme medical conditions (other than ASDs), severe perinatal difficulties, or non-Caucasian ethnicity. Males only were selected in order to avoid sex differences in the high and low groups in Stage 1. The TEDS sample is reasonably representative of the UK population (Oliver and Plomin 2007). Comparing the TEDS sample that provided data when the twins were age 7 to the General Household Survey (Office for National Statistics 2002), 94 versus 93% were white, 48 versus 50% were male, and 37 versus 32% of mothers had one or more A-level (UK advanced educational qualification). 4% of children in the TEDS 7-year sample had a statement of special educational needs versus 3% of children in England (Department for education and skills 2002).
For Stage 1 of the two separate studies of social and nonsocial autistic-like traits, boys were selected at the low and high extremes of the quantitative trait distribution on a measure of social and nonsocial autistic-like traits (Ronald et al. 2005)—see “Measures” section. Figure 1 shows the distributions of the social and nonsocial autistic-like trait measures and the high and low group cutoff criteria according to raw scores. For the social study, boys were selected if they scored in the most severe 10.5% of the sample for the high-scoring group, and in the lowest (least impaired) 29.9% of the sample for the low scoring group. For the nonsocial study, the equivalent cutoffs for the high and low groups were 13.7 and 25.4%. The choice of cut-off was guided by quantitative genetic research in TEDS showing that heritability of autistic-like traits is high regardless of cutoff (Ronald et al. 2006a) and by statistical genetic simulations that show that such cutoffs balance the power obtained in DNA pooling studies from using extreme cutoffs and from using large samples (Sham et al. 2002). The cutoffs are less extreme for the low groups because of the lack of variation at the low end of the distribution (see Fig. 1).
If both twins fell in the extreme group, the more extreme-scoring child was selected to be included in the high group. In opposite-sex pairs, only the male twin was included. Similarly, in the low group the lowest-scoring twin was selected. Thus children in the high group were unrelated to each other, and the same was true for the low group; moreover, children in the high and low groups were unrelated to each other. The final N in the high and low social groups were 373 and 372, respectively. The final N in the high and low nonsocial groups were 434 and 436, respectively. Thirty percent of the children in the high social group were also in the high nonsocial group, and 11% of the children in the low social group were also in the low nonsocial group. If we had excluded the children who appeared in both high groups, our two GWAS would have been studies of social-but-not-nonsocial autistic-like traits and nonsocial-but-not-social rather than separate GWAS of social and nonsocial autistic-like traits, which in fact correlate modestly in the population (rph = .15) (Ronald et al. 2006a; Ronald et al. 2005). Moreover, this slight overlap of subjects in the two studies is conservative in the sense that it works against our hypothesis that we will find different SNPs associated with social and nonsocial autistic-like traits.
An unselected sample was constructed from TEDS children not included in Stage 1, therefore including individuals who were independent of the sample used in Stage 1. Because the sample was unselected, it could be used for both the social and nonsocial studies. The same exclusion criteria were employed as for the initial sample. Initially a male-only sample was used; 1,411 had social autistic-like trait scores and 1,379 nonsocial autistic-like trait scores. This sample was subsequently extended to provide more power in Stage 2 by also genotyping females. The sample N with males and females with the social and nonsocial scores and DNA was 3,341 and 3,308, respectively. The overlap in individuals between the two studies was 3,155 indicating that a total of 3,494 individuals were genotyped. The community sample replication provides a test of the QTL hypothesis that those SNPs exhibiting allele frequency differences between the low and high parts of the distribution will be associated with variation across the entire quantitative trait distribution.
Teachers rated the twins’ autistic-like traits on DSM-IV-based social and non-social scales (Ronald et al. 2005) when the twins were age 7. The questionnaire was designed to include items that were relevant for assessing the types of social and nonsocial behaviors notable in autism but that also would be seen in the general population. The majority of items were derived from the DSM IV autism criteria (see Ronald et al. 2005). Each item was rated as Not true (0), Somewhat true (1) or Certainly true (2). The social scale had 10 items and therefore a range of 0–20; the non-social scale had 6 items and therefore a range of 0–12. Figure 1 shows the distributions of the social and nonsocial autistic-like trait scales in the male-only unrelated TEDS sample used for selecting high and low groups in Stage 1.
The DSM-IV-based social and non-social scales showed moderate internal consistency with Cronbach’s alphas of .72 and .51 for the social and nonsocial scales, respectively. Another index of reliability is the MZ twin correlation because reliability creates an upper limit for MZ twin correlations. MZ twin correlations for both the social and nonsocial scales were moderate to high for teacher ratings (.60–.77).
A total of 20 biologically independent DNA pools were constructed to represent high and low scorers for both social and nonsocial autistic-like traits (five pools per group, with approximately 74 subjects in each social pool and about 86 subjects in each nonsocial pool). Genomic DNA for each individual was extracted from buccal swabs (Freeman et al. 2003), suspended in EDTA TE buffer (0.01 M Tris–HCl, 0.001 M EDTA, pH 8.0) and quantified in triplicate using PicoGreen® dsDNA quantification reagent (Cambridge Bioscience, UK). Upon obtaining reliable triplicate readings (±0.5 ng/μl), equimolar quantities of DNA for each individual were added to their pool. Differing DNA concentrations for the individuals were compensated by adding different volumes for the individuals; the minimum volume added was 1 μl to avoid compromise due to pipetting errors. Therefore, the amount of DNA an individual contributed to each group of pools was established as the amount contained within 1 μl of the most concentrated individual from that group.
Each of the 20 DNA pools was allelotyped using the GeneChip® Mapping 500 K Array set in accordance with the standard protocol for individual DNA samples (see the GeneChip® Mapping 500 K Assay Manual for full protocol). Each microarray was scanned using the GeneChip® Scanner 3000 with High-Resolution Scanning Upgrade, which was controlled using GeneChip® Operating software (GCOS) v1.4. Cell intensity (.cel) files were analyzed using GTYPE. Each of the 20 DNA pools was assayed on a separate microarray set; for quality control checks, a reference DNA individual provided by the manufacturer (sample number 100103) was also assayed on a separate microarray set.
Relative allele signal (RAS) scores, calculated using the 10 K MPAM Mapping algorithm, have been shown to be reliable and valid indices of allele frequency in pooled DNA (Brohede et al. 2005; Butcher et al. 2004; Craig et al. 2005; Kirov et al. 2006; Liu et al. 2005; Meaburn et al. 2005, 2006; Simpson et al. 2005). Details of how probesets on Affymetrix Mapping GeneChip® microarrays are used to calculate allele frequency estimates are described in Appendix 1. Allele frequency estimates for the 500 K microarray set were calculated manually from the raw probe intensity data exported as a .txt file.
To screen SNPs in Stage 1 SNP-MaP analysis, we derived a rank-based composite score using five equally weighted criteria. The rationale and derivation of this composite score is presented in Appendix 1 (see also Butcher et al. 2008). Briefly, the five criteria were: (1) greater average allele frequency difference between low and high autistic-like trait groups, (2) smaller average variance of the low and high groups (i.e., variance across the 5 DNA pooled allele frequency estimates for each group), (3) smaller average variance within each microarray i.e., variance across the multiple probe sets that form the microarray’s allele frequency estimate (to account for probe-specific errors), (4) greater number of successful replicate pools, and (5) greater minor allele frequency, as indexed by the average of the low and high autistic-like trait groups. The fourth criterion was included because the more data we had from the replicate pools, the more accurate the allele frequency estimates were likely to be. The fifth criterion was included because we had more power to detect common alleles. We used this composite to choose the top SNPs with the highest composite scores in each of the two studies.
The Stage 2 sample of 3,494 individuals from TEDS were genotyped using Applied Biosystems’ SNPlex™ genotyping system and analyzed using GeneMapper v4.0 software (Applied Biosystems). SNPlex is a capillary electrophoresis-based multiplex genotyping system capable of genotyping up to 48 SNPs per sample per well (Tobler et al. 2005). A SNPlex multiplex was successfully designed for the top 47 SNPs as indexed by the composite score described above: 23 SNPs for social traits; 24 for nonsocial traits. In addition to the TEDS individuals, DNA from 88 Centre de d’Etude du Polymorphism Humain (CEPH) individuals who have been genotyped as part of the HapMap Project (The International HapMap Consortium 2003, 2005) were obtained from the Coriell Institute to assess genotyping quality and error rate. Reference genotypes of CEPH individuals for the selected SNPs were downloaded from HapMart, the data mining tool for downloading HapMap data (http://hapmart.hapmap.org/BioMart/martview).
SNPs passing quality control (see below) were tested for additive genetic effects (coding genotypes as 0, 1 or 2) using a Pearson correlation (r) between additive genotypic values and quantitative trait scores.
The following sequential criteria were applied for the genotyping quality control: SNPs were omitted from analysis if either poor genotype clusters prevented GeneMapper software from making calls or a SNP showed more than one genotype mismatch between CEPH genotypes deposited in HapMap and those derived using in-house genotyping methods. Individuals were omitted if their SNP call rate was <65% (1 SD below the average). Finally, for each SNP, individual genotypes were omitted if their peak heights were <25% of the average peak height for that genotypic group as measured across the entire sample; we apply this procedure because poor quality samples often exhibit high background noise that SNPlex can mistake as heterozygotes. This leads to an apparent excess of heterozygotes that inflates the number of false positives in Hardy–Weinberg equilibrium tests.
SNP-MaP allele frequencies for the 20 DNA pools were calculated and analyzed separately for the social and nonsocial studies. In order to increase the reliability of SNP-MaP allele frequency estimates, we required allele frequency estimates from a minimum of 3 (out of 5) pools for both high and low groups. We also excluded SNPs with minor allele frequencies lower than .05 (according to CEPH allele frequencies from HapMap) as power to detect association in this range is greatly reduced. After these exclusion criteria, the autosomal genome-wide screen consisted of 433,813 SNPs for the social study and 435,457 SNPs for the nonsocial study.
The average allele frequency for the low and high groups was calculated for each SNP. The correlation between allele frequency estimates for the low and high groups was .986 and .988 for the social and nonsocial SNP-MaP studies respectively, indicating that the rank order of allele frequencies was highly reliable overall—a test analogous to genome control. Accordingly, allele frequency differences between the low and high groups were small: For social autistic-like traits, 76% of SNPs exhibited between-group differences smaller than .05, with a mean between-group absolute difference of .035 (range: .00–.40); for nonsocial autistic-like traits, 78% of SNPs exhibited between-group differences smaller than .05, with a mean between-group absolute difference of .030 (range: .00–.30).
As explained in “Design and procedures”, Stage 1 was used to screen SNPs on the basis of a ranked composite score which took into account the between-group allele frequency difference, variance between- and within- biological replicate microarrays, number of successfully assayed arrays, and minor allele frequency. Due to financial restrictions, we were limited to genotyping individuals in Stage 2 using a single SNPlex multiplex probe set of 47 SNPs: 23 SNPs in the social study and 24 SNPs in the nonsocial study. These SNPs represent the highest composite scores in Stage 1 for each of the two studies. None of the SNPs selected from the social SNP-MaP study overlapped with those selected from the nonsocial SNP-MaP study. The mean absolute difference between low and high SNP-MaP allele frequency estimates for the 23 social SNPs was .17 (ranging from .08 to .29); for the 24 nonsocial SNPs the mean absolute difference was .16 (ranging from .08 to .26). Figures 2 and and33 place the selected SNPs in the context of the full dataset for the social and nonsocial studies, respectively, by plotting the average allele frequency of the low scoring groups against that of the high (impaired) scoring groups.
The 47 SNPs nominated in Stage 1 in the two studies were individually genotyped in the unselected sample in order to test the QTL hypothesis directly by assessing the degree to which the SNPs nominated in Stage 1 are associated with quantitative autistic-like traits throughout the distribution. With 23 tests for the social study and 24 tests for the nonsocial study and an alpha of 0.05, one significant result would be expected in each study on the basis of chance alone.
In our SNPlex analysis, three out of 47 SNPs (rs6701037 [selected for social], and rs1546377 and rs7894025 [selected for nonsocial]) exhibited poor call rates across plates due to poor genotype clustering and were omitted from further analyses. We omitted two other SNPs (rs6903663 and rs9654873 [both selected for nonsocial]) from analyses because of inaccurate calls, that is, they showed poor concordance between in-house derived genotyping of 88 CEPH individuals and genotypes deposited in HapMap International HapMap (Frazer et al. 2007). Using this comparison between in-house genotyping and genotypes deposited in HapMap, the remaining 42 SNPs (22 SNPs for social, 20 SNPs for nonsocial) displayed genotyping error rates of <1%. These errors were caused by homozygotes being erroneously called as heterozygotes.
55 (3.9%) individuals and 50 (3.6%) individuals showing low call rates (<65%) across SNPs were omitted from the social and nonsocial analyses respectively. We also excluded an additional 4.8 and 2.9% of genotypes from the social and non-social studies respectively, whose peak heights were <25% of the average peak height for that SNP across the study. After excluding the 5 aforementioned SNPs, samples with poor call rates and genotypes with low peak heights, we observed 27,162 (87.5% completeness) and 24,684 (89.5% completeness) genotypes to perform association analysis for the social and nonsocial autistic-like traits, respectively, in the male-only sample. Table 1 outlines the number of SNPs and individuals remaining after SNP, sample and genotype quality control procedures.
Our conservative criteria improved observed genotypic distributions under Hardy–Weinberg equilibrium, tightened genotype clusters in SNPlex, and left the distributions of social and nonsocial autistic-like trait scores unchanged.
One SNP (rs11894053) in the social study correlated .06 (P = .02) in the male-only sample, and no SNPs in the nonsocial study showed nominally significantly correlations (P < .05) in the male-only sample. Figure 4 plots the results for rs11894053 in terms of standardised mean quantitative trait social score (age- and sex-regressed) for the three SNP genotypes in the male-only sample. The SNP shows an additive pattern. The homozygotes differ by .23 SD. Stage 1 and 2 results for all 42 SNPs are shown in Table 2.
Squaring the correlation of r = 0.06 to estimate effect size indicates that this association accounted for only 0.36% of the variance in teacher-rated social autistic-like traits.
Next, we explored which of the SNPs nominated in Stage 1 replicated in the whole sample including females as well as males. Although this meant mixing males and females together, the advantage of using the whole sample for the Stage 2 replication was that it provided greater power to detect effects. Power to detect a QTL explaining 0.5% of the variance with the male-only sample was 62–72% (N = 1031-1278), whereas power to detect a QTL explaining 0.5% of the variance with the male and female sample was 94–97% (N = 2519-3047). In the social study, the SNP (rs11894053) that correlated in the male-only sample also correlated .04 (P = .02) in the whole sample, and another SNP (rs17622673) correlated .03 (P = .05) in the whole sample. For the nonsocial study, one SNP (rs12578517) correlated .03 (P = .03) in the whole sample.
To test our hypothesis that SNPs associated with autistic-like traits in the general population will also be associated with diagnosed ASD, each of the 3 SNPs that were nominally significant in Stage 2 (rs11894053, rs17622673, rs12578517) were tested individually in a transmission disequilibrium test (TDT) analysis using the AGRE ASD database (see Appendix 2). The TDT analysis tested for over-transmission of the risk allele from heterozygous parents to affected offspring. However all 3 Chi-square tests were nonsignificant (P = .51–.91).
In this first genome-wide association study of autistic-like traits in the general population, we found one SNP associated with social autistic-like traits in the male-only Stage 2 sample that was nominally significant (P < 0.05). When using the whole sample of males and females, the same SNP and another one were both nominally significant with social autistic-like traits, and one SNP was significant P < .05 with nonsocial autistic-like traits. With 42 SNPs nominated in the SNP-MaP stage using pooled DNA for low versus high social and nonsocial autistic-like trait groups we would expect two SNPs to remain significant; therefore we only found one more SNP to be nominally significant with P < .05 than that expected by chance alone. Importantly, no SNP associations emerged in Stage 2 that accounted for more than 0.4% of the variance in either study. In sum, despite studying two highly heritable traits and employing a three-stage design, we did not find any associations of the effect size we had power to detect.
The nominally significant SNP (rs11894053) in the male-only sample is in an intergenic region at 2p21 that maps to a hypothetical protein BC007901. According to a recent review (Abrahams and Geschwind 2008), one gene on 2p, NRXN1 (neurexin 1), has been implicated in clinical ASD but is located at 2p16, which is not in linkage disequilibrium with rs11894053. As described above, when using the whole sample, two additional SNPs became nominally significant. rs17622673 was associated with social autistic-like traits in the male and female sample; it is located in an intergenic region on 6q16.3 downstream from GRIK2 (glutamate receptor, ionotropic, kainate 2). rs12578517 was associated with nonsocial autistic-like traits in the whole sample and is located on 12p12.3, upstream from PTPRO (protein tyrosine phosphatase, receptor type, O). However, none of these 3 SNPs were associated with diagnosed ASD in the AGRE sample.
We had hypothesised that SNPs selected from the pooling stage (which compared high versus low scoring groups) would also show a significant association across the dimension of autistic-like traits in the individual differences replication stage. In an additional analysis, we hypothesised that SNPs associated with autistic-like traits in Stages 1 and 2 would also be associated with diagnosed ASD. The small number of significant SNPs (at P < 0.05) found in Stage 2, as well as the lack of significant association of these SNPs in the AGRE ASD sample, deserves discussion. Although the selection criteria for the SNPs for Stage 2 were carefully explored and have been applied successfully to identify SNPs in previous GWAS SNPMaP studies (e.g., Butcher et al. 2008), it is possible that some aspect of the selection criteria meant that the most optimal SNPs for Stage 2 were not identified here. For example, we purposely biased our selection criteria in Stage 2 towards common alleles because we had greater power to detect them, which meant discriminating against less common alleles. DNA pooling has considerable advantages and has been shown to work effectively with the Affymetrix 500 K array (e.g., Docherty et al. 2007). Results might have been improved if more arrays per pool had been used. Ultimately, individual genotyping would have provided greater information in Stage 1. Financial considerations meant that the number of SNPs that could be followed up in the individual genotyping stage was constrained to approximately 20 SNPs per study. Therefore, financial constraints that limited the sensitivity and breadth of the genome-wide scan, as well as the sample size, were possible reasons for the lack of significant findings in Stages 1 and 2.
Indeed, power relies on the design employed and the sample sizes. Because Stage 1 was a screening stage that employed a rank-based composite to select SNPs, the power available in this stage is not explicit. However, it is thought that pooling retains 60–70% of the power of individual genotyping (Barratt et al. 2002). Power in Stage 2 of the design with both males and females included, as described in the “Results” section, was >90%. In Stage 3, with 777 families and with a genotypic relative risk of 1.5, we could expect to have 77% power to detect an additive SNP with a MAF of 0.05, using a nominal significance level (p < .05).
Regarding the additional TDT analyses of the AGRE database, heterogeneity within ASD samples (for example, due to different types of diagnoses and comorbid features) is often cited as a problem in ASD samples and it is possible that this could explain the null findings in the AGRE sample. It is noteworthy that using three different types of analysis on the SNPs at each stage of the study (allelic low–high group association, a QTL additive model, TDT analysis) made the test of significance at 3 consecutive stages extremely stringent. Finally, which SNPs were tested in the AGRE sample depended on the results of Stages 1 and 2. Further research is needed, but the present data did not support the hypothesis that the same genes influence autistic-like traits assessed dimensionally in the general population and diagnosed ASD in clinical samples.
At the phenotypic level, reliable and valid measurement of autistic-like traits is vital. Quantitative measures of autistic-like traits are still in development—there is no gold standard as there is with ASD diagnoses. The measures employed in this study were relatively short questionnaires and were only moderately reliable (Ronald et al. 2005). An additional issue to consider is the choice of rater when assessing autistic-like traits as it has been shown that parent, teacher and self reports show only modest agreement and may in part pick up on somewhat different genetic influences when assessing autistic-like traits in children (Ronald et al. 2008). Indeed, in a previous study, different linkage signals were found for parent and teacher ratings of a quantitative assessment of autistic behaviors in ASD families (Duvall et al. 2007).
A male-only sample was initially selected for Stages 1 and 2 in order to avoid conflating the genders in the analysis, and because autistic-like traits and ASD are both more common in males and there is some evidence for sex-specific genetic effects (e.g., Stone et al. 2004). Therefore Stage 1 will have missed SNPs associated specifically with autistic-like traits in females. Females were added to the Stage 2 sample in order to increase power in the analysis. These data do not distinguish between the possibilities that some SNPs showed a sex-specific effect or that they were easier to identify simply because the sample of males and females offered greater power than the male-only sample.
A more general consideration concerning the design of the study is that relevant SNPs not captured by the Affymetrix 500 K array, other polymorphisms (e.g., copy number variation, microsatellites), as well as rare alleles, as mentioned above, may have passed through our screen unnoticed. This limitation is likely to be avoided in future studies because of the advent of newer microarrays offering more comprehensive coverage of both SNPs and copy number variations and the availability of larger samples. Both social and nonsocial teacher-rated autistic traits show high heritability (63–74%) (Ronald et al. 2005) which led us to expect to be able to find SNPs associated with these traits. Three recent GWAS studies have identified common SNP variants associated with autism (Ma et al. 2009; Wang et al. 2009; Weiss et al. 2009). In positional candidate gene studies, several common variants have been found to be associated with autism, for example, a common allele in the promoter region of the MET gene (Campbell et al. 2006) and a common polymorphism in the contactin-associated protein-like 2 gene (CNTNAP2; Alarcon et al. 2008). These previous findings support the hypothesis that common variants play a role in the etiology of autism. In conclusion, this first GWAS study of social and non-social autistic-like traits in the general population joins one of the other published GWAS studies of autism to date, a family-based study, in reporting largely negative findings (Arking et al. 2008).
TEDS is funded by MRC grant G0500079. AR was funded by Autism Speaks during this project. We gratefully acknowledge the resources provided by the Autism Genetic Resource Exchange (AGRE) Consortium and the participating AGRE families. The Autism Genetic Resource Exchange is a program of Autism Speaks and is supported, in part, by grant 1U24MH081810 from the National Institute of Mental Health to Clara M. Lajonchere (PI). AGRE Affymetrix 5.0 data was generated at the Broad Institute and provided to AGRE by Dr. Mark Daly and the Autism Consortium. Maja Bucan, Ph.D., AGRE Scientific Steering Committee Chair, Department of Psychiatry, University of Pennsylvania, PA; Daniel H. Geschwind, M.D., Ph.D., AGRE Chief Scientific Advisor, Department of Neurology, UCLA School of Medicine, CA; W. Ted Brown, M.D., Ph.D., NYS Institute for Basic Research, NY; Rita M. Cantor, Ph.D., UCLA School of Medicine, CA; John N. Constantino, M.D., Washington University School of Medicine; T. Conrad Gilliam, Ph.D., University of Chicago, IL; Martha Herbert, M.D., Ph.D. Harvard University, MA; David Ledbetter, Ph.D., Emory University, GA; Stanley F. Nelson, M.D., UCLA School of Medicine, CA; Carole Samango-Sprouse, Ed.D., George Washington University, DC; Gerard Schellenberg, Ph.D., University of Washington, WA; Jonathan Shestack, Autism Speaks, Los Angeles, CA; Matthew State, M.D., Ph.D., Yale University Schoold of Medicine, CT; Rudolph Tanzi, Ph.D., Harvard Medical School, MA.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
The following sections give an overview of how we calculated allele frequency estimates using the measurement structure of Affymetrix DNA genotyping microarrays with pooled DNA of groups selected for low and high social (or nonsocial) autistic-like traits. We also detail how these measurements (and their derivatives) were implemented to form a five-criterion rank-based composite score for each SNP used to select SNPs from Stage 1 (pooled DNA using Affymetrix 500 K microarrays) for individual genotyping on an independent and representative sample of social (or nonsocial) autistic-like traits. The same approach is described in Butcher et al. (2008).
As with other applications of pooled DNA, allele frequency estimates (p) are calculated as the proportion of fluorescent intensity corresponding to allele A to the fluorescent intensities corresponding to the sum of alleles A and B:
Affymetrix microarrays, however, measure alleles A and B numerous times using multiple unique probes (oligos) scattered across the microarray’s surface. The fluorescent intensity values for these probes are contained in cell files (.cel) that are produced separately for each microarray. The lowest level at which an allele frequency estimate can be measured using Eq. 1 is at the “quartet” level. A quartet contains four 25 bp probes with variations at a consistent location within each probe. The four variations are either (1) a perfect match to allele A of the SNP (PMA), (2) a perfect match to allele B (PMB), (3) a mismatch to allele A (MMA) or (4) a mismatch to allele B (MMB). Numerous quartets represent each SNP on the microarray with variation achieved through “off-sets” and/or by designing quartets on sense or anti-sense strands. Off-sets refer to the shifting of the SNP interrogation site to a different position within the quartets’ probes. Sequence design occurs either exclusively on the sense or anti-sense strand, or on both strands depending on the SNP. Therefore, microarray measurement is defined as taking place at the ith quartet of the jth replicate in group k, for SNP s.
We denote the four probes within a quartet as:
We then transform PMijks probe intensities by subtracting an estimate of non-specific hybridization (the average intensity of the two mismatch probes) to derive best estimates for allele A (Aikjs) and allele B (Bikjs):
Transformed values are then substituted into Eq. 1 to provide an allele frequency estimate for the ith quartet:
The allele frequency estimate for the jth replicate is the simple arithmetic mean of I quartets with ‘interpretable data’, i.e., the denominator of Eq. 4 ≠ 0:
where and is dependent on SNP, s. As can be seen from Eq. 5 we only accepted allele frequency estimates from replicates with either 5/6 or 7/10 quartets with interpretable data.
Variance across quartets is thus:
The allele frequency estimate for the kth group is simply the arithmetic mean of J replicates, thus:
with variance of allele frequency estimates across replicates:
where J is the number of replicates within the kth group.
The following sections detail how the data acquired from the above equations is used to create a rank-based composite score (based on five criteria) for each SNP used to select SNPs from Stage 1 (pooled DNA using Affymetrix 500 K microarrays) for individual genotyping on an independent and representative sample assessed on social and nonsocial autistic-like traits.
Using Eq. 7 we calculated allele frequency estimates separately for the two groups, low social autistic-like traits (klow) and high social autistic-like traits (khigh). The same was done for the high and low nonsocial groups. The allele frequency difference between groups at SNP s is the absolute difference between group estimates, thus:
For the composite, allele frequency differences were standardized separately by array type (NspI or StyI) and weighted positively (to prioritize larger allele frequency differences) which we denote:
Equation 8 was applied to each group, averaged, standardized separately by array type, summed then weighted negatively (to prioritize low variance scores), which we denote:
Equation 6 was applied to each replicate, averaged across all replicates, standardized separately by array type, summed then weighted negatively (to prioritize low variance scores), which we denote:
We denote the number of replicates at SNP s, for low and high groups at SNP s as and respectively. We took the arithmetic mean of these values, standardized the result separately by array type, summed them then weighted them positively (to prioritize SNPs with more replicates) to give:
Minor allele frequency was calculated as:
For the composite, minor allele frequencies were standardized separately by array type and positively weighted (to prioritize common allele frequencies) which we denote:
The composite measure (C) was the simple summation of the standardized information from criterion 1–5:
In future experiments, different composite scores may be created by assigning weights to the different criteria, or add or remove different criteria. At the time of writing, the criteria used were believed to be the most informative for detecting QTLs of small effect size.
High-density single nucleotide polymorphism data from 500 K Affymetrix arrays on 777 families were analysed. These were contributed by the Autism Consortium to the Autism Genetic Resource Exchange (AGRE). From the sample of 2883 individuals, 4 had missing phenotypic information and 5 were removed for low genotyping (maximum SNP missingness rate >0.1). The final sample was 63% male and contained 2,874 individuals.
Distortion in the transmission of SNP alleles was tested for by the transmission disequilibrium test (TDT; Spielman et al. 1993) implemented in PLINK (Purcell et al. 2007). Thresholds were set to exclude minor allele frequencies <.01, maximum minor allele frequencies of 1.0, maximum SNP missingness rate of 0.1 and maximum individual missingness rate of 0.1. Over-transmission of the risk allele from heterozygous parents to affected offspring was tested for. However all 3 Chi-square tests were nonsignificant (P = .51–.91).