|Home | About | Journals | Submit | Contact Us | Français|
Autism spectrum disorder (ASD) is a highly heritable disorder of complex and heterogeneous aetiology. It is primarily characterized by altered cognitive ability including impaired language and communication skills and fundamental deficits in social reciprocity. Despite some notable successes in neuropsychiatric genetics, overall, the high heritability of ASD (~90%) remains poorly explained by common genetic risk variants. However, recent studies suggest that rare genomic variation, in particular copy number variation, may account for a significant proportion of the genetic basis of ASD. We present a large scale analysis to identify candidate genes which may contain low-frequency recessive variation contributing to ASD while taking into account the potential contribution of population differences to the genetic heterogeneity of ASD. Our strategy, homozygous haplotype (HH) mapping, aims to detect homozygous segments of identical haplotype structure that are shared at a higher frequency amongst ASD patients compared to parental controls. The analysis was performed on 1,402 Autism Genome Project trios genotyped for 1 million single nucleotide polymorphisms (SNPs). We identified 25 known and 1,218 novel ASD candidate genes in the discovery analysis including CADM2, ABHD14A, CHRFAM7A, GRIK2, GRM3, EPHA3, FGF10, KCND2, PDZK1, IMMP2L and FOXP2. Furthermore, 10 of the previously reported ASD genes and 300 of the novel candidates identified in the discovery analysis were replicated in an independent sample of 1,182 trios. Our results demonstrate that regions of HH are significantly enriched for previously reported ASD candidate genes and the observed association is independent of gene size (odds ratio 2.10). Our findings highlight the applicability of HH mapping in complex disorders such as ASD and offer an alternative approach to the analysis of genome-wide association data.
The online version of this article (doi:10.1007/s00439-011-1094-6) contains supplementary material, which is available to authorized users.
Extended runs of homozygosity (ROH) have recently been highlighted as a genomic feature that may be useful to map recessive disease genes in outbred populations (Hildebrandt et al. 2009; Lencz et al. 2007; Yang et al. 2010). Furthermore, even in complex disorders, we expect to find an unusually high number of affected individuals to have the same haplotype in the region surrounding a disease mutation (Durand et al. 2007; Lesch et al. 2008; Wong et al. 2002). Therefore, a rare pathogenic variant and surrounding haplotype is often enriched in frequency in a group of affected individuals compared with the haplotype frequency in a cohort of unaffected controls (The International HapMap Consortium 2003). We propose that homozygous haplotypes (HH) that are shared by multiple affected individuals may be important for the discovery of recessive disease genes in complex disorders. We have extended the traditional homozygosity mapping method by analysing the haplotype within shared ROH regions to identify homozygous segments of identical haplotype that are present uniquely or at a higher frequency in ASD probands compared to parental controls (Fig. 1). Such regions are termed risk homozygous haplotypes (rHH). We postulate that rHH may contain low-frequency recessive variants that contribute to ASD risk in a subset of ASD patients.
Allelic and locus heterogeneity are major challenges in the identification of ASD risk loci (Lamb et al. 2000). In cases where distinct populations share the same risk allele, differences in allele frequency and LD structure between populations may result in the risk allele segregating on different haplotype backgrounds. Correction for population substructure in large scale studies minimises the rate of false positives and dilution of a population-specific signal. Furthermore, Nothnagel et al. (2010) recently reported that the distribution of SNP-defined ROHs is highly structured across European populations and highlighted the importance of accounting for ancestry when undertaking ROH-based analyses. Our analysis accounts for population effects by separating the samples into groups of common ancestry and applying the HH mapping to each population group independently. It also involves the use of parental controls to address variation in low-frequency alleles across populations (Cardon and Palmer 2003). The genetic ancestry of the sample set was examined by principal component analysis (PCA), Hopach clustering (van der Laan and Pollard 2002) and genetic distance Fst calculations (Supplementary Material 1, Supplementary Fig. 1 and 2, Supplementary Tables 2 and 3). Ten distinct population clusters were identified ranging in size from 27 to 289 probands (Fig. 2). Population clusters with a minimum of 50 probands were selected for the discovery analysis (n = 5). Each cluster was analysed independently by HH mapping and the genes identified in each population cluster were then compared. In this manner (taking a gene-centric approach) it is possible to identify genes that may confer risk across different populations regardless of whether the causal haplotype is population-specific or not.
The samples used in the HH analysis were collected as part of an international consortium, the Autism Genome Project (AGP). Informed consent was obtained from all participants. The AGP sample set is a trio based collection, comprising an affected proband and two parents, grouped into the three distinct diagnostic classes of autism; strict, broad and spectrum. Affected individuals were diagnosed using the Autism Diagnostic Interview-Revised (ADI-R) and/or the Autism Diagnostic Observation Schedule (ADOS). A detailed description of the AGP sample set is provided in Supplementary Material 1 and by Pinto et al. (2010). Raw genotype data for the ASD trios is deposited at NCBI dbGAP (accession phs000267.v1.p1). The HH analysis was performed on trios in the autism spectrum diagnostic category (n = 2,584 trios). The ASD spectrum trios were further subdivided into stage 1 and stage 2 collections. In the current study, the 1,402 stage 1 trios were used for the initial discovery analysis and the 1,182 stage 2 trios were used for the independent replication study.
Stage 1 samples were genotyped using the Illumina 1M-single array while the stage 2 samples were genotyped on a combination of 1M and 1M-duo chips. The 1M SNP array contains 1,072,820 SNP markers at an average inter-marker distance of 2.7 kb while the 1M-duo chip contains 1,199,187 SNPs with a mean marker spacing of 1.5 kb. The 1,003,736 SNPs that were genotyped on both the 1M and 1M-duo platforms were considered for the analysis. Possible gender miscalls were assessed through analysis of chromosome X genotypes in PLINK (version 1.04) (http://pngu.mgh.harvard.edu/~purcell/plink/) (Purcell et al. 2007). Duplicates were identified by calculating identity by state (IBS) values using PLINK and one sample from each duplicate pair was removed. After quality control and filtering for autosomal markers, 887,716 SNPs and 7,719 individuals were retained for analysis (Supplementary Material 1).
The population structure of the sample set was assessed through principal component analysis (PCA), Hopach hierarchical clustering and Fst calculations. PCA was performed with EIGENSTRAT from EIGENSOFT (version 3.0) (Price et al. 2006) using 70,175 independent autosomal SNPs with a minor allele frequency >5% and a call rate of 100%. Tracy–Widom statistics indicated that the first eight principal components (PCs) were significantly contributing to the population structure of the sample set. Individuals were assigned to clusters based on similarity in eigenvalues for the first eight PCs using the Hopach hybrid hierarchical clustering algorithm available in the R package (van der Laan and Pollard 2002). Hierfstat was used to calculate pair-wise Fst metrics for the clusters using 5,000 SNPs randomly chosen from the panel of 70,175 SNPs used for PCA (Goudet 2005). Clusters displaying high similarity (Fst value < 1 × 10−4) were merged, resulting in 10 population clusters for the HH analysis (Supplementary Material 1 and Supplementary Table 3).
Long series of consecutive homozygous SNPS, referred to as runs of homozygosity (ROHs), were identified in each sample using the ‘homozyg’ function in PLINK. A threshold of 100 consecutive homozygous SNPs spanning at least 1 Mb at a minimum density of 1 SNP/50 kb was implemented. Using the homozyg-group function in PLINK, samples (min 3) with overlapping ROHs were pooled and subdivided into haplotype groups (min 95% allelic identity). Each haplotype within the overlapping ROH region is referred to as a homozygous haplotype (HH). For each overlapping ROH, the frequency of each HH within cases and controls was evaluated using the Fisher’s exact test (R script). HH that were significantly more common in ASD probands compared to parental controls (p value < 0.05) were considered rHH (Supplementary Tables 1a–m). All subsequent analysis was limited to these regions. The rHH regions were annotated using the Illumina 1M annotation file (Human Genome build 36.1 RefSeq). In cases where rHH were located in intergenic regions, the nearest centromeric and telomeric gene was noted.
Patterns of linkage disequilibrium (LD) within the rHH were visualised and analysed in Haploview using HapMap CEU as a reference (Barrett et al. 2005). LD was measured as r2 values and calculated between each pair of SNPs. The Tagger algorithm (Haploview) was used to determine the number of tagging SNPs within each rHH at an r2 threshold of 0.8. Genes located in rHH comprising <10 tagging SNPs were noted. For each stage 1 population group, the samples contributing to significant rHH were inspected for CNV content as detected by Pinto et al. (2010). If present, rHH identified as CNVs were removed prior to application of the Fisher’s exact test.
A replication study was undertaken using the AGP follow-up stage 2 data set (freezes 5–8) which comprises 1,182 ASD trios genotyped on a combination of the Illumina 1M and 1M-duo platforms. The stage 2 data was cleaned with the stage 1 samples to ensure that the same markers would be used in both analyses. The stage 1 and stage 2 probands were clustered to identify ancestry-matched replication groups. The four population clusters (C3–C6) with ≥50 probands in both stage 1 and 2 analyses were considered for the replication study. The same HH mapping method was applied to the four stage 2 population clusters. The rHH genes identified in the discovery (stage 1) and replication (stage 2) analyses were then compared to identify overlap.
For each population group, the number and length of autosomal ROH in ASD probands and parental controls was compared using a paired t test. ASD probands were compared to their mothers and fathers separately.
A Fisher’s exact test was used to identify HH that were more prevalent in ASD probands compared to parental controls at a 5% Fisher’s significance level. As we are assuming a recessive model a one-sided test was used.
Each of the five population clusters were analysed independently and the rHH genes subsequently compared to identify overlaps. A gene was considered a candidate in multiple population clusters regardless of whether the position of the rHH in each population differed (to allow for population-specific effects). In this manner, it is possible to identify genes that may confer susceptibility to ASD across multiple populations even though the underlying risk allele may be population-specific. Genes located in/near rHH over-represented in ASD probands in at least two population groups were noted.
A χ2 test with Yates’ correction factor was used to determine if the rHH genes displayed enrichment for previously reported ASD candidate genes (n = 202). To account for a potential bias towards large genes, a logistic multiple regression was performed in STATA including both ASD gene status and gene size (as defined in UCSC hg18) as covariates. Thus, if rHH genes were primarily associated with larger genes, and this was merely a confounding factor arising simply because of a preponderance of larger genes in the ASD list of genes (mean length of 185 kb compared to 57 kb for other genes), then the multiple regression model would identify gene size as the significant predictor, and fail to identify ASD genes as being significantly over-represented amongst the rHH genes.
Runs of homozygosity (ROH) of at least 1 Mb and 100 SNPs were identified in samples from each of the five population clusters using PLINK (“Subjects and methods” and Supplementary Material 1). A paired t test demonstrated that ASD probands did not have a higher genome-wide burden of ROH compared to parental controls (Supplementary Fig. 3). The findings from the current ASD study reflect those of a related neuropsychiatric condition, bipolar disorder, where it was recently reported that individuals with bipolar disorder do not have an excess of runs of homozygosity (Vine et al. 2009). A one-sided Fisher’s exact test was applied to identify HH that were more prevalent in ASD probands compared to parental controls at a 5% significance threshold. Such HH are considered regions of interest and are referred to as rHH. Since low-frequency variants will rarely be present at a high enough frequency to survive multiple testing, they will not be detected with the correction methods currently available. Therefore unless otherwise stated, p values have not been corrected for multiple testing. Genes located within or overlapping the rHH were noted. In cases where the rHH was intergenic, the neighbouring centromeric and telomeric genes were used. We found that, on average, 76% of the rHH were genic (Supplementary Fig. 4).
To determine if excess HH sharing is also likely to occur in a non-disease cohort we compared the number of HHs that are more common in ASD probands compared to parental controls and the number of HHs that are more common in parental controls compared to ASD cases. We observed that ASD probands shared a significantly higher number of homozygous segments of identical haplotype compared to the parental control group (paired t test p value = 0.008) (Fig. 3). This finding suggests that, although ASD probands may not have a higher burden of homozygous segments compared to parental controls, the probands display a much higher degree of haplotype sharing within overlapping homozygous regions. Excess haplotype sharing often indicates the presence of a disease locus (Bahlo et al. 2006), an observation that forms the basis of the current study. Two distinct types of rHH were identified in the affected cohort; (1) homozygous segments with a haplotype that is present in ASD probands but absent in parental controls and (2) homozygous segments with a haplotype that is present at a higher frequency in ASD cases compared to parental controls. A summary of the rHH results is shown in Table 1. The average rHH size across the five population clusters was 541.27 kb, 24-fold larger than the average haplotype block (Gabriel et al. 2002). Visual inspection of LD structure showed that the rHH extended across multiple LD blocks and 90% of rHH contained at least 10 tagging SNPs, indicating that it is unlikely that the observed HH sharing is a consequence of long-range LD.
We identified 192 genes (1 gene in 4 populations, 15 in 3 populations and 176 in 2 populations) that are in or near rHH regions significantly more prevalent among the ASD probands in two or more population clusters (Supplementary Table 4). In four of the five population clusters, an rHH was found in an evolutionary-conserved intergenic region on 3p12.1 in the vicinity of CADM2, a novel ASD candidate gene (Fig. 4). The rHH adjacent to CADM2 were identified in 23/1,019 ASD cases and 11/2,031 parental controls [Yates’ corrected χ2p = 1.9 × 10−5, OR = 4.26 (2.1, 8.6)]. Recent studies have highlighted the presence of long-range regulatory mutations in several diseases and suggest that cis-regulatory domains of developmental genes may extend over megabases of DNA flanking its coding sequences (Benko et al. 2009; Kleinjan and van Heyningen 2005). CADM2 is a member of the synaptic cell adhesion molecule (SynCAM) immunoglobulin superfamily which has potential roles in early postnatal development of the central nervous system (specifically synapse formation) and is predominantly expressed in neurons of the developing and adult brain (Thomas et al. 2008). The encoded protein localises to both excitatory and inhibitory neurons which is intriguing given that there is compelling evidence for dysfunction in neuronal communication and excitatory/inhibitory imbalance in autism (Persico and Bourgeron 2006; Sudhof 2008). Of interest, CADM2 contains a neurexin domain. Rare variation contributing to autism has previously been identified in neurexin and neurexin-binding genes (Arking et al. 2008; Betancur et al. 2009; Kim et al. 2008). In our study, the location of the rHH in relation to CADM2 differed between each of the four population clusters. This may indicate population-specific risk loci arising from different founder events involving the gene or a control element proximal to the gene. Given the recent implication of synaptic genes in ASD susceptibility (Durand et al. 2008; Zoghbi 2003) and the strong functional candidacy of CADM2, its involvement in ASD warrants further investigation.
Other novel ASD candidate genes located in rHH across multiple population clusters include GRIK2; an ionotropic glutamate receptor associated with autosomal recessive mental retardation and autism (Jamain et al. 2002; Motazacker et al. 2007), GRM3; a metabotropic glutamate receptor that impacts on aspects of cognition dependent on hippocampal and prefrontal cortical function (Egan et al. 2004), EPHA3; an ephrin receptor involved in axon guidance and synaptic plasticity, FGF10; a fibroblast growth factor involved in regulating the onset of neurogenesis and cortical brain size (Sahara and O’Leary 2009), ABHD14A; a cerebellar abhydrolase protein with a role in granule neuron development (Hoshino et al. 2003), NTS; a brain and gastrointestinal peptide involved in passive avoidance behaviour and the modulation of dopaminergic neurotransmission and serotonin levels (Shugalev et al. 2008), SCAMP5; a brain-enriched membrane trafficking protein located 10 kb centromeric to the breakpoint of a 15q de novo balanced translocation in a patient with autism (Castermans et al. 2010), CHRFAM7A; a hybrid gene involving a partial duplication of the alpha7 nicotinic acetylcholine receptor in the postsynaptic membrane and that has previously been associated with schizophrenia, bipolar disorder, dementia, Alzheimer’s disease and attention-deficit hyperactivity disorder and KCND2; a brain-specific voltage-gated ion channel component that regulates neurotransmitter release and neuronal excitability at the glutamatergic synapse where SHANK3 and NLGN products are formed.
Of particular interest in this study are the rHH that are ASD-specific; shared by multiple ASD patients but not present in any of the parental controls. We identified 8 ASD-specific rHH, implicating 12 genes (POLR3C, CD160, ZNF364, PDZK1, NUDT17, GBE1, HTR1E, C10orf95, CUECDC2, CHORDC1, MGAT4C and C12orf50) and a cluster of defensins (8p23.1), which are significant in at least two population clusters. The ASD-specific rHH at 1q21.1 is shared by three population clusters and the maximal overlapping region contains a single gene; PDZK1. PDZK1 encodes a scaffold protein that connects plasma membrane proteins and regulatory components. The PDZK1 protein is part of a complex that includes SYNGAP1, KLHL17 and NMDA receptors and this complex is important for maintaining the integrity of actin cytoskeleton structures in neurons (Chen and Li 2005). In addition, the ASD-specific rHH at 1q21.1 overlaps with a rare deletion associated with schizophrenia in two independent studies (Tam et al. 2009). Furthermore, we observed that 8 of 996 ASD cases from the AGP sample set analysed for rare copy number variants in the study by Pinto and colleagues have a rare duplication (n = 6; 145–1,470 kb) or deletion (n = 2; 189 and 528 kb) involving the PDZK1 gene.
The genes located in rHH regions were compared to autosomal genes that have previously been implicated in ASD in association studies, expression analyses and chromosomal anomaly studies, as reviewed by Yang and Gill (2007). We extended the literature search to 2009 using the same search criteria provided by Yang and Gill (Supplementary Table 5). Of the 1,243 genes identified in the study, 25 are published ASD candidate genes (Table 2). However, considerable caution is required in interpreting such findings, since both rHH genes and previously reported ASD candidate genes tend to be larger (p < 10−3). Accordingly, we performed a logistic multiple regression including both ASD gene status and gene size as covariates. While the odds ratio decreased slightly, it remained highly significant, indicating that there is an association between rHH status and ASD status that is independent of gene size [size adjusted p = 0.001, OR = 2.10, (1.36, 3.25)]. In addition, 9 ASD candidates (BTN2A1, FOXP2, GBE1, GRIK2, IMMP2L, LRFN5, LRRN3, ROBO1 and SLC4A10) are amongst the 192 genes located in rHH in two or more population clusters [size adjusted p = 6.9 × 10−5, OR = 4.76 (2.34, 9.64)].
To investigate our findings further, we performed a replication study on an additional 1,182 independent AGP trios. The replication cohort (stage 2) was clustered with the discovery sample set (stage 1) to identify ancestry-matched replication groups (Supplementary Material 1). The four population clusters (C3–C6) with at least 50 probands in both the discovery and replication sample sets were considered for the replication study (Supplementary Table 6). The same analytical strategy was applied to the replication sample set and identified 1,190 genes in rHH regions. The rHH genes identified in the discovery (number of genes = 1,086) and replication (number of genes = 1,190) analyses of population clusters C3–C6 were subsequently compared. We found that 28.5% (310/1086 genes) of the rHH genes identified in the discovery analysis occurred in a rHH in at least one of the replication population clusters (Supplementary Table 7). In particular, the replication study provided further evidence for the possible involvement of the novel candidate genes ABHD14A, CADM2, EPHA3, FGF10, GRIK2, GRM3, and KCND2. SCAMP5 and CHRFAM7A (found in rHH in multiple population clusters in the discovery analysis) did not replicate in their corresponding replication clusters while NTS was significant in one of two ancestry-matched replication groups. Furthermore, 10 previously identified ASD candidate genes (ALAS1, CSMD3, FOXP2, GABRG1, GBE1, GRM8, FBXO33, IMMP2L, SLC4A10 and ACO2) were located in rHH regions in both the discovery and replication HH mapping analyses [size adjusted p = 0.001, OR = 2.99 (1.50, 5.90)]. The ASD-specific rHH at 1p21.1 was also significant in two of four population clusters in the replication study providing further evidence of an ASD risk gene at this locus.
The rHH regions identified in the discovery (stage 1) and replication (stage 2) analyses were compared to the genomic regions that have previously displayed significant linkage (LOD score >3.3) with ASD. We found that 31 rHH identified in the discovery and replication analyses are located under significant ASD linkage peaks (Supplementary Table 8). Of interest, three of the ASD linkage peaks harbour rHH from multiple population groups.
Firstly, three population groups in the replication analysis (stage 2: C4, C5, C6) have an overlapping rHH within the 7q22.1 ASD linkage peak (International Molecular Genetic Study of Autism Consortium (IMGSAC) 2001). The rHH at 7q22.1 range in size from 545 to 808 kb. The maximal shared region is 545.3 kb and includes 19 genes. Three of the genes at this locus (CYP3A4, CYP3A5 and CYP3A7) are members of the cytochrome P450 superfamily that plays important roles in hormone synthesis and breakdown, cholesterol synthesis, vitamin D metabolism and the metabolism of drugs and toxic compounds. Data from the KEGG databases suggests that members of the cytochrome P450 family such as CYP3A4 and CYP3A5 are involved in 5-HT (serotonin) catabolism (Jia et al. 2004). Moreover, in the brain, mRNA expression appears to be specific to neurons in the cerebral cortex and basal ganglia, regions of the brain that are consistently implicated in ASD (Dutheil et al. 2008). In addition, genetic polymorphisms of cytochrome P450 enzymes have been linked to ASD and schizophrenia (Currenti 2010).
Secondly, a rHH shared by two populations in the discovery analysis is located within the significant ASD linkage peak at 15q13.1–q14 (Lauritsen et al. 2006; Liu et al. 2008; Philippe et al. 1999). This particular locus has been implicated in several neuropsychiatric disorders. The overlapping rHH region identified in the current study is 579 kb and contains five genes, only two of which are coding (CHRFAM7A and ARHGAP11B). Of most relevance to ASD is CHRFAM7A which represents a fusion product involving a partial duplication of the alpha7 nicotinic acetylcholine receptor in the postsynaptic membrane and a gene from the family with sequence similarity 7. CHRFAM7A has previously been associated with schizophrenia, bipolar disorder, dementia, Alzheimer’s and attention-deficit hyperactivity disorder (Feher et al. 2009; Flomen et al. 2006; Manchia et al. 2010; Martin et al. 2007; Sinkus et al. 2009). This is the first implication of CHRFAM7A in ASD.
Thirdly, we noted a rHH at 20q11.21–q13.12 that was significantly more common in ASD patients compared to parental controls in two population groups in the replication study. The shared rHH region includes 11 genes. The strongest functional candidate at this locus is MYH7B which encodes a myosin heavy chain protein that maintains excitatory synaptic function. Reduction of MyH7B in rats causes a profound alteration in dendritic spine structure and excitatory synaptic strength (Rubio et al. 2011). The resulting abnormal spine phenotype was strikingly similar to the morphological changes induced by over-expression of SynGAP1, a gene linked to mental retardation and ASD (Hamdan et al. 2009; Pinto et al. 2010). The advantage of the rHH analysis is that the ASD-specific rHH regions are much smaller (~0.53 Mb) than ASD linkage regions which range from 2 to 242 Mb and may facilitate identification of the causative gene within these loci.
Despite some notable successes in neuropsychiatric genetics, overall the high heritability of ASD (~90%) remains poorly explained by common genetic risk variants. Instead, early studies of rare variation, in particular copy number variation, have suggested that rare variants may account for a significant proportion of the genetic basis of ASD. The study of excess haplotype sharing within homozygous regions offers a complimentary approach to the analysis of GWAS data in the study of complex disease and particularly focuses on the contribution of low-frequency variation to disease risk. We report the first genome-wide rHH mapping study to identify novel recessive candidate genes and loci involved in ASD susceptibility. The strategy was applied to one of the largest ASD trio collections, subdivided into 10 distinct population clusters. We identified 307 rHH regions containing 1,243 genes for further investigation. In cases where ancestry-matched replication groups were available, almost one-third of the discovery phase rHH genes (310/1,086) were located in homozygous haplotypes that were significantly more common in ASD patients compared to parental controls in the replication analysis. Importantly, we identified novel ASD genes that merit further study including ABHD14A, CADM2, CHRFAM7A, EPHA3, FGF10, GRIK2, GRM3, KCND2 and PDZK1. The current study also provides further support for 25 previously reported ASD genes. Interestingly 192 of the rHH genes were significant in multiple population clusters. The variation in haplotype of the rHH across the different populations may suggest the existence of common risk genes but population-specific risk alleles. The findings of the current study serve as a starting point to screen for causal variants and elucidate the underlying pathogenesis of ASD. In particular, sequence analysis will be more economically feasible since the search for causal variants is limited to 1,243 genes and specifically identifies the patients carrying the rHH of interest, allowing a more targeted follow-up approach.
Sample size is an important feature of the HH mapping approach and is a limiting factor in the current study. A number of the population clusters in the ASD study are of modest sample size (n < 100) and five population clusters could not be included because of insufficient sample numbers. Larger collections of genetically homogenous samples would increase the power to detect low-frequency events. Similarly, due to the small sample sizes, we have not corrected for multiple testing. The HH sharing strategy aims to identify genes that may contain low-frequency disease variants. Given the small sample sizes, such events are highly unlikely to survive correction for multiple testing. Although this may be considered a weakness it is important to note that correcting for multiple testing will not change the relative order of the rHH results (Zaykin and Zhivotovsky 2005). The distributions of ranks will remain the same with HHs that show the greatest difference in frequency between ASD cases and parental controls remaining as the most important findings. We acknowledge that by not correcting for multiple testing, there is an increased risk of false positive findings. To address this we have focused on the genes that occur in rHH in multiple population clusters and replicate in the stage 2 sample set, as they are less likely to be false positives.
Analysis of Log R ratios and B allele frequencies confirmed that the rHH regions are not attributed to copy number deletions. To ascertain whether some unknown genomic architecture contributed to the observed rHH findings and also to reduce the risk of false positives, we undertook rHH mapping in two additional disease data sets from the Wellcome-Trust Case–Control Consortium; bipolar disorder (BD) (cases = 1,875 and controls = 2,954) and coronary artery disease (CAD) (cases = 1,963 and controls = 2,978). We hypothesised that genome architecture could possibly be contributing to the results of the ASD rHH analysis if (1) the rHH in the BD and CAD studies showed a significant overlap with the rHH identified in the current ASD study and (2) the BD and CAD rHH genes showed a significant enrichment for previously identified ASD genes. There was a 2 and 2.6% overlap between BD-ASD and CAD-ASD rHH regions, respectively, neither of which are greater than expected by chance (J.P.C., unpublished data). Furthermore the BD and CAD rHH genes did not display an enrichment for known ASD candidate genes (BD Yates’ corrected χ2p = 0.815, CAD Yates’ corrected χ2p = 1.000) suggesting that the rHH identified in the current study are more likely to be related to the ASD phenotype than to the genomic architecture in the rHH regions.
There are very few analytical strategies designed to identify rare recessive disease genes for complex traits. The homozygous haplotype sharing strategy addresses this issue and proposes a novel concept for searching for genes that may contain low-frequency disease variants. The most important feature of the HH mapping approach presented in this study is the concept; analysis of the haplotype within shared homozygous segments provides an additional level of information that has been overlooked in ROH-based analyses and excess sharing of a homozygous haplotype amongst patients may support the presence of a rare recessive disease mutation in the region. In the future, the strategy itself will undoubtedly benefit from further modifications and improvements, particularly in the areas of modelling, simulation and statistics for rare genetic events.
We applied the HH mapping approach to one of the largest international ASD cohorts (4,206 samples) and identified novel and known ASD candidate genes which were replicated in an independent sample set. We also found that homozygous haplotypes over-represented in ASD patients were significantly enriched for previously identified ASD candidate genes, further validating our approach. Although HH mapping does not identify causative alleles, the regions reported in the current study provide narrow genomic intervals containing highly plausible candidate genes for further investigation. The findings reported in this study suggest that the analysis of homozygous haplotype sharing may be an important tool in uncovering some of the missing heritability in a variety of complex disorders.
Ethical standards The experiments comply with the current laws of the country in which they were performed.
The authors acknowledge the families participating in the study and the main funders of the Autism Genome Project Consortium (AGP): Autism Speaks (USA), the Health Research Board (HRB; Ireland), The Medical Research Council (MRC; UK), Genome Canada/Ontario Genomics Institute, and the Hilibrand Foundation (USA). Additional support for individual groups was provided by the US National Institutes of Health (NIH grants HD055751, HD055782, HD055784, HD35465, MH52708, MH55284, MH57881, MH061009, MH06359, MH066673, MH080647, MH081754, MH66766, NS026630, NS042165, NS049261), the Canadian Institute for Advanced Research (CIFAR), the Canadian Institutes for Health Research (CIHR), Assistance Publique–Hôpitaux de Paris (France), Autistica, Canada Foundation for Innovation/Ontario Innovation Trust, Deutsche Forschungsgemeinschaft (grant Po 255/17-4) (Germany), EC Sixth FP AUTISM MOLGEN, Fundação Calouste Gulbenkian (Portugal), Fondation de France, Fondation FondaMental (France), Fondation Orange (France), Fondation pour la Recherche Médicale (France), Fundação para a Ciência e Tecnologia (Portugal), the Hospital for Sick Children Foundation and University of Toronto (Canada), INSERM (France), Institut Pasteur (France), the Italian Ministry of Health (convention 181 of 19.10.2001), the John P. Hussman Foundation (USA), McLaughlin Centre (Canada), Ontario Ministry of Research and Innovation (Canada), the Seaver Foundation (USA), the Swedish Science Council, The Centre for Applied Genomics (Canada), the Utah Autism Foundation (USA) and the Wellcome Trust core award 075491/Z/04 (UK). We acknowledge support from the Autism Genetic Resource Exchange (AGRE) and Autism Speaks. We gratefully acknowledge the resources provided by the AGRE consortium and the participating AGRE families. AGRE is a program of Autism Speaks and is supported, in part, by grant 1U24MH081810 from the National Institute of Mental Health to Clara M. Lajonchere (PI). We wish to acknowledge the National Children’s Research Centre Our Lady’s Children’s Hospital Crumlin Ireland for providing additional support and the Wellcome Trust Case–Control Consortium for providing data sets that were used as part of this study. J.P.C is supported by an EMBARK postgraduate award from the Irish Research Council for Science, Engineering and Technology (IRCSET).
The authors declare that they have no conflict of interest.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Dan Geschwind, M.D., Ph.D., UCLA, Los Angeles, CA; Maja Bucan, Ph.D., University of Pennsylvania, Philadelphia, PA; W.Ted Brown, M.D., Ph.D., F.A.C.M.G., N.Y.S. Institute for Basic Research in Developmental Disabilities, Staten Island, NY; Rita M. Cantor, Ph.D., UCLA School of Medicine, Los Angeles, CA; John N. Constantino, M.D., Washington University School of Medicine, St. Louis, MO; T.Conrad Gilliam, Ph.D., University of Chicago, Chicago, IL; Joachim Hallmayer, Ph.D., Stanford University, Stanford, CA; Martha Herbert, M.D., Ph.D., Harvard Medical School, Boston, MA Clara Lajonchere, Ph.D, Autism Speaks, Los Angeles, CA; David H. Ledbetter, Ph.D., Emory University, Atlanta, GA; Christa Lese-Martin, Ph.D., Emory University, Atlanta, GA; Janet Miller, J.D., Ph.D., Autism Speaks, Los Angeles, CA; Stanley F. Nelson, M.D., UCLA School of Medicine, Los Angeles, CA; Gerard D. Schellenberg, Ph.D., University of Washington, Seattle, WA; Carol A. Samango-Sprouse, Ed.D., George Washington University, Washington, D.C.; Jonathan Shestack, Autism Speaks, NY, NY; Sarah Spence, M.D., Ph.D., UCLA, Los Angeles, CA; Matthew State, M.D., Ph.D., Yale University, New Haven, CT.Rudolph E. Tanzi, Ph.D., Massachusetts General Hospital, Boston, MA.
Annemarie Poustka: deceased.