To understand the complexities of genomic architecture of ASDs, a highly heterogeneous group of disorders, and to assess the distribution and specificity of common and rare variants in different populations, we initiated genomic analysis of ASD and control subjects from Croatia. This cohort, consisting of 103 subjects (children and adults) with ASD and 203 adult control subjects, with male-to-female ratios of 3.7:1 (in ASD) and 2.6:1 (in controls), was originally recruited to study the role of hyperserotonemia in autism [26
]. Patients with autism were evaluated by a child psychiatrist and diagnoses were based on DSM-IV-TR [29
] criteria. Severity of behavioral symptoms was measured using the Childhood Autism Rating Scale (CARS) [30
]. The degree of mental retardation (MR) was assessed according to the standardized intelligence or developmental tests, corresponding to the apparent developmental level of each individual. ASD in selected families occurs either in the absence of positive family history (sporadic) or in families with a high prevalence of other clinical psychiatric disorders distinct from ASD (such as anxiety, depression and phobias). During the clinical assessment and interviews, psychiatrists and geneticists did not notice any recognizable dysmorphic features in selected study subjects. DNA samples isolated from whole blood were genotyped with the Illumina HumanHap550 SNP array and 306 samples gave high quality data that were subjected to population structure and CNV analysis. See Table for a summary of the genome-wide datasets used in this study.
Summary of datasets with whole-genome genotyping
To understand the genetic homogeneity of the Croatian cohort and its position in the worldwide human population, we used phylogenetic tree analysis, principal component analysis (PCA) (not shown), and multi-dimensional scaling (MDS), three tools commonly used in the study of human population genetic diversity [23
]. We used data from the Human Genome Diversity Project (HGDP), which consists of genome-wide SNP data of 1007 individuals from 51 populations, divided into 7 geographic regions. Of these individuals, 153 were from 8 populations from the European region. We plotted the results of MDS and tree analyses using control subjects from Croatia and the HGDP samples (Figure ). All three methods show the same overall pattern emerging from previous HGDP studies: the relative positions of the major regions largely correlate with their relative geographical locations, and reflect the human migration history well. Humans migrated from Africa to the Middle East (Figure ); the first branching took place, and one group migrated to Europe, while the other migrated to South Asia. A second major branching then occurred, with one migration to America (through Alaska), the other countinuing to East Asia and eventually Oceania. As expected, the Croatia control samples are entirely contained within the European samples. A focused analysis of Europeans in the HGDP panel (Figure ) with the Croatia control samples situated in the middle of phylogeny or MDS plots shows the relationship of this subpopulation to other European populations. The Sardinians and the Basques seem to be the farthest, with Italian samples placed in between them and the Croatian controls; the other subpopulation that is separate from the Croatia controls is the Adygei. The Croatians are most similar to the Russians and the Orcadians. Both the PCA (not shown) and MDS plots show the relative diversity of the Croatian controls (indicated by the area spanned by the samples on the MDS and PCA plots) to be similar to the other populations. The boxplot of pairwise Identity by State (IBS) distances for European populations (Figure ) showed the Croatian control cohort has divergence similar to other European subpopulations (except Basques and Sardinians who are genetically more homogeneous), and smaller than the European population as a whole.
Figure 1 Population structure of Croatia Cohort. (A) FastME tree using IBS distance and (B) IBS distance multidimensional scaling (MDS) for Croatia control + HGDP, colored by HGDP region IDs. (C) FastME tree using IBS distance and (D) IBS distance multidimensional (more ...)
Figure 2 Pairwise IBS analysis. (A) Box-and-whisker plot of pairwise IBS distances for Croatia Control, subpopulations of HGDP European Cohort, and the whole HGDP European Cohort. (B) Scatter plot of the first two coordinates of IBS distance multidimensional scaling (more ...)
We performed MDS on the Croatian controls and cases, as well as NINDS (National Institute of Neurological Disorders and Stroke) control cohorts to check the relationships between the three cohorts (Figure ), and found that the Croatian control and cases are well mixed but separate from the NINDS controls. Randomization tests by shuffling between Croatian control and NINDS controls showed the median pairwise IBS (identity-by-state) distance across the two subsets (0.28) is significantly higher than expected by chance (p = 0.00273, see Figure ); this agrees with the MDS visualization that the two cohorts are genetically separate. Analysis of fixation index (FST) led to the same conclusion (see Table for top regions with large FST). In summary, the Croatian and the NINDS cohorts have significant genetic difference, and care must be taken to correct for population stratification in combining the two datasets for association studies.
Figure 3 Analysis of median and mean cross-population pairwise IBS distances between Croatia Control and NINDS. (A) Observed median IBS distance and distribution of median IBS distances by random shuffling of population memberships; (B) observed mean IBS distance (more ...)
Top windows for sliding window FST analysis
Mapping of large regions of homozygosity in pedigrees with shared ancestry has been employed in the discovery of autosomal recessive diseases associated with neurodevelopmental anomalies [31
], and more recently, in the identification of novel genes associated with ASD, schizophrenia and Alzheimer's Disorder [17
]. Although we recruited unrelated ASD cases and healthy controls, we analyzed genome-wide rates of homozygosity to estimate consanguinity and compared these rates to a heterogeneous control sample of Europeans from North America. It has been suggested that mapping of extended tracks of homozygosity may facilitate identification of low-frequency variants associated with complex diseases [36
]. We detected regions ("runs") of homozygosity (ROH) using the PLINK v1.05 software [38
] and compared the extent of homozygosity in Croatia ASD cases, control subjects and NINDS controls. We computed the number of ROH segments, total ROH length, and average ROH length for the three subsets (Table ). We detected subtle differences between the three groups for the median number of ROHs (Kruskal-Wallis test; p = 0.06), as well as average ROH length between Croatia controls and NINDS controls (Wilcoxon test; p = 0.053), with NINDS cohort showing shorter total ROH length than Croatia controls (Kruskal-Wallis test; p = 0.0056, higher due to the very large number of NINDS samples). However, the most significant pairwise difference is the total ROH length between Croatian cases Croatian controls (Wilcoxon test; p = 0.0065).
Summary of the ROH comparison across Croatia Control, Croatia Case, and NINDS sample groups.
Motivated by the study demonstrating the positive correlation between runs of homozygosity and age in a North American cohort of European descent due to urbanization [35
], we examined age correlation with ROH frequency and size in the 103 ASD subjects (Figure and Table ). We found only borderline significance between age and average ROH size, though the lack of significance may be due to the fact that the Croatia ASD cohort is 40 years younger (mean age is 21.5 y as opposed to 61.7 y in Nalls et al. [35
Scatter plots for ROH number and length versus Age of Subject. (A) Total ROH length (in Kb) versus Age; (B) Number of ROHs versus Age; (C) Average ROH size (in Kb) versus Age.
Correlation between ROH and age
ROH regions were common or specific for a demographic group (Croatia vs. North America) or a disease status (ASD vs. controls). Although the longest ROH regions do not need to necessarily coincide with the disease susceptibility, we cannot exclude a possibility that recessive variants, such as single nucleotide or copy number variants within these regions contribute to disease susceptibility. To identify which regions contribute to the significant difference between Croatia case and control groups, we sorted the 7,740 endpoints of all autosomal ROHs across Croatia case and control groups, computed the significance of association between ROH status and case/control status by Fisher's test for each pair of adjacent ROH endpoints, and examined the significance of each such ROH interval where at least 5% (15) individuals have the ROH (Table ). The most significant ROHs are in two chromosomal regions: chr2:82,040,863-82,677,336 (2p12) and chr6:29,481,192-29,719,410 (6p21.1) (Figure and Figure ). It is striking these two homozygosity regions are near several genes previously associated with neurological and psychiatric disorders. The chromosome 2 interval (636 Kb) is located downstream of CTNNA2
(alpha catenin) and LRRTM1
(leucine-rich repeat transmembrane neuronal protein 1), and contains a segment (between SNPs rs2685729 and rs2862972) significantly enriched in ASD subjects (top p = 2.5 × 10-3
). The LRRTM1
gene is associated with left-handedness (patients with autism and many other psychiatric disorders have a higher tendency to be left-handed [39
]) and may increase risk of schizophrenia [40
]. The chromosome 6 interval (238 Kb) is a gene-rich region located distal to the major histocompatibility complex. This ROH is shared by 17 ASD subjects (16.5%) and 19 controls (9.6%) spans 8 genes, including a cluster of olfactory receptors (OR11A1
), the MOG
locus, and the GABBR1
locus. The MOG
gene (myelin oligodendrocyte glycoprotein) is a transmembrane protein expressed on the surface of myelin sheath, and may be involved in multiple sclerosis (OMIM#126200) [42
]; recent studies also found that MOG
is differentially expressed in prefrontal cortex and temporal lobes in schizophrenic patients [43
]. The most significant interval (between SNPs rs1233399 and rs29225; top p = 3.4 × 10-3
) spans the 3' end of the GABBR1
(Gamma-aminobutyric acid (GABA) B receptor, 1) and the overlapping UBD
(Ubiquitin D) loci. GABBR1
is highly expressed in nervous systems and has been suggested as a candidate region for neuropsychiatric disorders [45
]. A recent study showed that the expression levels of GABBR1 are significantly lower in several brain regions in patients with autism compared with healthy controls [47
]. Finally, we examined the median IBS distances between the subjects with regions of homozygosity on chromosomes 2 (13 Croatian cases and 10 Croatian controls) and chromosome 6 (17 Croatian cases and 19 Croatian controls) with permutation tests, but found that these subjects were not more closely related than the population average. Thus, the increased homozygosity in Croatia ASD cases cannot be explained by inbreeding among close relatives in a small number of selected families. However, the observed homozygosity may be the result of shared ancestral alleles from generations back in time.
Top ROH regions in ROH association analysis
UCSC Genome Browser genomic annotation around the region chr2:82,040,863-82,677,336. Genomic annotations of a region where ROH status is associated with case/control status in the Croatia cohort.
UCSC Genome Browser genomic annotation around the region chr6:29,481,192-29,719,410. Genomic annotations of a region where ROH status is associated with case/control status in the Croatia cohort.
Several studies have shown a role for copy number variation in predisposing individuals to complex diseases [48
], including ASD [15
]. To address this, we next analyzed CNV calls from the high-density SNP genotyping data in Croatia cases and control subjects. We used PennCNV [53
], a high-resolution CNV detection method, to call CNVs from the signal intensity data. We identified 2,303 CNVs (using a 10-SNP threshold) in 304 subjects passing the quality control criteria (Table ). The median size of CNVs is 71.4 Kb, while the mean size of CNVs is 131.4 Kb. Due to the relatively small sample size, we do not expect that we could detect association of specific CNVs with disease status; instead, we focused on large and rare CNVs observed in cases but not controls, since they are more likely to be disease-related. For example, a recent study identified large (>500 Kb) CNVs to be especially pathogenic [54
]; in addition, a CNV study on schizophrenia demonstrates that CNVs >500 Kb tend to be more prevalent in cases versus controls [55
], and that CNVs >2 Mb are observed exclusively in cases [15
Summary of the CNV analysis.
Examining 299 unrelated individuals passing QC, we identified a total of 32 CNVs >500 Kb, including 20 duplications and 12 deletions (Table and Table ). These large chromosomal rearrangements, detected in DNA samples isolated from blood (rather then cell lines), were confirmed through signal intensity data for the entire chromosome in the Illumina BeadStudio software (data not shown). Among them, 5/103 cases (4.85%) and 1/197 controls subject (0.49%) have CNVs greater than 2 Mb. An exceptionally large CNV was detected in an ASD female that harbored an 18 Mb duplication on chromosome 7 (7q21.11-21.3), which includes several known neurodevelopmental genes (PCLO
). A duplication of this chromosomal region has been associated with intellectual disability and multiple congenital anomalies [58
]. Another female has a 10 Mb deletion on chromosome 11 (11q24.2-q25). The Jacobsen syndrome (MIM147791) is a contiguous gene syndrome caused by terminal deletions of the long arm of chromosome 11 and typically associated with growth and psychomotor retardation and characteristic facial dimorphism [59
Large CNVs in the Croatia dataset: CNVs longer than 500 Kb in Croatia Cases.
Large CNVs in the Croatia dataset: CNVs longer than 500 Kb in Croatia Controls.
Three of the ASD cases have multiple large rearrangements; one adult male harbors two chromosomal rearrangements (separated by 117 Mb) on chromosome 1, a 335 Kb deletion on 1p35.3 and a 882 Kb duplication on 1q21.1 (Figure ). In a second individual (adult female) we identified two distinct large CNVs: a 838 Kb deletion in an intergenic region between GRM8
on 7q31.33 (chr7:124,752,465-125,591,197) and a 691 Kb deletion at 22q13.33 (chr22:48,833,840-49,524,956). This deletion encompasses the SHANK3
gene and cytogenetic rearrangements affecting this gene have been previously associated with ASD [60
]. It has been previously reported that the 22q13 deletion syndrome, also known as Phelan-McDermid syndrome (MIM606232), is characterized by severe neonatal hypotonia and global developmental delay, normal to accelerated growth, absent to severely delayed speech, and minor dysmorphic features [62
]. The third individual (male child) with multiple large rearrangements harbors a 534 Kb intergenic deletion at 2q21.1 (chr2: 129,716,352-130,250,751) and a 400 Kb duplication at 3q29 (chr3:195,987,870-196,387,903). These large structural variants, especially those found on multiple chromosomes illustrate the need to combine high-density genotype analysis with karyotype analysis to test for the presence of balanced translocations and inversions.
Figure 7 CNV status of AGRE, NINDS control, and Croatia ASD/control cohorts. (A) CNVs around four candidate genes (ACP6, CHD1L, FMO5, and PRKAB2) in 882 kb duplication on 1q21.1 (chr1:144,943,150-145,824,905). (B) Duplication of size 979 kb around the neuronal (more ...)
In contrast to large rearrangements encompassing >10 Mbs that were observed in ASD cases, the largest rearrangement observed in control subjects is a 2.5 Mb duplication at 15q24.1-q24.2 (chr15:70,783,089-73,316,235), a known microdeletion/duplication region [64
]. Also, we found one control subject to have two duplications within the DiGeorge (MIM188400) and Velocardiofacial (MIM192430) syndrome region on chromosome 22 (22q11.21), a known pathogenic rearrangement hotspot where large (>1.5 Mb) deletions are pathogenic [65
]. These duplications (922 Kb in chr22:17,690,812-18,613,429 and 563 Kb in chr22:19,102,598-19,665,559) are separated by 489 Kb. The observation that we see more large CNVs associated with ASD subjects (>2 Mb: 5 cases, 1 control; Fisher's test; p = 0.019) suggests these large rearrangements may be rare ASD susceptible variants; follow-up investigations on larger cohorts will be necessary.