Genetic Analysis Workshop 19 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence and gene expression data from a pedigree-based sample, as well as whole-exome sequence data from a large cohort of unrelated individuals. In this article we present an overview of the data sets, the GAW experience, and summaries of the contributions arranged into nine methodological themes.
Identifying variants that regulate gene expression and delineating their genetic architecture is a critical next step in our endeavors to better understand the genetic etiology of complex diseases. The appropriate genomic tools are in place, and preliminary analytic strategies have been developed.
Here we used Genetic Analysis Workshop (GAW) 19 data to investigate the genetic complexity of expression quantitative trait loci (eQTL), chromosomal regions likely to harbor regulatory elements responsible for gene expression. For this investigation, we analyzed the lymphocyte expression profiles of 653 individuals in 20 pedigrees who were also genotyped by single nucleotide polymorphism (SNP) arrays, followed by sequencing and imputation. We used these data to examine the degree of allelic heterogeneity, a contributor to genetic complexity at eQTL, by sequentially conditioning on the most significantly associated SNPs.
SOLAR (Sequential Oligogenic Linkage Analysis Routines)-MGA (measured genotype approach) and FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) software allowed us to analyze pedigree data. The power and Type 1 error rates for single SNP association testing and multiple SNP sequential association testing were consistent for these programs. Sequential conditioning of the real expression data revealed substantial levels of allelic heterogeneity at the 2 eQTL examined, illustrating this feature of genetic complexity.
eQTL exhibit substantial genetic complexity among and within pedigrees.
Background: A long-standing epidemiological puzzle is the reduced rate of rheumatoid arthritis (RA) in those with schizophrenia (SZ) and vice versa. Traditional epidemiological approaches to determine if this negative association is underpinned by genetic factors would test for reduced rates of one disorder in relatives of the other, but sufficiently powered data sets are difficult to achieve. The genomics era presents an alternative paradigm for investigating the genetic relationship between two uncommon disorders.
Methods: We use genome-wide common single nucleotide polymorphism (SNP) data from independently collected SZ and RA case-control cohorts to estimate the SNP correlation between the disorders. We test a genotype X environment (GxE) hypothesis for SZ with environment defined as winter- vs summer-born.
Results: We estimate a small but significant negative SNP-genetic correlation between SZ and RA (−0.046, s.e. 0.026, P = 0.036). The negative correlation was stronger for the SNP set attributed to coding or regulatory regions (−0.174, s.e. 0.071, P = 0.0075). Our analyses led us to hypothesize a gene-environment interaction for SZ in the form of immune challenge. We used month of birth as a proxy for environmental immune challenge and estimated the genetic correlation between winter-born and non-winter born SZ to be significantly less than 1 for coding/regulatory region SNPs (0.56, s.e. 0.14, P = 0.00090).
Conclusions: Our results are consistent with epidemiological observations of a negative relationship between SZ and RA reflecting, at least in part, genetic factors. Results of the month of birth analysis are consistent with pleiotropic effects of genetic variants dependent on environmental context.
Schizophrenia; rheumatoid arthritis; genetic relationship; pleiotropy
Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection (SSC) (N = 2,591 families) replicates prior findings of strong association with autism spectrum disorders (ASDs) and confirms six risk loci (1q21.1, 3q29, 7q11.23, 16p11.2, 15q11.2-13, and 22q11.2). The addition of published CNV data from the Autism Genome Project (AGP) and exome sequencing data from the SSC and the Autism Sequencing Consortium (ASC) shows that genes within small de novo deletions, but not within large dnCNVs, significantly overlap the high-effect risk genes identified by sequencing. Alternatively, large dnCNVs are found likely to contain multiple modest-effect risk genes. Overall, we find strong evidence that de novo mutations are associated with ASD apart from the risk for intellectual disability. Extending the transmission and de novo association test (TADA) to include small de novo deletions reveals 71 ASD risk loci, including 6 CNV regions (noted above) and 65 risk genes (FDR ≤ 0.1).
Fears et al. investigate brain-behaviour associations in families genetically enriched for bipolar disorder. Increased ventrolateral prefrontal thickness is associated with better memory in affected individuals but not unaffected family members. Effects of ageing on cognition do not differ between the diagnostic groups, with greater global brain volume associated with cognitive resilience in both.
Fears et al. investigate brain-behaviour associations in families genetically enriched for bipolar disorder. Increased ventrolateral prefrontal thickness is associated with better memory in affected individuals but not unaffected family members. Effects of ageing on cognition do not differ between the diagnostic groups, with greater global brain volume associated with cognitive resilience in both.
Recent theories regarding the pathophysiology of bipolar disorder suggest contributions of both neurodevelopmental and neurodegenerative processes. While structural neuroimaging studies indicate disease-associated neuroanatomical alterations, the behavioural correlates of these alterations have not been well characterized. Here, we investigated multi-generational families genetically enriched for bipolar disorder to: (i) characterize neurobehavioural correlates of neuroanatomical measures implicated in the pathophysiology of bipolar disorder; (ii) identify brain–behaviour associations that differ between diagnostic groups; (iii) identify neurocognitive traits that show evidence of accelerated ageing specifically in subjects with bipolar disorder; and (iv) identify brain–behaviour correlations that differ across the age span. Structural neuroimages and multi-dimensional assessments of temperament and neurocognition were acquired from 527 (153 bipolar disorder and 374 non-bipolar disorder) adults aged 18–87 years in 26 families with heavy genetic loading for bipolar disorder. We used linear regression models to identify significant brain–behaviour associations and test whether brain–behaviour relationships differed: (i) between diagnostic groups; and (ii) as a function of age. We found that total cortical and ventricular volume had the greatest number of significant behavioural associations, and included correlations with measures from multiple cognitive domains, particularly declarative and working memory and executive function. Cortical thickness measures, in contrast, showed more specific associations with declarative memory, letter fluency and processing speed tasks. While the majority of brain–behaviour relationships were similar across diagnostic groups, increased cortical thickness in ventrolateral prefrontal and parietal cortical regions was associated with better declarative memory only in bipolar disorder subjects, and not in non-bipolar disorder family members. Additionally, while age had a relatively strong impact on all neurocognitive traits, the effects of age on cognition did not differ between diagnostic groups. Most brain–behaviour associations were also similar across the age range, with the exception of cortical and ventricular volume and lingual gyrus thickness, which showed weak correlations with verbal fluency and inhibitory control at younger ages that increased in magnitude in older subjects, regardless of diagnosis. Findings indicate that neuroanatomical traits potentially impacted by bipolar disorder are significantly associated with multiple neurobehavioural domains. Structure–function relationships are generally preserved across diagnostic groups, with the notable exception of ventrolateral prefrontal and parietal association cortex, volumetric increases in which may be associated with cognitive resilience specifically in individuals with bipolar disorder. Although age impacted all neurobehavioural traits, we did not find any evidence of accelerated cognitive decline specific to bipolar disorder subjects. Regardless of diagnosis, greater global brain volume may represent a protective factor for the effects of ageing on executive functioning.
bipolar disorder; structural MRI; neurocognition; temperament; pedigrees; component phenotype
The observation that variants regulating gene expression (expression quantitative trait loci, eQTL) are at a high frequency among SNPs associated with complex traits has made the genome-wide characterization of gene expression an important tool in genetic mapping studies of such traits. As part of a study to identify genetic loci contributing to bipolar disorder and other quantitative traits in members of 26 pedigrees from Costa Rica and Colombia, we measured gene expression in lymphoblastoid cell lines derived from 786 pedigree members. The study design enabled us to comprehensively reconstruct the genetic regulatory network in these families, provide estimates of heritability, identify eQTL, evaluate missing heritability for the eQTL, and quantify the number of different alleles contributing to any given locus. In the eQTL analysis, we utilize a recently proposed hierarchical multiple testing strategy which controls error rates regarding the discovery of functional variants. Our results elucidate the heritability and regulation of gene expression in this unique Latin American study population and identify a set of regulatory SNPs which may be relevant in future investigations of complex disease in this population. Since our subjects belong to extended families, we are able to compare traditional kinship-based estimates with those from more recent methods that depend only on genotype information.
We assess the heritability and genetic regulation of gene expression in a population of 786 individuals from Costa Rica and Colombia. The subjects, originally recruited in a study of bipolar disorder, are related within 26 extended families. This design allows us to compare estimates of the heritability of gene expression obtained using both traditional and genotype-based methods. We address questions regarding the architecture of genetic regulation including the extent to which gene expression is influenced by variants located nearby vs. far away on the genome and how many variants affect the expression of a given gene. In addition, we identify genetic variants which regulate gene expression; these serve as candidates for future studies to establish the genetic basis of complex traits, including those related to bipolar disorder, and also provide insight into the architecture of genetic regulation in this unique Latin American study population.
Genetic Analysis Workshop 18 provided a platform for developing and evaluating statistical methods to analyze whole-genome sequence data from a pedigree-based sample. In this article we present an overview of the data sets and the contributions that analyzed these data. The family data, donated by the Type 2 Diabetes Genetic Exploration by Next-Generation Sequencing in Ethnic Samples Consortium, included sequence-level genotypes based on sequencing and imputation, genome-wide association genotypes from prior genotyping arrays, and phenotypes from longitudinal assessments. The contributions from individual research groups were extensively discussed before, during, and after the workshop in theme-based discussion groups before being submitted for publication.
Autism Spectrum Disorder (ASD) is characterized by deficits in social function and the presence of repetitive and restrictive behaviors. Following a previous test of principle, we adopted a quantitative approach to discovering genes contributing to the broader autism phenotype by using social responsiveness as an endophenotype for ASD.
Linkage analyses using scores from the Social Responsiveness Scale (SRS) were performed in 590 families from AGRE, a largely multiplex ASD cohort. Regional and genome-wide association analyses were performed to search for common variants contributing to social responsiveness.
SRS is unimodally distributed in male offspring from multiplex autism families, in contrast with a bimodal distribution observed in females. In correlated analyses differing by SRS respondent, genome-wide significant linkage for social responsiveness was identified at chr8p21.3 (multi-point LOD=4.11; teacher/parent scores) and chr8q24.22 (multi-point LOD=4.54; parent-only scores), respectively. Genome-wide or linkage-directed association analyses did not detect common variants contributing to social responsiveness.
The sex-differential distributions of SRS in multiplex autism families likely reflect mechanisms contributing to the sex ratio for autism observed in the general population and form a quantitative signature of reduced penetrance of inherited liability to ASD among females. The identification of two strong loci for social responsiveness validates the endophenotype approach for the identification of genetic variants contributing to complex traits such as ASD. While causal mutations have yet to be identified, these findings are consistent with segregation of rare genetic variants influencing social responsiveness and underscore the increasingly recognized role of rare inherited variants in the genetic architecture of ASD.
linkage; social responsiveness; autism; endophenotype; association
We currently have the ability to quantify transcript abundance of messenger RNA (mRNA), genome-wide, using microarray technologies. Analyzing genotype, phenotype and expression data from 20 pedigrees, the members of our Genetic Analysis Workshop (GAW) 19 gene expression group published 9 papers, tackling some timely and important problems and questions. To study the complexity and interrelationships of genetics and gene expression, we used established statistical tools, developed newer statistical tools, and developed and applied extensions to these tools.
To study gene expression correlations in the pedigree members (without incorporating genotype or trait data into the analysis), 2 papers used principal components analysis, weighted gene coexpression network analysis, meta-analyses, gene enrichment analyses, and linear mixed models. To explore the relationship between genetics and gene expression, 2 papers studied expression quantitative trait locus allelic heterogeneity through conditional association analyses, and epistasis through interaction analyses. A third paper assessed the feasibility of applying allele-specific binding to filter potential regulatory single-nucleotide polymorphisms (SNPs). Analytic approaches included linear mixed models based on measured genotypes in pedigrees, permutation tests, and covariance kernels. To incorporate both genotype and phenotype data with gene expression, 4 groups employed linear mixed models, nonparametric weighted U statistics, structural equation modeling, Bayesian unified frameworks, and multiple regression.
Results and discussion
Regarding the analysis of pedigree data, we found that gene expression is familial, indicating that at least 1 factor for pedigree membership or multiple factors for the degree of relationship should be included in analyses, and we developed a method to adjust for familiality prior to conducting weighted co-expression gene network analysis. For SNP association and conditional analyses, we found FaST-LMM (Factored Spectrally Transformed Linear Mixed Model) and SOLAR-MGA (Sequential Oligogenic Linkage Analysis Routines –Major Gene Analysis) have similar type 1 and type 2 errors and can be used almost interchangeably. To improve the power and precision of association tests, prior knowledge of DNase-I hypersensitivity sites or other relevant biological annotations can be incorporated into the analyses. On a biological level, eQTL (expression quantitative trait loci) are genetically complex, exhibiting both allelic heterogeneity and epistasis. Including both genotype and phenotype data together with measurements of gene expression was found to be generally advantageous in terms of generating improved levels of significance and in providing more interpretable biological models.
Pedigrees can be used to conduct analyses of and enhance gene expression studies.
Nonhuman primates (NHP) provide crucial biomedical model systems intermediate between rodents and humans. The vervet monkey (also called the African green monkey) is a widely used NHP model that has unique value for genetic and genomic investigations of traits relevant to human diseases. This article describes the phylogeny and population history of the vervet monkey and summarizes the use of both captive and wild vervet monkeys in biomedical research. It also discusses the effort of an international collaboration to develop the vervet monkey as the most comprehensively phenotypically and genomically characterized NHP, a process that will enable the scientific community to employ this model for systems biology investigations.
African green monkey; genetics; genomics; phenomics; simian immunodeficiency virus [SIV]; systems biology; transcriptomics; vervet
Cerebral dominance of language function and hand preference are suggested to be heritable traits with possible shared genetic background. However, joined genetic studies of these traits have never been conducted. We performed a genetic linkage study in 37 multigenerational human pedigrees of both sexes (consisting of 355 subjects) enriched with left-handedness in which we also measured language lateralization. Hand preference was measured with the Edinburgh Handedness Inventory, and language lateralization was measured with functional transcranial Doppler during language production. The estimated heritability of left-handedness and language lateralization in these pedigrees is 0.24 and 0.31, respectively. A parametric major gene model was tested for left-handedness. Nonparametric analyses were performed for left-handedness, atypical lateralization, and degree of language lateralization. We did not observe genome-wide evidence for linkage in the parametric or nonparametric analyses for any of the phenotypes tested. However, multiple regions showed suggestive evidence of linkage. The parametric model showed suggestive linkage for left-handedness in the 22q13 region [heterogeneity logarithm of odds (HLOD) = 2.18]. Nonparametric multipoint analysis of left-handedness showed suggestive linkage in the same region [logarithm of odds (LOD) = 2.80]. Atypical language lateralization showed suggestive linkage in the 7q34 region (LODMax = 2.35). For strength of language lateralization, we observed suggestive linkage in the 6p22 (LODMax = 2.54), 7q32 (LODMax = 1.93), and 9q33 (LODMax = 2.10) regions. We did not observe any overlap of suggestive genetic signal between handedness and the extent of language lateralization. The absence of significant linkage argues against the presence of a major gene coding for both traits; rather, our results are suggestive of these traits being two independent polygenic complex traits.
asymmetry; genetics; hand-preference; language lateralization; left-handedness; linkage analysis
The identification of autism susceptibility genes has been hampered by phenotypic heterogeneity of autism, among other factors. However, the use of endophenotypes has shown preliminary success in reducing heterogeneity and identifying potential autism-related susceptibility regions. To further explore the utility of using language related endophenotypes, we performed linkage analysis on multiplex autism families stratified according to delayed expressive speech and also assessed the extent to which parental phenotype information would aid in identifying regions of linkage. A whole genome scan using a multipoint nonparametric linkage approach was performed in 133 families, stratifying the sample by phrase speech delay and word delay. None of the regions reached suggested genome-wide or replication significance thresholds. However, several loci on chromosomes 1, 2, 4, 6, 7, 8, 9, 10, 12, 15, and 19 yielded nominally higher linkage signals in the delayed groups. The results did not support reported linkage findings for loci on chromosomes 7 or 13 that were a result of stratification based on the language delay endophenotype. In addition, inclusion of information on parental history of language delay did not appreciably affect the linkage results. The nominal increase in NPL scores across several regions using language delay endophenotypes for stratification suggests that this strategy may be useful in attenuating heterogeneity. However, the inconsistencies in regions identified across studies highlight the importance of increasing sample sizes to provide adequate power to test replications in independent samples.
Autism; linkage; endophenotypes; language; AGRE
We summarize the work done by the contributors to Group 13 at Genetic Analysis Workshop 17 (GAW17) and provide a synthesis of their data analyses. The Group 13 contributors used a variety of approaches to test associations of both rare variants and common single-nucleotide polymorphisms (SNPs) with the GAW17 simulated traits, implementing analytic methods that incorporate multiallelic genotypes and haplotypes. In addition to using a wide variety of statistical methods and approaches, the contributors exhibited a remarkable amount of flexibility and creativity in coding the variants and their genes and in evaluating their proposed approaches and methods. We describe and contrast their methods along three dimensions: (1) selection and coding of genetic entities for analysis, (2) method of analysis, and (3) evaluation of the results. The contributors consistently presented a strong rationale for using multiallelic analytic approaches. They indicated that power was likely to be increased by capturing the signals of multiple markers within genetic entities defined by sliding windows, haplotypes, genes, functional pathways, and the entire set of SNPs and rare variants taken in aggregate. Despite this variability, the methods were fairly consistent in their ability to identify two associated genes for each simulated trait. The first gene was selected for the largest number of causal alleles and the second for a high-frequency causal SNP. The presumed model of inheritance and choice of genetic entities are likely to have a strong effect on the outcomes of the analyses.
rare variants; sequence data; multiallelic data; Bayesian regression; penalized regression; tree-based clustering; pathway analysis; haplotypes
Schizophrenia is a common complex disorder with polygenic inheritance. Here we show that by using an approach that compares the individual loads of rare variants in 1,042 schizophrenia cases and 961 controls, schizophrenia cases carry an increased burden of deleterious mutations. At a genome-wide level, our results implicate non-synonymous, splice site as well as stop-altering single-nucleotide variations occurring at minor allele frequency of ≥ 0.01% in the population. In an independent replication sample of 5,585 schizophrenia cases and 8,103 controls of European ancestry we confirm an enrichment in cases of the alleles identified in our study. In addition, the genes implicated by the increased burden of rare coding variants highlight the involvement of neurodevelopment in the etiology of schizophrenia.
Schizophrenia is a common complex disorder with polygenic inheritance. Here we show that by using an approach that compares the individual loads of rare variants in 1,042 schizophrenia cases and 961 controls, schizophrenia cases carry an increased burden of deleterious mutations. At a genome-wide level, our results implicate non-synonymous, splice site as well as stop-altering single-nucleotide variations occurring at minor allele frequency of ≥0.01% in the population. In an independent replication sample of 5,585 schizophrenia cases and 8,103 controls of European ancestry we confirm an enrichment in cases of the alleles identified in our study. In addition, the genes implicated by the increased burden of rare coding variants highlight the involvement of neurodevelopment in the aetiology of schizophrenia.
Schizophrenia is a complex disorder with high heritability but poorly understood genetics. Here Olde Loohuis et al. compare schizophrenia patients to unaffected individuals and identify an increased individual burden of rare deleterious mutations in patients.
We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available.
We identified genome-wide single nucleotide polymorphisms (SNPs) by WGS of 721 members of an extended pedigree from the VRC. From high-depth WGS data we identified more than 4 million polymorphic unequivocal segregating sites; by pruning these SNPs based on heterozygosity, quality control filters, and the degree of linkage disequilibrium (LD) between SNPs, we constructed genome-wide panels suitable for genetic association (about 500,000 SNPs) and linkage analysis (about 150,000 SNPs). To further enhance the utility of these resources for linkage analysis, we used a further pruned subset of the linkage panel to generate multipoint identity by descent matrices.
The genetic and phenotypic resources now available for the VRC and other Caribbean-origin vervets enable their use for genetic investigation of traits relevant to human diseases.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0152-2) contains supplementary material, which is available to authorized users.
Vervet; Non-human primate; Whole genome sequencing; SNP; Linkage; Association
Genetic factors contribute to risk for bipolar disorder (BP), yet its
pathogenesis remains poorly understood. A focus on measuring multi-system
quantitative traits that may be components of BP psychopathology may enable
genetic dissection of this complex disorder, and investigation of extended
pedigrees from genetically isolated populations may facilitate the detection
of specific genetic variants that impact on BP as well as its component
To identify quantitative neurocognitive, temperament-related, and
neuroanatomic phenotypes that appear heritable and associated with severe
bipolar disorder (BP-I), and therefore suitable for genetic linkage and
association studies aimed at identifying variants contributing to BP-I
Multi-generational pedigree study in two closely related, genetically
isolated populations: the Central Valley of Costa Rica (CVCR) and Antioquia,
738 individuals, all from CVCR and ANT pedigrees, of whom 181 are
affected with BP-I.
MAIN OUTCOME MEASURE
Familial aggregation (heritability) and association with BP-I of 169
quantitative neurocognitive, temperament, magnetic resonance imaging (MRI)
and diffusion tensor imaging (DTI) phenotypes.
Seventy-five percent (126) of the phenotypes investigated were
significantly heritable, and 31% (53) were associated with BP-I.
About 1/4 of the phenotypes, including measures from each phenotype domain,
were both heritable and associated with BP-I. Neuroimaging phenotypes,
particularly cortical thickness in prefrontal and temporal regions, and
volume and microstructural integrity of the corpus callosum, represented the
most promising candidate traits for genetic mapping related to BP based on
strong heritability and association with disease. Analyses of phenotypic and
genetic covariation identified substantial correlations among the traits, at
least some of which share a common underlying genetic architecture.
CONCLUSIONS AND RELEVANCE
This is the most extensive investigation of BP-relevant component
phenotypes to date. Our results identify brain and behavioral quantitative
traits that appear to be genetically influenced and show a pattern of
BP-I-association within families that is consistent with expectations from
case-control studies. Together these phenotypes provide a basis for
identifying loci contributing to BP-I risk and for genetic dissection of the
Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here, we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain providing biological plausibility for the findings. Many findings have the potential to provide entirely novel insights into aetiology, but associations at DRD2 and multiple genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that play important roles in immunity, providing support for the hypothesized link between the immune system and schizophrenia.
To assess the utility of online patient self-report outcomes in a rare disease, we attempted to observe the effects of corticosteroids in delaying age at fulltime wheelchair use in Duchenne muscular dystrophy (DMD) using data from 1,057 males from DuchenneConnect, an online registry. Data collected were compared to prior natural history data in regard to age at diagnosis, mutation spectrum, and age at loss of ambulation. Because registrants reported differences in steroid and other medication usage, as well as age and ambulation status, we could explore these data for correlations with age at loss of ambulation. Using multivariate analysis, current steroid usage was the most significant and largest independent predictor of improved wheelchair-free survival. Thus, these online self-report data were sufficient to retrospectively observe that current steroid use by patients with DMD is associated with a delay in loss of ambulation. Comparing commonly used steroid drugs, deflazacort prolonged ambulation longer than prednisone (median 14 years and 13 years, respectively). Further, use of Vitamin D and Coenzyme Q10, insurance status, and age at diagnosis after 4 years were also significant, but smaller, independent predictors of longer wheelchair-free survival. Nine other common supplements were also individually tested but had lower study power. This study demonstrates the utility of DuchenneConnect data to observe therapeutic differences, and highlights needs for improvement in quality and quantity of patient-report data, which may allow exploration of drug/therapeutic practice combinations impractical to study in clinical trial settings. Further, with the low barrier to participation, we anticipate substantial growth in the dataset in the coming years.
As the availability of cost-effective high-throughput sequencing technology increases, genetic research is beginning to focus on identifying the contributions of rare variants (RVs) to complex traits. Using RVs to detect associated genes requires statistical approaches that mitigate the lack of power with the analysis of single RVs. Here we report the development and application of an approach that aggregates and evaluates the transmissions of RVs in parent-child trios. An initial score that incorporates the distortion in transmission of the observed RVs from the parents to their offspring is calculated for each variant. The scores are analyzed using a support vector machine that handles these data by mapping the transmission distortion of the multiple RVs into a one-dimensional score in a nonlinear fashion when parent-child trios with affected and nonaffected children are contrasted. We refer to this approach as Trio-SVM. A total of 275 trios were available in the Genetic Analysis Workshop 18 data for analysis. Because of their nonindependence and the extended linkage disequilibrium (LD) within pedigrees, Trio-SVM was vulnerable to type I errors in detecting association. Using the GAW18 data with simulated trait values, Trio-SVM has an appropriate type I error, but it lacks power with a sample of 267 trios. Larger samples of 500 to 1000 trios, derived from combining the simulated data, provided sufficient power. Two chromosome 3 candidate genes were tested in the real GAW18 data with Trio-SVM, and they showed marginal associations with hypertension.
The Mexican population and others with Amerindian heritage exhibit a substantial predisposition to dyslipidemias and coronary heart disease. Yet, these populations remain underinvestigated by genomic studies, and to date, no genome-wide association (GWA) studies have been reported for lipids in these rapidly expanding populations.
Methods and Findings
We performed a two-stage GWA study for hypertriglyceridemia and low high-density lipoprotein cholesterol (HDL-C) in Mexicans (n=4,361) and identified a novel Mexican-specific genome-wide significant locus for serum triglycerides (TGs) near the Niemann-Pick type C1 protein (NPC1) gene (P=2.43×10−08). Furthermore, three European loci for TGs (APOA5, GCKR, and LPL) and four loci for HDL-C (ABCA1, CETP, LIPC and LOC55908) reached genome-wide significance in Mexicans. We utilized cross-ethnic mapping to narrow three European TG GWA loci, APOA5, MLXIPL, and CILP2 that were wide and contained multiple candidate variants in the European scan. At the APOA5 locus, this reduced the most likely susceptibility variants to one, rs964184. Importantly, our functional analysis demonstrated a direct link between rs964184 and postprandial serum apoAV protein levels, supporting rs964184 as the causative variant underlying the European and Mexican GWA signal. Overall, 52 of the 100 reported associations from European lipid GWA meta-analysis generalized to Mexicans. However, in 82 of the 100 European GWA loci, a different variant other than the European lead/best-proxy variant had the strongest regional evidence of association in Mexicans.
This first Mexican GWA study of lipids identified a novel GWA locus for high TG levels; utilized the inter-population heterogeneity to significantly restrict three previously known European GWA signals; and surveyed whether the European lipid GWA SNPs extend to the Mexican population.
Given prior evidence for the contribution of rare copy number variations (CNVs) to autism spectrum disorders (ASD), we studied these events in 4,457 individuals from 1,174 simplex families, composed of parents, a proband and, in most kindreds, an unaffected sibling. We find significant association of ASD with de novo duplications of 7q11.23, where the reciprocal deletion causes Williams-Beuren syndrome, featuring a highly social personality. We identify rare recurrent de novo CNVs at five additional regions including two novel ASD loci, 16p13.2 (including the genes USP7 and C16orf72) and Cadherin13, and implement a rigorous new approach to evaluating the statistical significance of these observations. Overall, we find large de novo CNVs carry substantial risk (OR=3.55; CI =2.16-7.46, p=6.9 × 10−6); estimate the presence of 130-234 distinct ASD-related CNV intervals across the genome; and, based on data from multiple studies, present compelling evidence for the association of rare de novo events at 7q11.23, 15q11.2-13.1, 16p11.2, and Neurexin1.
Autism spectrum disorders (ASDs) are male-biased and genetically heterogeneous. While sequencing of sporadic cases has identified de novo risk variants, the heritable genetic contribution and mechanisms driving the male bias are less understood. Here, we aimed to identify familial and sex-differential risk loci in the largest available, uniformly ascertained, densely genotyped sample of multiplex ASD families from the Autism Genetics Resource Exchange (AGRE), and to compare results with earlier findings from AGRE.
From a total sample of 1,008 multiplex families, we performed genome-wide, non-parametric linkage analysis in a discovery sample of 847 families, and separately on subsets of families with only male, affected children (male-only, MO) or with at least one female, affected child (female-containing, FC). Loci showing evidence for suggestive linkage (logarithm of odds ≥2.2) in this discovery sample, or in previous AGRE samples, were re-evaluated in an extension study utilizing all 1,008 available families. For regions with genome-wide significant linkage signal in the discovery stage, those families not included in the corresponding discovery sample were then evaluated for independent replication of linkage. Association testing of common single nucleotide polymorphisms (SNPs) was also performed within suggestive linkage regions.
We observed an independent replication of previously observed linkage at chromosome 20p13 (P < 0.01), while loci at 6q27 and 8q13.2 showed suggestive linkage in our extended sample. Suggestive sex-differential linkage was observed at 1p31.3 (MO), 8p21.2 (FC), and 8p12 (FC) in our discovery sample, and the MO signal at 1p31.3 was supported in our expanded sample. No sex-differential signals met replication criteria, and no common SNPs were significantly associated with ASD within any identified linkage regions.
With few exceptions, analyses of subsets of families from the AGRE cohort identify different risk loci, consistent with extreme locus heterogeneity in ASD. Large samples appear to yield more consistent results, and sex-stratified analyses facilitate the identification of sex-differential risk loci, suggesting that linkage analyses in large cohorts are useful for identifying heritable risk loci. Additional work, such as targeted re-sequencing, is needed to identify the specific variants within these loci that are responsible for increasing ASD risk.
Male brain; Sex differences; Intermediate phenotype; Linkage analysis; Association; AGRE
The Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) Consortium is a collaborative network of researchers working together on a range of large-scale studies that integrate data from 70 institutions worldwide. Organized into Working Groups that tackle questions in neuroscience, genetics, and medicine, ENIGMA studies have analyzed neuroimaging data from over 12,826 subjects. In addition, data from 12,171 individuals were provided by the CHARGE consortium for replication of findings, in a total of 24,997 subjects. By meta-analyzing results from many sites, ENIGMA has detected factors that affect the brain that no individual site could detect on its own, and that require larger numbers of subjects than any individual neuroimaging study has currently collected. ENIGMA’s first project was a genome-wide association study identifying common variants in the genome associated with hippocampal volume or intracranial volume. Continuing work is exploring genetic associations with subcortical volumes (ENIGMA2) and white matter microstructure (ENIGMA-DTI). Working groups also focus on understanding how schizophrenia, bipolar illness, major depression and attention deficit/hyperactivity disorder (ADHD) affect the brain. We review the current progress of the ENIGMA Consortium, along with challenges and unexpected discoveries made on the way.
Genetics; MRI; GWAS; Consortium; Meta-analysis; Multi-site