Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. The recent application of GWAS to clinic-based cohorts has also yielded genetic predictors of clinical outcomes. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. With each new dataset, new realities are discovered about GWAS data and best practices continue to be developed. The Genomics Workgroup of the National Human Genome Research Institute (NHGRI) funded electronic Medical Records and Genomics (eMERGE) network has invested considerable effort in developing strategies for QC of these data. The lessons learned by this group will be valuable for other investigators dealing with large scale genomic datasets. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the eMERGE network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. In this protocol we discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
Age-related macular degeneration is the leading cause of blindness among the adult population in the developed world. To further the understanding of this disease, we have studied the genetically isolated Amish population of Ohio and Indiana.
Cumulative genetic risk scores were calculated using the 19 known allelic associations. Exome sequencing was performed in three members of a small Amish family with AMD who lacked the common risk alleles in complement factor H (CFH) and ARMS2/HTRA1. Follow-up genotyping and association analysis was performed in a cohort of 973 Amish individuals, including 95 with self-reported AMD.
The cumulative genetic risk score analysis generated a mean genetic risk score of 1.12 (95% confidence interval [CI]: 1.10, 1.13) in the Amish controls and 1.18 (95% CI: 1.13, 1.22) in the Amish cases. This mean difference in genetic risk scores is statistically significant (P = 0.0042). Exome sequencing identified a rare variant (P503A) in CFH. Association analysis in the remainder of the Amish sample revealed that the P503A variant is significantly associated with AMD (P = 9.27 × 10−13). Variant P503A was absent when evaluated in a cohort of 791 elderly non-Amish controls, and 1456 non-Amish cases.
Data from the cumulative genetic risk score analysis suggests that the variants reported by the AMDGene consortium account for a smaller genetic burden of disease in the Amish compared with the non-Amish Caucasian population. Using exome sequencing data, we identified a novel missense mutation that is shared among a densely affected nuclear Amish family and located in a gene that has been previously implicated in AMD risk.
In this study, we describe the analysis of the genetically isolated Amish population of Ohio and Indiana for AMD.
age-related macular degeneration; linkage analysis; rare variant; exome sequencing; risk score analysis
To identify genetic associations between specific risk genes and bilateral advanced age-related macular degeneration (AMD) in a retrospective, observational case series of 1,003 patients: 173 patients with geographic atrophy in at least 1 eye and 830 patients with choroidal neovascularization in at least 1 eye.
Patients underwent clinical examination and fundus photography. The images were subsequently graded using a modified grading system adapted from the Age-Related Eye Disease Study. Genetic analysis was performed to identify genotypes at 4 AMD-associated variants (ARMS2 A69S, CFH Y402H, C3 R102G, and CFB R32Q) in these patients.
There were no statistically significant relationships between clinical findings and genotypes at CFH, C3, and CFB. The genotype at ARMS2 correlated with bilateral advanced AMD using a variety of comparisons: unilateral geographic atrophy versus bilateral geographic atrophy (P = 0.08), unilateral choroidal neovascularization versus bilateral choroidal neovascularization (P = 9.0 × 10 −8), and unilateral late AMD versus bilateral late AMD (P = 5.9 × 10 −8).
In this series, in patients with geographic atrophy or choroidal neovascularization in at least 1 eye, the ARMS2 A69S substitution strongly associated with geographic atrophy or choroidal neovascularization in the fellow eye. The ARMS2 A69S substitution may serve as a marker for bilateral advanced AMD.
age-related macular degeneration; ARMS2; choroidal neovascularization; genotypes; geographic atrophy
Glaucoma is characterized by irreversible optic nerve degeneration and is the most frequent cause of irreversible blindness worldwide. Here, the International Glaucoma Genetics Consortium conducts a meta-analysis of genome-wide association studies of vertical cup-disc ratio (VCDR), an important disease-related optic nerve parameter. In 21,094 individuals of European ancestry and 6,784 individuals of Asian ancestry, we identify 10 new loci associated with variation in VCDR. In a separate risk-score analysis of five case-control studies, Caucasians in the highest quintile have a 2.5-fold increased risk of primary open-angle glaucoma as compared with those in the lowest quintile. This study has more than doubled the known loci associated with optic disc cupping and will allow greater understanding of mechanisms involved in this common blinding condition.
Glaucoma is the most common cause of irreversible blindness worldwide. Here, the authors carry out a large meta-analysis of genetic data from individuals of European and Asian ancestry and identify 10 new loci associated with vertical cup-disc ratio, a key factor in the clinical assessment of patients with glaucoma.
The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course.
Materials and methods
We used four algorithms based on ICD-9 codes, text keywords, and medications to identify individuals with MS from a de-identified, research version of the EMR at Vanderbilt University. Using a training dataset of the records of 899 individuals, algorithms were constructed to identify and extract detailed information regarding the clinical course of MS from the text of the medical records, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale (EDSS) scores, timed 25-foot walk scores, and MS medications. Algorithms were evaluated on a test set validated by two independent reviewers.
We identified 5789 individuals with MS. For all clinical traits extracted, precision was at least 87% and specificity was greater than 80%. Recall values for clinical subtype, EDSS scores, and timed 25-foot walk scores were greater than 80%.
Discussion and conclusion
This collection of clinical data represents one of the largest databases of detailed, clinical traits available for research on MS. This work demonstrates that detailed clinical information is recorded in the EMR and can be extracted for research purposes with high reliability.
Multiple sclerosis; electronic health records
To identify novel late-onset Alzheimer disease (LOAD) risk genes, we have analyzed Amish populations of Ohio and Indiana. We performed genome-wide SNP linkage and association studies on 798 individuals (109 with LOAD). We tested association using the Modified Quasi-Likelihood Score (MQLS) test and also performed two-point and multipoint linkage analyses. We found that LOAD was significantly associated with APOE (P=9.0×10-6) in all our ascertainment regions except for the Adams County, Indiana, community (P=0.55). Genome-wide, the most strongly associated SNP was rs12361953 (P=7.92×10-7). A very strong, genome-wide significant multipoint peak (recessive HLOD=6.14, dominant HLOD=6.05) was detected on 2p12. Three additional loci with multipoint HLOD scores >3 were detected on 3q26, 9q31, and 18p11. Converging linkage and association results, the most significantly associated SNP under the 2p12 peak was at rs2974151 (P=1.29×10-4). This SNP is located in CTNNA2, which encodes catenin alpha 2, a neuronal-specific catenin known to have function in the developing brain. These results identify CTNNA2 as a novel candidate LOAD gene, and implicate three other regions of the genome as novel LOAD loci. These results underscore the utility of using family-based linkage and association analysis in isolated populations to identify novel loci for traits with complex genetic architecture.
GWAS; Linkage; founder population; Amish; Alzheimer
Knowledge of the relationship between depressive symptoms and cognition in older adults has primarily come from studies of clinically depressed, functionally impaired or cognitively impaired individuals, and in predominately White samples. Limited minority representation in depression research exposes the need to examine these associations in more ethnic/racially diverse populations. We sought to examine the relationship between depressive symptoms and cognition in a sample of non-demented older African Americans recruited from surrounding U.S. cities of New York, Greensboro, Miami, and Nashville (N = 944). Depressive symptoms were evaluated with the Geriatric Depression Scale (GDS). Cognition was evaluated with a comprehensive neuropsychological battery. Test scores were summarized into attention, executive function, memory, language, and processing speed composites. Controlling for age, education, reading level, and sex, African American older adults who endorsed more symptoms obtained significantly lower scores on measures of memory, language, processing speed, and executive functioning. Further investigation of the causal pathway underlying this association, as well as potential mediators of the relationship between depressive symptoms and cognitive test performance among older African Americans, such as cardiovascular and cerebrovascular disease, may offer potential avenues for intervention.
Aging; Ethnic groups; Depression; Executive function; Memory; Language
Autism spectrum disorder (ASD) is highly heritable, yet genome-wide association studies (GWAS), copy number variation screens, and candidate gene association studies have found no single factor accounting for a large percentage of genetic risk. ASD trio exome sequencing studies have revealed genes with recurrent de novo loss-of-function variants as strong risk factors, but there are relatively few recurrently affected genes while as many as 1000 genes are predicted to play a role. As such, it is critical to identify the remaining rare and low-frequency variants contributing to ASD.
We have utilized an approach of prioritization of genes by GWAS and follow-up with massively parallel sequencing in a case-control cohort. Using a previously reported ASD noise reduction GWAS analyses, we prioritized 837 RefSeq genes for custom targeting and sequencing. We sequenced the coding regions of those genes in 2071 ASD cases and 904 controls of European white ancestry. We applied comprehensive annotation to identify single variants which could confer ASD risk and also gene-based association analysis to identify sets of rare variants associated with ASD.
We identified a significant over-representation of rare loss-of-function variants in genes previously associated with ASD, including a de novo premature stop variant in the well-established ASD candidate gene RBFOX1. Furthermore, ASD cases were more likely to have two damaging missense variants in candidate genes than controls. Finally, gene-based rare variant association implicates genes functioning in excitatory neurotransmission and neurite outgrowth and guidance pathways including CACNAD2, KCNH7, and NRXN1.
We find suggestive evidence that rare variants in synaptic genes are associated with ASD and that loss-of-function mutations in ASD candidate genes are a major risk factor, and we implicate damaging mutations in glutamate signaling receptors and neuronal adhesion and guidance molecules. Furthermore, the role of de novo mutations in ASD remains to be fully investigated as we identified the first reported protein-truncating variant in RBFOX1 in ASD. Overall, this work, combined with others in the field, suggests a convergence of genes and molecular pathways underlying ASD etiology.
Electronic supplementary material
The online version of this article (doi:10.1186/s13229-015-0034-z) contains supplementary material, which is available to authorized users.
Electronic medical records (EMRs) are being widely implemented for use in genetic and genomic studies. As a phenotypic rich resource, EMRs provide researchers with the opportunity to identify disease cohorts and perform genotype-phenotype association studies. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study, has genotyped more than 15,000 individuals of diverse genetic ancestry in BioVU, the Vanderbilt University Medical Center’s biorepository linked to a de-identified version of the EMR (EAGLE BioVU). Here we develop and deploy an algorithm utilizing data mining techniques to identify primary open-angle glaucoma (POAG) in African Americans from EAGLE BioVU for genetic association studies. The algorithm described here was designed using a combination of diagnostic codes, current procedural terminology billing codes, and free text searches to identify POAG status in situations where gold-standard digital photography cannot be accessed. The case algorithm identified 267 potential POAG subjects but underperformed after manual review with a positive predictive value of 51.6% and an accuracy of 76.3%. The control algorithm identified controls with a negative predictive value of 98.3%. Although the case algorithm requires more downstream manual review for use in large-scale studies, it provides a basis by which to extract a specific clinical subtype of glaucoma from EMRs in the absence of digital photographs.
TREM and TREM-like receptors are a structurally similar protein family encoded by genes clustered on chromosome 6p21.11. Recent studies have identified a rare coding variant (p.R47H) in TREM2 that confers a high risk for Alzheimer’s disease (AD). In addition, common SNPs in this genomic region are associated with cerebrospinal fluid (CSF) biomarkers for AD and a common intergenic variant found near the TREML2 gene has been identified to be protective for AD. However, little is known about the functional variant underlying the latter association or its relationship with the p.R47H. Here, we report comprehensive analyses using whole-exome sequencing data, CSF biomarker analyses, meta-analyses (16,254 cases and 20,052 controls) and cell-based functional studies to support the role of the TREML2 coding missense variant p.S144G (rs3747742) as a potential driver of the meta-analysis AD-associated GWAS signal. Additionally, we demonstrate that the protective role of TREML2 in AD is independent of the role of TREM2 gene as a risk factor for AD.
Hippocampal sclerosis of aging (HS-Aging) is a high-morbidity brain disease in the elderly but risk factors are largely unknown. We report the first genome-wide association study (GWAS) with HS-Aging pathology as an endophenotype. In collaboration with the Alzheimer’s Disease Genetics Consortium, data were analyzed from large autopsy cohorts: (#1) National Alzheimer’s Coordinating Center (NACC); (#2) Rush University Religious Orders Study and Memory and Aging Project; (#3) Group Health Research Institute Adult Changes in Thought study; (#4) University of California at Irvine 90+ Study; and (#5) University of Kentucky Alzheimer’s Disease Center. Altogether, 363 HS-Aging cases and 2,303 controls, all pathologically confirmed, provided statistical power to test for risk alleles with large effect size. A two-tier study design included GWAS from cohorts #1–3 (Stage I) to identify promising SNP candidates, followed by focused evaluation of particular SNPs in cohorts #4–5 (Stage II). Polymorphism in the ATP-binding cassette, sub-family C member 9 (ABCC9) gene, also known as sulfonylurea receptor 2, was associated with HS-Aging pathology. In the meta-analyzed Stage I GWAS, ABCC9 polymorphisms yielded the lowest p values, and factoring in the Stage II results, the meta-analyzed risk SNP (rs704178:G) attained genome-wide statistical significance (p = 1.4 × 10−9), with odds ratio (OR) of 2.13 (recessive mode of inheritance). For SNPs previously linked to hippocampal sclerosis, meta-analyses of Stage I results show OR = 1.16 for rs5848 (GRN) and OR = 1.22 rs1990622 (TMEM106B), with the risk alleles as previously described. Sulfonylureas, a widely prescribed drug class used to treat diabetes, also modify human ABCC9 protein function. A subsample of patients from the NACC database (n = 624) were identified who were older than age 85 at death with known drug history. Controlling for important confounders such as diabetes itself, exposure to a sulfonylurea drug was associated with risk for HS-Aging pathology (p = 0.03). Thus, we describe a novel and targetable dementia risk factor.
Oldest old; Neuropathology; KATP; CTAGE5; ADGC; Potassium channel
We examined the role of DNA copy number variants (CNVs) of known glaucoma genes in relation to primary open angle glaucoma (POAG).
Our study included DNA samples from two studies (NEIGHBOR and GLAUGEN). All the samples were genotyped with the Illumina Human660W_Quad_v1 BeadChip. After removing non–blood-derived and amplified DNA samples, we applied quality control steps based on the mean Log R Ratio and the mean B allele frequency. Subsequently, data from 3057 DNA samples (1599 cases and 1458 controls) were analyzed with PennCNV software. We defined CNVs as those ≥5 kilobases (kb) in size and interrogated by ≥5 consecutive probes. We further limited our investigation to CNVs in known POAG-related genes, including CDKN2B-AS1, TMCO1, SIX1/SIX6, CAV1/CAV2, the LRP12-ZFPM2 region, GAS7, ATOH7, FNDC3B, CYP1B1, MYOC, OPTN, WDR36, SRBD1, TBK1, and GALC.
Genomic duplications of CDKN2B-AS1 and TMCO1 were each found in a single case. Two cases carried duplications in the GAS7 region. Genomic deletions of SIX6 and ATOH7 were each identified in one case. One case carried a TBK1 deletion and another case carried a TBK1 duplication. No controls had duplications or deletions in these six genes. A single control had a duplication in the MYOC region. Deletions of GALC were observed in five cases and two controls.
The CNV analysis of a large set of cases and controls revealed the presence of rare CNVs in known POAG susceptibility genes. Our data suggest that these rare CNVs may contribute to POAG pathogenesis and merit functional evaluation.
Our study examined DNA copy number variants of known glaucoma genes in two large cohorts - NEIGHBOR and GLAUGEN. We identified rare deletions and/or duplications in a few glaucoma genes only in POAG cases. Our data suggest potential role of these rare CNVs in POAG pathogenesis.
DNA copy number variants; POAG; genetics; SIX6; GAS7
Genome Wide Association Studies (GWAS) are a standard approach for large-scale common variation characterization and for identification of single loci predisposing to disease. However, due to issues of moderate sample sizes and particularly multiple testing correction, many variants of smaller effect size are not detected within a single allele analysis framework. Thus, small main effects and potential epistatic effects are not consistently observed in GWAS using standard analytical approaches that consider only single SNP alleles. Here we propose unique methodology that aggregates variants of interest (for example, genes in a biological pathway) using GWAS results. Multiple testing and type I error concerns are minimized using empirical genomic randomization to estimate significance. Randomization corrects for common pathway-based analysis biases such as SNP coverage and density, linkage disequilibrium, gene size and pathway size. PARIS (Pathway Analysis by Randomization Incorporating Structure) applies this randomization and in doing so directly accounts for linkage disequilibrium effects. PARIS is independent of association analysis method and is thus applicable to GWAS datasets of all study designs. Using the KEGG database as an example, we apply PARIS to the publicly available Autism Genetic Resource Exchange (AGRE) GWA dataset, revealing pathways with a significant enrichment of positive association results.
pathway analysis; genomic randomization; gene set; enrichment
Identification of mutations at familial loci for amyotrophic lateral sclerosis (ALS) has provided novel insights into the aetiology of this rapidly progressing fatal neurodegenerative disease. However, genome-wide association studies (GWAS) of the more common (∼90%) sporadic form have been less successful with the exception of the replicated locus at 9p21.2. To identify new loci associated with disease susceptibility, we have established the largest association study in ALS to date and undertaken a GWAS meta-analytical study combining 3959 newly genotyped Italian individuals (1982 cases and 1977 controls) collected by SLAGEN (Italian Consortium for the Genetics of ALS) together with samples from Netherlands, USA, UK, Sweden, Belgium, France, Ireland and Italy collected by ALSGEN (the International Consortium on Amyotrophic Lateral Sclerosis Genetics). We analysed a total of 13 225 individuals, 6100 cases and 7125 controls for almost 7 million single-nucleotide polymorphisms (SNPs). We identified a novel locus with genome-wide significance at 17q11.2 (rs34517613 with P = 1.11 × 10−8; OR 0.82) that was validated when combined with genotype data from a replication cohort (P = 8.62 × 10−9; OR 0.833) of 4656 individuals. Furthermore, we confirmed the previously reported association at 9p21.2 (rs3849943 with P = 7.69 × 10−9; OR 1.16). Finally, we estimated the contribution of common variation to heritability of sporadic ALS as ∼12% using a linear mixed model accounting for all SNPs. Our results provide an insight into the genetic structure of sporadic ALS, confirming that common variation contributes to risk and that sufficiently powered studies can identify novel susceptibility loci.
Primary open angle glaucoma (POAG), a major cause of blindness worldwide, is a complex disease with a significant genetic contribution. We performed Exome Array (Illumina) analysis on 3504 POAG cases and 9746 controls with replication of the most significant findings in 9173 POAG cases and 26 780 controls across 18 collections of Asian, African and European descent. Apart from confirming strong evidence of association at CDKN2B-AS1 (rs2157719 [G], odds ratio [OR] = 0.71, P = 2.81 × 10−33), we observed one SNP showing significant association to POAG (CDC7–TGFBR3 rs1192415, ORG-allele = 1.13, Pmeta = 1.60 × 10−8). This particular SNP has previously been shown to be strongly associated with optic disc area and vertical cup-to-disc ratio, which are regarded as glaucoma-related quantitative traits. Our study now extends this by directly implicating it in POAG disease pathogenesis.
Elevated intraocular pressure (IOP) is an important risk factor in developing glaucoma and IOP variability may herald glaucomatous development or progression. We report the results of a genome-wide association study meta-analysis of 18 population cohorts from the International Glaucoma Genetics Consortium (IGGC), comprising 35,296 multiethnic participants for IOP. We confirm genetic association of known loci for IOP and primary open angle glaucoma (POAG) and identify four new IOP loci located on chromosome 3q25.31 within the FNDC3B gene (p=4.19×10−08 for rs6445055), two on chromosome 9 (p=2.80×10−11 for rs2472493 near ABCA1 and p=6.39×10−11 for rs8176693 within ABO) and one on chromosome 11p11.2 (best p=1.04×10−11 for rs747782). Separate meta-analyses of four independent POAG cohorts, totaling 4,284 cases and 95,560 controls, show that three of these IOP loci are also associated with POAG.
Primary open-angle glaucoma (POAG) is a major cause of irreversible blindness worldwide. We performed a genome-wide association study in an Australian discovery cohort comprising 1,155 advanced POAG cases and 1,992 controls. Association of the top SNPs from the discovery stage was investigated in two Australian replication cohorts (total 932 cases, 6,862 controls) and two US replication cohorts (total 2,616 cases, 2,634 controls). Meta-analysis of all cohorts revealed three novel loci associated with development of POAG. These loci are located upstream of ABCA1 (rs2472493 [G] OR=1.31, P= 2.1 × 10-19), within AFAP1 (rs4619890 [G] OR=1.20, P= 7.0 × 10-10) and within GMDS (rs11969985 [G] OR=1.31, and P= 7.7 × 10-10). Using RT-PCR and immunolabelling, we also showed that these genes are expressed within human retina, optic nerve and trabecular meshwork and that ABCA1 and AFAP1 are also expressed in retinal ganglion cells.
Primary open-angle glaucoma (POAG) is a major cause of irreversible blindness worldwide. We performed a genome-wide association study in an Australian discovery cohort comprising 1,155 advanced POAG cases and 1,992 controls. Association of the top SNPs from the discovery stage was investigated in two Australian replication cohorts (total 932 cases, 6,862 controls) and two US replication cohorts (total 2,616 cases, 2,634 controls). Meta-analysis of all cohorts revealed three novel loci associated with development of POAG. These loci are located upstream of ABCA1 (rs2472493 [G] OR=1.31, P= 2.1 × 10−19), within AFAP1 (rs4619890 [G] OR=1.20, P= 7.0 × 10−10) and within GMDS (rs11969985 [G] OR=1.31, and P= 7.7 × 10−10). Using RT-PCR and immunolabelling, we also showed that these genes are expressed within human retina, optic nerve and trabecular meshwork and that ABCA1 and AFAP1 are also expressed in retinal ganglion cells.
Substantial progress has been made in identifying susceptibility variants for AMD in European populations; however, few studies have been conducted to understand the role these variants play in AMD risk in diverse populations. The present study aims to examine AMD risk across diverse populations in known and suspected AMD complement factor and lipid-related loci.
Targeted genotyping was performed across study sites for AMD and lipid trait-associated single nucleotide polymorphism (SNPs). Genetic association tests were performed at individual sites and then meta-analyzed using logistic regression assuming an additive genetic model stratified by self-described race/ethnicity. Participants included cases with early or late AMD and controls with no signs of AMD as determined by fundus photography. Populations included in this study were European Americans, African Americans, Mexican Americans, and Singaporeans from the Population Architecture using Genomics and Epidemiology (PAGE) study.
Index variants of AMD, rs1061170 (CFH) and rs10490924 (ARMS2), were associated with AMD at P = 3.05 × 10−8 and P = 6.36 × 10−6, respectively, in European Americans. In general, none of the major AMD index variants generalized to our non-European populations with the exception of rs10490924 in Mexican Americans at an uncorrected P value < 0.05. Four lipid-associated SNPS (LPL rs328, TRIB1 rs6987702, CETP rs1800775, and KCTD10/MVK rs2338104) were associated with AMD in African Americans and Mexican Americans (P < 0.05), but these associations did not survive strict corrections for multiple testing.
While most associations did not generalize in the non-European populations, variants within lipid-related genes were found to be associated with AMD. This study highlights the need for larger well-powered studies in non-European populations.
The Population Architecture using Genomics and Epidemiology (PAGE) I Study characterized known age-related macular degeneration risk variants previously identified in European populations in ethnically/racially diverse populations. Major AMD variants did not generalize in diverse populations.
age-related macular degeneration; CFH Y402H; ARMS2 A69S; PAGE Study; genetic epidemiology
Alzheimer disease (AD) is the most common cause of dementia. As with many complex diseases, the identified variants do not explain the total expected genetic risk that is based on heritability estimates for AD. Isolated founder populations, such as the Amish, are advantageous for genetic studies as they overcome heterogeneity limitations associated with complex population studies. We determined that Amish AD cases harbored a significantly higher burden of the known risk alleles compared to Amish cognitively normal controls, but a significantly lower burden when compared to cases from a dataset of unrelated individuals. Whole-exome sequencing of a selected subset of the overall study population was used as a screening tool to identify variants located in the regions of the genome that are most likely to contribute risk. By then genotyping the top candidate variants from the known AD genes and from linkage regions implicated previous studies in the full dataset, new associations could be confirmed. The most significant result (p = 0.0012) was for rs73938538, a synonymous variant in LAMA1 within the previously identified linkage peak on chromosome 18. However, this association is specific to the Amish and did not generalize when tested in a dataset of unrelated individuals. These results suggest that additional risk variation in the Amish remains to be identified and likely resides outside of the classical protein coding gene regions.
As APOE locus variants contribute to both risk of late-onset Alzheimer disease and differences in age-at-onset, it is important to know if other established late-onset Alzheimer disease risk loci also affect age-at-onset in cases.
To investigate the effects of known Alzheimer disease risk loci in modifying age-at-onset, and to estimate their cumulative effect on age-at-onset variation, using data from genome-wide association studies in the Alzheimer’s Disease Genetics Consortium (ADGC).
Design, Setting and Participants
The ADGC comprises 14 case-control, prospective, and family-based datasets with data on 9,162 Caucasian participants with Alzheimer’s occurring after age 60 who also had complete age-at-onset information, gathered between 1989 and 2011 at multiple sites by participating studies. Data on genotyped or imputed single nucleotide polymorphisms (SNPs) most significantly associated with risk at ten confirmed LOAD loci were examined in linear modeling of AAO, and individual dataset results were combined using a random effects, inverse variance-weighted meta-analysis approach to determine if they contribute to variation in age-at-onset. Aggregate effects of all risk loci on AAO were examined in a burden analysis using genotype scores weighted by risk effect sizes.
Main Outcomes and Measures
Age at disease onset abstracted from medical records among participants with late-onset Alzheimer disease diagnosed per standard criteria.
Analysis confirmed association of APOE with age-at-onset (rs6857, P=3.30×10−96), with associations in CR1 (rs6701713, P=7.17×10−4), BIN1 (rs7561528, P=4.78×10−4), and PICALM (rs561655, P=2.23×10−3) reaching statistical significance (P<0.005). Risk alleles individually reduced age-at-onset by 3-6 months. Burden analyses demonstrated that APOE contributes to 3.9% of variation in age-at-onset (R2=0.220) over baseline (R2=0.189) whereas the other nine loci together contribute to 1.1% of variation (R2=0.198).
Conclusions and Relevance
We confirmed association of APOE variants with age-at-onset among late-onset Alzheimer disease cases and observed novel associations with age-at-onset in CR1, BIN1, and PICALM. In contrast to earlier hypothetical modeling, we show that the combined effects of Alzheimer disease risk variants on age-at-onset are on the scale of, but do not exceed, the APOE effect. While the aggregate effects of risk loci on age-at-onset may be significant, additional genetic contributions to age-at-onset are individually likely to be small.
Alzheimer Disease; Alzheimer Disease Genetics; Alzheimer’s Disease - Pathophysiology; Genetics of Alzheimer Disease; Aging
Age-related macular degeneration (AMD) is the leading cause of irreversible visual loss in developed countries. Its etiology includes genetic and environmental factors. Although VEGFA variants are associated with AMD, the joint action of variants within the VEGF pathway and their interaction with nongenetic factors have not been investigated.
Affymetrix 6.0 chipsets were used to genotype 668,238 single nucleotide polymorphisms (SNPs) in 1207 AMD cases and 686 controls. Environmental exposures were collected by questionnaire. A set-based test was conducted using the χ2 statistic at each SNP derived from Kraft's two degree of freedom (2df) joint test. Pathway- and gene-based test statistics were calculated as the mean of all independent SNP statistics. Phenotype labels were permuted 10,000 times to generate an empirical P value.
While a main effect of the VEGF pathway was not identified, the pathway was associated with neovascular AMD in women when accounting for birth control pill (BCP) use (P = 0.017). Analysis of VEGF's subpathways showed that SNPs in the proliferation subpathway were associated with neovascular AMD (P = 0.029) when accounting for BCP use. Nominally significant genes within this subpathway were also observed. Stratification by BCP use revealed novel significant genetic effects in women who had taken BCPs.
These results illustrate that some AMD genetic risk factors may be revealed only when complex relationships among risk factors are considered. This shows the utility of exploring pathways of previously associated genes to find novel effects. It also demonstrates the importance of incorporating environmental exposures in tests of genetic association at the SNP, gene, or pathway level.
Analysis using a set-based joint test of genetic main effects and environmental interaction found that SNPs in VEGF's proliferation subpathway were associated with neovascular AMD when exogenous estrogen use in women was accounted for.
age-related macular degeneration; case-control study; epidemiology; statistics; candidate genes
Although autism is one of the most heritable neuropsychiatric disorders, its underlying genetic architecture has largely eluded description. To comprehensively examine the hypothesis that common variation is important in autism, we performed a genome-wide association study (GWAS) using a discovery dataset of 438 autistic Caucasian families and the Illumina Human 1M beadchip. 96 single nucleotide polymorphisms (SNPs) demonstrated strong association with autism risk (p-value < 0.0001). The validation of the top 96 SNPs was performed using an independent dataset of 487 Caucasian autism families genotyped on the 550K Illumina BeadChip. A novel region on chromosome 5p14.1 showed significance in both the discovery and validation datasets. Joint analysis of all SNPs in this region identified 8 SNPs having improved p-values (3.24E-04 to 3.40E-06) than in either dataset alone. Our findings demonstrate that in addition to multiple rare variations, part of the complex genetic architecture of autism involves common variation.
Multiple sclerosis is a debilitating neuroimmunological and neurodegenerative disease affecting more than 400,000 individuals in the United States. Population and family-based studies have suggested that there is a strong genetic component. Numerous genomic linkage screens have identified regions of interest for MS loci. Our own second-generation genome-wide linkage study identified a handful of non-MHC regions with suggestive linkage. Several of these regions were further examined using single-nucleotide polymorphisms (SNPs) with average spacing between SNPs of approximately 1.0 Mb in a dataset of 173 multiplex families. The results of that study provided further evidence for the involvement of the chromosome 1q43 region. This region is of particular interest given linkage evidence in studies of other autoimmune and inflammatory diseases including rheumatoid arthritis and systemic lupus erythematosus. In this follow-up study, we saturated the region with ~700 SNPs (average spacing of 10kb per SNP) in search of disease associated variation within this region. We found preliminary evidence to suggest that common variation within the RGS7 locus may be involved in disease susceptibility.
multiple sclerosis; linkage; association; 1q43; RGS7
A broad region of chromosome 10 (chr10) has engendered continued interest in the etiology of late-onset Alzheimer Disease (LOAD) from both linkage and candidate gene studies. However, there is a very extensive heterogeneity on chr10. We converged linkage analysis and gene expression data using the concept of genomic convergence that suggests that genes showing positive results across multiple different data types are more likely to be involved in AD. We identified and examined 28 genes on chr10 for association with AD in a Caucasian case-control dataset of 506 cases and 558 controls with substantial clinical information. The cases were all LOAD (minimum age at onset ≥ 60 years). Both single marker and haplotypic associations were tested in the overall dataset and 8 subsets defined by age, gender, ApoE and clinical status. PTPLA showed allelic, genotypic and haplotypic association in the overall dataset. SORCS1 was significant in the overall data sets (p=0.0025) and most significant in the female subset (allelic association p=0.00002, a 3-locus haplotype had p=0.0005). Odds Ratio of SORCS1 in the female subset was 1.7 (p<0.0001). SORCS1 is an interesting candidate gene involved in the Aβ pathway. Therefore, genetic variations in PTPLA and SORCS1 may be associated and have modest effect to the risk of AD by affecting Aβ pathway. The replication of the effect of these genes in different study populations and search for susceptible variants and functional studies of these genes are necessary to get a better understanding of the roles of the genes in Alzheimer disease.
Alzheimer disease; late-onset Alzheimer Diseasev; LOAD; genomic convergence; association; candidate genes; PTPLA; SORCS1