Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. The recent application of GWAS to clinic-based cohorts has also yielded genetic predictors of clinical outcomes. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. With each new dataset, new realities are discovered about GWAS data and best practices continue to be developed. The Genomics Workgroup of the National Human Genome Research Institute (NHGRI) funded electronic Medical Records and Genomics (eMERGE) network has invested considerable effort in developing strategies for QC of these data. The lessons learned by this group will be valuable for other investigators dealing with large scale genomic datasets. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the eMERGE network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. In this protocol we discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
Age-related macular degeneration is the leading cause of blindness among the adult population in the developed world. To further the understanding of this disease, we have studied the genetically isolated Amish population of Ohio and Indiana.
Cumulative genetic risk scores were calculated using the 19 known allelic associations. Exome sequencing was performed in three members of a small Amish family with AMD who lacked the common risk alleles in complement factor H (CFH) and ARMS2/HTRA1. Follow-up genotyping and association analysis was performed in a cohort of 973 Amish individuals, including 95 with self-reported AMD.
The cumulative genetic risk score analysis generated a mean genetic risk score of 1.12 (95% confidence interval [CI]: 1.10, 1.13) in the Amish controls and 1.18 (95% CI: 1.13, 1.22) in the Amish cases. This mean difference in genetic risk scores is statistically significant (P = 0.0042). Exome sequencing identified a rare variant (P503A) in CFH. Association analysis in the remainder of the Amish sample revealed that the P503A variant is significantly associated with AMD (P = 9.27 × 10−13). Variant P503A was absent when evaluated in a cohort of 791 elderly non-Amish controls, and 1456 non-Amish cases.
Data from the cumulative genetic risk score analysis suggests that the variants reported by the AMDGene consortium account for a smaller genetic burden of disease in the Amish compared with the non-Amish Caucasian population. Using exome sequencing data, we identified a novel missense mutation that is shared among a densely affected nuclear Amish family and located in a gene that has been previously implicated in AMD risk.
In this study, we describe the analysis of the genetically isolated Amish population of Ohio and Indiana for AMD.
age-related macular degeneration; linkage analysis; rare variant; exome sequencing; risk score analysis
To identify genetic associations between specific risk genes and bilateral advanced age-related macular degeneration (AMD) in a retrospective, observational case series of 1,003 patients: 173 patients with geographic atrophy in at least 1 eye and 830 patients with choroidal neovascularization in at least 1 eye.
Patients underwent clinical examination and fundus photography. The images were subsequently graded using a modified grading system adapted from the Age-Related Eye Disease Study. Genetic analysis was performed to identify genotypes at 4 AMD-associated variants (ARMS2 A69S, CFH Y402H, C3 R102G, and CFB R32Q) in these patients.
There were no statistically significant relationships between clinical findings and genotypes at CFH, C3, and CFB. The genotype at ARMS2 correlated with bilateral advanced AMD using a variety of comparisons: unilateral geographic atrophy versus bilateral geographic atrophy (P = 0.08), unilateral choroidal neovascularization versus bilateral choroidal neovascularization (P = 9.0 × 10 −8), and unilateral late AMD versus bilateral late AMD (P = 5.9 × 10 −8).
In this series, in patients with geographic atrophy or choroidal neovascularization in at least 1 eye, the ARMS2 A69S substitution strongly associated with geographic atrophy or choroidal neovascularization in the fellow eye. The ARMS2 A69S substitution may serve as a marker for bilateral advanced AMD.
age-related macular degeneration; ARMS2; choroidal neovascularization; genotypes; geographic atrophy
Alzheimer disease (AD) is the most common cause of dementia. As with many complex diseases, the identified variants do not explain the total expected genetic risk that is based on heritability estimates for AD. Isolated founder populations, such as the Amish, are advantageous for genetic studies as they overcome heterogeneity limitations associated with complex population studies. We determined that Amish AD cases harbored a significantly higher burden of the known risk alleles compared to Amish cognitively normal controls, but a significantly lower burden when compared to cases from a dataset of unrelated individuals. Whole-exome sequencing of a selected subset of the overall study population was used as a screening tool to identify variants located in the regions of the genome that are most likely to contribute risk. By then genotyping the top candidate variants from the known AD genes and from linkage regions implicated previous studies in the full dataset, new associations could be confirmed. The most significant result (p = 0.0012) was for rs73938538, a synonymous variant in LAMA1 within the previously identified linkage peak on chromosome 18. However, this association is specific to the Amish and did not generalize when tested in a dataset of unrelated individuals. These results suggest that additional risk variation in the Amish remains to be identified and likely resides outside of the classical protein coding gene regions.
As APOE locus variants contribute to both risk of late-onset Alzheimer disease and differences in age-at-onset, it is important to know if other established late-onset Alzheimer disease risk loci also affect age-at-onset in cases.
To investigate the effects of known Alzheimer disease risk loci in modifying age-at-onset, and to estimate their cumulative effect on age-at-onset variation, using data from genome-wide association studies in the Alzheimer’s Disease Genetics Consortium (ADGC).
Design, Setting and Participants
The ADGC comprises 14 case-control, prospective, and family-based datasets with data on 9,162 Caucasian participants with Alzheimer’s occurring after age 60 who also had complete age-at-onset information, gathered between 1989 and 2011 at multiple sites by participating studies. Data on genotyped or imputed single nucleotide polymorphisms (SNPs) most significantly associated with risk at ten confirmed LOAD loci were examined in linear modeling of AAO, and individual dataset results were combined using a random effects, inverse variance-weighted meta-analysis approach to determine if they contribute to variation in age-at-onset. Aggregate effects of all risk loci on AAO were examined in a burden analysis using genotype scores weighted by risk effect sizes.
Main Outcomes and Measures
Age at disease onset abstracted from medical records among participants with late-onset Alzheimer disease diagnosed per standard criteria.
Analysis confirmed association of APOE with age-at-onset (rs6857, P=3.30×10−96), with associations in CR1 (rs6701713, P=7.17×10−4), BIN1 (rs7561528, P=4.78×10−4), and PICALM (rs561655, P=2.23×10−3) reaching statistical significance (P<0.005). Risk alleles individually reduced age-at-onset by 3-6 months. Burden analyses demonstrated that APOE contributes to 3.9% of variation in age-at-onset (R2=0.220) over baseline (R2=0.189) whereas the other nine loci together contribute to 1.1% of variation (R2=0.198).
Conclusions and Relevance
We confirmed association of APOE variants with age-at-onset among late-onset Alzheimer disease cases and observed novel associations with age-at-onset in CR1, BIN1, and PICALM. In contrast to earlier hypothetical modeling, we show that the combined effects of Alzheimer disease risk variants on age-at-onset are on the scale of, but do not exceed, the APOE effect. While the aggregate effects of risk loci on age-at-onset may be significant, additional genetic contributions to age-at-onset are individually likely to be small.
Alzheimer Disease; Alzheimer Disease Genetics; Alzheimer’s Disease - Pathophysiology; Genetics of Alzheimer Disease; Aging
Age-related macular degeneration (AMD) is the leading cause of irreversible visual loss in developed countries. Its etiology includes genetic and environmental factors. Although VEGFA variants are associated with AMD, the joint action of variants within the VEGF pathway and their interaction with nongenetic factors have not been investigated.
Affymetrix 6.0 chipsets were used to genotype 668,238 single nucleotide polymorphisms (SNPs) in 1207 AMD cases and 686 controls. Environmental exposures were collected by questionnaire. A set-based test was conducted using the χ2 statistic at each SNP derived from Kraft's two degree of freedom (2df) joint test. Pathway- and gene-based test statistics were calculated as the mean of all independent SNP statistics. Phenotype labels were permuted 10,000 times to generate an empirical P value.
While a main effect of the VEGF pathway was not identified, the pathway was associated with neovascular AMD in women when accounting for birth control pill (BCP) use (P = 0.017). Analysis of VEGF's subpathways showed that SNPs in the proliferation subpathway were associated with neovascular AMD (P = 0.029) when accounting for BCP use. Nominally significant genes within this subpathway were also observed. Stratification by BCP use revealed novel significant genetic effects in women who had taken BCPs.
These results illustrate that some AMD genetic risk factors may be revealed only when complex relationships among risk factors are considered. This shows the utility of exploring pathways of previously associated genes to find novel effects. It also demonstrates the importance of incorporating environmental exposures in tests of genetic association at the SNP, gene, or pathway level.
Analysis using a set-based joint test of genetic main effects and environmental interaction found that SNPs in VEGF's proliferation subpathway were associated with neovascular AMD when exogenous estrogen use in women was accounted for.
age-related macular degeneration; case-control study; epidemiology; statistics; candidate genes
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
imputation; genome-wide association; eMERGE; electronic health records
Genome-wide association studies (GWAS) and other genomic technologies have accelerated the discovery of genes and genomic regions contributing to common human ocular disorders with complex inheritance. Age-related macular degeneration (AMD), diabetic retinopathy (DR), glaucoma and myopia account for the majority of visual impairment worldwide. Over 19 genes and/or genomic regions have been associated with AMD. Current investigations are assessing the clinical utility of risk score panels and therapies targeting disease-specific pathways. DR is the leading cause of blindness in the United States and globally is a major cause of vision loss. Genomic investigations have identified molecular pathways associated with DR in animal models which could suggest novel therapeutic targets. Three types of glaucoma, primary-open-angle glaucoma (POAG), angle-closure glaucoma and exfoliation syndrome (XFS) glaucoma, are common age-related conditions. Five genomic regions have been associated with POAG, three with angle-closure glaucoma and one with XFS. Myopia causes substantial ocular morbidity throughout the world. Recent large GWAS have identified >20 associated loci for this condition. In this report, we present a comprehensive overview of the genes and genomic regions contributing to disease susceptibility for these common blinding ocular disorders and discuss the next steps toward translation to effective gene-based screening tests and novel therapies targeting the molecular events contributing to disease.
The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course.
Materials and methods
We used four algorithms based on ICD-9 codes, text keywords, and medications to identify individuals with MS from a de-identified, research version of the EMR at Vanderbilt University. Using a training dataset of the records of 899 individuals, algorithms were constructed to identify and extract detailed information regarding the clinical course of MS from the text of the medical records, including clinical subtype, presence of oligoclonal bands, year of diagnosis, year and origin of first symptom, Expanded Disability Status Scale (EDSS) scores, timed 25-foot walk scores, and MS medications. Algorithms were evaluated on a test set validated by two independent reviewers.
We identified 5789 individuals with MS. For all clinical traits extracted, precision was at least 87% and specificity was greater than 80%. Recall values for clinical subtype, EDSS scores, and timed 25-foot walk scores were greater than 80%.
Discussion and conclusion
This collection of clinical data represents one of the largest databases of detailed, clinical traits available for research on MS. This work demonstrates that detailed clinical information is recorded in the EMR and can be extracted for research purposes with high reliability.
Multiple sclerosis; electronic health records
To identify novel late-onset Alzheimer disease (LOAD) risk genes, we have analyzed Amish populations of Ohio and Indiana. We performed genome-wide SNP linkage and association studies on 798 individuals (109 with LOAD). We tested association using the Modified Quasi-Likelihood Score (MQLS) test and also performed two-point and multipoint linkage analyses. We found that LOAD was significantly associated with APOE (P=9.0×10-6) in all our ascertainment regions except for the Adams County, Indiana, community (P=0.55). Genome-wide, the most strongly associated SNP was rs12361953 (P=7.92×10-7). A very strong, genome-wide significant multipoint peak (recessive HLOD=6.14, dominant HLOD=6.05) was detected on 2p12. Three additional loci with multipoint HLOD scores >3 were detected on 3q26, 9q31, and 18p11. Converging linkage and association results, the most significantly associated SNP under the 2p12 peak was at rs2974151 (P=1.29×10-4). This SNP is located in CTNNA2, which encodes catenin alpha 2, a neuronal-specific catenin known to have function in the developing brain. These results identify CTNNA2 as a novel candidate LOAD gene, and implicate three other regions of the genome as novel LOAD loci. These results underscore the utility of using family-based linkage and association analysis in isolated populations to identify novel loci for traits with complex genetic architecture.
GWAS; Linkage; founder population; Amish; Alzheimer
Alzheimer's disease (AD) and related dementias are a major public health challenge and present a therapeutic imperative for which we need additional insight into molecular pathogenesis. We performed a genome-wide association study and analysis of known genetic risk loci for AD dementia using neuropathologic data from 4,914 brain autopsies. Neuropathologic data were used to define clinico-pathologic AD dementia or controls, assess core neuropathologic features of AD (neuritic plaques, NPs; neurofibrillary tangles, NFTs), and evaluate commonly co-morbid neuropathologic changes: cerebral amyloid angiopathy (CAA), Lewy body disease (LBD), hippocampal sclerosis of the elderly (HS), and vascular brain injury (VBI). Genome-wide significance was observed for clinico-pathologic AD dementia, NPs, NFTs, CAA, and LBD with a number of variants in and around the apolipoprotein E gene (APOE). GalNAc transferase 7 (GALNT7), ATP-Binding Cassette, Sub-Family G (WHITE), Member 1 (ABCG1), and an intergenic region on chromosome 9 were associated with NP score; and Potassium Large Conductance Calcium-Activated Channel, Subfamily M, Beta Member 2 (KCNMB2) was strongly associated with HS. Twelve of the 21 non-APOE genetic risk loci for clinically-defined AD dementia were confirmed in our clinico-pathologic sample: CR1, BIN1, CLU, MS4A6A, PICALM, ABCA7, CD33, PTK2B, SORL1, MEF2C, ZCWPW1, and CASS4 with 9 of these 12 loci showing larger odds ratio in the clinico-pathologic sample. Correlation of effect sizes for risk of AD dementia with effect size for NFTs or NPs showed positive correlation, while those for risk of VBI showed a moderate negative correlation. The other co-morbid neuropathologic features showed only nominal association with the known AD loci. Our results discovered new genetic associations with specific neuropathologic features and aligned known genetic risk for AD dementia with specific neuropathologic changes in the largest brain autopsy study of AD and related dementias.
Alzheimer's disease (AD) and related dementias are a major public health challenge and present a therapeutic imperative for which we need additional insight into molecular pathogenesis. We performed a genome-wide association study (GWAS), as well as an analysis of known genetic risk loci for AD dementia, using data from 4,914 brain autopsies. Genome-wide significance was observed for 7 genes and pathologic features of AD and related diseases. Twelve of the 22 genetic risk loci for clinically-defined AD dementia were confirmed in our pathologic sample. Correlation of effect sizes for risk of AD dementia with effect size for hallmark pathologic features of AD were strongly positive and linear. Our study discovered new genetic associations with specific pathologic features and aligned known genetic risk for AD dementia with specific pathologic changes in a large brain autopsy study of AD and related dementias.
Primary open-angle glaucoma (POAG) is a common disease with complex inheritance. The identification of genes predisposing to POAG is an important step toward the development of novel gene-based methods of diagnosis and treatment. Genome-wide association studies (GWAS) have successfully identified genes contributing to complex traits such as POAG however, such studies frequently require very large sample sizes, and thus, collaborations and consortia have been of critical importance for the GWAS approach. In this report we describe the formation of the NEIGHBOR consortium, the harmonized case control definitions used for a POAG GWAS, the clinical features of the cases and controls and the rationale for the GWAS study design.
In the decade that has passed since the initial release of the Human Genome, numerous advancements in science and technology within and beyond genetics and genomics have been encouraged and enhanced by the availability of this vast and remarkable data resource. Progress in understanding three common, complex diseases: age-related macular degeneration (AMD), Alzheimer’s disease (AD), and multiple sclerosis (MS), are three exemplars of the incredible impact on the elucidation of the genetic architecture of disease. The approaches used in these diseases have been successfully applied to numerous other complex diseases. For example, the heritability of AMD was confirmed upon the release of the first genome-wide association study (GWAS) along with confirmatory reports that supported the findings of that state-of-the art method, thus setting the foundation for future GWAS in other heritable diseases. Following this seminal discovery and applying it to other diseases including AD and MS, the genetic knowledge of AD expanded far beyond the well-known APOE locus and now includes more than 20 loci. MS genetics saw a similar increase beyond the HLA loci and now has more than 100 known risk loci. Ongoing and future efforts will seek to define the remaining heritability of these diseases; the next decade could very well hold the key to attaining this goal.
human genome project; age-related macular degeneration; Alzheimer’s disease; multiple sclerosis; genetics; genomics; genome-wide association study
We set out to determine whether expansions in the C9ORF72 repeat found in Amyotrophic Lateral Sclerosis (ALS) and Frontotemporal Dementia (FTD) families are associated with Parkinson Disease (PD). We determined the repeat size in a total of 889 clinically ascertained patients (including PD and Essential Tremor plus Parkinsonism (ETP)) and 1144 controls using a repeat-primed PCR assay. We found that large C9ORF72 repeat expansions (>30 repeats) were not contributing to PD risk. However, PD and ETP cases had a significant increase in intermediate (>20 to 30+) repeat copies compared to controls. Overall, 14 cases (13 PD, 1 ETP) and 3 controls had >20 repeat copies (Fisher’s exact test p=0.002). Further, seven cases and no controls had >23 repeat copies (p=0.003). Our results suggest that intermediate copy numbers of the C9ORF72 repeat contribute to risk for PD and ETP. This also suggests that PD, ALS and FTD share some pathophysiologic mechanisms of disease. Further studies are needed to elucidate the contribution of the C9ORF72 repeat in the overall PD population and if other common genetic risk factors exist between these neurodegenerative disorders.
Parkinson Disease; C9ORF72 repeat; association; risk factor
Successful aging (SA) is a multidimensional phenotype involving living to older age with high physical function, preserved cognition, and continued social engagement. Several domains underlying SA are heritable, and identifying health-promoting polymorphisms and their interactions with the environment could provide important information regarding the health of older adults. In the present study, we examined 263 cognitively intact Amish individuals age 80 and older (74 SA and 189 “normally aged”) all of whom are part of a single 13-generation pedigree. A genome-wide association study of 630,309 autosomal single nucleotide polymorphisms (SNPs) was performed and analyzed for linkage using multipoint analyses and for association using the modified quasi-likelihood score test. There was evidence for linkage on 6q25-27 near the fragile site FRA6E region with a dominant model maximum multipoint heterogeneity LOD score = 3.2. The 1-LOD-down support interval for this linkage contained one SNP for which there was regionally significant evidence of association (rs205990, p = 2.36 × 10−5). This marker survived interval-wide Bonferroni correction for multiple testing and was located between the genes QKI and PDE10A. Other areas of chromosome 6q25-q27 (including the FRA6E region) contained several SNPs associated with SA (minimum p = 2.89 × 10−6). These findings suggest potentially novel genes in the 6q25-q27 region linked and associated with SA in the Amish; however, these findings should be verified in an independent replication cohort.
Electronic supplementary material
The online version of this article (doi:10.1007/s11357-012-9447-1) contains supplementary material, which is available to authorized users.
Genome-wide association; Longevity; Genetic epidemiology; Family-based study
Primary open angle glaucoma (POAG) is a genetically and phenotypically complex disease that is a leading cause of blindness worldwide. Previously we completed a genome-wide scan for early-onset POAG that identified a locus on 9q22 (GLC1J). To identify potential causative variants underlying GLC1J, we used targeted DNA capture followed by high throughput sequencing of individuals from four GLC1J pedigrees, followed by Sanger sequencing to screen candidate variants in additional pedigrees. A mutation likely to cause early-onset glaucoma was not identified, however COL15A1 variants were found in the youngest affected members of 7 of 15 pedigrees with variable disease onset. In addition, the most common COL15A1 variant, R163H, influenced the age of onset in adult POAG cases. RNA in situ hybridization of mouse eyes shows that Col15a1 is expressed in the multiple ocular structures including ciliary body, astrocytes of the optic nerve and cells in the ganglion cell layer. Sanger sequencing of COL18A1, a related multiplexin collagen, identified a rare variant, A1381T, in members of three additional pedigrees with early-onset disease. These results suggest genetic variation in COL15A1 and COL18A1 can modify the age of onset of both early and late onset POAG.
The Alzheimer amyloid protein precursor (APP) is subject to proteolysis by ADAM10 and ADAM17, precluding the formation of Aβ. Recently, coding variations in ADAM10 resulting in altered function have been reported in familial Alzheimer disease (AD). We carried out a large-scale (n=576: Controls, 271; AD, 305) resequencing study of ADAM10 in sporadic AD. Our results do not support a significant role for ADAM10 mutations in AD. Our results also make it clear that the careful examination of ancestry required in any case-control comparison is especially true with rare variations, where even a very small number of variations might form the basis of scientific conclusions.
Mutation; rare variation; genetics; association
Variations in a locus at chromosome 10q26 are strongly associated with the risk of age-related macular degeneration (AMD). The most significantly associated haplotype includes a nonsynonymous SNP rs10490924 in the exon 1 of ARMS2 and rs11200638 in the promoter region of HTRA1. It is under debate which gene(s), ARMS2, HTRA1 or some other genes are functionally responsible for the genetic association. To verify whether the associated variants correlate with a higher HTRA1 expression level as previously reported, HTRA1 mRNA and protein were measured in a larger human retina-RPE-choroid samples (n = 82). Results show there is no significant change of HTRA1 mRNA level among genotypes at rs11200638, rs10490924 or an indel variant of ARMS2. Furthermore, two AMD-associated synonymous SNPs rs1049331 and rs2293870 in HTRA1 exon 1 do not change its protein level either. These results suggest that the AMD-associated variants in the chromosome 10q26 locus do not significantly affect the expression of HTRA1.
MAPT encodes for tau, the predominant component of neurofibrillary tangles that are neuropathological hallmarks of Alzheimer’s disease (AD). Genetic association of MAPT variants with late-onset AD (LOAD) risk has been inconsistent, although insufficient power and incomplete assessment of MAPT haplotypes may account for this.
We examined the association of MAPT haplotypes with LOAD risk in more than 20,000 subjects (n-cases = 9,814, n-controls = 11,550) from Mayo Clinic (n-cases = 2,052, n-controls = 3,406) and the Alzheimer’s Disease Genetics Consortium (ADGC, n-cases = 7,762, n-controls = 8,144). We also assessed associations with brain MAPT gene expression levels measured in the cerebellum (n = 197) and temporal cortex (n = 202) of LOAD subjects. Six single nucleotide polymorphisms (SNPs) which tag MAPT haplotypes with frequencies greater than 1% were evaluated.
H2-haplotype tagging rs8070723-G allele associated with reduced risk of LOAD (odds ratio, OR = 0.90, 95% confidence interval, CI = 0.85-0.95, p = 5.2E-05) with consistent results in the Mayo (OR = 0.81, p = 7.0E-04) and ADGC (OR = 0.89, p = 1.26E-04) cohorts. rs3785883-A allele was also nominally significantly associated with LOAD risk (OR = 1.06, 95% CI = 1.01-1.13, p = 0.034). Haplotype analysis revealed significant global association with LOAD risk in the combined cohort (p = 0.033), with significant association of the H2 haplotype with reduced risk of LOAD as expected (p = 1.53E-04) and suggestive association with additional haplotypes. MAPT SNPs and haplotypes also associated with brain MAPT levels in the cerebellum and temporal cortex of AD subjects with the strongest associations observed for the H2 haplotype and reduced brain MAPT levels (β = -0.16 to -0.20, p = 1.0E-03 to 3.0E-03).
These results confirm the previously reported MAPT H2 associations with LOAD risk in two large series, that this haplotype has the strongest effect on brain MAPT expression amongst those tested and identify additional haplotypes with suggestive associations, which require replication in independent series. These biologically congruent results provide compelling evidence to screen the MAPT region for regulatory variants which confer LOAD risk by influencing its brain gene expression.
The electronic MEdical Records & GEnomics (eMERGE) network was established in 2007 by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) in part to explore the utility of electronic medical records (EMRs) in genome science. The initial focus was on discovery primarily using the genome-wide association paradigm, but more recently, the network has begun evaluating mechanisms to implement new genomic information coupled to clinical decision support into EMRs. Herein, we describe this evolution including the development of the individual and merged eMERGE genomic datasets, the contribution the network has made toward genomic discovery and human health, and the steps taken toward the next generation genotype-phenotype association studies and clinical implementation.
biobanks; genome-wide association studies; pharmacogenomics; electronic medical records
Alzheimer's disease is a common debilitating dementia with known heritability, for which 20 late onset susceptibility loci have been identified, but more remain to be discovered. This study sought to identify new susceptibility genes, using an alternative gene-wide analytical approach which tests for patterns of association within genes, in the powerful genome-wide association dataset of the International Genomics of Alzheimer's Project Consortium, comprising over 7 m genotypes from 25,580 Alzheimer's cases and 48,466 controls.
In addition to earlier reported genes, we detected genome-wide significant loci on chromosomes 8 (TP53INP1, p = 1.4×10−6) and 14 (IGHV1-67 p = 7.9×10−8) which indexed novel susceptibility loci.
The additional genes identified in this study, have an array of functions previously implicated in Alzheimer's disease, including aspects of energy metabolism, protein degradation and the immune system and add further weight to these pathways as potential therapeutic targets in Alzheimer's disease.
Translation of results from genetic findings to inform medical practice is a highly anticipated goal of human genetics. The aim of this paper is to review and discuss the role of genetics in medically-relevant prediction. Germline genetics presages disease onset and therefore can contribute prognostic signals that augment laboratory tests and clinical features. As such, the impact of genetic-based predictive models on clinical decisions and therapy choice could be profound. However, given that (i) medical traits result from a complex interplay between genetic and environmental factors, (ii) the underlying genetic architectures for susceptibility to common diseases are not well-understood, and (iii) replicable susceptibility alleles, in combination, account for only a moderate amount of disease heritability, there are substantial challenges to constructing and implementing genetic risk prediction models with high utility. In spite of these challenges, concerted progress has continued in this area with an ongoing accumulation of studies that identify disease predisposing genotypes. Several statistical approaches with the aim of predicting disease have been published. Here we summarize the current state of disease susceptibility mapping and pharmacogenetics efforts for risk prediction, describe methods used to construct and evaluate genetic-based predictive models, and discuss applications.
predictive model; genetic risk; human genetics; prognosis; clinical utility
Natural selection shapes many human genes, including some related to complex diseases. Understanding how selection affects genes, especially pleiotropic ones, may be important in evaluating disease associations and the role played by environmental variation. This may be of particular interest for genes with antagonistic roles that cause divergent patterns of selection. The lectin like low-density lipoprotein 1 receptor (LOX-1), encoded by OLR1, is exemplary. It has antagonistic functions in the cardiovascular and immune systems as the same protein domain binds oxidized LDL and bacterial cell wall proteins - the former contributing to atherosclerosis, the latter presumably protecting from infection. We studied patterns of selection in this gene, in humans and non-human primates, to determine whether variable selection can lead to conflicting results in CVD association studies.
Methods and Results
We analyzed sequences from 11 non-human primate species as well as SNP and sequence data from multiple human populations. Results indicate that the derived allele is favored across primate lineages (probably due to recent positive selection). However, both the derived and ancestral alleles were maintained in human populations, especially European ones (possibly due to balancing selection derived from LOX-1's dual roles). Balancing selection likely reflects response to diverse environmental pressures among humans.
These data indicate that differential selection patterns, within and between species, in OLR1 render association studies difficult to replicate even if the gene is etiologically connected to CVD. Selection analyses can identify genes exhibiting gene-environment interactions critical for unraveling disease association.
lipoproteins; immune system; genetics; LOX-1 receptor; evolution
Eleven susceptibility loci for late-onset Alzheimer’s disease (LOAD) were identified by previous studies; however, a large portion of the genetic risk for this disease remains unexplained. We conducted a large, two-stage meta-analysis of genome-wide association studies (GWAS) in individuals of European ancestry. In stage 1, we used genotyped and imputed data (7,055,881 SNPs) to perform meta-analysis on 4 previously published GWAS data sets consisting of 17,008 Alzheimer’s disease cases and 37,154 controls. In stage 2,11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer’s disease cases and 11,312 controls. In addition to the APOE locus (encoding apolipoprotein E), 19 loci reached genome-wide significance (P < 5 × 10−8) in the combined stage 1 and stage 2 analysis, of which 11 are newly associated with Alzheimer’s disease.