The advent of high-content genomic mapping technologies has provided numerous clues about the genetic architecture of complex disease and the tools with which to understand the biological framework resulting from this architecture. We believe that understanding and mapping epigenetic marks, in particular DNA methylation, which is suited to such assays, offers a timely opportunity in this context. Here, we make an argument for this work, describing this opportunity, the likely path ahead, and the problems and pitfalls associated with such work.
Methylation; Genetic variability; Quantitative trait loci; Genome-wide association
Genetic heterogeneity is common in many neurologic disorders. This is particularly true for the hereditary ataxias where at least 36 disease genes or loci have been described for spinocerebellar ataxia and over 100 genes for neurologic disorders that present primarily with ataxia. Traditional genetic testing of a large number of candidate genes delays diagnosis and is expensive. In contrast, recently developed genomic techniques, such as exome sequencing that targets only the coding portion of the genome, offer an alternative strategy to rapidly sequence all genes in a comprehensive manner. Here we describe the use of exome sequencing to investigate a large, 5-generational British kindred with an autosomal dominant, progressive cerebellar ataxia in which conventional genetic testing had not revealed a causal etiology.
Twenty family members were seen and examined; 2 affected individuals were clinically investigated in detail without a genetic or acquired cause being identified. Exome sequencing was performed in one patient where coverage was comprehensive across the known ataxia genes, excluding the known repeat loci which should be examined using conventional analysis.
A novel p.Arg26Gly change in the PRKCG gene, mutated in SCA14, was identified. This variant was confirmed using Sanger sequencing and showed segregation with disease in the entire family.
This work demonstrates the utility of exome sequencing to rapidly screen heterogeneous genetic disorders such as the ataxias. Exome sequencing is more comprehensive, faster, and significantly cheaper than conventional Sanger sequencing, and thus represents a superior diagnostic screening tool in clinical practice.
Mutations in the valosin-containing protein gene (VCP) have been identified in neurological disorders (IBMPFD and ALS) and are thought to play a role in the clearance of abnormally folded proteins. Parkinsonism has been noted in kindreds with VCP mutations. Based on this, we hypothesized that mutations in VCP may also contribute to idiopathic PD. We screened the coding region of the VCP gene in a large cohort of 768 late onset PD cases (average age at onset = 70 years), both sporadic and with positive family history. We identified a number of rare single nucleotide changes, including a variant previously described to be pathogenic, but no clear disease-causing variants. We conclude that mutations in VCP are not a common cause for idiopathic PD.
To test whether the synucleinopathies Parkinson’s disease and multiple system atrophy (MSA) share a common genetic etiology, we performed a candidate single nucleotide polymorphism (SNP) association study of the 384 most associated SNPs in a genome-wide association study of Parkinson’s disease in 413 MSA cases and 3,974 control subjects. The 10 most significant SNPs were then replicated in additional 108 MSA cases and 537 controls. SNPs at the SNCA locus were significantly associated with risk for increased risk for the development of MSA (combined p = 5.5 × 1012; odds ratio 6.2).
Cortical and cerebrovascular amyloid-beta (A-beta) deposition is a hallmark of Alzheimer’s disease (AD), but also occurs in elderly people not affected by dementia. The apolipoprotein E (APOE) epsilon4 is a major genetic modulator of A-beta deposition and AD risk. Variants of the amyloid-beta protein precursor (A-betaPP) gene have been reported to contribute to AD and cerebral amyloid angiopathy (CAA). We analyzed the role of APOE and A-beta PP variants in cortical and cerebrovascular A-beta deposition, and neuropathologically verified AD (based on modified NIA-RI criteria) in a population-based autopsy sample of Finns aged ≥85 years (Vantaa85 + Study; n = 282). Our updated analysis of APOE showed strong associations of the epsilon4 allele with cortical (p = 4.91×10−17) and cerebrovascular (p = 9.87×10−11) A-beta deposition as well as with NIA-RI AD (p = 1.62×10−8). We also analyzed 60 single nucleotide polymorphisms (SNPs) at the A-betaPP locus. In single SNP or haplotype analyses there were no statistically significant A-betaPP locus associations with cortical or cerebrovascular A-beta deposition or with NIA-RI AD. We sequenced the promoter of the A-betaPP gene in 40 subjects with very high A-beta deposition, but none of these subjects had any of the previously reported or novel AD-associated mutations. These results suggest that cortical and cerebrovascular A-beta depositions are useful quantitative traits for genetic studies, as highlighted by the strong associations with the APOE epsilon4 variant. Promoter mutations or common allelic variation in the A-betaPP gene do not have a major contribution to cortical or cerebrovascular A-beta deposition, or very late-onset AD in this Finnish population based study.
Interpreting gene expression profiles obtained from heterogeneous samples can be difficult because bulk gene expression measures are not resolved to individual cell populations. We have recently devised Population-Specific Expression Analysis (PSEA), a statistical method that identifies individual cell types expressing genes of interest and achieves quantitative estimates of cell type-specific expression levels. This procedure makes use of marker gene expression and circumvents the need for additional experimental information like tissue composition.
To systematically assess the performance of statistical deconvolution, we applied PSEA to gene expression profiles from cerebellum tissue samples and compared with parallel, experimental separation methods. Owing to the particular histological organization of the cerebellum, we could obtain cellular expression data from in situ hybridization and laser-capture microdissection experiments and successfully validated computational predictions made with PSEA. Upon statistical deconvolution of whole tissue samples, we identified a set of transcripts showing age-related expression changes in the astrocyte population.
PSEA can predict cell-type specific expression levels from tissues homogenates on a genome-wide scale. It thus represents a computational alternative to experimental separation methods and allowed us to identify age-related expression changes in the astrocytes of the cerebellum. These molecular changes might underlie important physiological modifications previously observed in the aging brain.
Genomics; Computational biology; Cerebellum; Gene expression; Aging; Astrocyte
Declining muscle strength is a core feature of aging. Several mechanisms have been postulated, including CCAAT/enhancer-binding protein-beta (C/EBP-β) triggered macrophage-mediated muscle fibre regeneration after micro-injury, evidenced in a mouse model. We aimed to identify in-vivo circulating leukocyte gene expression changes associated with muscle strength in the human adult population.
We undertook a genome wide expression microarray screen, using peripheral blood RNA samples from InCHIANTI study participants (ages 30–104 yrs). Logged expression intensities were regressed with muscle strength using models adjusted for multiple confounders. Key results were validated by real-time PCR. The Short Physical Performance Battery score (SPPB) tested walk speed, chair stand and balance.
CEBPB expression levels were associated with muscle strength (beta coefficient = 0.20560, p=1.03*10−6, false discovery rate q=0.014). The estimated handgrip strength in 70 year old men in the lowest CEBPB expression tertile was 35.2 kg compared to 41.2 kg in the top tertile. CEBPB expression was also associated with hip, knee, ankle and shoulder strength and the SPPB performance score (p=0.018). Near study-wide associations were also noted for TGFB3 (p=3.4*10−5, q=0.12) and CEBPD expression (p=9.67E−5, q=0.18) but not for CEBPA expression.
We report here a novel finding that raised CEBPB expression in circulating leukocyte derived RNA samples in-vivo is associated with greater muscle strength and better physical performance in humans. This association may be consistent with mouse model evidence of CEBPB triggered muscle repair: if this mechanism is confirmed it may provide a target for intervention to protect and enhance aging muscle.
macrophage; inflammation; transcription; regeneration; population; mechanism
The dominant and sometimes competing theories for the aetiology of complex human disease have been the common disease, common variant (CDCV) hypothesis, and the multiple rare variant (MRV) hypothesis. With the advent of genome wide association studies and of second-generation sequencing, we are fortunate in being able to test these ideas. The results to date suggest that these hypotheses are not mutually exclusive. Further, initial evidence suggests that both MRV and CDCV can be true at the same loci, and that other disease-related genetic mechanisms also exist at some of these loci. We propose calling these, pleomorphic risk loci, and discuss here how such loci not only offer understanding of the genetic basis of disease, but also provide mechanistic biological insight into disease processes.
Aging is a major risk factor for chronic disease in the human population, but there is little human data on gene expression alterations that accompany the process. We examined human peripheral blood leucocyte in-vivo RNA in a large-scale transcriptomic microarray study (subjects aged 30 to 104 years). We tested associations between probe expression intensity and advancing age (adjusting for confounding factors), initially in a discovery set (n = 458), following-up findings in a replication set (n=240). We confirmed expression of key results by real-time PCR. Of 16,571 expressed probes, only 295 (2%) were robustly associated with age. Just six probes were required for a highly efficient model for distinguishing between young and old (Area Under the Curve in replication set; 95%). The focussed nature of age-related gene expression may therefore provide potential biomarkers of aging. Similarly, only 7 of 1065 biological or metabolic pathways were age-associated, in Gene Set Enrichment Analysis (GSEA), notably including the processing of messenger RNAs (mRNAs); (p<0.002, FDR q<0.05). This is supported by our observation of age-associated disruption to the balance of alternatively-expressed isoforms for selected genes, suggesting that modification of mRNA processing may be a feature of human aging.
Aging; Gene expression; mRNA processing; Cell senescence; predictive model
Background and Purpose
Ischemic stroke has a strong familial component to risk. The Siblings with Ischemic Stroke Study (SWISS) is a genome-wide family-based analysis that included use of imputed genotypes. SWISS was conducted to examine associations between SNPs and risk of stroke and stroke subtypes within pairs.
SWISS enrolled 312 probands with ischemic stroke across 70 US and Canadian centers. Affected siblings were ascertained by centers and confirmed by central record review; unaffected siblings were ascertained by telephone contact. Ischemic stroke was subtyped using TOAST criteria. Genotyping was performed using an Illumina 610 quad array (probands) and an Illumina linkage V array (affected siblings). SNPs were imputed using 1000 Genomes Project data and MACH software. Family-based association analyses were conducted using the sibling-transmission disequilibrium test.
For all pairs, the correlation of age at stroke within pairs of affected siblings was r = 0.83 (95%CI, 0.78 to 0.86; P < 2.2×10−16). The correlation did not differ substantially by subtype. The concordance of stroke subtypes among affected pairs was 33.8% (kappa = 0.13; P = 5.06×10−4) and did not differ by age at stroke in the proband. Although no SNP achieved genome-wide significance for risk of ischemic stroke, there was clustering of the most associated SNPs on chromosomes 3p (NOS1) and 6p.
Stroke subtype and age at stroke in affected sibling pairs exhibit significant clustering. No individual SNP reached genome-wide significance. However, two promising candidate loci were identified, including one that contains NOS1, though these risk loci warrant further examination in larger sample collections.
A large proportion of basic research into disease mechanisms has leveraged genetic findings to model and understand etiology. There has been broad success in finding disease linked mutations using traditional positional cloning approaches, however, because of the requirements of the method, these successes have been limited by the availability of large, well characterized families. Because of these and other limitations the genetic basis of many diseases, and in many families remains unknown across myriad diseases.
Exome sequencing uses DNA enrichment methods and massively parallel nucleotide sequencing to comprehensively discover and type protein-coding variants throughout the genome. Coupled with growing databases that contain known variants, exome sequencing affords the ability to find genetic mutations and risk factors in families and samples that were deemed insufficiently informative for previous genetic studies. Not only does this method afford discovery in families that linkage and positional cloning methods were unable to use, but compared to this method, it is much quicker and cheaper. Exome sequencing has had initial success in many rare diseases.
Exome sequencing is being adopted widely and we can expect a landslide of mutation discovery, similar to the deluge of genome wide association findings reported over the past 5 years. It is to be expected that exome sequencing will enable not only the discovery of rare causal variants, but also protein coding risk variants. This method will have application in both the research and clinical arena and sets the scene for the use of whole genome sequencing.
Exome Sequencing is rapidly becoming a fundamental tool for genetics and functional genomics laboratories. This methodology has enabled the discovery of novel pathogenic mutations causing mendelian diseases that had, until now, remained elusive. In this review we discuss not only how we envisage exome sequencing being applied to a complex disease, such as Parkinson’s disease, but also what are the known caveats of this approach.
Exome sequencing; Genetics; Parkinson’s disease; Genomics
In the current study we undertook a series of experiments to test the hypothesis that a monogenic cause of disease may be detectable within a cohort of Finnish young onset Parkinson’s disease patients. In the first instance we performed standard genome wide association analyses, and subsequent risk profile analysis. In addition we performed a series of analyses that involved testing measures of global relatedness within the cases compared to controls, searching for excess homozygosity in the cases, and examining the cases for signs of excess local genomic relatedness using a sliding window approach. This work suggested that the previously identified common, low risk alleles, and the risk models associated with these alleles, were generalizable to the Finnish Parkinson’s disease population. However, we found no evidence that would suggest a single common high penetrance mutation exists in this cohort of young onset patients.
The MAPT (microtubule-associated protein tau) locus is one of the most remarkable in neurogenetics due not only to its involvement in multiple neurodegenerative disorders, including progressive supranuclear palsy, corticobasal degeneration, Parksinson's disease and possibly Alzheimer's disease, but also due its genetic evolution and complex alternative splicing features which are, to some extent, linked and so all the more intriguing. Therefore, obtaining robust information regarding the expression, splicing and genetic regulation of this gene within the human brain is of immense importance. In this study, we used 2011 brain samples originating from 439 individuals to provide the most reliable and coherent information on the regional expression, splicing and regulation of MAPT available to date. We found significant regional variation in mRNA expression and splicing of MAPT within the human brain. Furthermore, at the gene level, the regional distribution of mRNA expression and total tau protein expression levels were largely in agreement, appearing to be highly correlated. Finally and most importantly, we show that while the reported H1/H2 association with gene level expression is likely to be due to a technical artefact, this polymorphism is associated with the expression of exon 3-containing isoforms in human brain. These findings would suggest that contrary to the prevailing view, genetic risk factors for neurodegenerative diseases at the MAPT locus are likely to operate by changing mRNA splicing in different brain regions, as opposed to the overall expression of the MAPT gene.
In view of the population-specific heterogeneity in reported genetic risk factors for Parkinson's disease (PD), we conducted a genome-wide association study (GWAS) in a large sample of PD cases and controls from the Netherlands. After quality control (QC), a total of 514 799 SNPs genotyped in 772 PD cases and 2024 controls were included in our analyses. Direct replication of SNPs within SNCA and BST1 confirmed these two genes to be associated with PD in the Netherlands (SNCA, rs2736990: P=1.63 × 10−5, OR=1.325 and BST1, rs12502586: P=1.63 × 10−3, OR=1.337). Within SNCA, two independent signals in two different linkage disequilibrium (LD) blocks in the 3′ and 5′ ends of the gene were detected. Besides, post-hoc analysis confirmed GAK/DGKQ, HLA and MAPT as PD risk loci among the Dutch (GAK/DGKQ, rs2242235: P=1.22 × 10−4, OR=1.51; HLA, rs4248166: P=4.39 × 10−5, OR=1.36; and MAPT, rs3785880: P=1.9 × 10−3, OR=1.19).
SNCA; BST1; GAK/DGKQ; HLA; MAPT; PD
Survival bias is the phenomenon by which individuals are excluded from analysis of a trait because of mortality related to the expression of that trait. In genetic association studies, variants increasing risk for disease onset as well as risk of disease-related mortality (lethality) could be difficult to detect in genetic association case-control designs, possibly leading to underestimation of a variant's effect on disease risk.
Methods and Results
We modeled cohorts for three diseases of high lethality (intracerebral hemorrhage, ischemic stroke, and myocardial infarction) using existing longitudinal data. Based on these models, we simulated case-control genetic association studies for genetic risk factors of varying effect sizes, lethality, and minor allele frequencies (MAF). For each disease, erosion of detected effect size was larger for case-control studies of individuals of advanced age (age > 75 years) and/or variants with very high event-associated lethality (Genotype Relative Risk for event-related death > 2.0). We found that survival bias results in no more than 20% effect size erosion for cohorts with mean age < 75 years, even for variants that double lethality risk. Furthermore, we found that increasing effect size erosion was accompanied by depletion of MAF in the case population, yielding a “signature” of the presence of survival bias.
Our simulation provides formulas to allow estimation of effect size erosion given a variant's odds-ratio (OR) of disease, OR of lethality, and MAF. These formulas will add precision to power calculation and replication efforts for case-control genetic studies. Our approach requires validation using prospective data.
Stroke; hemorrhage; myocardial infarction; genetics; epidemiology
Methylation at CpG sites is a critical epigenetic modification in mammals. Altered DNA methylation has been suggested to be a central mechanism in development, some disease processes and cellular senescence. Quantifying the extent and identity of epigenetic changes in the aging process is therefore potentially important for understanding longevity and age-related diseases. In the current study, we have examined DNA methylation at >27 000 CpG sites throughout the human genome, in frontal cortex, temporal cortex, pons and cerebellum from 387 human donors between the ages of 1 and 102 years. We identify CpG loci that show a highly significant, consistent correlation between DNA methylation and chronological age. The majority of these loci are within CpG islands and there is a positive correlation between age and DNA methylation level. Lastly, we show that the CpG sites where the DNA methylation level is significantly associated with age are physically close to genes involved in DNA binding and regulation of transcription. This suggests that specific age-related DNA methylation changes may have quite a broad impact on gene expression in the human brain.
Parkinson's disease (PD) occurs in both familial and sporadic forms, and both monogenic and complex genetic factors have been identified. Early onset PD (EOPD) is particularly associated with autosomal recessive (AR) mutations, and three genes, PARK2, PARK7 and PINK1, have been found to carry mutations leading to AR disease. Since mutations in these genes account for less than 10% of EOPD patients, we hypothesized that further recessive genetic factors are involved in this disorder, which may appear in extended runs of homozygosity.
We carried out genome wide SNP genotyping to look for extended runs of homozygosity (ROHs) in 1,445 EOPD cases and 6,987 controls. Logistic regression analyses showed an increased level of genomic homozygosity in EOPD cases compared to controls. These differences are larger for ROH of 9 Mb and above, where there is a more than three-fold increase in the proportion of cases carrying a ROH. These differences are not explained by occult recessive mutations at existing loci. Controlling for genome wide homozygosity in logistic regression analyses increased the differences between cases and controls, indicating that in EOPD cases ROHs do not simply relate to genome wide measures of inbreeding. Homozygosity at a locus on chromosome19p13.3 was identified as being more common in EOPD cases as compared to controls. Sequencing analysis of genes and predicted transcripts within this locus failed to identify a novel mutation causing EOPD in our cohort.
There is an increased rate of genome wide homozygosity in EOPD, as measured by an increase in ROHs. These ROHs are a signature of inbreeding and do not necessarily harbour disease-causing genetic variants. Although there might be other regions of interest apart from chromosome 19p13.3, we lack the power to detect them with this analysis.
Several genetic variants associated with platelet count and mean platelet volume
(MPV) were recently reported in people of European ancestry. In this
meta-analysis of 7 genome-wide association studies (GWAS) enrolling African
Americans, our aim was to identify novel genetic variants associated with
platelet count and MPV. For all cohorts, GWAS analysis was performed using
additive models after adjusting for age, sex, and population stratification. For
both platelet phenotypes, meta-analyses were conducted using inverse-variance
weighted fixed-effect models. Platelet aggregation assays in whole blood were
performed in the participants of the GeneSTAR cohort. Genetic variants in ten
independent regions were associated with platelet count
(N = 16,388) with p<5×10−8 of
which 5 have not been associated with platelet count in previous GWAS. The novel
genetic variants associated with platelet count were in the following regions
(the most significant SNP, closest gene, and p-value): 6p22 (rs12526480,
LRRC16A, p = 9.1×10−9), 7q11
(rs13236689, CD36, p = 2.8×10−9),
10q21 (rs7896518, JMJD1C,
p = 2.3×10−12), 11q13 (rs477895,
BAD, p = 4.9×10−8), and 20q13
(rs151361, SLMO2, p = 9.4×10−9).
Three of these loci (10q21, 11q13, and 20q13) were replicated in European
Americans (N = 14,909) and one (11q13) in Hispanic
Americans (N = 3,462). For MPV
(N = 4,531), genetic variants in 3 regions were significant
at p<5×10−8, two of which were also associated with
platelet count. Previously reported regions that were also significant in this
study were 6p21, 6q23, 7q22, 12q24, and 19p13 for platelet count and 7q22,
17q11, and 19p13 for MPV. The most significant SNP in 1 region was also
associated with ADP-induced maximal platelet aggregation in whole blood (12q24).
Thus through a meta-analysis of GWAS enrolling African Americans, we have
identified 5 novel regions associated with platelet count of which 3 were
replicated in other ethnic groups. In addition, we also found one region
associated with platelet aggregation that may play a potential role in
The majority of the variation in platelet count and mean platelet volume between
individuals is heritable. We performed genome-wide association studies in more
than 16,000 African American participants from seven population-based cohorts to
identify genetic variants that correlate with variation in platelet count and
mean platelet volume. We observed statistically significant evidence
(p-value<5×10−8) that 10 genomic regions were
associated with platelet count and 3 were associated with mean platelet volume.
Of the regions that were significantly associated, we found 5 novel regions that
were not reported previously in other populations. Three of these 5 regions were
also associated with platelet count in European Americans and Hispanic
Americans. All these regions contain genes that are either known to have or
potentially may have a role in determining platelet count and/or mean platelet
volume. We further found that one of these regions was also associated with
agonist-induced platelet aggregation. Further studies will determine the exact
role played by these genomic regions in platelet biology. The knowledge
generated by this and other studies will not only help us better understand
platelet biology but can also lead us to the discovery of new anti-platelet
Progressive supranuclear palsy (PSP) is a movement disorder with prominent tau neuropathology. Brain diseases with abnormal tau deposits are called tauopathies, the most common being Alzheimer’s disease. Environmental causes of tauopathies include repetitive head trauma associated with some sports. To identify common genetic variation contributing to risk for tauopathies, we carried out a genome-wide association study of 1,114 PSP cases and 3,247 controls (Stage 1) followed up by a second stage where 1,051 cases and 3,560 controls were genotyped for Stage 1 SNPs that yielded P ≤ 10−3. We found significant novel signals (P < 5 × 10−8) associated with PSP risk at STX6, EIF2AK3, and MOBP. We confirmed two independent variants in MAPT affecting risk for PSP, one of which influences MAPT brain expression. The genes implicated encode proteins for vesicle-membrane fusion at the Golgi-endosomal interface, for the endoplasmic reticulum unfolded protein response, and for a myelin structural component.
Parkinson's disease (PD) is a complex neurodegenerative disease which is clinically heterogeneous and pathologically consists of loss of dopaminergic neurons in the substantia nigra and intracytoplasmic neuronal inclusions containing alpha-synuclein aggregations known as Lewy bodies. Although the majority of PD is idiopathic, pathogenic mutations in several mendelian genes have been successfully identified through linkage analyses. To identify susceptibility loci for idiopathic PD, several genome-wide association studies (GWAS) within different populations have recently been conducted in both idiopathic and familial forms of PD. These analyses have confirmed SNCA and MAPT as loci harboring PD susceptibility. In addition, the GWAS identified several other genetic loci suggestively associated with the risk of PD; among these, only one was replicated by two different studies of European and Asian ancestries. Hence, we investigated this novel locus known as PARK16 for coding mutations in a large series of idiopathic pathologically proven PD cases, and also conducted an association study in a case–control cohort from the United Kingdom. An association between a novel RAB7L1 mutation, c.379-12insT, and disease (P-value=0.0325) was identified. Two novel coding variants present only in the PD cohort were also identified within the RAB7L1 (p.K157R) and SLC41A1 (p.A350V) genes. No copy number variation analyses have yet been performed within this recently identified locus. We concluded that, although both coding variants and risk alleles within the PARK16 locus seem to be rare, further molecular analyses within the PARK16 locus and within different populations are required in order to examine its biochemical role in the disease process.
PARK16 locus; genetics; association studies