1.  Systematic identification of genetic influences on methylation across the human life course 
Genome Biology  2016;17:61.
The influence of genetic variation on complex diseases is potentially mediated through a range of highly dynamic epigenetic processes exhibiting temporal variation during development and later life. Here we present a catalogue of the genetic influences on DNA methylation (methylation quantitative trait loci (mQTL)) at five different life stages in human blood: children at birth, childhood, adolescence and their mothers during pregnancy and middle age.
We show that genetic effects on methylation are highly stable across the life course and that developmental change in the genetic contribution to variation in methylation occurs primarily through increases in environmental or stochastic effects. Though we map a large proportion of the cis-acting genetic variation, a much larger component of genetic effects influencing methylation are acting in trans. However, only 7 % of discovered mQTL are trans-effects, suggesting that the trans component is highly polygenic. Finally, we estimate the contribution of mQTL to variation in complex traits and infer that methylation may have a causal role consistent with an infinitesimal model in which many methylation sites each have a small influence, amounting to a large overall contribution.
DNA methylation contains a significant heritable component that remains consistent across the lifespan. Our results suggest that the genetic component of methylation may have a causal role in complex traits. The database of mQTL presented here provide a rich resource for those interested in investigating the role of methylation in disease.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-0926-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4818469  PMID: 27036880
Methylation quantitative trait loci; mQTL; Cohort; Genetic association; DNA methylation
2.  Plasma urate concentration and risk of coronary heart disease: a Mendelian randomisation analysis 
Increased circulating plasma urate concentration is associated with an increased risk of coronary heart disease, but the extent of any causative effect of urate on risk of coronary heart disease is still unclear. In this study, we aimed to clarify any causal role of urate on coronary heart disease risk using Mendelian randomisation analysis.
We first did a fixed-effects meta-analysis of the observational association of plasma urate and risk of coronary heart disease. We then used a conventional Mendelian randomisation approach to investigate the causal relevance using a genetic instrument based on 31 urate-associated single nucleotide polymorphisms (SNPs). To account for potential pleiotropic associations of certain SNPs with risk factors other than urate, we additionally did both a multivariable Mendelian randomisation analysis, in which the genetic associations of SNPs with systolic and diastolic blood pressure, HDL cholesterol, and triglycerides were included as covariates, and an Egger Mendelian randomisation (MR-Egger) analysis to estimate a causal effect accounting for unmeasured pleiotropy.
In the meta-analysis of 17 prospective observational studies (166 486 individuals; 9784 coronary heart disease events) a 1 SD higher urate concentration was associated with an odds ratio (OR) for coronary heart disease of 1·07 (95% CI 1·04–1·10). The corresponding OR estimates from the conventional, multivariable adjusted, and Egger Mendelian randomisation analysis (58 studies; 198 598 individuals; 65 877 events) were 1·18 (95% CI 1·08–1·29), 1·10 (1·00–1·22), and 1·05 (0·92–1·20), respectively, per 1 SD increment in plasma urate.
Conventional and multivariate Mendelian randomisation analysis implicates a causal role for urate in the development of coronary heart disease, but these estimates might be inflated by hidden pleiotropy. Egger Mendelian randomisation analysis, which accounts for pleiotropy but has less statistical power, suggests there might be no causal effect. These results might help investigators to determine the priority of trials of urate lowering for the prevention of coronary heart disease compared with other potential interventions.
UK National Institute for Health Research, British Heart Foundation, and UK Medical Research Council.
PMCID: PMC4805857  PMID: 26781229
3.  Metabolite Profiling and Cardiovascular Event Risk: A Prospective Study of Three Population-Based Cohorts 
Circulation  2015;131(9):774-785.
High-throughput profiling of circulating metabolites may improve cardiovascular risk prediction over established risk factors.
Methods and Results
We applied quantitative NMR metabolomics to identify biomarkers for incident cardiovascular disease during long-term follow-up. Biomarker discovery was conducted in the FINRISK study (n=7256; 800 events). Replication and incremental risk prediction was assessed in the SABRE study (n=2622; 573 events) and British Women’s Health and Heart Study (n=3563; 368 events). In targeted analyses of 68 lipids and metabolites, 33 measures were associated with incident cardiovascular events at P<0.0007 after adjusting for age, sex, blood pressure, smoking, diabetes and medication. When further adjusting for routine lipids, four metabolites were associated with future cardiovascular events in meta-analyses: higher serum phenylalanine (hazard ratio per standard deviation: 1.18 [95%CI 1.12–1.24]; P=4×10−10) and monounsaturated fatty acid levels (1.17 [1.11–1.24]; P=1×10−8) were associated with increased cardiovascular risk, while higher omega-6 fatty acids (0.89 [0.84–0.94]; P=6×10−5) and docosahexaenoic acid levels (0.90 [0.86–0.95]; P=5×10−5) were associated with lower risk. A risk score incorporating these four biomarkers was derived in FINRISK. Risk prediction estimates were more accurate in the two validation cohorts (relative integrated discrimination improvement 8.8% and 4.3%), albeit discrimination was not enhanced. Risk classification was particularly improved for persons in the 5–10% risk range (net reclassification 27.1% and 15.5%). Biomarker associations were further corroborated with mass spectrometry in FINRISK (n=671) and the Framingham Offspring Study (n=2289).
Metabolite profiling in large prospective cohorts identified phenylalanine, monounsaturated and polyunsaturated fatty acids as biomarkers for cardiovascular risk. This study substantiates the value of high-throughput metabolomics for biomarker discovery and improved risk assessment.
PMCID: PMC4351161  PMID: 25573147
biomarkers; metabolomics; risk prediction; amino acids; fatty acids
4.  Diagnosis of Coronary Heart Diseases Using Gene Expression Profiling; Stable Coronary Artery Disease, Cardiac Ischemia with and without Myocardial Necrosis 
PLoS ONE  2016;11(3):e0149475.
Cardiovascular disease (including coronary artery disease and myocardial infarction) is one of the leading causes of death in Europe, and is influenced by both environmental and genetic factors. With the recent advances in genomic tools and technologies there is potential to predict and diagnose heart disease using molecular data from analysis of blood cells. We analyzed gene expression data from blood samples taken from normal people (n = 21), non-significant coronary artery disease (n = 93), patients with unstable angina (n = 16), stable coronary artery disease (n = 14) and myocardial infarction (MI; n = 207). We used a feature selection approach to identify a set of gene expression variables which successfully differentiate different cardiovascular diseases. The initial features were discovered by fitting a linear model for each probe set across all arrays of normal individuals and patients with myocardial infarction. Three different feature optimisation algorithms were devised which identified two discriminating sets of genes, one using MI and normal controls (total genes = 6) and another one using MI and unstable angina patients (total genes = 7). In all our classification approaches we used a non-parametric k-nearest neighbour (KNN) classification method (k = 3). The results proved the diagnostic robustness of the final feature sets in discriminating patients with myocardial infarction from healthy controls. Interestingly it also showed efficacy in discriminating myocardial infarction patients from patients with clinical symptoms of cardiac ischemia but no myocardial necrosis or stable coronary artery disease, despite the influence of batch effects and different microarray gene chips and platforms.
PMCID: PMC4773227  PMID: 26930047
5.  Identifying Highly Penetrant Disease Causal Mutations Using Next Generation Sequencing: Guide to Whole Process 
BioMed Research International  2015;2015:923491.
Recent technological advances have created challenges for geneticists and a need to adapt to a wide range of new bioinformatics tools and an expanding wealth of publicly available data (e.g., mutation databases, and software). This wide range of methods and a diversity of file formats used in sequence analysis is a significant issue, with a considerable amount of time spent before anyone can even attempt to analyse the genetic basis of human disorders. Another point to consider that is although many possess “just enough” knowledge to analyse their data, they do not make full use of the tools and databases that are available and also do not fully understand how their data was created. The primary aim of this review is to document some of the key approaches and provide an analysis schema to make the analysis process more efficient and reliable in the context of discovering highly penetrant causal mutations/genes. This review will also compare the methods used to identify highly penetrant variants when data is obtained from consanguineous individuals as opposed to nonconsanguineous; and when Mendelian disorders are analysed as opposed to common-complex disorders.
PMCID: PMC4461748  PMID: 26106619
6.  Proxy Molecular Diagnosis from Whole-Exome Sequencing Reveals Papillon-Lefevre Syndrome Caused by a Missense Mutation in CTSC 
PLoS ONE  2015;10(3):e0121351.
Papillon-Lefevre syndrome (PLS) is an autosomal recessive disorder characterised by severe early onset periodontitis and palmoplantar hyperkeratosis. A previously reported missense mutation in the CTSC gene (NM_001814.4:c.899G>A:p.(G300D)) was identified in a homozygous state in two siblings diagnosed with PLS in a consanguineous family of Arabic ancestry. The variant was initially identified in a heterozygous state in a PLS unaffected sibling whose whole exome had been sequenced as part of a previous Primary ciliary dyskinesia study. Using this information, a proxy molecular diagnosis was made on the PLS affected siblings after consent was given to study this second disorder found to be segregating within the family. The prevalence of the mutation was then assayed in the local population using a representative sample of 256 unrelated individuals. The variant was absent in all subjects indicating that the variant is rare in Saudi Arabia. This family study illustrates how whole-exome sequencing can generate findings and inferences beyond its primary goal.
PMCID: PMC4370501  PMID: 25799584
7.  Epigenome-wide association of DNA methylation markers in peripheral blood from Indian Asians and Europeans with incident type 2 diabetes: a nested case-control study 
Indian Asians, who make up a quarter of the world’s population, are at high risk of developing type 2 diabetes. We investigated whether DNA methylation is associated with future type 2 diabetes incidence in Indian Asians and whether differences in methylation patterns between Indian Asians and Europeans are associated with, and could be used to predict, differences in the magnitude of risk of developing type 2 diabetes.
We did a nested case-control study of DNA methylation in Indian Asians and Europeans with incident type 2 diabetes who were identified from the 8-year follow-up of 25 372 participants in the London Life Sciences Prospective Population (LOLIPOP) study. Patients were recruited between May 1, 2002, and Sept 12, 2008. We did epigenome-wide association analysis using samples from Indian Asians with incident type 2 diabetes and age-matched and sex-matched Indian Asian controls, followed by replication testing of top-ranking signals in Europeans. For both discovery and replication, DNA methylation was measured in the baseline blood sample, which was collected before the onset of type 2 diabetes. Epigenome-wide significance was set at p<1 × 10−7. We compared methylation levels between Indian Asian and European controls without type 2 diabetes at baseline to estimate the potential contribution of DNA methylation to increased risk of future type 2 diabetes incidence among Indian Asians.
1608 (11·9%) of 13 535 Indian Asians and 306 (4·3%) of 7066 Europeans developed type 2 diabetes over a mean of 8·5 years (SD 1·8) of follow-up. The age-adjusted and sex-adjusted incidence of type 2 diabetes was 3·1 times (95% CI 2·8–3·6; p<0·0001) higher among Indian Asians than among Europeans, and remained 2·5 times (2·1–2·9; p<0·0001) higher after adjustment for adiposity, physical activity, family history of type 2 diabetes, and baseline glycaemic measures. The mean absolute difference in methylation level between type 2 diabetes cases and controls ranged from 0·5% (SD 0·1) to 1·1% (0·2). Methylation markers at five loci were associated with future type 2 diabetes incidence; the relative risk per 1% increase in methylation was 1·09 (95% CI 1·07–1·11; p=1·3 × 10−17) for ABCG1, 0·94 (0·92–0·95; p=4·2 × 10−11) for PHOSPHO1, 0·94 (0·92–0·96; p=1·4 × 10−9) for SOCS3, 1·07 (1·04–1·09; p=2·1 × 10−10) for SREBF1, and 0·92 (0·90–0·94; p=1·2 × 10−17) for TXNIP. A methylation score combining results for the five loci was associated with future type 2 diabetes incidence (relative risk quartile 4 vs quartile 1 3·51, 95% CI 2·79–4·42; p=1·3 × 10−26), and was independent of established risk factors. Methylation score was higher among Indian Asians than Europeans (p=1 × 10−34).
DNA methylation might provide new insights into the pathways underlying type 2 diabetes and offer new opportunities for risk stratification and prevention of type 2 diabetes among Indian Asians.
The European Union, the UK National Institute for Health Research, the Wellcome Trust, the UK Medical Research Council, Action on Hearing Loss, the UK Biotechnology and Biological Sciences Research Council, the Oak Foundation, the Economic and Social Research Council, Helmholtz Zentrum Munchen, the German Research Center for Environmental Health, the German Federal Ministry of Education and Research, the German Center for Diabetes Research, the Munich Center for Health Sciences, the Ministry of Science and Research of the State of North Rhine-Westphalia, and the German Federal Ministry of Health.
PMCID: PMC4724884  PMID: 26095709
8.  Texture analysis in gel electrophoresis images using an integrative kernel-based approach 
Scientific Reports  2016;6:19256.
Texture information could be used in proteomics to improve the quality of the image analysis of proteins separated on a gel. In order to evaluate the best technique to identify relevant textures, we use several different kernel-based machine learning techniques to classify proteins in 2-DE images into spot and noise. We evaluate the classification accuracy of each of these techniques with proteins extracted from ten 2-DE images of different types of tissues and different experimental conditions. We found that the best classification model was FSMKL, a data integration method using multiple kernel learning, which achieved AUROC values above 95% while using a reduced number of features. This technique allows us to increment the interpretability of the complex combinations of textures and to weight the importance of each particular feature in the final model. In particular the Inverse Difference Moment exhibited the highest discriminating power. A higher value can be associated with an homogeneous structure as this feature describes the homogeneity; the larger the value, the more symmetric. The final model is performed by the combination of different groups of textural features. Here we demonstrated the feasibility of combining different groups of textures in 2-DE image analysis for spot detection.
PMCID: PMC4713050  PMID: 26758643
9.  Lipids, obesity and gallbladder disease in women: insights from genetic studies using the cardiovascular gene-centric 50K SNP array 
Gallbladder disease (GBD) has an overall prevalence of 10-40% depending on factors such as age, gender, population, obesity and diabetes, and represents a major economic burden. While gallstones are composed of cholesterol by-products and are associated with obesity, presumed causal pathways remain unproven, although BMI reduction is typically recommended. We performed genetic studies to discover candidate genes and define pathways involved in GBD. We genotyped 15,241 women of European ancestry from three cohorts, including 3,216 with GBD, using the Human cardiovascular disease (HumanCVD) BeadChip (Illumina, San Diego, CA) containing up to ~53,000 SNPs. Effect sizes with p values for development of GBD were generated. We identify two new loci associated with GBD, GCKR rs1260326:T>C (P=5.88×10−7, ß=−0.146) and TTC39B rs686030:C>A (P=6.95×10−7, ß=0.271) and detect four independent SNP effects in ABCG8 rs4953023:G>A (P=7.41×10−47, ß=0.734), ABCG8 rs4299376:G>T (P=2.40×10−18, ß=0.278), ABCG5 rs6544718:T>C (P=2.08×10−14, ß=0.044) and ABCG5 rs6720173:G>C (P=3.81×10−12, ß=0.262) in conditional analyses taking genotypes of rs4953023:G>A as a covariate. We also delineate the risk effects among many genotypes known to influence lipids. These data, from the largest GBD genetic study to date, show that specific, mainly hepatocyte-centred, components of lipid metabolism are important to GBD risk in women. We discuss the potential pharmaceutical implications of our findings.
PMCID: PMC4681116  PMID: 25920552
gallbladder disease; genetics; lipids; women; cardiovascular gene-centric 50K SNP array
11.  Copy number variations and cognitive phenotypes in unselected populations 
JAMA  2015;313(20):2044-2054.
The association of rare copy number variants (CNVs) with complex disorders is almost exclusively evaluated using clinically ascertained cohorts. As a result, the contribution of these genetic variants to cognitive phenotypes in the general population remains unclear.
- To investigate the clinical features of genomic disorders in adult carriers without clinical pre-selection.
- To assess the genome-wide burden of rare CNVs on carriers’ educational attainment and intellectual disability prevalence in the general population.
Design, Setting, and Participants
The population biobank of Estonia (EGCUT) contains 52,000 participants, or 5% of the Estonian adults, enrolled in 2002-2010. General practitioners examined participants and filled out a questionnaire of health- and lifestyle-related questions, as well as reported diagnoses. As EGCUT is representative of the country's population, we investigated a random sample of 7877 individuals for CNV analysis and genotype-phenotype associations with education and disease traits.
Main Outcomes and Measures
Phenotypes of genomic disorders in the general population, prevalence of autosomal CNVs, and association of the latter variants with decreased educational attainment and increased prevalence of intellectual disability.
We identified 56 carriers of genomic disorders. Their phenotypes are reminiscent of those described for carriers of identical rearrangements ascertained in clinical cohorts. We also generated a genome-wide map of rare (frequency ≤0.05%) autosomal CNVs and identified 10.5% of the screened general population (n=831) as carriers of CNVs ≥250kb. Carriers of deletions ≥250kb or duplications ≥1Mb show, compared to the Estonian population, a greater prevalence of intellectual disability (P=0.0015, OR=3.16, (95%CI: 1.51-5.98); P=0.0083, OR=3.67, (95%CI: 1.29-8.54), respectively), reduced mean education attainment (a proxy for intelligence; P=1.06e-04; P=5.024e-05, respectively) and an increased fraction of individuals not graduating from secondary school (P=0.005, OR=1.48 (95%CI: 1.12-1.95); P=0.0016, OR=1.89 (95%CI: 1.27-2.8), respectively). The deletions show evidence of enrichment for genes with a role in neurogenesis, cognition, learning, memory and behavior. Evidence for an association between rare CNVs and decreased educational attainment was confirmed by analyses in adult cohorts of Italian (HYPERGENES) and European American (Minnesota Center for Twin and Family Research) individuals, as well as in the Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort.
Conclusions and Relevance
Our results challenge the assumption that carriers of known syndromic CNVs identified in population cohorts are asymptomatic. They also indicate that individually rare but collectively common intermediate-size CNVs contribute to the variance in educational attainment. Refinements of these findings in additional population groups is warranted given the potential implications of this observation for genomics research, clinical care, and public health.
PMCID: PMC4684269  PMID: 26010633
genomic disorders; CNV; 16p11.2; population biobanks; education; intelligence; EGCUT; ALSPAC
12.  Influence of Adiposity-Related Genetic Markers in a Population of Saudi Arabians Where Other Variables Influencing Obesity May Be Reduced 
Disease Markers  2014;2014:758232.
Large scale studies in Europeans have clearly identified common polymorphism affecting BMI and obesity. We undertook a genotype study to examine the impact of variants, known to influence obesity, in a sample from the Saudi Arabian population, notable for its profound combination of low mean physical activity indices and high energy intake. Anthropometry measures and genotypes were obtained for 367 Saudis, taken from King Saud University and Biomarker Screening Project in Riyadh (Riyadh Cohort). We observed large effect sizes with obesity for rs10767664 (BDNF) (OR = 1.923, P = 0.00072) and rs3751812 (FTO) (OR = 1.523, P = 0.016) in our sample and, using weighted genetic risk scores, we found strong evidence of a cumulative effect using 11 SNPs taken predominantly from loci principally affecting appetite (OR = 2.57, P = 0.00092). We used conditional analyses to discern which of our three highly correlated FTO SNPs were responsible for the observed signal, although we were unable to determine with confidence which best marked the causal site. Our analysis indicates that markers located in loci known to influence fat mass through increased appetite affect obesity in Saudi Arabians to an extent possibly greater than in Europeans. Larger scale studies will be necessary to obtain a precise comparison.
PMCID: PMC4251424  PMID: 25484485
13.  Prenatal and early life influences on epigenetic age in children: a study of mother–offspring pairs from two cohort studies 
Human Molecular Genetics  2015;25(1):191-201.
DNA methylation-based biomarkers of aging are highly correlated with actual age. Departures of methylation-estimated age from actual age can be used to define epigenetic measures of child development or age acceleration (AA) in adults. Very little is known about genetic or environmental determinants of these epigenetic measures of aging. We obtained DNA methylation profiles using Infinium HumanMethylation450 BeadChips across five time-points in 1018 mother–child pairs from the Avon Longitudinal Study of Parents and Children. Using the Horvath age estimation method, we calculated epigenetic age for these samples. AA was defined as the residuals from regressing epigenetic age on actual age. AA was tested for associations with cross-sectional clinical variables in children. We identified associations between AA and sex, birth weight, birth by caesarean section and several maternal characteristics in pregnancy, namely smoking, weight, BMI, selenium and cholesterol level. Offspring of non-drinkers had higher AA on average but this difference appeared to resolve during childhood. The associations between sex, birth weight and AA found in ARIES were replicated in an independent cohort (GOYA). In children, epigenetic AA measures are associated with several clinically relevant variables, and early life exposures appear to be associated with changes in AA during adolescence. Further research into epigenetic aging, including the use of causal inference methods, is required to better our understanding of aging.
PMCID: PMC4690495  PMID: 26546615
14.  Gene-centric association signals for haemostasis and thrombosis traits identified with the HumanCVD BeadChip 
Thrombosis and haemostasis  2013;110(5):995-1003.
Coagulation phenotypes show strong intercorrelations, affect cardiovascular disease risk and are influenced by genetic variants. The objective of this study was to search for novel genetic variants influencing the following coagulation phenotypes: factor VII levels, fibrinogen levels, plasma viscosity and platelet count.
Methods and Results
We genotyped the British Women’s Heart and Health Study (n=3445) and the Whitehall II study (n=5059) using the Illumina HumanCVD BeadArray to investigate genetic associations and pleiotropy. In addition to previously reported associations (SH2B3, F7/F10, PROCR, GCKR, FGA/FGB/FGG, IL5), we identified novel associations at GRK5 (rs10128498, p=1.30×10−6), GCKR (rs1260326, p=1.63×10−6), ZNF259-APOA5 (rs651821, p=7.17×10−6) with plasma viscosity; and at CSF1 (rs333948, p=8.88×10−6) with platelet count. A pleiotropic effect was identified in GCKR which associated with factor VII (p=2.16×10−7) and plasma viscosity (p=1.63×10−6), and, to a lesser extent, ZNF259-APOA5 which associated with factor VII and fibrinogen (p<1.00×10−2) and additionally plasma viscosity (p<1.00×10−5). Triglyceride associated variants were overrepresented in Factor VII and plasma viscosity associations. Adjusting for triglyceride levels resulted in attenuation of associations at the GCKR and ZNF259-APOA5 loci.
In addition to confirming previously reported associations, we identified four SNPs associated with plasma viscosity and platelet count and found evidence of pleiotropic effects with SNPs in GCKR and ZNF259-APOA5. These triglyceride-associated, pleiotropic SNPs suggest a possible causal role for triglycerides in coagulation.
PMCID: PMC4067543  PMID: 24178511
Haemostasis; Thrombosis; HumanCVD; Clotting Factors; Genetic Association
15.  The effects of height and BMI on prostate cancer incidence and mortality: a Mendelian randomization study in 20,848 cases and 20,214 controls from the PRACTICAL consortium 
Cancer Causes & Control  2015;26(11):1603-1616.
Epidemiological studies suggest a potential role for obesity and determinants of adult stature in prostate cancer risk and mortality, but the relationships described in the literature are complex. To address uncertainty over the causal nature of previous observational findings, we investigated associations of height- and adiposity-related genetic variants with prostate cancer risk and mortality.
We conducted a case–control study based on 20,848 prostate cancers and 20,214 controls of European ancestry from 22 studies in the PRACTICAL consortium. We constructed genetic risk scores that summed each man’s number of height and BMI increasing alleles across multiple single nucleotide polymorphisms robustly associated with each phenotype from published genome-wide association studies.
The genetic risk scores explained 6.31 and 1.46 % of the variability in height and BMI, respectively. There was only weak evidence that genetic variants previously associated with increased BMI were associated with a lower prostate cancer risk (odds ratio per standard deviation increase in BMI genetic score 0.98; 95 % CI 0.96, 1.00; p = 0.07). Genetic variants associated with increased height were not associated with prostate cancer incidence (OR 0.99; 95 % CI 0.97, 1.01; p = 0.23), but were associated with an increase (OR 1.13; 95 % CI 1.08, 1.20) in prostate cancer mortality among low-grade disease (p heterogeneity, low vs. high grade <0.001). Genetic variants associated with increased BMI were associated with an increase (OR 1.08; 95 % CI 1.03, 1.14) in all-cause mortality among men with low-grade disease (p heterogeneity = 0.03).
We found little evidence of a substantial effect of genetically elevated height or BMI on prostate cancer risk, suggesting that previously reported observational associations may reflect common environmental determinants of height or BMI and prostate cancer risk. Genetically elevated height and BMI were associated with increased mortality (prostate cancer-specific and all-cause, respectively) in men with low-grade disease, a potentially informative but novel finding that requires replication.
Electronic supplementary material
The online version of this article (doi:10.1007/s10552-015-0654-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4596899  PMID: 26387087
Height; Body mass index; Prostate cancer; Mendelian randomization; Single nucleotide polymorphisms; Instrumental variables analysis
16.  Nonsense Mutation in Coiled-Coil Domain Containing 151 Gene (CCDC151) Causes Primary Ciliary Dyskinesia 
Human Mutation  2014;35(12):1446-1448.
Primary ciliary dyskinesia (PCD) is an autosomal-recessive disorder characterized by impaired ciliary function that leads to subsequent clinical phenotypes such as chronic sinopulmonary disease. PCD is also a genetically heterogeneous disorder with many single gene mutations leading to similar clinical phenotypes. Here, we present a novel PCD causal gene, coiled-coil domain containing 151 (CCDC151), which has been shown to be essential in motile cilia of many animals and other vertebrates but its effects in humans was not observed until currently. We observed a novel nonsense mutation in a homozygous state in the CCDC151 gene (NM_145045.4:c.925G>T:p.[E309*]) in a clinically diagnosed PCD patient from a consanguineous family of Arabic ancestry. The variant was absent in 238 randomly selected individuals indicating that the variant is rare and likely not to be a founder mutation. Our finding also shows that given prior knowledge from model organisms, even a single whole-exome sequence can be sufficient to discover a novel causal gene.
PMCID: PMC4489323  PMID: 25224326
primary ciliary dyskinesia; CCDC151; respiratory cilia; ciliopathy
17.  Gene-centric meta-analyses for central adiposity traits in up to 57 412 individuals of European descent confirm known loci and reveal several novel associations 
Human Molecular Genetics  2013;23(9):2498-2510.
Waist circumference (WC) and waist-to-hip ratio (WHR) are surrogate measures of central adiposity that are associated with adverse cardiovascular events, type 2 diabetes and cancer independent of body mass index (BMI). WC and WHR are highly heritable with multiple susceptibility loci identified to date. We assessed the association between SNPs and BMI-adjusted WC and WHR and unadjusted WC in up to 57 412 individuals of European descent from 22 cohorts collaborating with the NHLBI's Candidate Gene Association Resource (CARe) project. The study population consisted of women and men aged 20–80 years. Study participants were genotyped using the ITMAT/Broad/CARE array, which includes ∼50 000 cosmopolitan tagged SNPs across ∼2100 cardiovascular-related genes. Each trait was modeled as a function of age, study site and principal components to control for population stratification, and we conducted a fixed-effects meta-analysis. No new loci for WC were observed. For WHR analyses, three novel loci were significantly associated (P < 2.4 × 10−6). Previously unreported rs2811337-G near TMCC1 was associated with increased WHR (β ± SE, 0.048 ± 0.008, P = 7.7 × 10−9) as was rs7302703-G in HOXC10 (β = 0.044 ± 0.008, P = 2.9 × 10−7) and rs936108-C in PEMT (β = 0.035 ± 0.007, P = 1.9 × 10−6). Sex-stratified analyses revealed two additional novel signals among females only, rs12076073-A in SHC1 (β = 0.10 ± 0.02, P = 1.9 × 10−6) and rs1037575-A in ATBDB4 (β = 0.046 ± 0.01, P = 2.2 × 10−6), supporting an already established sexual dimorphism of central adiposity-related genetic variants. Functional analysis using ENCODE and eQTL databases revealed that several of these loci are in regulatory regions or regions with differential expression in adipose tissue.
PMCID: PMC3988452  PMID: 24345515
18.  Longitudinal analysis of DNA methylation associated with birth weight and gestational age 
Human Molecular Genetics  2015;24(13):3752-3763.
Gestational age (GA) and birth weight have been implicated in the determination of long-term health. It has been hypothesized that changes in DNA methylation may mediate these long-term effects. We obtained DNA methylation profiles from cord blood and peripheral blood at ages 7 and 17 in the same children from the Avon Longitudinal Study of Parents and Children. Repeated-measures data were used to investigate changes in birth-related methylation during childhood and adolescence. Ten developmental phenotypes (e.g. height) were analysed to identify possible mediation of health effects by DNA methylation. In cord blood, methylation at 224 CpG sites was found to be associated with GA and 23 CpG sites with birth weight. Methylation changed in the majority of these sites over time, but neither birth characteristic was strongly associated with methylation at age 7 or 17 (using a conservative correction for multiple testing of P < 1.03 × 10–7), suggesting resolution of differential methylation by early childhood. Associations were observed between birth weight-associated CpG sites and phenotypic characteristics in childhood. One strong association involved birth weight, methylation of a CpG site proximal to the NFIX locus and bone mineral density at age 17. Analysis of serial methylation from birth to adolescence provided evidence for a lack of persistence of methylation differences beyond early childhood. Sites associated with birth weight were linked to developmental genes and have methylation levels which are associated with developmental phenotypes. Replication and interrogation of causal relationships are needed to substantiate whether methylation differences at birth influence the association between birth weight and development.
PMCID: PMC4459393  PMID: 25869828
19.  Maternal pre-pregnancy BMI and gestational weight gain, offspring DNA methylation and later offspring adiposity: findings from the Avon Longitudinal Study of Parents and Children 
Background: Evidence suggests that in utero exposure to undernutrition and overnutrition might affect adiposity in later life. Epigenetic modification is suggested as a plausible mediating mechanism.
Methods: We used multivariable linear regression and a negative control design to examine offspring epigenome-wide DNA methylation in relation to maternal and offspring adiposity in 1018 participants.
Results: Compared with neonatal offspring of normal weight mothers, 28 and 1621 CpG sites were differentially methylated in offspring of obese and underweight mothers, respectively [false discovert rate (FDR)-corrected P-value < 0.05), with no overlap in the sites that maternal obesity and underweight relate to. A positive association, where higher methylation is associated with a body mass index (BMI) outside the normal range, was seen at 78.6% of the sites associated with obesity and 87.9% of the sites associated with underweight. Associations of maternal obesity with offspring methylation were stronger than associations of paternal obesity, supporting an intrauterine mechanism. There were no consistent associations of gestational weight gain with offspring DNA methylation. In general, sites that were hypermethylated in association with maternal obesity or hypomethylated in association with maternal underweight tended to be positively associated with offspring adiposity, and sites hypomethylated in association with maternal obesity or hypermethylated in association with maternal underweight tended to be inversely associated with offspring adiposity.
Conclusions: Our data suggest that both maternal obesity and, to a larger degree, underweight affect the neonatal epigenome via an intrauterine mechanism, but weight gain during pregnancy has little effect. We found some evidence that associations of maternal underweight with lower offspring adiposity and maternal obesity with greater offspring adiposity may be mediated via increased DNA methylation.
PMCID: PMC4588865  PMID: 25855720
Epigenetic; ALSPAC; ARIES; causality; epigenome-wide association study; longitudinal; overweight; overnutrition; undernutrition
20.  Mosaic structural variation in children with developmental disorders 
Human Molecular Genetics  2015;24(10):2733-2745.
Delineating the genetic causes of developmental disorders is an area of active investigation. Mosaic structural abnormalities, defined as copy number or loss of heterozygosity events that are large and present in only a subset of cells, have been detected in 0.2–1.0% of children ascertained for clinical genetic testing. However, the frequency among healthy children in the community is not well characterized, which, if known, could inform better interpretation of the pathogenic burden of this mutational category in children with developmental disorders. In a case–control analysis, we compared the rate of large-scale mosaicism between 1303 children with developmental disorders and 5094 children lacking developmental disorders, using an analytical pipeline we developed, and identified a substantial enrichment in cases (odds ratio = 39.4, P-value 1.073e − 6). A meta-analysis that included frequency estimates among an additional 7000 children with congenital diseases yielded an even stronger statistical enrichment (P-value 1.784e − 11). In addition, to maximize the detection of low-clonality events in probands, we applied a trio-based mosaic detection algorithm, which detected two additional events in probands, including an individual with genome-wide suspected chimerism. In total, we detected 12 structural mosaic abnormalities among 1303 children (0.9%). Given the burden of mosaicism detected in cases, we suspected that many of the events detected in probands were pathogenic. Scrutiny of the genotypic–phenotypic relationship of each detected variant assessed that the majority of events are very likely pathogenic. This work quantifies the burden of structural mosaicism as a cause of developmental disorders.
PMCID: PMC4406290  PMID: 25634561
21.  HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials 
Lancet  2015;385(9965):351-361.
Statins increase the risk of new-onset type 2 diabetes mellitus. We aimed to assess whether this increase in risk is a consequence of inhibition of 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR), the intended drug target.
We used single nucleotide polymorphisms in the HMGCR gene, rs17238484 (for the main analysis) and rs12916 (for a subsidiary analysis) as proxies for HMGCR inhibition by statins. We examined associations of these variants with plasma lipid, glucose, and insulin concentrations; bodyweight; waist circumference; and prevalent and incident type 2 diabetes. Study-specific effect estimates per copy of each LDL-lowering allele were pooled by meta-analysis. These findings were compared with a meta-analysis of new-onset type 2 diabetes and bodyweight change data from randomised trials of statin drugs. The effects of statins in each randomised trial were assessed using meta-analysis.
Data were available for up to 223 463 individuals from 43 genetic studies. Each additional rs17238484-G allele was associated with a mean 0·06 mmol/L (95% CI 0·05–0·07) lower LDL cholesterol and higher body weight (0·30 kg, 0·18–0·43), waist circumference (0·32 cm, 0·16–0·47), plasma insulin concentration (1·62%, 0·53–2·72), and plasma glucose concentration (0·23%, 0·02–0·44). The rs12916 SNP had similar effects on LDL cholesterol, bodyweight, and waist circumference. The rs17238484-G allele seemed to be associated with higher risk of type 2 diabetes (odds ratio [OR] per allele 1·02, 95% CI 1·00–1·05); the rs12916-T allele association was consistent (1·06, 1·03–1·09). In 129 170 individuals in randomised trials, statins lowered LDL cholesterol by 0·92 mmol/L (95% CI 0·18–1·67) at 1-year of follow-up, increased bodyweight by 0·24 kg (95% CI 0·10–0·38 in all trials; 0·33 kg, 95% CI 0·24–0·42 in placebo or standard care controlled trials and −0·15 kg, 95% CI −0·39 to 0·08 in intensive-dose vs moderate-dose trials) at a mean of 4·2 years (range 1·9–6·7) of follow-up, and increased the odds of new-onset type 2 diabetes (OR 1·12, 95% CI 1·06–1·18 in all trials; 1·11, 95% CI 1·03–1·20 in placebo or standard care controlled trials and 1·12, 95% CI 1·04–1·22 in intensive-dose vs moderate dose trials).
The increased risk of type 2 diabetes noted with statins is at least partially explained by HMGCR inhibition.
The funding sources are cited at the end of the paper.
PMCID: PMC4322187  PMID: 25262344
22.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation 
Bioinformatics  2015;31(10):1536-1543.
Motivation: Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source.
Results: We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions.
Availability and implementation: The FATHMM-MKL webserver is available at:
Contact: or or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4426838  PMID: 25583119
23.  Prenatal exposure to maternal smoking and offspring DNA methylation across the lifecourse: findings from the Avon Longitudinal Study of Parents and Children (ALSPAC) 
Human Molecular Genetics  2014;24(8):2201-2217.
Maternal smoking during pregnancy has been found to influence newborn DNA methylation in genes involved in fundamental developmental processes. It is pertinent to understand the degree to which the offspring methylome is sensitive to the intensity and duration of prenatal smoking. An investigation of the persistence of offspring methylation associated with maternal smoking and the relative roles of the intrauterine and postnatal environment is also warranted. In the Avon Longitudinal Study of Parents and Children, we investigated associations between prenatal exposure to maternal smoking and offspring DNA methylation at multiple time points in approximately 800 mother–offspring pairs. In cord blood, methylation at 15 CpG sites in seven gene regions (AHRR, MYO1G, GFI1, CYP1A1, CNTNAP2, KLF13 and ATP9A) was associated with maternal smoking, and a dose-dependent response was observed in relation to smoking duration and intensity. Longitudinal analysis of blood DNA methylation in serial samples at birth, age 7 and 17 years demonstrated that some CpG sites showed reversibility of methylation (GFI1, KLF13 and ATP9A), whereas others showed persistently perturbed patterns (AHRR, MYO1G, CYP1A1 and CNTNAP2). Of those showing persistence, we explored the effect of postnatal smoke exposure and found that the major contribution to altered methylation was attributed to a critical window of in utero exposure. A comparison of paternal and maternal smoking and offspring methylation showed consistently stronger maternal associations, providing further evidence for causal intrauterine mechanisms. These findings emphasize the sensitivity of the methylome to maternal smoking during early development and the long-term impact of such exposure.
PMCID: PMC4380069  PMID: 25552657
24.  Canonical Correlation Analysis for Gene-Based Pleiotropy Discovery 
PLoS Computational Biology  2014;10(10):e1003876.
Genome-wide association studies have identified a wealth of genetic variants involved in complex traits and multifactorial diseases. There is now considerable interest in testing variants for association with multiple phenotypes (pleiotropy) and for testing multiple variants for association with a single phenotype (gene-based association tests). Such approaches can increase statistical power by combining evidence for association over multiple phenotypes or genetic variants respectively. Canonical Correlation Analysis (CCA) measures the correlation between two sets of multidimensional variables, and thus offers the potential to combine these two approaches. To apply CCA, we must restrict the number of attributes relative to the number of samples. Hence we consider modules of genetic variation that can comprise a gene, a pathway or another biologically relevant grouping, and/or a set of phenotypes. In order to do this, we use an attribute selection strategy based on a binary genetic algorithm. Applied to a UK-based prospective cohort study of 4286 women (the British Women's Heart and Health Study), we find improved statistical power in the detection of previously reported genetic associations, and identify a number of novel pleiotropic associations between genetic variants and phenotypes. New discoveries include gene-based association of NSF with triglyceride levels and several genes (ACSM3, ERI2, IL18RAP, IL23RAP and NRG1) with left ventricular hypertrophy phenotypes. In multiple-phenotype analyses we find association of NRG1 with left ventricular hypertrophy phenotypes, fibrinogen and urea and pleiotropic relationships of F7 and F10 with Factor VII, Factor IX and cholesterol levels.
Author Summary
Pleiotropy appears when a variation in one gene affects to several non-related phenotypes. The study of this phenomenon can be useful in gene function discovery, but also in the study of the evolution of a gene. In this paper, we present a methodology, based on Canonical Correlation Analysis, which studies gene-centered multiple association of the variation of SNPs in one or a set of genes with one or a set of phenotypes. The resulting methodology can be applied in gene-centered association analysis, multiple association analysis or pleiotropic pattern discovery. We apply this methodology with a genotype dataset and a set of cardiovascular related phenotypes, and discover new gene association between gene NRG1 and phenotypes related with left ventricular hypertrophy, and pleiotropic effects of this gene with other phenotypes as coagulation factors and urea or pleiotropic effects between coagulation related genes F7 and F10 with coagulation factors and cholesterol levels. This methodology could be also used to find multiple associations in other omics datasets.
PMCID: PMC4199483  PMID: 25329069
25.  Loci influencing blood pressure identified using a cardiovascular gene-centric array 
Human Molecular Genetics  2013;22(16):3394-3395.
PMCID: PMC3888295

