Several studies examined the fine-scale structure of human genetic variation in Europe. However, the European sets analyzed represent mainly northern, western, central, and southern Europe. Here, we report an analysis of approximately 166,000 single nucleotide polymorphisms in populations from eastern (northeastern) Europe: four Russian populations from European Russia, and three populations from the northernmost Finno-Ugric ethnicities (Veps and two contrast groups of Komi people). These were compared with several reference European samples, including Finns, Estonians, Latvians, Poles, Czechs, Germans, and Italians. The results obtained demonstrated genetic heterogeneity of populations living in the region studied. Russians from the central part of European Russia (Tver, Murom, and Kursk) exhibited similarities with populations from central–eastern Europe, and were distant from Russian sample from the northern Russia (Mezen district, Archangelsk region). Komi samples, especially Izhemski Komi, were significantly different from all other populations studied. These can be considered as a second pole of genetic diversity in northern Europe (in addition to the pole, occupied by Finns), as they had a distinct ancestry component. Russians from Mezen and the Finnic-speaking Veps were positioned between the two poles, but differed from each other in the proportions of Komi and Finnic ancestries. In general, our data provides a more complete genetic map of Europe accounting for the diversity in its most eastern (northeastern) populations.
Clinical relevance of a genetic predisposition to elevated blood pressure was quantified during the transition from childhood to adulthood in a population-based Finnish cohort (N=2,357). Blood pressure was measured at baseline in 1980 (age 3–18 years) and in follow-ups in 1983, 1986, 2001 and 2007. Thirteen single nucleotide polymorphisms associated with blood pressure were genotyped and three genetic risk scores associated with systolic and diastolic blood pressure and their combination were derived for all participants. Effects of the genetic risk score were 0.47 mmHg for systolic and 0.53 mmHg for diastolic blood pressure (both p<0.01). The combination genetic risk score was associated with diastolic blood pressure from age 9 onwards (β=0.68 mmHg, p=0.015). Replications in 1194 participants of the Bogalusa Heart Study showed essentially similar results. The participants in the highest quintile of the combination genetic risk score had a 1.82-fold risk of hypertension in adulthood (p<0.0001) compared with the lowest quintile, independent of a family history of premature hypertension. These findings show that genetic variants are associated with preclinical blood pressure traits in childhood, individuals with several susceptibility alleles have on average a 0.5 mmHg higher blood pressure and this trajectory continues from childhood to adulthood.
Epidemiological study; Genetic risk score; Blood Pressure; Cardiovascular disease
Genome-wide association (GWA) studies have identified several susceptibility loci for metabolic syndrome (MetS) component traits, but have had variable success in identifying susceptibility loci to the syndrome as an entity. We conducted a GWA study on MetS and its component traits in four Finnish cohorts consisting of 2637 MetS cases and 7927 controls, both free of diabetes, and followed the top loci in an independent sample with transcriptome and NMR-based metabonomics data. Furthermore, we tested for loci associated with multiple MetS component traits using factor analysis and built a genetic risk score for MetS.
Methods and Results
A previously known lipid locus, APOA1/C3/A4/A5 gene cluster region (SNP rs964184), was associated with MetS in all four study samples (P=7.23×10−9 in meta-analysis). The association was further supported by serum metabolite analysis, where rs964184 associated with various VLDL, TG, and HDL metabolites (P=0.024-1.88×10−5). Twenty-two previously identified susceptibility loci for individual MetS component traits were replicated in our GWA and factor analysis. Most of these associated with lipid phenotypes and none with two or more uncorrelated MetS components. A genetic risk score, calculated as the number of alleles in loci associated with individual MetS traits, was strongly associated with MetS status.
Our findings suggest that genes from lipid metabolism pathways have the key role in the genetic background of MetS. We found little evidence for pleiotropy linking dyslipidemia and obesity to the other MetS component traits such as hypertension and glucose intolerance.
metabolic syndrome; risk factors; genome-wide association study; meta-analysis; lipids
Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis.
In this study, we aim to identify novel genetic variants for metabolism, characterize their effects on nearby genes, and show that the nearby genes are associated with metabolism and atherosclerosis. To discover new genetic variants, we use an alternative approach to traditional genome-wide association studies: we leverage the information in phenotype covariance to increase our statistical power. We identify variants at seven novel loci and then show that our top signals drive expression of nearby genes AQP9 and SERPINA1 in multiple tissues. We demonstrate that AQP9 and SERPINA1 gene expression, in turn, is associated with metabolite levels. Finally, we show that the genes are associated with atherosclerosis using mouse atherosclerotic lesion size (AQP9) as well as tissue from healthy human arteries and atherosclerotic plaques (AQP9 and SERPINA1). This study illustrates that multivariate analysis of correlated metabolites can boost power for gene discovery substantially. Further functional work will need to be performed to elucidate the biological role of SERPINA1 and AQP9 in atherosclerosis.
High plasma HDL cholesterol is associated with reduced risk of myocardial infarction, but whether this association is causal is unclear. Exploiting the fact that genotypes are randomly assigned at meiosis, are independent of non-genetic confounding, and are unmodified by disease processes, mendelian randomisation can be used to test the hypothesis that the association of a plasma biomarker with disease is causal.
We performed two mendelian randomisation analyses. First, we used as an instrument a single nucleotide polymorphism (SNP) in the endothelial lipase gene (LIPG Asn396Ser) and tested this SNP in 20 studies (20 913 myocardial infarction cases, 95 407 controls). Second, we used as an instrument a genetic score consisting of 14 common SNPs that exclusively associate with HDL cholesterol and tested this score in up to 12 482 cases of myocardial infarction and 41 331 controls. As a positive control, we also tested a genetic score of 13 common SNPs exclusively associated with LDL cholesterol.
Carriers of the LIPG 396Ser allele (2·6% frequency) had higher HDL cholesterol (0·14 mmol/L higher, p=8×10−13) but similar levels of other lipid and non-lipid risk factors for myocardial infarction compared with non-carriers. This difference in HDL cholesterol is expected to decrease risk of myocardial infarction by 13% (odds ratio [OR] 0·87, 95% CI 0·84–0·91). However, we noted that the 396Ser allele was not associated with risk of myocardial infarction (OR 0·99, 95% CI 0·88–1·11, p=0·85). From observational epidemiology, an increase of 1 SD in HDL cholesterol was associated with reduced risk of myocardial infarction (OR 0·62, 95% CI 0·58–0·66). However, a 1 SD increase in HDL cholesterol due to genetic score was not associated with risk of myocardial infarction (OR 0·93, 95% CI 0·68–1·26, p=0·63). For LDL cholesterol, the estimate from observational epidemiology (a 1 SD increase in LDL cholesterol associated with OR 1·54, 95% CI 1·45–1·63) was concordant with that from genetic score (OR 2·13, 95% CI 1·69–2·69, p=2×10−10).
Some genetic mechanisms that raise plasma HDL cholesterol do not seem to lower risk of myocardial infarction. These data challenge the concept that raising of plasma HDL cholesterol will uniformly translate into reductions in risk of myocardial infarction.
US National Institutes of Health, The Wellcome Trust, European Union, British Heart Foundation, and the German Federal Ministry of Education and Research.
Genetic effects contribute to individual differences in smoking behavior. Persistence to smoke despite known harmful health effects is mostly driven by nicotine addiction. As the physiological effects of nicotine are mediated by nicotinic acetylcholine receptors (nAChRs), we aimed at examining whether single nucleotide polymorphisms (SNPs) residing in nAChR subunit (CHRN) genes, other than CHRNA3/CHRNA5/CHRNB4 gene cluster previously showing association in our sample, are associated with smoking quantity or serum cotinine levels.
The study sample consisted of 485 Finnish adult daily smokers (age 30–75 years, 59% men) assessed for the number of cigarettes smoked per day (CPD) and serum cotinine level. We first studied SNPs residing on selected nAChR subunit genes (CHRNA2, CHRNA4, CHRNA6/CHRNB3, CHRNA7, CHRNA9, CHRNA10, CHRNB2, CHRNG/CHRND) genotyped within a genome-wide association study for single SNP and multiple SNP associations by ordinal regression. Next, we explored individual haplotype associations using sliding window technique.
At one of the 8 loci studied, CHRNG/CHRND (chr2), single SNP (rs1190452), multiple SNP, and 2-SNP haplotype analyses (SNPs rs4973539–rs1190452) all showed statistically significant association with cotinine level. The median cotinine levels varied between the 2-SNP haplotypes from 220 ng/ml (AA haplotype) to 249 ng/ml (AG haplotype). We did not observe significant associations with CPD.
These results provide further evidence that the γ−δ nAChR subunit gene region is associated with cotinine levels but not with the number of CPD, illustrating the usefulness of biomarkers in genetic analyses.
Psychological stress is suggested to accelerate the rate of biological aging. We investigated whether work-related exhaustion, an indicator of prolonged work stress, is associated with accelerated biological aging, as indicated by shorter leukocyte telomeres, that is, the DNA-protein complexes that cap chromosomal ends in cells.
We used data from a representative sample of the Finnish working-age population, the Health 2000 Study. Our sample consisted of 2911 men and women aged 30–64. Work-related exhaustion was assessed using the Maslach Burnout Inventory - General Survey. We determined relative leukocyte telomere length using a quantitative real-time polymerase chain reaction (PCR) -based method.
After adjustment for age and sex, individuals with severe exhaustion had leukocyte telomeres on average 0.043 relative units shorter (standard error of the mean 0.016) than those with no exhaustion (p = 0.009). The association between exhaustion and relative telomere length remained significant after additional adjustment for marital and socioeconomic status, smoking, body mass index, and morbidities (adjusted difference 0.044 relative units, standard error of the mean 0.017, p = 0.008).
These data suggest that work-related exhaustion is related to the acceleration of the rate of biological aging. This hypothesis awaits confirmation in a prospective study measuring changes in relative telomere length over time.
Phenotype mining is a novel approach for elucidating the genetic basis of complex phenotypic variation. It involves a search of rich phenotype databases for measures correlated with genetic variation, as identified in genome-wide genotyping or sequencing studies. An initial implementation of phenotype mining in a prospective unselected population cohort, the Northern Finland 1966 Birth Cohort (NFBC1966), identifies neurodevelopment-related traits—intellectual deficits, poor school performance and hearing abnormalities—which are more frequent among individuals with large (>500 kb) deletions than among other cohort members. Observation of extensive shared single nucleotide polymorphism haplotypes around deletions suggests an opportunity to expand phenotype mining from cohort samples to the populations from which they derive.
To get insight into molecular mechanisms underlying insulin resistance, we compared acute in vivo effects of insulin on adipose tissue transcriptional profiles between obese insulin-resistant and lean insulin-sensitive women.
Subcutaneous adipose tissue biopsies were obtained before and after 3 and 6 hours of intravenously maintained euglycemic hyperinsulinemia from 9 insulin-resistant and 11 insulin-sensitive females. Gene expression was measured using Affymetrix HG U133 Plus 2 microarrays and qRT-PCR. Microarray data and pathway analyses were performed with Chipster v1.4.2 and by using in-house developed nonparametric pathway analysis software.
The most prominent difference in gene expression of the insulin-resistant group during hyperinsulinemia was reduced transcription of nuclear genes involved in mitochondrial respiration (mitochondrial respiratory chain, GO:0001934). Inflammatory pathways with complement components (inflammatory response, GO:0006954) and cytokines (chemotaxis, GO:0042330) were strongly up-regulated in insulin-resistant as compared to insulin-sensitive subjects both before and during hyperinsulinemia. Furthermore, differences were observed in genes contributing to fatty acid, cholesterol and triglyceride metabolism (FATP2, ELOVL6, PNPLA3, SREBF1) and in genes involved in regulating lipolysis (ANGPTL4) between the insulin-resistant and -sensitive subjects especially during hyperinsulinemia.
The major finding of this study was lower expression of mitochondrial respiratory pathway and defective induction of lipid metabolism pathways by insulin in insulin-resistant subjects. Moreover, the study reveals several novel genes whose aberrant regulation is associated with the obese insulin-resistant phenotype.
Rationale: Genomic loci are associated with FEV1 or the ratio of FEV1 to FVC in population samples, but their association with chronic obstructive pulmonary disease (COPD) has not yet been proven, nor have their combined effects on lung function and COPD been studied.
Objectives: To test association with COPD of variants at five loci (TNS1, GSTCD, HTR4, AGER, and THSD4) and to evaluate joint effects on lung function and COPD of these single-nucleotide polymorphisms (SNPs), and variants at the previously reported locus near HHIP.
Methods: By sampling from 12 population-based studies (n = 31,422), we obtained genotype data on 3,284 COPD case subjects and 17,538 control subjects for sentinel SNPs in TNS1, GSTCD, HTR4, AGER, and THSD4. In 24,648 individuals (including 2,890 COPD case subjects and 13,862 control subjects), we additionally obtained genotypes for rs12504628 near HHIP. Each allele associated with lung function decline at these six SNPs contributed to a risk score. We studied the association of the risk score to lung function and COPD.
Measurements and Main Results: Association with COPD was significant for three loci (TNS1, GSTCD, and HTR4) and the previously reported HHIP locus, and suggestive and directionally consistent for AGER and TSHD4. Compared with the baseline group (7 risk alleles), carrying 10–12 risk alleles was associated with a reduction in FEV1 (β = –72.21 ml, P = 3.90 × 10−4) and FEV1/FVC (β = –1.53%, P = 6.35 × 10−6), and with COPD (odds ratio = 1.63, P = 1.46 × 10−5).
Conclusions: Variants in TNS1, GSTCD, and HTR4 are associated with COPD. Our highest risk score category was associated with a 1.6-fold higher COPD risk than the population average score.
FEV1; FVC; genome-wide association study; modeling risk
Attrition in longitudinal studies can lead to biased results. The study is motivated by the unexpected observation that alcohol consumption decreased despite of increased availability, which may be due to sample attrition of heavy drinkers. Several imputation methods have been proposed, but rarely compared in longitudinal studies of alcohol consumption. The imputation of consumption level measurements is computationally particularly challenging due to alcohol consumption being a semi-continuous variable (dichotomous drinking status and continuous volume among drinkers), and the non-normality of data in the continuous part. Data come from a longitudinal study in Denmark with four waves (2003–2006) and 1771 individuals at baseline. Five techniques for missing data are compared: Last value carried forward (LVCF) was used as a single, and Hotdeck, Heckman modelling, multivariate imputation by chained equations (MICE), and a Bayesian approach as multiple imputation methods. Predictive mean matching was used to account for non-normality, where instead of imputing regression estimates, “real” observed values from similar cases are imputed. Methods were also compared by means of a simulated dataset. The simulation showed that the Bayesian approach yielded the most unbiased estimates for imputation. The finding of no increase in consumption levels despite a higher availability remained unaltered.
panel surveys; missing data; multiple imputation; Bayesian models; alcohol consumption
Genome-wide association studies (GWASs) have identified a large number of variants (SNPs) associating with an increased risk of coronary artery disease (CAD). Recently, the CARDIoGRAM consortium published a GWAS based on the largest study population so far. They successfully replicated twelve already known associations and discovered thirteen new SNPs associating with CAD. We examined whether the genetic profiling of these variants improves prediction of subclinical atherosclerosis – i.e., carotid intima-media thickness (CIMT) and carotid artery elasticity (CAE) – beyond classical risk factors.
Subjects and Methods
We genotyped 24 variants found in a population of European ancestry and measured CIMT and CAE in 2001 and 2007 from 2,081, and 2,015 subjects (aged 30–45 years in 2007) respectively, participating in the Cardiovascular Risk in Young Finns Study (YFS). The Bogalusa Heart Study (BHS; n = 1179) was used as a replication cohort (mean age of 37.5). For additional replication, a sub-sample of 5 SNPs was genotyped for 1,291 individuals aged 46–76 years participating in the Health 2000 population survey. We tested the impact of genetic risk score (GRS24SNP/CAD) calculated as a weighted (by allelic odds ratios for CAD) sum of CAD risk alleles from the studied 24 variants on CIMT, CAE, the incidence of carotid atherosclerosis and the progression of CIMT and CAE during a 6-year follow-up.
CIMT or CAE did not significantly associate with GRS24SNP/CAD before or after adjusting for classical CAD risk factors (p>0.05 for all) in YFS or in the BHS. CIMT and CAE associated with only one SNP each in the YFS. The findings were not replicated in the replication cohorts. In the meta-analysis CIMT or CAE did not associate with any of the SNPs.
Genetic profiling, by using known CAD risk variants, should not improve risk stratification for subclinical atherosclerosis beyond conventional risk factors among healthy young adults.
A cost-efficient way to increase power in a genetic association study is to pool controls from different sources. The genotyping effort can then be directed to large case series. The Nordic Control database, NordicDB, has been set up as a unique resource in the Nordic area and the data are available for authorized users through the web portal (http://www.nordicdb.org). The current version of NordicDB pools together high-density genome-wide SNP information from ∼5000 controls originating from Finnish, Swedish and Danish studies and shows country-specific allele frequencies for SNP markers. The genetic homogeneity of the samples was investigated using multidimensional scaling (MDS) analysis and pairwise allele frequency differences between the studies. The plot of the first two MDS components showed excellent resemblance to the geographical placement of the samples, with a clear NW–SE gradient. We advise researchers to assess the impact of population structure when incorporating NordicDB controls in association studies. This harmonized Nordic database presents a unique genome-wide resource for future genetic association studies in the Nordic countries.
common controls; genome-wide data; Nordic Control Database; population stratification
USF1 is a ubiquitous transcription factor governing the expression of numerous genes of lipid and glucose metabolism. APOA5 is a well-established candidate gene regulating triglyceride (TG) levels and has been identified as a downstream target of upstream stimulatory factor. No detailed studies about the effect of APOA5 on atherosclerotic lesion formation have been conducted, nor has its potential interaction with USF1 been examined.
Methods and Results
We analyzed allelic variants of USF1 and APOA5 in families (n=516) ascertained for atherogenic dyslipidemia and in an autopsy series of middle-aged men (n=300) with precise quantitative measurements of atherosclerotic lesions. The impact of previously associated APOA5 variants on TGs was observed in the dyslipidemic families, and variant rs3135506 was associated with size of fibrotic aortic lesions in the autopsy series. The USF1 variant rs2516839, associated previously with atherosclerotic lesions, showed an effect on TGs in members of the dyslipidemic families with documented coronary artery disease. We provide preliminary evidence of gene-gene interaction between these variants in an autopsy series with a fibrotic lesion area in the abdominal aorta (P=0.0028), with TGs in dyslipidemic coronary artery disease subjects (P=0.03), and with high-density lipoprotein cholesterol (P=0.008) in a large population cohort of coronary artery disease patients (n=1065) in which the interaction for TGs was not replicated.
Our findings in these unique samples reinforce the roles of APOA5 and USF1 variants on cardiovascular phenotypes and suggest that both genes contribute to lipid levels and aortic atherosclerosis individually and possibly through epistatic effects.
genes; USF1; APOA5; lipids; atherosclerosis; epistasis
Patterns of genetic diversity have previously been shown to mirror geography on a global scale and within continents and individual countries. Using genome-wide SNP data on 5174 Swedes with extensive geographical coverage, we analyzed the genetic structure of the Swedish population. We observed strong differences between the far northern counties and the remaining counties. The population of Dalarna county, in north middle Sweden, which borders southern Norway, also appears to differ markedly from other counties, possibly due to this county having more individuals with remote Finnish or Norwegian ancestry than other counties. An analysis of genetic differentiation (based on pairwise Fst) indicated that the population of Sweden's southernmost counties are genetically closer to the HapMap CEU samples of Northern European ancestry than to the populations of Sweden's northernmost counties. In a comparison of extended homozygous segments, we detected a clear divide between southern and northern Sweden with small differences between the southern counties and considerably more segments in northern Sweden. Both the increased degree of homozygosity in the north and the large genetic differences between the south and the north may have arisen due to a small population in the north and the vast geographical distances between towns and villages in the north, in contrast to the more densely settled southern parts of Sweden. Our findings have implications for future genome-wide association studies (GWAS) with respect to the matching of cases and controls and the need for within-county matching. We have shown that genetic differences within a single country may be substantial, even when viewed on a European scale. Thus, population stratification needs to be accounted for, even within a country like Sweden, which is often perceived to be relatively homogenous and a favourable resource for genetic mapping, otherwise inferences based on genetic data may lead to false conclusions.
C-reactive protein (CRP) is a heritable marker of chronic inflammation that is strongly associated with cardiovascular disease. We aimed to identify genetic variants that are associated with CRP levels.
Methods and Results
We performed a genome wide association (GWA) analysis of CRP in 66,185 participants from 15 population-based studies. We sought replication for the genome wide significant and suggestive loci in a replication panel comprising 16,540 individuals from ten independent studies. We found 18 genome-wide significant loci and we provided evidence of replication for eight of them. Our results confirm seven previously known loci and introduce 11 novel loci that are implicated in pathways related to the metabolic syndrome (APOC1, HNF1A, LEPR, GCKR, HNF4A, and PTPN2), immune system (CRP, IL6R, NLRP3, IL1F10, and IRF1), or that reside in regions previously not known to play a role in chronic inflammation (PPP1R3B, SALL1, PABPC4, ASCL1, RORA, and BCL7B). We found significant interaction of body mass index (BMI) with LEPR (p<2.9×10−6). A weighted genetic risk score that was developed to summarize the effect of risk alleles was strongly associated with CRP levels and explained approximately 5% of the trait variance; however, there was no evidence for these genetic variants explaining the association of CRP with coronary heart disease.
We identified 18 loci that were associated with CRP levels. Our study highlights immune response and metabolic regulatory pathways involved in the regulation of chronic inflammation.
genome-wide association; C-reactive protein; inflammation; epidemiology; coronary heart disease
Serum concentrations of total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) are among the most important risk factors for coronary artery disease (CAD) and are targets for therapeutic intervention. We screened the genome for common variants associated with serum lipids in >100,000 individuals of European ancestry. Here we report 95 significantly associated loci (P < 5 × 10-8), with 59 showing genome-wide significant association with lipid traits for the first time. The newly reported associations include single nucleotide polymorphisms (SNPs) near known lipid regulators (e.g., CYP7A1, NPC1L1, and SCARB1) as well as in scores of loci not previously implicated in lipoprotein metabolism. The 95 loci contribute not only to normal variation in lipid traits but also to extreme lipid phenotypes and impact lipid traits in three non-European populations (East Asians, South Asians, and African Americans). Our results identify several novel loci associated with serum lipids that are also associated with CAD. Finally, we validated three of the novel genes—GALNT2, PPP1R3B, and TTC39B—with experiments in mouse models. Taken together, our findings provide the foundation to develop a broader biological understanding of lipoprotein metabolism and to identify new therapeutic opportunities for the prevention of CAD.
The lipid–leukocyte (LL) module is associated with, and reactive to, a wide variety of serum metabolites.The LL module appears to be a link between metabolism, adiposity, and inflammation.Serum metabolite concentrations themselves determine the connectedness of LL module.
Comprehensive characterization of human tissues promises novel insights into the biological architecture of human diseases and traits. We assessed metabonomic, transcriptomic, and genomic variation for a large population-based cohort from the capital region of Finland. Network analyses identified a set of highly correlated genes, the lipid–leukocyte (LL) module, as having a prominent role in over 80 serum metabolites (of 134 measures quantified), including lipoprotein subclasses, lipids, and amino acids. Concurrent association with immune response markers suggested the LL module as a possible link between inflammation, metabolism, and adiposity. Further, genomic variation was used to generate a directed network and infer LL module's largely reactive nature to metabolites. Finally, gene co-expression in circulating leukocytes was shown to be dependent on serum metabolite concentrations, providing evidence for the hypothesis that the coherence of molecular networks themselves is conditional on environmental factors. These findings show the importance and opportunity of systematic molecular investigation of human population samples. To facilitate and encourage this investigation, the metabonomic, transcriptomic, and genomic data used in this study have been made available as a resource for the research community.
bioinformatics; biological networks; integrative genomics; metabonomics; transcriptomics
Summary: The Sample avAILability system—SAIL—is a web based application for searching, browsing and annotating biological sample collections or biobank entries. By providing individual-level information on the availability of specific data types (phenotypes, genetic or genomic data) and samples within a collection, rather than the actual measurement data, resource integration can be facilitated. A flexible data structure enables the collection owners to provide descriptive information on their samples using existing or custom vocabularies. Users can query for the available samples by various parameters combining them via logical expressions. The system can be scaled to hold data from millions of samples with thousands of variables.
Availability: SAIL is available under Aferro-GPL open source license: https://github.com/sail.
Contact: email@example.com, firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online and from http://www.simbioms.org.
Comparison of patients with coronary heart disease and controls in genome-wide association studies has revealed several single nucleotide polymorphisms (SNPs) associated with coronary heart disease. We aimed to establish the external validity of these findings and to obtain more precise risk estimates using a prospective cohort design.
We tested 13 recently discovered SNPs for association with coronary heart disease in a case-control design including participants differing from those in the discovery samples (3829 participants with prevalent coronary heart disease and 48 897 controls free of the disease) and a prospective cohort design including 30 725 participants free of cardiovascular disease from Finland and Sweden. We modelled the 13 SNPs as a multilocus genetic risk score and used Cox proportional hazards models to estimate the association of genetic risk score with incident coronary heart disease. For case-control analyses we analysed associations between individual SNPs and quintiles of genetic risk score using logistic regression.
In prospective cohort analyses, 1264 participants had a first coronary heart disease event during a median 10·7 years' follow-up (IQR 6·7–13·6). Genetic risk score was associated with a first coronary heart disease event. When compared with the bottom quintile of genetic risk score, participants in the top quintile were at 1·66-times increased risk of coronary heart disease in a model adjusting for traditional risk factors (95% CI 1·35–2·04, p value for linear trend=7·3×10−10). Adjustment for family history did not change these estimates. Genetic risk score did not improve C index over traditional risk factors and family history (p=0·19), nor did it have a significant effect on net reclassification improvement (2·2%, p=0·18); however, it did have a small effect on integrated discrimination index (0·004, p=0·0006). Results of the case-control analyses were similar to those of the prospective cohort analyses.
Using a genetic risk score based on 13 SNPs associated with coronary heart disease, we can identify the 20% of individuals of European ancestry who are at roughly 70% increased risk of a first coronary heart disease event. The potential clinical use of this panel of SNPs remains to be defined.
The Wellcome Trust; Academy of Finland Center of Excellence for Complex Disease Genetics; US National Institutes of Health; the Donovan Family Foundation.
A cluster of three nicotinic acetylcholine receptor genes on chromosome 15 (CHRNA5/CHRNA3/CHRNB4) has been shown to be associated with nicotine dependence and smoking quantity. The aim of this study was to clarify whether the variation at this locus regulates nicotine intake among smokers by using the level of a metabolite of nicotine, cotinine, as an outcome. The number of cigarettes smoked per day (CPD) and immune-reactive serum cotinine level were determined in 516 daily smokers (age 30–75 years, 303 males) from the population-based Health2000 study. Association of 21 SNPs from a 100 kb region of chromosome 15 with cotinine and CPD was examined. SNP rs1051730 showed the strongest association to both measures. However, this SNP accounted for nearly a five-fold larger proportion of variance in cotinine levels than in CPD (R2 4.3% versus 0.9%). The effect size of the SNP was 0.30 for cotinine level, whereas it was 0.13 for CPD. Variation at CHRNA5/CHRNA3/CHRNB4 cluster influences nicotine level, measured as cotinine, more strongly than smoking quantity, measured by CPD, and appears thus to be involved in regulation of nicotine levels among smokers.
While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.
Circulating lipid concentrations are important predictors of coronary artery disease. The main pathology of coronary artery disease is atherosclerosis, a cycle of lipid adherence to the walls of arteries and an inflammatory response resulting in more adhesion. To investigate the link between lipids and immune cells in circulation, we have generated both genomic and whole blood gene expression profiles for a population-based collection of individuals from the capital region of Finland. Key mediators of inflammation and allergy were shown to be correlated with lipid levels. Further, the expressions of these genes operated in such a highly coordinated fashion that they appeared to function as part of a single pathway, which itself was both highly correlated with and reactive to lipid levels. Our findings offer insight into how lipids activate circulating immune cells, potentially contributing to the pathogenesis of coronary artery disease.
To get beyond the “low-hanging fruits” so far identified by genome-wide association (GWA) studies, new methods must be developed in order to discover the numerous remaining genes that estimates of heritability indicate should be contributing to complex human phenotypes, such as obesity. Here we describe a novel integrative method for complex disease gene identification utilizing both genome-wide transcript profiling of adipose tissue samples and consequent analysis of genome-wide association data generated in large SNP scans. We infer causality of genes with obesity by employing a unique set of monozygotic twin pairs discordant for BMI (n = 13 pairs, age 24–28 years, 15.4 kg mean weight difference) and contrast the transcript profiles with those from a larger sample of non-related adult individuals (N = 77). Using this approach, we were able to identify 27 genes with possibly causal roles in determining the degree of human adiposity. Testing for association of SNP variants in these 27 genes in the population samples of the large ENGAGE consortium (N = 21,000) revealed a significant deviation of P-values from the expected (P = 4×10−4). A total of 13 genes contained SNPs nominally associated with BMI. The top finding was blood coagulation factor F13A1 identified as a novel obesity gene also replicated in a second GWA set of ∼2,000 individuals. This study presents a new approach to utilizing gene expression studies for informing choice of candidate genes for complex human phenotypes, such as obesity.
Obesity has a strong genetic component and an estimated 45%–85% of the variation in adult relative weight is genetically determined. Many genes have recently been identified in genome-wide association studies. The individual effects of the identified genes, however, have been very modest, and their identification required very large sample sizes. New approaches are therefore needed to uncover further genetic variants that contribute to the development of obesity and related conditions. Much can be learned from studying the expression of genes in adipose tissue of obese and non-obese subjects, but it is very difficult to distinguish which genes' expression differences represent reactions to obesity from those related to causal processes. We studied monozygotic twin pairs discordant for obesity and contrasted the gene expression profiles of obese and lean co-twins (controlling for genetic variation) to those from unrelated individuals to try to discern the cause-and-effect relationships of the identified changes in gene expression in fat. Testing the identified genes in 21,000 individuals identified numerous new genes with possible roles in the development of obesity. Among the top findings was a gene involved in blood coagulation (Factor XIIIA1), possibly linking obesity with known complications including deep vein thrombosis, heart attack, and stroke.
Accelerated leukocyte telomere shortening has been previously associated to self-perceived stress and psychiatric disorders, including schizophrenia and mood disorders. We set out to investigate whether telomere length is affected in patients with anxiety disorders in which stress is a known risk factor. We also studied the effects of childhood and recent psychological distress on telomere length. We utilized samples from the nationally representative population-based Health 2000 Survey that was carried out between 2000–2001 in Finland to assess major public health problems and their determinants. We measured the relative telomere length of the peripheral blood cells by quantitative real-time PCR from 321 individuals with DSM-IV anxiety disorder or subthreshold diagnosis and 653 matched controls aged 30–87 years, who all had undergone the Composite International Diagnostic Interview. While telomere length did not differ significantly between cases and controls in the entire cohort, the older half of the anxiety disorder patients (48–87 years) exhibited significantly shorter telomeres than healthy controls of the same age (P = 0.013). Interestingly, shorter telomere length was also associated with a greater number of reported childhood adverse life events, among both the anxiety disorder cases and controls (P = 0.005). Childhood chronic or serious illness was the most significantly associated single event affecting telomere length at the adult age (P = 0.004). Self-reported current psychological distress did not affect telomere length. Our results suggest that childhood stress might lead to accelerated telomere shortening seen at the adult age. This finding has potentially important implications supporting the view that childhood adversities might have a considerable impact on well being later in life.
The global prevalence of obesity has increased significantly in recent decades, mainly due to excess calorie intake and increasingly sedentary lifestyle. Here, we test the association between obesity measured by body mass index (BMI) and one of the best-known genetic variants showing strong selective pressure: the functional variant in the cis-regulatory element of the lactase gene. We tested this variant since it is presumed to provide nutritional advantage in specific physical and cultural environments. We genetically defined lactase persistence (LP) in 31 720 individuals from eight European population-based studies and one family study by genotyping or imputing the European LP variant (rs4988235). We performed a meta-analysis by pooling the β-coefficient estimates of the relationship between rs4988235 and BMI from the nine studies and found that the carriers of the allele responsible for LP among Europeans showed higher BMI (P = 7.9 × 10−5). Since this locus has been shown to be prone to population stratification, we paid special attention to reveal any population substructure which might be responsible for the association signal. The best evidence of exclusion of stratification came from the Dutch family sample which is robust for stratification. In this study, we highlight issues in model selection in the genome-wide association studies and problems in imputation of these special genomic regions.