|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: TZ KJL JL NH TM SB FC. Performed the experiments: TZ PSW CRS RBS DM WR CP CP SB. Analyzed the data: SS MR AS RC SM MG ME EL VN DT AZ LT FC. Contributed reagents/materials/analysis tools: TZ PSW KJL HR ME CRS RBS EL DM WR AZ. Wrote the paper: TZ LT SB FC.
Variability of gene expression in human may link gene sequence variability and phenotypes; however, non-genetic variations, alone or in combination with genetics, may also influence expression traits and have a critical role in physiological and disease processes.
To get better insight into the overall variability of gene expression, we assessed the transcriptome of circulating monocytes, a key cell involved in immunity-related diseases and atherosclerosis, in 1,490 unrelated individuals and investigated its association with >675,000 SNPs and 10 common cardiovascular risk factors. Out of 12,808 expressed genes, 2,745 expression quantitative trait loci were detected (P<5.78×10−12), most of them (90%) being cis-modulated. Extensive analyses showed that associations identified by genome-wide association studies of lipids, body mass index or blood pressure were rarely compatible with a mediation by monocyte expression level at the locus. At a study-wide level (P<3.9×10−7), 1,662 expression traits (13.0%) were significantly associated with at least one risk factor. Genome-wide interaction analyses suggested that genetic variability and risk factors mostly acted additively on gene expression. Because of the structure of correlation among expression traits, the variability of risk factors could be characterized by a limited set of independent gene expressions which may have biological and clinical relevance. For example expression traits associated with cigarette smoking were more strongly associated with carotid atherosclerosis than smoking itself.
This study demonstrates that the monocyte transcriptome is a potent integrator of genetic and non-genetic influences of relevance for disease pathophysiology and risk assessment.
The transcriptome, i.e. the whole set of RNA transcripts in a cell, is generally conceived as a system whose major function is to pass information encoded in the genome sequence to the realm of phenotypes that underlie physiological and pathological traits. This messenger paradigm justifies the current interest for the genetics of gene expression – which has been further enhanced by the numerous associations between genetic markers and diseases reported in recent genome-wide association studies (GWAS) and the expected relevance of genome wide expression (GWE) to characterize the biological basis of these associations –. However the variability of gene expression not only reflects genetic variation but depends on other factors as well such as environmental exposures , , metabolic conditions , ageing ,  or gender –.
Based on these premises we reasoned that if the state of the transcriptome and its changes are important determinants of cell functions, differences in transcript abundance whatever their origin, genetic or non genetic, may contribute to disease pathogenesis. Moreover, the transcriptome might integrate information from numerous sources and inform on the current pathophysiological state of the organism. To assess these possibilities, a global characterization of the variability of the transcriptome, integrating genetic and non genetic influences was undertaken. The study focused on peripheral blood monocytes because these cells may be isolated from an easily accessible tissue and play a key role in the pathogenesis of immune disorders and atherosclerosis-related diseases . In addition, working on a single cell type reduces the complexity of transcriptome data and may avoid possible biases resulting from the heterogeneous cell-types distribution in different samples as it is the case when using whole blood or leucocytes RNAs.
To reduce potential artefacts, fresh samples were collected and processed in a short period of time according to a very strict protocol. Monocytes were obtained from 1,490 unrelated individuals, 730 women and 760 men, aged 35 to 74 years, recruited in the Gutenberg Heart Study (GHS), a community-based project conducted in a single centre in the region of Mainz (Germany) (Table 1). GWE profiles were generated using Illumina Human HT-12 expression BeadChips, and after normalization and filtering out genes undetected in monocytes or non-well characterized (see Materials and Methods), 12,808 expressions traits (averaged over probes) remained for analysis.
Genotyping was performed using Affymetrix 6.0 arrays. After filtering out SNPs poorly performing or having a minor allele frequency <0.01, 675,350 SNPs were kept for further analyses. All associations between SNPs and expression traits with a P-value <10−5 (n>225,000) were stored in the “GHS_Express” database (“GHS_Express” is available online, see Methods S1). At a study-wise threshold of significance correcting for the number of SNPs and expressions (P<5.78×10−12), 37,403 associations, involving 29,912 SNPs and 2,745 expression traits (referred to as eSNPs and eQTLs, respectively), were identified (Table 2). The median number of eSNPs by eQTL was 11 with an interquartile range of 4 to 26. Owing to its large sample size, the study had an 80% power to detect a SNP effect accounting for 4% or more of the variability (R2) of any expression trait. Among the 2,745 significant eQTLs, the R2 observed for the best eSNP varied from 3.1% to >80% with a median of 7.7%. For 290 eQTLs, the R2 was greater than 25%.
Associations involving SNPs located within 1 Mb of either the 5′ or 3′ end of the associated gene were considered cis (File S1) and other associations were considered trans. In accordance with previous results –, most of the genetic variability affecting the transcriptome was of cis origin. At study-wise significance, the number of cis and trans eQTLs were 2,477 and 349, respectively, yielding a cis/trans ratio of 7.1 (81 eQTLs were both cis- and trans-modulated). At less stringent levels of significance the number of trans associations considerably increased, as expected by chance, whereas the number of cis eQTLs only modestly increased, indicating that the high stringency used for cis eQTLs identification did not result in an important under-estimation of the true number of cis eQTLs (Figure 1).
We examined the overlap between the cis eQTLs identified in the present study and those found in three previous association studies in which gene expression was explored in LCLs ,  and hepatic cells . For this comparison, a significance threshold of 3.9×10−6 (Bonferroni-corrected for 12,808 genes) was used for the analysis of eQTLs in GHS data, corresponding to a single hypothesis tested per gene. Among the cis eQTLs considered significant in each of the studies and involving expression traits detected in GHS, 66.7%, 56.5% and 54.1%, respectively, were significant in our data (Table 3, Files S2–S4). The proportion of cis eQTLs that replicated in GHS increased with the increasing level of significance reported in each study, consistent with the fact that stronger associations are more robust and more likely to be shared by different types of cells. These comparisons revealed a relatively high rate of replication of the previous findings in GHS. However, as a consequence of its greater power, P-values observed in GHS were considerably lower than those previously reported (Figure 2).
We also examined the overlap between cis eQTLs in GHS and cis-heritable eQTLs found by expression profiling of lymphocyte RNA in the San Antonio Family Heart Study (SAFHS) 4. Among the eQTLs with a cis heritability ≥0.1 in SAFHS, 62% were significantly cis modulated in GHS, and this proportion reached 89% for heritabilities ≥0.6 (Figure 3, File S5).
Trans associations showed much weaker consistency across studies. Among the 50 eQTLs having a trans lod score >4.0 in SAFHS  with corresponding expression detected in GHS, only one, MAPK8IP1, was replicated in GHS (P<10−300). Replication of the trans associations in studies of similar power as the present one would be of interest.
For all probes present on the Illumina HT12 array, a systematic search for sequence polymorphisms was undertaken, using the HapMap database as reference (Release 27; Phase II+III, Feb09, on NCBI B36 assembly and dbSNP b126). Among the 2,477 genes whose expression was associated with cis eSNPs, 173 (7%) were probed by one or several polymorphic sequences (180 probes) (Table S1). For 32 of these probes, the HapMap SNP was present on the Affymetrix array used in this study and for 41 other probes, the HapMap SNP had one or several perfect proxies on the array. For those eQTLs, we cannot exclude the possibility of an artefactual association due to a differential binding of the probe to its target sequence.
A link between genetic variability and clinical phenotypes is supported in human studies by several observations relating variants in regulatory gene regions to protein phenotypes or diseases , . However, this message-passing paradigm has never been evaluated on a genome-wide basis. We therefore tested whether monocytes gene expression might mediate the effects of loci recently identified by GWAS of cardiovascular risk factors. For each locus identified in GWAS of lipid variables , blood pressure (BP)  and body mass index (BMI) , we selected the lead SNP or a tag SNP having an r2≥0.8 with the lead SNP in GHS data. Associations between lead/tag SNPs and corresponding risk factors were checked in all GHS subjects for whom genome-wide data was available (n=3,175). Most previous GWAS loci for circulating lipids were replicated in our data (Table 4) but only few of the findings of GWAS of BMI and BP were replicated (Table S2). This low replication is probably due to a lack of power, as the maximum R2 observed in the GWAS of BP  was 0.09% and the power of GHS to replicate such an association was only 38%.
For each GWAS locus, we examined whether the lead/tag SNP correlated with any expression trait in GHS data and when a significant association was found, we checked whether the expression trait was significantly associated with the risk factor under consideration. This analysis revealed that very few GWAS results were compatible with an effect mediated by gene expression at the locus (Table 4 and Table S2). There were, however, two exceptions: the first one concerned the LPL locus, where the minor allele of rs17489282 was associated with higher HDL-cholesterol (P=5.91×10−5) and LPL expression (P=2.18×10−6), while HDL-cholesterol and LPL expression were positively correlated (P=6×10−4), consistent with an effect mediated by LPL; the second one concerned the association between the 1p13.3 locus and LDL-cholesterol. This locus encompasses three potential candidate genes, CELSR2, PSRC1 and SORT1, and it has been suggested that CELSR2 or SORT1 could be responsible for the reported associations of this locus with LDL , , . In our data, the minor allele of rs629301 (a perfect tag of the lead SNP identified by GWAS), was associated with lower LDL-cholesterol (P=2.6×10−4) and higher PRSC1 expression (P=2.3×10−56) while PRSC1 expression and LDL-cholesterol were negatively correlated (P=0.019). Results for CELSR2 were much less consistent and SORT1, the third gene at the locus, was not cis-modulated in monocytes.
Several loci associated with coronary artery disease (CAD) have been identified by GWAS –. The strongest association involves SNPs in the 9p21 region. Recently it was reported that deletion in mice of the region orthologous to the 9p21 CAD interval in human affects the expression of the nearby cdkn2a and cdkn2b genes as well as the properties of proliferation of vascular cells . The Cyclin-dependent kinase inhibitor coding genes, CDNK2A and CDNK2B, are also located close to the CAD locus in humans. CDKN2A expression in monocytes was not detected in our study, we therefore focused our analysis on CDKN2B. All SNPs available in GHS in the region encompassing the CAD locus were tested for association with the expression of CDKN2B. Figure 4 shows that CDKN2B expression was strongly associated with several SNPs located in a region upstream of the gene sequence (P<10−60). However, these SNPs were not associated with CAD (this result was obtained in a yet unpublished GWAS comparing GHS individuals to a cohort of CAD patients), whereas proxies of the CAD-associated SNPs were unrelated with CDKN2B expression (see legend of Figure 4 for more details). The SNPs associated with CDKN2B expression are located within the sequence of the non-coding alternatively spliced gene ANRIL (also named CDKN2BAS) whose implication in the association with CAD has been hypothesized . Although our results are limited by the fact that neither CDKN2A nor ANRIL expressions could be evaluated, they reveal that in humans, SNPs that affect CDKN2B expression are different from those that are known to affect CAD risk (Figure 4).
To investigate gene expression in relation to risk factors (age, gender, BMI, HDL and LDL cholesterol, triglycerides, Systolic and Diastolic Blood pressure, smoking and plasma CRP), the study-wise significance threshold was set at 3.9×10−7 to correct for the number of risk factors (n=10) and expressions (n=12,808) tested. Overall, 1,662 expression traits (13.0%) were associated with at least one risk factor (Table 5 and File S6). Gender and age were the two major factors influencing expression levels (807 gene expressions were affected by gender and 396 by age). BMI, smoking and C-reactive protein (CRP) levels were also correlated with numerous expression traits (230, 294 and 328, respectively). Conversely, few associations with BP and lipids were observed (Table 5).
Cis eQTLs were over-represented among expression traits that were also affected by gender, age, BMI, smoking and CRP, with odds ratios as high as 3.24 for cigarette smoking (Table 5). This suggests that some genes are more responsive than others to the influence of multiple factors. For expression traits that were simultaneously associated with cis eSNPs and risk factors (n=465), we determined the joint effects of the two sources of variability on expression level. For this purpose, each eQTL was modelled as a function of the best cis-acting eSNP, the associated risk factor and the interaction between the two. When several risk factors were associated with the same eQTL, each of them was tested separately for interaction with the corresponding SNP. The best FDR-corrected P-values for interaction (File S7) were 0.014 (ISCU expression, gender and rs4830487) and the second one was 0.042 (HIST1H2AE expression, smoking and rs16891378). This first genome-wide exploration of interaction between cis eSNPs and risk factors on gene expression therefore suggests that the two sources of variability mostly act additively on expression. This is illustrated for eQTLs affected by smoking in Figure 5. It must be noted however that despite the large size of this study, its power may nevertheless be insufficient to assess weak interaction.
An ontology analysis using the Panther system demonstrated that, by reference to the list of 12,808 genes expressed in monocytes, the set of 465 expression traits affected by multiple sources of variability was enriched in “Immunity and defense” genes (69 observed/35.8 expected, P=4.5×10−6), especially in the sub-categories of “Macrophage-mediated immunity” (18/3.6, P=7.5×10−6).
The preceding analyses revealed that each risk factor was associated with a large number of expression traits, thus emphasizing the multiple inter-relations existing between the transcriptomic and risk factor profiles of an individual. While it is important from a biological and mechanistic perspective to characterize at best all the genes that are influenced by a given condition, from a clinical perspective, it might be more relevant to identify a limited set of gene expressions that could efficiently discriminate individuals with different risk profiles. Indeed, because of the tight co-regulation of genes within biological systems, numerous gene expressions are inter-correlated and consequently, their association with risk factors are not independent. To account for this inter-dependency, we conducted a multivariate analysis to identify expression traits that were independently associated with each risk factor.
To obtain reliable results, we randomly divided the study population into two sub-samples of equal size which were used for screening and validation purposes respectively (see Materials and Methods). In these analyses, each risk factor was considered separately whereas all expression traits were envisaged jointly for their potential association with the risk factor, considered here as the dependent variable. The screening/validation procedure was repeated 250 times and for each risk factor, we report expression traits associated (P<0.01) with the risk factor in more than 25% of the replicates. This stringent approach led to the identification of 106 independent expression correlates for the ten risk factors (Table 6), a much reduced number compared to the 1,662 expression traits previously identified by the one-to-one association analysis presented in Table 5.
Even after exclusion of sex-linked genes, gender was independently associated with the largest number of expression traits (n=31) which, considered all together, contributed to a highly significant discrimination between males and females (P<10−100). By contrast the number of expression traits independently associated with age was more limited (n=12).
Both factors were independently associated with 12 and 14 expression traits respectively. Inspection of the genes listed in Table 6 shows that several of them, including CX3CR1, CD209, CLEC10A, FCER1A, FCGBP, C1RL, C1QB, CD36, ADM and VSIG4, encode proteins involved in the differentiation or maturation of immunity-related cells and in host defence –. We may speculate that the variability of expression of these genes is the consequence of an already present heterogeneity of monocytes ,  or reflects a particular transcription pattern that prefigures future functional changes. The example of CX3CR1 which was positively associated with both BMI and CRP is particularly interesting as this gene encodes the fractalkine receptor whose role is essential in the migration of monocytes to sites of inflammation and injury, especially in atherosclerotic lesions .
ABCA1 and ABCG1 gene expressions were both associated with circulating lipids. The proteins encoded by these genes are key players in reverse cholesterol transport and the regulation of lipid-trafficking mechanisms in macrophages respectively , . MYLIP (Idol) is a ligase involved in the ubiquitination and degradation of LDL receptors ,  and SCD is a stearoyl-CoA desaturase involved in the conversion of saturated into monounsaturated fatty acids that regulates lipid metabolism and may be modulated by dietary intake .
One of the main independent correlates of SBP was ARID5B (MRF2), whose relevance in the physiology of BP regulation is supported by its role as a regulator of smooth muscle differentiation and proliferation . GFOD1, another expression correlate of SBP and DBP is a gene of unknown function which has been associated with attention deficit hyperactivity disorder .
Smoking was independently associated with 18 expression traits (Table 6) which, considered all together, contributed to a highly significant discrimination between smokers and non smokers (P<10−107, R2>50%). Nine of these genes were modulated by cis eSNPs (Table 7) and as already mentioned above, the genetic and smoking effects on these gene expressions were additive (Figure 5).
Among the 18 expression traits associated with smoking, SASH1, P2RY6 and PTGDS were systematically retrieved in all screening/validation replicates (Table S3). SASH1 is a tumor suppressor gene , P2RY6 encodes a G-protein-coupled receptor involved in the proinflammatory response to UDP in monocytes  and PTGDS encodes a prostaglandin D synthase involved in smooth muscle contraction/relaxation and inhibition of platelet aggregation, two functions known to be modified by tobacco consumption. Recently, in a GWE study of leukocytes RNA, PTGDS and SASH1 expressions were found associated with cotinine, a metabolite of nicotine used as a marker of tobacco exposure .
Cigarette smoking is a major risk factor for atherosclerosis . In GHS participants, the prevalence of atherosclerotic plaques in the right and left carotid arteries, assessed by echography, was strongly increased in smokers (P=9.1×10−7 after adjustment for age and gender). Among the 18 gene expressions independently associated with smoking, four were individually correlated with the number of carotid plaques, PTGDS (P=1.8×10−7) negatively and MMP25 (P=3.5×10−4), SASH1 (P=1.4×10−5) and WWC3 (P=6.3×10−4) positively (Table 7). In a multivariate model including the four expression traits and smoking, as well as age and gender, PTGDS (P=5.7×10−4) and SASH1 (P=0.012) remained significantly associated with the number of plaques whereas MMP25 (P=0.09), WWC3 (P=0.10) and smoking (P=0.5) were no longer significant, suggesting that the association between smoking and atherosclerosis was mostly reflected (or mediated) by its effect on the expression of these four genes. The fact that PTGDS and SASH1 expression remained associated with carotid plaques after adjustment on smoking status may indicate a broader implication of these genes in atherosclerosis than the sole effect induced by smoking. However it is also possible that the expression of these two genes more faithfully reflects tobacco consumption than the dichotomous variable used to define smoking. This illustrates the dual aspect of the transcriptome which may be viewed either as an element in a causal chain or as reflecting ongoing processes with no implied causation. Because genetics may help to dissect causal pathways, we examined whether the best cis eSNPs associated with expression of the smoking-related genes were also associated with carotid atherosclerosis, but no such association was detected (Table 7).
This large-scale investigation of the transcriptome of monocytes in healthy individuals provides new biological insights into the mechanisms by which gene expression might contribute to disease pathogenesis. In the line of previous studies –, we could build a detailed map of cis-regulated eQTLs in monocytes. Even if cell-specific eQTLs exist , a large fraction of them are likely to be common to other cell types, and the eQTL map provided here constitutes the most extensive one so far.
Despite the large number of eQTLs identified, the transcriptome of circulating monocytes, contrary to initial expectations –, appeared of modest help to dissect the relationship between genome variability and complex human traits such as cardiovascular risk factors. One explanation for this finding might be that monocytes are not the most relevant cells for unravelling links between genome variation and the risk factors investigated. With regard to circulating lipids for example, only 28 of the 45 genes located in regions harbouring SNPs associated with circulating lipids in GWAS  were expressed in monocytes. The difficulty to corroborate the messenger paradigm in human clinical studies may also relate to the fact that the linear model of causality generally assumed to reflect the relation between genome variability, expression and phenotype may be too simplistic to account for a much more complex biological reality. The effects of genetic variants may be too weak to allow detection even in a study of this size. It is also important to keep in mind that most reported eSNPs are acting in cis, whereas trans eSNPs may actually be those that mainly drive the changes in gene expression that affect disease risk.
Most importantly, the present study highlighted for the first time the strong link existing between the transcriptome of an individual and his (her) clinical and epidemiological profile. The fact that the transcriptome tightly mirrors the variability of risk factors at a cellular level may have profound implications from a biological and clinical perspective. Until now, the traditional way of viewing the role of genes in the susceptibility to human diseases was through the effect of their variability of sequence. The present findings suggest that another important, if not greater, impact of genes on human phenotypes relates to the variability of their expression, whatever the origin of this variability. The global association observed between most cardiovascular risk factors and the transcriptome and the fact that each risk factor could be characterized by a limited and specific set of independent gene expressions further suggests that this relationship might be clinically relevant. This was particularly well illustrated by the response of the transcriptome to cigarette smoking. We showed that less than 20 genes among the 12,000 expressed in monocytes could highly discriminate smokers and non-smokers, and among them, four genes were sufficient to account for the strong association existing between smoking and atherosclerosis. Whether these genes are causally involved in the mechanisms linking smoking to the development of atherosclerotic plaques or whether they are only markers of ongoing pathological processes remains to be elucidated.
In conclusion, the variability of the transcriptome of monocytes can be viewed from two perspectives. On one hand it reflects the accumulation of effects originating from the genome and the environment and may inform on a number of ongoing processes relevant to disease. On the other hand, it may reflect or anticipate differences in monocytes biology that could have pathophysiological implications. This dual perspective suggests that a better understanding of the sources of variability of the transcriptome of monocytes and other easily accessible cells, will contribute in an important way to our understanding of complex diseases.
The study protocol and drawing of the blood sample have been approved by the local ethics committee and by the local and federal data safety commissioners (Ethik-Kommission der Landesärztekammer Rheinland-Pfalz). All subjects included signed an informed consent.
The Gutenberg Heart Study (GHS) is designed as a community-based, prospective, observational single-center cohort study in the Rhein-Main region in western mid-Germany. The primary aim of the study is to improve the individual cardiovascular risk prediction by identifying genetic and non genetic risk factors contributing to cardiovascular diseases, with a strong emphasis on atherosclerosis.
A sample of eligible participants was randomly drawn from the registers of the local registry offices in the city of Mainz and the district of Mainz-Bingen. This sample was stratified in a ratio of 11 for gender and residence, and in equal numbers for decades of age. Inclusion criteria were an age between 35 and 74 years and a written consent; exclusion criteria were insufficient knowledge of the German language to understand explanations and instructions, and physical or psychic inability to participate in the examinations in the study center. Individuals were invited for a 5-hour baseline-examination to the study center where clinical examinations and collection of blood samples were performed. The present analysis was based on an initial sample of 3,336 subjects successively enrolled into the GHS from April 2007 to April 2008. Genomic DNA was isolated from all participants. Monocyte RNA was isolated from half of the participants recruited each day to ensure rapid sample processing and isolation of total RNA. For approximately 1,500 study participants, both DNA and RNA were available.
Blood pressure measurements were performed by an automated sphygmomanometer blood pressure meter (Omron 705CP-II, OMRON Medizintechnik Handesgesellschaft GmbH, Germany) after 5, 8 and 11 minutes of rest. The mean from the 2nd and 3rd standardized measurement was calculated for the systolic and diastolic blood pressure. For the anthropometric measurements, calibrated, digital scales (Seca 862, Seca Germany), a measuring stick (Seca 220, Seca, Germany) and a waist measuring tape were used. The blood sampling was carried out under fasting conditions. HDL-cholesterol, LDL-cholesterol, triglycerides and C-reactive protein (CRP) measurements were performed on an Architect c8000 by commercially available tests (CRP, Ultra HDL, Direct LDL and Triglycerides) from Abbott (www.abbottdiagnostics.de). All tests were measured under standardized conditions in an accredited laboratory of the institute of clinical chemistry and laboratory medicine at the University of Mainz. Smoking was defined by dichotomizing the population into non-smokers (never smokers and former smokers) and smokers (occasional smoker, i.e. <1 cigarette/day, and smoker, i.e. >1 cigarette/day).
IMT was assessed with an ie33 ultrasound system (Philips, NL) using an 11 to 3 MHz linear array transducer. Experienced technologists blinded to participants' clinical data made all ultrasound measurements. The IMT was visualized bilaterally at the far wall of the CCA. In brief, a cursor representing the region of interest (10 mm) was positioned 1 cm in front of the beginning before the carotid bulb. Evaluation was performed using an automatic computerized system (Philips, NL Qlab software) and triggering was performed according to the Q wave of the ECG to enable measurement in complete relaxation of the ventricle. IMT was recorded 1 cm before the carotid bulb in a part without plaque on the left and right side. As mean IMT, the CCA was reported with the sum of IMT of the left and right side and afterwards divided by two. Plaques were defined as thickening of the IMT of at least 1.5 mm and presence was checked in all measured arteries. The number of plaques from both sides was recorded and subjects being classified as plaque positive when at least one plaque was measured on either side or plaque negative, when no plaque was recorded.
For each participant genomic DNA was extracted from buffy-coats prepared from EDTA blood samples (9 mL) using the method of Miller . Genotyping was performed using the Affymetrix Genome-Wide Human SNP Array 6.0 (http://www.affymetrix.com), as described by the Affymetrix user manual. Genotypes were called using the Affymetrix Birdseed-V2 calling algorithm and quality control was performed using GenABEL (http://mga.bionet.nsc.ru/nlru/GenABEL/). Individuals with a call rate below 97% or a too high autosomal heterozygosity (False Discovery Rate <1%) were excluded. After applying standard quality criteria (minor allele frequency >1%, genotype call rate >98% and P-value of deviation from Hardy-Weinberg equilibrium >10−4), 675,350 out of 900,392 SNPs remained for analysis.
Separation of monocytes was conducted within 60 min after blood collection and RNA was extracted the same day. Total RNA was isolated from monocytes using Trizol extraction and purification by silica-based columns. To separate monocytes, 8 mL blood was collected using the Vacutainer CPT Cell Preparation Tube System (BD, Heidelberg, Germany) and 400 µL Rosette Sep Monocyte Enrichment Cocktail (StemCell Technologies, Vancouver, Canada) was added immediately after blood collection. Monocytes, not labeled by antibodies, are collected as a highly enriched fraction at the interface between plasma and the density medium in the tube. After separation, cells were washed twice in ice cold PBS buffer containing 2 mM EDTA. Success of monocyte separation was controlled using an ADVIA 2120 Analyser (Siemens Healthcare Diagnostics, Eschborn, Germany) for part of the samples.
After separation, cells were resuspended in 1.5 mL Trizol Reagent (Invitrogen, Karlsruhe, Germany) immediately and frozen at −20°C until isolation of RNA at the same day (maximal storage time 5 h). After thawing, samples were transferred into Phase Lock Gel Tubes (Eppendorf, Hamburg, Germany), 200 mL chloroform was added and phases were separated by centrifugation at 4600 rpm for 15 min. Purification of total monocytes RNA was performed using the RNeasy Mini kit (Qiagen, Hilden, Germany) according to the manufactures' Animal Cell Spin and RNA Cleanup protocols including an additional DNase digestion step. Total RNA was eluted in 20 µL RNase-free water. Yield of RNA was checked spectrophotometrically by NanoDrop N-1000 measuring the OD260 as well as the ratio OD260 and OD280. The integrity of the total RNA was assessed through analysis on an Agilent Bioanalyzer 2100 (Agilent Technologies, Boeblingen, Germany).
GWE analysis was performed on monocytes RNA samples using the Illumina HT-12 v3 BeadChip (http://www.Illumina.com). RNA samples were processed in batches of 96 samples. Two hundred ng of total RNA was reverse transcribed, amplified and biotinylated using the Illumina TotalPrep-96 RNA Amplification Kit (Ambion/Applied Biosystems, Darmstadt, Germany). 700 ng of each biotinylated cRNA was hybridized to a single BeadChip at 58°C for 16–18 hours. BeadChips were scanned using the Illumina Bead Array Reader.
The summary probe-level data delivered by the Illumina scanner (mean and SD computed over all beads for a particular probe) was loaded in Beadstudio. The pre-processing done by the Illumina software, at the level of the scanner and by Beadstudio included: correction for local background effects, removal of outlier beads, computation of average bead signal and SD for each probe and gene, calculation of detection P-values using negative controls present on the array, quantile normalization across arrays, check of outlier samples using a clustering algorithm, check of positive controls. Analyses were carried out on the mean level for all probes in each gene. To stabilise variance across expression levels, we applied an arcsinh transformation to the expression data . Compared to a log transformation, this transformation has the advantage not to discard negative expression values which can occur in Illumina data.
The Illumina HT-12 BeadChip included 37,804 genes (some probes being not assigned to RefSeq genes). A gene was declared significantly expressed in the dataset, i.e. expressed above background (as measured by the negative controls present on each array), when the detection P-value calculated by Beadstudio was <0.05 in more that 5% of the samples. This resulted in 22,305 genes considered as being significantly expressed in our dataset. After removing 8,058 putative and/or non well characterized genes, i.e. gene names starting by KIAA (n=165), FLJ (n=214), HS (n=4,262), Cxorf (n=842), MGC (n=72), LOC (n=2,503), 12,808 well characterized detected genes remained for analysis.
To test all associations between SNPs and expressions in a reasonable amount of time, a C script calling the GNU Scientific Library (GSL) “TAMU ANOVA” (www.stat.tamu.edu/~aredd/tamuanova/) was written. For all significant associations, results were checked against the R-lm library . When the numbers of homozygotes for the minor allele of a SNP was lower than 30, they were grouped with heterozygotes. We used a family-wise error rate of 0.05 corrected for the number of tested SNP x expression associations, which corresponds to declare significant any association with a P-value <5.78×10−12. To increase robustness, associations significant by ANOVA were further checked by a Kruskall-Wallis (KW) test and only associations with a P-value <10−10 by the KW test were retained. P-values given in the results and in the GHS-Express database are those obtained by ANOVA. For SNPs on chromosome X, associations with gene expression were assessed separately in women and men and the P-values were combined using the Fisher method .
The relationship between gene expression and each CHD risk factor was tested by a linear regression model using R-lm, with gene expression as the dependent variable. Association with age was adjusted for gender while association with other risk factors were adjusted for gender, age and, if specified, BMI. A square root transformation was applied to CRP levels to remove positive skewness. A study-wise statistical significance threshold of 3.9×10−7 was used to correct for the number of tests (10 risk factors ×12,808 gene expressions). For each expression trait that was associated with a risk factor and also affected by cis eSNPs, we tested the interaction between the risk factor and the best cis eSNP on expression in a regression model.
The goal of this analysis was to identify subsets of expression traits independently associated with each risk factor. To increase the robustness of the analysis the population was randomly divided into 2 sub-samples of equal size which were used for screening and validation purpose respectively. The screening step was focused on the subsets of expression traits that were associated with each covariate-adjusted risk factor in univariate analysis at P<3.9×10−6 (Bonferroni correction for 12,808 expressions). Each risk factor and corresponding subset of expression traits were included as dependent and predictor variables respectively in a forward stepwise regression model to identify expression traits that were independently associated with the risk factor (P<0.01). Gene expressions selected at the screening step were then jointly tested in the validation sample for association with the risk factor by multiple regression analysis. This screening/validation procedure was repeated 250 times and for each risk factor, expression traits associated (P<0.01) with the risk factor in more than 25% of the replicates are reported.
Power was calculated using the program Quanto (http://hydra.usc.edu/GxE/). Assuming a quantitative expression trait with mean 0 and SD 1, a sample size of 1,490 subjects, a type I error of 5.78×10−12 and an additive allele effect, the study had a 82% power to detect the effect of a SNP explaining 4% of gene expression.
An ontology analysis was performed using the Panther database (http://www.pantherdb.org/). Lists or sublists of genes involved in associations with eSNPs or risk factors were compared to the background list of the 12,808 genes. The P-value calculated by the binomial statistic and Bonferroni-corrected was used.
Population stratification and quality of genotypes and expression data were tested extensively and outliers were excluded on the basis of multidimensional scaling analysis (see Methods S2)
Characterization of polymorphic probes in eQTLs.
(0.33 MB DOC)
Loci identified in GWAS of BMI and BP - associations of lead/tag SNPs with phenotypes and expressions, and of expressions with phenotype in GHS.
(0.07 MB DOC)
Sets of gene expressions robustly and independently associated with each risk factor in the validation samples.
(0.21 MB DOC)
(0.18 MB DOC)
Data quality checking.
(0.21 MB DOC)
GHS - Characteristics of cis eQTLs.
(0.40 MB XLS)
Comparison cis eQTL in GHS and the study of Stranger et al.
(0.10 MB XLS)
Comparison cis eQTL in GHS and the study of Dixon et al.
(0.14 MB XLS)
Comparison cis eQTL in GHS and the study of Schadt et al.
(0.28 MB XLS)
Comparison cis eQTL in GHS and the study of Goring et al.
(0.83 MB XLS)
GHS - Expression traits associated with risk factors.
(0.29 MB XLS)
GHS - Interaction cis eSNPs with risk factors.
(0.12 MB XLS)
We thank John Todd and Chris Wallace for very helpful comments on our manuscript.
We acknowledge Carolin Neukirch, Fatma Karaman, Jutta Bähr and Stefanie Müller for technical assistance, Andreas Weith for help during technical performance of GWV and GWE experiments and Alexandru Munteanu for assistance in informatics.
Competing Interests: In this study, Boehringer Ingelheim provided payment for two employees for expression microarray analyses and array purchase and PHILIPS Medical Systems provided instruments for ultrasound studies. However, this does not alter the authors' adherence to all the PLoS ONE policies on sharing data and materials, as detailed online in the guide for authors of the Journal.
Funding: The Gutenberg Heart Study is funded through the government of Rheinland-Pfalz (Stiftung Rheinland Pfalz für Innovation, contract number AZ 961-386261/733), the research programs Wissen schafft Zukunft and Schwerpunkt Vaskuläre Prävention of the Johannes Gutenberg-University of Mainz and its contract with Boehringer Ingelheim and PHILIPS Medical Systems including an unrestricted grant for the Gutenberg Heart Study. Specifically, the research reported in this article was supported by the National Genome Network NGFNplus (contract number project A3 01GS0833) by the Federal Ministry of Education and Research, Germany. For this particular research paper, Boehringer Ingelheim provided payment of two employees for expression microarray analyses and array purchase. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.