|Home | About | Journals | Submit | Contact Us | Français|
AGES: Study design and phenotype collection, TBH,VG,LJL; Genotyping,; data analysis, AVS, MAN,TA; manuscript preparation, MAN, NS; manuscript revisions, TBH,VG,LJL, AVS, ABS, DGH, MAN
ARIC: Study design and phenotype collection, EB, JC, SF, AC; Genotyping, EB, AC; data analysis, SKG, AK, AC, AG, GBE; manuscript preparation, SKG, SF; manuscript revisions, SKG, AC, EB, JC, AK, SF.
CHS: Study design and phenotype collection, BMP, JIR, MC, NAZ, NLG; Genotyping, JIR; data analysis, NLG, TL; manuscript preparation, NAZ; manuscript revisions BMP, JCB, KR, JLR, MC, NAZ, NLG, TL,
FHS: Study design and phenotype collection, MHC, QY, CSF, DL, LAC, CJO, JPL; Genotyping,; data analysis, MHC, QY, JPL; manuscript preparation, MHC, CJO, JPL; manuscript revisions, MHC, QY, CSF, DL, LAC, CJO, JPL.
Rotterdam: Study design and phenotype collection, FJAR, AH, AGU, BAO, CMD, JCMW; Genotyping, AGU; data analysis, FJAR, JFF, AD, GCV; manuscript preparation, FJAR, JFF, CMD; manuscript revisions FJAR, JFF, AD, GCV, AH, AGU, BAO, CMD, JCMW
InCHIANTI: Study design and phenotype collection, LF,JMG,SB; Genotyping, ; data analysis, MAN,TT; manuscript preparation, MAN; Manuscript revisions, LF, JMG, SB, KVP, ABS, DGH, MAN, TT
Genotyping, AR, JJZ, JMvG
TwinsUK: Study design and phenotype collection, TDS, SLT, PD; data analysis, NS; manuscript preparation NS, manuscript revisions, NS
UKBS1: Study design and phenotype collection, WHO, JGS; data analysis, NS; manuscript preparation NS, AR, manuscript revisions, NS, AR
KORA: Study design and phenotype collection, CM, CG; data analysis, CG, BK; manuscript preparation CG, manuscript revisions, CG
SHIP: Study design and phenotype collection, MN, AG; data analysis, AT
Gene expression: AR, JJZ, JMvG
Erythrocyte measures are heritable and have important health implications, yet their genetic determinants are largely unknown. We performed genome-wide association analyses in 24,167 European-ancestry individuals for six erythrocyte traits and identified associations at 23 loci (P values 5×10-8 to 1×10-57). Replication testing in an independent set of 9,456 European-ancestry individuals showed strong evidence of association in all 23 loci in meta-analysis of the discovery and replication cohorts. Our findings include previously identified loci (HBS1L/MYB, HFE, TMPRSS6, TFR2, SPTA1) and novel associations (EPO, TFRC, SH2B3, and 15 other loci). This study has identified novel determinants of erythrocyte traits, offering insight into common variants underlying variation in erythrocyte measures.
Red blood cell disorders such as anemia and erythrocytosis are broadly associated with multiple comorbid conditions including hypertension and other cardiovascular diseases, yet the genetic determinants of erythrocyte traits in the general population are poorly defined. Erythrocytes, which comprise approximately 40% - 50% of blood volume, are a key component for the transport of oxygen and carbon dioxide for cellular respiration. In clinical practice, measures of erythrocyte quantity, size and composition are routinely tested to diagnose and monitor hematologic diseases as well as the overall health of patients. Variation in erythrocyte measures even within normal ranges are related to other non-hematologic diseases and mortality1-3.
Erythrocyte production and quality are under various environmental and genetic influences. While environmental exposures, dietary intake of vitamins and iron, and the anemia of chronic disease contribute substantially to abnormalities of erythrocyte measures, the heritability of erythrocyte traits ranges from 40% - 90%4-6. Disorders of hemoglobin production and hemoglobinopathies are some of the most common genetic diseases in the world, owing to natural selection. Some known low-frequency Mendelian variants also lead to inter-individual variability in erythrocyte traits in the general population7, 8. Candidate gene studies have identified a few non-hemoglobin loci, including EPOR and HBS1L, related to variation in erythrocyte traits8-10. Early genome-wide association and linkage studies of erythrocyte measures, which have identified a few associations, such as at chromosome 6q23, lacked statistical power for association and genetic resolution for testing competing hypotheses6, 11-13.
To investigate genetic determinants of erythrocyte traits in the general population, we carried out genome-wide association studies and meta-analysis within multiple community-based cohorts comprising the CHARGE consortium14, followed by replication in independent samples. We identified 23 genetic loci associated with these erythrocyte traits. We further extend these findings to investigate possible links between these traits and vascular diseases, reporting associations of a few of the 23 loci identified with blood pressure and hypertension.
The total sample size for the individual cohort genome-wide association analysis and the CHARGE meta-analysis was 24,167 (the Age, Gene/Environment Susceptibility Reykjavik Study (AGES), N=3,205; the Atherosclerosis Risk in Communities Study (ARIC), N=7,803; the Cardiovascular Health Study (CHS), N=3,256; the Framingham Heart Study (FHS), N=3,359; and the Rotterdam Study (RS), N=5,523). We also included the Invecchiare in Chianti Study (InCHIANTI, N=1,021), an Italian cohort study, in these analyses. Characteristics of the study participants, including age, sex and trait summaries, are presented in Table 1.
We studied six erythrocyte traits: hemoglobin concentration (Hgb); hematocrit (Hct); mean corpuscular volume (MCV); mean corpuscular hemoglobin (MCH); mean corpuscular hemoglobin concentration (MCHC); and red blood cell count (RBC), as defined in Supplementary Table 1. When cohort results were combined, 831 SNP associations at 23 independent loci (r2 < 0.3 between loci) across the six traits reached the genome-wide (GW) significance threshold of P < 5×10-8. The -log10(P value) genome-wide association plots for the meta-analysis of each of the 6 traits are shown in Figure 1. Corresponding QQ-plots are shown in Supplementary Figure 1a and the genomic control lambda (λGC) values in Supplementary Table 2. The genomic control inflation factor post-meta-analysis, which was not corrected at the meta-analysis level, showed no systematic inflation (Hgb λGC = 1.066; Hct λGC = 1.045; MCH λGC = 1.014; MCHC λGC = 0.995; MCV λGC = 1.029; and RBC λGC = 1.029; Supplementary Table 2). The meta-analysis results for all traits are summarized in Table 2, which is organized by the 23 independent loci and includes gene annotation information for each locus. The table also lists for each trait the number of SNPs exceeding the GW significance level. Altogether, there were 45 trait-locus combinations with at least one GW significant SNP. The complete set of SNP associations identified by the CHARGE meta-analysis is provided in Supplementary Table 3. Replication and further analysis focused on the 45 SNPs that gave the smallest P values for each of the 45 trait-locus findings in CHARGE.
Replication of the 45 SNPs was conducted using a meta-analysis of association data in 9,456 independent European-ancestry individuals from five population-based cohorts in the HaemGen Consortium (Supplementary Note). A joint analysis of the HaemGen and CHARGE data showed a decrease in P values for all but two SNPs selected for replication. For one of the two SNPs (rs1800562) that did not show an improvement in P value when associated with Hct, the association to the Hgb trait was significant after Bonferroni correction, and for the second SNP (rs4466998), the association to MCV in the joint analysis of CHARGE and HaemGen data remained genome wide significant (P = 4.91 × 10-8). Significant independent replication for at least one trait was observed at 13 of 23 loci, using a Bonferroni-corrected significance threshold of P < 0.0011, or 0.05/45. Taking the joint meta-analysis results in sum, these data provide supportive evidence that the 23 loci from the discovery meta-analysis are true positives. Table 3 provides the full replication results, including beta coefficients, standard errors, and P values for the primary CHARGE findings, the HaemGen replication, and a combined meta-analysis of the two consortia for the 45 CHARGE trait-locus SNPs.
For each lead SNP in the 23 independent loci, percent variance explained for each of the lead SNPs in the corresponding trait is provided in Table 3, averaging the percent variance explained for each SNP across the CHARGE cohorts. The combination of lead SNPs from each of the trait loci showed that average percent variance explained by the combination of lead SNPs, beyond the variance explained by age and gender, was 1.14% of Hgb variation (rs10495928, rs1800562, rs10224002, rs16926246, rs11065987, rs6013509, rs855791); 1.16% of Hct variation (rs10168349, rs1800562, rs7385804, rs10224002, rs16926246, rs11065987, rs9483788, rs2413450); 4.53% of MCH variation (rs11915082, rs1408272, rs9349205, rs7776054, rs628751, rs10758658, rs1122794, rs11085824, rs2413450); 0.63% of MCHC variation (rs857721, rs9373124); 5.98% of MCV variation (rs2540917, rs9859260, rs172629, rs1800562, rs9349205, rs9374080, rs4895441, rs643381, rs12718597, rs7786877, rs10758658, rs11239550, rs4466998, rs7189020, rs7255045, rs2413450, rs131794); and 0.85% of RBC variation (rs9483788, rs2075671).
For 20 of the identified loci, top associated SNPs were identified within a +/- 60 Kb window of a RefSeq gene (Table 2). For three loci, chromosomes 4q12, 6q24.1, 20q13.2, no genes were identified within this window, with the nearest genes approximately 116 Kb, 89 Mb, and 50 Mb away, respectively. Of the 23 loci, previously reported mutations or genetic associations for erythrocyte traits, markers of iron status or fetal hemoglobin levels, have been noted at six loci containing the genes HFE, TFR2, TMPRSS6, SPTA1, HBS1L-MYB, and BCL11A. Most of the remaining loci have not previously been reported to be associated with erythrocyte traits, though several genes are known to have important roles in erythrocyte biology or erythropoiesis. Genes identified near the associated loci, and their associated erythrocyte traits, are presented in Table 2 and Figure 2. Gene annotations, including gene information, known genetic mutations causing hematologic and non-hematologic diseases, and previously defined roles in hematologic and cardiovascular systems are listed in Supplementary Table 4. We confirmed the association of the known C282Y (rs1800562) and H63D (rs1799945) mutations in the HFE gene, mutations that are already known to underlie hereditary hemochromatosis, with Hgb, Hct, MCH and MCV.
RNA expression levels for genes within a 1 Mb interval of each of the 23 loci we identified are presented in Supplementary Figure 2, for erythroid bodies (EBs), human umbilical vein endothelial cells (HUVECs), and seven other blood cell lines. For the top associated locus, chromosome 6q23.3, four genes were identified (ALDH8A1, HBS1L, MYB, AHI1) within the 1 Mb interval. A heatmap showing gene expression levels for each of these four genes is shown in Figure 3, demonstrating approximately 2-fold expression of MYB in EBs compared to other cell lines. Gene expression was detected for genes in multiple loci (Supplementary Figure 2). Since the identification of gene expression in biologically relevant tissues provides a rationale for prioritization of candidate genes for further genetic or functional investigations, we noted broad categories of expression patterns in EBs and HUVECs. The most highly expressed genes in EBs were in chromosomes 3q29 (TFRC), 6p22.2 (HIST1H4C, which is near HFE), 6p21.1 (CCND3), 10q21.3 (HK1), 12q24.12 (RPL6P27), 16p13.3 (HBZ, HBA1), and 22q12.3 (TST, RAC2). The most highly expressed genes in HUVECs were in chromosomes 6p22.2 (HIST14HC), 7q22.1 (SERPINE1), and 22q13.3 (MFNG).
The results of association testing for the 45 lead SNPs from the CHARGE analysis of erythrocyte traits within the 23 loci are summarized in Supplementary Table 5a. We identified associations at a Bonferroni-corrected significance threshold P < 0.00135 (0.05/37, since 37 of the 45 SNPs are unique) in chromosomes 12q24.1 (SH2B3) and 7q36.1 (PRKAG2). The previously reported association of the chromosome 12q24.1 (SH2B3) locus with systolic blood pressure (SBP) and diastolic blood pressure (DBP) was the most significant association (rs1106598, SBP P = 1.2×10-6, HTN P = 0.0035; rs1763023, DBP P = 4.2×10-8). In the reported BP and HTN analysis, signals at this locus spanned 700kb from rs3184504 to rs1106618815, and association signals in Hgb and Hct spanned 987kb, from rs3184504 in SH2B3 to rs11066301 in PTPN11 and contained multiple genes15. Inspection of RNA expression data (Supplementary Figure 2) showed that in this region, SH2B3 and ATXN2 show high levels of gene expression in erythroid bodies and endothelial cells. Nominal associations (P < 0.05) were identified in the chromosomes 6p22.2 (HFE), 6q24.1, 7q22.1 (TFR2), and 20q12.3 (Supplementary Table 5a), and results of the evaluation of SNPs associated with BP and hypertension are presented in Supplementary Table 5b.
In this meta-analysis of genome-wide association data from 24,167 European-ancestry individuals from six cohort studies in the CHARGE Consortium, we identified 23 loci associated with at least one of the six erythrocyte traits Hgb, Hct, MCH, MCHC, MCV, and RBC. We sought evidence for replication in an independent analysis of data from 9,456 European-ancestry individuals in the HaemGen Consortium. In the joint meta-analysis, merging CHARGE and HaemGen data, all 23 loci had P values less than P < 5×10-8, implying strong associations that merit further study. Among the 23 loci, six were previously known QTLs, and 17 are novel loci, some of which contain genes known to be involved with iron homeostasis, erythropoiesis, globin synthesis and erythrocyte membrane function. Finally, an investigation of possible links between these erythrocyte traits and blood pressure and hypertension confirmed overlap at the previously known SH2B3 locus, and identified additional suggestive associations, none of which met a genome-wide significance threshold.
The six erythrocyte traits studied included some that are highly correlated, and as expected, we observed a high degree of concordance in the results across the six traits. Among the many genome-wide significant associations identified, the patterns of association are likely to reflect the correlations among these related traits. Interestingly, MCHC, a ratio of Hgb and Hct, two directly measured traits, is uniquely associated with chromosome 1q23.1 (SPTA1), a gene with several rare mutations known to cause deformation of erythrocytes16, 17. Reviewing the results in total, we observed there were generally three patterns of significant associations among the six traits (Figure 2). Results were generally similar for: (1) Hgb and Hct, which are mainly quantitative measures of hemoglobin in the blood; (2) MCH and MCV, representing erythrocyte size and quantity of hemoglobin per erythrocyte; and (3) MCHC, the ratio of Hgb to Hct, which appears somewhat distinct from the other traits. Across the six traits studied, the strongest signal was found in the HBS1L/MYB locus on chromosome 6q23, which was observed for five of the six individual traits (Hct, MCH, MCHC, MCV, RBC) at the genome-wide significance level. This locus also provided a modest but non-significant result for Hgb (rs4895441, P = 4.8×10-4). In addition, the Hgb/Hct and MCH/MCV patterns overlapped for associations in the chromosome 6p22.2 (HFE), 22q12.3 (TMPRSS6) and 7q22.1 (TFR2/EPO) loci. The RBC results represent a subset of the overlap between the Hgb/Hct and MCH/MCV patterns, with associations observed in the 7q22.1 (TFR2/EPO) and 6q23 (HBS1L/MYB) loci. Across the erythrocyte traits, overlap occurs where known patterns of traits characterize various clinically observed hematologic diseases, providing a possible context in which to interpret the overlap of associations.
We annotated and categorized the findings of our analyses by association with known genetic disorders, biologic function, or altered function of the hematopoietic system, to assist with interpretation of the findings (Supplementary Table 4). We here consider these multiple findings in light of their potential role in several processes critical to erythrocyte biology, including iron homeostasis, erythrocyte membrane function, erythropoiesis and globin synthesis.
We identified genome-wide significant association of SNPs within the HFE gene with Hgb, Hct, MCH, and MCV. C282Y mutation in HFE is the principal cause of hereditary hemochromatosis, a common autosomal recessive iron overload disease in individuals of northern European descent18. This mutation was associated with increased MCV and Hgb concentrations in a study of individuals drawn from a hemochromatosis and iron overload screening study7, and this variant was the lead association result for both Hgb and Hct in our study. Heterozygotes for either allele do not manifest clinical iron overload but may display an increased iron uptake and “resistance” to anemia, and the C282Y mutation may increase risk of coronary heart disease by increasing iron stores and lipid oxidation19, 20. The HFE gene induces expression of the iron regulatory hormone hepcidin. Hepcidin has recently emerged as the likely link between the inflammatory response and the handling of iron for erythropoiesis by both downregulating the absorption of iron in the intestine and by inhibiting the release of iron from macrophages21-24. SNPs within the TMPRSS6 gene were associated with Hgb, Hct, MCH and MCV. TMPRSS6 was identified by linkage and association studies in five families and two sporadic cases with iron-refractory iron deficiency anemia, a rare Mendelian disease25. TMPRSS6 encodes a type II transmembrane serine protease produced by the liver that regulates the expression of hepcidin25. The transferrin receptor (encoded by TFRC(TFR1)) and transferrin receptor 2 (TFR2) are highly homologous type II trans-membrane proteins in the transferrin protein family. SNPs within TFRC were associated with MCH and MCV, and SNPs within TFR2 were associated with Hct and MCV. Reduced TFRC expression is associated with anemia26. Existing evidence indicates that TFR2 is also a modulator of hepcidin expression, and mutations in TFR2 cause hemochromatosis type 327.
Two loci, chromosome 2p16.1 (BCL11A) associated with MCV, and chromosome 6q23.3 (HBS1L-MYB) associated with all traits except for Hgb, are related to variation in fetal hemoglobin and hemoglobin beta levels28, 29. BCL11A is an oncogene related to B-cell malignancies30 and regulates fetal hemoglobin expression31. BCL11A is expressed in erythroid precursors, and we observed BCL11A expression in EBs (Supplementary Figure 2) making it a biologically plausible candidate gene for erythrocyte trait variation32. A healthy population study showed polymorphisms in HBS1L and MYB influences erythrocyte, platelet, and monocyte counts10. Although the role of HBS1L is unknown, MYB has been associated with proliferation, survival, and differentiation of hematopoietic progenitor cells33, 34. MYB is also associated with eosinophil counts in blood and atopic asthma35. There are strong associations between SNPs within this locus and multiple erythrocyte traits (lead SNP rs4895441, Hgb P = 4.8×10-4; Hct P = 9.7×10-10; MCH P = 7.8×10-32; MCHC P = 4.5×10-9; MCV P = 1.0×10-57; and RBC P = 2.2×10-15). These strong genetic effects may explain why several prior linkage analyses of erythrocyte traits have identified this chromosomal region6, 11, 13. SNPs within SH2B3 are associated with Hct and Hgb. Interestingly, the SNPs within this gene are associated with blood pressure, myocardial infarction, type 1 diabetes, and celiac disease15, 35-39. SH2B3 is expressed in hematopoietic precursor cells and increases hematopoietic progenitors of erythroid, megakaryocytic, and myeloid lineages35, 40. SH2B3 is also expressed in vascular endothelium, where it promotes inflammation and may thereby contribute to vascular disease. Expression in different cell lineages and tissues may underlie the diverse pleiotropic consequences of SH2B3 on hematopoietic traits, autoimmune diseases, and vascular diseases. In the same locus, the PTPN11 gene product interacts with the transcription factor SHP2, which has an essential role in blood development that has been demonstrated in a murine Shp2-/- model41, and PTPN11 mutations cause Noonan’s and LEOPARD syndromes and juvenile myelomonocytic leukemia42-44. Lastly, we identified associations for Hct, MCV and RBC near the EPO gene. Erythropoietin, a glycoprotein hormone that controls erythropoiesis, is the first human recombinant hematopoietic protein approved for human use and is now used widely for the treatment of anemia. EPO variants have previously been described in association with diabetic retinal and renal vascular complications45 but not with erythrocyte traits.
SNPs within the SPTA1 gene were associated with MCHC. SPTA1 encodes erythroid spectrin, a protein in the erythrocyte membrane, and is essential in determining the shape and deformability of erythrocytes. Spectrin mutations have been previously associated with hemolytic anemia, elliptocytosis, spherocytosis, and propoikilocytosis, but not with variations in MCHC outside of disease states16, 17.
The gene expression data may be viewed as additional annotation of the 23 loci we identified, confirming which genes are expressed in cell types of interest. These data may be used to generate further specific hypotheses that can then be tested at a functional and molecular level.
Multiple lines of evidence formed the basis for our rationale to study the relationship of the SNPs identified through the study of erythrocyte traits to blood pressure (BP) and hypertension. Prior studies have shown that Hgb and Hct levels are associated with increased risk for hypertension and a variety of other vascular diseases and mortality1-3, 46-49. From a rheologic perspective, blood viscosity depends largely on Hgb or Hct levels and is a determinant of blood pressure50-53. There is an inverse relationship between viscosity and vascular blood flow54, and elevated Hct thereby hampers organ perfusion. Given these findings, we are intrigued by the overlap between the association results for Hgb and Hct from our study and the recently reported associations for BP and hypertension15, 36. We observed overlap of associations in the chromosome 12q24.12 region across a 987 kb linkage disequilibrium block, containing SH2B3, ATXN2, BRAP, C12orf03, TRAFD1, ACAD10, TMEM116 and PTPN11. We also identified associations within the 7q36.1 region containing PRKAG2, which does not have specifically known hematologic or vascular roles, but mutations in this gene cause cardiomyopathy and cardiac conduction system disorders55, 56. Neither causality nor independence of these associations is necessarily supported by these findings. However, these associations suggest that common genetic bases may underlie some of the correlation seen between erythrocyte, BP and hypertension traits. Further confirmation in large independent cohorts may provide stronger evidence for the strength and consistency of the associations with hypertension.
Limitations of this study include restriction of the discovery and replication analyses to individual of European ancestry. Several spectrin and globin mutations have been identified in African American kindreds and the prevalence of hemoglobinopathies of various types is generally higher among individuals of African ancestry, highlighting the need for further investigation of these findings in individuals of non-European-ancestry57. As with any meta-analysis of genome-wide association results across different cohorts, population structure and other sources of heterogeneity may have caused false positives or false negatives. To assess population structure, we examined the per-cohort λGC, demonstrating that these values were consistently below 1.08, and we applied genomic control to the cohort-level test statistics. The final meta-analyses also showed no systematic inflation of the distribution of the final association statistics. In the replication analysis, for those loci that did not meet a conservative replication test, power may have been limited, and many are likely to improve with additional study. Finally, the interpretation of multiple analyses of correlated traits requires caution, particularly in attributing causality or independence of effects. We take the findings from our analyses of the six erythrocyte traits to indicate a set of loci that are of interest with regards to erythrocyte production, homeostasis and function. Specific differences in association patterns may highlight different pathways, and to understand this more deeply, further studies are needed.
In summary, we have identified and validated common variants at several known and novel loci that influence the levels of six clinically relevant red blood cell measures in population-based cohorts. These QTLs have implications for understanding a variety of hematologic diseases as well as correlates of erythrocyte traits, such as BP and hypertension. Further studies are warranted to define these variants in the extremes of the distributions of these traits and ethnically diverse populations and to understand the functional impact of variants at the implicated candidate genes.
Complete study acknowledgments are listed in the Supplementary Note. The authors thank the studies’ participants, staff and the funding agencies for their support.
We performed a cross-sectional analysis of genotype and phenotypic data on erythrocyte traits in the CHARGE consortium14, which includes five cohort studies that have genotyped high density SNP markers and have phenotypic data on erythrocyte traits (AGES N=3,205; ARIC N=7,803; CHS N=3,256; FHS N=3,359; RS N=5,523) and InCHIANTI (N=1,021). Each participating study was approved by its corresponding Institutional Review Board, and all study subjects provided informed consent for participation in the study and genetic research. Participants were excluded if they were of non-European ancestry, as determined by self-report, and also by principal component analysis in ARIC and RS. Detailed methods for each of the participating cohorts are provided in the Supplementary Note.
Briefly, each study genotyped samples using high-density SNP marker platforms (Affymetrix SNP6.0 - ARIC; Affymetrix 500K - FHS; Illumina 370K - AGES, CHS; Illumina 550K-InCHIANTI, RS). Genotypes were then imputed to a set of approximately 2.5M HapMap SNPs using Phase II CEU HapMap individuals for reference using either MACH (http://www.sph.umich.edu/csg/abecasis /MACH) (ARIC, AGES, FHS, InCHIANTI, RS) or BimBam58 (CHS) software.
Erythrocyte parameters studied were 1) hemoglobin concentration (Hgb), the concentration of hemoglobin within whole blood; 2) hematocrit (Hct), the percent of whole blood comprised of cellular erythrocyte elements 3) red blood cell count (RBC), the number of red blood cells per volume of blood. 4) mean corpuscular volume (MCV), the average erythrocyte volume; 5) mean corpuscular hemoglobin (MCH), the average quantity of Hgb per erythrocyte; and 6) mean corpuscular hemoglobin concentration (MCHC), the ratio of Hgb to Hct. The definition and units of each trait are provided in Supplementary Table 1. Blood was drawn from each participant using standard phlebotomy methods, and erythrocyte measures were obtained using standard clinical assays in certified laboratories.
All traits studied were continuous. Based on prior convention and visual inspection of the data, MCH, MCHC and MCV were natural log transformed, RBC was square root transformed, and Hgb and Hct were not transformed prior to analyses. In order to focus on determinants of variation of these traits in the general population, rather than on specific hematologic diseases which are over-represented at the tails of the distribution for each of the traits, we restricted analysis to those individuals within three standard deviations of the sample mean within each cohort. For each SNP meeting QC criteria, linear regression was used to assess association with trait, separately for all six traits. An additive genetic model was used throughout. These regressions were adjusted for age, gender, and site in the multi-center cohorts. In FHS, linear mixed effects models were used to account for relatedness, and these models included adjustment for principal components, computed using Eigenstrat 2.059. The genome-wide level of significance threshold was set at P < 5×10-8.
Each cohort’s results were forwarded to a central repository, including regression coefficients, standard errors, sample size, imputation quality, minor allele designation and minor allele frequency. Prior to meta-analysis, we performed genomic control on each cohort-specific distribution of the association test statistics for each trait60. We also filtered out SNPs with allelic frequency less than 1% or poor imputation quality (the ratio of observed variance of imputed allele counts to the expected variance of imputed allele counts > 1.1 from the imputation software output). Separately for each SNP and trait, within-cohort association results were combined in an inverse variance weighted meta-analysis, as implemented in METAL (http://www.sph.umich.edu/csg/abecasis/Metal/index.html). After meta-analysis, genomic control inflation factor (λGC) was again calculated to assess stratification between the cohorts and resulting inflation of the test statistics. Genomic control was not applied to the final meta-analysis results. The SNAP program (http://www.broadinstitute.org/ mpg/snap) was used to estimate linkage disequilibrium between the associated loci. Percent variance explained within each cohort was calculated from the r2 estimate derived from a linear regression model for individual lead SNPs at each trait locus and the combination of all leads SNPs per trait, accounting for age and gender as well.
The replication set included samples from the five population-based cohorts of individuals of European ancestry that comprise the HaemGen Consortium (Study of Health in Pomerania (SHIP) N=3,200; UK Blood Services Common Control (UKBS1-CC1) N=1,290; Twins UK adult twin registry N=1,510; KORAF3 500K study population N =1,643; and KORA-F4 N=1,814). Further information on these cohorts is provided in the Supplementary Note. Trait definitions were identical to our initial study. Analysis for each of the six traits undergoing replication was implemented in the same way as for the CHARGE analyses. For replication, we used a Bonferroni correction for the number of SNPs tested by the HaemGen Consortium. Given the smaller sample size available for the HaemGen replication analysis relative to the CHARGE discovery analysis, we performed a combined meta-analysis on the top trait-locus SNPs identified by the CHARGE meta-analysis, to assess impact on the association signals.
To assist with prioritization of the candidate genes identified in this study, we evaluated RNA expression in eight blood cell lines (stem-cell derived erythroblasts (EB) and megakaryocytes (MK), CD14+ monocytes, CD56+ natural killer (NK) cells, CD4+ T helper (Th) lymphocytes, CD8+ cytotoxic T (Tc) lymphocytes, and CD66b+ granulocytes), using an established catalogue of gene expression in these lineages61; we also examined gene expression in cultured human umbilical vein endothelial cells (HUVECs)62. For each of the 23 identified loci, we examined the transcript levels in the nine cell types of all genes within a +/-500kb window centered on the lead association SNP.
Briefly, EB and MK samples were obtained by culturing of CD34-positive haematopoietic stem cells purified from cord blood cultured with EPO-IL3-SCF for 10 days and with TPO, respectively61. Cultured cells were sorted by fluorescence-activated cell sorting using either a monoclonal antibody against CD235a (glycophorin A) or against CD41 (άIIb integrin). The other six blood cell types were purified from the peripheral blood of seven healthy subjects using the corresponding CD markers. Blood cell types were hybridized onto Illumina V2 Ref6 gene expression beadarrays, and the detailed methods for isolation of other hematologic cell types, RNA extraction and microarray analysis are described elsewhere61. Cultured human umbilical vein endothelial cells (HUVECs) and expression profiles were ascertained as has been described elsewhere62 and using the same microarray platform61.
Raw data was VST transformed using the R/Bioconductor package “lumi,”63 followed by quantile normalization. The mode of the transformed signal intensity is 7.94 and can be taken as the background intensity. The maximum probe intensity was 15.85 corresponding to a signal intensity of 59,000 in a linear scale. For each cell type and probe, a Grubbs test for outlier identification was used and samples with P values less than 0.01 were removed (implemented in the R package “outliers”). This was not performed for the MK, EB or HUVEC cell types due to their smaller sample size (4, 4, and 3 respectively). Probe mappings were obtained from re-annotation efforts available at http://www.compbio.group.cam.ac.uk/Resources/Annotation/.
To determine genes within or neighboring each locus, we examined RefSeq gene annotations, build 36. To annotate the loci and genes, we reviewed the literature, OMIM64 and the Genetic Association Database65 (Supplementary Table 4).
Following the observation that epidemiologic studies have identified a strong yet unexplained link between Hgb and Hct and blood pressure, we examined the associations of these SNPs to those previously reported meta-analysis within the CHARGE consortium for systolic blood pressure (SBP), diastolic blood pressure (DBP) and hypertension1, 2, 15. The CHARGE cohorts that contributed samples to the BP and hypertension analyses were AGES, ARIC, CHS, FHS, and RS, with a total sample size of N=29,136. SBP and DBP measured at the first visit attended were continuous traits, and hypertension was analyzed as a dichotomous trait15. We tested the association of the lead SNP per locus for each trait (45 SNPs) for association with SBP, DBP and hypertension15, using a Bonferroni corrected threshold for the number of SNPs tested. We additionally reversed the analysis, examining the test statistics within each of the six erythrocyte traits for those SNPs reported to be associated with SBP, DBP or hypertension15.
Competing Interest Statement: Aravinda Chakravarti is a paid consultant with Affymetrix in accordance with the policies of Johns Hopkins.
Cardiovascular Health Study, http://www.chs-nhlbi.org/;
Framingham Heart Study, http://www.framinghamheartstudy.org/about/index.html;
Rotterdam Study, http://www.epib.nl/ergo.htm;
GenABLE and ProbABEL, http://mga.bionet.nsc.ru/~yurii/ABEL/;
MACH v1.0.15/16 (http://www.sph.umich.edu/csg/abecasis/MaCH/index.html);
Illumina BeadChip Probe Annotation http://www.compbio.group.cam.ac.uk/Resources/Annotation/.