Search tips
Search criteria 


Logo of mayoclinprocLink to Publisher's site
Mayo Clin Proc. 2012 May; 87(5): 461–474.
PMCID: PMC3538470

Genetic Loci Implicated in Erythroid Differentiation and Cell Cycle Regulation Are Associated With Red Blood Cell Traits



To identify common genetic variants influencing red blood cell (RBC) traits.

Patients and Methods

We performed a genomewide association study from June 2008 through July 2011 of hemoglobin, hematocrit, RBC count, mean corpuscular volume, mean corpuscular hemoglobin, and mean corpuscular hemoglobin concentration in 12,486 patients of European ancestry from the electronic MEdical Records and Genomics (eMERGE) network. We developed an electronic medical record–based algorithm that included individuals who had RBC measurements obtained for clinical care and excluded values measured in the setting of hematopoietic disorders, comorbid conditions, or medications known to affect RBC production or a recent history of blood loss.


We identified 4 new genetic loci and replicated 11 loci previously reported to be associated with one or more RBC traits in individuals of European ancestry. Notably, genes present in 3 of the 4 newly identified loci (THRB, PTPLAD1, CDT1) and in 6 of the 11 replicated loci (KLF1, ALDH8A1, CCND3, SPTA1, FBXO7, TFR2/EPO) are implicated in erythroid differentiation and regulation of cell cycle in hematopoietic stem cells.


Genes in the erythroid differentiation and cell cycle regulation pathways influence interindividual variation in RBC indices. Our results provide insights into the molecular basis underlying variation in RBC traits.

Abbreviations and Acronyms: eMERGE, electronic MEdical Records and GEnomics; EMMAX, mixed-model association-expedited; EMR, electronic medical record; eQTL, expression quantitative trait locus; GHC, Group Health Cooperative--University of Washington; GWAS, genomewide association study; HCT, hematocrit; HGB, hemoglobin; IBS, identity-by-state; LD, linkage disequilibrium; MC, Marshfield Clinic; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MCV, mean corpuscular volume; MIM, Mendelian Inheritance of Man; NU, Northwestern University; RBC, red blood cell; SNP, single-nucleotide polymorphism; VUMC, Vanderbilt University Medical Center

The red blood cell (RBC) acts as a container for hemoglobin (HGB) and thereby assumes the critical function of oxygen transport and delivery from the lungs to tissues throughout the body to support aerobic metabolism.1 The RBC membrane and metabolic machinery facilitate oxygen transport and delivery function of HGB. Disorders involving RBCs, such as anemia, are common and associated with adverse health outcomes, including cardiovascular events.2 The RBC traits have a substantial genetic component, with heritabilities of 0.56, 0.52, and 0.52 reported for RBC count, mean corpuscular volume (MCV), and mean corpuscular hemoglobin (MCH), respectively.3 Recently, genomewide association studies (GWAS) in cohorts of European ancestry,4-7 as well as in a Japanese cohort,8 have reported multiple quantitative trait loci associated with one or more RBC traits.

The electronic MEdical Records and GEnomics (eMERGE) network ( was established to develop and implement approaches for leveraging biorepositories with electronic medical record (EMR) systems for large-scale genomic research, including but not limited to GWAS, sequencing, and structural variation.9 The 5 participating sites include Group Health Cooperative−University of Washington (GHC), Marshfield Clinic (MC), Mayo Clinic, Northwestern University (NU), and Vanderbilt University Medical Center (VUMC). Each site is conducting GWAS of multiple phenotypes, with Mayo Clinic investigating genetic loci associated with peripheral arterial disease and RBC traits including HGB, hematocrit (HCT), RBC count, MCV, MCH, and mean corpuscular hemoglobin concentration (MCHC).

Using the Mayo eMERGE cohort (n=3336), we demonstrated feasibility of EMR-based genetic studies to replicate genetic loci previously associated with RBC traits.10 In this article, we report results of an analysis of 12,486 patients in the eMERGE network, performed to identify additional genetic loci influencing interindividual variation in RBC traits. For functional annotation of significant loci, we used a database of uniquely mapped single-nucleotide polymorphisms (SNPs), as well as gene expression and epigenetic data.


Study Design of GWAS for RBC Traits in the eMERGE Consortium

We performed a GWAS for 6 RBC traits in 12,486 patients of European ancestry identified from 5 sites in the eMERGE network (Table 1). Four of these traits (HBG, HCT, RBC count, and MCV) are measured directly by standard methods established in clinical laboratories, whereas MCH and MCHC are derived (ie, MCH = HGB × 10/RBC count and MCHC = HGB/HCT). We used an algorithm to extract RBC traits from the EMR, excluding RBC values affected by comorbid conditions, medications, and blood loss.10,11 The algorithm was exported to the remaining eMERGE sites to extract RBC trait values from the EMR. We randomly split the 5 cohorts into a discovery cohort (Mayo + GHC + VUMC, n=7873) and a replication cohort (MC + NU, n=4613). We performed tests of association within each of the 5 sites and the discovery and replication cohorts. Finally, we performed a joint analysis of the combined cohort that included all 5 sites.

Sample Characteristicsa,b

Genotyping and Quality Control

Genotyping was performed at the Center for Genotyping and Analysis at the Broad Institute (Mayo, VUMC, and NU) and the Center for Inherited Disease Research at the Johns Hopkins University (MC and GHC) using the Illumina Human660W-Quadv1_A genotyping platform (Illumina, San Diego, CA), consisting of 561,490 SNPs and 95,876 intensity-only probes. Data were cleaned using the quality control pipeline developed by the eMERGE Genomics Working Group.12 The process includes evaluation of sample and marker call rate, sex mismatch and anomalies, duplicate and HapMap concordance, batch effects, Hardy-Weinberg equilibrium, sample relatedness, and population stratification. A total of 476,395 SNPs were available for analysis after applying the following quality control criteria: SNP call rate greater than 98%, sample call rate greater than 98%, minor allele frequency greater than 0.05, Hardy-Weinberg equilibrium P>.001, and 99.99% concordance rate in duplicates. The first 2 dimensions were derived from multidimensional decomposition analysis (using the cmdscale function in R) of the 1-IBS (identity-by-state) matrix based on all eMERGE samples (n=17,358). Samples greater than 6 standard error from the mean of self-reported white ethnicity on dimensions 1 and 2 were excluded. After the quality control steps, 12,486 patients (ie, patients with genetically defined European ancestry) with phenotype and genotype data were available for association analyses.

Statistical Analyses

When multiple measurements of RBC traits were available for an individual patient, we chose the median value and the corresponding age for the genetic analyses. We used an efficient mixed-model association-expedited (EMMAX) algorithm13 to correct for sample relatedness and cryptic population substructure. The IBS matrix was calculated for each pair of individuals using the genomewide genotype data. The generalized least-squares F test was used to estimate the regression coefficient (β) and perform association analyses, which were implemented in the EMMAX package, with adjustment for age, sex, and site. The variance components for each RBC trait, as well as the pseudoheritability13 (ie, the fraction of phenotypic variance explained by the empirically estimated relatedness matrix), were estimated from the kinship matrix and residual errors (from the IBS matrix) (Supplemental Table 1, available online at The statistical power of our study was 80% to detect a quantitative trait locus that explained 0.32% variance in an RBC trait, given a sample size of 12,486, and the significance level of 5×10−8. Regional plots of genomic loci associated with RBC traits were plotted using LocusZoom.14

We imputed all ungenotyped SNPs in 9 chromosomes harboring loci associated with RBC traits, based on HapMap II CEU database (release 21). Imputation-based association for ungenotyped SNPs was performed using the same IBS matrix for the genotyped SNPs by EMMAX. We tested the association (r2>0.3) between the imputed SNPs (P<5×10−8) and the most significant (or nonsynonymous) SNP in each locus.

Patterns of Linkage Disequilibrium in the Candidate Regions and Selection of Candidate Genes

The patterns of linkage disequilibrium (LD) in the chromosome regions were illustrated using LocusZoom.14 The SNPs near the most significant or the nonsynonymous SNP (ie, the reference SNP) were color coded to reflect their LD with this SNP. Pairwise r2 values were taken from the HapMap CEU database. We assumed that 2 genes are independently affecting the trait if r2<0.3 between the reference SNP and a SNP in the other gene, recognizing that there is no standard threshold of r2 to declare a good correlation. We also used the conditional haplotype test implemented in PLINK15 to test whether SNPs have independent effects. In a region of LD with multiple genes, we selected the most likely candidate genes based on their function (ie, whether the genes are involved in erythrocyte differentiation, regulation of cell cycle, or iron metabolism).

Functional Annotation of Significant Loci

We created a pipeline to perform functional annotation of SNPs associated with the RBC traits (eFigure 1).16 First, we performed association analyses of all variants within the region of LD based on imputation of ungenotyped SNPs from the HapMap II CEU database.17 The imputation was implemented using MACH1.18 Functional characterization of significant SNPs was performed using the uniquely mapped SNP database ( We then explored associations between statistically significant SNPs (genotyped or imputed) and the expression of genes (ie, expression quantitative trait locus [eQTL] analysis) in the HapMap CEU samples by using the SCAN database.19 Finally, we performed an epigenetic analysis of human embryonic (methylation analysis) and hematopoietic stem cells and erythroid progenitors (histone signatures analysis) in the regions of the significant SNPs (eAppendix 1).


The characteristics of 12,486 patients of European ancestry in the eMERGE cohort are listed in Table 1. We identified 15 chromosomal regions associated with at least one RBC trait in the combined cohort (P<5×10−8) (Figure 1 and Table 2). A list of 142 genotyped SNPs (72 unique) associated with RBC traits, including P values within each individual site and the discovery, replication, and combined cohorts, is in Supplemental Table 2 and Supplemental Table 3, available online at, respectively. We present our GWAS results by chromosome regions because a given chromosome region may contain multiple independent loci (defined based on the pattern of LD in the HapMap CEU population, r2<0.3) that contribute to interindividual variation in RBC traits.

Manhattan plots for genomewide association analysis of red blood cell (RBC) traits. The vertical axis indicates (–log10 transformed) observed P values, and the horizontal line indicates the genomewide significance level of P<5E-08. HCT ...
Gene Loci Associated With RBC Traits in the Present and Previous GWASa

New Loci Associated With RBC Traits in Individuals of European Ancestry

In the discovery cohort, 32 new loci not previously reported in individuals of European ancestry were associated with at least 1 RBC trait at a significance level of P≤1×10−5, 4 each with HGB, and MCHC, 12 each with MCV and MCH, and 3 each with HCT and RBC count. Of these, 3 loci were replicated to be associated with RBC traits after correction for multiple testing in the replication cohort and 4 loci achieved P<5×10−8 in the combined cohort analyses (Table 3). All the lead SNPs at these loci associated with RBC traits had similar effect sizes and direction within each eMERGE study site. The 4 new loci on chromosomes 3p24.2, 15q22.3, 16q24.2, and 16q24.3 associated with RBC traits are summarized in Table 3, and the regional plots of these loci are shown in Figure 2.

Regional plots of 4 novel loci associated with red blood cell (RBC) traits on chromosomes 3p24.2 (mean corpuscular volume [MCV]), 15q22.3 (mean corpuscular hemoglobin [MCH]), 16q24.2 (mean corpuscular hemoglobin concentration [MCHC]), and 16q24.3 (MCHC). ...
Newly Discovered Loci (for Individuals of European Ancestry) Associated With at Least One of the RBC Traitsa

Chromosome 3p24.2

An intronic SNP (rs9310736) in the thyroid hormone receptor β-gene (THRB) (Mendelian Inheritance of Man [MIM] 190160) was associated with MCV (β=.35, P=6×10−9). The SNP was associated with MCV and MCH in a Japanese cohort.8 THRB is a nuclear hormone receptor for triiodothyronine (T3) and mediates biological activities of the thyroid hormone.20

Chromosome 15q22.3

At this locus, 4 SNPs in an LD block of 328 kb were associated with MCH. The region contains several candidate genes: dipeptidyl-peptidase 8 (DPP8), protein tyrosine phosphatase-like A domain containing 1 (PTPLAD1), solute carrier family 24 (sodium/potassium/calcium exchanger), member 1 (SLC24A1), chromosome 15 open reading frame (C15orf44), and DENN/MADD domain containing 4A (DENND4A). One of the SNPs (rs352476, β=0.13, P=9×10−9) is located in the 5′ regulatory region of PTPLAD1, a component of Rac1-signalling pathway necessary for erythropoiesis in the bone marrow.21 Another candidate gene, C15orf44, is differentially expressed in hematopoietic stem cells and erythrocyte progenitor cells.22 An intronic SNP (rs6494537) in DENND4A was associated with MCH in the Japanese cohort.8

Chromosome 16q24.2

An intergenic SNP (rs9937239) in this region was associated with MCHC (β=0.06, P=2×10−8). No known gene is present within ±60 kb of this SNP, although transcripts have been detected by RNA sequencing. Two major CCCTC-binding factor binding sites flank rs9937239 and could act in trans to confer enhancer-blocking insulator activity.23

Chromosome 16q24.3

A SNP (rs837763) in the 5′ flanking region of the chromatin licensing and DNA replication factor 1 gene (CDT1) was associated with MCHC (β=−0.06, P=2×10−8), and the same SNP was associated with MCHC in a Japanese cohort.8 CDT1 cooperates with CDK6 (cell division cycle 6 homolog) to promote loading of the minichromosome maintenance complex onto chromatin to form the pre-replication complex needed to initiate DNA replication.24

Replicated Loci

Eleven loci previously associated with RBC traits in individuals of European ancestry were replicated in the joint analysis (Table 4). Regional plots for these loci are shown in eFigure 2. We provide a detailed description of GWAS results and LD analyses at these loci. On the basis of LD analysis and conditional haplotype test, we identified additional significant signals, independent of previously reported genes, at 3 chromosome loci: chr6p22.2 (1 gene [HIST1H2AC] independent of HFE) (eFigure 2c), chr6q23.3 (1 gene [ALDH8A1] independent of HBS1L/MYB) (eFigure 2e), and chr22q12.3 (1 gene [MPST] independent of TMPRSS6) (eFigure 2j).

Replication of Loci Previously Reported to Be Associated With RBC Traitsa

Chromosome 1q23.1

A nonsynonymous SNP (rs857725, Lys1693Gln) and a SNP in the 3′ flanking region of the spectrin gene (SPTA1) was associated with MCHC (rs857725, β=0.07, P=2×10−8). An intronic SNP (rs857721) in SPTA1 was associated with MCHC in a prior GWAS.4 Spectrin is an actin cross-linking and molecular scaffold protein that links the plasma membrane to the actin cytoskeleton26 and determines cell shape, arrangement of transmembrane proteins, and organization of organelles. Mutations in SPTA1 result in a variety of hereditary RBC disorders, including hemolytic anemia, elliptocytosis, spherocytosis, and pyropoikilocytosis.27 Exon 1 and intron 1 of SPTA1 have been proposed as candidate regions for mutations in patients with spectrin-linked hemolytic anemia.28

Chromosome 4q12

An intergenic SNP (rs218237) in this region was associated with RBC count (β=0.04, P=1×10−8), MCV (β=−0.58, P=3×10−12), and MCH (β=−0.22, P=3×10−11). The SNP was associated with RBC count and MCH in a Japanese cohort.8 Another SNP (rs172629, 13.5 kb downstream of rs218237) at this locus was associated with MCV in a prior GWAS.4 The rs218237 is 230 kb downstream of the platelet-derived growth factor receptor α polypeptide gene (PDGFRA) and 130 kb upstream of the human homolog of the proto-oncogene c-kit gene (KIT). The former is an attractive regional candidate gene for hematologic traits.

Chromosome 6p22.2-6p22.1

We found multiple SNPs within this locus to be associated with several RBC traits. This genomic region spans approximately 2.4 MB and contains 66 reference genes (eFigure 2c). The nonsynonymous SNP rs1800562 in HFE is known to be associated with HGB,4,5 HCT,4 and MCV.4,5 In the present study, the SNP was associated with HGB (β=−0.19, P=2×10−9), MCV (β=−1.17, P=2×10−21), MCH (β=−0.52, P=2×10−27), and MCHC (β=−0.14, P=3×10−10).

In addition to the hemochromatosis gene (HFE), there are 7 butyrophilin family genes, 4 solute carrier family genes (eg, SLC17A1), 40 histone family genes (eg, HIST1H2BJ), and 14 other genes (eg, LRRC16A, SCGN, and TRIM38). These genes are in different LD blocks and may therefore independently contribute to variation in RBC traits. For example, there was low LD between SNP rs1800562 in HFE (reference SNP) and rs169219 near the tripartite motif containing 38 gene (TRIM38) (r2=0.04) and rs129128 in the HISTIH2AC gene (r2=0.01).

The SNP (rs169219) in the 5′ flanking region of TRIM38 was associated with MCV (β=0.38, P=1×10−9) and MCH (β=0.17, P=1×10−11). Three SNPs (rs12216125, rs9379818, and rs9295684) in the 3′ flanking region of TRIM38 were also associated with MCH, and rs12216125 was additionally associated with MCHC. Linkage disequilibrium analysis revealed an LD block spanning approximately 66 kb (26,083-26,150 kb) in this region that contains 6 histone family genes. The TRIM motif in TRIM38 includes 3 zinc-binding domains (ie, a RING, a B-box type 1 and a B-box type 2, and a coiled-coil region) and is involved in signal transduction of I-κB kinase/NF-κB cascade. However, rs169219 was not independently associated with MCV and MCH by conditional haplotype test (P=0.23 and P=0.21, respectively).29

Several SNPs (rs707896, rs129128, and rs2856646) within the locus 26,203 to 26,243 kb were associated with MCV (rs129128, β=−0.65, P=2×10−14), MCH (rs129128, β=−0.30, P=1×10−19), and MCHC (rs129128, β=−0.09, P=4×10−10). Although this region is adjacent to HFE (rs1800562), LD analysis suggests that these SNPs are located in an LD block (approximately 40 kb) separate from HFE. This region contains 4 histone family genes: HIST1H4C, HIST1H1T, HIST1H2BC, and HIST1H2AC. The SNP rs129128 was independently associated with MCV (F=43.1; P=5×10−11) and MCH (F=58.6; P=2×10−14).

Chromosome 6p21.1

This region with a strong level of LD contains 4 candidate genes: ubiquitin specific peptidase 49 (USP49), mediator complex subunit 20 (MED20), bystin-like (BYSL), and cyclin D3 (CCND3). Multiple intronic SNPs in this region were associated with MCV and MCH. The lead SNP (rs3218097) is in the intron of CCND3 (β=−0.58, P=1×10−18 for MCV and β=−0.18, P=6×10−12 for MCH). The same SNP was associated with RBC count in a Japanese cohort,8 and 2 other SNPs (rs9349205 and rs11970772) in an intron of CCND3 were associated with MCV4,5 and MCH.4 CCND3 is involved in hematopoietic stem cell expansion.30 It forms a complex with and functions as a regulatory subunit of cyclin-dependent kinase 4 (CDK4) or cyclin-dependent kinase 6 (CDK6), whose activity is required for cell cycle G1/S transition.

Chromosome 6q23.3

Multiple SNPs in the 5′ flanking region of HBS1L/MYB were associated with HCT, RBC, MCV, MCH, and MCHC. This locus has been identified in prior GWAS for RBC traits.4,5 These SNPs were located in a region (approximately 120 kb) between HBS1L and MYB (top SNP rs7775698). We also noted several SNPs in the 3′ flanking region of the aldehyde dehydrogenase 8 family, member A1 gene (ALDH8A1) to be associated with MCV (rs9483769, β=0.59, P=5×10−9) and MCH (rs9483769, β=0.23, P=4×10−9) (eFigure 2e). We noted low LD (r2=0.02) between rs7775698 and rs9483769, and the latter was independently associated with MCV (F=14.4; P=1×10−4) and MCH (F=14.1; P=2×10−4). ALDH8A1 plays a role in the 9-cis-retinoic acid biosynthesis pathway by converting 9-cis-retinaldehyde into the retinoid X receptor ligand 9-cis-retinoic acid. All-trans retinoic acid and its isomer 9-cis retinoic acid may play a key role in hematopoiesis that occurs in the fetal liver.31

Chromosome 6q24.1

Two intergenic SNPs were associated with MCV (lead SNP rs668459, β=−0.33, P=1×10−8). A SNP (rs643381) in this locus was associated with MCV and another SNP (rs628751) with MCH.4 There is only one noncoding gene (LOC645434) within ±60 kb of this locus.

Chromosome 7q22.1

An intronic SNP (rs4434553) in the transferrin receptor 2 gene (TFR2) was associated with MCV (β=−0.34, P=2×10−9) and MCH (β=−0.12, P=2×10−8). Another intronic SNP (rs7385804) in TFR2 was associated with HCT and RBC in prior GWAS.4,5 Pichler et al32 identified a common variant (rs7385804) in TFR2 to be associated with serum iron levels. TFR2 is a member of the transferrin receptor–like family that mediates cellular uptake of transferrin-bound iron, and mutations in this gene have been associated with hereditary hemochromatosis type III.33 TFR2, a homolog of TFR1 with expression restricted to hepatocytes and erythroid cells, binds transferrin at lower affinity than TFR1 and is involved in iron homeostasis.

In addition, imputation-based association analyses identified a SNP (rs551238, r2=0.31 with rs4434553 in TFR2) in the 3′ flanking region of the erythropoietin gene (EPO) to be associated with MCV (P=4.6×10−8) and MCH (P=9.7×10−9). EPO is a hematopoietic growth factor that regulates RBC mass and stimulates erythropoiesis by binding with the erythropoietin receptor.

Chromosome 19p13.3

Several SNPs at this locus, which spans approximately 45 kb with a high level of LD, were associated with MCH (rs2293683, β=−0.14, P=2×10−9). Ganesh et al4 reported that a SNP (rs11085824) at this locus was associated with MCH (P=1×10−11), which is in LD with rs2293683 (r2=0.93). Candidate genes in this region include Kruppel-like factor 1 (erythroid) (KLF1), glutaryl-CoA dehydrogenase (GCDH), synaptonemal complex central element protein 2 (SYCE2), and calreticulin (CALR). Of note, KLF1 (MIM 600599) is a zinc finger transcription factor expressed in erythroid cells and their precursors.34

Chromosome 22q12.3

Multiple SNPs in the F-box protein 7 gene (FBXO7) (MIM 605648) were associated with MCV, including a nonsynonymous SNP (rs11107, β=0.35, P=3×10−9). A SNP (rs5994574) in the 5′ flanking region of FBXO7 was also associated with MCH. A SNP (rs9609565) in the 5′ flanking region of FBXO7 was previously reported to be associated with MCV.5 The F-box proteins constitute one of the 4 subunits of the ubiquitin protein ligase complex called SCFs (SKP1-cullin-F-box), which functions in phosphorylation-dependent ubiquitination.36

Chromosome 22q12.3

The transmembrane protease, serine 6 gene (TMPRSS6), has been associated with HGB, MCV, MCH, and MCHC in prior GWAS.4,5 We found that multiple SNPs within TMPRSS6 were associated with HGB, MCV, MCH, and MCHC. We also noted that several SNPs (lead SNP rs130624, 23 kb upstream of TMPRSS6) in the potassium channel tetramerisation domain containing 17 gene (KCTD17) were associated with HGB and MCV. However, these SNPs were in LD with the top SNP rs855791 in TMPRSS6.

A SNP (rs8141597) in the mercaptopyruvate sulfurtransferase gene (MPST) was associated with MCV (β=−0.39, P=1×10−10), and MCH (β=−0.15, P=8×10−11) (eFigure 2j). MPST catalyzes the transfer of a sulfur ion from 3-mercaptopyruvate to cyanide or other thiol compounds and may be involved in cysteine degradation and cyanide detoxification. Mutations in MPST result in a rare heritable disorder (mercaptolactate-cysteine disulfiduria) characterized by elevated levels of the mixed disulfide of 3-mercaptolactate and cysteine in the urine,35 RBCs devoid of MPST activity, and anemia. We noted low LD (r2=0.13) between rs855791 and rs8141597, and the latter was independently associated with MCV (F=5.96; P=3×10−3) and MCH (F=4.24; P=.01).

Chromosome 22q13.3

A SNP (rs140522) in the 5′ flanking region of the thymidine phosphorylase gene (TYMP) was associated with MCV (β=0.34, P=2×10−8). Ganesh et al4 also found SNP rs131794 in the 5′ flanking region of TYMP to be associated with MCV. An additional intronic SNP (rs470119) was associated with MCH in a Japanese cohort.8 TYMP encodes an angiogenic factor that promotes angiogenesis in vivo and stimulates the growth of endothelial cells in vitro.37

Functional Annotation of SNPs Associated With RBC Traits

We created a pipeline to perform functional annotation of genotyped or imputed SNPs associated with the RBC traits (eFigure 1)16 and identified 32 SNPs associated with RBC traits as potentially functional (Supplemental Table 4, available online at Seven of the variants were nonsynonymous. The eQTL (Supplemental Table 5 and Supplemental Table 6, available online at, methylation (Supplemental Table 7 and Supplemental Table 8, available online at, and histone modification (Supplemental Table 9 and Supplemental Table 10, available online at analyses suggested that the remaining SNPs may exert their effect by regulating gene expression. A detailed description of the results of functional characterization of SNPs associated with RBC traits is presented in eAppendix 2.


In the present study, we report results of a GWAS for 6 RBC traits (HGB, HCT, RBC count, MCV, MCH, and MCHC) in 12,486 patients of European ancestry from 5 sites that comprise the eMERGE network.9,38 We identified 15 chromosomal loci associated with at least one RBC trait, including 11 loci associated with RBC traits in prior cohort studies.4,5,7 Notably, genes present in 3 of the 4 loci newly identified in individuals of European ancestry (THRB, PTPLAD1, CDT1) and in 6 of the 11 replicated loci (KLF1, ALDH8A1, CCND3, SPTA1, FBXO7, TFR2/EPO) are implicated in erythroid differentiation and regulation of cell cycle in hematopoietic stem cells. Our report also highlights the potential use of the EMR in conducting GWAS of quantitative traits of medical importance.

The sample size in this study was similar to that of previous GWAS of RBC traits in individuals of European ancestry,4,5 yet we were able to identify 4 new loci and replicate 11 loci that had been identified in at least 2 of the previous reports. Of the 4 new loci that were novel for individuals of European ancestry, 3 have been reported to be associated with RBC traits in Japanese individuals.8 An advantage of EMR-based GWAS of medically relevant quantitative traits is that the traits are measured by well-validated and established methods. In addition, multiple measures over time are often available, and these may permit study of the genetic basis of temporal trends in the traits. On the other hand, trait values may be affected by acute illness or chronic comorbid conditions present at the time of measurement, and appropriate EMR-based phenotyping algorithms for excluding such trait values need to be developed (as was done for this study) before genetic analyses.

In the present study, genes (PTPLAD1, THRB, and CDT1) residing in 3 newly identified loci associated with RBC traits are implicated in erythroid differentiation and cell cycle regulation (Figure 3). Coordinated regulation of cell cycle progression and differentiation is critical for normal hematopoiesis. Cells committed to differentiation undergo a programmed loss of proliferative capacity, restricted to only a few divisions because of their irreversible growth arrest in G1 phase, and terminal erythroid differentiation is accompanied by arrest of the cell cycle in the G1/G0 phase.41 We briefly describe the role of these genes in cell proliferation and differentiation below.

Genes implicated in erythroid differentiation are associated with red blood cell (RBC) traits. Three novel genes (THRB, PTPLAD1, CDT1) shaded in blue and 6 replicated loci (KLF1, ALDH8A1, SPTA1, FBXO7, CCND3, TFR2/EPO) shaded in brown are shown. A, Different ...

PTPLAD1 potentiates Rac1-induced NF-κB and c-Jun N-terminal kinase activation and also forms complexes with constitutively activated Rac1.42 Rac1 (and Rac2) GTPases are necessary for early erythropoietic expansion in the bone marrow, and erythropoiesis in Rac1−/− and Rac2−/− mice is characterized by abnormal burst-forming units–erythroid colony morphologic findings and decreased numbers of megakaryocyte-erythrocyte progenitors, colony-forming units–erythroid, and erythroblasts in the bone marrow.21

THRB is also known as erythroblastic leukemia viral (v-erbA) oncogene homolog 2, avian. The v-erbA oncogene of avian erythroblastosis virus encodes an aberrant version of a gene for a thyroid hormone receptor (c-erbA) and promotes neoplasia by blocking erythroid differentiation. Ligand-activated c-erbA/TR accelerates erythroid differentiation, whereas unliganded c-erbA/TR effectively blocks erythroid differentiation.43 Expression of THRB may vary according to cell cycle, thereby mediating hormone sensitivity and contributing to cell cycle progression during normal development.44

Cdk6 has been implicated in the terminal hematopoietic differentiation processes in mice, and loss of Cdk6 affects the production of terminally differentiated myeloid and erythroid cells.45 CDT1, which is required for the initiation of DNA replication, together with origin recognition complex and CDC6, constitutes the machinery that loads the minichromosome maintenance complex, a candidate replicative helicase, onto chromatin during the G1 phase.46 In mice, CDT1 is phosphorylated by CDKs during the cell cycle, which induces the association of CDT1 with ubiquitination complex SCF-SKP2 and targets CDT1 for degradation.

Of the genes residing in the replicated loci, 6 were also involved in erythrocyte differentiation and regulation of cell cycle. Erythropoiesis is largely mediated by a relatively small number of lineage-restricted transcription factors, including KLF1.47 KLF1 controls the development and differentiation of erythroid lineage by mediating the switch from expression of fetal γ-globin to adult β-globin and regulates transcription of genes that encode cytoskeletal proteins, heme synthesis enzymes, and blood group antigens.48,49 KLF1 also regulates the lineage progression of megakaryocyte-erythroid progenitor cells. Failure of terminal erythroid differentiation in KLF1-deficient mice is associated with cell cycle perturbation and reduced expression of E2F2.49 In humans, haploinsufficiency for KLF1 causes hereditary persistence of fetal hemoglobin.50 In erythrocyte progenitor cells, KLF1 was upregulated 28-fold when compared with hematopoietic stem cells.22

ALDH8A1 converts 9-cis-retinaldehyde into the 9-cis-retinoic acid, which influences hematopoiesis in the fetal liver.31 CCND3 interacts with CDK4 and CDK6 to play a lineage-independent role in hematopoiesis,51 SPTA1 and TFR2 are proteins related to erythrocyte structure and function, EPO is a hematopoietic growth factor, and EPOR is the required receptor. In the mouse, FBXO7 associates specifically with CDK6 to promote CDK6–cyclin D complex formation and cellular transformation.52

The effect of trait-associated alleles may be exerted by multiple mechanisms influencing gene expression.16 Of the SNPs associated with RBC traits in the present study, 15 SNPs in 7 loci were associated with 27 distant eQTLs (Supplemental Table 5, available online at The associated SNPs may also influence epigenetic regulation, including methylation and histone modification.

Three of the 4 newly discovered loci associated with RBC traits in our analyses were reported in a Japanese cohort.8 Nonetheless, these loci are novel for individuals of European ancestry. The associations of these loci with RBC traits are weaker than the replicated loci, as expected. Our annotation of likely candidate genes and their function is based on LD patterns and known literature and bioinformatics analyses. Additional work is needed to confirm the role of specific genes at a locus in RBC biology.


We identified 4 new genetic loci associated with RBC traits at P<5×10−8 in individuals of European ancestry and replicated 11 previously reported loci. Genes present within 3 of the 4 newly identified loci (THRB, PTPLAD1, and CDT1) and in 6 of the 11 replicated loci (KLF1, ALDH8A1, CCND3, SPTA1, FBXO7, and TFR2/EPO) are implicated in erythroid differentiation/cell cycle of hematopoietic stem cells. The results provide insights into common genetic variants influencing RBC traits and advance our understanding of the mechanisms regulating erythroid differentiation. These findings have implications for understanding the origin of RBC disorders, including anemia and erythroid dysplasia/neoplasia, and developing therapies for such disorders.


Grant Support: The eMERGE network was initiated and funded by the National Human Genome Research Institute, with additional funding from National Institute of General Medical Sciences through the following grants: U01-HG-04599 (Mayo Clinic), U01-HG-004610 (GHC), U01-HG-004608 (MC), U01HG004609 (NU), and U01-HG-04603 (VUMC, also serving as the administrative coordinating center).

Supplementary material

Online eAppendix, eTables 1 to 10 and eFigures 1 and 2:


1. Ascenzi P., Bellelli A., Coletta M. Multiple strategies for O2 transport: from simplicity to complexity. IUBMB Life. 2007;59(8-9):600–616. [PubMed]
2. Sarnak M.J., Tighiouart H., Manjunath G. Anemia as a risk factor for cardiovascular disease in The Atherosclerosis Risk in Communities (ARIC) study. J Am Coll Cardiol. 2002;40(1):27–33. [PubMed]
3. Lin J.P., O'Donnell C.J., Jin L., Fox C., Yang Q., Cupples L.A. Evidence for linkage of red blood cell size and count: genome-wide scans in the Framingham Heart Study. Am J Hematol. 2007;82(7):605–610. [PubMed]
4. Ganesh S.K., Zakai N.A., van Rooij F.J. Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium. Nat Genet. 2009;41(11):1191–1198. [PMC free article] [PubMed]
5. Soranzo N., Spector T.D., Mangino M. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nat Genet. 2009;41(11):1182–1190. [PMC free article] [PubMed]
6. Chambers J.C., Zhang W., Li Y. Genome-wide association study identifies variants in TMPRSS6 associated with hemoglobin levels. Nat Genet. 2009;41(11):1170–1172. [PMC free article] [PubMed]
7. Benyamin B., Ferreira M.A., Willemsen G. Common variants in TMPRSS6 are associated with iron status and erythrocyte volume. Nat Genet. 2009;41(11):1173–1175. [PMC free article] [PubMed]
8. Kamatani Y., Matsuda K., Okada Y. Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat Genet. 2010;42(3):210–215. [PubMed]
9. McCarty C.A., Chisholm R.L., Chute C.G. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med Genomics. 2011;4(1):13. [PMC free article] [PubMed]
10. Kullo I.J., Ding K., Jouni H., Smith C.Y., Chute C.G. A genome-wide association study of red blood cell traits using the electronic medical record. PLoS One. 2010;5(9):e13011. [PMC free article] [PubMed]
11. Kullo I.J., Ding K., Shameer K. Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. Am J Hum Genet. 2011:89181–89189. [PubMed]
12. Turner S., Armstrong L.L., Bradford Y. Quality control procedures for genome-wide association studies. Curr Protoc Hum Genet. 2011 Chapter 1:Unit 1.19. [PMC free article] [PubMed]
13. Kang H.M., Sul J.H., Service S.K. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42(4):348–354. [PMC free article] [PubMed]
14. Pruim R.J., Welch R.P., Sanna S. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–2337. [PMC free article] [PubMed]
15. Purcell S., Neale B., Todd-Brown K. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. [PubMed]
16. Freedman M.L., Monteiro A.N., Gayther S.A. Principles for the post-GWAS functional characterization of cancer risk loci. Nat Genet. 2011;43(6):513–518. [PMC free article] [PubMed]
17. International Hapmap Consortium A haplotype map of the human genome. Nature. 2005;437(7063):1299–1320. [PMC free article] [PubMed]
18. Li Y., Willer C., Sanna S., Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406. [PMC free article] [PubMed]
19. Gamazon E.R., Zhang W., Konkashbaev A. SCAN: SNP and copy number annotation. Bioinformatics. 2010;26(2):259–262. [PMC free article] [PubMed]
20. Sakurai A., Nakai A., DeGroot L.J. Structural analysis of human thyroid hormone receptor beta gene. Mol Cell Endocrinol. 1990;71(2):83–91. [PubMed]
21. Kalfa T.A., Pushkaran S., Zhang X. Rac1 and Rac2 GTPases are necessary for early erythropoietic expansion in the bone marrow but not in the spleen. Haematologica. 2010;95(1):27–35. [PubMed]
22. Cui K., Zang C., Roh T.Y. Chromatin signatures in multipotent human hematopoietic stem cells indicate the fate of bivalent genes during differentiation. Cell Stem Cell. 2009;4(1):80–93. [PMC free article] [PubMed]
23. Bao L., Zhou M., Cui Y. CTCFBSDB: a CTCF-binding site database for characterization of vertebrate genomic insulators. Nucleic Acids Res. 2008;36(Database issue):D83–D87. [PMC free article] [PubMed]
24. Nishitani H., Lygerou Z., Nishimoto T., Nurse P. The Cdt1 protein is required to license DNA for replication in fission yeast. Nature. 2000;404(6778):625–628. [PubMed]
25. Ferreira M.A., Hottenga J.J., Warrington N.M. Sequence variants in three loci influence monocyte counts and erythrocyte volume. Am J Hum Genet. 2009;85(5):745–749. [PubMed]
26. Ziemnicka-Kotula D., Xu J., Gu H. Identification of a candidate human spectrin Src homology 3 domain-binding protein suggests a general mechanism of association of tyrosine kinases with the spectrin-based membrane skeleton. J Biol Chem. 1998;273(22):13681–13692. [PubMed]
27. Marchesi S.L., Letsinger J.T., Speicher D.W. Mutant forms of spectrin alpha-subunits in hereditary elliptocytosis. J Clin Invest. 1987;80(1):191–198. [PMC free article] [PubMed]
28. Gallagher P.G., Nilson D.G., Steiner L.A., Maksimova Y.D., Lin J.Y., Bodine D.M. An insulator with barrier-element activity promotes alpha-spectrin gene expression in erythroid cells. Blood. 2009;113(7):1547–1554. [PubMed]
29. Matsuda A., Suzuki Y., Honda G. Large-scale identification and characterization of human genes that activate NF-kappaB and MAPK signaling pathways. Oncogene. 2003;22(21):3307–3318. [PubMed]
30. Kozar K., Ciemerych M.A., Rebel V.I. Mouse development and cell proliferation in the absence of D-cyclins. Cell. 2004;118(4):477–491. [PubMed]
31. Tocci A., Parolini I., Gabbianelli M. Dual action of retinoic acid on human embryonic/fetal hematopoiesis: blockade of primitive progenitor proliferation and shift from multipotent/erythroid/monocytic to granulocytic differentiation program. Blood. 1996;88(8):2878–2888. [PubMed]
32. Pichler I., Minelli C., Sanna S. Identification of a common variant in the TFR2 gene implicated in the physiological regulation of serum iron levels. Hum Mol Genet. 2011;20(6):1232–1240. [PMC free article] [PubMed]
33. Roetto A., Totaro A., Piperno A. New mutations inactivating transferrin receptor 2 in hemochromatosis type 3. Blood. 2001;97(9):2555–2560. [PubMed]
34. Miller I.J., Bieker J.J. A novel, erythroid cell-specific murine transcription factor that binds to the CACCC element and is related to the Kruppel family of nuclear proteins. Mol Cell Biol. 1993;13(5):2776–2786. [PMC free article] [PubMed]
35. Billaut-Laden I., Rat E., Allorge D. Evidence for a functional genetic polymorphism of the human mercaptopyruvate sulfurtransferase (MPST), a cyanide detoxification enzyme. Toxicol Lett. 2006;165(2):101–111. [PubMed]
36. Chang Y.F., Cheng C.M., Chang L.K., Jong Y.J., Yuo C.Y. The F-box protein Fbxo7 interacts with human inhibitor of apoptosis protein cIAP1 and promotes cIAP1 ubiquitination. Biochem Biophys Res Commun. 2006;342(4):1022–1026. [PubMed]
37. Leung D.W., Cachianes G., Kuang W.J., Goeddel D.V., Ferrara N. Vascular endothelial growth factor is a secreted angiogenic mitogen. Science. 1989;246(4935):1306–1309. [PubMed]
38. Kho A.N., Pacheco J.A., Peissig P.L. Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med. 2011;3(79):79re71. [PubMed]
39. Koury M.J., Sawyer S.T., Brandt S.J. New insights into erythropoiesis. Curr Opin Hematol. 2002;9(2):93–100. [PubMed]
40. Truong L.N., Wu X. Prevention of DNA re-replication in eukaryotic cells. J Mol Cell Biol. 2011;3(1):13–22. [PMC free article] [PubMed]
41. Tsiftsoglou A.S., Vizirianakis I.S., Strouboulis J. Erythropoiesis: model systems, molecular regulators, and developmental programs. IUBMB Life. 2009;61(8):800–830. [PubMed]
42. Courilleau D., Chastre E., Sabbah M., Redeuilh G., Atfi A., Mester J. B-ind1, a novel mediator of Rac1 signaling cloned from sodium butyrate-treated fibroblasts. J Biol Chem. 2000;275(23):17344–17348. [PubMed]
43. Bartunek P., Zenke M. Retinoid X receptor and c-cerbA/thyroid hormone receptor regulate erythroid cell growth and differentiation. Mol Endocrinol. 1998;12(9):1269–1279. [PubMed]
44. Maruvada P., Dmitrieva N.I., East-Palmer J., Yen P.M. Cell cycle-dependent expression of thyroid hormone receptor-beta is a mechanism for variable hormone sensitivity. Mol Biol Cell. 2004;15(4):1895–1903. [PMC free article] [PubMed]
45. Malumbres M., Sotillo R., Santamaria D. Mammalian cells cycle without the D-type cyclin-dependent kinases Cdk4 and Cdk6. Cell. 2004;118(4):493–504. [PubMed]
46. Sugimoto N., Tatsumi Y., Tsurumi T. Cdt1 phosphorylation by cyclin A-dependent kinases negatively regulates its function without affecting geminin binding. J Biol Chem. 2004;279(19):19691–19697. [PubMed]
47. Cantor A.B., Orkin S.H. Transcriptional regulation of erythropoiesis: an affair involving multiple partners. Oncogene. 2002;21(21):3368–3376. [PubMed]
48. Hodge D., Coghill E., Keys J. A global role for EKLF in definitive and primitive erythropoiesis. Blood. 2006;107(8):3359–3370. [PubMed]
49. Pilon A.M., Arcasoy M.O., Dressman H.K. Failure of terminal erythroid differentiation in EKLF-deficient mice is associated with cell cycle perturbation and reduced expression of E2F2. Mol Cell Biol. 2008;28(24):7394–7401. [PMC free article] [PubMed]
50. Borg J., Papadopoulos P., Georgitsi M. Haploinsufficiency for the erythroid transcription factor KLF1 causes hereditary persistence of fetal hemoglobin. Nat Genet. 2010;42(9):801–805. [PMC free article] [PubMed]
51. Uchimaru K., Taniguchi T., Yoshikawa M. Detection of cyclin D1 (bcl-1, PRAD1) overexpression by a simple competitive reverse transcription-polymerase chain reaction assay in t(11;14)(q13;q32)-bearing B-cell malignancies and/or mantle cell lymphoma. Blood. 1997;89(3):965–974. [PubMed]
52. Laman H., Funes J.M., Ye H. Transforming activity of Fbxo7 is mediated specifically through regulation of cyclin D/cdk6. EMBO J. 2005;24(17):3104–3116. [PubMed]

Articles from Mayo Clinic Proceedings are provided here courtesy of The Mayo Foundation for Medical Education and Research