|Home | About | Journals | Submit | Contact Us | Français|
Red blood cell, white blood cell, and platelet measures, including their count, sub-type and volume, are important diagnostic and prognostic clinical parameters for several human diseases. To identify novel loci associated with hematological traits, and compare the architecture of these phenotypes between ethnic groups, the CARe Project genotyped 49,094 single nucleotide polymorphisms (SNPs) that capture variation in ~2,100 candidate genes in DNA of 23,439 Caucasians and 7,112 African Americans from five population-based cohorts. We found strong novel associations between erythrocyte phenotypes and the glucose-6 phosphate dehydrogenase (G6PD) A-allele in African Americans (rs1050828, P < 2.0 × 10−13, T-allele associated with lower red blood cell count, hemoglobin, and hematocrit, and higher mean corpuscular volume), and between platelet count and a SNP at the tropomyosin-4 (TPM4) locus (rs8109288, P = 3.0 × 10−7 in Caucasians; P = 3.0 × 10−7 in African Americans, T-allele associated with lower platelet count). We strongly replicated many genetic associations to blood cell phenotypes previously established in Caucasians. A common variant of the α-globin (HBA2-HBA1) locus was associated with red blood cell traits in African Americans, but not in Caucasians (rs1211375, P < 7 × 10−8, A-allele associated with lower hemoglobin, mean corpuscular hemoglobin, and mean corpuscular volume). Our results show similarities but also differences in the genetic regulation of hematological traits in European- and African-derived populations, and highlight the role of natural selection in shaping these differences.
Blood cell counts are important clinical parameters: they are altered in many human diseases (e.g., cancers, infections and inflammation), and strongly modulate severity in primary blood disorders (e.g., the hemoglobinopathies). Genome-wide association studies (GWAS) in individuals of European ancestry have identified >30 loci that carry common DNA polymorphisms associated with blood cell numbers [including red blood cells (RBC), white blood cells (WBC), WBC sub-types, and platelets (PLT)] and related phenotypes [hematocrit (Hct), hemoglobin (Hb), mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and mean platelet volume (MPV)] (see Supplementary Table 1 for a definition of these different hematological traits and how they are derived) (Ganesh et al. 2009; Gudbjartsson et al. 2009; Meisinger et al. 2009; Soranzo et al. 2009a, b; Uda et al. 2008). Recent GWAS have also reported genetic associations between common SNPs and blood cell parameters in Japanese populations (Kamatani et al. 2010; Okada et al. 2010). Finally, genetic variation at the Duffy antigen receptor for chemokines (DARC) locus explains reduced total WBC and neutrophil levels in African Americans, compared to European Americans (Nalls et al. 2008; Reich et al. 2009).
To identify novel loci associated with blood cell counts and related indices, and to compare the genetic architecture of these quantitative traits in two different ethnic groups, we analyzed genetic association between each of 12 hematologic traits and ~ 175,000 SNPs that were genotyped or imputed in 23,439 Caucasians and 7,112 African Americans. These SNPs had been selected to capture variation across multiple ethnic groups (including African-derived populations) in ~ 2,100 candidate genes for heart, lung, and blood disorders (Keating et al. 2008). Here, we present evidence that the canonical glucose-6 phosphate dehydrogenase (G6PD) A-allele (rs1050828) implicated in G6PD deficiency (MIM #305900) and malaria resistance (Guindo et al. 2007; Ruwende et al. 1995; Tishkoff et al. 2001) also associates with RBC, Hct, Hb, and MCV variation (P < 2 × 10−13) in African Americans. We also identified a novel association between rs8109288 in the TPM4 gene and PLT count in Caucasians (P = 3.0 × 10−7) and African Americans (P = 3.0 × 10−7). In Caucasians, we replicated the association between 13 distinct loci and hematological traits (Ganesh et al. 2009; Soranzo et al. 2009b). In African Americans, we found strong evidence of association between the DARC locus and WBC count (Nalls et al. 2008; Reich et al. 2009), and between the α-globin (HBA2-HBA1) locus and RBC count.
All participants gave informed written consent. The CARe project is approved by the ethics committees of the participating studies and of the Massachusetts Institute of Technology. This project was also reviewed and approved by the Montreal Heart Institute’s ethics committee.
Phenotypes from 23,439 Caucasians and 7,112 African Americans from the Atherosclerosis Risk in Communities (ARIC) study, the Coronary Artery Risk Development in young Adults (CARDIA) study, the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS), and the Jackson Heart Study (JHS) were analyzed in this study. A detailed description of each cohort can be found in the Supplementary Information.
All samples were genotyped by the Genetic Analysis Platform at the Broad Institute using the ITMAT-Broad-CARe (IBC) Illumina iSELECT array according to the manufacturer’s recommendations (Keating et al. 2008). Genotypes were called using Beadstudio (Illumina) and the calling cluster Id CVDSNP55v1_A.EGT. Quality control filters applied are summarized in Supplementary Tables 2–3.
Imputation was performed using MACH 1.0.16 (ref. Li et al. 2009). MACH requires phased reference haplotypes to perform imputation. For European Americans, we used the reference haplotypes from the Northern European CEU population from HapMap phase 2. For the African Americans, a combined CEU + YRI reference panel was created using HapMap phase 2 data (Kang et al. 2010). This panel includes SNPs segregating in both CEU and YRI, as well as SNPs segregating in one panel and monomorphic and nonmissing in the other. For both European Americans and African Americans, imputation was performed in two steps. For the first step, 300 individuals were randomly extracted to generate recombination and error rate estimates. In the second step, these rates were used to impute all individuals across the entire reference panel. Imputation results were filtered at an rsq_hat threshold ≥0.6 and a MAF threshold ≥1%.
Methods used to measure the blood traits analyzed have been described previously for ARIC (Ganesh et al. 2009), CARDIA (Shimakawa and Bild 1993), CHS (Ganesh et al. 2009), FHS (Ganesh et al. 2009), and JHS (Reich et al. 2009). Because we use linear regression to analyze genotype–phenotype associations (see below), the distribution of the quantitative traits analyzed needs to be normal. Trait values were normalized into Z-scores using inverse normal transformation after accounting for gender, age, age-squared, and recruitment center (when available). Inverse normal transformation uses ranks to fit all phenotypic residuals into a perfectly normal distribution such that even individuals with extreme phenotype values (“outliers”) can be kept in the analysis. We excluded individuals with blood cancers or known pregnancy at the time of visit.
For all cohorts but FHS, analysis was performed in PLINK (Purcell et al. 2007) using linear regression under an additive genetic model. For FHS, we modeled the family structure in the association tests using a linear mixed effects (LME) model implemented in R (Chen and Yang 2010). We tested an additive genetic model and included as covariates the first ten principal components. For imputed SNPs, dosage information (bound between 0.0 and 2.0) was used as predictor.
Association results were combined within ethnic group using the inverse variance method, as implemented in the software METAL (Willer et al. 2010). Individual study results were corrected using genomic control; meta-analytic results were also scaled using genomic control (Devlin and Roeder 1999).
The National Heart, Lung and Blood Institute (NHLBI)-funded Candidate gene Association Resource (CARe) Project genotyped >40,000 participants from nine population-based cohorts to identify genetic associations with cardiovascular, pulmonary, hematologic, and sleep-related traits (Musunuru et al. 2010). In the CARe dataset, blood indices were available for up to 23,439 Caucasians and 7,112 African Americans from the Atherosclerosis Risk in Communities (ARIC) study, the Coronary Artery Risk Development in young Adults (CARDIA) study, the Cardiovascular Health Study (CHS), the Framingham Heart Study (FHS), and the Jackson Heart Study (JHS) (Table 1). All CARe samples were genotyped on the ITMAT-Broad-CARe (IBC) platform, which interrogates 49,094 SNPs (including many rare non-synonymous SNPs) to capture genetic variation in ~2,100 candidate genes in multiple ethnic groups (Keating et al. 2008). Genotype data were processed using stringent quality-control filters (Supplementary Tables 2–3), and we used genotype imputation to increase coverage (“Materials and methods”). Phenotypes were analyzed as quantitative traits under an additive genetic model using a linear regression framework (Purcell et al. 2007). For each analysis, we also included as covariates the first ten principal components to correct for global admixture in African Americans and possible population stratification. Within each ethnic group, results were combined by meta-analysis (Willer et al. 2010). Individual study results, as well as meta-analytic results, were scaled using genomic control (Devlin and Roeder 1999).
Meta-analysis results for the 12 phenotypes analyzed in Caucasians and African Americans are summarized in quantile–quantile (QQ) plots (Supplementary Figs. 1–4). Except for WBC count in African Americans, the inflation factors observed are near unity (range λGC 0.948–1.065) (Supplementary Table 4), suggesting that our results are not markedly inflated by confounding factors. However, in African Americans, we observed a high inflation factor for WBC count (λGC = 1.113). Most of the inflation is due to the DARC locus on chromosome 1, which was originally identified as a bona fide WBC count locus by admixture mapping (Nalls et al. 2008). When we exclude SNPs on chromosome 1, the inflation factor for WBC count in African Americans is reduced to λGC = 1.054 (Supplementary Fig. 5).
In this study, we consider a threshold of P ≤ 2 × 10−6 as significant after accounting for the number of independent loci tested on the IBC platform (see Supplementary Information for a discussion of this statistical threshold). Our analysis highlighted novel genetic associations to hematological traits at P ≤ 2 × 10−6, including two loci that reach the generally accepted threshold to declare genome-wide significance (P ≤ 5 × 10−8) (Fig. 1; Table 2). In African Americans, we found that the missense SNP rs1050828 (Val68Met) in the G6PD gene is associated with RBC count, Hb, Hct, and MCV (all with P values <2.0 × 10−13). This SNP corresponds to the canonical G6PD A-allele, shown to cause partial G6PD deficiency (MIM #305900) and resistance to malaria (Guindo et al. 2007; Ruwende et al. 1995; Tishkoff et al. 2001). The G6PD A-allele is associated with decreased RBC count, Hb, and Hct, and increased MCV. As expected, because G6PD is X-linked, effect sizes for rs1050828 on RBC trait variation were stronger in males.
The second novel genome-wide significant association that we identified is between an intronic SNP in TPM4 and PLT count. The A-allele at rs8109288 was associated with a decreased PLT count in both Caucasians (P = 3.0 × 10−7) and African Americans (P = 3.0 × 10−7) (Fig. 1; Table 2). The FHS does not have a standard PLT count available, but PLT count has been measured in platelet-rich plasma (PRP) as part of a study assessing in vitro platelet aggregation responses (O’Donnell et al. 2001). The A-allele at rs8109288 was associated with a decreased PLT count in PRP in the FHS (P = 0.005), providing additional, independent evidence that this sequence variant at the TPM4 locus is linked to a variant that affects PLT number. TPM4 is one of the four human tropomyosin genes, whose protein products play a role in cytoskeletal functions. An intronic SNP in TPM1 (rs11071720) was recently shown to associate with MPV in Caucasians (Soranzo et al. 2009b), and a proxy of this SNP, rs3803499 [r2 = 0.25 in HapMap Northern European (CEU) samples], was genotyped on the IBC array and is nominally associated with PLT count in the CARe cohorts (P = 0.013 in Caucasians, P = 0.007 in African Americans). In the JHS, the only CARe cohort with MPV available, the association between TPM4 rs8109288 and MPV was strong (P = 9.7 × 10−5; the A-allele is associated with increased MPV). Nineteen SNPs in the two remaining tropomyosin genes, TPM2 and TPM3, were also genotyped on the IBC array. None of these SNPs, or nearby imputed SNPs, were convincingly associated with PLT phenotypes in Caucasians or African Americans. Therefore, at least two tropomyosin genes harbor common genetic polymorphisms associated with PLT count and volume.
Our analysis strategy was validated by the replication of several associations to blood cell phenotypes reported previously. We replicated associations of 14 loci with one or more hematological traits (Table 3). Obviously, our replication is limited to the known loci covered by the IBC platform: loci that were interrogated by the IBC array and did not replicate, or loci not genotyped on this platform are listed in Supplementary Tables 5 and 6, respectively. Furthemore, because Caucasian samples from the ARIC, CHS, and FHS cohorts overlap with samples used in the CHARGE meta-analysis (Ganesh et al. 2009), replication of the CHARGE findings in the CARe meta-analysis does not provide independent confirmatory evidences. In some cases, we identified the same SNPs as previous studies, such as the missense SNP rs1800562 in HFE associated with Hb, Hct, and MCV in Caucasians, or rs2814778 located in the 5′ untranslated region (UTR) of DARC, strongly associated with WBC and neutrophil levels in African Americans (Table 3). Of particular interest, the missense SNP rs3184504 in SH2B3 (Trp262Arg) illustrates an extreme case of pleiotropy: this SNP has now been associated with Hb and Hct (Table 3) (Ganesh et al. 2009), PLT count (Kamatani et al. 2010; Soranzo et al. 2009b), eosinophil count and myocardial infarction (Gudbjartsson et al. 2009), blood pressure and hypertension (Levy et al. 2009), celiac disease (Hunt et al. 2008), and type 1 diabetes (Todd et al. 2007) in Caucasians. SH2B3 encodes Lnk, a negative regulator of hematopoiesis, and it is possible that genetic variation in this gene affects common disease risk indirectly by modulating properties of the three major blood cell types.
We have used a genotyping array that covers common genetic variation in ~2,100 candidate genes to identify loci associated with blood cell counts and related indices in Caucasians and African Americans. SNPs on this array were selected using a “cosmopolitan” tagging approach such that common variants in these candidate genes should be covered similarly in these two ethnic groups (Keating et al. 2008). We analyzed 12 phenotypes and identified two new loci that reach genome-wide significance: G6PD rs1050828 is associated with RBC count, Hb, Hct, and MCV in African Americans, and TPM4 rs8109288 is associated with PLT count in Caucasians and African Americans (Table 2). Since clinical processes such as iron deficiency or inflammation can influence hematological traits, we sought to confirm that the associations observed at G6PD and TPM4 were independent of these conditions. When we adjusted our analyses for iron levels [using ferritin or iron levels, or total-iron binding capacity (TIBC)] or inflammation [using C-reactive protein (CRP) levels], association results remained largely unchanged (data not shown), suggesting that the associations at G6PD and TPM4 are independent of iron and inflammation status. In addition to the novel associations at G6PD and TPM4, we replicated 36 previously reported associations (Table 3).
Glucose-6-phosphate dehydrogenase protects red blood cells against oxidative damage. Inherited deficiency of glucose-6-phosphate dehydrogenase is an X-linked enzymopathy that has a higher prevalence in areas of the world where malaria is endemic. Many variants of G6PD have been described with wide ranging levels of enzyme activity and associated clinical symptoms. Rare severe mutations in G6PD have been linked to neonatal jaundice and to acute and chronic hemolytic anemia (including congenital non-spherocytic hemolytic anemia) in the presence of oxidative stress (Cappellini and Fiorelli 2008). To our knowledge, our study is the first to report associations between the mild G6PD A-variant and erythrocyte phenotypes in normal populations, although it was recently associated with Hb levels in sickle cell anemia patients (Nouraie et al. 2010).
We have conducted one of the first well-powered genetic association studies where several complex human traits are analyzed in two ethnic groups, allowing a direct comparison of the architecture of these phenotypes in individuals of European and African descent. The overlap in loci that control hematological traits in Caucasians and African Americans was small, with only SNPs at the BAK1 and TPM4 genes found consistently associated with the same trait at P ≤ 2 × 10−6 (PLT count in both cases) (Soranzo et al. 2009b). This lack of overlap might be partially explained by the difference in sample sizes: 7,112 African Americans and 23,439 Caucasians were available in our analyses. Consequently, when more African American datasets become available, more loci will likely be shown to control blood cell phenotypes in both of these two ethnic groups. It is also possible that difference in allele frequencies might affect discovery power between Caucasians and African Americans.
However, it is also apparent that differing selective pressures have shaped the genetic regulation of hematological traits in European- and African-derived populations (see Supplementary Table 7 for iHS values). The HFE rs1800562 missense SNP (C282Y) is associated with erythrocyte phenotypes (Hb, Hct, MCV) (Ganesh et al. 2009; Soranzo et al. 2009b) and iron status (Benyamin et al. 2009), and causes hereditary hemochromatosis (MIM #235200) in Caucasians (Table 3). The minor A-allele of rs1800562 is absent in populations of African ancestry but relatively frequent in Caucasians (5–10%), and is located on a long haplotype of low diversity indicative of positive selection (Ajioka et al. 1997; Thomas et al. 1998). Similarly, it was recently shown that the SH2B3 locus associated with variation in all three main blood cell type (RBC, WBC, and PLT) indices is also under natural selection (Soranzo et al. 2009b). Genetic variation at the SH2B3 locus is also associated with activation of the innate immune system, suggesting a possible role in protection against bacterial infection (Zhernakova et al. 2010). In African Americans, the three major loci associated with WBC and RBC trait variations are DARC, HBA2-HBA1, and G6PD. These three loci are known to carry alleles that confer resistance to malaria and to be under strong positive selection in African-derived populations. The minor alleles for DARC rs2814778 and G6PD rs1050828 are common in African Americans (>10%) but extremely rare (or absent) in Caucasians. The minor A-allele for HBA2-HBA1 rs1211375 is common in both Caucasians (35%) and African Americans (27%), but only associates with erythrocyte phenotypes in African Americans. A recent survey of structural variants in the human genome has shown a copy number polymorphism (CNVR6569) at the α-globin locus in HapMap samples from Yoruba, Nigeria (YRI) that is absent in HapMap CEU individuals (Conrad et al. 2009). Of the SNPs at the α-globin locus surveyed by the IBC array, rs1211375 is the best tag for CNVR6569 (r2 = 0.37). Thus, it is likely that rs1211375 captures variation in the number of α-globin genes in African Americans. This is consistent with the clinical epidemiology of α-thalassemia, which is known to modulate RBC phenotypes. As cross-ethnic association studies are performed for additional phenotypes, it will be interesting to compare the genetic architecture of complex human diseases and traits between ethnic groups, and to assess the role of natural selection in shaping the differences.
The authors wish to acknowledge the support of the National Heart, Lung, and Blood Institute and the contributions of the research institutions, study investigators, field staff and study participants in creating this resource for biomedical research. The grants and contracts that have supported CARe are listed at http://public.nhlbi.nih.gov/GeneticsGenomics/home/care.aspx. Additional support for this work was provided by the Fondation de l’Institut de Cardiologie de Montréal (to G.L.), and by NIH R01 HL71862-06 and ARRA N000949304 (to A.P.R.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Electronic supplementary material The online version of this article (doi:10.1007/s00439-010-0925-1) contains supplementary material, which is available to authorized users.
Conflict of interest We declare no competing interests.
Ken Sin Lo, Montreal Heart Institute, 5000 Bélanger Street, Montreal, QC H1T 1C8, Canada.
James G. Wilson, Department of Medicine, G.V. (Sonny) Montgomery V.A., Medical Center and the University of Mississipi Medical Center, Jackson, MS 39216, USA.
Leslie A. Lange, Department of Genetics, University of North Carolina, 5112 Genetic Medicine Building, Chapel Hill, NC 27599-7264, USA.
Aaron R. Folsom, Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN 55454, USA.
Geneviève Galarneau, Montreal Heart Institute, 5000 Bélanger Street, Montreal, QC H1T 1C8, Canada.
Santhi K. Ganesh, Division of Cardiovascular Medicine, Department of Internal Medicine, The University of Michigan, Ann Arbor, MI 48109, USA.
Struan F. A. Grant, Center for Applied Genomics, Division of Human Genetics, Children’s Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA. Department of Pediatrics, University of Pennsylvania, School of Medicine, Philadelphia, PA 19104, USA.
Brendan J. Keating, Center for Applied Genomics, Division of Human Genetics, Children’s Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA.
Steven A. McCarroll, Department of Genetics, Harvard Medical School, Boston, MA 02115, USA. Broad Institute, Seven Cambridge Center, Cambridge, MA 02142, USA.
Emile R. Mohler, III, Cardiovascular Division, Vascular Medicine Section, Department of Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA.
Christopher J. O’Donnell, Division of Intramural Research, National Heart, Lung, and Blood Institute (NHLBI), Bethesda, MD 20892, USA. NHLBI’s Framingham Heart Study, Framingham, MA 01702, USA. Cardiology Division, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
Walter Palmas, Department of Medicine, Columbia University, 622 West 168th Street, Ph 9 East, 107, New York, NY 10032, USA.
Weihong Tang, Division of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN 55454, USA.
Russell P. Tracy, Departments of Pathology and Biochemistry, University of Vermont, 208 S. Park Drive, suite 2, Colchester, VT 05446, USA.
Alexander P. Reiner, Department of Epidemiology, University of Washington, Box 357236, Seattle, WA 98195, USA, Email: ude.notgnihsaw.u@renierpa.
Guillaume Lettre, Montreal Heart Institute, 5000 Bélanger Street, Montreal, QC H1T 1C8, Canada, Email: firstname.lastname@example.org. Département de Médecine, Université de Montreal, C.P. 6128, succursale Centre-ville, Montreal, QC H3C 3J7, Canada.