|Home | About | Journals | Submit | Contact Us | Français|
Platelets are the second most abundant cell type in blood and are essential for maintaining haemostasis. Their count and volume are tightly controlled within narrow physiological ranges, but there is only limited understanding of the molecular processes controlling both traits. Here we carried out a high-powered meta-analysis of genome-wide association studies (GWAS) in up to 66,867 individuals of European ancestry, followed by extensive biological and functional assessment. We identified 68 genomic loci reliably associated with platelet count and volume mapping to established and putative novel regulators of megakaryopoiesis and platelet formation. These genes show megakaryocyte-specific gene expression patterns and extensive network connectivity. Using gene silencing in Danio rerio and Drosophila melanogaster, we identified 11 of the genes as novel regulators of blood cell formation. Taken together, our findings advance understanding of novel gene functions controlling fate-determining events during megakaryopoiesis and platelet formation, providing a new example of successful translation of GWAS to function.
To discover novel genetic determinants of megakaryopoiesis and platelet formation, we performed meta-analyses of GWAS for mean platelet volume (MPV) and platelet count (PLT). Our analyses included 18,600 (13 studies, MPV) and 48,666 (23 studies, PLT) individuals of European descent, respectively, and up to ~2.5 million genotyped or imputed single nucleotide polymorphisms (SNPs)1. Briefly, we tested within each study (Supplementary Table 1) the associations of MPV and PLT with each SNP using an additive model; we then combined these study-specific test statistics in a fixed-effects meta-analysis. To reduce the risk of spurious associations, we applied common stringent quality control filters and the genomic control method2 to the meta-analysis, which shows no evidence for residual inflation of summary statistics (Supplementary Fig. 1).
A total of 52 genomic loci reaching statistical significance at the genome-wide adjusted threshold of P ≤ 5 × 10−8 were discovered in this stage 1 analysis; 55 additional loci reached suggestive association (5 × 10−8 < P ≤ 5 × 10−6). We tested one SNP per locus in a stage 2 analysis that included in silico and de novo replication data in up to 18,838 individuals from 12 additional studies, confirming 15 additional loci (Supplementary Table 2). One further independent locus (TRIM58) associated with PLT was identified through detection of secondary association signals. Overall, 68 independent genomic regions were associated with PLT and MPV with P ≤ 5 × 10−8, of which 52 are new and 16 were described previously in Europeans3-6 (Table 1). Of the 68 loci, 43 and 25 loci were associated significantly with PLT and MPV, respectively; 16 of them reached genome-wide significance with both traits (Supplementary Fig. 2). This partial overlap reflects the negative correlation of both traits (gender-adjusted r = −0.49, Fig. 1a) that results from the tight control of platelet mass (PLT × MPV)7. The association of some loci with both PLT and MPV may reflect this negative correlation between the two traits or independent pleiotropic effects of a locus on megakaryopoiesis and platelet formation. The different statistical power at the two traits and small effect sizes at many loci reduce our power to discriminate among loci controlling MPV and PLT through analysis of platelet mass. Their testing will require the collection and analysis of PLT and MPV in large independent homogeneous cohorts. Some loci, however, have a clear-cut effect. For instance, BAK1 affects PLT specifically, compatible with its role in apoptosis and platelet lifespan.
We further tested the association of the 68 loci in 7,949 (MPV) and 8,295 (PLT) samples of south Asian and 14,697 (PLT) samples of Japanese8 origin. We detected substantial overlap of association signals, with effect size and direction highly concordant with findings in Europeans (Supplementary Fig. 3 and Supplementary Table 3). In the south Asian sample, 15 of the 68 (22.1%) loci were significant after adjustment for multiple testing (P ≤ 7 × 10−4). In the Japanese sample, 13 of 55 (23.6%) PLT loci showed significance. Moreover, 73 of 84 (87%, South Asians) and 45 of 55 (82%, Japanese) SNPs showed associations with effect estimates directionally consistent with Europeans. Such concordance is highly unlikely to be due to chance (P = 2.3 × 10−12 and P = 2.1 × 10−6), and provides independent validation of the locus discovery in Europeans.
The 68 loci cumulatively explain 4.8% of the phenotypic variance in PLT and 9.9% in MPV, accounting respectively for average increases of 2.57 × 109 l−1 PLT and 0.10 fl MPV per copy of allele. These levels of explained variance are in accordance with other GWAS of complex quantitative traits9. Our results indicate that many other common variants of similar or lower effect size, rare variants as well as structural variants may also contribute to the variation of both platelet traits. We used the method of ref. 10 to estimate the number of additional PLT- and MPV-associated loci having effect sizes comparable to those observed in our analysis. The method (with caveats discussed in the Supplementary Information) predicted that 137 and 81 such loci exist for PLT and MPV respectively, accounting for 9.7% and 18.3% of the total phenotypic variance.
Evidence from recent, highly powered meta-analyses suggests that the association peaks are enriched for genes controlling key underlying biological pathways11,12. In our case, a large proportion of the association signals (46 out of 68) had the most significant SNP in stage 1 (‘sentinel SNP’) mapping to within a gene-coding region, including several key regulators of haemostasis (ITGA2B, F2R, GP1BA), megakaryopoiesis (THPO, MEF2C) and platelet lifespan (BAK1). Through an unbiased analysis of our GWAS results, we estimated that PLT-associated SNPs are significantly more likely to map to gene regions than expected by chance (P < 0.05, Supplementary Fig. 4), suggesting that we may prioritize the search of additional yet unknown genes controlling these processes in the associated regions. To define a univocal rule to study the enrichment of functional relationships in associated genes, we made the choice to focus on a set of 54 ‘core’ genes selected as either containing the sentinel SNP or mapping to within 10 kb from an intergenic sentinel SNP (Table 2). This selection strategy is designed to obtain unbiased hypotheses producing interpretable biological inference for genes near the association signals, but has reduced sensitivity for genes that map further from the sentinel SNP. For instance VWF, a key regulator of haemostasis, maps to 55 kb from the sentinel SNP (Supplementary Fig. 3 and Supplementary Table 4) and is therefore not considered as a core gene. We further note that this selection strategy does not imply knowledge of the location of causative variants, which is currently incomplete. A detailed SNP survey showed that at 15 loci the sentinel SNPs either encoded, or were in high linkage disequilibrium (LD, r2 ≥ 0.8) with, a non-synonymous variant (Supplementary Table 5); another 11 either matched or were in high linkage disequilibrium with SNPs associated with expression levels of core genes (or cis-eQTLs, Supplementary Table 6), indicating that other loci may exert their effect through regulation of gene expression13. The validation of suggestive causative effects, as well as the identification of more complex interactions involving other genomic loci (trans eQTLs), will require a more comprehensive discovery in appropriately powered genomic data sets.
As a first effort to characterize biological connectivity among the core genes, we applied canonical pathway analyses (see http://www.ingenuity.com), detecting a highly significant over-representation of core genes in relevant biological functions such as haematological disease, cancer and cell cycle (Supplementary Table 7). Encouraged by these results, we extended this effort to construct a comprehensive network of protein-protein interactions incorporating the core genes. This effort integrated information from public databases (principally Reactome and IntAct) with careful manual revision of published evidence and high-throughput gene expression data. The resulting network, which includes 633 nodes and 827 edges, showed extensive connectivity between the proteins encoded by the core genes with an established functional role in megakaryopoiesis and platelet formation and those encoded by genes hitherto unknown to be implicated in these processes (Fig. 1b).
We next considered whether this connectivity was also reflected in the regulation of core gene transcription, and whether expression patterns were unique to megakaryocytes. Despite high levels of correlation in gene expression between different blood cell types (median 5 0.8; median absolute deviation = 0.1)14, we found that core genes tend to have significantly greater expression in megakaryocytes than in the other blood cells (P = 7.5 × 10−5, Supplementary Fig. 5a). This observation is compatible with the notion that ultimate steps in blood cell lineage specification are accompanied, or driven, by the emergence of increasing numbers of lineage-specific transcripts. To explore this assumption, we used genome-wide expression arrays to determine changes in global transcript levels during in vitro differentiation of umbilical-cord blood-derived haematopoietic stem cells to precursors of blood cells. We considered five different time points and two cell types, erythroblasts (the precursors of red blood cells) and megakaryocytes. Notwithstanding high levels of correlation of gene expression between erythroblasts and megakaryocytes14, core gene transcripts showed a significant increase over time in megakaryocytes (P = 1.5 × 10−6) but not in erythroblasts (P = 0.77, Fig. 1c, d; see also Supplementary Fig. 5b). Taken together, these patterns of core gene expression are consistent with a different regulation of their transcription in megakaryocytes versus erythroblasts, and with their centrality in megakaryopoiesis and platelet formation. This hypothesis is also consistent with the observation that only 5 of the 68 sentinel SNPs exert a significant effect on erythrocyte parameters (HBS1L-MYB, RCL1, SH2B3, TRIM58 and TMCC2, Supplementary Table 8).
To assess whether core genes are indeed implicated in haematopoiesis, we interrogated the function of 15 genes using gene silencing in D. rerio and D. melanogaster, and supported empirical data with published evidence on knockout models in M. musculus (Table 2 and Supplementary Table 4). In D. rerio, we applied morpholino constructs to silence the expression of six genes (Fig. 2 and Supplementary Fig. 6) selected to have >50% homology with the human counterpart and no previous evidence of involvement in haematopoiesis. Silencing of four genes in D. rerio (arhgef3, ak3, rnf145, jmjd1c) resulted in the ablation of both primitive erythropoiesis and thrombocyte formation. Silencing of tpma, the orthologue of TPM1 that is transcribed in megakaryocytes but not in other blood cells, abolished the formation of thrombocytes but not of erythrocytes. Silencing of ehd3 did not yield a haematopoietic phenotype. We also screened D. melanogaster RNA interference (RNAi) knockdown lines for quantitative alterations in the two most prevalent classes of blood elements: plasmatocytes and crystal cells. The repertoire of blood cells in D. melanogaster, consisting of about 95% plasmatocytes and 5% crystal cells, is less varied than in vertebrates. Transcription factors and signalling pathways regulating haematopoiesis have, however, been conserved throughout evolution15, making the RNAi knockdown studies a relevant first step towards a better understanding of the putative role of these GWAS genes in haematopoiesis. Four core-gene D. melanogaster lines (shibire (DNM), ush (ZFPM2), rpn9 (PSMD13), Brf (BRF1)), as well as five others (sun (ATP5E), CG3704 (XAB1), Su(var)205 (CBX5), dve (SATB1) and RpL6 (RPL6)), displayed highly reproducible differences in the numbers of these two cell types (Table 2 and Supplementary Table 4). Despite widespread differences between mammalian and insect haematopoietic lineages16, our findings from D. melanogaster provide new and supporting examples of functional conservation in the control of blood cell formation in invertebrates and vertebrates17-19.
The data from studies in D. rerio by us and in M. musculus by others (see Supplementary Table 4) provided proof-of-concept evidence that our prioritization strategy is appropriate for selecting novel genes controlling thrombopoiesis and megakaryopoiesis, respectively. More detailed insights and additional implicated genes will be revealed through the systematic silencing of all genes in the associated regions. For instance, RNAi knockdown of dve in D. melanogaster reduces plasmatocyte numbers and increases the number of crystal cells, thus providing supporting evidence that its non-core genehuman homologue SATB1 should be prioritized in functional studies. However, the results of the knockdown study in D. rerio do not clarify at which hierarchical positions in thrombopoiesis and erythropoiesis the genes exert their effect, requiring further assessment in conditional knockout models in M. musculus with lineage-specific regulation of gene transcription. Nevertheless, our results have already allowed novel insights into the genetic control of these processes. Signalling cascades initiated by thrombopoietin (THPO) and its receptor cMPL via the JAK2/STAT3/5A signalling pathway are key regulatory steps initiating changes in gene expression responsible for driving forward megakaryocyte differentiation20. Our study highlights several additional signalling proteins implicating potentially important novel regulatory routes. For instance, two genes encoding guanine nucleotide exchange factors (DOCK8 and ARHGEF3) were identified. Mendelian mutations of the former are causative of the hyper-IgE syndrome, but its effect on platelets had not yet been identified. The silencing of the latter gene in D. rerio resulted in a profound haematopoietic phenotype characterized by a complete ablation of both primitive erythropoiesis and thrombocyte formation, demonstrating its novel regulatory role in myeloid differentiation. In a parallel and in-depth study we demonstrated its novel role in the regulation of iron uptake and erythroid cell maturation21. A second class of genes also known to critically control early and late events of megakaryopoiesis are transcription factors. For instance, MYB silencing by microRNA 150 determines the definitive commitment of the megakaryocyte–erythroblast precursor to the megakaryocytic lineage15. A further 10 core genes identified in this study are implicated in the regulation of transcription. Among these, we have demonstrated here that silencing of rnf145 and jmjd1c in D. rerio severely affects both lineages.
In conclusion, this highly powered study describes a catalogue of known and novel genes associated with key haematopoietic processes in humans, providing an additional example of GWAS leading to biological discoveries. We further showed that for a large proportion of these known and new genes, functional support is achieved from model organisms and by overlap with genes implicated in inherited Mendelian disorders and in human cancers because of acquired mutations. In-depth functional studies and comparative analyses will be necessary to characterize the precise mechanisms by which these new genes and variants affect haematopoiesis, megakaryopoiesis and platelet formation. Furthermore, we provide extensive new resources, most notably a freely accessible knowledge base embedded in the novel protein-protein interaction network, with information about the identified platelet genes being implicated in Mendelian disorders and results from gene-silencing studies in model organisms. We anticipate that these resources will help to advance megakaryopoiesis research, to address key questions in blood stem-cell biology and to propose new targets for the treatment of haematological disorders. Finally, MPV has been associated with the risk of myocardial infarction22,23. The contribution of the new loci to the aetiology of acute myocardial infarction events will require assessment in a prospective setting.
A summary of the methods can be found in Supplementary Information and includes detailed information on: study populations; blood biochemistry measurements; genotyping methods and quality control filters; genome-wide association and meta-analysis methods; gene prioritization strategies for functional assessment and network construction; protein-protein interaction network; in vitro differentiation of blood cells; experimental data sets and analytical methods for gene expression analysis; zebrafish morpholino knockdown generation; assessment of other model organism resources.
Reactome is supported by grants from the US National Institutes of Health (P41 HG003751), EU grant LSHG-CT-2005-518254 ‘ENFIN’, and the EBI Industry Programme. A.C. and D.L.S. are supported by the Wellcome Trust grants WT 082597/Z/07/Z and WT 077037/Z/05/Z and WT 077047/Z/05/Z, respectively. J.S.C. and K.V. are supported by the European Union NetSim training fellowship scheme (number 215820). A. Radhakrishnan, A.A., H.L.J., J.J., J.S. and W.H.O. are funded by the National Institute for Health Research, UK. Augusto Rendon is funded by programme grant RG/09/012/28096 from the British Heart Foundation. J.M.P. is supported by an Advanced ERC grant. IntAct is funded by the EC under SLING, grant agreement no. 226073 (Integrating Activity) within Research Infrastructures of the FP7, under PSIMEX, contract no. FP7-HEALTH-2007-223411. C.G. received support by the European Community’s Seventh Framework Programme (FP7/2007-2013), ENGAGE Consortium, grant agreement HEALTH-F4-2007-201413. N.S. is supported by the Wellcome Trust (Grant Number 098051). Acknowledgements by individual participating studies are provided in Supplementary Information.
Supplementary Information is linked to the online version of the paper at www.nature.com/nature.
Author Information Reprints and permissions information is available at www.nature.com/reprints. The authors declare no competing financial interests. Readers are welcome to comment on the online version of this article at www.nature.com/nature.
Author Contributions Study design group: C.G., S. Sanna, A.A.H., A. Rendon, M.A.F., W.H.O., N.S.; manuscript writing group: C.G., A. Radhakrishnan, S. Sanna, A.A.H., A. Rendon, M.A.F., W.H.O., N.S.; data preparation, meta-analysis and secondary analysis group: A. Radhakrishnan, B.K., W.T., E.P., G.P., R.M., M.A.F., C.G., N.S.; bioinformatics analyses, pathway analyses and protein-protein interaction network group: S.M., J.-W.N.A., S.J., J.K., Y.M., L.B., A. Rendon, W.H.O.; transcript profiling methods and data group: K.V., A. Rendon, L.W., A.H.G., T.-P.Y., F. Cambien, J.E., C. Hengstenberg, N.J.S., H. Schunkert, P.D., W.H.O.; M. musculus models: R.R.-S.; D. rerio knockdown models: A.C., J.S.-C., D. Stemple, W.H.O.; D. melanogaster knockdown models: U.E., J. Penninger and A.A.H. All other author contributions and roles are listed in Supplementary Information.