Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Cell Physiol. Author manuscript; available in PMC 2011 June 1.
Published in final edited form as:
PMCID: PMC2875260

Transcriptome study for early hematopoiesis—achievement, challenge and new opportunity

Hematopoietic stem progenitor cells are the source for the entire hematopoietic system. Studying gene expression in hematopoietic stem progenitor cells will provide information to understand the genetic programs controlling early hematopoiesis, and to identify the gene targets to interfere hematopoietic disorders. Extensive efforts using cell biology, molecular biology and genomics approaches have generated rich knowledge for the genes and functional pathways involving in early hematopoiesis. Challenges remain, however, including the rarity of the hematopoietic stem progenitor cells that set physical limitation for the study, the difficulty for reaching comprehensive transcriptome detection under the conventional genomics technologies, and the difficulty for using conventional biological methods to identify the key genes among large number of expressed genes controlling stem cell self renewal and differentiation. The newly developed single-cell transcriptome method and the next-generation DNA sequencing technology provide new opportunities for transcriptome study for early hematopoietic. Using systems biology approach may reveal the insight of the genetic mechanisms controlling early hematopoiesis.

The early hematopoietic differentiation

Hematopoietic system is one of the best-characterized cellular differentiation systems. The hematopoietic stem cell is formed in the ventral mesoderm at the embryonic stage, and migrates progressively to yolk sac, aortic region, placenta, fetal liver, and bone marrow in the adulthood (Zon. 2008). During the process, some hematopoietic stem cells maintain self-renewal capacity, others gradually loss their self-renewal capacity and differentiate towards the lineage-defined multipotent progenitors cell, the lineage-restricted progenitors, and eventually the mature terminal cell types to perform specified physiological functions. The early hematopoiesis plays critical role in maintaining the entire hematopoietic system. The functional aberrations in early hematopoiesis directly cause various hematopoietic disorders. Studying gene expression in early hematopoiesis is critical to understand the genetic mechanisms controlling early hematopoiesis, and to identify genetic causes for hematopoietic disorders.

Hematopoiesis is a dynamics process. To study hematopoiesis, it requires determining the specific differentiation stages of the cells. This is largely achieved by identifying cellular surface markers that are presence or absence at specific stages of hematopoiesis. A typical example is the CD34+marker. CD34+was firstly identified in 1984 (Civin et al. 1984). Subsequent studies determined that CD34+is present in early-stage hematopoietic cells (Tindle et al., 1985; Katz et al., 1985; Andrews et al., 1986; Watt et al., 1987), and the hematopoietic CD34+cell is able to reconstitute the entire hematopoietic system in the lethally irradiated animal (Berenson et al., 1988). After nearly 10 years, CD34+gene was cloned (Simmons et al., 1992) and its genome origin and structure was located at 1q32 (Satterthwaite et al., 1003). Molecular analysis reveals that CD34+is a 40-kDa type I integral membrane protein with nine potential N-linked and numerous potential O-linked glycosylation sites in its extracellular domain. Continuous studies indicate that CD34+ marker is a pan-early hematopoietic cell marker. It is present in later hematopoietic stem cell, multipotent hematopoietic progenitors cell and the lineage-restricted hematopoietic progenitors. While the biological function of CD34+ molecule itself remains largely unknown (Furness et al., 2006), CD34+ cells have been used widely as hematopoietic stem cells for clinical transplantation to restore the hematopoietic system. However, It has been observed that D34-/CD38- cells can also initiate multilineage hematopoiesis (Bhatia et al., 1998). Therefore, not all hematopoietic stem cells are CD34+. To further define the early differentiation stage of hematopoiesis, new markers specific for each specific stage will be required. Indeed, more specific surface markers have been identified for cell types differentiated at specific stages. For example, coupling with CD38, a glycoprotein expressed in matured immune cells, CD34+ cells can be further divided into CD34+/CD38− subpopulation that enrich for the primitive hematopoietic stem cells and CD34+/CD38+ subpopulation that enriches for the lineage-committed hematopoietic progenitor cells (Georgantas et al., 2004). More markers can be used to further define the cells at more specific stages. Using more specific markers, one can further sub-classify early hematopoietic cells that greatly facilitate hematopietic study in determining the origin, the migration pathway and the cellular development using the advanced cell sorting techniques, antibody staining, and animal models.

Studying gene expression in early hematopoiesis

Using molecular biological, functional and animal modeling approaches, efforts have been make in attempting to dissect the genetic programs for early hematopoiesis. Multiple genes have been identified to play roles in controlling early hematopoiesis. These genes include growth factors, chromatin association factors, homeobox genes, transcription factors, and cell cycle regulators (Zon. 2008) as exampled by Drosophila trithorax homolog MLL, multiple HOX genes, NOTCH, WNT and TGFB signaling pathways etc.. Recent studies suggest that microRNAs may also involve in hematopoietic regulation (Garzon et al., 2008). Comparing to the 20,000 or so genes in the human genome, however, the number of functional important genes identified so far is limited. It is likely that more genes involving in hematopoiesis are waiting to be identified.

Following the development of genome studies, genomic approaches have been applied for gene expression profiling in hematopoiesis. Microarray was used to study early hematopoiesis in mouse model (Phillips et al., 2003), to study dynamic gene expression during in vitro hematopoietic differentiation (Komor et al., 2005), and to study the priming events during early hematopoiesis that identified lineage-related gene expression signatures for lymphoid, myeloid and erythroid at the HSC stage (Ng et al., 2009). Sanger sequencing-based technologies including the full-length cDNA, EST, and SAGE have also been used to study gene expression from early hematopoiesis to matured hematopoietic cell types. EST was used to analyze gene expression in human CD34+cells (Yang et al., 1006; Mao et al., 1998; Zhang et al., 2000), SAGE was used to identify the genes expressed in CD34+ +/CD38− HSC and CD34+ +/CD38+ HPC (Georgantas et al. 2004), CD34+ + cells (Zhou et al., 2001; Zhao et al., 2007), pre-T (Klein et al., 2003), pre-B cell (Müschen et al., 2002), myeloid progenitor (Lee et al., 2001), NK progenitor cells (Kang et al., 2005) and erythroid progenitor (Lee et al., 2007). Table 1 summaries the sequence data generated from CD34+cells by individual studies.

Table 1
mRNA sequences identified from normal human CD34+ + cells

While the data from individual study provide gene expression information in CD34+cells, limitations are present. These include 1) Lack of comprehensiveness. For example, only several hundreds of full-length sequences have been generated from CD34+cells so far; 2) Lack of specificity. Most of the transcriptome information in CD34+cells is from SAGE data. SAGE only provides a short tag sequence for the detected transcript. SAGE tag alone is not enough to determine the gene structure and to perform functional study; 3) Lack of data consistence between different studies. Each individual data set was generated by different lab at different time through using different annotation process and reference databases. It is difficult to compare the existing data.

We recently performed a study aiming to provide uniform, comprehensive transcriptome for human CD34+hematopoietic cells (Kim et al., 2009). We collected 25,798 3′ ESTs, which is the largest EST collection from human CD34+ + cells. Through database and literature mining, we also collected existing CD34+cDNA sequences collected by previous studies including full-length cDNA, EST and SAGE. We integrated all sequences data into a uniformed dataset, and annotated them using the latest human genome knowledge. Our study indicates that at least 12,759 genes are expressed in human CD34+cells. Our study confirmed the genes known to be important in hematopoiesis but identify more candidate genes. For example, we identified 14 HOX genes expressed in CD34++ cells, of which HOXA9, HOXA10 and HOXB4 are known regulators of hematopoiesis and the remaining 11 are newly detected. We observed that 56% (574) of known human transcription factor genes are expressed in CD34+ + cells, of which 327 belong to zinc finger protein zf-C2H2 family. Besides the NOTCH, WNT and TGFB pathways known to regulate hematopoietic self-renewal and differentiation, we also identified seven other signaling pathways. We detected 94 kinase genes, of which 19% are tyrosing kinase genes. We also identified 45 miRNA genes expressed in CD34+cells. We analyzed alternative transcriptional initiation, alternative splicing and adenylation, antisense and non-coding transcripts. By comparing with the information from multiple matured hematopietic cell types, we identified CD34+ + cell-specific gene expression signature and provided many gene marker candidates for early hematopoiesis study. Using the rich transcriptome and human genome information, we generated a CD34+ + cell transcriptional map that reflects the transcriptional activities in the human genome at the CD34+cell stage. The data from our study represent the latest knowledge of gene expression during early hematopoiesis.

Challenges for transcriptome study in early hematopoiesis

Although great progress has been made, our understanding of the genetic basis of early hematopoiesis is limited. The following factors contribute to the situation:

The rarity of hematopoietic stem progenitor cells

The number of hematopoietic stem-progenitor cells is rare in the hematopoietic cell population. Using more specific markers provides better determination of specific differentiation stage, however, it also increases the rarity for the specific type of cells. For example, the CD34+ + cell accounts for 1 to 2% normal bone marrow. The rate decreases to 1/1000 cells for the CD34+ +/CD38−/Lin- undifferentiated hematopoietic stem cells (Georgantas et al., 2004). While the increased rarity of the cells has limited influence for studying the cells at the cellular level, it drives the cell number toward physical limitation for transcriptome study under the conventional microarray and Sanger sequencing platforms.

Inability to reach comprehensive mRNA detection

It is still not known how many genes are expressed during early hematopoiesis. Microarray can only detect the known genes. The high cost of Sanger sequencing does not allow exhaustive mRNA sequence collection. As a result, full-length cDNA sequencing can only detect limited number of mRNA, EST only provide partial sequences for the detected mRNA, SAGE only provides minimal information for the detected mRNA (Wang. 2008). Under microarray and Sanger sequencing systems, only the mRNA expressed at the high and intermediate abundant levels can be detected. For most of the mRNA present at the lower levels, of which many could be from the functional important genes, they remain to be detected.

Difficult to determine the key genes controlling self-renewal and differentiation

Our latest study shows that over half of the human genes are expressed in CD34+cells. In contrast, efforts made in the past decades have identified only handful number of genes that are important for early hematopoiesis. These genes were mainly identified through using classical methods such as gene knockout. The low-throughput nature of these classical methods determines that it will be difficult to use these methods to analyze large number of genes expressed during early hematopoiesis. New approaches need to be developed for high-throughput functional analysis.

New opportunity for studying gene expression in early hematopoiesis

The availability of next-generation DNA sequencers

A major factor preventing from comprehensive transcriptome analysis is the high sequencing cost of the Sanger-sequencing platform, besides the associated complexity of sample preparation of library construction and single clone isolation. Multiple next-generation DNA sequencers have been developed, including 454, Solexa, SOLiD, Polonator, Helicos, etc. and more are under developing. The common features of next-generation sequencers include massive data production, simple sample preparation, high speed, and low cost. For example, the new 454 sequencer provide 100 million reads per run at up to 500 bases per read that reaches the length of ESTs generated by Sanger sequencing; the Solexa and SOLiD sequencers generate multi-Gb per run at 35–75 bps per read. The next-generation sequencers have overcome most of the limitations in Sanger sequencer. This implies that sequence collection is not a restriction factor any more for transcriptome study. The contents will be easily increased several hundred-thousand folds comparing to thee by the Sanger sequencing system. Next-generation sequencer-based mRNAseq methods have been developed (Marioni et al., 2008; Nagalakshmi et al., Cloonan et al., 2008; 2008; Wang et al., 2009). For example, it was used to analyze gene expression in embryonic stem cells and embryonic bodies (Cloonan et al., 2008). Annotation of over 10Gb sequences identified the transcribed regions in the genome of the embryonic stem cells and embryonic bodies, the SNPs and alternative spliced transcripts in the expressed genes, and the key signaling pathways involving embryonic stem cells pluripotency and differentiation.

The development of the “single cell mRNASeq” method

As indicated above, the rarity of the hematopoietic stem progenitor cells largely restrict transcriptome study for early hematopoiesis. Attempts have been made in providing solutions to overcome the restriction. A microarray-based single-cell system was converted for sequencing collection using the SOLiD sequencer (Kurimoto et al., 2006; Kurimoto et al., 2007; Tang et al., 2009). Taking the advantage of massive sequence production, this modified method provides high sensitivity to detect mRNAs expressed in single cell. Using this method, over 100 million sequence reads were collected from a single mouse blastomere cell. Combining single-cell sorting method and single-cell mRNAseq method, it is possible now to reach comprehensive coverage of the transcriptome for each type of hematopoietic stem-progenitor cells at the single cell level. Such data should provide the information unperceivable before to study the genetic basis of early hematopoiesis.

The development of systems biology concept to reveal the genetic basis of early hematopoiesis Detailed single gene study can determine if a gene is required for early hematopoiesis. At a systems point of view, a given phenotype is the consequence of joint action by multiple genes and pathways. To reveal the genetic programs controlling early hematopoiesis, systems approaches will likely be required to interpret the gene expression information and to determine the genes and functional pathways that play important roles in early hematopoiesis (Foster et al 2009). New epigenome mapping data from NIH Road Map project (Mendenhall & Bernstein, 2008) will further provide chromatin state information during cell programming and reprogramming. New single cell imaging technologies and surface markers will help identifying key sources and steps in the birth of the blood cells (Yoshimoto and Yoder, 2009)


We would like to acknowledge the funding from NIH HG002600 (SMW), HG001696 and ES017166 (MQZ).

Literature cited

  • Andrews RG, Singer JW, Bernstein ID. Monoclonal antibody 12–8 recognizes a 115-kd molecule present on both unipotent and multipotent hematopoietic colony-forming cells and their precursors. Blood. 1986;67:842–845. [PubMed]
  • Berenson RJ, Andrews RG, Bensinger WI, Kalamasz D, Knitter G, Buckner CD, Bernstein ID. Antigen CD34+ + marrow cells engraft lethally irradiated baboons. J Clin Invest. 1988;81:951–955. [PMC free article] [PubMed]
  • Beschorner WE, Civin CI, Strauss LC. Localization of hematopoietic progenitor cells in tissue with the anti-My-10 monoclonal antibody. Am J Pathol. 1985;119:l–4. [PubMed]
  • Bhatia M, Bonnet D, Murdoch B, Gan OI, Dick JE. A newly discovered class of human hematopoietic cells with SCID-repopulating activity. Nat Med. 1998;4(9):1038–1045. [PubMed]
  • Chen J, Lee S, Zhou G, Wang SM. High-throughput GLGI procedure for converting a large number of serial analysis of gene expression tag sequences into 3’ complementary DNAs. Genes, Chromosomes & Cancer. 2002;33:252–261. [PubMed]
  • Civin CI, Strauss LC, Brovall C, Fackler MJ, Schwartz JF, Shaper JH. Antigenic analysis of hematopoiesis. III. A hematopoietic progenitor cell surface antigen defined by a rnonoclonal antibody raised against KG-la cells. J Immunol. 1984;133:157–165. [PubMed]
  • Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5:613–619. [PubMed]
  • Foster SD, Oram SH, Wilson NK, Gottgens B. From genes to cells to tissues- modeling the haematopoietic system. Mol BioSyst. 2009;5:1413–1420. [PubMed]
  • Furness SG, McNagny K. Beyond mere markers: functions for CD34+family of sialomucins in hematopoiesis. Immunol Res. 2006;34:13–32. [PubMed]
  • Garzon R, Croce CM. MicroRNAs in normal and malignant hematopoiesis. Curr Opin Hematol. 2008;15:352–358. [PubMed]
  • Georgantas RW, et al. Microarray serial analysis of gene expression analyses identify known novel transcripts overexpressed in hematopoietic stem cells. Cancer Res. 2004;64:4434–4441. [PubMed]
  • Kang HS, Kim EM, Lee S, Yoon SR, Kawamura T, Lee YC, Kim S, Myung PK, Wang SM, Choi I. Stage-dependent gene expression profiles during natural killer cell development. Genomics. 2005;86:551–565. [PubMed]
  • Katz FE, Tindle R, Sutherland DR, Greaves MF. Identification of a membrane glycoprotein associated with haemopoietic progenitor cells. Leuk Res. 1985;9:191–198. [PubMed]
  • Kim YC, Wu Q, Chen J, Xuan Z, Jung YC, Zhang MQ, Rowley JD, Wang SM. The transcriptome of human CD34+ + hematopoietic stem-progenitor cells. Proc Natl Acad Sci U S A. 2009;106(20):8278–8283. [PubMed]
  • Klein F, Feldhahn N, Lee S, Wang H, Ciuffi F, von Elstermann M, Toribio ML, Sauer H, Wartenberg M, Barath VS, Kronke M, Wernet P, Rowley JD, Muschen M. Tlymphoid differentiation in human bone marrow. Proc Natl Acad Sci U S A. 2003;100:6747–6752. [PubMed]
  • Komor M, Guller S, Baldus CD, de Vos S, Hoelzer D, Ottmann OG, Hofmann WK. Transcriptional profiling of human hematopoiesis during in vitro lineage-specific differentiation. Stem Cells. 2005;23(8):1154–1169. [PubMed]
  • Kurimoto K, Yabuta Y, Ohinata Y, Ono Y, Uno KD, Yamada RG, Ueda HR, Saitou M. An improved single-cell cDNA amplification method for efficient high-density oligonucleotide microarray analysis. Nucleic Acids Res. 2006;34(5):e42. [PMC free article] [PubMed]
  • Kurimoto K, Yabuta Y, Ohinata Y, Saitou M. Global single-cell cDNA amplification to provide a template for representative high-density oligonucleotide microarray analysis. Nat Protoc. 2007;2:739–52. [PubMed]
  • Lee S, Hwang J, Ulaszek J, Kim YC, Dong H, Kim HS, Seok JW, Suh BK, Yim SJ, Johnson D, Choe NH, Chang KT, Ryoo ZY, Tseng CC, Wickrema A, Wang SM. Stable transcriptional status in the apoptotic erythroid genome. Biochem Biophys Res Commun. 2007;359:556–62. [PubMed]
  • Lee S, Zhou G, Clark T, Chen J, Rowley JD, Wang SM. The pattern of gene expression in human CD15+ myeloid progenitor cells. Proc Natl Acad Sci U S A. 2001;98(6):3340–3345. [PubMed]
  • Mao M, Fu G, Wu JS, Zhang QH, Zhou J, Kan LX, Huang QH, He KL, Gu BW, Han ZG, Shen Y, Gu J, Yu YP, Xu SH, Wang YX, Chen SJ, Chen Z. Identification of genes expressed in human CD34+ +(+) hematopoietic stem/progenitor cells by expressed sequence tags efficient full-length cDNA cloning. Proc Natl Acad Sci U S A. 1998;95:8175–8180. [PubMed]
  • Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. [PubMed]
  • Mendenhall EM, Bernstein BE. Chromotin state maps: new technologies, new insights. Curr Opin Genet Dev. 2008;18:109–115. [PMC free article] [PubMed]
  • Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. [PubMed]
  • Müschen M, Lee S, Zhou G, Feldhahn N, Barath VS, Chen J, Moers C, Kronke M, Rowley JD, Wang SM. Molecular portraits of B cell lineage commitment. Proc Natl Acad Sci U S A. 2002;99:10014–10049. [PubMed]
  • Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. [PMC free article] [PubMed]
  • Ng SY, Yoshida T, Zhang J, Georgopoulos K. Genome-wide lineage-specific transcriptional networks underscore Ikaros-dependent lymphoid priming in hematopoietic stem cells. Immunity. 2009;30(4):493–507. [PMC free article] [PubMed]
  • Phillips RL, Ernst RE, Brunk B, Ivanova N, Mahan MA, Deanehan JK, Moore KA, Overton GC, Lemischka IR. The genetic program of hematopoietic stem cells. Science. 2000;288:1635–1640. [PubMed]
  • Satterthwaite AB, Burn TC, Le Beau MM, Tenen DG. Structure of the gene encoding CD34+ +, a human hematopoietic stem cell antigen. Genomics. 1992;12:788–794. [PubMed]
  • Simmons DL, Satterthwaite AB, Tenen DG, Seed B. Molecular cloning of a cDNA encoding CD34+ +, a sialomucin of human hematopoietic stem cells. J Immunol. 1992;148:267–271. [PubMed]
  • Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6:377–382. [PubMed]
  • Tindle RW, Nichols RA, Chan L, Campana D, Catovsky D, Birnie GD. A novel monoclonal antibody BI-3C5 recognizes myeloblasts and non-B non-T lymphoblasts in acute leukemias and CGL blast crises, and reacts with immature cell in normalb bone marrow. Leuk Res. 1985;9:1–9. [PubMed]
  • Venezia TA, Merchant AA, Ramos CA, Whitehouse NL, Young AS, Shaw CA, Goodell MA. Molecular signatures of proliferation quiescence in hematopoietic stem cells. PLoS Biol. 2004;2:e301. [PMC free article] [PubMed]
  • Wang SM. Long-short-long games in transcript identification: the length matters. Curr Pharm Biotechnol. 2008;9:362–367. [PubMed]
  • Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. [PMC free article] [PubMed]
  • Watt SM, Karhi K, Gatter K, Furley AJ, Katz FE, Healy LE, Altass LJ, Bradley NJ, Sutherland DR, Levinsky R, Greaves MF. Distribution and epitope analysis of the cell membrane glycoprotein (HPCA-1) associated with human hemopoietic progenitor cells. Leukemia. 1987;1:417–426. [PubMed]
  • Yang Y, Peterson KR, Stamatoyannopoulos G, Papayannopoulou T. Human CD34+ cell EST database: single-pass sequencing of 402 clones from a directional cDNA library. Exp Hematol. 1996;24:605–612. [PubMed]
  • Yoshimoto M, Yoder MC. Developmental biology: Birth of the blood cell. Nature. 2009;457:801–803. [PMC free article] [PubMed]
  • Zhang QH, et al. Cloning functional analysis of cDNAs with open reading frames for 300 previously undefined genes expressed in CD34+ + hematopoietic stem/progenitor cells. Genome Res. 2000;10:1546–1560. [PubMed]
  • Zhao Y, Raouf A, Kent D, Khattra J, Delaney A, Schnerch A, Asano J, McDonald H, Chan C, Jones S, Marra MA, Eaves CJ. A modified polymerase chain reaction-long serial analysis of gene expression protocol identifies novel transcripts in human CD34+ + bone marrow cells. Stem Cells. 2007;25:1681–1689. [PubMed]
  • Zhou G, Chen J, Lee S, Clark T, Rowley JD, Wang SM. The pattern of gene expression in human CD34+ +(+) stem/progenitor cells. Proc Natl Acad Sci U S A. 2001;98(24):13966–13971. [PubMed]
  • Zon L. Intrinsic extrinsic control of haematopoietic stem-cell self-renewal. Nature. 2008;453:307–313. [PubMed]