|Home | About | Journals | Submit | Contact Us | Français|
Hematopoietic stem progenitor cells are the source for the entire hematopoietic system. Studying gene expression in hematopoietic stem progenitor cells will provide information to understand the genetic programs controlling early hematopoiesis, and to identify the gene targets to interfere hematopoietic disorders. Extensive efforts using cell biology, molecular biology and genomics approaches have generated rich knowledge for the genes and functional pathways involving in early hematopoiesis. Challenges remain, however, including the rarity of the hematopoietic stem progenitor cells that set physical limitation for the study, the difficulty for reaching comprehensive transcriptome detection under the conventional genomics technologies, and the difficulty for using conventional biological methods to identify the key genes among large number of expressed genes controlling stem cell self renewal and differentiation. The newly developed single-cell transcriptome method and the next-generation DNA sequencing technology provide new opportunities for transcriptome study for early hematopoietic. Using systems biology approach may reveal the insight of the genetic mechanisms controlling early hematopoiesis.
Hematopoietic system is one of the best-characterized cellular differentiation systems. The hematopoietic stem cell is formed in the ventral mesoderm at the embryonic stage, and migrates progressively to yolk sac, aortic region, placenta, fetal liver, and bone marrow in the adulthood (Zon. 2008). During the process, some hematopoietic stem cells maintain self-renewal capacity, others gradually loss their self-renewal capacity and differentiate towards the lineage-defined multipotent progenitors cell, the lineage-restricted progenitors, and eventually the mature terminal cell types to perform specified physiological functions. The early hematopoiesis plays critical role in maintaining the entire hematopoietic system. The functional aberrations in early hematopoiesis directly cause various hematopoietic disorders. Studying gene expression in early hematopoiesis is critical to understand the genetic mechanisms controlling early hematopoiesis, and to identify genetic causes for hematopoietic disorders.
Hematopoiesis is a dynamics process. To study hematopoiesis, it requires determining the specific differentiation stages of the cells. This is largely achieved by identifying cellular surface markers that are presence or absence at specific stages of hematopoiesis. A typical example is the CD34+marker. CD34+was firstly identified in 1984 (Civin et al. 1984). Subsequent studies determined that CD34+is present in early-stage hematopoietic cells (Tindle et al., 1985; Katz et al., 1985; Andrews et al., 1986; Watt et al., 1987), and the hematopoietic CD34+cell is able to reconstitute the entire hematopoietic system in the lethally irradiated animal (Berenson et al., 1988). After nearly 10 years, CD34+gene was cloned (Simmons et al., 1992) and its genome origin and structure was located at 1q32 (Satterthwaite et al., 1003). Molecular analysis reveals that CD34+is a 40-kDa type I integral membrane protein with nine potential N-linked and numerous potential O-linked glycosylation sites in its extracellular domain. Continuous studies indicate that CD34+ marker is a pan-early hematopoietic cell marker. It is present in later hematopoietic stem cell, multipotent hematopoietic progenitors cell and the lineage-restricted hematopoietic progenitors. While the biological function of CD34+ molecule itself remains largely unknown (Furness et al., 2006), CD34+ cells have been used widely as hematopoietic stem cells for clinical transplantation to restore the hematopoietic system. However, It has been observed that D34-/CD38- cells can also initiate multilineage hematopoiesis (Bhatia et al., 1998). Therefore, not all hematopoietic stem cells are CD34+. To further define the early differentiation stage of hematopoiesis, new markers specific for each specific stage will be required. Indeed, more specific surface markers have been identified for cell types differentiated at specific stages. For example, coupling with CD38, a glycoprotein expressed in matured immune cells, CD34+ cells can be further divided into CD34+/CD38− subpopulation that enrich for the primitive hematopoietic stem cells and CD34+/CD38+ subpopulation that enriches for the lineage-committed hematopoietic progenitor cells (Georgantas et al., 2004). More markers can be used to further define the cells at more specific stages. Using more specific markers, one can further sub-classify early hematopoietic cells that greatly facilitate hematopietic study in determining the origin, the migration pathway and the cellular development using the advanced cell sorting techniques, antibody staining, and animal models.
Using molecular biological, functional and animal modeling approaches, efforts have been make in attempting to dissect the genetic programs for early hematopoiesis. Multiple genes have been identified to play roles in controlling early hematopoiesis. These genes include growth factors, chromatin association factors, homeobox genes, transcription factors, and cell cycle regulators (Zon. 2008) as exampled by Drosophila trithorax homolog MLL, multiple HOX genes, NOTCH, WNT and TGFB signaling pathways etc.. Recent studies suggest that microRNAs may also involve in hematopoietic regulation (Garzon et al., 2008). Comparing to the 20,000 or so genes in the human genome, however, the number of functional important genes identified so far is limited. It is likely that more genes involving in hematopoiesis are waiting to be identified.
Following the development of genome studies, genomic approaches have been applied for gene expression profiling in hematopoiesis. Microarray was used to study early hematopoiesis in mouse model (Phillips et al., 2003), to study dynamic gene expression during in vitro hematopoietic differentiation (Komor et al., 2005), and to study the priming events during early hematopoiesis that identified lineage-related gene expression signatures for lymphoid, myeloid and erythroid at the HSC stage (Ng et al., 2009). Sanger sequencing-based technologies including the full-length cDNA, EST, and SAGE have also been used to study gene expression from early hematopoiesis to matured hematopoietic cell types. EST was used to analyze gene expression in human CD34+cells (Yang et al., 1006; Mao et al., 1998; Zhang et al., 2000), SAGE was used to identify the genes expressed in CD34+ +/CD38− HSC and CD34+ +/CD38+ HPC (Georgantas et al. 2004), CD34+ + cells (Zhou et al., 2001; Zhao et al., 2007), pre-T (Klein et al., 2003), pre-B cell (Müschen et al., 2002), myeloid progenitor (Lee et al., 2001), NK progenitor cells (Kang et al., 2005) and erythroid progenitor (Lee et al., 2007). Table 1 summaries the sequence data generated from CD34+cells by individual studies.
While the data from individual study provide gene expression information in CD34+cells, limitations are present. These include 1) Lack of comprehensiveness. For example, only several hundreds of full-length sequences have been generated from CD34+cells so far; 2) Lack of specificity. Most of the transcriptome information in CD34+cells is from SAGE data. SAGE only provides a short tag sequence for the detected transcript. SAGE tag alone is not enough to determine the gene structure and to perform functional study; 3) Lack of data consistence between different studies. Each individual data set was generated by different lab at different time through using different annotation process and reference databases. It is difficult to compare the existing data.
We recently performed a study aiming to provide uniform, comprehensive transcriptome for human CD34+hematopoietic cells (Kim et al., 2009). We collected 25,798 3′ ESTs, which is the largest EST collection from human CD34+ + cells. Through database and literature mining, we also collected existing CD34+cDNA sequences collected by previous studies including full-length cDNA, EST and SAGE. We integrated all sequences data into a uniformed dataset, and annotated them using the latest human genome knowledge. Our study indicates that at least 12,759 genes are expressed in human CD34+cells. Our study confirmed the genes known to be important in hematopoiesis but identify more candidate genes. For example, we identified 14 HOX genes expressed in CD34++ cells, of which HOXA9, HOXA10 and HOXB4 are known regulators of hematopoiesis and the remaining 11 are newly detected. We observed that 56% (574) of known human transcription factor genes are expressed in CD34+ + cells, of which 327 belong to zinc finger protein zf-C2H2 family. Besides the NOTCH, WNT and TGFB pathways known to regulate hematopoietic self-renewal and differentiation, we also identified seven other signaling pathways. We detected 94 kinase genes, of which 19% are tyrosing kinase genes. We also identified 45 miRNA genes expressed in CD34+cells. We analyzed alternative transcriptional initiation, alternative splicing and adenylation, antisense and non-coding transcripts. By comparing with the information from multiple matured hematopietic cell types, we identified CD34+ + cell-specific gene expression signature and provided many gene marker candidates for early hematopoiesis study. Using the rich transcriptome and human genome information, we generated a CD34+ + cell transcriptional map that reflects the transcriptional activities in the human genome at the CD34+cell stage. The data from our study represent the latest knowledge of gene expression during early hematopoiesis.
Although great progress has been made, our understanding of the genetic basis of early hematopoiesis is limited. The following factors contribute to the situation:
The number of hematopoietic stem-progenitor cells is rare in the hematopoietic cell population. Using more specific markers provides better determination of specific differentiation stage, however, it also increases the rarity for the specific type of cells. For example, the CD34+ + cell accounts for 1 to 2% normal bone marrow. The rate decreases to 1/1000 cells for the CD34+ +/CD38−/Lin- undifferentiated hematopoietic stem cells (Georgantas et al., 2004). While the increased rarity of the cells has limited influence for studying the cells at the cellular level, it drives the cell number toward physical limitation for transcriptome study under the conventional microarray and Sanger sequencing platforms.
It is still not known how many genes are expressed during early hematopoiesis. Microarray can only detect the known genes. The high cost of Sanger sequencing does not allow exhaustive mRNA sequence collection. As a result, full-length cDNA sequencing can only detect limited number of mRNA, EST only provide partial sequences for the detected mRNA, SAGE only provides minimal information for the detected mRNA (Wang. 2008). Under microarray and Sanger sequencing systems, only the mRNA expressed at the high and intermediate abundant levels can be detected. For most of the mRNA present at the lower levels, of which many could be from the functional important genes, they remain to be detected.
Our latest study shows that over half of the human genes are expressed in CD34+cells. In contrast, efforts made in the past decades have identified only handful number of genes that are important for early hematopoiesis. These genes were mainly identified through using classical methods such as gene knockout. The low-throughput nature of these classical methods determines that it will be difficult to use these methods to analyze large number of genes expressed during early hematopoiesis. New approaches need to be developed for high-throughput functional analysis.
A major factor preventing from comprehensive transcriptome analysis is the high sequencing cost of the Sanger-sequencing platform, besides the associated complexity of sample preparation of library construction and single clone isolation. Multiple next-generation DNA sequencers have been developed, including 454, Solexa, SOLiD, Polonator, Helicos, etc. and more are under developing. The common features of next-generation sequencers include massive data production, simple sample preparation, high speed, and low cost. For example, the new 454 sequencer provide 100 million reads per run at up to 500 bases per read that reaches the length of ESTs generated by Sanger sequencing; the Solexa and SOLiD sequencers generate multi-Gb per run at 35–75 bps per read. The next-generation sequencers have overcome most of the limitations in Sanger sequencer. This implies that sequence collection is not a restriction factor any more for transcriptome study. The contents will be easily increased several hundred-thousand folds comparing to thee by the Sanger sequencing system. Next-generation sequencer-based mRNAseq methods have been developed (Marioni et al., 2008; Nagalakshmi et al., Cloonan et al., 2008; 2008; Wang et al., 2009). For example, it was used to analyze gene expression in embryonic stem cells and embryonic bodies (Cloonan et al., 2008). Annotation of over 10Gb sequences identified the transcribed regions in the genome of the embryonic stem cells and embryonic bodies, the SNPs and alternative spliced transcripts in the expressed genes, and the key signaling pathways involving embryonic stem cells pluripotency and differentiation.
As indicated above, the rarity of the hematopoietic stem progenitor cells largely restrict transcriptome study for early hematopoiesis. Attempts have been made in providing solutions to overcome the restriction. A microarray-based single-cell system was converted for sequencing collection using the SOLiD sequencer (Kurimoto et al., 2006; Kurimoto et al., 2007; Tang et al., 2009). Taking the advantage of massive sequence production, this modified method provides high sensitivity to detect mRNAs expressed in single cell. Using this method, over 100 million sequence reads were collected from a single mouse blastomere cell. Combining single-cell sorting method and single-cell mRNAseq method, it is possible now to reach comprehensive coverage of the transcriptome for each type of hematopoietic stem-progenitor cells at the single cell level. Such data should provide the information unperceivable before to study the genetic basis of early hematopoiesis.
The development of systems biology concept to reveal the genetic basis of early hematopoiesis Detailed single gene study can determine if a gene is required for early hematopoiesis. At a systems point of view, a given phenotype is the consequence of joint action by multiple genes and pathways. To reveal the genetic programs controlling early hematopoiesis, systems approaches will likely be required to interpret the gene expression information and to determine the genes and functional pathways that play important roles in early hematopoiesis (Foster et al 2009). New epigenome mapping data from NIH Road Map project (Mendenhall & Bernstein, 2008) will further provide chromatin state information during cell programming and reprogramming. New single cell imaging technologies and surface markers will help identifying key sources and steps in the birth of the blood cells (Yoshimoto and Yoder, 2009)
We would like to acknowledge the funding from NIH HG002600 (SMW), HG001696 and ES017166 (MQZ).