|Home | About | Journals | Submit | Contact Us | Français|
Molecular changes underlying stem cell differentiation are of fundamental interest. scRNA-seq on murine hematopoietic stem cells (HSC) and their progeny MPP1 separated the cells into 3 main clusters with distinct features: active, quiescent, and an un-characterized cluster. Induction of anemia resulted in mobilization of the quiescent to the active cluster and of the early to later stage of cell cycle, with marked increase in expression of certain transcription factors (TFs) while maintaining expression of interferon response genes. Cells with surface markers of long term HSC increased the expression of a group of TFs expressed highly in normal cycling MPP1 cells. However, at least Id1 and Hes1 were significantly activated in both HSC and MPP1 cells in anemic mice. Lineage-specific genes were differently expressed between cells, and correlated with the cell cycle stages with a specific augmentation of erythroid related genes in the G2/M phase. Most lineage specific TFs were stochastically expressed in the early precursor cells, but a few, such as Klf1, were detected only at very low levels in few precursor cells. The activation of these factors may correlate with stages of differentiation. This study reveals effects of cell cycle progression on the expression of lineage specific genes in precursor cells, and suggests that hematopoietic stress changes the balance of renewal and differentiation in these homeostatic cells.
Hematopoietic stem cells (HSCs) capable of long term marrow reconstitution have been separated from other marrow cells prospectively based on differential expression of cell surface markers or differences in the ability to extrude certain dyes. The HSCs give rise to progenitors (MPPs) that are capable of multilineage differentiation without long term marrow reconstitution, although the combinations of markers and the definition of the subsets of MPPs vary between research groups (1–3) (Supplementary Table S1). The pattern of lineage differentiation from MPP cells has been widely studied and correlated with the expression of transcriptional or post-transcriptional regulators, such as transcription factors (4–7), microRNAs (8) and epigenetic regulators (9,10). Alternative lineage roadmaps downstream of MPP cells have been proposed and investigated (11–13).
Heterogeneity of long-term repopulating stem cells is well recognized (12,14,15), including a distinction between replicating and quiescent stem cells (16–19). Changes in stem cells also occur in aging individuals (20) with an accumulation of cycling HSCs in the marrow (21,22) that is strain specific in mice (23). The accumulation of cycling HSCs refers a declining repopulating ability of stem cells from aged animals (24–27) that may be related to increased cycling of the cells (22), epigenetic changes (20,28), a shift towards oligo clonal hematopoiesis (29) and a shift from lymphoid towards myeloid bias (30,31) perhaps due to a decreased ability to lymphoid cells (32) although a recent report has reached a different conclusion about the myeloid/lymphoid ratio of production rates (33). Some HSCs can repopulate specific lineages in the marrow and give rise to long-term repopulating and transplantable progeny that often retain the same lineage bias. These include cells with a bias towards megakaryocytic repopulation (34), sometimes along with erythroid or granulocytic/monocytic lineage production (35). It has also been proposed that platelet-biased stem cells lie at the apex of the hierarchy (34) or that c-kitLo HSCs give rise to c-kitHi HSCs with increasing megakaryocytic lineage bias but decreasing self-renewal capability (36).
To get further understanding of the relation among the various types of early precursors, it is necessary to move beyond the type of limited resolution obtained with cell surface markers (37). Recent technical developments in microfluidics have made it possible to obtain whole transcriptome profile with multiplexed procedures (38), and these methods have been used to study the global transcriptome of single hematopoietic precursor cells (39–43).
In the present study, we performed single cell RNA-seq on HSC and the closely related MPP1 cells from normal mice as well as mice that underwent acute blood loss. Our analyses demonstrated well separated clusters of related cells in standard preparations of HSC and MPP1 cells, and heterogeneity in patterns of gene expression within each cluster. Stimulation of erythropoiesis by acute blood loss caused most HSC cells to shift into the cluster of cycling cells without loss of cell surface markers defining HSC. The HSC and MPP1 cells from anemic mice also increased expression of certain inhibitory transcription factors. We found correlation between the relative expressions of groups of genes specific for certain lineages with stages of the cell cycle. We confirmed stochastic activation of many lineage related transcription factors in HSC and MPP1 but found that a very few key lineage-related factors, perhaps one or two per developmental transition, are essentially silent in these cells, and we speculate that overcoming this inactivity represents a regulatory step in lineage development and maturation.
Bone Marrow cells were isolated from 6 to 8 weeks C57BL6 mice (mixed 50% males and 50% females). All mice used in this study were purchased from Charles River and housed at the Yale Animal Resources Center (YARC). Bone marrow cells were flushed from femurs by using Dulbecco's Phosphate Buffered Saline solution (DPBS, 1X, GIBCO) without Calcium and Magnesium, supplemented with 5% fetal bovine serum (GIBCO) (FACS buffer). The harvested cells were gently dispersed by passage through a 21G1 needle attached to a 5 ml syringe. Cells were spun down at 1200 rpm for 10 min, resuspended in 15 ml and filtered through a 40 μm cell strainer (BD Falcon).
After fresh bone marrow cells were isolated, the apoptotic cells were excluded using Annexin V Apoptosis Detection Kit eFluor® 450, and the Lineage and Flk2 negative cells were obtained with Lineage-biotin cocktail and Flk2-biotin conjured with PEcy7-streptavidin. Then the APC Cy7-ckit and PerCP Cy5.5-Sca-1 double positive cells were gated, and further the FITC-CD48 negative and APC-CD150 positive cells were used to sort HSC and MPP1 combined pure live cell populations for further analysis.
EML cells were obtained and cultured as described previously (44), and cultures in midlog phase were fractionated by preparative FACS using antibodies for lineage markers, CD34 and Sca-1. Human CD34 positive cells were obtained, expanded, and differentiated by the protocols previously described (45). Cells were harvested for RNA-seq at 0 time, 2 and h hours, and 2, 3, 4, 6, 8 and 12 days after initiation of erythropoietin treatment.
Cells were stained with CD62L and CD97 antibodies (Supplementary Table S11) according to the manufacturer's description. Eighteen different populations of mouse BM cells were sorted by FACS AriaII (BD Biosciences) from the Yale cell sorter Core Facility (Supplementary Figure S2). When sorting HSC, progenitors and early stage BM cells, we added Biotin Lineage depletion cocktail (anti-Mac1, anti-CD3ε, anti-B220, anti-Gr-1, anti-CD11b and anti-Ter119) and streptavidin Particles Plus-DM (BD IMag™, mouse hematopoietic progenitor (stem) cell enrichment set). Lineage positive cells were depleted by transferring cells to a magnetic stand. Enriched lineage negative cells were further stained with antibodies listed in Supplementary Table S11. We note that the gate for selecting CD48- cells may have been set slightly higher than that in the literature and in four of seven preparative FACS experiments used for single cell analysis a few outlier cells (average 2.5-3.5% of the CD150+CD48-LSK cells) were excluded. However, our gate was chosen to surround the cluster of similarly stained CD48- cells. The small differences that adding the outlier cells could have made would not have affected our conclusions.
After sorting the HSC and MPP1 population, we measured the size of the cells with a Countess Automated Cell Counter (Invitrogen). After cell counting we loaded cells at a concentration of 2.5 × 105/ml on a C1 Fluidigm 5–10 um RNA-seq chip immediately, and the number of captured cells at each capture site was recorded under a microscope. Harvested amplified cDNAs from a 5 to 10 um C1 Fluidigm chip were transferred to a 96-well PCR plate (Denville, ThermoGrid PCR Plates 96-well standard, Cat# C18096-10). The PCR plate was sealed with Microseal ‘B’ seal Seals (Bio-rad Cat# MSB1001). The concentration of amplified cDNA was measured by using the Quant-iT™ PicoGreen® dsDNA Assay Kit (Invitrogen Cat# P11496). The libraries were generated by using a Nextera XT DNA sample preparation kit (96 samples) (Illumina, Catalog# FC-131-1096) and Truseq Dual Index sequencing Primer (paired-end) kit (Illumina, Catalog# PE-121-1003), with 96 barcoded samples combined. The libraries were pooled and purified using AMPure XP beads (Beckman Coulter, Inc, item # A63880), with QC evaluated with a bioanalyzer. The combined libraries were sequenced in one lane per plate on the Illumina HiSeq2500 instrument with 101-bp paired end sequencing. On average 1.4 million reads per cell were obtained, sufficient to detect most of the genes in the library (47). A total of eight samples of normal murine precursors, five HSC and three MPP1, were analyzed as were three samples of human CD34+ cells and two samples of EML cells.
RNA-seq was performed according to Illumina protocols on the illumine 2500 instrument. For bulk RNA analysis at least ten million reads were obtained per sample. RNA-seq reads were mapped to the mouse genome (mm9 from the UCSC database) by Tophat (v1.3.3) with ‘–r 100’ option (48). Gene expression values were then calculated by Cufflinks (v1.2.1), using RefSeq genes as reference annotation with ‘-G’ option (49).
Differentially expressed genes for each subset were identified (Supplementary Table S3). Principle component analysis (PCA) was performed for genes with an average expression level more than 1 RPKM and gene expression level was transformed to ln(e) (RPKM) value after adding a pseudo count of 1. Principle components (PCs) 1–3 were used to separate cell population in each lineage (Supplementary Figure S3A). PC11 was used to detect DEGs between HSC and MPP1. PCs 8, 9 and 13 were used to identify genes, preferentially expressed in MPP2, 3 and 4, respectively (Supplementary Figure S3B). In this study, 0.4 factor loading (FL) was used as a cutoff of DEGs;for example, genes with FL in PC11 < –0.4 and > 0.4 were selected as HSC and MPP1-specific genes. Enrichment of these DEGs in cell populations sorted by (50) was evaluated by Gene Set Enrichment Analysis (GSEA) software (v2.0.14) with 1,000 permutations of the gene set, weighted enrichment statistic and signal2noise metric without collapse of the dataset (51).
We performed hierarchical clustering with the dist and hclust function in R. Network analysis of 18 cell populations were constructed from a Euclidian distance matrix D of all principle component scores. The distance was then converted into a similarity score matrix S as, . Two cell populations with S>0.85 were then connected to each other (Supplementary Figure S3H).
Single-cell RNA-seq reads were processed by Tophat and Cufflinks as described above. Single-cell data with mapping ratio >60% was retained for subsequent analyses. The data derived from wells with zero cells or more than one cell was removed (Supplementary Table S4).
Single-cell RNA-seq data was visualized by t-Distributed stochastic neighbor embedding (tSNE) using the RtSNE package (52). This program attempts to minimize the additional information necessary to reconstruct the multidimensional relationship among samples as compared to the information contained in the two dimensional map, and considers each data point as a sampling from a normally distributed variable. Log2(RPKM+1) values were used for tSNE with options ‘initial_dims = 10, perplexity = 10, theta = 0.5, check_duplicates = T’ except as otherwise noted. The tSNE output was used for hierarchical clustering using the hclust function with ‘method = ward.D2’. Smoothed curves were generated by the loess method in R, using default parameters.
DEGs between active and quiescent clusters were identified using two-fold cutoff and P < 0.05 by a one-tailed t test. Gene Ontology (GO) analysis was performed using GO stats in Bioconductor. Multiple hypothesis testing was adjusted using the Benjamini–Hochberg method.
Gene sets of transcriptional modules for hematopoiesis were downloaded from the Stem Site website at the George Daley lab (http://daleystem.hms.harvard.edu/) (53). The cell cycle stage was defined by expression pattern of 30 stage-specific genes, which were shown in (54) and also by expression of manually selected sets of genes known to function in DNA synthesis or in generation of the mitotic spindle, as described in the main text. ‘Lymphocyte’, ‘Granulocyte’, ‘Megakaryocyte’ and ‘Erythrocyte’-specific genes were selected by 4-fold higher average expression in ‘CLP and CLP_Sca-1hi’, ‘CMP, GMP-7 and GMP-11’, ‘Mature_Meg’ and ‘preCFU-E, CFU-E and ProEry’ than the other populations, respectively (Supplementary Table S5). Analysis of myeloid and erythroid gene expression was also performed with gene sets derived from our bulk sequencing of CMP, MEP and GMP as described in the main text. ‘Quiescence’ genes were obtained from (18). Cell cycle, DNA replication, DNA repair and apoptosis-related gene sets were obtained from MSigDB (software.broadinstitute.org). For identification of major changes in gene expression we generally limited the analysis to genes that were expressed, on a log scale, at an average of at least 1 per cell for a relevant subset of cells, and also showed at least two-fold changes in expression between subsets (http://www.broadinstitute.org/gsea/msigdb/index.jsp). GSEA of lineage-specific and quiescence genes was performed in each cluster with comparison to the rest of the clusters. Statistical significance of GSEA was set as FDR <0.1, which is more stringent than default setting of GSEA software. The list of gene sets used for GSEA is presented in Supplementary Table S5.
The tSNE analysis was also performed on HSC and MPP1 from anemic mice combined with those from normal mice. For the classification of HSC and MPP1 from anemic mice, we calculated Euclidean distances from cells into each subgroup (‘G2/M’, ‘G1/S’, ‘early G1’, ‘quiescent’ and ‘unidentified’). Then, we categorized HSC and MPP1 cells from bleeding mice into the subgroup showing the shortest average distance.
A total of 28 C57/Blk6 wild type mice ranged 6–8 weeks old were used, including nine males and nine females for bleedings, and five male and five female controls. All procedures were performed in compliance with relevant laws and institutional guidelines and were approved by the Yale University Institutional Animal Care and Use Committee. C57BL/6J mice were obtained from Jackson laboratory (Bar Harbor, ME, USA). Mice were kept in microisolator cages, given autoclaved food and water, and handled only with gloves in a biological safety cabinet.
To generate bled, anemic mice, the mice were injected with 2 ml of pre-warmed normal saline IP 2 days before the start of bleeding. At day 2, 500 ul of blood was collected retro-orbitally from ‘bled mice’ to induce anemia. Saline injection was performed on both bled and ‘control mice’. The bleeding and saline injections were repeated on days 5, 8 and 11. On day 12, the mice were analyzed.
Complete blood counts were performed on a Hemavet (Drew Scientific, Waterbury, CT, USA) according to manufacturer's instructions. Peripheral blood was stained with antibody against Ter-119 PE (eBioscience, San Diego, CA, USA) followed by staining with thiazole orange (Sigma-Aldrich, St. Louis, MO, USA) in PBS and assessed by flow cytometry.
We examined subpopulations (Supplementary Table S2, Supplementary Figure S1) of the hematopoietic cells from mouse bone marrow by separating the cells with preparative FACS (Supplementary Figure S2), and analyzed transcriptional profiles from FACS samples with a thousand cells of each of 18 separate cell fractions (Supplementary Table S3) (1,55). The subpopulations of cells were distinguished by PCA (Supplementary Figure S3A and B). Around 250–300 genes were identified as differentially expressed genes (DEGs) in each population (Supplementary Table S3, Supplementary Figure S3C and D). The additional cell surface markers CD62L (56) and CD97 (57) were negative in HSCs, but a part of the MPP1 (25% and 31% respectively) expressed both proteins on their cell surface (Supplementary Figure S3E and F). FACS analysis of forward scatter and Sca-1 expression also showed somewhat more heterogeneity in MPP1 than in HSC (Supplementary Figure S3G). The sub-populations of cells were clustered in two (Supplementary Figure S3H) or three (Figure (Figure1)1) dimensions based on Euclidean distance. MPP1 was located closer to the HSCs and separated from the other MPP fractions (Figure (Figure1),1), consistent with an early observation (50).
To better understand the heterogeneity of the earliest precursors we sequenced single cells of biologic replicate samples of HSC and MPP1 using the Fluidigm C1 system and Smart-seq method (Clontech). Single-cell RNA-seq libraries with >60% of the products mapping to murine cDNA were highly reproducible (Pearson correlation coefficient > 0.9) (Supplementary Figure S4A). An average of 1.4 million and 1.7 million reads per cell, corresponding respectively to 112 HSC and 109 MPP1 single cells of normal mice met our criteria and were used for subsequent analysis (Supplementary Table S4).
We used the t-Distributed Stochastic Neighbor Embedding (tSNE) program (52) for display of the results of single-cell RNA sequence data (Figure (Figure2A).2A). The tSNE method gives sensitive separation of closely related groups of objects, although there is variation from run to run in the relative position of remote groups. Unlike principal component analysis (PCA), this method gave a clear separation of pooled HSC and MPP1 single cells into three subsets. This pattern was reproduced in biologic replicates or when the parameters of tSNE were varied (perplexity from 10 to 30 and use of 7 to 15 principal components) (Supplementary Figures S5A and S6A). The majority of HSC cells (55%) and a small fraction of MPP1 cells (7%) were classified into the first cluster (blue ellipse in Figure Figure2A),2A), which we refer to as the quiescent cluster. Gene signatures significantly enriched in this cluster (Figure (Figure2B,2B, ,CC and Supplementary Table S6) included: (i) quiescent HSC genes (18) (FDR = 0.001); (ii) previously-defined transcriptional modules for definitive HSCs (FDR < 0.058) (53); and (iii) Immunity related genes, external stimuli, interferon response, and cellular defense responses genes (FDR = 0.002) (Figure (Figure2D,2D, Supplementary Figure S5B and C, Supplementary Table S5). The expression level of the immunity related genes was significantly correlated with that of quiescence-related genes (Pearson correlation = 0.873, P < 2.2e-16), and varied markedly between cells within the quiescent cluster (Figure (Figure2D2D and F, Supplementary Figures S5C and S6C), with a small fraction of the cells expressing high levels of these products (Supplementary Figure S5B). The quiescence related genes were also expressed at considerably different levels in different cells of the quiescent cluster.
Most of the MPP1 cells (74.5%) and a small fraction of HSC cells (21.5%) (Figure (Figure2A2A and B) fell into the second cluster (red ellipse in Figure Figure2A),2A), which we refer to as the active cluster. Comparison between the cells in active and quiescent clusters showed extensive up and down regulation of genes (Supplementary Table S6). The active group displayed depletion of quiescence-related genes (FDR = 0.001) and enrichment of cell cycle and DNA replication-related genes (FDR = 0.004 and 0.002, respectively), indicating active cell cycling (Figures (Figures2D,2D, ,5B,5B, and Supplementary Figure S4D and E), as well as increased expression of erythroid, megakaryocytic and granulocyte/macrophage related genes relative to the quiescent cluster (Figure (Figure2E).2E). Many of the top genes preferentially expressed in the active cluster were related to later phases of the cell cycle including spindle formation, centromere duplication, and DNA replication and repair (Figure (Figure2D,2D, Supplementary Figure S4F).
The third cluster of cells, the U-cluster (uncharacterized), in the black ellipse (Fig, (Fig,2A),2A), was composed of 32 (24%) HSC and 18 (16%) MPP1 cells. This cluster was characterized by lower RNA reads and fewer genes detected per cell (1.3 million reads and 1,045 genes on average) than the other two clusters (2.8 million reads and 3216 genes on average) (Figure (Figure2G),2G), although spike-in RNAs for the most part gave higher reads in cells from the U cluster (Supplementary Figure S7D). The pattern of forward scattering on FACS analysis showed that almost all HSC cells fell into a rather tight cluster, arguing against the possibility that the reduced RNA in the third cluster was due to the presence of a group of cells that were substantially smaller or more shrunken than the remainder. When more cells were analyzed, the U-cluster appeared more compacted (see Figure Figure5,5, Supplementary Figures S6A–D, S7F and S10).
The distribution of Sca1 mRNA and surface protein also indicated heterogeneity in the HSC and MPP1 cells. Wilson et al. (40) compared four methods of preparing HSC and looked for expression patterns common to cells obtained by each method. These authors concluded that high Sca-1 (Ly-6A/E) was an effective marker for identifying true long term repopulating cells. We confirmed that there was heterogeneity in the Sca-1 surface marker and mRNA levels in these cells, and found that most of the cells with high levels of Sca-1 mRNA are in the quiescent cluster (Supplementary Figure S10E). However in anemic mice (Supplementary Figure S3G), the levels of Sca-1 are increased even though the precursor cells from such mice have been found to be defective in long term repopulation, so that Sca-1 is a correlated but not a determining factor for long term repopulating cells. Of the 13 genes (B2m, Ndufa3, H2-T9, Ifitm2, Sepw1, Tbxas1, Rps27, Gm12657, Txnip, Ifitm1, Tmem176b, Ifitm3, Shisa5) whose expression correlated most closely with that of Sca-1, three are transmembrane proteins known to be induced by gamma interferon (Ifitm1, 2, and 3), and another of the 13 genes, Shisa5, has also been associated with apoptosis induction (58,59).
To further compare the relative expression of genes, we multiplied cDNA RPKM levels in each cell by a factor such that the total level of counts per cell mapped to known cDNAs was similar in each cluster. After this adjustment the separation of cells into three clusters was still retained and the U cluster cells were enriched for mRNA of genes identified as G0 genes in fibroblasts (60).
We performed a third batch of scRNA-seq in HSCs (replicate 3) with an RNA spike-in (Supplementary Figure S7A, D, G). We filtered out two single-cell libraries flawed by lower efficiency of PCR amplification (Supplementary Figure S7B), and 12 libraries with lower mapping rates (Supplementary Figure S7C). The tSNE analysis showed a similar distribution of three groups to that seen with other replicates of HSC scRNA-seq. The read count for mitochondrial sequences, proposed as an indicator of damaged cells (61), was generally below 12%, although the U-cluster included about 45% cells with a higher number of reads derived from the mitochondria genome (Supplementary Figure S7E). Removal of the cells with over 12% mitochondrial sequences did not alter the pattern of the tSNE display including the detection of three major clusters of cells (Supplementary Figure S7F). The ratio of reads of the RNA spikes to reads of the RNA of the expressed genes was higher in the cells with lower numbers of gene reads (Supplementary Figure S7C), indicating that the cells with low gene reads were a result of lower levels of mRNA per cell rather than a failure of the reverse transcription or downstream reactions.
We also performed FACS in which we omitted CD34 staining, and combined MPP1 and HSC cells, to allow careful exclusion of all cells with surface annexin-V staining. The scRNA-seq data from the annexin-V negative population (live cells) was analyzed by tSNE and showed a pattern resembling that from the preceding experiments except that there was reduction in the number of cells in the U-cluster from about 20–24% to about 12% (Supplementary Figure S8A–D).
We mapped the expression pattern of each cell to sets of genes differentially expressed in the G1/S, or G2/M phase of the cell cycle as well as small sets of genes known to have S and G2 phase specific expression (54,62) (Supplementary Table S5).Based on the expression of stage-specific genes, the active cluster was classified into three states: G2/M, G1/S and early G1 (Figure (Figure3A).3A). In the tSNE patterns, cells in the G2/M state lie at the tip of the branch of the graph containing active cells (Figure (Figure3A).3A). Expression of quiescence-related genes was significantly lower in the G2/M state (Figures (Figures3B3B and 4A). Because not all the genes in the lists in the literature could clearly be related to processes specific to S, G2 or M phases of the cell cycle, we refined the set of G2/M specific genes by selecting 41 genes that corresponded to centromere mitotic, tubulin functions or spindle check point genes (Supplementary Table S5 Refined G2/M). In a repeat tSNE analysis (Supplementary Figure S6A), expression of these genes was highly augmented in 10–15 cells out of the 80 cells in the active cluster (Supplementary Figure S6B). A second subset of cells displayed higher expression level of G1/S and S phase-specific genes but not G2/M and M phase genes, and was located proximal to these G2/M cells in the tSNE map. We separately analyzed the relative expression of each of a number of genes related to DNA synthesis or to the immune response, including Pcna (Supplementary Figure S6D), Ifitm (Supplementary Figure S6C), and MCM genes, subunits of DNA polymerase E, ribonucleotide reductase, and other genes signaling the onset of the S phase. The highest expressing cells for each DNA synthesis gene mapped in the region enriched for the pool of G1/S and S phase genes. However the cells with the highest levels of the individual genes were partially different from one another, and deeper sequencing might detect the sequence of events during the onset of DNA synthesis (Supplementary Figure S5A in green cluster of the red circle). M/G1-specific genes were found in two groups, one near the G1/S genes and the other close to the quiescent cells (Supplementary Figure S5A, the black cluster).We classified the second subset of cells as being in an early G1 state. Although occasionally quiescent cells expressed G1/S and S phase genes, none expressed elevated levels of G2/M genes (Figure (Figure5B).5B). In the anemic mice the HSC (CD34-) were below the S/G2/M region, while the MPP1 (CD34+) fell within that region (Figure (Figure5C).5C). This suggests that on passage through the S/G2/M phases of the cell cycle, cells acquire expression of surface CD34.
We selected signatures of groups of genes that were specifically expressed in either the erythroid, the megakaryocytic, the lymphocytic or the granulocytic lineages from our RNA-seq data (Supplementary Tables S3 and S5). For granulocyte/ monocyte and for lymphoid cells we selected genes expressed at 4-fold higher levels in the appropriate precursors than in the average of precursors for other lineages. For the erythroid and megakaryocytic clusters we selected genes that were expressed 4-fold higher in erythroid precursors than in megakaryocytes or other lineages, and vice versa. These gene sets were projected across the tSNE groups (Supplementary Figure S10). The cells expressing higher levels of lymphoid related genes were separated from those expressing other lineages, and fell into the quiescent cluster (FDR = 0.012) (Figure (Figure2E).2E). In contrast, cells expressing higher levels of erythroid, granulocytic, and megakaryocytic genes tended to fall into the active cluster (Figure (Figure2E,2E, FDR = 0.001, 0.09 and 0.054, respectively). Genes of multiple lineages were often expressed in the same cell, but the average relative levels of expression of the lineage-specific genes varied with the stage of the cell cycle (Figure (Figure3C3C).
We also plotted expression data in cells ordered according to the rank order of levels of expression of G2/M genes (Figure (Figure3D),3D), and compared the pattern with that of S phase and G2M phase genes (Figures (Figures3E3E and 5D, ,E).E). This showed that the cells with high expression of erythroid genes were in the G2/M phase. Megakaryocyte and granulocyte precursor genes showed similar levels of expression relative to erythroid genes in the early G1 state of the active cluster (FDR = 0.031 and 0.086, respectively) and lower relative expression in G1/S and G2/M state (FDR>0.198) (Figure (Figure3C)3C) although the megakaryocyte genes rose more than the granulocyte precursors genes in G2/M cells (Figure (Figure5D5D and E). The erythroid genes showed prominent up regulation in the G2/M (FDR = 0.003) and some up-regulation in G1/S state cells (FDR = 0.083) (Figure (Figure3C,3C, ,DD and E). A composite PCA analysis (Supplementary Figure S5E) showed that, relative to megakaryocyte genes, granulocyte/macrophage genes were more highly expressed in many MPP1 cells than in HSC, demonstrating further levels of heterogeneity in these precursor cells. Single cell RNA-seq of total human CD34+ cells expanded for 6 days without erythropoietin (50) also showed a striking rise of erythroid genes in cells in the S/G2/M phase (Supplementary Figure S5D), so that the elevation of erythroid mRNA in G2/M phase early precursors is not species-specific.
To confirm the results we performed a separate analysis after preparing sufficient numbers of CMP, MEP, and GMP cells to be able to doRNA-seq without amplification. From this data, we then selected genes that were more than two-fold over expressed in either MEP or GMP relative to the average level of expression for the three cell types and were also more than four-fold over expressed in MEP versus GMP or vice versa. Again the pattern for the level of expression of granulocyte/macrophage genes was different from that for the erythroid genes, with a marked increase in erythroid gene expression in cells with the highest levels of G2/M genes (Supplementary Figure S10C).
To confirm that particular clusters represent quiescent cells that can become activated, we performed single-cell RNA-seq on HSC (bHSC) and MPP1 (bMPP1) from mice whose erythropoiesis had been stimulated by bleeding (Figure (Figure4,4, Supplementary Figure S9). The tSNE analysis with 99 single cells with acceptable sequencing quality revealed that a higher fraction of HSC and MPP1 cells were in active states in acute blood loss mice than in control mice (Figure (Figure4A).4A). Approximately 80% of HSC were in either the U or quiescent clusters in normal mice, but more than 70% of the HSC fell into the early G1 and G1/S stages of active cells in the anemic mice (Figure (Figure4B,4B, Supplementary Figure S10B), i.e. only 20% of the cells were in the Uor quiescent state. In contrast, the proportion of MPP1 in the active state was not different between control and anemic mice (74.5% and 75.5%, respectively) although the MPP1 cells in the active cluster from anemic mice were shifted towards the S and G2/M enriched regions of the active cluster. This data indicates that the stress of anemia caused most of the quiescent cells to shift into the active state, and promoted progression through the cell cycle within the MPP1 population (Supplementary Figure S10A and B). There was a reduction of cells in the U cluster from 24% to 8%, P = 0.0009 (one-sided t-test), with Malat1 remaining higher per cell in the U cluster (Supplementary Figure S10D), and Sca1 was high in the active cluster HSC (Supplementary Figure S10E). HSC increased gene expression for all lineages in the anemic mice (P < 1.42e–5 by the t test) but this was due to the decreased fraction of inactive cells (Figure (Figure5C)5C) rather than increase per cell among the more active HSC. HSC showed a higher level of Sca-1 expression than in control mice (Supplementary Figure S3G) as well as expression of relatively high levels of a number of immune response genes (Supplementary Tables S7–S9). MPP1 did not show any significant difference in lymphoid genes, but significantly induced granulocytic, megakaryocytic and erythroid genes (P < 0.03 by the t-test). The effects of stage of the cell cycle on erythroid gene expression was similar in the anemic and control mice (Figure (Figure5D,5D, ,EE and Supplementary Figure S10C). Overall the effect of bleeding was depletion of cells from the U and quiescent clusters, and augmentation of the fraction of cells in the cycling cluster.
Among the most significantly induced genes in bHSC as compared to resting HSC and MPP1 (Supplementary Tables S7 and S8) were a series of transcription factors (7 out of the 10 most strongly induced genes) that were generally not specific to any one lineage. Of these all except Id1 and Hes1 were significantly increased in normal MPP1 compared with normal HSC, so that activation of the factors other than Hes1 and Id1 may reflect shifting of HSC into the active cluster. A small group of genes including Hes1, Id1 and Lst1 were more highly expressed in both MPP1 and HSC from anemic mice than in either cell type in control mice and these are briefly discussed below.
Consistent with earlier reports (63), also in the discussion in (64), single HSC or MPP1 cells expressed, in an apparently stochastic manner, a range of genes characteristic of various differentiated hematopoietic lineages. However this promiscuity was selective. We compiled lists of transcription factors characteristic of the precursors of lymphocytes, megakaryocytes, erythroid cells, or granulocytes/macrophages and added to these lists a number of transcription factors that function in specific lineages of hematopoietic cells (Supplementary Table S10). Most genes in each list were expressed in over 5% of the cells, often over 10%. However a small set of transcription factors were expressed either not at all, or only in a very few cells and at low levels (Figure (Figure6).6). Among the silent factors were several that are known to be critical and highly expressed during specific lineage development: Cebpa (Figure (Figure6B)6B) for granulcoyte/macrophage and, at higher levels, specifically granulocyte (65); Klf1 (Figure (Figure6A)6A) for erythropoiesis, at late stages of maturation of a lineage(Cebpe for granulocytes (66) (Figure (Figure6B),6B), or to specify sub-lineages from a differentiated precursor (for example, Foxp3 for Treg cells (67). Several of the 55 lymphocyte specific transcription factors were not detected (Figure (Figure6C).6C). This may reflect the multiple branches of lymphocyte differentiation. All megakaryocyte transcription factors were expressed in HSC and MPP1 (Supplementary Figure S6D), so that, in this sense, as has recently been discussed (68), megakaryocyte development is a default process.
Klf1 is a well-studied erythroid transcription factor that is highly expressed during the course of erythroid cell maturation (69) and coordinates many aspects of erythropoiesis (70,71). To confirm the particular features of expression of this gene, we also examined Klf1 expression during erythropoietic development in the murine multipotent hematopoietic cell line EML (72), and in human peripheral blood mobilized CD34+ precursor cells.
Treatment of human peripheral blood mobilized CD34+ precursor cells with erythropoietin causes erythroid differentiation and maturation (Supplementary Figures S11C and S12A, B, C). Among the transcription factors strongly expressed in maturing erythroid human CD34+ cells, KLF1 showed by far the greatest amplitude of induction (Supplementary Figure S11C). During erythroid development from human CD34+ cells, KLF1 induction preceded or paralleled the induction of other most strongly induced genes (Supplementary Figure S12A). In the first few days after stimulation of the human cells with erythropoietin, a set of megakaryocyte specific genes including Pf4, Ppbp, Itga2b, Gpiba and Vwf were activated, but these were progressively silenced when Klf1 mRNA approached its peak levels (Supplementary Figure S12B).
EML suspension cultures contain CD34+, Sca-1+, Lin− precursors, which spontaneously differentiate into intermediate erythroid precursors (44). Klf1 was essentially silent in the FACS sorted EML precursor cells, unlike most other erythroid transcription factors (Supplementary Figure S11B), but was the most strongly induced transcription factor in EML erythroid cells.
Knockout of Klf1 in mice produces early embryonic lethality due to a failure to form properly hemoglobinized red cells, and Klf1 deficient cells also fail to normally express a variety of other erythroid genes (73). Nevertheless erythroid progenitors are formed in Klf1 knockout embryos (74) so that initial steps of erythropoiesis are not exclusively dependent on Klf1.Mutations in KLF1 are associated with a range of red blood cell disorders (75). The results summarized above suggest that full Klf1 activation contributes to or controls a stage transition during erythroid differentiation. The sequences in the first 900 bp upstream of the transcription start site direct cell type and state specific Klf1 transcription (76,77). Histone 3 lysine 27 acetylation is a mark for active promoters and enhancers, and is very much reduced in the 900 bp upstream of the Klf1 gene in EML precursors compared to the spontaneously emerging erythroid cells (Supplementary Figure S12D), so that the regulation of Klf1 mRNA occurs at the transcriptional level.
The mechanisms differentially regulating Klf1 expression are of special interest. The ratio of Gata2 to Gata1 declines during erythroid differentiation (78–80), and may regulate the levels of Klf1 expression. This is consistent with the results from human CD34+ cells (Supplementary Figure S12C). However, this ratio changed only 3.4 fold during differentiation of the EML cells, based on bulk RNA-seq (in the same experiment as Supplementary Figure S11). In addition to Gata1 and Gata2, other factors including Smad and Dek oncoprotein (77) also regulate the transcription of Klf1, but the levels of DEK and SMAD proteins during erythroid development from human CD34+ cells did not increase with the levels of Klf1 (Supplementary Figure S12C).
Another factor, Bcl11a, in EML cells, unlike in mouse HSC and MPP1, was also silenced similarly to Klf1 (Supplementary Figure S11B). Bcl11a is a major player in regulating the switch from fetal to adult hemoglobin in humans (81) and its transcription is mediated in part by Klf1 (82). Therefore, Bcl11a is another example of a selectively silenced regulatory gene at a developmental transition state. A recent report confirmed that both Klf1 and Bcl11A are needed for substantial expression of fetal and adult hemoglobin in human fetal erythroblasts (71).
Overall, there are transcription factors acting at transition steps during hematopoietic development that are under more stringent or specific control in HSC/MPP1 than most lineage-specific transcription factors and are essentially silent in early precursors. This may offer an empirical test for identifying a subset of transcription factors or other genes that are uniquely required for particular developmental transitions in other systems.
Defining the differences in composition between closely related HSC and MPP1 populations that cause their differences in long term repopulation of marrow is an approach towards understanding the basis for this fundamental property. Surface CD34 in mice has been associated with lack of long term repopulating activity (32,83). In the present studies, cells that were at the S/G2/M phases of the cycle were CD34 positive. Because CD34+ cells are less efficient at long term repopulation this is consistent with the suggestion that cells that are actively replicating lack long term repopulating activity in standard assays.
Our bulk RNA-seq data are in agreement with earlier results (18,19,50,84–87), and showed that the lineage-related precursors fell into different volumes of the first three dimensions of PCA space, and that the MPP population was heterogeneous (88). The number of genes detected in a single cell by current methods may vary over as much as an order of magnitude between vigorously growing or large cells and relatively inactive small cells. Cutoff levels of 2300–4000 genes per cell have been used by others, with partly different goals. However, resting T cells show an average of <1300 genes per cell in Fluidigm C1 analyses. There could be a risk of biasing the result against some interesting cells if we had used a high cutoff for the genes identified per cell. We therefore chose to accept all capture sites that contained a single cell with over 60% mappable reads and >800 genes expressed/cell. 80% of the empty wells showed <700 genes, with an average of 180 genes per empty well. The remaining empty wells gave results resembling wells with a single cell and may have either trapped single cells that were missed by visual inspection or trapped cytoplasmic fragments of cells. We emphasize that it is unknown whether the cells in the U-cluster are a technical artifact or whether they have some biologic significance although we note that other groups have identified a biologic function for cells with low read counts in other systems (90,91).
The tSNE analysis displayed heterogeneity within HSC and MPP1. The major component of HSC is clearly separated from the MPP1 cells and displays gene signatures of quiescence and elevation of mRNAs for Sca-1 and a subset of interferon responsive genes. The main cluster of MPP1 is characterized by higher expression of cell cycle and DNA replication-related genes.
The subsets of HSC and MPP1 were defined by the destructive procedure of transcriptome analysis, limiting our ability to apply concurrent functional analyses to these cells. Further understanding of these clusters would be greatly expedited by development of a method for selectively isolating viable subsets of cells in a non-destructive protocol. Assignment of other molecular characteristics such as epigenomic changes to subgroups of HSC would require the application of demanding and low throughput techniques such as simultaneous analysis of the RNA and DNA of the same cell (89–93).
Lineage choice in precursor cells may require cells to go through a replicative cycle (94–98). Recent studies have suggested that the lineage biases in differentiation of hematopoietic (99) or embryonic stem cells (100,101) may vary with progression of the cell cycle. In particular, studies of colony formation from HSC expanded in a cytokine mixture in vitro indicated that megakaryocytes may preferentially be produced from cells early in the S phase at about the same time as proliferative granulocytes, while non-proliferative granulocytes are produced by cells in mid-S phase (102). Socolovsky et al. (94) have found that a key commitment step for maturation of erythropoiesis is linked to progression through the cell cycle. The present results show that there was an erythroid specific augmentation of gene expression that occurs in cells as they passed through the G2/M phase of the cycle, and this could explain the effect of cell cycling on differentiation. There are a number of possible mechanisms for the G2/M phase relative elevation of erythroid related transcripts in G2/M including the speculation that this may be related to the observation that Gata1 remains associated with chromatin throughout mitosis (103) and therefore may not be removed from chromatin during S phase.
This data suggests a model in which lineage choice is influenced in part by the stage of the cell cycle at which the cell receives signals for lineage activation, perhaps different from signals needed for self-renewal or survival (104). If lineage selection is based on stochastic changes in the balance of regulatory factors, then differences in the relative amounts of erythroid and myeloid factors during the cell cycle might bias stochastic activation of one or the other lineage. Coupling of lineage bias with stages of the cell cycle could play a role in maintaining the balance between productions of different types of cells. Alternatively, the high ratio of erythroid to granulocytic gene expression in the G2/M cells could reflect the relative rates of production needed for the different lineages, if differentiation occurred in the early precursors during G2/M.
Induction of anemia caused HSC to shift into the active cluster together with MPP1, and increased the fraction of MPP1 cells in cycle. However, the transcriptome of activated cells did not show any bias towards erythroid or megakaryocytic development beyond that seen in normal cells at the same stage of the cell cycle although the production of downstream progeny would be directed towards the erythroid lineage, so it appears that lineage specific amplification occurs downstream of these earliest precursor cells.
Id1 and Hes1 are two transcription factors whose mRNAs were elevated on average over 10-fold in HSC and MPP1 from anemic animals above the levels seen in either normal HSC or MPP1, although the induction is partly a consequence of more cells expressing these genes. Id1 is a transcription factor that antagonized the action of various basic HLH factors, inhibits differentiation in other systems (105), and functions to maintain stem cells (106–108). Hes1 is also a negative regulator of transcription that functions in the maintenance of cancer stem cells (109) and is an essential regulator of HSC development (110). Elevated Id1 and Hes1 in the HSC from anemic animals may play a role in retaining HSC or in promoting expansion of early hematopoietic precursors by limiting their rate of differentiation.
The earliest hematopoietic precursors show an apparently stochastic activation of transcription of a variety of genes characteristic of progeny cell lineages (63). However, a few transcription factors, perhaps as few as one per developmental transition, are silent or only very weakly expressed. Examining one of these factors, the erythroid factor Klf1, in more detail, we found that it is the transcription factor with the greatest amplitude of change in expression level in both mouse EML and human CD34+ cells modeling erythroid development. Klf1 is an example of a subset of transcription factors coupled to and regulating distinct transitions during development of various lineages. Such transitions represent discrete developmental steps subsequent to initial choice of lineage predilection. Understanding the basis for the failure to stochastically activate Klf1 in precursor cells might have more general significance for understanding the staging of differentiation events in other cell types.
In summary single cell RNA sequencing showed distinct clusters of cells in both the standard HSC and MPP1 population, including active cells and quiescent cells, and a new, undefined fraction of cells. HSC express a number of interferon response genes that may contribute to quiescence. However, when erythropoiesis was stimulated by induction of anemia, most HSC cells enter into active phases of the cell cycle without silencing of the interferon response genes, so that induction of cell cycling occurs despite continuing presence of mRNA for genes corresponding to interferon-like cell responses. More precise information would become available as it becomes feasible to directly study the protein content of the cells.
The relative levels of expression of mRNAs characteristic of early stages of differentiation of various lineages correlates with stage of the cell cycle in individual cells, possibly contributing to maintenance of balanced multipotency. Strikingly, erythroid lineage genes are relatively highly expressed in G2/M stages. Meanwhile, a small subset of lineage specific transcription factors, perhaps one factor per developmental step, such as Klf1 in erythroid lineage, are largely silent in the HSC and MPP1, and these may mediate limiting steps in the progression of differentiation along various lineages. Megakaryocyte differentiation is an exception, as the megakaryocyte related transcription factors we detected were all expressed in a number of HSC and MPP1 cells. Therefore, the oscillation or burst of gene expression in different single cells of early hematopoietic precursors, at least for some genes correlated to cell cycling, are functionally associated to lineage specific-expression. The stress of anemia causes HSC to shift to the active cluster and activates expression of several genes including Id1 and Hes1 in both HSC and MPP1, perhaps shifting the balance between precursor renewal and differentiation.
RNA-seq data sets supporting the results of this article are registered in the GEO database. The accession number is GSE68981.
We thank Dr. Nenad Sestan for support single cell RNA-seq, Candace Bichsel Tebbenkamp for assistance in single cell experiment arrangement, Lucia Ramirez, Nathaniel Watson and Nathan Hammond for sequencing, Zhao Zhao for FACS sorting. Computation time was provided by the Yale University Biomedical High Performance Computing Center.
Author contributions: S.M.W. and X.P, designed and coordinated the study. J.Y. and M.S. prepared the cells. S.H. and A.T. prepared anemia mice. J.Y. performed cell fractionation and FACS analyses, J.Y. performed bulk RNA-seq. J.Y. and Z.L. performed single cell RNA-seq. Y.T., S.M.W., and X.P analyzed RNA-seq data. S.M.W., X.P., Y.T., J.Y., L.W., I.H.P., S.H. wrote and/or edited the manuscript. G.E performed illumine high-throughput sequencing. M.P.S. supervised the sequencing. L.X.G., Y.K. and X.Z. performed additional clustering analysis.
Supplementary Data are available at NAR Online.
National Institutes of Health (NIH) [1P01GM099130-01 and R01DK100858]. Funding for open access charge: NIH [R01DK100858].
Conflict of interest statement. None declared.