|Home | About | Journals | Submit | Contact Us | Français|
Recent molecular studies have revealed that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels, and phenotypic output1–5, with important functional consequences4,5. Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs1,2 or proteins5,6 simultaneously because genomic profiling methods3 could not be applied to single cells until very recently7–10. Here, we use single-cell RNA-Seq to investigate heterogeneity in the response of bone marrow derived dendritic cells (BMDCs) to lipopolysaccharide (LPS). We find extensive, and previously unobserved, bimodal variation in mRNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization (RNA-FISH) for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.
To characterize the extent of expression variability on a genomic scale and decipher its functional implications, we used single-cell RNA-Seq to profile a temporal snapshot of the BMDC response to LPS. This is an attractive model system for single-cell analyses for several reasons. First, LPS, a component of gram-negative bacteria and a ligand of Toll-like receptor 4, strongly synchronizes cellular responses and mitigates temporal phasing11. Second, LPS activation evokes a robust transcriptional program that has been extensively investigated at the population level12. Third, LPS stimulation should increase the correlation between mRNA and protein levels for induced genes, thus reducing a potentially confounding factor13. Lastly, differentiated BMDCs are post-mitotic, largely removing cell cycle-dependent transcriptional variation3.
We stimulated BMDCs with LPS and harvested single cells after four hours12 (Supplementary Information (SI)). Using SMART-Seq9, we constructed cDNA libraries from 18 single BMDCs (S1–S18), three replicate populations of 10,000 cells, and two negative controls (empty wells), and sequenced each to an average depth of 27-million read-pairs. Negative control libraries failed to align (<0.25%) to the mouse genome, and were discarded from all further analyses. Library quality metrics, such as genomic alignment rates, rRNA contamination, and 3′ or 5′ coverage bias, were similar across all libraries (Supplementary Table 1). We estimated expression levels for all UCSC-annotated genes using RSEM14 (Supplementary Table 2, SI) and discarded genes that were not appreciably expressed (transcripts per million (TPM) > 1) in at least three individual cells, retaining 6,313 genes for further analysis.
While the gene expression levels of population replicates were tightly correlated with one another (Pearson r > 0.98, log-scale, Fig. 1a), there were substantial differences in expression between individual cells (0.29 < r < 0.62, mean: 0.48, Fig. 1b, Supplementary Fig. 1). Despite this extensive cell-to-cell variation, expression levels for an “average” single cell correlated well with the population samples (0.79 < r < 0.81, Fig. 1c, Supplementary Fig. 1).
We used RNA-FISH, an amplification-free imaging technique2, to verify that heterogeneity in our single-cell expression data reflected true biological differences, rather than technical noise associated with the amplification of small amounts of cellular RNA. For 25 genes, selected to cover a wide range of expression levels, the variation in gene expression detected by RNA-FISH closely mirrored the heterogeneity observed in our sequencing data (Fig. 1d–g, Supplementary Fig. 2). For example, expression of housekeeping genes (e.g., Beta-Actin (Actb), Beta-2-microglobulin (B2m)) matched a log-normal distribution in both single-cell RNA-Seq and RNA-FISH measurements, consistent with previous studies1. In contrast, many genes involved in the LPS response, although highly expressed on average, showed significantly greater levels of heterogeneity, with expression levels deviating ~1,000 fold between individual cells in extreme cases (Fig. 1e–g).
More generally, we observed that single cell variability existed across a wide range of population expression levels (Fig. 2a). Of the 522 most highly expressed genes (single-cell average TPM > 250, Fig. 2a: unshaded region, Supplementary Table 3), 281 had low cell-to-cell variability (coefficient of variation (CV) < 0.25, SI) and were well described by log-normal distributions (RNA-Seq: Fig. 2b,c top, RNA-FISH (Actb, B2m): Supplementary Fig. 2). These 281 genes were enriched for housekeeping genes, encoding ribosomal and other structural proteins (Supplementary Table 2 & 3, Bonferroni-corrected p=1.5×10−6), consistent with previous findings in yeast15 and mammalian cells1.
Surprisingly, however, 185 of the remaining 241 (coefficient of variation (CV) > 0.25, SI) highly expressed genes had bimodal expression patterns (Fig. 2b,c bottom): mRNA levels for these genes were high in many of the cells, but were at least an order of magnitude lower (often very low or undetectable) than the single-cell average in three or more cells. We independently verified this disparity by RNA-FISH (e.g., Cxcl1, Cxcl10, Ifit1, and others: Fig. 1f,g & Supplementary Fig. 2), confirming that it was not a result of technical noise. This variable set included both antiviral and inflammatory response genes, and was highly enriched for genes whose expression was increased by at least two-fold upon LPS stimulation at the population level16 (p = 2.7×10−7; hypergeometric test; Supplementary Table 2). Still, bimodal expression was not a universal feature of immune response transcripts; some key chemokines and chemokine receptors (Ccl3, Ccl4, Ccrl2), cytokines (Cxcl2), and signaling molecules (Tank) were highly expressed in every cell (Supplementary Fig. 3), indicating that all cells were indeed activated by LPS.
This degree of variation in expression for highly expressed (on average) transcripts has not been observed in previous reports7–10. For example, examination of published single-cell RNA-Seq datasets of human embryonic stem cells9 (Fig. 2a), mouse embryonic stem cells, and terminally differentiated fibroblasts10 (Supplementary Fig. 4) revealed far less heterogeneity in expression for highly abundant (population average) genes. Similarly, studies of protein expression in mid-log yeast cells and dividing human cell lines15,17 did not find such bimodality in (on average) highly expressed genes. We thus hypothesized that widespread variability in single-cell gene expression may reflect functionally important differences in the stimulated BMDC population.
Furthermore, we found that splicing patterns also showed previously unobserved levels of heterogeneity across single cells. Specifically, for genes that have multiple splice isoforms at the population level, individual cells predominantly expressed one particular isoform. We calculated the frequency (percent spliced in, PSI) of previously annotated splicing events in each of our samples using MISO18, a Bayesian framework for calculating isoform ratios (Supplementary Table 4). Although the population-derived estimates were highly reproducible, single cells exhibited significant variability in their exon-inclusion frequencies (Fig. 3a,b).
We considered the possibility that PCR amplification (intrinsic to the library preparation process) could potentially produce an overestimation of isoform regulation variability, particularly for weakly expressed transcripts19. However, even when we limited our analysis to 89 alternatively spliced exons (0.2 < population PSI < 0.8) that were very highly expressed within a single cell (single cell TPM > 250, SI), we still observed the same variability in splicing patterns amongst individual cells, with highly skewed expression towards a single splice variant (Fig. 3b). We obtained similar results when we generated three additional single-cell cDNA libraries using a slightly modified SMART-Seq protocol (SI) in which a four nucleotide barcode was introduced onto each RNA molecule during reverse transcription19, enabling us to estimate the number of unique RNA transcripts that existed prior to PCR (Supplementary Fig. 5 & 6 and SI).
To the best of our knowledge, single-cell variation in splicing patterns has rarely been studied for individual genes, and never been analyzed on a genomic scale. One recent report20 used RNA-FISH to study variation in alternative isoforms in two genes, and observed lower levels of isoform variability across single cells (the levels of heterogeneity differed in different cell types). Another study that used fluorescent reporters to quantify single-cell exon inclusion levels for one gene discovered highly variable and bimodal splicing patterns21.
To independently verify the existence of extensive differences in isoform ratios between cells, we designed RNA-FISH probes targeting constitutive and isoform-specific exons in two genes (Irf7 and Acpp, Fig. 3c and Supplementary Fig. 7 & 8)20. We found substantial expression variability in overall Irf7 levels between individual cells (as reflected by the ‘constitutive’ probes, Fig. 3c, bottom and top panels), mirroring our single-cell sequencing results (and further explored below). Additionally, within each Irf7-expressing cell, we observed a bias towards either the inclusion or exclusion of the cassette exon (Fig. 3c, Supplementary Fig. 7, middle panel, e.g., compare ‘high’ and ‘low’ marked cells). We obtained comparable results for Acpp using two probes designed to detect mutually exclusive alternative final exons (Supplementary Fig. 8).
We next explored the sources and functional implications of expression variability. Bimodality amongst highly expressed immune response genes may reflect either the presence of distinct cellular subtypes or stochastic differences in the activation of signaling circuits11. We performed a principal components analysis (PCA, Fig. 4a) on our single-cell expression profiles, focusing on the 632 genes that were induced at least twofold in the population-wide response to LPS16 (Supplementary Table 5). We found two distinct subpopulations, clearly distinguishable by the first principal component (PC1, 15% of the total variation, Fig. 4a). One group of fifteen cells expressed a core set of antiviral and inflammatory defense cytokines (including: Tnf, Il1a, Il1b, and Cxcl10) at extremely high levels (TPM > 1,000), while the remaining three cells expressed them at far weaker levels (TPM < 50). Some cell surface proteins (Ccr7, Cd83) and chemokines (Ccl22), which are known markers of BMDC maturation, showed the opposite expression pattern (Fig. 4b, Supplementary Fig. 9).
During maturation, BMDCs switch from antigen-capturing to antigen-presenting cells that prime the adaptive immune system22. Maturation can occur either in response to pathogen-derived ligands (pathogen-dependent maturation), such as LPS, or when clusters of BMDCs are disrupted in culture22 (pathogen-independent maturation). Both processes lead to induction of maturation markers, but only pathogen-dependent maturation results in co-expression of defense cytokines.
Examining the expression of maturation markers and defense cytokines (Supplementary Fig. 9) suggested that our 18 cells represent two distinct maturity states: (1) fifteen cells that were in the early stages of pathogen-dependent maturation (Fig. 4a, ‘maturing’, triangles; grey triangles, the two cells furthest along in this process); and, (2) three cells that likely matured during the culturing process (Fig. 4a, ‘mature’, squares; pathogen-independent). We further verified the existence of these sub-populations via RNA-FISH (Supplementary Fig. 10), single-cell quantitative reverse transcription polymerase chain reaction (qRT-PCR; Supplementary Fig. 11, SI, Supplementary Table 6), and cell sorting based on surface markers identified from the RNA-Seq data (Supplementary Fig. 12, SI). These results highlight that single-cell RNA-Seq can sensitively distinguish between closely related, yet distinct, developmental states, even within the same cell type.
Since differences in cell state explain only a small portion of the observed heterogeneity, we next examined the variation that might arise from the differential activity of regulatory circuits. We reasoned that co-variation across single cells between the mRNA levels of a transcription factor and its targets would represent a potential regulatory interaction, and, furthermore, would suggest that heterogeneity in the regulator’s expression may underlie the variability of its targets. Such a correlative approach has successfully identified regulatory connections from population-level transcription profiles measured in different conditions12,23. Here, we attempted to apply it to multiple single cells in the same condition.
To this end, we calculated the correlation in expression profiles between every pair of induced genes across all single cells, and identified a cluster of 137 genes that varied in a correlated way and were strongly discriminated by the second principal component (PC2, 8% of the variation, Fig. 4a,b). The cluster’s genes included the known antiviral master regulators Irf7 and Stat2, and were highly enriched for members of the antiviral response12 (60 of 137 genes, p = 2.5×10−3, hypergeometric test, Supplementary Table 5), as well as STAT2 targets16 (73/137 genes, p = 4.5×10−5, hypergeometric test). Most (100/137) of the cluster’s genes were bimodally expressed across single cells (Fig. 2c, bottom) despite being strongly expressed at the population level (13 genes TPM > 250; 53 genes TPM > 50). We independently validated a subset of these correlations using single-cell qRT-PCR and RNA-FISH (Fig. 4c,d). Moreover, single-cell qRT-PCR analysis of additional time points demonstrated that these correlations persisted at 6h as well (Supplementary Discussion in SI, Supplementary Fig. 13).
We hypothesized that bimodal variation in the expression of the cluster’s genes may be related to differences in the levels and activities of Stat2 and Irf7. To test this hypothesis, we measured expression of a set of antiviral genes by single-cell qRT-PCR in LPS-stimulated BMDCs from Irf7 knockout (Irf7 −/−) mice (SI). As expected, this perturbation ablated expression of most of the variable antiviral transcripts in our signature, while leaving non-variable antiviral transcripts relatively unaffected (Fig. 4e). However, Stat2 expression and variability levels were unaffected by the Irf7 knockout, implying that Stat2 may act either upstream or in parallel to Irf7 during the response24 (Supplementary Fig. 14). As both Stat2 and Irf7 are targets of the interferon-signaling pathway, we stimulated and profiled BMDCs from interferon receptor knockout (Ifnr −/−) mice. In these cells, we found drastically reduced expression for both Stat2 and Irf7, as well as all other measured cluster genes (Fig. 4f).
Our analysis provides a proof-of-concept demonstrating how co-variation between transcripts across seemingly homogeneous single cells can help to identify and assemble regulatory circuits. Specifically, in our variable circuit (Supplementary Fig. 14) interferon signaling is required for induction of Stat2 and Irf7, which, in turn, act to induce our variable antiviral cluster genes. Our experiments do not definitively determine, however, which component of the circuit causes the observed heterogeneity per se. One compelling possibility is that upstream noise is propagated from the interferon-signaling pathway first to Stat2 and Irf7 and then to the target genes25,26. This hypothesis is supported by the variation we observed in STAT protein levels and nuclear localization (Supplementary Discussion in SI, Supplementary Fig. 15 & 16). However, since temporal snapshots of RNA and protein are not always directly comparable (Supplementary Discussion in SI, Supplementary Fig. 15 & 16), new strategies for tracing the spatiotemporal dynamics of both proteins and RNA in single living cells are needed to fully test this hypothesis11.
A similar approach could potentially be used to explore the consequences of bimodality in splicing. Even looking at just 18 cells, we witnessed interesting examples of bimodal splicing patterns for genes whose isoforms have distinct functional consequences. For example, the splicing regulators Srsf3 and Srsf7 are each known to contain a “poison cassette exon” that, when included, targets the RNA for degradation via nonsense-mediated decay27 (Supplementary Fig. 17). Meanwhile, splicing differences in other regulatory genes may further enhance expression diversity: for example, proteins encoded by different isoforms of Irf7 (Fig. 3c) differentially activate interferon-responsive genes in vitro24. These examples suggest that heterogeneity in splicing may represent another layer of response encoding.
In conclusion, our study reveals extensive bimodality in the transcriptional response of BMDCs to LPS, reflected in gene expression, alternative splicing, and regulatory circuit activity. While some variation in expression reflects differences in developmental state, other bimodal patterns reflect the differential activity of an antiviral regulatory circuit in this temporal snapshot. These phenomena allowed us to treat each cell as a “perturbation system” for reconstructing cell circuits28, even with relatively few cells.
Moreover, our results demonstrate how co-variation across single cells can help dissect and refine gene modules that may be indistinguishable in population-scale measurements. For instance, in a recent population-scale study16, we identified a large cluster of 808 “late-induced” LPS genes that was enriched for both maturation genes and STAT-regulated antiviral genes. These two subsets could not be separated by population-level expression profiles alone16, but our single-cell data from a single timepoint clearly distinguishes them. Similarly, the unexpected and prevalent skewing we discovered in alternative splicing between single cells revises our molecular view of this process.
Finally, although many of our analyses focused on highly expressed genes to reduce the potential influence of amplification noise, our data also revealed substantial bimodality amongst more moderately expressed transcripts, such as large non-coding RNAs (lincRNAs, Supplementary Fig. 18). This suggests that the low population-level expression of these transcripts29 may sometimes reflect high expression in a small subset of cells as opposed to uniform levels of low expression. While further technical improvements will be necessary to disentangle these two hypotheses (Supplementary Fig. 5), single-cell measurements should help facilitate the discovery and annotation of lincRNAs.
Comparing our results to other single-cell RNA-Seq data sets (e.g., Fig. 2a, Supplementary Fig. 4) indicates that the source of the analyzed tissue (in vitro vs. ex vivo), the biological condition of the individual cells (steady state vs. dynamically responding), and the cellular microenvironment all likely influence the extent of single-cell heterogeneity within a system. When applied to complex tissues – such as unsorted bone marrow, developing embryos, tumors, and other rare clinical samples – the variability seen through single-cell genomics may help determine new cell classification schemes, identify transitional states, discover previously unrecognized biological distinctions, and map markers that differentiate them. Fulfilling this potential would require novel strategies to address the high levels of noise inherent in single-cell genomics – both technical, due to minute amounts of input material, and biological, e.g., due to short bursts of RNA transcription30. Future studies that couple technological advances in experimental preparation with novel computational approaches would enable analyses, based on hundreds or thousands of single cells, to reconstruct intracellular circuits, enumerate and redefine cell states and types, and transform our understanding of cellular decision-making on a genomic scale.
BMDCs, prepared as previously described12, were stimulated with LPS for 4h and then sorted as single cells or populations (10,000 cells) directly into TCL lysis buffer (Qiagen) supplemented with 1% v/v 2-mercaptoethanol. After performing an 2.2x clean up with Agencourt RNAClean XP Beads (Beckman Coulter), whole transcriptome-amplified cDNA products were generated using the SMARTer Ultra-low RNA Kit (Clontech), and conventional Illumina libraries were made and sequenced to an average depth of 27 million read pairs (HiSeq 2000, Illumina). Expression levels and splicing ratios were quantified using RSEM14 and MISO18, respectively. Additional experiments were performed using RNA-FISH (Panomics), Immunofluorescence, FACS, and single-cell qRT-PCR (Single Cell-to-CT (Invitrogen) and BioMark (Fludigm)). Full Methods and any associated references are provided in SI.
We thank N. Chevrier, C. Villani, M. Jovanovic, M. Bray and J. Shuga for scientific discussions; N. Friedman and E. Lander for comments on the manuscript; B. Tilton, T. Rogers, and M. Tam for assistance with cell sorting; J. Bochicchio, E. Shefler, and C. Guiducci for project management; the Broad Genomics Platform for all sequencing work; K. Fitzgerald for the Irf7 −/− bone marrow; and L. Gaffney for help with artwork. Work was supported by an NIH Postdoctoral Fellowship (1F32HD075541–01, RS), an NIH grant (U54 AI057159, NH), an NIH New Innovator Award (DP2 OD002230, NH), an NIH CEGS Award (1P50HG006193–01, HP, AR and NH), NIH Pioneer Awards (5DP1OD003893–03 to HP, DP1OD003958–01 to AR), the Broad Institute (HP and AR), HHMI (AR), and the Klarman Cell Observatory at the Broad Institute (AR).
Author ContributionsAR, HP, JZL, NH, AKS, RS, AlG, & AG conceived and designed the study. AKS, XA, RSG, JTG, RR, CM, DL, JTT, DG & JG performed experiments. RS, AKS, SS, & NY performed computational analyses. RS, AKS, AlG, NH, JZL, HP, & AR wrote the manuscript, with extensive input from all authors. The authors declare no competing financial interests.