A critical question we must answer before applying iPSCs in regenerative medicine is how close iPSCs resemble ESCs and whether there are any features distinguishing them. Here, we reveal that iPSCs generated to date still inherently express distinctive transcriptome compared with ESCs, and that these 2 cell types can be distinguished by several basic biological modules.
Experimental conditions such as cell culture, cell handling, and treatment conditions have been proposed as factors that contribute to stochastic variations in iPSCs transcriptome [
1–
2,
10]. This seemed true after observing lab-specific iPSC transcriptome profiling [
11,
12]. However, these lab-specific patterns were drawn from microarray analyses without adjusting for batch effects, which is notorious for misleading microarray data interpretation [
15]. In addition, these patterns [
11,
12] were generated from cluster analysis that has low sensitivity for discriminating samples with high dimensions. In this study, we removed the batch effects from all datasets and employed between-group analysis [
32] to re-analyze the iPSC samples. Between-group analysis uses a standard conversion method such as correspondence analysis to calculate an ordination of sample groups rather than that of individual microarray samples and thus it has a discriminating power compatible to artificial neural network with high sensitivity. Our analysis revealed that the lab-specific iPSC profiling is a consequence of batch effects in microarray data ( right panel) and that, after removing batch effects, we find iPSCs are clearly separated from ESCs ( right panel). This indicates that human iPSCs inherently express distinctive transcriptome compared with ESCs.
Here, we employed systems biology approaches based on WGCNA to systematically investigate the system-wide biological picture between these 2 cell types and revealed conserved molecular features distinguishing these 2 cell types. Our analysis revealed a network containing 17 modules differentially expressed in iPSCs and ESCs (, ). These modules can be grouped into meta-modules based on functions and they primarily function in transcription, metabolism, development, and immune response. Strikingly, the functional modules are highly conserved in various iPSCs (). This conservation relationship was measured by the module membership correlation based on the network eigengene scores, which uses the principle component of high dimension data and thus captures the maximum information that may explain the natural relationship of the variables.
The modules identified in this study can be used as quantitative variables to classify samples and to predict the new samples (). By employing SVMs, our module-based models successfully discriminate these 2 cell types with a very high accuracy, ~96% for models based on both 17 modules and 4 meta-modules (transcription, metabolism, immune response, and development). Even with 2 meta-modules (transcription and metabolism), our model reaches a 94% accuracy (). Together, coherent co-expression, conservation, and discriminating powers of these modules suggest that these functional modules identified here serve as inherently conserved features distinguishing iPSCs and ESCs. This further suggests that these 2 cell types exhibit the distinctive differences in fundamental biological functions such as in transcription and metabolism. Consistently, recent studies have observed improvements in transcription and metabolism during iPSC production by adjusting transcription factor composition and hypoxia condition [
34], adding microRNAs [
35], and other factors like vitamin D [
36]. Furthermore, enzyme activity differences in metabolism between iPSCs and ESCs may explain the recent observations showing that modified culture medium enhances the iPSCs generation [
36]. Therefore, iPSCs have unique distinguishing features to ESCs.
Altered expression of functional modules may be modulated by many mechanisms, including epigenetic and genetic factors. Our data uncovered an overall inverse correlation between module expression and DNA methylation level (). We observed a similar trend even when we expanded our data set with 67 samples (unpublished data), suggesting that DNA methylation may serve as one epigenetic mechanism underlying functional module differences. Our present result on DNA methylation differences parallels the most current observations showing that iPSCs retain DNA methylation patterns from original somatic cells [
10,
37], and that iPSCs differentially express a panel of DNA methylation sites compared with ESCs [
38–
40]. Further biological experiments and bioinformatics algorithms are needed to fully understand the role of DNA methylation in regulating these modules. Recently, copy number variations are uncovered in iPSC compared with the parental somatic cells, suggesting that genetic changes can also take place in iPSC derivation [
41,
42]. Thus, we cannot rule out that genetic changes may also contribute to functional differences of human iPSCs and ESCs.
Our study systematically reveals inherent functional modules that are uniquely activated in iPSCs. Our findings provide an avenue to guide the further efforts on overcoming the barriers of transcriptional differences between iPSCs and ESCs.