|Home | About | Journals | Submit | Contact Us | Français|
Pluripotent stem cell lines with similar phenotypes can be derived from both blastocysts (embryonic stem cells, ESC) and primordial germ cells (embryonic germ cells, EGC). Here, we present a compendium DNA microarray analysis of multiple mouse ESCs and EGCs from different genetic backgrounds (strains 129 and C57BL/6) cultured under standard conditions and in differentiation-promoting conditions by the withdrawal of Leukemia Inhibitory Factor (LIF) or treatment with retinoic acid (RA). All pluripotent cell lines showed similar gene expression patterns, which separated them clearly from other tissue stem cells with lower developmental potency. Differences between pluripotent lines derived from different sources (ESC vs. EGC) were smaller than differences between lines derived from different mouse strains (129 vs. C57BL/6). Even in the differentiation-promoting conditions, these pluripotent cells showed the same general trends of gene expression changes regardless of their origin and genetic background. These data indicate that ESCs and EGCs are indistinguishable based on global gene expression patterns alone. On the other hand, a detailed comparison between a group of ESC lines and a group of EGC lines identified 20 signature genes whose average expression levels were consistently higher in ESC lines, and 84 signature genes whose average expression levels were consistently higher in EGC lines, irrespective of mouse strains. Similar analysis identified 250 signature genes whose average expression levels were consistently higher in a group of 129 cell lines, and 337 signature genes whose average expression levels were consistently higher in a group of C57BL/6 cell lines. Although none of the genes was exclusively expressed in either ESCs versus EGCs or 129 versus C57BL/6, in combination these signature genes provide a reliable separation and identification of each cell type. Differentiation-promoting conditions also revealed some minor differences between the cell lines. For example, in the presence of RA, EGCs showed a lower expression of muscle- and cardiac-related genes and a higher expression of gonad-related genes than ESCs. Taken together, the results provide a rich source of information about the similarities and differences between ESCs and EGCs as well as 129 lines and C57BL/6 lines. Such information will be crucial to our understanding of pluripotent stem cells. The results also underscore the importance of studying multiple cell lines from different strains when making comparisons based on gene expression analysis.
Embryonic Stem Cells (ESCs) and embryonic germ cells (EGCs) are prototypical pluripotential stem cells cultured in vitro. Mouse ESCs can be derived from the inner cell mass (ICM) of blastocysts (Evans and Kaufman, 1981; Martin, 1981), whereas mouse EGCs can be derived from primordial germ cells (Matsui et al., 1992; Resnick et al., 1992). Similar cells have also been derived from human ICM (Thomson et al., 1998) and PGCs (Shamblott et al., 1998), respectively. Both ESCs and EGCs possess two defining characters of pluripotent stem cells: self-renewal, i.e., ability to produce daughter cells with an identical phenotype, and pluripotency, i.e., ability to give rise to most differentiated cell types, including germ cells (Hadjantonakis and Papaioannou, 2001; Labosky et al., 1994; Loebel et al., 2003; Smith, 2001; Stewart et al., 1994). Furthermore, both ESC and EGC express the same pluripotency markers, such as Pou5f1 (also known as Oct3/4, Oct3, Oct4), Nanog, Zfp42 (also known as Rex1), and alkaline phosphatase (ALP). Thus, ESCs and EGCs are often considered as essentially equivalent cell types. Such similarity has recently led to a hypothesis that ESCs are also derived from PGC precursors that are originated from the ICM (Zwaka and Thomson, 2005).
There are, however, some known differences between ESC and EGC. For example, chimeras generated using some EGCs derived from postmigratory PGC from the genital ridge had fetal overgrowth and skeletal abnormalities (McLaren and Durcova-Hills, 2001; Tada et al., 1998). It has been reported that EGCs failed to differentiate in a co-culture with lung tissue, whereas ESC successfully differentiated in these conditions (Durcova-Hills et al., 2003). In vitro, EGCs differentiated more efficiently to neuronal cells and less efficiently to cardiac and skeletal muscle cells than ESC (Rohwedel et al., 1996). As far as we know, there are no surface or other markers that can distinguish between mouse ESC and EGC lines. The only known molecular difference is the DNA methylation patterns of some imprinted genes.
Another important difference between ESCs and EGCs is related to how these pluripotent cells are derived. Pluripotent ESCs are derived from pluripotent ICM of blastocysts, whereas pluripotent EGCs are derived from unipotent (or sometimes called nullipotent (Donovan and de Miguel, 2003)) PGCs, which can differentiate only to sperm or oocytes. Therefore, ICM cells converting to ESCs need to acquire only the capacity to proliferate actively in culture for long periods without differentiation (self-renewal), but PGCs converting to EGCs need to acquire both self-renewal and pluripotency. Therefore, the changes that occur during the conversion of PGCs to EGCs are often compared to those seen after the transplantation of nuclei from differentiated cells into enucleated oocytes (“nuclear reprogramming”) (Donovan and de Miguel, 2003; Hochedlinger and Jaenisch, 2006). Here, we ask whether these seemingly different mechanisms involved in deriving ESCs and EGCs are reflected in the molecular characteristics of the cells in culture.
Although the mouse ESCs have been studied in great detail by global gene expression profiling (e.g., Anisimov et al., 2002; Brambrink et al., 2006; Furusawa et al., 2006; Ivanova et al., 2002; Ramalho-Santos et al., 2002; Tanaka et al., 2002; Wei et al., 2005), comparisons between ESC and EGC have been limited to the analysis of EST frequency in one ESC line versus one EGC line (Sharov et al., 2003). Therefore, the main question still remains open of whether ESCs and EGCs are distinct or equivalent cell types from the viewpoint of global gene expression patterns. As a first step to address this question, we have examined the global gene expression profiles of mouse ESCs and EGCs. To avoid detecting cell line-specific gene expression patterns, we examined multiple ESC lines and EGC lines, which have been established independently. Considering that gene expression profiles might be affected by genetic background, we tested cell lines derived from different mouse strains. Finally, we examined possible differential responses of ESC and EGC lines to differentiation-promoting stimuli, namely the absence of LIF and the presence of retinoic acid (RA).
We first examined the phenotypes of six mouse ESC lines and 10 mouse EGC lines (Table 1) cultured for three days in three different conditions: with LIF (LIF+, the standard culture condition), without LIF (LIF−, the differentiation-promoting condition), and with LIF and RA (RA+, the differentiation-promoting condition) (see the Materials and Methods for details). In the standard culture conditions, all cell lines examined here showed similar morphology and growth (Fig. 1A). When these cells were cultured in the differentiation-promoting conditions, we observed three early indicators of differentiation: a decreased signal intensity of alkaline phosphatase (ALP) staining, the flattening of ESC and EGC colonies, and the emergence of distinct single cells at the edges of colonies. These early signs of differentiation were noted by day 2 with RA and by day 3 in the absence of LIF (Fig. 1A). By day 3, most cells exposed to RA were already well-differentiated, with a morphology resembling epithelial or endodermal cells. However, at day 3, the cells cultured without LIF did not show any obvious signs of differentiation into a particular cell lineage, although the colonies already showed flat morphology and low intensity or patchy ALP staining. In the presence of RA, cells with a differentiated phenotype were slightly more abundant in ESC lines than in EGC lines.
It is known that pluripotent cells have a unique cell cycle: most cells are found in the S phase of the cell cycle and the G1 phase is short (Savatier et al., 1996). In the presence of LIF, all cell lines examined here showed a distribution of 57.7% - 66.3% cells in S phase and 13.8% - 23.4% cells in G1 phase (Fig. 1B), as it has been reported for undifferentiated ESC (Savatier et al., 1996). As expected, upon differentiation the proportion of cells in the G1 phase of the cell cycle increased while the proportion of cells in S phase decreased. However, these changes were relatively small, probably because cells were not fully differentiated after 3 days in the RA+ and LIF− conditions. The proportion of cells in G1 phase of the cell cycle was lower in EGC than in ESC (GLM, p = 0.033), especially in the RA+ condition.
Using a whole-genome NIA 44K oligo-DNA microarray, we obtained the global gene expression profiles of six mouse ESC lines and six mouse EGC lines cultured for three days in LIF+, LIF−, and RA+ conditions. To assess the expression profiles of these cells in a larger context, we first compared them with our previous microarray data obtained from trophoblast stem (TS) cells and neural stem/progenitor cells (NSC) (Aiba et al., 2006). Direct comparison was possible because both studies used the same type of microarrays with a largely overlapping set (N = 20,088) of 60-mer oligos, and both studies included the same ESC line 129.3, which was used as a common standard for data normalization. Results of Principal Component Analysis (PCA) of log-transformed gene expression values showed that the gene expression profiles of ESC and EGC lines were similar to each other, and were clearly separated as a single group from those of TS and NSC (Fig. 2).
Although ESCs and EGCs were inseparable in the above analysis, we wished to identify the differences among individual cell lines. First, we applied ANOVA statistics to the microarray data of all ESC and EGC lines in the standard LIF+ condition and found that 6998 genes had a significant difference in their expression among individual ESC and EGC lines (Table S1). The PCA of these genes revealed considerable variations among individual cell lines. We found that 129 cell lines and C57BL/6 cell lines were best separated along a linear combination of principal components 1 and 2 (PC1+0.69·PC2), whereas ESCs and EGCs were separated along the PC3 axis (Fig. 3). However, the cell lines-to-cell lines variations within ESC and EGC groups were too large to draw clear-cut boundary between these categories (Fig. 3). For example, two EGC lines (TGC 8.5-5 and TGC 8-8) had gene expression similar to the ESC line BL6.9, whereas other 4 EGC lines had a more distinct gene expression pattern. In general, the difference in gene expression patterns between ESCs and EGCs was smaller than that between mouse strains, because the former was represented only by the 3rd principal component (PC3), but the latter was represented by the first two principal components (PC1 and PC2). Therefore, we conclude that ESCs and EGCs are not distinct entities in relation to global gene expression.
We expected sex-specific differences of global gene expression profiles in ESCs and EGCs. However, we did not find any principal component that separated female cell lines (CC9, TGC-12.8, and EG-3) from male lines. Three Y-linked genes (Eif2s3y, Uty, and Ddx3y) were more highly expressed (p<0.001) in male cell lines than in female cell lines, whereas four X-linked genes (Xlr5d, Xlr5c, and 1700013H16Rik) were more highly expressed (p<0.04, fold change >2) in female cell lines. However, such a few sex-related genes did not satisfy the stringent statistical criteria by the false discovery rates (FDR; see Materials and Methods for the detail). Pluripotent cell lines have thus limited sex-related differences in gene expression. We also expected stage-specific differences in the expression profiles of EGCs. However, we did not detect any specific gene expression patterns that correlated to the stage of EGC source, e.g., pre-migratory PGC (8-8.5 dpc) or post-migratory PGC (12 dpc) stages.
The separation of individual pluripotent cell lines by the PCA reflects the differences in their global gene expression patterns among the cell lines. To investigate what genes contributed to these differences, we sought genes whose expression levels changed significantly between cell lines (FDR ≤ 0.05, fold change >1.5) and correlated (abs(r) > 0.7) to those principal components that separated cell lines based on their strain and origin (Fig. 3). We found four sets of genes (Table S2-S3): 580 genes were correlated positively to PC1+0.69·PC2, and thus were generally more highly expressed in 129 cells than in C57BL/6 cells (“129-overexpressed genes”); 658 genes were correlated negatively to PC1+0.69·PC2, and thus were generally more highly expressed in C57BL/6 cells than in 129 cells (“C57BL/6-overexpressed genes”); 207 genes were correlated positively to the PC3, and thus were generally more highly expressed in EGC lines than ESC lines (“EGC-overexpressed genes”); and 35 genes were correlated negatively with PC3, and thus were more highly expressed in ESC lines than EGC lines (“ESC-overexpressed genes”).
First, we analyzed these four sets of genes from the viewpoint of their pluripotency. Previously, we have shown that NSCs and TS cells already commit to specific cell lineages and express many genes that are characteristic of differentiated cells (Aiba et al., 2006). From the PCA (Fig. 2), we extracted 1,719 genes that were negatively associated with PC1, which represented genes expressed more highly in pluripotent ESCs and EGCs than in the lineage-committed NSCs and TS cells (Table S4). These candidate “pluripotency-related” genes included Pou5f1 (Oct3/4), Nanog, Zfp42 (Rex1), Nr0b1 (Dax1), Tcl1, Dppa3 (Stella), Dppa4, Klf2, Klf4, Jarid2, Jarid1b, Foxd3 (Anisimov et al., 2002; Boyer et al., 2005; Gordeeva et al., 2005; Hanna et al., 2002; Li et al., 2005; Mongan et al., 2006; Niakan et al., 2006; Player et al., 2006; Sharov et al., 2003; Yoshikawa et al., 2006). The pluripotency-related genes were more abundant in the 129-overexpressed gene list than the C57BL/6-overexpressed gene list (chi-square = 19.4; p < 0.001) (Fig. 3), but there was no significant difference in the proportion of pluripotency-related genes in EGC- and ESC-overexpressed gene lists (chi-square < 0.001; p = 1). These data suggest that ESC and EGC show no difference in the degree of pluripotency, whereas C57BL/6 lines have a greater tendency to lose their pluripotency than 129 cell lines, at least after three days in culture in the standard LIF+ condition. Genes that were differentially expressed between strains did not include the canonical pluripotency genes (Pou5f1, Nanog, and Sox2); however, the expression level of Zfp42, which was already high in C57BL/6 lines, was much higher in 129 cell lines (Table S2, Fig. 5B).
Second, we analyzed these four sets of genes based on gene ontology (GO) terms (Fig. 3; see Tables S5-S7 for the details). The 129-overexpressed genes were classified as being associated with cytokine production, amino acid metabolism, lipid transport, and actin cytoskeleton. Higher cytokine production may be related to the fact that 129-derived cell lines have faster proliferation rate than C57BL/6-derived cell lines in the standard LIF+ culture condition (data not shown). They also included many genes with unknown functions. The C57BL/6-overexpressed genes were classified as being associated with oxidoreductase activity, steroid metabolism, lysosome, spliceosome, protein transport, apoptosis, ubiquitination activity, and GTPase activity. Because the number of ESC-overexpressed genes was very limited (N = 35), there were no statistically significant GO categories. The EGC-overexpressed genes were associated with tRNA aminoacylation, cytokinesis, translation, and ubiquitination activity. It is worth mentioning that genes involved in methyltransferase activity (Mettl5, Smyd3, Dnmt3l) were overexpressed in the EGCs, which may indicate that the status of chromatin modifications in EGCs is different from that in ESCs.
One of our original goals was to identify marker genes that can separate ESCs and EGCs. The comparison between a single ESC line and a single EGC line revealed many such differentially expressed genes. However, subsequent analysis showed that these genes were merely cell line-specific and did not show consistent differential expression between multiple ESCs and multiple EGCs (e.g., Supplemental Fig. S1).
To find signature genes that are most consistently different between ESC and EGC lines in all culture conditions, we grouped the cells according to their origin (e.g. ESC vs. EGC) and carried out the statistical analysis (ANOVA) between these groups (see Materials and Methods for the detail). We found 104 signature genes (20 genes for ESC and 84 genes for EGC), whose average expression levels can distinguish each group (Tables S8-S9). Although we did not find any gene that was exclusively expressed in ESC or EGC, these groups were separated by the average expression levels of signature genes from each set (Fig. 4A), except one replication for cell line TGC 8-7. The top genes from signature lists were chosen for qRT-PCR verification: 2 ESC-overexpressed genes (Grb10, Id2) and 3 EGC-overexpressed genes (Peg3, Spp1, Snrpn). qRT-PCR results confirmed microarray data with all genes showing differential expression across all tested 6 ESC and 10 EGC lines (Fig. 5A). Out of 20 ESC-signature genes, two (Id1 and Id2) are known to be directly activated by Bmp4 (Hollnagel et al., 1999), which may suggest that Bmp4 signaling is suppressed in EGC lines. Among EGC signature genes some are associated with extracellular matrix (Col16a1, Col1a1, Col8a1, Ecm1, Mfge8, Postn, Serpinf1, Serpinh1, Slit2, Spp1, Timp1), which may be related to the migratory pathway of parental PGC cells (Pereda et al., 2006).
We also found 250 signature genes for cells derived from the 129 strain and 336 signature genes for cells derived from the C57BL/6 strain in all three culture conditions (Tables S10-S11). Cell lines derived from these different strains were well separated by the average expression levels of top 100 signature genes from each set (Fig. 4B). Differential expression of 7 signature genes Prdx2, Bhlhb2, Zadh2, Inpp5d, Tcstv3, Spink3, and Zfp42 was confirmed using qRT-PCR (Fig. 5B). To test if these differences were related to the degree of cell differentiation, we compared the sets of signature genes with a set of 514 genes whose expressions were induced in ESCs and EGCs in the LIF− condition (Table S12). In the set of 514 genes, more C57BL/6-signature genes were represented than 129-signature genes (16.6%, N = 56 in the strain C57BL/6 versus 4.0%, N = 10 in the strain 129; chi-squre = 22.9, p < 0.001, see Tables S10, S11), indicating that cell lines derived from the C57BL/6 strain had a higher tendency to differentiate than cell lines from the 129 strain. However, the majority of strain-specific signature genes were not related to cell differentiation.
We found a few interesting examples that strain-specific phenotypes of mice may be reflected in the strain-specific signature genes. First, oxidoreductase activity was the top GO category for C57BL/6-overexpressed genes (Fig. 3), 17 of which were represented in the signature genes (e.g., Prdx2, Dhrs7, Zadh2, Egln3, Mod1, Nxn, Phgdh, Gsr). This seems to be consistent with the report that C57BL/6 mice have higher resistance to oxidative stress and have a longer life span than other strains (Rebrin and Sohal, 2004; Wesselkamper et al., 2000; Zraika et al., 2006). Second, we noticed genes involved in alcohol metabolism (Pkm2, Sgpp1, Slc37a4, Soat1, Vldlr, Zadh2) in the C57BL/6 signature genes, which may be related to the well-known alcohol-preferring behavior of C57BL/6 mice (Kelai et al., 2006). Third, we found that apoptosis-related genes (Card4/Nod1 and Pmaip1) were at the top of the list in the signature genes for strain C57BL/6, followed by additional genes (Atm, Cebpb, and Dap). Indeed, the higher incidence of apoptosis in C57BL/6 strain has been reported during oocyte atresia (Canning et al., 2003) and ES cell differentiation (Ward et al., 2004).
Frequent testicular cancer is a well-known phenotype of the 129 mouse strain (Matin and Nadeau, 2005). Dnd1 has recently been identified as a gene responsible for Ter mutation, which is the most potent testicular cancer modifier gene (Youngren et al., 2005). We therefore examined whether the strain-specific signature genes were cancer-related (based on PubMed and NCI/Biomax Cancer Gene Database, http://ncicb.nci.nih.gov/NCICB/projects/cgdcp) and expressed in testis (based on EST Profile Viewer (Larsson et al., 2000) in UniGene database). We found that 6 such genes were over-expressed in the strain 129 (Anxa4, Dmd, Eno1, Gbp1, Gpr56, Tlr2); however, all of them were tumor-suppressors, and therefore, their high expression is unlikely to cause cancer. In contrast, we found that 16 such genes were under-expressed in the strain 129 (i.e., signature genes for C57BL/6), 7 of which were either tumor suppressors or negatively correlated with cancer (Atm, Dok2, Hdac1, Herpud1, Pik3cd, Prdx2, Sfrp1). A list of genes identified here may provide additional candidates for testicular cancer modifier genes in the 129 strain.
Previous study has shown the differential DNA methylation of some imprinted genes between ESCs and EGCs (Durcova-Hills et al., 2001; Labosky et al., 1994; Tada et al., 1998). Out of 60 mammalian imprinted genes reported so far (Luedi et al., 2005), 6 genes (H19, Kcnq1, Nnat, Peg3, Snrpn, and U2af1-rs1) were more highly expressed in EGCs, whereas 2 genes (Calcr, Grb10) were more highly expressed in ESCs in the standard LIF+ culture condition (Table 2). Grb10, a growth suppressor (Smith et al., 2006), was also the top ESC signature gene and Snrpn, U2af1-rs1, H19 were among the top EGC signature genes (Tables S8-S9). Differential expression of Grb10, Peg3, and Snrpn was tested and confirmed by qRT-PCR (Fig. 5A). Five genes out of 7 imprinted genes that were over-expressed in EGC were mapped to chromosome 7 (Table 2). Interestingly, 8 imprinted genes showed significant (p < 0.05 and >1.5 fold change) differences in expressions between mouse strains at least in one of the differentiation-promoting conditions (Table 2). For example, Peg3 and Dcn were more highly expressed in the C57BL/6 strain, whereas Impact, and Zfp264 were more highly expressed in the 129 strain.
Although ESCs and EGCs were not separable by global gene expression patterns in the standard culture condition, we suspected that they might show more discernible differences when induced to differentiate. To test this idea, we carried out the expression profiling of ESC lines and EGC lines cultured for three days in LIF− and RA+ conditions. The PCA of the centered gene expression profiles (see Materials and Methods for the detail) clearly showed that both ESCs and EGCs responded in a similar manner to LIF withdrawal and RA treatments (Fig. 6). The first major component PC1 corresponded to the changes associated with the RA+ condition, whereas the second major component PC2 corresponded to the changes associated with the LIF− condition (Fig. 6). The data indicate that the RA+ condition caused more dramatic change of global gene expression patterns than the LIF− condition. In fact, the RA treatment caused a qualitatively different and much stronger response in gene expression (2613 genes with >1.5-fold change) than the LIF− condition (937 genes with >1.5-fold change). This is consistent with the phenotypical differences observed between the RA+ and LIF− conditions (Fig. 1A): The RA+ condition induced more active and faster differentiation than the LIF− condition. In fact, the expressions of some pluripotency-related genes (e.g., Pou5f1, Sox2, Nanog, Aire, Otx2, Foxd3, and Tcea3) were more sharply reduced in the RA+ condition than the LIF− condition. The addition of RA to the basal conditions directed cells to a wide variety of lineages (neural, epithelial, endoderm, and mesoderm) with a bias towards neural differentiation. For example, genes induced selectively by RA included homeobox genes (e.g., Esx1, Meis1, Hoxa1, Hoxa7, Hoxb1, Hoxa5, Hoxa2, Hoxc9, Hoxa10, Hoxb7, Hoxb8), and genes involved in morphogenesis (e.g. Tbx3), actin cytoskeleton, cell adhesion, cell migration, extracellular matrix, and various types of differentiation (e.g., neural, muscular, vascular, kidney, and endodermal cells). The most highly induced genes were related to neural differentiation (Pmp22, Mrg1, and Hoxa1). Genes induced selectively by LIF withdrawal included fibroblast growth factors (Fgf5, Fgf8, and Fgf15) and transcription factors (T, Eomes, Lmyc1/Mycl1, Sp8, Irf1, Gsc, and Sp5) (Table S12), many of which are mesoderm markers (T, Eomes, Fgf8, Fgf15) (Loebel et al., 2003). Some pluripotency-related genes were suppressed preferentially in the LIF− condition (Spp1, Klf4, Esgp/EG653016, Piwil2, Kit), whereas other pluripotency-related genes decreased both in the LIF− and RA+ conditions (Nr0b1, Tcl1, Zfp459, Sall1, Gli1, Rest, Nr5a2, Lrrc2, Klf2, Klf5, Klf9).
Differentiation-promoting culture conditions generally reduced the differences in gene expression levels among pluripotent cell lines. For example, most ESC- and EGC-signature genes identified in the standard LIF+ culture condition did not show significant differences in their expression in the LIF− and RA+ conditions. Only 2 (10%) ESC-signature genes (Grb10 and Glo1) and 15 (18%) EGC-signature genes (Atf5, Ccng2, Itpr3, Mylpf, Pgm2, Pink1, Ralgds, Snf1lk, Snrpn, Snx6, Spp1, Tbc1d13, Tmem40, Trib3, U2af1-rs1) showed persistent differential expressions in all three culture conditions (tables S8-S9), indicating that these genes are the most reliable separator between ESCs and EGCs. Similarly, many C57BL/6-signature genes and 129-signature genes showed no difference in expression in the LIF− and RA+ conditions. For example, Fgf5 showed a high expression in C57BL/6 lines and a low expression in 129 lines in the standard LIF+ culture condition, but its expression became equally high in all cell lines in the LIF− condition and equally low in all cell lines in the RA+ condition. However, a higher proportion of strain-specific signature genes retained their expression differences in the differentiation-promoting conditions: 75 genes (30%) for 129-signature genes and 91 genes (27%) for C57BL/6-signature genes (Tables S10-S11).
To further investigate the origin- and strain-specific variations in responses to differentiation-promoting culture conditions, we focused on a set of genes that were differentially expressed between 129 and C57BL/6 (Fig. 7A) or between ESC and EGC (Fig. 7B), only after RA induced differentiation. Genes that were overexpressed in the 129 cell lines after RA treatment were associated with immune response (Lgals4, H2-Bl, Ly6h, Cxcl10, Ccl3, H2-Q1, H2-Q5, H2-M10.2), muscle differentiation (Myocd, Myl7), and cell cycle control (Gas1, Fos) (Fig. 7A). In contrast, genes that were overexpressed in the C57BL/6 cell lines in response to RA were associated with secretion (Saa3, Saa1), the inhibition of Wnt signaling (Nkd1), and the extracellular matrix (Col2a1, Mmp9).
Genes that were overexpressed in ESCs compared with EGCs in the RA+ condition included the following functional groups: (i) muscle development and actin cytoskeleton (Myocd, Acta1, Myl7, Cgnl1, Flnb, Myl6, Tagln2, and Cald1), (ii) regulation of transcription (Creb3l3, Foxd1, Hipk3, Rarb, Setbp1, Meis1, and Bach2), and (iii) cell junction structure (Cgnl1, Flnb, Shrm/Shroom3)(Fig. 7B). These results indicate that genes activated in ESC lines by the RA treatment are more related to muscle differentiation than those activated in EGC lines. Genes that were selectively activated in EGCs compared with ESCs in the RA+ condition are known to be expressed in ovary (Obox6, Hoxc9, Mmp9, Saa1), blood (Hoxa9), and intestine (Cdx1). Obox6 is expressed uniquely in oocytes (Rajkovic et al., 2002), and Hoxc9 and Mmp9 are expressed in granulosa cells in the ovary (Huntriss et al., 2006; Ke et al., 2004). qRT-PCR analyses confirmed the overexpression of Myocd, Myl7, and Plat in ESC and Obox6 in EGC in the RA+ condition (Fig. 5C). These data thus indicate that RA induced some gonad-related genes more highly in the EGC lines than the ESC lines.
We have compared the gene expression profiles of pluripotent stem cell lines with different genetic backgrounds and derived from cells at different stages of development: the ICM of blastocyst (ESC) and primordial germ cells (EGC). In addition to the standard LIF+ culture conditions, we tested gene expression after LIF withdrawal and RA treatment. As far as we know, this is the first large-scale gene expression study of various types of pluripotent stem cell lines. This data set (Table S13) will thus be a valuable resource to the research community.
One of the main questions we wanted to address in this work was whether ESCs and EGCs can be distinguished based on global gene expression patterns. Both ESC and EGC are pluripotent stem cells and can differentiate into essentially all the cell types in vivo and in vitro. Therefore, it has been an important question whether the different origins of these cells indeed cause any differences in their phenotypes and gene expression regulation (Donovan and de Miguel, 2003; Durcova-Hills et al., 2003; Hochedlinger and Jaenisch, 2006; Rossant, 1993). Here, we have shown many similarities and differences between ESCs and EGCs as well as between C57BL/6 lines and 129 lines. From the perspective of global gene expression profiles, we have shown in this paper that ESCs and EGCs are indistinguishable. First of all, when the expression profiles of multiple ESCs and EGCs were compared with those of NSC and TS cells, all ESCs and EGCs were clustered as one group. Second, even when comparing only ESCs and EGCs, variations between individual cell lines were too great to clearly separate them into two distinct groups, and the differences between ESCs and EGCs were smaller than those between cells derived from different strains. Differences of global gene expression patterns measured by the PCA between one ESC line to another ESC line were larger than those between one ESC line and EGC line. It is important to point out that if we compared the global expression profiles of only a few cell lines, we would have erroneously concluded that ESC and EGC are separate entities and identified marker genes specifically expressed in either ESC or EGC.
Although we expected that our study would reveal differences between ESCs and EGCs when they were induced to differentiate, we failed to find such an effect in global gene expression patterns. In both ESCs and EGCs, the RA treatment caused a much stronger response in gene expression than the LIF withdrawal. This is consistent with the phenotypical differences observed between RA+ and LIF− conditions. The addition of RA affected the expressions of genes involved in a wide variety of lineages (neural, epithelial, endoderm, and mesoderm) with a bias towards neural differentiation, whereas upon LIF withdrawal changes were biased towards mesoderm lineage.
Considering possible differences between the nuclear reprogramming process of ICM cells converting to ESCs and that of PGCs converting to EGCs, the remarkable similarity of gene expression patterns between ESCs and EGCs was unexpected. The work presented here suggests that these reprogramming events occur to the level where the global gene expression patterns of EGCs are inseparable from those of ESCs. Alternatively, this similarity between ESCs and EGCs may simply support the hypothesis that ESCs are also derived from PGC precursors that are originated from the ICM (Zwaka and Thomson, 2005).
Although large variations of gene expression profiles between cell lines within each category (e.g., ESCs) make it difficult to separate pluripotent cells to distinct groups as discussed above, it is possible to identify signature genes by first grouping the cells according to their categories and then by carrying out the statistical analysis to identify genes differentially expressed between groups. Using this approach, we identified 84 EGC signature genes and 20 ESC signature genes, which showed the most consistent and differential expression between EGC and ESC lines, irrespective of mouse strains. Although none of these genes was exclusively expressed in either EGCs or ESCs, in combination they provide a reasonable separation between these two cell types. These signature genes included several imprinted genes: Grb10 was overexpressed in ESC lines and Snrpn, U2af1-rs1, H19, and Nnat were overexpressed in EGC lines. In total, 13 imprinted genes were differentially expressed between ESC and EGC at least in one culture condition. Imprinted genes are thus one of the few features that reflect the origin of the pluripotent cell lines. However, imprinted genes constitute only a minor portion (~5%) of ESC- and EGC-signature genes and cannot account for the majority of differences between ESCs and EGCs. Although EGC signature genes identified in the standard LIF+ culture condition did not include germ line-specific genes, some ovary-related genes were upregulated in these cells in the RA+ condition. The level of induction of genes related to muscular differentiation was lower in EGCs compared with ESCs. As a result, EGCs might be more suitable starting material for generation of germ cells in culture and would be a poor choice in muscle development experiments. The most significant differences in induction of differentiation-related genes between EGC and ESC were observed for muscle-related genes in the RA+ condition (Fig. 8). It was reported previously that mouse EGC line (EG-1) undergoing differentiation in the embryoid body (EB) system has a limited ability to differentiate into muscle and cardiac lineages (Rohwedel et al., 1996). In addition, the proportion of cells in G1 phase of the cell cycle in EGC was lower than in ESC in the RA+ condition. These findings might indicate that EGCs were less differentiated in the RA+ condition than ESCs, which could be caused by either delayed response or lower sensitivity to differentiation stimuli.
Using similar approaches, we have also identified a set of genes that can distinguish C57BL/6 and 129 cells by their average expression levels. One notable difference between C57BL/6 cells and 129 cells is their relative pluripotency status at the day 3 of culture in the standard LIF+ condition. Cell lines derived from C57BL/6 strain have their gene expression already shifted towards more differentiated state, although they still have a high expression of canonical pluripotency markers (Pou5f1, Nanog, Sox2), maintain normal ESC-like phenotype (Fig. 1), and normal ESC-like cell cycle (Fig. 2). Interestingly, Zfp42, which is a well-known marker for pluripotent cells, belongs to 129-signature genes. Although Zfp42 was expressed at relatively high level in C57BL/6 cell lines in the standard LIF+ condition compared to tissue-specific stem cells, its expression was up to 10 times higher in 129 cell lines than in C57BL/6 cell lines. We also noticed that the C57BL/6 cells tended to show the slight reduction of ALP staining level and the slight flattening of the cell colonies at the day 3 in the standard LIF+ condition, compared with 129 cells (data not shown). This earlier shift towards differentiation may reflect cell's ability to respond to differentiating stimuli. However, this shift to differentiation is only a minor aspect of differences in gene expression between C57BL/6 and 129 strains, because the majority of strain-specific signature genes were not related to cell differentiation.
It is known that it is easier to derive ES cell lines from the 129 strain than most other mouse strains (Ledermann, 2000). ESCs derived from C57BL/6 are also known to have restricted differentiation ability in some conditions: for example in the medium without LIF and supplemented with RA they fail to differentiate in neurons and die within 2 weeks in culture (Ward et al., 2004). Their reported inclination to apoptosis matches well with our analysis (Fig. 3). They are also less efficient in producing chimeras and germ line transmission (Ward et al., 2004), although for practical applications higher breeding efficiency of the C57BL/6 strain seems to compensate these drawbacks (Seong et al., 2004). We have shown in this report that pluripotency-related genes are overexpressed in 129-derived cells compared with C57BL/6-derived cells, which might correlate with observations that 129-derived ES cells produce chimeras with high percentage of germ line transmission. We also found candidate genes that may be responsible for the difference in phenotypes between mouse strains. In particular we found genes that are likely to be associated with high resistance to oxidative stress, alcohol-preferring behavior, and high level of apoptosis in C57BL/6 mice as well as genes potentially associated with high prevalence of testicular cancer in the 129 strain.
The current study has examined multiple ESCs and EGCs, but the scope is still limited only to the subsets of all available ESCs and EGCs. It will be thus interesting to extend the study to other ESC and EGC lines. Moreover, it will also be interesting to examine other reported pluripotent stem cells such as multipotent adult progenitor cells (MAPCs; (Jiang et al., 2002), germline stem cells (GSCs; (Kanatsu-Shinohara et al., 2004), and multipotent adult germline stem cells (mpGSCs; (Guan et al., 2006) on the same platform to compare with the current data sets. The ESC and EGC signature genes that we have identified in this work may have a practical use to distinguish between ESCs and EGCs as well as between 129 cells and C57BL/6 cells. Future studies on genes differentially expressed between 129 and C57BL/6 cells may also help to improve the germline transmission efficiency of C57BL/6 ES cells in mouse gene targeting strategy.
We used 6 mouse ESC lines, including 1 line derived from strain C57BL/6 and 5 lines derived from strain 129, and 10 mouse EGC lines, including 8 TGC lines from strain C57BL/6 (established in the laboratory of B.L.M.H.) and 2 EG lines from strain 129 (established in the laboratory of C.L.S.) (Table 1). BL6.9 (aka MC2-B6) and 129.3 (aka MC1) ESC lines were purchased from The Transgenic Core Laboratory of the Johns Hopkins University School of Medicine (Baltimore, MD, USA). ES-D3-GL ESC line was purchased from the American Type Culture Collection (ATCC, Manassas, VA, USA). CC9.3.1 ESC line, which was originally generated by Dr. Allan Bradley, was a kind gift of Dr. Grant R. MacGregor (University of California, Irvine, CA, USA). EBRTcH3 and MG1.19 ESC lines were kind gifts of Dr. Hitoshi Niwa (RIKEN Center for Developmental Biology, Kobe, Japan). Three cell lines were female and the other were male; two ESC lines (EBRTcH3 and MG1.19) were genetically modified (Table 1). For experiments, both ESC and EGC were cultured for 2 passages on gelatin-coated plates in the presence of Leukemia Inhibitory Factor (LIF) in order to remove feeder cells. Cells were then transferred to gelatin-coated 6-well plates at the density of 1-2×105 cells/well (1-2×104 cells/cm2) and cultured for 3 days in 3 different conditions: (1) complete ES medium: DMEM, 15% FBS; LIF (ESGRO, Chemicon, USA) 1000 U/ml; 1 mM sodium pyruvate; 0.1 mM NEAA, 2 mM glutamate, 0.1 mM beta-mercaptoethanol, and penicillin/streptomycin (50 U/50 μg per ml) (referred as “standard LIF+ condition”); (2) complete medium without LIF (referred as “LIF− condition”), and (3) complete medium with 1 μM RA (referred as “RA+ condition”). Cells were cultured at 37 °C and 5% CO2 condition and the culture medium was changed daily. Undifferentiated state of pluripotent cells was confirmed by ALP staining using Stem Cell Characterization kit (Chemicon, USA).
For cell cycle analysis, cells were plated on gelatin-coated 6-well plates at the density of 1×104 cells/cm2 (105 cells/well). Cells were harvested at day 3, washed in PBS, fixed in 70% ethanol, treated with RNase A (1 mg/ml final concentration), and stained with propidium iodide (PI, Roche, 50 μg/ml final concentration). Cell cycle analysis was performed using FACScan. Cell cycle fit software (MacCycle-Phoenix Flow Systems) was used to estimate the percentages of cells in various stages of the cell cycle.
For microarray experiments we used all 6 ESC cell lines and 6 EGC lines in 3 culture conditions (LIF+, LIF−, and RA+) (See Table 1 for the list of cell lines). Experiments in the standard LIF+ condition were carried out with 2 to 3 biological replications, and experiments in the LIF− and RA+ conditions were done without replications. At the day 3, Trizol™ (1 ml/well; Invitrogen, USA) was added to the well and total RNAs were extracted using Phase lock gel™ columns (Eppendorf/Brinkman) according to the manufacturer's protocol. Total RNAs were precipitated with isopropanol, washed with 70% ethanol, and dissolved in DEPC-treated H2O 2.5 μg of total RNA samples were labeled with Cy3-CTP using a Low RNA Input Fluorescent Linear Amplification Kit (Agilent, USA). A reference target (Cy5-CTP-labeled) was prepared from the Universal Mouse Reference RNA (Stratagene, USA). Labeled targets were purified using an RNeasy Mini Kit (Qiagen, USA) according to the Agilent's protocol, quantitated by a NanoDrop scanning spectrophotometer (NanoDrop Technologies, USA), and hybridized to the NIA Mouse 44K Microarray v2.2 (whole genome 60-mer oligo; manufactured by Agilent Technologies, #014117) (Carter et al., 2005) according to the Agilent protocol (G4140-90030; Agilent 60-mer oligo microarray processing protocol - SSC Wash, v1.0). All hybridizations were carried out in the two color protocol by combining one Cy3-CTP-labeled experimental target and Cy5-CTP-labeled reference target. Microarrays were scanned on an Agilent DNA Microarray Scanner, using standard settings, including automatic PMT adjustment.
The data discussed in this publication have been deposited in NCBI Gene Expression Omnibus (GEO; Barrett et al., 2007; http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession number GSE5914. The data and analysis software is also available at the NIA ANOVA tool website (Sharov et al., 2005b; http://lgsun.grc.nia.nih.gov/ANOVA/). Differential gene expressions in various cell lines in the standard culture condition were analyzed using the NIA Array Analysis software (Sharov et al., 2005b), which implements ANOVA statistics with two additional methods to reduce the number of false positives: (1) small error variances were replaced with the average error variance estimated from 500 genes with similar signal intensity, and (2) false discovery rates (FDR ≤ 0.05) were used to select genes with differential expression, instead of p-values (Benjamini and Hochberg, 1995). Mean center global adjustment was used to remove differences between 2 batches of replications, which were done with the same set of cell lines. Experiments without replication (LIF+, LIF−, and RA+ experiments that were done simultaneously) were analyzed with a 2-factor ANOVA (factorial design) with cell line (n = 12) and treatment (n = 3) as factors. The variance associated with interaction of factors was treated as error variance. This analysis may be biased towards the increased stringency, because some interaction terms may be biologically relevant, but the stringency was not compromised by these assumptions. Error variance adjustment and FDR estimation were done as described above.
Principal component analysis (PCA) of log-transformed gene expression levels was done using the SVD method within the NIA Array Analysis software (Sharov et al., 2005b) based on averages for each cell line. Individual replications were projected on the principal components and then ANOVA was used to estimate standard errors. Genes contributing to each principal component were selected based on the following criteria: (1) significant differential expressions between cell lines (FDR ≤ 0.05), (2) high correlation (abs(r)≥0.7), and (3) fold change in the direction of the principal component (fold change >1.5). Because we wanted to separate the effects of cell lines from those of culture conditions (LIF+, LIF−, and RA+), we carried out PCA for the log-expression of genes in the standard (LIF+) condition, and for the centered log-expression obtained by subtracting cell line-specific average log-expression in all 3 culture conditions.
Signature genes for EGC and ECS lines and those for 129 and C57BL/6 strains were selected based on the ANOVA of grouped replications from all cell lines in the standard LIF+ culture condition (FDR ≤ 0.05) with additional filtering based on: (1) significant differential expressions between all cell lines; (2) fold change >1.5; and (3) the presence of proper gene symbols. Even if a gene fell into both EGC-ESC and 129-C57BL/6 categories, it was included in only one category with stronger effects. Genes that were differentially expressed between EGC and ESC (or between 129 and C57BL/6) only in the RA+ condition were selected based on the following criteria: (1) differential expressions in the RA+ condition was significant (FDR <= 0.05, fold change >1.5), and (2) the difference in ESC-EGC (or 129-C57BL/6) log-ratios between RA+ and LIF+ conditions was >1.76 (1.5 fold change).
Analysis of selected genes by gene ontology (GO) terms was carried out using the hypergeometric distribution (FDR ≤ 0.1) and enrichment ratio (>1.5) as significance criteria with the NIA Mouse Gene Index (ver. mm7) software (Sharov et al., 2005a). Only non-redundant genes with gene symbols were used for analysis. Because many GO categories are redundant (i.e., contain similar lists of genes), we adjusted the calculation of FDR in the following manner. First, we identified all redundant pairs of GO terms that had a correlation of gene presence ≥0.7. Second, in the list of GO terms ordered by increasing p-values, pi, estimated from the hypergeometric distribution, GO term i was considered redundant if it was redundant to at least one preceding term. Finally, the FDR was:
where N is the total number of non-redundant GO terms, and ni is the number of non-redundant GO terms from the start of the list till term i. This is a modification of the standard equation for FDR (Benjamini and Hochberg, 1995).
Primer sets for quantitative reversetranscriptase PCR (qRT-PCR) were designed and tested for SYBR (Supplemental Table S14). Green chemistry using an established in-house protocol (Carter et al., 2003). Total RNA was used to prepare cDNA as described previously (Carter et al., 2003). Because the microarray targets were oligo(dT)-primed, all cDNA synthesis reactions were oligo(dT)-primed as well, and qRT-PCR primer sets were designed so that PCR-amplicons were matched to the region upstream of microarray 60-mer oligo probes when possible, or to the region within the 650 bp downstream. These steps were taken to minimize possible bias due to the different locations of amplicons and microarray oligos. Reactions were run on ABI 7900 HT Sequence Detection Systems using the default cycling program, and data were processed using SDS 2.2 software (Applied Biosystems).
Fig. S1. Examples of genes that showed cell line-specific expressions, but did not represent cell strain or origin. Expression levels of four genes (Col6a3, Ctss, Lyz, and Ugt1a6a) were measured by qRT-PCR. If we studied only one of these ESCs or EGCs cells as representative of ESCs or EGCs, these genes would have been selected as ESC- or EGC-specifically expressed genes.
Table S1. Analysis of variance (ANOVA) of log-transformed gene expression levels in pluripotent cell lines in the standard LIF+ condition.
Table S2. A list of genes whose expressions were significantly correlated with PC1+0.69·PC2 in Fig3. These genes represented mostly differences between mouse strains.
Table S3. A list of genes whose expressions were significantly correlated with PC3 in Fig. 3. These genes represented mostly differences between EGC and ESC lines.
Table S4. A list of genes whose expressions were correlated negatively with PC1 in Fig. 2. These genes can be considered “pluripotency-related genes”.
Table S5. GO annotations overrepresented in genes whose expressions were positively correlated with PC1+0.69·PC2 in Fig. 3. These GO categories were associated mostly with the 129 strain.
Table S6. GO annotations overrepresented in genes whose expressions were negatively correlated with PC1+0.69·PC2 in Fig. 3. These GO categories were associated mostly with the C57BL/6 strain.
Table S7. GO annotations overrepresented in genes whose expressions were positively correlated with PC3 in Fig. 3. These GO categories were associated mostly with EGC lines.
Table S8. A list of EGC-specific signature genes.
Table S9. A list of ESC-specific signature genes.
Table S10. A list of 129-specific signature genes.
Table S11. A list of C57BL/6-specific signature genes.
Table S12. Gene expression changes in differentiation-promoting culture conditions (LIF−, RA+).
Table S13. All the data of log-transformed (log10) gene expression levels normalized by signal intensities of Universal Mouse Reference RNAs.
Table S14. Primers used for quantitative RT-PCR.
We would like to thank Drs. Grant McGregor and Hitoshi Niwa for providing ES cell lines. We would also like to thank Ms. Christa M. Morris and Dr. Robert Wersto for performing the flow cytometry analysis, and Mr. Dawood Dudekula for help with bioinformatics analysis. The work was supported by the Intramural Research Program of National Institute on Aging, NIH.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.