To better understand the causes and consequences of variation among human pluripotent cell lines, we used genomic methods to characterize a panel of 20 ES cell lines and 12 iPS cell lines. All cell lines exhibited similar DNA methylation and gene expression levels, which clearly denoted them as pluripotent and set them apart from somatic cells. Despite their global similarity, we could identify in each cell line a number of genes that deviated from the DNA methylation or gene expression levels of the other cell lines. These cell-line-specific outliers were relatively stable over time, and our dataset suggests that some may have functional consequences, for example by interfering with differentiation into certain cell types. Cell-line-specific outliers were slightly more prevalent among iPS cell lines than among ES cell lines, but we could not find any epigenetic or transcriptional deviation that was unique to and shared by all iPS cell lines. This observation was confirmed by developing bioinformatic classifiers, which could correctly identify most but not all iPS cell lines in our dataset based on their DNA methylation and gene expression profiles.
These results suggest that ES and iPS cells should not be regarded as one or two well-defined points in the cellular space but rather as two partially overlapping point clouds with inherent variability among both ES and iPS cell lines (). In this model, a single iPS cell line can be indistinguishable from ES cell lines, even though there is a difference in our current dataset between the average ES cell line and the average iPS cell line (denoted by the two crosses in ).
These observations have important practical implications. On the one hand, equivalence to ES cell lines is unlikely to be a sufficient indicator of an iPS cell line's utility for a specific application, given that cell-line-specific outliers were prevalent even among ES cell lines. On the other hand, no single cell line may be equally powerful for deriving all cell types in vitro, implying that researchers would benefit from identifying the best cell lines specifically for each application. Unfortunately, the teratoma assay (
Daley et al., 2009) does not provide the level of specificity and detail that would support application-specific selection of the most suitable cell lines (cf.
Boulting et al., 2011). Teratomas are also too time consuming and expensive to be feasible for validating a large cohort of iPS cell lines, highlighting the demand for more informative and efficient assays that can be used to validate human pluripotent cell lines.
We sought to address the need for better validation assays by developing a genomic scorecard of pluripotent cell line quality and utility. The cell-line-specific outliers detected by DNA methylation and gene expression profiling were aggregated into a deviation scorecard ( and
Table S5), which enables researchers to quickly identify defects at known genes that are relevant for the intended application. This gene-specific view was complemented by the lineage scorecard, which provides a systems-level assay for quantifying how well each cell line can be differentiated into the neural and hematopoietic lineages, and into the three germ layers (). We tested the practical utility of this scorecard by comparing its results with independently derived motor neuron differentiation efficiencies and showed that it was highly predictive (
Boulting et al., 2011).
Because the scorecard does not involve any labor-intensive steps, it becomes feasible to quickly screen through a large number of iPS cell lines in order to find the most appropriate cell lines for an intended application (). Furthermore, the scorecard provides a substantially more detailed characterization than for example the teratoma assay, and it therefore seems plausible that genomic scorecards could over time supersede the teratoma assay as the gold standard for validating human pluripotent cell lines. To assist researchers who want to use the scorecard on their own cell lines, we provide an extended technical note in the Extended Experimental Procedures. The scorecard can readily be adapted to other protocols for DNA methylation and gene expression profiling, and it is easy to incorporate new cell types in the prediction of the lineage scorecard. In the future, it will be necessary to validate the predictiveness for additional directed differentiation protocols, and it may occasionally be necessary to recalibrate the scorecard (e.g., for directed differentiation protocols that do not involve an EB step). The scorecard could also provide a useful readout when optimizing cell culture conditions, developing new reprogramming protocols, or continuously monitoring cell line quality in large-scale production facilities. For example, it will be interesting to measure whether the use of integration-free methods for reprogramming (
Soldner et al., 2009;
Warren et al., 2010) has an effect on the differentiation propensities of iPS cell lines.
In conclusion, the discovery of human pluripotent cells and the reprogramming methods to produce them from selected patient populations has revolutionized the way we think about studying and treating human disease. However, if we are to efficiently and effectively use these discoveries to improve the lives of patients, we must continue to develop tools (such as the scorecard described herein) that optimize and streamline the selection and monitoring of pluripotent cell lines and their differentiating progeny.