|Home | About | Journals | Submit | Contact Us | Français|
The study of complex disease genetics by genome-wide association studies (GWAS) has led to hundreds of genomic loci associated with disease traits in humans. However, the functional consequences of most loci are largely undefined. We discuss here the potential for human induced pluripotent stem (iPS) cells to bridge the gap between genetic variant and mechanisms of complex disease. We also highlight specific diseases and the roadblocks that must be overcome before iPS cell technology can be widely adopted for complex disease modeling.
Over the last several decades, the study of human monogenic diseases has contributed significantly to the understanding of many disease mechanisms and the development of a number of therapies . On the other hand, the mechanistic link between genotype and phenotype has been more difficult to connect in the study of complex genetic diseases. In its simplest form, a genetic disease is categorized as monogenic when it exhibits a clear genotype-to-phenotype relationship from mutation(s) in a single known gene. The discovery of such disease-causing genes relies on the use of linkage analysis and positional cloning to map single mutations within the genome to human disease traits in large families. Because monogenic disorders tend to be highly-penetrant and the affected individuals have relatively limited phenotypic variability, these diseases are particularly amenable to mechanistic studies using in vitro and animal disease models. As such, investigations involving transgenic mice engineered to carry human disease mutations have revolutionized our understanding of disease pathogenesis and will continue to play a key role in biological discovery. Although monogenic diseases represent only a minor fraction of all human diseases, mechanistic and therapeutic insights gleaned from the study of these diseases have helped to facilitate our understanding of complex polygenic diseases .
With the advent of high throughput sequencing technology and the completion of the human genome project, we are now poised to take advantage of these advances to address diseases at the other end of the spectrum - complex polygenic diseases due to aberrant interplay among many genetic, epigenetic, and environmental factors (Figure 1). By nature, complex diseases are difficult to model using conventional cellular and animal model systems that have been successful thus far in modeling monogenic disorders. Unfortunately, complex diseases include some of the most prevalent and morbid diseases afflicting humans-most forms of cancer, diabetes, and heart disease. While each perturbation may only contribute a fraction of the overall risk, en masse the combined effects are believed to lead to disease manifestations that can sometimes be heterogeneous. For example, coronary artery disease can present in a wide spectrum of disease states that range from the diffuse narrowing of all coronary arteries in an obese patient with longstanding diabetes to a young, otherwise healthy patient with a single isolated major blockage of a coronary artery. With the inherent heterogeneity of the complex disease state, it is imperative to develop a model system that can biologically validate the role of each disease-associated locus. The ideal model system would have the capability to incorporate the effects from multiple genetic, epigenetic, and environmental perturbations.
Since 2005, there has been an exponential growth of GWAS linking regions of the human genome with complex human traits. Currently, there are over 1200 genome-wide associations of linked loci for over 200 complex traits with significance level of association at p<10^−8 or better (see http://www.genome.gov/gwastudies for an updated statistic). For comparison, in the prior 15 years there have been roughly1200 genes identified to cause monogenic diseases starting with the discovery of mutations in CFTR in cystic fibrosis in 1989 . The human genome has over 10 million single nucleotide polymorphisms (SNPs) with a minor allele frequency of at least 5%, and these SNPs tend to be inherited in blocks throughout the genome called haplotypes . By studying the associations of SNPs that tag each haplotype in large patient cohorts (usually case-control studies), investigators have been able to link haplotypes with complex disease traits such as type 2 diabetes and coronary artery disease [3,4]. With the explosion in the identification of human disease-associated gene variants in the past 7 years, this era will be recognized as one of the most prolific periods of discovery in human genetics. Recent examples of novel biological insights from GWAS include the understanding of the role of immunity in macular degeneration  as well as the importance of insulin production in type-2 diabetes .
GWAS-based investigation is centered on the hypothesis that common diseases are caused by common genetic variants. This hypothesis assumes that the majority of a population’s genetic risk for a common disease is accounted for by a limited number of common variants with population frequencies greater than 1–5% . The basis for such assumption is that only variants occurring with such frequencies can reach genome-wide statistical significance by the current methodology for GWAS analysis. Examples of complex diseases with important risk alleles meeting this frequency requirement include Factor V Leiden mutation in deep venous thrombosis [7,8] and ApoE type IV variant in Alzheimer’s disease , each with an estimated prevalence of ~5–15% in the general population. Interestingly, in most published GWAS studies thus far, the trait-associated SNPs have tended to be of small effect size with an odds ratio per copy of the risk allele no greater than 1.33 . Even in sum, SNPs from all GWAS studies explain only a minority of the total population risk for the vast majority of complex diseases. Furthermore, since only 12% of trait-associated SNPs fall within the protein-coding regions  it has been difficult to uncover the functional significance of each locus. However, recent investigation has found a causal relationship between SNP variant in lipid GWAS studies and transcriptional regulation of sortilin , raising the prospect for finding the biological meaning in other GWAS loci.
Given the daunting challenge in identifying the biological significance of more than 1200 genome-wide associations to date, any progress in our ability to translate these SNPs to novel pathways would be welcoming. In 2006, Takahashi and Yamanaka published the first seminal paper on somatic cell reprogramming to pluripotent stem cells by enforced expression of four key transcription factors: OCT4, SOX2, KLF4 and c-MYC . Quickly thereafter, it became obvious the potential for induced pluripotent stem (iPS) cells to model complex diseases, since iPS cells are able to capture all of the variation in the human genome while delivering differentiated cell types that can be investigated.
Human iPS cells have several key features that make them amenable to in vitro disease modeling. They are accessible from adult somatic tissues , scalable given their ability to self-renew indefinitely, programmable to differentiate into a wide variety of cell types in vitro , and recapitulate the human genetic state inclusive of all of the variation of the human genome. The promise of patient-derived iPS cells for disease modeling increases with the growing list of publications that have reported faithful reproduction of disease phenotypes in mostly monogenic diseases. However, there remain major roadblocks to the development of iPS cell models for most complex diseases. In vitro modeling with iPS has a limited ability to recreate the appropriate environmental and epigenetic influences on differentiated iPS cells. For complex diseases that lack a robust in vivo phenotype, it will be difficult to validate in vitro phenotypes. Moreover, the selection of the candidate complex disease for in vitro modeling requires careful consideration of the robustness of the genetic association, the ease of phenotype assessment, the available knowledge regarding environmental factors influencing disease onset, and the time course of disease onset. A comparison of the advantages and disadvantages of in vitro ES cell and iPS cell models with gene-targeted animal models is shown in Table I. We believe that human iPS cell-based disease modeling should be pursued only after a careful comparison with the most promising animal model for the same disease since human iPS cell generation and characterization can be challenging and laborious. Until there is more data to support the use of iPS cells to model genetic diseases in vitro, it is unclear the full extent of the impact that iPS cells will have on our understanding of GWAS findings and the biological mechanism associated with those variants.
In the following sections, we discuss specific complex diseases and their associated GWAS variants that have been uncovered thus far. The promises and pitfalls for iPS cells to model these diseases will be highlighted and our speculation regarding the likelihood of success of iPS cell models in these diseases will be presented.
Despite the ability of iPS cells to differentiate into many human cell types in vitro, the use of iPS cells to model a particular complex disease will require at least some knowledge of the cell type that is affected by the disease gene variant. For example, in early onset coronary artery disease, GWAS studies have identified SNPs (e.g. LDL-R and PCSK9) that are most likely related to cholesterol metabolism in the hepatocyte . On the other hand, for SNPs found near genes with unclear relationships to coronary artery disease (e.g. 9p21 locus near CDKN2A/B), it is difficult to envision how iPS cells may help us understand the effect of these variants on disease phenotype . Nevertheless, in the last two years several investigators have found a link between the 9p21 locus and endothelial cell response to inflammation in animal models, raising the possibility that endothelial cells derived from patient-specific iPS cells may be useful for further mechanistic studies [16,17]. One may need to recapitulate the interplay between inflammatory leukocytes and endothelial cells to elicit the appropriate phenotypic effect of the 9p21 SNP variant using in vitro differentiated iPS cells. However, if the identified SNPs affect the function of many cell types, it will be unclear which and how many different cells and interactions will need to be reproduced to elicit the disease phenotype in vitro.
Assuming that the appropriate cell type has been identified, there needs to be a well-developed protocol to direct the differentiation of iPS cells to the particular cell of interest. For example, iPS cells can be directed towards several cell types either spontaneously or with the use of growth factors and/or small molecules . Directly differentiated iPS cells have also been employed in the modeling of several monogenic neurodegenerative, cardiac, and metabolic disorders [18,19,20]. The efficiency of directed differentiation is highly variable from cell line to cell line and is often lineage dependent. In some cases, the best efficiency achieved is less than 5%. This remains a key issue in iPS cell modeling of any disease. Additionally, each pluripotent stem cell line has its own intrinsic differentiation propensity, which limits the generalizability of published protocols for lineage-specific differentiation. This property of iPS cells requires the generation and screening of multiple cell lines in order to find the line that has the most optimal capacity for differentiation into the cell type of interest. Furthermore, the differentiated cells that are derived from iPS cells are generally immature and at best mimic cells with embryonic or neonatal phenotypes rather than mature adult phenotypes[22,23,24]. If the complex disease manifests later in life and is dependent on having the appropriate environmental or epigenetic influences on mature adult cells, then it is unclear if iPS cells can be employed to model such diseases at all.
Unlike in monogenic disorders, the phenotype of most complex diseases is modulated by incomplete penetrance, mild severity, and delayed onset. Any cellular assay to assess the mechanism of disease will require the highest sensitivity and specificity to uncover subtle phenotypes. This implies that a phenotypic “signal” from iPS cell-derived cells that is due to a gene variant will have to be robust and reliable enough to offset the inherent ”noisy” process of somatic cell reprogramming, iPS differentiation and maturation. The presence of residual transgene expression from viral reprogramming factors, and epigenetic variability from both “memory” of the somatic cell origin and the reprogramming process itself may influence the in vitro phenotype significantly. In the former, the use of non-integrating vectors (e.g. non-integrating DNA such as episomal vectors and mini-circle plasmids, modified RNA, and protein-based systems) or excisable vectors (Cre recombinase or transposases) will minimize, if not eliminate, genome-modifications associated with viral-mediated reprogramming . The residual epigenetic memory of somatic cell due to incomplete reprogramming has been shown to significantly affect the differentiation potential of iPS lines as well[25,26], and may affect the interpretation of phenotypes from diseased iPS cells. While the influence of epigenetic memory appears to diminish with further passaging of cell lines, the ongoing debate regarding whether iPS cells are truly equivalent to ES cells serves to remind us that the biology of somatic cell reprogramming is in its infancy.
In some cases the variability in background genetics, epigenetics, and viral integration can be controlled by genome editing technology. While homologous recombination in mouse ES cells has been a powerful tool for creating genetically engineered models, human pluripotent stem cells have been more difficult to manipulate at the genomic level. To overcome this, Soldner et al. used Zinc-finger (ZFN) nucleases to modify the loci for alpha-synuclein to generate matching case and control iPS cell lines (i.e. isogenic except for a single base pair change) for Parkinson’s disease [27,28]. More recently, TALE nucleases was used to introduce or correct single base pair change with high efficiency and accuracy . Using ZFN-nucleases to either delete or introduce two common mutations in Parkinson’s disease genes in iPS cells, Soldner et al have attempted to incorporate the genomic variability of a complex disease while introducing the specific SNP variation in question such that the effect of this gene variant could be compared in isolation.
While genome editing with ZFN- or TALE-nucleases is a conceptually straight forward way to investigate the effects of single SNPs on iPS cell-based disease phenotype, the likelihood is low that introducing only one single gene variant will lead to a reproducible effect on iPS cell phenotype in vivo. Hence, additional approaches will be needed to bring greater value to iPS cell-based disease modeling. Given the current limitation to functionally identify the causal gene within a disease-associated loci, possibility exists that iPS cell-derived cells from a large population of patients (>200) may be used to generate expression quantitative trait loci (eQTL) data to correlate transcriptional changes in iPS cell-derived cells with genomic SNPs . Furthermore, to directly assess the potential role of candidate genes near a disease-associated loci, one can perform gain- and loss-of-function studies using over-expression and knock-down approaches in iPS cells to see whether there is a corresponding phenotype change. If this approach is successful, one can envision the possibility that one may perform combinatorial studies in the same iPS cell line to assess the effect of multiple gene/loci contribution to disease phenotype. In this way, the role of multiple genes on a complex disease phenotype may be deconvoluted systematically.
Since complex polygenic diseases often manifest in the context of cumulative interactions among several genetic, epigenetic, and environmental perturbation, it is important for any model of these diseases to replicate the effects of such non-cell autonomous interactions. Furthermore, it has become clear that complex disease phenotypes are often caused by abnormal interactions between tissues and organs composed of multiple cell types. Since iPS cells are typically directed to differentiate into one specific cell type, it would require multiple different cell sources coming together to create the environment necessary for disease phenotype manifestation. An example of how non-cell autonomous requirement can be recapitulated during iPS cell-based disease modeling is the recent demonstration that glial cells from normal but not diseased sources can support the survival of motor neurons generated from iPS cells from patients with Amyotropic Lateral Sclerosis, a progressive monogenic neurodegenerative disease due to loss of motor neurons .
Type 1 diabetes (T1DM), characterized by the progressive autoimmune-mediated destruction of the insulin-producing beta cells of the pancreas in mostly young individuals, is another example where the interaction among multiple different cell types is essential to faithfully replicate disease phenotype. The loss of insulin production leads to chronic hyperglycemia and multi-organ dysfunction. Although T1DM is classified as an autoimmune disorder with progressive immune-mediated destruction of pancreatic beta cells, the trigger for disease onset remains unclear. It has been suggested that at least three cell types – the cytotoxic T cell that attacks the beta cell, the pancreatic beta cell itself, and the cells within the thymus that assist with T-cell “education” and maturation, are critical in the development of this disease. Several genome-wide association studies of humans with T1DM have been published thus far . These studies showed SNPs within or near genes encoding the HLA class II, insulin, CTLA4, and IL2RA loci among others correlate with T1DM manifestation and confirm the central importance of the T-cell, pancreatic beta cell, and the thymic epithelial cell in the human disorder. Ongoing work to build a humanized T1DM model in vivo via chimeric mouse technology should be highly informative . The success of this strategy will depend on our ability to generate human beta cells, T-cells, and thymic epithelial cells from T1DM patient-specific iPS cells. These cells may then be transplanted individually or all together into an immunocompromized mouse (NOD/SCID) to create a partially “humanized” mouse model . The humanized mouse carrying these different cell types from iPS cells derived from patients with T1DM would enable the identification of the relationship between SNP variant and the appropriate cell type that gives rise to disease.
The wide availability of somatic cell reprogramming technology has now led to the generation of a large number of patient-specific iPS cell lines (for a review see ). In most of the reported studies, the in vitro phenotype is a single cell-based extrapolation of a known in vivo phenotype (e.g. prolonged action potential duration in vitro in cardiomyocytes generated from Long QT Syndrome iPS cells or enlarged cardiomyocytes in iPS cells derived from LEOPARD Syndrome patients who are predisposed to hypertrophic cardiomyopathy). While in many cases these in vitro phenotypes are reasonable surrogates for in vivo disease phenotype, in some diseases, particularly psychiatric and behavioral disorders, this correlation is less clear. For example, neurons differentiated from schizophrenic patient-derived iPS cells have reported abnormalities in neuronal connectivity, decreased neurite number, and altered gene expression . The investigators specifically evaluated these phenotypes because of their reported presence in a mouse model of schizophrenia and in autopsy brain specimens from patients with schizophrenia. Furthermore, the investigators showed that treatment of schizophrenia iPS cell-derived neurons with anti-psychotic medications was able to reverse the changes in neurite connectivity and aberrant gene expression. Since it remains unclear whether the changes in neurite connectivity and number is due to genetic or environmental consequences, it is unclear whether the iPS cell phenotype will be reliable for investigation of gene variants. Given that GWAS studies for schizophrenia have been less successful in identifying disease-associated loci at a genome-wide level of statistical significance (i.e. p < 1×10−8), despite having heritability as high as 80%, the utility of schizophrenia iPS cells for in vitro validation of genetic variation appears limited.
The study of complex disease genetics by GWAS has led to significant discoveries in novel genes and pathways in some of the most prevalent and morbid diseases that affect humans. However, the exact relationships between gene variants and disease phenotypes for most loci are unknown thus far. There remains a significant unmet need to develop in vitro model systems that will allow the exploration of genomic loci to uncover both insights into disease pathogenesis as well as to serve as a tool for therapeutic development. With the ability to create iPS cells from complex disease patients, investigators will now be able to examine whether the “disease-in-a-dish” approach can help to validate the correlation between phenotypes and genotypes arising from the rapidly accumulating GWAS data. With the ongoing development of molecular tools that facilitate iPS cell-based disease modeling such as integration-free reprogramming vectors, engineered nucleases that can modify precisely human DNA sequence, and well-validated differentiation protocols, the current limitations with iPS cell-based studies will hopefully be overcome in the near future. In time we anticipate that iPS cells should join classic mammalian model systems such as cell lines and genetically-modified mice as central tools in biomedical discovery to uncover the vast potential of GWAS.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.