|Home | About | Journals | Submit | Contact Us | Français|
Down syndrome (DS) is a complex developmental disorder with diverse pathologies that affect multiple tissues and organ systems. Clear mechanistic description of how trisomy of chromosome 21 gives rise to most DS pathologies is currently lacking and is limited to a few examples of dosage-sensitive trisomic genes with large phenotypic effects. The recent advent of cellular reprogramming technology offers a promising way forward, by allowing derivation of patient-derived human cell types in vitro. We present general strategies that integrate genomics technologies and induced pluripotent stem cells to identify molecular networks driving different aspects of DS pathogenesis and describe experimental approaches to validate the causal requirement of candidate network defects for particular cellular phenotypes. This overall approach should be applicable to many poorly understood complex human genetic diseases, whose pathogenic mechanisms might involve the combined effects of many genes.
Down syndrome (DS) is a complex disease caused by trisomy of human chromosome 21 (HSA21) that occurs in about 1 in 750 live births . In most cases the extra HSA21 is the result of a maternal segregation error in meiosis I [2, 3] or, in rare cases, Robertsonian translocations (4%) or mosaicism (1%) . Individuals with DS exhibit a large range of pathologies, some of which are highly penetrant, such as cognitive defects, premature Alzheimer disease and aging, and dysmorphic facial features; as well as pathologies that are highly variable, such as cardiac defects, which occur in 50% of cases . DS individuals also have an increased risk of duodenal stenosis (250 times normal risk), increased incidence of Hirschsprung disease and leukemia (30 times and 300 times normal risk, respectively), and reduced incidence of solid tumors .
Although HSA21 is considered to host approximately 350 protein-coding genes, for many years the prevailing hypothesis was that DS was the result of dosage sensitivity of only a small subset of these genes, which are clustered together in one chromosomal region denoted the Down syndrome critical region (DSCR) [6–11]. However, chromosomal engineering experiments in mice  and, more recently, genotype-phenotype analysis of human individuals with partial trisomy for regions of HSA21 [13, 14] have now demonstrated that the complete DSCR is not necessary or sufficient for most DS phenotypes. Rather, these analyses have revealed regions linked to individual DS phenotypes that are distributed over large regions of chromosome 21, suggesting that there are potentially many causative genes in DS  (summarized in Fig. 1).
The new zeitgeist proposes that DS phenotypes arise from a complex interplay of these many causative genes, which can exert pathological influence at various stages of development or maturity . Under this hypothesis, it has been suggested that the genetic mechanisms responsible for DS phenotypes can be subdivided into the following (from ): small sets of dosage-sensitive genes, such as APP, in early onset Alzheimer disease pathology [16, 17]; interacting genes that drive major effects, such as DSCR1 and DYRK1A in heart abnormalities and solid cancer risk [18, 19]; or the collective influence of many coincident small effects. Although this latter mechanism has not been formally demonstrated, small sets of causative genes responsible for most DS phenotypes have resisted identification, and it is increasingly becoming acknowledged that coincident small effects are major drivers of complex disease pathogenesis [20–23]. Moreover, this possibility is supported by the observations that most trisomic genes in DS are indeed overexpressed in multiple tissues [24–29] and that gene deregulation in DS further impacts up to one-third of disomic genes [29–31]. In addition to dosage effects, this gene deregulation seems to result from complex cis- and trans-, intra- and interchromosomal regulatory interactions, including some that may impose genome-wide effects.
As with many complex diseases, most DS phenotypes manifest with variable expressivity. This could be due in part to allelic variation in modifier genes, with only certain combinations of alleles exerting pathological effects, or chromatin modifications such as parental imprinting, which may alter the expression of particular modifier genes [15, 30]. These effects contribute to a complex interplay between genetic, developmental, and external parameters, which collectively determine phenotype expressivity (summarized in Fig. 2).
Mouse models are currently the primary tools used to study DS etiology and have contributed much of our current understanding of DS pathogenesis [32, 33]. These models are powerful because they provide researchers with access to developing tissues and are compatible with a wealth of technologies that allow the roles of particular genes in establishing disease phenotypes to be interrogated. It has become clear, however, that for complex diseases such as DS, with potentially many causative genes, generating mouse models with the exact genetic defect found in the human disorder might not be possible, resulting in an imperfect recapitulation of disease etiology and potentially confounding species-specific disease mechanisms . It is additionally concerning that for many DS pathologies (such as cognitive impairment) the affected mouse and human developmental processes differ significantly .
The most sophisticated mouse models of DS include mice that contain single-chromosome duplications for the three mouse chromosomal regions orthologous to HSA21  or contain an extra copy of the complete HSA21 (Tc1 mice) . Triple segmental duplication DS mice models display cognitive impairment, including reduced hippocampal long-term potentiation, and hydrocephalus . However, these mice lack B-amyloid plaque deposition, which is highly penetrant in DS . It is thought that beyond mouse-human brain differences, such discrepancies could be because mouse-human orthology of HSA21 is not complete [38, 39].
Tc1 mice somewhat circumvent the issue of incomplete mouse-human orthology for modeling DS by carrying a complete copy of HSA21 . However, the utility of these mice is limited by the variable contribution of humanized ES cells to mouse chimeras and by the fact that they are sterile, limiting analysis to mice that are mosaic for the additional copy of human HSA21 . Nevertheless, these mice were found to recapitulate many, but still not all, of the brain-, heart-, and craniofacial development-related DS traits observed in humans . Other than intrinsic developmental differences between mice and humans, this could be because human genes placed in a mouse cellular context will be regulated and behave differently than they would in a human .
Reprogramming of somatic cells of disease-affected patients to a pluripotent state, generating disease-specific induced pluripotent stem cells (iPSCs), has recently emerged as an alternative paradigm for modeling human genetic disease in vitro [40, 41]. In combination with directed differentiation protocols, these iPSC models theoretically make it possible to derive any given disease-affected human cell-type . Fluorescence-activated cell sorting purification protocols further allow isolation of highly reproducible populations of diseased and wild-type cells , which may then be interrogated by the library of tools currently used for the manipulation of human cells in vitro. Because of this, disease-specific iPSCs not only supersede much of the utility offered by mouse disease models but also do so within a human cellular context that is able to faithfully capture complex genetic disease genotypes, making them a particularly powerful tool for investigating the mechanisms of complex genetic diseases such as DS. Disease-specific iPSCs also have the potential to increase our understanding of the genetic (recapitulated by iPSCs) versus environmental (not recapitulated) contributions to complex disease phenotypes.
An increasing array of disease-specific iPSC lines have been generated , but to date their utility in disease modeling has mainly been demonstrated for single-gene disorders [40, 45–48]. We recently demonstrated that neurally differentiated DS iPSCs are able to recapitulate several well-established DS phenotypes, including increased gliogenesis and sensitivity to oxidative stress-induced apoptosis, the latter of which could be pharmacologically prevented . Another group has also recently shown that DS iPSC-derived neurons recapitulate Alzheimer disease-associated amyloid and tau pathologies in vitro , which seems to demonstrate that iPSC disease models are also able to recapitulate degenerative phenotypes over relatively short time scales in vitro.
Schizophrenia is another complex disorder that has recently been successfully modeled using iPSCs . Schizophrenia iPSC-derived neurons were reported to display reduced neuronal connectivity (assessed by an elegant in vitro assay measuring the trans-synaptic transfer of a modified rabies virus), reduced neurite extensions, reduced PSD95 gene and protein expression, and reduced glutamate receptor expression, metrics consistent with phenotypes observed in post-mortem schizophrenia brains . Notably, some of these phenotypes in schizophrenia iPSC-derived neurons could be partially rescued with the known antipsychotic drug loxapine, suggesting that these cells will additionally have utility in drug screening . Taken together, these studies demonstrate that iPSCs are able to recapitulate both DS and other complex disease phenotypes, including those with both early and delayed disease onset, and that disease-specific iPSCs allow testing of corrective therapeutics.
It is tempting to speculate that disease-specific iPSCs will become a core component of future investigations into the mechanisms of complex human genetic disease in general. However, before too much expectation is placed on these models for disease mechanism discovery, researchers need to be aware of several potential technical confounders and other limitations. An obvious limitation is that some disease phenotypes arise from interactions between multiple cell types and involve three-dimensional or non-cell-autonomous parameters that can only be modeled in vitro (if at all) by well-designed organoid or coculture assays, which are technically challenging to engineer. Capturing phenotypes that arise from a combination of genetic background and environmental influences (lifestyle, diet, etc.) over prolonged periods of time will also remain difficult to model in vitro. Further to these inherent limitations, significant gaps in directed differentiation protocols currently prevent the derivation of sufficiently pure populations of cells for many lineages  and are susceptible to potentially confounding variability in differentiation potential, between human embryonic stem cell or iPSC lines (including from the same individual/genotype) . It is also known that the process of reprogramming can generate genetic and epigenetic variation between iPSC lines [53–57] and that iPSCs seem to retain an epigenetic memory of their cell type of origin [58–61], both of which can have a confounding influence on postreprogramming developmental trajectories and other cellular processes. Another consequence of epigenetic memory is that disease-associated epigenetic aberrations from donor somatic tissues could plausibly be preserved through reprogramming and could introduce artifactual epigenetic aberrations in patient-derived iPSC lines, which would not normally be present in pluripotent cells. For these reasons, when using iPSC disease models it is important to generate multiple lines from prudently selected tissues of multiple affected and nonaffected (ideally from siblings or parents) individuals and to prioritize the use of nonintegrating reprogramming technologies, which have been shown to generate less reprogramming-associated genetic and epigenetic variability . Although this is a significant technical and financial burden, it will help to reduce noise in analyses from genetic variation and experimental artifacts, both maximizing power to identify pathogenic pathways and reducing the likelihood of discovering false disease mechanisms.
A picture of many complex diseases such as DS is emerging in which pathological processes can result from perturbed behaviors of biological networks, driven by the collective effects of many risk genes acting in concert within related network modules [20–23]. This is in contrast to single-gene disorders or more simple DS genetic mechanisms, where altered function or dysregulation of small sets of genes is sufficient for phenotype manifestation, and necessitates a different approach to identifying causative complex disease mechanisms. If we are to understand the more complex genetic mechanisms of diseases such as DS, the questions become (a) how are we going to acquire a complete picture of the genetic defect caused by trisomy 21? (b) how are we going to predict the causative genetic mechanisms from this large data set? and (c) how are we going to prove that these mechanisms are indeed causative?
Technologies such as microarrays [63, 64] and, more recently, RNA sequencing [65, 66] have emerged as powerful tools allowing quantification of genome-wide gene expression. Microarray studies on peripheral blood cells or post-mortem material have confirmed that most HSA21 genes are overexpressed by 1.5-fold in DS, reflecting gene dosage [24–27, 29]. It was these studies that also surprisingly identified significant global transcriptome deregulation in DS [29, 30]. Our own transcriptome study of DS iPSCs confirmed that thousands of genes are deregulated across the genome even in pluripotent DS cells , further suggesting that this global deregulation may be involved in establishing DS developmental defects from the earliest stages of development. Global transcriptome deregulation in DS is likely to result from HSA21 transcription factors, epigenetic machinery, and noncoding RNAs acting in trans. For example, overexpression of HSA21 microRNAs (miR-155 and miR-802) leads to progressive silencing of MeCP2, CREB1, and MEF2C in DS post-mortem brain specimens . Application of microarrays and RNA sequencing technologies across periods of development in DS can also provide insight into developmental disease mechanisms, and the dynamics of transcriptome deregulation in DS. Such an approach was used to identify Notch signaling and cell migration abnormalities during DS cerebellar development and demonstrated that overexpression of trisomic genes was remarkably stable through development .
Profiling genome-wide gene expression both in tissues of interest and across periods of development promises to produce a comprehensive annotation of the transcriptome deregulation responsible for DS phenotypes. However, it remains a major challenge to effectively interpret these data in a way that will elucidate the mechanisms that are causative in DS and therefore appropriate for therapeutic targeting.
It is becoming clear that many phenotypes of complex diseases such as DS arise from potentially unique gene deregulation that converges on particular pathogenic processes [20–23]. This limits the power of classic approaches for identifying disease mechanisms from transcriptome data, which might aim to identify mis-expressed gene sets common to all patients or patients with a particular phenotype. A more promising strategy is to use pathway or gene-set enrichment analysis methods (summarized in Table 1). These methods do not rely on common gene sets and are thus well suited to reveal deregulated processes for complex disease mechanism discovery [69, 70]. Such approaches have elucidated defects in dopamine receptor signaling, insulin-like growth factor 1 signaling pathways, and linked proteasomal and mitochondrial pathways from Parkinson disease patient brain gene expression data [71, 72]. Another approach is to reverse the traditional pathway-analysis workflow of mapping mis-expressed genes to annotated pathways and to instead begin by identifying pathways that best distinguish between phenotypes of interest using methods such as “attract ” . “Attract ” is also able to decompose the most informative expression profiles into “synexpression ” patterns and allow the identification of novel genes that had not previously been annotated to a pathway of interest but that nonetheless share the same expression profile and therefore might be under shared regulatory control or contribute to similar biological processes. This approach has been used to implicate novel cell signaling pathways in complex diseases such as Parkinson disease and schizophrenia, and it has also identified disease-associated differences in expression variance of core biological networks, defining disease-associated alterations in network constraints .
As a cautionary note, however, most of these annotation-based methods are still being actively developed, with several sources of bias, including type I and type II statistical error, acting to potentially confound interpretation and reduce reproducibility of results . For example, most publicly available pathway analysis tools generally rely on public databases of gene annotation, which are often biased toward biological processes that have been studied in more detail than others (e.g., apoptosis) . This reduces the discovery component of these analyses. Moreover, a recent study found that random gene expression signatures were commonly significantly associated with breast cancer outcome by standard gene expression signature analysis methods, thus implying that previously identified associations by these methods are likely spurious and nonreflective of bona fide disease mechanisms , emphasizing the essential requirement of assessing the reproducibility of transcriptome studies. Nonetheless, with refinement, pathway-based methods for interpretation of gene expression data are likely to prove particularly powerful in investigations into complex diseases such as DS, because of their ability to detect instances where small changes in the expression of a group of genes is likely to exert an amplified effect via their enrichment as a group within a particular biological pathway. Such pathways would warrant functional experimentation and may constitute appropriate targets for therapeutic intervention.
Another approach is to look for evidence of common causes for large sets of transcriptome deregulation in DS, such as cis- or trans-regulatory effects of HSA21 gene products, including transcription factors, epigenetic modifiers, and noncoding RNAs [77, 78]. This is possible through screening for binding motif enrichment within sets of deregulated genes  or for overlap of transcriptome deregulation with differential methylation or chromatin modification profiles across the genome [80–83]. Such analysis will provide insight into the as yet unexplored contribution of epigenetic mechanisms to DS pathogenesis. Identification of genes that are drivers of global transciptome deregulation in DS is useful because these are likely to be particularly appropriate therapeutic targets, as their normalization could potentially correct coordinated pathogenic processes.
For any predicted pathogenic complex disease mechanism, the next step is to experimentally demonstrate the necessity and/or sufficiency of that mechanism for specific disease features. To do this using iPSC disease models, this first requires that some disease feature can be captured by an in vitro cellular phenotype. Second, the candidate causative mechanism(s) must be predicted from omics data generated from the relevant cell type and developmental stage, since a mechanism that is pathogenic in one setting may not be in another. Once these criteria are met, the reproducible nature of iPSC-derived models leaves investigators well placed to carry out functional experiments. The goal is to experimentally modulate candidate pathogenic mechanisms and examine the consequences on phenotype manifestation, to establish whether the mechanism is causative in pathogenesis. Normalizing a particular disease mechanism and rescuing the phenotype demonstrates the necessity of that aberration for the phenotype. Alternatively, if recapitulating a molecular aberration in wild-type cells recreates a disease phenotype, this proves the sufficiency of that mechanism for the phenotype. Importantly, the ability to normalize phenotypes by specific modulation of molecular aberrations is likely to constitute a starting point for therapeutic strategies.
A number of versatile technologies are available for the specific and regulated over- and underexpression of transcripts of interest. The most established of these include RNA interference-based approaches and overexpression of exogenous transgenes, both of which may be delivered in a variety of ways depending on the particular experimental requirements [84–88]. High degrees of control over the timing of gene knockdown and overexpression are achievable using inducible expression systems , allowing simple assessment of the roles of particular genes during narrow developmental windows or periods following a particular stimulus (maybe to simulate a physiological function). One limitation of these systems, however, is a lack of control over the magnitude of transcript knockdown or overexpression. This is particularly limiting when interrogating complex genetic disease mechanisms, since gene deregulation in these disorders is often expected to be subtle and to potentially involve many related gene products.
More sophisticated control of gene expression modulation is likely to be possible with emerging transcription activator-like effector (TALE)-based technologies [90, 91]. These technologies allow transcriptional activator and repressor modules to be targeted to specific loci anywhere in the genome [90, 91]. Importantly, by substituting activator or repressor modules with different activities, or altering TALE concentrations, it should be possible to achieve a high degree of control over the magnitude of change produced in gene expression. More farsighted possibilities for the subtle modulation of gene regulatory networks, such as those that use synthetic network components to modify the behavior/logic of particular gene regulatory modules [92, 93], might prove particularly useful in modeling and normalizing more elaborate pathogenic genetic mechanisms for some complex-diseases. In some cases, it might also be effective to target gene products, for example by using small molecules or recombinant proteins, to directly examine the contribution of deregulation of these gene products to transcriptome deregulation and pathogenesis . By combining multiple modules/technologies, it should be possible to sensibly modulate complex gene deregulation phenotypes. Once this can be achieved we can ask whether a particular network defect is indeed necessary or sufficient for a particular phenotype.
In cases where large sets of deregulated genes appear to result from trans-regulatory effects of individual deregulated transcription factors, epigenetic modifiers, or noncoding RNAs, technologies such as chromatin immunoprecipitation sequencing [95, 96] and capture hybridization analysis of RNA targets/chromatin isolation by RNA purification [97, 98] may be useful to demonstrate binding of the trans-acting factor to deregulated target loci. Normalizing the expression of the trans-acting factor(s) (by the above approaches) could then be used to confirm the functional importance of these genes in regulating their targets and contributing to disease phenotype manifestation.
Chromosomal engineering using DS iPSCs could allow assessment of the role of trisomy of discrete chromosomal regions in pathogenesis of DS cellular phenotypes. TAL effector nucleases (TALENs) allow targeted cleavage of almost any site in the human genome [90, 91] and have been used for chromosomal engineering of iPSCs . We are currently pursuing a chromosomal engineering approach that allows deletion of large regions of HSA21, adapted from a method used in mice [12, 100]. The method introduces two cassettes bordering the region to be deleted, containing both tandemly oriented Cre recombinase loxP substrate sites for gene excision and positive and negative genetic selection markers, allowing simple selection of successfully excised regional HSA21 deletion DS iPSC clones . This will be particularly informative when rationally implemented to test genotype-phenotype maps generated from human segmental trisomy patients and corresponding clinical data . Combination of regional deletion DS iPSCs with unmodified DS iPSC lines, and whole-HSA21 deletion DS iPSC lines, would offer perfect experimental controls, with identical genetic backgrounds.
The strategy for research into complex disease presented in this review is primarily motivated by appreciation of the limitations of reductionist-type approaches that focus on individual proteins or signaling pathways in the setting of complex disease. These studies have dominated human genetic disease research for the last two decades, and their limitations are well evidenced by the lack progress to date in understanding the mechanisms of, and designing treatments for, complex human genetic diseases in general. Many complex diseases such as DS are thought to arise in part from the collective contributions of potentially small perturbations in large numbers of causative genes. Moreover, in DS causative genes need not be limited to HSA21, because of the ostensibly large influence of trans-regulatory effects on global transcriptome deregulation. To account for this we describe strategies to capture systems-level genetic deregulation, and we describe approaches that are able to predict mechanisms that are likely to be pathogenic from this profile. Importantly, these methods are sensitive to small-magnitude changes in large sets of functionally related genes. We further describe how iPSCs can be used in conjunction with a variety of established and emerging technologies to functionally interrogate these predicted pathogenic genetic mechanisms, to formally demonstrate the necessity and sufficiency of these mechanisms for pathogenesis of particular disease phenotypes. iPSCs have enabled this paradigm of research by providing a renewable and reproducible source of disease relevant human cells, theoretically representing any cell type or developmental stage, and the impact of iPSCs is amplified by the apparent inability of mouse models to faithfully recapitulate complex human genetic diseases.
There are a number of limitations to this strategy as it is presented that will need to be overcome to yield the true potential of iPSC-based research into complex disease. First, the strategy we have presented is not directly sensitive to alterations in the function of particular transcripts or proteins; rather, it only detects changes in their expression. Combination of DNA sequencing technologies with diseased donor cell lines could catalogue genome-wide mutations/single-nucleotide polymorphisms enriched in diseased cells, as done in genome-wide association studies, to aid identification of potential functional as well as expression-level alterations . However, although techniques able to predict the mechanistic consequences of sequence changes to protein function are showing some promise , they are prone to error and have not yet been applied to noncoding RNAs. This limitation will need to be overcome if we are to reduce the requirement for time-consuming and costly experimental validation of the mechanistic influence of large numbers of disease-associated sequence variants.
Second, many complex disease mechanisms may be rooted in protein or metabolic network functional defects [20–22]. Emerging technologies are beginning to allow systems-level profiling of these networks [104–108], including post-translational protein modifications [109, 110] and protein complex stoichiometries [111, 112], even with spatial and high-throughput temporal resolution [113, 114], and within human tissues and cell lines [115–119]. Importantly, once this is possible, an overall approach similar to that which we have presented here for gene expression should be applicable, but instead centered on proteome/metabolome features or even multiomics networks . It is also worth noting that although gene expression profiling cannot detect functional transcript changes or changes in protein/metabolic network function directly, in many cases it may be possible to infer such functional changes based on the structure of gene regulatory network alterations .
Finally, strategies to combat challenges associated with modeling complex genetic diseases will need to be implemented (as discussed above), and progress is still required in developing directed differentiation and cell sorting protocols. Nonetheless, a combination of disease-specific iPSCs with increasingly sophisticated systems-level technologies, including components allowing profiling of complex systems, interpretation of the resulting data, and the functional manipulation of relevant biological networks, promises to produce a lasting impact on the field of complex human genetic disease research and lay a foundation for the future development of therapeutic strategies.
E.J.W. and C.A.W. are partner investigators on the Australian Research Council (ARC) Special Research Initiative in Stem Cell Science, Stem Cells Australia.
J.A.B.: manuscript writing, conception and design, data analysis and interpretation, collection and/or assembly of data; E.A.M. and D.A.O.: collection and/or assembly of data, manuscript writing; C.A.W.: manuscript writing, data analysis and interpretation, collection and/or assembly of data; E.J.W.: manuscript writing, conception and design, data analysis and interpretation, collection and/or assembly of data, final approval of manuscript.
The authors indicate no potential conflicts of interest.