|Home | About | Journals | Submit | Contact Us | Français|
It will not be long before we have mutant alleles for every gene in the mouse genome. It also will not be long before we can sequence an entire genome in a few hours and for less than 1000 US$. But there is much more to come, and the speed with which these developments will take place will surprise you. Imagine following the fate of every single cell during the development of a mouse from fertilization to birth and even beyond. Imagine watching the expression of a single molecule of any protein or the total expression of all proteins in a single cell continuously over time. Imagine titrating the expression of single genes in specific cell populations at will.
High-throughput technologies have become the driving force in the analysis of biological systems. Biologists are increasingly taking advantage of automatization, miniaturization, and computerization. In this sense biology follows the development of computer and information technology: smaller size, higher speed and capacity, lower cost. However, we should remember that the age of computer and information technology was preceded by a pre-exponential phase during which important theoretical frameworks and concepts were developed: Alan Turing (1936) and John von Neumann (1945) provided the mathematical basis for an automatic computing machine and a corresponding “computer architecture.” To Claude E. Shannon (Weaver and Shannon 1949) and Norbert Wiener (1948) we owe the mathematical theory of information and cybernetics. The convergence of electronic and mechanical engineering then triggered the development and application of systems control theory, a key requirement for the modeling and simulation of the dynamics of technical systems.
A major challenge in biology is to model, simulate, and eventually predict the behavior of complex biological systems. The identification of the individual components that constitute a biological system, i.e., through the genome-wide transcriptome, proteome, and metabolome analysis, will be required but will not be sufficient to achieve this goal. We will also need detailed information about the “network architecture” and the dynamics of biological systems. This is where systems biology comes into place (Kitano 2002; Kirschner 2005; Palsson 2006; Alon 2007).
Biological systems are emerging, adaptive systems, highly complex and often nonlinear. Their behavior cannot be explained solely on the basis of their individual parts. Deep insight into the network structure, function, and dynamics of biological systems can be obtained only through their systematic perturbation, followed by a detailed characterization of the molecular, cellular, and phenotypic changes that follow these perturbations. Based on the perturbation consequences observed, a model can then be established or existing models modified or further developed that grasp the important features of the underlying mechanisms (Sauro and Kholodenko 2004).
Mouse genetics has been an extremely powerful perturbation method for nearly a century. Loss-of-function and gain-of-function mouse mutants are able to reveal causal relationships between specific genes and specific phenotypes. An impressive mouse genetics toolbox is now available that allows us to perturb a wide range of biological systems. Methods such as the production of transgenic mice, gene targeting through homologous recombination in embryonic stem (ES) cells, phenotype or gene-driven mutagenesis strategies, or RNAi-based knockdown are now used on a routine basis. Sequence diversity is also a form of natural perturbation. In combination with the analysis of gene expression and phenotype analysis, a thorough comparison of the consequences of allelic variants can be very powerful. The main challenge will be the functional dissection of the combinatorial activity of small sequence changes, forming the core of “Complex Trait Analysis.”
Perturbing biological systems through genetic changes is only one way to obtain information to dissect the structure and function of genetic networks. Equally important and increasingly appreciated is the use of small molecules, which can act as agonists or antagonists of biological processes (Schreiber 2005). Whereas a while ago combinatorial chemistry was largely a domain of pharmaceutical drug development, the power of small molecules as a means to study the function of specific proteins or pathways is increasingly appreciated. There are specific strengths and weaknesses to the use of small molecules. One of the most important aspects is specificity. Very rarely does a small molecule bind to one and only one target; in many cases the precise number and nature of targets is unknown. Small molecules are more adaptable to the titration of dose-response or pulse-chase studies. Similar to searching for modifiers in a genetic screen, chemical biologists now are starting to perform combinatorial screens to unravel redundancies or pathway interactions that are not revealed by single small-molecule screens. In fact, one may be able to stay below the “toxic window” of a specific small molecule by combining two or more of them, each of which acts on different targets within the same pathway. Eventually we will see a convergence between the fields of small molecules and small animals, i.e., in the area of noninvasive imaging (Sako 2006). Molecular markers will become available that allow us to follow perturbations at the molecular and cellular level and in real time.
Analyzing the individual components of a network is not sufficient. We need to understand how these components interact with each other and which are in direct or indirect contact. We must know how the components dynamically interact, what compensatory mechanisms are triggered, and when components are defective or are inactivated. An understanding of a system therefore requires knowledge about the system’s structure and architecture (Papin et al. 2005). Once we have sufficient information about the structure of a system, we can begin to study systems dynamics. This will then help us to understand the control measures that are responsible for the overall behavior of the system or its modules under external perturbations (Carpenter and Sabatini 2004; Barabasi and Zoltva 2004; Alon 2007). These cannot automatically be inferred from the parts list of a system. Biological information is passed through a number of highly integrated networks, including transcriptome, proteome, or metabolome networks (Khammash and El-Samad 2004; Saez-Rodriguez et al. 2004). Methods to reconstruct or analyze biological networks have become an active field of research (Oda et al. 2005; Oda and Kitano 2006). Since Leonard Euler and Paul Erdöz, the field of network analysis and graph theory has developed tremendously. Network analysis is also the basis of understanding disease pathogenesis and disease traits.
A common theme in advanced technologies and engineering is to divide systems into modules that can be treated individually or in terms of connecting different modules as part of a higher-order system. Since the discovery of the double-helix structure of DNA, a reductionist approach to analyze biological systems has proved to be extremely successful. However, we feel that we are reaching a limit as to how much we can learn about complex biological systems by looking with increasing resolution at individual components of a system. No doubt, at the end we would like to understand biological phenomena on the basis of atomic resolution. On the other hand, the rise of systems biology reflects our increased appreciation and desire of looking at all the scales of biology, including the molecular, cellular, organismic, and population-based levels.
Partitioning biological systems into modules helps to achieve a more integrated picture. To understand causal relations among individual parts and modules, we need information about the directionality of flow of information or material between the edges within a network (Natarajan et al. 2006). We already know that systems behave differently depending on whether we deal with one or a few molecules or millions of molecules. Stochastic and statistical approaches, i.e., Bayesian network reconstruction algorithms, need to be applied to deal with the uncertainties and probabilities of biological systems (Needham et al. 2006). The role of noise in biological systems is just being unraveled. Some of the most important contributions are currently made by physicists who are able to apply the repertoire of statistical physics to biological problems (Rao et al. 2002; Samoilov et al. 2005; Sprinzak and Elowitz 2005; Kussell et al. 2005; Alon 2007).
Given their complexity, a remarkable feature of biological systems is their robustness with respect to environmental perturbations (Kitano 2004a,b; Kitano and Oda 2006; Kurata et al. 2006). How do biological systems preserve their function despite environmental conditions that can differ over magnitudes of scale leading to tremendous fluctuations in metabolic components or ligands? We do not yet understand the underlying mechanisms that are responsible for this robustness. Genetic redundancy, i.e., the presence of multigene families that can at least partially substitute for each other, is apparently one way to increase the robustness of a system. Similarly, a redundancy of pathways could contribute to the potential of a cell to maintain the robustness of a biological system. On the other hand, there might be a price to be paid, i.e., under different environmental conditions, leading to a tradeoff of robustness versus fragility dependent on the external factors that act on the system (Kurata et al. 2006). Robustness or fragility of biological systems can be understood only if we obtain insight into the structure and the dynamics of elements responsible for feedback control, an essential element in almost all complex systems (Schmidt and Jacobsen 2004).
Robustness and fragility are also highly important on understanding disease pathogenesis or the susceptibility or resistance toward the development of diseases (Butcher et al. 2004; Kitano 2004b; Fishman and Porter 2005; Wagner 2005). What are the factors that drive a physiologic system toward its disease state? How can we interfere with an unbalanced situation through preventive or therapeutic measures and maybe push back a disease state toward a more buffered state? What are the critical components that could be selected as a drug target? We are just at the beginning of identifying specific molecular components as indicators of the state of a system and more important as predictors for the future development of the system, i.e., as an early marker for disease development (Lage et al. 2007). These “biomarkers” do not necessarily need to be the same as those that qualify as drug targets. One of the frustrating issues in the drug development pipeline is the lack of sufficient preclinical predictability for safety and efficacy. Although many of the animal models are able to predict side effects of drug candidates, in many cases we miss adverse reactions and identify them at later stages of clinical development. By combining network analysis, statistics, and high-throughput genetic and genomic approaches to identify new relevant biomarkers, systems biology bears great potential to improve the predictability of our preclinical in vitro and vivo models (Hood et al. 2004).
Maybe we have focused too much on the similarities of model organisms instead of also trying to understand the differences. Maybe we should increase our efforts in comparative systems biology. We might have to take a much closer look at the differences between mice and humans in terms of their relevance for drug development and try to understand the mechanisms of species-specific absorption, distribution, metabolism and excretion (ADME). Some of the species differences can be overcome by, for example, introducing human genes into the mouse genome or by xenografting human stem cells into mice (Shultz et al. 2007). These efforts in “humanizing mice” are still at the “trial and error” stage. We urgently need a comparative systems analysis that could guide us in selecting the most relevant genes or cell types that are the cause of differences in drug responses or disease pathogenesis and that should be prioritized in our efforts to improve the predictability of mice as a model system for human disease.
Biological systems are complex adaptive systems that emerge during the development from a fertilized egg to the development of an adult organism. During evolution changes in the environment lead to different constraints and fixation of certain degrees of freedom in genome structure and function. Components of genome networks can be added or changed only when the workability and functionality of the biological system is maintained, at least to a certain degree (Ottino 2004; Weitz et al. 2007).
Comparative systems analysis needs appropriate databases (Albeck et al. 2006; Kersey and Apweiler 2006). These are not yet sufficiently developed. The mouse comparative ontology database (http://www.informatics.jax.org/menus/homology_menu.shtml) is useful but does not provide information about the components, interactions, and dynamics of physiologic systems.
We need all the information available, i.e., a user–friendly, easily retrievable information system on the level of transcripts of a given cell, the dynamic response of mouse vs. human cells to small molecules, the levels of redundancy in the two species, species-specific genes, splicing patterns, and post-translational modification.
Networks of biological systems are so complex that they cannot be understood by intuition. Some systems properties are even counterintuitive! It is the iteration of experiment and simulation that will characterize future systems biology. We need to describe biological systems mathematically and treat them in an integrated and quantitative manner to come up with predictions about their behavior (Gershenfeld 2006; Szallasi et al. 2006). So far biologists often formulate their conceptional picture of a biological system as a flowchart-type model. These are more or less static and do not encompass information about the behavior of a system over time, i.e., after a specific environmental perturbation. Model building has often been done by biologists on an intuitive basis. Biologists are often not aware that there already exists a rich literature and toolbox in systems control theory (Csete and Doyle 2002; Tyson 2003; Brent 2004). We need to get used to applying systematic perturbations, observing the reaction of the system to these perturbations, developing a first-approximation model, and testing this model by further perturbation studies (Locke et al. 2005; Aldridge et al. 2006; Janes and Yaffe 2006). Biologists are fairly well trained in hypothesis testing but not in hypothesis generation. This is where systems biology has its greatest potential. Description will converge with prediction.
Systems biology often tries to apply formal mathematical descriptions based on time-series analysis of biological response. So far the sheer amount and the quality of data constituted significant roadblocks to tackle the dynamics of biological systems. Technological advances help us to overcome these problems. A more severe problem, at least for the current generation of biologists, is the limited training in mathematics. The first two years of engineering training provides the mathematical toolbox necessary for a mathematical description of technical systems and is essential for modeling or simulating the behavior of complex systems. It will be neither possible nor useful to turn every biologist into a mathematician. However, we need to improve the dialog between biologists and mathematicians, physicists, and engineers. The basics of linear algebra, vector analysis, and graph theory have to enter the curriculum of a biologist’s training (Wingreen and Botstein 2006).
Unfortunately, formal tools for model production do not yet exist. In addition, model building is not easy and requires a very good understanding of the biological system under study. A question often raised is where to start: bottom up, top down, or a combination of both. An interesting suggestion is to start “middle out,” where the modeling begins at the level at which there are rich biological data and then reach up and down to other levels (Noble 2002). Another major difficulty is the transfer of a model from one application to another. We need to develop standardization frameworks so that even novices in computational biology or systems biology are able to build, access, and work with existing models (Wall et al. 2004).
For more than 100 years mouse genetics has relied on the analysis of single monogenic mutants. The methods to identify or produce mutants have changed considerably over the years. Soon we will have in our catalogs and freezers mouse mutants for every gene in the genome (Collins et al. 2007). Extensive collections will also be available as a result of phenotype-driven mutagenesis screens (Balling 2001). Whereas the analysis of these mutants might keep us busy for many years to come, the next frontier of mouse genetics is already on the horizon: systems genetics. We all know that the expressivity and penetrance of mouse mutant phenotypes can vary tremendously, depending on the genetic background. Modifier screens can be used to identify some of the genetic loci responsible for the strong influence of genetic background on physiologic and pathophysiologic processes. Sequencing and, as a cheaper substitute, SNP typing have provided us with a detailed picture of the genetic diversity of our main inbred mouse strains. Most of them are derived from a very limited pool of parental strains, and strong selection was applied to obtain the handsome, highly adapted common lab strains of mice that we now use in our experiments.
Recombinant inbred strains and other reference panels of inbred strains are powerful tools for performing a genome-wide dissection of complex biological traits that are the result of multiple, quantitative, and often highly interacting genes (Churchill et al. 2004; Flint et al. 2005; Zou et al. 2005; Hill et al. 2006; Peters et al. 2007). The series of BXD strains has been a paradigm for the success of analyzing complex traits. Unfortunately, the use of recombinant inbred strains does not fall under the category “quick and easy” but requires a fair amount of logistics, infrastructure, and an appreciation for the power of genetics. The major bottleneck, however, was the “power of mapping resolution” that the analysis of 30-80 recombinant inbred strains provides. The Complex Trait Consortium has tackled precisely this problem (Churchill et al. 2004). The goal is to produce approximately 1000 recombinant inbred strains (The Collaborative Cross) within the next five years and make them available as an open source to the scientific community. Importantly, the parental strains chosen include three strains that we would classify as “inbred wild mice,” i.e., PWK/PhJ, WSB/EiJ, and CAST/EiJ. The inclusion of these genetically highly diverse strains adds about 75% additional sequence diversity. The availability of this large panel of diverse and well-structured strains will allow experiments where mice with an identical genotype can be produced in large numbers and compared to an equally large number of mice with a wide range of different genetic and even environmental backgrounds. Sequencing of the parental strains and a community-based complementary and additive phenotyping will eventually produce a resource that will help us to answer questions about gene function, epistatic genetic interactions, and genome-environment interactions that we can currently only dream about.
There are other approaches, i.e., the development of consomic mouse strains, that essentially target the same questions (Peters et al. 2007). It will be important to not look at these approaches as exclusive or competitive, but as a new toolbox of quantitative trait analysis where each one has specific pros and cons. New phenotyping methods, including gene expression arrays, or phenotyping based on noninvasive imaging will have to be integrated into the described complex trait studies. Microarrays are a new microphenotyping platform that allow us to look at the expression of thousands and hundreds of thousands of different genes (eQTLs). This shift to microphenotypes requires new statistical tools because of multiple-testing issues but it also gives a much higher computational capacity then ever before. To quote Denis Noble: “Biology is set to become highly quantitative in the 21st century. It will become a computer-intensive discipline” (Noble 2002).
For many years mouse genetics has been the driving force as a hypothesis generator for functional genomics. Mouse models, i.e., transgenic mice, knockout mice, or mouse mutants identified from phenotype-driven screens, are great tools to identify candidates for human disease genes. The construction of mouse inbred strain panels derived from genetically diverse parental populations provides us with valuable model populations. At the same time, the power of human association studies has reached a point where some people even think that it heralds the end of mouse genetics. I think the opposite is true. The availability of mouse reference populations will allow us to ask questions that complement those addressed by human association studies. More importantly, we can quickly validate hypotheses derived from human population studies not only by constructing equivalent mouse populations but also by probing the function of individual genes through the analysis of gene targeting or specific point mutation alleles. The argument that we can find such mutations also in human populations does not take into account that in mice we are not only able to study the effect of genetic variation, but also to “titrate the environment” much better than it will ever be possible for humans.
At this time, mouse geneticists and human geneticists have not connected well enough to exploit the power of their respective toolboxes. To quote Rob Williamson: “There is still an impedance mismatch between human association and reductionist mouse studies.” Maybe this special issue of Mammalian Genome can contribute to better cooperation between mouse and human geneticists. It will pay off for all of us.