|Home | About | Journals | Submit | Contact Us | Français|
Finding fundamental organizing principles is the current intellectual front end of systems biology. From a hydrogen atom to the whole cell level, organisms manage massively parallel and massively interactive processes over several orders of magnitude of size. To manage this scale of informational complexity it is natural to expect organizing principles that determine higher order behavior. Currently, there are only hints of such organizing principles but no absolute evidences. Here, we present an approach as old as Mendel that could help uncover fundamental organizing principles in biology. Our approach essentially consists of identifying constants at various levels and weaving them into a hierarchical chassis. As we identify and organize constants, from pair-wise interactions to networks, our understanding of the fundamental principles in biology will improve, leading to a theory in biology.
In scientific jargon, law describes a true, absolute and unchanging relationship among interacting elements. Unlike in some fields, social customs and authorities do not determine the establishment of laws in science. Given that laws are derived from empirical observations, it implies that laws symbolize regularities endorsed by a majority opinion. People also use terms like rules and principles to describe consistent relationships expressed by mathematical equations e.g., Heisenberg uncertainty principle, the causality principle of physics. Here we will adopt the less demanding and the more useful definition of law as ‘a frequently observed regularity that allows for a substantial improvement of our prediction ability in well-defined systems’. The distinction among terms rule, principle, theory and hypothesis, is beyond the scope of this paper.
Our knowledge of laws, theories and hypotheses can be traced to physical sciences. While physicists have identified a number of laws related to mass, energy, momentum and so on, some of the ‘laws’ known to biologists are those of Mendelian Inheritance (Mendel 1865), metabolic scaling (Kleiber 1932) and the recent power laws (Jeong et al. 2000). However, even these laws are not absolute—they come with exceptions. For example, non-random segregation of chromosomes (White et al. 2008) and homozygous mutants parenting a normal offspring, are deviation from Mendelian Inheritance (Lolle et al. 2005). The prevailing effect of these exceptions with the overwhelming role of boundary conditions makes paradigms of scientific laws too demanding, like those based on Popper’s falsifiability concept which is of little or no use in biology (Stamos 2007).
It is therefore useful to think of biological regularities as broad generalizations than stiff relationships among interacting components. Here we would like to discuss why absolute generalizations are rare in biology, and what can be done to fill the gap?
Broadly speaking, to discover new regularities and laws we either follow top–down or the bottom–up approach (Fig. 1). In the top–down approach, the search begins with an external observation e.g., Newton’s laws of motion. The observer intuitively imagines a set of elements, a set of interactions and a mathematically expressible form to connect the two. Components are weaved into a mental map and experiments are planned to verify or nullify the model. If the experimental observations repeatedly support the model under different environmental settings, the model takes a more generalized form and may be ultimately adopted, with a broad consensus, as a law.
In the bottom–up approach, one begins by collecting data on individual elements i.e., experimentally determine properties of components in isolation and in association with other interacting elements. Data are collected in different environmental settings and patterns are searched. Once patterns are found, experiments are repeated to confirm observations. The evidence of consistent relationship among interacting components in different environmental settings provides a strong basis to represent patterns in a logical form. This approach was typically used by Gregor F. Mendel to deduce Laws of Inheritance (Mendel 1865).
In both the approaches, scientists contribute their own subjective judgment in terms of what is contingent (exceptions to the rule) and what is essential (obeying the rule). The extent of exceptions and commonalities vary among different instances and clearly has to do with the scale at which the observations are made.
In both top–down and bottom–up approaches, the key is to find a consistent pattern. For example, an equation consistently explaining regularity is a strong indication of a law. The top–down approach i.e., from imagination to observation, has been often used in physics, while the bottom–up approach i.e., from observation to imagination has been used in biology. Interestingly, we have laws for things that we cannot see e.g., light, gravity and sound, but no laws for things that we see e.g., DNA, RNA, proteins and cells. This is due to the fact that former are based on the consistent behavior of elementary particles compared to the latter where interactions are frequently probabilistic.
Going further, one understands that the well-known law of gravity is nothing but a name given to the striking regularity observed in the motion of the bodies. However, even this regularity is obtained by a subjective choice of what is essential. Pure observation tells us that some bodies e.g., leaves on a windy day, go up and down and not directly down towards the earth. Due to this reason Aristotle spoke about two kinds of bodies: light and heavy. Only in the XVII century Galileo decided to think of the difference between lightness and heaviness as contingent and identified the tendency to fall down (gravity) as the key feature. Thus the concept of gravity is essentially a rationalization of the observed behavior of bodies. The search for the material counterpart of this force in terms of particles (gravitons) is still elusive and highly uncertain. In the same way if we clap our hands a nearby mouse will surely run away with a reliability degree of predictability, comparable to that of falling bodies. However, if we try to explain this very repeatable pattern in terms of mouse microarray profile, before and after the clap we will surely have a hard time. The key message is that the molecular level description is sometimes inadequate to explain higher-level behavior of organisms.
The reason why bottom–up approach is preferred in biology is due to the presence of the large variety of context-dependent data types. Due to this reason, a good imagination i.e., top down approach, cannot assure a consistent molecular level description. Furthermore, a rule in biology often comes with exceptions. For example, in the early 1990s telomerase-dependent telomere elongation was considered a kind-of rule in biology. However, the discovery of transposon-dependent telomere maintenance in Drosophila (Levis et al. 1993) demonstrated an exception to this rule. This is not to indicate that exceptions are strange phenomena—they simply point to the undiscovered states of the system. Given the impracticality of studying all possible system states v/s contexts, one should expect to see exceptions along with common trends in biology. For example, the genetic code that has a fairly straightforward implementation comes with the codon bias (Sharp and Li 1987).
The mouse example (described earlier) indicates that the macroscopic level of observation is repeatable and reliable but unhelpful to describe the workings of a system in its entirety. In this case, like in any other complex system, the most fruitful layer of analysis is the mesoscopic level i.e., half-way between trivial determinism (escapability) and pure stochasticity (assumed fluctuation of protein concentration before and after the escape). It is at the mesoscopic level that physiological and anatomical ‘links’ between microscopic and macroscopic levels are formed e.g., nervous system organization and dynamics (Laughlin et al. 2000). As a matter of fact any law dealing with the organized matter, from paramagnetic materials to organisms, resides where meaningful correlations between elementary units give rise to macroscopic regularities that are largely independent from microscopic details. This independence from microscopic description is at the basis of the observed resilience of biological systems at large.
Given the background setting of immense data scarcity, how could Mendel succeed in discovering laws of inheritance when people had no clue about underlying components and interactions? The field of biochemistry was still in its infancy and molecular biology was unheard of. There was hardly any technological aid to help Mendel ask the important fundamental questions in biology. In our opinion, the key reason for his success was his clear understanding that he needed to find ‘constants’. It is interesting that the word “constant” appeared 69 times in his paper (Mendel 1865)! Mendel chose seven pairs of contrasting characteristics and ensured (through in-breeding) that each plant consistently exhibited the same feature. Even if he had included the eighth feature or considered only six pairs of contrasting characteristics, he would have still reached the same conclusion. That is because Mendel “artificially eliminated” noise from his samples and considered only those plants exhibiting consistent patterns both in isolation i.e., monohybrid crosses and in a group i.e., di-hybrid crosses. It is important to understand that these seven pairs of contrasting features i.e., phenotype-level constants, did not change with time, fluctuating environment and so on.
Due to this reason i.e., the strength of the data quality, Mendel only used elementary mathematics i.e., addition and division, to obtain the Laws of Inheritance. In contrast, these days we are inundated with a morass of expression data, have huge computational power, apply advance mathematical techniques, but are nowhere close to identifying the network equivalent of Mendelian laws. This is because moving from the consistent phenotype level to the dynamic molecular level exposes us to a large body of variables e.g., stochastic gene expression (Elowitz et al. 2002; Cai et al. 2006) and concentration gradients influencing cell–cell interactions (Gurdon and Bourillot 2001). The key, therefore, is to mine this vast space of variables for biological constants.
The endpoints i.e., the top-level phenotype and the bottom-level genome sequence may be considered as ‘boundary conditions’ of the living systems. These two ends must be connected through intermediate levels, to understand biology as a whole. Since the concept of “constant” is important here, it is useful to give the word ‘constant’ a definition. At its core, a constant is a measurement that comes out the same every time (Laughlin 2005). The seven pairs of contrasting characters (Mendel’s work) are examples of constants at the phenotype level. The genome sequence may be considered as another “constant” level, even though breaks, transposons, error-prone repairs affect the composition of the original DNA sequence. Nevertheless, the DNA repair mechanisms actively repair DNA breaks maintaining the integrity of essential genome sequence.
Moving from the constant-phenotypic to the constant-sequence level, one comes across several layers of variables e.g., cell–cell interaction, network and pathway dynamics, molecular interactions and stochastic gene expressions. In this space between genome and phenotype, probabilities, fluctuating concentrations, molecular crowding, context dependencies and emergent phenomenon play a significant role. Due to this reason, this layer is the domain of statistical laws. It is the mesoscopic level where, in our opinion, useful principles for understanding the organization of biological systems, reside. But this is not the place to get deeper (we will come back indirectly on this aspect in terms of emergent phenomena). Here we would like to concentrate on the ‘law-like’ style of reasoning. Clearly the mesoscopic scale has been mined for laws e.g., thermodynamics is the home of the most precise and reliable laws in physics, but this precision comes from the averaging over huge ensembles of units, each unit being almost completely ‘unaware’ of the ensemble features. So, how about mechanistic, microscopic, laws in biology?
To address this issue, even though the genome is an attractive ‘microscopic level’ to begin with, it cannot provide all the answers. The genome sequence does not directly control downstream interactions of molecules the pathway and network levels. Moving from the sequence level to the interaction level, it is important to find relationship constants, i.e., a unique gene (or a group of unique genes) controlling a process. Another example of an interaction constant is a protein consistently interacting with another protein in several organisms under well-defined conditions. However, it is unlikely that we will ever find an absolute ‘interaction constant’ common to all the organisms. A trend rather than an absolute correlation is what we should probably expect in biology. It would be useful to identify consistent interaction patterns at the RNA–DNA level, protein–DNA level, pathway and network level, cell–cell interaction level, and build a ‘constant chassis’ from the sequence level to the phenotype level (Table 1). Such a ‘chassis’ could help identify core biological processes, around which variables operate. If such a chassis is built, we should expect to see connectivity constant in the beginning, followed by quantitative constants e.g., thresholds.
Although Fig. 2 describes a partial list of constants, there are obviously more levels/sublevels and, in fact, several ways of representing the data. At the sequence level, sequence motifs (DNA and proteins) seem to be reasonable examples of genome-level constants. At the protein structure level, highly conserved folds and binding domains (e.g., helix-turn-helix, zinc finger, and leucine zipper) seem to be examples of structure-level constants. At the molecular-interaction level, conserved folds could be the examples of interaction constants. At the network level, the ‘power law distribution’ (Jeong et al. 2000) and ‘the small world phenomenon of metabolic networks’ (Wagner and Fell 2001) are examples of network level constants.
Furthermore, it would be useful to find relationship among: (1) constants at the same level, (2) among constants at different levels and (3) among constants and variables at the same and different levels, to get a hierarchical systems perspective of the constants-organization. In such a setting relationships among elements could be described in the form of a ‘periodic’ table (Dhar 2007).
A bio-periodic table is a tabular arrangement of elementary interacting components that, when connected, lead to higher-level properties of systems. In our opinion, the mesoscopic level of “protein fold”, instead of a microscopic level of DNA sequence, represents a reasonable building block of such a periodic table. The concept of a unit in this sense is not a structural irreducible minimum but a “workable unit” that provides enough description to reliably compose circuits. In the field of engineering one also uses higher-level abstraction and does not compose electronic circuits from a collection of atoms or subatomic particles (as elementary units). Likewise in biology, a cell can be considered a unit for a tissue-level description. An interaction can be considered a unit for a network-level description. Folds are reasonable fundamental units of a bio-periodic table, as they show less redundancy than the sequence level data and are directly responsible for most of the interactions, at the level of pathways and networks. Moving from folds upwards, a bio-periodic table can connect fold-level description to the cell-level response through a series of hierarchical information transfers. Two key issues arise in such a description: the need to build a ‘fold interaction table’ and, the need to build ‘interaction management’ table. An ‘interaction management table’ would set “boundary conditions” to all possible interactions by adding regulatory loops, quantitative thresholds and contextual descriptions. It would be interesting to see how a bio-periodic table performs and evolves, as data comes in. Even though the term ‘table’ has been used to bring conceptual clarity from engineering design perspective, the bio-periodic table would most likely resemble a tree.
Standards are created to establish quality norms and requirements for the community. A scientific standard is a reference measurement used for comparisons. Once tested, validated and published, standards are adopted widely. The BioBricks project (Shetty et al. 2008) is an engineering inspired approach to create de facto standards for building organisms. Though the approach is novel and interesting, it is unclear whether engineering level standards will ever be possible in composing biological systems. Also we do not know of a biobrick-based system that cannot be constructed without biobricks, or the boundary conditions beyond which adding more biobricks will result in the loss of control. In general, even though reverse engineering of organisms is a logical approach, it is early to say if the bottom up construction is going to be easier than disassembling them top–down.
The key difference between standards and constants is in the system they belong to. “Standards” are artificially created reference points against which other things can be evaluated. “Constants” describe consistent observations derived from natural systems. To specifically describe this concept in the context of biology-constructing systems bottom–up would be easier if there are standards in biology. However, to understand naturally evolved systems on the whole, systems biology would need constants e.g., a constant interaction, a constant phenotype, a constant expression profile and so on. The question is: can standards and constants meet at some point in the future i.e., can human created reference points turn out to be naturally occurring constants in biological systems? In our opinion, standards are reasonable constraints on the system that can help us uncover new information in a controlled environment. By creating standards in biology and applying them for the in vivo construction, it is quite possible to identify naturally occurring biological constants and rules of biological composition, leading to the discovery of new regularities in biology.
Laws are formal representations of objective reality. They do not necessarily represent the total reality but symbolize a specific feature of the system. Richard Feynman prefers to view Laws as rhythms or pattern in nature apparent only to the eye of the observer (Feynman 1967). While studying these patterns sometimes we tend to overlook the influence of one pattern on the other. A case in point is the well-known Law of Gravitation. Newton’s Law of universal gravitation describes attraction between bodies with mass and is widely used in planetary studies. However, this Law does not truly describe how bodies behave. For bodies with significant mass and charge, the Law of universal gravitation and Coulomb’s Laws of electric charges must interact to determine the final force. Neither of these describes how bodies behave real time (Cartwright 1983). Newton’s theory does not capture the impact of gravitational force from other heavenly bodies in determining the final force. Also, the modern thinking is that Newton’s Laws are emergent i.e., these laws symbolize a collective property exhibited by aggregation of quantum matter into macroscopic fluids and solids (Laughlin, 2005). Similarly, the well-known laws of pressure and volume break down when the numbers of gas molecules reduce below a certain threshold. In other words, laws hold well only in a certain range below or above which, uncertainty exists. It is important to recognize that collective coordination of entities, at several levels of organizational hierarchy is not only fundamental to our existence but also provide the right material for discovering new laws in science.
More than 7 million protein sequences and more than 50,000 protein structures have been experimentally determined (Kelley and Scott 2008). With the emergence of the new direction of metagenomics, many more molecular components and interactions are waiting to be discovered. Therefore, it is logical to assume a fundamental organizing principle to explain how information is efficiently transferred over large bio-molecular networks. Whatever happens within the organisms might be interpreted as biology but it is important to clearly understand the distinction between chemistry and biology.
Everything an organism is composed of does not belong to the realm of biology. The construction of matter from atoms and molecules can be described with the help of Physics and Chemistry. The layer of atomic structure is described by Physics. The layer of atomic interaction is described by chemistry. One might think of protein–protein or protein–DNA interactions in terms of laws and rules in biology. However, even these bio-molecular structures and interactions are the outcome of physical processes. The question is: “where does the real biology begin”? In our opinion, the real biology is composed of space that exists between interaction and function i.e., biology must operate at levels higher than that of atoms and molecules. In other words, the real biology exists in the purpose and not just plain physical interactions. One may consider feedback loops as the physical equivalent of the purpose. In fact, organisms may be abstracted as “similar-input v/s unique-output” black boxes that vary in terms of feedback loops, more than the building blocks themselves.
In search of new laws in biology, it would be pertinent to ask: ‘why it exists’ in addition to ‘how it exists’? This question is actually a subset of a broader question concerning the purpose of our existence i.e., why life exists? Probably the ultimate answer resides somewhere at the boundary of philosophy and material science. At a physical level, the laws exist because of an inherent order in the system. Science simply describes this inherent order in the form of rules, principles and laws. So, the question—why laws exist is because regularities exist. Why regularities exist is because molecular structures fit into each other—the structure-enabled interactions is the root cause of higher-level regularities. If an object does not interact it is probably an evolutionary appendage, waiting to be recycled or to be structurally modified for a minimal interaction. Laws are human-invented formalisms created to make a sense of what’s naturally available and build new designs from existing raw materials.
The discovery of laws, based on well known constants in physics e.g., Planck’s constant, speed of light, laws of motion) encourages search for similar regularities in biology. At the phenotypic level, Mendelian Laws of Inheritance provide a reasonable framework. However, at the cell–cell interaction and molecular networks levels, fundamental organizing principles remain to be discovered. In biology, it is difficult to conceive the existence of (1) components with predictable behavior, (2) non-decomposable components similar to the elements of a periodic table and (3) universal biological constants equivalent to those of the physical constants. In essence there is no ‘standard trajectory’ in biology—every biological decision is optimal in a given environmental context. However, due to complex nature of biological organization it is difficult to think of a universal law or a theory in biology connecting all the levels, from atoms to ecosystems. One should look for generalizations at various levels instead. To find such generalizations it is useful to develop novel measurement technologies that capture the dynamic nature of biological systems and more importantly catch emergent properties arising from a group behavior of interacting components.
Once the core principles of collective organization are uncovered, the species-specific variation can be explained by considering metabolic/regulatory plug-ins into the fundamental framework. It will be similar to describing foundational rules of automobile construction and adding unique functionalities to build unique car models. Although it is unclear whether we will ever be successful in finding new laws/principles in biology, our paper presents a fresh approach to address this issue.
From qualitative data, some static constants have been identified e.g., sequence and structural motifs, power laws and so on. However, one needs to extract dynamic constants from quantitative data e.g., concentration thresholds. One must be aware that the term ‘constant’ does not catch the value that remains the same independent of the boundary conditions. Each ‘apparently local constant’ takes along a ‘non local’ character by the inheritance of the structure and dynamics of the network. This matches very strictly with the problem of impedance in electrical engineering and was exploited in terms of an electrical based metaphor (Palumbo et al. 2005, 2007). The correlation of metabolic rate with the body mass in both prokaryotes and eukaryotes (Kleiber 1932) tells us that the search for regularities in biology is a worthwhile effort. However, in future we need to address the issue at molecular network complexity.
Finally, one may ask if it makes sense to identify regularities from data that is often incomplete and sometimes incorrect too. The question is: can we make generalizations from incompletely understood systems? There is another school of thought that says laws in biology simply do not exist. According to this belief, organisms emerge from spontaneous order. We would like to argue that spontaneous order does not point towards a randomly organized system. Spontaneous order merely indicates that components find each other and create a robust system. Unless the act of finding each other is based on well-defined rules, it is difficult to explain how consistent phenotype can repeatedly show up from molecular interactions. In this paper we have explored the possibility of using Mendelian approach for finding new laws in biology. There maybe several, more efficient approaches than constant-based approach. Irrespective of all this, it is important to recognize that laws formalize consistent observations; they do not explain them.
This work was supported by intramural funding from RIKEN.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Pawan K. Dhar, Phone: +81-45-5039551, Fax: +81-45-5039176, Email: pj.nekir@rahdkp.
Alessandro Giuliani, Email: email@example.com.