|Home | About | Journals | Submit | Contact Us | Français|
Large-scale cancer genomics, proteomics and RNA-sequencing efforts are currently mapping in fine detail the genetic and biochemical alterations that occur in cancer. However, it is becoming clear that it is difficult to integrate and interpret these data and to translate them into treatments. This difficulty is compounded by the recognition that cancer cells evolve, and that initiation, progression and metastasis are influenced by a wide variety of factors. To help tackle this challenge, the US National Cancer Institute Physical Sciences-Oncology Centers initiative is bringing together physicists, cancer biologists, chemists, mathematicians and engineers. How are we beginning to address cancer from the perspective of the physical sciences?
There is a rich history of the physical sciences contributing to cancer research and treatment. Max Delbrück, a physicist, was one of the pioneers of molecular genetics. In collaboration with the biologist Salvador Luria, he showed that phage resistance in a population of bacteria is caused by random mutations. The equations they developed to model this process are still used to predict how cancers gradually become resistant to chemotherapy. Francis Crick and Maurice Wilkins, two physicists, the biologist James Watson and the chemist Rosalind Franklin discovered the structure of DNA, and thus laid the foundation for cancer genomics and much of contemporary biology. Indeed, concepts from other fields as wide-ranging as agriculture (the seed-and-soil hypothesis1), developmental biology (Folkman’s antiangiogenesis strategy2) and mathematics (Nowell’s clonal evolution model3 and the multistep theory of tumorigenesis4,5) are at the core of cancer biology.
Treatments have also been influenced by the physical sciences. Chemotherapy began in chemistry laboratories, in which chemists sought to develop new dye molecules. Radiation oncology, which is a cornerstone of cancer therapy, originated from basic physical chemistry research. Physics and mathematics are central to designing the accelerators that are used to generate radiation and the algorithms that are used to determine where the radiation should be delivered and how much radiation should be used. Most recently, the availability of fairly inexpensive high-throughput sequencing is making it possible to contemplate highly personalized cancer therapies, in which patients are treated with drug regimens that are specifically tailored to their disease. In addition to laying the foundation for new, personalized treatments, these large-scale sequencing efforts have also helped scientists to delineate the enormous complexity of the disease and the degree to which signalling, drug resistance and genomic alterations vary from patient to patient and even within one patient.
This new vista of cancer in all its heterogeneity and complexity suggests additional ways in which the physical sciences can assist cancer researchers and clinicians. For decades, physical scientists have been grappling with systems that are composed of many interacting parts and that exhibit considerable local variation, much like tumours in individual patients. Entire scientific fields, such as the study of superconductivity and the fractional quantum hall effect, are devoted to understanding the unexpected things that can happen when large numbers of simple pieces interact. It is very difficult, or it may even be impossible, to predict the aggregate behaviour of these systems even if all the laws that are relevant to each constituent are known. Ultimately, physical scientists were forced to invent a completely new set of theoretical and computational tools, such as the Monte-Carlo method, to explore and to simulate systems with many coupled degrees of freedom.
Cancer is perhaps such a system. It has now become clear that cancer is not a strictly deterministic disease that progresses through a simple, fixed succession of specific mutations in two or three genes. Rather, there are many molecularly distinct routes to clinically identical cancers, and the final development of malignancy is influenced by a multitude of factors, encompassing the immune system, ageing, nutrition and microenvironmental details within particular tissues. Like other emergent phenomena, cancer cannot be readily understood by merely characterizing all its components. Developing a fundamental understanding of cancer that recognizes and embraces the great heterogeneity of tumours and their emergent properties may benefit from integrated teams of physicists, cancer biologists, mathematicians and engineers.
In this Review, we provide examples from four broad areas to illustrate the idea of the physical sciences contributing to cancer biology. These four areas have well-established clinical relevance, and in each of these areas there is also evidence, owing to decades of preceding research, that physics, mathematics, chemistry and engineering are able to contribute, sometimes decisively, to breakthroughs in cancer research. These areas are cancer mechanics, cancer evolution, information coding and decoding, and transport and delivery in cancer.
Since Egyptian times6, physicians have noted that tumours are typically harder than the tissue that surrounds them. This observation gave rise to the word oncology (from the Ancient Greek ‘onkos’, which means ‘a mass’) and continues to be widely used to detect cancers. However, the connections between tissue mechanics, cancer progression and patient outcomes are only now being established7–9. Much of what we know about the role of mechanics in biological function (and dysfunction) comes from studies of organ development10 and the investigation of clinical specimens with new tools such as magnetic resonance elastography11 and highly sensitive tissue indenters12. Tensile forces within developing organs are master regulators of cell sorting and packing, thereby specifying overall tissue architecture. For example, differential cell cortex tension is a key factor in progenitor cell sorting and thus germ-layer organization13. Tensile forces arise from cell–cell and cell–matrix adhesion, surface tension, and intracellular molecular machines and cytoskeletal elements. From the interplay of cell mechanics and geometrical constraints, constructed by the gene expression of cytoskeletal elements and adhesion complexes, emerge the approximately 250 distinct cell shapes and sizes found in the human body.
Once a tissue has formed, it remains sensitive to alterations to the shape and mechanics of all its constituents. Cells change their shape when the subtle balance of forces that define their shape is modified — this is analogous to how a small stumble can immediately alter, and can quickly end, a game of tug-of-war. When changes of cell shape and mechanics spread in a tissue, as is the case in cancer, the organization and shape of the entire tissue is necessarily altered. Communications between and among cells are mediated through cell surface receptors and a network of signal transduction reactions. Mechanical forces actively alter large-scale spatial organization of signalling molecules14, providing a mechanism for physical forces to directly regulate chemical signal transduction processes. These, in turn, can activate or repress genes, modifying cell and extracellular matrix mechanics, and so on.
The physics of soap bubbles is a simple starting point for thinking about how cells may change their shape during tumour progression. Single soap bubbles are spherical, the one shape that minimizes their surface area and thus their elastic energy. For bubbles, the principle of energy minimization is equivalent to surface area minimization15. This makes it possible to precisely calculate the shapes of collections of soap bubbles. Developmental biologists have recently discovered intriguing similarities between soap bubble configurations and growing and migrating cells. For example, surface mechanics seem to mediate pattern formation in the developing Drosophila melanogaster retina16 (FIG. 1). Despite the enormous obvious differences between soap bubbles and living cells, simple calculations and simulations invoking only cell–cell adhesion, cell contractility and energy minimization reproduce the intricate six-cell ommatidium cell clusters that are found in the D. melanogaster retina17,18. The key equation used in this approach (FIG. 1a) is worthy of discussion. The first term comes straight out of a physics textbook — from elasticity theory. The second and third terms relate to the levels and effective stickiness of N-cadherins and E-cadherins. These two terms blend at least five broad areas of science: genomics, cancer biology, thermodynamics, soft condensed matter physics, and structural and membrane biology. Taken together, this then yields the characteristic six-cell clusters of the D. melanogaster retina17,18. This example illustrates the power of bringing together the physical sciences and biology. It is tempting to speculate, but by no means proved, that similar integrated approaches will help to reveal why particular changes to cell and tissue architecture are so useful for detecting, identifying and staging cancer (FIG. 1b).
Moving beyond the fairly simple D. melanogaster retina to more complicated tissues with vasculature and dozens of different cell types will require qualitatively different approaches, some of which still need to be invented. The major issues are that cell–cell and cell–matrix interfaces are neither uniform nor static, and that there might be multiscale mechanochemical feedback within tissues, potentially yielding extremely complicated dynamics. For example, unlike soap bubbles with their homogeneous interfaces, cells and tissues have complex cell–cell and cell–matrix interfaces that change with time. Each cell inside a tissue monitors its surrounding tensile forces and chemicals. Different mechanochemical inputs to the cell change the genes that it expresses, altering its cytoskeleton and changing its stiffness. Thus, the global mechanochemical state of a tissue may modulate the mechanical properties of each of its constituent cells, and these in turn can initiate mechanochemical state changes that can ripple through the tissue (FIG. 2). This is perhaps the most difficult challenge in understanding the interplay of cell mechanics, signalling, genetics and tissue function: how do cause and effect interact in a tissue, when small changes may be synergistically amplified?
One insight from the physical sciences and neurobiology is that complicated chemical, physical or biological systems are best approached by tightly integrating manipulation (where possible), measurement and simulation. Simulations allow powerful experiments to be designed, and precision manipulation and measurement allow models to be decisively tested and gradually refined. Implicit in such an approach is that experiments are carried out in a way that allows error estimates to be assigned to each measurement. For example, in particle physics, new particles are announced in terms of their sigmas: the probability of scientists being wrong. Overall, the triad of manipulation, measurement and simulation allows physical scientists and engineers to simulate the Earth’s global climate with increasing reliability, investigate the time evolution of the universe, and design and flight-test aircraft purely in silico. This approach is also showing impressive results in systems and synthetic biology19.
Another insight that is relevant to cancer mechanics is the utility of measuring system dynamics, which stems from the fundamental link between the forces that drive a system from one state to another and the dynamics with which that change takes place. A good example is an electron moving through space. A single picture of this process, however detailed, contains little information beyond revealing the presence of the electron. By contrast, a movie of this process allows the forces that control the motion of the electron to be accurately inferred; this is how electrodynamics, the theory of charge, light, radio waves and electricity, was developed. Essentially, all known physical laws were discovered by watching how things move, whether they are planets or atoms. In a cancer mechanics context, the ideal experiment would be to watch the cell boundaries within a living tumour and its surrounding tissue move with great accuracy and over long time periods. Such a movie could then be inverted to reveal the alterations of the normal cellular ‘tug-of-war’ that take place in a tissue during carcinogenesis and subsequent metastasis. However, this inversion will require the generation of new and complex mathematical models.
A final consideration is the value of simultaneously determining correlations among many parameters. For example, perturbing a cell and then quantifying the degree to which the fluctuations in the concentration of, or the location of, two or more proteins are correlated can reveal important information about signalling network topology, feedback loops and the role of noise in gene expression19,20. The cancer research literature still features reports of single genes or proteins that are asserted to cause cancer. The flood of results from the cancer genomics and cancer systems biology efforts makes one wonder, however, about the actual utility of relating any one isolated gene or protein to the disease. A new generation of measurement technologies that are able to simultaneously measure many cellular and tissue parameters, thereby relating them and thus allowing cause and effect to be distinguished, would be of great use. Imagine, for example, being able to mechanically manipulate cells within a tissue and then following the activity of all 50-plus proteins of the RAS effector pathways within single cells and thus seeing how the network is influenced by local mechanics and how it might work around a mechanically induced or a drug-induced reduction of kinase activity.
So far, we have been emphasizing the role of mechanics in development and disease, and we have also tried to provide a glimpse of related ideas that may be useful for cancer research. This is only one small part of a much larger puzzle. An equally important way to approach cancer is by studying how it evolves, as this reveals the physiological, mechanical, genetic and biochemical forces that guide the disease and its progression.
Tumours result from evolutionary processes within tissues3,21–25. From an evolutionary standpoint, tumours can be regarded as collections of cells that accumulate genetic and epigenetic alterations, which are then subjected to the selection pressures operating on the cells that harbour them. These alterations have heritable effects on the fitness of cells and may thus lead to rapid increases or decreases of mutant clones within the tumour26,27. Beneficial alterations can generate adaptations, such as an increased growth rate, motility and ability to invade into surrounding tissue, as well as the induction of angiogenesis and evasion of the immune system. The fitness of a tumour cell thus results both from the accumulation of alterations and from the interaction with cells and other components of its microenvironment. Changes that are beneficial to the cell are normally detrimental to the organism and thus neoplastic processes are an example of conflicting selection acting on different hierarchical levels28: evolution and natural selection generally lead to increased proliferation, survival and evolvability on the cellular level and results in progression, invasion and resistance. Selection at the level of organisms and genes has led to the evolution of oncogenes and tumour suppressors in the genome23,29.
Viewing neoplasms as a result of evolutionary forces operating on tissues within multicellular organisms provides physicists, mathematicians and population geneticists with an opportunity to use their tools to describe the evolution and ecology of cancer cells with mathematical constructs. Such theoretical modelling, together with the principles of evolutionary biology, has been successfully used to study the mechanisms and dynamics of tumour initiation4,5,30,31 and progression32,33, as well as the response to treatment and the emergence of resistance34–36. For example, an interest in understanding and preventing the evolution of resistance against anticancer therapy has inspired the development of several mathematical approaches. Coldman and co-authors pioneered the field by introducing stochastic models of resistance to chemotherapy to guide the selection of treatment schedules35,37. The thought process introduced by these investigators was later applied to study the risk of pre-existing resistance38–41, resistance emerging during treatment39,40,42 and the optimal scheduling of treatment administration under various circumstances43–47. Related efforts have led to such seminal results as the discovery of tumour suppressor genes30,48 and the multistage theory of carcinogenesis4,49.
The recognition of cancer as a disease that is caused by the accumulation of several somatic alterations has motivated recent large-scale efforts to annotate the cancer genome and epigenome for many human cancers50–52. When combined with computational approaches that can distinguish significant, recurrent events from the background noise in high-resolution data sets, these cancer genome and epigenome surveys yield molecular portraits that are specific for each cancer type and consistent across multiple sample sets in that they uncover a subset of events in many samples of the same cancer type53,54. These emerging, large cross-sectional data sets are the basis for investigations by computational biologists, physicists, mathematicians and evolutionary biologists to address a multitude of questions about the generation and persistence of genetic and epigenetic alterations in cancer. Here, we summarize two recent advances in a mechanistic understanding of these alterations — one addressing the propensity of genetic alterations to arise at particular loci in the genomes of evolving cancer cells, and the other concerning the deduction of the temporal order in which genomic alterations arise during tumorigenesis using evolutionary mathematical approaches.
An unstable genome is a hallmark of many cancers55. The mechanisms of the generation of genomic variation, however, have not yet been entirely elucidated. It is unclear, for example, whether some mutagenic features that drive somatic alterations in cancer are encoded in the genome sequence or whether they can operate in a tissue-specific manner. Therefore, a genome-wide analysis of the properties associated with DNA breakpoints that are related to somatic alterations in cancer is of fundamental interest for many areas in biology, including cancer genomics, genome informatics and evolution.
Many exogenous and endogenous factors, as well as molecular mechanisms, can cause double-strand breaks and erroneous DNA repair, leading to genomic alterations in cancer genomes56–58. Under certain circumstances, DNA can adopt non-B conformations, which can similarly contribute to DNA damage59–61. Guanine-rich sequences (G3+N1–7G3+N1–7G3+N1–7G3+) can adopt four-stranded structures called G-quadruplexes (G4s)62–64. As these sequences occur frequently in the human genome, they could potentially contribute to DNA damage in multiple areas of the genome. Indeed, G4 structures have the potential to obstruct the movement of DNA polymerase65, thereby increasing the risk of DNA breakage or of non-allelic homologous recombination. A recent genome-wide analysis of DNA breakpoints that are associated with somatic copy number alterations (SCNAs) from 2,792 cancer samples classified into 26 cancer subtypes led to the identification of SCNA hotspots66. Despite a subset of these hotspots being present in the genomes of apparently healthy individuals, this investigation uncovered that G4 structures could be causally implicated in genomic instability and the generation of DNA breakpoints in cancer. The genomic alterations that were associated with DNA breaks had a strand-specific pattern that was consistent with a causal role of G4 structures in their generation. An analysis of methylation data from several different tissue and cancer types subsequently led to the finding that abnormal hypomethylation in genomic regions that are enriched in G4 sequences is likely to be a key mutagenic factor that is associated with tumorigenesis. These findings are consistent with observations that G4 structures are implicated in germline deletion67–68 and recombination69 events. These studies suggested a mechanistic model for the generation of tissue-specific mutational landscapes in cancer, showcasing the ability of computational approaches, together with modern cancer data sets, to provide mechanistic insights into the evolution of cancer genomes.
These emerging, cross-sectional cancer data sets have also recently been linked to a novel evolutionary approach for predicting the temporal order of somatic events that arise during tumorigenesis. Knowledge of this temporal order helps to guide the generation of the correct genomic context in animal models of human cancer, and aids in prioritizing the validation of potential drug targets, as changes that occur early in malignant transformation may result in the rewiring of the signalling circuitry or may confer a state of addiction to the new signal. A novel evolutionary approach, called retracing the evolutionary steps in cancer (RESIC)70, determines the sequence of genetic events using cross-sectional genomic data from a large number of tumours.
RESIC is based on the principles of population genetics71 (BOX 1). RESIC predicts the distribution of patients across possible mutational states that are defined by specific genotypes; this distribution is then compared with the numbers of clinical samples that contain the corresponding genotypes. This mapping is used to optimize the evolutionary parameters by minimizing the difference between the predicted and the observed frequencies in the data set. The output of RESIC is given as a percentage of the flux through the network through each particular evolutionary path, thus specifying the temporal sequence of somatic alterations in cancer samples.
Consider a population of N cells at risk of accumulating the genetic changes that lead to cancer. Cells proliferate according to a stochastic process: at each time step, a cell is chosen that is proportional to fitness to produce a possibly mutated daughter cell. Subsequently, another cell is chosen at random to die, and is replaced by the newly produced cell to maintain homeostasis. A mutated cell can take over the population (that is, reach fixation) or go extinct owing to stochastic fluctuations (see the figure). If the population size is smaller than the inverse of the mutation rate, then at any time, there are at most two types of cells in the population: type i and type j. Cells of type i differ from cells of type j by only one genetic alteration. Their respective fitness values (that is, growth rates) are denoted by ri and rj. The rate at which the population transitions from state i to state j is given by mi,j = Niui ρ(ri,rj), where ρ(ri,rj) = [1–1/(rj/ri)]/[1–1/(rj/ri)Ni], if ri ≠ rj, and ρ(ri,rj) = 1/Ni, if ri = rj. Depending on the order of appearance of alterations, the population follows different evolutionary paths towards the fully mutated state (part a).
Cancers are considered to originate from a single population of cells per person. Using this model, we study the evolutionary dynamics of individuals accumulating the mutations leading to cancer (part b). We consider the dynamics of patients in steady state: there is a constant influx into the unmutated state, representing diagnosis of disease, and a constant outflux from the fully mutated state, accounting for diagnosis and deaths of patients or their cure. The evolutionary dynamics of a population is described by X = XM + F, where the vector X(t) consists of the frequencies Xi(t), the matrix M contains the transition probabilities mi,j, and F = (f,0, …–f) represents the influx into the initial node and outflux from the fully mutated node. At steady state, the population is distributed across all possible states; this steady state distribution can be compared with the numbers of clinical samples that have the corresponding genotypes, where the total number of patients in a data set is equal to the sum of patients in all states (part b). This mapping is used to optimize a subset of parameters in the mathematical model (that is, the fitness values of cell types) by minimizing the difference between the prediction and the observed frequencies in the data set. Other parameters, such as cellular population size, mutation rate and influx rate, are estimated from experimental results and tested for robustness over several orders of magnitude. The output of RESIC is given as the percentage of flux through the network via each particular path, and can be used, together with cross-sectional cancer genome profiling studies, to identify the temporal sequence of events arising during tumorigenesis (part c).
The established sequence of genetic events arising during the multistep process of colorectal carcinogenesis72, neurofibromin (NF1)-driven primary glioblastoma73 and secondary acute myeloid leukaemia74 provided an opportunity to validate the ability of RESIC to recover the orders of events from cross-sectional data sets70. This methodology was also applied to a large, integrated genomics data set of primary glioblastoma samples53. First, areas of significant gene copy number alterations were identified using Genomic Identification of Significant Targets in Cancer (GISTIC)75, and then alterations that were significantly positively correlated with each other were selected for further analysis; correlations between alterations are a prerequisite for the methodology, as the determination of an order of oncogenic events is only meaningful for those events that co-occur sufficiently often. This approach determined that homozygous deletions of the CDKN2A locus (which encodes INK4A and ARF) frequently co-occur with epidermal growth factor receptor (EGFR) and PTEN alterations (P value < 10–8) in primary glioblastoma. When studying this mutational network, RESIC predicts that the most common early alterations are EGFR low-level amplification and CDKN2A deletion, which have a similar likelihood of occurring. Although there is no single most frequent path through the network, the frequency of paths concluding with high-level amplification of EGFR is highest; the second most frequent final event is homozygous CDKN2A deletion70. These data suggest that glial progenitor cells can tolerate full EGFR activation only after the inactivation of CDKN2A or PTEN. This result agrees with the fact that EGFR overexpression is insufficient for tumorigenesis in mouse models of glioblastoma76,77, providing support for the temporal order of events predicted by RESIC.
Evolutionary methods of analysis such as the ones presented here will provide the research community with tools for the identification of tumourinitiating events using modern cancer genome and epigenome data sets; furthermore, such frameworks of tumorigenesis will help with the generation of hypotheses that can be tested using transgenic mouse models of human cancer. Many opportunities exist to use evolutionary approaches to further our knowledge of cancer initiation, cancer progression, the response to treatment and therapeutic resistance.
Current thinking in information coding and decoding in biological systems generally implies a one-way information flow — from DNA to transcribed RNA to translation to protein. Recent studies in developmental biology and epigenetics, however, have demonstrated that this information can flow in both directions, and that this flow can be influenced by external physical forces, even if the underlying DNA sequence remains unaltered. It is becoming increasingly clear that this biological information system is not only made up of two-way communication, but that feedback loops, inter-connectivity and modulation by external environmental forces also introduce a previously unappreciated level of complexity. Moreover, studies are revealing that new kinds of biological information exist, ranging from genetic information that is encoded in the mechanical properties of DNA, to information in protein sequences that control the lifetime and post-translational processing of the protein. This complexity represents an opportunity for physicists, chemists and engineers to work together and has led to studies at the interface of physical science, genetics and oncology. These investigations are yielding rapid advances in our understanding of chromosome structure and function at multiple length scales and are shedding light on gene regulation in normal health and development, which in turn may help to explain, diagnose and treat gene misregulation in cancer. Here, we summarize two of these advances — one concerning fundamental physical chemistry in gene regulation that is related to the lowest levels of chromosome structural organization, and the other concerning a novel cancer diagnostic technique that seems to be measuring aspects of the highest levels of chromosome organization.
An especially active research area concerns the rules governing the most fundamental level of chromosome architecture78, in which short stretches of DNA (147 bp) are wrapped locally in ~1.75 turns around octameric cores of his-tone proteins, creating nucleosomes79. Nucleosomes are separated from each other by ~10–50 bp stretches of unwrapped linker DNA; thus, only 75–90% of eukaryotic genomic DNA is wrapped in nucleosomes. DNA that is wrapped in nucleosomes is sterically occluded from most other DNA-binding proteins, and moreover is sharply distorted away from the DNA conformations that are favoured by most other proteins80. Consequently, the placement of nucleosomes along the DNA profoundly influences essential DNA interactions, such as gene regulation, transcription, replication, recombination, chromosome breakage, retroviral and transposon integration sites and DNA repair81–92.
It is perhaps not surprising, therefore, that recent studies have shown that nucleosome positioning is tightly regulated, and that there is an additional layer of genetic information, superimposed or multiplexed directly on top of other kinds of regulatory and coding information, which functions to bias where nucleosomes can be located along the DNA. The nature of this information lies in the sequence-dependent mechanics of DNA93. Different DNA sequences differ greatly regarding the ease with which they can bend around a nucleosome94,95, conferring differences of many thousand-fold or more on the affinity of nucleosomes for one DNA sequence versus another96. The concentration of nucleosomes in the cell is kept below 100% saturation of the genomic DNA, and thus different regions of DNA compete for nucleosome occupancy. DNA sequences can dictate which DNA regions will compete well for nucleosomes and have high intrinsic nucleosome occupancy, and which will not. Like transcription factor binding sites, natural genomic nucleosome positioning sequences are not determined purely through highest possible affinity96, allowing degeneracy in the choice of DNA sequence used. Thus, the degeneracy introduced by variations in the genetic code, transcription factor binding sites and the mechanical constraints of the nucleosome DNA sequence preferences97,98 allow the preferential locations of many nucleosomes to be specified alongside DNA coding and conformation changes that constitute genetic information and conventional gene regulatory information.
Approaches that are more commonly used in the physical sciences are contributing to this work in two ways. Diverse new experimental studies99,100 and theoretical studies, ranging from atomic101 to multi-scale102 and mesoscopic97,98,103, seek to measure and explain the sequence-dependent mechanics of DNA, with the goal of being able to predict the influence of the genomic DNA sequence on nucleosome formation and on the stability or occupancy of other structures involving tightly bent DNA (such as, numerous sharply looped gene regulatory complexes) from first principles.
Additionally, physical science tools have already enabled advances in predicting important aspects of nucleo-some organization in vivo, using phenomenological definitions of the nucleosome DNA sequence preferences that have been obtained in direct binding experiments104. This prediction problem is complicated by the combination of the high concentration of nucleosomes along the DNA and the physical reality that nucleosomes cannot overlap along the DNA in any one cell at any one point in time. The problem is that where any one nucleosome resides along the DNA not only partly depends on the DNA sequence inside that nucleosome, but also on the positions of the neighbours of that nucleosome; however, where those neighbours are not only depends on the DNA sequences inside them, but also on where their neighbours reside — and so on, out to the ends of the chromosome. If nucleosomes occurred only rarely along the DNA, then one might be able to ignore the problem, as the chances of two favoured nucleosome locations overlapping on the DNA might be very low. This is the case in typical analyses of transcription factor binding sites, for example, as the recognition sites for any given transcription factor are typically sparsely distributed along the genome. But for nucleosomes this is definitely not the case; some sort of ‘holistic’ theoretical approach is required, which solves the nucleosome distribution problem for an entire chromosome all at once.
The assembly of nucleosomes in vitro from purified components104 has identified the sequences to which nucleosomes are more likely to bind. This information specifies an effective potential for a nucleosome to start at each basepair along the DNA. But calculating where the nucleosomes will actually be is complicated by competition between nucleosomes, which occupy space and cannot overlap. If one poses the hypothesis that nucle-osomes equilibrate their locations along the DNA — a conjecture that could at best be only approximately true — then this problem reduces to a famous problem in statistical mechanics, namely that of a one-dimensional solution of hard rods in an external potential, and thus can be approximately solved by Monte Carlo methods105 or exactly solved by numerical integration106, recursion107 or dynamic programming78,97 (BOX 2). The solution of these equations yields the probability of a nucleosome starting at each basepair, and the probability that each basepair is covered by any of the 147 different nucleosomes that could potentially cover it (as each nucleosome covers 147 bp). The solution bears a striking resemblance to the locations of nucleosomes genome-wide (and a highly significant genome-wide correlation) measured in vivo, suggesting that the assumptions made in the theoretical analysis could be reasonable.
Following the recursive approach, one defines a potential Vn for a nucleosome located at basepair n, Vn = −kBT0 ln Pn, where kB is Boltzmann’s constant, Pn is the likelihood of a nucleosome starting at basepair n in the absence of nucleosome–nucleosome interactions, given by a Markov model used to specify the intrinsic DNA sequence preferences of the nucleosome 78,104 and T0 is the reference temperature at which the Markov model is defined. Thus, the likelihoods given by the nucleosome–DNA interaction Markov model are treated as an apparent free energy landscape onto which nucleosomes will be placed at equilibrium subject to the rule that they cannot overlap in space and time.
One then defines a recursion relation (see equation 1):
where μ is the nucleosome chemical potential (related to the nucleosome concentration (c) by μ = μ0 +kBTln (c), where μ0 is a constant) and the Hn are given by equation 2:
for a nucleosome occupying (excluding another nucleosome for a length of) a basepairs. The Hn capture the different ways a given site can be blocked by neighbouring nucleosomes. One initiates the recursion by forbidding a nucleosome from occupying less than length a basepairs at the right-hand end of the DNA, setting the Hn values for those a–1 basepairs = 1. One then solves equations 1 and 2 iteratively from large to small n. The probability ρn of a nucleosome starting at each basepair n is then given by equation 3:
which is solved iteratively from small to large n. Finally, the occupancy σn of a given basepair by any of the a nucleosomes that potentially occlude it is given by equation 4:
Of course, the real problem in the cell nucleus is not simply one of nucleosome positioning; rather, there is an ongoing dynamic competition between nucleosomes and changing constellations of transcription factors and other DNA binding proteins, each seeking to bind to favoured sequences along the DNA. Transcription factors may have high sequence specificity but they are present at fairly low concentrations, whereas nucleosomes may have lower specificity but are present at high concentrations, and thus both make important contributions to the outcome. In a sense, therefore, the genome sequence is specifying how this dynamically evolving competition will play out.
Similarly, it can be assumed that the nucleosomes and changing sets of competing factors approximate a distribution equilibrium for any given window of time (that is, for any given set of competing factor concentrations), and the statistical mechanics model summarized in BOX 2 can be generalized to allow for competition not just between nucleosomes, but also between one or more different transcription factors with each other and with nucleosomes108. This problem can again be solved exactly by using dynamic programming97,109. An example of the competition of nucleosomes with themselves and with a single factor binding at two nearby sites is shown in FIG. 3. In this model, nucleosomes reconfigure in response to changing transcription factor concentrations simply because of the changing nature of the competition. The transcription factors influence the resulting distribution of nucleosome locations and occupancies, while the nucleosomes equally influence the distribution of bound transcription factors, as well as the occupancies of the transcription factors110.
By complementing such an analysis of nucleosome and transcription factor binding configurations with information relating transcription factor binding and the eventual transcriptional output, it might be possible in the future to predict the transcriptional state of many genes in a cell given only the genomic DNA sequence, concentrations of key transcription factors, and the known sequence preferences of those transcription factors108. Such a predictive ability would in turn be valuable for understanding how a normal cell is transformed into a malignant one, and conversely, how a malignant cell might be transformed back into a non-malignant one. Currently, however, other outstanding issues remain, most notably, the extent to which these models will predict biological reality, as determined by experiments.
At the other, highly compacted, end of the structural hierarchy of the chromosome, little detailed structural information is currently available111, but there is great potential for important advances to be made from using approaches that are commonly used in the physical sciences. Indeed, methods from the physical sciences are already being used in cancer diagnostics.
Partial wave spectroscopy (PWS)112 takes advantage of the field effect, in which apparently normal cells, which are distant from a cancerous or potentially precancerous lesion, can develop anomalous properties that may be detectable even in the absence of knowledge regarding the exact location of the lesion itself (FIG. 3). The PWS experiment yields a signal, the magnitude of which can be a highly sensitive and specific predictor of the existence of a cancerous lesion some distance away in the body. This approach has great potential in the diagnosis of cancers, such as those of the colon, lung and pancreas, for which existing diagnostics are ineffective, unpleasant or have a substantial risk of complications. Although PWS was initially developed as a purely phenomenological indicator, its striking preliminary successes have heightened interest in better understanding the underlying physical and biological changes that it monitors. Quantitative studies on patient samples versus normal controls have shown that increased PWS disorder strength in cells that are distal to a cancerous lesion is a strong indicator of the existence of a lesion112,113. However, the PWS technology has not yet been applied to problems of distinguishing pre-malignant from malignant disease, which is an important goal of future studies.
PWS measures the refractive index, which is related to the polarization of molecules, and in turn is related to electron density. Of the main macromolecular constituents of a cell with amounts that might plausibly have significant variability on subwavelength length scales, nucleic acids and phospholipids stand out by virtue of having fairly high amounts of the electron-rich element phosphorous. Chromatin in particular is both highly phosphorous-enriched and highly heterogeneous in subcellular (and subnuclear) distribution, with notable regions of high local chromatin concentration observed cytologically as dense heterochromatic regions. Hence, one expects that dense nucleic acid-containing superstructures, such as large-scale regions of compact heterochromatin, might dominate the PWS disorder strength measurement.
Consistent with this expectation, direct tests show that both the nucleus and the cytoplasm contribute to the measured disorder strength, but that the nuclear contributions dominate114. Certainly within the nucleus, one expects chromatin structure to dominate refractive index inhomogeneities and so the PWS disorder strength115.
Therefore, for the case of PWS, physical science approaches have already made a valuable contribution to cancer diagnostics. This approach potentially provides diagnostic sensitivity and specificity in many cancer types that are comparable to or better than those that are presently available using much more invasive tests. PWS should also help in understanding the intracellular organization of chromatin, a problem that has resisted a definitive solution for many decades. Much anecdotal evidence suggests that there is a relationship between higher order chromatin compaction and transcriptional repression. Advances using PWS spectroscopy as a discovery tool, together with other approaches from physical sciences ranging from micromechanical studies on whole chromosomes, to novel imaging modalities such as super-resolution optical microscopy and electron microscopy using engineered nanoparticle markers, will probably shed much light on this longstanding problem in fundamental molecular biology. Furthermore, a better understanding of the chromatin organization and how it relates to the PWS disorder signal may in the future lead to further refinement of this promising diagnostic tool.
Complex systems represent major areas of study for physical scientists. Fundamental insights from areas such as thermodynamics, fluid and classical mechanics, in combination with advanced computational visualization and simulation, could potentially aid in understanding cancer. Genomic instability is a fundamental characteristic of cancer, inherently variable between patients even with nominally the same disease and during cancer progression within the same patient116. The current clinical taxonomy of cancer technically defines more than 200 types, but closer inspection reveals that cancers are like ‘malignant snowflakes’, with no two cases identical at the cellular level117–119. Heterogeneity is higher in tumour types that originate later in life and increases during tumour progression120,121. As is particularly evident in pancreatic cancer122, tens or hundreds of different diseases coexist within the same person with metastatic dis- ease, with each metastasis having a distinct genetic profile and distinct signalling pathways, growth, interaction with the microenvironment and response to treatment.
Non-genetic and extrinsic factors greatly add to the complexity of cancer: the stochastic partitioning of proteins during cell division, which generates randomness in protein abundance123; epigenetic heterogeneity including DNA methylation, histone modifications, nucleosomal occupancy and remodelling, chromatin modifications and remodelling, non-coding RNAs124 and proteomic profiles125; microenvironmental heterogeneity between cancer types and metastases126, as well as within a single tumour; and the macroscopic heterogeneity of the patient, including age, gender, weight, immune status, lifestyle and mental health.
Faced with this overwhelming diversity, medicine has not substantially advanced in the treatment of metastatic cancer — and perhaps it never will, unless the root causes of this heterogeneity are clarified. Complexity theory in mathematics and physics is defined by the quest to understand and predict the emergence of order and structure in complex and apparently chaotic systems, such as turbulent flows. A major advance in breaking the overall problem into more manageable pieces lies in the answer to the question of what defines a cancer. Six fundamental, distinct hallmarks have been proposed55. It is possible that cancer can be understood in terms of these hallmarks, which is akin to resolving a six- dimensional vector into components along six coordinate axes. However, each ‘axis’ further comprises a limited number of mutated genes that define pathological pathways127. This is accompanied by the heterogeneity of genetic mutations even in two nominally identical tumours, owing to a large number of infrequently mutated genes, which may be crucial drivers of the development and the progression of tumours128. Systems biology129 offers a promising approach to the organization of genomic data into quanta of biological order, and information on molecular circuitry connecting cellular pathways and diseases states130. The quest for intelligible underlying structures in cancer to which complexity theory could be applied can be thought of as the search for ‘super-genes’ that are shared among cancer types, which form 3–5% of the mutated gene population, and which affect the key pathways in which the other mutations tend to cluster127,128. Even with this approach, the assessment of cancer complexity challenges our optimism about finding cures with the currently available molecularly targeted therapeutic approaches127,131,132.
The conventional approach to metastatic disease, which is what kills most cancer patients, involves the use of systemically administered agents, in the form of chemotherapy, radiotherapy or biomolecularly targeted therapeutics that have the capacity to reduce the selective fitness advantage of metastasizing and/or metastasized cells. Such chemical and biological agents are expected to simultaneously carry out a triad of functions: transport from the point of administration to the intended cancer target sites, preferential accumulation at these target sites and preferential cytotoxicity or cell signalling modulation. The failure to treat metastases stems from the failure to develop agents that are capable of reducing the reproductive success of metastatic cells, as well as the failure to address the three functions simultaneously, for all different presentations of the disease within a patient at any given time. One ‘blind spot’ has been the physics of mass and momentum transport. This is a starting point for a different perspective on cancer, termed Transport OncoPhysics133. This approach aims to reduce the complexity of apparently disparate biological hallmarks to the unifying notion that these hallmarks all reflect the deregulation of mass transport. In this framework, cancer is viewed as a disease of mass transport deregulation at multiple scales — bridging the molecular to the cellular, the microenvironmental, organ and organism levels (FIG. 4). An intriguing view emerges of a family of diseases characterized by pathological disruptions of mass transport, in hierarchically nested systems. For example, the defining aspect of malignancy, tissue invasion, is mass transport deregulation at the interface between the cell and the microenvironment. Metastasis defines the transition to lethality, and is a deregulation of local and distant cellular transport at the scale of the organism. Tumour-associated angiogenesis completely overhauls microenvironmental mass transport, and is another prime example of transport deregulation. The signalling pathologies that accompany the evasion of apoptosis, growth signal dependence and growth inhibitory messages from the immediate environment are also disruptions in molecular transport — for molecular signalling occurs by the transport of signalling molecules.
To verify or disprove the Transport OncoPhysics approach, several concurrent, novel investigational modalities and tools are required, which are based on mathematics and the physical sciences: a multiscale mathematical theory of mass and momentum transport through the body; multiscale imaging that enables the tracking of mass transport in living organisms, with integrated resolution from subcellular to full body levels; and multiscale probes, in conjunction with imaging techniques, to determine the transport properties at various levels, as functions of the characteristics of the transported object. In the Center for Transport OncoPhysics, within the US National Cancer Institute’s Physical Sciences-Oncology Centers initiative134, we have focused on all three of these enabling aspects (the multiscale modelling of cancer growth and multiparameter response to therapy135,136; multiscale imaging137; and multiscale probes138), which also provide specific vectoring across biological barriers to target lesions139, yielding novel mechanisms of transport, accumulation and the release of the therapeutic payload. These have resulted in unprecedented therapeutic results140,141 through the use of systems of nested (multistage) particles, designed with explicit consideration of the physics and mathematics governing the convection, margination, adhesion and cellular uptake of the particles142–145. Once the physical laws of transport are determined, their parameters for individual lesions can be obtained from direct observation through suitable imaging techniques. Among these parameters are the blood flow velocity, shear stress at the vascular wall, vascular permeability and the density of the specific antigens of interest that are expressed on the vascular endothelium. Once these are observed, the mathematical routines for the rational design of particulates can be used to yield therapeutic systems that are individualized for the specific lesion — not only in the therapeutic payload but also in the therapeutic delivery vector itself. This opens new frontiers for the idea of personalized therapy.
The physical laws and principles that define the behaviour of matter are essential for developing an understanding of the initiation and progression of cancer at all length scales. The theoretical approaches that enable the definition of behaviour within complex systems offer opportunities for new insights into long-standing problems in cancer research. For example, the metastatic process, the generation and maintenance of heterogeneity within and among tumours, the emergence of drug resistance and the ecological behaviour of cell types with differing reproductive fitness and degrees of drug sensitivity, and the delivery of therapeutics to the core of a tumour, as well as its distant metastases, will all benefit from an application of physical science approaches to oncology — from mechanics to evolution, chemistry and nanotechnology. Thus, the successful integration of approaches from mathematics, physics and engineering with cancer biology may be our best hope to understand complex systems such as cancer and to develop effective strategies for a cure.
The authors would like to acknowledge support from the US National Cancer Institute Physical Sciences-Oncology Center (PSOC) initiative to fund the Dana-Farber Cancer Institute PSOC (F.M.), Bay Area PSOC (J.L.), The Methodist Hospital Research Institute PSOC (M.F.) and the Northwestern University PSOC (J.W.). M.F.’s research for this article was furthermore supported by grants from DoD/BCRP (W81XWH-09-1-0212), as well as by the Ernest Cockrell Jr. Distinguished Endowed Chair. The authors would like to thank A. Sebeson for her invaluable help. This work is dedicated to Professor Jonathan Widom. With his passing, we have lost both a major intellectual force and a valued member of our community, as well as a trusted friend and colleague.
Competing interests statement
The authors declare no competing financial interests.
Segal Lab of Computational Biology
The Bay Area Physical Sciences-Oncology Center
The Dana-Farber Cancer Institute Physical Sciences-Oncology Center:
The Methodist Hospital Research Institute Physical Sciences-Oncology Center
The Northwestern University Physical Sciences-Oncology Center
The US National Cancer Institute’s Physical Sciences-Oncology Centers initiative
ALL LINKS ARE ACTIVE IN THE ONLINE PDF