|Home | About | Journals | Submit | Contact Us | Français|
Cancer arises as the result of a natural selection process among cells of the body, favoring lineages bearing somatic mutations that bestow them with a proliferative advantage. Of the thousands of mutations within a tumor, only a small fraction functionally drive its growth; the vast majority are mere passengers of minimal biological consequence. Yet the presence of any mutation, independent of its role in facilitating proliferation, tags a cell’s clonal descendants in a manner that allows them to be distinguished from unrelated cells. Such markers of cell lineage can be used to identify the abnormal proliferative signature of neoplastic clonal evolution, even at a stage which predates morphologically recognizable dysplasia. This article focuses on molecular techniques for assessing cellular clonality in humans with an emphasis on how they may be used for early detection of tumorigenic processes. We discuss historical as well as contemporary approaches and consider ways in which powerful new genomic technologies might be harnessed to develop a future generation of early cancer diagnostics.
Tumorigenesis is driven by somatic evolution [1–4]. Random mutations that arise during life and confer a growth advantage upon a cell will lead to that cell’s preferential multiplication within a tissue. New variants that emerge within the expanding population fuel further waves of selection and expansion that iteratively repeat until all the phenotypes of a mature cancer have been achieved . The forces dictating this process are identical to the Darwinian principles that govern evolution among individual organisms. Many of the challenges to which a cancer cell must adapt stem from growth controls built into its own genome. In multicellular organisms, a common genome derived from the founding zygote serves as a contract among cells to restrict autonomous proliferation that would negatively impact the fitness of the organism as a whole. In a developing cancer, these genetically hardwired antineoplastic defense mechanisms are systematically mutated away until tumor cells gain the ability to proliferate indefinitely.
Mutation leading to selectable genetic diversity is at the heart of the evolutionary process. Just how this diversity arises in cancer remains uncertain, but recent cancer genome sequencing studies have indicated that tumors contain tens of thousands of mutations . Methods by which healthy tissues suppress the accumulation of genetic errors include: maintaining a very low per-cell-division rate of mutation, minimizing the number of cell divisions that occur in long-lived stem cells, purging of mutated cells through programmed cell death and immune surveillance, and use of a hierarchical structure of cell division whereby most cells become terminally differentiated into non-reproducing entities without the ability to propagate further errors . A long-standing controversy has existed over the question of which, if any, of these mechanisms must be disrupted to allow sufficient diversity to accumulate for a cancer to evolve [1, 8–11]. Other articles in this issue consider the evidence for and against a mutator phenotype in cancer. In this review we assume a neutral position on this complex issue and instead focus on ways by which the mutations that do arise can be used to identify the signature of clonal evolution. We discuss how heritable genetic changes of many varieties can serve as markers of cell lineages to track the emergence of new clones as an empirical metric of dysregulated cell growth for the purpose of early diagnosis of cancer.
The term “field cancerization” first appeared in a 1953 publication by Slaughter et al  wherein it was observed that a field of subtly abnormal epithelium commonly surrounds oral cancers and that multiple distinct tumors often co-occur within these zones. They noted:
“this pattern of distribution is of interest because it suggests a regional carcinogenic activity of some kind, in which a preconditioned epithelium has been activated over an area in which multiple cell groups undergo a process of irreversible change towards cancer”
The authors reported that such fields routinely extended beyond the margins of the surgical resections they examined and posited that this fact might explain the high local recurrence rate of oral cancers following surgery. No biochemical explanation was offered to explain the fields, simply thoughtful phenomenological observations.
In the decades since, oral cancer-associated fields have been characterized through molecular means and found to frequently possess a subset of the genetic and epigenetic abnormalities occurring in the tumor itself [13–15]. Similar mutant fields have been identified surrounding a variety of other cancer types including esophageal [16, 17], lung [18, 19], bladder , breast  and colon in the setting of chronic ulcerative colitis [22–24], among others. In many cases, molecularly-definable fields exist in the absence of histological changes. These findings are remarkably in line with Slaughter’s original hypothesis and the currently held evolutionary model of cancer development whereby successive waves of mutation, selection and clonal expansion gradually accrue the genetic changes necessary for malignant transformation. By this model a field is simply an ancestral clone possessing a partial complement of the genotypes and phenotypes of a fully formed cancer.
Given that the spread of clonal cell populations over large areas of epithelium appears to be an early part of the developmental pathway of some cancer types, many investigators have asked whether the presence of such clones can be used to predict future tumor development. The most common approach taken for identifying fields has been to screen for molecular changes commonly found in tumors themselves and thought to drive their growth. This has typically entailed focusing on alterations affecting known proto-oncogenes or tumor suppressors including: point mutations, chromosomal rearrangements, deletions, amplifications, loss-of-heterozygosity events and/or epigenetic modifications. Among the best studied systems have been Barrett’s Esophagus, oral leukoplakia and ulcerative colitis, all highly cancer-predisposing conditions with well-established field intermediates and where tissue is relatively accessible and routinely sampled as part of clinical care. While the cancer-predictive value of mutant fields identified by this method has been remarkably good for some diseases , for many others, it has remained limited.
One possible explanation derives from the fact that a cancer’s evolution is a stochastic process and the genetic changes driving one tumor will not always be the same as those selected in others [3, 26]. Such logic has been substantiated by large-scale sequencing studies of breast and colon cancer exome “landscapes” indicating that, while a handful of genes (“mountains”) are commonly mutated across different tumors, a far greater number are mutated only infrequently (“hills”) . The tactic of screening for mutations in common drivers will, by definition, be unable to detect clonal expansions driven by mutations in unpredicted genes or regulators elsewhere in the genome. Given that it is beyond our current abilities to know of all possible rare drivers, an alternative approach must be taken to identify such clones.
As cells divide throughout life, irrespective of cancer, mutations are continually introduced into their genomes at low frequency . Most of these land outside of genes or regulatory regions and are likely to be functionally silent (neutral) but serve to indelibly mark each cell with a unique genetic fingerprint. The majority of such mutations will normally be undetectable by routine genotyping techniques because they are present in only one or a few cells whose signal is obscured by the vastly larger number of non-mutant genotypes in the surrounding tissue (Fig. 1A). If, however, a cell gains the ability to clonally proliferate as a result of one or more driver mutations, it will also carry along this larger number of neutral mutations to detectable level as chance passengers (Fig. 1B,C). Thus, the identification of any mutation in a tissue by conventional methods of aggregate DNA analysis, regardless of its functional status, is an indication that a clonal expansion has occurred. With few exceptions outside of the immune system, large clonal expansions arising in adults are abnormal and a signature of neoplasia.
As a means of identifying preneoplastic clones, screening for neutral mutations has several advantages. First, the approach conceptually focuses on identifying the generic phenotype of abnormal growth patterns without the need for a priori knowledge of the heterogeneous genotypes that may induce it. The general concept can be easily transferred between cancer types having different characteristic drivers with little or no modification. Second, of all mutations carried by an emerging clone, a far greater number are passengers than drivers. Among the tens of thousands of mutations that have been identified in cancer genomes during the last three years, only a tiny subset is likely to be etiologically related [27, 29, 30]. Lastly, screening for mutations in strongly cancer-associated genes conceivably might even reduce the detectability of very early clones. Experimental introduction of powerful oncogenes, such as activated members of the ras pathway, induces growth arrest and senescence in otherwise untransformed human cells in vitro . This suggests that the most primitive clones in vivo may be less likely to bear such lesions, having not yet accrued prior mutations to facilitate oncogene tolerance. The main drawback of relying on passenger mutations to identify clonal expansions is that they are not limited to a handful of defined loci, but occur scattered throughout the genome. Mutagenesis, however, is not completely random; replication errors occur in certain hotspots orders of magnitude more frequently than elsewhere.
In this article we focus on techniques for identifying clones using neutral passengers and defer discussion of suspected driver-based methods to other excellent reviews [32–35]. For purposes herein, “neutral” is loosely defined as genetic or stable epigenetic changes identified based on methods not requiring knowledge of positively selectable loci. It is of course impossible to know with absolute certainty that mutation of a given site will not affect cell phenotype, but the spirit of the definition is to distinguish targeting of likely passengers from that of probable drivers. For the sake of brevity we limit the scope of our discussion to methods that are theoretically generalizable across multiple cell-types in the body. For example, we do not consider the elegant technique of using somatic rearrangement of Band T-cell receptors as clonal markers in blood cancers , nor methods involving detection of random genomic integration sites of organ-specific viruses . We begin with methods of historical importance involving X-chromosome inactivation before proceeding to approaches utilizing different varieties of mutational hotspots.
The earliest molecular evidence suggesting the monoclonal origin of a cancer came from studies by Linder and Gartler  based on heterozygous alleles of the X-linked G6PD gene. In the primitive blastocyst stage of female embryonic development, one X-chromosome per cell is randomly inactivated through hypermethylation, a process known as Lyonization . Using the fact that different isoforms of the G6PD protein migrate with different motilities during gel electrophoresis, Linder and Gartler showed that, whereas G6PD from normal myometrium in heterozygous females appeared as two distinct bands (corresponding to both allelic isoforms), protein from 27 leiomyoma samples universally contained only a single band. They interpreted this observation to mean that normal uterus contains a fine mixture of cells with different X-alleles inactivated whereas tumors must be clonally derived from a single cell bearing a just one expressed allele.
Modern variations on this technique rely on directly interrogating the genetic polymorphisms that distinguish the maternal and paternal X-chromosomes. One approach uses methylation-specific restriction endonucleases , or methylation specific PCR  to differentiate between active and inactive alleles of genetically variable loci. Among the most commonly interrogated sites are the CAG repeat portion of the human androgen receptor (HUMARA) and the phosphoglycerate kinase (PGK) gene. Another approach assesses which polymorphic variant of an X-linked gene is transcribed using RT-PCR or other methods .
As a technique for studying the basic clonal features of cancer, the X-chromosomal approach has proved useful for more than four decades. A variety of studies have demonstrated that the embryonic “patch” size defined by a particular inactivated X-chromosome is relatively large in some human tissues [43–46]. Others have used this observation to support the intriguing possibility that some tumors bearing a single active X-chromosome could be polyclonally derived from a modest number of cells within an embryonic patch [47–49]. This large patch size also means, however, that as a general method for identifying new clonal expansions as an indicator of early neoplasia, screening for homogeneous expression from a single X-allele may yield false positives in some organs if the area sampled is too small. Another significant shortcoming is the fact that the method is limited to patients who are both female and heterozygous at the particular X-linked loci being screened.
Microsatellites, also known as short tandem repeats or STRs, are iteratively repeated elements of 1–6 basepairs that make up approximately 3% of the human genome . These sites represent the prototypical example of mutational hotspots, resulting from biochemical properties that make them difficult to accurately traverse by replicative polymerases . Their meiotic mutability is well recognized to contribute to the generational progression of triplet repeat expansion diseases including Fragile X Syndrome, Huntington’s Disease and Spinal Muscular Atrophies, among others . Relative instability of tract length within the germline makes them highly polymorphic in the population, and thus useful markers for linkage analysis , forensics  and phylogenetic inference of human evolution and migration . Haplotype diversity at these loci additionally makes them useful for loss-of-heterozygosity (LOH) analysis in both cancers and preneoplastic fields .
Polymerase errors made when copying across microsatellites are predominantly corrected by the cell’s mismatch repair (MMR) system. Hereditary nonpolyposis coli, also known as Lynch syndrome, is an inherited deficiency in MMR associated with a >80% lifetime risk of colorectal and other cancers . Somatic loss of MMR activity is also observed in 10–15% of sporadic colorectal cancers in patients without hereditary disorders of DNA repair . On a biochemical level, deficiency of MMR elevates microsatellite slippage rates between 100 and 1000-fold , making MMR- tumors one of the most definitive examples of a cancer-associated mutator phenotype . The enormous number of passenger mutations arising in MMR- cancers annotates their genomes with an especially thorough record of the past. Tsao and colleagues capitalized on this unique phenomenon by using slipped microsatellite loci as a molecular clock to study the mitotic age of MMR- tumors . A similar concept was recently used to phylogenetically map the cell lineages of tumor metastases in an MMR-compromised mouse .
Even with intact MMR, microsatellites exhibit mitotic frameshift rates several orders of magnitude above that of non-repetitive sequences [62, 63]. Length altering microsatellite mutations have been identified in a variety of non-MMR deficient cancers and adjacent tissues . In Barrett’s Esophagus they have been observed in fields that temporally precede adenocarcinoma . The detection of low-frequency microsatellite slippage in cancers or preneoplastic fields has often been reported as “microsatellite instability” [66–68]. While this wording may not be precisely correct, given that the detection of a mutation is not absolute evidence that the rate of mutation is necessarily elevated , the ubiquity of slipped alleles speaks to their potential usefulness as clonal markers. A concern, however, is that many studies which have used microsatellite slippage to identify expansions have only been able to detect a fraction of known clonal entities defined by other types of mutations. This is not wholly unexpected given that mutational marking is stochastic: the probability of being able to detect a clonal population is a function of the number of sites screened, the per-cell-division rate of mutation at these sites and the number of divisions undergone by a cell lineage prior to the last expansion bottleneck. Improved sensitivity should thus always be attainable by assessing a larger number of markers sites and those of greater mutability.
The rate of mitotic microsatellite slippage depends on the repeat type, length and other less predictable factors involving adjacent sequence context, transcriptional status and chromatin structure [63, 70]. Values for different loci are quite variable, ranging from less than 10−6 to nearly 10−4 in normal human cells in culture [62, 63]. Monomeric repeats of polydeoxyguanosine [poly(dG) tracts] are particularly unstable, with long tracts on the upper end of this range. Several years ago our group developed a technique for constructing fate maps of mouse development by phylogenetically analyzing the mutational relationships of hundreds of poly(dG) tracts among many individual cells [71, 72]. We recently adapted our experimental pipeline to screen for poly(dG) slippage mutations as a biomarker of preneoplastic clones in ulcerative colitis (UC) . We genotyped 28 non-coding poly(dG) repeats of 12 or more residues in DNA from non-dysplastic colon biopsies of 19 individuals with UC by multiplex capillary fragment analysis. Half of these patients had cancer or advanced dysplasia in other portions of their colon and half had no histologically identifiable malignancies. Of the mutations found, 97% occurred in the cancer group. Whereas only one biopsy from one non-cancer individual bore a mutant marker, every single cancer-affected patient had at least one clonal field detectable by mutations in non-dysplastic colonic mucosa as much as 80 cm away from the cancer site. Of the thousands of genotypings carried out, only about 1% were mutant relative to the germline, yet approximately 2/3rds of all nondysplastic biopsies taken from the cancer group carried a mutation in at least one of the 28 markers, indicating a clonal derivation. This study illustrates the critical importance of high-throughput screening when relying upon random passenger mutations for clone detection. There are more than 3300 comparable poly(dG) tracts in the human genome and efforts by our group to be able to simultaneously screen the majority of these with even higher-throughput methods are ongoing.
A heritable genetic component of nearly all human cells that is often overlooked is that of the mtDNA. The mitochondrial genome is a 16.5 kb circular loop of DNA that is replicated by organelle-specific machinery independently of the cell cycle. Each cell contains multiple mitochondria and each mitochondria may contain up to ten genomes, bringing the copy number of mtDNA genomes per cell into the hundreds for some tissues . As with nuclear mutations, mtDNA mutations are passed on to daughter cells during cell division, thus serving as a marker of cell lineage. Yet because mitochondria appear to lack the complex DNA repair machinery of the nucleus and mtDNA is continually exposed to reactive oxygen intermediates from the electron transport chain and is unprotected by histones, the per-base-pair mutation rate is significantly higher than that of the nuclear genome .
There have been many reports of mitochondrial mutations in cancers of all varieties , and more recently, in fields surrounding tumors themselves. Sidransky’s group sequenced the complete mitochondrial genome from lung tumors and histologically normal respiratory epithelium in the lungs of long-time smokers . Multiple mtDNA mutations were found in non-dysplastic mucosa which were identical to those from nearby cancers. In another study, the same group looked at tumors, negative surgical margins and peripheral lymphocytes from 50 patients with squamous cell carcinoma of the head and neck that recurred after an initial surgery . Of these patients, approximately half had at least one mtDNA mutation in the cancer relative to the control tissue, and of these, approximately half were also found in the histologically negative margins.
An elegant technique, developed over the last several years, uses spontaneously-arising mutations in the mitochondrially-encoded cytochrome c oxidase gene (COX1) to directly visualize cell lineage relationships in situ. Cytochrome c oxidase comprises the last step of the electron transport chain (complex IV) and mutations that disrupt its activity can be identified as blue (versus brown) staining cells in tissue sections using duel epitope histochemistry (Fig. 2E) . Patches of blue cells indicate clonally-derived populations and the relationship between adjacent COX1- patches can be further delineated by microdissection and DNA sequencing for the causative mutation. The group which developed the method has used it to identify the stem cell compartment at the base of colonic crypts and shown that patches of genetically related crypts form and increase in size with age . More recently they have located the putative stem cell compartment for regenerative units in stomach , small intestine , skin and pancreas  and liver  as well as characterized the clonality of cirrhotic liver nodules . Conceivably such an approach might be used to identify early neoplastic clones in at-risk tissues by direct staining of biopsies.
While a number of features of the mitochondrial genome render it uniquely suitable for lineage mapping, several drawbacks also exist. Neither standard sequencing, nor functional staining for respiratory chain defects, is able to identify mtDNA mutations that are not present in a substantial fraction of the mitochondrial genomes within a cell. To become detectable, a mutant genome must first overtake other genomes within its organelle and then this mitochondria must outcompete or transform other mitochondria within a cell, rendering the mutation homoplasmic, prior to the marked cell clonally expanding within a tissue (Fig. 2).
The way by which homoplasmy occurs remains largely unclear. Mitochondria continually fuse and divide within the cell and potentially recombine or cross-correct their DNA . Partitioning of mitochondria in the cytoplasm of dividing daughter cells may be both regulated and influenced by stochastic factors, making inheritance more complex than the binary division through which the nuclear genome segregates. The extent to which drift or selection influences the emergence of particular mitochondrial variants is also unknown. Unlike in the nuclear genome, the majority of mtDNA is coding and a sizable percentage of random mutations will be expected to alter protein sequence . In cancer, some mtDNA mutations have been found more frequently than expected by chance  and others have been strongly associated with proliferative phenotypes. For example, Ishikawa and colleagues  demonstrated that the presence of a single mtDNA point mutation can render cells highly metastatic. A recent study by the developers of the COX1- staining methodology suggests that, at least in colon, these mutations have a small but noticeable effect on cell proliferation and apoptosis . Such reports bring into question the neutrality of some mitochondrial mutations as lineage markers. On the other hand, many hundreds of synonymous (non-protein changing) mtDNA variants have also been reported in cancer and mutations of all types occur heteroplasmically in different normal tissues . Modeling studies have also suggested that the phenomenon of homoplasmy can be expected to occur by drift alone [88, 89]. Just as in the nuclear genome, both passenger and driver mtDNA mutations are associated with clonal proliferation, and teasing apart which is which remains challenging.
Regardless of what factors contribute to homoplasmy, the process appears to take considerable time. Greaves et. al. showed that in normal colon, homogeneously COX1- staining crypts do not appear until after the age of 40 and that these divide to form small clusters of related crypts that increase in size with age . Presumably crypt division begins at a younger age but is not yet histochemically visible because the COX1- genotype has not had sufficient time to propagate to homoplasmy within the crypt stem cells. This is one illustration of a general consideration for all markers of cell lineage that is discussed further below–the inability to detect a clonal marker is not de facto evidence for the absence of a clone. The temporal delay of homoplasmy makes mitochondrial mutations a potentially problematic tool for identifying early neoplastic clones in the young. Emerging sequencing technologies  and other techniques capable of high resolution mutation analysis  are beginning to permit detailed investigation of low-frequency, heteroplasmic mtDNA mutations, which may partially obviate this concern in the future.
Another sometimes forgotten source of molecular information that is heritably transmitted during cell division is DNA methylation. Following genome replication, DNA methyltransferases copy the methylcytosine profile of the parent molecule to the newly synthesized daughter strand. While a relatively accurate process (approximately 1–2 mistakes introduced per 105 residues copied ), it remains considerably more error-prone than DNA replication itself (rates variably estimated to be from 10−9-10−11 per base per cell division in normal tissues ). De novo methylation tends to increase with age , most specifically, mitotic age . Thus, just as with DNA mutations, methylation error patterns serve as a record of somatic cell ancestry.
Silencing of gene expression through hypermethylation is a common phenomenon observed in cancer . There have been a variety of reports of similar epigenetic changes in non-dysplastic tissue surrounding tumors. Shen et al identified methylation of the MGMT gene promoter in normal-appearing tissue flanking sporadic colorectal cancers . Ushijima and colleagues have demonstrated promoter methylation of both protein-coding genes and microRNAs in non-cancerous mucosa around gastric cancers associated with helicobacter infection . Similar epigenetic changes have been observed in tissues adjacent to cancers in liver , esophagus [98, 99], lung , breast , kidney , and bladder , among others.
Interpreting the results of many of these studies in terms of clonality is complicated. While “fields” of methylation changes certainly exist around some cancers, it is difficult to know that these necessarily represent clonal entities from which the cancer evolved. The predominantly used technique (methylation-specific PCR) can detect epigenetic changes in a small percentage of cells in a population and offers a general assessment of methylation across a CpG island rather than a readout of specific methylated bases that would be needed for rigorous lineage assessment. Without this information, it is conceivable that such signals might result from aggregate changes of multiple small clones or individual cells induced by a common environmental factor such as inflammation. An additional consideration is that most studies have targeted promoter regions of tumor-associated genes, which raises the issues of non-neutral lineage markers considered previously.
A lineage mapping approach developed by Shibata’s group to circumvent some of these complications, entails bisulfite conversion and direct sequencing across CpG islands in portions of the genome not transcribed in the tissue type under study . The technique retrieves the exact base-by-base pattern of spontaneously methylated CpG sites in densely clustered regions, yielding multiple data points from loci presumably under neither negative or positive selection. So far this group has used the accumulation of epimutations as a molecular clock to study dynamics of replicative units in the colon [90, 105], small intestine  and endometrium . They have recently used the method to investigate how the spatial distance within colorectal cancers compares with epigenetic lineage distances and demonstrated that the terminal outgrowth of these tumors represent relatively uniform clonal expansions .
The mutation-based methods described so far all rely upon targeted screening of a relatively modest number of uniquely mutable hotspots. Such loci are experimentally practical to work with using conventional technologies, but encompass only a small fraction of the total mutational lineage information encoded in the genome. Recent large-scale sequencing studies have illustrated that a diversity of tumor types carry tens of thousands of somatic alterations in non-hotspot DNA , most of which are likely to be neutral passengers [27, 29, 30]. Presumably the bulk of these arose well before the terminal cancer outgrowth and could serve as markers of early clones, but their identification would require screening hundreds of millions of base pairs. A new generation of genomic technologies is rapidly bringing such an approach into the realm of possibility for individualized diagnostics.
Early methods that may be considered “whole genome” mutation screens include measurement of ploidy imbalances by DNA content cytometry  and cytogenetic assessment by traditional G-banding, florescent in situ hybridization (FISH)  or spectral karyotyping (SKY) . The granular resolution of these approaches necessitates that deletions, amplifications or structural rearrangements be on the order of 105-109 bases in size to be detectable. Another technique, known as DNA fingerprinting, can detect significantly smaller changes by using one or more short primers to randomly amplify a fraction of the genome during low-stringency PCR. The appearance, disappearance or change in size of product fragments during electrophoresis reflects clonal alterations that may be used as a marker of preneoplastic fields .
Newer technologies for identifying focal copy number changes by hybridization to solid-state probe arrays enable tens of thousands of sites across the genome to be systematically interrogated. Comparative genomic hybridization (CGH) and single nucleotide polymorphism (SNP) arrays have been used in the last several years to investigate preneoplastic populations in Barrett’s Esophagus [113, 114] and ulcerative colitis , among a variety of other cancer-predisposing diseases. Navin et al recently used the profile of CGH-identified copy number changes to study the clonal architecture of different regions of advanced breast cancers through phylogenetic inference .
Direct genomic sequencing provides the most detailed means possible of identifying clonal mutant markers. In contrast to conventional capillary-based techniques where individual PCR products or bacterial clones must be sequenced individually, a powerful new class of “Next Generation” sequencing technologies allows for simultaneous genotyping of tens of billions of base pairs . The rapidly decreasing costs associated with these platforms have recently made it feasible to sequence the entire aggregate genome of a tissue sample without any regional targeting. From a clone detection perspective this means that multiple types of mutations of all functional varieties (both likely passengers and suspected drivers) can be simultaneously assessed. While it only takes a single clonal mutation to identify an expanded population, the redundancy conferred by screening the entire genome provides a huge amount of additional lineage data with the potential to be used for subanalyses such as approximation of a clone’s mitotic age or the phylogenetic relationship between different clones.
The digital manner in which these novel sequencing technologies operate lend them a much greater dynamic range of sensitivity than conventional techniques, making it possible to resolve populations that are subclonal relative to a collected sample. Such an ability means that in situations where spatial coherence of an expanding clone is not maintained, for example in myelodysplasia preceding blood cancers, detection at an early stage can still be accomplished . Similarly, a tolerance for clone mixing should allow for convenient, minimally invasive sampling techniques that disrupt cohesive growth patterns in epithelial tissues such as cell isolation from lavage, scrapings or body fluids rather than biopsy. The relatively high error rate of individual sequencing reads currently limits the average depth to which rare subclonal mutations can be accurately detected to about 2 orders of magnitude below pure clonality . A variety of improvements at the level of chemistry, hardware and analysis are continuing to enhance this resolution for all current platforms [118, 119]. An even newer generation of exotic “Fourth Generation” sequencing technologies on the horizon promises ultra-long read lengths with the ability to continuously re-sequence the same molecule for extremely accurate detection of rare molecular populations [120–122]. The pace of innovation in this area is staggering. Perhaps the only thing that can be stated with certainty about the technologies that will be available five years in the future is that they will look nothing like those from five years in the past.
The reasoning associated with clonality determination is complex and many potential confounders exist. In this section we highlight seven important ideas for consideration when interpreting experimental findings.
Any mutation in a cell that undergoes clonal expansion will be passed onto progeny, irrespective of functional status (Fig. 3A,B). Given the size of the genome, more mutations carried by a neoplastic clone will be passengers than drivers [27, 29, 30]. To demonstrate driver status, supporting functional studies and/or repeated observations of a mutated loci in neoplastic clones from independent individuals are needed.
A clone may exist and not be detected if none of the genomic sites being screened carry a unique mutation permitting it to be distinguished from the germline (Fig. 3C). The more sites examined, the higher the probability of detection becomes. Restricting a marker panel to suspected driver sites precludes detection of pathological clones driven by unknown factors.
Screening of mutational hotspots makes it practical with conventional technologies to have a reasonable probability of detecting a clone by random passenger mutations. Clones derived from a mutator lineage or cells residing in a highly mutagenic environment should be more densely marked with identifiable passengers (Fig. 3D), potentially reducing the number of markers needing to be interrogated to identify them. Emerging high-throughput sequencing technologies will eventually obviate the need to restrict screening to a fraction of the genome.
Detection of a mutation requires clonal expansion with most traditional methods of aggregate DNA analysis. The probability of identifying clonal mutations in an expanded population is a function of the number of sites screened, the mutability of these sites and the number of cell divisions having occurred in the lineage leading up to the final expansion. Particularly when considering hotspots, mutations will be occasionally detectable in any clonally-derived population at a statistically definable frequency. Because mutations are not routinely encountered in normal tissues, it is tempting to explain their presence in preneoplastic populations as the result of “genetic instability” (an elevated mutation rate). However, the phenomenon is more precisely explained by the fact that expansion does not routinely occur in most normal tissues. An elevated mutation rate itself will have no observable effect on clonal mutation frequency in the absence of expansion (Fig. 3E). To determine mutation rate from a clonal mutation frequency it is necessary to (A) know that the population being assessed is clonal and (B) have some metric of the number of cell divisions that occurred in the period between the zygote and the founding of the final clonal outgrowth. To infer that a rate is elevated, it is also necessary to know the normal in vivo mutation rate, which is, in turn, impossible to measure by this approach given that large clonal expansions do not routinely occur in normal adult tissue. Considering these complexities, the lack of resolution as to whether mutation rates are elevated in cancer is not surprising [1, 8–11].
Clones which are small relative to the portion of tissue sampled (Fig. 3F) or which become mixed with other clones (Fig. 3G) or adjacent normal cells (Fig 3H) during expansion will not be detectable by methods that have a low dynamic range of sensitivity. Moderate-to-large sample sizes coupled with a low sensitivity assay limits detection to relatively large, coherently expanded clones, which in some situations may be of particular clinical relevance. For scenarios where mixing is anticipated, the most extreme example being in blood cell populations, a high sensitivity assay and/or enrichment prior to conventional detection methods is needed to identify early clones.
As humans we ourselves are a clonal entity derived from a single fertilized egg. Every cell that arises after the first zygotic cleavage forms the root of a new subclone that is propagated through development. When progeny derived from a common root remain spatially clustered during embryogenesis, a clonal patch is formed. Patches that happen to be marked by a founder mutation (Fig. 3I) will be indistinguishable in an adult from a similarly marked clone that arose post-zygotically as the result of a neoplastic process (Fig. 3B). Because Lyonization occurs very early in development, maximum patch size is represented by those defined by X-linked markers. In some tissues, X-linked patches appear to be relatively small , whereas in others they are quite large: on the order of a standard clinical biopsy [43–46]. In terms of non-X-linked lineage markers, however, it is only those embryonic patches which bear identifiable mutations that have the potential to be confused with adult-derived clones. Both the abundance of mutagenic insults, and the number of fallible cell divisions during very early embryogenesis, are small compared to those of later development and adult life. This is particularly true of the highly proliferative epithelial tissues most susceptible to field-associated cancers. The number of mutations marking embryonic patches should thus be relatively low compared to the quantity marking clones arising from adult-onset neoplastic processes. Determining the frequency and, possibly, spectrum of mutations in a detected clone may help distinguish embryonic from post-zygotic origins but the separation is unlikely to be absolute. In a diagnostic situation, it will always be necessary to establish the baseline mutational signature of “normal” for each type of tissue and marker being assessed.
The clonal proliferation of B and T lymphocytes in response to antigen is part of a normal immune response. A few clonal processes including skewing of X-inactivation patterns in blood , and the development of patches of related crypts in the colon , appear to occur during normal aging, possibly as a result of drift coinciding with depletion of stem cell populations. For some clones that do represent early neoplastic processes, it may take years to accrue the other necessary alterations to progress to overt malignancy, and some may never advance in a patient’s lifetime. For example, chromosomal translocations pathognomonic for particular types of leukemia are not uncommonly found in leukocytes of healthy newborns and do not portend malignancy for the overwhelming number of individuals in whom they are present . Correlating the detection of clones under different circumstances with future outcomes is essential for determining clinical relevance.
Most adult cancers appear to develop over a period of years, but exist for the majority of this time as clinically unrecognizable entities . While tumorigenesis is strongly associated with age, environmental exposures and certain predisposing conditions, many unknown factors and randomly occurring replication errors contribute to the unpredictable clinical emergence of the disease. The underlying process of clonal evolution, itself, leaves a molecular signature in the genome of neoplastically transforming cell populations with the potential to be identified prior to the appearance of overt malignancy.
In this article we have reviewed different techniques for using neutral passenger mutations as lineage markers of preneoplastic clones and discussed important considerations for, and limitations of, their use. While the presence of certain driver mutations is of unquestionable clinical utility for partially predicting cancer risk in some predisposing conditions, passenger-based analysis has the benefit of identifying anomalous clonal proliferation in a mechanism-independent manner which remains operative in scenarios were growth results from unknown molecular drivers. Clinically, the mutational signature of evolving clones offers a means of precisely defining the margins of a disease process so that affected tissues may best be treated, resected or monitored for recurrence. With sufficient mutational information it should be possible, not only to detect neoclones, but to characterize their specific features such as size , spatial arrangement [23, 127], diversity among foci [65, 116] and lineage age [60, 107, 128], all of which are important metrics of evolution and evolvability in traditional ecological populations.
Powerful new tools for genetic analysis are rapidly making their way into the mainstream and promise to significantly alter the way in which cancer is diagnosed and managed over the next decade. The unprecedented amount of data generated by these technologies will soon afford complete access to the vast amount of somatic evolutionary history encrypted in the genome of early neoplasia, leaving only the daunting task of meaningfully deciphering it. Early detection has historically been among the most effective means of preventing cancer deaths  and we hope that the concepts and past approaches discussed herein will help stimulate creative thinking about ways in which this wealth of information can be translated to a new generation of clinical diagnostics.
JJS was supported by NIH grant F30AG033485 and the University of Washington Medical Scientist Training Program under NIH grant T32GM007266. MSH was supported by NIH grants R01DK078340 and DP1OD003278.
Conflict of interest
The authors declare that there are no conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.