Precise timing of chromosomal abnormalities provides a wonderfully detailed glimpse of the etiology of a single tumor. However, we have demonstrated that there are limitations to this technique. In particular, we have shown that for high-level amplifications, most of the possible combinations of events that result in large amounts of amplification will not retain enough information in the allele frequencies to be able to estimate the ordering. Only regions where the amplification follows one single lineage can be timed using this model. This may result in a biased impression of the etiology, as this type of amplification may be predominant for the promotion of certain types of abnormalities and may miss many other types of oncogenes.
As we note in the introduction, our focus has been the traditional one of copy-number analysis, where each region in the normal genome is analyzed separately as to its behavior in the tumor. With exome sequencing, this traditional viewpoint is still the only one available. With whole-genome sequencing, as we noted, one can analyze the relationships between the regions and order the events using the information from other regions. In this case, a single region A can share an event with another region B if the amplification brought the two into proximity to each other through an insertion. Then there are additional constraints on estimating
jointly, as that event must occur at the same moment for both. This implies that with reasonably deep whole-genome sequencing such that these relationships are reliably determined, there will be a larger percentage of histories that are identifiable.
In addition, early events, which are of particular biological significance, are sensitive to estimation procedures and large numbers of mutations are necessary to be able have stable estimates of the time of occurrence. Of even greater difficulty is the ordering of a collection of early events. Even with whole-genome sequencing, some regions will not have the hundred or more mutations that our simulations show are necessary to distinguish early events, particularly in tumors with low mutation rates.
However, we have also shown that differences in estimation techniques can help provide better estimates and confidence intervals for temporal estimates. We have introduced a full MLE to handle sequencing variability due to lower read coverage, as well as a Bayesian estimation technique. We have shown the full MLE can provide improvement with read depths as large as 30×, and even up to 75× or higher if there is normal contamination or early events. The Bayesian estimates have a varying performance for different values of the parameter space, but can provide increased stability, particularly in their estimates of confidence intervals for the estimates.
Ultimately, the ability to successfully estimate π also relies on intrinsic properties of the cancer. In the skin tumors, only half of the samples had CNLOH over the tumor suppressor gene TP53 (not all of which were examined in this work); in the other samples, both copies of TP53 were also inactivated but through multiple mutations, not a chromosomal abnormality. Other important regions may be too small in a particular sample to have sufficient mutations—the regions we ordered were large, sometimes entire chromosomal arms. Some tumors, such as the ovarian, have low mutation rates so that even with whole-genome sequencing many regions will have few mutations or not enough to confidently distinguish between events. While 30–60% of the abnormal regions could theoretically be timed in our sample, the percentage that had enough mutations was generally 20–30%. Therefore, timing of the chromosomal abnormalities of a single sample remains extremely fragmentary, and an insight into tumor etiology will still ultimately be gained by comparing the temporal ordering of many tumors.