Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2761024

Formats

Article sections

Authors

Related links

Biometrics. Author manuscript; available in PMC 2009 October 13.

Published in final edited form as:

Published online 2008 February 11. doi: 10.1111/j.1541-0420.2008.00988.x

PMCID: PMC2761024

NIHMSID: NIHMS130479

The publisher's final edited version of this article is available at Biometrics

See the article"Statistical Tests for Clonality" in *Biometrics* on page 522.

See other articles in PMC that cite the published article.

In a recent article Begg et al. (2007) proposed a statistical test to determine whether or not a diagnosed second primary tumor is biologically independent of the original primary tumor, by comparing patterns of allelic losses at candidate genetic loci. The proposed Concordant Mutations Test is a conditional test, an adaptation of Fisher’s Exact Test, that requires no knowledge of the marginal mutation probabilities. The test was shown to have generally good properties, but is susceptible to anti-conservative bias if there is wide variation in mutation probabilities between loci, or if the individual mutation probabilities of the parental alleles for individual patients differ substantially from each other. In this article, a likelihood ratio test is derived in an effort to address these validity issues. This test requires pre-specification of the marginal mutation probabilities at each locus, parameters for which some information will typically be available in the literature. In simulations this test is shown to be valid, but to be considerably less efficient than the Concordant Mutations Test for sample sizes (numbers of informative loci) typical of this problem. Much of the efficiency deficit can be recovered, however, by restricting the allelic imbalance parameter estimate to a pre-specified range, assuming that this parameter is in the pre-specified range.

Cancer pathologists are increasingly exploring the use of genetic fingerprinting to assist in classifying tumors. This is likely to be particularly useful in distinguishing second primary cancers from metastases, especially in clinical scenarios where this distinction is difficult when based solely on gross pathology, and where the correct diagnosis is clinically relevant. This is the case, for example, for contralateral cancers in the same organ type with the same histology, or for the occurrence of, say, a solitary lung nodule in a patient who has survived a previous head and neck primary of the same cell type (Geurts et al., 2005; Leong et al., 1998). Since tumors typically harbor many somatic mutations, the patterns of these mutations provide the evidence for distinguishing clonal tumors (i.e. metastases), characterized by common somatic mutations that occurred in the single, originating clonal cell, from independent tumors, where the patterns of mutations have no common origin. These genetic fingerprints can be determined by studying markers of somatic mutations, such as the presence of loss of heterozygosity, at candidate loci known to experience frequent allelic losses in the tumor type under evaluation. Numerous studies of this nature have been conducted in recent years in many cancer sites (see for example Ha and Califano, 2003; Hafner et al., 2002; Huang et al., 2001; Imyanitov et al., 2002).

In a recent article, our group proposed a new statistical test for this purpose (Begg et al., 2007). This is a relatively simple adaptation of Fisher’s Exact Test, in which the marginal frequencies of somatic mutations on the two tumors are fixed, and the test statistic is a simple count of the number of common concordant mutations that occur on the same parental alleles. We hereafter refer to this as the Concordant Mutations Test (CM). The reference distribution can be expressed as a simple combinatorial sum, as in Fisher’s Test. The attraction of this approach is its simplicity, allied to the fact that it is based solely on the data observed in a single patient. However, it was shown that its validity depends on two important assumptions. The first assumption is that the probability of a mutation is common across the genetic loci investigated. It was demonstrated that the range of mutation probabilities typically studied may have little impact on the properties of the CM test. The second assumption is that the mutation probability at each locus is the same for each parental allele. An imbalance in these probabilities has the effect of inducing correlation in the observed mutations between tumors even when the tumors arise independently. Although the CM test may work well when there are modest departures from these assumptions, it is of interest to explore the development of techniques that are not sensitive to these validity concerns.

The goal of this article is to explore a new test for this problem that is designed to circumvent these threats to the validity of the CM test. This new test is a likelihood ratio test that requires knowledge of the marginal mutation probabilities at each locus. Some knowledge of these probabilities will usually be available on the basis of data from the literature, and these estimates will improve as tumors are increasingly examined for patterns of allelic loss.

We use to the extent possible in the following the same notation as in Begg et al. (2007). The data consist of indicators of LOH, and indicators of whether common losses on the two tumors occurred on the same parental allele, at each of *J* informative candidate loci. Let the locus be denoted by the subscript *i,* and let *a _{i}* = 1 if a mutation occurs on the same allele in both tumors at the i

We assume at the outset that the individual mutation probabilities at each locus are distinct but known, and are denoted by *pi, i* =1,,,*J*. Generally some information regarding these mutational probabilities will be available from the literature on any locus that would be considered a candidate for studies of this nature, and the amount of such background information is likely to increase rapidly in the future from studies such as the Cancer Genome Atlas, sponsored by the National Cancer Institute (http://cancergenome.nih.gov). The tests are designed to distinguish two hypotheses: *H _{I}* and

Under the clonal hypothesis (*H _{C}* ), both of the tumors originate from a single (clonal) cell in which the pivotal somatic mutations occurred. Later, the colony of daughter cells from this clonal cell gives rise to the second tumor when one (or more) of these cells migrates to form a new (metastatic) colony. Subsequently the growth of both the original colony and the new colony may become dominated by cells that experience subsequent somatic mutations that confer a growth advantage. The occurrences of these subsequent mutations are “independent” in the sense described in the previous paragraph, i.e. if a candidate locus experiences mutations in both tumors the probability of concordance is

where 0 ≤ *c* ≤1 and 0.5 ≤*π* ≤ 1. The likelihood ratio statistic is *L* = *L*(*ĉ*,)/*L*(0,_{0}), where (*ĉ*, ) is the MLE of the unconstrained likelihood, while _{0} is the MLE of *π* when *c* = 0. These estimates are obtained by numerical maximization as there is no closed form solution in general. If there are no common mutations on the two tumors, i.e. if *a* and *e* − *a* are both 0, where *e* = Σ*e _{i}*, then the likelihood provides no information about

To obtain a reference distribution for the test statistic we utilize probability sampling from the estimated reference distribution under the null hypothesis. That is, for each locus *i* we generate (*a _{i,}e_{i}* −

As we subsequently show in Section 3, the preceding unconstrained LR test has sub-optimal properties in the context of our small sample setting, due to inadequate power to estimate *π* reliably. A pragmatic solution is to constrain the range of admissible estimates for *π*. Empirical results suggest that by restricting the MLE of *π* to the arbitrary range [0.5, 0.8] we can utilize the available information more efficiently, assuming that *π* truly does lie in this range. In the simulation in Section 3 we present results for this test, referred to as the LR(0.8) test.

We examined datasets with different numbers of informative loci (10, 20, 30), with signal strengths either null (*c* = 0), moderate (*c* = 0.5) or large (*c* = 0.9), with allelic probability imbalances represented by *π* = 0.5, *π* = 0.6 and *π* = 0.7, and with “average” mutation probabilities of *p* = 0.3 and *p* = 0.5. Variation in mutation probabilities was selected based on the variance of log {*p _{i}*/(1 −

These configurations demonstrate the operating characteristics of the likelihood ratio approach under perfect circumstances, i.e. where the values of {*p _{i}* } used in this test are known without error. In practice we will only have estimates of these quantities. To estimate the degree of plausible misclassification error in practice we have made use of a literature review for a planned clonality study in melanoma. In the planning of this study we identified 20 markers at sites for which LOH is common in melanoma. The reported frequencies of LOH at these sites ranged from 10% to 56%, and the denominators of these relative frequencies ranged from 9 to 23. Based on these statistics we make the assumption that the presumed known (logit) values of {

The simulations were generated in the following way. First, we specified values of {*p _{i}*},

The size and power of the tests are displayed in Table 1. We see that the likelihood ratio (LR) test succeeds in its goal of correcting the validity problems that affect the Concordant Mutations (CM) test when the mutation probabilities vary across loci and/or the alleles at each locus possess unequal mutation probabilities. The size of the test remains at <0.05 for the constrained version of the test (LR(0.8)), and the test continues to have good validity properties even when the “known” values of {*p _{i}* }used in the construction of the test are known only with considerable error (see the column denoted “LR

We also present the corresponding results for power when the data are generated from a clonal model with *c* = 0.5. [Data are not shown for the setting of a strong clonal signal, with *c* = 0.9, as all of the tests maintain high power in this configuration.] The results in the power section of Table 1 are “calibrated” to a significance level of 0.05 to facilitate direct comparison of the tests in their abilities to distinguish *H _{I}* and

We have re-analyzed the data set from Imyanitov et al. (2002) using both tests. Each of 14 genetic loci was examined for presence of LOH. The data consist of pairs of bilateral breast cancers from 28 patients. Overall the CM test classifies 7 of the 28 patients as clonal, while the LR test with the parameter *π* restricted to the range [0.5, 0.8] classifies 9 of the patients as clonal. For 24 of the 28 patients, the results of the two tests are concordant with respect to statistical significance at the 5% level. In general, the CM test gives greater weight to the presence of the concordant mutations, while for the LR test, data patterns that are inconsistent with the pre-specified marginal probabilities seem to have a greater influence than in the CM test. Examples from 7 selected patients are displayed in Table 2. Losses are denoted as “▲”, the absence of LOH by “○”, and non-informative loci by “–”. Concordant and discordant losses at the same locus are denoted by “▲▲” and “▲”, respectively. Case # 25 highlights the fact that the LR test can find significant evidence of clonality even in the presence of a single concordant mutation.

Selected Patients with Breast Cancer [Adapted from Imyanitov et al. (2002)]

Our research does not clearly establish the preferred test. The advantages of the likelihood ratio test are that it has a valid test size, and its discriminatory power is almost as good as the Concordant Mutations Test. On the other hand, the Concordant Mutations Test does not rely on pre-specification of individual marginal mutation probabilities at each locus, and it is simple to construct and calculate.

The two tests differ in their use of the data in an important way. The Concordant Mutations test statistic is the count of the number of (potentially clonal) concordant mutations. Thus it can only lead to a result in favor of *H _{C}* if there are significantly more concordant mutations than expected under

The research was supported by the National Cancer Institute, award number CA098438.

- Begg CB, Eng KH, Hummer AJ. Statistical tests for clonality. Biometrics. 2007;63:522–30. [PMC free article] [PubMed]
- Geurts TW, Nederlof PM, van den Brekel MW, van’t Veer LJ, de Jong D, Hart AA, van Zandwijk N, Klomp H, Balm AJ, van Velthuysen ML. Pulmonary squamous cell carcinoma following head and neck squamous cell carcinoma: metastasis or second primary? Clinical Cancer Research. 2005;11:6608–6614. [PubMed]
- Ha PK, Califano JA. The molecular biology of mucosal field cancerization of the head and neck. Critical Reviews in Oral Biology and Medicine. 2002;14:363–369. [PubMed]
- Hafner C, Knuechel R, Stoehr R, Hartmann A. Clonality of multifocal urothelial carcinomas: 10 years of molecular genetic studies. International Journal of Cancer. 2002;101:1–6. [PubMed]
- Huang J, Behrens C, Wistuba I, Gazdar AF, Jagirdar J. Molecular analysis of synchronous and metachronous tumors of the lung: impact on management and prognosis. Annals of Diagnostic Pathology. 2001;5:321–329. [PubMed]
- Imyanitov EN, Suspitsin EN, Grigoriev MY, Togo AV, Kuligina E, Belogubova EV, Pozharisski KM, Turkevich EA, Rodriquez C, Cornelisse CJ, Hanson KP, Theillet C. Concordance of allelic imbalance profiles in synchronous and metachronous bilateral breast carcinomas. International Journal of Cancer. 2002;100:557–564. [PubMed]
- Leong PP, Rezai B, Koch WM, Reed A, Eisele D, Lee DJ, Sidransky D, Jen J, Westra WH. Distinguishing second primary tumors from lung metastases in patients with head and neck squamous cell carcinoma. Journal of the National Cancer Institute. 1998;90:972–977. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |