|Home | About | Journals | Submit | Contact Us | Français|
Summary: If a cancer patient develops multiple tumors, it is sometimes impossible to determine whether these tumors are independent or clonal based solely on pathological characteristics. Investigators have studied how to improve this diagnostic challenge by comparing the presence of loss of heterozygosity (LOH) at selected genetic locations of tumor samples, or by comparing genomewide copy number array profiles. We have previously developed statistical methodology to compare such genomic profiles for an evidence of clonality. We assembled the software for these tests in a new R package called ‘Clonality’. For LOH profiles, the package contains significance tests. The analysis of copy number profiles includes a likelihood ratio statistic and reference distribution, as well as an option to produce various plots that summarize the results.
Availability: Bioconductor (http://bioconductor.org/packages/release/bioc/html/Clonality.html) and http://www.mskcc.org/mskcc/html/13287.cfm.
Multiple cancerous lesions in the same patient can be either clonal or independent. Clonal tumors, for example, a primary tumor and its metastasis, originate from the same ‘clonal’ cell. A second primary tumor develops completely independently from the original primary. This distinction can have important clinical implications. For example, a lung nodule in a survivor with a previous head/neck cancer is potentially curable by surgery, if it is a primary lung cancer, but not if it is a metastasis from the original primary. In such cases, the genomic profiles of tumors can provide insight into their relationship, since clonal tumors necessarily possess at least some identical somatic mutations. Researchers have utilized two types of genetic data for evaluating clonality. The most common approach is to evaluate heterozygous markers at a set of candidate genetic loci for loss of heterozygosity (LOH). This strategy has promise as a clinical tool, because of its simplicity, and because it can be accomplished easily with minimal quantities of DNA. More recently, investigators have compared genomewide copy number arrays (CNAs) (e.g. CGH arrays) to search for identical copy number gains or losses. This is a more elaborate and expensive strategy, less likely to see clinical application in the near future, but potentially valuable for its greatly increased resolution compared with the use of candidate markers. We have previously published statistical methodology for formal statistical testing in both settings. For LOH data, we have developed a concordant mutations (CM) test (Begg et al., 2007) and a likelihood ratio (LR) test (Ostrovnaya et al., 2008). For CGH data, we proposed a LR statistic and an algorithm for calculating its reference distribution under the ‘independence’ hypothesis (Ostrovnaya et al., 2010a). We have created the R/Bioconductor (Gentleman et al., 2004; R Development Core Team, 2010) package called ‘Clonality’ that implements these methods. This enables users to summarize genetic profiles of tumors and to perform tests for clonal relatedness.
We describe the seven principal functions that are included in the package and discuss how to interpret the output. All tests are applied to pairs of tumors from the same patient. If multiple tumors per patient are provided, all pairwise comparisons are performed.
The function ‘LOHclonality()’ combines both tests we have developed for LOH data at the candidate loci (usually 10–30 markers). The main input is a matrix of LOH calls, where each marker has to be represented by one of three user-specified symbols denoting no LOH, LOH in allele 1 or LOH in allele 2. Markers that are not informative (e.g. homozygous) in a particular tumor should be given an NA instead of a call. The methodology assumes that the markers are independent (ideally from different chromosome arms) and that uninformative markers are missing at random. The user can choose to perform the CM test, the LR test or both. The test statistic for the CM test is the number of concordant losses (i.e. LOH on the same allele). This is printed in the output along with counts of markers with an LOH in one tumor only, with no LOH in both tumors and so on. The P-value is produced using a theoretical reference distribution that relies on assumptions that alleles 1 and 2 are equally likely to be lost, and that each marker has the same probability of an LOH. This test is valid when the deviations from these assumptions are not substantial (Begg et al., 2007). The LR test does not make these assumptions, but it requires knowledge of the LOH frequencies for each marker. If they are not provided, they can be estimated from the original dataset or another group of reference patients with the same disease. The reference distribution for the LR test is obtained by simulating tumors with given LOH frequencies. Unlike the CM test, concordant LOH at an infrequently occurring marker can provide stronger evidence for clonality than at a commonly occurring marker. The LR test, while free of the assumptions of the CM test, is preferred only when the true LOH frequencies in the specific patient population of interest are known or can be estimated with high confidence.
For the analysis of CNAs, the software requires the package DNAcopy (Olshen et al., 2004; Venkatraman et al., 2007). The input into the main function, ‘clonality.analysis()’, has to be a CNA object used in DNAcopy; it combines chromosome, genomic locations and each sample's log-ratios in columns. The chromosomes should be split into arms, since it increases the number of independent genomic units for the analysis. This can be done using function ‘splitChromosomes()’. The clonality analysis consists of several steps:
It is important to make sure that small gains and losses that could potentially be germline copy number variants (CNVs) are excluded from the arrays prior to clonality analysis. These appear as pronounced changes that are exactly the same in both tumors, and are thus mistaken to be strong evidence toward clonality by our algorithm. In order to exclude CNVs, the arrays should be either compared to their matching normal arrays, or to the database of genomic variants (http://projects.tcag.ca/variation/), or some other screening method can be applied (e.g. Ostrovnaya et al., 2010c).
Since only one genomic change per chromosome arm is allowed, we suggest limiting the resolution of the array to roughly a total of 5000–15 000 markers. If the array has more markers, then mutually exclusive blocks of several adjacent markers can be averaged using the function ‘ave.adj.probes()’. The number of markers averaged should be calculated based on the original resolution of the array. Averaging also helps reduce noise in the arrays, it removes waves in log-ratios that can result in false genomic changes, and it simplifies the copy number patterns. If the user is not confident that all CNVs are removed, then a lower resolution (e.g. 2000 markers) is recommended.
The help file for ‘clonality.analysis()’ contains a real data example—analysis of pairs of breast cancer lesions published by (Hwang et al., 2004) and available from the authors' web site.
Two functions, ‘chromosomePlots()’ and ‘genomewidePlots()’, provide per chromosome and genomewide plots of the data and one step segmentation for pairs of tumors. These plots show where the clonality signal is coming from; they illustrate the concordance between general mutational patterns and the match between specific gains and losses. The overlaying histograms of the LRs and the reference distribution under the hypothesis of independence are produced by ‘histogramPlot()’. The extent of overlap between the two histograms can help define the cut-offs for diagnosing patients: the upper right tail of the reference distribution might be interpreted as an equivocal area, while the LRs above or below the equivocal area receive unambiguous diagnoses.
We are grateful to Kevin Eng for programming work conducted early in the development of this project.
Funding: National Cancer Institute award (CA124504).
Conflict of Interest: none declared.