Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Int J Cancer. Author manuscript; available in PMC 2010 June 15.
Published in final edited form as:
PMCID: PMC2670951

Construction of oncogenetic tree models reveals multiple pathways of oral cancer progression


Oral cancer develops and progresses by accumulation of genetic alterations. The interrelationship between these alterations and their sequence of occurrence in oral cancers has not been thoroughly understood. In the present study, we applied oncogenetic tree models to comparative genomic hybridisation (CGH) data of 97 primary oral cancers to identify pathways of progression. CGH revealed the most frequent gains on chromosomes 8q (72.4%) and 9q (41.2%), and frequent losses on 3p (49.5%) and 8p (47.5%). Both mixture and distance-based tree models suggested multiple progression pathways and identified +8q as an early event. The mixture model suggested two independent pathways namely a major pathway with −8p and a less frequent pathway with +9q. The distance-based tree identified three progression pathways, one characterized by −8p, another by −3p and the third by alterations +11q and +7p. Differences were observed in cytogenetic pathways of node-positive and node-negative oral cancers. Node-positive cancers were characterized by more non-random aberrations (n=11) and progressed via −8p or −3p. On the other hand, node-negative cancers involved fewer non-random alterations (n=6) and progressed along −3p. In summary, the tree models for oral cancers provided novel information about the interactions between genetic alterations and predicted their probable order of occurrence.

Keywords: oral cancer, CGH, comparative genomic hybridisation, oncogenetic tree, progression pathways, genetic progression score


Oral squamous cell carcinomas (OSCC), like all solid tumours, are characterized by multiple chromosomal alterations and are genetically complex1. Dependencies between the numerous genetic alterations lead to observed karyotypic complexity which results in the distinct biological behaviour of oral cancers2. For example, node-positive OSCC are biologically aggressive and have poor prognosis as compared to the node-negative OSCC3. This indicates that different genetic pathways of progression exist in oral cancers, leading to the molecular subtypes with distinct clinical outcomes. Hence, it is necessary to identify the genetic alterations and the interactions between them that form multiple progression pathways. This approach may aid in better understanding the biology of oral carcinomas.

Comparative genomic hybridisation (CGH), a genome-wide profiling technique, has revealed non-random pattern of genetic alterations in oral cancers4-9. An early study suggested that genomic alterations in OSCC may be more uniform than those of other solid tumours5, but a more recent study demonstrated that the initiation and progression of oral cancers involves divergent pathways9. Because the study used frequency analysis, it could not evaluate interactions between the alterations and provided limited information on genetic pathways. Thus, it is desirable to use additional statistical methods that account for interactions and can estimate genetic pathways of cancer progression from CGH data.

Cancer progression has been described by mathematical models, such as oncogenetic tree models10-12. Tree models are more flexible than linear models of progression proposed earlier13, because trees can represent multiple pathways simultaneously. Oncogenetic tree models have been constructed for renal cell carcinomas, bladder cancers, head and neck cancers, nasopharyngeal cancers, prostate cancer, B-cell lymphomas and meningiomas14-20. For each cancer type, the tree models have identified multiple progression pathways and revealed different subtypes characterized by combinations of alterations

To date, the pathogenetic pathways followed by oral cancer have not been thoroughly investigated. Elucidating the divergent routes in oral carcinomas could provide information about molecular subtypes, which might support treatment decisions. With this long-term goal in mind, in the current study, we constructed oncogenetic tree models based on comparative genomic hybridisation (CGH) data of 97 primary oral cancers. Both mixture models and distance-based trees models were used to analyse genetic alterations in OSCC. Distance-based tree models were also constructed separately for node-positive and node-negative OSCC.

Materials and Methods

Study Population

Tumour tissues were collected from 97 oral cancer patients who underwent surgical resection at the Tata Memorial Hospital (TMH), Mumbai. Tissue collection and the entire study protocol were approved by the Institutional Review Board at TMH. Informed consent was obtained from the patients. All patients underwent neither chemotherapy nor radiation therapy before surgery. After microdissection of tissues, the pathologist confirmed that each tissue sample had ≥60% tumour cell content. All the tumour samples were graded and staged according to the WHO and TNM & AJCC 2002 classification of tumours, respectively. Clinico-pathological data of the cases are summarized in Table 1. The study group consisted of 75 males and 22 females with median age of 52 years, ranging from 23 years to 77 years. There were more node-positive tumours (n=54) than node-negative tumours (n=43).

Clinico-pathological characteristics of 97 oral cancer patients

CGH analysis

CGH was performed as described previously using direct labelling method with fluorochrome labelled dUTPs16, 21. Using standard nick translation method, tumour DNA was labelled with Fluorescein-12-dUTP and normal DNA was labelled with Texas Red-5-dUTP (NEN, Boston, MA). An equal quantity (2 μg each) of tumour and normal DNA were mixed with 10 μg of unlabelled human Cot-1 DNA (Invitrogen, USA) and dissolved in 10 μl of hybridisation buffer. The denatured probes in the mixture were hybridized onto the normal metaphase spreads (Vysis, Inc, Downers Grove, IL) at 37°C for 48 hrs. After post-hybridisation washes, the slides were counterstained with DAPI (Vectashield, Burlingame, CA) for chromosome identification before visualization by the fluorescence microscope (Zeiss Axioscope, Zena, Germany). The images were analyzed by digital image analysis system (Metasystems, Germany). The average ratio of green to red fluorescence intensity was calculated for each chromosome. The thresholds were set at 1.25 and 0.75 to determine copy number alterations (CNAs), i.e. gains and losses, respectively. The thresholds 1.25 and 0.75 are commonly, though not universally used in CGH. The thresholds were proposed in the original paper on CGH21 and further evaluated in early studies22, 23. The study of Jeunken et al24 showed that using thresholds of 1.2 and 0.8 did not make much difference in practice. Among the six previous studies on CGH of oral cancer that we found4-9, some used 1.25/0.75 thresholds; others used 1.2/0.8. So far as we could determine, none of these six studies analyzed their data using more than one pair of thresholds.

Fluorescence in situ hybridisation

Interphase FISH was performed on archival oral cancer samples with known CGH results in order to evaluate the CGH-predicted gains of 11q13 and 8q24.3. Dual coloured FISH was performed on 4μm sections of archival OSCC samples (n = 28) and the corresponding normal oral mucosal tissues (n = 3). Centromere-specific BAC clones were Cy-3 (red) labelled and region-specific BAC clones were FITC (green) labelled using standard nick translation method. All BAC clones were obtained from the BACPAC resource (Children's Hospital Oakland-BACPAC Resources, Oakland, CA). The specific BAC clones we used were: RP11−642A1 for region 8q24.3, RP11−73M19 for chromosome 8 centromere, RP11−149G19 for 11q13 region and RP11−135H8 for chromosome 11 centromere.

Sections on slides were first deparaffinised in xylene and then treated using commercially available Vysis pre-treatment kit (Vysis, Inc USA). Briefly, this involved treating the sections for 30min in 1M sodium thiocyanate, digesting for 20min with protease at 0.5mg/ml in 0.01N HCl, and fixing in 10% neutral buffered formalin. For FISH experiments, labelled probes, were added to a slide in a hybridisation solution containing 50% deionised formamide, 10% dextran sulphate, 2X SSC, 2mg salmon sperm DNA, and 10mg Cot-1 DNA. The slides and probe DNA were denaturated at 75°C for 10min and hybridised overnight in a humidified chamber at 37°C. Subsequently, the slides were subjected to post hybridization washes 50% deionised formamide/2X SSC at 42°C for 10min and three times in 2X SSC at 42°C for 5min. Interphase nuclei were counterstained with 4,6-diamidino-2-phenylindole (DAPI) in Vectashield mounting medium (Vector Laboratories, Burlingame, CA, USA).

For evaluation of the experiments, hybridisation signals from 100 non-overlapping interphase cell nuclei of each tumour sample were counted using a fluorescence microscope. A copy number gain was scored, if the average number of signals per nucleus was greater than or equal to three (≥3).

Construction of oncogenetic trees

Two types of oncogenetic tree models were used to describe the occurrence of genetic alterations during the progression of oral cancer. Distance-based trees were constructed using the software oncotrees10, 11 ( and mixture models of trees were generated by the Mtreemix software package (

Chromosomal aberrations from CGH data were used as input for the tree modelling procedures. Abnormalities on 1p, 16p, 22q and Y chromosome arms were excluded from analysis, because these chromosome regions are guanine/cytosine rich regions which are known to yield false positives. For construction of tree models, the CGH profile was recorded as presence/absence of a gain and presence/absence of a loss on a chromosome arm. We first tried using more precise single digit bands (data not shown), but this representation led to confounding of spatial relationships between bands on the same chromosome arm and the desired temporal relationships between copy number aberrations.

Trees were constructed from CNA events selected as non-random by the method of Brodeur26, which is implemented within oncotrees. The method of Brodeur requires a prior distribution for CNA occurrences. For this purpose, we initially assumed that the probabilities of a gain or loss on a chromosome arm are equal and proportional to the chromosome arm size; we used arm sizes from Morton27. However, it is known that gains are more common than losses in OSCC4, 5, 7-9, and in the present data set, gains were approximately twice as frequent as losses. Therefore, we also used a 2:1 skewed prior distribution in addition to the balanced prior. To reduce the number of events selected as non-random we used a threshold of 99th percentile in the test statistic of Brodeur et al. The events selected as non-random scored above the 99th percentile using both the balanced and the 2:1 skewed prior distributions. The selection of non-random events was redone for each tumour subset considered.

In the oncogenetic tree model, the root corresponds to the normal state of the cell. Vertices of the tree represent genetic alterations (events) and edges between vertices represent statistical dependencies between events. Each vertex is associated with the probability that the corresponding event will occur provided that the preceding event in the tree has already occurred. Thus, aberrations that are placed close to root in the tree are estimated to occur early in tumour development, while those at longer distances are estimated to occur late in tumour progression. Correlated events tend to cluster in subtrees. In the tree models, the genetic events are assumed to be irreversible.

Mixtures of oncogenetic trees

The oncogenetic trees mixture model12, 18, 25 consists of several weighted components, each of which is an oncogenetic tree as described above. Using several tree components allows for more flexible modelling of oncogenetic pathways than using a single tree. We fixed one tree component with a star topology and uniform transition probabilities. This tree models events as independent and uniform and thus serves as a null model. The fraction of samples assigned to this tree does not show apparent branching-like correlations. By contrast, the structure and parameters of the other tree components are learned from the data and they reveal pronounced dependencies among the cytogenetic alterations. The star component captures the possibility that gains and losses occur at random with no dependencies. The mixture model allows the addition of one or more non-star trees to model the dependencies between copy number aberrations. Using more non-star trees may fit the data better, but is susceptible to overfitting. We used a modified Bayesian Information Criterion (BIC) to trade off the increasing complexity of the model with the improved fit to the data as the number of tree components increases28. Using this model selection technique, one additional non-star tree gave the best BIC score.

Distance-based trees

The distance-based trees were constructed as described in previous studies16, 29. For the distance-based trees we use the Fitch30 and Neighbor31 programs of PHYLIP32 to fit the distances to a tree. In all cases, the Neighbor program gave a better fit, which means the following. Let Iij be the input distance between CNAs i and j, and let Tij be the distance implied by the output tree. We have two choices for the matrix T, one from the Fitch program and one from the Neighbor program. We define a matrix of differences Dij = |Iij – Tij|. One choice of T “fits” better than another, if it gives a matrix D that has smaller entries. The total size of D can be measured by standard matrix norms. When we do so for these trees, the tree distances T from Neighbor always give a better fit than those from Fitch.

To compute bootstrap confidence levels for each split in the distance-based trees we used a three-step procedure. First, 100 bootstrap samples were generated and their associated distance matrices were computed using the bootstrapping module of oncotrees. Second, we used the Neighbor program to fit each bootstrap sample to a tree. Third, we used the Consense program from PHYLIP to compute the number of times each split in the tree occurred perfectly. This method counts positively only those trees in which a split into subtrees occurred exactly as in the original distance-based tree; therefore one expects confidence levels to be fairly low when the split involves more than two or three events.

Tree visualisation

Mixture models are drawn with Graphviz33, and distance-based trees are drawn with TreeView34. The relevant distance in the visualization of distance-based trees is the horizontal distance from the root node to the node representing a copy number aberration or the horizontal distance between two copy number aberrations. Vertical edges and distance are used to spread out the tree for easier visualisation.

Genetic progression scores

The genetic progression score (GPS) of an observed tumour sample is defined as the expected waiting time of the mutational pattern of the tumour in the timed oncogenetic trees mixture model18. The absolute values of this quantity are only meaningful if information on the true age of each tumour is available. However, even without scaling, the expected waiting time provides a useful measure of genetic progression that is based on the tree model. Unlike simple counting of alterations, the GPS accounts for dependencies between events. The GPS distributions of node-positive and node-negative patients were compared using the Wilcoxon rank-sum test.

Time of occurrence

A simpler method to infer the relative early and late occurrence of the events is called time of occurrence (TO) analysis10, 35. One computes for each event, how many other events occur in all tumours that have the event. The general concept of time of occurrence analysis is that an event A occurs before an event B if the number of events co-occurring with A is smaller than with B. To compare the number of cooccurring events for A and B, one could use as test statistics the average, the median, or the mode. Desper et al.10 recommended using the average. Höglund et al.35 who coined the term “time of occurrence analysis”, recommended using the mode. Unfortunately, in our data set, many CNAs have a mode of “9” in the distribution of number of co-occurring aberrations, so the mode is not a very useful test statistic. Therefore, we followed the recommendation of Desper et al.10 and used the average number of co-occurring events to propose an order of events.


Copy number alterations (CNAs) detected by CGH in 97 primary oral cancers

Copy number gains were more common than losses. The average number of total CNAs was 9.90 (range, 1−30). The most common gains were observed on chromosome regions 8q (74.2%), 9q (41.2%), 11q13 (41.2%), 7p (37.1%), 3q (35.1%), 20q (33%), 20p (29%), and 5p (24%). The most common losses were observed on chromosome regions 3p (49.5%), 8p (47.5%), 18q (34%), and 11q14-qter (20.6%). The bandless CGH data are available at

Confirmation of some CGH results by FISH

To confirm some of the CNAs detected by CGH, we performed FISH on 28 samples to evaluate the regions 11q13 and 8q24 that are frequently gained (Figure 1). For samples where we could obtain FISH signals, the concordance rates were good (81% for 11q13 and 88% for 8q24.3; Table 2). For 11q13, the five discrepancies were due to tissues that were non-informative in two cases, and FISH detected a gain that was not found by CGH in three cases. For 8q24, the three discrepancies comprise two non-informative tissues and one case of FISH detected a gain that was not found by CGH.

Figure 1
Interphase FISH analysis of two chromosomal gains
Table 2
Concordance of CGH results and FISH results for two regions

Construction of oncogenetic trees for oral cancer progression

The method of Brodeur et al. (see Materials and Methods) selected 12 CNAs that occurred more frequently than would be expected at random: +5p, +7p, +8q, +9q, +11q, +17p, +18p, +20p, +20q, −3p, −8p and −18q. Here, a plus (+) symbol indicates the gain of a chromosomal region and a minus (−) represents a loss. Oncogenetic and distance-based trees were constructed using these 12 events. Both models represent the apparent multi-step and multi-pathway process of oral carcinogenesis (Figure 2 and Figure 3).

Figure 2
Oncogenetic trees mixture model for all oral cancers
Figure 3
Distance-based tree model for all oral cancers

Oncogenetic trees mixture model

We estimated oncogenetic trees mixture models consisting of a star component and a non-star tree component in order to obtain a concise description of the genetic development of oral cancers (Figure 2). A third of the tumours can be explained by the non-star tree component. In this branching tree, the root vertex corresponds to the normal oral keratinocyte, whereas the other vertices represent the CNAs of interest. Because the event +8q is the only direct successor of the root, it is predicted to be the initial event. Once this event occurs the occurrence of subsequent events becomes much more likely. Following +8q, the branching tree displays two independent pathways, one consisting only of event +9q, the other comprising the ten remaining events and starting with −8p followed by −3p. The latter pathway was predicted to further branch into two pathway beginning with −18q and +7p, respectively. After the initial +8q event, the large 10-event sub-branching is more likely to develop than the +9q pathway, the likelihood ratio being 0.79: 0.51 = 1.55.

Distance-based trees

In the distance-based tree model, the time to occurrence is proportional to the root– leaf horizontal distance (Figure 3). According to this model, +8q was an early event. After the occurrence of +8q the distance-based tree classified other events into two or three clusters. One cluster is marked by −8p and −3p (comprising −8p, −18q and +18p; and −3p, +9q and +17p); it might be split into two sub-clusters of three events each, but the bootstrap confidence for the split into two sub-clusters is low (14%). The other cluster is marked by events +11q and +7p (comprising +11q, +7p, +20p, +20q and +5p). These two clusters suggest OSCC genetic subtypes.

Though similarities were observed among the tree models, there were inconsistencies in terms of whether i) −3p is part of the −8p pathway and ii) +9q depends on −3p or not.

Construction of oncogenetic trees for node-negative and node-positive oral cancers

In a univariate analysis comparing node-positive and node-negative tumours, four events showed significant association with node-positive status by a one-sided Fisher's exact test: −8p (p < 0.008), +7p (p < 0.01), −18q (p < 0.04), and +9q (p < 0.04). These associations do not remain significant after correcting for multiple testing using the false discovery rate (FDR) method36. Using the FDR method, −8p (adjusted p < 0.06) and +7p (adjusted p < 0.06) approached statistical significance. Thus, associations of single genetic imbalances with nodal status are not significant after correcting for multiple testing. However, this finding does not preclude the possibility that associations of two or more imbalances with nodal status might be statistically significant. To test this hypothesis, we focus on multivariate analyses which could shed more light on the genomic differences between node-positive and node-negative OSCC.

Separate distance-based tree models were constructed to understand the progression pathways in node-positive versus node-negative oral cancers. More events (n=11) were selected as non-random in node-positive OSCC than in node-negative OSCC (n=6) (Figure 4), consistent with the hypothesis that node-positive tumours are more advanced. For both sets of tumours, the non-random events were chosen systematically by the established method of Brodeur et al. (1982), although we raised the cutoff to the 99th percentile instead of the 95th. Being more stringent is not helpful. If we are less stringent (e.g., 95th percentile), then the number of events selected as non-random grows, and the tree models become unwieldy.

Figure 4Figure 4
Distance-based trees for node-negative (upper) and node-positive oral cancers (lower)

According to Figure 4, the node-positive cancers may be classified into two main groups: one group includes +9q, −8p, +18p and −18q, while the other group includes, +7p, +11q, +20p, +20q and +5p. Events +8q and −3p are early events that do not fit clearly into either of these two groups. The splitting of the two large groups has only moderate bootstrap support (15%), while the separation of +8q and −3p is more pronounced (63% and 25%, respectively). The event +9q, which is suggested as an important indicator of progression by the mixture tree model on all tumours, is selected as non-random for the node-positive tumours, but not for the node-negative tumours. The numbers of tumours in each subset were considered too small for mixture tree analysis

Further, the distance-based tree models predicted progression of node-negative OSCC by two main classes (Figure 4, bootstrap confidence 69%). One class included tumours that did not progress much beyond the aberrations on chromosome 8. The other class had two subgroups for which the confidence in the split is 64%: subgroup 1 which included tumours that progressed with −3p and subgroup 2 that progressed with +11q. The events in both these clusters were not placed near to each other. The alterations +7p, −18q, +20p and +5p were not selected for the node-negative OSCC. Overall the distance-based tree model indicates that this subtype was karyotypically less complex as compared to the node-positive OSCC.

Genetic progression score

For each observed tumour, the GPS measures the level of genetic progression in the oncogenetic tree model. Node-positive cancers had progressed significantly further along the tree model than node-negative cancers (median GPS of 0.87 versus 0.59, p = 0.0016; Figure 5).

Figure 5
Genetic progression score (GPS) for the node-negative and the node-positive oral cancers

Time of occurrence

TO statistics for the non-random events are shown in Table 3. The average number of imbalances in tumours harbouring −8p was less than the average for −3p, which indicates that −8p would occur before −3p. The values for all alterations except +18p were consistent with the predictions of the tree analysis. The exception might be due to the lower frequency of +18p in our study (13.4%).

Table 3
Time of occurrence (TO)


The non-random CGH pattern of oral cancers detected in our study was generally consistent with previous studies4-9, although some of the losses reported as common vary from study to study; this variation may be due to differences between the populations sampled37. Among the observed alterations, the gain of 8q and loss of 3p have been reported to be the early events in oral carcinogenesis38, 39. Besides these alterations, the sequence of other aberrations in oral tumorigenesis is not known. Elucidating progression pathways from complex CGH data requires statistical modelling techniques such as oncogenetic trees. The current study is first to apply tree models to CGH data obtained from a reasonably large number (n=97) of primary oral cancers. Among the previous CGH studies of oral cancer, the largest number of tumours in one study was 359.

The branching mixture models and distance-based tree models identified divergent pathways of progression in oral cancers. There were similarities and differences in the sequence of alterations revealed by them. The common findings led to the following inferences: i) +8q is located near the root, hence it is an early event in cellular transformation of oral cancers; ii) at least three subtrees emerge subsequent to the occurrence of +8q, which indicate that there are divergent pathways of progression in oral carcinomas; iii) alterations +5p, +17p, and +18p are late events due their long distance from the root; iv) a close relationship exist between +7p and +11q; and +20p and +20q alterations, hence these are present in the same subcluster. Some inconsistencies were observed mainly in the relative ordering of −3p and +9q alterations, namely whether −3p is part of the −8p pathway or not and with respect to the placement of +9q, whether it is part of the −3p pathway or independent of the other events. Part of these inconsistencies may result from the fact that only about a third of the data can be mapped to the oncogenetic tree component displayed in Figure 2.

Comparing the current tree models of oral cancers with the tree analysis of the head and neck cancers, we found that some of the alterations (−3p, +8q, +5p, +17p, and +18p) were selected as non-random events for both the present oral cancer data set and an early collection of head and neck cancer16. Also, the prediction of occurrence of +8q and +18p suggested for the subset of head and neck cancers, coincided with the progression pathways identified by our study. Thus, +8q appears to be an early event in the development of oral cancers. Activation of the oncogene c-MYC on chromosome region 8q24 has frequently been implicated in oral carcinogenesis, but some studies have suggested other genes on 8q38.

Following the occurrence of +8q, the losses of 8p and 3p were predicted as subsequent early events during progression of OSCC. This novel finding seems to be in contrast to previous studies which have reported that −3p is an early event39 or explicitly predicted that the loss of 3p precedes the loss of 8p in both oral premalignant and malignant lesions40. Both of these studies were based on loss of heterozygosity (LOH) analysis using specific microsatellite markers. For the present data set, the time of occurrence analysis confirmed the oncogenetic tree predictions that −8p tends to occur before −3p (Table 3). The different predictions could be due to differences between LOH and CGH or due to differences in the study populations.

Among the three subclusters identified by the distance-based tree, +7p and +11q appeared to form a distinct pathway comprising only chromosomal gains (+11q, +7p, +20p, +20q and +5p). Genes at chromosomal regions 7p12 and 11q13 are known to affect the biological characteristics of OSCC. For example, activation of the oncogene EGFR on 7p12 has been implicated in nodal metastasis of OSCC41, 42, while the oncogenes CCND1 and EMS1 on 11q13 have been associated with high proliferation, invasiveness, and poor prognosis of HNSCC43, 44. Because the gains of 7p and 11q contribute to the aggressive behaviour of OSCC, this pathway may also have prognostic importance.

It is well established that node-positive OSCC have poor prognosis as compared to node-negative OSCC. This may be due to the differences in the underlying genetic alterations between the two subtypes. The GPS analysis suggests that node-positive OSCC have progressed further along the tree and they tend to be of a later genetic progression stage. On the other hand, the distance-based trees that were constructed for each subgroup separately indicate differences in the number and type of genetic events that occur in each subtype. Hence, we have found indications for node-positive OSCC as both late stage tumours in a unified progression model and as resulting from alternative progression routes.

Higher numbers of genetic alterations and higher GPS suggest increased karyotypic complexity and genetic instability in the node-positive OSCC. Furthermore, the pathway of −8p and its subtree as well as the alterations −18q, +7p and +9q were prominently observed in the node-positive as compared to the node-negative OSCC. The loss of chromosomes 8p and 18q and gain of chromosome 7p have been reported to contribute to metastasis and poor prognosis of HNSCC42, 45, 46.

Taken together the findings of the oncogenetic tree analyses indicate that the initial genomic event is +8q, from which at least three karytotypic pathways emerged. Thus, the events after +8q are not completely random. This implies that the process of oral carcinogenesis is not the accumulation of genetic alterations in an unordered fashion. Rather, alterations occur preferentially in certain orders that may define tumour subtypes such as the node positive and node negative oral cancers. The tree models have provided novel information about the sequence of alterations and the non-linear pathways they form. Future efforts should attempt to identify the genes present at the altered chromosomal regions identified by the tree analysis, which might help elucidating the pathogenesis of oral cancers.


We are grateful to Indian Council of Medical Research (ICMR; Grant no. 5/13/2/TF/2001-NCD-III) for funding the current project. This research was supported in part by the Intramural Research Program of the National Institutes of Health, NLM. We thank Council of Scientific and Industrial Research (CSIR) for providing fellowship to Ms. Swapnali Pathare during her tenure as a graduate (PhD) student. We thank Dr R Mistry, Dr A K D'Cruz, Dr KA Pathak who permitted collection of oral cancer samples from TMH; Dr AM Borges for microdissection of tumor samples and Sadhana Kannan for her valuable suggestions during data analysis.

Supported by: Indian Council of Medical Research (ICMR) grant no. 5/13/2/TF/2001-NCD-III and in part by the Intramural Research program of the National Institutes of Health


Conflict of interest statement: The authors agree with the contents of the manuscript and they have no conflicts of interest.

A brief statement about novelty of the research work: By applying oncogenetic tree models to the CGH data of oral cancers, we identified the sequence of alterations and the divergent pathways of progression. This implies that interaction between the genetic alterations contribute to multiple progression pathways in oral cancers.


1. Baudis M. Genomic imbalances in 5918 malignant epithelial tumors: an explorative meta-analysis of chromosomal CGH data. BMC Cancer. 2007. p. 226. Available from [PMC free article] [PubMed]
2. Höglund M, Gisselsson D, Mandahl N, Johansson B, Mertens F, Mitelman F, Säll T. Multivariate analyses of genomic imbalances in solid tumors reveal distinct and converging pathways of karyotypic evolution. Genes Chromosomes Cancer. 2001. pp. 156–71. Available from [PubMed]
3. Shingaki S, Takada M, Sasai K, Bibi R, Kobayashi T, Nomura T, Saito C. Impact of lymph node metastasis on the pattern of failure and survival in oral carcinomas. Am J Surg. 2003. pp. 278–84. Available from [PubMed]
4. Hermsen MAJA, Joenje H, Arwert F, Braakhuis BJM, Baak JP, Westerveld A, Slater R. Assessment of chromosomal gains and losses in oral squamous cell carcinoma by comparative genomic hybridisation. Oral Oncol. 1997. pp. 414–18. Available from [PubMed]
5. Wolff E, Girod S, Liehr T, Vorderwulbecke U, Ries J, Steininger H, Gebhart E. Oral squamous cell carcinomas are characterized by a rather uniform pattern of genomic imbalances detected by comparative genomic hybridisation. Oral Oncol. 1998. pp. 186–90. Available from [PubMed]
6. Weber RG, Scheer M, Born IA, Joos S, Cobbers JMJL, Hofele C, Reifenberger G, Zöller JE, Lichter P. Recurrent chromosomal imbalances detected in biopsy material from oral premalignant and malignant lesions by combined tissue microdissection, universal DNA amplification, and comparative genomic hybridization. Am J Pathol. 1998;153(1):295–303. [PubMed]
7. Okafuji M, Ita M, Oga A, Hayatsu Y, Matsuo A, Shinzato Y, Shinozaki F, Sasaki K. The relationship of genetic aberrations detected by comparative genomic hybridization to DNA ploidy and tumor size in human oral squamous cell carcinomas. J Oral Pathol Med. 2000. pp. 226–31. Available from [PubMed]
8. Lin S-C, Chen Y-J, Kao S-Y, Hsu M-T, Lin C-H, Yang S-C. Chromosomal changes in betel-associated oral squamous cell carcinomas and their relationship to clinical parameters. Oral Oncol. 2002;38(3):266–73. [PubMed]
9. Noutomi Y, Oga A, Uchida K, Okafuji M, Ita M, Kawauchi S, Furuya T, Ueyama Y, Sasaki K. Comparative genomic hybridization reveals genetic progression of oral squamous cell carcinoma from dysplasia via two different tumourigenic pathways. J Pathol. 2006;210(1):67–74. [PubMed]
10. Desper R, Jiang F, Kallioniemi O-P, Moch H, Papadimitriou CH, Schäffer AA. Inferring tree models for oncogenesis from comparative genome hybridization data. J Comput Biol. 1999. pp. 37–51. Available from [PubMed]
11. Desper R, Jiang F, Kallioniemi O-P, Moch H, Papadimitriou CH, Schäffer AA. Distance-based reconstruction of tree models for oncogenesis. J Comput Biol. 2000. pp. 789–803. Available from [PubMed]
12. Beerenwinkel N, Rahnenführer J, Däumer M, Hoffmann D, Kaiser R, Selbig J, Lengauer T. Learning multiple evolutionary pathways from cross-sectional data. J Comput Biol. 2005. pp. 584–98. Available from [PubMed]
13. Fearon E, Vogelstein B. A genetic model of colorectal tumorigenesis. Cell. 1990;61(5):759–67. [PubMed]
14. Jiang F, Desper R, Papadimitriou CH, Schäffer A, Kallioniemi O-P, Richter J, Schraml P, Sauter G, Mihatsch MJ, Moch H. Construction of evolutionary tree models for renal cell carcinoma from comparative genomic hybridization data. Cancer Res. 2000. pp. 6503–9. Available from [PubMed]
15. Schäffer AA, Simon R, Desper R, Richter J, Sauter G. Tree models for dependent copy number changes in bladder cancer. Int J Oncol. 2001;18(2):349–54. [PubMed]
16. Huang Q, Yu GP, McCormick SA, Mo J, Datta B, Mahimkar M, Lazarus P, Schäffer AA, Desper R, Schantz SP. Genetic differences detected by comparative genomic hybridization in head and neck squamous cell carcinomas from different tumor sites: construction of oncogenetic trees for tumor progression. Genes Chromosomes Cancer. 2002. pp. 224–33. Available from [PubMed]
17. Huang Z, Desper R, Schäffer AA, Yin Z, Li X, Yao K. Construction of tree models for pathogenesis of nasopharyngeal carcinoma. Genes Chromosomes Cancer. 2004. pp. 307–15. Available from [PubMed]
18. Rahnenführer J, Beerenwinkel N, Schulz WA, Hartmann C, von Deimling A, Wullich B, Lengauer T. Estimating cancer survival and clinical outcome based on genetic tumor progression scores. Bioinformatics. 2005. pp. 2438–46. Available from [PubMed]
19. Jiang H-Y, Huang Z-X, Zhang X-F, Desper R, Zhao T. Construction and analysis of tree models for chromosomal classification of diffuse large B-cell lymphomas. World J Gastroenterol. 2007. pp. 1737–42. Available from [PMC free article] [PubMed]
20. Ketter R, Urbschat S, Henn W, Feiden W, Beerenwinkel N, Lengauer T, Steudel W-I, Zang KD, Rahnenführer J. Application of oncogenetic trees mixtures as a biostatistical model of the clonal cytogenetic evolution of meningiomas. Int J Cancer. 2007. pp. 1473–80. Available from [PubMed]
21. Kallioniemi A, Kallioniemi O-P, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science. 1992. pp. 818–21. Available from [PubMed]
22. du Manoir S, Schröck E, Bentz M, Speicher MR, Joos S, Ried T, Lichter P, Cremer T. Quantitative analysis of comparative genomic hybridization. Cytometry. 1995;19(1):27–41. [PubMed]
23. Ried T, Liyanage M, du Manoir S, Heselmeyer K, Auer G, Macville M, Schröck E. Tumor cytogenetics revisited: comparative genomic hybridization and spectral karyotyping. J Mol Med. 1997;75(11−12):801–14. [PubMed]
24. Jeuken JWM, Sprenger SHE, Wesseling P. Comparative genomic hybridization: practical guidelines. Diagn Mol Pathol. 2002;11(4):193–203. [PubMed]
25. Beerenwinkel N, Rahnenführer J, Kaiser R, Hoffmann D, Selbig J, Lengauer T. Mtreemix: a software package for learning and using mixture models of mutagenetic trees. Bioinformatics. 2005;21(9):2106–7. [PubMed]
26. Brodeur GM, Tsiatis AA, Williams DL, Luthardt FW, Green AA. Statistical analysis of cytogenetic abnormalities in human cancer cells. Cancer Genet Cytogenet. 1982. pp. 137–52. Available from [PubMed]
27. Morton NE. Parameters of the human genome. Proc Natl Acad Sci USA. 1991;88(17):7474–6. [PubMed]
28. Yin J, Beerenwinkel N, Rahnenführer J, Lengauer T. Model selection for mixtures of mutagenetic trees. Stat Appl Genet Mol Biol. 2006. Article17. Available from [PubMed]
29. Simon R, Papadimitriou CH, Peng A, Taetle R, Alberts DS, Trent JM, Schäffer AA. Chromosome abnormalities in ovarian adenocarcinoma III: Using breakpoint data to infer and test mathematical models for oncogenesis. Genes Chromosomes Cancer. 2000;28(2):106–20. [PubMed]
30. Fitch WM, Margoliash E. Construction of phylogenetic trees. Science. 1967;155(760):279–84. [PubMed]
31. Saitou N, Nei M. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4(4):406–24. [PubMed]
32. Felsenstein J. PHYLIP Phylogeny Inference Package (Version 3.2). Cladistics. 1989;5:164–6.
33. Gansner ER, North S. An open graph visualization system and its applications to software engineering. Softw. Pract. Exper. 1999;30(11):1203–33.
34. Page RDM. TREEVIEW: An application to display phylogenetic trees on personal computers. Comp Appl Biosci. 1996;12(4):357–8. [PubMed]
35. Höglund M, Gisselsson D, Säll T, Mitelman F. Coping with complexity: multivariate analysis of tumor karyotypes. Cytogenet Genome Res. 2002;135(2):103–9. [PubMed]
36. Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J Roy Statist Soc Ser. 1995;B 57:289–300.
37. Paterson IC, Eveson JW, Prime SS. Molecular changes in oral cancer may reflect aetiology and ethnic origin. Eur J Cancer B Oral Oncol. 1996;32B(3):150–3. [PubMed]
38. Garnis C, Coe BP, Ishkanian A, Zhang L, Rosin MP, Lam WI. Novel regions of amplification on 8q distinct from the MYC locus and frequently altered in oral dysplasia and cancer. Genes Chromosomes Cancer. 2004. pp. 93–8. Available from [PubMed]
39. Roz L, Wu CL, Porter S, Scully C, Speight P, Read A, Sloan P, Thakker N. Allelic imbalance on chromosome 3p in oral dysplastic lesions: an early event in oral carcinogenesis. Cancer Res. 1996. pp. 1228–31. Available from [PubMed]
40. Rosin MP, Cheng X, Poh C, Lam WL, Huang Y, Lovas J, Berean K, Epstein JB, Priddy R, Le ND, Zhang L. Use of allelic loss to predict malignant risk for low-grade oral epithelial dysplasia. Clin Cancer Res. 2000. pp. 357–62. Available from [PubMed]
41. Gebhart E, Ries J, Wiltfang J, Liehr T, Efferth T. Genomic gain of the epidermal growth factor receptor harboring band 7p12 is part of a complex pattern of genomic imbalances in oral squamous cell carcinomas. Arch Med Res. 2004. pp. 385–94. Available from [PubMed]
42. Kalyankrishna S, Grandis J. Epidermal growth factor receptor biology in head and neck cancer. J Clin Oncol. 2006. pp. 2666–72. Available from [PubMed]
43. Michalides R, van Veelen N, Hart A, Loftus B, Wientjens E, Balm A. Overexpression of cyclin D1 correlates with recurrence in a group of forty-seven operable squamous cell carcinomas of the head and neck. Cancer Res. 1995. pp. 975–8. Available from [PubMed]
44. Rodrigo JP, García LA, Ramos S, Lazo PS, Suárez C. EMS1 gene amplification correlates with poor prognosis in squamous cell carcinomas of the head and neck. Clin Cancer Res. 2000. pp. 3177–82. Available from [PubMed]
45. Bockmühl U, Iswad CS, Ferrell RE, Gollin SM. Association of 8p23 deletions with poor survival in head and neck cancer. Otolaryngol–Head Neck Surgery. 2001;124(4):451–55. [PubMed]
46. Pearlstein RP, Benninger MS, Carey TE, Zarbo RJ, Torres FX, Rybicki BA, Dyke DL. Loss of 18q predicts poor survival of patients with squamous cell carcinoma of the head and neck. Genes Chromosomes Cancer. 1998. pp. 333–9. Available from [PubMed]