PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bioldirectBioMed CentralBiomed Central Web Sitesearchsubmit a manuscriptregisterthis articleBiology DirectJournal Front Page
 
Biol Direct. 2010; 5: 26.
Published online 2010 April 20. doi:  10.1186/1745-6150-5-26
PMCID: PMC2875223

A new stochastic and state space model of human colon cancer incorporating multiple pathways

Abstract

Background and Purpose

Studies by molecular biologists and geneticists have shown that tumors of human colon cancer are developed from colon stem cells through two mechanisms: The chromosomal instability and the micro-satellite instability. The purpose of this paper is therefore to develop a new stochastic and state space model for carcinogenesis of human colon cancer incorporating these biological mechanisms.

Results

Based on recent biological studies, in this paper we have developed a state space model for human colon cancer. In this state space model, the stochastic system is represented by a stochastic model, involving 2 different pathways-the chromosomal instability pathway and the micro-satellite instability pathway; the observation, cancer incidence data, is represented by a statistical model. Based on this model we have developed a generalized Bayesian approach to estimate the parameters through the posterior modes of the parameters via Gibbs sampling procedures. We have applied this model to fit and analyze the SEER data of human colon cancers from NCI/NIH.

Conclusions

Our results indicate that the model not only provides a logical avenue to incorporate biological information but also fits the data much better than other models including the 4-stage single pathway model. This model not only would provide more insights into human colon cancer but also would provide useful guidance for its prevention and control and for prediction of future cancer cases.

Reviewers

This article was reviewed by M.P. Little and M. Kimmel

Background

In the past 15 years, molecular biologists and geneticists have revealed the basic molecular and genetic mechanisms for human colon cancer. These mechanisms have been linked to two avenues: The chromosomal instability (CIN) involving chromosomal aberrations and loss of heterozygosity (LOH), and the micro-satellite instability (MSI) involving mis-match repair genes and the creation of mutator phenotype ([1-9]). The pathway of the CIN avenue (also referred to as LOH pathway) involves inactivation through genetic and/or epigenetic mechanisms, or loss, or mutation of the suppressor APC gene in chromosome 5q (about 85% of all human colon cancers) whereas the pathway of the MSI avenue involves mutation or epigenetic inactivation of the mis-match repair suppressor genes (about 15% of all colon cancers). This leads to multiple pathways for the generation of human colon cancer tumors with each pathway following a stochastic multi-stage model and with intermediate transformed cells subjecting to stochastic proliferation (birth) and differentiation (death). The goal of this paper is to develop a stochastic model for human colon cancer to incorporate these biological information and pathways. This paper is an extension of Tan et al. [10], Little and Wright [11] and Little et al. [12]. We note that besides the multiple pathways considered above, Little and Wright [11], Little et al. [12] and Little [13] have also included mixture type of multiple pathways; however, because the mutation rates are very small, the chance of mixture type of pathways will be extremely small in which case the Little model is equivalent to the model in Section 3.

For developing biologically supported stochastic model of carcinogenesis, in Section 2 we present the most recent cancer biology of human colon cancer. Using results from Section 2, we develop in Section 3 a stochastic model for carcinogenesis of human colon cancer involving multiple pathways. In Section 4 we derive a statistical model for cancer incidence data of human colon cancer. By combining models from Sections 3 and 4, in Section 5 we develop a state space model for human colon cancer. In Section 6, by using the state space model in Section 5, we develop a generalized Bayesian inference procedure to estimate unknown parameters and to predict state variables. To illustrate the applications of the model and methods, in Section 7 we apply the model and methods to the colon cancer incidence data from SEER. Finally in Section 8, we discuss the usefulness of the model and methods and provide some conclusions.

A Brief Summary of Colon Cancer Biology

As discussed in the introduction, genetic studies have indicated that there are two major avenues by means of which human colon cancer is derived: The Chromosomal Instability (CIN) and the Micro-Satellite Instability (MSI). The first avenue is associated with the LOH pathway involving the APC gene in chromosome 5q and the latter associated with the micro-satellite pathway involving mis-match repair genes. The most important oncogene is the β-catenin gene in chromosome 3p22.

The CIN (LOH) Pathway of Human Colon Cancer (The APC-β-catenin - Tcf - myc pathway)

The CIN pathway involves loss or inactivation of the tumor suppressor genes - the APC gene in chromosome 5q, the Smad-4 gene in chromosome 18q and the p53 gene in chromosome 17p; see Remark 1. This pathway accounts for about 85% of all colon cancers. It has been referred to as the LOH pathway because it is characterized by aneuploidy/or loss of chromosome segments (chromosomal instability); see Remark 2. This pathway has also been referred to as APC-β - catenin - Tcf - myc pathway because it involves the destruction complex GSK-3β -Axin-APC which phosphorylates the β-catenin protein leading to its degradation; when both copies of the APC gene are inactivated or mutated, the destruction complex is then inactive leading to accumulation of free β-catenin proteins in the cytoplasm which move to the nucleus to complex with Tcf/Lef transcription factor to activate and transcript oncogenes myc, cyclin D and CD44. (Free β-catenin protein in the cytoplasm also binds with E-cadherin and α-catenin to disrupt the gap junction between cells, leading to migration and metastasis of cancer tumors.)

Morphological studies have indicated that inactivation, or loss or mutation of APC creates dysplastic aberrant crypt foci (ACF) which grow into dysplastic adenomas. These adenomas grow to a maximum size of about 10 mm3; further growth and malignancy require the abrogation of differentiation, cell cycle inhibition and apoptosis which are facilitated by the inactivation, or mutation or loss of Smad-4 gene in 18q and the p53 gene in 17p. The mutation or activation of the oncogene H-ras in chromosome 11p and/or mutation and/or activation of the oncogene src in chromosome 20q would speed up these transitions by promoting the proliferation rates of the respective intermediate initiated cells [14]. This pathway is represented schematically by Figure Figure11.

Figure 1
The CIN pathway of human colon cancer. Sporatic Chromsomal Instability Pathways of human colon cancer.

The model in Figure Figure11 is a 6-stage model. However, because of the haplo-insufficiency of the Smad4 gene (see Alberici et al.[15]) and the haplo-insufficiency of the p53 gene ([16]), one may reduce this 6-stage model into a 4-stage model by combining the third stage and the fourth stage into one stage and by combining the fifth stage and the sixth stage into one stage. This may help explain why for single pathway models, the 4-stage model fits the human colon cancer better than other single pathway multi-stage models ([17]). Recent biological studies by Green and Kaplan [4] and others have also shown that the inactivation or deletion or mutation of one copy of the APC gene in chromosome 5 can cause defects in microtubule plus-end attachment during mitosis dominantly, leading to aneuploidy and chromosome instability. This would speed up the mutation or inactivation of the second copy of the APC gene and increase fitness of the APC-carrying cells in the micro-evolution process of cancer progression. This could also help explain why the APC LOH pathway is more frequent than other pathways.

Remark 1: As observed by Sparks et al. [8], instead of the APC gene, this pathway can also be initiated by mutation of the oncogene β-catenin gene; however, the proportion of human colon cancer due to mutation of β-catenin is very small (less than 1%) as compared to the APC gene, due presumably to the contribution of the APC on chromosome instability ([4]). Similarly, the destruction complex can become inactive either by the inhibition of GSK-3β through the Wnt signalling pathway (see [18]) or the inactivation or mutation of the Axin protein, leading to accumulation of the β- Catenin proteins in the cytoplasm; but the proportion of colon cancer caused by inhibition of GSK-3β is also very small as compared to the colon cancer cases caused by the CIN and the MSI pathways.

Remark 2: The APC gene in chromosome 5q acts both as a tumor suppressor gene and an oncogene in initiating and promoting colon carcinogenesis. As an oncogene, the APC gene acts dominantly in regulating microtubule plus-end attachment during mitosis ([4]). Thus, the inactivation or deletion or mutation of one copy of the APC gene in chromosome 5 can cause defects in microtubule plus-end attachment during mitosis, leading to aneuploidy and chromosome instability. This would speed up the mutation or inactivation of the second copy of the APC gene and increase fitness of the APC-carrying cells in the micro-evolution process of cancer progression. This could also help explain why the APC LOH pathway is more frequent than other pathways.

The MSI (Micro-Satellite Instability) Pathway of Human Colon Cancer

This pathway accounts for about 15% of all colon cancers and appears mostly in the right colon. It has been referred to as the MSI pathway or the mutator phenotype pathway because it is initiated by the mutations or epigenetic methylation of the mis-match repair genes (mostly hMLH1 in chromosome 3p21 and hMSH2 in chromosome 2p16) creating a mutator phenotype to significantly increase the mutation rate of many critical genes 10 to 1000 times. Normally these critical genes are TGF-β RII, Bax (The X protein of bcl-2 gene), IGF2R, or CDX-2. The mis-match repair genes are hMLH1, hMSH2, hPMS1, hPMS2, hMSH6 and hMSH3; mostly hMLH1 (50%) and hMSH2 (40%). This pathway is represented schematically by Figure Figure2.2. As in the LOH pathway, assuming haplo-insufficiency of tumor suppressor genes, one may approximate this pathway by a 5-stage model.

Figure 2
The MSI pathway of human colon cancer. Microsatellite Instability pathway of human colon cancer.

Morphologically, mutation or methylation silencing of the MMR gene hMLH1 or hMSH2 generates hyperplastic polyps which lead to the generation of serrated adenomas. These adenomas develop into carcinomas after the inactivation, or loss or mutations of the TGF-β RII gene and the Bax gene, thus abrogating differentiation and apoptosis. (Bax is an anti-apoptosis gene.) In what follows, we let N denote the normal stem cells, Ji the i-th stage cells in the MSI pathways. Then for sporadic MSI, the model is N J1 J2 J3 J4 J5 cancer tumor.

The Major Signalling Pathways for Human Colon Cancer

Recent biological studies ([18,19]) have shown that both the CIN and the MSI pathways involve the Wnt signalling pathway and the destruction complex (this complex is a downstream of the Wnt signalling pathway), the TGF-β inhibiting signalling pathway and the p53-Bax apoptosis signalling pathway, but different genes in the CIN and MSI pathways are affected in these signalling processes. In the CIN pathway, the affected gene is the APC gene in the Wnt signalling, the Smad4 in the TGF-β signalling and the p53 gene in the p53-Bax signalling; on the other hand, in the MSI pathway, the affected gene is the Axin 2 gene in the Wnt signalling, the TGF-β -Receptor II in the TGF-β signalling and the Bax gene in the p53-Bax signalling.

Because the probability of point mutation or genetic changes of genes are in general very small compared to epigenetic changes, one may speculate that colon cancer may actually be initiated by some epigenetic mechanisms ([18,20,21]). In fact, Breivik and Gaudernack [20] showed that in human colon cancer, either methylating carcinogens or hyper-methylation at CpG islands would lead to G/T mismatch which in turn leads to Mis-match Repair (MMR) gene deficiency or epigenetic silencing of the MMR genes and hence MSI (Micro-satellite Instability); alternatively, either hypo-methylation, or bulky-adduct forming (BAF) carcinogens such as alkylating agents, UV radiation and oxygen species promote chromosomal rearrangement via activation of mitotic check points (MCP), thus promoting CIN (Chromosomal Instability). A recent review by Baylin and Ohm [18] have demonstrated that epigenetic events may lead to LOH and mutations of many genes which may further underline the importance of epigenetic mechanisms in cancer initiation and progression.

Based on the above biological studies, in this paper we thus postulate that the incidence data of human colon cancer are described and generated by a multi-stage model involving 2 pathways as defined above. In this paper, because of haploid-insufficiency of the tumor suppressor genes {Smad4, p53, Axin, Bax, TGF - β - ReceptorII}, the number of stages for the CIN pathway and MSI are assumed as 4 and 5 respectively.

Methods

Stochastic Multi-Stage Model of Carcinogenesis for Human Colon Cancer Involving Multiple Pathways

From results of Section 2, it follows that the stochastic multi-stage model for human colon cancer can be represented schematically by Figure Figure33.

Figure 3
The multiple pathways of human colon cancer. All pathways involoved in sporatic human colon cancer.

In Figure Figure3,3, the model assumes that cancer tumors are generated by two pathways with pathway 1 as a k1-stage multi-stage model involving Il (l = 1, ..., k1) cells and with pathway 2 as a k2-stage multi-stage model involving Jr (r = 1, ..., k2) cells. (For human colon cancer, k1 = 4, k2 = 5.) The state variables are then An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t) = {Il(t), l = 1, ..., k1 - 1, Jr(t), r = 1, ..., k2 - 1} and T(t), where T(t) denotes the number of cancer tumors at time t and where Il(t) (Jr(t)) denote the number of the Il (Jr) initiated cells for {l = 1, ..., k1 - 1 (r = 1, ..., k2 - 1)} respectively. Notice that because cell proliferation, cell differentiation and apoptosis, mutation or genetic changes all occur during cell division and cell division cycle, and because An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t + Δt) develop from An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t) through cell divisions during (t, t + Δt], one may pratically assume that (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t), t t0) is a Markov process with continous time, where t0 represents time at birth; one the other hand, T(t + Δt) may derive from An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i2.gif (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i3.gif) cells before time t, T(t) is in general not Markov ([22,23])). If one assumes that the An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i2.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i3.gif cells grow instantaneously into cancer tumors as soon as they are generated, then one may also assume the T(t) as Markov. In this case, as illustrated in Tan[24], one may use standard Markov theory to derive the probability generating function (pgf) of the probabilities of these variables and hence the probability distribution of these variables. Let ψ (xl, l = 1,..., k1 -1, yr, r = 1,..., k2-1, z; t0, t) = ψ (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i4.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, z; t0, t) denote the pgf of {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t), T(t)}. Let An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i6.gif denote the mutation rates, the birth rates and the death rates of {Il, Jr} cells as given in Table Table11 respectively.

Table 1
Transition rates and transition probabilities for human colon carcinogenesis

If T(t) is Markov, then by using the method of Kolmogorov forward equation of these variables (Tan [24]), it can readily be shown that ψ (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i4.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, z; t0, t) satisfies the following partial differential equation (pde):

equation image
(1)

where λI (t) = N(t)α0(t), λJ (t) = N(t) β0(t), An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i12.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i13.gif, and the initial condition is ψ (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i4.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, z; t0, t0) = 1 given normal individuals at risk at time t0 .

The above pde is in general very difficult to solve; further, even if the solution of this equation can be derived, the results are very difficult to apply to estimate the unknown parameters and to predict future cancer cases. Most importantly, T(t) may not be Markov so that this theory is not applicable (Fakir et al.[22,23]). In this paper, we will thus propose an alternative approach through stochastic equations. It can easily be shown through the method of pgf that if T(t) is Markov, then the stochastic equation method is equivalent to the method of Markov theory; as we shall see, however, the stochastic equation method is more powerful and does not need to assume Markov for T(t).

The Stochastic Equation for State Variables

To derive stochastic equations for the state variables, let An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i14.gif be the number of births of the Il (Jr) initiated cells during (t, t + Δt] {l = 1, ..., k1- 1 (r = 1, ..., k2 - 1)}, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i15.gif the number of deaths of the Il (Jr) initiated cells during (t, t + Δt] {l = 1, ..., k1- 1 (r = 1, ..., k2 - 1)} and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i16.gif the number of mutation (Il Il+1) (Jr Jr+1) of Il (Jr) cells during (t, t + Δt] {l = 1, ..., k1 - 1 (r = 1, ..., k2 - 1)}.

Also let An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i17.gif be the number of mutation of N I1(N J1) during (t, t + Δt]. Taking into account of all possible input and output of relevant cells, we have the following stochastic equations for the state variables:

equation image
(2)

Because the transition variables An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i20.gif are random variables, the above equations are stochastic equations. With the transition rates as given in Table Table1,1, it can readily be shown that to the order of ot), the conditional probability distributions of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i21.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i22.gif given N(t) are Poisson with means λI(tt and λI(tt respectively whereas the conditional probability distributions of the numbers of births and deaths given the staging variables (i.e. the Il(t) and Jr(t)) follow multinomial distributions independently. That is,

equation image
(4)

for l = 1, 2, ..., k1 - 1,

equation image
(6)

for r = 1, ..., k2 - 1,

equation image
(7)

where λI(t) = N(t)α0(t), λJ (t) = N(t)β0(t).

Because the number of mutations of the Il cells would not affect the size of the Il population but only increase the number of Il+1 cells and because the mutation rate of Il cells is very small (10-5 ~10-8), it can readily be shown that to the order of ot), the conditional probability distribution of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i27.gif given Il(t) Il cells at time t is Poisson with mean Il(t)αl(tt independently of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i28.gif and other transition variables. That is,

equation image
(8)

independently of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i28.gif and other transition variables.

Similarly, we have that to the order of ot),

equation image
(9)

independently of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i31.gif and other transition variables.

Using the probability distributions given by equations (5)-(10) and by subtracting from the transition variables the conditional expected values respectively, we have the following stochastic differential equations for the staging state variables:

equation image
(10)

where An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i34.gif.

In the above equations, the random noises An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i35.gif are derived by subtracting the conditional expected numbers from the random transition variables respectively. Obviously, these random noises are linear combinations of Poisson and multinomial random variables. These random noises have expected value zero and are un-correlated with the state variables {Il(t), l = 1, ..., k1 - 1, Jr(t), r = 1, ..., k2 - 1}. It can also be shown that to the order of ot), these random noises are uncorrelated with one another and have variances given by:

equation image

where I0(t) = J0(t) = N(t).

The Expected Numbers

Let uI(l, t) = E [Il(t)] and uJ (r, t) = E [Jr(t)] denote the expected numbers of Il(t) and Jr(t) respectively and write uI(0, t) = uJ(0, t) = N(t). Using equations (11)-(12), we have the following differential equations for these expected numbers:

equation image
(12)

The solution of the above equations are:

equation image

If the model is time homogeneous, then λI(t) = λI, λJ(t) = λJ, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i40.gif for l = 1, ..., k1 - 1 and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i41.gif for r = 1, ..., k2 - 1. If the proliferation rates are not zero and if An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i42.gif for all l u and r v, then the above solutions reduce to:

equation image

where An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i44.gif.

The Probability Distribution of State Variables and Transition Variables

Although T(t) is not Markov, the random vector {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t), t t0} is Markov with continuous time. To derive the transition probability of this process, denote by f(x, y : N, p1, p2) the density at (x, y) of the multinomial distribution ML(N; p1, p2) with parameters (N; p1, p2) and h(x; λ) the density at x of the Poisson distribution with mean λ. Then, using the probability distributions given by equations (5)-(10), the transition probability of this Markov process is, to order of ot):

equation image

where I0(t) = J0(t) = N(t), a(lu, iu; t) = Iu(t + Δt) - Iu(t) - lu + iu, u = 1, ..., k1 - 1 and where b(mv, jv; t) = Jv(t + Δt) - Jv(t) - mv + jv, v = 1, ..., k2 - 1.

The above transition probability and hence the probability distribution of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t) is too complicated to be of much use. For implementing the Gibbs sampling procedure to estimate parameters and to predict state variables, we use data augmentation method to expand the model. Thus, we define the augmented variables An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i46.gif. (In what follows we will refer these variables as the transition variables, unless otherwise stated.)

Put An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i47.gif. Then {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i48.gif (t), t t0} is Markov with continuous time. Using the probability distributions of the transition random variables given by equations (5)-(10), the transition probability P{An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i48.gif (t + Δt)|An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i48.gif (t)} is

equation image
(14)

where

equation image
(15)

and

equation image
(16)

where An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i52.gif for l = 1, ..., k1 - 1 and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i53.gif for r = 1, ..., k2 - 1.

The probability distribution given by equation (15) will be used to derive estimates and predicted numbers of state variables. This is discussed in Section 6.

A Statistical Model and The Probability Distribution of the Number of Detectable Tumors

The data available for modeling carcinogenesis are usually cancer incidence over different time periods. For example, the SEER data of NCI/NIH for human cancers are given by {(yj, nj), j = 1, ..., n}, where yj is the observed number of cancer cases during the j-th age group and nj is the number of normal people who are at risk for cancer and from whom yj of them have developed cancer during the age group. Given in Table Table22 are the SEER data for human colon cancer adjusted for genetic cancer cases.

Table 2
Colon Cancer Data from SEER(overall population)

The Probability Distribution of the Number of Detectable Tumors for Colon Cancer

To derive the probability distribution of time to tumors, one needs the probability distribution of T(t). For deriving this probability distribution, we observe that malignant cancer tumors arise by clonal expansion from primary An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i2.gif cells and primary An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i3.gif cells, where primary An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i2.gif cells are An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i2.gif cells derived from An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i54.gif cells by mutation of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i54.gif cells and primary An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i3.gif cells are An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i3.gif cells derived from An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i55.gif cells by mutation of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i55.gif cells.

Let An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i56.gif be the probability that a primary An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i2.gif (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i3.gif) cancer cell at time s develops into a detectable cancer tumor at time t. Let Ti(t) be the number of cancer tumors derived from the i-th pathway. Then, to order of ot), the conditional probability distribution of T1(t) given {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i54.gif (s), s t} is Poisson with mean ω1(t) independently of T2(t), where

equation image

Similarly, to order of ot), the conditional probability distribution of T2(t) given {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i55.gif (s), s t} is Poisson with mean ω2(t) independently of T1(t), where

equation image

Let Qi(j) (i = 1, 2) be defined by:

equation image

where Ri(tj-1, tj) = ωi(tj-1) - ωi(tj).

Then Qi(j) is the probability that cancer tumors would develop during the j-th age group by the i-th pathway. Since cancer tumors develop if and only if at least one of the two pathways yield cancer tumors, the probability that each normal person at time t0 will develop cancer tumors during (tj-1, tj] is given by QT (j), where

equation image

For practical applications, we observe that to order of o(An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i61.gif (t)) and o(An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i62.gif (t)) respectively, the ωi(t) in Qi(j) are approximated by

equation image

Similarly, it can readily be shown that to the order of Min{o(An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i61.gif (t)), o(An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i62.gif (t)}, QT(t) ~Q1(t) + Q2(t).

To further simplify the calculation of QT (j), we observe that in studying human cancers, one time unit (i.e. Δt = 1) is usually assumed to be 3 months or 6 months or longer. In these cases, one may practically assume An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i64.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i65.gif if t - s ≥ 1.

A Statistical Model for Cancer Incidence Data

Let yj be the observed number of the number of cancer cases Yj developed during (tj-1, tj] given nj people at risk for cancer, who are normal at birth (t0). We assume that each individual develops colon cancer tumor by the same mechanism independently of one another. Then for each person who is normal at birth (t0), the probability that this individual would develop colon cancer tumor during the j-th age group (tj-1, tj] is given by QT (j). It follows that the probability distribution of Yj given that nj is:

equation image
(17)

Because nj is very large and QT (j) is very small, approximately Yj is Possion with mean τj = njQT (j). Notice that to the order of Max{o(An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i61.gif (t)), o(An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i62.gif (t))}, τj (and hence the probability distribution of Yj) depends on the stochastic model of colon carcinogenesis through the expected number {E [An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i54.gif (t)], E [An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i55.gif (t)]} of {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i54.gif (t), An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i55.gif (t)} and the parameters {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i61.gif (t), An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i62.gif (t)} over the time period (tj-1, tj].

The State Space Model of Human Colon Cancer

State space model is a stochastic model which consists of two sub-models: The stochastic system model which is the stochastic model of the system and the observation model which is a statistical model based on available observed data from the system. Thus, the state space model of a system takes into account the basic mechanisms of the system and the random variation of the system through its stochastic system model and incorporates all these into the observed data from the system; furthermore, it validates and upgrades the stochastic model through its observation model and the observed data of the system. As illustrated in Tan ([25], Chapters 8-9), the state space model has many advantages over both the stochastic model and the statistical model when used alone since it combines information and advantages from both of these models.

For human colon cancer, the stochastic system model of the state space model is the stochastic model consisting of 2 pathways with each pathway following a multi-stage model as described in Section 3; the observation model of this state space model is a statistical model based on the observed number of colon cancer cases as described in Section 4.

The Stochastic System Model and the State Variables

Putting Δt = 1 for some fixed small interval, then the staging variables are X = {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t), t = t0, t0 + 1, ..., tM} and the transition variables are U = {An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i67.gif (t), t = t0, t0 + 1, ..., tM - 1}. From results in Section (3.3), the joint probability distribution of {X, U} given the parameters Θ is:

equation image
(18)

where P{An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i67.gif (t - 1)|An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t - 1)} and P{An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t)|An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i1.gif (t - 1), An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i67.gif (t - 1)} are given by equations (16) and (17) respectively and where Θ = {λI, λJ, αl(t), βr(y), An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i69.gif, dl(t)(I)(t), An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i70.gif, dr(t)(J)(t), l = 1, ..., k1 - 1, r = 1, ..., k2 - 1}.

Notice that this probability distribution is basically a product of Poisson distributions and multinomial distributions.

The Observation Model Using SEER Data

Put Y = (Yj, j = 1, ..., m) and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif = (yj, j = 1, ..., m)'. By the probability distribution given by equation (18), the conditional probability density of Y given {X, U, Θ} is approximately given by:

equation image
(19)

where h(Yj; τj) is the density at Yj of the Poisson distribution with mean τj.

Then the likelihood function of Θ given (X, U) is An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i72.gif. It follows that the deviance from this density is:

equation image
(20)

where An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i74.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i75.gif is the maximum likelihood estimate of τj.

From equations (19)-(20), we have for the joint density of (X, U, Y ) given Θ:

equation image
(21)

To apply the above distribution to estimate unknown parameters and to fit real data, we also make the following assumptions: (a) From biological observations ([1-9]), one may practically assume that {αl(t) = αl, l = 0, 1, 2, 3; βr (t) = βr, r = 0, 1, 2, 3, 4, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i77.gif}. (b) Because the colon polyps are generated by proliferation of I2 cells and J3 cells and because the polyps can only grow to a maximum size of about 10 mm3, we assume that An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i78.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i79.gif for some small (δi > 0, i = 1, 2). (c) Because colon cell divisions are mainly due to action of the β-catenin gene, one may also assume An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i80.gif. In this case, one has approximately An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i81.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i82.gif, r = 1, 2. Under these assumptions, the unknown parameters of interest are Θ = {Θ1, Θ2}, where An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i83.gif and Θ2 = {α3, β4).

The Generalized Bayesian Method and the Gibbs Sampling Procedure

The generalized Bayesian inference is based on the posterior distribution P{Θ|X, U, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif} of Θ given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif}. This posterior distribution is derived by combining the prior distribution P{Θ} of Θ with the probability distribution P{X, U, Y|Θ} given by equation (20) with Y being replaced by An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif. It follows that this inference procedure would combine information from three sources: (1) Previous information and experiences about the parameters in terms of the prior distribution P{Θ}, (2) Biological information represented by the stochastic system equations of the stochastic system (P{X, U|Θ}) and (3) Information from observed data, represented by the statistical model through the conditional likelihood L(Θ|An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, X, U).

Because of additional information from the stochastic system model, this inference procedure is advantageous over the standard Bayesian procedure in that it can avoid the identifiability problems associated with standard Bayesian method. For example, we have shown that to the order of Max{o(α3(t)), o(β4(t))} the probability distribution of the Yj's depends on the stochastic model through the expected numbers of I3(t) and J4(t), which depend on the birth rates and death rates only through the difference of these rates. It follows that it is not possible to estimate the birth rates and death rates separately by the traditional Bayesian method. Most importantly, the number of parameters is very large and the number of data points is limited. Thus, without information from the stochastic system model, it is virtually impossible to estimate all unknown parameters; for more examples, see Tan ([25,26]).

The Prior Distribution of the Parameters

For the prior distributions of Θ, because biological information have suggested some lower bounds and upper bounds for the mutation rates and for the proliferation rates, we assume

equation image
(22)

where c is a positive constant if these parameters satisfy some biologically specified constraints; and equal to zero otherwise. These biological constraints are:

(i) For the mutation rates of the Ii cells in the LOH pathway, 1 <λI < 1000 (N I1), 10-6 <αi < 10-4, i = 1, 2, 3. For the proliferation rates of Iicells in the LOH pathway, γ1(t) = 0, 0 <An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i85.gif < 0.5, i = 2, 3, γ2(t) = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i86.gif, 10-4 <γ2 < 2 * 10-2, 10-5 <δ1 < 5 * 10-3, 10-2 <γ3< 0.5.

(ii) For the mutation rates in the MSI pathway, 1 <λJ< 1000 (N I1), 10-8 <β1 < 10-5, 10-6 <βj < 10-2, j = 2, 3, 4. For the proliferation rates in the MSI pathway, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i87.gif.

We will refer the above prior as a partially informative prior which may be considered as an extension of the traditional non- informative prior given in Box and Tiao [27].

The Posterior Distribution of the Parameters Given {Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, X, U}

Combining the prior distribution given in (6.1) with the density of P{X, U, Y |Θ} given in equation (20), one can readily derive the conditional posterior distribution of Θ given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif}. For (l = 2, 3), denote by: An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i88.gif and An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i89.gif; similarly, for r = 3, 4, we define {BrJ, DrJ, NrJ} by replacing An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i90.gif by An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i91.gif respectively. Then, we have the following results for the conditional posterior distributions:

(i) The conditional posterior distributions of Θ1(1) = {λI, λJ, αl, l = 1, 2, βr, r = 1, 2, 3} given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif} is:

equation image

(ii) The conditional posterior distributions of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i93.gif given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif} is:

equation image

(iii) The conditional posterior distribution of {α3, β4} given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif} is:

equation image

(vi) The conditional posterior distribution of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i96.gif given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif} and the conditional posterior distribution of An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i97.gif given {X, U, Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif} are represented respectively by:

equation image

The Multi-level Gibbs Sampling Procedure For Estimating Parameters

Given the above probability distributions, the multi-level Gibbs sampling procedure for deriving estimates of the unknown parameters are given by:

(a) Step 1: Generating (X, U) Given (Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, Θ) (The Data-Augmentation Step):

Given Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif and given Θ, use the stochastic equations (3)-(4) and the probability distributions given by equations (5)-(10) in Section 3 to generate a large sample of (X, U). Then, by combining this sample with P{Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif|X, U, Θ} to select (X, U) through the weighted bootstrap method due to Smith and Gelfant [28]. This selected (X, U) is then a sample from P{X, U|Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, Θ} even though the latter is unknown. (For proof, see Tan [25], Chapter 3.) Call the generated sample (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i99.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i100.gif).

(b) Step 2: Estimation of Θ = {Θ1, Θ2} Given {Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, X, U}:

Given Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif and given (X, U) = (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i99.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i100.gif) from Step 1, derive the posterior mode of the parameters by maximizing the conditional posterior distribution P{Θ|An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i99.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i100.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif}. Denote the generated mode as An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i101.gif.

(c) Step 3: Recycling Step.

With {(X, U) = (An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i99.gif, An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i100.gif), Θ = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i101.gif} given above, go back to Step (a) and continue until convergence. The convergence of the above steps can be proved using procedure given in Tan ([25], Chapter 3). At convergence, the An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i101.gif are the generated values from the posterior distribution of Θ given Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif independently of (X, U) (for proof, see Tan [25], Chapter 3). Repeat the above procedures one then generates a random sample of Θ from the posterior distribution of Θ given Y = An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif; then one uses the sample mean as the estimates of (Θ) and use the sample variances and covarainces as estimates of the variances and covariances of these estimates.

Results

Application to Fit the SEER Data

In this section, we will apply the above model to the NCI/NIH colon cancer data from the SEER project. Given in Table Table22 are the numbers of people at risk and colon cancer cases in the age groups together with the predicted cases from the model. There are 18 age groups with each group spanning over 5 years. To fit the data, we have assumed that An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i102.gif for j = 1, 2 because of the observation that uncontrolled cell division of colon stem cells is mainly initiated by the oncogene β-catenin in 3p22. Given in Table Table33 are the estimates of the mutation rates, the birth rates and the death rates of the Ii cells and Jjcells. Given in Figure Figure33 is the plot of probability density of time to tumors.

Table 3
Estimates of Parameters for Each Pathway

From these results, we have made the following observations:

(a) As shown by results in Table Table2,2, the predicted number of cancer cases are very close to the observed cases in all age groups. This indicates that the model fits the data well and that one can safely assume that the human colon cancer can be described by a model of 2 pathways. The AIC (Akaike Information Criterion) and the BIC (Bayesian Information Criterion) from the model are 55.96 and 81.30 which are smaller than the AIC of 816.0667 and the BIC value of 827.1513 from a single pathway 4-stage model respectively (Luebeck and Moolgavkar [17]). This shows that the multiple pathway model fits better than the single pathway 4-stage model as proposed by Luebeck and Moolgavkar [17].

(b) From Table Table2,2, it is observed that the largest number of cancer cases is in the age group between 70 and 75 years old. Comparing the values of Qi(j) between the CIN pathway (i = 1) and the MSI pathway (i = 2), it appears that the largest cancer cases is between the age group 65 and 70 years old for the CIN pathway and is between 85 and 90 years old for the MSI pathways. Presumably this might be due to the fact that the MSI pathway has one more stage than the CIN pathway.

(c) Reflecting the contribution of the APC gene on chromosomal instability, results in Table Table33 showed that the mutation rates of the Ircells from I1 I2 and from I2 I3 had increased about 100 times and 1000 times respectively than the mutation rate from N I1 cells. Similarly, due to the contribution to genomic instability by the mis-match repair genes, the mutation rates from J1 J2, from J2 J3 and J3 J4 had increased about 5 * 102, 0.5 * 104 and 104 times respectively than the mutation rate from N J1. Notice also from Table Table33 that the mutation rates from J1 J2 J3 J4 are about 2 to 3 times of those from I1 I2 I3. As shown in probability plots (not shown here), these increases have speeded up the time to cancer in the MSI pathway by about 5-10 years.

(d) Results in Table Table33 showed that the mutation rates from I3 I4 and from J4 J5 are of the order 10-6 which were about 102 → 103 times smaller than the mutation rates from I1 I2 I3 and from J1 J2 J3 J4. These results might be the consequence that we had ignored the stages of vascular carcinogenesis (i.e. angiogenesis and metastasis; see Hanahan and Weinberg [29] and Weinberg [30]) by merging these stages into the last stage. From Weinberg ([30], Chapters 13-14), notice that the angiogenesis and metastasis are also multi-stage processes.

(e) Results in Table Table33 showed that the proliferation rates (birth rate - death rate) of the I3 cells and the J4 cells are of order 10-2 which are much larger than the proliferation rates of the I2 cells and the J3 cells, due presumably to the effects of the silencing or inactivation of the cell cycle inhibition genes (Smad4 and TGF-β-RII) and the apoptosis inhibition genes (p53 and Bax). Notice from Table Table33 that the estimates of the proliferation rates of the I2 and I3 cells are approximately equal to those of the J3 and J4 cells respectively. These results seemed to suggest that the genomic instabilities had little effects on cell proliferations.

Conclusions and Discussion

Recent studies of cancer molecular biology have indicated very clearly that human colon cancer is developed through multiple pathways ([1-9]). This indicates that single pathway models are not realistic and hence may lead to incorrect prediction and confusing results. For developing efficient prevention and controlling procedures for human colon cancer and for prediction of future human colon cancer, in this paper we have developed a stochastic model and a state space model for carcinogenesis of human colon cancer involving multiple pathways with each pathway being a multi-stage model. Using this model, we have derived for the first time the probability distribution of the numbers of initiated cells and the probability distribution of time to cancer tumors. Such derivation by the traditional approach is extremely difficult and had not been attempted previously for multiple pathway models. Based on the state space model of colon cancer, we have developed a generalized Bayesian procedure to estimate the unknown parameters and to predict future cancer cases. This approach combines information from three sources: The stochastic system model via P{X, U|Θ}, the prior information via P{Θ} and information from data via L{Θ|An external file that holds a picture, illustration, etc.
Object name is 1745-6150-5-26-i5.gif, X, U}. Because of additional information from the stochastic system model, our procedure is advantageous over the standard Bayesian procedure and the sampling theory procedure. Notice that there are a large number of unknown parameters in the model and only a limited amount of data are available. Without this additional information, it is then not possible to estimate all unknown parameters. Notice also that through the stochastic system model, one can incorporate biological mechanism into the model. Because the number of stages and the mutation rates of intermediate cells in different pathways are different and different drugs may affect different pathways, we believe that this is important and necessary.

We have applied these models and procedure to the NCI SEER data (upto November, 2007). Our results showed that the proposed multiple pathways model fitted better than the single pathway 4-stage model as proposed by Luebeck and Moolgavkar [17]. (The respective AIC and BIC for the multiple pathways model are 55.96 and 81.30 which are ten times smaller than those of the AIC (816.0667) and BIC (827.1513) respectively of the single pathway 4-stage model.)

In this preliminary study, we have not yet compared the multiple pathways model with the single pathway model regarding prediction of future cancer cases and evaluation of treatment protocols for human colon cancer. This will be our future research, we will not go any further here.

Abbreviations

CIN: Chromsomal Instability; LOH: Loss of Heterozygocity; MSI: Micro-Satellite Instability; APC: Adenomatous Polyposis Coli; MMR: Mis-Match Repair; ML: Multinomial; Dev: Deviance; AIC: Akaike Information Criterion; BIC: Bayesian Information Criterion; SEER: Surveillance Epidemiology and End Results.

Competing interests

The results of this paper have not been published in other places.

Authors' contributions

Both authors contributed equally.

Reviewers' comments

Reviewer 1 (M.P. Little)

General comments

This is a generally well-written paper, describing a model very similar to that recently developed by Little et al. [12], generalising the model of Little and Wright ([11]). Arguably thislatest model should be referenced, with discussion of mathematical differences between the model outlined here and that one. There could also be discussion of the somewhat different conclusions reached in fits to more or less the same SEER colon cancer data.

Response: The paper by Little et al. has been added to the reference; see Little et al. [12].

Specific comments (page/line)

(1) The 6-stage model is suddenly drawn from the hat here, but is then almost immediately reduced to a 4-stage model! Why does the haplo-insufficiency of Smad4 and p53 justify combining the stages in this way? Is there evidence that Smad4 and p53 are without function at half gene dose? Also, what is the evidence for mutations being in the order given in Figure Figure11?

Response: References documenting haplo-insufficiency of p53 and smad4 have been given in the paper. The reason why we can combine the two-stages involving P53 into one-stage is based on these papers. To illustrate, let B denote the P53 gene and b the mutant of p53. Then, under haplo-insufficiency of P53, the level and effects of the P53 protein has been reduced significantly (at least 4-fold or more) so that the phenotype of genotype B/b (or B/-) is closely approximated by that of genotype b/b.

(2) How would a model incorporating epigenetic effects differ from the (DNA mutational) models already outlined? I suspect that mathematically the formalism would be exactly the same, although the implied "mutation rates" would be very much higher for epigenetic events.

Response: From modeling viewpoint, it is difficult to tell the difference between epigenetic changes and mutation except that the former is much more frequent and very often reversible. However, epigenetic changes can help the modeler to incorporate biological information into the biological process. Many biological papers (just during 2008, there are hundreds of biological papers published) support the epigenetic changes and the view that epigenetic changes are the driving force for cancer initiation, progression and metastasis, more important than gene mutations in cancer initiation and progression. Epigenetic changes include methylation (hypo- and hyper-methylation), micro-RNA (non-coding RNA), loss of imprinting, Histone acetylation, HDAC, tissue disorganization and gap junction disruption, etc.; epigenetic changes can also lead to gene mutations; for a brief review, see Tan and Hanin [31] (Chapter 3).

(3) This is slightly confusing. I assume that the two pathways referred to are CIN and MSI, but coming after discussion of epigenetic effects, perhaps this was meant as well or instead.

Response: The genetic sequence of the CIN and MSI pathways were determined by molecular biology of colon cancer and have been published in cancer journals. (There are a large number of biological papers documenting this. I have just listed a few of them in our paper. I can provide many more published papers from cancer journals if one wishes.) This sequence appears to be logical from biological mechanism. Notice that the APC-β- Catenin activates myc and cyclin D to push the cell into cell cycle; for the cell cycle to progress, the inhibition effects of p15, p16, p18, p19, p27 have to be abrogated through the inhibition of the TGF-β signaling pathway (epigenetic silencing or inactivation or mutation or deletion of smad2/smad4 (CIN pathway) or TGF-β Receptor II (MSI pathway).). When the number of cells has increased to certain level, then the apoptosis p53-Bax pathway is activated. For the progression of carcinogenesis, the p53 (CIN pathway) or the Bax gene (MSI pathway) have to be epigenetically silenced or inactivated or mutated or deleted. This is the reason why only the late stage involves silencing or inactivation of the gene p53. This is illustrated in our paper; see also Tan and Hanin [31] (Chapter 3, Chapter 11 and Chapter 12). There are no biological supports single-stage or two stage models. Hence it is not logical to accept single stage or two stage models for colon cancer for people who are born normal. Also, it is extremely difficult for me to accept that there are no proliferation for I2 cells as assumed in some of the cancer model papers simply because of the observation of polyps in colon which are derived from proliferation of second-stage cells and is the basis for colon cancer screening procedure "Colonoscopy" practiced by medical doctors.

(4) I assume that asymmetric mutations are assumed, in which a cell produces one normal daughter cell and one mutant daughter cell, as assumed by Little et al. [12] and Little and Wright [11] and many others, but this could perhaps be clarified. It might be useful to derive this Kolmogorov forward equation (2) in an Appendix. Clearly this forward equation (2) is in general intractable. I suspect that as in the papers of Little et al. [12] and Little and Wright [11] it is much easier to attempt to solve the Kolmogorov backward equations.

Response: As in Little, I assume that under genetic changes, a normal (or one Ij cell) cell will give rise to one normal cell (or one Ij cell) and one mutant cell (or one Ij+1cell)(asymmetric change or mutation). This is logical because mutation or genetic changes occur during cell division.

(5) These conclusions are somewhat at odds with those of Little et al. [12] and Little and Li [32] who fitted models to very similar SEER colon cancer data and demonstrated that there was little evidence of better fit of models that allowed for genomic instability compared with those that did not. In particular the 4-stage model of Luebeck and Moolgavkar [16] provided as good fit as models that allowed for genomic instability [32], or as here multiple types of instability (CIN and MSI) [12]. The authors may care to discuss this.

Response: On November 13, 2009, Dr. Little had sent me his comments about our paper and a copy of his 2008 paper which I was not aware of before I wrote our paper. Dr. Little claimed that their fitting of the data could not differentiate between many different models. When I examined Dr. Little's paper, I found out that our estimation approach is very different from Dr. Little's. While Dr. Little's approach used the classical sampling theory through maximum likelihood estimates; our approach is the state space modeling and the generalized Bayesian inference incorporating information from three different sources: (1) The biological information from cancer molecular biology, (2) prior information from epigenetic and genetic mechanism, and (3) information from the likelihood function of observed data involving cancer incidence. Furthermore, because cancer incidence are derived only from people who are less than 100 years old (his/her life time), we first change the SEER data from (nj, yj) to (mj, yj), where mjis the number of people who can develop colon cancer during his life time (i.e., less that 100 yrars old.). Notice that nj is of the order 107 while mj is of the order 105. This is described in detail in our paper in Tan and Hanin [31] (Chapter 11). Because of this and because of additional information from biological mechanism and prior distribution, we do not have the problem of identification of parameters which statisticians usually encounter in analyzing data without using information from biological mechanism. Thus, using classical approach, the parameters are not identifiable so that one has to make some assumptions such as that growth rates (birth rates) of different stage cancer initiated cells are equal which can hardly be realistic in carcinogenesis. (Biological studies by cancer biologists clearly demonstrated that the proliferation rates of cancer initiated cells with different genetic changes are very different in most of the cases.)

Finally, I want to emphazise that many models can fit the data but some fit better than others. Finally I like to emphasize what Dr. Van Ryzin had concluded 20 years ago that for cancer risk assessment, many models can fit the data but only models which are biologically supported can give correct results for cancer prediction and cancer risk assessment. It is important to list the predicted numbers along with the observed numbers in cancer modeling research. EPA has revised the guidelines to require that cancer risk assessment models should be biologically supported.

Reviewer 2 (M. Kimmel)

The paper by Tan and Yan, proposes a new stochastic model of colon cancer progression, involving the chromosomal instability pathway and the micro-satellite instability pathway. This model not only might provide more insights into human colon cancer but also might provide useful guidance for its prevention and control and for prediction of future cancer cases. One interesting question is as to whether Tan and Yang model might help in deciding about the value of early detection of colon cancer by screening examination. Although colonoscopy is an accepted procedure, there are questions that linger concerning the impact of early detection on mortality reduction.

Response to Reviewer 2

Thank you very much for your comments and suggestion. We are currently collecting data on screening by colonoscopy. We will apply the model to estimate cancer incidence under screening and will examine if colonoscopy will help reduce cancer incidence. We will do computer simulation to find this out. This is our next research on colon cancer.

Acknowledgements

The research of this paper is supported by an NCI/NIH grant, R15 CA113347-01 and an NSF grant, DMS-0805829. As a one-time exception to the publishing policy of Biology Direct the articles in this series are being published with two reviewers.

References

  • de la Chapelle A. Genetic predisposition to colorectal cancer. Nature Review (Cancer) 2004;4:769–780. doi: 10.1038/nrc1453. [PubMed] [Cross Ref]
  • Fodde R, Smit R, Clevers H. APC, signal transduction and genetic instability in colorectal cancer. Nature Review (Cancer) 2001;1:55–67. doi: 10.1038/35094067. [PubMed] [Cross Ref]
  • Fodde R, Kuipers J, Rosenberg C. Mutations in the APC tumor suppressor gene cause chromosomal instability. Nature Cell Biology. 2001;3:433–438. doi: 10.1038/35070129. [PubMed] [Cross Ref]
  • Green RA, Kaplan KB. Chromosomal instability in colorectal tumor cells is associated with defects in microtubule plus-end attachments caused by a dominant mutation in APC. The Journal of Cell Biology. 2003;163:949–961. doi: 10.1083/jcb.200307070. [PMC free article] [PubMed] [Cross Ref]
  • Hawkins NJ, Ward RL. Sporadic colorectal cancers with micro-satellite instability and their possible origin in hyperplastic polyps and serrated adenomas. J Natl Cancer Institute. 2001;93:1307–1313. doi: 10.1093/jnci/93.17.1307. [PubMed] [Cross Ref]
  • Hisamuddin IM, Yang VW. Genetics of colorectal cancer. Medscape General Medicine. 2004;6(3):2004. [PubMed]
  • Peltomaki P. Deficient DNA mismatch repair: A common etiologic factor for colon cancer. Human Molecular Genetics. 2001;10:735–740. doi: 10.1093/hmg/10.7.735. [PubMed] [Cross Ref]
  • Sparks AB, Morin PJ, Vogelstein B, Kinzler KW. Mutational analysis of the APC/beta-catenin/Tcf pathway in colorectal cancer. Cancer Res. 1998;58:1130–1134. [PubMed]
  • Ward R, Meagher A, Tomlinson I, O'Connor T. Microsatellite instability and the clinicopathological features of sporadic colorectal cancer. Gut. 2001;48:821–829. doi: 10.1136/gut.48.6.821. [PMC free article] [PubMed] [Cross Ref]
  • Tan WY, Zhang LJ, Chen W, Zhu JM. In: "Handbook of Cancer Models with Applications". Tan WY, Hanin L, editor. Chapter 3. World Scientific, Singapore and River Edge, NJ; 2008. A stochastic model of human colon cancer involving multiple pathways.
  • Little MP, Wright EG. A stochastic carcinogenesis model incorporating genomic instability fitted to colon cancer data. Math Biosci. 2003;183:111–134. doi: 10.1016/S0025-5564(03)00040-3. [PubMed] [Cross Ref]
  • Little MP, Vineis P, Li G. A stochastic carcinogenesis model incorporating multiple types of genomic instability fitted to colon cancer data. J Theor Biolpgy. 2008;254:229–238. doi: 10.1016/j.jtbi.2008.05.027. [PubMed] [Cross Ref]
  • Little MP. In: "Handbook of Cancer Models with Applications". Tan WY, Hanin L, editor. Chapter 5. World Scientific, Singapore and River Edge, NJ; 2008. Cancer models, ionization and genomic instability: A review.
  • Jessup JM, Gallic GG, Liu B. In: "The Molecular Basis of Human Cancer". Coleman WB, Tsongalis GJ, editor. Chapter 13. Humana Press, Totowa; 2002. The molecular biology of colorectal carcinoma: Importance of the Wg/Wnt signal transduction pathway; pp. 251–268.
  • Alberici P, Jagmohan-Changur S, De Pater E. Smad4 haplo-insufficiency in mouse models for intestinal cancer. Oncogene. 2006;25:1841–1851. doi: 10.1038/sj.onc.1209226. [PubMed] [Cross Ref]
  • Lynch CJ, Milner J. Loss of one p53v allele results in four-fold reduction in p53 mRNA and protein: A basis for p53 haplo-insufficiency. Oncogene. 2006;25:3463–3470. doi: 10.1038/sj.onc.1209387. [PubMed] [Cross Ref]
  • Luebeck EG, Moolgavkar SH. Multistage carcinogenesis and colorectal cancer incidence in SEER. Proc Natl Acad Sci USA. 2002;99:15095–15100. doi: 10.1073/pnas.222118199. [PubMed] [Cross Ref]
  • Baylin SB, Ohm JE. Epigenetic silencing in cancer-a mechanism for early oncogenic pathway addiction. Nature Rev Cancer. 2006;6:107–116. doi: 10.1038/nrc1799. [PubMed] [Cross Ref]
  • Koinuma K, Yamashita Y, Liu W, Hatanaka H, Kurashina K, Wada T. Epigenetic silencing of AXIN2 in colorectal carcinoma with microsatellite instability. Oncogene. 2006;25:139–146. [PubMed]
  • Breivik J, Gaudernack G. Genomic instability, DNA methylation, and natural selection in colorectal carcinogenesis. Semi Cancer Biol. 1999;9:245–254. doi: 10.1006/scbi.1999.0123. [PubMed] [Cross Ref]
  • Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet. 2002;3:415–428. doi: 10.1038/nrg962. [PubMed] [Cross Ref]
  • Fakir H, Tan WY. Stochastic population dynamic effects for lung cancer progression. Radiation Research. 2009;172:383–393. doi: 10.1667/RR1621.1. [PubMed] [Cross Ref]
  • Yakovlev AY, Tsodikov AD. Stochastic Models of Tumor Latency and Their Biostatistical Applications. World Scientific, Singapore and River Edge, New Jersey; 1996.
  • Tan WY. Stochastic Models of Carcinogenesis. Marcel Dekker, New York; 1991.
  • Tan WY. Stochastic Models With Applications to Genetics, Cancers, AIDS and Other Biomedical Systems. World Scientific, Singapore and River Edge, New Jersey; 2002.
  • Tan WY. Stochastic Modeling of AIDS Epidemiology and HIV Pathogenesis. World Scientific, Singapore and River Edge, New Jersey; 2000.
  • Box GEP, Tiao GC. Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, MA; 1973.
  • Smith AFM, Gelfand AE. Bayesian statistics without tears: A sampling-resampling perspective. American Statistician. 1992;46:84–88. doi: 10.2307/2684170. [Cross Ref]
  • Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57–70. doi: 10.1016/S0092-8674(00)81683-9. [PubMed] [Cross Ref]
  • Weinberg RA. The Biology of Cancer. Garland Sciences, Taylor and Frances Group, New York; 2007.
  • Tan WY, Hanin L. Handbook of Cancer Models with Applications. Vol. 9. World Scientific Publishing Co. Pte. Ltd; 2008.
  • Little MP, Li G. Stochastic modelling of colon cancer: is there a role for genomic instability? Carcinogenesis. 2007;28:479–487. doi: 10.1093/carcin/bgl173. [PubMed] [Cross Ref]

Articles from Biology Direct are provided here courtesy of BioMed Central