We formulate a partially observable Markov decision process (POMDP) model for the prostate cancer screening problem. The true health states in our model are not directly observable, but can be probabilistically inferred from the patient’s PSA history. Our POMDP model is described as follows. One criterion in our model, from the patient’s perspective, is to maximize the expected QALYs. QALYs are estimated by decrementing a normal life year based on: (a) occurrence of biopsy, (b) treatment upon detection of cancer, and (c) long-term complications resulting from treatment. The optimal policy obtained from solving the POMDP trades off the long-term benefits from early detection of prostate cancer with the short term negative impact of biopsy and long-term side effects of treatment. The second criterion in our model, from the societal perspective, is to maximize the expected monetary value, using the societal willingness to pay to translate the expected QALYs into a monetary value and subtracting the costs of PSA tests, biopsies, and treatments.
In our model patients progress through the following unobservable health states before cancer is diagnosed: no prostate cancer, organ confined prostate cancer, extracapsular prostate cancer, lymph node positive prostate cancer, metastasis and death. The patient’s health state is inferred probabilistically based on PSA test observations. Observations are defined by a set of clinically relevant ranges {[0, 1), [1, 2.5), [2.5, 4), [4, 7), [7, 10), and [10, ∞)}. At each annual decision epoch there is a two-stage decision problem: the first stage is whether to PSA test or not; the second stage is whether to biopsy or not. If a patient receives a positive biopsy result, he is assumed to be treated by prostatectomy. If a patient receives a negative biopsy result then he awaits PSA screening in the next decision epoch. is an illustration of the prostate cancer screening decision process.
Following is a description of the essential elements of the POMDP model (additional details are in the
appendix):
Time Horizon: PSA screening is performed at each of a set of annual decision epochs starting at age 40,
t ![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{40, 41, 42,

}.
Decisions: at ![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{
B, DB, DP}, denotes the decision to perform a biopsy (B), defer biopsy and obtain a new PSA test result in epoch
t + 1 (DB), or defer the biopsy decision and PSA testing in decision epoch
t + 1 (DP). Combinations of these three actions over the decision horizon determine the PSA test and biopsy schedule. For instance,
a40 =
DB,
a41 =
DP,
a42 =
DB and
a43 =
B imply PSA testing at age 41 and 43, and followed by biopsy at age 43. Note that decisions are made sequentially and based on the probability of prostate cancer at each decision epoch.
States: At each decision epoch a patient is in one of several health states including no cancer (NC), organ confined (OC) cancer detected, extraprostatic (EP) cancer detected, lymph node-positive (LN) cancer detected, metastases (mets) detected, and death from prostate cancer and all other causes (D). Prior to detection, OC, EP, LN and mets are not differentiable. We use an aggregate state C to denote prostate cancer present but not detected. Note that we assume states NC and C are not directly observable without biopsy. The underlying health states of Markov model are illustrated in .
Observations: At each decision epoch the patient is observed in one of a set of observable states including a particular PSA interval, or cancer detected and treated (
T) or death (
D). The observable states are indexed by
t
M = {1, 2, 3, …,
m, T, D}.
Transition Probabilities: There are two kinds of transition probabilities in the model.
pt(
st+1|
st, at) denote the state transition probability from health state
st to
st+1 at epoch
t given action
at.
qt(
t|
st) denote the probability of observing PSA state
t
M given the patient is in health state
st
S; its matrix form is also known as the
information matrix.
Belief States: The belief state, πt = (πt(NC), πt(C), πt(T), πt(D)), defines the probability the patient is in one of the four health states at epoch t. Note that, for a patient without a positive biopsy result, his belief state can be represented as πt = (1 − πt(C), πt(C), 0, 0). We use πt(C), the probability of having prostate cancer, to denote the belief state in the remainder of the article.
Rewards:
rt(
st, at) is the reward of living for a year given the patient is in health state
st and decision
at. The expected reward of living for a year is the average over possible health states:
rt(π
t,
at) = ∑
st
S
rt(
st, at)π
t(
st). The reward defines the decision makers perspective. For patient’s it is measured in QALYs. For the societal perspective it is measured (in dollars) as the difference in (a) the product of QALYs and a willingness to pay factor and (b) the cost of PSA tests, biopsy, and treatment.
We assume that a patient has at most one biopsy prior to detection of prostate cancer. This is a reasonable assumption since multiple biopsies prior to detection are seldom done due to the invasive nature of the procedure and the slow progression of prostate cancer. This assumption is validated by our dataset in which 81.4% of the patients have only one biopsy. It is also consistent with findings in (
15) in the context of regular annual screening.
The detailed transition probabilities, the reward function and the optimality equations for the POMDP are provided in the
appendix. The optimal screening policy is obtained by solving the optimality equations to estimate the PSA screening and biopsy which maximize the expected rewards over a patient’s lifetime. POMDPs are computationally challenging to solve. We use a finite fixed-grid method (
16) to solve the optimality equations and obtain the optimal policy. Theoretical properties and methodological details of the solution methodology for a related model on biopsy referral can be found in (
15).
The optimal policies obtained from our POMDP model differ in several aspects from traditional policies which are based on a predefined PSA screening frequency and PSA threshold for biopsy referral. First, our policy uses the probability of being in state
C. The probability is estimated from all prior PSA observations using Bayesian updating as the patient ages (see the section on Bayesian updating in the
appendix for details). There is no fixed PSA test frequency; rather, PSA testing and biopsy occur according to a policy, defined by testing according to probability threshold, that maximizes long term expected rewards for the patient.
Data description
We obtained the results of all PSA tests done in Olmsted County, Minnesota from 1983 to 2005. There are a total of 11,872 men underwent PSA testing during this timeframe with a total of 50,589 PSA test results. The medical records linkage system of the Rochester Epidemiology Project (
17) was then used to identify all patients that underwent a prostate biopsy or that had a pathologic diagnosis of prostate cancer during this same period of time. All health care providers in Olmsted County participate in the records linkage system, and more than 95% of Olmsted County residents receive their medical care in Olmsted County, implying that missed prostate biopsies and prostate cancer diagnoses are unlikely. We merged the PSA data with the clinical data to obtain a comprehensive longitudinal dataset of PSA screening occurring in a fixed geographic population of men not subject to major referral biases. Since we focus on the screening policy for early detection we do not consider PSA records after cancer treatment. We use it to estimate prostate cancer probabilities conditional on PSA level for a general population. Details of clinical and demographic characteristics of the screened cohort can be found in .
| Table 1Clinical and demographic characteristics of the screened cohort from Olmsted County, Minnesota. |
Base case parameter estimation
Following is a summary of the model parameter estimates. First, we use our dataset to estimate the probability of observing different PSA values conditional on the patients’ health states. This is represented by the information matrix defined earlier in the Methods section. (Note that, Thompson et al. (
18) estimated the prostate cancer probability given PSA test results, which is related but not the same as what we need.) Since it is possible that some men with prostate cancer were never subjected to biopsy and therefore never diagnosed, the information matrix,
Qt(
t|
st), is subject to bias. We used the methods proposed by Begg and Greenes (
19) to correct for this bias. We use biopsy as the confirmative test; thus, we assume that patients who have positive biopsies are true cancer patients and those who have negative biopsy are true no cancer patients. We first separate the patients into different groups according to their PSA values ([0, 1), [1, 2.5), [2.5, 4), [4, 7), [7, 10) and ≥ 10) and ages ([40, 50), [50, 60), [60, 70), [70, 80) and ≥ 80). Within each group, we assume patients without a confirmative test (biopsy) have the same probability of prostate cancer as patients who have had a confirmative test. The probability of having prostate cancer based on patients with confirmative tests is used to infer the cancer state of patients without confirmative tests. The resulting information matrix is
The rows of
Qt(
t|
st) correspond to states
NC,
C,
T, and
D, respectively; the columns correspond to PSA intervals [0, 1), [1, 2.5), [2.5, 4), [4, 7), [7, 10), [10, ∞),
T, and
D, respectively.
Qt(
t|
st) is fixed for all the ages in this empirical study since our numerical experiments showed the differences in
Qt(
t|
st) with respect to age do not significantly influence the optimal policy.
The prostate cancer incidence rate,
wt (shown in ), is calculated from the prevalence of prostate cancer at autopsy from the autopsy review study (
20) assuming the annual prostate cancer incidence rate is fixed for each ten-year age interval.
| Table 2The age-specific values of the prostate cancer incidence rate, wt. |
In the absence of screening, we assume that patients are not detected with prostate cancer unless metastasis is discovered, in which case, radical prostatectomy is no longer a good treatment option. Therefore, we assume patients in state
C cannot transition to state
T in the absence of screening. Patients in state
C have higher death rate from prostate cancer. The prostate cancer death rate in the absence of screening,
et, is approximated using the death rate of prostate cancer patients under conservative management (
21).
We assume patients detected with prostate cancer are treated by prostatectomy. This is one of the most common forms of treatment at present (
6,
22). Patients that enter the treatment state,
T, receive expected rewards based on an aggregation of the four prostate cancer stages. In order to estimate the annual prostate cancer death rate excluding death from other causes for patients in state
T,
bt, we estimated survival rates for the four prostate cancer stages using the Mayo Clinic Radical Prostatectomy Registry (MCRPR) and the Surveillance, Epidemiology and End Results (SEER) data (
23) (see for details of data sources and values). In estimating the MCRPR transition rates after radical prostatectomy, all patients having undergone radical prostatectomy at the Mayo Clinic in the PSA era (1990–2005) were included. The patient population of 13,313 men was stratified into three TNM 2002-based stage groupings (Stage II = organ confined, Stage III = extra-prostatic, Stage IV = lymph node positive). The probability of a radical prostatectomy patient being in one of the three stage groups was calculated from the pathologic evaluation of the prostatectomy specimens obtained from each patient. The cumulative incidence (probability) of prostate cancer metastases and prostate cancer-specific mortality was calculated for each stage group using the competing risks estimator with non-prostate cancer death representing the competing risk.
We use the four cancer stages to calculate the total annual prostate cancer death rate for state T. The calculation is based on the mean death rate across the four prostate cancer stages at the time of detection. We estimated the annual death rates of organ confined, extraprostatic, and lymph nodes positive patients from the 5, 10, 15 and 20 year death rates in . (Letting st denote the t year death rate and solving four equations (1 − s1)t = 1 − st for t = 5, 10, 15, 20 for organ confined, extraprostatic and lymph nodes positive patients.) We estimated annual death rate for metastases from the 5 year death rates for patients’ age < 65 and ≥ 65 from the SEER data. The aggregated death rate of T was then computed using the weighted average (weights correspond to the probabilities of being in different cancer stages upon detection in ) of the death rate of the 4 different cancer stages in . Based on our estimates the annual prostate cancer death rate in state T is bt = 0.00672 for t < 65 and bt = 0.00923 for t ≥ 65 (note that the SEER data differentiates between patients younger and older than 65 years).
In our base case, we use a decrement of μ = 0.05 in the year of biopsy to estimate the disutility of biopsy. To our knowledge no estimates of utility decrement exist yet for prostate biopsy; however this estimate is consistent with similar choices of parameters for a recent breast cancer study (
24) and a bladder cancer biopsy study (
25). In our base case we assume that the utility decrement in years after treatment via prostatectomy is 0.145, which is the midpoint of two extremes reported in Bremner et al. (
26): (a) the most severe (metastases), 0.24, and (b) minor symptoms (mild sexual disfunction), 0.05. We assume a patient in state
C, who has not been detected with prostate cancer, has no reduction in quality of life.
The mortality rate from other causes,
dt, is age specific and based on the general mortality rate from Heron (
27) minus the prostate cancer mortality rate from the National Cancer Institute (
23). Note that the National Cancer Institute (
23) reports a single prostate cancer mortality rate for ages greater than 95 and Heron (
27) reports a single all cause mortality rate for ages greater than 95. Therefore, we assume that
dt is fixed after the age of 95. All other parameters and their sources, are provided in .
| Table 3Parameters, their sources, and specific values used in our base case analysis. |