Search tips
Search criteria

Results 1-25 (39)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Surrogate Endpoint Analysis: An Exercise in Extrapolation 
Surrogate endpoints offer the hope of smaller or shorter cancer trials. It is, however, important to realize they come at the cost of an unverifiable extrapolation that could lead to misleading conclusions. With cancer prevention, the focus is on hypothesis testing in small surrogate endpoint trials before deciding whether to proceed to a large prevention trial. However, it is not generally appreciated that a small surrogate endpoint trial is highly sensitive to a deviation from the key Prentice criterion needed for the hypothesis-testing extrapolation. With cancer treatment, the focus is on estimation using historical trials with both surrogate and true endpoints to predict treatment effect based on the surrogate endpoint in a new trial. Successively leaving out one historical trial and computing the predicted treatment effect in the left-out trial yields a standard error multiplier that summarizes the increased uncertainty in estimation extrapolation. If this increased uncertainty is acceptable, three additional extrapolation issues (biological mechanism, treatment following observation of the surrogate endpoint, and side effects following observation of the surrogate endpoint) need to be considered. In summary, when using surrogate endpoint analyses, an appreciation of the problems of extrapolation is crucial.
PMCID: PMC3611854  PMID: 23264679
2.  The risky reliance on small surrogate endpoint studies when planning a large prevention trial 
The definitive evaluation of treatment to prevent a chronic disease with low incidence in middle age, such as cancer or cardiovascular disease, requires a trial with a large sample size of perhaps 20,000 or more. To help decide whether to implement a large true endpoint trial, investigators first typically estimate the effect of treatment on a surrogate endpoint in a trial with a greatly reduced sample size of perhaps 200 subjects. If investigators reject the null hypothesis of no treatment effect in the surrogate endpoint trial they implicitly assume they would likely correctly reject the null hypothesis of no treatment effect for the true endpoint. Surrogate endpoint trials are generally designed with adequate power to detect an effect of treatment on surrogate endpoint. However, we show that a small surrogate endpoint trial is more likely than a large surrogate endpoint trial to give a misleading conclusion about the beneficial effect of treatment on true endpoint, which can lead to a faulty (and costly) decision about implementing a large true endpoint prevention trial. If a small surrogate endpoint trial rejects the null hypothesis of no treatment effect, an intermediate-sized surrogate endpoint trial could be a useful next step in the decision-making process for launching a large true endpoint prevention trial.
PMCID: PMC3616635  PMID: 23565041
Cancer prevention; Cardiovascular disease; Prentice criterion; Principal stratification; Sample size calculation; Surrogate endpoint
3.  Causal inference, probability theory, and graphical insights 
Statistics in medicine  2013;32(25):4319-4330.
Causal inference from observational studies is a fundamental topic in biostatistics. The causal graph literature typically views probability theory as insufficient to express causal concepts in observational studies. In contrast, the view here is that probability theory is a desirable and sufficient basis for many topics in causal inference for the following two reasons. First probability theory is generally more flexible than causal graphs: besides explaining such causal graph topics as M-bias (adjusting for a collider) and bias amplification and attenuation (when adjusting for instrumental variable), probability theory is also the foundation of the paired availability design for historical controls, which does not fit into a causal graph framework. Second probability theory is the basis for insightful graphical displays including the BK-Plot for understanding Simpson’s paradox with a binary confounder, the BK2-plot for understanding bias amplification and attenuation in the presence of an unobserved binary confounder, and the PAD-Plot for understanding the principal stratification component of the paired availability design.
PMCID: PMC4072761  PMID: 23661231
BK-Plot; causal graph; confounder; instrumental variable; observational study; Simpson’s paradox
4.  Early reporting for cancer screening trials 
Journal of medical screening  2008;15(3):122-129.
Many cancer screening trials involve a screening programme of one or more screenings with follow-up after the last screening. Usually a maximum follow-up time is selected in advance. However, during the follow-up period there is an opportunity to report the results of the trial sooner than planned. Early reporting of results from a randomized screening trial is important because obtaining a valid result sooner translates into health benefits reaching the general population sooner. The health benefits are reduction in cancer deaths if screening is found to be beneficial and more screening is recommended, or avoidance of unnecessary biopsies, work-ups and morbidity if screening is not found to be beneficial and the rate of screening drops.
Our proposed method for deciding if results from a cancer screening trial should be reported earlier in the follow-up period is based on considerations involving postscreening noise. Postscreening noise (sometimes called dilution) refers to cancer deaths in the follow-up period that could not have been prevented by screening: (1) cancer deaths in the screened group that occurred after the last screening in subjects whose cancers were not detected during the screening program and (2) cancer deaths in the control group that occurred after the time of the last screening and whose cancers would not have been detected during the screening programme had they been randomized to screening (the number of which is unobserved). Because postscreening noise increases with follow-up after the last screening, we propose early reporting at the time during the follow-up period when postscreening noise first starts to overwhelm the estimated effect of screening as measured by a z-statistic. This leads to a confidence interval, adjusted for postscreening noise, that would not change substantially with additional follow-up. Details of the early reporting rule were refined by simulation, which also accounts for multiple looks.
For the re-analysis of the Health Insurance Plan trial for breast cancer screening and the Mayo Lung Project for lung cancer screening, estimates and confidence intervals for the effect of screening on cancer mortality were similar on early reporting and later.
The proposed early reporting rule for a cancer screening trial with post-screening follow-up is a promising method for making results from the trial available sooner, which translates into health benefits (reduction in cancer deaths or avoidance of unnecessary morbidity) reaching the population sooner.
PMCID: PMC2586667  PMID: 18927094
5.  Gene Signatures Revisited 
PMCID: PMC3283539  PMID: 22262869
6.  Evaluation of markers and risk prediction models: Overview of relationships between NRI and decision-analytic measures 
For the evaluation and comparison of markers and risk prediction models, various novel measures have recently been introduced as alternatives to the commonly used difference in the area under the ROC curve (ΔAUC). The Net Reclassification Improvement (NRI) is increasingly popular to compare predictions with one or more risk thresholds, but decision-analytic approaches have also been proposed.
We aimed to identify the mathematical relationships between novel performance measures for the situation that a single risk threshold T is used to classify patients as having the outcome or not.
We considered the NRI and three utility-based measures that take misclassification costs into account: difference in Net Benefit (ΔNB), difference in Relative Utility (ΔRU), and weighted NRI (wNRI). We illustrate the behavior of these measures in 1938 women suspect of ovarian cancer (prevalence 28%).
The three utility-based measures appear transformations of each other, and hence always lead to consistent conclusions. On the other hand, conclusions may differ when using the standard NRI, depending on the adopted risk threshold T, prevalence P and the obtained differences in sensitivity and specificity of the two models that are compared. In the case study, adding the CA-125 tumor marker to a baseline set of covariates yielded a negative NRI yet a positive value for the utility-based measures.
The decision-analytic measures are each appropriate to indicate the clinical usefulness of an added marker or compare prediction models, since these measures each reflect misclassification costs. This is of practical importance as these measures may thus adjust conclusions based on purely statistical measures. A range of risk thresholds should be considered in applying these measures.
PMCID: PMC4066820  PMID: 23313931
7.  Systems Biology and Cancer: Promises and Perils 
Systems biology uses systems of mathematical rules and formulas to study complex biological phenomena. In cancer research there are three distinct threads in systems biology research: modeling biology or biophysics with the goal of establishing plausibility or obtaining insights, modeling based on statistics, bioinformatics, and reverse engineering with the goal of better characterizing the system, and modeling with the goal of clinical predictions. Using illustrative examples we discuss these threads in the context of cancer research.
PMCID: PMC3156977  PMID: 21419159
bioinformatics; microarrays; reverse engineering; receiver operating characteristic curves
8.  Designing a Randomized Clinical Trial to Evaluate Personalized Medicine: A New Approach Based on Risk Prediction 
We define personalized medicine as the administration of treatment to only persons thought most likely to benefit, typically those at high risk for mortality or another detrimental outcome. To evaluate personalized medicine, we propose a new design for a randomized trial that makes efficient use of high-throughput data (such as gene expression microarrays) and clinical data (such as tumor stage) collected at baseline from all participants. Under this design for a randomized trial involving experimental and control arms with a survival outcome, investigators first estimate the risk of mortality in the control arm based on the high-throughput and clinical data. Then investigators use data from both randomization arms to estimate both the effect of treatment among all participants and among participants in the highest prespecified category of risk. This design requires only an 18.1% increase in sample size compared with a standard randomized trial. A trial based on this design that has a 90% power to detect a realistic increase in survival from 70% to 80% among all participants, would also have a 90% power to detect an increase in survival from 50% to 73% in the highest quintile of risk.
PMCID: PMC2994862  PMID: 21044964
10.  Putting Risk Prediction in Perspective: Relative Utility Curves 
Risk prediction models based on medical history or results of tests are increasingly common in the cancer literature. An important use of these models is to make treatment decisions on the basis of estimated risk. The relative utility curve is a simple method for evaluating risk prediction in a medical decision-making framework. Relative utility curves have three attractive features for the evaluation of risk prediction models. First, they put risk prediction into perspective because relative utility is the fraction of the expected utility of perfect prediction obtained by the risk prediction model at the optimal cut point. Second, they do not require precise specification of harms and benefits because relative utility is plotted against a summary measure of harms and benefits (ie, the risk threshold). Third, they are easy to compute from standard tables of data found in many articles on risk prediction. An important use of relative utility curves is to evaluate the addition of a risk factor to the risk prediction model. To illustrate an application of relative utility curves, an analysis was performed on previously published data involving the addition of breast density to a risk prediction model for invasive breast cancer.
PMCID: PMC2778669  PMID: 19843888
11.  Improving the Biomarker Pipeline to Develop and Evaluate Cancer Screening Tests 
The biomarker pipeline to develop and evaluate cancer screening tests has three stages: identification of promising biomarkers for the early detection of cancer, initial evaluation of biomarkers for cancer screening, and definitive evaluation of biomarkers for cancer screening. Statistical and biological issues to improve this pipeline are discussed. Although various recommendations, such as identifying cases based on clinical symptoms, keeping biomarker tests simple, and adjusting for postscreening noise, have been made previously, they are not widely known. New recommendations include more frequent specimen collection to help identify promising biomarkers and the use of the paired availability design with interval cases (symptomatic cancers detected in the interval after screening) for initial evaluation of biomarkers for cancer screening.
PMCID: PMC2728744  PMID: 19574417
12.  Using relative utility curves to evaluate risk prediction 
Because many medical decisions are based on risk prediction models constructed from medical history and results of tests, the evaluation of these prediction models is important. This paper makes five contributions to this evaluation: (1) the relative utility curve which gauges the potential for better prediction in terms of utilities, without the need for a reference level for one utility, while providing a sensitivity analysis for missipecification of utilities, (2) the relevant region, which is the set of values of prediction performance consistent with the recommended treatment status in the absence of prediction (3) the test threshold, which is the minimum number of tests that would be traded for a true positive in order for the expected utility to be non-negative, (4) the evaluation of two-stage predictions that reduce test costs, and (5) connections among various measures of prediction performance. An application involving the risk of cardiovascular disease is discussed.
PMCID: PMC2804257  PMID: 20069131
decision analysis; decision curve; receiver operating characteristic curve; utility
13.  Predictive Accuracy of the Liverpool Lung Project Risk Model for Stratifying Patients for Computed Tomography Screening for Lung Cancer 
Annals of internal medicine  2012;157(4):242-250.
External validation of existing lung cancer risk prediction models is limited. Using such models in clinical practice to guide the referral of patients for computed tomography (CT) screening for lung cancer depends on external validation and evidence of predicted clinical benefit.
To evaluate the discrimination of the Liverpool Lung Project (LLP) risk model and demonstrate its predicted benefit for stratifying patients for CT screening by using data from 3 independent studies from Europe and North America.
Case–control and prospective cohort study.
Europe and North America.
Participants in the European Early Lung Cancer (EUELC) and Harvard case–control studies and the LLP population-based prospective cohort (LLPC) study.
5-year absolute risks for lung cancer predicted by the LLP model.
The LLP risk model had good discrimination in both the Harvard (area under the receiver-operating characteristic curve [AUC], 0.76 [95% CI, 0.75 to 0.78]) and the LLPC (AUC, 0.82 [CI, 0.80 to 0.85]) studies and modest discrimination in the EUELC (AUC, 0.67 [CI, 0.64 to 0.69]) study. The decision utility analysis, which incorporates the harms and benefit of using a risk model to make clinical decisions, indicates that the LLP risk model performed better than smoking duration or family history alone in stratifying high-risk patients for lung cancer CT screening.
The model cannot assess whether including other risk factors, such as lung function or genetic markers, would improve accuracy. Lack of information on asbestos exposure in the LLPC limited the ability to validate the complete LLP risk model.
Validation of the LLP risk model in 3 independent external data sets demonstrated good discrimination and evidence of predicted benefits for stratifying patients for lung cancer CT screening. Further studies are needed to prospectively evaluate model performance and evaluate the optimal population risk thresholds for initiating lung cancer screening.
Primary Funding Source
Roy Castle Lung Cancer Foundation.
PMCID: PMC3723683  PMID: 22910935
14.  Development Tracks for Cancer Prevention Markers 
Disease markers  2004;20(2):97-102.
We provide a general framework for describing various roles for biomarkers in cancer prevention research (early detection, surrogate endpoint, and cohort identification for primary prevention) and the phases in their evaluation.
PMCID: PMC3839325  PMID: 15322317
15.  Predicting treatment effect from surrogate endpoints and historical trials: an extrapolation involving probabilities of a binary outcome or survival to a specific time 
Biometrics  2011;68(1):248-257.
Using multiple historical trials with surrogate and true endpoints, we consider various models to predict the effect of treatment on a true endpoint in a target trial in which only a surrogate endpoint is observed. This predicted result is computed using (1) a prediction model (mixture, linear, or principal stratification) estimated from historical trials and the surrogate endpoint of the target trial and (2) a random extrapolation error estimated from successively leaving out each trial among the historical trials. The method applies to either binary outcomes or survival to a particular time that is computed from censored survival data. We compute a 95% confidence interval for the predicted result and validate its coverage using simulation. To summarize the additional uncertainty from using a predicted instead of true result for the estimated treatment effect, we compute its multiplier of standard error. Software is available for download.
PMCID: PMC3218246  PMID: 21838732
Randomized trials; Reproducibility; Principal stratification
16.  Clarifying the Role of Principal Stratification in the Paired Availability Design 
The paired availability design for historical controls postulated four classes corresponding to the treatment (old or new) a participant would receive if arrival occurred during either of two time periods associated with different availabilities of treatment. These classes were later extended to other settings and called principal strata. Judea Pearl asks if principal stratification is a goal or a tool and lists four interpretations of principal stratification. In the case of the paired availability design, principal stratification is a tool that falls squarely into Pearl's interpretation of principal stratification as “an approximation to research questions concerning population averages.” We describe the paired availability design and the important role played by principal stratification in estimating the effect of receipt of treatment in a population using data on changes in availability of treatment. We discuss the assumptions and their plausibility. We also introduce the extrapolated estimate to make the generalizability assumption more plausible. By showing why the assumptions are plausible we show why the paired availability design, which includes principal stratification as a key component, is useful for estimating the effect of receipt of treatment in a population. Thus, for our application, we answer Pearl's challenge to clearly demonstrate the value of principal stratification.
PMCID: PMC3114955  PMID: 21686085
principal stratification; causal inference; paired availability design
17.  Estimation and Inference for the Causal Effect of Receiving Treatment on a Multinomial Outcome: An Alternative Approach 
Biometrics  2011;67(1):319-323.
Recently Cheng (Biometrics, 2009) proposed a model for the causal effect of receiving treatment when there is all-or-none compliance in one randomization group, with maximum likelihood estimation based on convex programming. We discuss an alternative approach that involves a model for all-or-none compliance in two randomization groups and estimation via a perfect fit or an EM algorithm for count data. We believe this approach is easier to implement, which would facilitate the reproduction of calculations.
PMCID: PMC3030650  PMID: 20560933
All-or-none compliance; Causal effect; Multinomial outcomes; Noncompliance; Perfect fit; Principal stratification; Randomized trials
19.  Transparency and reproducibility in data analysis: the Prostate Cancer Prevention Trial 
Biostatistics (Oxford, England)  2010;11(3):413-418.
With the analysis of complex, messy data sets, the statistics community has recently focused attention on “reproducible research,” namely research that can be readily replicated by others. One standard that has been proposed is the availability of data sets and computer code. However, in some situations, raw data cannot be disseminated for reasons of confidentiality or because the data are so messy as to make dissemination impractical. For one such situation, we propose 2 steps for reproducible research: (i) presentation of a table of data and (ii) presentation of a formula to estimate key quantities from the table of data. We illustrate this strategy in the analysis of data from the Prostate Cancer Prevention Trial, which investigated the effect of the drug finasteride versus placebo on the period prevalence of prostate cancer. With such an important result at stake, a transparent analysis was important.
PMCID: PMC2883301  PMID: 20173101
Categorical data; Maximum likelihood; Missing data; Multinomial–Poisson transformation; Propensity-to-be-missing score; Randomized trials
20.  Simple and flexible classification of gene expression microarrays via Swirls and Ripples 
BMC Bioinformatics  2010;11:452.
A simple classification rule with few genes and parameters is desirable when applying a classification rule to new data. One popular simple classification rule, diagonal discriminant analysis, yields linear or curved classification boundaries, called Ripples, that are optimal when gene expression levels are normally distributed with the appropriate variance, but may yield poor classification in other situations.
A simple modification of diagonal discriminant analysis yields smooth highly nonlinear classification boundaries, called Swirls, that sometimes outperforms Ripples. In particular, if the data are normally distributed with different variances in each class, Swirls substantially outperforms Ripples when using a pooled variance to reduce the number of parameters. The proposed classification rule for two classes selects either Swirls or Ripples after parsimoniously selecting the number of genes and distance measures. Applications to five cancer microarray data sets identified predictive genes related to the tissue organization theory of carcinogenesis.
The parsimonious selection of classifiers coupled with the selection of either Swirls or Ripples provides a good basis for formulating a simple, yet flexible, classification rule. Open source software is available for download.
PMCID: PMC2949887  PMID: 20825641
21.  Using microarrays to study the microenvironment in tumor biology: The crucial role of statistics 
Seminars in cancer biology  2008;18(5):305-310.
Microarrays represent a potentially powerful tool for better understanding the role of the microenvironment on tumor biology. To make the best use of microarray data and avoid incorrect or unsubstantiated conclusions, care must be taken in the statistical analysis. To illustrate the statistical issues involved we discuss three microarray studies related to the microenvironment and tumor biology involving: (i) prostatic stroma cells in cancer and non-cancer tissues; (ii) breast stroma and epithelial cells in breast cancer patients and non-cancer patients; and (iii) serum associated with wound response and stroma in cancer patients. Using these examples we critically discuss three types of analyses: differential gene expression, cluster analysis, and class prediction. We also discuss design issues.
PMCID: PMC2584335  PMID: 18455427
Bonferroni; Class prediction; Cluster analysis; Differential expression; False discovery rate; Sample size
22.  Plausibility of stromal initiation of epithelial cancers without a mutation in the epithelium: a computer simulation of morphostats 
BMC Cancer  2009;9:89.
There is experimental evidence from animal models favoring the notion that the disruption of interactions between stroma and epithelium plays an important role in the initiation of carcinogenesis. These disrupted interactions are hypothesized to be mediated by molecules, termed morphostats, which diffuse through the tissue to determine cell phenotype and maintain tissue architecture.
We developed a computer simulation based on simple properties of cell renewal and morphostats.
Under the computer simulation, the disruption of the morphostat gradient in the stroma generated epithelial precursors of cancer without any mutation in the epithelium.
The model is consistent with the possibility that the accumulation of genetic and epigenetic changes found in tumors could arise after the formation of a founder population of aberrant cells, defined as cells that are created by low or insufficient morphostat levels and that no longer respond to morphostat concentrations. Because the model is biologically plausible, we hope that these results will stimulate further experiments.
PMCID: PMC2663766  PMID: 19309499
23.  Paradoxes in carcinogenesis: New opportunities for research directions 
BMC Cancer  2007;7:151.
The prevailing paradigm in cancer research is the somatic mutation theory that posits that cancer begins with a single mutation in a somatic cell followed by successive mutations. Much cancer research involves refining the somatic mutation theory with an ever increasing catalog of genetic changes. The problem is that such research may miss paradoxical aspects of carcinogenesis for which there is no likely explanation under the somatic mutation theory. These paradoxical aspects offer opportunities for new research directions that should not be ignored.
Various paradoxes related to the somatic mutation theory of carcinogenesis are discussed: (1) the presence of large numbers of spatially distinct precancerous lesions at the onset of promotion, (2) the large number of genetic instabilities found in hyperplastic polyps not considered cancer, (3) spontaneous regression, (4) higher incidence of cancer in patients with xeroderma pigmentosa but not in patients with other comparable defects in DNA repair, (5) lower incidence of many cancers except leukemia and testicular cancer in patients with Down's syndrome, (6) cancer developing after normal tissue is transplanted to other parts of the body or next to stroma previously exposed to carcinogens, (7) the lack of tumors when epithelial cells exposed to a carcinogen were transplanted next to normal stroma, (8) the development of cancers when Millipore filters of various pore sizes were was inserted under the skin of rats, but only if the holes were sufficiently small. For the latter paradox, a microarray experiment is proposed to try to better understand the phenomena.
The famous physicist Niels Bohr said "How wonderful that we have met with a paradox. Now we have some hope of making progress." The same viewpoint should apply to cancer research. It is easy to ignore this piece of wisdom about the means to advance knowledge, but we do so at our peril.
PMCID: PMC1993836  PMID: 17683619
24.  Selecting patients for randomized trials: a systematic approach based on risk group 
Trials  2006;7:30.
A key aspect of randomized trial design is the choice of risk group. Some trials include patients from the entire at-risk population, others accrue only patients deemed to be at increased risk. We present a simple statistical approach for choosing between these approaches. The method is easily adapted to determine which of several competing definitions of high risk is optimal.
We treat eligibility criteria for a trial, such as a smoking history, as a prediction rule associated with a certain sensitivity (the number of patients who have the event and who are classified as high risk divided by the total number patients who have an event) and specificity (the number of patients who do not have an event and who do not meet criteria for high risk divided by the total number of patients who do not have an event). We then derive simple formulae to determine the proportion of patients receiving intervention, and the proportion who experience an event, where either all patients or only those at high risk are treated. We assume that the relative risk associated with intervention is the same over all choices of risk group. The proportion of events and interventions are combined using a net benefit approach and net benefit compared between strategies.
We applied our method to design a trial of adjuvant therapy after prostatectomy. We were able to demonstrate that treating a high risk group was superior to treating all patients; choose the optimal definition of high risk; test the robustness of our results by sensitivity analysis. Our results had a ready clinical interpretation that could immediately aid trial design.
The choice of risk group in randomized trials is usually based on rather informal methods. Our simple method demonstrates that this decision can be informed by simple statistical analyses.
PMCID: PMC1609186  PMID: 17022818
25.  Identifying genes that contribute most to good classification in microarrays 
BMC Bioinformatics  2006;7:407.
The goal of most microarray studies is either the identification of genes that are most differentially expressed or the creation of a good classification rule. The disadvantage of the former is that it ignores the importance of gene interactions; the disadvantage of the latter is that it often does not provide a sufficient focus for further investigation because many genes may be included by chance. Our strategy is to search for classification rules that perform well with few genes and, if they are found, identify genes that occur relatively frequently under multiple random validation (random splits into training and test samples).
We analyzed data from four published studies related to cancer. For classification we used a filter with a nearest centroid rule that is easy to implement and has been previously shown to perform well. To comprehensively measure classification performance we used receiver operating characteristic curves. In the three data sets with good classification performance, the classification rules for 5 genes were only slightly worse than for 20 or 50 genes and somewhat better than for 1 gene. In two of these data sets, one or two genes had relatively high frequencies not noticeable with rules involving 20 or 50 genes: desmin for classifying colon cancer versus normal tissue; and zyxin and secretory granule proteoglycan genes for classifying two types of leukemia.
Using multiple random validation, investigators should look for classification rules that perform well with few genes and select, for further study, genes with relatively high frequencies of occurrence in these classification rules.
PMCID: PMC1574352  PMID: 16959042

Results 1-25 (39)