All projects with serum-based measurements use a single shared nested case-control sample of participants to evaluate associations of risk factors with the risk or prostate cancer so that each subject will provide a more complete biomarker and genetic profile for the evaluation of joint effects of these putative factors. In addition, several projects are using prostate tissue from the biopsy cores from both cases and controls, and the prostatectomy tissue from cases. These studies use a variety of designs and laboratory methods to address questions about somatic mutation, inflammation, focal atrophy, proliferation and growth factor signaling. Below we highlight features of the case-control study design, including the definition of cases and controls.
Selection of nested case-control design
Data from the PCPT, a randomized clinical trial, can be thought of as a prospective cohort, with data collected on exposures before diagnosis of cancer being used to predict prostate cancer risk. However, even though this is a cohort study, a time-to-event analysis is inappropriate. Because the PCPT had interim events and the detection of the interim events was biased between the arms due to the enhanced sensitivity of PSA, DRE and needle biopsy in the finasteride group, the only method to determine if a participant was prostate cancer free or not was an end-of-study biopsy that assessed the endpoint in an unbiased fashion.
As a result of this detection method, our endpoints are not time-to-event; rather, they are simply presence or absence of prostate cancer by the end of seven years of follow-up, as it was unknown when many of the cancers would have been detected without the end-of-study biopsy. We therefore define the outcome in the cohort study as the 7-year period prevalence of prostate cancer found either at a for-cause procedure or an end-of-study biopsy. Because the tissue, serum, and genetic assays are expensive, the prospective nested case-control study design was selected in order to efficiently examine a sample of men without prostate cancer.
Definition of prostate cancer cases
The primary case definition for this program is biopsy-proven presence of prostate cancer. From an initial pool of 2,401 potential cases, cases for this project were excluded because the detection of the prostate cancer was after the trial was unblinded (n=173) or was outstide the established timeframe of the seven-year end-of-study biopsy (n=91) or an adequate baseline serum sample was not available (n=328). This resulted in a total of 1,809 cases of prostate cancer with available baseline serum included in this project. Approximately 40% of our cases are interim and 60% are cases detected at the end-of-study biopsy. All analyses examining prostate cancer cases are stratified by Gleason score, defining high grade cancer as Gleason score 7–10 (n=498; 218 in placebo arm, 280 in finasteride arm) and low grade cancer, defined as Gleason score 2–6 (n=1233; 782 in placebo arm, 451 in finasteride arm). Although the subsequent analyses, described above (
4–
7), have tempered concerns regarding the excess of high grade cancer diagnosed in men receiving finasteride, all projects continue to examine high grade cancers given the clinical relevance of high grade disease and the urgent need to develop markers of aggressive disease that go beyond Gleason score. In addition, given the relatively small numbers of Gleason score 8–10 cancers, it was not possible draw definitive conclusions regarding the potential of finasteride to affect this important subgroup.
Two endpoints often used in epidemiologic studies of prostate cancer, stage and prostate cancer-specific death cannot be examined in the project research proposed because nearly all of the men diagnosed with prostate cancer in the PCPT had early stage disease, i.e. 98% were T1 or T2. This is most likely due to the requirement for serum PSA levels of ≤ 3 ng/mL and normal DRE at randomization and annual PSA and DRE testing.
Definition of controls
Controls were randomly selected in a 1:1 ratio from men who completed the end-of-study prostate biopsy and had no evidence of prostate cancer. Controls include all eligible non-white men (primarily African-Americans and Hispanics), to better support exploratory analyses in these subgroups. Remaining controls were frequency matched to cases by distributions of age at randomization, in 5-year intervals, first-degree family history of prostate cancer (established risk factors for prostate cancer), and intervention arm. By oversampling the non-white men and matching on the factors listed above, the estimated race-specific odds ratios for prostate cancer are not interpretable.
Due to the impossibility of sampling the entire prostate using needle biopsies it is possible that some men with prostate cancer will be misclassified as controls. However, the controls in this program will yield a much lower misclassification of prostate cancer disease status compared to that typically seen in epidemiologic studies using cumulative incidence sampling of controls. This reduced misclassification is especially important when the expected associations for some of the research questions being addressed are in the small to moderate range (e.g., SNPs and prostate cancer).
The baseline demographics and potential prostate cancer risk factors (diabetes, body mass index (BMI), physical activity, and smoking status) for the cases and controls as well as for those non-cancers in the PCPT who were eligible to be in the controls but were not chosen, are presented in . Also presented are the characteristics of the men in the PCPT who had an endpoint evaluated (i.e., either had an interim prostate cancer or had an end-of-study biopsy within the required timeframe) and those who were not evaluated. As shown, the men who were evaluated for an endpoint were younger, more likely to have had a family history of prostate cancer and tended to be Caucasian. Redman, et al, reported that by accounting for the nonrandom missing biopsy results, the true rate of cancer would have been slightly less in both intervention arms, but the relative risk was minimally changed (
4).
| Table 2The baseline demographics and potential prostate cancer risk factors |
Because the population to be studied in this project is frequency-matched case-control with specific oversampling of non-white race and positive family history, men in the project were older and have a higher baseline PSA than those not in the project. Additionally, there are more biopsies on the placebo arm than on finasteride.
Overall analytical approach
The approach for the analysis of the primary aims for each of the projects is to apply the same methods across all projects when possible. This includes using the same set of cases and controls and definitions of exposure variables and endpoints. The preliminary analyses produce standard tables and refinements are made to address issues specific to individual projects and/or analyses. For the serum analyses, this has been relatively straightforward to implement once agreement among the investigators was achieved. Because the tissue analyses are done in differing subsets, the analyses are more project-specific.
Each biological analyte is assessed separately in each of the intervention (placebo and finasteride) groups. If there is no evidence of an interaction between the exposure analyte and intervention, the two arms are pooled. This increases the sample size and greatly increases the power to identify associations.
In the design phase, each investigator consulted with the statisticians to determine the achievable power for each of the primary and cross-project aims given a sample size of 1809 cases and 1809 controls and their pre-specified, clinically meaningful differences. Power was calculated for the main effects, intervention interactions and effect modification by genetic variation. – provide some examples of the minimally detectable odds ratios with 80% power and a 2-sided alpha of 0.05.
| Table 3Minimally detectable odds ratio |
| Table 5Minimum detectable Odds Ratios of prostate cancer and high grade disease |
Most cross-project interactive aims take the analytic form of tests of interactions: between genes, between genes and metabolic and behavioral measures, and between metabolic and behavioral measures. These cross-project collaborations are feasible because of both the large number of cases and the coordination of study designs (including sample selection) across projects. There is also reasonable power to detect statistically significant tests of cross-project aims. In general, if odds ratios comparing high to low quartiles of an exposure differ by a ratio of about 2 between two groups (e.g., by polymorphism), there is 80% power to detect such a difference.
Another focus of our analyses is to maximize the ability to address questions related to race/ethnicity and prostate cancer risk. African-American (AA) men have an elevated risk of prostate cancer, and there is some evidence that this may be linked to genetic characteristics (e.g., short
AR CAG repeat length,
CYP3A4 polymorphisms) and higher serum androgen levels (
19). Extensive efforts were made to recruit minorities into the PCPT, however only 8% of participants were non-white and only 4% were AA. We have oversampled controls to include all non-white men, and race/ethnicity is used as a stratification variable in exploratory analyses. Although limited in power to address hypotheses stratified by race/ethnicity, this program includes the largest AA cohort (216 controls and 90 cases) we are aware of to have been included in comprehensive molecular epidemiologic studies of prostate cancer.
For the genetic aims, the program investigators are primarily using a candidate gene and SNP approach. Several criteria were used to select the specific genes and polymorphisms to be assayed in the different pathways of interest. Data on functional significance or data demonstrating a significant role in prostate or any other cancer were the primary criteria used for selection. Other criteria include SNPs that result in amino acid changes in the protein (i.e., nonsynonymous SNPs), changes in promoter regions and splice sites. In general, variants with a minor allele frequency (MAF) of 5% or higher were selected in order to have good statistical power. The major exception to this is SNPs in Project 1, where a number of variants at very low frequency will be studied in the SRD5A2 gene because of their known functional significance and relevance to the finasteride intervention. Additional genotyping will be done on tagging SNPs for key genes. Each project prioritized which tagging SNPs would be genotyped based on their knowledge of the gene's variants and the number that would be needed for coverage.
With extensive serum measurements and genotyping from several different projects, the program provides the opportunity to study the complexity of potential joint effects within and between projects. A number of methods will be explored including logistic regression adjusting for “upstream” effects and potential interactions, Classification and Regression Trees (CART) or Logic Regression to identify prognostic groups based on combinations of SNPs or other biomarkers. Bayesian model averaging which draws inferences about particular effects, taking into account the uncertainty about the correct model form will also be performed. A pharmacokinetic approach which attempts to use physiologically-based representations of the biochemical pathways to statistically model the dependencies between factors and pathway analysis which provides estimates of hypothesized causal connections between variables are other planned analytic methods. All of these methodologies rely on interdisciplinary collaboration. These models are useful not only to identify the complex joint effects on prostate cancer risk but also to identify groups of men who are either more or less likely to benefit from the chemopreventive effects of finasteride.
At the time of the development of this project, it was clear that the statistical analysis for genetic polymorphisms was a rapidly evolving field. Over the course of the project, improved and less expensive methods have become available, which have permitted SNP analyses to be conducted at lower cost while providing flexibility to select new candidate SNPs and haplotype tagging SNPs.