Tissue Microarrays (TMAs) measure tumor-specific protein expression via high-density immunohistochemical staining assays. They provide a proteomic platform for validating cancer biomarkers emerging from large-scale DNA microarray studies. Repeated observations within each tumor result in substantial biological and experimental variability. This variability is usually ignored when associating the TMA expression data with patient survival outcome. It generates biased estimates of hazard ratio in proportional hazards models. We propose a Latent Expression Index (LEI) as a surrogate protein expression estimate in a two-stage analysis. Several estimators of LEI are compared: an Empirical Bayes (EB), a Full Bayes (FB), and a Varying Replicate Number (VRN) estimator. In addition, we jointly model survival and TMA expression data via a shared random effects model. Bayesian estimation is carried out using a Markov Chain Monte Carlo (MCMC) method. Simulation studies were conducted to compare the two-stage methods and the joint analysis in estimating the Cox regression coefficient. We show the two-stage methods reduce bias relative to the naive approach, but still lead to under-estimated hazard ratios. The joint model consistently outperforms the two-stage methods in terms of both bias and coverage property in various simulation scenarios. In case studies using prostate cancer TMA data sets, the two stage methods yields a good approximation in one data set while an insufficient one in the other. A general advice is to use the joint model inference whenever results differ between the two-stage methods and the joint analysis.

Keywords: Biomarker, Empirical Bayes, Joint modeling, Mixed effects, Tissue microarray, Varying