presents estimates of the odds ratios and corresponding 95% confidence and credible intervals for the first stage covariates for the pseudo-likelihood method and the Bayesian method, respectively (columns 1 and 2). It also includes results obtained from the pseudo-likelihood method under the assumption that the random effects β are exchangeable (corresponding to no second stage covariates), and the maximum likelihood estimates obtained from an ordinary logistic regression (first stage model with fixed effects only). Estimates for the confounder variables included in the first stage model are very similar for all of these methods, and are not sensitive to different prespecified values of τ2 for the Semi-Bayes pseudo-likelihood approach (data not shown), suggesting that with sufficient sample size, the methods accurately estimate these parameters.
Odds Ratios and corresponding 95% confidence intervals and credible intervals (Bayesian) for the first stage covariates under Gibbs sampling, pseudo-likelihood, and maximum likelihood methods.
In subsequent tables we present analyses of the individual variants. We include in these tables all variants observed on at least three patients and a representative selection of the remaining variants which are each observed only once or twice in the dataset. In and we present the results of four analyses based on the pseudo-likelihood method using different combinations of the higher level covariates. Models 1 through 4 all include an intercept and, respectively, no higher level covariates, functional status alone, functional status plus history of familial aggregation (GenoMEL), and finally both of these in addition to the Polyphen score and the indicator for missing Polyphen scores. The coefficients of the higher level covariates are in . All three higher level covariates appear to contain important information for predicting the risks of the individual variants, though their individual rate ratios are attenuated when they are included together in the model, due to modest collinearity between these factors.
Odds Ratios and corresponding 95% confidence intervals for the higher level covariates under pseudo-likelihood methods
Odds Ratios and corresponding 95% confidence intervals for selected variants using maximum likelihood and pseudo-likelihood methods under different second stage model scenarios.
In we see that the relative risks of the individual variants seem to be well differentiated by the second stage covariates and by the empirical case-control relative frequencies. The results for the 6 most frequent variants at the top seem to show estimators that are influenced by the observed case-control (MPM/SPM) ratios of counts. That is, high odds ratios (both statistically significant) are estimated for the two variants that have a strong preponderance of MPM cases, namely −34 G>T, despite the fact that this variant is non-functional, and c.301 G>T, although this variant is both functional and has been observed in melanoma families also. Null or negative associations are estimated for the other 4 of these relatively frequent variants (despite the fact that c.8_9 is a “functional” variant). The remaining variants with 3 or fewer occurrences seem to be strongly influenced by the hierarchical covariates. For example, c.159 G>C has a baseline relative risk estimate of 2.2, but this rises to 2.7 on the basis of its classification as functional (Model 2), to 4.5 on the basis of its additional classification as a “familial” variant, and to 6.5 on the basis of its high Polyphen score. Conversely, c.170 C>T has a relatively high relative risk of 4.7 based on its classification as functional, but absence of evidence of prior family clustering and a low Polyphen score reduces this estimate to 1.6. Confidence intervals are generally wide for these rarer mutations. However, it is clearly possible to achieve individual statistical significance, an example being c.334 C>G where the designation of functionality, the fact that this variant has been observed in melanoma families, and the fact that both carriers are cases is sufficient, after drawing strength from the aggregating powers of the model, to conclude with statistical significance that this variant confers a high risk.
It is of interest to compare the results of ordinary logistic regression (MLE) with those from Model 1, which is in essence a random effects model with no higher level covariates. It is noticeable that the more extreme odds ratio estimates from the logistic regression are attenuated substantially in the random effects setting (), notably those at −34 G>T, c.301 G>T and c.123 G>A, a familiar shrinkage phenomenon of empirical Bayes techniques.
contrasts the results from the Bayesian (Gibbs sampling) and pseudo-likelihood approaches. The results from the two methods are generally similar, though the Bayesian approach seems to lead to more extreme estimates in several cases, and generally wider intervals (note that the Bayesian method produces a larger estimate for τ than the pseudo-likelihood method: 1.9 versus 1.2). One limitation of running the Gibbs sampler in this scenario is that due to the sparseness of the data the procedure is computationally intensive. For the GEM data, on an Intel Pentium 4 CPU 3.06 GHz workstation it takes 4 minutes in WinBUGS (1.35 minutes in OpenBUGS on Linux) per each additional iteration, thus needing almost 67 (23) hours per one thousand iterations. To ensure convergence of the Markov chain, one would typically allow it to run for a large number of iterations, a task that may be infeasible depending on the data set. In contrast, the pseudo-likelihood method implemented by the GLIMMIX macro converges and produces results in a few seconds.
Table 5 Odds Ratios and corresponding 95% confidence intervals and credible intervals (Bayesian) for selected variants and for second stage covariates under Gibbs sampling, pseudo-likelihood methods, and diagnostic tests for the Gibbs sampler (p=passed, f=failed). (more ...)
To assess convergence we used the graphical tools provided by the WinBUGS menu (trace, history, and quantile plots), but also some more formal convergence diagnostics such as the Gelman-Rubin convergence statistic [32
] and the convergence tests proposed by Heidelberger and Welch [33
]. Based on three chains, the scale reduction factors of Gelman and Rubin [32
] and Brooks and Gelman [34
] were all close to 1 suggesting satisfactory convergence. After 8500 burn-in iterations followed by 10000 iterations, all the parameters in the model passed the stationarity test of Heidelberger and Welch [33
]. However not all of these passed the half-width test, where the accuracy of the test was ε
= 0.1 (which is the CODA default). These results can be seen in for our selected set of the investigated variants.
Recognizing that hierarchical analyses can be sensitive to the choice of prior [35
], we conducted a small sensitivity analysis in which we varied the scale and shape parameters of the Inverse-Gamma prior for τ
from 0.01 to 0.0001. A Uniform(0, 100) distribution was also assumed for comparison. The choice of the scale and shape parameters of the Inverse-Gamma distribution did not change the results substantially though the relative risk estimates of the variants were slightly more variable in the case of the Inverse-Gamma(0.0001, 0.0001). The choice of a uniform prior led to somewhat more extreme estimates for the relative risks of some of the variants and generally wider credible intervals. The odds ratios and credible intervals for the first and second stage covariates were very similar regardless of the prior distribution for τ
Finally, we examined the Semi-Bayes approach by comparing the odds ratio estimates of the variants for varying pre-specified values of τ. The results show that the estimates diverge monotonically, suggesting that pre-specification of τ is not a good strategy in the absence of strong prior information about τ as it can lead to radically different estimates. As an example, c.301 G>T has an odds ratio of 3.5 for τ = 0.1, 4.6 for τ = 0.5, 6.0 for τ = 0.8, 7.0 for τ = 1, 7.3 for τ = 1.2, and 10.0 for τ = 1.9. Other variants exhibited similar trends.