Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC2827886

Formats

Article sections

- Abstract
- 1. Introduction
- 2. A class of two-stage multilevel models
- 3. Maximum likelihood using the EM algorithm
- 4. A simulation study
- 5. Differential expression between lung cancer subtypes
- 6. Discussion
- A Notes on the EM algorithm
- B Figures
- References

Authors

Related links

Int J Biostat. 2009 January 1; 5(1): 18.

Published online 2009 May 29. doi: 10.2202/1557-4679.1129

PMCID: PMC2827886

Copyright © 2009 The Berkeley Electronic Press. All rights reserved

In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately capitalizing on the benefits of a well fitting fully-specified likelihood in the terms of gene ranking is difficult.

Multilevel models (Carlin and Louis, 2000) are widely used for the simultaneous estimation of an ensemble of related quantities. In recent years, they have been applied successfully in situations where the number of quantities to be estimated simultaneously is very high. For example, estimands could be associated to each of the zip codes in a nation, or each of the genes in an organism. In these applications, it can be useful to rely on parametric models that allow for fast computation. From this standpoint a very convenient default approach has been to choose a so called conjugate multilevel model (see Ando and Kaufman, 1965; Lindley and Smith, 1972). However, this approach can be too restrictive, as they impose specific constraints on the relationship between sources of variations at different levels of the model. Moreover, it is typically applied without validating these assumptions. In this paper we explore the ability of the likelihood to detect specific departures from conjugacy. Furthermore, we propose a general class of multilevel models that highlights a particularly important continuous scale of departures from conjugacy. We investigate the potential robustness offered by estimating the conjugacy relationship as a component of model building.

Our development is motivated by the analysis of data from gene expression microarrays, and is illustrated in that context, although the techniques can be applied much more broadly. These experiments study the expression levels of thousands of genes across experimental conditions. Because the number of replicates is typically small, the borrowing of strength afforded by multilevel models can be critical and multilevel models are widely applied (see Cui et al., 2005; Baldi and Long, 2001; Newton et al., 2001; Lönnstedt and Speed, 2002; Ibrahim et al., 2002; Parmigiani et al., 2002; Su et al., 2007). While commonly used, the assumption of conjugacy is often violated by the data, as shown in Liu et al. (2004). On the other hand, the large number of genes potentially gives one the opportunity to check this assumption and reliably fit more complex models.

To provide an idea of the nature of the generalization investigated in this article, consider experiments that compare two groups across several units of interest, such as genes. Following Liu et al. (2004), we focus on three parameters of interest. The first is a mean difference in the response between two groups for each unit, referred to as the unit-specific signal; the second is the overall expression for a gene across the groups, called the unit-specific abundance and the third is the standard deviation of the expression measure for a unit, called the unit-specific noise. Throughout we assume the noise is common across the two groups being compared.

Generally, conjugacy refers to the property that both the prior and the posterior distribution of these parameters, conditional on hyperparameters, belong to the same family, in this case a normal-gamma. In this context, to achieve conjugacy one needs to assume a) independence of the signal to noise ratio and the noise, and b) independence of the abundance to noise ratio and the noise. Liu et al. (2004) consider the impact of relaxing this assumption in favor of independence between the signals and the noise and the abundances and the noises. In this manuscript, we present a class of distributions that formally estimates the relationship between the signal and noise. In addition, our formulation allows for separate variances on the distribution of signals and abundances and the possibility of a positive correlation between the signals and abundances. All analyses and simulations were performed in the statistical language R (Ihaka and Gentleman, 1996). Software fore fitting can be downloaded from http:\\www.biostat.jhsph.edu\~bcaffo\downloads.html

The paper is laid out as follows. In Section 2 we present the new model while Section 3 gives an easily implemented expectation maximization (EM) fitting algorithm. In Section 4 we evaluate the performance of the power conjugate model in a simulation study. In Section 5 we analyze data from three lung cancer studies using the model. Finally, Section 6 presents a summary discussion.

We develop the model in general though focus our examples and discussion on two group comparisons. Let *y _{g}* be a vector of length

$${y}_{g}|{\beta}_{g},{\lambda}_{g}\hspace{0.17em}\sim \hspace{0.17em}\text{N}(X{\beta}_{g},\hspace{0.17em}\hspace{0.17em}{\lambda}_{g}I)$$

(1)

where *β _{g}* is a

While such a model is generally appropriate for one or a few units, applications such as gene expression arrays require hundreds or thousands of models such as (1). In this setting gene-specific maximum likelihood is wastefully ignorant of the aggregate information contained across units. Therefore, we specify a hierarchical model by assuming

$${\beta}_{g}|{\lambda}_{g},\hspace{0.17em}\delta ,\hspace{0.17em}\mu ,\hspace{0.17em}F\hspace{0.17em}\sim \hspace{0.17em}\text{N}(\mu ,\hspace{0.17em}\hspace{0.17em}F{\lambda}_{g}^{\delta})\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\text{and}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{\lambda}_{g}|\nu ,\hspace{0.17em}\hspace{0.17em}\tau \hspace{0.17em}\sim \hspace{0.17em}\text{IG}(\nu ,\hspace{0.17em}\hspace{0.17em}\tau ),$$

(2)

where IG is short hand for the inverted gamma density,

$$\frac{{\tau}^{\nu}}{\Gamma (\nu )}{\lambda}_{g}^{-\nu -1}\text{exp}(-\tau /{\lambda}_{g}).$$

Here *μ* is the (*p ×*1) inter-unit mean of the *β _{g}* and

In this manuscript, we explore the ability of the marginal likelihood (obtained by integrating over *β _{g}* and

To illustrate this parameter, consider Figure 1 which displays an example of the conditional density (in gray scale) of *β _{g}* given

Illustration of the impact of *δ.* Plotted is the conditional distribution of a scalar random effect slope parameter, *β*, with mean value 0, conditional on varying values of *λ*. Gray scales represent the height of the density (darker **...**

The rate of increase of the conditional variability of *β _{g}* for given noise gets larger with

Recall, that the conditional variation in the *β _{g}* increases with

Illustration of the impact of *δ*. Plots of the marginal density of *β* for differing assumptions regarding *δ.*

In this article we focus on two group comparisons. For example,

$$X=\left(\begin{array}{cc}1& .5\\ \vdots & \vdots \\ 1& .5\\ 1& -.5\\ \vdots & \vdots \\ 1& -.5\end{array}\right)\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{\beta}_{g}=\left(\begin{array}{l}{\beta}_{0g}\\ {\beta}_{1g}\end{array}\right)\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\mu =\left(\begin{array}{l}{\mu}_{0}\\ {\mu}_{1}\end{array}\right)\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}F=\left(\begin{array}{cc}{F}_{0}& {F}_{01}\\ {F}_{01}& {F}_{1}\end{array}\right).$$

Here we refer to *β _{0g}* as the gene-specific abundance,

A closely related modeling approach is presented in Liu et al. (2004). In particular, their model specified that the conditional variance of *β _{g}* given

$$\left(\begin{array}{cc}{F}_{0}{\lambda}_{g}^{{\delta}_{0}}& 0\\ 0& {F}_{1}{\lambda}_{g}^{{\delta}_{1}}\end{array}\right).$$

(3)

They subsequently characterized the model with (*δ*_{0}, *δ*_{1}) = (0, 0) as complete independence, the model with (*δ*_{0}, *δ*_{1}) = (1, 1) as complete conjugacy. These are special cases of model (2) with *δ = 0* and 1 respectively and *F*_{01} = 0. However, they also considered cases (*δ*_{0}, *δ*_{1}) = (0, 1) and (*δ*_{0}, *δ*_{1}) = (1, 0), which they referred to as independence-conjugate and conjugate-independence models respectively. Thus, their model explicitly allows for separate conjugacy relationships for the signal and noise.

Our approach differs markedly from that study by: (*i*) allowing for the correlation, F_{01}, to be non-zero, *(ii)* focusing on maximum likelihood estimation and fully specified models rather than real-time moment-based statistics, *(iii)* assuming that *δ _{0}* =

We discuss computational fitting with regard to model (2). Maximum likelihood estimates of *F*, *μ*, *v* and *τ* are obtained by maximizing the marginal likelihood

$$\int f({y}_{g}|{\beta}_{g},\hspace{0.17em}\hspace{0.17em}{\lambda}_{g})\hspace{0.17em}f({\beta}_{g}|{\lambda}_{g};\delta ,\hspace{0.17em}\hspace{0.17em}\mu ,\hspace{0.17em}\hspace{0.17em}F)\hspace{0.17em}f({\lambda}_{g};\hspace{0.17em}\nu ,\hspace{0.17em}\hspace{0.17em}\tau )d{\beta}_{g}d{\lambda}_{g}$$

(4)

where *f* denotes a generic density. We label these maxima, , , and . We suggest calculating these maxima using the EM algorithm. It will be seen that this choice results in considerable simplification in calculations. In practice, we have seen that the convergence of the algorithm is quite acceptable, usually requiring only a few minutes. While not quite real-time in the sense of Liu et al. (2004), we find that the algorithm is generally stable and requires virtually no user input.

When *δ* = 1, , , and are easily obtained using EM, since all of the relevant integrals are tractable. For arbitrary *δ*, the relevant integrals for applying EM or marginal maximization are not tractable and hence we apply numerical integration techniques based on Gaussian quadrature. Below, we outline the fitting. One important note is that it is much easier to calculate a profile likelihood for *δ* rather than incorporating maximization with respect to *δ* into the M step of the EM algorithm. In general, rather than maximizing the profile likelihood numerically, we suggest using a grid search. This is largely because we have found that a fine grid for *δ* is not necessary.

Before we discuss the details of the fitting algorithm, we briefly discuss EM. A more complete introduction can be found in Lange (1999). To put this problem into the framework of EM, we consider the *y _{g}* as the “observed data”, the

The (*t* + l)st parameter estimates, say, are obtained by separately maximizing the marginal log likelihood (see Appendix equation (7)) with respect to *μ* and *F* and (8) with respect to *ν* and *τ*. Benefits of applying EM are that (7) and (8) can be maximized independently and that (7) has the closed form solutions

$${\mu}^{(t+1)}=\sum _{g=1}^{G}{E}_{t}^{*}[{\beta}_{g}]/G$$

and

$${F}^{(t+1)}=\sum _{g=1}^{G}{E}_{t}^{*}[({\beta}_{g}-{\mu}^{(t+1)})({\beta}_{g}-{\mu}^{(t+1)})\text{'}{\lambda}_{g}^{-\delta}]$$

provided *F* is unstructured. The notation,
${E}_{t}^{*}$, is defined in Appendix A. Closed form solutions for *μ* and *F* for fixed (*δ*_{0}, *δ*_{1}) also exist if model (3) is employed.

Maximizing (8) with respect to *v* and *τ* is equivalent to maximizing a likelihood of independent, identically distributed gamma variates for which we employ an algorithm from Johnson et al. (1995). The previous parameter estimates, *μ** ^{(t)}* and

It is possible to obtain REML-type estimates of *ν* and *τ* to avoid using this iterative algorithm within the M step. Notice that *z*_{g}* H*_{yg} is normally distributed given *λ** _{g}* with mean 0 and variance

The following facts are useful for application of the algorithm:

- Since the distribution of
*β*_{g}*|λ*_{g}*, y*is normal, each intractable integral can be reduced (by iterating expectations) to a univariate integral involving only_{g}*λ*._{g} - Only one pass through the gene index
*g*is required per EM iteration. In particular, only the five quantities ${\sum}_{g=1}^{G}{E}_{1}^{*}[\text{log}{\lambda}_{g}]/G$, ${\sum}_{g=1}^{G}{E}_{t}^{*}[{\lambda}_{g}^{-1}]/G$, ${\sum}_{g=1}^{G}{E}_{t}^{*}[{\beta}_{g}]/G$, ${\sum}_{g=1}^{G}{E}_{t}^{*}[{\beta}_{g}{\lambda}_{g}^{\delta}]/G$ and ${\sum}_{g=1}^{G}{E}_{t}^{*}[{\beta}_{g}{\beta}_{g}^{\text{'}}{\lambda}_{g}^{\delta}]/G$ need to be calculated and stored, where*E**is defined in the Appendix. - Without accurate starting values, cycling through the entire gene index is unnecessary early on in the algorithm. Instead one can cycle through a random subset of genes increasing the size of the subset as the algorithm progresses (Caffo et al., 2002).

Remaining is a discussion of how to approximate the intractable expectation. Because the distribution of *β*_{g}*|λ*_{g}*, y** _{g}* is tractable, the E-step can be reduced to finding a method for approximating integrals with respect to the distribution of

In this section we evaluate model performance when estimating *δ* via an extensive simulation study. We focus on three aspects of modeling: *i*) estimation of *δ*, *ii)* the importance of modeling the covariance between *β*_{1g} and *β** _{2g}* and

$$E[{\beta}_{1g}/{\lambda}_{g}^{1/2}\hspace{0.17em}|\hspace{0.17em}{y}_{g},\hspace{0.17em}\hspace{0.17em}\widehat{\mu},\hspace{0.17em}\hspace{0.17em}\widehat{F},\hspace{0.17em}\hspace{0.17em}\widehat{\nu},\hspace{0.17em}\hspace{0.17em}\widehat{\tau}],$$

(5)

where the estimated parameters are plugged-in after taking the expectation. In our simulation study we compare rankings based on this statistic assuming conjugacy (*δ* = 1) and independence *(δ* = 0) to those with *δ* estimated.

We assumed *G* = 1, 000 or 10, 000 genes with 4 or 32 replicates per group, *J* = 8 or 64. We considered a variety of parameter settings and simulated 100 complete data sets per setting. The following transformed parameters are most useful for summarizing the results: (*i*) the overall ratio of the standard deviations of the abundances to the signals,
$\phi =\sqrt{{F}_{0}/{F}_{1}}$, (*ii*) the correlation between the signals and abundances,
$\rho ={F}_{01}/{F}_{0}^{1/2}{F}_{1}^{1/2}$, and (*iii*) the coefficient of variation, *CV,* of the inverted gamma distribution and *(iv)* the mean, *M*, of the inverted gamma distribution on the *λ** _{g}*. Finally we also considered four values of

$$\phi \in \{.5,\hspace{0.17em}\hspace{0.17em}1,\hspace{0.17em}\hspace{0.17em}2\}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\rho \in \{0,\hspace{0.17em}\hspace{0.17em}.5\}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}CV\in \{.5,\hspace{0.17em}\hspace{0.17em}1,\hspace{0.17em}\hspace{0.17em}2\}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}M\in \{.5,\hspace{0.17em}\hspace{0.17em}1,\hspace{0.17em}\hspace{0.17em}1.5\}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\delta \in \{0,\hspace{0.17em}\hspace{0.17em}.5,\hspace{0.17em}\hspace{0.17em}1,\hspace{0.17em}\hspace{0.17em}1.5\}.$$

The simulations were conducted using the R open source programming language (Ihaka and Gentleman, 1996). Because of the extensiveness of the simulations, we highlight the most salient results that were consistent across simulation settings.

To obtain maximum likelihood estimates for *δ*, we performed both coarse, *δ * {0, 1}, and fine, *δ * {0, .1, .2,..., 1.5}, grid searches for each simulation setting. Table 1 shows the modal estimates of *δ* from the coarse grid search across the 100 simulations by the true value of *δ* for *G* = 1,000 and *G* = 10, 000 and *J* = 8 and 64; results are collapsed over the remaining simulation parameters and combined in a single contingency table. This table highlights a consistent finding; that is, low estimated values of *δ* tend to be accurate. For example, Table 1 depicts counts across simulation settings of the modal estimate of *δ* within each simulation setting. The table suggests that when considering only conjugacy or independence, an estimated *δ* of 0 strongly suggests independence, while an estimated *δ* of 1 apparently offers less insight into the the true conjugacy relationship across the simulation parameters considered. However, this conclusion is the result of the fitted model tending to prefer the conjugacy model over independence for the parameter settings chosen. Such a result may be due to the fact parameter settings are not entirely exchangeable; for example, fixing the remaining parameters and switching from independence to conjugacy adds marginal variability in the signals and abundances.

Simulation results for likelihood-based estimation of *δ*. For each parameter setting, the modal estimate of *δ* (either 0 or 1) across simulations was obtained. Shown above are the counts of each mode for various values of *J*, *G* and true values **...**

The pattern was similar for the finer grid search for *δ.* Figure 3 shows the average estimated value of *δ* by the true value across the remaining parameters when *J* = 8 and *G* = 1,000. Again this figure suggests greater accuracy for smaller estimated values of *δ* and, in general, the preference of the model for choosing larger values of *δ.*

We compared Model (2) that estimates the correlation between signals and abundances, *F*_{01}, to with the model where F_{01} = 0. When it is assumed to be zero, we considered both estimating *δ* and fixing it at either complete conjugacy or complete independence. Unlike (3), however, we do not consider instances where the conjugacy relationship differs for the signals and abundances. We compare areas under the receiver operating characteristic curve (AUC) for detecting the largest 1%, 5% and 10% signal to noise ratios across the two groups. We compare the median AUC across the 100 simulations for each of the parameter settings.

Incorrectly assuming that the correlation between the signals and abundances is zero can have a very negative impact on results, especially for very small sample sizes. For example, consider comparing the model that performs a coarse grid search for *δ* and estimates the correlation between signals and abundances and to the model that assumes the correlation is zero. When the number of observations and genes was smaller (*J* = 8 and *G* = 1,000) there were numerous instances where fixing the covariance to be zero (*F*_{12} = 0) resulted in very poor operating characteristics. Figure 4 displays the ratio of the median AUCs for the best performing model presuming a correlation versus the best performing model presuming independence. Points to the right of the figure have a higher assumed covariance between the signals and abundances. The figure displays that the penalty for incorrectly assuming a correlation is generally less than incorrectly assuming independence. This figure omits some 30% of the simulation settings under independence where the models assuming independence had very poor performance, sometimes with AUCs less than 0.50. These are not shown, as we stipulate that those extremely poor operating characteristics may be due to numerical instabilities, though no evidence suggests that this is the case.

Ratios of median areas (taken across simulation iterations) under the receiver operating characteristic curve comparing the best performing model allowing for a non-zero signal/abundance correlation to the best fitting model assuming independence for **...**

There were instances where the best model allowing for possible correlation outperformed the assumption of independence, even when even when the actual correlation used for simulation was zero. We believe this is a result of the additional joint variability in the signals and abundances implied by the conjugacy relationship. However, the general pattern suggests that one is safer allowing for the possibility of a correlation rather than assuming independence, even if some higher order aspects of the model are incorrectly specified. Hence these results mirror conventional wisdom regarding random effect slopes and intercepts.

In this section we consider the potential robustness that estimating *δ* offers to misspecification. To focus the discussion, consider only the coarse search where *J* = 8 and *G* = 1,000; the remaining cases were similar, though more favorable toward estimating *δ* as the number of genes and samples increased and less favorable with the finer grid search.

In terms of the median AUC for the coarse search, the model with the estimated *δ* outperformed one of either the conjugate or independence model 23% of the time. Overwhelmingly, the instances where estimating the conjugacy relationship performed well in the terms of AUC was when *F*_{1} *> F*_{0} or *ρ*= .5. Moreover, the instances where estimating *δ* showed an improvement, was when the true model was conjugacy and the model selected conjugacy. Moreover, the improvements tended to be modest, with an average 1% improvement over the worst model, in the cases where estimating *δ* represented the best model.

In contrast, the model assuming *δ = 0* was the best on 50% of the simulation settings. The conjugate model, *δ* = 1, was the best on 50%, though tied with model with an estimated *δ* on 23% of those cases. Surprisingly, the model performance was largely unaffected by the true value of delta. For example *δ* = 1 for only 50% of the simulation settings where assuming conjugacy resulted in the best model. This suggests a complex interplay between: the conjugacy relationship, the other model parameters, the statistic used for ranking, and ranking performance.

Because of concerns over the role that fitting played a part in the simulation conclusions we conducted a smaller scale simulation study using Gibbs sampling with diffuse priors for fitting. Again the program was coded in the R statistical programming language. Since the results from this simulation confirmed the ML results, they are not presented.

The results of the simulation study suggest that we cannot give an unqualified recommendation for using the likelihood to estimate the conjugacy relationship. The higher order components of the model defining the conjugacy relationship are only weakly identified, even in genomic settings. The simulation results were more clear on the benefit and robustness for modeling a non-zero correlation between signals and abundances, even if one specifies an incorrect conjugacy model.

With regard to the coarse grid search for *δ*, i.e. likelihood based model selection between the conjugate and independence model, the simulation results suggest that an estimated independence model appears more trustworthy than an estimated conjugate model. We believe this is in part due to the added marginal variability imposed by the conjugate model. That is, the simulation settings for the conjugate model had greater variability in the signals and abundances than those of the independence model. As a consequence, it appears that confirming conjugacy via the likelihood requires a larger sample. With regard to finer estimation of *δ*, because of the difficulty in estimation, a very fine grid, or numerical maximization algorithm does not seem warranted. Again, the results of the finer grid search suggested that lower values appear to be more trustworthy than larger.

Further complicating the discussion was that there was no clear winner among the strategies “always choose conjugacy” and “always choose independence” for ranking signal-to-noise ratios. The strategy of estimating *δ* is not preferable to the strategy of “always choosing conjugacy.” However, such a statement only considers ranking and selection of the topmost signal-to-noise ratios. It appears that likelihood-based estimation of the *δ* parameter is most accurate and offers no penalty in performance when there is a large degree of correlation between the signals and abundances and when there is potential imbalance in the variance of the signals and abundances. We believe that this largely depends on the assumption of the same conjugacy relationship across the signals and abundances. For example, having low variation in the abundances allowed for a better estimate of *δ* while having high variation in the signals increased the impact that the model assumptions had on the shrinkage of the test statistics.

The data for this example were collected from three separate studies from researchers at Harvard University (using Affymetrix Hu95a), University of Michigan (using Affymetrix HG6800) and Stanford University (using cDNA) (Bhattacharjee et al., 2001; Beer et al., 2002; Garber et al., 2001). We demonstrate that the Stanford data set has properties that would make estimating the conjugacy relationship worthwhile. Here we focus on measures of expression taken for 307 genes using cDNA microarrays. These 307 genes were selected as the most variable subset of a larger group of several thousand in an analysis that combined the three studies (Parmigiani et al., 2004) to compare survival rates over gene expression profiles. In the Stanford data set, there were 68 samples with 31 cases and 37 controls. The outcome is the log-relative expression from the two-channel array.

Figure 6 in Appendix B shows boxplots of the separately estimated signals minus their overall means divided by the noises to the power *δ* for *δ* = 0,1 and estimated using a fine grid search. Figure 7 in Appendix B replicates this plot for the separately fit abundances. Here “separately estimated” refers to taking the simple gene-specific overall means, difference in means and pooled variances. None of the models presented in this paper appear to hold for the Harvard or Michigan studies, because of a marked increase of the noises with the abundances. For the Stanford study, the assumption of model (2) appears to hold, with a similar dependence structure for both the signals and variance and abundances and variances. It is difficult, however, to determine a reasonable value for *δ* from the plot alone. Figure 5, shows the profile likelihood for *δ* (with reference lines drawn at 1/32 and 1/8, see Royall, 1997) for the Stanford data set. The model fits suggest a value between conjugacy and independence.

Boxplots of
$({\widehat{\beta}}_{0g}-{\widehat{\mu}}_{0})/{\widehat{\lambda}}_{g}^{\delta /2}$ by deciles of
${\widehat{\lambda}}_{g}^{1/2}$ for _{2g}and _{g} estimated by the gene-specific sample means and variances for three values of *δ* for the Harvard, **...**

Interestingly, there is significant correlation between the signals and abundances in the Stanford data set. That is,
${F}_{01}/{F}_{0}^{1/2}{F}_{1}^{1/2}$, was estimated to be –.64, which is validated by the estimated correlations of the separately fit signals and abundances of –.61. Also, the estimates of the between-gene abundances and signal variances (*F*_{0} and *F*_{1}. respectively) are quite different, with *F** _{0}* being 56% larger than

In this article we investigated the ability of the likelihood to discern specific departures from conjugacy in multilevel models. We gave a real-data example illustrating this estimation that has some of the hallmarks where the approach may work well. The proposed class of models estimates the relationship between the variation of group-specific means and the noise with which those are measured, while traditional approaches impose restrictions on such relationship. Therefore, this class allows for a wider variety of data patterns than the most common multilevel models while still retaining a completely specified likelihood. We presented an efficient EM algorithm for fitting data using this family of models.

We explored the potential of using this class of multilevel models for determining differential expression in two group comparisons of gene expression arrays. We emphasized two important modeling considerations: diagnosing the correct form of the conjugacy relationship between gene specific mean parameters and the variance parameter, and addressing correlation between signals and abundances. A potentially more robust approach for future research would employ non-parametric methods to fit more flexible models in these settings (see Newton et al., 2004, for example.).

Our simulation study suggests tempered enthusiasm for likelihood-based estimation of conjugacy relationships for the purposes of gene ranking by signal-to-noise ratios. Identifying higher-order components of the model is difficult, even with the ensemble of information contained in thousands of genes. We found that estimates closer to independence tend to be more reliable than those suggesting conjugacy. We found that when there is a high correlation between signals and abundances, while a large discrepancy in their variances, estimating the conjugacy relationship resulted in good performance. However, in these settings, the model tended to select conjugacy. Hence, there is evidence to suggest that the likelihood can select between conjugate and independence models and potentially reap the benefits of a better fitting model. However, the simulation results suggest that adequately capitalizing on a better fitting model is a difficult task. In addition, the simulation study suggests a complicated relationship between the degree of conjugacy and the remaining parameters, statistic used, ranking objective and performance.

One must consider the following caveats when interpreting the simulation evidence. First, though extensive and requiring several months of computer time, the simulation settings studied represented only a small subset of possible real-world settings. Secondly, our focus on estimating reliably measured genes via signal-to-noise statistics is only a small subset of possible methods to rank genes. For example, one could similarly rank genes via posterior medians, Bayes factors, posterior tail probabilities, best linear predictors of the signal, the joint posterior distributions of the ranks and so on. In fact one could consider the plethora of methods in which a fully specified model could be used as both a strength and weakness of such an approach.

Our investigations showed that other signal-to-noise, such as the moderated T statistic of Smyth (2004), statistics performed well across simulations. Their performance notwithstanding, it is worth considering the strengths of a completely specified model-based approach over robust signal-to-noise statistics, which includes a more complete description and summary of the data, as well the ability to rank genes by other properties and a mechanism for extending results to a population. In addition, there may be benefits to a complete Bayesian solution that are overlooked in the current study. These include the diversity of statistics available from Bayesian machinery. Moreover, Bayesian methods for model averaging and model search might improve over strict likelihood-based methods for selecting conjugacy relationships.

The use of multilevel models to rank gene expression differences is a much more subtle process than using moment-based signal-to-noise statistics. Of particular importance is the fact that the large volume of genes under investigation can magnify the negative effects of unmet assumptions in models. Therefore, more flexible models need to be considered for statistics based on multilevel models to be on par with robust statistics. However, our investigation also illustrates that, while addressing higher order features in the data is necessary, constructing robust models to perform the estimation typically can force a compromise between a better fitting model and the added complexity, additional parameters, and ranking performance.

Let *v*^{(t)}*, τ*^{(t)}*μ*^{(t)}*,* and *F** ^{(t)}* be the current parameter estimates (for fixed

$$\sum _{g=1}^{G}E[\text{log}\{f({y}_{g}|{\beta}_{g},\hspace{0.17em}\hspace{0.17em}{\lambda}_{g})f({\beta}_{g}|{\lambda}_{g};\hspace{0.17em}\delta ,\hspace{0.17em}\hspace{0.17em}\mu ,\hspace{0.17em}\hspace{0.17em}F)\hspace{0.17em}f({\lambda}_{g};\hspace{0.17em}\hspace{0.17em}\nu ,\hspace{0.17em}\hspace{0.17em}\tau )\}|{y}_{g},\hspace{0.17em}\hspace{0.17em}{\nu}^{(t)},\hspace{0.17em}\hspace{0.17em}{\tau}^{(t)},\hspace{0.17em}\hspace{0.17em}{\mu}^{(t)},\hspace{0.17em}\hspace{0.17em}{F}^{(t)}].$$

(6)

To simplify notation, we denote expectations with respect to this distribution as *E*.*

The contribution of the terms involving *μ* and *F* to the expected complete data log-likelihood is

$$-\frac{1}{2}\text{log}|F|-\sum _{g=1}^{G}{E}_{t}^{*}[({\beta}_{g}-\mu )\text{'}{F}^{-1}({\beta}_{g}-\mu ){\lambda}_{g}^{-\delta}]/2G.$$

(7)

The contribution of the terms involving *ν* and *τ*to the expected complete data log-likelihood is

$$\nu \text{log}\tau -\text{log}\Gamma (\nu )-\nu \sum _{g=1}^{G}{E}_{t}^{*}[\text{log}\hspace{0.17em}\hspace{0.17em}{\lambda}_{g}]/G-\tau \sum _{g=1}^{G}{E}_{t}^{*}[{\lambda}_{g}^{-1}]/G.$$

(8)

Let *θ** _{g}* = 1 =

$$\begin{array}{l}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}f({\theta}_{g}|{y}_{g};\hspace{0.17em}\hspace{0.17em}F,\hspace{0.17em}\mu ,\hspace{0.17em}\hspace{0.17em}\tau ,\hspace{0.17em}\hspace{0.17em}\nu ;\hspace{0.17em}\hspace{0.17em}\delta )\\ \propto \hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{\theta}_{g}^{J/2+p\delta /2+\nu -1}|{X}^{t}X{\theta}_{g}+{F}^{-1}{\theta}_{g}^{\delta}{|}^{-1/2}\\ \times \hspace{0.17em}\hspace{0.17em}\text{exp}\{-{y}_{g}^{t}{y}_{g}{\theta}_{g}/2-{\theta}_{g}\tau +{y}_{g}^{t}X({X}^{t}X{\theta}_{g}+{F}^{-1}{\theta}_{g}^{\delta}{)}^{-1}{X}^{t}{y}_{g}{\theta}_{g}^{2}/2\}\\ =\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}|{X}^{t}X{\theta}_{g}+{F}^{-1}{\theta}_{g}^{\delta}{|}^{-1/2}\hspace{0.17em}\hspace{0.17em}\text{exp}\{{y}_{g}^{t}X({X}^{t}X{\theta}_{g}+{F}^{-1}{\theta}_{g}^{\delta}{)}^{-1}{X}^{t}{y}_{g}{\theta}_{g}^{2}/2\}K({\theta}_{g})\end{array}$$

(9)

where *K* is a gamma kernel.

In Gaussian/Laguerre integrations, integrals are approximated via a trapezoidal rule,

$${\int}_{0}^{\infty}h({\theta}_{g})dK({\theta}_{g})\hspace{0.17em}\approx \hspace{0.17em}\sum _{i=1}^{n}h({n}_{i}){w}_{i},$$

where the nodes, *n** _{i}*, are the zeros of the

^{*}The work on this grant was supported by NIH grants K25EB003491 and 5T32HL007024 and NSF grant DMS034211.

- Ando A, Kaufman GM. 1965. Bayesian analysis of the independent multinormal process – Neither mean nor precision known Journal of the American Statistical Association, 60:347–358.35810.2307/2283159 [Cross Ref]
- Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: Regularized t–test and statistical inferences of gene changes. Bioinformatics. 2001;17(6):509–519. doi: 10.1093/bioinformatics/17.6.509. [PubMed] [Cross Ref]
- Beer D, Kardia S, Huang C, et al. Gene-expression profiles predict survival of patients with lung adenocarcinoma. Nature Medicine. 2002;8:816–824. [PubMed]
- Bhattacharjee A, Richards W, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct subclasses. The Proceedings of the National Academy of Sciences of the United States of America. 2001;98:13790–13795. doi: 10.1073/pnas.191502998. [PubMed] [Cross Ref]
- Caffo B, Jank W, Jones G. 2002. Ascent based Monte Carlo EMTechnical report, Johns Hopkins University Department of Biostatistics
- Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall; Boca Raton, FL: 2000.
- Cui X, Hwang J, Qiu J, Blades N, Churchill G. Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics. 2005;6(1):59–75. doi: 10.1093/biostatistics/kxh018. [PubMed] [Cross Ref]
- Garber M, Troyanskay O, Schluens K, Petersen S, Pacyna-Gengelbach ZT, van de Rijn M, Rosen G, Perou C, Whyte R, Altman R, Brown P, Botstein D, Petersen I. Diversity of gene expression in adenocarcinoma of the lung. The Proceedings of the National Academy of Sciences of the United States of America. 2001;98(24):13784–13789. doi: 10.1073/pnas.241500798. [PubMed] [Cross Ref]
- Ibrahim JG, Chen MH, Gray RJ. Bayesian models for gene expression with DNA microarray data. Journal of the American Statistical Association. 2002;97:88–99. doi: 10.1198/016214502753479257. [Cross Ref]
- Ihaka R, Gentleman R. R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5(3):299–314. doi: 10.2307/1390807. [Cross Ref]
- Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions Volume 2. second edition Wiley; New York: 1995.
- Lange K. Numerical Analysis for Statisticians. Springer-Verlag; 1999.
- Lindley DV, Smith AFM. Bayes estimates for the linear model (with discussion) Journal of the Royal Statistical Society, Series B. 1972;34:1–41.
- Liu D, Parmigiani G, Caffo B. 2004. Screening for differentially expressed genes: Are multilevel models helpful?Technical report,Johns Hopkins University
- Lönnstedt I, Speed T. Replicated microarray data. Statistica Sinica. 2002;12(1):31–46.
- Newton M, Noueiry A, Sarkar D, Ahlquist P. Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics. 2004;5(2):155–176. doi: 10.1093/biostatistics/5.2.155. [PubMed] [Cross Ref]
- Newton MA, Kendziorski CM, Richmond CS, Blattner FR, Tsui KW. On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Journal of Computational Biology. 2001;8:37–52. doi: 10.1089/106652701300099074. [PubMed] [Cross Ref]
- Parmigiani G, Garrett ES, Anbazhagan R, Gabrielson E. A statistical framework for expression-based molecular classification in cancer. Journal of the Royal Statistical Society, Series B, Methodological. 2002;64:717–736. doi: 10.1111/1467-9868.00358. [Cross Ref]
- Parmigiani G, Garrett-Mayer E, Ramaswamy A, Gabrielson E. A cross-study comparison of gene expression studies for the molecular classification of lung cancer. Clinical cancer research. 2004;10:2922–2927. doi: 10.1158/1078-0432.CCR-03-0490. [PubMed] [Cross Ref]
- Press W, Teukolsky S, Vetterling W, Flanner B. Numerical Recipes in C: The Art of Scientific Computing. second edition Cambridge University Press; Cambridge: 1992.
- Royall R. Statistical Evidence: A Likelihood Paradigm. Chapman and Hall; 1997.
- Scharpf RB, Tjelmeland H, Parmigiani G, Nobel A. 2008. A Bayesian model for cross-study differential gene expression JASATo appear [PMC free article] [PubMed]
- Smyth G. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology. 2004;3(1):3. doi: 10.2202/1544-6115.1027. [PubMed] [Cross Ref]
- Su S, Caffo B, Garrett-Mayer E, Bassett S. 2007. modified test statistics by inter-voxel variance shrinkage with an application to fMRI Johns Hopkins University, Dept. of Biostatistics Working Papers, page 138 [PMC free article] [PubMed]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |