Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC3202941

Formats

Article sections

Authors

Related links

Int J Biostat. 2011 January 1; 7(1): Article 37.

Published online 2011 September 27. doi: 10.2202/1557-4679.1353

PMCID: PMC3202941

Melissa Eliot, University of Massachusetts Amherst;

Copyright © 2011 The Berkeley Electronic Press. All rights reserved

Technological advances facilitating the acquisition of large arrays of biomarker data have led to new opportunities to understand and characterize disease progression over time. This creates an analytical challenge, however, due to the large numbers of potentially informative markers, the high degrees of correlation among them, and the time-dependent trajectories of association. We propose a mixed ridge estimator, which integrates ridge regression into the mixed effects modeling framework in order to account for both the correlation induced by repeatedly measuring an outcome on each individual over time, as well as the potentially high degree of correlation among possible predictor variables. An expectation-maximization algorithm is described to account for unknown variance and covariance parameters. Model performance is demonstrated through a simulation study and an application of the mixed ridge approach to data arising from a study of cardiometabolic biomarker responses to evoked inflammation induced by experimental low-dose endotoxemia.

Ridge regression (RR), also known as Tikhonov regularization (Tikhonov, 1943), is a well-described penalized regression approach to handling multiple co-linear predictor variables that involves adding a regularization term to the least squares equation in order to derive parameter estimates in the context of an ill-conditioned or singular design matrix (Hoerl and Kennard, 1970a,b). The resulting shrinkage estimates, while biased, offer improved prediction accuracy, that is, reduced prediction variance, and thus may be preferable in settings with a large number of highly correlated independent variables, in which unique least squares solutions are not tenable. Recent applications of RR to data arising from genetic association studies include, for example, Zhang and Horvath (2006) and Malo, Libiger, and Schork (2008).

In this manuscript we extend RR to the correlated response setting in which a single outcome of interest is measured repeatedly over time at potentially unevenly spaced time intervals. The data motivating our research arise from the Genetics of Evoked-Response to Niacin and Endotoxemia (the GENE) study and are described in more detail in Section 3.2 below. In the case of uncorrelated or minimally-correlated predictor variables, mixed effects models with person-specific random intercept and slopes terms can be applied to account for the within person correlation in making inferences about association (Fitzmaurice, Laird, and Ware, 2004). Here we integrate RR and the mixed effects modeling framework in order to account both for the correlation induced by repeatedly measuring the outcome on each individual over time, as well the potential high degree of correlation among potential predictor variables.

We begin in Section 2.1 by providing a brief background on RR and mixed effects modeling while defining the notation used throughout this manuscript. We then introduce a new estimator, which we term the mixed ridge (MR) estimator, for longitudinal data, as well as an associated testing framework in Section 2.2. Results of a simulation study to evaluate performance and an application to data arising from the GENE study are then provided in Sections 3.1 and 3.2, respectively. Finally, a discussion of our findings and alternative approaches is provided in Section 4.

We begin with the simple linear regression model given by:

$$\mathbf{Y}=\mathbf{X}\beta +$$

(1)

where **Y** = (**y _{1}**,

Ridge regression, designed specifically to handle correlated predictors, involves introducing a shrinkage penalty *λ* to the least squares equation, and subsequently solving for the value of such that:

$${\widehat{\beta}}_{R}=\text{arg}\mathrm{{\text{min}}_{\beta}\left\{{(\mathbf{Y}-\mathbf{X}\beta )}^{T}(\mathbf{Y}-\mathbf{X}\beta )+{\lambda}^{2}{\beta}^{T}\beta \right\}}$$

(2)

It has been shown that the solution to Equation 2 is given by * _{R}* = (

$$\widehat{\lambda}=\underset{\left\{{n}^{-1}{(\mathbf{Y}-\widehat{Y})}^{T}(\mathbf{Y}-\widehat{Y})/{(1-tr(\mathbf{S})/n)}^{2}\right\}}{\text{arg}\mathrm{\text{min}}\lambda}$$

(3)

where **S** = **X**(**X**^{T}**X** + *λ***I**)^{−1}**X*** ^{T}* and

$$\widehat{\lambda}\approx \underset{\left\{{n}^{-1}{(\mathbf{Y}-\widehat{Y})}^{T}(\mathbf{Y}-\widehat{Y})+2{n}^{-2}tr(\mathbf{S}){(\mathbf{Y}-\widehat{Y})}^{T}(\mathbf{Y}-\widehat{Y})\right\}}{\text{arg}\mathrm{\text{min}}\lambda}$$

(4)

Now consider the setting in which multiple measurements are observed for each individual over time. The mixed effects model for this setting is given by:

$${\mathbf{y}}_{i}={\mathbf{X}}_{i}\beta +{\mathbf{z}}_{i}^{T}{b}_{i}+{i}_{}$$

(5)

where *i* = 1, …, *n* represents individual, **y*** _{i}* = (

Notably, the log-likelihood function of **Y** based on this model is given by (Fitzmaurice et al., 2004):

$$l(\mathbf{Y})=-\frac{N}{2}\mathit{log}(2\pi )-\frac{1}{2}\sum _{i=1}^{n}\mathit{log}({V}_{i})-\frac{1}{2}{(Y-\mathbf{X}\beta )}^{T}{\mathbf{V}}^{-1}(Y-\mathbf{X}\beta )$$

(6)

where **V** =*Var*(**Y**) = **Z***D***Z*** ^{T}* +

$$\widehat{\beta}=\underset{\{{(\mathbf{Y}-\mathbf{X}\beta )}^{T}{\mathbf{V}}^{-1}(\mathbf{Y}-\mathbf{X}\beta )\}}{\text{arg}\mathrm{\text{min}}\beta}$$

(7)

It is straightforward to show that the resultant estimate of *β* is given by:

$$\widehat{\beta}={({\mathbf{X}}^{T}{V}^{-1}\mathbf{X})}^{-1}{\mathbf{X}}^{T}{V}^{-1}\mathbf{Y}$$

(8)

Further, the best unbiased predictor (BLUP) of the random effects is given by = *D***Z**^{T}*V*^{−1}(**Y** – **X**), *var*() = (**X**^{T}*V*^{−1}**X**)^{−1} and *var*() = *D***Z*** ^{T}* [

In this section we introduce a penalized regression approach to estimation for the mixed model given in Equation 5. To begin, we assume the variance parameters *θ* = (*σ*,*D*) are known and add a penalization term to Equation 7, similar to the ridge regression setting, which yields:

$${\widehat{\beta}}_{MR}=\underset{\{{(\mathbf{Y}-\mathbf{X}\beta )}^{T}{V}^{-1}(\mathbf{Y}-\mathbf{X}\beta )+\lambda {\beta}^{T}\beta \}}{\text{arg}\mathrm{\text{min}}\beta}$$

(9)

Differentiating the objective function in Equation 9, setting the resultant equal to 0, and solving for , we have:

$${\widehat{\beta}}_{MR}={({\mathbf{X}}^{T}{V}^{-1}\mathbf{X}+\lambda \mathbf{I})}^{-1}{\mathbf{X}}^{T}{V}^{-1}\mathbf{Y}$$

(10)

A simple proof of this result is provided in Appendix A.1. Additionally, it can be shown that *Var*(* _{MR}*) = (

$$\begin{array}{lll}\widehat{b}\hfill & =\hfill & E[b|\mathbf{Y}]={D\mathbf{Z}}^{T}{V}^{-1}(\mathbf{Y}-\mathbf{X}{\widehat{\beta}}_{MR})\hfill \\ \hfill & =\hfill & {D\mathbf{Z}}^{T}{V}^{-1}(\mathbf{I}-\mathbf{X}\left[{({X}^{\prime}{V}^{-1}\mathbf{X}+\lambda \mathbf{I})}^{-1}{\mathbf{X}}^{T}{V}^{-1}\right])\mathbf{Y}\hfill \end{array}$$

(11)

Estimation of *λ* is achieved using the approach described in Section 2.1.1, where now we have **S** = **X**(**X**^{T}V^{−1}**X** + *λ***I**)^{−1}**X**^{T}V^{−1} + **Z***D***Z*** ^{T}* (

More generally, consider the setting in which the variance parameters *θ* are unknown. We propose an extension of the expectation-maximization (EM) algorithm described by Laird and Ware (1982), that includes an additional step for estimation of the ridge component. This approach is summarized by the following step-by-step procedure:

- (
**E-step**) Initialize^{(t)}=*θ*_{0}and^{(t)}=*λ*_{0}. Solve for ${\widehat{\beta}}_{\text{MR}}^{\left(t\right)}$ and the sufficient statistics ${\widehat{t}}_{1}^{\left(t\right)}$ and ${\widehat{t}}_{2}^{\left(t\right)}$ given by:$${\widehat{t}}_{1}^{\left(t\right)}=E(\sum _{i=1}^{n}{iT}_{{i}_{|}}^{})$$(12)$${\widehat{t}}_{2}^{\left(t\right)}=E(\sum _{i=1}^{n}{b}_{i}\mathrm{{b}_{i}^{T}|{\mathbf{Y}}_{i},{\widehat{\beta}}_{\text{MR}}^{\left(t\right)},{\widehat{\theta}}^{\left(t\right)}})$$(13) - (
**M-Step**) Solve for^{(t+1)}where ${\widehat{\sigma}}^{2(t+1)}={\widehat{t}}_{1}^{\left(t\right)}/N$, ${\widehat{D}}^{(t+1)}={\widehat{t}}_{2}^{\left(t\right)}/n$, $N={\sum}_{i=1}^{n}{n}_{i}$ and*n*is the number of individuals in our sample, and let^{(t+1)}=**Z**^{(t+1)}**Z**+^{T}^{2}^{(t+1)}**I** - Update
^{(t+1)}and let$${\widehat{\beta}}_{\text{MR}}^{(t+1)}={\left({\mathbf{X}}^{T}{\widehat{V}}^{-1(t+1)}\mathbf{X}+{\widehat{\lambda}}^{(t+1)}\mathbf{I}\right)}^{-1}{\mathbf{X}}^{T}{\widehat{V}}^{-1(t+1)}\mathbf{Y}$$(14) - Repeat Steps (
*a*) – (*c*) a large number of times and until a convergence criterion is met.

In order to determine the significance of each of the predictor variables, we begin by calculating Wald test statistics. These are denoted *Z _{k}* for each

A simple simulation study is conducted to characterize the relative performances of mixed ridge regression and the usual mixed effects modeling approach in the context of multiple, correlated predictors. For simplicity of presentation, the simulation study assumes repeatedly measured outcomes, while the predictor variables are measured at a single, baseline time point. We further assume *n _{i}* = 4 measurements for each subject

Simulation results are presented in Table 1 and Figure 1. As expected, MR estimates tend to exhibit slightly more bias (|Estimate — *β*|) than mixed estimates; however, the MR estimates have smaller variances, resulting ultimately in higher power of the MR model. Power for detecting association when *β* = 0.20, *σ* = 1 and *D* = 0.6, is illustrated in Figure 1. Here we see that power begins to diverge between the two models at approximately ρ = 0.8. This difference is more dramatic at extreme levels of correlation (ρ = 0.99), where power for the mixed model when the true value of *β* is 0.20 is 0.160 compared to a power of 0.610 for the MR model; when true value of *β* is 0.60 the power for the mixed model is 0.51 compared with 0.90 for the MR model. The average type 1 error rates across all assumed values of ρ and both *β*_{1} and *β*_{2} are 0.050 and 0.054 for the mixed and MR models, respectively.

The Genetics of Evoked-Responses to Niacin and Endotoxemia (the GENE) study is an ongoing trial designed to characterize the effects of genetic factors on the response to niacin therapy and endotoxin. Healthy volunteers (*n* = 189) were administered low-dose endotoxin (LPS) infusion, which produces a mild-inflammatory response over a 6–8 hour period. Vital signs, including systolic blood pressure, as well as apolipoprotein A1 (Apo-A1), apolipoprotein B (Apo-B), total cholesterol (Chol), high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, phospholipids (Phos), triglycerides (Trig) and tumor necrosis factor-alpha (TNF) were measured repeatedly over a 24-hour period.

Systolic blood pressure at 0, 6, 12, and 24 hours after the evoked inflammation is modeled using all other variables listed above as potential predictor variables. Pearson correlations between all pairs of potential predictor variables range in absolute value from 0.01 to 0.95, as described in Table 2. The high degree of correlation indicates that MR regression is appropriate for this setting. As the change in systolic blood pressure over time is expected to be piecewise-linear, a linear spline (Fitzmaurice et al., 2004) model is applied with change points at times 6 and 12, denoted Time_{1} and Time_{2} respectively. Random within-subject effects for intercept and slope are also included in the model. Each subject is measured between 2 and 4 times.

The coefficient estimates, standard deviations and corresponding observed test statistics (corresponding to 2-sided tests that the coefficient is equal to 0) for the MR and mixed modeling approaches are given in Table 3. In both modeling frameworks, the overall effect of time on systolic blood pressure is significantly different than 0. After adjusting for multiple testing, the MR approach additionally identifies Apo-B as significantly associated with systolic blood pressure over time, while the usual mixed modeling approach is unable to detect this association. This finding is consistent with known associations between Apo-B and blood pressure [see, for example, Genest and Cohn (1995)]. Apo-B can be suppressed by inflammation in humans while blood pressure tends to rise with inflammation stress, supporting the observed inverse correlation. This is likely an indirect association with inflammation driving both changes in Apo-B and blood pressure.

In this manuscript, we describe an extension and application of ridge regression for longitudinally measured biomarkers. Alternative penalized regression approaches include the least absolute shrinkage and selection operator (LASSO) and the elastic net (Tibshirani, 1996, Zou and Hastie, 2005). LASSO extends ridge regression by introducing a mixture distribution on the coefficient effects with a point mass at zero, while the elastic net is a further generalization that incorporates two shrinkage parameters. Further extensions of the approach described herein is feasible in these alternative settings; however, estimation is less straightforward given the additional parameter constraints that would need to be applied.

We have proposed a new estimator to use when data sets may have correlated variables and within-subject effects. The mixed-effects ridge estimator combines the usefulness of hierarchical linear models and ridge regression. To account for unobserved variance parameters, we have proposed an EM algorithm. As expected, the mixed-ridge estimation results in coefficients with smaller variances than the mixed model. Furthermore, as correlation among predictors increases, the power for mixed modeling decreases more rapidly than the power for our MR model. In summary, the mixed ridge model appears to perform better than the usual mixed model in terms of power for detecting association in the context of highly correlated predictors.

Differentiating the objective function of Equation 9 and setting this equal to 0, we have:

$$\begin{array}{lll}0\hfill & =\hfill & \frac{d}{d\beta}[{(\mathbf{Y}-\mathbf{X}\beta )}^{T}{V}^{-1}(\mathbf{Y}-\mathbf{X}\beta )+\lambda {\beta}^{T}\beta ]\hfill \\ \hfill & =\hfill & \frac{d}{d\beta}[{\mathbf{Y}}^{T}{V}^{-1}\mathbf{Y}-{\mathbf{Y}}^{T}{V}^{-1}\mathbf{X}\beta -{(\mathbf{X}\beta )}^{T}{V}^{-1}\mathbf{Y}+{(\mathbf{X}\beta )}^{T}{V}^{-1}(\mathbf{X}\beta )+\lambda {\beta}^{T}\beta ]\hfill \\ \hfill & =\hfill & \frac{d}{d\beta}[{\mathbf{Y}}^{T}{V}^{-1}\mathbf{Y}-2{\mathbf{Y}}^{T}{V}^{-1}\mathbf{X}\beta +{(\mathbf{X}\beta )}^{T}{V}^{-1}(\mathbf{X}\beta )+\lambda {\beta}^{T}\beta ]\hfill \\ \hfill & =\hfill & -2{\mathbf{Y}}^{T}{V}^{-1}\mathbf{X}+2{\beta}^{T}{\mathbf{X}}^{T}{V}^{-1}\mathbf{X}+2\lambda {\beta}^{T}\hfill \end{array}$$

or equivalently, *β ^{T}* (

Melissa Eliot, University of Massachusetts Amherst.

Jane Ferguson, University of Pennsylvania.

Muredach P Reilly, University of Pennsylvania.

Andrea S Foulkes, University of Massachusetts Amherst.

- Christensen R. Plane Answers to Complex Questions. Springer; 2002.
- Craven P, Wahba G. “Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation,” Numerische Mathematik. 1979;31:377–403. doi: 10.1007/BF01404567. [Cross Ref]
- Fitzmaurice G, Laird N, Ware J. Applied Longitudinal Analysis. Wiley Series in Probability and Statistics; 2004.
- Foulkes A. Applied Statistical Genetics with R: for population-based association studies. Springer; NY: 2009. [Cross Ref]
- Genest J, Cohn J. “Clustering of cardiovascular risk factors: targeting high-risk individuals,” American Journal of Cardiology. 1995;76:8A–20A. doi: 10.1016/S0002-9149(05)80010-4. [PubMed] [Cross Ref]
- Gentle J, Hardle W, Mori Y. Handbook of Computational Statistics. Springer-Verlag; Berlin Heidelberg: 2004.
- Hoerl A, Kennard R. “Ridge regression: Applications to nonorthogonal problems,” Technometrics. 1970a;12:69–82. doi: 10.2307/1267352. [Cross Ref]
- Hoerl A, Kennard R. “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics. 1970b;12:55–67. doi: 10.2307/1267351. [Cross Ref]
- Laird N, Ware J. “Random-effects models for longitudinal data,” Biometrics. 1982;38:963–974. doi: 10.2307/2529876. [PubMed] [Cross Ref]
- Louis TA. “Finding the observed information matrix when using the EM algorithm,” Journal of the Royal Statistical Society, Series B: Methodological. 1982;44:226–233.
- Malo N, Libiger O, Schork N. “Accommodating linkage disequilibrium in genetic-association analyses via ridge regression,” The American Journal of Human Genetics. 2008;82:375–85. doi: 10.1016/j.ajhg.2007.10.012. [PubMed] [Cross Ref]
- Tibshirani “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society Series B (Methodological) 1996;58:267–88.
- Tikhonov A. “On the stability of inverse problems,” Proceedings of the USSR Academy of Sciences. 1943;39:267–88.
- Westfall P, Young S. Resampling-based multiple testing. Wiley-Interscience; 1993.
- Zhang B, Horvath S. “Ridge regression based hybrid genetic algorithms for multi-locus quantitative trait mapping,” Bioinformatics Research and Applications. 2006;1:261–72. doi: 10.1504/IJBRA.2006.007905. [PubMed] [Cross Ref]
- Zou H, Hastie T. “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society, Series B: Statistical Methodology. 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x. [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |