Search tips
Search criteria 


Logo of ijbiostatThe International Journal of BiostatisticsThe International Journal of BiostatisticsSubmit to The International Journal of BiostatisticsSubscribe
Int J Biostat. 2011 January 1; 7(1): 37.
Published online 2011 September 27. doi:  10.2202/1557-4679.1353
PMCID: PMC3202941

Ridge Regression for Longitudinal Biomarker Data


Technological advances facilitating the acquisition of large arrays of biomarker data have led to new opportunities to understand and characterize disease progression over time. This creates an analytical challenge, however, due to the large numbers of potentially informative markers, the high degrees of correlation among them, and the time-dependent trajectories of association. We propose a mixed ridge estimator, which integrates ridge regression into the mixed effects modeling framework in order to account for both the correlation induced by repeatedly measuring an outcome on each individual over time, as well as the potentially high degree of correlation among possible predictor variables. An expectation-maximization algorithm is described to account for unknown variance and covariance parameters. Model performance is demonstrated through a simulation study and an application of the mixed ridge approach to data arising from a study of cardiometabolic biomarker responses to evoked inflammation induced by experimental low-dose endotoxemia.

Keywords: biomarkers, cardiovascular disease (CVD), mixed effects, repeated measures, ridge regression

1. Introduction

Ridge regression (RR), also known as Tikhonov regularization (Tikhonov, 1943), is a well-described penalized regression approach to handling multiple co-linear predictor variables that involves adding a regularization term to the least squares equation in order to derive parameter estimates in the context of an ill-conditioned or singular design matrix (Hoerl and Kennard, 1970a,b). The resulting shrinkage estimates, while biased, offer improved prediction accuracy, that is, reduced prediction variance, and thus may be preferable in settings with a large number of highly correlated independent variables, in which unique least squares solutions are not tenable. Recent applications of RR to data arising from genetic association studies include, for example, Zhang and Horvath (2006) and Malo, Libiger, and Schork (2008).

In this manuscript we extend RR to the correlated response setting in which a single outcome of interest is measured repeatedly over time at potentially unevenly spaced time intervals. The data motivating our research arise from the Genetics of Evoked-Response to Niacin and Endotoxemia (the GENE) study and are described in more detail in Section 3.2 below. In the case of uncorrelated or minimally-correlated predictor variables, mixed effects models with person-specific random intercept and slopes terms can be applied to account for the within person correlation in making inferences about association (Fitzmaurice, Laird, and Ware, 2004). Here we integrate RR and the mixed effects modeling framework in order to account both for the correlation induced by repeatedly measuring the outcome on each individual over time, as well the potential high degree of correlation among potential predictor variables.

We begin in Section 2.1 by providing a brief background on RR and mixed effects modeling while defining the notation used throughout this manuscript. We then introduce a new estimator, which we term the mixed ridge (MR) estimator, for longitudinal data, as well as an associated testing framework in Section 2.2. Results of a simulation study to evaluate performance and an application to data arising from the GENE study are then provided in Sections 3.1 and 3.2, respectively. Finally, a discussion of our findings and alternative approaches is provided in Section 4.

2. Methods

2.1. Background and Notation

2.1.1. Ridge regression

We begin with the simple linear regression model given by:


where Y = (y1, y2, , yn) is an n × 1 vector of responses, X is an n × p design matrix comprised of p columns representing each of the potential predictor variables, n is the number of individuals in our sample and epsilon ~ MVN (0, σ2I) is an n × 1 vector of independent errors. It is straightforward to show that the least squares and maximum likelihood solutions to this equation are given by [beta] = (XTX)−1XTY and Var([beta]) = (XTX)−1σ2 (Christensen, 2002). Notably, in the case that the columns of X are highly correlated, XTX will be singular and we replace (XTX)−1 with (XTX) where ‘−’ denotes generalized inverse, and a unique solution to Equation 1 does not exist. Further, in the case of high correlation where XTX is still invertible, the resulting coefficient estimates will have largely inflated variances, which in turn, results in low predictive precision.

Ridge regression, designed specifically to handle correlated predictors, involves introducing a shrinkage penalty λ to the least squares equation, and subsequently solving for the value of [beta] such that:


It has been shown that the solution to Equation 2 is given by [beta]R = (XTX + λI)−1XTY and we have Var([beta]R) = [(XTX + λI)−1XT] [(XTX + λI)−1XT]T (Hoerl and Kennard, 1970a,b). Further, dividing [beta] by root n times the square root of its variance has a Student’s t distribution with effective degrees of freedom given by EDF = tr ([XTX + λI]−1XTX) (Malo et al., 2008). Applying the generalized cross validation (GCV) method of Craven and Wahba (1979), we estimate λ of Equation 2 by:


where S = X(XTX + λI)−1XT and Ŷ = X[beta]R. The optimal value of [lambda with circumflex] is found through an iterative procedure that involves recursively updating Ŷ and S based on the current value of [lambda with circumflex]. Notably, as described in Gentle, Hardle, and Mori (2004) for tr(S)/n small, [lambda with circumflex] of Equation 3 is approximated by:


2.1.2. Linear mixed effects model

Now consider the setting in which multiple measurements are observed for each individual over time. The mixed effects model for this setting is given by:


where i = 1, …, n represents individual, yi = (yi1, yi2, …, yini)T is a vector of ni observations for individual i, and Xi is the corresponding ni × p design matrix of fixed effect covariates. We further assume bi ~ MVN(0, D) are person-specific random effects, zi is the corresponding random effects design matrix, and epsiloni ~ MVN(0, σ2Ini×ni) are independent random errors. Finally, we let Y, X and Z be appropriately defined matrices representing the concatenation of the corresponding variables over all individuals i.

Notably, the log-likelihood function of Y based on this model is given by (Fitzmaurice et al., 2004):


where V =Var(Y) = ZDZT + σ2I and Vi is component corresponding to individual i. Maximizing this function with respect to the fixed effects parameter vector, β, in the non-penalized setting is equivalent to minimizing the least squares objective function:


It is straightforward to show that the resultant estimate of β is given by:


Further, the best unbiased predictor (BLUP) of the random effects is given by b = DZTV−1(YX[beta]), var([beta]) = (XTV−1X)−1 and var(b) = DZT [V−1V−1X(XTV−1X)−1XTV−1]ZD (Laird and Ware, 1982).

2.2. Mixed Ridge (MR) Regression

In this section we introduce a penalized regression approach to estimation for the mixed model given in Equation 5. To begin, we assume the variance parameters θ = (σ,D) are known and add a penalization term to Equation 7, similar to the ridge regression setting, which yields:


Differentiating the objective function in Equation 9, setting the resultant equal to 0, and solving for [beta], we have:


A simple proof of this result is provided in Appendix A.1. Additionally, it can be shown that Var([beta]MR) = (XTV−1X + λI)−1XTV−1X (XTV−1X + λI)−1 and the empirical Bayes estimate of b is given by:


Estimation of λ is achieved using the approach described in Section 2.1.1, where now we have S = X(XTV−1X + λI)−1XTV−1 + ZDZT (IX[(XTV−1X + λI)−1 XTV−1]).

More generally, consider the setting in which the variance parameters θ are unknown. We propose an extension of the expectation-maximization (EM) algorithm described by Laird and Ware (1982), that includes an additional step for estimation of the ridge component. This approach is summarized by the following step-by-step procedure:

  1. (E-step) Initialize [theta w/ hat](t) = θ0 and [lambda with circumflex](t) = λ0. Solve for β^MR(t) and the sufficient statistics t^1(t) and t^2(t) given by:
  2. (M-Step) Solve for [theta w/ hat](t+1) where σ^2(t+1)=t^1(t)/N, D^(t+1)=t^2(t)/n, N=i=1nni and n is the number of individuals in our sample, and let V(t+1) = ZD(t+1)ZT + [sigma with hat]2(t+1)I
  3. Update [lambda with circumflex](t+1) and let
  4. Repeat Steps (a) – (c) a large number of times and until a convergence criterion is met.

In order to determine the significance of each of the predictor variables, we begin by calculating Wald test statistics. These are denoted Zk for each k =, 1,…, p, and given by the corresponding value of [beta]MR divided by the square root of its variance. Since an EM algorithm is employed to estimate β, we use Louis’ formula (Louis, 1982) to determine the corresponding variance. Finally, Westfall and Young’s free step-down resampling approach (Westfall and Young, 1993, Foulkes, 2009), also known as the maxT procedure, is applied to adjust for multiple testing.

3. Examples

3.1. Simulation Study

A simple simulation study is conducted to characterize the relative performances of mixed ridge regression and the usual mixed effects modeling approach in the context of multiple, correlated predictors. For simplicity of presentation, the simulation study assumes repeatedly measured outcomes, while the predictor variables are measured at a single, baseline time point. We further assume ni = 4 measurements for each subject i and generate data according to the model of Equation 5 where β = (0,0,0.2,0.4,0.6), bi ~ N(0, 0.6), and epsiloni jk ~ N(0, 1). Each predictor variable is assumed to arise from a normal distribution with mean equal to 5 and variance equal to 1. The correlation between predictor variables, given by ρ in Table 1, is assumed to take on values between 0 and 0.99. Starting values for the variance components are derived based on fitting a mixed model with no ridge component. In total, M = 500 simulations are conducted for each condition based on sample sizes of n = 500 individuals.

Table 1:
Results of simulation study

Simulation results are presented in Table 1 and Figure 1. As expected, MR estimates tend to exhibit slightly more bias (|Estimate — β|) than mixed estimates; however, the MR estimates have smaller variances, resulting ultimately in higher power of the MR model. Power for detecting association when β = 0.20, σ = 1 and D = 0.6, is illustrated in Figure 1. Here we see that power begins to diverge between the two models at approximately ρ = 0.8. This difference is more dramatic at extreme levels of correlation (ρ = 0.99), where power for the mixed model when the true value of β is 0.20 is 0.160 compared to a power of 0.610 for the MR model; when true value of β is 0.60 the power for the mixed model is 0.51 compared with 0.90 for the MR model. The average type 1 error rates across all assumed values of ρ and both β1 and β2 are 0.050 and 0.054 for the mixed and MR models, respectively.

Figure 1:
Power for detecting associaton

3.2. Data Example: The GENE study

The Genetics of Evoked-Responses to Niacin and Endotoxemia (the GENE) study is an ongoing trial designed to characterize the effects of genetic factors on the response to niacin therapy and endotoxin. Healthy volunteers (n = 189) were administered low-dose endotoxin (LPS) infusion, which produces a mild-inflammatory response over a 6–8 hour period. Vital signs, including systolic blood pressure, as well as apolipoprotein A1 (Apo-A1), apolipoprotein B (Apo-B), total cholesterol (Chol), high-density lipoprotein (HDL) cholesterol, low-density lipoprotein (LDL) cholesterol, phospholipids (Phos), triglycerides (Trig) and tumor necrosis factor-alpha (TNF) were measured repeatedly over a 24-hour period.

Systolic blood pressure at 0, 6, 12, and 24 hours after the evoked inflammation is modeled using all other variables listed above as potential predictor variables. Pearson correlations between all pairs of potential predictor variables range in absolute value from 0.01 to 0.95, as described in Table 2. The high degree of correlation indicates that MR regression is appropriate for this setting. As the change in systolic blood pressure over time is expected to be piecewise-linear, a linear spline (Fitzmaurice et al., 2004) model is applied with change points at times 6 and 12, denoted Time1 and Time2 respectively. Random within-subject effects for intercept and slope are also included in the model. Each subject is measured between 2 and 4 times.

Table 2:
Observed correlations among potential predictor variables

The coefficient estimates, standard deviations and corresponding observed test statistics (corresponding to 2-sided tests that the coefficient is equal to 0) for the MR and mixed modeling approaches are given in Table 3. In both modeling frameworks, the overall effect of time on systolic blood pressure is significantly different than 0. After adjusting for multiple testing, the MR approach additionally identifies Apo-B as significantly associated with systolic blood pressure over time, while the usual mixed modeling approach is unable to detect this association. This finding is consistent with known associations between Apo-B and blood pressure [see, for example, Genest and Cohn (1995)]. Apo-B can be suppressed by inflammation in humans while blood pressure tends to rise with inflammation stress, supporting the observed inverse correlation. This is likely an indirect association with inflammation driving both changes in Apo-B and blood pressure.

Table 3:
Results from the GENE study comparing MR to mixed model

4. Discussion

In this manuscript, we describe an extension and application of ridge regression for longitudinally measured biomarkers. Alternative penalized regression approaches include the least absolute shrinkage and selection operator (LASSO) and the elastic net (Tibshirani, 1996, Zou and Hastie, 2005). LASSO extends ridge regression by introducing a mixture distribution on the coefficient effects with a point mass at zero, while the elastic net is a further generalization that incorporates two shrinkage parameters. Further extensions of the approach described herein is feasible in these alternative settings; however, estimation is less straightforward given the additional parameter constraints that would need to be applied.

We have proposed a new estimator to use when data sets may have correlated variables and within-subject effects. The mixed-effects ridge estimator combines the usefulness of hierarchical linear models and ridge regression. To account for unobserved variance parameters, we have proposed an EM algorithm. As expected, the mixed-ridge estimation results in coefficients with smaller variances than the mixed model. Furthermore, as correlation among predictors increases, the power for mixed modeling decreases more rapidly than the power for our MR model. In summary, the mixed ridge model appears to perform better than the usual mixed model in terms of power for detecting association in the context of highly correlated predictors.

Appendix (A.1): 

Differentiating the objective function of Equation 9 and setting this equal to 0, we have:


or equivalently, βT (XTV−1X + λIp×p) = YTV−1 X. Solving for β results in Equation 10.

Contributor Information

Melissa Eliot, University of Massachusetts Amherst.

Jane Ferguson, University of Pennsylvania.

Muredach P Reilly, University of Pennsylvania.

Andrea S Foulkes, University of Massachusetts Amherst.


  • Christensen R. Plane Answers to Complex Questions. Springer; 2002.
  • Craven P, Wahba G. “Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation,” Numerische Mathematik. 1979;31:377–403. doi: 10.1007/BF01404567. [Cross Ref]
  • Fitzmaurice G, Laird N, Ware J. Applied Longitudinal Analysis. Wiley Series in Probability and Statistics; 2004.
  • Foulkes A. Applied Statistical Genetics with R: for population-based association studies. Springer; NY: 2009. [Cross Ref]
  • Genest J, Cohn J. “Clustering of cardiovascular risk factors: targeting high-risk individuals,” American Journal of Cardiology. 1995;76:8A–20A. doi: 10.1016/S0002-9149(05)80010-4. [PubMed] [Cross Ref]
  • Gentle J, Hardle W, Mori Y. Handbook of Computational Statistics. Springer-Verlag; Berlin Heidelberg: 2004.
  • Hoerl A, Kennard R. “Ridge regression: Applications to nonorthogonal problems,” Technometrics. 1970a;12:69–82. doi: 10.2307/1267352. [Cross Ref]
  • Hoerl A, Kennard R. “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics. 1970b;12:55–67. doi: 10.2307/1267351. [Cross Ref]
  • Laird N, Ware J. “Random-effects models for longitudinal data,” Biometrics. 1982;38:963–974. doi: 10.2307/2529876. [PubMed] [Cross Ref]
  • Louis TA. “Finding the observed information matrix when using the EM algorithm,” Journal of the Royal Statistical Society, Series B: Methodological. 1982;44:226–233.
  • Malo N, Libiger O, Schork N. “Accommodating linkage disequilibrium in genetic-association analyses via ridge regression,” The American Journal of Human Genetics. 2008;82:375–85. doi: 10.1016/j.ajhg.2007.10.012. [PubMed] [Cross Ref]
  • Tibshirani “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society Series B (Methodological) 1996;58:267–88.
  • Tikhonov A. “On the stability of inverse problems,” Proceedings of the USSR Academy of Sciences. 1943;39:267–88.
  • Westfall P, Young S. Resampling-based multiple testing. Wiley-Interscience; 1993.
  • Zhang B, Horvath S. “Ridge regression based hybrid genetic algorithms for multi-locus quantitative trait mapping,” Bioinformatics Research and Applications. 2006;1:261–72. doi: 10.1504/IJBRA.2006.007905. [PubMed] [Cross Ref]
  • Zou H, Hastie T. “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society, Series B: Statistical Methodology. 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x. [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of Berkeley Electronic Press