It is a common practice to analyze complex longitudinal data using semiparametric nonlinear mixed-effects (SNLME) models with a normal distribution. Normality assumption of model errors may unrealistically obscure important features of subject variations. To partially explain between- and within-subject variations, covariates are usually introduced in such models, but some covariates may often be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. Inferential procedures can be complicated dramatically when data with skewness, missing values, and measurement error are observed. In the literature, there has been considerable interest in accommodating either skewness, incompleteness or covariate measurement error in such models, but there has been relatively little study concerning all three features simultaneously. In this article, our objective is to address the simultaneous impact of skewness, missingness, and covariate measurement error by jointly modeling the response and covariate processes based on a flexible Bayesian SNLME model. The method is illustrated using a real AIDS data set to compare potential models with various scenarios and different distribution specifications.
Bayesian analysis; Covariate measurement errors; Longitudinal data; Missing data; Random-effects models; Skew distributions
Longitudinal data arise frequently in medical studies and it is a common practice to analyze such complex data with nonlinear mixed-effects (NLME) models which enable us to account for between-subject and within-subject variations. To partially explain the variations, covariates are usually introduced to these models. Some covariates, however, may be often measured with substantial errors. It is often the case that model random error is assumed to be distributed normally, but the normality assumption may not always give robust and reliable results, particularly if the data exhibit skewness. Although there has been considerable interest in accommodating either skewness or covariate measurement error in the literature, there is relatively little work that considers both features simultaneously. In this article, our objectives are to address simultaneous impact of skewness and covariate measurement error by jointly modeling the response and covariate processes under a general framework of Bayesian semiparametric nonlinear mixed-effects models. The method is illustrated in an AIDS data example to compare potential models which have different distributional specifications. The findings from this study suggest that the models with a skew-normal distribution may provide more reasonable results if the data exhibit skewness and/or have measurement errors in covariates.
Bayesian approach; Covariate measurement errors; HIV/AIDS; Joint models; Longitudinal data; Semiparametric nonlinear mixed-effects models; Skew-normal distribution
Ordinary least squares linear regression (OLSLR) analyses are inappropriate for performing trend analysis on repeatedly measured longitudinal data. This study examines multilevel linear mixed-effects (LME) and nonlinear mixed-effects (NLME) methods to model longitudinally collected perimetry data and determines whether NLME methods provide significant improvements over LME methods and OLSLR.
Models of LME and NLME (exponential, whereby the rate of change in sensitivity worsens over time) were examined with two levels of nesting (subject and eye within subject) to predict the mean deviation. Models were compared using analysis of variance or Akaike's information criterion and Bayesian information criterion, as appropriate.
Nonlinear (exponential) models provided significantly better fits than linear models (P < 0.0001). Nonlinear fits markedly improved the validity of the model, as evidenced by the lack of significant autocorrelation, residuals that are closer to being normally distributed, and improved homogeneity. From the fitted exponential model, the rate of glaucomatous progression for an average subject of age 70 years was −0.07 decibels (dB) per year. Ten years later, the same eye would be deteriorating at −0.12 dB/y.
Multilevel mixed-effects models provide better fits to the test data than OLSLR by accounting for group effects and/or within-group correlation. However, the fitted LME model poorly tracks visual field (VF) change over time. An exponential model provides a significant improvement over linear models and more accurately tracks VF change over time in this cohort.
OLS methods are inappropriate for performing trend analyses on repeatedly measured longitudinal data. Instead, a nonlinear (exponential) model provides a significant improvement over linear models and more accurately tracks visual field change over time in this cohort.
glaucoma; mean deviation; linear mixed effect; nonlinear mixed effect; autocorrelation
Often in biomedical studies, the routine use of linear mixed-effects models (based on Gaussian assumptions) can be questionable when the longitudinal responses are skewed in nature. Skew-normal/elliptical models are widely used in those situations. Often, those skewed responses might also be subjected to some upper and lower quantification limits (viz. longitudinal viral load measures in HIV studies), beyond which they are not measurable. In this paper, we develop a Bayesian analysis of censored linear mixed models replacing the Gaussian assumptions with skew-normal/independent (SNI) distributions. The SNI is an attractive class of asymmetric heavy-tailed distributions that includes the skew-normal, the skew-t, skew-slash and the skew-contaminated normal distributions as special cases. The proposed model provides flexibility in capturing the effects of skewness and heavy tail for responses which are either left- or right-censored. For our analysis, we adopt a Bayesian framework and develop a MCMC algorithm to carry out the posterior analyses. The marginal likelihood is tractable, and utilized to compute not only some Bayesian model selection measures but also case-deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated with a simulation study as well as a HIV case study involving analysis of longitudinal viral loads.
Bayesian inference; Detection limit; HIV viral load; Linear mixed models; Skew-normal/independent distribution
Bivariate clustered (correlated) data often encountered in epidemiological and clinical research are routinely analyzed under a linear mixed model framework with underlying normality assumptions of the random effects and within-subject errors. However, such normality assumptions might be questionable if the data-set particularly exhibit skewness and heavy tails. Using a Bayesian paradigm, we use the skew-normal/independent (SNI) distribution as a tool for modeling clustered data with bivariate non-normal responses in a linear mixed model framework. The SNI distribution is an attractive class of asymmetric thick-tailed parametric structure which includes the skew-normal distribution as a special case. We assume that the random effects follows multivariate skew-normal/independent distributions and the random errors follow symmetric normal/independent distributions which provides substantial robustness over the symmetric normal process in a linear mixed model framework. Specific distributions obtained as special cases, viz. the skew-t, the skew-slash and the skew-contaminated normal distributions are compared, along with the default skew-normal density. The methodology is illustrated through an application to a real data which records the periodontal health status of an interesting population using periodontal pocket depth (PPD) and clinical attachment level (CAL).
Bayesian; linear mixed model; MCMC; normal/independent distributions; skewness
Linear mixed effects (LME) models are useful for longitudinal data/repeated measurements. We propose a new class of covariate-adjusted LME models for longitudinal data that nonparametrically adjusts for a normalizing covariate. The proposed approach involves fitting a parametric LME model to the data after adjusting for the nonparametric effects of a baseline confounding covariate. In particular, the effect of the observable covariate on the response and predictors of the LME model is modeled nonparametrically via smooth unknown functions. In addition to covariate-adjusted estimation of fixed/population parameters and random effects, an estimation procedure for the variance components is also developed. Numerical properties of the proposed estimators are investigated with simulation studies. The consistency and convergence rates of the proposed estimators are also established. An application to a longitudinal data set on calcium absorption, accounting for baseline distortion from body mass index, illustrates the proposed methodology.
Binning; Covariance structure; Covariate-adjusted regression (CAR); Longitudinal data; Mixed model; Multiplicative effect; Varying coefficient models
We consider a semiparametric regression model that relates a normal outcome to covariates and a genetic pathway, where the covariate effects are modeled parametrically and the pathway effect of multiple gene expressions is modeled parametrically or nonparametrically using least-squares kernel machines (LSKMs). This unified framework allows a flexible function for the joint effect of multiple genes within a pathway by specifying a kernel function and allows for the possibility that each gene expression effect might be nonlinear and the genes within the same pathway are likely to interact with each other in a complicated way. This semiparametric model also makes it possible to test for the overall genetic pathway effect. We show that the LSKM semiparametric regression can be formulated using a linear mixed model. Estimation and inference hence can proceed within the linear mixed model framework using standard mixed model software. Both the regression coefficients of the covariate effects and the LSKM estimator of the genetic pathway effect can be obtained using the best linear unbiased predictor in the corresponding linear mixed model formulation. The smoothing parameter and the kernel parameter can be estimated as variance components using restricted maximum likelihood. A score test is developed to test for the genetic pathway effect. Model/variable selection within the LSKM framework is discussed. The methods are illustrated using a prostate cancer data set and evaluated using simulations.
BLUPs; Kernel function; Model/variable selection; Nonparametric regression; Penalized likelihood; REML; Score test; Smoothing parameter; Support vector machines
Linear mixed effects (LME) models are increasingly used for analyses of biological and biomedical data. When the multivariate normal assumption is not adequate for an LME model, then a robust estimation approach is preferable to the maximum likelihood one. M-estimators were considered before for robust estimation of the LME models, and recently a constrained S-estimator was proposed. This S-estimator can not be applied directly to LME models with correlated error terms and vector random effects with correlated dimensions. Therefore, a modification is proposed, which extends application of the constrained S-estimator to the LME models for multivariate responses with correlated dimensions and to longitudinal data. Also a new computational algorithm is developed for computing constrained S-estimators. Performance of the S-estimators based on the original Tukey’s biweight and translated biweight is evaluated in a small simulation study with repeated multivariate responses with correlated dimensions. Proposed methodology is applied to jointly analyze repeated measures on three cholesterol components, HDL, LDL, and triglycerides.
Multivariate linear mixed effects models; robust estimation; CTBS estimator for LME model; M-estimator
We propose a semiparametric Bayesian method for handling measurement error in nutritional epidemiological data. Our goal is to estimate nonparametrically the form of association between a disease and exposure variable while the true values of the exposure are never observed. Motivated by nutritional epidemiological data we consider the setting where a surrogate covariate is recorded in the primary data, and a calibration data set contains information on the surrogate variable and repeated measurements of an unbiased instrumental variable of the true exposure. We develop a flexible Bayesian method where not only is the relationship between the disease and exposure variable treated semiparametrically, but also the relationship between the surrogate and the true exposure is modeled semiparametrically. The two nonparametric functions are modeled simultaneously via B-splines. In addition, we model the distribution of the exposure variable as a Dirichlet process mixture of normal distributions, thus making its modeling essentially nonparametric and placing this work into the context of functional measurement error modeling. We apply our method to the NIH-AARP Diet and Health Study and examine its performance in a simulation study.
B-splines; Dirichlet process prior; Gibbs sampling; Measurement error; Metropolis-Hastings algorithm; Partly linear model
In this article we consider a semiparametric generalized mixed-effects model, and propose combining local linear regression, and penalized quasilikelihood and local quasilikelihood techniques to estimate both population and individual parameters and nonparametric curves. The proposed estimators take into account the local correlation structure of the longitudinal data. We establish normality for the estimators of the parameter and asymptotic expansion for the estimators of the nonparametric part. For practical implementation, we propose an appropriate algorithm. We also consider the measurement error problem in covariates in our model, and suggest a strategy for adjusting the effects of measurement errors. We apply the proposed models and methods to study the relation between virologic and immunologic responses in AIDS clinical trials, in which virologic response is classified into binary variables. A dataset from an AIDS clinical study is analyzed.
AIDS clinical trial; generalized linear mixed-effects models; linear mixed-effects model; local linear; local quasilikelihood; longitudinal data; measurement error; penalized quasilikelihood
Regression calibration has been described as a means of correcting effects of measurement error for normally distributed dietary variables. When foods are the items of interest, true distributions of intake are often positively skewed, may contain many zeroes, and are usually not described by well-known statistical distributions. The authors considered the validity of regression calibration assumptions where data are non-Gaussian. Such data (including many zeroes) were simulated, and use of the regression calibration algorithm was evaluated. An example used data from Adventist Health Study 2 (2002–2008). In this special situation, a linear calibration model does (as usual) at least approximately correct the parameter that captures the exposure-disease association in the “disease” model. Poor fit in the calibration model does not produce biased calibrated estimates when the “disease” model is linear, and it produces little bias in a nonlinear “disease” model if the model is approximately linear. Poor fit will adversely affect statistical power, but more complex linear calibration models can help here. The authors conclude that non-Gaussian data with many zeroes do not invalidate regression calibration. Irrespective of fit, linear regression calibration in this situation at least approximately corrects bias. More complex linear calibration equations that improve fit may increase power over that of uncalibrated regressions.
bias (epidemiology); foods; measurement error; power; regression calibration
This article proposes a joint model for longitudinal measurements and competing risks survival data. The model consists of a linear mixed effects sub-model with t-distributed measurement errors for the longitudinal outcome, a proportional cause-specific hazards frailty sub-model for the survival outcome, and a regression sub-model for the variance-covariance matrix of the multivariate latent random effects based on a modified Cholesky decomposition. A Bayesian MCMC procedure is developed for parameter estimation and inference. Our method is insensitive to outlying longitudinal measurements in the presence of non-ignorable missing data due to dropout. Moreover, by modeling the variance-covariance matrix of the latent random effects, our model provides a useful framework for handling high-dimensional heterogeneous random effects and testing the homogeneous random effects assumption which is otherwise untestable in commonly used joint models. Finally, our model enables analysis of a survival outcome with intermittently measured time-dependent covariates and possibly correlated competing risks and dependent censoring, as well as joint analysis of the longitudinal and survival outcomes. Illustrations are given using a real data set from a lung study and simulation.
Joint model; Competing risks; Bayesian analysis; Cholesky decomposition; Mixed effects model; MCMC; Modeling random effects covariance matrix; Outlier
We extend the standard multivariate mixed model by incorporating a smooth time effect and relaxing distributional assumptions. We propose a semiparametric Bayesian approach to multivariate longitudinal data using a mixture of Polya trees prior distribution. Usually, the distribution of random effects in a longitudinal data model is assumed to be Gaussian. However, the normality assumption may be suspect, particularly if the estimated longitudinal trajectory parameters exhibit multimodality and skewness. In this paper we propose a mixture of Polya trees prior density to address the limitations of the parametric random effects distribution. We illustrate the methodology by analyzing data from a recent HIV-AIDS study.
Conditional predictive ordinate; Longitudinal data; Mixture of Polya trees; Penalized spline
The relationship between a primary endpoint and features of longitudinal profiles of a continuous response is often of interest, and a relevant framework is that of a generalized linear model with covariates that are subject-specific random effects in a linear mixed model for the longitudinal measurements. Naive implementation by imputing subject-specific effects from individual regression fits yields biased inference, and several methods for reducing this bias have been proposed. These require a parametric (normality) assumption on the random effects, which may be unrealistic. Adapting a strategy of Stefanski and Carroll (1987 Biometrika 74:703–716), we propose estimators for the generalized linear model parameters that require no assumptions on the random effects and yield consistent inference regardless of the true distribution. The methods are illustrated via simulation and by application to a study of bone mineral density in women transitioning to menopause.
Conditional score; Longitudinal data; Measurement error; Mixed effects model; Regression calibration; Semiparametric
In nonlinear mixed-effects models, estimation methods based on a linearization of the likelihood are widely used although they have several methodological drawbacks. Kuhn and Lavielle (2005) developed an estimation method which combines the SAEM (Stochastic Approximation EM) algorithm, with a MCMC (Markov Chain Monte Carlo) procedure for maximum likelihood estimation in nonlinear mixed-effects models without linearization. This method is implemented in the Matlab software MONOLIX which is available at http://software.monolix.org/. In this paper we apply MONOLIX to the analysis of the pharmacokinetics of saquinavir, a protease inhibitor, from concentrations measured after single dose administration in 100 HIV patients, some with advance disease. We also illustrate how to use MONOLIX to build the covariate model using the Bayesian Information Criterion. Saquinavir oral clearance (CL/F) was estimated to be 1.26 L/h and to increase with body mass index, the inter-patient variability for CL/F being 120%. Several methodological developments are ongoing to extend SAEM which is a very promising estimation method for population pharmacockinetic/pharmacodynamic analyses.
Administration, Oral; Algorithms; Bayes Theorem; HIV Infections; blood; drug therapy; metabolism; virology; HIV Protease Inhibitors; administration & dosage; blood; pharmacokinetics; HIV-1; Humans; Likelihood Functions; Markov Chains; Models, Biological; Monte Carlo Method; Nonlinear Dynamics; Population Surveillance; Prospective Studies; Reproducibility of Results; Saquinavir; administration & dosage; blood; pharmacokinetics; Severity of Illness Index; Software; Stochastic Processes
We consider functional measurement error models, i.e. models where covariates are measured with error and yet no distributional assumptions are made about the mismeasured variable. We propose and study a score-type local test and an orthogonal series-based, omnibus goodness-of-fit test in this context, where no likelihood function is available or calculated—i.e. all the tests are proposed in the semiparametric model framework. We demonstrate that our tests have optimality properties and computational advantages that are similar to those of the classical score tests in the parametric model framework. The test procedures are applicable to several semiparametric extensions of measurement error models, including when the measurement error distribution is estimated non-parametrically as well as for generalized partially linear models. The performance of the local score-type and omnibus goodness-of-fit tests is demonstrated through simulation studies and analysis of a nutrition data set.
Efficient estimation; Efficient testing; Errors in variables; Goodness-of-fit tests; Local alternatives; Measurement error; Score testing; Semiparametric models
This paper considers the problem of estimation in a general semiparametric regression model when error-prone covariates are modeled parametrically while covariates measured without error are modeled nonparametrically. To account for the effects of measurement error, we apply a correction to a criterion function. The specific form of the correction proposed allows Monte Carlo simulations in problems for which the direct calculation of a corrected criterion is difficult. Therefore, in contrast to methods that require solving integral equations of possibly multiple dimensions, as in the case of multiple error-prone covariates, we propose methodology which offers a simple implementation. The resulting methods are functional, they make no assumptions about the distribution of the mismeasured covariates. We utilize profile kernel and backfitting estimation methods and derive the asymptotic distribution of the resulting estimators. Through numerical studies we demonstrate the applicability of proposed methods to Poisson, logistic and multivariate Gaussian partially linear models. We show that the performance of our methods is similar to a computationally demanding alternative. Finally, we demonstrate the practical value of our methods when applied to Nevada Test Site (NTS) Thyroid Disease Study data.
Generalized estimating equations; generalized linear mixed models; kernel method; measurement error; Monte Carlo Corrected Score; semiparametric regression
Malaria is a major public health problem in Malawi, however, quantifying its burden in a population is a challenge. Routine hospital data provide a proxy for measuring the incidence of severe malaria and for crudely estimating morbidity rates. Using such data, this paper proposes a method to describe trends, patterns and factors associated with in-hospital mortality attributed to the disease.
We develop semiparametric regression models which allow joint analysis of nonlinear effects of calendar time and continuous covariates, spatially structured variation, unstructured heterogeneity, and other fixed covariates. Modelling and inference use the fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulation techniques. The methodology is applied to analyse data arising from paediatric wards in Zomba district, Malawi, between 2002 and 2003.
Results and Conclusion
We observe that the risk of dying in hospital is lower in the dry season, and for children who travel a distance of less than 5 kms to the hospital, but increases for those who are referred to the hospital. The results also indicate significant differences in both structured and unstructured spatial effects, and the health facility effects reveal considerable differences by type of facility or practice. More importantly, our approach shows non-linearities in the effect of metrical covariates on the probability of dying in hospital. The study emphasizes that the methodological framework used provides a useful tool for analysing the data at hand and of similar structure.
Joint models are frequently used in survival analysis to assess the relationship between time-to-event data and time-dependent covariates, which are measured longitudinally but often with errors. Routinely, a linear mixed-effects model is used to describe the longitudinal data process, while the survival times are assumed to follow the proportional hazards model. However, in some practical situations, individual covariate profiles may contain changepoints. In this article, we assume a two-phase polynomial random effects with subject-specific changepoint model for the longitudinal data process and the proportional hazards model for the survival times. Our main interest is in the estimation of the parameter in the hazards model. We incorporate a smooth transition function into the changepoint model for the longitudinal data and develop the corrected score and conditional score estimators, which do not require any assumption regarding the underlying distribution of the random effects or that of the changepoints. The estimators are shown to be asymptotically equivalent and their finite-sample performance is examined via simulations. The methods are applied to AIDS clinical trial data.
Changepoint; Conditional score; Corrected score; Measurement error; Random effects; Proportional hazards
Many methods, including parametric, nonparametric, and Bayesian methods, have been used for detecting differentially expressed genes based on the assumption that biological systems are linear, which ignores the nonlinear characteristics of most biological systems. More importantly, those methods do not simultaneously consider means, variances, and high moments, resulting in relatively high false positive rate. To overcome the limitations, the SWang test is proposed to determine differentially expressed genes according to the equality of distributions between case and control. Our method not only latently incorporates functional relationships among genes to consider nonlinear biological system but also considers the mean, variance, skewness, and kurtosis of expression profiles simultaneously. To illustrate biological significance of high moments, we construct a nonlinear gene interaction model, demonstrating that skewness and kurtosis could contain useful information of function association among genes in microarrays. Simulations and real microarray results show that false positive rate of SWang is lower than currently popular methods (T-test, F-test, SAM, and Fold-change) with much higher statistical power. Additionally, SWang can uniquely detect significant genes in real microarray data with imperceptible differential expression but higher variety in kurtosis and skewness. Those identified genes were confirmed with previous published literature or RT-PCR experiments performed in our lab.
Spatial-temporal data requires flexible regression models which can model the dependence of responses on space- and time-dependent covariates. In this paper, we describe a semiparametric space-time model from a Bayesian perspective. Nonlinear time dependence of covariates and the interactions among the covariates are constructed by local linear and piecewise linear models, allowing for more flexible orientation and position of the covariate plane by using time-varying basis functions. Space-varying covariate linkage coefficients are also incorporated to allow for the variation of space structures across the geographical location. The formulation accommodates uncertainty in the number and locations of the piecewise basis functions to characterize the global effects, spatially structured and unstructured random effects in relation to covariates. The proposed approach relies on variable selection-type mixture priors for uncertainty in the number and locations of basis functions and in the space-varying linkage coefficients. A simulation example is presented to evaluate the performance of the proposed approach with the competing models. A real data example is used for illustration.
Bayesian regression; latent structure model; piecewise linear splines; space-time models; variable selection
Understanding human sexual behaviors is essential for the effective prevention of sexually transmitted infections. Analysis of longitudinally measured sexual behavioral data, however, is often complicated by zero-inflation of event counts, nonlinear time trend, time-varying covariates, and informative dropouts. Ignoring these complicating factors could undermine the validity of the study findings. In this paper, we put forth a unified joint modeling structure that accommodates these features of the data. Specifically, we propose a pair of simultaneous models for the zero-inflated event counts: Each of these models contains an auto-regressive structure for the accommodation of the effect of recent event history, and a nonparametric component for the modeling of nonlinear time effect. Informative dropout and time varying covariates are modeled explicitly in the process. Model fitting and parameter estimation are carried out in a Bayesian paradigm by the use of a Markov Chain Monte Carlo (MCMC) method. Analytical results showed that adolescent sexual behaviors tended to evolve nonlinearly over time and they were strongly influenced by the day-to-day variations in mood and sexual interests. These findings suggest that adolescent sex is to a large extent driven by intrinsic factors rather than being compelled by circumstances, thus highlighting the need of education on self protective measures against infection risks.
Joint modeling; Markov Chain Monte Carlo; Mood; Sexually transmitted infections; Zero-inflated Poisson
In the analysis of cluster data the regression coefficients are frequently assumed to be the same across all clusters. This hampers the ability to study the varying impacts of factors on each cluster. In this paper, a semiparametric model is introduced to account for varying impacts of factors over clusters by using cluster-level covariates. It achieves the parsimony of parametrization and allows the explorations of nonlinear interactions. The random effect in the semiparametric model accounts also for within cluster correlation. Local linear based estimation procedure is proposed for estimating functional coefficients, residual variance, and within cluster correlation matrix. The asymptotic properties of the proposed estimators are established and the method for constructing simultaneous confidence bands are proposed and studied. In addition, relevant hypothesis testing problems are addressed. Simulation studies are carried out to demonstrate the methodological power of the proposed methods in the finite sample. The proposed model and methods are used to analyse the second birth interval in Bangladesh, leading to some interesting findings.
Varying-coefficient models; local linear modelling; cluster level variable; cluster effect
In many environmental epidemiology studies, the locations and/or times of exposure measurements and health assessments do not match. In such settings, health effects analyses often use the predictions from an exposure model as a covariate in a regression model. Such exposure predictions contain some measurement error as the predicted values do not equal the true exposures. We provide a framework for spatial measurement error modeling, showing that smoothing induces a Berkson-type measurement error with nondiagonal error structure. From this viewpoint, we review the existing approaches to estimation in a linear regression health model, including direct use of the spatial predictions and exposure simulation, and explore some modified approaches, including Bayesian models and out-of-sample regression calibration, motivated by measurement error principles. We then extend this work to the generalized linear model framework for health outcomes. Based on analytical considerations and simulation results, we compare the performance of all these approaches under several spatial models for exposure. Our comparisons underscore several important points. First, exposure simulation can perform very poorly under certain realistic scenarios. Second, the relative performance of the different methods depends on the nature of the underlying exposure surface. Third, traditional measurement error concepts can help to explain the relative practical performance of the different methods. We apply the methods to data on the association between levels of particulate matter and birth weight in the greater Boston area.
Air pollution; Measurement error; Predictions; Spatial misalignment
We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.
Correlated data; Gaussian stochastic process; Linear mixed models; Smoothly clipped absolute deviation; Smoothing splines