PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Stat Med. Author manuscript; available in PMC 2017 April 30.
Published in final edited form as:
Stat Med. 2016 April 30; 35(9): 1488–1501.
Published online 2015 December 2. doi:  10.1002/sim.6810
PMCID: PMC4821700
NIHMSID: NIHMS736503

Deletion Diagnostics for the Generalised Linear Mixed Model with independent random effects

Abstract

The Generalised Linear Mixed Model (GLMM) is widely used for modelling environmental data. However, such data are prone to influential observations which can distort the estimated exposure-response curve particularly in regions of high exposure. Deletion diagnostics for iterative estimation schemes commonly derive the deleted estimates based on a single iteration of the full system holding certain pivotal quantities such as the information matrix to be constant. In this paper, we present an approximate formula for the deleted estimates and Cook’s distance for the GLMM which does not assume that the estimates of variance parameters are unaffected by deletion. The procedure allows the user to calculate standardised DFBETAs for mean as well as variance parameters. In certain cases, such as when using the GLMM as a device for smoothing, such residuals for the variance parameters are interesting in their own right. In general, the procedure leads to deleted estimates of mean parameters which are corrected for the effect of deletion on variance components as estimation of the two sets of parameters is interdependent. The probabilistic behaviour of these residuals is investigated and a simulation based procedure suggested for their standardisation. The method is used to identify influential individuals in an occupational cohort exposed to silica. The results show that failure to conduct post model fitting diagnostics for variance components can lead to erroneous conclusions about the fitted curve and unstable confidence intervals.

Keywords: Cook’s distance, deletion diagnostics, DFBETAs, exposure-response, Generalised Linear Mixed Models

1. Introduction

1.1. Influential observations in environmental epidemiology

The motivation for this paper arises out of the need to simultaneously address the twin issues of modeling nonlinear exposure-response relationships while discounting the effect of influential observations for a cohort study of lung cancer and silica exposure [1]. Heterogeneity in response commonly leads to unusal shapes of the dose response in survival analysis [2]. Concurrently, smoothing methods have become also common for analyzing such data. Smoothing is particularly appropriate for survival analysis of large occupationally exposed cohorts because the exposure-response relationship is often attenuated with a rising relative risk that flattens or declines at the highest exposures [3]. The impact of observations with the highest values of exposure on the shape of the dose response curve is of great interest because a plateau or decline in risk at high levels may appear to undercut evidence for a causal relationship. Adhoc procedures such as omitting the highest percentiles of the exposure are common but as our application section shows, influential values need not be restricted to the highest exposures. Residual analysis methods are sometimes used but are often inconclusive and not sufficiently well developed for many common models for environmental data. In a previous paper Ganguli et al. [4], the authors have shown that application of different adhoc diagnostic procedures can lead to conflicting conclusions about the functional form of the log hazard ratio for silica exposure.

In a previous reanalysis of the motivating data [5], penalized splines were used in a Cox model framework to estimate the exposure-response relationship. As shown in Figure 1, the selected curve rose to a maximum hazard at 10 mg/m3-years, and then after a long plateau, gradually declined. Informal regression diagnostics revealed that the subject with the highest exposure at the end of follow-up (62 mg/m3-years) appeared in 72 of the risk sets defined for the 77 lung cancer cases. The removal of the subject truncated the upper half of the exposure range beyond 40 mg/m3-years, and with it, removed most of the decline in the fitted curve. The post model fitting deletion methods developed in this paper are applied to the same occupational cohort study of silica [1] that provided the basis for the earlier analysis.

Figure 1
Plot of the estimated log hazard ratio from the motivating silica dataset, including fits from the full dataset and after exclusion of the highest exposures.

1.2. Literature review

Post model fitting deletion diagnostics [6] using criterion-based methods such as Cook’s distance [7] offer a relatively objective means to identify influential data points. An alternative class of methods for identification of influential points focuses on measurement of local influence [813]. As is evident from the name, these methods provide an estimate of local influence in response to small perturbations of the model. They require the user to specify a priori null and alternative hypotheses about the nature of influence. By contrast, Cook’s distance is better suited for exploratory analysis of influence and detection of gross violations.

Much of the literature on criterion-based methods has focused on the classical linear model, an attractive feature of which is that well-known matrix identities can be used to derive explicit expressions for the “deleted” estimates. Outside the realm of the linear model, Fung et al. [14] derives Cook’s distance for a class of semiparametric models while Barlow [15] derives the distance for the proportional hazards model. Preisser and Qaqish [16] considers deletion diagnostics for Generalised Estimating Equations. A particularly simple and elegant framework for deletion diagnostics for linear mixed models is provided by Haslett and Hayes [17], and Haslett [18] where the authors show that the Best Linear Unbiased Predictor (BLUP) after deletion of an observation is equivalent to the BLUP from the complete subset with the deleted observation replaced by its predictor based on the remaining data points. However, each of these make the assumption that deletion of an individual does not affect the estimate of the variance components. While Haslett and Dillane [19] derives separate deletion diagnostics for variance parameters, the authors recognise that estimation of mean and variance parameters are interlinked and that one should really derive deleted estimates of regression coefficients which have been corrected for any changes to estimated variances as a result of deletion. Welsch [20] argues that the estimate of a regression parameter is only meaningful in the context of a standard error estimate which in turn requires deletion diagnostics for variance parameters. This is particularly true when variance components are used as a device for smoothing [21]. Therneau and Grambsch [22] points out that failure to re-estimate the variance can lead to underestimation for larger influence points. Sen Roy and Guria [23] shows that a proper combination of the full set and deleted set estimators leads to a smaller estimate of the standard error and hence to more efficient estimation. Ganguli et al. [4] reviews the application of existing diagnostic methods for Cox models and finds them inadequate when splines are included as regressors.

Our objective in this paper is to develop an approximation for Cook’s distance for a particular class of GLMMs considered by Breslow and Clayton [24] with independent random effects. While our primary focus is on using GLMMs as a device for penalised smoothing, the GLMM with independent random effects has a vast scope including random intercept modeling, correlated errors, clustering, spatial modeling, censored data modelling and combinations of the above. This has been discussed by Ruppert et al. [21]. The proposed method enables the user to study the impact of deletion on point estimates of the fixed and random coefficients as well as on corresponding confidence/prediction intervals. In general, influence detection when the underlying relationship is non-linear is a tricky problem and some of the pertinent issues have been discussed by Manchester and Blanchard [25]. However, the GLMM provides a particularly convenient framework for modeling a non-linear dose response relationship and the interested reader is referred to [21] and [26].

2. Methodology

2.1. Model

Our underlying model is a special case of the GLMM of Breslow and Clayton [24] which assumes independence of random effects. For each of n individuals, we observe an outcome yi together with p covariates xi associated with fixed effects. The random effects are components of the r–dimensional vector, b partitioned as b=(b1T,,bkT)T, with bj being of dimension rj, j = 1, …, k and r=j=1krj. The corresponding coefficients are denoted by zi=(zi1T,,zikT)T, where zij = (zij1, …, zijrj)T. Conditioning on the r–dimensional vector of random effects, we have,

E(yib)=μiandVar(yib)=ϕaiv(μi),i=1,,n.
(1)

The ai ’s are known scalars and v(·) is a known function. The unknown parameter ϕ accounts for overdispersion. Here, μi is related to a linear predictor ηi via the relationship,

g(μi)=ηi=xiTα+ziTb

for some known function g(·). We assume that b ~ Nr(0, D) where D = ((σj2Irj))j=1,…,k. Explicit expressions for parameter estimates are not available and α, b and D are estimated from an iterative system of equations. Restricted Maximum Likelihood (REML) estimating equations [27] are used for the variance parameters. Writing y = (y1, …, yn)T, X = (x1, …, xn)T and Z = (z1, …, zn)T, the equations are,

[XTWXXTWZZTWXZTWZ+D-1][αb]=[XTWYZTWY],
(2)

and,

YTQZjZjTQY=trace(QZjZjT).
(3)

Here, Y = η + (yμ) * g′(μ), and assuming a canonical link

W=diag((ϕaig(μi)))V=W-1+ZDZT=W-1+j=1kσj2ZjZjTandQ=V-1-V-1X(XTV-1X)-1XTV-1,

with * representing component-wise multiplication and Z partitioned as [Z1, …, Zk] corresponding to the k sets of independent random effects.

Our application is based on the approximation of a Cox model with nonlinear log hazard ratios by a GLMM as illustrated by Therneau and Grambsch [22], and Ganguli and Wand [28]. The use of GLMMs as a device for scatterplot smoothing and penalised regression is well known and has been elaborated upon by Ruppert et al. [21], Speed [29], Wahba [30], Babette and John [31]. The essence of the idea is to use a flexible set of basis functions corresponding to a fixed number of knots as explanatory variables which can model nonlinearity and the distributional assumption on the coefficients of these basis functions serves as a device for smoothing. Best Linear Unbiased Prediction can be used to derive the fitted model [32]. The principle of best prediction can be extended in several ways, for example as a device for penalised log likelihood based estimation for generalised responses belonging to the one parameter exponential family [33] and as a device for kriging in case of geospatial data [34]. Extensive discussion on technical issues such as knot selection, alternative choices of basis functions, integral evaluation etc. has been provided by Ruppert et al. [21]. Therneau and Grambsch [22] further extends this to modelling censored data with nonlinear log hazard ratios by using the counting process representation of Cox models. By introducing an approximation to the baseline hazard function, the authors are able to reduce the model to an approximate Poisson GLMM.

2.2. Deletion estimates

A first set of deleted estimates can be obtained by application of the results of Haslett and Hayes, and Hayes and Haslett [17, 35] viz.:

α^(i,0)=α^n-Bieib^(i,0)=b^n-τ^ZTQiei.

where,

τ^=diag(τ^1,,τ^k)=diag(σ^12,,σ^k2)B=(XTV-1X)-1XTV-1=[B1,,Bn]Q=[Q1,,Qn]=((qij))i,j=1,,nei=qii-1QiTY,

and [alpha]n and bn represent estimates from the full dataset. Similarly, estimates of the variance parameters after deletion can be approximated using the results of Haslett and Dillane [19], viz.

Tτ^(i,0)=s(i),
(4)

where,

T=((traceZlZlTQZjZjTQ))l,js(i)=YPiQZZTQPiY+qii-1((traceQZjZjTQEi)).

Here, Pi = Inqii−1EiQ and Ei is the ith unit vector of order n.

Unfortunately, the results of Haslett and Hayes [17], and Hayes and Haslett [35] cannot be applied to subsequent iterations to arrive at the deleted estimates as the ith iteration equation post deletion is not directly related to the ith iteration equation of the full model. Instead, we consider the systems (2) and (3) at convergence and denote by [alpha]n, bn, and Dn the estimators computed from the full system based on n observations. At the next iteration of the deleted system, the quantities Y, μ, η, V, W and Q are updated using the first step deleted estimates [alpha](i,0), b(i,0) and [tau](i,0) to get the next set of iterative equations,

[X(i)TW(i,0)X(i)X(i)TW(i,0)Z(i)Z(i)TW(i,0)X(i)Z(i)TW(i,0)Z(i)+D(i,0)-1][αb]=[X(i)TW(i,0)Y(i)Z(i)TW(i,0)Y(i)]
(5)

where X(i) and Z(i) are the design and variance matrices constructed after deletion of the ith individual and D(i,0) and W(i,0) are variance matrices constructed based on [alpha](i,0), b(i,0) and [tau](i,0). As the one-step deleted estimates, [alpha](i,0), b(i,0) and [tau](i,0) usually constitute small corrections and are often under estimates [22], we make the assumption that the solution to the full system corresponding to (5) may be approximated by [alpha](i,0), b(i,0) and [tau](i,0). Then by a reapplication of the results of Haslett and Hayes [17], and Hayes and Haslett [35], improved deleted estimates are given by

α^(i)=α^n-Biei-Bieib^(i)=b^n-τ^ZTQiei-τ^ZTQiei.

where the superscript * denotes that these quantities were calculated based on [alpha](i,0), b(i,0) and [tau](i,0).

Using a Taylor series expansion, we can linearise [alpha](i) as

α^(i)=α^n-Biei-Bieiα^n-Biei-{Biei+(δeiBiδαTδeiBiδbTδeiBiδτT)(α^(i,0)-α^nb^(i,0)-b^nτ^(i,0)-τ^n)}

Hence

α^(i)-α^n=-2eiBi-(δeiBiδαTδeiBiδbTδeiBiδτT)(α^(i,0)-α^nb^(i,0)-b^nτ^(i,0)-τ^n)
(6)

=fα(α^n,b^n,τ^n,ei),
(7)

where, as shown in Appendix A1,

δeiBiδαT=-eiBδVδαT(QiIp)+Biqii-1QiTδVδαT{(eiQi-QY)Ip}δeiBiδbT=-eiBδVδbT(QiIq)+Biqii-1QiTδVδbT{(eiQi-QY)Iq}δeiBiδτT=-eiBδVδτT(QiIk)+Biqii-1QiTδVδτT{(eiQi-QY)Ik}

and fα(·) is defined in (6) and (7). Note that this enables us to approximate [alpha](i) in terms of quantities obtained from the full model fit. Similarly, we can get expressions for the DFBETAs,

b^(i)-b^n=fb(α^n,b^n,τ^n,ei)
(8)

τ^(i)-τ^n=fτ(α^n,b^n,τ^n,ei)
(9)

as well as similar expressions for DFFIT residuals in terms of the output of the full model fit.

2.3. Standardisation and critical values

The next step is to derive a suitable standardisation factor for the differences in (6) and (8). The usual approach in the literature is to standardise these by a positive definite matrix which can be chosen to reflect specific concerns [36]. A standard procedure following Cook [7] is to use the variances of the corresponding parameters from the full model for standardisation and the percentage points of the F distribution as a critical value. In this case, the corresponding ‘p value’ gives the level of the smallest confidence ellipsoid based on the full data that contains the estimate.

However, there has been some discussion in the literature regarding the choice of the norming matrix to be used for standardisation. Beckman and Cook [6] argues that it should not depend on i whereas the norm suggested by Belsley et al.[37] does so. Dempster and Gasko [36] provides an interesting geometric perspective on various alternative standardisation criteria and their connection with residual analyses.

Our approach is based on the result,

fα(α^n,b^n,τ^n,ei)-fα(α^n,b^n,τ^n,qii-12Z)P0fb(α^n,b^n,τ^n,ei)-fb(α^n,b^n,τ^n,qii-12Z)P0fτ(α^n,b^n,τ^n,ei)-fτ(α^n,b^n,τ^n,qii-12Z)P0
(10)

where Z is a standard normal deviate. This result has been proved in Appendix (A2). The implication of this is that we can generate K standard normal deviates {Zk, k = 1, …, K} and estimate the covariance matrix of say, fα([alpha]n, bn, [tau]n, i) by the empirical covariance of fα(α^n,b^n,τ^n,qii-12Zk), k = 1, …, K which can then be used to standardise the residuals fα([alpha]n, bn, [tau]n, i). Similar approaches can be used for the random effects b and the variance components τ as well as for the DFFITs [eta w/ hat]i[eta w/ hat]i(i). Simulation based critical values can also be derived for each of the quantities based on the results in (10).

3. Application

3.1. Application to the silica exposure dataset

In this section, we apply our methods to the motivating dataset. The connection between Cox models and Poisson regression is well established [38, 39]. Here, we use the equivalence discussed by Therneau and Grambsch [22], and Ganguli and Wand [28] who approximate a Cox proportional hazards model with a non linear log hazard ratio by a Poisson mixed effects model. We fit a model of the form,

λ(tx)=λ0(t)exp{β0HISP+f(x(t))},
(11)

where t is the age at which a subject dies from lung cancer, HISP is an indicator for whether or not a subject is Hispanic and x(t) is cumulative silica exposure at time t. The function f(·) is unknown but assumed to be smooth. Using I as the indicator function and δ as the censoring indicator (1, if uncensored and 0, if censored) and writing N(t) = I(Tt, δ = 1), it follows from Ganguli and Wand [28] that (11) can be approximated as,

E(N(t)b)=exp{β0HISP+β1x(t)+k=1Kbkx(t)-κk3}Λ^,
(12)

where the κk’s are a set of K quantiles of x and the components of b = (b1, …, bK) ~ i.i.d. N(0, σb2) and [Lambda with circumflex] is the estimator of the cumulative hazard function of the standard Cox model with log linear hazard. Following Ganguli and Wand [28], and Therneau and Grambsch[22], this can be approximated by a GLMM with a log-linear link function,

log{E(N(t)b)}=log(Λ^)+β0HISP+β1x(t)+k=1Kbkx(t)-κk3,

where [Lambda with circumflex], the estimated cumulative hazard from a standard Cox model assuming a linear log hazard function for cumulative silica hazard, serves as an approximate offset.

Figure 2 shows the DFFITs in panel (a) and DFBETAs for the variance component σb2 in panel (b). Panels (c) and (d) show the estimates of the log hazard function after excluding subjects identified as influential in panel (a) and after excluding subjects identified as influential based on both panels (a) and (b) respectively. Subjects were considered to be influential if their standardised DFBETA or DFFIT values exceeded the simulation based benchmark described in section 2.3. We can make the following observations:

Figure 2
Influence diagnostics for the silica data. The horizontal axis represents cumulative silica exposure (mg/m3yr). Panel (a) shows DFFITs and panel (b) shows DFBETAs for the variance component. Panels (c) and (d) give the estimated log hazard ...
  • The log hazard continues to dip beyond an exposure of 50 mg/m3-year in the first case but approaches a plateau in the second.
  • The two figures (Figure 2 (c)–(d)) show a remarkable contrast. In the former case (Figure 2-(c)), deletion does not affect the shape of the dose response curve significantly although it does attenuate the response - by about 50% at 30 mg/m3-years. However, in the latter case (Figure 2-(d)), deletion appears to have a more dramatic effect and leads to a different shape of the dose response curve. The results suggest that the curve based on the full dataset might be picking up spurious structure at the upper range of exposures.
  • The 95% pointwise confidence intervals are indicated using bold lines for the full dataset and dotted lines corresponding to post-deletion. The results make a strong case for conducting post model fitting deletion diagnostics for variance components as the confidence intervals obtained by dropping observations with influential DFFIT values only is very wide. Note that individuals who are influential for variance component estimation are still included for estimation in panel (c) and it appears that the inclusion of these observations leads to an unstable estimate of the variance components and hence to an unreliable confidence interval. By contrast, the confidence intervals obtained in panel (d) are tighter.

Our learning from this application is firstly that influential observations need not occur at the highest extremes of exposure. Second, we can distinguish between observations which are influential for the shape of the estimated dose response curve and those which impact its value but not its shape. The individual with the highest exposure is influential for the estimated shape of the log hazard ratio. The optimum degrees of freedom estimate decreases from approximately three to two as a result of dropping the individual with the highest exposure. Finally, we note that failure to conduct post model fitting diagnostics for variance components can lead to erroneous and unstable confidence interval estimation.

3.2. Performance on simulated datasets

In this section we demonstrate the performance of the method on three different applications all of which may be fitted using GLMMs.

3.2.1. Clustered data

We simulate clustered count data using the model:

yij~Poisson(μij);i=1,2,k;j=1,2,,ni,

with k being the number of clusters. All observations from the ith cluster share a common random intercept, bi specified by log(μij) = bi + βxij, where xij is a subject specific covariate and β, the corresponding log rate ratio. For the purpose of simulation, we set β = 0.5, k = 9 and ni = 100. The bi, i = 1, 2, …, 9, are generated from a N(0, 0.5) distribution and xij’s are obtained from a sequence of (k × ni) = 900 values generated from a U[0, 1] distribution. We introduce an influential subject (here cluster) by replacing all observations in cluster 3 by counts corresponding to μij = 20.

Figure 3 shows the corresponding standardized residuals which clearly identify observations belonging to cluster 3 as influential in terms of high residual values. It is noteworthy that inclusion of the outlying cluster forces the regression line to have a negative slope whereas the true slope coefficient is positive. Note that the estimates of the variance component are 3.544 and 0.301 corresponding to inclusion and exclusion of the outlying cluster respectively.

Figure 3
Plot of standardized DFBETAs of fixed effects for clustered Possion counts with all observation in cluster 3 as influential.

3.2.2. Binary generalized additive model

We generate data from a binary model with a nonlinear dose response which is modelled using a Generalized Additive Model (GAM) specified by:

y~Binary(p)wherelogit(p)=β0+f(x).
(13)

We set β0 = 0.5 and f(x) = 1.5x − 2.5x2. We first generate the xi, i = 1, 2 …, n = 1500 from the Uniform[0, 1] distribution and then generate the binary yi, i = 1, 2, …, n = 1500 using (13). Thereafter, we introduce influential values by sorting the x values and replacing all the 0 responses between observations 1480–1495 by 1’s. Following Ruppert et al. [21], the model in (13) can be represented as a GLMM as follows,

logit(pb)=β0+k=1Kbkx-κk3.
(14)

Here, the κk’s are a set of K quantiles of x and the components of b = (b1, …, bK) ~ i.i.d. N(0, σb2). This model can be fitted using the glmmPQL routine in R. Note that K is set at 15 following the guidelines provided by Ruppert et al. [21]. A plot of the estimated log-odds ratio along with point-wise 95% confidence invervals (Figure 4) suggests spurious structure of the esimated function when computed from the full dataset whereas the functional form based on the deleted estimates indicates a smoother fit. Note that the REML based estimates of the degrees of freedom based on all and non-influential observations only were 4.5 and 2.8 respectively.

Figure 4
True (continuous line), estimated (dotted lines) and 95% point-wise confidence interval (thick dashed line) of log odds f(x) for a simulated binary GAM. (a) is obtained by including all observations and (b) by excluding observations flagged as influential. ...

3.2.3. Geostatistical data

We consider here a simulated example from the following geostatistical model:

yi~Poisson(μ),wherelog(μ)=s(x),
(15)

with s(·) being a bivariate function of x = (x1, x2)T, which we set as

s(x1,x2)=0.62πσ1σ2×e-(x1-0.2)2σ12-(x2-0.3)2σ22+0.42πσ1σ2×e-(x1-0.7)2σ12-(x2-0.8)2σ22,

where σ1 = 0.3 and σ2 = 0.4. Five hundred sets of x were independently generated from the Uniform [0, 1] distribution and based on these, y’s were generated using (15). We replaced the 500th observation with a relatively high value to induce an influential observation. Following Kammann and Wand [34], a low rank model formulation is given by,

logE(yib)=β0+β1x1+β2x2+k=1Kbkxi-κk3.
(16)

Here, the κk’s are a set of K knots chosen by applying a space-filling algorithm to the xi’s and the components of b = (b1, …, bK) ~ i.i.d. N(0, σb2).

Results are plotted in Figure 5. The estimated d.f. with and without influential observations were 23.3 and 21.75 respectively. The influential point was correctly identified and the estimated functional form after dropping the influential point was similar to the true functional form.

Figure 5
Contour plots showing a comparison of the true and estimated functional forms with and without inclusion of the influential point. DFBETAs for the variance component are overlaid on the estimated surface and the size of the solid circle corresponds to ...

Discussion

In this paper, we have developed Cook’s distance for fixed and random effects parameters as well as for variance component estimates for the Generalised Linear Mixed Model with independent random effects. Given the wide applicability of the model, these diagnostics have far reaching scope. The simulations and application presented make a strong case for conducting post model fitting deletion diagnostics. First, influential observations need not occur at the highest extremes of exposure. Second, we can distinguish between influential observations which only modestly influence the values of the log hazard over local regions, and those which influence the shape of the exposure-response curve. Third, observations influential for the variance component estimate occurred at the highest levels of exposure and the deletion of these outliers had more influence on the shape of the curve. In the silica application, when the highest exposed subject was removed, the optimum degrees of freedom dropped from 3 to 2. Using the full dataset leads to overfitting in regions of high cumulative exposure. This is common for exposure-response modelling and possible reasons include the saturation of biological processes, misclassification of exposure at high levels and the healthy worker effect [3, 5, 40]. For the silica dataset, the main problem is the sparsity of data and few cases in the regions of high cumulative exposure.

Empirical observations from the authors suggest that the performance of the method is driven by a combination of the degree of nonlinearity in the exposure-response relationship and the local density of data points. The method performs well in presence of moderate nonlinearity even in regions of sparse data but will tend to spuriously identify influential observations in regions of high nonlinearity. We have also informally compared the performance of the method on simulated datasets with Poisson and binary outcomes and observed that for the same degrees of nonlinearity and data density, influential values are identified more accurately for Poisson outcomes. Further extensions of the method should accomodate deletion of a subset of observations. Note that in our application to the silica data which has time-varying exposures, we have considered detection of influential exposures and considered an individual as influential if at least one of his observations have been classified as influential. We have assumed throughout the paper that the random effects are independent. While such an assumption is typically made for cases such as our motivating example, where the variance components model serves as a device for smoothing, non-spherical variance structures are commonly used when random effects are used to model subject specific effects or overdispersion. As additional covariance parameters are also estimated by Restricted Maximum Likelihood (REML) and the results of Haslett and Hayes, and Hayes and Haslett[17, 35] continue to be valid, our method can, in theory, be extended to accomodate non-spherical variance structures although at a cost of added computational complexity. Additional research which could be spurred by this paper include the development of models which include explanations for the occurence of influential observations such as population heterogenity but it is clear that such research requires considerable substantive knowledge.

Acknowledgments

Contract/grant sponsor: R01 CA081345, National Institute of Health.

This research was supported by grant R01 CA081345 from the National Institute of Health. The authors would also like to thank Prof. Checkoway for permission to use the dataset on silica exposure.

Appendix

A1: Derivation of the expression for the derivatives in (6)

We have,

δeiBiδαT=δBiδαTeiIp+BiIpδeiδαT

To calculate the derivatives, first note that writing C=(0p×pXTXV),

B=(Ip0p×n)(0p×pXTXV)-1(0p×nIn)=(Ip0p×n)C-1(0p×nIn)

so that

δBδαT={(Ip0p×n)Ip}δC-1δαT{(0n×pIn)TIp}=-{(Ip0p×n)Ip}C-1δCδαTC-1{(0n×pIn)TIp}

Noting that,

δCδαT=(00T0δVδαT)

and that

δVδαT=[ϕa1g(μ1)g(μ1)x1T0T0T0T0Tϕang(μn)g(μn)xnT],

we get after some simplification,

δBiδαT=-BδVδαT(QiIp).

Similarly,

δQiδαT=-QδVδαT(QiIp)andδqiiδαT=-QiTδVδαT(QiIp).

Also, using the fact that

δeiδαT=qii-1QiTδVδαT{(eiQi-QY)Ip}

we get the expression for δeiBiδαT. Expressions for δeiBiδbT and δeiBiδτT can be similarly found.

A2: To prove (10)

We make the following three assumptions:

  1. n-1[XTWXXTWZZTWXZWZ+D-1], which is positive definite.
  2. W and ZDZT have finite elements with W > 0 and/or the elements of Z, D finite.
  3. ai=ϕg(μi)g(μi)c, i = 1, …, n for some constant c.

Note that fα(α^n,b^n,τ^n,ei)-fα(α^n,b^n,τ^n,qii-12Z) can be expressed as

-2(ei-qii-12Z)Bi+(A1-C1A2-C2A3-C3)(B1B2B3)+(C1C2C3)(B1-D1B2-D2B3-D3)
(17)

Consider the first term in (17). By the normal theory model applied to the constructed variable Y,

eiasyN(1QiTXα,qii-2QiTVQi)=N(0,qii-1)sothatei-qii-12ZP0.

Also,

Bi=1n(1nXTV-1X)-11nXTViei0asn.

Here Vi is the ith column of V−1. Now,

A1-C1=-(ei-qii-12Z)BδVδαT(QIp)+Biqii-1QiTδVδαT{(ei-qii-12Z)QiIp}=-(ei-qii-12Z)BAQX+Biqii-1QiT(ei-qii-12Z)AQX-Biqii-1QiAqii(ei-qii-12Z)EiX(ei-qii-12Z){-(XTV-1Xn)-1(XTV-1Xn)+(XTV-1Xb-1)(1nXTVi1X)-(XTV-1Xn-1)(1nXTVi1X)}=o(1)O(1)=o(1)

and B1=(1nXTV-1X)-11(n)XTVi1ne1=O(1)O(1)o(1) assumptions 1–3, so that

B1(A1-C1)=o(1).

Similarly, B2(A2C2) = o(1). Note that

A3-C3=(ei-qii-12Z)BZD(ZTQiIk)+Biqii-1QiT(ei-qii-12Z)(ZTQiIk)-Biqii-1QiTqii(ei-qii-12Z)(ZTIk)=o(1)O(1)O(1)+O(1)o(1)-O(1)o(1)=o(1)

and that B3=((trace1n2(ZlTQZj)1n2(ZjQZl)))-1=O(1)o(1)=o(1), so that B3(A3C3) = o(1). Similarly,

C1=(1nXTV-1X)-11nXTVi(ei-qii-12Z),and(B1-D1)=(1nqii-12Z)(1nXTV-1X)-1(1nXTV-1AQiTX)+(1nXTV-1X)-1{1nXTViQiT(qii-12Z/n)AQiTX}-(1nXTV-1X)-1{1nXTViqii-1QiTAqii(qii-12Z(n))EiX}sothatC1(B1-D1)=[O(1)]O(1)O(1)o(1)=o(1)

Similarly, we can show that C2(B2D2) = o(1).

Finally, note that,

C3n=-(1nqii-12Z)(1nXTV-1X)-1(1nXTV-1Z)D(ZTQiTIk)+(1nXTV-1X)-11nXViqii-1QiT(1nqii-12Z)(ZTQiIk)-(1nXTV-1X)-11nXTViQiT(1nqii-12Z)(ZTIk)

so that C3(B3-D3)=O(n)O(n-1)=o(n-12)=o(1). This completes the proof. In case, V is replaced by a consistent estimator V, we require an additional set of conditions:

1nXT(V^-1-V-1)XP01nXT(V^-1-V-1)ZP01nXT(V^-1-V-1)eiP01nZT(V^-1-V-1)eiP0

In case the application is to clustered data, we would require the above set of conditions to hold for each cluster.

References

1. Checkoway H, Heyer NJ, Seixas NS, Welp EA, Demers PA, Hughes JM, Weill H. Dose-response association of silica with nonmalignant respiratory disease and lung cancer mortality in the diatomaceous earth industry. Americal Journal of Epidemiology. 1997;145:680–688. [PubMed]
2. Hernan MA. The hazards of hazard ratios. Epidemiology. 2010;21:13–15. [PMC free article] [PubMed]
3. Stayner L, Steenland K, Dosemeci M, Hertz-Picciotto I. Attenuation of exposure-response curves in occupation cohort studies at high exposure levels. Scandinavian Journal of Work, Enironment & Health. 2003;29:317–324. [PubMed]
4. Ganguli B, Naskar M, Malloy EJ, Eisen EA. Determination of the functional form of the relationship of covariates to the log hazard ratio in a Cox model. Journal of Applied Statistics. 2015 doi: 10.1080/02664763.2014.995607. http://dx.doi.org/10.1080/02664763.2014.995607. [Cross Ref]
5. Eisen EA, Agalliu I, Thurston SW, Coul BA, Checkoway H. Smoothing in occupational cohort studies: An illustration based on penalized splines. Occupational Environmental Medicine. 2004;61:854–860. [PMC free article] [PubMed]
6. Beckman RJ, Cook RD. Outlier............s. Technometrics. 1983;25:119–149.
7. Cook RD. Influential observations in linear regression. Journal of the American Statistical Association. 1979;74:169–174.
8. Pregibon D. Logistic regression diagnostics. The Annals of Statistics. 1981;9:705–724.
9. Ouwens MJ, Tan F, Berger M. Local influence to detect influential data structures for generalized linear mixed models. Biometrics. 2001;57:1166–1172. [PubMed]
10. Cain KC, Lange NT. Approximate case influence for the proportional hazards regression model with censored data. Biometrics. 1984;40:493–499. [PubMed]
11. Lesaffre E, Verbeke G. Local influence in linear mixed models. Biometrics. 1998;54:570–582. [PubMed]
12. Zhu H, Lee S. Local influence for generalized linear mixed models. The Canadian Journal of Statistics. 2003;31:293–309.
13. Fung W, Gu H, Xiang L, Yau KKW. Assessing local influence in principal component analylsis with application to haematology study data. Statistics in Medicine. 2007;26:2730–2744. [PubMed]
14. Fung WK, Zhu HY, Wei BC, He X. Influence diagnostics and outlier tests for semiparametric mixed models. Journal of the Royal Statistical Society, Series B. 2002;64:565–579.
15. Barlow WE. Global measures of local influence for proportional hazards regression models. Biometrics. 1997;53:1157–162. [PubMed]
16. Preisser JS, Qaqish BF. Deletion diagnostics for generalized estimating equations. Biometrika. 1996;83:551–562.
17. Haslett J, Hayes K. Residuals for the linear model with general covariance structure. Journal of the Royal Statistics Society, Series B. 1998;60:201–215.
18. Haslett J. A simple derivation of deletion diagnostic results for the general linear model with correlated errors. Journal of the Royal Statistical Society, Series B. 1999;61:603–609.
19. Haslett J, Dillane D. Application of delete = replace to deletion diagnostics for variance component estimation in the linear mixed model. Journal of the Royal Statistical Society, Series B. 2004;66:131–143.
20. Welsch RE. Influential observations, high leverage points and outliers in linear regression: Comment. Statistical Science. 1986;1:403–405.
21. Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; New York: 2003.
22. Therneau TM, Grambsch PM. Modeling Survival Data; Extending the Cox Model. Springer-Verlag; New York: 2000.
23. Sen Roy S, Guria S. Estimation of regression parameters in the presence of outliers in the response. Statistics. 2009;43:531–539.
24. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25.
25. Manchester L, Blanchard W. When is a curve an outlier? An account of a tricky problem. The Canadian Journal of Statistics. 1996;24:455–466.
26. Grambsch PM, Terneau TM, Fleming TR. Diagnostics plot to reveal functional form for covariates in multiplicative intensity models. Biometrics. 1995;51:1469–1482. [PubMed]
27. Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrics. 1971;58:545–554.
28. Ganguli B, Wand MP. Additive models for geo-referenced failure time data. Statistics in Medicine. 2006;25:2469–2482. [PubMed]
29. Speed T. [That BLUP is a good thing: The estimation of random effects]: Comment. Statistical Science. 1991;6:42–44. doi: 10.1214/ss/1177011930. http://projecteuclid.org/euclid.ss/1177011930. [Cross Ref]
30. Wahba G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society Series B. 1978;40:364–372.
31. Babette AB, John AR. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association. 1998;93:961–976.
32. Robinson GK. That blup is a good thing: The estimation of random effects. Statistical Science. 1991;6:15–32.
33. Green PJ. Penalized likelihood for general semi-parametric regression models. International Statistical Review. 1987;55:245–259.
34. Kammann E, Wand M. Geoadditive models. Applied Statistics. 2003;52:1–18.
35. Hayes K, Haslett J. Simplifying general least squares. The American Statistician. 1999;53:376–381.
36. Dempster A, Gasko-Green M. New tools for residual analysis. The Annals of Statistics. 1981;9:945–959.
37. Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons; New York: 1980.
38. Whitehead J. Fitting Cox’s regression model to survival data sing GLIM. Journal of the Royal Statistical Society, Series C. 1980;29:268–275.
39. Ma R, Krewski D, Burnett RT. Random effects Cox models: A Poisson modeling approach. Biometrika. 2003;90:157–169.
40. Steeland K, Ryan S, Mitch K, Jenifer J, Henry DK. Risk estimation with epidemiologic data when response attenuates at high exposure levels. Environmental Health perspective. 2011;11:854–860. [PMC free article] [PubMed]