Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC4821700

Formats

Article sections

Authors

Related links

Stat Med. Author manuscript; available in PMC 2017 April 30.

Published in final edited form as:

PMCID: PMC4821700

NIHMSID: NIHMS736503

See other articles in PMC that cite the published article.

The Generalised Linear Mixed Model (GLMM) is widely used for modelling environmental data. However, such data are prone to influential observations which can distort the estimated exposure-response curve particularly in regions of high exposure. Deletion diagnostics for iterative estimation schemes commonly derive the deleted estimates based on a single iteration of the full system holding certain pivotal quantities such as the information matrix to be constant. In this paper, we present an approximate formula for the deleted estimates and Cook’s distance for the GLMM which does not assume that the estimates of variance parameters are unaffected by deletion. The procedure allows the user to calculate standardised DFBETAs for mean as well as variance parameters. In certain cases, such as when using the GLMM as a device for smoothing, such residuals for the variance parameters are interesting in their own right. In general, the procedure leads to deleted estimates of mean parameters which are corrected for the effect of deletion on variance components as estimation of the two sets of parameters is interdependent. The probabilistic behaviour of these residuals is investigated and a simulation based procedure suggested for their standardisation. The method is used to identify influential individuals in an occupational cohort exposed to silica. The results show that failure to conduct post model fitting diagnostics for variance components can lead to erroneous conclusions about the fitted curve and unstable confidence intervals.

The motivation for this paper arises out of the need to simultaneously address the twin issues of modeling nonlinear exposure-response relationships while discounting the effect of influential observations for a cohort study of lung cancer and silica exposure [1]. Heterogeneity in response commonly leads to unusal shapes of the dose response in survival analysis [2]. Concurrently, smoothing methods have become also common for analyzing such data. Smoothing is particularly appropriate for survival analysis of large occupationally exposed cohorts because the exposure-response relationship is often attenuated with a rising relative risk that flattens or declines at the highest exposures [3]. The impact of observations with the highest values of exposure on the shape of the dose response curve is of great interest because a plateau or decline in risk at high levels may appear to undercut evidence for a causal relationship. Adhoc procedures such as omitting the highest percentiles of the exposure are common but as our application section shows, influential values need not be restricted to the highest exposures. Residual analysis methods are sometimes used but are often inconclusive and not sufficiently well developed for many common models for environmental data. In a previous paper Ganguli *et al.* [4], the authors have shown that application of different adhoc diagnostic procedures can lead to conflicting conclusions about the functional form of the log hazard ratio for silica exposure.

In a previous reanalysis of the motivating data [5], penalized splines were used in a Cox model framework to estimate the exposure-response relationship. As shown in Figure 1, the selected curve rose to a maximum hazard at 10 mg/m^{3}-years, and then after a long plateau, gradually declined. Informal regression diagnostics revealed that the subject with the highest exposure at the end of follow-up (62 mg/m^{3}-years) appeared in 72 of the risk sets defined for the 77 lung cancer cases. The removal of the subject truncated the upper half of the exposure range beyond 40 mg/m^{3}-years, and with it, removed most of the decline in the fitted curve. The post model fitting deletion methods developed in this paper are applied to the same occupational cohort study of silica [1] that provided the basis for the earlier analysis.

Post model fitting deletion diagnostics [6] using criterion-based methods such as Cook’s distance [7] offer a relatively objective means to identify influential data points. An alternative class of methods for identification of influential points focuses on measurement of local influence [8–13]. As is evident from the name, these methods provide an estimate of local influence in response to small perturbations of the model. They require the user to specify a priori null and alternative hypotheses about the nature of influence. By contrast, Cook’s distance is better suited for exploratory analysis of influence and detection of gross violations.

Much of the literature on criterion-based methods has focused on the classical linear model, an attractive feature of which is that well-known matrix identities can be used to derive explicit expressions for the “deleted” estimates. Outside the realm of the linear model, Fung *et al.* [14] derives Cook’s distance for a class of semiparametric models while Barlow [15] derives the distance for the proportional hazards model. Preisser and Qaqish [16] considers deletion diagnostics for Generalised Estimating Equations. A particularly simple and elegant framework for deletion diagnostics for *linear* mixed models is provided by Haslett and Hayes [17], and Haslett [18] where the authors show that the Best Linear Unbiased Predictor (BLUP) after deletion of an observation is equivalent to the BLUP from the complete subset with the deleted observation replaced by its predictor based on the remaining data points. However, each of these make the assumption that deletion of an individual does not affect the estimate of the variance components. While Haslett and Dillane [19] derives separate deletion diagnostics for variance parameters, the authors recognise that estimation of mean and variance parameters are interlinked and that one should really derive deleted estimates of regression coefficients which have been corrected for any changes to estimated variances as a result of deletion. Welsch [20] argues that the estimate of a regression parameter is only meaningful in the context of a standard error estimate which in turn requires deletion diagnostics for variance parameters. This is particularly true when variance components are used as a device for smoothing [21]. Therneau and Grambsch [22] points out that failure to re-estimate the variance can lead to underestimation for larger influence points. Sen Roy and Guria [23] shows that a proper combination of the full set and deleted set estimators leads to a smaller estimate of the standard error and hence to more efficient estimation. Ganguli *et al.* [4] reviews the application of existing diagnostic methods for Cox models and finds them inadequate when splines are included as regressors.

Our objective in this paper is to develop an approximation for Cook’s distance for a particular class of GLMMs considered by Breslow and Clayton [24] with independent random effects. While our primary focus is on using GLMMs as a device for penalised smoothing, the GLMM with independent random effects has a vast scope including random intercept modeling, correlated errors, clustering, spatial modeling, censored data modelling and combinations of the above. This has been discussed by Ruppert *et al.* [21]. The proposed method enables the user to study the impact of deletion on point estimates of the fixed and random coefficients as well as on corresponding confidence/prediction intervals. In general, influence detection when the underlying relationship is non-linear is a tricky problem and some of the pertinent issues have been discussed by Manchester and Blanchard [25]. However, the GLMM provides a particularly convenient framework for modeling a non-linear dose response relationship and the interested reader is referred to [21] and [26].

Our underlying model is a special case of the GLMM of Breslow and Clayton [24] which assumes independence of random effects. For each of *n* individuals, we observe an outcome *y _{i}* together with

$$\mathrm{E}({y}_{i}\mid \mathbf{b})={\mu}_{i}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{and}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{Var}({y}_{i}\mid \mathbf{b})=\varphi {a}_{i}v({\mu}_{i}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}i=1,\dots ,n.$$

(1)

The *a _{i}* ’s are known scalars and

$$g({\mu}_{i})={\eta}_{i}={{\mathbf{x}}_{i}}^{T}\mathit{\alpha}+{{\mathbf{z}}_{i}}^{T}\mathbf{b}$$

for some known function *g*(·). We assume that **b** ~ *N _{r}*(

$$\left[\begin{array}{cc}{\mathbf{X}}^{T}\mathbf{WX}& {\mathbf{X}}^{T}\mathbf{WZ}\\ {\mathbf{Z}}^{T}\mathbf{WX}& {\mathbf{Z}}^{T}\mathbf{WZ}+{\mathbf{D}}^{-1}\end{array}\right]\phantom{\rule{0.38889em}{0ex}}\left[\begin{array}{c}\mathit{\alpha}\\ \mathbf{b}\end{array}\right]=\left[\begin{array}{c}{\mathbf{X}}^{T}\mathbf{WY}\\ {\mathbf{Z}}^{T}\mathbf{WY}\end{array}\right],$$

(2)

and,

$${\mathbf{Y}}^{T}{\mathbf{QZ}}_{j}{{\mathbf{Z}}_{j}}^{T}\mathbf{QY}=\text{trace}\phantom{\rule{0.16667em}{0ex}}({\mathbf{QZ}}_{j}{{\mathbf{Z}}_{j}}^{T}).$$

(3)

Here, **Y** = ** η** + (

$$\begin{array}{lll}\mathbf{W}\hfill & =\hfill & \text{diag}((\varphi {a}_{i}{g}^{\prime}({\mu}_{i})))\hfill \\ \mathbf{V}\hfill & =\hfill & {\mathbf{W}}^{-1}+{\mathbf{ZDZ}}^{T}={\mathbf{W}}^{-1}+{\sum}_{j=1}^{k}{{\sigma}_{j}}^{2}{\mathbf{Z}}_{j}{{\mathbf{Z}}_{j}}^{T}\hfill \\ \text{and}\hfill & \mathbf{Q}\hfill & ={\mathbf{V}}^{-1}-{\mathbf{V}}^{-1}\mathbf{X}{({\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}{\mathbf{X}}^{T}{\mathbf{V}}^{-1},\hfill \end{array}$$

with * representing component-wise multiplication and **Z** partitioned as [**Z**_{1}, …, **Z*** _{k}*] corresponding to the

Our application is based on the approximation of a Cox model with nonlinear log hazard ratios by a GLMM as illustrated by Therneau and Grambsch [22], and Ganguli and Wand [28]. The use of GLMMs as a device for scatterplot smoothing and penalised regression is well known and has been elaborated upon by Ruppert *et al.* [21], Speed [29], Wahba [30], Babette and John [31]. The essence of the idea is to use a flexible set of basis functions corresponding to a fixed number of knots as explanatory variables which can model nonlinearity and the distributional assumption on the coefficients of these basis functions serves as a device for smoothing. Best Linear Unbiased Prediction can be used to derive the fitted model [32]. The principle of best prediction can be extended in several ways, for example as a device for penalised log likelihood based estimation for generalised responses belonging to the one parameter exponential family [33] and as a device for kriging in case of geospatial data [34]. Extensive discussion on technical issues such as knot selection, alternative choices of basis functions, integral evaluation etc. has been provided by Ruppert *et al.* [21]. Therneau and Grambsch [22] further extends this to modelling censored data with nonlinear log hazard ratios by using the counting process representation of Cox models. By introducing an approximation to the baseline hazard function, the authors are able to reduce the model to an approximate Poisson GLMM.

A first set of deleted estimates can be obtained by application of the results of Haslett and Hayes, and Hayes and Haslett [17, 35] viz.:

$$\begin{array}{lll}{\widehat{\mathit{\alpha}}}^{(i,0)}\hfill & =\hfill & {\widehat{\mathit{\alpha}}}_{n}-{\mathbf{B}}_{i}{\stackrel{\sim}{e}}_{i}\hfill \\ {\widehat{\mathbf{b}}}^{(i,0)}\hfill & =\hfill & {\widehat{\mathbf{b}}}_{n}-\widehat{\mathit{\tau}}{\mathbf{Z}}^{T}{\mathbf{Q}}_{i}{\stackrel{\sim}{e}}_{i}.\hfill \end{array}$$

where,

$$\begin{array}{lllll}\widehat{\mathit{\tau}}\hfill & =\hfill & \text{diag}({\widehat{\tau}}_{1},\dots ,{\widehat{\tau}}_{k})\hfill & =\hfill & \text{diag}({\widehat{\sigma}}_{1}^{2},\dots ,{\widehat{\sigma}}_{k}^{2})\hfill \\ \mathbf{B}\hfill & =\hfill & {({\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\hfill & =\hfill & [{\mathbf{B}}_{1},\dots ,{\mathbf{B}}_{n}]\hfill \\ \mathbf{Q}\hfill & =\hfill & [{\mathbf{Q}}_{1},\dots ,{\mathbf{Q}}_{n}]\hfill & =\hfill & {(({q}_{ij}))}_{i,j=1,\dots ,n}\hfill \\ {\stackrel{\sim}{e}}_{i}\hfill & =\hfill & {{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}\mathbf{Y},\hfill & \phantom{\rule{0.16667em}{0ex}}\hfill & \phantom{\rule{0.16667em}{0ex}}\hfill \end{array}$$

and * _{n}* and

$$\mathbf{T}{\widehat{\mathit{\tau}}}^{(i,0)}={\stackrel{\sim}{\mathbf{s}}}^{(i)},$$

(4)

where,

$$\begin{array}{lll}\mathbf{T}\hfill & =\hfill & {((\text{trace}{\mathbf{Z}}_{l}{\mathbf{Z}}_{l}^{T}{\mathbf{QZ}}_{j}{\mathbf{Z}}_{j}^{T}\mathbf{Q}))}_{l,j}\hfill \\ {\stackrel{\sim}{\mathbf{s}}}^{(i)}\hfill & =\hfill & {\mathbf{YP}}_{i}{\mathbf{QZZ}}^{T}{\mathbf{QP}}_{i}\mathbf{Y}+{{q}_{ii}}^{-1}((\text{trace}{\mathbf{QZ}}_{j}{\mathbf{Z}}_{j}^{T}\mathbf{Q}{\mathbf{E}}_{i})).\hfill \end{array}$$

Here, **P*** _{i}* =

Unfortunately, the results of Haslett and Hayes [17], and Hayes and Haslett [35] cannot be applied to subsequent iterations to arrive at the deleted estimates as the *i ^{th}* iteration equation post deletion is not directly related to the

$$\left[\begin{array}{cc}{\mathbf{X}}^{{(i)}^{T}}{\mathbf{W}}^{(i,0)}{\mathbf{X}}^{(i)}& {\mathbf{X}}^{{(i)}^{T}}{\mathbf{W}}^{(i,0)}{\mathbf{Z}}^{(i)}\\ {\mathbf{Z}}^{{(i)}^{T}}{\mathbf{W}}^{(i,0)}{\mathbf{X}}^{(i)}& {\mathbf{Z}}^{{(i)}^{T}}{\mathbf{W}}^{(i,0)}{\mathbf{Z}}^{(i)}+{\mathbf{D}}^{{(i,0)}^{-1}}\end{array}\right]\phantom{\rule{0.38889em}{0ex}}\left[\begin{array}{c}\mathit{\alpha}\\ \mathbf{b}\end{array}\right]=\left[\begin{array}{c}{\mathbf{X}}^{{(i)}^{T}}{\mathbf{W}}^{(i,0)}{\mathbf{Y}}^{(i)}\\ {\mathbf{Z}}^{{(i)}^{T}}{\mathbf{W}}^{(i,0)}{\mathbf{Y}}^{(i)}\end{array}\right]$$

(5)

where **X**^{(}^{i}^{)} and **Z**^{(}^{i}^{)} are the design and variance matrices constructed after deletion of the *i ^{th}* individual and

$$\begin{array}{lll}{\widehat{\mathit{\alpha}}}^{(i)}\hfill & =\hfill & {\widehat{\mathit{\alpha}}}_{n}-{\mathbf{B}}_{i}{\stackrel{\sim}{e}}_{i}-{\mathbf{B}}_{i}\ast {{\stackrel{\sim}{e}}_{i}}^{\ast}\hfill \\ {\widehat{\mathbf{b}}}^{(i)}\hfill & =\hfill & {\widehat{\mathbf{b}}}_{n}-\widehat{\mathit{\tau}}{\mathbf{Z}}^{T}{\mathbf{Q}}_{i}{\stackrel{\sim}{e}}_{i}-{\widehat{\mathit{\tau}}}^{\ast}{\mathbf{Z}}^{T}{{\mathbf{Q}}_{i}}^{\ast}{{\stackrel{\sim}{e}}_{i}}^{\ast}.\hfill \end{array}$$

where the superscript * denotes that these quantities were calculated based on ^{(}^{i}^{,0)}, ^{(}^{i}^{,0)} and ^{(}^{i}^{,0)}.

Using a Taylor series expansion, we can linearise ^{(}^{i}^{)} as

$$\begin{array}{l}{\widehat{\mathit{\alpha}}}^{(i)}={\widehat{\mathit{\alpha}}}_{n}-{\mathbf{B}}_{i}{\stackrel{\sim}{e}}_{i}-{{\mathbf{B}}_{i}}^{\ast}{{\stackrel{\sim}{e}}_{i}}^{\ast}\\ \simeq {\widehat{\mathit{\alpha}}}_{n}-{\mathbf{B}}_{i}{\stackrel{\sim}{e}}_{i}-\left\{{\mathbf{B}}_{i}{\stackrel{\sim}{e}}_{i}+({\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\alpha}}^{T}}}{\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathbf{b}}^{T}}}{\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\tau}}^{T}}})\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{l}{\widehat{\mathit{\alpha}}}^{(i,0)}-{\widehat{\mathit{\alpha}}}_{n}\hfill \\ {\widehat{\mathbf{b}}}^{(i,0)}-{\widehat{\mathbf{b}}}_{n}\hfill \\ {\widehat{\mathit{\tau}}}^{(i,0)}-{\widehat{\mathit{\tau}}}_{n}\hfill \end{array}\right)\right\}\end{array}$$

Hence

$${\widehat{\mathit{\alpha}}}^{(i)}-{\widehat{\mathit{\alpha}}}_{n}=-2{\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}-\left(\begin{array}{lll}{\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\alpha}}^{T}}}\hfill & {\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathbf{b}}^{T}}}\hfill & {\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\tau}}^{T}}}\hfill \end{array}\right)\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{l}{\widehat{\mathit{\alpha}}}^{(i,0)}-{\widehat{\mathit{\alpha}}}_{n}\hfill \\ {\widehat{\mathbf{b}}}^{(i,0)}-{\widehat{\mathbf{b}}}_{n}\hfill \\ {\widehat{\mathit{\tau}}}^{(i,0)}-{\widehat{\mathit{\tau}}}_{n}\hfill \end{array}\right)$$

(6)

$$={f}_{\mathit{\alpha}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i}),$$

(7)

where, as shown in Appendix A1,

$$\begin{array}{l}\frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\alpha}}^{T}}=-{\stackrel{\sim}{e}}_{i}\mathbf{B}\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}({\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{p})+{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{\mathbf{Q}}_{i}^{T}\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}\{({\stackrel{\sim}{e}}_{i}{\mathbf{Q}}_{i}-\mathbf{QY})\otimes {\mathbf{I}}_{p}\}\\ \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathbf{b}}^{T}}=-{\stackrel{\sim}{e}}_{i}\mathbf{B}\frac{\delta \mathbf{V}}{\delta {\mathbf{b}}^{T}}({\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{q})+{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{\mathbf{Q}}_{i}^{T}\frac{\delta \mathbf{V}}{\delta {\mathbf{b}}^{T}}\{({\stackrel{\sim}{e}}_{i}{\mathbf{Q}}_{i}-\mathbf{QY})\otimes {\mathbf{I}}_{q}\}\\ \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\tau}}^{T}}=-{\stackrel{\sim}{e}}_{i}\mathbf{B}\frac{\delta \mathbf{V}}{\delta {\mathit{\tau}}^{T}}({\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{k})+{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{\mathbf{Q}}_{i}^{T}\frac{\delta \mathbf{V}}{\delta {\mathit{\tau}}^{T}}\{({\stackrel{\sim}{e}}_{i}{\mathbf{Q}}_{i}-\mathbf{QY})\otimes {\mathbf{I}}_{k}\}\end{array}$$

and *f***_{α}**(·) is defined in (6) and (7). Note that this enables us to approximate

$${\widehat{\mathbf{b}}}^{(i)}-{\widehat{\mathbf{b}}}_{n}={f}_{\mathbf{b}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i})$$

(8)

$${\widehat{\mathit{\tau}}}^{(i)}-{\widehat{\mathit{\tau}}}_{n}={f}_{\mathit{\tau}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i})$$

(9)

as well as similar expressions for DFFIT residuals in terms of the output of the full model fit.

The next step is to derive a suitable standardisation factor for the differences in (6) and (8). The usual approach in the literature is to standardise these by a positive definite matrix which can be chosen to reflect specific concerns [36]. A standard procedure following Cook [7] is to use the variances of the corresponding parameters from the full model for standardisation and the percentage points of the **F** distribution as a critical value. In this case, the corresponding ‘*p value*’ gives the level of the smallest confidence ellipsoid based on the full data that contains the estimate.

However, there has been some discussion in the literature regarding the choice of the norming matrix to be used for standardisation. Beckman and Cook [6] argues that it should not depend on *i* whereas the norm suggested by Belsley *et al.*[37] does so. Dempster and Gasko [36] provides an interesting geometric perspective on various alternative standardisation criteria and their connection with residual analyses.

Our approach is based on the result,

$$\begin{array}{lll}{f}_{\mathit{\alpha}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i})-{f}_{\mathit{\alpha}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z)\hfill & \stackrel{P}{\to}\hfill & 0\hfill \\ {f}_{\mathbf{b}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i})-{f}_{\mathbf{b}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z)\hfill & \stackrel{P}{\to}\hfill & 0\hfill \\ {f}_{\mathit{\tau}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i})-{f}_{\mathit{\tau}}({\widehat{\mathit{\alpha}}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z)\hfill & \stackrel{P}{\to}\hfill & 0\hfill \end{array}$$

(10)

where *Z* is a standard normal deviate. This result has been proved in Appendix (A2). The implication of this is that we can generate *K* standard normal deviates {*Z _{k}*,

In this section, we apply our methods to the motivating dataset. The connection between Cox models and Poisson regression is well established [38, 39]. Here, we use the equivalence discussed by Therneau and Grambsch [22], and Ganguli and Wand [28] who approximate a Cox proportional hazards model with a non linear log hazard ratio by a Poisson mixed effects model. We fit a model of the form,

$$\lambda (t\mid x)={\lambda}_{0}(t)exp\{{\beta}_{0}\text{HISP}+f(x(t))\},$$

(11)

where *t* is the age at which a subject dies from lung cancer, HISP is an indicator for whether or not a subject is Hispanic and *x*(*t*) is cumulative silica exposure at time *t*. The function *f*(·) is unknown but assumed to be smooth. Using *I* as the indicator function and *δ* as the censoring indicator (1, if uncensored and 0, if censored) and writing *N*(*t*) = *I*(*T* ≤ *t*, *δ* = 1), it follows from Ganguli and Wand [28] that (11) can be approximated as,

$$\mathrm{E}(N(t)\mid \mathbf{b})=exp\{{\beta}_{0}\text{HISP}+{\beta}_{1}x(t)+\sum _{k=1}^{K}{b}_{k}{\mid x(t)-{\kappa}_{k}\mid}^{3}\}\widehat{\mathrm{\Lambda}},$$

(12)

where the *κ _{k}*’s are a set of

$$log\{\mathrm{E}(N(t)\mid \mathbf{b})\}=log(\widehat{\mathrm{\Lambda}})+{\beta}_{0}\text{HISP}+{\beta}_{1}x(t)+\sum _{k=1}^{K}{b}_{k}{\mid x(t)-{\kappa}_{k}\mid}^{3},$$

where , the estimated cumulative hazard from a standard Cox model assuming a linear log hazard function for cumulative silica hazard, serves as an approximate offset.

Figure 2 shows the DFFITs in panel (a) and DFBETAs for the variance component *σ _{b}*

Influence diagnostics for the silica data. The horizontal axis represents cumulative silica exposure (*mg/m*^{3} – *yr*). Panel (a) shows DFFITs and panel (b) shows DFBETAs for the variance component. Panels (c) and (d) give the estimated log hazard **...**

- The log hazard continues to dip beyond an exposure of 50 mg/
*m*^{3}-year in the first case but approaches a plateau in the second. - The two figures (Figure 2 (c)–(d)) show a remarkable contrast. In the former case (Figure 2-(c)), deletion does not affect the shape of the dose response curve significantly although it does attenuate the response - by about 50% at 30 mg/m
^{3}-years. However, in the latter case (Figure 2-(d)), deletion appears to have a more dramatic effect and leads to a different shape of the dose response curve. The results suggest that the curve based on the full dataset might be picking up spurious structure at the upper range of exposures. - The 95% pointwise confidence intervals are indicated using bold lines for the full dataset and dotted lines corresponding to post-deletion. The results make a strong case for conducting post model fitting deletion diagnostics for variance components as the confidence intervals obtained by dropping observations with influential DFFIT values only is very wide. Note that individuals who are influential for variance component estimation are still included for estimation in panel (c) and it appears that the inclusion of these observations leads to an unstable estimate of the variance components and hence to an unreliable confidence interval. By contrast, the confidence intervals obtained in panel (d) are tighter.

Our learning from this application is firstly that influential observations need not occur at the highest extremes of exposure. Second, we can distinguish between observations which are influential for the shape of the estimated dose response curve and those which impact its value but not its shape. The individual with the highest exposure is influential for the estimated shape of the log hazard ratio. The optimum degrees of freedom estimate decreases from approximately three to two as a result of dropping the individual with the highest exposure. Finally, we note that failure to conduct post model fitting diagnostics for variance components can lead to erroneous and unstable confidence interval estimation.

In this section we demonstrate the performance of the method on three different applications all of which may be fitted using GLMMs.

We simulate clustered count data using the model:

$${y}_{ij}~\text{Poisson}({\mu}_{ij});\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}i=1,2\dots ,k;j=1,2,\dots ,{n}_{i},$$

with *k* being the number of clusters. All observations from the *i*th cluster share a common random intercept, *b _{i}* specified by

Figure 3 shows the corresponding standardized residuals which clearly identify observations belonging to cluster 3 as influential in terms of high residual values. It is noteworthy that inclusion of the outlying cluster forces the regression line to have a negative slope whereas the true slope coefficient is positive. Note that the estimates of the variance component are 3.544 and 0.301 corresponding to inclusion and exclusion of the outlying cluster respectively.

We generate data from a binary model with a nonlinear dose response which is modelled using a Generalized Additive Model (GAM) specified by:

$$y~\text{Binary}(p)\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{where}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{logit}(p)={\beta}_{0}+f(x).$$

(13)

We set *β*_{0} = 0.5 and *f*(*x*) = 1.5*x* − 2.5*x*^{2}. We first generate the *x _{i}*,

$$\text{logit}(p\mid \mathbf{b})={\beta}_{0}+\sum _{k=1}^{K}{b}_{k}{\mid x-{\kappa}_{k}\mid}^{3}.$$

(14)

Here, the *κ _{k}*’s are a set of

We consider here a simulated example from the following geostatistical model:

$${y}_{i}~\text{Poisson}(\mu ),\phantom{\rule{0.38889em}{0ex}}\text{where}\phantom{\rule{0.38889em}{0ex}}log(\mu )=s(\mathbf{x}),$$

(15)

with *s*(·) being a bivariate function of **x** = (*x*_{1}, *x*_{2})* ^{T}*, which we set as

$$s({x}_{1},{x}_{2})=\frac{0.6}{2\pi {\sigma}_{1}{\sigma}_{2}}\times {e}^{-{\scriptstyle \frac{{({x}_{1}-0.2)}^{2}}{{{\sigma}_{1}}^{2}}}-{\scriptstyle \frac{{({x}_{2}-0.3)}^{2}}{{{\sigma}_{2}}^{2}}}}+\frac{0.4}{2\pi {\sigma}_{1}{\sigma}_{2}}\times {e}^{-{\scriptstyle \frac{{({x}_{1}-0.7)}^{2}}{{{\sigma}_{1}}^{2}}}-{\scriptstyle \frac{{({x}_{2}-0.8)}^{2}}{{{\sigma}_{2}}^{2}}}},$$

where *σ*_{1} = 0.3 and *σ*_{2} = 0.4. Five hundred sets of **x** were independently generated from the Uniform [0, 1] distribution and based on these, *y’s* were generated using (15). We replaced the 500* ^{th}* observation with a relatively high value to induce an influential observation. Following Kammann and Wand [34], a low rank model formulation is given by,

$$log\mathrm{E}({y}_{i}\mid \mathbf{b})={\beta}_{0}+{\beta}_{1}{x}_{1}+{\beta}_{2}{x}_{2}+\sum _{k=1}^{K}{b}_{k}{\mid {\mathbf{x}}_{i}-{\mathit{\kappa}}_{k}\mid}^{3}.$$

(16)

Here, the *κ** _{k}*’s are a set of

Results are plotted in Figure 5. The estimated d.f. with and without influential observations were 23.3 and 21.75 respectively. The influential point was correctly identified and the estimated functional form after dropping the influential point was similar to the true functional form.

In this paper, we have developed Cook’s distance for fixed and random effects parameters as well as for variance component estimates for the Generalised Linear Mixed Model with independent random effects. Given the wide applicability of the model, these diagnostics have far reaching scope. The simulations and application presented make a strong case for conducting post model fitting deletion diagnostics. First, influential observations need not occur at the highest extremes of exposure. Second, we can distinguish between influential observations which only modestly influence the values of the log hazard over local regions, and those which influence the shape of the exposure-response curve. Third, observations influential for the variance component estimate occurred at the highest levels of exposure and the deletion of these outliers had more influence on the shape of the curve. In the silica application, when the highest exposed subject was removed, the optimum degrees of freedom dropped from 3 to 2. Using the full dataset leads to overfitting in regions of high cumulative exposure. This is common for exposure-response modelling and possible reasons include the saturation of biological processes, misclassification of exposure at high levels and the healthy worker effect [3, 5, 40]. For the silica dataset, the main problem is the sparsity of data and few cases in the regions of high cumulative exposure.

Empirical observations from the authors suggest that the performance of the method is driven by a combination of the degree of nonlinearity in the exposure-response relationship and the local density of data points. The method performs well in presence of moderate nonlinearity even in regions of sparse data but will tend to spuriously identify influential observations in regions of high nonlinearity. We have also informally compared the performance of the method on simulated datasets with Poisson and binary outcomes and observed that for the same degrees of nonlinearity and data density, influential values are identified more accurately for Poisson outcomes. Further extensions of the method should accomodate deletion of a subset of observations. Note that in our application to the silica data which has time-varying exposures, we have considered detection of influential exposures and considered an individual as influential if at least one of his observations have been classified as influential. We have assumed throughout the paper that the random effects are independent. While such an assumption is typically made for cases such as our motivating example, where the variance components model serves as a device for smoothing, non-spherical variance structures are commonly used when random effects are used to model subject specific effects or overdispersion. As additional covariance parameters are also estimated by Restricted Maximum Likelihood (REML) and the results of Haslett and Hayes, and Hayes and Haslett[17, 35] continue to be valid, our method can, in theory, be extended to accomodate non-spherical variance structures although at a cost of added computational complexity. Additional research which could be spurred by this paper include the development of models which include explanations for the occurence of influential observations such as population heterogenity but it is clear that such research requires considerable substantive knowledge.

Contract/grant sponsor: R01 CA081345, National Institute of Health.

This research was supported by grant R01 CA081345 from the National Institute of Health. The authors would also like to thank Prof. Checkoway for permission to use the dataset on silica exposure.

We have,

$$\frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\alpha}}^{T}}=\frac{\delta {\mathbf{B}}_{i}}{\delta {\mathit{\alpha}}^{T}}{\stackrel{\sim}{e}}_{i}\otimes {\mathbf{I}}_{p}+{\mathbf{B}}_{i}\otimes {\mathbf{I}}_{p}\frac{\delta {\stackrel{\sim}{e}}_{i}}{\delta {\mathit{\alpha}}^{T}}$$

To calculate the derivatives, first note that writing $\mathbf{C}=\left(\begin{array}{cc}{\mathbf{0}}_{p\times p}& {\mathbf{X}}^{T}\\ \mathbf{X}& \mathbf{V}\end{array}\right)$,

$$\mathbf{B}=\left(\begin{array}{cc}{\mathbf{I}}_{p}& {\mathbf{0}}_{p\times n}\end{array}\right)\phantom{\rule{0.16667em}{0ex}}{\left(\begin{array}{cc}{\mathbf{0}}_{p\times p}& {\mathbf{X}}^{T}\\ \mathbf{X}& \mathbf{V}\end{array}\right)}^{-1}\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{\mathbf{0}}_{p\times n}\\ {\mathbf{I}}_{n}\end{array}\right)=\left(\begin{array}{cc}{\mathbf{I}}_{p}& {\mathbf{0}}_{p\times n}\end{array}\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{C}}^{-1}\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{\mathbf{0}}_{p\times n}\\ {\mathbf{I}}_{n}\end{array}\right)$$

so that

$$\begin{array}{l}\frac{\delta \mathbf{B}}{\delta {\mathit{\alpha}}^{T}}=\{(\begin{array}{cc}{\mathbf{I}}_{p}& {\mathbf{0}}_{p\times n}\end{array})\otimes {\mathbf{I}}_{p}\}\frac{\delta {\mathbf{C}}^{-1}}{\delta {\mathit{\alpha}}^{T}}\{{(\begin{array}{cc}{\mathbf{0}}_{n\times p}& {\mathbf{I}}_{n}\end{array})}^{T}\otimes {\mathbf{I}}_{p}\}\\ =-\{(\begin{array}{cc}{\mathbf{I}}_{p}& {\mathbf{0}}_{p\times n}\end{array})\otimes {\mathbf{I}}_{p}\}{\mathbf{C}}^{-1}\frac{\delta \mathbf{C}}{\delta {\mathit{\alpha}}^{T}}{\mathbf{C}}^{-1}\{{(\begin{array}{cc}{\mathbf{0}}_{n\times p}& {\mathbf{I}}_{n}\end{array})}^{T}\otimes {\mathbf{I}}_{p}\}\end{array}$$

Noting that,

$$\frac{\delta \mathbf{C}}{\delta {\mathit{\alpha}}^{T}}=\left(\begin{array}{cc}\mathbf{0}& {\mathbf{0}}^{T}\\ \mathbf{0}& {\scriptstyle \frac{\delta \mathbf{V}}{\delta {\alpha}^{T}}}\end{array}\right)$$

and that

$$\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}=\left[\begin{array}{cccc}\varphi {a}_{1}{\scriptstyle \frac{{g}^{\u2033}({\mu}_{1})}{{g}^{\prime}({\mu}_{1})}}{{\mathbf{x}}_{1}}^{T}& {\mathbf{0}}^{T}& \cdots & {\mathbf{0}}^{T}\\ \vdots & \vdots & \vdots & \vdots \\ {\mathbf{0}}^{T}& {\mathbf{0}}^{T}& \cdots & \varphi {a}_{n}{\scriptstyle \frac{{g}^{\u2033}({\mu}_{n})}{{g}^{\prime}({\mu}_{n})}}{{\mathbf{x}}_{n}}^{T}\end{array}\right],$$

we get after some simplification,

$$\frac{\delta {\mathbf{B}}_{i}}{\delta {\mathit{\alpha}}^{T}}=-\mathbf{B}\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}({\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{p}).$$

Similarly,

$$\begin{array}{llll}\phantom{\rule{0.16667em}{0ex}}\hfill & \frac{\delta {\mathbf{Q}}_{i}}{\delta {\mathit{\alpha}}^{T}}\hfill & =\hfill & -\mathbf{Q}\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}({\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{p})\hfill \\ \text{and}\hfill & \frac{\delta {q}_{ii}}{\delta {\mathit{\alpha}}^{T}}\hfill & =\hfill & -{{\mathbf{Q}}_{i}}^{T}\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}({\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{p}).\hfill \end{array}$$

Also, using the fact that

$$\frac{\delta {\stackrel{\sim}{e}}_{i}}{\delta {\mathit{\alpha}}^{T}}={{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}\frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}\{({\stackrel{\sim}{e}}_{i}{\mathbf{Q}}_{i}-\mathbf{QY})\otimes {\mathbf{I}}_{p}\}$$

we get the expression for ${\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\alpha}^{T}}}$. Expressions for ${\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathbf{b}}^{T}}}$ and ${\scriptstyle \frac{\delta {\stackrel{\sim}{e}}_{i}{\mathbf{B}}_{i}}{\delta {\mathit{\tau}}^{T}}}$ can be similarly found.

We make the following three assumptions:

- ${n}^{-1}\phantom{\rule{0.16667em}{0ex}}\left[\begin{array}{cc}{\mathbf{X}}^{T}\mathbf{WX}& {\mathbf{X}}^{T}\mathbf{WZ}\\ {\mathbf{Z}}^{T}\mathbf{WX}& \mathbf{ZWZ}+{\mathbf{D}}^{-1}\end{array}\right]\to \mathbf{\sum}$, which is positive definite.
**W**and**ZDZ**have finite elements with^{T}**W**>**0**and/or the elements of**Z**,**D**finite.- ${a}_{i}=\varphi {\scriptstyle \frac{{g}^{\u2033}({\mu}_{i})}{{g}^{\prime}({\mu}_{i})}}\le c$,
*i*= 1, …,*n*for some constant*c*.

Note that ${f}_{\alpha}({\widehat{\alpha}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{\stackrel{\sim}{e}}_{i})-{f}_{\alpha}({\widehat{\alpha}}_{n},{\widehat{\mathbf{b}}}_{n},{\widehat{\mathit{\tau}}}_{n},{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z)$ can be expressed as

$$-2\phantom{\rule{0.16667em}{0ex}}\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{B}}_{i}+\left(\begin{array}{ccc}{\mathbf{A}}_{1}-{\mathbf{C}}_{1}& {\mathbf{A}}_{2}-{\mathbf{C}}_{2}& {\mathbf{A}}_{3}-{\mathbf{C}}_{3}\end{array}\right)\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{\mathbf{B}}_{1}\\ {\mathbf{B}}_{2}\\ {\mathbf{B}}_{3}\end{array}\right)+\left(\begin{array}{ccc}{\mathbf{C}}_{1}& {\mathbf{C}}_{2}& {\mathbf{C}}_{3}\end{array}\right)\phantom{\rule{0.16667em}{0ex}}\left(\begin{array}{c}{\mathbf{B}}_{1}-{\mathbf{D}}_{1}\\ {\mathbf{B}}_{2}-{\mathbf{D}}_{2}\\ {\mathbf{B}}_{3}-{\mathbf{D}}_{3}\end{array}\right)$$

(17)

Consider the first term in (17). By the normal theory model applied to the constructed variable **Y**,

$${\stackrel{\sim}{e}}_{i}\stackrel{\text{asy}}{\sim}\mathrm{N}\phantom{\rule{0.16667em}{0ex}}({\mathbf{1}}^{\ast}{{\mathbf{Q}}_{i}}^{T}\mathbf{X}\mathit{\alpha},{{q}_{ii}}^{-2}{{\mathbf{Q}}_{i}}^{T}{\mathbf{VQ}}_{i})=\mathrm{N}\left(0,{{q}_{ii}}^{-1}\right)\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{so}\phantom{\rule{0.16667em}{0ex}}\text{that}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\stackrel{P}{\to}0.$$

Also,

$${\mathbf{B}}_{i}=\frac{1}{\sqrt{n}}{\left(\frac{1}{n}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X}\right)}^{-1}\frac{1}{\sqrt{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}{\stackrel{\sim}{e}}_{i}\to \mathbf{0}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\text{as}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}n\to \infty .$$

Here **V*** ^{i}* is the

$$\begin{array}{l}{\mathbf{A}}_{1}-{\mathbf{C}}_{1}=\\ -\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}\mathbf{B}{\scriptstyle \frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}}\phantom{\rule{0.16667em}{0ex}}(\mathbf{Q}\otimes {\mathbf{I}}_{p})+{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}{\scriptstyle \frac{\delta \mathbf{V}}{\delta {\mathit{\alpha}}^{T}}}\left\{\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{p}\right\}\\ =-\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{BAQ}}^{\ast}\mathbf{X}+{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{AQ}}^{\ast}\mathbf{X}-{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{\mathbf{Q}}_{i}\mathbf{A}{q}_{ii}\phantom{\rule{0.16667em}{0ex}}\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{E}}_{i}\mathbf{X}\\ \le \left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}\left\{-{\left({\scriptstyle \frac{{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X}}{n}}\right)}^{-1}\phantom{\rule{0.16667em}{0ex}}\left({\scriptstyle \frac{{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X}}{n}}\right)+\left({{\scriptstyle \frac{{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X}}{b}}}^{-1}\right)\phantom{\rule{0.16667em}{0ex}}({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}\mathbf{1}\mathbf{X})-\left({{\scriptstyle \frac{{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X}}{n}}}^{-1}\right)\phantom{\rule{0.16667em}{0ex}}({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}\mathbf{1}\mathbf{X})\right\}\\ =o(1)O(1)=o(1)\end{array}$$

and ${\mathbf{B}}_{1}={({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}{\scriptstyle \frac{1}{\sqrt{(n)}}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}{\scriptstyle \frac{1}{\sqrt{n}}}{\stackrel{\sim}{e}}_{1}=O(1)O(1)o(1)$ assumptions 1–3, so that

$${\mathbf{B}}_{1}({\mathbf{A}}_{1}-{\mathbf{C}}_{1})=o(1).$$

Similarly, **B**_{2}(**A**_{2} − **C**_{2}) = *o*(1). Note that

$$\begin{array}{l}{\mathbf{A}}_{3}-{\mathbf{C}}_{3}=\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{BZD}}^{\ast}({\mathbf{Z}}^{T}{\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{k})+{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}^{T}{\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{k})-{\mathbf{B}}_{i}{{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}{q}_{ii}\phantom{\rule{0.16667em}{0ex}}\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}^{T}\otimes {\mathbf{I}}_{k})\\ =o(1)O(1)O(1)+O(1)o(1)-O(1)o(1)=o(1)\end{array}$$

and that
${\mathbf{B}}_{3}={((\text{trace}{\scriptstyle \frac{1}{{n}^{2}}}\phantom{\rule{0.16667em}{0ex}}({{\mathbf{Z}}_{l}}^{T}{\mathbf{QZ}}_{j}){\scriptstyle \frac{1}{{n}^{2}}}\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}_{j}{\mathbf{QZ}}_{l})))}^{-1}=O(1)o(1)=o(1)$, so that **B**_{3}(**A**_{3} − **C**_{3}) = *o*(1). Similarly,

$$\begin{array}{l}{\mathbf{C}}_{1}={({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}{\scriptstyle \frac{1}{\sqrt{n}}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}\phantom{\rule{0.16667em}{0ex}}\left({\stackrel{\sim}{e}}_{i}-{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right),\\ \text{and}\\ ({\mathbf{B}}_{1}-{\mathbf{D}}_{1})=\left({\scriptstyle \frac{1}{\sqrt{n}}}{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}\phantom{\rule{0.16667em}{0ex}}({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{A}{{\mathbf{Q}}_{i}}^{T}\mathbf{X})\\ +{({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}\phantom{\rule{0.16667em}{0ex}}\left\{{\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}{{\mathbf{Q}}_{i}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z/n\right)\phantom{\rule{0.16667em}{0ex}}\mathbf{A}{{\mathbf{Q}}_{i}}^{T}\mathbf{X}\right\}\\ -{({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}\phantom{\rule{0.16667em}{0ex}}\left\{{\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}{{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}\mathbf{A}{q}_{ii}\phantom{\rule{0.16667em}{0ex}}\left({{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}{\scriptstyle \frac{Z}{\sqrt{(n)}}}\right)\phantom{\rule{0.16667em}{0ex}}{\mathbf{E}}_{i}\mathbf{X}\right\}\\ \text{so}\phantom{\rule{0.16667em}{0ex}}\text{that}\\ {\mathbf{C}}_{1}({\mathbf{B}}_{1}-{\mathbf{D}}_{1})=[O(1)]\phantom{\rule{0.16667em}{0ex}}O(1)O(1)o(1)=o(1)\end{array}$$

Similarly, we can show that **C**_{2}(**B**_{2} − **D**_{2}) = *o*(1).

Finally, note that,

$$\begin{array}{l}{\scriptstyle \frac{{\mathbf{C}}_{3}}{\sqrt{n}}}=-\left({\scriptstyle \frac{1}{\sqrt{n}}}{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}{({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}\phantom{\rule{0.16667em}{0ex}}({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{Z})\phantom{\rule{0.16667em}{0ex}}{\mathbf{D}}^{\ast}\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}^{T}{{\mathbf{Q}}_{i}}^{T}\otimes {\mathbf{I}}_{k})\\ +{({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}{\scriptstyle \frac{1}{\sqrt{n}}}\mathbf{X}{\mathbf{V}}^{i}{{q}_{ii}}^{-1}{{\mathbf{Q}}_{i}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\scriptstyle \frac{1}{\sqrt{n}}}{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}^{T}{\mathbf{Q}}_{i}\otimes {\mathbf{I}}_{k})\\ -{({\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{-1}\mathbf{X})}^{-1}{\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}{\mathbf{V}}^{i}{{\mathbf{Q}}_{i}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\scriptstyle \frac{1}{\sqrt{n}}}{{q}_{ii}}^{-{\scriptstyle \frac{1}{2}}}Z\right)\phantom{\rule{0.16667em}{0ex}}({\mathbf{Z}}^{T}\otimes {\mathbf{I}}_{k})\end{array}$$

so that
${\mathbf{C}}_{3}\phantom{\rule{0.16667em}{0ex}}({\mathbf{B}}_{3}-{\mathbf{D}}_{3})=O(\sqrt{n})O({n}^{-1})=o({n}^{-{\scriptstyle \frac{1}{2}}})=o(1)$. This completes the proof. In case, **V** is replaced by a consistent estimator , we require an additional set of conditions:

$$\begin{array}{lll}{\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\widehat{\mathbf{V}}}^{-1}-{\mathbf{V}}^{-1}\right)\phantom{\rule{0.16667em}{0ex}}\mathbf{X}\hfill & \stackrel{P}{\to}\hfill & \mathbf{0}\hfill \\ {\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\widehat{\mathbf{V}}}^{-1}-{\mathbf{V}}^{-1}\right)\phantom{\rule{0.16667em}{0ex}}\mathbf{Z}\hfill & \stackrel{P}{\to}\hfill & \mathbf{0}\hfill \\ {\scriptstyle \frac{1}{n}}{\mathbf{X}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\widehat{\mathbf{V}}}^{-1}-{\mathbf{V}}^{-1}\right)\phantom{\rule{0.16667em}{0ex}}{\stackrel{\sim}{e}}_{i}\hfill & \stackrel{P}{\to}\hfill & \mathbf{0}\hfill \\ {\scriptstyle \frac{1}{n}}{\mathbf{Z}}^{T}\phantom{\rule{0.16667em}{0ex}}\left({\widehat{\mathbf{V}}}^{-1}-{\mathbf{V}}^{-1}\right)\phantom{\rule{0.16667em}{0ex}}{\stackrel{\sim}{e}}_{i}\hfill & \stackrel{P}{\to}\hfill & \mathbf{0}\hfill \end{array}$$

In case the application is to clustered data, we would require the above set of conditions to hold for each cluster.

1. Checkoway H, Heyer NJ, Seixas NS, Welp EA, Demers PA, Hughes JM, Weill H. Dose-response association of silica with nonmalignant respiratory disease and lung cancer mortality in the diatomaceous earth industry. Americal Journal of Epidemiology. 1997;145:680–688. [PubMed]

2. Hernan MA. The hazards of hazard ratios. Epidemiology. 2010;21:13–15. [PMC free article] [PubMed]

3. Stayner L, Steenland K, Dosemeci M, Hertz-Picciotto I. Attenuation of exposure-response curves in occupation cohort studies at high exposure levels. Scandinavian Journal of Work, Enironment & Health. 2003;29:317–324. [PubMed]

4. Ganguli B, Naskar M, Malloy EJ, Eisen EA. Determination of the functional form of the relationship of covariates to the log hazard ratio in a Cox model. Journal of Applied Statistics. 2015 doi: 10.1080/02664763.2014.995607. http://dx.doi.org/10.1080/02664763.2014.995607. [Cross Ref]

5. Eisen EA, Agalliu I, Thurston SW, Coul BA, Checkoway H. Smoothing in occupational cohort studies: An illustration based on penalized splines. Occupational Environmental Medicine. 2004;61:854–860. [PMC free article] [PubMed]

6. Beckman RJ, Cook RD. Outlier............s. Technometrics. 1983;25:119–149.

7. Cook RD. Influential observations in linear regression. Journal of the American Statistical Association. 1979;74:169–174.

8. Pregibon D. Logistic regression diagnostics. The Annals of Statistics. 1981;9:705–724.

9. Ouwens MJ, Tan F, Berger M. Local influence to detect influential data structures for generalized linear mixed models. Biometrics. 2001;57:1166–1172. [PubMed]

10. Cain KC, Lange NT. Approximate case influence for the proportional hazards regression model with censored data. Biometrics. 1984;40:493–499. [PubMed]

11. Lesaffre E, Verbeke G. Local influence in linear mixed models. Biometrics. 1998;54:570–582. [PubMed]

12. Zhu H, Lee S. Local influence for generalized linear mixed models. The Canadian Journal of Statistics. 2003;31:293–309.

13. Fung W, Gu H, Xiang L, Yau KKW. Assessing local influence in principal component analylsis with application to haematology study data. Statistics in Medicine. 2007;26:2730–2744. [PubMed]

14. Fung WK, Zhu HY, Wei BC, He X. Influence diagnostics and outlier tests for semiparametric mixed models. Journal of the Royal Statistical Society, Series B. 2002;64:565–579.

15. Barlow WE. Global measures of local influence for proportional hazards regression models. Biometrics. 1997;53:1157–162. [PubMed]

16. Preisser JS, Qaqish BF. Deletion diagnostics for generalized estimating equations. Biometrika. 1996;83:551–562.

17. Haslett J, Hayes K. Residuals for the linear model with general covariance structure. Journal of the Royal Statistics Society, Series B. 1998;60:201–215.

18. Haslett J. A simple derivation of deletion diagnostic results for the general linear model with correlated errors. Journal of the Royal Statistical Society, Series B. 1999;61:603–609.

19. Haslett J, Dillane D. Application of delete = replace to deletion diagnostics for variance component estimation in the linear mixed model. Journal of the Royal Statistical Society, Series B. 2004;66:131–143.

20. Welsch RE. Influential observations, high leverage points and outliers in linear regression: Comment. Statistical Science. 1986;1:403–405.

21. Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge University Press; New York: 2003.

22. Therneau TM, Grambsch PM. Modeling Survival Data; Extending the Cox Model. Springer-Verlag; New York: 2000.

23. Sen Roy S, Guria S. Estimation of regression parameters in the presence of outliers in the response. Statistics. 2009;43:531–539.

24. Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25.

25. Manchester L, Blanchard W. When is a curve an outlier? An account of a tricky problem. The Canadian Journal of Statistics. 1996;24:455–466.

26. Grambsch PM, Terneau TM, Fleming TR. Diagnostics plot to reveal functional form for covariates in multiplicative intensity models. Biometrics. 1995;51:1469–1482. [PubMed]

27. Patterson HD, Thompson R. Recovery of inter-block information when block sizes are unequal. Biometrics. 1971;58:545–554.

28. Ganguli B, Wand MP. Additive models for geo-referenced failure time data. Statistics in Medicine. 2006;25:2469–2482. [PubMed]

29. Speed T. [That BLUP is a good thing: The estimation of random effects]: Comment. Statistical Science. 1991;6:42–44. doi: 10.1214/ss/1177011930. http://projecteuclid.org/euclid.ss/1177011930. [Cross Ref]

30. Wahba G. Improper priors, spline smoothing and the problem of guarding against model errors in regression. Journal of the Royal Statistical Society Series B. 1978;40:364–372.

31. Babette AB, John AR. Smoothing spline models for the analysis of nested and crossed samples of curves. Journal of the American Statistical Association. 1998;93:961–976.

32. Robinson GK. That blup is a good thing: The estimation of random effects. Statistical Science. 1991;6:15–32.

33. Green PJ. Penalized likelihood for general semi-parametric regression models. International Statistical Review. 1987;55:245–259.

34. Kammann E, Wand M. Geoadditive models. Applied Statistics. 2003;52:1–18.

35. Hayes K, Haslett J. Simplifying general least squares. The American Statistician. 1999;53:376–381.

36. Dempster A, Gasko-Green M. New tools for residual analysis. The Annals of Statistics. 1981;9:945–959.

37. Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons; New York: 1980.

38. Whitehead J. Fitting Cox’s regression model to survival data sing GLIM. Journal of the Royal Statistical Society, Series C. 1980;29:268–275.

39. Ma R, Krewski D, Burnett RT. Random effects Cox models: A Poisson modeling approach. Biometrika. 2003;90:157–169.

40. Steeland K, Ryan S, Mitch K, Jenifer J, Henry DK. Risk estimation with epidemiologic data when response attenuates at high exposure levels. Environmental Health perspective. 2011;11:854–860. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |