PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of biostsLink to Publisher's site
 
Biostatistics. 2009 April; 10(2): 205–218.
Published online 2008 September 29. doi:  10.1093/biostatistics/kxn030
PMCID: PMC2733172

Generalized linear models with unspecified reference distribution

Paul J. Rathouz*
Department of Health Studies, University of Chicago, 5841 South Maryland Avenue, MC 2007, Chicago, IL 60637, USA ; ude.ogacihcu@zuohtarp

Abstract

We propose a new class of semiparametric generalized linear models. As with existing models, these models are specified via a linear predictor and a link function for the mean of response Y as a function of predictors X. Here, however, the “baseline” distribution of Y at a given reference mean μ0 is left unspecified and is estimated from the data. The response distribution when the mean differs from μ0 is then generated via exponential tilting of the baseline distribution, yielding a response model that is a natural exponential family, with corresponding canonical link and variance functions. The resulting model has a level of flexibility similar to the popular proportional odds model. Maximum likelihood estimation is developed for response distributions with finite support, and the new model is studied and illustrated through simulations and example analyses from aging research.

Keywords: Baseline distribution, Canonical link, Density ratio model, Exponential tilting, Linear exponential family, Natural exponential family, Quasi-likelihood, Semiparametric model

1. INTRODUCTION

Over the past 3 decades, generalized linear models (McCullagh and Nelder, 1989) and quasi-likelihood (QL) (Wedderburn, 1974; McCullagh, 1983) have become the bread and butter of data analysis in biostatistics and other areas of applied statistics. The QL regression models generally yield consistent estimators for regression parameters under correct specification of the mean model, without requiring knowledge of the full distribution of the response variable (Crowder, 1986). When the variance is correctly specified, the QL estimating equation is fully efficient among those linear in the data (Crowder, 1987) and the model-based standard errors obtained by treating the QL equation as the true score equation are asymptotically correct. When the variance is misspecified, the empirical or “sandwich” variance estimator provides asymptotically correct standard errors (Huber, 1967; White, 1982). QL and the empirical variance estimator therefore combine to yield a powerful set of practical data analytic tools. QL has also led to advances in robust analysis procedures for longitudinal and clustered data (Liang and Zeger, 1986), missing covariate and response data (Robins and others, 1994, 1995) and covariates measured with error (e.g. Carroll and Stefanski, 1990).

Despite clear advantages in terms of robustness, models for the mean response which ignore the full distribution of that response have some obvious drawbacks, such as the inability to make inferences about the cumulative distribution of the response. They also create difficulties in more complex modeling situations such as those involving latent predictor variables or missing covariate data because marginal likelihoods over such latent or missing data are difficult to compute or are only approximate when a full likelihood is not available. For example, longitudinal or clustered data are often modeled via random-effects models (Laird and Ware, 1982); while some of these models can be fitted via extensions of QL (Breslow and Clayton, 1993), the solution is at best an approximate one (Lin and Breslow, 1996) and a full probability model for the data would presumably yield more precise inferences. In addition, posterior prediction of random effects is hampered in the absence of a full probability model. Similarly, biased or outcome-dependent sampling models (e.g. Scott and Wild, 1986) require the application of Bayes’ Theorem to—and hence a probability model for—the model of interest in order to generate a likelihood conditional on the observation being sampled; a similar formal problem arises in missing covariate models in which the response is modeled conditional on the covariate being observed.

For these reasons, in applied work in which one of us (P. J. Rathouz) has been involved, even when a model for the mean response seems plausible, if a full probability model is desired, we have often chosen to work instead with the proportional odds model (POM) (McCullagh, 1980) or the analogous ordinal probit model. For example, in our work on child conduct problems, the outcome variable is often a count of conduct symptoms. In some analyses (e.g. Lahey and others, 2007), we model the mean of this count using Poisson (log-linear) regression with overdispersion (estimated with QL), even though the response is not truly Poisson. However, in work wherein we wish to employ random effects, we base our analysis on the POM for ordinal data (e.g. Chronis and others, 2007) because we cannot justify the Poisson distributional model for symptom counts. Similarly, in aging research with outcomes such as the number of difficulties with activities of daily living, some (e.g. Clarke and George, 2005) have used Poisson regression, accounting for overdispersion, while others (e.g. Sulander and others, 2005) have used ordinal data approaches. The advantage of the POM is that it is a fully specified probability model and therefore easily combines with random effects or latent variables (Hedeker and Gibbons, 1994). It can be inconvenient to use in applied settings, however, because the interpretation of the regression coefficients as log cumulative odds ratios rather than mean contrasts is difficult for applied audiences to understand. Additionally, the analyst is often willing to interpret regression relationships directly in terms of the mean by, for example, plotting and interpreting empirical means of the response. A regression model parameterized in terms of the mean response is therefore a desirable alternative to ordinal data regression models.

Beginning with a model for the mean, this paper takes a first step toward addressing the limitations of QL by introducing a new class of semiparametric generalized linear models (SPGLMs) with a level of flexibility similar to the POM. Models in this class are characterized by a parametric function for the mean response, specified via a linear predictor and a link function, and a nonparametric baseline distribution for the response at a given reference mean μ0. The response distribution when the mean differs from μ0 is then generated via exponential tilting (Barndorff-Nielsen and Cox, 1989) of the baseline distribution, yielding a response model that is a natural or linear exponential family (Morris, 1982; Jørgensen, 1987).

We believe that this class of models holds promise in a variety of applications. Our hope is that it will provide a flexible alternative to QL models for the mean when a full distributional model is desired but difficult to specify and will provide a modeling framework on which to build random effects or other latent variable models, as well as new methods for missing data or biased samples. In addition, in some settings, these models will provide flexible alternatives against which to test when the analyst believes that a parametric model may in fact hold.

In Section 2, we elaborate our new model, discuss a few of its properties, and tie it to related literature. In Section 3, we restrict attention to the case where the response distribution has a finite and known support in order to develop maximum likelihood estimation. Examples of finite support outcomes from psychiatry and aging research include counts of symptoms or disability items, modeled using linear, log-linear, or logistic regression. Section 4 contains a simulation study illustrating potential benefits of our model and providing some empirical comparisons with the POM. Section 5 contains example data analyses from aging research employing our new model, and Section 6 is a discussion.

2. MODEL AND RELATED LITERATURE

2.1. A SPGLM

Let X be a p × 1 covariate vector, and let scalar response Y be randomly sampled from the conditional distribution of (Y|X), where Y has support An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx1_ht.jpg. Interest lies in inferences about the distribution of (Y|X). Following the traditional approach to regression modeling, define the conditional mean model

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx2_ht.jpg
(2.1)

where η = XTβ is the linear predictor and g(·) is a known link function. That is, g(·) is a strictly increasing function from the open interval (m, M) [subset or is implied by] An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx3_ht.jpg into An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx3_ht.jpg, where m = inf(An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg) and M = sup(An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg).

Now, suppose that the probability density of (Y|X) is given by

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx5_ht.jpg
(2.2)

where f0(·) is a baseline density on An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg,

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx6_ht.jpg

depending on f0, and θ is chosen such that An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx7_ht.jpg. For a fixed f0, (2.2) is a natural exponential family (Morris, 1982) with canonical parameter θ and cumulant generating function b(·). Coupling (2.2) with mean model (2.1), model (2.1)–(2.2) is a generalized linear model (McCullagh and Nelder, 1989) with linear predictor η, link function g(·), and error distribution (2.2).

The right-hand side of (2.2) is a density ratio model (Anderson, 1972) because, for θ1 ≠ θ2, the log ratio of the 2 corresponding densities is linear in the data, that is, log{f(y1)/f(y;θ2)} [proportional, variant]1 − θ2)y. In traditional generalized linear models, f0 is a known component of the error distribution. For example,

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx8_ht.jpg

for the binomial model and f0(y) = 1/(ey!), An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg ={0,1,…}, for Poisson data. With (2.1)–(2.2), we propose to extend the traditional framework by estimating f0 from the data as in a density ratio model while maintaining the mean model in terms of user-specified g(μ) = η. Model (2.1)–(2.2) is thus an “SPGLM” governed by finite-dimensional mean parameter β and nonparametric reference density f0.

As with traditional generalized linear models, the mean model and the error distribution are tied together via a canonical link function gc(μ;f0)[equivalent]gc(μ) = θ, ∀ μ [set membership](m,M). Writing

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx9_ht.jpg
(2.3)

we have that the solution θ to (2.3), as an implicit function of μ, is the canonical link gc(μ;f0) = b−1(μ;f0), which depends on f0. We show in Appendix A (Corollary 4b) of the supplementary material (available at Biostatistics online, http://www.biostatistics.oxfordjournals.org) that for a given f0 with bounded support An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg, gc(μ;f0) exists and is a one-to-one mapping from μ [set membership](m,M) onto (−∞, +∞).

As stated, model (2.1)–(2.2) is not completely identifiable. In Appendix B of the supplementary material, available at Biostatistics online, we show that f0 in (2.2) can be replaced with any other f0* that is a “tilted” version of f0, that is, f0*(y) = f0(y)exp(θ0*y)/An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx10_ht.jpgf0(u exp(θ0*u)du for some θ0* [set membership](−∞,+∞). When this substitution is made in (2.2), then θ = gc(μ;f0) becomes θ* =gc(μ;f0*) =θ −θ0* and f(y|X;β,f0) = f(y|X;β,f0*). In order to render (2.1)–(2.2) identifiable, it is therefore necessary to place an additional restriction on f0. We also show in Appendix B that for given f0 and μ0* [set membership](m,M), there exists a unique f0* that is (i) a tilted version of f0 and (ii) has mean μ0*. Without loss of generality, we thus impose the identifiability constraint on f0 that gc0;f0)=0 (i.e. An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx10_ht.jpgyf0(y)dy = μ0), where μ0 [set membership](m,M) is an arbitrary reference mean chosen by the user; the material in Appendix B establishes that the model is invariant to the choice of μ0. As f0 has the interpretation as the distribution of (Y|X) when g−1(X′β)= μ0, we recommend choosing μ0 to be the mean or median of Y in the data or some other centrally located value appropriate for the problem at hand. Doing so ensures that f0 has a meaningful interpretation and is not an extrapolation from the data.

If An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg is finite of cardinality K, then f0 is a {card(An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg) −2}-dimensional parameter after incorporating constraints An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx11_ht.jpgf0(y) =1 and An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx11_ht.jpgyf0(y)=μ0. By contrast, if Y has infinite support, then f0 is an infinite-dimensional parameter, similar to a baseline hazard function in the Cox proportional hazards model.

REMARK 2.1

Model (2.1)–(2.2) provides an interesting alternative to the POM. Both models have p+card(An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg)−2 parameters, including a regression model for the location with p−1 slope parameters and a nonparametric model for a baseline distribution. In addition, both models share the stochastic ordering property discussed below. The 2 models are compared via simulations and data analysis examples in Sections 4–5. The similarities between them and their equivalent level of flexibility suggest that in many cases which model the analyst chooses is more likely to be a matter of analytic goals and personal preferences than a question of one model providing a substantially better fit to the data.

2.2. Properties

We mention 2 useful properties of model (2.1)–(2.2). First, it possesses a stochastic ordering property. This property states that if for given values x1x2, μ(x1;β)<μ(x2;β), then for all y [set membership]An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg such that m < y < M,

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx12_ht.jpg

Second, both the interpretation and the estimation of regression parameter β are robust to unknown baseline distribution f0. Note that (2.1)–(2.2) is specified in terms of the mean function and f0 and that b(θ) implicitly depends on f0. So, in general, derivatives b(r)(θ) and in particular var(Y|X;β,f0)=b′′(θ) will depend on f0. However, because θ is defined as a function of μ, rather than the other way around, μ is invariant to f0. Further, as the score function for β is X([partial differential]μ/[partial differential]η)(Y −μ)/b′′(θ), it is easily seen (Hiejima, 1997) that μ (or β) and f0 are orthogonal parameters (Cox and Reid, 1987). The implication is that maximum likelihood estimators (MLEs) An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg will very generally be consistent and asymptotically normal even if f0 is misspecified or poorly estimated or if tilting model (2.2) does not hold (Crowder, 1986), although nominal standard errors will likely be incorrect. Model (2.2) can thus be seen as a working distribution for (Y|X) around its mean; if the working model is correct, mean model inferences will be enhanced, while if it is incorrect, such results will still be valid.

2.3. Related literature

The operation of multiplying a baseline density by a positive function exp{w(θ,y)} is known as exponential tilting (Barndorff-Nielsen and Cox, 1989, Section 4.3) and has been studied extensively in the context of natural exponential families (Morris, 1982) when w(θ,y) = θh(y) for some known function h(·). Exponential tilting has been used in constructing bootstrap hypothesis tests (Davison and Hinkley, 1997), in empirical likelihood methods (Owen, 2001), and in nonparametric density estimation (Efron and Tibshirani, 1996).

A connection to other work can be made by considering models wherein the regression is specified directly in terms of canonical parameter θ, that is, θ = XTβ. The case when X is a scalar was studied by Birch (1963; see also McCullagh and Nelder, 1989, Section 5.2.3). When X is a vector of indicators for a factor, the multigroup density ratio model (Anderson, 1972; Fokianos and others, 2001; Fokianos, 2004) arises and gives rise to methods for biased or outcome-dependent samples such as case–control studies (Qin, 1998; Gilbert, 2004). Emphasis is on comparison of the groups via contrasts in the canonical parameter θ and/or estimation of the baseline distribution f0. In contrast to this work, our proposal is to model the mean μ, or some user-chosen transformation thereof, as a regression on X. We believe that this will render our model substantially more useful in practice as the interpretation of the regression parameter β is independent of estimated value of the baseline distribution f0.

Finally, Hiejima (1997) recently reexamined QL estimation using tilting and mean models similar to (2.1)–(2.2); however, that author's focus is on interpreting the QL estimator when the specified variance function does not correspond to a natural exponential family distribution.

3. MAXIMUM LIKELIHOOD ESTIMATION

We now consider estimation of model (2.1)–(2.2) for the special case of a finite known support An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg=(y1,…,yK)T of cardinality K, so that baseline density f0 is the K × 1 vector of probabilities f0 = {f0(y1),…,f0(yK)}T. Suppose for i =1,…,n that the Xis are either iid draws from the population or fixed by design. Given each Xi, let Yi be independently sampled from model (2.1)–(2.2), yielding a log-likelihood function l(β,f0)= An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx14_ht.jpg.

The p × 1 score function for β, fixing f0, is given by

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx15_ht.jpg

where An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx16_ht.jpg and θi =gci;f0). Details of computing gci;f0) are in Appendix C of the supplementary material, available at Biostatistics online. Solution of S(β;f0)=0 yields an MLE An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg. Fisher's expected information for β is, as usual,

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx17_ht.jpg

Note additionally that, as μi does not depend on f0, we have for y [set membership] An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg, An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx18_ht.jpg, so that estimation of β is asymptotically unaffected by estimation of f0. Of course, as seen by the form of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx19_ht.jpgββ, estimation of f0 is required in order to allow the use of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx19_ht.jpgββ−1 to estimate var(An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg).

Turning to baseline distribution f0, for each y[set membership] An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg, the score function for f0(y) is given by

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx20_ht.jpg

where An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx21_ht.jpg and An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx22_ht.jpg denote the partial derivatives with respect to f0(y) and θi, holding θi and f0 fixed, respectively. Calculations in Appendix D of the supplementary material, available at Biostatistics online, then yield

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx23_ht.jpg
(3.1)

Note here that Yi represents the response for the ith subject, while y represents a typical value in An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg. Importantly, the first 2 terms in (3.1), arising from An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx21_ht.jpg, form an unbiased estimating function for f0(y), as does the third term. However, the third term, which is due to the fact that θi depends on f0(y), asymptotically contains no information on f0 in the sense that its derivative with respect to any f0(y′), y′ [set membership] An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg, has expected value zero.

The K × 1 score function for f0 for a fixed β is simply

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx24_ht.jpg

Fairly straightforward calculations applied to the first 2 terms of Ty(f0;β) yield Fisher's expected information as the K × K matrix An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx19_ht.jpgf0f0 with typical element

An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx25_ht.jpg

Note importantly that An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx19_ht.jpgf0β = E{−[partial differential]T(f0;β)T/[partial differential]β}=E{−[partial differential]S(β;f0)/[partial differential]f0T}=0. So, maximum likelihood estimation of f0 is asymptotically unaffected by estimation of β.

Solving S(β;f0)=0 and T(f0;β)=0 can be done simultaneously, but it is convenient to iterate between the 2, as suggested by the orthogonality between β and f0. This is analogous to iterative estimation of regression parameters and variance–covariance parameters in fitting generalized estimating equation models (Liang and Zeger, 1986). Both equations may be solved by Fisher scoring; the algorithm for computing An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg is well known. As for the maximum likelihood solution An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg of T(f0;β)=0, note that An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg must satisfy 3 constraints: (i) 0≤f0(y)≤1, (ii) An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx27_ht.jpgf0(yk)=1, and (iii) An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx27_ht.jpgykf0(yk) =μ0. Briefly, this is accomplished by reparameterizing the model in terms of the (K−2)×1 vector h0*, where h0*(yk)=log{f0(yk)}, k = 1,…,K−2. Details are provided in Appendix D.

4. SIMULATION STUDY

We conducted a simulation study of the performance of MLE (An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg, An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg) in the SPGLM given by (2.1)–(2.2) using a log link function. Goals of the simulation included the following: (i) to confirm that the estimation procedure outlined in Section 3 yields consistent estimates and accurate tests and confidence intervals when the true data generation mechanism (DGM) is the SPGLM; (ii) to compare regression parameter and cumulative distribution function (cdf) estimators under a log-linear regression model with Poisson error distribution (PLLM) to those under the SPGLM when the DGM is SPGLM; and (iii) to compare the POM to the SPGLM with respect to test accuracy, power, and cdf estimation, when the DGM is either the SPGLM or the POM.

4.1. Data generation and analysis

Letting covariate vector Xi = (1,Xi1), each Xi1 was an iid draw from a standard normal distribution. Response Yi, taking values on support An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg ={0,1,2,3,4,5}, was generated conditionally on Xi according to 3 different DGMs. The first 2 are based on the SPGLM with link g(·)= log (·), while the third is based on the POM. In the first 2, f0 is either a truncated Poisson or a “0-inflated” Poisson as follows. Let p0(u) be the Poisson probability mass function with mean 1, that is, p0(u)=1/(eu!). In the first case, we set f0(y)=p0(u)/An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx28_ht.jpg. In the second, we set f0(0)=3p0(0)/{3p0(0)+An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx28_ht.jpg} and f0(y)=p0(y)/{3p0(0)+An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx28_ht.jpg}, y=1,…,5. For these 2 DGMs, we set βT =(β0, β1)=(−0.7,0.2), yielding E(Yi) ≈ 0.5 marginally over Xi.

Data for the third case were generated under the POM. Here, we set β1 =0.3. Then, in order to ensure comparability to data generated under the SPGLM, we constructed a baseline cumulative odds function starting with the same f0 as that given by the 0-inflated Poisson in the second DGM. Here, however, f0 was tilted according to (2.2) for θ=gc{exp(−0.7)}, and the resulting distribution was used for the baseline cumulative odds function. This ensured that E(Yi|Xi=0)= exp(β0) was the same in all 3 DGMs used in the simulations.

For each DGM, we generated 1000 replicates each with sample size n = 250. For each replicate, 3 models were fitted to the data: the PLLM, the SPGLM with log link, and the POM. Models were compared based on the values of log-likelihoods, means, and standard errors of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg, coverage probabilities for β, estimated cdf values, and power for detecting β1 ≠0 using a Wald test. Additionally, in order to evaluate Type I error rates under the 3 analysis models, we generated data under the null hypothesis that β1= 0. In all cases, Wald-based procedures were based on the estimated expected Fisher's information.

4.2. Results

Table 1 presents results for An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpg under the 2 SPGLM data–generating mechanisms. As expected, both the PLLM and the SPGLM yield approximately unbiased regression estimators, and the 2 estimators have comparable efficiency. The 95% Wald confidence interval coverage probabilities are accurate for the SPGLM, as expected, but are low for the PLLM under the 0-inflated DGM because the Poisson PLLM model does not account for the overdispersion due to the extra mass at y=0. Table 2 presents the bias and standard deviation of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg for the SPGLM estimators under the first 2 DGMs, demonstrating the consistency of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg. We conclude that the SPGLM MLEs are performing as expected.

Table 1.

Simulation results for β estimation under SPGLM data–generating mechanisms and PLLM and SPGLM models

Table 2.

Simulation results for f0 estimation under SPGLM data–generating mechanisms and models

Table 3 presents comparisons among the 3 models for the 3 DGMs. The first column presents Type I error for testing β1=0 with a Wald test under each model, when data are generated under the null hypothesis that βT =(−0.7,0.0). We see that both the SPGLM and the POM perform acceptably, whereas the PLLM has inflated Type I error for the 0-inflated baseline distribution. The next column presents power for detecting β1 ≠ 0 under the alternative that βT =(−0.7,0.2). Under the first 2 DGMs, the correctly specified SPGLM has higher power than the POM, with the power of the PLLM similar to that of the SPGLM under the truncated Poisson DGM. We ignore power results for the 0-inflated Poisson DGM under the PLLM because the Type I error rate is not correct. Under the third DGM, the correctly specified POM is more powerful. However, in each case, both the SPGLM and the POM are reasonably efficient. The log-likelihood values at the maximum give a similar picture: when either the SPGLM or the POM is the correct model, that model yields, on average, the highest likelihood value; however, the other model has comparable likelihood values and so would be a reasonable competitor in terms of overall fit of the model to the data. The last 2 columns in Table 3 present cdf parameter estimates given X =0. Again, each of the SPGLM and POM is best in terms of bias when it is correctly specified, and the PLLM is only adequate for the truncated Poisson distribution. However, the SPGLM appears to be more stable than the POM in terms of smaller standard errors in the tails of the distribution.

Table 3.

Simulation results for Type I error, power, and cdf estimation under 3 data-generating mechanisms and 3 models for estimation

Based on these results, we expect in applications that the SPGLM and POM would both be reasonable data analysis options in the sense of having similar log-likelihood values and in terms of Type I error, power, and estimation of the conditional (on X) cdf of Y, even when the other model is the true one. The SPGLM is perhaps more stable in estimation of the tails of the cdf, and this deserves further study.

5. APPLICATION TO AGING RESEARCH: AHEAD DATA

We illustrate the use of the SPGLM given in (2.1)–(2.2) to the analysis of data from the Study of Assets and Health Dynamics Among the Oldest Old (AHEAD) (Soldo and others, 1997). AHEAD is a national longitudinal study with an initial sample of 7444 respondents aged 70 years and older and their spouses (if married). Objectives of the study include monitoring transitions in physical, functional, and cognitive health and examining the relationship of late-life changes in physical and cognitive health to patterns of dissaving and income flows. Here, we use the baseline data collected in 1993. Variable descriptions are in Table 4; complete data sample size is n = 6441.

Table 4.

Variables from baseline wave of the AHEAD study

In a first analysis, we model numiadl as a function of age, sex, immediate word recall (iwr), and netwc using the SPGLM with the log link function, as well as a log-linear model with variance proportional to the mean (quasi-Poisson), and the POM. Estimates and fit statistics are given in Table 5. Comparing the SPGLM and quasi-Poisson model fits, regression estimates are as expected extremely close; the overdispersion parameter estimate of 1.48 confirms that the Poisson model is inadequate for the conditional distribution of (Y|X). The standard errors given by the quasi-Poisson and the SPGLM models are essentially equal, suggesting that, insofar as including an overdispersion parameter in the variance model corrects for effects of variance misspecification on β-inferences, the SPGLM has done an excellent job of modeling the baseline distribution.

Table 5.

Fitted log-linear and POMs for numiadl, AHEAD data. Log-linear models are estimated with SPGLM and quasi-Poisson regression

Turning to the SPGLM and the POM model fits, we make 2 remarks. First, based on log-likelihood values, the overall fit of the 2 models is indistinguishable. Second, while the regression parameters are not directly comparable, the corresponding Z-statistics can be compared and we see that qualitatively very similar conclusions are drawn about the effects of the predictors on numiadl. We conclude in this example that both the SPGLM and the POM are equally appropriate approaches to modeling these data, the main difference between the 2 being in the interpretation of the regression coefficients.

To further examine the ability of the SPGLM to capture the relationship of both the mean and the cdf of numiadl to iwr, we fitted a model with iwr as the sole predictor (obtaining An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx13_ht.jpgiwr =−0.30±0.013). The fitted means (log scale) are compared to the empirical means in Figure 1. Additionally, the fitted values for Pr(iadl ≥3| iwr) compare very favorably with the empirical proportions (Figure 1, right panel); similar results were obtained for Pr(iadl≥1|iwr). The plots indicate that our model is doing well at describing both the mean and the underlying distribution function as a function of iwr. This illustrates a key advantage of the SPGLM as estimation of the conditional distribution function is not available with QL.

Fig. 1.

Fitted SPGLM model for numiadl as a function of immediate word recall (iwr), AHEAD data. Left panel: Log-mean numiadl by iwr, fitted (triangles) and observed (circles with pluses indicating 95% large-sample confidence intervals) values. Right panel: Logit ...

In a second analysis, we model iwr as a function of age, sex, and netwc using the SPGLM with the logistic link function g(μ)= log{μ/(10−μ)} appropriate for response Y that is the number of successes out of 10 trials. We compare this to the standard binomial model with overdispersion (quasi-binomial). In Table 6, we see that regression estimates from the 2 model fits are very close and, as in the previous example, that the standard errors from the SPGLM and the overdispersed binomial model are also very close. Additionally, because the support An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx4_ht.jpg is the same for the binomial model and the SPGLM models, we can directly compare the log-likelihood values with a likelihood ratio test. This yields a test statistic of X2 = 520 on K−2=9 df, indicating that the SPGLM fits the data significantly better than the binomial model.

Table 6.

Logistic linear models for iwr estimated with SPGLM, binomial and quasi-binomial regression, AHEAD data

Fig. 2.

Fitted SPGLM and binomial log-logistic models for immediate word recall, AHEAD data. Left panel: Histogram of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg for each model. Right panel: Variance functions for the SPGLM model fit (—) and the binomial model (- - -).

To gain more insight into how the SPGLM is modeling the response distribution differently than the binomial model, we plot in Figure 2 the histograms of An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg under the SPGLM and binomial models and the variance function (var(Y) as a function of E(Y)) generated by the fitted An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg. It can be seen that An external file that holds a picture, illustration, etc.
Object name is biostskxn030fx26_ht.jpg is indeed more dispersed under the SPGLM than under the binomial model. Additionally, the variance function is clearly larger and not quadratic, thereby capturing the overdispersion in the data relative to the binomial distribution.

6. DISCUSSION

We have proposed a new class of SPGLMs for the distribution of (Y|X), characterized by a parametric function for E(Y|X) and an unspecified reference distribution to be estimated with the data. The distribution of (Y|X) for a given X is generated by an exponential tilting of the reference distribution according to a canonical parameter computed from the mean. We have shown via simulations and examples that our model yields very similar inferences for the mean parameters as commonly used overdispersed generalized linear models and enjoys a comparable level of flexibility as the popular POM, while characterizing the dependence of Y on X via the mean rather than the cumulative odds.

For count data such as the iadl or iwr responses in the AHEAD data, other semiparametric approaches are certainly available. For example, Podlich and others (2004) propose a model wherein Yi arises as an extended Poisson process over a fixed time interval in which the rate parameter λi,j varies as a function of j as Yi increments from j to j+1. Covariate effects are incorporated via a multiplicative term such as exp(XiTβ), so that λi,j= exp(XiT β) λ0,j, where λ0,j is a nonparametric sequence of reference rates. Our model shares the same degree of flexibility as theirs by also marrying a parametric model for covariate effects with a nonparametric model for a reference distribution. It differs from theirs in that our regression is for the mean and not for a rate parameter and in that our model is not in general restricted to count data.

Straightforward generalizations of model (2.1)–(2.2) obtain through relaxing μ(·) to be a general nonlinear function of β. More involved generalizations include replacing the tilting function θy with θh(y) for known strictly monotone function h(·), estimating h(·;γ) as a function of some parameter γ, and/or using a more general tilting function w(θ,y;γ), perhaps estimated as a member of a parametric family (see, e.g. Fokianos, 2004). One use for such generalizations would be to diagnose whether the exponential tilting model (2.2) is reasonable in a given data set. These are areas for future development.

An important area for further development is the extension of these results to the case where Y has infinite support, either discrete or continuous. A major challenge in launching this work in the continuous case is to show that the nonparametric MLE of f0 has positive mass only on the observed support. Related earlier work on this problem, for example by Vardi (1985), does not directly apply because our tilting function depends on f0 through the parameter θ. An alternative approach is to estimate f0 via smooth density estimation. In either case, we expect inferences about β to be robust to specification and estimation of f0 due to the orthogonality of these 2 parameters.

FUNDING

National Cancer Institute (5PO1 CA98262) to P.J.R.

Supplementary Material

[Supplementary Material]

Acknowledgments

The authors thank Peter McCullagh and Ronald Thisted for stimulating discussions and useful comments on earlier drafts of this work. Conflict of Interest: None declared.

References

  • Anderson JA. Separate sample logistic discrimination. Biometrika. 1972;59:19–35.
  • Barndorff-Nielsen OE, Cox DR. Asymptotic Techniques for Use in Statistics. London: Chapman & Hall; 1989.
  • Birch MW. Maximum likelihood in three-way contingency tables. Journal of the Royal Statistical Society Series B. 1963;25:220–233.
  • Breslow NE, Clayton DG. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25.
  • Carroll RJ, Stefanski LA. Approximate quasi-likelihood estimation in models with surrogate predictors. Journal of the American Statistical Association. 1990;85:652–663.
  • Chronis AM, Lahey BB, Pelham WE, Williams SH, Baumann BL, Kipp H, Jones HA, Rathouz PJ. Maternal depression and early positive parenting predict future conduct problems in young children with attention-deficit/hyperactivity disorder. Developmental Psychology. 2007;43:7–82. [PubMed]
  • Clarke P, George LK. The role of the built environment in the disablement process. American Journal of Public Health. 2005;95:1933–1939. [PubMed]
  • Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society Series B. 1987;49:1–18.
  • Crowder M. On consistency and inconsistency of estimating equations. Econometric Theory. 1986;2:305–330.
  • Crowder M. On linear and quadratic estimating functions. Biometrika. 1987;74:591–597.
  • Davison AC, Hinkley DV. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press; 1997.
  • Efron B, Tibshirani R. Using specially designed exponential families for density estimation. Annals of Statistics. 1996;24:2431–2461.
  • Fokianos K. Merging information for semiparametric density estimation. Journal of the Royal Statistical Society Series B. 2004;66:941–958.
  • Fokianos K, Kedem B, Qin J, Short DA. A semiparametric approach to the one-way layout. Technometrics. 2001;43:56–65.
  • Gilbert PB. Goodness-of-fit tests for semiparametric biased sampling models. Journal of Statistical Planning and Inference. 2004;118:51–81.
  • Hedeker D, Gibbons RD. A random-effects ordinal regression model for multilevel analysis. Biometrics. 1994;50:933–944. [PubMed]
  • Hiejima Y. Interpretation of the quasi-likelihood via the tilted exponential family. Journal of the Japan Statistical Society. 1997;2:157–164.
  • Huber PJ. The behavior of maximum likelihood estimates under nonstandard conditions. Proceedings of the 5th Berkeley Symposium. 1967;1:221–233.
  • Jørgensen B. Exponential dispersion models. Journal of the Royal Statistical Society Series B. 1987;49:127–162.
  • Lahey BB, Van Hulle CA, Waldman ID, Rodgers JL, D'onofrio BM, Pedlow S, Rathouz PJ, Keenan K. Testing descriptive hypotheses regarding sex differences in the development of conduct problems and delinquency. Journal of Abnormal Child Psychology. 2007;34:737–755. [PubMed]
  • Laird NM, Ware JH. Random effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed]
  • Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
  • Lin X, Breslow NE. Bias correction in generalized linear mixed models with multiple components of dispersion. Journal of the American Statistical Association. 1996;91:1007–1017.
  • McCullagh P. Regression models for ordinal data (with discussion) Journal of the Royal Statistical Society Series B. 1980;42:109–142.
  • McCullagh P. Quasi-likelihood functions. Annals of Statistics. 1983;11:59–67.
  • McCullagh P, Nelder JA. Generalized Linear Models. 2nd edition. London: Chapman & Hall; 1989.
  • Morris CN. Natural exponential families with quadratic variance functions. Annals of Statistics. 1982;10:65–80.
  • Owen AB. Empirical Likelihood. Boca Raton, FL: Chapman & Hall; 2001.
  • Podlich HM, Faddy MJ, Smyth GK. Semi-parametric extended Poisson process models for count data. Statistics and Computing. 2004;14:311–321.
  • Qin J. Inferences for case-control and semiparametric two-sample density ratio models. Biometrika. 1998;85:619–630.
  • Robins JM, Rotnitzky A, Zhao LP. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866.
  • Robins JM, Rotnitzky A, Zhao LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association. 1995;90:106–121.
  • Scott AJ, Wild CJ. Fitting logistic models under case-control or choice-based sampling. Journal of the Royal Statistical Society Series B. 1986;48:170–182.
  • Soldo BJ, Hurd MD, Rodgers WL, Wallace RB. Asset and health dynamics of the oldest old: an overview of the AHEAD study. Journals of Gerontology, Series B, Psychological Sciences and Social Sciences. 1997;52B:1–20. [PubMed]
  • Sulander T, Martelin T, Rahkonen O, Nissinen A, Uutela A. Associations of functional ability with health-related behavior and body mass index among the elderly. Archives of Gerontology and Geriatrics. 2005;40:185–199. [PubMed]
  • Vardi Y. Empirical distributions in selection bias models. Annals of Statistics. 1985;13:178–203.
  • Wedderburn RWM. Quasi-likelihood functions, generalized linear models and the Gauss-Newton method. Biometrika. 1974;61:439–447.
  • White H. Maximum likelihood estimation of misspecified models. Econometrics. 1982;50:1–25.

Articles from Biostatistics (Oxford, England) are provided here courtesy of Oxford University Press