We analyze the CPP data to identify the relationship of the children’s IQ at 7 years of age to in utero exposure to polychlorinated biphenyls (PCBs), after adjusting for potential confounders, including the highest education level attained by the mother (EDU).
Additional covariates in the analysis are socioeconomic status of the child’s family (SES), the gender of the child (SEX, with female=1 and male=0) and the race of the child (RACE, with black=1 and other=0). To model the nonlinear effect of education noted in the Introduction, we consider the following partial linear model,
) is an unspecified function to be estimated along with βi
= 1,…,4. To estimate the nonparametric function α(·), we adopted a three-degree truncated power function basis
with five fixed knots 1
selected as the equally spaced sample quantiles of EDU, i.e., 2,5,9,12,15. Under these specifications, the above model can be rewritten as IQ
)δ+ε, where δ = (δ0
is the parameter vector associated with the nonparametric function α(·).
We analyzed the CPP data with the following methods using a penalized spline for α(EDU
): the proposed method (P), the modified Horvitz-Thompson weighted likelihood method (HT), the modified Breslow-Cain pseudo-likelihood method (BC), and the MLE estimator based on the SRS sample (MLE-SRS). The smoothing parameter was selected as 0.0853 by the proposed GCV method. Additionally, for the proposed method, we also considered modeling α(EDU
) as a linear, quadratic, or cubic function of EDU. Furthermore, we considered using a restricted cubic spline (Herndon and Harrell 1990
) for α(EDU
) and obtained the corresponding estimate through maximizing F
). The restricted cubic spline, which has a linearly constrained tails which is slightly different from the general cubic spline function and can be used directly to fit models without penalty.
) from the different methods and their corresponding 95% confidence intervals given in . The fitted
) tell a similar story in that there is a clear nonlinear trend present in all fitted lines. The most noticeable difference is the width of the confidence interval band, which indicates which method is more efficient. A careful inspection of the trend of
) reveals that the rate of rise of
) is much faster after around year 12 (i.e. after high school education). This agrees with the previous published results (e.g., Breslau et al., 2005
; Oddy et al., 2003
) that mother’s years in college have a much greater effect on child IQ.
Figure 2 The estimated function α(·) on EDU for CPP data. Plot a: the curve obtained by proposed method; Plot b: the curve obtained by HT method; Plot c: the curve obtained by BC method; Plot d: the curve obtained by MLE method based on the SRS (more ...)
We conducted likelihood ratio tests for testing if the nonlinear fit of α(EDU
) from the proposed method can be represented by a simple polynomial function. The following three tests on the form of α(EDU
) are for linear, quadratic, and cubic functions, respectively.
- Test 1: H0 : α(EDU) = δ0 + δ1EDU,
- Test 2: H0 : α(EDU) = δ0 + δ1EDU + δ2EDU2,
- Test 3: H0 : α(EDU) = δ0 + δ1EDU + δ1EDU2 + δ3EDU3.
The test statistic for the Test 1–3 are: for linear α(EDU
with p< 0.001; for quadratic α(EDU
with p< 0.001; and for cubic α(EDU
with p< 0.001, respectively. These results suggest that, although the cubic fit in may be sufficiently close to the fully nonparametric fit in for practical purposes, there is still statistical evidence suggesting that
) may be more complex than a cubic function.
The parameter estimates from the six methods are presented in which also includes the analysis with a linear effect for EDU using the Zhou et al. (2002)
method which is based on the ODS data only. Overall, the point estimates from the above methods are similar. The most obvious difference across the methods is that the standard error estimates from the proposed methods (the P, cubic and restricted cubic spline) are much smaller for the covariates. This reflects the fact that these three methods utilized the real values of the covariates in the entire study cohort while the others only used the fraction as weight in the inference. In addition, we computed the values of the penalized log-likelihood function (3
) for these three methods, which are Qp
= −150441.651, Qc
= −150474.636, QR
= −150452.816 respectively corresponding to the P, cubic and restricted cubic spline methods, indicating that the P method is more suitable for this CPP data set than the other two methods. However, for practical purposes, the restricted cubic spline method is a viable alterative in this case.
Analysis results for the CPP data set.