Estimation of
ρ_{XY} is based on the observed data of size
n,
where
_{i} = _{i} _{1} (
U_{i}) +
_{2}(
U_{i}),
Ỹ_{i} = Y_{i}ψ_{1}(
U_{i}) +
ψ_{2}(
U_{i}) and the unobserved variables (
X,
Y) are defined to be the parts of
and
Ỹ that are independent of
U. The proposed estimator of
ρ_{XY} is constructed from local method of moments estimates of
ρ_{XY}. These local estimates utilize the fact that, under the general adjustments (2)–(3), the correlation between
and
Ỹ at a fixed
U is equal to the correlation
ρ_{XY}. To be more precise, denote
(
u) to be the correlation between
and
Ỹ given
U = u, defined by
(
u)
Corr(
,Ỹ 
U = u)
= Cov(
,Ỹ 
U = u)/{
Var(

U = u)
Var(
Ỹ 
U = u)}
^{1/2}. Note that by conditioning on
U = u, it follows from the definitions of
Ỹ and
and the invariance of
ρ_{XY} to linear transformations that
if
_{1}(
u) and
ψ_{1}(
u) are assumed to be of the same sign. The above relationship implies that within a neighborhood of
u, the correlation between the observed variables
and
Ỹ, denoted
ρ_{Ỹ}, will target
ρ_{XY} of interest. The proposed estimator of
ρ_{XY}, based on this relationship, is an average of localized method of moments estimates of
(
u).
To obtain the targeted local estimates, we bin the observed data with respect to
U. The range of
U is divided into
m equidistant intervals, referred to as bins and denoted by
B_{1},… ,
B_{m}. Let
L_{j} denote the number of subjects falling into bin
j, 1 ≤
j ≤
m. To track the observations that fall into a given bin, binspecific observations are marked by a prime. For example, data for subject
k in bin
j is
for
1 ≤
k ≤
L_{j}. We define the following local method of moments estimator of the correlation between
and
Ỹ within bin
j,
where
, and
. Guidelines for choosing the total number of bins
m will be given in the simulation studies of Section 4. Because
r_{j} targets
ρ_{XY} for all
j = 1, …,
m, a natural estimator of
ρ_{XY} can be based on the average of
. Therefore, the proposed covariate adjusted correlation estimator of
ρ_{XY} is
which is a weighted average of the bin specific estimators. Note that the weights are proportional to the numbers of points in each bin. The covariate adjusted estimator,
r, is consistent for
ρ_{XY}, as given by the following result. The proof is deferred to the
Appendix section.
Theorem 1: Under the technical conditions given in the
Appendix,
where c_{n} = {n/log(n)}^{−1/3}.
We emphasize here that the consistency of the covariate adjusted correlation estimator,
r, holds under the general additive and multiplicative adjustments (2)–(3). However, as pointed out in the “Introduction” section and proven in the
Appendix section, the special case of additive linear effects of
U (i.e.,
= X +
a_{1} U +
a_{2} and
Ỹ = Y +
b_{1} U +
b_{2}) can be handled with standard partial correlation analysis. The partial correlation estimate is obtained by first regressing (1)
on
U and (2)
Ỹ on
U to obtain two sets of residuals. The partial correlation estimate is then obtained as the Pearson correlation between the two sets of residuals. In contrast to the additive linear case, the partial correlation does not target
ρ_{XY} under general additive effects of
U on
X and
Y, such as nonlinear effects. More specifically, consider
= X +
(
U) and
Ỹ = Y +
ψ(
U), where
(·) and
ψ(·) are unknown smooth functions of
U that may be nonlinear. Under these general additive effects, it is also shown in the
Appendix section that a simple generalization of the partial correlation, called nonparametric partial correlation, targets
ρ_{XY}. The only difference between partial and nonparametric partial correlation is that, for the latter, the two sets of residuals are obtained from nonparametric regressions of
on
U and
Ỹ on
U. Both partial and nonparametric partial regression do not target
ρ_{XY} under the more general form of (2)–(3), as shown in the
Appendix section.
We note that while r is based on an equidistant binning procedure, alternative binning approaches can be integrated to the estimation procedure proposed above. One alternative approach that we also explored is based on nearest neighbor binning. As pointed out earlier, for the equidistant binning used, B_{j}, j = 1,…, m, are fixed and equidistant; however, the number of data points, L_{j}, falling into each bin is random. In nearest neighbor binning, the bin lengths and boundaries are random, but each bin contains the same number of observations, denoted by L. This alternative binning utilizes the nearest neighbor idea by first ordering the observed distortion values U_{i}, i = 1,…, n, and then forming the m = n/L number of bins by grouping the sets of L nearest neighbor values among the ordered set starting with the first L to the last. Once the bins are formed, the rest of the procedure is the same as explained for the case of equidistant binning. We compare the performance of the two binning procedures in more detail in Section 4.4 with respect to various distributions for U.
Also, upon the suggestion of the editor, we explored a variation on the proposed estimator in (4) by replacing the r_{j}'s in (4) with their Fisher's z transformed values (i.e., .5{ln (1 + r_{j}) − ln (1 − r_{j})}). Comparison of r with this variation is given in Section 4.5.
For inference, we use the bootstrap percentile method to form confidence intervals (CIs) based on the proposed covariate adjusted estimator in the analysis of the female FMR1 premutation data. The estimated nonparametric density of the standardized 1000 bootstrap estimates of ρ_{XY} is given in (bottom panel), along with the standard normal density curve. The fitted density appears close to the standard normal density, indicating that the percentile bootstrap approximation is reasonable. The coverage of the proposed bootstrap percentile CIs are examined through simulations reported in Section 4.3.
An important practical issue with the application of the proposed estimator is the adequacy of the assumed adjustment forms (2)–(3). Although these assumed dual additive and multiplicative adjustment forms are fairly general compared to the additive linear restriction of other methods like partial correlation, it is still of interest to check the adequacy of these forms. We address this issue next by developing a bootstrap test to check this assumption.