Estimation of
ρXY is based on the observed data of size
n,

where
i =
i
1 (
Ui) +
2(
Ui),
Ỹi = Yiψ1(
Ui) +
ψ2(
Ui) and the unobserved variables (
X,
Y) are defined to be the parts of

and
Ỹ that are independent of
U. The proposed estimator of
ρXY is constructed from local method of moments estimates of
ρXY. These local estimates utilize the fact that, under the general adjustments (2)–(3), the correlation between

and
Ỹ at a fixed
U is equal to the correlation
ρXY. To be more precise, denote
![[rho with tilde]](/corehtml/pmc/pmcents/x03C1x0303.gif)
(
u) to be the correlation between

and
Ỹ given
U = u, defined by
![[rho with tilde]](/corehtml/pmc/pmcents/x03C1x0303.gif)
(
u)
Corr(
,Ỹ |
U = u)
= Cov(
,Ỹ |
U = u)/{
Var(

|
U = u)
Var(
Ỹ |
U = u)}
1/2. Note that by conditioning on
U = u, it follows from the definitions of
Ỹ and

and the invariance of
ρXY to linear transformations that
if
1(
u) and
ψ1(
u) are assumed to be of the same sign. The above relationship implies that within a neighborhood of
u, the correlation between the observed variables

and
Ỹ, denoted
ρ
Ỹ, will target
ρXY of interest. The proposed estimator of
ρXY, based on this relationship, is an average of localized method of moments estimates of
![[rho with tilde]](/corehtml/pmc/pmcents/x03C1x0303.gif)
(
u).
To obtain the targeted local estimates, we bin the observed data with respect to
U. The range of
U is divided into
m equidistant intervals, referred to as bins and denoted by
B1,… ,
Bm. Let
Lj denote the number of subjects falling into bin
j, 1 ≤
j ≤
m. To track the observations that fall into a given bin, bin-specific observations are marked by a prime. For example, data for subject
k in bin
j is

for
1 ≤
k ≤
Lj. We define the following local method of moments estimator of the correlation between

and
Ỹ within bin
j,
where

, and

. Guidelines for choosing the total number of bins
m will be given in the simulation studies of Section 4. Because
rj targets
ρXY for all
j = 1, …,
m, a natural estimator of
ρXY can be based on the average of

. Therefore, the proposed covariate adjusted correlation estimator of
ρXY is
which is a weighted average of the bin specific estimators. Note that the weights are proportional to the numbers of points in each bin. The covariate adjusted estimator,
r, is consistent for
ρXY, as given by the following result. The proof is deferred to the
Appendix section.
Theorem 1: Under the technical conditions given in the
Appendix,
where cn = {n/log(n)}−1/3.
We emphasize here that the consistency of the covariate adjusted correlation estimator,
r, holds under the general additive and multiplicative adjustments (2)–(3). However, as pointed out in the “Introduction” section and proven in the
Appendix section, the special case of additive linear effects of
U (i.e.,
= X +
a1 U +
a2 and
Ỹ = Y +
b1 U +
b2) can be handled with standard partial correlation analysis. The partial correlation estimate is obtained by first regressing (1)

on
U and (2)
Ỹ on
U to obtain two sets of residuals. The partial correlation estimate is then obtained as the Pearson correlation between the two sets of residuals. In contrast to the additive linear case, the partial correlation does not target
ρXY under general additive effects of
U on
X and
Y, such as nonlinear effects. More specifically, consider
= X +
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
(
U) and
Ỹ = Y +
ψ(
U), where
![[var phi]](/corehtml/pmc/pmcents/x03C6.gif)
(·) and
ψ(·) are unknown smooth functions of
U that may be nonlinear. Under these general additive effects, it is also shown in the
Appendix section that a simple generalization of the partial correlation, called non-parametric partial correlation, targets
ρXY. The only difference between partial and nonparametric partial correlation is that, for the latter, the two sets of residuals are obtained from nonparametric regressions of

on
U and
Ỹ on
U. Both partial and nonparametric partial regression do not target
ρXY under the more general form of (2)–(3), as shown in the
Appendix section.
We note that while r is based on an equidistant binning procedure, alternative binning approaches can be integrated to the estimation procedure proposed above. One alternative approach that we also explored is based on nearest neighbor binning. As pointed out earlier, for the equidistant binning used, Bj, j = 1,…, m, are fixed and equidistant; however, the number of data points, Lj, falling into each bin is random. In nearest neighbor binning, the bin lengths and boundaries are random, but each bin contains the same number of observations, denoted by L. This alternative binning utilizes the nearest neighbor idea by first ordering the observed distortion values Ui, i = 1,…, n, and then forming the m = n/L number of bins by grouping the sets of L nearest neighbor values among the ordered set starting with the first L to the last. Once the bins are formed, the rest of the procedure is the same as explained for the case of equidistant binning. We compare the performance of the two binning procedures in more detail in Section 4.4 with respect to various distributions for U.
Also, upon the suggestion of the editor, we explored a variation on the proposed estimator in (4) by replacing the rj's in (4) with their Fisher's z transformed values (i.e., .5{ln (1 + rj) − ln (1 − rj)}). Comparison of r with this variation is given in Section 4.5.
For inference, we use the bootstrap percentile method to form confidence intervals (CIs) based on the proposed covariate adjusted estimator in the analysis of the female FMR1 premutation data. The estimated nonparametric density of the standardized 1000 bootstrap estimates of ρXY is given in (bottom panel), along with the standard normal density curve. The fitted density appears close to the standard normal density, indicating that the percentile bootstrap approximation is reasonable. The coverage of the proposed bootstrap percentile CIs are examined through simulations reported in Section 4.3.
An important practical issue with the application of the proposed estimator is the adequacy of the assumed adjustment forms (2)–(3). Although these assumed dual additive and multiplicative adjustment forms are fairly general compared to the additive linear restriction of other methods like partial correlation, it is still of interest to check the adequacy of these forms. We address this issue next by developing a bootstrap test to check this assumption.