Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2748149

Formats

Article sections

- Summary
- 1. Introduction
- 2. Estimation
- 3. Assessing the Adjustment Model Assumption
- 4. Simulation Studies
- 5. Application to Female FMR1 Premutation Data
- 6. Concluding Remarks
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2009 September 22.

Published in final edited form as:

Published online 2009 January 23. doi: 10.1111/j.1541-0420.2008.01169.x

PMCID: PMC2748149

NIHMSID: NIHMS120050

Damla Şentürk,^{1} Danh V. Nguyen,^{2,}^{*} Flora Tassone,^{3} Randi J. Hagerman,^{4} Raymond J. Carroll,^{5} and Paul J. Hagerman^{3}

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

Motivated by molecular data on female premutation carriers of the fragile X mental retardation 1 (*FMR1*) gene, we present a new method of covariate adjusted correlation analysis to examine the association of messenger RNA (mRNA) and number of CGG repeat expansion in the *FMR1* gene. The association between the molecular variables in female carriers needs to adjust for activation ratio (ActRatio), a measure which accounts for the protective effects of one normal X chromosome in females carriers. However, there are inherent uncertainties in the exact effects of ActRatio on the molecular measures of interest. To account for these uncertainties, we develop a flexible adjustment that accommodates both additive and multiplicative effects of ActRatio nonparametrically. The proposed adjusted correlation uses local conditional correlations, which are local method of moments estimators, to estimate the Pearson correlation between two variables adjusted for a third observable covariate. The local method of moments estimators are averaged to arrive at the final covariate adjusted correlation estimator, which is shown to be consistent. We also develop a test to check the nonparametric joint additive and multiplicative adjustment form. Simulation studies illustrate the efficacy of the proposed method. (Application to *FMR1* premutation data on 165 female carriers indicates that the association between mRNA and CGG repeat after adjusting for ActRatio is stronger.) Finally, the results provide independent support for a specific jointly additive and multiplicative adjustment form for ActRatio previously proposed in the literature.

Fragile X syndrome (FXS) is the most common inherited form of X-linked intellectual disability, with cognitive and behavioral impairments associated with distinct physical features. FXS results from a hyperexpansion of a CGG trinucleotide repeat in the promoter region of the fragile X mental retardation 1 (*FMR1*) X-linked gene (Oberlé et al., 1991; Verkerk et al., 1991). When the number of CGG repeats exceeds 200 (full mutation) methylation and transcriptional silencing of the gene occur (Pieretti et al., 1991) with consequent absence or deficiency of the FMR1 protein (FMRP; Devys et al., 1993). Individuals with smaller expansions in the premutation range of 55 to 200 CGG repeats are called premutation carriers. Many premutation carriers have some physical and behavioral characteristics of FXS (Hagerman, 2002) while a subgroup of older adult carriers develop fragile X-associated tremor/ataxia syndrome (FXTAS) later in their lives (Jacquemont et al., 2004) and about 20% develop premature ovarian failure. For a review, see Hagerman and Hagerman (2004). However, molecular mechanisms/models for the myriad of clinical involvements associated with premutation carriers, a current area of active research, are distinct from the molecular model that characterizes FXS. More precisely, unlike full mutation, premutation alleles do not lead to transcriptional silencing of *FMR1*. Indeed, it has been shown that premutation male carriers have significantly elevated levels of *FMR1* mRNA compared to normal controls (Tassone, Hagerman, Chamberlain, et al., 2000; Tassone, Hagerman, Taylor, et al., 2000; Kenneson et al., 2001) and mRNA levels are positively correlated with the number of CGG repeats.

For female premutation carriers, the underlying association/correlation between CGG repeat size and mRNA level is more complex. The analysis of this correlation needs to take into account (or adjust for) the protective effects from one normal X chromosome. This protective effect is quantified by the activation ratio (ActRatio), which measures the proportion of normal active X chromosomes. Although it is difficult to precisely account for the effect of ActRatio on observed mRNA level, Tassone, Hagerman, Chamberlain, et al. (2000) proposed to examine the relationship between CGG repeat size and mRNA level, after adjusting for ActRatio, based on the following adjustment

$$\stackrel{\sim}{X}=(1-U)X+aU,$$

(1)

where is the observed mRNA level, *X* is the unobserved (adjusted) mRNA level due to the carrier chromosome, *U* is the ActRatio, and *a* is the fixed mean level of mRNA in normal alleles. The parametric adjustment in (1) is a simple decomposition of the observed mRNA level into two parts, one from the normal allele and the other from the diseased allele. Although this simple decomposition serves as a simple and biologically sensible adjustment, it does not account for the inherent uncertainties in the precise effect of ActRatio on mRNA expression level (Tassone, Hagerman, Chamberlain, et al., 2000). Hence, we propose a more general, fully non-parametric adjustment that incorporates both additive and multiplicative effects of *U*, as in (1). More precisely, we consider the following adjustment, of which the previous adjustment (1) is a special case,

$$\stackrel{\sim}{X}={\phi}_{1}(U)X+{\phi}_{2}(U),$$

(2)

where _{1}(·) and _{2}(·) are unknown smooth functions of *U*. Similarly, the potential effect of *U* on the variability in CGG repeats is modeled as

$$\stackrel{\sim}{Y}={\psi}_{1}(U)Y+{\psi}_{2}(U),$$

(3)

where *ψ*_{1}(·) and *ψ*_{2}(·) are also allowed to be general unknown smooth functions to accommodate uncertainties in the effects of *U*, *Ỹ* is the observed CGG repeat size, and *Y* is its *U*-adjusted form. In (2)–(3), the unobserved variables (*X*, *Y*) are defined to be the parts of (*,Ỹ*) that are independent of *U*. Our aim is estimation of the correlation between *X* and *Y*, denoted *ρ _{XY}*, adjusted for the general effects of

The adjustments that we consider in (2)–(3) are flexible to accommodate linear effects of *U*, as in (1), or nonlinear effects. In addition, the effects of *U* may be additive, multiplicative, or a combination of both. The trivial case where *U* has no effect is accommodated with _{1}(·) *= ψ*_{1}(·) *=* 1 and _{2}(·) *= ψ*_{2}(·) = 0. Also, because there are no assumptions made on the unknown functions in (2)–(3), other than smoothness, an important property of the proposed estimator for *ρ _{XY}* is its invariance under linear transformations, similar to the Pearson correlation. Thus, the proposed covariate adjusted correlation is unaffected by the scale of the measurements.

We note that the adjustments (2)–(3) are partly related to the work of Şentürk and Müller (2005b). They proposed an estimator for *ρ _{XY}* (a) under the special case when

Although the proposed adjustment formulation (2)–(3) is motivated from the problem of assessing the association between molecular measures, adjusted for ActRatio in female premutation carrier data, it is sufficiently general for a variety of other applications. Thus, the proposed adjusted estimator and the associated theory should be of broader interest beyond the motivating area of application. For instance, examples of covariate adjustments (2)–(3) include normalization of albumin turnover and protein catabolic rate (Kaysen et al., 2002), through division by *U*, body surface area. Such a normalization of the observed variables is common in biomedical studies, and can be viewed as a special case of the adjustments (2)–(3) with _{2}(·) = *ψ*_{2}(·) = 0 and _{1}(·) = *ψ*_{1}(·) = *U*. A similar adjustment in environmental health is described in Schisterman et al. (2005), where the exposure level of polychlorinated biphenyl (PCB), a lipophilic compound, is adjusted through division by a function of serum lipid levels (*U*).

We also note that although the additive effects of a covariate can be adjusted for with standard approaches, such as partial correlation or nonparametric partial correlation, these methods cannot adjust for multiplicative (possibly nonlinear) effects. A limitation of the partial correlation is that it adjusts for only additive linear effects of a covariate. More specifically, it can be shown that the standard partial correlation between and *Ỹ* adjusted for a covariate *U* targets *ρ _{XY}*, when

The remainder of the article is organized as follows. We detail the proposed covariate adjusted estimator of *ρ _{XY}* in Section 2. The asymptotic result is also given in Section 2, where the proof is deferred to the Appendix section. In Section 3, we propose a bootstrap test to check the proposed dual additive and multiplicative adjustment structure of (2)–(3). The proposed method is further examined with simulation studies and illustrated with an application to the aforementioned data on female

Estimation of *ρ _{XY}* is based on the observed data of size

$$\stackrel{\sim}{\rho}(u)={\rho}_{XY},$$

if _{1}(*u*) and *ψ*_{1}(*u*) are assumed to be of the same sign. The above relationship implies that within a neighborhood of *u*, the correlation between the observed variables and *Ỹ*, denoted *ρ _{Ỹ}*, will target

To obtain the targeted local estimates, we bin the observed data with respect to *U*. The range of *U* is divided into *m* equidistant intervals, referred to as bins and denoted by *B*_{1},… ,* B _{m}*. Let

$${r}_{j}=\frac{{M}_{\stackrel{\sim}{X}\stackrel{\sim}{Y},j}-{M}_{\stackrel{\sim}{X},j}{M}_{\stackrel{\sim}{Y},j}}{\sqrt{{M}_{{\stackrel{\sim}{X}}^{2},j}-{M}_{\stackrel{\sim}{X},j}^{2}}\sqrt{{M}_{{\stackrel{\sim}{Y}}^{2},j}-{M}_{\stackrel{\sim}{Y},j}^{2}}},$$

where
${M}_{\stackrel{\sim}{X}\stackrel{\sim}{Y},j}={L}_{j}^{-1}{\sum}_{k=1}^{{L}_{j}}{\stackrel{\sim}{X}}_{jk}^{\prime}{\stackrel{\sim}{Y}}_{jk}^{\prime},{M}_{\stackrel{\sim}{X},j}={L}_{j}^{-1}{\sum}_{k=1}^{{L}_{j}}{\stackrel{\sim}{X}}_{jk}^{\prime},{M}_{\stackrel{\sim}{Y},j}={L}_{j}^{-1}{\sum}_{k=1}^{{L}_{j}}{\stackrel{\sim}{Y}}_{jk}^{\prime},{M}_{{\stackrel{\sim}{X}}^{2},j}={L}_{j}^{-1}{\sum}_{k=1}^{{L}_{j}}{\stackrel{\sim}{X}}_{jk}^{{\prime}^{2}}$, and
${M}_{{\stackrel{\sim}{Y}}^{2},j}={L}_{j}^{-1}{\sum}_{k=1}^{{L}_{j}}{\stackrel{\sim}{Y}}_{jk}^{{\prime}^{2}}$. Guidelines for choosing the total number of bins *m* will be given in the simulation studies of Section 4. Because *r _{j}* targets

$$r=\sum _{j=1}^{m}\frac{{L}_{j}}{n}{r}_{j},$$

(4)

which is a weighted average of the bin specific estimators. Note that the weights are proportional to the numbers of points in each bin. The covariate adjusted estimator, *r*, is consistent for *ρ _{XY}*, as given by the following result. The proof is deferred to the Appendix section.

Theorem 1: Under the technical conditions given in the Appendix,

$$r={\rho}_{XY}+{O}_{p}({c}_{n}),$$

*where c _{n}* = {

We emphasize here that the consistency of the covariate adjusted correlation estimator, *r*, holds under the general additive and multiplicative adjustments (2)–(3). However, as pointed out in the “Introduction” section and proven in the Appendix section, the special case of additive linear effects of *U* (i.e., * = X* + *a*_{1} *U* + *a*_{2} and *Ỹ = Y* + *b*_{1} *U* + *b*_{2}) can be handled with standard partial correlation analysis. The partial correlation estimate is obtained by first regressing (1) on *U* and (2) *Ỹ* on *U* to obtain two sets of residuals. The partial correlation estimate is then obtained as the Pearson correlation between the two sets of residuals. In contrast to the additive linear case, the partial correlation does not target *ρ _{XY}* under general additive effects of

We note that while *r* is based on an equidistant binning procedure, alternative binning approaches can be integrated to the estimation procedure proposed above. One alternative approach that we also explored is based on nearest neighbor binning. As pointed out earlier, for the equidistant binning used, *B _{j}*,

Also, upon the suggestion of the editor, we explored a variation on the proposed estimator in (4) by replacing the *r _{j}*'s in (4) with their Fisher's

For inference, we use the bootstrap percentile method to form confidence intervals (CIs) based on the proposed covariate adjusted estimator in the analysis of the female *FMR*1 premutation data. The estimated nonparametric density of the standardized 1000 bootstrap estimates of *ρ _{XY}* is given in Figure 1 (bottom panel), along with the standard normal density curve. The fitted density appears close to the standard normal density, indicating that the percentile bootstrap approximation is reasonable. The coverage of the proposed bootstrap percentile CIs are examined through simulations reported in Section 4.3.

(Top panel) Scatter plot of the local correlation estimates *r*_{j} versus
${\text{U}}_{J}^{M}$ for *j* = 1,…, 20 bins, with approximately eight points per bin. A local linear smooth overlays the scatterplot with an automatically selected bandwidth of *h* = 0.1. (Bottom **...**

An important practical issue with the application of the proposed estimator is the adequacy of the assumed adjustment forms (2)–(3). Although these assumed dual additive and multiplicative adjustment forms are fairly general compared to the additive linear restriction of other methods like partial correlation, it is still of interest to check the adequacy of these forms. We address this issue next by developing a bootstrap test to check this assumption.

The dual additive and multiplicative adjustment form of (2)–(3) imply that the local correlation (*u*) = *ρ _{XY}* is free of

$${R}_{n}=\frac{1}{m}\sum _{\ell =1}^{m}\left|\widehat{\rho}\left({U}_{\ell}^{m};{h}_{T}\right)\right|,$$

where
$\widehat{\rho}({U}_{\ell}^{M};{h}_{T})={\sum}_{j=1}^{m}{\text{r}}_{j}^{C}{w}_{j}({U}_{\ell}^{M},{h}_{T})$ is the linear smooth fitted to the centered scatterplot using the bandwidth *h _{T}* and weights
${w}_{j}({U}_{\ell}^{M},{h}_{T})$, evaluated at
${U}_{\ell}^{M}$.

An automatic data-based choice of the bandwidth parameter *h _{T}* that is fast to implement and that leads to good results, adopted from Rice (1984), is

$${h}_{T}=\text{arg}\underset{h}{min}\{T(h)\}=arg\underset{h}{min}\left\{\frac{(1/m)\text{R}SS(h)}{1-2tr({\text{W}}_{h})/m}\right\},$$

(5)

where **W*** _{h}* is an

We approximate the sampling distribution of *R _{n}* by the wild bootstrap, because the local estimators

In this section, we summarize the simulation studies conducted to examine (i) the finite-sample performance of the proposed covariate adjusted correlation estimator and its relative performance in comparison to no adjustment, parametric adjustment of Tassone, Hagerman, Chamberlain, et al. (2000), partial correlation and nonparametric partial correlation, (ii) the sensitivity of the proposed estimator to the choice of the number of bins *m*, (iii) the power of the proposed bootstrap test for checking the dual additive and multiplicative adjustment forms and the coverage of the proposed bootstrap percentile confidence interval (CI), (iv) the performance of the two binning procedures (equidistant and nearest neighbor) and their robustness to the distribution of *U*, and (v) the performance of the alternative estimator proposed via Fisher's *z* transformations.

The simulation setup was designed to reflect the observed *FMR1* premutation data, where the means and variation of (,*Ỹ*,*U*) are chosen to be similar to those of (
$\stackrel{\sim}{\text{mRNA}}$,
$\stackrel{\sim}{\text{CGG}}$, ActRatio). Also, the correlation *ρ _{,Ỹ}* = 0.32 was chosen to approximately match the observed correlation
${r}_{\stackrel{\sim}{\text{mRNA}},\stackrel{\sim}{\text{CGG}}}=0.29$. The covariate

The first distortion considered is (a) the general case of nonparametric additive and multiplicative effects under which only the proposed covariate adjusted estimator targets the underlying correlation coefficient. Table 1 reports the estimated absolute bias, variance, and MSE of the correlation estimators based on 1000 Monte Carlo data sets for each sample size. As evident from the results in Table 1, only the bias of the proposed covariate adjusted correlation decreases with increasing sample size. As expected, the biases of the other methods remain substantial across the different sample sizes, because they do not target *ρ _{XY}*.

Estimated absolute bias, variance, and MSE of the estimators from proposed covariate adjusted correlation, no adjustment, parametric adjustment (1) of Tassone, Hagerman, Chamberlain, et al. (2000), partial correlation, and nonparametric partial correlations **...**

In the second distortion setup (b) of parametric additive and multiplicative distortion (*Ỹ* = *Y*, = (1 − *U*)*X* + 1.42*U*), the parametric adjustment (1) of Tassone et al. and the proposed covariate-adjusted estimator are the two methods that target the underlying correlation. For the third setup (c) of parametric additive distortion (*Ỹ* = *Y* + 5*U*, = *X* + 10*U*), partial correlation, nonparametric partial correlation, and the proposed method target the correct correlation. The results for (b) and (c) are reported in Tables 2 and and3,3, respectively. As can be seen from Table 2, both partial and nonparametric partial correlation perform (equally) poorly and their biases do not decrease with increasing sample size, as expected for model (b). For parametric additive distortion, namely case (c), the incorrect form of adjustment (1) due to Tassone et al. and the unadjusted correlation result in biases that do not decrease with increasing sample size (Table 3). The biases of parametric, nonparametric partial correlation and the proposed covariate-adjusted correlation decrease with increasing *n*. The simpler methods of parametric and partial correlation are more efficient than the proposed method under the null models (b), (c) for the small sample size of *n* = 150, as expected. However, this difference seems to diminish quickly as the sample size increases to *n* = 300 and 600.

Estimated absolute bias, variance, and MSE of the estimators of the correlation under the null case of parametric adjustment given in (1) from proposed covariate adjusted correlation, no adjustment, parametric adjustment (1) of Tassone, Hagerman, Chamberlain, **...**

Estimated absolute bias, variance, and MSE of the estimators of the correlation under the null case of additive parametric adjustment from proposed covariate adjusted correlation, no adjustment, parametric adjustment (1) of Tassone, Hagerman, Chamberlain, **...**

In the simulation studies we also examine the effect of the total number of bins, *m*, on the proposed estimators. Similar to the results of Şentürk and Müller (2005b), where the corresponding estimator was also obtained through binning, the estimates are found to be robust to the choice of *m*. The results indicate that the correlation estimates and MSEs were very similar for *m* between 15 and 30 for *n* = 150, *m* between 20 and 40 for *n* = 300, and *m* between 25 and 50 for *n* = 600. For example, under simulation setup (a) of Section 4.1, for *n* = 300 with *m* {20, 30, 40} the mean of covariate adjusted correlation estimates for *ρ _{XY}* = 0.467 were (0.449, 0.447, 0.439) and the MSEs were (0.0027, 0.0030, 0.0034), respectively. Although our experience with the proposed estimator indicates that the final estimate is fairly robust to a reasonably wide range of

Next, to examine the power of the proposed bootstrap test for checking the dual additive and multiplicative form of (2)–(3), we considered two cases of deviations (alternatives) from this assumption (null case). The simulation model (a) of Section 4.1 described above is used for the null case. In the first alternative case, *Ỹ* and deviate from the additive and multiplicative forms (2)–(3) through:

$$\stackrel{\sim}{X}={N}_{0}-{N}_{0}{I}_{\{\theta >0\}}+cos\{X(0.4+\theta /130)(U/1.3+3.2)\}{I}_{\{\theta >0\}}$$

$$\stackrel{\sim}{Y}={M}_{0}-{M}_{0}{I}_{\{\theta >0\}}+cos\{Y(0.4+\theta /130)(U/1.3+3.2)\}{I}_{\{\theta >0\}},$$

where *N*_{0} _{1}(*U*)*X* + _{2}(*U*), *M*_{0} *ψ*_{1}(*U*)*Y* + *ψ*_{2}(*U*), *I*_{{}_{e}_{}} is the indicator function for event *E*, and *θ* = 0, 1,…, 8. The functions _{1}(*U*), _{2}(*U*), *ψ*_{1}(*U*), and *ψ*_{2}(*U*) are as defined above. The null hypothesis (i.e., assumption (2)–(3) holds) corresponds to *θ* = 0 and *θ* = 1,…, 8 correspond to increasing alternatives. These alternatives as well as the null are displayed in Figure 2 (top left plot), where the conditional correlation functions, (*u*), are provided. When the additive and multiplicative forms are satisfied (*θ* = 0), (*u*) is constant (see Section 3). The second set of alternatives/violations explored in this simulation study are provided graphically in the top right plot of Figure 2. These conditional correlation functions correspond to the following alternative deviations:

(Top panel) Plots of (*u*) from the two cases of alternatives to the proposed additive and multiplicative distortion form. The null hypothesis of additive and multiplicative forms corresponds to *θ* = 0, i.e., the (conditional) correlation **...**

$$\stackrel{\sim}{X}={N}_{0}-{N}_{0}{I}_{\{\theta >0\}}+cos\{X(0.38+\theta /130)(2U-1.1\}{I}_{\{\theta >0\}},$$

$$\stackrel{\sim}{Y}={M}_{0}-{M}_{0}{I}_{\{\theta >0\}}+cos\{Y(0.38+\theta /130)(2U-1.1)\}{I}_{\{\theta >0\}},$$

for *θ* = 0,…, 8. Similarly, the null hypothesis corresponds to the case of *θ* = 0.

Given in the lower panel of Figure 2 are the power of the bootstrap test proposed in Section 3 to check the adequacy of the dual additive and multiplicative forms. Displayed are three sets of power curves corresponding to sample sizes *n* = 150, 300, and 600. Two curves of the same line type are for the test at significance levels 0.05 (bottom curve) and 0.10 (top curve). Power estimates are based on 1000 Monte Carlo runs. The observed type I errors at *θ* = 0, for the above significance levels are, (0.015, 0.025) for *n* = 150, and (0.04, 0.08) for *n* = 600 in the first alternative deviation case. For the second alternative deviation case, the observed levels are (0.01, 0.04) for *n* = 150 and (0.04, 0.09) for *n* = 600. As expected, the levels of the bootstrap test move closer to the target values and the power functions increase with increasing deviation away from the null case of *θ* = 0 and with increasing sample size.

We also examined the estimated coverage levels of the proposed bootstrap percentile CIs under the simulation setting (a) of Section 4.1. Briefly, 1000 data sets were simulated at two sample sizes of *n* = 165 and *n* = 300. For each data set, 1000 bootstrap samples were generated and the estimated coverage values of the CIs corresponding to levels of (0.80, 0.90, and 0.95) are (0.75, 0.88, 0.93) for *n* = 165 and (0.80, 0.89, 0.95) for *n* = 300.

We also ran a simulation study to compare the proposed equidistant binning estimator to one obtained via nearest neighbor binning and to evaluate the performance of the two binning procedures under different *U* distributions. While the bin size is kept constant in equidistant binning, the number of points per bin is kept constant in nearest neighbor binning (see Section 2). We compare the two binning procedures for three different distributions of *U*. Under model (a) of Section 4.1, where *U* was sampled from Uniform [0.2, 0.9] uniform distribution, we additionally consider cases where *U* is sampled from *N* (0.55, 0.04) and *χ*^{2} (1)*/*6 + 0.4. The three distributions are chosen such that they have approximately the same first two moments. The results are summarized in Table 4. The results suggest that the proposed equidistant binning is quite robust to the distribution of *U* and the nearest neighbor binning do not improve on the proposed equidistant binning.

We implemented the variation of the proposed estimator by replacing *r _{j}* by its the Fisher-

The molecular measurements,
$\stackrel{\sim}{\text{CGG}}$,
$\stackrel{\sim}{\text{mRNA}}$, and activation ratio (ActRatio) *U * ActRatio, were obtained from experiments at the University of California at Davis on 165 female premutation carriers. Our main interest here is to target *ρ _{XY} ρ*

Matrix plot of the observed variables
$\stackrel{\sim}{\text{CGG}}$,
$\stackrel{\sim}{\text{mRNA}}$, and ActRatio for *n* = 165 female premutation carriers.

Prior to estimating *ρ*_{mRNA,CGG}, we assess the adequacy of the assumed dual additive and multiplicative forms (2)–(3) for the data, as described in Section 3. The local correlation estimators from each bin are given in Figure 1 (top panel), along with a local linear smooth using the automatic bandwidth choice of *h* = 0.1 determined by (5). A *p*-value of 0.56 was obtained from 1000 bootstrap replications of *R _{n}*. Thus, the adequacy of the assumed adjustment forms (2)–(3) is not rejected. Graphically, this can also be seen from Figure 1 where the linear smooth fitted to the scatterplot of the local correlations is approximately close to a constant function.

In our analysis, we compare the proposed covariate (ActRatio) adjusted estimate for the correlation, *ρ*_{mRNA,CGG}, to estimates obtained without adjustment and with adjustment (1) on
$\stackrel{\sim}{\text{mRNA}}$, previously proposed by Tassone, Hagerman, Chamberlain, et al. (2000), partial correlation and nonparametric partial correlation. The estimate without adjustment corresponds to the observed Pearson correlation between
$\stackrel{\sim}{\text{mRNA}}$ and
$\stackrel{\sim}{\text{CGG}}$( and *Ỹ*). The proposed estimate is obtained using a total of 20 bins. We note here that the covariate adjusted correlation estimate was quite robust to the choice of the number of bins. For example, the estimates were very similar for the number of bins from *m* = 17 to *m* = 25. The estimates and approximate 95% CIs for *ρ _{XY}* from these five methods are provided in Table 6. For the unadjusted Pearson correlation, the assumed adjustment (1), partial correlation, and nonparametric partial correlation approximate CIs can be obtained using Fisher's

Estimates and approximate 95% CIs for ρ_{mRNA,CGG} adjusted for ActRatio in n = 165 female premutation carriers. The first four estimates correspond to unadjusted Pearson correlation and parametric adjustment (1) from Tassone, Hagerman, Chamberlain, **...**

The correlation between the observed mRNA levels and the CGG in female premutation carriers, unadjusted for the effect of ActRatio, is 0.29 (95% CI: 0.15–0.43, Table 6). Although still significant, the correlation estimate for female carriers falls substantially below the corresponding estimate for male carriers (correlation ≈ 0.57), which has been established in literature. As described in the “Introduction” section, this weaker association in female carriers is attributed partly to the protective effects from one normal X chromosome in female carriers which is absent in male carriers. Applying the (parametric) adjustment (1) of Tassone, Hagerman, Chamberlain, et al. (2000), specifically
$\stackrel{\sim}{\text{mRNA}}$ = (1 − ActRatio)mRNA + *a*ActRatio, to account for ActRatio results in a stronger (adjusted) correlation point estimate of 0.34. We used the constant *a* = 1.42, which is the empirical mean mRNA level for normal/unaffected individuals from Tassone, Hagerman, Taylor, et al. (2000). The proposed covariate adjusted correlation under the general adjustment forms (2)–(3), allowing for nonparametric effects of ActRatio on both observed mRNA and CGG, results in an adjusted correlation point estimate of 0.37 (95% CI: 0.25 – 0.51). Although the proposed method suggests that the underlying correlation is slightly higher, this result is quite similar to the result using the parametric additive and multiplicative adjustment (1) proposed by Tassone, Hagerman, Chamberlain, et al. (2000). Hence, this application provides an independent empirical support for the previously proposed parametric joint additive and multiplicative effect of ActRatio on mRNA, derived mainly from biological motivations. Also, because the nonparametric partial correlation (0.367, 95% CI: (0.228, 0.491)) is close to the proposed adjusted correlation estimate (0.372, 95% CI: (0.252, 0.520)), informally, it is interpreted that the nonlinearity is due to the additive distortion part.

Motivated by an adjustment for ActRatio in fragile X premutation female carriers, we proposed a general dual additive and multiplicative correlation adjustment model for the correlation between mRNA and CGG repeat expansion. A key feature of the methodology is that the uncertainty in the precise effects of ActRatio at the molecular level is modeled nonparametrically, thus accommodating linear or nonlinear effects. We proposed a simple covariate adjusted correlation estimator that is easy to obtain, showed that it is consistent, and examined its numerical properties in simulation studies. Although the adjustment forms are fairly general and, therefore, are automatically adaptive to special cases like linear additive or nonlinear additive effects, we also developed and assessed the performance of a bootstrap test procedure to check the adequacy of the dual additive and multiplicative forms. A test for detecting whether the distortion setting at hand would reduce to parametric or only additive cases would also be of interest, because then a simpler adjustment method can be employed. Nevertheless this remains an open problem requiring further research.

Application of the proposed covariate adjusted correlation to *n* = 165 fragile X premutation female carriers indicates stronger association between *FMR1* mRNA level and CGG repeat expansion compared to unadjusted analysis. Our results provide new insights and additional support for a dual additive and multiplicative parametric adjustment previously proposed in the fragile X premutation literature. The proposed adjustment is also applicable to the multiplicative adjustments (normalizations) used in biomedical research, including adjustments of biomarkers of inflammation by body mass index or body surface area and individual levels of PCB exposure by individual serum lipids.

Extension of the proposed algorithm to accommodate multiple covariates poses challenges. While the adjustment for two covariates (**U** = (*U*_{1}, *U*_{2})) would be a straightforward extension of the proposed algorithm using a two-dimensional binning procedure, as the dimension of **U** increases, one would quickly run into the curse of dimensionality. Because the proposed procedure involves localizing with respect to **U**, when the dimension of **U** increases, the data needed for the localization (binning) would become highly sparse. In these cases, a dimension reduction approach, such as taking a linear combination of the components of **U** vector may be of interest.

We are grateful to the reviewer, associate editor, and the editor for many detailed suggestions that substantially improved the article. Support for this work includes the National Institute of Health grants UL1DE19583, RL1AG032119, and RL1AG032115 (VN, FT, RJH, PJH), National Institute of Child Health and Human Development grant HD036071 (RJH, DVN, FT), National Cancer Institute grant CA-57030 (RJC), and grant UL1 RR024146 from the National Center for Research Resources (NCRR).

We first state the technical conditions that will be used in the proof of consistency. They are: *C1*. The adjusting variable *U* is independent of the variables *X* and *Y*. In addition, the marginal density *f*(*U*) of *U* has compact support, i.e., *a* ≤ *U* ≤ *b* for some constants *a*, *b*, and satisfies inf* _{a≤u≤b} f*(

Considering the definition of the Nadaraya–Watson estimator (Fan and Gijbels, 1996), we note that all the five terms in *r _{j}* are Nadaraya–Watson estimators. For instance, consider

$$\underset{a\le u\le b}{sup}|\widehat{N}(u)-N(u)|={O}_{p}({c}_{n}),$$

(6)

where *N*(*u*) *= E*( | *U = u*) and *c _{n}* = {

$$\begin{array}{cc}{r}_{j}\hfill & =c/({s}_{1}{s}_{2})+{o}_{p}({c}_{n})\hfill \\ \hfill & =\text{Corr}(\stackrel{\sim}{X},\stackrel{\sim}{Y}|U={\text{U}}_{j}^{M})+{o}_{p}({c}_{n}={\rho}_{XY}+{o}_{p}({c}_{n}),\hfill \end{array}$$

where $c=E(\stackrel{\sim}{X}\stackrel{\sim}{Y}|U={\text{U}}_{j}^{M})-E(\stackrel{\sim}{X}|U={\text{U}}_{j}^{M})E(\stackrel{\sim}{Y}|U={\text{U}}_{j}^{M}),{s}_{1}={[E({\stackrel{\sim}{X}}^{2}|U={\text{U}}_{j}^{M})-{\{E(\stackrel{\sim}{X}|U={\text{U}}_{j}^{M})\}}^{2}]}^{1/2}$ and ${s}_{2}={[E({\stackrel{\sim}{Y}}^{2}|U={\text{U}}_{j}^{M})-{\{E(\stackrel{\sim}{X}|U={\text{U}}_{j}^{M})\}}^{2}]}^{1/2}$. Hence, Theorem 1 follows.

The consistency of *r* also follows under weaker moment conditions than the ones given in *C3*, where 2 < λ < 3. For the order of *m* and the convergence rates under weaker moment conditions, see Härdle et al. (1988) for details.

The partial correlation between *Ỹ* and adjusted for *U* is equivalent to the correlation between the variables *e _{ỸU}* and

The nonparametric partial correlation between *Ỹ* and adjusted for *U* is equivalent to *ρ _{ẽỸUẽU}*, where

- Davidson AC, Hinkley DV. Bootstrap Methods and Their Applications. New York: Cambridge University Press; 1997.
- Devys D, Lutz Y, Rouyer N, Bellocq JP, Mandel JL. The FMR-1 protein is cytoplasmic, most abundant in neurons and appears normal in carriers of a fragile X premutation. Nature Genetics. 1993;4:335–340. [PubMed]
- Fan J, Gijbels I. Local Polynomial Modelling and its Applications. London: Chapman and Hall; 1996.
- Hagerman RJ. Physical and behavioral phenotype. In: Hagerman RJ, Hagerman PJ, editors. Fragile X Syndrome: Diagnosis, Treatment and Research. 3rd. Baltimore: Johns Hopkins University Press; 2002. pp. 3–109.
- Hagerman PJ, Hagerman RJ. The fragile-X premutation: A maturing perspective. American Journal of Human Genetics. 2004;74:805–816. [PubMed]
- Härdle W, Janssen P, Serfling R. Strong uniform consistency rates for estimators of conditional functionals. Annals of Statistics. 1988;16:1428–1449.
- Hart J. Nonparametric Smoothing and Lack of Fit Tests. New York: Springer-Verlag; 1997.
- Jacquemont S, Hagerman RJ, Leehey MA, Hall DA, Levine RA, Brunberg JA, Zhang L, Jardini T, Gane LW, Harris SW, Herman K, Grigsby J, Greco C, Berry-Kravis E, Tassone F, Hagerman PJ. Penetrance of the fragile X-associated tremor/ataxia syndrome (FXTAS) in a premutation carrier population: Initial results from the California-based study. Journal of the American Medical Association. 2004;291:460–469. [PubMed]
- Kaysen GA, Dubin JA, Müller HG, Mitch WE, Rosales LM, Levin NW, Hemo Study Group Relationship among inflammation nutrition and physiologic mechanisms establishing albumin levels in hemodialysis patients. Kidney International. 2002;61:2240–2249. [PubMed]
- Kenneson A, Zhang F, Hagedorn CH, Warren ST. Reduced FMRP and increased FMR1 transcription is proportionally associated with CGG repeat number in intermediate-length and premutation carriers. Human Molecular Genetics. 2001;10:1449–1454. [PubMed]
- Oberlé L, Rousseau F, Heitz D, Kretz C, Devys D, Hanauer A, Boue J, Bertheas MF, Mandel JL. Instability of a 550-base pair DNA segment and abnormal methylation in fragile X sydrome. Science. 1991;252:1097–1102. [PubMed]
- Pieretti M, Zhang F, Fu YH, Warren ST, Oostra BA, Caskey CT, Nelson DL. Absence of expression of the FMR-1 gene in fragile X syndrome. Cell. 1991;66:817–822. [PubMed]
- Rice J. Bandwidth choice for nonparametric regression. Annals of Statistics. 1984;12:1215–1230.
- Schisterman EF, Whitcomb BW, Louis GMB, Louis TA. Lipid adjustment in the analysis of environmental contaminants and human health risks. Environmental Health Perspectives. 2005;113:853–857. [PMC free article] [PubMed]
- Şentürk D, Müller HG. Covariate adjusted regression. Biometrika. 2005a;92:59–74.
- Şentürk D, Müller HG. Covariate adjusted correlation analysis via varying coefficient models. Scandinavian Journal of Statistics. 2005b;32:365–383.
- Tassone F, Hagerman RJ, Chamberlain WD, Hagerman PJ. Transcription of the FMR1 gene in individuals with fragile X syndrome. American Journal of Medical Genetics. 2000;97:195–203. [PubMed]
- Tassone F, Hagerman RJ, Taylor AK, Gane LW, Godfrey TE, Hagerman PJ. Elevated levels of FMR1 mRNA in carrier males: A new mechanism of involvement in fragile X syndrome. American Journal of Human Genetics. 2000;66:6–15. [PubMed]
- Verkerk AJ, Pieretti M, Sutcliffe JS, Fu YH, Kuhl DP, Pizzuti A, Reiner O, Richards S, Victoria MF, Zhang F, Eussen BE, van Ommen GJB, Blonden LAJ, Riggins GJ, Chastain JL, Kunst CB, Galjaard H, Caskey CT, Nelson DL, Oostra BA, Warren ST. Identification of a gene (FMR-1) containing a CGG repeat co-incident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell. 1991;65:905–914. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |