Home | About | Journals | Submit | Contact Us | Français |

**|**Hum Hered**|**PMC2880728

Formats

Article sections

Authors

Related links

Hum Hered. 2007 July; 64(4): 220–233.

Published online 2007 June 12. doi: 10.1159/000103751

PMCID: PMC2880728

*Daniel J. Schaid, PhD, Harwick 775, Section of Biostatistics, Mayo Clinic, 200 First Street, SW, Rochester, MN 55905 (USA), Tel. +1 507 284 0639, Fax +1 507 284 9542, E-Mail ude.oyam@diahcs

Received 2006 December 21; Accepted 2007 March 12.

Copyright © 2007 by S. Karger AG, Basel

This article has been cited by other articles in PMC.

Genetic linkage analysis of common diseases is complicated by the heterogeneity of genetic and environmental factors that increase disease risk, and possibly interactions among them. Most linkage methods that account for covariates are restricted to sib pairs, with the exception of the conditional logistic regression model [1] implemented in LODPAL in the S.A.G.E. software [2]. Although this model can be applied to arbitrary pedigrees, at times it can be difficult to maximize the likelihood due to model constraints, and it does not account for the dependence among the different types of relative pairs in a pedigree.

To overcome these limitations, we developed a new approach based on score statistics for quasi-likelihoods, implemented as weighted least squares. Our methods can be used to test three different hypotheses: (1) a test for linkage without covariates; (2) a test for linkage with covariates, and (3) a test for effects of covariates on identity by descent sharing (i.e., heterogeneity). Furthermore, our methods are robust because they account for the dependence among different relative pairs within a pedigree. *Results and Conclusion:* Although application of our methods to a prostate cancer linkage study did not find any critical covariates in our data, the results illustrate the utility and interpretation of our methods, and suggest, nonetheless, that our methods will be useful for a broad range of genetic linkage heterogeneity analyses.

Genetic linkage has been widely used to screen for susceptibility genes for complex human traits. Despite successes for many diseases that follow simple Mendelian transmission, there have been few successes of linkage analyses for common diseases. Common diseases are influenced by a spectrum of forces that include genetic, environmental, and behavioral risk factors. The definition of the phenotype, age-at-onset of disease, secular trends in diagnosis, and environmental risk factors that influence the phenotype, and possibly interact with the underlying genotype, can dramatically influence the magnitude of the observed linkage signal. For many complex diseases, the linkage signals have been weak to moderate, and some of the most promising findings have been difficult to replicate. A significant difficulty in most linkage studies of common diseases is controlling the many factors that can influence the linkage results. Accounting for these factors can be vital to the discovery and understanding of susceptibility alleles for common diseases. To achieve this goal, we develop new regression methods for relative-pair linkage analyses.

To account for pair-specific covariates in affected sib pair *(ASP)* studies, several authors have extended the popular model-free maximum lod score *(MLS)* method of Risch [3] to allow the identical by descent *(IBD)* sharing probabilities for an *ASP* to depend on pair-specific covariates in non-linear regression models [4 5 6 7 8 9 10]. For a comparison of several methods, see Zhang et al. [11]. It is important to recognize, however, that when the regression effect is not large, the proposed non-linear regression models are not that different from the linear regression of the proportion of alleles shared *IBD*, π, on pair-specific covariates [12]. Gauderman and Siegmund showed that this type of linear regression can increase the power to detect linkage when gene-environment interaction is at least moderate, and give further discussion of the advantages of linear regression in the presence of gene-environment interaction [13]. Peng et al. [14] recently derived score statistics to test for linkage in the presence of gene-covariate interaction, and one of their score statistics for independent *ASP* s can be viewed as a score statistic resulting from a linear regression model. Because linear regression can be easier to fit, it offers a viable alternative to the non-linear approach. One needs to be cautious, however, about the assumption of homoscedasticity. For the regression of π on pair-specific covariates, homoscedasticity means that the variance of π, conditional on the covariates, does not depend on the values of the covariates. Clearly this is not valid, because π is a proportion, and if its mean depends on covariates, then so too does its variance. To account for this, one could use weighted linear regression with weights inversely proportional to the variance.

Based on previous developments of non-linear regression models for the influence of covariates on *IBD* probabilities [see references in 11 and 12], we develop new quasi-likelihood score statistics, which can be viewed as weighted least squares. Hence, they implicitly account for heteroscedasticity. A significant advantage of our approach is that it provides a method to account for correlations among multiple relative pairs from the same pedigree. These methods are developed to test three types of hypotheses: (1) a test for linkage without covariates; (2) a test for linkage with covariates, and (3) a test for covariate effects on *IBD* sharing. Application of these methods to a prostate cancer linkage study illustrates their utility and offers guidance for their interpretation.

Our goal is to derive a flexible method to account for pair-specific covariates in linkage analyses, and to account for correlations induced by using multiple pairs of relatives from the same pedigree. Because an exact likelihood for linkage with covariates can be complex, particularly when considering the dependence among multiple pairs of relatives, we base our methods on a quasi-likelihood *(QL)* score function [15, 16], as others have done for complex pedigree analyses [17].

To develop a *QL* score function, we first consider how the expected allele sharing for a pair of relatives depends on their affection status and their covariates. First, let *s*_{r} denote the number of alleles shared *IBD* for a pair of relatives with relationship of type *r*. Because incomplete linkage information is common, we use the imputed value, ${s}_{r}=2{\stackrel{\circ}{f}}_{r,2}+{\stackrel{\circ}{f}}_{r,1}$, where ${\stackrel{\circ}{f}}_{r,2}$ and ${\stackrel{\circ}{f}}_{r,1}$ are the estimat ed posterior probabilities of sharing 2 and 1 alleles *IBD*, respectively, conditional on the marker data of the pedigree. Second, let m_{r} denote the expected value of *s*_{r}, conditional on affection status and covariates. A key aspect in the development of a *QL* score function is how m_{r} depends on the covariates for a pair of relatives. Although this will be fully developed in the next section, we first present the general setup for the *QL* score function. Assuming that m_{r} depends on a vector of regression coefficients, denoted, with vector length *q*, the *QL* score function can be expressed as

$$U=\sum _{i=1}^{n}{D}_{i}^{T}{V}_{i}^{-1}\left({S}_{i}-{M}_{i}\right),$$

(1)

where *i* indexes a pedigree, *n* is the number of pedigrees, the vector *S*_{i} contains the *s _{r}* values for all

To consider how linkage and covariates influence the mean function *m*_{r}, note that when there is no linkage, the expected value of *s*_{r} is ${m}_{r}^{0}=2{f}_{r,2}+{f}_{r,1}$, where *f*_{r,2} and *f*_{r,1} denote the no-linkage prior probabilities of sharing 2 and 1 alleles *IBD*, respectively. The prior probabilities depend only on relationships, not the marker data. For example, for sib pairs, *f*_{r,2} = 1/4 and *f r*,1 = 1/2. When there is linkage without covariate effects, the expected value of *s*_{r} depends on the affection status of the members of the pair, the penetrances and allele frequencies of the underlying disease susceptibility *(DS)* locus, and the genetic distance between the marker and the *DS* locus. If there is linkage with covariate effects, the expected value of *s*_{r} will additionally depend on the covariates of the relative pairs. Hence, a key aspect in our development is how we model the influence of linkage and covariates on *m*_{r}.

For now we restrict our focus to only affected relative pairs. Later we expand our model to consider relative pairs with both members unaffected and relative pairs that are discordant for affection status. For affected relative pairs, we show in Appendix 1 that the influence of pair-specific covariates on the expected allele sharing can be modeled as

$${m}_{r}={m}_{r}^{0}+{c}_{r}X\beta ,$$

(2)

where matrix *X* contains the pair-specific covariates (including an intercept of 1's in the first column), and *c*_{r} is a factor that scales the covariates according to the type of relative pair. The factor *c*_{r} depends only on the prior probabilities *f*_{r,1} and *f*_{r,2}. In Appendix 1 we derive two types of scaling factors: one that assumes no dominance variation of the penetrances, and one that approximates the minimax constraint used by Goddard and Olson [1]. Expression (2) illustrates that the expected allele sharing can be approximated by a linear regression model with an offset of *m*^{o}_{r} and a new covariate that is scaled according to the type of relative pair; the scaled covariate is *X** = *c*_{r} *X*. This allows us to include different types of relative pairs in the score statistic by using a scaled covariate in place of the original covariate.

If we did not scale the covariates, we would need to fit a separate regression model for each type of relative pair, requiring too many parameters. Hence, by scaling the covariates, the regression model requires fewer terms. To understand the role of scaling, consider two types of affected relative pairs from the same pedigree: an affected sib pair and an affected cousin pair. Without covariate effects, we know that the affected sib pair is expected to have greater allele sharing than the affected cousin pair, because sibs are genetically more similar than cousins. If we assumed that the intercept were the same for both types of relative pairs, we would underestimate the true intercept for the sib pair, and overestimate the true intercept for the cousin pair. By scaling the covariate according to the type of relative pair, we allow for the fact that the cousin pair is a priori expected to have less allele sharing than the sib pair, yet we require only a single intercept. Now, when covariates are added to the model, scaling the covariates achieves the same goal as scaling the intercept: the expected allele sharing for pairs can decrease as the within-pair prior genetic similarity decreases, even if the covariate values do not change. Furthermore, these methods apply to the X chromosome, noting that if a pair has at least one male, ${f}_{2}={\stackrel{\circ}{f}}_{2}=0$.

Using the linear model for expected allele sharing (expression 2), we develop *QL* score statistics for three types of hypotheses:

- 1.test for linkage without covariates (i.e., only the intercept is of interest), in contrast to the null hypothesis of no linkage,
- 2.test for linkage with covariates, in contrast to the null hypothesis of no linkage,
- 3.test for covariate effects on
*IBD*sharing, in contrast to the null hypothesis that the*IBD*sharing does not depend on covariates (e.g., linkage homogeneity), yet allowing linkage under this null hypothesis.

To develop a test for linkage with or without covariates, we consider a general model that includes an intercept, β_{0}. If pair-specific covariates are included, say *p* covariates, there would be a total of *q* = *p* + 1 regression coefficients. The test for linkage, with or without covariates, tests the null hypothesis that all regression coefficients are zero; a special case is when there is only an intercept (*q* = 1). The score vector for this hypothesis can be expressed as

$$U=\sum _{i=1}^{n}{X}_{i}^{*T}{V}_{o,i}^{-1}\left({S}_{i}-{M}_{i}^{0}\right),$$

(3)

where the vector *M*^{o}_{i} contains the null expected values for the different types of affected relative pairs (*m*^{o}_{r} for relatives of type *r*), the *k*_{i} × *q* matrix *X**_{i} contains the scaled covariates, and *V*_{o, i} is the covariance matrix for the vector *S*_{i}, computed under the null hypothesis (see Appendix 1). We later discuss how to calculate *V*_{o, i}. The covariance matrix for the vector *U* is

$${V}_{U}=\sum _{i=1}^{n}{X}_{i}^{*T}{V}_{o,i}^{-1}{X}_{i}^{*}$$

(4)

and the resulting test statistic is

$${T}_{1}={U}_{T}{V}_{U}^{-1}U.$$

(5)

For a large sample, *T*_{1} has a χ^{2}_{q} distribution.

To increase the power to detect linkage, a variety of constraints on the *IBD* sharing proportions for *ASP* linkage studies have been proposed, ranging from Holman's ‘triangle constraints’, to a model with no dominance variation [18], to a model with a minimax optimality property [19]. These types of constraints have been evaluated for *ASP* regression models with pair-specific covariates, whereby the regression parameters were constrained such that the fitted *IBD* proportions met the imposed constraints. Some constraints were found to increase power when covariates were included, particularly in the presence of gene-environment interaction [6]. For instance, if an environmental exposure changes the direction of the effect of a *DS* allele and two affected sibs have different exposures, then their *IBD* sharing should be *less* than the null, while other *ASP* s with concordant exposure should share *more* than the null. These forces, going in opposite directions from the null, lead to a large regression coefficient for the environmental covariate, giving an increase in power.

Greenwood and Bull evaluated an ‘average’ constraint, such that the sample mean of the model-fitted *IBD* sharing proportions was constrained, and a ‘simultaneous boundary’ constraint, such that all model-fitted *IBD* proportions were constrained (i.e., for all possible covariate configurations, the model-fitted *IBD* proportions met the assumed constraints) [6]. Although the simultaneous boundary constraint gave the best power for their simulations, it was the more difficult method to implement. Furthermore, it imposes greater constraints than the average constraint, which may not be desirable when the underlying relationships of gene(s) with environmental exposures on the risk of disease is not known; others have warned about the potential limitations of constraints in the presence of gene-environment interaction [6, 20, 21]. For these reasons, we used a simple ‘average’ constraint in our *QL* score statistics, such that the sample average of the expected number of alleles shared *IBD*, determined by equation (2), is not less than the sample average of the null values.

To implement the ‘average constraint’, all pair-specific covariates are centered about their mean values (e.g., the *j*-th covariate for pair *i* is $\left({x}_{i,j}-{\overline{x}}_{j}\right)$), these covariates are scaled, and then a score statistic is computed that considers alternative hypotheses with the intercept constrained to be non-negative. To derive this constrained score statistic, we partition the *U* vector of equation (3) into *U*^{T} = (*U*_{0} | *U*^{T}_{1}), where *U*_{0} is the contribution from the intercept, and *U*_{1} is a vector of length *p* for the contribution of the pair-specific covariates. In an analogous fashion, we partition the variance matrix in equation (4) into the corresponding components,

$${V}_{U}=\left(\begin{array}{cc}{V}_{00}& {V}_{01}\\ {V}_{10}& {V}_{11}\end{array}\right).$$

(6)

Then, by ‘zeroing-out’ the *U*_{0} component whenever it is less than zero (i.e., average *IBD* sharing less than the null value), we compute the following constrained score statistic

$${T}_{1,constrained}=\{\begin{array}{l}{U}^{T}{V}^{-1}U\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\text{if}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{U}_{0}>0;\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}rank=q\hfill \\ {U}^{*T}{V}^{*-1}{U}^{*}\hspace{0.17em}\hspace{0.17em}\text{if}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{U}_{0}\le 0;\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}rank=q-1\hfill \end{array}.$$

(7)

Here, *rank* is the rank of *V* or *V**, and *U** is the projection of *U* onto the half-plane {*U*_{0} ≥ 0}, which is accomplished by the regression of *U*_{1} on *U*_{0}. That is, *U** = *U*_{1} – (*V*_{10} / *V*_{00}) *U*_{0}. This regression adjusts for any correlation of the *U*_{1} scores with the *U*_{0} score. The variance matrix for this adjusted score vector is *V** = *V*_{11} – (*V*_{10} *V*_{01})/*V*_{00}. For large samples, *T*_{1, constrained} has a chi-bar-squared distribution, which is a 50:50 mixture of χ^{2}_{q − 1} and χ^{2}_{q}. These results follow from case 2 of Self and Liang [22].

It is important to note that both *T*_{1} and *T*_{1, constrained} can be used to test for linkage without the use of covariates, by simply including only the intercept in the score statistics. This provides a way to test for linkage using all affected relative pairs. Without covariates, the distribution of *T*_{1, constrained} reduces to a 50:50 mixture of χ^{2}_{0} and χ^{2}_{1}, a well known result for testing a one-sided alternative hypothesis for excess *IBD* sharing.

To test for the effects of covariates on *IBD* sharing, we wish to allow for the presence of linkage by allowing a non-zero intercept, and test whether the remaining regression coefficients differ from zero. For now, we do not constrain the intercept. To develop this score statistic, we need a robust estimate of the intercept parameter. For a model with only the intercept, the X*_{i} matrix reduces to a single column of 1's multiplied by the appropriate *c*_{r} scaling factors for the types of relative pairs; denote this scaled vector *C*_{i}. From this, we can solve the following *QL* score function to obtain a robust estimate of β_{0},

$${U}_{{\beta}_{0}}=\sum _{i=1}^{n}{C}_{i}^{T}{V}_{i}^{-1}\left({S}_{i}-{M}_{i}^{o}-{\beta}_{0}{C}_{i}\right)\equiv 0.$$

(8)

The solution of this equation is the weighted least squares estimate

$${\stackrel{\circ}{\beta}}_{0}={\left[\sum _{i=1}^{n}{C}_{i}^{T}{V}_{i}^{-1}{C}_{i}\right]}^{-1}\left[\sum _{i=1}^{n}{C}_{i}^{T}{V}_{i}^{-1}\left({S}_{i}-{M}_{i}^{0}\right)\right].$$

(9)

Note that the variance matrix *V*_{i} should be computed with ${\stackrel{\circ}{\beta}}_{0}$, suggesting that one should iterate between estimating β_{0} and estimating *V*_{i}. However, because it is computationally challenging to estimate *V*_{i}, we compute *V*_{i} under the hypothesis of no linkage (all regression coefficients, including the intercept, are zero). Although this may result in a less efficient estimator of β_{0}, it is still unbiased. We shall derive a *QL* score statistic that is based on the computation of *V*_{i} under the hypothesis of no linkage, and we call this a ‘model-based’ score statistic. Nonetheless, to account for a false assumption of no linkage, we shall also use a robust variance matrix, to derive a ‘robust’ score statistic.

To develop the model-based score statistic, which accounts for the additional variation from estimating β_{0}, we partition the columns of the scaled *X**_{i} matrix into the intercept column, denoted *C*_{i}, and the remaining columns, denoted *X**_{i,[−1]}. Furthermore, assuming linkage, but no covariate effect, the estimated expected allele sharing is ${M}_{i}={M}_{i}^{0}+{\stackrel{\circ}{\beta}}_{0}{C}_{i}$. Then, the score vector can be expressed as

$$U=\left[\sum _{i=1}^{n}{X}_{i,\left[-1\right]}^{*T}{V}_{o,i}^{-1}\left({S}_{i}-{M}_{i}\right)\right].$$

(10)

The model-based variance matrix of *U* is

$${V}_{U,mod\text{el}}=\left[\left(\sum _{i=1}^{n}{X}_{i,\left[-1\right]}^{*T}{V}_{o,i}^{-1}{C}_{i}\right){\left(\sum _{i=1}^{n}{C}_{i}^{T}{V}_{o,i}^{-1}{C}_{i}\right)}^{-1}\left(\sum _{i=1}^{n}{C}_{i}^{T}{V}_{o,i}^{-1}{X}_{i,\left[-1\right]}^{*}\right)\right]$$

(11)

and the resulting test statistic is

$${T}_{2,mod\text{el}}={U}^{T}{V}_{U,mod\text{el}}^{-1}U$$

To compute the robust score statistic, we use a robust variance matrix for *Var*(*U*). To achieve this, let *U*_{i} denote the contribution of the *i*-th pedigree to the *U* score vector, and partition this vector into the part that is for the intercept (*U*_{i,0}), and the part that is for the remaining covariates (*U*_{i,1}),

$$\begin{array}{l}{U}_{i,0}={C}_{i}^{T}{V}_{o,i}^{-1}\left({S}_{i}-{M}_{i}\right),\\ {U}_{i,1}={X}_{i,\left[-1\right]}^{*T}{V}_{o,i}^{-1}\left({S}_{i}-{M}_{i}\right).\end{array}$$

From these partitioned vectors, we can compute a robust variance matrix,

$${V}_{U,\text{robust}}=\left[\left(\sum _{i=1}^{n}{U}_{i,1}{U}_{i,1}^{T}\right)-\left(\sum _{i=1}^{n}{U}_{i,1}{U}_{i,0}^{T}\right){\left(\sum _{i=1}^{n}{U}_{i,0}{U}_{i,0}^{T}\right)}^{-1}\left(\sum _{i=1}^{n}{U}_{i,0}{U}_{i,1}^{T}\right)\right].$$

Using this robust variance, the corresponding test statistic is

$${T}_{2,\text{robust}}={U}^{T}{V}_{U,\text{robust}}^{-1}U$$

(13)

When the covariates do not influence the *IBD* sharing, both *T*_{2,model} and *T*_{2,robust} have an asymptotic χ^{2} distribution with *q*−1 degrees of freedom.

Now, when we constrain the intercept to be non-negative, we still estimate ${\stackrel{\circ}{\beta}}_{0}$ according to expression (9), but if this estimate is less than zero, we set it equal to zero, and then proceed with the above steps to compute both *T*_{2,model} and *T*_{2,robust}. For the constrained test for covariate effects, we use the unmixed χ^{2}_{q −1} distribution, since in this case the null hypothesis does not include the constrained β_{0} parameter.

It is well known that sampling affected relatives for linkage studies of complex traits is much more powerful than sampling unaffected relatives [23]. In practice, however, it is not unusual to sample unaffected relatives in order to infer missing genotypes of parents. These unaffected relatives can be used for two purposes. One is to determine if the inclusion of unaffected relatives increases the strength of evidence for or against linkage, and the other is to evaluate the assumption of Mendelian transmission of the marker alleles in the absence of linkage. When there is linkage, relative pairs with concordant affection status are expected to have *IBD* allele sharing that exceeds the null, but the magnitude of this excess sharing depends on the penetrance of the DS locus. As the penetrance decreases, the allele sharing for concordant *unaffected* relative pairs is expected to approach the null sharing more rapidly than that for *affected* relative pairs. This is because the variability of the genotypes of unaffected subjects increases as the penetrance decreases. On the other hand, the allele sharing for pairs with discordant affection status is expected to be less than the null, with a magnitude of deficient sharing that also depends on the penetrance.

With this background, we extend our expected allele sharing model (for now ignoring covariates) by including a parameter for each of the three types of relative pairs: concordantly affected *(AA)*, concordantly unaffected *(UU)*, and discordant *(AU)*. We achieve this by using indicator variables for each type of pair (e.g., *X*_{AA} = 1 if an *AA* pair, etc.):

$${m}_{r}={m}_{r}^{o}+{c}_{e}\left\{{X}_{AA}{\beta}_{AA}+{X}_{UU}{\beta}_{UU}+{X}_{AU}\right\}.$$

(14)

Furthermore, based on the above background, it is reasonable to impose the following constraints when testing for linkage,

$$\begin{array}{l}{\beta}_{AA}\ge {\beta}_{UU}\\ {\beta}_{UU}\ge 0\\ {\beta}_{AU}\leqq 0\end{array}$$

These constraints can be represented as *R*β ≤ 0, where the inequality is coordinate-wise, β′ = (β_{AA}, β_{UU}, β_{AU}), and

$$R=\left(\begin{array}{ccc}-1& 1& 0\\ 0& -1& 0\\ 0& 0& 1\end{array}\right)$$

Although more constraints might be desirable (e.g., | β_{AU} | ≤ β_{AA}), they are not possible, because *R* must be of full rank. To test the null hypothesis of no linkage (β = 0) versus the alternative hypothesis of linkage under the constraint *R*β ≤ 0, we can use the score statistic for one-sided alternatives developed by Silvapulle and Silvapulle [24]. Using the *X** matrix that results from equation (14) to compute *U* and *V*_{U} of equations (3) and (4), the constrained score statistic can be expressed as

$${T}_{3}={\stackrel{~}{\beta}}^{\prime}{V}_{\stackrel{~}{\beta}}^{-1}\stackrel{~}{\beta}-min\left\{{\left(\stackrel{~}{\beta}-b\right)}^{\prime}{V}_{\stackrel{~}{\beta}}^{-1}\left(\stackrel{~}{\beta}-b\right):Rb\le 0\right\},$$

(15)

where $\stackrel{~}{\beta}={V}_{U}^{-1}U$ and *V*_{β} = *V*_{U}^{−1}. The trick of using $\stackrel{~}{\beta}$ in place of the score vector *U* is required because we restrict the β's (e.g., β_{AA} ≥ β_{UU}). We could not simply restrict the corresponding *U* statistics, (e.g., *U*_{AA} ≥ *U*_{UU}), because the magnitudes of the score statistics depend on the number of pairs of each type, as well as the variation of the scores. Rather, by using $\stackrel{~}{\beta}$, we implicitly account for this information. One can view $\stackrel{~}{\beta}$ as a first-order approximate estimate of the unconstrained regression parameters for two reasons. First, $\stackrel{~}{\beta}$ is the first updated estimate from a Newton-Rhaphson procedure. Second, under alternative hypotheses that are local to the null hypothesis, in the sense of *H*_{A} : β = *n* ^{−1/2}δ, the expected value of $\stackrel{~}{\beta}$ is β. To compute expression (15), one only needs *U* and *V*_{U} that are used to compute the unconstrained score test, *T*_{1}, and to compute min{…} using a quadratic program; we used the algorithm of Wollan and Dykstra [25]. Note that if $\stackrel{~}{\beta}$ satisfies the inequality constraints, then min{…} = 0, and the first term is ${\stackrel{~}{\beta}}^{\prime}{V}_{\stackrel{~}{\beta}}^{-1}\stackrel{~}{\beta}={U}^{\prime}{V}_{U}^{-1}U$, in which case the constrained score statistic would equal the unconstrained score statistic. If, however, the constraints are not satisfied, then min{…} > 0, in which case the constrained score statistic would be less than the unconstrained score statistic. The potential gain in power by using the constrained score statistic comes from a reduction in the degrees of freedom. The distribution of the unconstrained score statistic is a χ^{2}_{3} distribution, whereas the distribution of the constrained score statistic is a mixture of χ^{2} distributions, denoted ${\overline{\chi}}^{2}$ (chi-bar-squared). This mixture is

$${\overline{\chi}}^{2}=\sum _{k=0}^{3}{w}_{k}\left(R{V}_{\stackrel{~}{\beta}}{R}^{\prime}\right){\chi}_{k}^{2}$$

where the mixture proportions, *w*_{k} (·), are functions of the matrix $R{V}_{\stackrel{~}{\beta}}R\prime $, and are given in Appendix 2.

As emphasized by Blackwelder and Elston [23], a critical assumption for valid linkage analysis with affected relative pairs is Mendelian transmission of the marker alleles in the absence of linkage. One way to violate this assumption is misspecification of the marker allele frequencies when parents are missing genotypes. This can cause excessive *IBD* sharing even among discordant relative pairs. Other ways to violate the assumption of Mendelian transmission are misspecified pedigree relationships and inbreeding. Following the advice of Blackwelder and Elston to evaluate Mendelian transmission, we use our developed score statistics to evaluate whether unexpected allele sharing occurs for a dataset by simply comparing the constrained versus unconstrained score statistics. Excess *IBD* sharing for *AA* pairs and *UU* pairs, and deficient sharing for *AU* pairs, are all consistent with linkage. In contrast, if any of these sharing patterns are in the opposite direction, then unexpected *IBD* sharing would be suspected. In this later case, the p value for the unconstrained statistic would be smaller than that for the constrained statistic, suggesting violation of Mendelian transmission, or using a misspecified model, or genotype errors.

To include covariates, it is sensible to allow the covariate effects to differ for each of the *AA*, *UU*, and *AU* pair types. This implies that for *p* covariates, a total of 3(*p* + 1) regression parameters would be required. For complex traits, it is unlikely that the magnitude of the covariate effects, relative to their standard errors, would be large enough to compensate for the large number of degrees of freedom for these types of analyses. For this reason, we do not advocate routine use of unaffected subjects to test for linkage with covariates. Nonetheless, our general approach allows such modeling when deemed appropriate.

Under the null hypothesis, the *S*_{i} scores for a pedigree can be correlated. Three illustrative examples follow. First, for a fully informative marker, it can be shown that the scores are uncorrelated for different sib pairs within the same nuclear family, even for pairs that overlap with a sib in common (although these scores are not jointly independent). Second, it can also be shown that the correlation is −1 between the scores for a grand-child paired with each of its grand-parents (i.e., grand-parents are spouses); knowing that a grand-child and grand-parent share 1 allele *IBD* implies that the same grand-child cannot share an allele *IBD* with the other grand-parent. Third, positive correlations arise between the score of first cousins and the score between their parents (i.e., when their parents are full sibs).

Depending on the size of a pedigree, we use either an exact method or a simulation method to compute the covariance matrix *V*_{0, i} for the vector *S*_{i} under the null hypothesis of no linkage, assuming fully informative markers. For these methods, we need to consider the joint distribution of *s*_{i, j} and *s*_{i, k}, the estimated allele sharing for relative pair *j* and relative pair *k*, both pairs from pedigree *i*. For small pedigrees, the exact method enumerates all possible *IBD* states for all possible pairs of relatives, along with the null probabilities of the *IBD* states. The software Merlin [26] computes this information using the options ––ibd ––matrices. For large pedigrees that cannot be analyzed by Merlin, the covariance matrix can be approximated by simulated ‘gene dropping’. For each founder of a pedigree we assign a pair of integer codes for their two alleles. If there is no inbreeding, the allele codes are unique; for *f* founders, we assign the integers 1, 2,…, 2 *f*. These alleles are randomly transmitted throughout the pedigree according to Mendelian segregation (for either an autosome or the X chromosome). Then, for a particular configuration of transmitted alleles, we evaluate the *IBD* status for all possible pairs of relatives, allowing us to create an instance of the *S*_{i} vector under the null hypothesis of no linkage. This process is repeated a large number of times, and the sample covariance matrix of these simulated *S*_{i} vectors is used to estimate *V*_{0, i}.

The simulation results of Greenwood and Bull [6] showed that models with covariates can improve power to detect linkage in *ASP* studies, particularly when there is gene-covariate interaction and some *ASP* s are discordant for their covariates. Further work by Peng et al. [14] showed that correctly modeling gene-covariate interaction in *ASP* studies can lead to substantially increased power to detect linkage, in contrast to analyses that ignore the covariates. In contrast to our approach that considers how covariates affect the mean *IBD* sharing, Peng et al. derived score tests for linkage in *ASP* data by starting with a penetrance model for the joint effects of a genotype and a covariate on the phenotype. Despite this different starting point, one of their derived score statistics is a special case of our *T*_{1} statistic used to test linkage with covariates. Because they restricted their analyses to *ASP* s, our scaling factor, *c*_{r}, for computing scaled *X* covariates is not required, and *V*_{0, i} is constant, and so it drops out of the score equation in expression (3). The variance of their score vector, *V*_{U}, is the same as ours.

Overall, Peng et al. [14] found that if the influence of the gene-covariate interaction is negligible, there can be a loss in power by including the covariates, roughly equivalent to a 25% loss in sample size. This is caused by the additional degrees of freedom for the covariates. In contrast, there can be a substantial gain in power if the gene-covariate interaction is strong. An advantage of their approach is that it provides insight on the best ways to code covariates that are sensitive to gene-covariate interaction. Let *w*_{1} and *w*_{2} denote a covariate for subjects 1 and 2 of an *ASP*. Peng et al. proposed a three-dimensional statistic with a pair-specific covariate vector *X* = (1, *w*_{1} + *w*_{2}, *w*_{1} *w*_{2}), and a similar two-dimensional vector that eliminates the term *w*_{1} *w*_{2}. They found that the two-dimensional score statistic is often more powerful than the three-dimensional score statistic, consistent with conclusions by Gauderman and Siegmund [13]. These findings suggest that simple models for covariate effects, with fewer degrees of freedom, will likely provide the desired gain in power when the covariate effects are at least of moderate strength. The simple coding of a pair-specific sum will likely achieve the desired results when there is gene-covariate interaction.

Our software package, *ibdreg*, uses either SPLUS or R statistical computing environments as an interface, along with linked ANSI C code for rapid computations. The calculation of *IBD* probabilities requires an external program. We use Merlin [26], and provide PERL scripts to simplify the required steps. This software can be downloaded from our web site.

Our newly developed methods were applied to a study of linkage for familial prostate cancer conducted at the Mayo Clinic. This study included 159 pedigrees with a total of 429 *ARP* s, with both microsatellite and single nucleotide polymorphism (SNP) genetic markers, and so provided high information content, despite not having genotypes for the founders of the pedigrees. Details of pedigree sampling and genotyping can be found elsewhere [27, 28]. The study was approved by the Mayo Clinic Institutional Review Board.

The number of affected men per pedigree ranged 2–7, with 78 pedigrees (49%) having two affected men, 57 (36%) having three, 17 (11%) having four, and the remaining 7 (4%) having five to seven affected men. The types of relationships are summarized in table table11 according to prior probabilities of *IBD* sharing and the numbers of *ARP* s. Although there were five genetically different types of *ARP* s, the majority were either full-sibs, half-sibs, or first cousins. Without accounting for pair-specific covariates, the most significant evidence for linkage was on chromosome 20, with a Kong and Cox [29] exponential allele sharing LOD score of 2.4 [28]. For our new analyses with covariates, we considered three pair-specific covariates: average number affected men in a pedigree, sum of ages at diagnosis, and sum of Gleason scores. Gleason score indicates prostate tumor differentiation (ranging from 2 for well-differentiated to 10 for poorly differentiated cellular architecture) and is a measure of disease aggression, with larger values having poorer prognosis.

In addition to analyses by our new score statistics, we performed regression analyses with the LODPAL software [2], which approximates the pseudo-likelihood of *IBD* sharing probabilities by a trinomial logistic regression model, as proposed by Goddard and Olson [1] (see details in Appendix A). This model imposes the minimax constraint of Whittemore and Tu [19], which is approximately halfway between a dominant and a recessive model, and requires a regression coefficient for each pair-specific covariate.

The results for simultaneously testing linkage with the covariate age at diagnosis (pair-specific sum) are presented in figure figure11 for our new score statistic and for LODPAL. We present results for the approximate ‘minimax’ scaling (equation A17), which were quite close to those obtained from the ‘no dominance’ scaling (equation A10; results not shown). This figure illustrates several key points. First, the results from our score statistic and from LODPAL are generally quite close. However, when they differ, the difference can be quite large, such as for chromosome 5, where LODPAL gives extraordinary large LOD scores, most likely due to instability of the maximization of the LODPAL pseudo-likelihood. Other examples are chromosomes 6 and 20, for which the LODPAL peak appears much too narrow for the number of meioses in these data. We interpret these differences as difficulties with maximization of the LODPAL pseudo-likeli-hood. Score statistics circumvent this problem by not needing to maximize a complex function. These patterns of similarities and differences were found for the other covariates, number affected men in a pedigree and the pair-sum of Gleason scores (results not shown).

Linkage results for constrained score statistic versus LODPAL for affected relative pairs with the pair-specific covariate sum of age at diagnosis.

Given the more striking linkage signal on chromosome 20, we focused on this chromosome to illustrate additional analyses with our score statistics. Figure Figure22 illustrates the three types of tests: linkage without covariates, linkage with covariates, and the covariate effect on *IBD* sharing. Although linkage with the covariate (pair-sum of age at diagnosis) slightly increased the linkage signal, the signal without the covariate was almost as large. This suggests that the covariate makes only a small contribution to the linkage signal, which is further emphasized by the broken line in figure figure22 for the effect of the covariate on *IBD* sharing (which used the robust variance, not the model-based variance).

Linkage results for constrained score statistics with and without a covariate, and for covariate effect on *IBD* sharing (robust variance) for chromosome 20, using only affected relative pairs. Pair-specific covariate is the sum of age at diagnosis.

The analyses presented in figures figures11 and 2 were for only affected relative pairs. To examine the influence of unaffected relatives on linkage, we present in figure figure33 the constrained score statistics for each of the *AA*, *AU*, and *UU* pairs, as well as the global score statistic that simultaneously considers all types of pairs, constraining each type of pair to have *IBD* sharing in the direction that favors linkage. The results in figure figure33 emphasize that the unaffected subjects contribute little to linkage, and that including them diminishes the linkage signal of the global score test compared to the score test for the subset of *AA* pairs. This is not surprising for a complex trait like prostate cancer, where the unaffected men tend to be much less informative than affected men.

Linkage results for constrained score statistics without co-variates using all relatives: separate score statistics for affected pairs (*AA*), for unaffected pairs (*UU*), and for discordant pairs (*AU*), and a global test for all types of pairs.

To evaluate whether there is evidence for unexpected *IBD* sharing, we used only affected men (we have few unaffected men) to compare the constrained versus unconstrained score statistics. The results, illustrated in figure figure4,4, show that for most chromosomes the p values for the unconstrained score statistics were less extreme than those for the constrained statistics. Some notable exceptions were for chromosomes 1, 7, and 12, each of which had regions where the p values for the unconstrained statistics were more extreme than those for the constrained statistics. For more detailed diagnostic checks, we examined the pedigree-specific *z*-scores for these regions: ${U}_{i}/\sqrt{{V}_{i}}$, where *U _{i}* and

Because common diseases are influenced by both genetic and non-genetic factors, it can be crucial to account for non-genetic covariates when evaluating the strength of evidence for genetic linkage. To achieve this, we developed quasi-likelihood score statistics to test for: (1) linkage without covariates; (2) the simultaneous effects of linkage and covariates, and (3) the effects of covariates on *IBD* sharing. Although others have developed pseudo-likelihood ratio tests for these hypotheses [1, 4, 6], our approach offers several advantages. First, score statistics are rapid to compute, allowing rapid evaluation of a wide variety of covariates. Second, score statistics avoid the numerical problems of likelihood ratio statistics that require maximizing a complex function that is likely to have multiple modes, and hence can provide odd results, particularly with sparse data. Third, unlike some other methods, ours is not restricted to only *ASPs*, but rather includes all types of *ARP* s in a single score statistic. Fourth, we provide a mechanism to scale the covariates, which allows different levels of dominance variation. Fifth, we account for the correlation among different pairs from the same pedigree. Although our methods focus on relative pairs, the unit of analysis is the pedigree. This is achieved by creating a vector, for each pedigree, of allele sharing for all possible pairs of relatives of interest. The quasi-likelihood for this vector of pair-wise allele sharing incorporates a model for how pair-wise covariates influence the allele sharing. The quasi-likelihood also uses pedigree-specific covariance matrices for the vectors of pair-wise allele sharing values. These covariance matrices offer a new approach to account for the dependence among pairs originating from the same pedigree, in contrast to methods that treat pairs of relatives as the unit of analysis. It has been reported that the performance of robust sandwich variance estimators in generalized estimating equations with small samples can be poor, with inflated Type-I errors for Wald tests and conservative Type-I errors for score tests [30]. In our context, this would occur when there are few pedigrees. Fortunately, a simple adjustment to the statistic, by multiplying by *J*/(*J* − 1), where *J* is the number of clusters (pedigrees), seems to provide an adequate fix [30]. Alternatively, the model-based variance, calculated under the null of no linkage, might be more reliable for small sample sizes.

Although likelihood ratio statistics can have greater power than score statistics when there are large departures from the null hypothesis, this is not likely for common complex traits. Furthermore, because the likelihood methods implemented in LODPAL do not account for dependent *ARP* s from the same pedigree, the resulting statistics are not guaranteed to have a known distribution, even for large sample sizes. The joint dependence among *ARP* s can skew the computed allele sharing statistics. To overcome this problem, a variety of ad hoc weighting schemes have been proposed. However, weighting schemes can lead to extremely conservative linkage statistics, particularly for weighted likelihood ratio tests [31]. It might, however, be possible to overcome this limitation by using second order adjustments to the likelihood ratio test, based on comparisons of the observed information matrix with an empirical estimate [32]. Nonetheless, a significant advantage of our approach is the statistically valid method of using the null covariance for the entire vector of allele sharing for all possible pairs of subjects within a pedigree.

In our implementation of score statistics, we compute both the constrained and unconstrained versions, as a way to evaluate whether the constrained version makes sense. Without covariates, the p values for the constrained statistic should be smaller than that for the unconstrained statistic. Finding opposite results should raise concerns about ‘unexpected’ *IBD* sharing, possibly due to misspecified marker allele frequencies, genotype errors, etc. In early stages of our developments, when using both affected and unaffected relatives, we considered using the items from the score vector, *U*′ = (*U*_{AA}, *U*_{UU}, *U*_{AU}), as a way to examine unexpected allele sharing. For example, finding (${U}_{AA}/\sqrt{{V}_{AA}}\ll 0$ (where *V*_{AA} is the diagonal of the variance matrix for the *U* score), would suggest unexpected sharing. However, because *U*_{AA}, *U*_{UU}, and *U*_{AU} are correlated, this approach was not as informative as simply comparing the results from the constrained and unconstrained analyses.

When covariates are included, Greenwood and Bull [6] caution that allele sharing patterns can fall outside the triangle constraint when there is a qualitative gene-environment interaction (implying that an exposure changes the direction of effect of the *DS* allele, such as when the gene is protective in unexposed subjects, yet confers risk in exposed subjects), and when the sample contains pairs with discordant exposure [see also 20, 21]. Because of this, they suggested that the unconstrained estimates of allele sharing should be examined before using the constrained models.

Recently, Wallace et al. [33] derived a score test for the effects of covariates on *IBD* sharing for *ASP* studies. They were motivated to evaluate whether co-phenotypes, or endophenotypes, were associated with *IBD* sharing. Their score statistic is a special case of our quasi-likelihood score statistic under the assumption of no dominance variation (see Appendix 1) and when *ASP* s are independent from each other. They also determine genome-wide statistical significance by permuting covariate vectors among the *ASP*s – a method appropriate for independent pairs. Their method, however, does not account for the correlation of pairs from the same pedigree. In contrast, our more general approach not only allows for different types of *ARPs* and different scaling factors that allow different amounts of dominance variance, but it also accounts for the correlation among multiple *ARP* s from the same pedigree.

Although we did not compute simulation p values, a significant advantage of using score statistics is that they are rapid to compute, making simulations feasible. When doing so, it would be important to account for the correlation of information provided by multiple *ARP* s from the same pedigree. To do this in a rapid way, one could use the rapid simulation strategy for score statistics proposed by Lin [34]. He showed that under the null hypothesis, and conditional on the data, simulation p values can be computed by multiplying the observed *U* score statistic vectors by a standard normal random variable (which we denote *r*), and then computing the desired summary statistic. To account for correlations within pedigrees, one could first sum the score vectors within each pedigree, and then multiply the pedigree total score, *U*_{i}, by *r*. Under the null hypothesis, where *E* [*U*] = 0, the asymptotically equivalent statistic $\stackrel{~}{U}={\Sigma}_{{r}_{i}}{U}_{i}$ has the desired expectation zero and the same variance matrix *V*, since *r*_{i} ~ *N* (0,1) and is independent from *U*_{i}.

The results from our applications to a linkage study of prostate cancer illustrate that the score statistics tend to be more stable than methods that require maximization of a complex function. Although this particular application suggested that the covariates did not add substantial information to the linkage analyses, the applications illustrated how our methods can be used and interpreted. Further applications, particularly to datasets that have more rich covariates or co-phenotypes, will likely prove fruitful, given the results by Wallace et al. [33] that used co-phenotypes for hypertension in a sib-pair study.

We recognize that further study of the statistical properties of our methods is warranted. Some important considerations are how to control the Type-I error rate when multiple covariates are examined in a genome-wide analysis, understanding the conditions for which including covariates provides a gain in power, and evaluating whether certain covariate scaling factors can provide high power over a range of genetic and environmental influences on linkage. These are important considerations, but require such extensive simulations, with a broad range of both genetic and covariate influences on disease susceptibility, that a separate extensive study is postponed for future work.

*ibdreg* Package: http://mayoresearch.mayo.edu/mayo/research/biostat/schaid.cfm.

This work was supported by the U.S. Public Health Service, National Institutes of Health, contract grant numbers GM67768, CA72818.

Risch showed that the penetrances of the *DS* locus depend on the sibling risk ratio λ_{s}, which is the ratio of the probability that a sib of an affected person will also be affected, divided by the population disease prevalence [18, 35]. Since we work with the imputed value ${\stackrel{\circ}{f}}_{r,2}$ and ${\stackrel{\circ}{f}}_{r,1}$, we shall first show that their expected values, *z*_{r,2} and *z*_{r,1}, can be expressed as functions of the sibling risk ratio λ_{s} and the null prior probabilities *f*_{r,2} and *f*_{r,1}. We then express λ_{s} as a function of pair-specific covariates and their corresponding regression coefficients, β.

To show that *z*_{r,2} and *z*_{r,1} can be expressed as functions of the sibling risk ratio λ_{s} and the null prior probabilities *f*_{r,2} and *f*_{r,1}, we use results from Rich [18],

$${z}_{r,2}={f}_{r,2}{\lambda}_{m}/{\lambda}_{r},$$

(A1)

$${z}_{r,1}={f}_{r,1}{\lambda}_{o}/{\lambda}_{r},$$

(A2)

where λ_{m}, λ_{o}, and λ_{r} are the recurrence risk ratios for a monozygotic twin (*m*), an offspring (*o*), and a general relative of type *r*, respectively. Furthermore, assuming that there is no dominance variance of the penetrances, Risch [35] showed that

$${\lambda}_{r}-1=2{\phi}_{r}{V}_{A}/{K}^{2},$$

(A3)

where _{r} is the kinship coefficient, *V*_{A} is the additive genetic variance, and *K* is the population disease prevalence. Note that 2_{r} = (*f*_{r,2} + *f*_{r,1}/2) = *m*^{o}_{r}/2, the latter equality following from our notation for the null expected value of *s*_{r}. Using the special case of sibs, expression (A3) implies that (*V*_{A} / *K*^{2}) = 2(λ_{s} − 1). Substituting this into expression (A3) and using 2_{r} = *m*^{o}_{r}/2 allows us to express equation (A3) as

$${\lambda}_{r}-1=2{m}_{r}^{o}\left({\lambda}_{s}-1\right).$$

(A4)

Now, because we assumed no dominance variance, λ_{m} = 1 + 2(λ_{s} − 1) and λ_{o} = λ_{s}; using these terms with expression (A4) allows us to express equations (A1) and (A2) as

$${z}_{r,2}={f}_{r,2}\frac{\left[1+2\left({\lambda}_{s}-1\right)\right]}{\left[1+{m}_{r}^{o}\left({\lambda}_{s}-1\right)\right]}$$

(A5)

and

$${z}_{r,1}={f}_{r,1}\frac{{\lambda}_{s}}{\left[1+{m}_{r}^{o}\left({\lambda}_{s}-1\right)\right]}.$$

(A6)

We now add and subtract *f*_{r,2} to the right-hand side of equation (A5), and add and subtract *f*_{r,1} to the right-hand side of equation (A6), to obtain

$${z}_{r,2}={f}_{r,2}+{f}_{r,2}\frac{\left({\lambda}_{s}-1\right)\left(2-{m}_{r}^{o}\right)}{\left[1+\left({\lambda}_{s}-1\right){m}_{r}^{o}\right]}$$

(A7)

$${z}_{r,1}={f}_{r,1}+{f}_{r,1}\frac{\left({\lambda}_{s}-1\right)\left(1-{m}_{r}^{o}\right)}{\left[1+\left({\lambda}_{s}-1\right){m}_{r}^{o}\right]}.$$

(A8)

Since we want the expected number of alleles shared *IBD*, we use *m*_{r} = 2*z*_{r,2} + *z*_{r,1} to determine that

$${m}_{r}={m}_{r}^{o}+{c}_{r}\frac{\left({\lambda}_{s}-1\right)}{\left[1+\left({\lambda}_{s}-1\right){m}_{r}^{o}\right]}$$

(A9)

where *c*_{r} is

$${c}_{r}=2{f}_{r,2}\left(2-{m}_{r}^{o}\right)+{f}_{r,1}\left(1-{m}_{r}^{o}\right).$$

(A10)

Note that *m*_{r} depends on linkage only through the parameter λ_{s}; the other terms, *m*^{o}_{r} and *c*_{r}, are computed under the null of no linkage, and hence depend only on the relationship of the pair of relatives.

Like others [4 5 6 7], we assume that λ_{s} is an exponential function of pair-specific covariates, so that λ_{s} = exp(*X*^{T}β), where *X* and β are possibly vectors for multiple covariates, and *X* includes an intercept, with corresponding coefficient β_{0}. Substituting this exponential form into equation (A9) results in

$${m}_{r}={m}_{r}^{o}+{c}_{r}\frac{\left(exp\left\{{X}^{T}\beta \right\}-1\right)}{\left[1+\left(exp\left[{X}^{T}\beta \right]-1\right){m}_{r}^{o}\right]}.$$

(A11)

To develop *QL* score statistics, we use a first-order Taylor series approximation of equation (A11). This is motivated by the facts that score statistics are based on the behavior of the score function near the null hypothesis and they are powerful for a sequence of alternative hypotheses that are contiguous to the null. Hence, using a first-order expansion around β = 0 results in

$${m}_{r}={m}_{r}^{o}+{c}_{r}{X}^{T}\beta .$$

(A12)

Besides allowing us to view the expected allele sharing as a linear regression model with an offset of *m*^{o}_{i} and a scaled covariate, *X** = *c*_{r}*X*, this also simplifies the calculation of the *QL* score statistic in expression (1), because *m*_{r}/β_{j} = *c*_{r}*X*_{rj} = *X**_{rj} (we drop the subscript for a pedigree, *r* is for the *r*-th relative pair in a pedigree and *j* is for the *j*-th covariate).

At this point, it is worthwhile to understand the impact of the assumption of no dominance variance. For *ASP*s, a multinomial logistic model has been used for the expected allele sharing proportions [4, 6], with the form

$${z}_{j}\left({x}_{i}\right)=\frac{exp\left({X}_{i}^{T}{\beta}_{j}\right)}{1+exp\left({X}_{i}^{T}{\beta}_{1}\right)+exp\left({X}_{i}^{T}{\beta}_{2}\right)}.$$

(A13)

Because constraints on the *IBD* sharing proportions can increase power, Greenwood and Bull used simulations to evaluate the impact of various constraints, versus an unconstrained model. They found that the assumption of no dominance variance, applied to all covariate configurations, can have greater power than the unconstrained model, and was a strong competitor to both the triangle constraint [36] and a minimax constraint [19]. For *ASP*s, the assumption of no dominance variance implies that *z*_{1} = 0.5, which can be confirmed with Appendix equation (A8) by noting that *m*^{o}_{r} = 1 for *ASP*s.

Because the multinomial logistic regression model in equation (A13) requires two sets of regression coefficients (β_{1} and β_{2}), implying degrees of freedom that is twice the number of covariates, the power to detect linkage with this model can be weak. The assumption of no dominance variation reduces the number of parameters to a single set of regression coefficients. As an alternative to the assumption of no dominance variance, multiplicative models have been proposed for *ASP* data, whereby the covariate effect has twice the influence on the log relative risk for *IBD* = 2 versus *IBD* = 1 [8 9 10, 37]. Extending the multiplicative model to general types of relative pairs, the allele sharing probabilities can be expressed as

$${z}_{r,2}=\frac{{f}_{r,2}exp\left(2{X}^{T}\beta \right)}{{f}_{r,0}+{f}_{r,1}exp\left({X}^{T}\beta \right)+{f}_{r,2}exp\left(2{X}^{T}\beta \right)}$$

$${z}_{r,1}=\frac{{f}_{r,1}exp\left(2{X}^{T}\beta \right)}{{f}_{r,0}+{f}_{r,1}exp\left({X}^{T}\beta \right)+{f}_{r,2}exp\left(2{X}^{T}\beta \right)}$$

Using *m*_{r} = 2*z*_{r,2} + *z*_{r,1}, and taking a first-order approximation about β = 0, it can be shown that the resulting scaling factor, *c*_{r}, for this multiplicative model is the same as that given in expression (A10) for the no dominance model.

Another model for *IBD* sharing probabilities was proposed by Goddard and Olson [1], who used the minimax constraint of Whittemore and Tu [19], which can be expressed as *z*_{1} = 0.335 + 0.58 *z*_{0}. Applying this constraint to the multinomial regression for general types of affected relative pairs results in

$${z}_{r,1}=\frac{{f}_{r,1}exp\left({X}^{T}\beta \right)}{{f}_{r,0}+{f}_{r,1}exp\left({X}^{T}\beta \right)+{f}_{r,2}\left[3.634exp\left({X}^{T}\beta \right)-2.643\right]},$$

(A14)

$${z}_{r,2}=\frac{{f}_{r,2}\left[3.634exp\left({X}^{T}\beta \right)-2.643\right]}{{f}_{r,0}+{f}_{r,1}exp\left({X}^{T}\beta \right)+{f}_{r,2}\left[3.634exp\left({X}^{T}\beta \right)-2.643\right]}.$$

(A15)

Then, using these sharing proportions in *m*_{r} = 2*z*_{r,2} + *z*_{r,1}, and taking a first-order approximation about β = 0, it can be shown that

$${m}_{r}\approx {m}_{r}^{o}+{c}_{r}{X}^{T}\beta ,$$

(A16)

where the scaling factor now has the form

$${c}_{r}=7.268{f}_{r,2}-5.634{f}_{r,2}{f}_{r,1}-7.268{f}_{r,2}^{2}+{f}_{r,1}-{f}_{r,1}^{2}.$$

(A17)

Hence, by using *c*_{r} from equation (A17) to scale the covariates, we can approximate the minimax constraint used by Goddard and Olson [1] in our *QL* score statistics. Note that the scaling factor (A10), which results from assuming either no dominance variance or the multiplicative model, and the scaling factor (A17), which results from the minimax constraint, differ only for relative pairs that can share 2 alleles *IBD*, such as full sibs or double first-cousins; for other types of relative pairs, the two scaling factors are equivalent, and equal to *c*_{r} = *f*_{r,1} (1 – *f*_{r,1}). Hence, the impact of the choice of a particular scaling factor depends on the fraction of the sample that has affected relative pairs that can share 2 alleles *IBD*.

Mixture proportions for null distributions of test statistics with multivariate inequality constraints are given in the Appendix of Wolak [38]. For our case of dimension three, these proportions are

$${w}_{3}\left(\Sigma \right)=\frac{1}{4\pi}\left[2\pi -arccos\left(\rho {\u200a}_{12}\right)-arccos\left({\rho}_{13}\right)-arccos\left({\rho}_{23}\right)\right],$$

$${w}_{2}\left(\Sigma \right)=\frac{1}{4\pi}\left[3\pi -arccos\left(\rho {\u200a}_{12.3}\right)-arccos\left({\rho}_{13.2}\right)-arccos\left({\rho}_{23.1}\right)\right],$$

$${w}_{1}\left(\Sigma \right)=\frac{1}{2}-{w}_{3}\left(\Sigma \right),$$

$${w}_{0}\left(\Sigma \right)=\frac{1}{2}-{w}_{2}\left(\Sigma \right),$$

where ρ_{ij} is the *ij*-th element of the correlation matrix associated with the covariance matrix Σ, and ρ_{ij.k} is the partial correlation of *X*_{i} and *X _{j}*, with

$${\rho}_{ij,k}=\frac{{\rho}_{ij}-{\rho}_{ik}{\rho}_{jk}}{\sqrt{1-{\rho}_{ik}^{2}}\sqrt{1-{\rho}_{jk}^{2}}}.$$

1. Goddard KA, et al. Model-free linkage analysis with covariates confirms linkage of prostate cancer to chromosomes 1 and 4. Am J Hum Genet. 2001;68:1197–1206. [PubMed]

2. S.A.G.E. Statistical Analysis for Genetic Epidemiology. Cork, Ireland: Statistical Solutions Ltd.; 2004.

3. Risch N. Linkage strategies for genetically complex traits. III The effect of marker polymorphism on analysis of affected relative pairs. Am J Hum Genet. 1990;46:242–253. [PubMed]

4. Olson JM. A general conditional-logistic model for affected-relative-pair linkage. Am J Hum Genet. 1999;65:1760–1769. [PubMed]

5. Greenwood CMT, Bull SB. Incorporation of covariates into genome scanning using sib-pair analysis in bipolar affective disorder. Genet Epidemiol. 1997;14:635–640. [PubMed]

6. Greenwood CMT, Bull SB. Analysis of affected sib pairs, with covariates – with and without constraints. Am J Hum Genet. 1999;64:871–885. [PubMed]

7. Bull S, et al. Regression models for allele sharing: Analysis of accumulating data in affected sib pair studies. Stat Med. 2002;21:431–444. [PubMed]

8. Rice J. Commentary: The role of meta-analysis in linkage studies of complex traits. Am J Med Genet. 1997;74:112–114. [PubMed]

9. Morton NE. Logarithm of odds (lods) for linkage in complex inheritance. Proc Natl Acad Sci USA. 1996;93:3471–3476. [PubMed]

10. Rice J, et al. Covariates in linkage analysis. Genet Epidemiol. 1999;17:691–695. [PubMed]

11. Zhang W, et al. A linkage tournament: Affection status, parametric analysis, multivariate traits, and enhancements to variance components and relative pairs. Ann Hum Genet. 2002;66:87–98. [PubMed]

12. Schaid D, et al. Regression models for linkage: Issues of traits, covariates, heterogeneity, and interaction. Hum Hered. 2003;55:86–96. [PubMed]

13. Gauderman W, Siegmund K. Gene-environment interaction and affected sib pair linkage analysis. Hum Hered. 2001;52:34–46. [PubMed]

14. Peng J, Tang H-K, Siegmund D: Genome scans with gene-covariate interaction. Genet Epidemiol 2005. To Appear. [PubMed]

15. McCullagh P, Nelder JA. Generalized linear models. London: Chapman and Hall; 1983.

16. Heyde C. Quasi-likelihood and its applications: A general approach to optimal parameter estimation. New York: Springer; 1997.

17. Bourgain C, et al. Novel case-control test in a founder population identifies P-selectin as an atopy-susceptibility locus. Am J Hum Genet. 2003;73:612–626. [PubMed]

18. Risch N. Linkage strategies for genetically complex traits. II The power of affected relative pairs. Am J Hum Genet. 1990;46:229–241. [PubMed]

19. Whittemore AS, Tu I-P. Simple, robust linkage tests for affected sib pairs. Am J Hum Genet. 1998;62:1228–1242. [PubMed]

20. Dizier M, et al. The triangle test statistic (TTS): A test of genetic homogeneity using departure from the triangle constraints in IBD distribution among affected sib-pairs. Ann Hum Genet. 2000;64:433–442. [PubMed]

21. Guo S-W. Gene-environment interactions and the affected-sib-pair designs. Hum Hered. 2000;50:271–285. [PubMed]

22. Self SG, Liang K-Y. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc. 1987;82:605–610.

23. Blackwelder WC, Elston RC. A comparison of sib-pair linkage tests for disease susceptibility loci. Genet Epidemiol. 1985;2:85–97. [PubMed]

24. Silvapulle M, Silvapulle P. A score test against one-sided alternatives. J Am Stat Assoc. 1995;90:342–349.

25. Wollan P, Dykstra R. Minimizing linear inequality constrained mahalanobis distances. Appl Stat. 1987;36:234–240.

26. Abecasis G, et al. Merlin – rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30:97–101. [PubMed]

27. Cunningham JM, et al. Genome linkage screen for prostate cancer susceptibility loci: Results from the Mayo Clinic Familial Prostate Cancer Study. Prostate. 2003;57:335–346. [PubMed]

28. Schaid D, et al. Comparison of microsatellites versus single nucleotide polymorphisms by a genome linkage screen for prostate cancer susceptibility loci. Am J Hum Genet. 2004;75:948–965. [PubMed]

29. Kong A, Cox NJ. Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997;61:1179–1188. [PubMed]

30. Guo X, et al. Small-sample performance of the robust score test and its modifications in generalized estimating equations. Stat Med. 2005;24:3479–3495. [PubMed]

31. Greenwood C, Bull S. Down-weighting of multiple affected sib pairs leads to biased likelihood-ratio tests, under the assumption of no linkage. Am J Hum Genet. 1999;64:1248–1252. [PubMed]

32. Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika. 1990;77:485–497.

33. Wallace C, et al. Linkage analysis using co-phenotypes in the BRIGHT study reveals novel potential susceptibility loci for hypertension. Am J Hum Genet. 2006;79:323–331. [PubMed]

34. Lin DY. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005;21:781–787. [PubMed]

35. Risch N. Linkage strategies for genetically complex traits. I Multilocus models. Am J Hum Genet. 1990;46:222–228. [PubMed]

36. Holmans P. Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet. 1993;52:362–374. [PubMed]

37. Holmans P. Detecting gene-gene interactions using affected sib pair analysis with covariates. Hum Hered. 2002;53:92–102. [PubMed]

38. Wolak F. An exact test for multiple inequality and equality constraints in the linear regression model. J Am Stat Assoc. 1987;82:782–793.

39. Anderson T. An Introduction to Multivariate Analysis. ed 2. New York: John Wiley; 1984.

Articles from Human Heredity are provided here courtesy of **Karger Publishers**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |