Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3087204

Formats

Article sections

Authors

Related links

Genet Epidemiol. Author manuscript; available in PMC 2011 December 1.

Published in final edited form as:

PMCID: PMC3087204

NIHMSID: NIHMS288418

The publisher's final edited version of this article is available at Genet Epidemiol

See other articles in PMC that cite the published article.

The distribution of two-point heterogeneity lod scores (HLOD) has been intensively investigated because the conventional χ^{2} approximation to the likelihood ratio test is not directly applicable. However, there was no study investigating the distribution of the multipoint HLOD despite its wide application. Here we want to point out that, compared with the two-point HLOD, the multipoint HLOD essentially tests for homogeneity given linkage and follows a relatively simple limiting distribution , which can be obtained by established statistical theory. We further examine the theoretical result by simulation studies.

Locus heterogeneity represents a form of genetic architecture of complex traits where alleles at more than one locus lead to the same phenotype. It adversely affects the power of linkage analysis if the heterogeneous disease genetic background of families is not taken into account. A natural way to model such heterogeneous data is by a mixture model, as first suggested by Smith [1963]. Under the mixture model framework one can either test for homogeneity given linkage [Ott, 1983] or test for linkage allowing for heterogeneity [Hodge et al., 1983] by a likelihood ratio test. The distribution of two-point heterogeneity lod scores (HLOD) has been intensively investigated [Abreu et al., 2002; Chernoff and Lander, 1995; Chiano and Yates, 1995; Faraway, 1993; Huang and Vieland, 2001; Lemdani and Pons, 1995; Liang and Rathouz, 1999] because the conventional χ^{2} approximation to the likelihood ratio test is not directly applicable [Davies, 1977, 1987]. However, to our surprise, there was no study investigating the distribution of the multipoint HLOD despite its wide application. The multipoint HLOD is reported by popular software packages such as GENEHUNTER [Kruglyak et al., 1996] and MERLIN [Abecasis et al., 2002] without a *P*-value accompanying it. Here we want to point out that, compared with the two-point HLOD, the multipoint HLOD essentially tests for homogeneity given linkage and follows a relatively simple limiting distribution, which can be obtained by established statistical theory. We further examine the theoretical result by simulation studies.

Denote by *M* the genotype data, by *D* the phenotype data, by α the admixture parameter, which indicates the proportion of families linked to the locus tested, and by *x* the map position of a putative disease locus. A general format of the likelihood for the *i*th family is defined as, *L _{i}*(

Below we examine the theoretical result by simulation studies under varying simulation models, analysis models, and sample sizes. We simulated nuclear families consisting of two parents and four children, and two linked markers 10 cM apart with four alleles of equal frequency at each locus. Two randomly chosen children were set to be affected and the other two unaffected. The parental phenotypes were simulated under three models. In model I, one parent was set to be affected and the other unaffected, which mimicked a dominant trait; in model II, both parents were set to be unaffected, which mimicked a recessive trait; and in model III, both parents were set to be unknown, which mimicked a trait with mode of inheritance unclear. We generated 5,000 replicate samples under each of the four sample size scenarios—100, 500, 1,000 and 5,000 families in a sample dataset. Denote the penetrance by *f _{i}*, where

The simulation results confirmed the limiting distribution of *T*_{multipoint} (= 2 × ln 10 × multipoint HLOD) to be . In Table I we estimated some parameters of the empirical distribution under analysis models D2, D5, D8, R2, R5, and R8 with varying sample sizes in each simulation scenario. The proportion of *T*_{multipoint} equal to zero under different models always approximated to a half. The mean and variance of a random variable of are 1 and 2, respectively. We observed the mean and variance of non-zero *T*_{multipoint} approximated to 1 and 2, respectively, under most dominant models and high penetrance recessive models when the sample size was large. Under each analysis model the mean and variance approached their expectations under the theoretical distribution as the sample size increased. However, both mean and variance were smaller than their expectations under low penetrance recessive models even if the sample size was 5,000. In contrast, the maximum likelihood estimate of mean approximated to 1 when sample size was 500 under any analysis model. Given the sample size and analysis model, the estimated mean of *T*_{multipoint} for data generated under simulation model I approximated to 1 closer than that for data generated under models II and III, which was mostly clearly illustrated when analyzing the data by low penetrance recessive models. We calculated the empirical type I error rate of the multipoint HLOD under analysis models D2, D5, D8, R2, R5, and R8 with varying sample sizes in each simulation scenario assuming that *T*_{multipoint} follows a distribution of (Table II). The results were consistent with the empirical distribution as summarized in Table I. The dominant analysis models gave proper type I error rate. The recessive models were conservative except for high penetrance models with large sample size. Figure 1 illustrated the empirical distribution of the multipoint HLOD under analysis models D5 and R5 with the theoretical distribution.

Empirical distribution of multipoint HLOD compared with the theoretical distribution under the null hypothesis of no linkage. (A) and (C) correspond to the probability density functions of HLOD analyzed under models D5 and R5, respectively; (B) and (D) **...**

The multipoint HLOD method is powerful to detect linkage even when the assumed heterogeneity model is incorrect [Greenberg and Abreu, 2001; Hodge et al., 2002]. However, the distribution of the multipoint HLOD has remained mysterious, possibly because it has been obscured by the complexity of the two-point HLOD. Similarly, the model-based multipoint lod score that is best suited in an evidential paradigm [Hodge et al., 2008] displays some asymptotic complexity and does not have a limiting distribution [Xing and Elston, 2006]. We have shown in this paper that, in contrast with the two-point HLOD and the multipoint lod score, the multipoint HLOD test statistics follows a relatively simple asymptotic distribution. That is, 2 × ln 10 × multipoint HLOD follows an asymptotic distribution of . This not only enables evaluating the significance level easily, but also facilitates further inferences such as multiple testing correction. The rate of convergence to asymptotic distribution depends on the informativeness of both markers and the trait. Given data, the pre-specified analysis model defines the informativeness of the trait. As the model-based linkage statistics generally do, the multipoint HLOD under low penetrance recessive models is conservative because the phenotype contributes little information under such trait models, which is reflected as a high proportion of zeros, low mean and small variance (Table I). Similarly, data simulated under model I contain more trait information than that simulated under models II and III; therefore, the multipoint HLOD under simulation models II and III is more conservative than that under simulation model I. In multipoint analysis, the marker information is relatively constant across a map; thus, the behavior of *T*_{multipoint} should also be relatively stable across the map given an analysis model. We note that the proportion of HLODs equal to zero is always greater than, though close to, one half; thus, a nominal *P*-value is presumably conservative. In this study we did not investigate the performance of the multipoint HLOD test statistic in the extreme tails of its null distribution, which will be crucial in determining genome-wide significance. Considering the efficiency of the test depends on multiple factors such as the true, yet unknown, disease model, analysis model employed, and sample size, when a large multipoint HLOD is observed in reality, it would be more appropriate to perform Monte Carlo simulations to evaluate the significance level [Lin and Zou, 2004].

We thank Dr. Robert Elston for critically reading the manuscript and helpful discussions, and thank Dr. Gonçalo Abecasis for clarification and advice on the maximization procedure in MERLIN. C.X. was partially supported by a Pilot Award from UL1RR024982 from the National Center for Research Resources.

- Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30:97–101. [PubMed]
- Abreu PC, Hodge SE, Greenberg DA. Quantification of type I error probabilities for heterogeneity LOD scores. Genet Epidemiol. 2002;22:156–169. [PubMed]
- Brent RP. Algorithms for Minimization Without Dervatives. New York: Dover Publications; 2002.
- Chernoff H, Lander E. Asymptotic distribution of the likelihood ratio test that a mixture of two binomials is a single binomial. J Statist Plann Inference. 1995;43:19–40.
- Chiano MN, Yates JR. Linkage detection under heterogeneity and the mixture problem. Ann Hum Genet. 1995;59:83–95. [PubMed]
- Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1977;64:247–254. [PubMed]
- Davies RB. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987;74:33–43. [PubMed]
- Faraway JJ. Distribution of the admixture test for the detection of linkage under heterogeneity. Genet Epidemiol. 1993;10:75–83. [PubMed]
- Greenberg DA, Abreu PC. Determining trait locus position from multipoint analysis: accuracy and power of three different statistics. Genet Epidemiol. 2001;21:299–314. [PubMed]
- Hodge SE, Anderson CE, Neiswanger K, Sparkes RS, Rimoin DL. The search for heterogeneity in insulin-dependent diabetes mellitus (IDDM): linkage studies, two-locus models, and genetic heterogeneity. Am J Hum Genet. 1983;35:1139–1155. [PubMed]
- Hodge SE, Vieland VJ, Greenberg DA. HLODs remain powerful tools for detection of linkage in the presence of genetic heterogeneity. Am J Hum Genet. 2002;70:556–559. [PubMed]
- Hodge SE, Rodriguez-Murillo L, Strug LJ, Greenberg DA. Multipoint lods provide reliable linkage evidence despite unknown limiting distribution: type I error probabilities decrease with sample size for multipoint lods and mods. Genet Epidemiol. 2008;32:800–815. [PMC free article] [PubMed]
- Huang J, Vieland VJ. The null distribution of the heterogeneity lod score does depend on the assumed genetic model for the trait. Hum Hered. 2001;52:217–222. [PubMed]
- Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES. Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996;58:1347–1363. [PubMed]
- Lemdani M, Pons O. Tests for genetic linkage and homogeneity. Biometrics. 1995;51:1033–1041. [PubMed]
- Liang KY, Rathouz PJ. Hypothesis testing under mixture models: application to genetic linkage analysis. Biometrics. 1999;55:65–74. [PubMed]
- Lin DY, Zou F. Assessing genomewide statistical significance in linkage studies. Genet Epidemiol. 2004;27:202–214. [PubMed]
- Lindsay BG. Testing for Latent Structure. Mixture Models: Theory, Geometry and Applications. Hayward: Institute of Mathematical Statistics; 1995.
- Ott J. Linkage analysis and family classification under heterogeneity. Ann Hum Genet. 1983;47:311–320. [PubMed]
- Ott J. Analysis of Human Genetic Linkage, 3rd edition. Baltimore, MD: The Johns Hopkins University Press; 1999.
- Self SG, Liang KY. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Statist Assoc. 1987;82:605–610.
- Smith CAB. Testing for heterogeneity of recombination fraction values in human genetics. Ann Hum Genet. 1963;27:175–182. [PubMed]
- Xing C, Elston RC. Distribution and magnitude of type I error of model-based multipoint lod scores: implications for multipoint mod scores. Genet Epidemiol. 2006;30:447–458. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |