Home | About | Journals | Submit | Contact Us | Français |

**|**BMC Genet**|**v.11; 2010**|**PMC2991276

Formats

Article sections

- Abstract
- Background
- Results
- Discussion
- Conclusion
- Methods
- Authors' contributions
- Supplementary Material
- References

Authors

Related links

BMC Genet. 2010; 11: 100.

Published online 2010 November 9. doi: 10.1186/1471-2156-11-100

PMCID: PMC2991276

Hao Mei,^{1} Wei Chen,^{2} Andrew Dellinger,^{3} Jiang He,^{1} Meng Wang,^{4} Canddy Yau,^{5} Sathanur R Srinivasan,^{2} and Gerald S Berenson^{}^{2}

Hao Mei: ude.enalut@iemh; Wei Chen: ude.enalut@1nehcw; Andrew Dellinger: ude.ekud@regnilled.werdna; Jiang He: ude.enalut@ehj; Meng Wang: nc.ude.ujn@gnawm; Canddy Yau: ude.enalut@uayc; Sathanur R Srinivasan: ude.enalut@1vinirss; Gerald S Berenson: ude.enalut@nosnereb

Received 2009 December 7; Accepted 2010 November 9.

Copyright ©2010 Mei et al; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been cited by other articles in PMC.

Quantitative traits often underlie risk for complex diseases. For example, weight and body mass index (BMI) underlie the human abdominal obesity-metabolic syndrome. Many attempts have been made to identify quantitative trait loci (QTL) over the past decade, including association studies. However, a single QTL is often capable of affecting multiple traits, a quality known as gene pleiotropy. Gene pleiotropy may therefore cause a loss of power in association studies focused only on a single trait, whether based on single or multiple markers.

We propose using principal-component-based multivariate regression (PCBMR) to test for gene pleiotropy with comprehensive evaluation. This method generates one or more independent canonical variables based on the principal components of original traits and conducts a multivariate regression to test for association with these new variables. Systematic simulation studies have shown that PCBMR has great power. PCBMR-based pleiotropic association studies of abdominal obesity-metabolic syndrome and its possible linkage to chromosomal band 3q27 identified 11 susceptibility genes with significant associations. Whereas some of these genes had been previously reported to be associated with metabolic traits, others had never been identified as metabolism-associated genes.

PCBMR is a computationally efficient and powerful test for gene pleiotropy. Application of PCBMR to abdominal obesity-metabolic syndrome indicated the existence of gene pleiotropy affecting this syndrome.

Quantitative traits often underlie increased risk for complex diseases. To understand the genetic basis of such traits, each trait is often separately tested for association with one or more markers. This approach has two disadvantages: 1) independent tests of each trait may lead to issues related to multiple testing; and 2) if a locus affects two or more traits, a single-trait study may lose the power to detect a pleiotropic effect, where a single gene influences multiple phenotypic traits.

In the past decade, simultaneous analysis of multiple traits in the context of linkage mapping of quantitative trait loci (QTL) has attracted much attention. Three approaches to simultaneous analysis have been developed and broadly applied, the first of which is generalization of maximum likelihood (ML) [1,2]. Although this method can be applied to multiple traits, a large number of correlated traits requires the simultaneous estimation of too many parameters, restraining its practical use [3]. The second approach, first proposed by Haley & Knott, is multivariate regression [4-7]. This approach is computationally faster than maximum likelihood and is available in most statistical software packages. But as with the ML method, the requirement for simultaneous estimates of a large number of parameters may limit its application. The third approach is based on transformation of original traits to a reduced number of canonical variables [3,8]. This approach is often implemented in two steps. First, principal components of original traits are identified to generate canonical variables. Next, a classical single trait method is used as the test of linkage between a candidate locus and a canonical variable. The test is then repeated for each combination of locus and variable and is corrected for multiple testing.

The resolution of QTL linkage mapping is generally low (typically ≥ 10 cM) [9]. Thus, a QTL linked to multiple traits may be a single QTL with pleiotropy or different QTLs within the mapping region that affect different traits. Association studies, in contrast, have much higher resolution, and are more feasible for identifying gene pleiotropy. Lange [10] proposed a family-based association method that constructs an overall phenotype by finding a linear combination of traits to maximize heritability. Klei [11] extended this method to population samples. Both methods use principal components, reducing multiple phenotypes to only a single trait, which can cause loss of power. In addition, maximization of heritability and association testing in the same samples may inflate type I error. To address this issue, Klei [11] proposed to split the sample into training and testing data and apply cross-validation to control error inflation, but this further increases computational complexity. In contrast to reduction of phenotypes, direct multivariate regression examines pleiotropy by simultaneous analysis of multiple phenotypes [12].

In this study, we propose to integrate two common methods that test for association by analyzing multiple traits simultaneously: principal components and multivariate regression. However, there are no comprehensive evaluations of this principal-component-based multivariate regression (PCBMR). In our study, we comprehensively evaluated the power and type I error of PCBMR using simulations that varied pleiotropic effects, linkage disequilibrium (LD), proportion of contributed correlation, and number of traits. We also used PCBMR to examine the pleiotropic effects of multiple traits on human abdominal obesity-metabolic syndrome.

Human abdominal obesity-metabolic syndrome [13], a cluster of syndrome phenotypes, increases the risk of developing both diabetes mellitus [14] and cardiovascular disease [15,16]. The prevalence of metabolic syndrome varies with age and sex [17]. Kissebah [18] performed a genome-wide linkage scan with a marker density of 10 cM in 2,209 individuals from 507 Caucasian families. They found one QTL, on chromosome 3q27, that was strongly linked to six phenotypes: body mass index (BMI), waist circumference (WC), hip circumference (HC), weight, insulin, and insulin/glucose (I/G). The results indicated possible pleiotropic effects. Francke replicated this result, finding the same locus on 3q27 through a genome-wide linkage scan of 99 families of northeastern Indian origin [19]. Here, we attempted to identify markers on 3q27 that are associated with the six traits above by using PCBMR to analyze data from the Bogalusa Heart Study [20].

The correlation coefficients between traits *Y _{1 }*and

Correlation coefficients between *Y _{1 }*and

The correlation coefficients between simulated traits *Y _{1 }*and

Under this simulation strategy, the number of traits affected by the QTL ranged from 2 to 10. The correlation coefficients between any pair of simulated traits were all ≥0.97 and the expected percentage of correlation contributed by the tested QTL was 8.7%. For all numbers of traits, PCBMR generated one canonical variable for the association test. Results are presented in Table Table4.4. Power depended on genetic model assumptions, with power decreasing in order among ADD, DOM, GEN, and REC. For different numbers of traits and different genetic model assumptions, the power of PCBMR was consistently close to that of SATN, with no significant difference detected by the binomial exact test. Power was approximately equal for different numbers of traits as well. The power of SATB decreased dramatically as the number of traits increased. Compared with SATB, PCBMR had significantly improved power, especially with larger numbers of traits.

A total of 1,196 subjects with 5,529 SNPs in the candidate region of chromosome 3 (at 182-227cM or 173.4-198.8 Mb) made up the study population. Quality control measures included the removal of SNPs with minor allele frequencies of ≤0.01 and Hardy-Weinberg equilibrium p-values of ≤1e^{-5}, leaving 4,769 SNPs in the study. The characteristics of the study participants are summarized in Table Table55 for both males and females, as follows: age (AGE, in years), weight circumference (WEIGHT, in kg), waist circumference (WAIST, in cm), body mass index (BMI, in kg/m^{2}), hip circumference (HIP, in cm), plasma insulin level (INSULIN, in μU/mL) and plasma insulin/glucose ratio (I/G). The pairwise correlation coefficients (*r*) among adjusted traits are presented in Table Table6.6. The correlations clustered into two groups, with the first group comprised of WEIGHT, BMI, WAIST, and HIP (r ≥ 0.89) and the second group comprised of INSULIN and I/G (*r = 0.97*).

The results of the PCBMR pleiotropic association studies based on the GEN model are presented in Figures Figures11 and and2.2. Markers with significant p-values (≤1e^{-5}) are summarized in Tables Tables77 and and8.8. For these markers, analyses based on recessive, dominant and additive models were conducted, and the best genetic model and its p-value were documented.

For the first trait group of WEIGHT, BMI, WAIST, and HIP, PCBMR generated a single canonical variable that explained 94.1% of the variance. With Bonferroni adjustment, PCBMR using the GEN model found four SNPs with significant pleiotropic association (p <*1e ^{-5}*) (Figure (Figure1).1). Among these, SNP rs11721044 at 174.6 Mb and rs11926347 at 185.2 Mb were located in genes NLGN1 (OMIM 600568) and ABCC5 (OMIM 60521), respectively (Table (Table77).

For the second trait group of INSULIN and I/G, PCBMR also generated a single canonical variable, and this variable explained 98.6% of the variance. Using the GEN model, thirty-four SNPs passed Bonferroni significance level (Figure (Figure2),2), of which 17 were found within 11 genes. SNP rs11926347, in an intron of ABCC5 (OMIM 60521), and SNP rs6795506, near the 5' end of AHSG (OMIM 138680), had extremely small p-values (Table (Table8).8). Among the other nine genes, ADIPOQ (OMIM 605441, 612556) has been widely reported to be associated with obesity and diabetes [21-24]; FNDC3B (OMIM 611909) is involved in positive regulation of adipogenesis [25]; and DGKG (OMIM 601854) and AHSG (OMIM 138680) have been reported to be associated with obesity-related metabolic traits [26,27]. The remaining genes have no reported relation to obesity-related metabolic traits based on our literature review.

SNP rs11926347 in ABCC5 showed significant pleiotropic association in both groups and the p-value was extremely small in the second group of traits (*-log(P) = 109.86*). To validate these PCBMR results, this SNP was extracted for further study. The SNP's phenotype distribution, divided by genotype, is presented in Table Table9.9. Its alleles are '*A*' and '*G*' and the frequency of the minor allele *'A' *is *0.02*. The Hardy-Weinberg Equilibrium (HWE) exact test[28] yielded a p-value of 0.37. Homozygotes for the minor allele ('A/A') exhibited only one extreme mean value for all six traits. Heterozygotes ('G/A') had smaller values than 'A/A' homozygotes but much larger values than homozygotes for the major allele ('G/G'). SATN analyses with adjustment for age and sex gave p-values of *≤1.15*10 ^{-5 }*for all traits (results not shown). With allele A as a reference, we conducted an examination of pleiotropic effects for rs11926347 based on additive, dominant, and recessive models. The corresponding -log

Most current association studies have been based on single trait-single marker or single trait-multiple marker tests. These kinds of studies lose power in identifying genes with pleiotropic effects. In some cases, genes with pleiotropy may be found by separately testing each trait. However, two major issues make this strategy not always appropriate. First, pleiotropic effects for each trait may be too weak to be identified. Second, multiple testing problems may either lower the power or inflate the type I error. It is therefore important to develop methods that can test for association by analyzing multiple traits simultaneously.

In this paper, we present the use of PCBMR as a method which detects pleiotropic effects by combining principal component methods and multivariate regression. PCBMR generates a set of independent canonical variables based on principal components. Each canonical variable is associated with multiple traits and the sum of all variables explains at least 80% of the variation. Analysis of canonical variables is simultaneously implemented by multivariate regression. The statistic of PCBMR is simply the sum of individual test statistics. PCBMR is computationally efficient and can be easily implemented by most statistical packages. This makes PCBMR fast and feasible not only for candidate-gene association studies but also for genome-wide association studies (GWAS).

Comprehensive studies of simulated data have shown that PCBMR has well-controlled type I error, about 5%, when a tested marker has no pleiotropy (simulation 1 and 3) or exhibits linkage equilibrium to the pleiotropic QTL, in the case of pleiotropic tested markers (simulation 2). The power of PCBMR depends on the extent of the pleiotropic effect and on the LD of the QTL. Larger pleiotropic effects and higher LD result in larger power (simulation 1 and 2). When the trait correlation caused by pleiotropy was not strong (simulation 1), the number of canonical variables was the same as the number of traits and the power was reasonably high, even compared with SATN. When there were strong correlations among traits (simulation 4), the reduced number of variables resulted in fewer degrees of freedom for the PCBMR test, and the power of PCBMR was as high as SATN. However, SATN always has much higher type I error than PCBMR due to multiple testing. PCBMR was robust to conflicting effects from environmental factors or other, untested QTLs (simulation 3). In all cases, multi-trait association analyses using PCBMR were much more powerful than multiple single-trait association analyses using SATB. For all tests, multiple traits simultaneously studied by PCBMR were compared with the single trait with the best power as determined by SATN and SATB. The present study showed that PCBMR is at least as powerful as SATN and more powerful than SATB under pleiotropy.

PCBMR has great extensibility. For equations (*1*) and (*2*), PCBMR can be extended to any distribution in the exponential family and the parameter *θ *can take any link function (e.g. logistic or log) that relates a mean, *Z _{i, }*to covariates [29]. The covariate can be a single variable for one marker or multiple variables for different markers. In addition, non-genetic factors with or without interaction terms can also serve as the covariate. The final statistic, approximating the χ

Comparisons of power estimates among PCBMR, SATB, and SATN in this study were based on analyses of the simulated additive model. To verify these findings, we tried studies on both simulated dominant and recessive models, and the same conclusions were obtained - that pleiotropic association studies by PCBMR are more powerful than single-trait association studies by either SATN or SATB (results not shown here). In addition, influences of model mismatch were also observed. For example, we observed that a pleiotropic study based on an additive model sacrificed its power when the true model was dominant or recessive. In addition, we observed that all studies based on the general model have acceptable power. In contrast to the additive model, which assumes linear trends of genotypic effect, and the dominant and recessive models, which assume equal effects of two genotypes for an SNP, the general model aims to separately estimate the effect of each genotype without any restriction. Therefore, PCBMR based on the general model has the advantage of testing for a pleiotropic effect when a complex trait has no obvious Mendelian inheritance.

As a real example, PCBMR was applied to test association in a study of traits-weight, waist circumference, BMI, hip circumference, plasma insulin, and insulin-glucose ratios-of abdominal obesity-metabolic syndrome in the Bogalusa Heart Study cohort. The traits were clustered into two groups based on two previously identified linkage peaks [18] and these two groups exhibited strong correlation. After multiple-test adjustment, PCBMR successfully identified several SNPs associated with the traits, especially in the trait group of INSULIN and I/G. Some of the genes had been well-characterized in prior studies, e.g. FNDC3B, which is involved in adipogenesis [25]. However, the functions of most of the genes were not yet explicitly clear at the time of the analysis. For example, some genes (e.g. ABCC5) are known to be related to energy metabolism, but are they truly involved in obesity-metabolic syndrome? If they are, what are their functions? The results from the use of PCBMR in this study offer guidance for future researchers in understanding genetic mechanisms and pathways in the pathogenesis of this human disease.

Although this study illustrates many advantages of PCBMR, there are also some challenges to be faced in terms of practical application. In contrast to pleiotropic linkage studies that map a QTL to a large locus [30], PCBMR-based studies can provide a higher resolution QTL position. However, the association may not justify the true pleiotropy of the identified marker or gene. For example, when PCBMR identifies a significant association by studying multiple traits, we may not observe significant association with a particular trait. This may result from either a weak pleiotropic effect or no effect at all. Such differentiation is generally difficult to achieve by statistical analysis. Further experimental studies or repeated studies with larger sample size are therefore necessary to confirm that the association is due to pleiotropy. In addition, the power of PCBMR depends on the assumptions of the genetic model, and misuse of a model will decrease power. The number of canonical variables also depends on the threshold. A value of 0.8 is used in simulation studies to explain at least 80% of the variation. Although this threshold is widely accepted for principal component analysis and has been proven to be suitable in our simulation studies, the ideal threshold may depend on practical data, with the exact value generally not known in advance. Furthermore, pleiotropic association is based on canonical variables, and to get an exact estimate of the effect on an original trait, a reverse transformation needs to be conducted.

Another challenge is to decide which traits should be studied simultaneously by PCBMR. Some strategies may help to address this challenge. Candidate traits could be those related to each other in the same pathway leading to a disease or symptom. For example, greater weight and BMI are correlated with obesity. Candidate traits could also include traits with linkage to the same region, such as two groups of traits with linkage peaks in two separate loci, as found in our studies of abdominal obesity-metabolic syndrome. Nevertheless, it is possible that two traits without much correlation may be strongly affected by a common gene. For example, in our simulation *1*, though the effect is strong at *b = 1*, the correlation coefficient (r) ranges from -0.35 to 0.37 with a mean of only about 0.10. In this case, selection of traits mainly depends on currently established knowledge.

PCA is an important tool for data mining that transforms a larger number of correlated variables into a smaller number of independent variables, *i.e.*, principal components. Factor analysis (FA), another important analytical tool, identifies common factors that capture variance-covariance of multiple variables with random error. PCA, in contrast, identifies principle components, with the restriction that random error must be zero[31]. Therefore, FA could be better suited to the analysis of observed traits with measured errors and to tests of genetic pleiotropy in some cases. The PCA-based multivariate regression proposed in this study can be easily extended to FA-based regression for testing of genetic pleiotropy in these cases. This can be implemented by replacing principal components with common factors. However, without estimation of random error, PCA is more computationally efficient for analyses involving large amounts of genetic data, and has great advantages in terms of practical application[32]. For most cases, PCA and FA procedures typically yield highly similar results[32]. This was also the case in the present study; we conducted an additional FA-based multivariate regression analysis of pleiotropic association with metabolic traits, and the results were the same as those obtained by PCBMR (please see additional file 1). This is consistent with previous findings that PCA and FA behave similarly in tests of genetic pleiotropy[33].

In spite of its potential challenges, PCBMR is a powerful and computationally efficient method of studying the huge amounts of genetic data generated by advanced technology, *e.g. *GWAS. For a large number of markers, we suggest a strategy of traditional single-trait studies on a candidate marker that PCBMR declares significant. This strategy can not only help to explain PCBMR results, but also has great advantages over traditional single-trait studies in alleviating multiple testing problems. Suppose there are *N *markers and *m *traits, and the experimental type I error is controlled at *α*. The significance level for tests of a marker in traditional single-trait studies is *α/(N*M)*. This level is extremely small when both *N *and *M *are large. In contrast, for a candidate marker, the significance level for this strategy is *α/(N+M)*. Generally, for most association studies and GWAS, *M *is much smaller than *N*, and the significance level will approximate *α/N*.

In summary, we propose the use of PCBMR, a computationally efficient method for the testing of gene pleiotropy. Although PCBMR is a combination of two established methods- principal components and multivariate regression-we are the first to comprehensively evaluate this technique in its combined form. The simulation studies described here indicate that this method is powerful for different kinds of pleiotropy. In spite of some challenges for its use in practical studies, PCBMR can greatly increase the power of association studies under pleiotropy and can broaden understanding of a gene's functions as well as its pathway and mechanisms. PCBMR is not only a useful method for candidate-gene based studies; as the generation of high-throughput expression data becomes increasingly efficient, PCBMR can be used to study pleiotropy in analyses of massive amounts of data, such as GWAS.

Given a set of traits, PCBMR uses the method of principal component analysis (PCA) [34,35] to construct one or more independent canonical variables based on a specific threshold (*θ*). Suppose Y = (*Y _{1}, Y_{2},..., Y_{m}) *represents variables of

$${Y}^{S}={({V}^{1/2})}^{-1}(Y-\mu ),$$

where *μ *is the mean of *Y *and *V *is a diagonal matrix with diagonal items equal to the variances of the corresponding traits. For *Y ^{S}*,

PCA finds the weighting vector *δ = (δ ^{1}, ..., δ^{p})^{T }*that maximizes the variance of canonical variable

$$Var(z)=\underset{\{\delta :|\left|\delta \right||=1\}}{{\displaystyle \mathrm{max}}}Var({\delta}^{T}{Y}^{s})=\underset{\{\delta :|\left|\delta \right||=1\}}{{\displaystyle \mathrm{max}}}{\delta}^{T}\rho \delta .$$

*δ *is proved to be an eigenvector of *ρ *[36]. If we use *z = [z _{1}, z_{2}, ..., z_{m}]^{T }*representing

Suppose *z _{1}, z_{2}, ...,z_{k }*have normal distributions with mean

$$\begin{array}{l}f({z}_{1},{z}_{2},\mathrm{...},{z}_{k}|{\theta}_{1},{\theta}_{1},\mathrm{...},{\theta}_{k};{\varphi}_{1},{\varphi}_{1},\mathrm{...},{\varphi}_{k})\\ \begin{array}{cccc}& & & ={\displaystyle {\prod}_{i=1}^{k}\mathrm{exp}[\frac{{z}_{i}{\theta}_{i}-b({\theta}_{i})}{a({\varphi}_{i})}+c({z}_{i},{\varphi}_{i})]}\end{array}\end{array}$$

(1)

Where *θ _{i }= μ_{i}, ϕ_{i }= σ_{i}^{2}, a(ϕ_{i}) = ϕ_{i}, b(θ_{i}) = θ_{i}^{2}/2 *and

In multivariate regression, PCBMR takes the canonic link. The mean regression model is *μ _{i }= Xβ_{i}+Wτ_{i}*, where

$${\beta}_{1}={\beta}_{2}=...={\beta}_{k}=0$$

We define the full model as the one without restriction of *H _{0 }*and the nested model as the one with restriction of

$$\begin{array}{l}L(\theta )=L({\theta}_{1},{\theta}_{2},\mathrm{...},{\theta}_{k}|\left\{{z}_{ij}\right\})\\ \begin{array}{cc}& \end{array}={\displaystyle {\prod}_{i=1}^{k}\left\{{\displaystyle {\prod}_{j=1}^{N}\mathrm{exp}[\frac{{z}_{ij}{\theta}_{i}-b({\theta}_{i})}{a({\varphi}_{i})}+c({z}_{ij},{\varphi}_{i})]}\right\}}\end{array}$$

(2)

The LRT statistic T is -2[logL($\tilde{\theta}$) - logL($\widehat{\theta}$)], where $\tilde{\theta}$ is the maximum likelihood estimate (MLE) of θ for the nested model and the $\widehat{\theta}$ MLE of θ for the full model. When the mean regression model, *θ _{i }= μ_{i }= Xβ_{i}+Wτ_{i}*, is input into equation (2), the T statistic is simplified to:

$$T={\displaystyle {\sum}_{i=1}^{k}(\frac{{\displaystyle {\sum}_{j=1}^{N}{({z}_{ij}-{\widehat{\mu}}_{i})}^{2}-}{\displaystyle {\sum}_{j=1}^{N}{({z}_{ij}-{\tilde{\mu}}_{i})}^{2}}}{{\widehat{\varphi}}_{i}})=}{\displaystyle {\sum}_{i=1}^{k}{T}_{i}}$$

The mean estimates, ${\widehat{\mu}}_{i}$ and, ${\tilde{\mu}}_{i}$ are calculated by simple linear regression of *z _{i }*on [X W] and W respectively. $\sum}_{j=1}^{N}{({z}_{ij}-{\widehat{\mu}}_{i})}^{2$ and $\sum}_{j=1}^{N}{({z}_{ij}-{\tilde{\mu}}_{i})}^{2$ are deviances of the full and nested models, respectively, and ${\widehat{\varphi}}_{i}={\widehat{\sigma}}_{i}^{2}$ is the estimate of dispersion, all of which can be calculated by almost all statistical packages.

The power of PCBMR may depend on many factors; some of these are: 1) the extent of the QTL pleiotropic effect; 2) the extent of LD between the tested marker and the pleiotropic QTL; 3) the portion of the trait correlation contributed by the tested QTL relative to the portion contributed by other QTL and environmental factors; and 4) the number of traits in the study. For each simulation, 1,000 datasets were generated. Type I error and power were calculated as percentages of the datasets, with *p-*value ≤ 0.05. Without loss of generality, in the following design, the QTL is simulated with additive effects on different traits. *Y _{1}, Y_{2}, ...Y_{k }*are original QTL traits,

The minor allele frequency of QTL is *0.2 *(*p = 0.2*), and simple linear regression models, *Y _{1 }*=

In this situation, the QTL (*p _{1 }= 0.2*) is not known directly. Instead, a marker of minor allele frequency

Two linear regression models, *Y _{1 }*=

$$\begin{array}{l}\rho ({Y}_{1},{Y}_{2})=\frac{Cov({Y}_{1},{Y}_{2})}{\sqrt{\mathrm{var}({Y}_{1})}\sqrt{\mathrm{var}({Y}_{2})}}\\ \begin{array}{cc}\begin{array}{cc}& \end{array}& \end{array}=\frac{{b}_{1}{b}_{2}\mathrm{var}(X)+{c}_{1}{c}_{2}\mathrm{var}(Q)+{d}_{1}{d}_{2}\mathrm{var}(W)}{\sqrt{\mathrm{var}({Y}_{1})}\sqrt{\mathrm{var}({Y}_{2})}}\end{array}$$

The proportion of the correlation contributed by QTL *X*, *P _{ρ}(b)*, is

$$\begin{array}{l}{P}_{\rho}(b)=\frac{{b}_{1}{b}_{2}\mathrm{var}(X)}{{b}_{1}{b}_{2}\mathrm{var}(X)+{c}_{1}{c}_{2}\mathrm{var}(Q)+{d}_{1}{d}_{2}\mathrm{var}(W)}\\ \begin{array}{cc}& \end{array}=\frac{0.32{b}^{2}}{0.32{b}^{2}+21.12}\end{array},$$

(3)

so *P _{ρ}(b) *increases as

Based on the linear regression model, *Y _{i }*=

Power and type I error were estimated for PCBMR under the four simulation conditions. For comparison, we conducted single-trait association studies using classical linear regression with (STAB) and without (SATN) Bonferroni adjustment. For single-trait association studies, only the trait with the largest power or type I error was presented in the paper. Based on different assumptions of the genetic models, there are four possible ways of processing the *X *variable for genotypes, which take values *0*, *1 *and *2*: 1) *X *is treated as a factor with three levels for the general model (GEN) without assumption of any genetic inheritance; 2) *X *is a linear variable in the additive model (ADD); 3) *X *is 0 for genotypes *0 *and *1*, and is *1 *for genotype *2 *in the dominant model (DOM); and 4) *X *is *0 *for genotype *1 *and is *1 *for genotypes *1 *and *2 *in the recessive model (REC). All four assumptions were considered separately for association tests by PCBMR and single trait regression.

Without loss of generality, we created indicator variables *M _{1 }*and

We applied PCBMR to search for markers associated with multiple traits related to abdominal obesity-metabolic syndrome in the Bogalusa Heart Study, a community-based investigation of the evolution of cardiovascular disease risk beginning in childhood [20]. Based on previous studies [18], we focused our studies on six traits (body mass index (BMI), waist circumference (WAIST), hip circumference (HIP), weight (WEIGHT), insulin (INSULIN) and insulin/glucose (I/G)) and on chromosome 3 from 182-227 cM (173.4-198.8 Mb), which contains potential pleiotropic QTL [18,19]. The most recent measures were used for all subjects. SNP genotyping was performed using data from Illumina Human610 BeadChips. Only SNPs passing our quality control measures were included in the study. BMI, WAIST, HIP and WEIGHT traits have a linkage peak at 189-190 cM, and insulin and I/G at 202-203 cM [18]. Hence, associations with the multiple traits of BMI, WAIST, HIP, and WEIGHT and of INSULIN and I/G were separately studied by PCBMR. These traits may depend on sex and age. Instead of analyzing original traits directly, traits were regressed by sex and age according to the following formula: *Y _{i }= U+AGE*b_{1}+AGE^{2}*b_{2}+SEX+E_{i}*, where residuals (

HM developed and implemented the method. HM and WC performed the simulations, analysis and interpretation of the data. All authors participated in planning and discussion of the study. All authors read and approved the final manuscript.

**Factor analysis-based study of pleiotropic association. **Table of significant pleiotropic association and figure of p-values of SNPs in linkage region.

Click here for file^{(90K, DOC)}

This study was supported by grants 0855082E and 0555168B from American Heart Association, AG-16592 from the National Institute on Aging and HL-38844 from the National Heart, Lung, Blood Institute.

Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.nih.gov/Omim/

- Jiang C, Zeng ZB. Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995;140(3):1111–1127. [PubMed]
- Korol AB, Ronin YI, Kirzhner VM. Interval mapping of quantitative trait loci employing correlated trait complexes. Genetics. 1995;140(3):1137–1147. [PubMed]
- Mangin B, Thoquet P, Grimsley N. Pleiotropic QTL analysis. Biometrics. 1998;54:88–99. doi: 10.2307/2533998. [Cross Ref]
- Calinski T, Kaczmarek Z, Krajewski P, Frova C, Sari-Gorla M. A multivariate approach to the problem of QTL localization. Heredity. 2000;84(Pt 3):303–310. doi: 10.1046/j.1365-2540.2000.00675.x. [PubMed] [Cross Ref]
- Hackett CA, Meyer RC, Thomas WT. Multi-trait QTL mapping in barley using multivariate regression. Genet Res. 2001;77(1):95–106. doi: 10.1017/S0016672300004869. [PubMed] [Cross Ref]
- Knott SA, Haley CS. Multitrait least squares for quantitative trait loci detection. Genetics. 2000;156(2):899–911. [PubMed]
- Korol AB, Ronin YI, Nevo E, Hayes PM. Multi-interval mapping of correlated trait complexes. Heredity. 1998;80(3):273–284. doi: 10.1046/j.1365-2540.1998.00253.x. [Cross Ref]
- Weller JI, Wiggans GR, Vanraden PM, Ron M. Application of a canonical transformation to detection of quantitative trait loci with the aid of genetic markers in a multi-trait experiment. Theor App Genet. 1996;92:998–1002. doi: 10.1007/BF00224040. [PubMed] [Cross Ref]
- Mackay TF. The genetic architecture of quantitative traits. Annu Rev Genet. 2001;35:303–339. doi: 10.1146/annurev.genet.35.102401.090633. [PubMed] [Cross Ref]
- Lange C, van Steen K, Andrew T, Lyon H, DeMeo DL, Raby B, Murphy A, Silverman EK, MacGregor A, Weiss ST. et al. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat Appl Genet Mol Biol. 2004;3 Article17. [PubMed]
- Klei L, Luca D, Devlin B, Roeder K. Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet Epidemiol. 2008;32(1):9–19. doi: 10.1002/gepi.20257. [PubMed] [Cross Ref]
- Stich B, Piepho HP, Schulz B, Melchinger AE. Multi-trait association mapping in sugar beet (Beta vulgaris L.) Theor Appl Genet. 2008;117(6):947–954. doi: 10.1007/s00122-008-0834-z. [PubMed] [Cross Ref]
- Bjorntorp P. Metabolic implications of body fat distribution. Diabetes Care. 1991;14(12):1132–1143. doi: 10.2337/diacare.14.12.1132. [PubMed] [Cross Ref]
- Haffner SM, Valdez RA, Hazuda HP, Mitchell BD, Morales PA, Stern MP. Prospective analysis of the insulin-resistance syndrome (syndrome X) Diabetes. 1992;41(6):715–722. doi: 10.2337/diabetes.41.6.715. [PubMed] [Cross Ref]
- Isomaa B, Almgren P, Tuomi T, Forsen B, Lahti K, Nissen M, Taskinen MR, Groop L. Cardiovascular morbidity and mortality associated with the metabolic syndrome. Diabetes Care. 2001;24(4):683–689. doi: 10.2337/diacare.24.4.683. [PubMed] [Cross Ref]
- Srinivasan SR, Myers L, Berenson GS. Changes in metabolic syndrome variables since childhood in prehypertensive and hypertensive subjects: the Bogalusa Heart Study. Hypertension. 2006;48(1):33–39. doi: 10.1161/01.HYP.0000226410.11198.f4. [PubMed] [Cross Ref]
- Esposito K, Pontillo A, Giugliano F, Giugliano G, Marfella R, Nicoletti G, Giugliano D. Association of low interleukin-10 levels with the metabolic syndrome in obese women. J Clin Endocrinol Metab. 2003;88(3):1055–1058. doi: 10.1210/jc.2002-021437. [PubMed] [Cross Ref]
- Kissebah AH, Sonnenberg GE, Myklebust J, Goldstein M, Broman K, James RG, Marks JA, Krakower GR, Jacob HJ, Weber J. et al. Quantitative trait loci on chromosomes 3 and 17 influence phenotypes of the metabolic syndrome. Proc Natl Acad Sci USA. 2000;97(26):14478–14483. doi: 10.1073/pnas.97.26.14478. [PubMed] [Cross Ref]
- Francke S, Manraj M, Lacquemant C, Lecoeur C, Lepretre F, Passa P, Hebe A, Corset L, Yan SL, Lahmidi S. et al. A genome-wide scan for coronary heart disease suggests in Indo-Mauritians a susceptibility locus on chromosome 16p13 and replicates linkage with the metabolic syndrome on 3q27. Hum Mol Genet. 2001;10(24):2751–2765. doi: 10.1093/hmg/10.24.2751. [PubMed] [Cross Ref]
- Pickoff AS, Berenson GS, Schlant RC. Introduction to the symposium celebrating the Bogalusa Heart Study. Am J Med Sci. 1995;310(Suppl 1):S1–2. [PubMed]
- Vasseur F, Helbecque N, Dina C, Lobbens S, Delannoy V, Gaget S, Boutin P, Vaxillaire M, Lepretre F, Dupont S. et al. Single-nucleotide polymorphism haplotypes in the both proximal promoter and exon 3 of the APM1 gene modulate adipocyte-secreted adiponectin hormone levels and contribute to the genetic risk for type 2 diabetes in French Caucasians. Hum Mol Genet. 2002;11(21):2607–2614. doi: 10.1093/hmg/11.21.2607. [PubMed] [Cross Ref]
- Filippi E, Sentinelli F, Trischitta V, Romeo S, Arca M, Leonetti F, Di Mario U, Baroni MG. Association of the human adiponectin gene and insulin resistance. Eur J Hum Genet. 2004;12(3):199–205. doi: 10.1038/sj.ejhg.5201120. [PubMed] [Cross Ref]
- Menzaghi C, Ercolino T, Di Paola R, Berg AH, Warram JH, Scherer PE, Trischitta V, Doria A. A haplotype at the adiponectin locus is associated with obesity and other features of the insulin resistance syndrome. Diabetes. 2002;51(7):2306–2312. doi: 10.2337/diabetes.51.7.2306. [PubMed] [Cross Ref]
- Vimaleswaran KS, Radha V, Ramya K, Babu HN, Savitha N, Roopa V, Monalisa D, Deepa R, Ghosh S, Majumder PP. et al. A novel association of a polymorphism in the first intron of adiponectin gene with type 2 diabetes, obesity and hypoadiponectinemia in Asian Indians. Hum Genet. 2008;123(6):599–605. doi: 10.1007/s00439-008-0506-8. [PubMed] [Cross Ref]
- Tominaga K, Kondo C, Johmura Y, Nishizuka M, Imagawa M. The novel gene fad104, containing a fibronectin type III domain, has a significant role in adipogenesis. FEBS Lett. 2004;577(1-2):49–54. doi: 10.1016/j.febslet.2004.09.062. [PubMed] [Cross Ref]
- Thorleifsson G, Walters GB, Gudbjartsson DF, Steinthorsdottir V, Sulem P, Helgadottir A, Styrkarsdottir U, Gretarsdottir S, Thorlacius S, Jonsdottir I. et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nat Genet. 2009;41(1):18–24. doi: 10.1038/ng.274. [PubMed] [Cross Ref]
- Andersen G, Burgdorf KS, Sparso T, Borch-Johnsen K, Jorgensen T, Hansen T, Pedersen O. AHSG tag single nucleotide polymorphisms associate with type 2 diabetes and dyslipidemia: studies of metabolic traits in 7,683 white Danish subjects. Diabetes. 2008;57(5):1427–1432. doi: 10.2337/db07-0558. [PubMed] [Cross Ref]
- Emigh TH. A Comparison of Tests for Hardy-Weinberg Equilibrium. Biometrics. 1980;36(4):627–642. doi: 10.2307/2556115. [PubMed] [Cross Ref]
- Faraway JJ. Extending Linear Models with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Boca Raton: Chapman & Hall/CRC; 2006.
- Gardner KM, Latta RG. Shared quantitative trait loci underlying the genetic correlation between continuous traits. Mol Ecol. 2007;16(20):4195–4209. doi: 10.1111/j.1365-294X.2007.03499.x. [PubMed] [Cross Ref]
- Larose DT. Data mining methods and models. Hoboken, New Jersey: John Wiley & Sons, Inc; 2006.
- Velicer WF, Jackson DN. Component Analysis versus Common Factor Analysis: Some Issues in Selecting an Appropriate Procedure. Multivariate Behavioral Research. 1990;25(1):28.
- Wang X, Kammerer CM, Anderson S, Lu J, Feingold E. A comparison of principal component analysis and factor analysis strategies for uncovering pleiotropic factors. Genet Epidemiol. 2009;33(4):325–331. doi: 10.1002/gepi.20384. [PMC free article] [PubMed] [Cross Ref]
- Hotelling H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology. 1933;24:417–441. doi: 10.1037/h0071325. [Cross Ref]
- Jolliffe IT. Principal Component Analysis. 2. New York: Springer; 2002.
- Härdle W, Simar L. Applied Multivariate Statistical Analysis. 2. New York: Springer; 2007.
- Mardia KV, Kent JT, Bibby JM. Multivariate Analysis. London: Academic Press; 1979.
- Weir BS. Genetic Data Analysis 2: Methods for Discrete Population Genetic Data. 2. Sinauer Associates, Sunderland, MA; 1996.
- Agresti A. Categorical Data Analysis. 2. New Jersey.: John Wiley & Sons, Inc; 2002.

Articles from BMC Genetics are provided here courtesy of **BioMed Central**