Association studies are widely used to identify genetic variants underlying complex human diseases, such as osteoporosis 
, obesity 
and diabetes 
. Association studies can be generally classified into two classes: single locus association studies (SLAS) and multiple loci association studies (MLAS) 
. SLAS detect associations between each individual locus and target traits. Because of being simple to implement, SLAS are popular in current association mapping of disease genes. However, there are several limitations for SLAS. First, the performance of SLAS largely depends on the linkage disequilibrium (LD) between testing loci and potential causal loci. SLAS may have low power if the LD between testing loci and potential causal loci is weak. Second, it is well known that the risks of complex human diseases are usually determined by the main and interactive effects of multiple genetic and environmental factors 
. Because SLAS conduct association tests at each individual locus, it is difficult to detect genetic interactive effects using SLAS. Third, association studies usually request a multiple testing adjustment procedure to ensure overall appropriate type I error rates, such as Bonferroni correction 
and false discovery rates 
. These multiple testing adjustment procedures are sometimes too strict, and may miss real disease-gene associations in large scale SLAS.
The limitations of SLAS promote the development of MLAS approaches. Because MLAS can simultaneously consider the genetic information of multiple loci, it is expected that MLAS were more powerful than SLAS in disease genes mapping. Multilinear regression is one of the major multivariate analyses approaches, and has been applied to MLAS 
. In multilinear regression, target trait values can be modeled as a function of independent variable vector corresponding to the genotypes of multiple loci in candidate genetic regions. Because of large degrees of freedom (dfs) in statistical tests, it is difficult to directly apply multilinear regression to large genetic regions for MLAS. Previous studies found that multilinear regression had similar or reduced power relative to SLAS in disease gene mapping 
. The increased power gained from combining the genetic information of multiple loci may be compromised by increasing dfs in multilinear regression. Additionally, the genotypes of multiple densely spaced loci are usually correlated due to LD, which may induce collinearity of genotype vectors, and decrease the power of multilinear regression for MLAS 
Several methods have been proposed to deal with large dfs in multilinear regression. The first one is tagSNPs-based multilinear regression 
. A set of tagSNPs capturing majority of the genetic information of candidate genetic regions, and having no or weak collinearity among each other, can be selected and included into multilinear regression for MLAS. Although selecting tagSNPs can decrease dfs in multilinear regression, it will result in the lost of genetic information and therefore decrease the power of MLAS, especially in the genetic regions with weak LD. Additionally, the power of tagSNPs-based association studies is affected by the performance of tagSNPs selection methods 
. The second method applies dimension reduction techniques, such as principle component analysis (PCA) 
and Fourier transformation 
, to genotype data and produces a set of orthogonal predictors capturing majority of the genetic information of candidate genetic regions. One can then detect associations between the extracted orthogonal predictors and target traits under multilinear regression 
. Besides the multilinear regression-based MLAS approaches mentioned above, other MLAS approaches are also available, such as genetic similarity-based MLAS 
and Bayesian-based MLAS 
Recently, Taylor and Tibshirani proposed the tail strength measure (TSM) for assessing the overall significance levels of multiple hypotheses tests in microarray studies 
. Using simulated and real microarray datasets, Taylor and Tibshirani illustrated the performance of TSM, and suggested that TSM could be used to assess overall significance levels in microarray and other genetic studies with a number of hypotheses tests 
. TSM may be able to evaluate overall association strength of multiple loci in association studies. However, the performance of TSM for MLAS remains unclear.
In this paper, we present a MLAS approach based on partial least-squares (PLS) analysis, while avoiding large dfs. As an extension of multiple linear regression, PLS generalizes and combines the features of PCA and multilinear regression 
. Through maximizing the covariance of denpendent and indenpendent variables, PLS searches for the components capturing majority of the information contained in indenpendent variables as well as in the relations between denpendent and indenpendent variables. In Materials and Methods
section, we first formulate our PLS-based MLAS. Using simulated data based on real data from the HapMap project, we show that PLS-based MLAS are simple to implement, and generally provides improved power in diseases genes mapping relative to tagSNPs-based MLAS, PCA-based MLAS and TSM-based MLAS. Finally, a real data is used to assess the performance of PLS-based MLAS for genome-wide MLAS.