Multicollinearity occurs naturally in genetic regression analysis using functional models between the additive component and the dominance component and becomes even more complicated between the main effects and interaction effects when two or more genes or environmental factors are involved. When multicollinearity is present, the standard errors can become large and thus coefficients need to be very large in order to be statistically significant. In the NOIA framework, we solve the collinearity problem by orthogonalizing the dominant regressor with respect to the additive regressor, in order to keep the natural meaning of the coefficient of the additive regressor, i.e. the effect of allele substitutions. As a result, the NOIA statistical and functional models have identical additive regressor (i.e. the number of variant allele) and dominance coefficients, but different additive coefficient and dominant regressor terms. This strategy is exactly the same as the Gram-Schmidt process in mathematics for orthonormalising a set of vectors. This orthogonalizing procedure assigns all the shared variance of the additive and dominant components in the functional model to the additive component in the statistical model, thus usually making the power for detecting the additive effects higher. Our simulations and real-data analysis confirmed this anticipation. We found that the statistical model usually showed higher power in detecting main and/or interaction effects for both linear regression for quantitative traits and logistic regression for binary traits.
However, caution has to be exercised in interpreting the results of the statistical model. Specifically, the meaning of additive effect (α) in the statistical model is different from that in the functional model (a). The statistical effect, α, is determined not only by the true additive effect, a, but also by the dominance effect, d, and allele frequency. Nevertheless, both tests for α and for a give information on whether there exists a genetic factor for a quantitative trait or the risk of a disease. Our results shown in the figures indicated that transformation from the parameters used in the usual functional model to those in the statistical model leads to a more powerful test for the existence of a genetic factor while allowing for a dominant effect and a GxE interaction.
Some of the important properties of the NOIA framework for linear regression of quantitative traits are not always valid for logistic regression of qualitative traits, when we generalize the statistical model to the later case by treating the logit of the disease as genotypic values and genetic effects. Under the alternate model, when there is an association between the genotypes or environmental factors, the estimates of logistic regressing parameters are no longer uncorrelated. Also, under the alternate model, the main effects of a full interaction model are not the same as the corresponding main effects of the reduced single-gene model or the environment-only model. Nevertheless, we still advocate the application of the statistical model in analyzing case-control data, because it is more powerful in most of the cases.
Application of the NOIA statistical model to the ILCCO data confirmed the associations of the following loci with lung cancer through main effects: rs2736100, rs402710, rs16969968, and rs8034191. The main effects of these loci under the usual functional model were not significant (or had a larger P value) while allowing for gene-smoking interaction. Furthermore, the gene-smoking interaction was more significant under the statistical model for loci rs2256543, rs16969968, and rs8034191. Specifically, the statistical model revealed that the locus rs2256543 plays a rule in the development of lung cancer through interaction with smoking, but not with a main effect.
Finally, the advantage of statistical model over the usual functional model is not limited to the study of interaction effects. We propose that even for one-locus genetic analysis, such as GWAS, one should consider applying the statistical model, since it orthogonalizes the additive and dominant effects and hence improves power of detecting genetic effects. Although, the genetic effects in the statistical model usually are determined not only by the biological mechanisms but also the population properties, proper explanations of the genetic effects can be achieved through transformations established in the NOIA model.