Hum Hered. Author manuscript; available in PMC Aug 9, 2013. Published in final edited form as: | PMCID: PMC3534768 NIHMSID: NIHMS411034 |

Natural and Orthogonal Interaction framework for modeling gene-environment interactions with application to lung cancer

Jianzhong Ma,^{a} Feifei Xiao,^{a} Momiao Xiong,^{b} Angeline S Andrew,^{c} Hermann Brenner,^{d} Eric J. Duell,^{e} Aage Haugen,^{f} Clive Hoggart,^{g} Rayjean J. Hung,^{h} Philip Lazarus,^{i} Changlu Liu,^{a} Keitaro Matsuo,^{j} Jose Ignacio Mayordomo,^{k} Ann G. Schwartz,^{l} Andrea Staratschek-Jox,^{m} Erich Wichmann,^{n} Ping Yang,^{o} and Christopher I. Amos^{a}

^{a}Department of Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA

^{b}Human Genetics Center, University of Texas School of Public Health, Houston, Texas, USA

^{c}Department of Community and Family Medicine, Norris Cotton Cancer Center, Dartmouth Medical School, Lebanon, NH, USA

^{d}Division of Clinical Epidemiology and Aging Research, German Cancer Research Center, Im Neuenheimer Feld 581, 69120, Heidelberg, Germany

^{e}Unit of Nutrition, Environment and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology (ICO-IDIBELL), Barcelona, Spain

^{f}The National Institute of Occupational Health, P.O. Box 8149 Dep. N-0033 Oslo 1, Norway

^{g}Epidemiology Unit, London School of Hygiene and Tropical Medicine, UK

^{h}Samuel Lunenfeld Research Institute, 60 Murray St. Toronto ON M5T 3L9 Canada

^{i}Departments of Pharmacology and Public Health Sciences, Penn State College of Medicine

^{j}Division of Epidemiology and Prevention, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya 464-8681, Japan

^{k}Servicio de Oncologia Medica, Hospital Clinico Universitario, Av. San Juan Bosco, 15 50009 Zaragoza, Spain

^{l}Karmanos Cancer Institute and Wayne State University School of Medicine, Department of Oncology, 4100 John R, Detroit, MI 48201, USA

^{m}Life and Medical Sciences Bonn, Genomics and Immunoregulation, University of Bonn, Bonn, Germany

^{n}Helmholtz Zentrum Munchen, Deutsches Forschungszentrum fur Gesundheit und Umwelt (GmbH), Ingolstadter Landstr. 1, 85764 Neuherberg, Germany

^{o}Mayo Clinic Cancer Center, 200 First street SW, Rochester, MN 55905, USA

Multicollinearity occurs naturally in genetic regression analysis using functional models between the additive component and the dominance component and becomes even more complicated between the main effects and interaction effects when two or more genes or environmental factors are involved. When multicollinearity is present, the standard errors can become large and thus coefficients need to be very large in order to be statistically significant. In the NOIA framework, we solve the collinearity problem by orthogonalizing the dominant regressor with respect to the additive regressor, in order to keep the *natural* meaning of the coefficient of the additive regressor, i.e. the effect of allele substitutions. As a result, the NOIA statistical and functional models have identical additive regressor (i.e. the number of variant allele) and dominance coefficients, but different additive coefficient and dominant regressor terms. This strategy is exactly the same as the Gram-Schmidt process in mathematics for orthonormalising a set of vectors. This orthogonalizing procedure assigns all the shared variance of the additive and dominant components in the functional model to the additive component in the statistical model, thus usually making the power for detecting the additive effects higher. Our simulations and real-data analysis confirmed this anticipation. We found that the statistical model usually showed higher power in detecting main and/or interaction effects for both linear regression for quantitative traits and logistic regression for binary traits.

However, caution has to be exercised in interpreting the results of the statistical model. Specifically, the meaning of additive effect (*α*) in the statistical model is different from that in the functional model (*a*). The statistical effect, *α*, is determined not only by the true additive effect, *a*, but also by the dominance effect, *d*, and allele frequency. Nevertheless, both tests for *α* and for *a* give information on whether there exists a genetic factor for a quantitative trait or the risk of a disease. Our results shown in the figures indicated that transformation from the parameters used in the usual functional model to those in the statistical model leads to a more powerful test for the existence of a genetic factor while allowing for a dominant effect and a GxE interaction.

Some of the important properties of the NOIA framework for linear regression of quantitative traits are not always valid for logistic regression of qualitative traits, when we generalize the statistical model to the later case by treating the logit of the disease as genotypic values and genetic effects. Under the alternate model, when there is an association between the genotypes or environmental factors, the estimates of logistic regressing parameters are no longer uncorrelated. Also, under the alternate model, the main effects of a full interaction model are not the same as the corresponding main effects of the reduced single-gene model or the environment-only model. Nevertheless, we still advocate the application of the statistical model in analyzing case-control data, because it is more powerful in most of the cases.

Application of the NOIA statistical model to the ILCCO data confirmed the associations of the following loci with lung cancer through main effects: rs2736100, rs402710, rs16969968, and rs8034191. The main effects of these loci under the usual functional model were not significant (or had a larger P value) while allowing for gene-smoking interaction. Furthermore, the gene-smoking interaction was more significant under the statistical model for loci rs2256543, rs16969968, and rs8034191. Specifically, the statistical model revealed that the locus rs2256543 plays a rule in the development of lung cancer through interaction with smoking, but not with a main effect.

Finally, the advantage of statistical model over the usual functional model is not limited to the study of interaction effects. We propose that even for one-locus genetic analysis, such as GWAS, one should consider applying the statistical model, since it orthogonalizes the additive and dominant effects and hence improves power of detecting genetic effects. Although, the genetic effects in the statistical model usually are determined not only by the biological mechanisms but also the population properties, proper explanations of the genetic effects can be achieved through transformations established in the NOIA model.