Home | About | Journals | Submit | Contact Us | Français |

**|**Am J Epidemiol**|**PMC2732981

Formats

Article sections

Authors

Related links

Am J Epidemiol. 2009 January 15; 169(2): 219–226.

Published online 2008 November 20. doi: 10.1093/aje/kwn353

PMCID: PMC2732981

Correspondence to Cassandra E. Murcray, Department of Preventive Medicine, University of Southern California, 1540 Alcazar Street, CHP 222F, Los Angeles, CA 90089-9010 (e-mail: ude.csu@yarcrum).

Received 2007 November 27; Accepted 2008 May 5.

Copyright American Journal of Epidemiology © The Author 2008. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org.

See "Invited Commentary: From Genome-Wide Association Studies to Gene-Environment-Wide Interaction Studies—Challenges and Opportunities" on page 227.

See "Invited Commentary: Efficient Testing of Gene-Environment Interaction" on page 231.

This article has been cited by other articles in PMC.

It is a commonly held belief that most complex diseases (e.g., diabetes, asthma, cancer) are affected in part by interactions between genes and environmental factors. However, investigators conducting genome-wide association studies typically test for only the marginal effects of each genetic marker on disease. In this paper, the authors propose an efficient and easily implemented 2-step analysis of genome-wide association study data aimed at identifying genes involved in a gene-environment interaction. The procedure complements screening for marginal genetic effects and thus has the potential to uncover new genetic signals that have not been identified previously.

Many common, complex traits are believed to be a result of the combined effect of genes, environmental factors, and their interactions. For example, Ito et al. (1) showed a significant interaction between smoking status and the apurinic/apyrimidinic endonuclease 1 protein coding gene (*APE1*) for lung cancer. Stern et al. (2) found smoking status to be an effect modifier of the association between the XPD codon 751 polymorphism and risk of bladder cancer. Understanding the relation between genetic polymorphisms and environmental exposures can help to identify high-risk subgroups in the population and provide better insight into pathway mechanisms for complex diseases.

Methods for identifying disease susceptibility genes include linkage analysis, candidate gene association studies, and, more recently, the genome-wide association study (GWAS). It is known that a GWAS can be more powerful than linkage analysis in detecting genes associated with modest increases in disease risk (3). Current GWAS methods are designed to detect main effects, that is, direct associations of a single nucleotide polymorphism (SNP) or clusters of SNPs with disease (4, 5). In the context of complex diseases, scanning for main effects might miss important genetic variants specific to subgroups of the population, as defined by some exposure. In fact, interactions with opposite effects in 2 different exposure groups (crossing-interaction) will not show a main effect and therefore will not be identified by using standard approaches.

Despite the importance of gene-environment interaction for complex diseases, little work has been done to develop methods for detecting these types of interactions in the context of a GWAS. In this paper, we focus on identifying SNPs that demonstrate heterogeneity between subgroups defined by some environmental exposure. We introduce an efficient 2-step approach for detecting loci involved in gene-environment interactions that is performed independently of any initial scans for main effects. Our method expands on the traditional test for gene-environment interaction in a case-control study by incorporating a preliminary screening step constructed to efficiently use all available information in the data. We demonstrate that this 2-step approach is more powerful than the standard test of interaction across a wide range of models.

Let *D* be an indicator of disease status, and assume we have a sample of cases (*D*=1) and unrelated controls (*D*=0). Assume that information is available for a binary environmental exposure, with *E* as an indicator for exposure. Further assume that for each individual we have genotyped *M* SNPs spanning the genome, with *g*_{1}, *g*_{2}, … , *g _{M}* denoting the genotypes at the

(1)

Under a dominant coding of the genotype, for example, $\mathrm{exp}\left({\text{\beta}}_{g}\right)={\text{OR}}_{g}$ is the odds ratio (OR) comparing carriers of at least 1 risk allele (*G*=1) with noncarriers (*G*=0) among those unexposed (*E*=0). $\text{Exp}\left({\text{\beta}}_{e}\right)={\text{OR}}_{e}$ is the odds ratio comparing risk for exposed (*E*=1) with that for unexposed (*E*=0) individuals among noncarriers of the risk allele. Lastly, $\mathrm{exp}\left({\text{\beta}}_{ge}\right)={\text{OR}}_{ge}$ is the ratio of the genetic odds ratios comparing exposed with unexposed subjects, that is, ${\text{OR}}_{g|E=1}/{\text{OR}}_{g|E=0}$. If this ratio is equal to 1.0, or ${\text{\beta}}_{ge}=0$, we say that there is no interaction between genotype and the environmental exposure.

In the context of a GWAS, a standard approach to test for G × E interaction would be to perform a 1-df test of ${\text{H}}_{0}:{\text{\beta}}_{ge}=0$ for each SNP based on the model in equation 1. We assume that a likelihood ratio test will be used to test this hypothesis and denote it as the “1-step” test of interaction. A correction for multiple comparisons (e.g., Bonferroni, controlling the false discovery rate (6)) is required to achieve a desired genome-wide type I error.

One might consider using a case-only analysis of diseased individuals as the sole approach for identifying interactions in a GWAS. Indeed, the case-only test has been shown to be more powerful than a case-control analysis for identifying interactions (7, 8). However, the case-only analysis depends heavily on an underlying assumption of G-E independence in the population, which would be untenable across all SNPs being scanned in a GWAS. If there is an underlying population association between genotype and environmental exposure, a case-only analysis will result in an inflated number of false positives (7, 9).

We propose an alternative 2-step test to scan for interactions that combines the power of the case-only test with the protection from bias of the case-control test. Our 2-step scan for interactions consists of the following 2 steps:

- Step 1, screening test: For each of the
*M*SNPs, we perform a likelihood ratio test of association between G and E based on the logistic model $\text{logit}P\left(E=1|g\right)={\text{\gamma}}_{0}+{\text{\gamma}}_{g}G$. This is the standard test that would be applied in a case-only analysis of G × E interaction (7, 9), although, in our context, the test is applied to the combined sample of cases and controls. The subset of*m*SNPs that exceeds a given significance threshold (i.e., with*P*< α_{1}) for the test of ${\text{H}}_{0}:{\text{\gamma}}_{g}=0$ is analyzed in step 2. - Step 2, case-control test: The
*m*SNPs that pass step 1 are assessed in the traditional test of G × E interaction, that is, based on a likelihood ratio test of ${\text{H}}_{0}:{\text{\beta}}_{ge}=0$ derived from the model in equation 1. Significance at this step is defined as having a*P*value less than α/*m*, where α is the desired overall type I error rate.

Although step 1 of our procedure is also sensitive to the assumption of independence between G and E in the population, the step 2 comparison of cases to controls is not. Therefore, our overall 2-step procedure will provide a valid test in the presence of population-level association between genotype and exposure, a claim we will verify by simulation.

Given the reported power of a case-only analysis applied to only diseased individuals (7, 9), one may be tempted to apply step 1 to only diseased individuals, that is, to perform a true case-only test of interaction in step 1 and use it to define the subset of *m* markers to analyze in step 2. However, this approach produces a correlation between the step 1 and step 2 test statistics and leads to an inflated type I error rate for the overall procedure. Our screening test of G × E interaction applied to the entire sample of cases and controls eliminates the correlation between tests in steps 1 and 2 (Appendix 1) and, as we show (Appendix 2) and verify by simulation, preserves the overall type I error rate.

We performed simulations to study the power achieved by our 2-step testing framework compared with the traditional 1-step method. For each of 1,000 replicate data sets, we simulated a sample of 500 cases and 500 controls, each with genotype information on a large number of markers (*M* =10,000, 25,000, and 50,000). Although larger marker sets are likely to be used in practice, our chosen set sizes are sufficient to demonstrate the relative power of our 2-step method. Markers were assumed to be independent loci distributed across the genome. For each replicate, a single marker was chosen to be the true disease susceptibility locus, with remaining markers assumed to have no association with disease. We considered a range of minor allele frequencies (*q _{A}*) for the disease susceptibility locus, including 0.10, 0.20, and 0.30. For the remaining null markers, we simulated a uniform distribution of allele frequencies between 0.05 and 0.30. We also considered a range of values for the exposure prevalence (

As described above, the traditional case-only method of testing for G × E interaction is based on the assumption of no population-level association between the G and E. We explored the sensitivity of our method to population-level association between G and E by introducing a parameter *p _{ge}*, defined to be the probability that a given marker is associated with the exposure at the population level. With

For each parameter setting, we applied the traditional 1-step G × E test and our 2-step approach. We estimated the experiment-wise type I error as the proportion of 1,000 replicates in which at least one of the null markers was found to be significant after a Bonferroni correction for multiple comparisons. Power was calculated as the number of replicates in which the disease susceptibility locus was detected at an overall significance level of 0.05, again after a Bonferroni correction for multiple comparisons. We also computed the proportion of replicates in which the disease susceptibility locus was among the top 10 or top 25 most significant SNPs, that is, on a short list that might warrant additional scrutiny following a genome-wide scan.

Across a range of interaction effect sizes (*R _{ge}* =exp(β

Power for 1-step and 2-step analyses for varying levels of interaction effect size (*R*_{ge}). All other parameter settings remain constant under “base” model specifications (*M*=10,000, number of cases/controls=500/500, **...**

For various alternatives to the above parameter settings, Table 1 shows type I error and power for the 1- and 2-step methods for detecting G-E interaction, holding the interaction effect size fixed (*R _{ge}*=3.0). Both methods approximated the nominal 0.05 type I error under all scenarios, even when there was a population-level association (

As expected, a population-level association between markers and environment (*p _{ge}*>0) increased the number of markers that proceeded to step 2. However, for the range of values we considered plausible in a genome-wide scan (

Across a range of interaction effect sizes (*R _{ge}*=exp(β

Percentage of replicates for which the *P* value for disease susceptibility locus is ranked in the top *k* (*k*=10 or *k*=25) marker *P* values for varying levels of interaction effect size (*R*_{ge}). All other parameter settings **...**

Our 2-step method was more robust to changes in exposure prevalence, minor allele frequency, number of markers, and so forth, when we focused on the rank statistics (Table 2). Under our base model settings, the disease susceptibility locus was in the top 10 *P* values in 94% of the 1,000 replicates using the 2-step method compared with only 79% of the replicates when the 1-step method was used. Even under the least ideal circumstances, with a low exposure prevalence, the ranked *P* value for the disease susceptibility locus was still in the top 25 in 63% of replicates using the 2-step method compared with 35% when the 1-step method was used. Under all other scenarios we examined, the ranked *P* value using the 2-step method was in the top 25 for at least 85% of replicates, and the 2-step method always outperformed the 1-step approach.

For a GWAS in a case-control sample, we have shown that our 2-step testing approach provides a powerful alternative for testing G × E interaction relative to a traditional 1-step test. For the parameter settings we examined, the 2-step method was always more powerful than the 1-step method. Our method was also more robust than the traditional case-control test to changes in allele frequency, exposure prevalence, and other parameters when comparing the ranked *P* value for the true disease-susceptibility locus. Given its increased power and ease of implementation, the 2-step test is an attractive alternative for identifying G × E interactions in a GWAS for complex diseases.

We assumed a dichotomous environmental factor and dominant susceptibility locus in our simulations. In practice, the investigator may be interested in an alternative type of environmental factor (e.g., continuous, multinomial) or genetic model (e.g., additive, codominant). Both steps in our method can naturally be extended to accommodate any parameterization of the environmental exposure or genetic coding. The absolute power of these extensions would depend on the underlying data distributions, but we would expect similar increases in power for our 2-step approach relative to a 1-step approach.

A typical GWAS is conducted on a large sample size to achieve power to detect modest-sized effects at genome-wide significance after correction for multiple testing. We considered a scenario with only 500 cases and controls genotyped on 10,000 markers for our base model. With an increase in sample size, power to detect a marker involved in a G × E interaction would increase for both the 1- and 2-step methods. An increase in number of markers could increase or decrease power, depending on whether the increase in linkage disequilibrium between the disease susceptibility locus and the markers offsets the penalty for a larger number of tests. However, we would expect variations in sample size and number of markers to affect both the 1- and 2-step approaches similarly and therefore not affect the relative comparison of power for the 2 methods.

We simulated scenarios in which a proportion (*p _{ge}*) of the available null markers were associated with the environmental factor in the population to establish that our 2-step approach preserves the desired type I error rate. It is possible for a causal locus to influence disease risk through an interaction with some environmental factor and simultaneously to be associated with the same environmental exposure in the general population. Under this scenario, the population-level association between the causal locus and the environmental factor would affect the power of our screening step. For example, a population-level G-E association at the causal locus in the opposite direction of the G × E interaction effect may reduce the power of our method. On the other hand, a positive G × E interaction combined with a positive G-E association in the population would inflate the estimated G-E association in our screening step and would likely increase the overall power of our 2-step test.

Incorporating a screening step to improve power in genetic analysis has been proposed in other contexts. For example, Van Steen et al. (10) developed a 2-step method for genome-wide association tests of genetic main effects using family data. Their screening step was based on a regression model using between-family information, and it was statistically independent of the family-based association test used in the second analysis step. Similar to their approach, our proposed analysis begins with a potentially biased test in the first step that is designed to efficiently screen for potentially important SNPs and then uses an unbiased second step to ensure an overall valid procedure. Millstein et al. (11) also used a screening step in their Focused Interaction Testing Framework software to identify genes involved in G × G interactions in a study of many candidate loci. Although this screening test was biased in the presence of population-level association among genes, their second-step model ensured that the overall Focused Interaction Testing Framework approach was unbiased. Our results, in combination with those of Van Steen et al. and Millstein et al., demonstrate that well-designed 2-step approaches can lead to improved power in a wide range of genetic applications.

The additional power of our 2-step procedure comes from exploiting independent information provided by oversampling of cases relative to their prevalence in the population. In the presence of G × E interaction, this oversampling of cases induces an association between G and E in the combined case-control sample. Although it would be possible to develop an alternative 1-step test based on a likelihood that incorporates this additional information, such a test would not preserve the type I error in the presence of population-level G-E association. In the GWAS context, however, we can use the additional information derived from the oversampling of cases in a screening step to reduce the number of SNPs to be tested in the second step. When the power of the first step test is high, the chance that a true positive will be carried to the second step is also high. At the same time, a large number of null SNPs will be eliminated by the first-step screen, which reduces the multiple testing burden and results in our observed, overall gain in power to detect interaction at the causal locus.

Our 2-step approach is preferable to a case-only analysis of affected individuals, since the latter will have an inflated type I error rate in the presence of population G-E association. Even if a small subset of null SNPs has a population G-E association, one would expect several thousand false-positive results using a case-only analysis of interaction given the overall number of SNPs being tested. On the other hand, the type I error of our 2-step procedure is maintained because the second-step test is unbiased and the 2 steps are independent. Thus, even if there is a strong association in the population between E and a specific null SNP such that the SNP passes our first-step screen, the type I error rate for our overall 2-step procedure will be maintained.

Kraft et al. (12) proposed a powerful 2-df test for assessing genetic main effects and interactions jointly. They showed that under a wide variety of parameter settings, the 2-df test was often more powerful than a test of the main effect or the traditional test for G × E interaction. Although it is possible that a 2-df test would be more powerful than our 2-step testing framework, many investigators conducting a GWAS will want to begin with a full scan for genetic main effects. After conducting such a scan, use of the 2-df test would assess redundant information through partial retesting of the main effect. If a secondary goal is to detect genes involved in G × E interactions, our method allows it to be performed independently of the main-effect scan.

Several reported genome-wide association studies have conducted an initial scan of all available genotypes for main effects by ignoring heterogeneity between exposure classifications (13–17). It is possible that by focusing completely on main effects, SNPs with disease associations modified by some environmental factor were not detected. Still, it has been argued that focusing on G × E interaction is not advantageous over testing for main effects (18). However, if there is strong evidence that an environmental factor contributes to risk and possibly modifies a genetic effect, it is potentially important to define a testing strategy that uses independent information in a second scan across the available markers. We developed this method in order to use this additional information to detect genetic heterogeneity across subgroups that might otherwise be missed in a direct main effect test. Although we focused on subgroups defined by an environmental factor, our approach can be used to assess heterogeneity across racial/ethnic groups, genotypes at another locus (G × G interaction), or any other variable that can be used to classify study subjects.

Our method relies on a priori knowledge of factors that might be expected to modify the risk of genotype on disease. For example, the Children's Health Study, a prospective cohort study designed to investigate respiratory outcomes in children in 12 communities throughout southern California, has shown evidence to suggest that both regional air quality and proximity to traffic contribute to risk of asthma, reduced lung function growth, and other respiratory outcomes (19–23). For the GWAS being conducted in this cohort, simply a scan of the main effects that ignored the potential modification of genetic effects by air pollutants might lead investigators to miss genetic variants that are important determinants of complex respiratory diseases.

We have shown in the context of a GWAS that utilizing ascertainment information through a screening step of available markers can lead to substantial increases in power to detect a gene involved in a G × E interaction. We furthermore showed that this 2-step approach is more robust to changes in environmental exposure, minor allele frequency, and so forth, than the traditional 1-step test for identifying highly ranked SNPs. Our approach therefore has the potential to increase the yield of a given GWAS by identifying additional, important loci that act in concert with an environmental or other factor to influence risk of a complex disease.

Authors affiliation: Department of Preventive Medicine, University of Southern California, Los Angeles, California (Cassandra E. Murcray, Juan Pablo Lewinger, W. James Gauderman).

This work was supported in part by the National Institute of Environmental Health Sciences (T32 ES013678, 5P30ES007048, 5P01ES009581, R826708, RD831861, 5P01ES011627, 5R01ES014447, and 5R01ES014708), the National Heart, Lung, and Blood Institute (5R01HL087680, 5R01HL61768, and 5R01HL76647), the National Human Genome Research Institute (P50 HG 002790-02), and the Hastings Foundation.

Conflict of interest: none declared.

- E
- environment
- G
- gene
- GWAS
- genome-wide association study
- OR
- odds ratio
- SNP
- single nucleotide polymorphism

We assume a binary exposure *E* and binary genotype *G*, although the following result can be readily extended to multilevel exposures and genotypes. Let us consider the 2 × 2 × 2 table for a case-control study in which, as before, *D* is the binary case-control indicator:

The standard interaction *G* × *E* odds ratio is OR_{2}=(*n*_{11}*n*_{14}/*n*_{12}*n*_{13})/(*n*_{01}*n*_{04}/*n*_{02}*n*_{03}), and the *G* × *E* odds ratio pooling cases and controls is OR_{1}=(*n*_{11}+*n*_{01})(*n*_{14}+*n*_{04})/(*n*_{12}+*n*_{02})(*n*_{13}+*n*_{03}). Log(OR_{1}) is the numerator of a Wald test statistic for step 1, and log(OR_{2}) is the numerator of a Wald statistic for step 2. We show that the asymptotic covariance Cov(log(OR_{1}), log(OR_{2}))=0 by using the delta method. The delta method establishes that if a random vector **T*** _{n}* is asymptotically multivariate normal

The vectors **Y**=**(***n*_{11}, *n*_{12}, *n*_{13}, *n*_{14}) and **Z**=(*n*_{01}, *n*_{02}, *n*_{03}, *n*_{04}) of observed counts in cases and controls are independent and have a multinomial distribution Mult(*p*_{1}, *p*_{2}, *p*_{3}, *p*_{4,} *n*_{1}) and Mult(*q*_{1}, *q*_{2}, *q*_{3}, *q*_{4,} *n*_{0}), respectively, where *n*_{1} is the number of cases; *n*_{0} is the number of controls; *p*_{1}, *p*_{2}, *p*_{3}, *p*_{4} are the cell frequencies of the 2 × 2 *G* × *E* table for the cases; and *q*_{1}, *q*_{2}, *q*_{3}, *q*_{4} are the cell frequencies of the 2 × 2 *G* × *E* table for the controls. By the standard approximation to the multinomial distribution, **X**=**(Y, Z)** is asymptotically normal with mean **μ**=E[**X**]=(*n _{1} p_{1}, n_{1} p_{2}, n_{1} p_{3}, n_{1} p_{4}, n_{0} q_{1}, n_{0} q_{2}, n_{0} q_{3}, n_{0} q_{4}*) and partitioned covariance matrix

If *f*(**X**)=(log(OR_{2}(**X**)), log(OR_{1}(**X**))), we have

and by the delta method, the asymptotic covariance matrix of *f*(**X**) is

where *g*=(*n*_{1}*, n*_{0}*, p*_{1}*, p*_{2}*, p*_{3}*, p*_{4}*, q*_{1}*, q*_{2}*, q*_{3}*, q*_{4}) is a rather lengthy expression involving *n*_{1}*, n*_{0}*, p*_{1}*, p*_{2}*, p*_{3}*, p*_{4}*, q*_{1}*, q*_{2}*, q*_{3}*,* and *q*_{4}. The asymptotic joint distribution of log(OR_{1}) and log(OR_{2}) is then bivariate normal with covariance given by the off-diagonal entry of the 2 × 2 matrix above, that is, zero. Note that this result holds for *any* values of the cell frequencies and thus for any values of the underlying model parameters. This establishes the asymptotic independence of the 2 Wald statistics and, because they are asymptotically equivalent, of the corresponding likelihood ratio test statistics that we propose.

Let *M* be the total number of SNP markers, ${\mathrm{\mathcal{M}}}_{0}$ be the subset of SNPs that are true negatives, ${\mathrm{\mathcal{M}}}_{1}$ be the subset of SNPs that are true positives (i.e., for which there is G × E interaction), and *M*_{0} and *M*_{1} be the number of SNPs in ${\mathrm{\mathcal{M}}}_{0}$ and ${\mathrm{\mathcal{M}}}_{1}$, respectively. Let *T*_{1k}, *T*_{2k} be the first- and second-step likelihood ratio statistics for SNP *k*, 1 ≤ *k* ≤ *M*, α the desired genome-wide type I error level, and α_{1} *≈* Pr(*T*_{1k}>*c*_{1}*|*H_{0}) the step-1 type I error.

By a standard Bonferroni inequality argument, the genome-wide type I error is guaranteed to be less than α if the critical values *c*_{1} and *c*_{2} are chosen so that $\text{Pr}\left({T}_{1k}>{c}_{1}\text{,}{T}_{2k}>{c}_{2}\right)\le \mathrm{\alpha}\text{/}{M}_{0}$ for *k* ${\mathrm{\mathcal{M}}}_{0}$. Because *T*_{1k} and *T*_{2k} are (asymptotically) independent (Appendix 1), it is equivalent to requiring

(A1)

Now, the number of true negative SNPs tested in the second step can be written as ${m}_{0}={\displaystyle {\sum}_{k\in {M}_{0}}I({T}_{1k}>{c}_{1})}={\displaystyle {\sum}_{k\in {M}_{0}}{I}_{k}}$, where *I _{k}*=

1. Ito H, Matsuo K, Hamajima N, et al. Gene-environment interactions between the smoking habit and polymorphisms in the DNA repair genes, APE1 Asp148Glu and XRCC1 Arg399Gln, in Japanese lung cancer risk. Carcinogenesis. 2004;25(8):1395–1401. [PubMed]

2. Stern MC, Johnson LR, Bell DA, et al. XPD codon 751 polymorphism, metabolism genes, smoking, and bladder cancer risk. Cancer Epidemiol Biomarkers Prev. 2002;11(10 pt 1):1004–1011. [PubMed]

3. Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273(5281):1516–1517. [PubMed]

4. Browning BL, Browning SR. Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet Epidemiol. 2007;31(5):365–375. [PubMed]

5. Zhao J, Jin L, Xiong M. Nonlinear tests for genomewide association studies. Genetics. 2006;174(3):1529–1538. [PubMed]

6. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc (B) 1995;57:289–300.

7. Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! Am J Epidemiol. 1996;144(3):207–213. [PubMed]

8. Gauderman WJ. Sample size requirements for association studies of gene-gene interaction. Am J Epidemiol. 2002;155(5):478–484. [PubMed]

9. Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994;13(2):153–162. [PubMed]

10. Van Steen K, McQueen MB, Herbert A, et al. Genomic screening and replication using the same data set in family-based association testing. Nat Genet. 2005;37(7):683–691. [PubMed]

11. Millstein J, Conti DV, Gilliland FD, et al. A testing framework for identifying susceptibility genes in the presence of epistasis. Am J Hum Genet. 2006;78(1):15–27. [PubMed]

12. Kraft P, Yen YC, Stram DO, et al. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63(2):111–119. [PubMed]

13. Easton DF, Pooley KA, Dunning AM, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447(7148):1087–1093. [PMC free article] [PubMed]

14. Hunter DJ, Kraft P, Jacobs KB, et al. A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet. 2007;39(7):870–874. [PMC free article] [PubMed]

15. Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research. Saxena R, Voight BF, et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science. 2007;316(5829):1331–1336. [PubMed]

16. Scott LJ, Mohlke KL, Bonnycastle LL, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316(5829):1341–1345. [PMC free article] [PubMed]

17. Zeggini E, Weedon MN, Lindgren CM, et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science. 2007;316(5829):1336–1341. [PubMed]

18. Clayton D, McKeigue PM. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet. 2001;358(9290):1356–1360. [PubMed]

19. Gauderman WJ, Avol E, Gilliland F, et al. The effect of air pollution on lung development from 10 to 18 years of age. N Engl J Med. 2004;351(11):1057–1067. [PubMed]

20. Gauderman WJ, Avol E, Lurmann F, et al. Childhood asthma and exposure to traffic and nitrogen dioxide. Epidemiology. 2005;16(6):737–743. [PubMed]

21. Gauderman WJ, Vora H, McConnell R, et al. Effect of exposure to traffic on lung development from 10 to 18 years of age: a cohort study. Lancet. 2007;369(9561):571–577. [PubMed]

22. McConnell R, Berhane K, Gilliland F, et al. Air pollution and bronchitic symptoms in Southern California children with asthma. Environ Health Perspect. 1999;107(9):757–760. [PMC free article] [PubMed]

23. McConnell R, Berhane K, Gilliland F, et al. Asthma in exercising children exposed to ozone: a cohort study. Lancet. 2002;359(9304):386–391. [PubMed]

24. van der Vaart AW. Asymptotic Statistics. Cambridge, United Kingdom: Cambridge University Press; 1998.

Articles from American Journal of Epidemiology are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |