Researchers have widely used meta-analysis in genetic association studies to combine information from different populations and increase sample sizes. However, it is rarely used to combine different types of data in a single population. Genetic researchers will often collect phenotypic information in addition to disease status to better understand the pathophysiology of disease development and to maximize study findings; in many of these instances, the information is on intermediate phenotypes. A meta-analysis method incorporating both the disease status and intermediate phenotype should be more powerful than a traditional case-control study method. In the present study, we examined a modified inverse-variance weighted meta-analytical method. Simulation studies showed that this method is more powerful than the traditional case-control method in association analysis of complex diseases, especially for identification of disease loci having very minor effects. Also, compared with Fisher's combined probability test, inverse-variance weighted meta-analysis is more robust as it has a bigger power and a lower type I error rate. We set the MAFs of the SNP marker and disease locus as equal in our simulation studies and we observed that the results of the tests were similar when the MAFs were set differently (results not shown). In addition, the intermediate phenotypes in both patients and controls were available in this study. This phenotype is sometimes only available in patients because either the quantitative trait is expressed in them only or the cost of measuring the quantitative trait in controls is too high. Our simulation study showed that the meta-analysis was still better than the case-control study method when the quantitative trait was only available for patients (results not shown).
We further applied meta-analysis to empirical data analysis. Smoking behavior, which can be quantified as smoking duration or smoking quantity, is the most important risk factor for lung cancer development. In 2008, several replicated studies showed that there was a strong association between the nicotinic acetylcholine receptor subunit cluster of genes (CHRNA
) on chromosome 15q25.1 and lung cancer. But there was no conclusion on whether the association was direct or mediated via smoking behavior. Hung's group 
observed an increased risk even in non-smokers, which implied at least some of the risk was not mediated via smoking. Thorgeirsson et al. 
suggested that the association with lung cancer was mainly mediated through smoking behavior. In 2010, researchers using genome-wide approaches provided conclusive evidence for a strong association between CHRNA
genes and smoking behavior 
. There is reason to believe that CHRNA
genes are associated with both smoking and lung cancer. Smoking behavior is an attribute associating with increased lung cancer risk. The method that we derive can be applied equally well to either intermediate phenotypes or to behavioral attributes that associate with increased risk for a disease. To address this comment, we revised our paper by inserting discussion about modeling either intermediate phenotypes or other quantitative risk factors into the model. The GWA study incorporating the quantitative trait of CPD with the imputed genotype data detected significant SNPs on chromosomes 3, 15, and 19. The signal in the CHRNA3-CHRNA5-CHRNB4
region was much stronger in the meta-analysis than in the case-control study. The highest p
value was 1.98×10−9
, which was a very strong signal in our small sample size (1154 cases and 1146 controls). Many independent studies have replicated the finding of association of CHRNA3-CHRNA5-CHRNB4
on 15q24 with lung cancer and smoking behavior. Our results further confirmed this finding. Also, it suggested that CPD is abehaviorally mediated risk factor for lung cancer or an intermediate phenotype that is involved in lung cancer risk. Whether or not the genetic effects of the nicotinic receptor variants on chromosome 15q25.1 directly contribute to lung cancer risk or only contribute through their effects on smoking behavior is a topic of ongoing debate and further study. Mediation analyses 
have shown both direct and indirect (through smoking behavior) effects of the SNPs in this region on lung cancer risk. In contrast, a study of the SNP effects on cigarette per day use versus cotinine levels among smokers shows a much stronger effect on cotinine levels 
. This finding suggests that reported cigarettes per day is inadequately capturing the actual exposure individuals experience to nicotine, but this observation still does not indicate yet the exact pattern of relationship of the genetic effects on smoking versus lung cancer risk 
The SNPs rs1800469 and rs2241712 in the promoter of the TGFB1 gene on chromosome 19 were associated with chronic obstructive pulmonary disease and lung cancer in smokers in previous studies. These polymorphisms can only be detected in our study using meta-analysis. Thus, meta-analysis combining an intermediate phenotype and the disease status is a powerful tool for detecting genetic variants in complex disease association studies, especially when the effects of the susceptibility loci are minor. The significant SNPs detected in these verified regions demonstrate that our modified inverse-variance weighted meta-analysis is a reliable method for genetic association studies when an intermediate phenotype is available.
In the lung cancer study, the intermediate or behaviorally related phenotype, smoking quantity, has a positive relationship with disease status. This positive correlation may not always be true. For example, there is a negative relationship between brain size and Alzheimer's disease. In this case, the quantitative trait can be specified as the measurement of the overall brain shrinkage from the patient's normal brain size, which has a positive relationship with the disease. Researchers may use prior studies to assess correlations between the intermediate phenotype and the disease of interest to help determine how this information should be combined in the joint analysis.
In this study, the modified inverse-variance based test was applied when only one intermediate phenotype is available. Statistically, it can also be applied when multiple intermediate phenotypes are available in the data as this method is based on the combination of estimators from several regression tests with the modified inverse variance as the weights. However, consideration is needed on the complicated disease model when multiple intermediate phenotypes are existent. The disease model could include multiple disease pathways with each one having an intermediate phenotype in it, or one pathway with more than one intermediate phenotypes in it, or even a mixture of them. Further investigation is needed for the application of this method in a more complicated situation.
Whereas an intermediate phenotype is very useful in GWA studies, it also has potential to help researchers understand the intricate interactions among the disease associated genes and elucidate the complicated mechanism underlying the human diseases. The rapid development of microarray technology has made genome-wide gene expression profiles available to researchers. The gene expression levels are closely linked with both the genetic variants and disease status, providing a large number of intermediate phenotypes for complex diseases. Meta-analysis combining the disease status and gene expression data will be very powerful in identifying the functional genetic variants associated with complex diseases. This modified inverse-variance weighted meta-analytic approach is a promising tool in deciphering complex disease codes.