The simulation is designed around the asthma study discussed in the data analysis section of this paper. The markers of interest comprise a five-SNP haplotype modeled after five SNPs in the ARG1
gene. We generated the parental haplotypes by drawing from a uniform distribution, in which the probability that any parent has a given haplotype is the haplotypic frequency as measured in the Childhood Asthma Management Program (CAMP) population.21
The haplotypes of the probands are obtained by simulating Mendelian transmissions of the parental haplotypes, assuming complete linkage disequilibrium in each haplotype. For the computation of the MFBAT statistics, the genotypes of probands and their parents are assumed to be known.
We simulate 1000 trios with five phenotypes and five SNPs and then evaluate the power of the proposed testing strategy to other existing testing strategies. Using the haplotypes that were generated from these five SNPs in the CAMP population, the haplotypes with frequencies of 0.1, 0.2, and 0.3 are each selected to be the disease susceptibility loci, and the genotypic distribution under the alternative hypothesis is generated using E
) for the marker score as described by Lange and Laird.16
The haplotypes for the remaining SNPs are simulated under the null hypothesis, assuming Hardy–Weinberg equilibrium and complete linkage disequilibrium. Five phenotypes are generated, in which one phenotype is associated with one haplotype and the four remaining phenotypes are only associated with the haplotype by their correlation with the associated phenotype. Three different phenotypic correlations were used in the simulations: (1) a low phenotypic correlation, in which the phenotypic correlation between all phenotypes is 0.2; (2) a moderate phenotypic correlation, in which the correlation between each phenotype is 0.38, which reflects the average phenotypic correlation matrix of the five asthma phenotype measurements in the CAMP clinical trial; and (3) a high phenotypic correlation, in which the phenotypic correlation between all phenotypes is 0.8. The strength of the additive effect relative to the phenotypic variance is measured by the heritability, h2
. Specifically, when the phenotypes are generated for the simulation, they are generated using a regression equation that results in the specified heritability measurement between the phenotype and genotype. The phenotypic vector Yi
for each offspring is a random sample from a multivariate normal distribution, that is, Yi
), in which al
is the additive effect for the l
th phenotype, xi
is the individual genotype, and V
is the (5 × 5) variance matrix. We measure the strength of the additive genetic effect on a phenotypic trait by the heritability h2l
which is the proportion of phenotypic variation explained by the genetic variation – that is, h2l
This expression for h2l
can be solved for al
which is a measure of the genetic effect size.
We evaluate the power of the proposed testing strategy with other analysis approaches, including testing five phenotypes at five SNPs separately using the FBAT statistic (denoted as a single-SNP/single-phenotype test or SSSP), testing five phenotypes separately with a five-SNP haplotype using a family-based haplotype test (denoted as hap), testing five phenotypes at each SNP separately using the FAT-PC methodology (denoted as FBAT-PC), using MFBAT on each SNP but combining all five phenotypes (denoted as MFBAT-1S), testing all five SNPs with each phenotype individually using the MFBAT approach (denoted as MFBAT-1P), and testing all five SNPs and five phenotypes simultaneously using the MFBAT statistic (denoted as MFBAT). All multiple comparison corrections were made using the Bonferroni procedure with overall α
=0.05. The R2
measures the degree of linkage disequilibrium between two SNPs, in which an R2
of 0 indicates that there is no linkage disequilibrium between two SNPs and an R2
of 1 indicates perfect linkage disequilibrium between two SNPs. When simulating the SNPs with an R2
of 0.7, we used the actual structure of the SNPs in the ARG1
gene that is used in the data application. These SNPs have the following haplotypic distributions:
- Haplotype AATAT=0.534
- Haplotype TGATT=0.303
- Haplotype TATAT=0.108
- Haplotype TGATC=0.028
- Haplotype AGATT=0.019
- Haplotype TGAAT=0.008
We assumed that the disease haplotype had frequencies of 0.10 (haplotype TATAT), 0.20 (we changed haplotype TGATT to have this frequency), and 0.30 (haplotype TGATT). An additive mode of inheritance was assumed in the simulations. The reported power estimates are based on 1000 replicates. The results of the simulations are shown in . Similarly, when simulating the haplotypes with an R2
of 0.2, we used the following haplotypes:
- Haplotype AACAA=0.367
- Haplotype TACAA=0.129
- Haplotype TGTAA=0.100
- Haplotype AGCGT=0.012
- Haplotype AGTAA=0.066
- Haplotype AGCAA=0.039
- Haplotype TGTAT=0.038
- Haplotype AATAA=0.029
- Haplotype AACGT=0.026
- Haplotype TGCGT=0.092
Power simulations of the MFBAT methodology compared with the standard approaches using 1000 trios
Because we simulate the data using a five-SNP haplotype, presumably the statistical tests using all five SNPs would have the most power. In the simulations, we use all five SNPs together (MFBAT, MFBAT-1P, haplotype) and scenarios in which we only use one SNP at a time (MFBAT-1S, FBAT-PC). Because the simulations presented in the paper analyze the power using all five SNPs individually and together, we believe that we have captured the power that MFBAT approach has to detect the DSL in the two extreme situations, one in which what is being tested is exactly the DSL (ie, when all five markers are used simultaneously) and one in which we are testing only a part of the true DSL. The results of the simulations are shown in .
In all simulation scenarios, MFBAT or MFBAT-1S had the highest power estimates of the approaches that were used, and in many cases these approaches substantially outperformed both the haplotype and SSSP approaches. As MFBAT-1S and FBAT-PC are methodologically very similar, it is not surprising that these two approaches have comparable power estimate in all of the tested scenarios. When the R2 between the SNPs is 0.20, MFBAT is consistently the highest powered of the various testing strategies regardless of the haplotype frequency or the phenotypic correlation. In this scenario, MFBAT has between two and three times the power of the haplotype approach and over 10 times the power of SSSP. In fact, when the phenotypic correlation is 0.8 and R2 is 0.20, MFBAT is nine times as powerful as the haplotypic approach and over 60 times as powerful as SSSP. When the R2 between the SNPs is 0.70 and the phenotypic correlation is moderate (0.4) or low (0.20) with a minor allele frequency >0.2, then the power of FBAT-PC, MFBAT-IS, and MFBAT do not deviate notably from each other. The power of any of these three approaches is between 2 and 10 times higher than the overall haplotype test and SSSP. When R2 is high (0.7) and the phenotypic correlation is high (0.8), then MFBAT-1S and FBAT-PC substantially outperform the other approaches, with three times the power of MFBAT and approximately 30 times the power of the haplotypic approach or SSSP. In virtually all of the simulations, SSSP has the lowest power, which is not surprising because there are many comparisons that need to be adjusted for and the correlation between tests is not accounted for in any way. When the heritability is increased from 1 to 5%, the overall power increases, but the relative relationships among the statistical tests remains constant. In addition, simulations using an R2 of 0.5 show results in between 0.2 and 0.7 (data not shown).
We generated simulations under the null hypothesis of no genetic effect and calculated the empirical type I error rates for the MFBAT under three scenarios: (1) single phenotype and five SNPs, (2) five phenotypes and single SNP, and (3) five phenotypes and five SNPs. Simulations of 10
000 replicates were performed at minor allele frequencies of 10 and 30% and the α
was set at 5.0%. These results were compared with FBAT-PC, in which accurate type I error rates has been established many times. In all cases, the type I error rate does not vary by more then 0.2% above or below 5.0%. This small deviation suggests that the different MFBAT scenarios do not vary from the selected alpha level in any meaningful or systematic way. Therefore, we conclude that the type I error is being maintained appropriately.
Data analysis: an asthma study (CAMP)
We applied MFBAT to a collection of parent/child trios in the CAMP Genetics Ancillary Study. The CAMP study randomized asthmatic children to three different asthma treatments.21
Blood samples for DNA were collected from 696 complete parent/child trios from 640 nuclear families in the CAMP Ancillary Genetics Study. Genotyping was performed on five polymorphic loci located in the ARG1
Previously, an association with the ARG1
gene and bronchodilator response was identified and replicated in three independent populations.23
Therefore, we used this gene along with five phenotypes, including bronchodilator response and four measures of pulmonary function. The related phenotypes were pre- and post-bronchodilator measures of forced expiratory volume in one second (FEV1
) and forced vital capacity (FVC). FEV1
is the amount of air that can be forcibly blown out of the lung in 1
s. FVC is the total amount of air that can forcibly be blown out of the lung after full inspiration. Therefore, the following phenotypes were used in the analysis: measurements of (1) bronchodilator response, (2) pre-bronchodilator response to FEV1
, (3) post-bronchodilator response to FEV1
, (4) pre-bronchodilator response to FVC, and (4) post-bronchodilator response to FVC.
is known to be an important asthma phenotype that also depends on the age, height, weight, and sex of the individual. It is standard practice to adjust the FEV1
measurements for these covariates before analyzing them. Therefore, using the conditional mean model approach,16
we regressed the FEV1
and FVC measurements on the recorded values for age, height, weight, and sex and used the residuals in MFBAT.
lists the results from the various analyses that were used in the power simulations sections. We list the statistical test, the SNPs analyzed, the phenotypes used, the observed P-values, and the significance level. The column labeled significance level refers to the significance level (ie, alpha) that is necessary when accounting for the multiple testing. When there is one statistical test, then the P-value must be <0.05 to be significant whereas when there are five statistical tests, then the P-value must be <0.01 to be significant.
Analysis of ARG1 and five asthmatic phenotypes
All five SNPs in the ARG1 gene were also included in the analysis. The data were analyzed in six ways: (1) using MFBAT with all phenotypes and SNPs, (2) using MFBAT with all phenotypes and each SNP individually, (3) using MFBAT with all SNPs and each phenotype individually, (4) using the FBAT-PC association test for each SNP individually, (5) using a haplotype test for each phenotype individually, and finally (6) analyzing each SNP/phenotype combination individually. The results of these findings are summarized in . The minor allele frequency of the SNPs in this analysis ranged between 0.46 and 0.03, whereas the average phenotypic correlation was 0.38 and the average R2 was in between 0.2 and 0.7. In this example, significant associations are identified with FBAT-PC, MFBAT-1S, and MFBAT; however, the association findings for FBAT-PC and MFBAT-1S are not nearly as striking at low P-value observed using MFBAT (P-value=9.53 × 10−6).
From the data analysis we see that the MFBAT analysis, using all phenotypes and all SNPs simultaneously, has the strongest genetic association with a P-value of 9.5 × 10−6. When we analyzed the association P-values for individual SNPs and phenotypes, there is no specific phenotype/SNP combination that seems to be driving the effect. When analyzing the FBAT-PC and MFBAT-IS, it seems that rs2781659 and rs2781663 are driving the association. It also seems as if the pre- and post-FVC measures have the strongest association. When examined individually, however, the association is not as strong as when all SNPs and phenotypes are examined together. This is likely because the combinations of phenotypes are more informative than any single phenotype that is evaluated individually.
From the simulations described in , MFBAT would have the greatest power to detect an association, followed by MFBAT-1S and FBAT-PC. The haplotype and SSSP analyses would have the lowest power to detect an effect. In this analysis, MFBAT resulted in an overall P-value on the order of 10−6, which clearly shows a very strong association with the asthma-related phenotypes. Two SNPs were significantly associated with the phenotypes for both MFBAT-1S and FBAT-PC. As was observed in the simulations, these two test statistics have similar results and their association P-values track together closely. Both the haplotype analysis and SSSP had no significant associations after adjusting for multiple comparisons. This analysis illustrates that the MFBAT approach, which uses all of the information simultaneously, stands out as a possible association P-value that one would carry forward for further analysis.