The case-control and family-based PCC methods provide a computationally fast method of simultaneously testing rare and common variants and adjusting for covariates. Using a case-control design, the results obtained with the GAW17 simulations were similar to the results obtained using the WBC and SUP methods. The PCC methods are also much faster than the other two permutation-based approaches. Comparing our results to the true model used for simulating the data, we found that genes FLT1 and KDR are indeed part of the gene list that controls the quantitative trait Q1. These genes have a common variant (MAF equal to 0.07 and 0.16, respectively) with a moderate or small effect (odds ratio of 1.92 and 1.15, respectively) and nine rare variants, each with an odds ratios ranging from 1.17 to 2.9. Some of the other genes controlling Q1 are represented in the top 30 lists obtained with the three methods, but none reaches significance when correction for multiple testing is implemented.
Although the top two genes (VEGFA and FLT1) obtained with the family-based association test using the PCC method are in the list of genes controlling Q1, their test statistics are not significant. This can be explained by the simulation procedures used for the GAW17 family data sets, which consisted of eight extended families. Our analysis split the eight pedigrees into 194 nuclear families and assumed that the nuclear families were independent.
Using only the first principal component seemed to be a reasonable choice where adjustment on multiple covariates was needed. However, the first principal component accounted on average for 60% of the variability within the genes. In practice, one might choose to include even more principal components to capture more genetic variability. The impact of this on power would be interesting to evaluate in future studies.
A strength of the PCC approach is its ability to evaluate large numbers of variants that are not amenable to fitting with a conventional logistic regression model. Another solution to this issue is to use shrinkage and variable selection methods, such as the least absolute shrinkage and selection operator (LASSO) [15
]. Future work could compare the PCC method to these approaches.
In the PCA of each gene, the number of variables included was small (less than five in most cases) because rare variants were aggregated beforehand and only nonsynonymous variants were considered. However, for a larger number of variables, sparse PCA may provide a better alternative [16