As shown, a direct comparison of the power and type I errors of the eight multiallelic approaches was not a straightforward task for Group 13. However, we were able to draw some useful conclusions regarding the analysis of rare variant data with multiallelic approaches. The first point is that a key contributor to success in detecting associations is using a method that matches the genetic model of inheritance of the trait under analysis. For the GAW17 data, this model was primarily polygenic. The values of quantitative traits Q1 and Q2 were based on the contributions of numerous rare variants, although some common SNPs were also causal. The binary trait was derived from these two quantitative traits. The models are described much more extensively by Almasy et al. 
. Thus we thought that we could assess the success of the multiallelic methods in the context of those models if we examined the Group 13 detection rate for the two types of genes contributing to the simulated traits. Genes rather than variants were selected because all the methods were designed for multiallelic rather than individual variant data, although the gene was not used consistently as the unit of analysis. For each trait with a genetic contribution, we selected the gene with the largest number of causal rare variants and the gene with the causal common SNP. Both criteria were likely to provide the best chance for detection.
When comparisons were possible, we were pleased to see that detection of the six genes (three traits with two genes each) was fairly consistent across the methods. For the genes with the largest number of rare variants, FLT1
was detected for Q1 by all three work groups that analyzed it, but only one of the four work groups analyzing Q2 detected BCHE
(Wilson and co-workers, who used a clustering method). For the four groups analyzing the binary trait, only Bueno Filho et al. 
, using a Bayesian hierarchical mixture model, reported PIK3C2B
. Unfortunately, that group had a large false-positive rate, making their finding less remarkable and more consistent with the others. To address this, Bueno Filho and colleagues suggest that frequency weighting factors in their Bayesian mixture models might become clearer once methods to implement penalties for overfitting are incorporated.
Regarding the three genes with common SNPs for Q1, KDR
was selected by Wilson et al. 
and Yan et al. 
, indicating that both methods may be sensitive to SNPs with higher allele frequencies; the method of Christensen and Lambert 
did not detect KDR
, possibly because of the model they used to select genes for testing. All groups detected VNN1
for Q2, and none of the groups detected HSP90AA1
for the binary trait. This consistency in detection is applicable only to the GAW17 models, and the true model of how rare variants actually contribute to complex traits currently remains an open question. It will be interesting to see how the models used in the GAW17 simulations match what is observed in the analysis of rare variants as a substantial amount of sequence data become available over the next few years.
An additional lesson derived from the work of Group 13 is that decisions regarding the entities to be analyzed should be considered as seriously as the method of analysis. The entities can vary widely and should be constructed to match the presumed models of inheritance and the methods that will be used. As an example of a biological model used by Group 13, Christensen and Lambert 
applied biological insight regarding the relevance of haplotyping when they proposed a model of inheritance that required risk genotypes (rare variants on opposite haplotypes) rather than risk alleles (rare variants on the same haplotype) for testing a gene. Although their ability to capitalize on this model could not be addressed with the polygenic GAW17 data, it is reasonable to consider their model when analyzing sequence data. We expect that insights into the best entities will evolve over time as more work is done to investigate the role of rare variants.
Other decisions regarding the definition of the entities (e.g., genes, sliding windows, or pathways) will also affect the results. In its most simple form, the number of entities will require a particular level of significance, thus affecting the adjustment for multiple testing. The decision will also affect coverage of the genes and genome, which can influence the power to detect associations. Perhaps Sykes et al. 
are correct in suggesting that to boost power, we should filter genes to select those with associated common SNPs in order to better assess the role of rare variants if there is not adequate power to detect associations with the rare variants alone using multiallelic approaches. Pathways, as illustrated by Cherkas et al. 
, are beginning to be considered and may provide a useful strategy for rare variant aggregation based on functional implication.
Thus the overarching lesson is that creativity in modeling the genetic and biologic process with the genetic entities and statistical methods is an important factor in the detection of causal rare variants, and the process is not simple.