Research efforts over the past few years have yielded an explosion of exome sequencing studies and exomic variation data (reviewed in 
). One surprising result has been the discovery of hundreds of thousands of novel and rare nonsilent variants in protein coding genes, some of which may have functional consequences related to human health. Common diseases, once hypothesized to be primarily due to common variants 
, are now believed to have heterogeneous genetic causes, due to both common and rare variants 
These developments have created demand for a new generation of statistical and informatics approaches. Increasingly powerful analysis methods have been developed to enable detection of association between phenotype and variants with small to moderate effect sizes (reviewed in 
). Rather than testing each variant individually, variants can be collapsed or summed with a “burden” approach, in which the strength of phenotypic association is considered with respect to a group of variants occurring at a common region or allelic frequency threshold 
. The contribution of each variant to the association may be weighted by frequency or bioinformatically predicted impact 
. Burden strategies yield a power gain, compared to independent tests of single variants, but they lose power when variants with a neutral or protective effect are included. Regression models 
and overdispersion tests 
have been designed to detect variants that affect phenotype, regardless of the direction of the effect (deleterious or protective). New approaches continue to be introduced, such as a mixture model that incorporates gene-gene interactions and an adaptive weighting procedure 
. A recent study has even suggested that single-variant test statistics may be more powerful than collapsing strategies on real data 
. Importantly, no single method appears to be superior for all phenotypes, genomic regions, disease models, and populations 
Here we describe a new hybrid likelihood test BOMP (Burden Or Mutation Position test), designed for case-control exome sequencing studies, to detect the presence of causal variants in a functional group. The functional group can be defined as a gene, genomic region, or gene set (multiple genes involved in a pathway or biological process). The test can incorporate variant weighting by bioinformatically-predicted functional impact. We combine, into a single statistic, a directional burden test in which low frequency variants have increased weight and a non-directional position distribution test that does not consider allele frequency. Our burden test uses a collapsing strategy and metrics of variant functional importance, which are similar to previously published burden tests (Table S1
). An advantage of our test is that its formulation into a likelihood ratio uniquely allows us to combine it with the position distribution test. The two tests complement each other and together yield increased power to detect biologically important variants, particularly when applied to a gene set containing genes with different kinds of variants (e.g.
, rare, low frequency, common, protective).
To assess the utility of BOMP, we compare its power to three leading methods for variant case-control association testing: VT 
a mutation burden statistic, SKAT 
a regression model and overdispersion test, and KBAC 
, which uses mixture modeling and kernel density estimation. We generate simulated case-control studies ranging from 200 to 5,000 individuals, using two demographic growth models, and eight disease etiologies
(models of disease causation). We also apply BOMP to dichotomized empirical data from a study of quantitative traits, the Dallas Heart Study, which investigated the association between variants in angiopoietin-like (ANGPTL) proteins and triglyceride metabolism 
In these experiments, BOMP is consistently powerful across a spectrum of disease causality models, in simulations of case-control studies drawn from populations of African-American and European-American individuals, and for the ANGPTL variants from the Dallas Heart Study. It appears to be particularly useful for detecting genes containing causal variants when protective variants are present, when a disease phenotype is associated with variants that cluster in key regions on a gene, when a causal variant is common, or when applied to a candidate gene set, rather than a single candidate gene.
Finally, we apply BOMP to identify causal gene sets in an an ongoing, whole-exome case-control sequencing study of bipolar disorder. We find that seven gene sets are nominally associated with bipolar disorder and that one “MAPK signaling pathway” (KEGG) trends towards significance after correcting for multiple gene sets tested. Notably, this pathway has been implicated previously in bipolar disorder