Our detection of bimodal gene expression in skeletal muscle tissue reveals an apparently rare (<0.2%), yet potentially new predictor of human disease and phenotypes. Of the eight confirmed bimodal genes, we found that seven were expression quantitative trait loci, as they were strongly to completely associated with possibly causal SNPs occurring within a 200-kb cis region of the gene. This finding suggests that the majority of bimodal genes which may be found in healthy human tissue may be the result of polymorphisms within such gene regulatory regions. In the remaining potential bimodal genes, highly significant SNP associations were less common (possibly reflecting misclassification of bimodal genes), though still more frequent than occurs genome-wide. Hence in healthy individuals, eQTL are prime (yet not exclusive) locations for finding bimodally expressed genes. We also found that known copy number variation - while a potential explanation for some bimodal expression, was not found in higher abundance than across the genome, and is hence likely not the main source of such distributions.
The current study also assessed the relationship of RNA expression mode with known predictors of diabetes. We observed a likely association of insulin sensitivity with mode of ACTN3
expression, with 42.2% of the individuals falling in the higher expressed mode which was associated with insulin resistance. While not meeting strict correction for multiple testing, this gene has previously been associated with slow- and fast-twitch muscle fiber, with correlation to athletic performance [29
]. The bimodal expression of ACTN3
could reflect a bimodal distribution in the proportion of type 2 fibers present in the subjects, though we are not able to assess this directly. As insulin resistance is a known predictor of diabetes [30
], the role of ACTN3
expression deserves further investigation.
Many of the other bimodal genes identified have previously been reported to be associated with other diseases and characteristics ranging from cancer to arthritis [31
], some of these also reporting marked dichotomies in expression levels [38
]. Perhaps the most intriguing disease-related bimodal genes are the HLA
loci, as the HLA
system is known not only to convey risk for a variety of immune-related diseases, but contains the main genetic risk factors for type 1 diabetes [32
]. That three of the HLA
genes (DRB1, DRB5
, and C
) were found to have bimodal distributions - as well as ERAP2
which plays a final role in producing HLA
], is of great interest, and we are currently further investigating the role of these genes in relation to diabetes and various metabolic phenotypes in the Pima population.
The prevalence of bimodal gene expression we have observed in a healthy single tissue is approximately 100-times lower than what has been reported in mixed-tissue studies [21
]. Since these studies assessed distinct expression differences across heterogeneous tissues, a larger number of bimodally expressed genes would be expected. Our consistently low estimate gives confidence that at least in healthy muscle tissue, unambiguous bimodal expression is not common - providing a new reference prevalence for bimodality in healthy tissue to which studies of bimodality in diseased and mixed tissues may be compared.
Our estimate also differs greatly from the only other prevalence reported for bimodal expression in a population comprised solely of healthy persons - 28% in lymphoblastoid cells [18
]. As the threshold for assessing bimodality in that study was much more liberal, we re-analyzed the data from this population and found only 2.9% of genes of the limited gene set previously analyzed to show bimodality consistent with the thresholds used in our analysis (see Methods). We also assessed two other publically available data sets containing muscle biopsies, finding estimates of clear bimodality in these populations to be 0.72% and 1.4%, and again with <0.2% prevalence with replication of bimodality across these two data sets (data not shown). These validation sets used chips which had on average 1 or 2 probes per gene, whereas our present study is the first to assess bimodality in expression data from exon arrays which contained on average 49 probes per gene. Hence, the bimodality we observed represents a strong replicated signal which could be missed by a single probe, yet could also miss uniquely bimodal probes when averaged at the gene level. The additional validation of our confirmed bimodal genes in these other populations with low bimodal prevalence (Table ) provides even more convincing evidence of their authenticity.
We highlight two technical issues for future studies to be aware of: use of a model that allows for unequal variances in the bimodal distribution and the determining of batch effects. We observed that a model that allows for unequal variances was clearly more appropriate than a constant variance model for approximately 20% of the genuine bimodal genes identified in our data, and hence the failure to allow for such as well as differences in data scaling may contribute to some variability in prevalence estimates. Even more important is the potential of batch effects to cause artifactual bimodal expression. Such confounding can occur when any set of experimental factors is not uniform - causing the expression levels to be uniformly over- or under-estimated for a given transcript for a portion of the samples. We noted that these batch effects were associated more frequently with genes of higher transcript abundance levels. We also identified that chips which had distinct lot number differences and which had been processed on distinctly different fluidics station sets gave rise to such artifactual bimodality, though it was not possible to determine which of these factors had a more primary role in this study. However, because this grouped source of heterogeneity was identified for most arrays, bimodality could be analyzed in homogeneous groups, eliminating not only the unwanted bias, but also providing distinct replication sets in which tests for bimodality could be confirmed.