Here we demonstrate the performance of FDR-FET from three perspectives. First, we assessed the selectivity and sensitivity of the method. Second, we compared FDR-FET with other gene set enrichment analysis methods. Because FDR-FET takes P values as input and does not differentiate the directions of gene regulation, we chose two popular implementations of the same category, ie, a simple FET and PAGE. Third, we compared the results generated from different reference set options.
In general, the sensitivity of gene set enrichment analysis can be improved by removal of background noise, which can have a strong impact on the FDR result through removing the bottom n percentile of low intensity probes or probes flagged as “absent”, or similar. Consolidation of probes onto the gene level is also recommended to improve independence of measures, which is one assumption of FET.3
For example, Affymetrix probe sets can be consolidated by associating each gene with the most significant P
value among all probe sets for the gene. Alternatively, one can utilize the updated probe set definitions, which have been shown to improve the precision and accuracy of microarray data analysis.17
We utilized a microarray dataset from a published study on the cellular effects of three human immunodeficiency virus (HIV) protease inhibitors.19
It is well known that patients taking protease inhibitor drugs to treat HIV-autoimmune deficiency syndrome often develop a lipodystrophy-like syndrome, including hyperlipidemia, peripheral lipoatrophy, and central fat accumulation.20
Parker et al19
have shown that protease inhibitors could induce gene expression changes indicative of dysregulation of lipid metabolism, endoplasmic reticulum stress, and metabolic disturbance. These results are consistent with clinical observations, and provide a basis for a molecular mechanism for the pathophysiology of protease inhibitor-induced lipodystrophy.
The probe set level expression data was generated using the MAS 5.0 algorithm with quantile normalization,21
and the 20% lowest expressed probe sets were removed. A one-way analysis of variance with respect to the “drug treatment” factor was performed to generate the sorted gene list by P
values. We utilized gene sets from both the Gene Ontology22
project and the Kyoto Encyclopedia of Genes and Genomes
Validation of FDR-FET
To demonstrate the sensitivity and selectivity of FDR-FET, we generated 1000 randomized gene lists while retaining the same set of P values from the analysis of variance. We ran FDR-FET on each of these gene lists using reference set option 1 (ie, “genes”) and maximal FDR at 35% for every gene set in KEGG. The 95th and 99th percentiles of the negative log of P values were calculated for every gene set, and these values are found to center around 1.9 and 2.6, respectively (). As expected, no gene set shows any large deviation from the others. By contrast, the P values generated from the real dataset exhibit a nonuniform distribution with only a few highly significant gene sets. Importantly, the top three gene sets with the largest separations from the 99th percentiles are the targets of HIV protease inhibitors, ie, aminoacyl-tRNA biosynthesis (KEGG:hsa00970), biosynthesis of steroids (KEGG:hsa00100), and glycolysis/gluconeogenesis (KEGG:hsa00010).
Figure 1 Performance assessment of FDR-FET using simulated datasets. P values are calculated for gene sets from the KEGG for each of the 1000 randomized gene lists using FDR-FET (with the option “genes” and maximal FDR 35%). The 95th (red, squares) (more ...)
Comparison of FDR-FET with a simple FET test
Many of the existing gene set enrichment analysis implementations are based on FET with a fixed P value or fold-change cutoff. To compare the performance of FDR-FET, which employs a flexible cutoff criterion, with that of a typical gene set enrichment analysis, we analyzed the regulated gene list generated with an arbitrary FDR cutoff (35%). Table 1 contains the 10 most significant gene set hits calculated by FDR-FET using reference set option 1 (ie, “genes”) and maximal FDR at 35%. This list includes all the established major targets of the HIV protease inhibitors (lipid metabolism, amino acid metabolism, gluconeogenesis, and endoplasmic reticulum). By contrast, when a single arbitrary FDR cutoff (35%) is used, the effect on gluconeogenesis associated with the pathophysiology of protease inhibitors is missed. Moreover, as depicted in , the P values for three representative gene sets reach the maximal significance at different FDR cutoffs, demonstrating that the utilization of a flexible cutoff criterion indeed maximizes the signal to noise ratio of a gene list for individual gene sets.
Figure 2 The impact of cutoff criterion on gene set analysis result. The influence of the FDR cutoff on the size of regulated gene list (bars, right axis) and on the significance of selected gene sets (calculated with the option “genes”) for the (more ...)
Comparison of FDR-FET with PAGE
PAGE analysis was performed using the whole vector of P
values from the one way analysis of variance as input. Because PAGE is based on the central limit theorem that requires gene sets to be sufficiently large, we only examined those gene sets with sizes ≥10. The negative log of P
values for three gene sets (ie, GO:0006418, GO:0004812, and KEGG:hsa00970) are set to 20 because they all have a P
value of zero by PAGE analysis. Again, we could identify all the major targets of HIV protease inhibitors in the top 10 gene set hits from PAGE output (Appendix 1
). Interestingly, the results from FDR-FET and PAGE show high concordance, despite the fundamental difference in their underlining methodologies (). Using a gene set negative log P
value cutoff of 3, PAGE identified 76 significant affected gene sets, whereas FDR-FET identified 79, among which 63 are shared between the two methods. In particular, the two top 10 hit lists have eight gene sets in common.
Figure 3 Comparison of the analysis result of FDR-FET with that of PAGE. P values are calculated for gene sets from the Gene Ontology and KEG for the human immunodeficiency virus protease inhibitor experiment using FDR-FET (with the option “genes” (more ...)
Because PAGE is a parametric test, it is generally more liable to gene outliers. In other words, a gene (or a few genes) with a sufficiently large fold-change may lead to significant testing results for the gene set of which the gene is a member. For instance, GO:0008652 and GO:0000049 have highly significant P
values by PAGE, but only modest P
values by FDR-FET (). A close examination of the genes annotated to these two gene sets reveals that both contain a couple of genes with extremely low P
values from the analysis of variance test (Appendix 2
). By contrast, genes in FET-based methods have equal weight, and the P
value reflects the gene set enrichment in the regulated gene list, true to the name of gene set enrichment analysis. There are areas where FDR-FET and PAGE can complement each other. For example, FDR-FET is more robust when the gene set size is small and when PAGE cannot produce a reliable P
value. On the other hand, incomplete gene annotation may affect FET-based methods more than PAGE because lack of knowledge is counted as a “true negative” in the contingency table.
Comparison of different reference set options
When the biological experiment is performed using a focused gene array (ie, a subset of genes from a genome), the whole genome is used as the reference set, and the number of “true negative” is inflated, leading to unrealistic small P values in gene set enrichment analysis outputs. Therefore, one must evaluate what is (close to) the true “universe” for an enrichment analysis. We have introduced new options to address this issue:
- “Genes” whereby all genes tested are counted in the gene set enrichment analysis calculation, assuming that the gene sets are universally representing the genome universe
- “Intersection” can be used when the gene sets are selected to represent a restricted universe, eg, signaling pathways; in this case, only genes that are present in at least one of the signaling pathways are counted
- “Union” represents the general case by which any genes are counted once they are present in either the regulated gene list or the gene sets (“genome as reference set”). In options “genes” and “union”, annotated and unannotated
genes are both counted in the reference set, while in option “intersection”, genes are only counted when they are annotated in at least one of the gene sets. Table 2 contains the 10 most significant gene set hits by the option “genes” and the corresponding P
values, and ranks by options “union” and “intersection” calculated using maximal FDR at 35%. All three options identified the main HIV protease inhibitor targets, present in the top 10 s, except for gluconeogenesis, which is ranked 12th in results generated from the “union” option. Using a gene set negative log P
value cutoff of 3, the options “genes” and “intersection” identified similar numbers of affected gene sets, 79 and 73, respectively, among which 71 gene sets are shared between the two hit lists. By contrast, the “union” option identified 96 gene sets, of which 21 are unique to this option and appear to be nonspecific and unrelated to the drug effects upon close examination, suggesting a possible loss of selectivity with this option (Appendix 1
). The effect of “intersection” becomes more apparent when smaller gene sets are used. The P
values and the order of the hits are altered when considering smaller reference sets (Appendix 1
and Appendix 3
). By selecting an appropriate reference set, we can enhance the sensitivity and selectivity and reduce the number of spurious hits.