GOEAST is an online GO term enrichment analysis tool. It is developed with improved functionalities to meet new needs appeared with the broad application of multiple high-throughput experimental technologies, such as microarray, SAGE, mass spectrometry, etc.
The web-based nature makes GOEAST very easy to use; analysis can be finished with a few clicks. The source data files used by GOEAST are automatically or semi-automatically updated, which ensures that the users will always receive the most up-to-date analysis results. The broadness of supported microarray types and species allows more users to take advantage of GOEAST. As have shown by aforementioned examples, the graphical output and multiple experiment comparison function of GOEAST are capable of identifying more useful information from input data.
Some available GO enrichment analysis tools use a universal gene set or the input gene list itself as the statistical background for calculating GO term enrichment P
), which would produce biased enrichment analysis results. To overcome this problem, GOEAST uses all probes in each microarray platform or all genes of a given species to calculate the background GO distribution, thus ensures the accuracy of analysis results. The commonly used statistical methods for GO enrichment analysis include binomial test, χ2
-test, Fisher's exact test and hypergeometric test (Supplementary Table 3
). Previous research has shown that Fisher's exact test and hypergeometric test are more accurate than binomial test and χ2
-test for GO enrichment analysis, for the latter two tests require large sample sizes that most GO terms do not have (29
). Since the Fisher's exact test and the hypergeometric test are statistical equivalent (29
), GOEAST uses hypergeometric test by default because it is much faster to be calculated than Fisher's exact test by R packages. Fisher's exact test and χ2
-test are also supported by GOEAST.
When doing the same statistical test many times, the multiple-test problem often becomes significant and will produce more false positive results. This problem is usually solved by controlling the overall FDR of the entire result instead of controlling type I errors (the raw P
-values) in each individual test or controlling the family-wise error rate (FWER), which is considered to be too strict for biological data (32
). There are two commonly used methods to control FDRs, the Benjamini–Hochberg method and the Benjamini–Yekutieli method (33
). The former one is suitable for independent multiple-tests whereas the later is suitable for positively related multiple-tests. Since the enriched GO terms among a given list of genes are often positively related, we chose the Benjamini–Yekutieli method to calculate FDRs in GOEAST by default. Several other adjustment methods are also supported.
Due to the hierarchical-dependent relationships of GO terms, the enrichment of some GO terms might also cause overpresentation of their neighboring terms, when the correlations of neighboring GO terms and their enrichment status were considered in the calculation. To overcome this problem, Alexa et al
) introduced an improved weight scoring algorithm to calculate the significance of GO terms that is thought to be able to reduce FDRs caused by overpresentation of neighboring GO terms. This algorithm is also supported by GOEAST, but since analysis using this algorithm is time consuming, the algorithm is not chosen as default method, users can activate it via the advanced parameter setting choice.
All functionalities of GOEAST have been tested thoroughly by different input data on various operating systems. We believe the development of GOEAST would help more biologists to discover hidden information of their high-throughput experimental results.