The most common workflow in Gitools consists on four simple steps that can be followed in different ways (see ): (i) prepare data, (ii) perform analysis, (iii) browse data and results, and (iv) export tables, figures and reports.
Main steps and features of Gitools.
There are three data types that Gitools understands: matrices, modules and tables (see ). A matrix is a bi-dimensional structure in which for each dimension (row and column) there is a value. Modules (also known as gene-sets or concepts) are lists of genes or other biological elements sharing a common biological property. For example, all the genes involved in the cell cycle could form the module “cell cycle”. Gene Ontology terms and pathways are commonly used as modules. A table is a list of attributes, where each row is an element of the list and each column an attribute. Gitools supports many different file formats that represent these data types, which are easy to generate using any spreadsheet application like Excel or any text editor. Some of the supported file formats (i.e. GMX and GMT) are used by existing data repositories such as Molecular Signatures Database (MSigDB) 
, facilitating the used of gene-sets from this resource in Gitools.
Gitools allows retrieving data from several external data sources in the form of Gitools data types (matrices, modules and tables). The current version implements importers for Biomart 
, IntOGen 
, KEGG 
and Gene Ontology 
. The user can import matrices, modules and tables from those sources, and save them for posterior analysis and/or visualisation. One important advantage of Gitools data importers is that it allows the user to choose among many different types of gene identifiers for many different organisms. This makes Gitools a powerful application to work with genomic data independently of the organism or the type of gene identifiers.
IntOGen contains information of genes altered in hundreds of oncogenomics experiments. The alterations can be up-regulation and down-regulation for transcriptomic data, and gain and loss for copy number genomic data. Gitools can retrieve this information in the form of matrices and modules. Each of the available data types can be accessed at the level of individual experiments comparing tumour versus normal tissues among several samples, or at the level of combinations of experiments by tumour type. The ability to import data from IntOGen makes Gitools a valuable tool for oncogenomic data mining, and allows the comparison of user experimental data with hundred of already analysed oncogenomic experiments and combinations of experiments.
Gitools also allows retrieving data from Biomart systems. Biomart comprises many public databases as well as local installations that can be accessed from Gitools. One important database that can be accessed is Ensembl 
, it provides annotations for genes for many different organisms with cross-references for many types of gene identifiers. Further, the user can retrieve modules and annotations from different Ensembl releases. The generic importer from Biomart is very powerful as it allows users to obtain any type of data present in Biomart databases in the form of Gitools files. However it is also complex to use if one is not familiar with the data available in Biomart and its functioning. For that reason we have created dedicated importers to facilitate the import of the most commonly used modules and gene-sets such as Gene Ontology terms.
In addition we have created a dedicated importer for KEGG database, which includes the advantage of obtaining KEGG pathways for many organisms with many different types of gene identifiers.
The implemented methods in Gitools are enrichment, oncodrive, correlations, overlaps and combination of p-values ().
The enrichment analysis is useful when the analysis at the level of genes is not enough to capture the full complexity of biological systems and a higher level view is required, for example at the level of pathways or biological processes. We have implemented different methods of hypothesis testing that can be used depending on the nature of the data. For matrices with real data (e.g. expression log2 ratios) we implement a z-score test using bootstrapping for mean or median estimation. For data measuring events (e.g. whether a gene has been found to be differentially expressed or not) we implement binomial test and Fisher's exact test. As many tests are performed at the same time, multiple test correction for the p-values can be applied.
The oncodrive analysis is a method to find genes or items more altered than expected by chance taking into account the whole matrix. It has been designed to identify genes that are significantly altered in sets of tumours (see IntOGen article 
for more details). The combination of p-values is used to produce a combined test of significance across a set of experiments. When several experiments testing the same hypothesis are analysed, a natural question that arises is whether the combined evidence among them supports the hypothesis. However the individual experiments are often not directly compatible to produce a single large combined data set to be analysed together. Though, one can still produce a combined test of significance across a set of experiments. After computing p-values for each experiment independently we can integrate those results using the weighted Z-method 
. This method is very convenient for integrating results obtained with other analyses like oncodrive or enrichment.
Other commonly used methods that we have implemented are correlation and overlap analysis. It is possible to perform correlations to compare patterns among matrix columns and rows, for example to compare patterns of expression among genes for different samples. It is also possible to analyse the overlap of positive elements between columns and rows in a binary matrix.
Browse data and results
In Gitools results and matrices are represented as interactive heat-maps. This type of visualisation is very convenient because it allows having a wide view of the data as well as quick comparison of columns and rows. Several actions are available to perform over the heat-maps: sort by different criteria, display several annotations of rows and columns, search, filter the matrix by value or label, move rows and columns freely. Additionally one can perform clustering and calculate correlations and overlaps by columns or by rows. Every cell in the heat-map is associated with one or several values, for example in a heat-map resulting from an enrichment analysis every cell contains the statistical values derived from the analysis, such as the right, left and two-sided p-values, corrected p-values, observed and expected numbers, etc. By default a relevant value (e.g. right p-value for an enrichment analysis) is shown in colours in the heat-map, however the user can choose to show any associated value in the colour scale mode in each case. By clicking to each cell the user can see all the associated values in that cell (i.e. the details of the enrichment statistics for example). All these actions allow the user to interactively explore the data. Any matrix file can be opened as a heat-map without having to perform any analysis before, which allows the use of Gitools as a generic heat-map browser and visualizer.
To represent values of matrices and results we have implemented several colour scales, which are fully configurable for colours and significance level. In addition to changing the colour scale used for heat-maps there are other editable features that help to customize the heat-maps, such as the size of the cells, the grid lines, the font and the size for labels, etc. (see ).
Export tables, figures and reports
The last step is to generate tables and figures that can be used in presentations or manuscripts as well as reports that can be shared with collaborators or published in Internet. Tables are exported as tabulated text format that can be easily opened with any spreadsheet editor like Excel, heat-maps and scales can be exported as images, and reports in HTML.
Command line tools
Gitools is focused on being user friendly but also powerful enough to be used by advanced users in complex pipelines. This is why we have implemented some of the features available through the graphical interface using command line tools. Currently the following functions are implemented: gitools-convert to convert between file formats, gitools-enrichment to perform enrichment analysis, gitools-oncodrive to perform oncodrive analysis, gitools-correlation to perform correlations and gitools-overlaps to perform overlap analysis. Upgrades are scheduled for the near future.
Comparing Gitools with other programs
Gitools has important unique features compared to other existing tools for genomic data analysis which make it a valuable addition and complement to many other software tools that permit the analysis and integration of novel data with previous knowledge, such as for example: MeV 
, GenePatterns 
, geWorkbench 
, DAVID 
, Babelomics 
, GoMiner 
, ConceptGen 
, GSEA 
The most important distinctive feature of Gitools is the capacity of navigating data and results in the form of interactive heat-maps. This capacity makes Gitools useful for representing any matrix in the form of heat-map and navigate effectively the data on it even without doing any analysis. In addition, after any analysis performed in Gitools the results are shown as heat-maps in which each cell contain detailed information of the analysis results and it is possible to navigate between the results heat-map to the original data heat-map. Table S1
describes Gitools features in depicting and navigating heat-maps compared to other commonly used tools to depict heat-maps, namely MeV 
, GenePattern 
, Genesis 
, PageMan 
, CIMminer 
and matrix2png 
Gitools currently performs 5 different types of analysis. Two of these analyses are unique to Gitools and not present in any other genomic software to our knowledge, Oncodrive and combination of p-values. On the other side there are many tools that perform enrichment analysis over a variety of gene sets and modules (see 
for a review). The most important advantage of enrichment analysis in Gitools is that many conditions can be analysed at the same time and the results compared between them intuitively using interactive heat-maps. For example, in the case that many cancer transcriptomes (or various experimental conditions) are analyzed and we want to perform an enrichment analysis for modules (e.g. pathways) in each of them and compare the results for the different tumours, Gitools provides important advantages (see next section and for a real case study). First, to analyse all the samples (or conditions) a single run of the analysis is performed instead of one analysis run per sample as in most of the EA tools. Second, the results are shown in a heat-map, i.e. one column per sample and one row per module, which facilitate the comparison between samples and modules. Third, different actions, such as sorting, filtering, clustering, correlations and overlaps, can be performed over the results heat-map, which facilitate the exploration and interpretation of the results. Fourth, navigation from results heat-map to original data heat-map is possible, in the example proposed, this would allow to navigate from the module heat-map to a heat-map containing the genes in a particular module. Another advantage of enrichment analysis in Gitools includes the availability of many gene sets from generic and dedicated importers (i.e. Biomart, Gene Ontology, KEGG, IntOGen), which are available for many organism and many different types of gene identifiers. It is also important to note that Gitools understand various file formats for data matrices and gene sets. For example gmt and gmx file formats, which allows the use of the extensive collection of gene sets in MSigDB 
in Gitools. Also tcm format (two column mapping), which is a common format used for gene sets, for example by DAVID knowledge base 
. Table S2
describes similarities and differences of Gitools to other commonly used software to perform enrichment analysis (GSEA 
, DAVID 
, ConceptGene 
, ToppGene 
, Babelomics 
, Gominer 
To illustrate the use of Gitools we have prepared one case study in which all 5 analysis methods currently available in Gitools are used (). We used a data set containing 156 non-small cell lung carcinomas and adjacent normal lung tissue samples by Hou et al. 
We started with a matrix of expression values (median-centered log-intensity values divided by standard deviation) for the 156 samples (). We performed Zscore enrichment analysis for KEGG pathways for each sample of the dataset to identify pathways in which genes tend to have significantly higher or lower expression values (). For example, we found that genes involved in cell cycle, homologous recombination and genes that encode proteins of the spliceosome have higher expression values in the tumour samples compared to normal samples, however genes involved in MAPK signalling pathway, apoptosis, osteoclast differentiation and focal adhesion have significant lower expression values in tumour samples, specially in large cell carcinoma (LCC) subtype. We performed correlation analysis between expression values of different samples, finding higher similarities among samples with the same clinical classification (). This recapitulates the result obtained in the original manuscript describing this dataset 
. We next obtained log2ratios between tumour and normal samples and used Oncodrive to identify genes that are significantly upregulated in this set of tumours (). Next we imported from IntOGen p-values for upregulation for other experiments analyzing lung tumours. We combined the p-values of the imported experiments and Hou et al experiment to identify genes that are significantly upregulated in lung cancer taking into account several lung cancer experiments (). We also compared the overlap of genes up-regulated in Hou et al experiment and the other lung cancer experiments imported from IntOGen ().
Documentation and tutorials
The documentation on Gitools includes a user's guide, practical tutorials (including a tutorial to perform all the analysis presented in the Case Study section and shown in ), data and results examples and links to courses and presentations. It can be accessed either from the main web of Gitools at http://www.gitools.org
or directly from http://help.gitools.org
. All analysis wizards include a real biological example that allows the user to automatically fill the wizards with practical cases in biology to help exploring Gitools features step by step. Users are welcome to subscribe to the newsletter to stay updated about new releases or relevant events.
The use of high-throughput techniques, such as micro-arrays and more recently sequencing techniques, is very common in genome research. The analysis of these data requires dedicated tools. Gitools types of analysis and visualization has shown to be very useful in a number of different types of projects ranging from cancer genomics 
, plant genomics 
, molecular biology 
, and evolutionary genomics 
. For instance Gitools has been used for the integration of cancer genomics data 
, the study of RBP2 role in differentiation 
, a genomic analysis in melon 
and the study of functional divergence in the evolution of Homo sapiens 
In summary, we have presented Gitools, a desktop application for genomics data analysis, which main features are the use of interactive heat-maps to navigate the data and results and the ready data import systems from several sources (i.e. Biomart, KEGG, IntOGen and Gene Ontology). These features are available to researchers without advanced knowledge on bioinformatics as well as to more experimented users that need to perform many of the operations available using the command line.