shows AGRA's main interface with uploaded gene lists. The application offers a novel way to compare ranked lists of genes with the help of BCS. BCS is a set of ranked biomedical concepts gathered through FACTA where they are grouped into six different categories. FACTA can be queried by inputting a word (e.g. P53), a concept ID (e.g. UNIPROT: P04637) or a combination of these ‘[UNIPROT: P04637 AND (lung OR gastric)]’. AGRA calculates BCS for a single gene list in three steps: (i) calculation of protein BCS; (ii) calculation of gene symbol BCS; and (iii) calculation of gene list BCS.
To achieve this, each gene symbol from the gene lists is associated with its protein(s) and their Uniprot IDs are extracted with help of the Affymetrix annotation file (HG-U133 Plus 2 Annotations, Release 31). AGRA then queries FACTA with these Uniprot identifiers and maximum 50 most important biomedical concepts (ranked by their frequencies of appearing in the MEDLINE abstracts) from each category are extracted. Concepts that are gathered in this step represent six BCS categories of each associated protein.
Next, BCS categories for the gene symbol are calculated. If the gene symbol is associated with only one protein, its BCS is identical to the protein's one. When the symbol is associated with more than one protein, the average values of the frequencies in each category are calculated.
In the final step, the six categories for each gene list BCS are calculated. This is done by summarizing values from all gene symbol BCS categories from the list. Because the order of the gene symbols in the list is crucial, AGRA weights each gene symbol BCS according to the gene symbol position in the list. The weight w for single symbol xi is defined as w(xi) = (n − (i − 1))/n, where n is number of all its concepts and i represents the rank of the gene that concept belongs to in the gene list (starting from 1).
Finally, to avoid sending queries to the FACTA system too often, AGRA saves BCSs in a local database. Whenever a gene symbol, for which BCS has not been defined yet, appears in one of the gene lists, the system queries FACTA, calculates its BCS and saves it locally.
When BCSs for all gene lists are extracted, AGRA calculates the overlap values for every combination of two BCSs to evaluate the effectiveness of FS methods. Overlap is a simple method to measure similarity between two BCSs where biomedical concepts that appear in both BCS are counted and divided by the number of concepts in the shorter BCS. Another way to compare FS methods is to search for the position of relevant biomedical concepts in the final gene list BCS. Position of a single biomedical concept is defined as it is ranked number among all the concepts in one of the categories. This way, researchers can decide which FS method selects the most important concepts and ranks them higher compared with other methods.