SOURCE allows users to query individual genes as well as retrieve selected attributes for many genes in batch. When searching for individual genes, users can query the database via a gene's name (whether the official HGNC name or a historical alias), the LocusLink identifier, the current UniGene cluster identifier, the GenBank accession of a sequence associated with the gene through UniGene, or a cDNA clone identifier. The flexibility of this search interface is important, since users may have access to only a few of these attributes for the genes they are studying. In order to increase the likelihood of successful gene name searches, we have assembled the largest collection of gene aliases available on the web by combining synonym data from a large number of sources.
The capacity to access gene-level data through searches using clone identifiers is particularly practical for users of DNA microarrays, as most spotted array platforms employ cDNA clones, each of which may be represented by multiple ESTs. In this fashion, SOURCE can reveal potentially chimeric cDNA clones, which are associated with ESTs that map to multiple UniGene clusters or genes. Currently, no other publicly available database offers this search functionality for accessing both gene- and clone-level data.
SOURCE allows for dynamic linking to both GeneReports and CloneReports. This feature is particularly useful when browsing large data sets. For example, when visualizing datasets with TreeView (16
), linking of the gene or clone names to SOURCE allows users to find detailed information about each gene or clone with just a click. Similarly, external websites, such as supplements to published functional genomic datasets (e.g., see http://genome-www.stanford.edu/hostresponse/
) are made much more generally useful by linking of each gene or clone name to SOURCE.
One of the most important and unique features of SOURCE is the ability to simultaneously extract data for thousands of genes in batch, thus eliminating the need for laborious cross-referencing of data from external databases. This is particularly useful for functional genomic studies, where it is necessary to continually update information on the genes and clones being examined. For instance, researchers interested in the mapped position or subcellular localization of a list of genes can extract these attributes with ease, and perform statistical analyses such as assessing the enrichment of certain functional attributes within clusters of genes (17
). Since the data in SOURCE are refreshed weekly, users can also use this utility to regularly update annotations associated with genes or cDNA clones of interest. Input can be via a text file uploaded to the server or by pasting the queries into a text box. Batch SOURCE can be searched by clone identifier, accession number, gene name, gene symbol, UniGene identifier, or LocusLink identifier. Retrieval options include gene name, aliases, LocusLink ID, chromosome location, subcellular localization, representative accessions (protein or mRNA) and Gene Ontology annotations.
Use of SOURCE has steadily grown over the past two years. Today, thousands of researchers query the system on a daily basis, totaling over 100 000 hits per month. Individual GeneReports make up the majority of accesses, with the gene expression browser and the batch retrieval utility being extremely popular as well. Reciprocal links now exist to and from a number of databases, including SwissProt, GeneCards, and the UCSC Genome Browser.