Written in Perl, ASSIMILATOR retrieves, queries and processes information for the desired SNPs from the UCSC Genome Browser's public MySQL database and displays this in a simplified, user-friendly manner. All available ENCODE tracks are queried in addition to predefined tracks, such as mRNAs, ESTs and CpG islands. In addition, eQTL data hosted by the Pritchard laboratories (
http://eqtl.uchicago.edu), PolyPhen2 functional annotation (
Adzhubei et al., 2010) and SNP location relative to the gene are displayed. Multiple systems have been designed to improve the efficiency of data retrieval such as an XML-based track database, which minimizes the number of database queries and multi-threading support to query multiple SNPs simultaneously, reducing processing time with minimal reduction in individual performance.
The output can be viewed in a standard web browser and allows the user to quickly identify SNPs, which could be functionally important. To add extra functionality, the ability to view selected SNPs in NCBIs dbSNP (
Sherry et al., 2001) and in the UCSC Genome Browser has been incorporated into the output. To efficiently display features for a SNP in the UCSC Genome Browser, only tracks that contain features in the SNP region are displayed. The user interface has been designed to allow further mining of the output () to display information from the multiple cell types and links to external data. This includes the ability to view the detailed experimental data thereby allowing users to assess the biological relevance of the results in the context of the thresholds and criteria used. ASSIMILATOR automatically queries any new tracks appearing from the ENCODE project on UCSC and includes these in the analysis. To further ensure ASSIMILATOR stays up to date, an option is available, which searches all UCSC database versions for ENCODE tracks and automatically uses the latest suitable version [currently March. 2006 (NCBI36/hg18)]. The ENCODE data release policy places restrictions on the publication of ENCODE data; therefore, the date at which the data becomes unrestricted is also displayed to aid the user.
To analyse the data, a hierarchical approach can be employed by the user, where isolated evidence for conservation across species, evidence of histone modification or mapping to a methylated region might be assigned a low weighting by the user; conversely, consistent evidence for a region being active, such as evidence for histone modification, DNase-1 hypersensitivity and open chromatin in the same cell line, coupled with evidence that a SNP lies within a transcription factor binding site (TFBS) would receive a higher weighting and could help to prioritize that SNP for functional work and may inform the design of such studies.