The TSC repository consists of two subcomponents: (A) a primary Oracle database which contains all TSC SNP data and (B) a secondary MySQL database, serving as the DCC website back end, which contains a subset of the primary database consisting of data for publicly released SNPs. Data used to discover the SNPs, such as tracefiles and related data, are stored on a fileserver, since those are not normally needed after the SNPs have been called, except when website users want to view traces for individual SNPs, in which case the file is retrieved from our FTP-site.
The website software layer consists of a set of simple Perl CGI-scripts that run through the ubiquitous Apache webserver, an HTML-rendering module and a database module. This layer produces webpages for single SNPs (Fig. ), perform keyword searches for SNP IDs, gene descriptions and more.
Figure 2 A SNP report showing SNP details, including observed alleles, submitting lab, flanking sequences, genomic location reported allele frequencies and more. Information on this page can be dumped to textfile in various formats, for one or many SNPs at a (more ...)
Recent deployment of the Generic Genome Browser (GBrowse) package (10
) from the GMOD project for genome browsing on our website has allowed us to simplify the TSC codebase greatly. GBrowse uses a Bio::DB::GFF database, a subcomponent of the Bioperl toolkit (11
), as its back end, which we have loaded with genome annotations available from NCBI. These annotations include genomic coordinates for RefSeq mRNAs, corresponding LocusLink genes, NT contigs, STS markers and, in particular, reference SNP clusters (12
). We now use NCBI-produced coordinates for the corresponding reference SNP clusters as the genomic locations for TSC submitter SNPs contained within those clusters. We plan to import into the Bio::DB::GFF database features from Ensembl (3
) as well and possibly other sources.