We have created a new web based browser for visualization of NGS data. The browser retrieves genomic features such as predicted gene models and their annotations from VMD
], and retrieves the transcriptome information from OTD. The first track on the browser displays the gene models predicted from the genomes, followed by the transcript depth of coverage track. The transcript depth of coverage is plotted in two different colors e.g. yellow for infected samples and blue for mycelia samples for P. sojae
. For H. arabidopsidis
, the track just displays one color, orange since the samples are only from infection library. The next two tracks are for transcript assembly where the transcripts are color coded for their expression values calculated in FPKM. Transcripts with expression values
1000 FPKM are considered very highly expressed and are color coded in red; transcripts with moderate to high levels of expression (FPKM value between 100 and 1000) are coded in green; transcripts with low to moderate levels of expression (between 10 and 100) values are represented in blue; low expressed transcripts (< 10) are represented in black [Figure
A, B]. The remaining tracks are the EST-derived unigenes mapped into the reference genome assembly. These are also color-coded according to the quality of alignment to the genome sequence and the coloring scheme is similar to that of Genbank blast results. The best alignments, in which there are no gaps in the query or subject alignments are color coded in red; the next best alignments, that have subject gaps but no query gaps are coded in green; the third category, in which there are both query gaps and subject gaps are coded in blue and the poorest category, that contains query gaps as well as mismatches, is coded in black.
Screenshots of Transcriptomics browser. (A) Transcripts page for P.sojae V1.0 assembly. (B) H. arabidopsidis 8.3 assembly.
The transcriptomics browser is the central component of the resource that enables users to walk over the genome assembly and discover important transcribed elements that may be missing from the annotation. One can switch from one organism to another on the browser by selecting the organism from the top panel drop down box of the main transcriptomics browser page [Figure
All the tracks are clickable leading to the transcript assembly page or EST unigene page depending on the tracks (more details in Additional file
). The transcript assembly page contains extensive information starting with the location of the transcript on the genome. If the transcript overlaps with ESTs or predicted gene models, links to the EST and gene model page is provided [Figure
A]. From the scaffold location link, one can reach the transcriptomics browser [Figure
B]. Recently, we have mirrored data from PTD, which is displayed on the main transcript page. On-the-fly Genbank blast feature is available from main transcript page [Figure
C]. The reads assembly and SNP viewer are linked from the main transcript page [Figures
Figure 3 (A) Screenshot of transcript assembly page. The circled area representing the genomic location of the transcript links into the transcript browser (B). The on-the-fly BLAST link carries out a nr blast against Genbank (C). The reads assembly link opens (more ...)
Figure 4 Screenshot of the SNP viewer. In this view, three different screens are merged. The first view (A) indicates there are some polymorphisms with the reads. The reference sequence remains fixed to the top of the browser window. As the user scrolls down ( (more ...)
Web based reads alignment viewer or SNP viewer
We have created a web based text alignment viewer on the reference genome. This viewer can also be used for SNP viewing and for correcting gene models based on the alignment of the transcript reads to the reference genome. Links to the text based viewer are provided from the main transcript assembly page that is based on the reads assembly on the reference strand. The top most row is the genomic reference followed by the reads mapped to them arranged in rows. As the number of reads increase, the page needs to be scrolled down and towards right to view the alignment. We have used java script for fixing the position of the reference strand on the screen vertically, so the users can always superimpose reference bases with the read bases (Figure
). This greatly helps in detecting substitutions, intron—exon location and a false assembly.
Transcriptomics blast site
We have significantly upgraded the transcriptomics Blast utility which carries out Wublast
] against 23 transcriptomics databases. The graphical user interface of the blast utility uses the standard bioperl Bio::GMOD::Blast::Graph package. We replaced the HTML writer utility with our own perl package, so that subject values would point to the correct links in our database [Figure
B, C]. All of the transcripts assembled from the NGS reads along with several additional datasets such as Soybean CDS are available for blast. If the user chooses external databases such as the soybean genome and Soybean predicted transcripts, the links are directed towards the Phytozome web site [Figure
B]. For the internal databases such as the transcript assembly database, the link directs to the main transcript page in OTD [Figure
C. If an EST or inhouse database is searched against, then the link directs to the appropriate page.
Figure 5 Screenshot of transcriptomics Blast page and output pages. The blast output page against a query sequence opens into (A). The subject sequences from the blast output provide links to the appropriate databases. For example, if the data has additional relevant (more ...)
Main annotation page for unigenes
Each unigene whether derived from ABI SOLiD, Illumina or EST data is given a unique id and has a primary annotation page and a detailed annotation page. The primary annotation page includes component ESTs that make up the unigenes. Unigenes can be queried by name from the query page with a wild card search or an absolute string search. If a wildcard search is performed, then a number of unigenes will be displayed on the output page, with a lot of information such as the number of component ESTs making up that unigene, their primary annotations etc. [Figure
A]. On click, each unigene page opens onto a new page that lists basic information about the unigene, the assembly plot, the primary annotation, links to the unigene annotation detail page etc. [Figure
B]. The assembly plot of the component EST sequences displays the matching and non-matching regions in a sequence cluster. This helps users judge the quality of the assembly. From the unigene primary annotation page, a one-click link is provided for BLAST searches against the NCBI nr database. If the unigene has an overlap with a gene model predicted from the genome sequence, then a link to the gene is provided on the primary annotation page. Also, users can choose to run a BLAT alignment of the unigene against the reference genome on-the-fly [Figure
C]. The detailed annotation pages for unigenes and contigs have details on InterProScan, TMHMM and SignalP annotations, and coding frame and ORF information [Figure
Figure 6 Screenshots illustrating query by unigene name. Results shown are from a wild card search with “CL1C.*”. (A) The output of the search. Click on the contigs (second column) links to the unigene or contig page (B) that has several information (more ...)
Figure 7 Main Contig/Unigene annotation page. From the contig page described in Figure
A, there is a link to the detailed annotation page (A). There are several annotations available such as interproscan, TMHMM, SignalP etc. We used an in-house (more ...)
Each component EST sequence of a unigene, if present is provided with a link, so that the user can reach the EST details with a click. The individual EST page has quality trimming protocols, other ESTs that overlap with the sequence and many more relevant information [Figure
B]. Also an on-the-fly BLAT option is available for EST sequences against the respective reference genome [Figure
A user-friendly query page enables users to query OTD using the following categories [Figure
1. By fold change in treated versus untreated samples.
2. By expression value.
3. By names of the unigenes or ESTs or contigs.
4. By primary and secondary annotation.
5. By number of ESTs present in a unigene.
Expression values of transcripts are represented as FPKM values. Users can choose an absolute value or a value range such as 10–20, <10, > 10 etc. to query the database. If a range value is chosen, then a number of records are retrieved and displayed on a page. Links are provided from this page to go to individual transcript pages or the page for an overlapping EST or gene model (if available) in VMD [Figure
Figure 9 Screenshot of query by expression value. In this example, an expression value between 10–20 was chosen for genes from P. sojae V5.0 assembly (A). The search retrieves a number of records (B), where the first column contains the links to the assembled (more ...)
Another useful query feature is that ability to retrieve transcripts that have a fold change between two given conditions. For example, in the case of the P. sojae V1.0 assembly, one can query and find all the transcripts that show a certain fold change (e.g. two-fold) between infection and non-infection conditions. Similar search options are also available for soybean datasets. Due to the data size and query time, options are currently restricted to searching by individual scaffolds.
EST-derived unigenes and contigs can be searched by exact id name or by a regular expression. For example, most of the EST contigs begin with CL1, so, users can query the database with CL1* [Figure
A]. If the user chooses to query by a single contig name, then the primary contig page with primary annotation and quality scores are displayed. If a contig has a overlapping gene model, the gene_id along with its VMD link is provided.
In addition to the utilities described above, there are a number of miscellaneous items available from the home page. Sequence statistics, cluster statistics, metadata information and library construction methods are accessible from this page. For P. sojae EST datasets, cluster statistics and details of the sequence distribution in EST clusters are listed with proper links to the main annotation pages.
The download site currently provides 39 curated data types for download. Users can request additional information if necessary through the available requisition form provided in the page.