|Home | About | Journals | Submit | Contact Us | Français|
Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image.
BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically.
There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.
With the dramatic improvement of next-generation sequencing technologies over the last five years, there has been a corresponding increase in the amount of publicly available genomic data. As of February 2011, Entrez Genome Projects  catalogued 6,071 bacterial and archaeal genome projects. Of these, 1,444 had complete genome sequences, 42 percent of which were released within the last three years. In addition, 3,872 on-going genome projects were registered with the database; 1,734 of which had a draft sequence publicly available. These projects do not include the ten terabase-pairs of sequence data across more than 6,500 entries currently available in the Short Read Archive, the public repository specifically for raw data from next-generation sequencing . Current genome visualisation and data analysis methods are struggling to keep up as it becomes a routine requirement for biologists to compare a new genome to scores, if not hundreds, of other genomes at once.
Genome visualisation methods use linear or circular representations. Linear representations, like those that can be generated using Artemis Comparison Tool (ACT) , Genome2D , Combo , VISTA , Mauve , BugView  and Genomorama , have advantages in showing insertions and deletions between genomic sequences and certain programs, like Mauve and ACT, can show genome rearrangements. However, it is difficult to summarise large datasets using these tools. Programs that generate circular figures, like Microbial Genome Viewer  and Genome Projector , are designed to annotate a single chromosome and have no support for whole genome comparative data. These programs are restricted to published genomes and do not let users analyse their own genomic sequences. DNAPlotter  allows the user to input their own genome sequences and can show genome comparisons, but only by generating this information separately and loading it in as custom annotation tracks.
There are comparative circular genome visualisation alternatives available online, such as CGView Server  and GeneWiz browser , which allow users to upload their own sequences and provide a similar service, although GeneWiz browser can display mapped read data, whereas CGView Server cannot. However, both of these tools are only available as internet resources and limit the number of genome comparisons that can be shown on a single image. Command-line based alternatives and imaging libraries also exist, which require users to prepare all data and customisation through text files, such as Circos , CGView , Genome Diagram  and BLASTAtlas . While these programs are very powerful, they require command-line manipulation and scripting to use, putting them out of reach of many biologist end-users.
To address these issues, we present the BLAST Ring Image Generator (BRIG); an easy-to-use, cross-platform desktop application that enables rapid visualisation of BLAST comparisons to one or more central reference sequences using complete, draft or unassembled genome data.
The BLAST Ring Image Generator (BRIG) is a cross-platform desktop application written in Java 1.6. It uses CGView  for image rendering and BLAST  for genome comparisons. It has a graphical user interface, programmed on the Swing framework, which takes the user step-by-step through the generation of a circular image. The settings used to generate a particular image can be saved for re-use with different genome data, or the entire session can be bundled and saved for later. The image can be generated in JPEG, PNG, SVG or SVGZ format. An example of BRIG's output can be seen in Figure Figure1.1. A user guide describing step-by-step tutorials for several visualisation tasks and accompanying example files are provided at http://sourceforge.net/projects/brig/files/.
BRIG is capable of generating circular comparison images for prokaryote genomes, showing multiple genome comparisons in a single image, and displaying similarity between a reference genome in the centre against other query sequences as a set of concentric rings coloured according to BLAST identity. An example image (Figure (Figure1)1) produced by BRIG shows a comparison of a draft Escherichia coli genome with 13 other E. coli and 14 Salmonella genomes (Table (Table1).1). The varying colour gradient of rings 5-16 in Figure Figure11 indicates a BLAST match of a particular percentage identity, as shown in the key. BLAST matches can be filtered according to a minimum percentage identity or E-value cut-off (or indeed any available BLAST option). These matches are calculated from the perspective of the reference sequence; consequently, regions that are absent from the reference genome but present in one or more of the query sequences will not be displayed. Data from different genomes can be collated into a single lane, which enables visualisation of a large number of genomes and allows users to compare genomes as a group against the central reference sequence. This is shown in Figure Figure11 where the comparison results from 3 E. coli strains, MG1655, HS and W3310, have been grouped together to represent regions of the reference genome that are found in non-pathogenic E. coli.
Users can highlight regions of the reference genome with custom annotations by specifying the label text, colour, shape, and position of features either manually, or by uploading this information as a tab-delimited file. Alternatively, selected annotations can be uploaded from a GenBank or EMBL file; for instance, the annotations shown in the outermost ring in Figure Figure11 have been read from the GenBank file of E. coli O157:H7 str. Sakai  by selecting 'misc_features' that contain the text 'Sp' or 'SpLE', which correspond to annotated prophage regions .
Generating comparisons of a large number of genomes raises the issue of memory usage. To produce Figure Figure1,1, with its comparison against 27 genomes each of approximately 5 Megabase-pairs in size, one Gigabyte of RAM was required on a standard desktop computer. The memory requirement can be reduced by filtering the BLAST results according to E-value and percentage identity cut-offs within BRIG. Alternatively, the amount of memory allocated to BRIG can be altered from within the program.
A variety of genomic data sources can be used to produce an image, including BLAST comparisons of protein or nucleotide sequences from GenBank, EMBL and FASTA files. BRIG will internally handle all genome comparisons by converting GenBank or EMBL files into FASTA format, creating any necessary BLAST databases, running BLAST and converting the results into a format that CGView renders as the circular image. Users do not have to interact directly with BLAST or CGView, and nor is any knowledge of using command-line programs assumed. By default the central reference sequence is treated as the subject BLAST database with the rings representing matches to individual query sequences.
Users are taken step-by-step through the process to create a circular comparison image via a graphical user interface (Figure (Figure2).2). In the first screen (Figure (Figure2A),2A), users specify data they would like to compare to a central reference sequence. In the second screen (Figure (Figure2B),2B), users are able to configure the individual concentric rings; choosing which data they would like to show in each lane and make aesthetic choices including colour or ring size. Lastly, the settings can be reviewed and submitted for BLAST  alignment and image drawing using CGView  (Figure (Figure2C).2C). Image rendering settings and genome comparison configurations for a particular BRIG image can be saved and reused as an XML profile file. Alternatively, a number of sample templates are available to users, in order to quickly generate an image with optimised size and colour settings.
Users can add their own annotations to a BRIG image through the 'add custom features' dialog, to produce complex yet informative images. Users can alter every aspect of visualisation, including: image size, label visibility, texts and fonts, colours of ring lanes, the gradient reflecting percentage identity, and custom labels inside or outside of a ring.
Data such as transcriptome and microarray expression values can be graphed and displayed as a ring in the circular image. These custom graphs can be produced from user-defined data in a space or tab-delimited file that either includes; the start and stop positions and the value for that region; or a single value for every base pair, with one value per line. To ensure that a useful visualisation is produced regardless of the data source, the default graphing function is to display skew from the mean value, similar to the coverage graph in Figure Figure1.1. Users can choose to override this behaviour and scale the graph between zero and a user-defined value.
In many instances users may only be interested in the presence, absence or variation of a certain set of sequences amongst a number of different genomes. BRIG can visualise this kind of comparison if provided with a multi-FASTA sequence file (of genes, proteins or sequence regions) that will be concatenated to form the central reference ring. An example of such analysis can be seen in Figure Figure3A,3A, where the translated nucleotide sequences from genes encoded by the Locus of Enterocyte Effacement (LEE) pathogenicity island in the Enterohaemorrhagic E. coli strain O157:H7 Sakai genome were compared to the translated nucleotide sequence of whole genomes of other published Enterohaemorrhagic E. coli; two Enteropathogenic E. coli and one Citrobacter rodentium (related bacterial pathogens that also carry the LEE); and E. coli K-12 MG1655 (a non-pathogenic strain that does not contain the LEE) Table Table2.2. Comparing translated nucleotide sequences through protein alignment offers better sensitivity for divergent sequences than comparing nucleotide sequences only.
BRIG is capable of using raw sequencing reads as query sequences to provide rapid preliminary insights into unassembled draft genome or meta-genome data. To illustrate this feature, we have simulated unassembled Illumina data using MetaSim  by randomly sampling one million 100 base pair sequences from the complete genome sequences shown in Figure Figure3A3A and applying an Illumina error model. Reads were translated into peptides and used as query sequences in BRIG (using BLASTx) to search against the same central reference sequence in Figure Figure3A,3A, producing the image shown in Figure Figure3B.3B. Despite being based only on raw sequencing reads, the representation of sequence presence, absence and variation in Figure Figure3B3B is highly similar to that found when using whole genome sequences in Figure Figure3A3A.
Figure Figure33 represents the presence of protein encoding genes within each query genome as a full and vividly coloured bar (e.g. see the E. coli O157:H7 strains for the translated espD gene). Gene absence can be observed as a blank/white region, like any of the results for E. coli K12 MG1665, whose genome does not carry the LEE. Variation in the translated sequences will have a lower sequence identity compared to the reference genome and appear with a fully coloured but slightly faded bar, as seen in Figure Figure33 for E. coli O103:H2 and C. rodentium when searching for EspZ, or where the bar is not fully coloured, such as for E. coli O111:H- and O127:H6 when searching for EspH. As with any BRIG image, percentage identity cut-off values can be customised to alter the dynamic range of colour shown in each ring. The annotations in Figure Figure33 illustrate a feature of BRIG where users can opt to load the FASTA headings from a multi-FASTA reference sequence and use these headers to annotate their image.
BRIG is a valuable tool for analysing draft genome sequences. A draft genome that has been assembled into a set of contiguous sequences (contigs) or scaffolds (ordered contigs separated by gaps denoted by N's) in multi-FASTA format can be used as a reference sequence. Contig or scaffold boundaries can be shown as alternating blue or red segments as a custom ring. In addition, by uploading standard genome assembly files (e.g. ACE or SAM), the underlying sequencing reads can be included as a custom graph to show genome coverage. This procedure can help to highlight misassemblies, areas of low coverage and repeat regions that warrant further attention. For instance, the read coverage and contig boundaries in Figure Figure11 were generated from the ACE file produced by GS De Novo Assembler (454 Life Sciences/Roche). ACE files produced by Consed/Phrap  are also acceptable. The reordering of contigs in a draft genome is often carried out after assembly without reordering the corresponding assembly files. To address this, users can use BRIG's graph conversation module to reposition the coverage information from the original ace file to be consistent with the modified draft genome sequence based on a BLASTn comparison.
The genome coverage feature of BRIG can also show read mapping information. This can be a useful approach for determining differences amongst multiple unassembled genome datasets relative to the central reference sequence(s). As described previously, BRIG supports read or contig mapping by using BLAST (e.g. Figure Figure3B).3B). Alternatively, read mapping can be performed externally and read into BRIG as an ACE or SAM file and shown as a coverage graph. ACE files can be produced by the 454 Life Sciences/Roche GS Reference Mapper application, which maps 454 reads to a reference sequence, and there are a number of tools that use the SAM format as the standard file format for mapping short reads to a reference sequence. To illustrate this feature, the simulated reads from Figure Figure3B3B were mapped to the E. coli O157:H7 Sakai genome [GenBank:BA000007] using BWA and the genome coverage from resulting SAM files was calculated and visualised by BRIG. The resulting image is shown in Figure Figure4A,4A, where the complete E. coli O157:H7 Sakai genome is used as the central reference sequence with the read mapping graphs as rings. These results are broadly comparable to a standard BLAST comparison between O157:H7 Sakai and the original complete genome sequences (Figure (Figure4B).4B). Notably, as a member of a different genus, the genome of Citrobacter rodentium is more divergent from the O157:H7 Sakai genome than the other E. coli genomes shown and could not be mapped accurately (Figure (Figure4A).4A). This illustrates that the read mapping utility should be restricted to the analysis of strains from the same species, with BLAST being the preferable option for more distant comparisons.
There are already a number of resources that produce circular representations of prokaryote genomes; each with their own unique features and advantages. Table Table33 shows a comparison between the major features of BRIG and other GUI or internet based applications that produce circular images for prokaryote genomes. Of these resources, CGView Server , GeneWiz Browser  and DNAPlotter  bear the most resemblance to BRIG.
BRIG presents a solution to visualising prokaryote genome comparisons for a large number of genomes. Unlike DNAPlotter, BRIG does not show a preview of the image as the user edits it and only produces an image after the user has specified all of their settings. This is a common drawback of other genome comparison applications, including Circos , GeneWiz Browser and CGView Server. To address this, image templates are available in BRIG to help first time users to gauge appropriate settings for image aesthetics and scaling. Furthermore, the ability to save template files at any point during a BRIG session enables users to return to previous versions and modify images as needed.
Unlike BRIG, similar tools generally limit the number of genome comparisons that can be shown on a single image and they do not offer the option to collate multiple sequences into a single lane (Table (Table2).2). These drawbacks prevent the use of these resources in large-scale genome comparisons that are increasingly necessary as the number of publicly available genome sequences increase. BRIG has been designed with the task of draft genome analysis in mind. GeneWiz Browser, like BRIG, supports mapping and visualising short read sequences onto a reference genome; however, it does not explicitly support easy visualisation of contig boundaries within a reference sequence.
Standard BRIG comparisons rely on BLAST, so an understanding of BLAST parameters and behaviours is required in order to produce informative images. A common pitfall for first time users is the low-complexity filters, which is active by default in BLAST. These filters mask repetitive and low complexity sequences that could cause spurious low-scoring matches when searching large datasets. In BRIG, filtering often results in short (~30 base pairs long) blank regions spanning all query sequences, which may be misinterpreted as unique regions in the reference genome. Filtering can be turned off in the BLAST options field in BRIG. In addition, BLAST comparisons will often produce overlapping hits, which are difficult to visualise on a static flat image. To address this, BRIG was implemented to sort BLAST results so that the highest scoring hits are drawn last by CGView and displayed on top of other lower-scoring matches. As a result, high scoring matches are prominent over low scoring ones.
BRIG is actively maintained with a manual that includes step-by-step tutorials and sample data providing walk-throughs of all the major features. In future we plan to develop support for genome comparisons generated by programs like MUMmer  and for BRIG to calculate the co-ordinates of major regions of difference between genomes 'on-the-fly' for use in downstream analyses.
Here we report the development of the BLAST Ring Image Generator (BRIG), a user-friendly desktop application for comparing and visualising prokaryote genomes using BLAST. BRIG is highly versatile; it can visualise information derived from draft genome data, including contig boundaries, read coverage or read mapping data; it can display the presence, absence or variation of a user-defined set of reference sequences in multiple datasets simultaneously, including unassembled next-generation sequencing reads; and it can display several types of custom graphs and annotations. All facets of the program are customisable through an easy-to-use graphical user interface bringing comparative genome visualisation well within the reach of any user.
Project name: BLAST Ring Image Generator (BRIG)
Project home page: http://sourceforge.net/projects/brig/
Operating system(s): Platform independent
Programming language: Java
Other requirements: Java 1.6 or greater
Licence: GNU GPLv3
Any restrictions to use by non-academics: None.
NFA developed and implemented the BRIG application, and helped to draft the manuscript. NKP and NLBZ participated in the design and coordination of the study, and helped to draft the manuscript. SAB conceived the study, participated in its design and coordination, and helped to draft the manuscript. All authors read and approved the final manuscript.
The authors would like to Kirstin Hanks-Thomson, Nathan Bachmann, Makrina Totsika and Mark Schembri for their feedback in testing and development. This work was supported by a grant from the Australian National Health and Medical Research Council (511224). SAB is the recipient of an Australian Research Council Australian Research Fellowship (DP0881347).