In this section, we demonstrate examples of interrogating evolutionary relatedness of genomes using the Cinteny server. Different types of queries and different setup of parameters are illustrated. In particular, we show how the Cinteny web server can be used to identify synteny blocks and compute the reversal distance (RD) between whole genomes as well as between any two chromosomes of two genomes.
In order to perform sensitivity analysis for the computation of RD and identification of synteny blocks, Cinteny allows the user to interactively adjust several parameters, including: i) minimum length of synteny blocks (denoted as min_len); ii) maximum gap between adjacent blocks for aggregation (denoted as max_gap); iii) minimum number of markers in a block (denoted as min_num). Aggregation refers here to combining smaller synteny blocks to form larger blocks, wherever feasible. For example, by increasing the length between adjacent blocks (max_gap), one may effectively join segments which are otherwise far apart and obtain longer synteny blocks. Thus, such aggregation provides an effective coarse-graining of synteny blocks and affects the resulting RDs.
Another problem in applying algorithms for synteny block identification is posed by degenerate markers, such as paralogs (i.e., multiple copies of the same gene). In particular, the Hannenhalli-Pevzner theorem assumes that markers are unique [2
]. In order to address this problem, Cinteny offers several different options for dealing with paralogs and enabling the assessment of these heuristics. One rational strategy is to use a paralog which lies within the most conserved region (i.e., the largest synteny block). In fact, this is the default option used also for all the examples shown in this paper. Other options, which are provided to enable an assessment of the results of such arbitrary choices, include the use of a random paralog or ignoring all genes which have paralogs.
Whole genome analysis
Finding synteny and the RD between whole genomes is discussed here as an example of a typical application in comparative genomics. Figure shows the synteny blocks for human and mouse genomes. The genome shown in the right panel (mouse in this case) is the source genome and the genome on the left (human) is the target genome. All chromosomes of the source genome are shown in unique colors. Each chromosome of the target genome is shown as composed of segments of some chromosome of the source genome, as indicated by the corresponding color. For example, the majority of human chromosome 1 is composed of (i.e. is syntenic to) mouse chromosomes 4, 3 and 1.
Visualization of synteny between human and mouse genomes.
The figure was generated using a particular level of coarse-graining (aggregation), as defined by min_len = 300 Kb, max_gap = 1 Mb and min_num = 3. The number of synteny blocks found with these parameters and using the set of 15,645 human-mouse orthologs, as identified by HomoloGene 46.1 [11
], is 359 and the RD is 261. These results are in qualitative agreement with previous studies [17
]. However, it should be noted that by changing the level of coarse graining one may obtain very different results. For example, using min_len = 0, max_gap = 0 and min_num = 2 (this experiment is equivalent to no aggregation of synteny blocks) we find that number of synteny blocks increases to 828 and the RD to 348, respectively. We discuss further the problem of the dependence of the results on the parameters in Multiple Genomes section.
Chromosome level analysis
Cinteny can also be used to identify synteny blocks and reversal distance between two chromosomes. In this case we use common markers located on the chromosomes of interest. In Figure , we show an example of synteny for the X chromosome of the mouse and the rat genomes. Figure shows the syntenic blocks obtained without any aggregation (except for imposing that min_num = 2). The number of synteny blocks is 85 and the RD is 52 in this case. On the other hand, when using the aggregation described in the previous section, the number of synteny blocks is reduced to 17 and the RD to 8, respectively (see Figure ). As discussed later, similar results were obtained by imposing natural coarse-graining that utilizes multiple genomes.
Comparison of X chromosomes of mouse and rat with (Panel B) and without (Panel A) aggregation.
Analysis of individual synteny blocks
In addition to queries illustrated above, one may use Cinteny for visualization and analysis of the synteny around a specific marker or gene. For example, starting with human and mouse genomes and default aggregation parameters, we find that the human BRCA1 gene is present in a conserved region of size 7.2 Mb in human chromosome 17, whereas its mouse ortholog (Brca1) is present in a conserved region of size 6.1 Mb on mouse chromosome 11 (see Figure ). On the other hand, the DKK1 gene is present in a conserved region of size 2 Mb in the human genome and its ortholog, Dkk1, is present in a conserved region of size 1.7 Mb in the mouse genome. Several examples of such queries are available in the online help available at the Cinteny web site.
Synteny block view: human chromosome 17 (top) and mouse chromosome 11 (bottom) segments which contain the gene BRCA1.
Using multiple genomes
The use of multiple genomes for genome rearrangement analysis has been proposed before, e.g., in order to derive relationships between canine and other mammalian genomes [18
]. Cinteny web server allows one to perform a 2-way (two genome-based) as well as multi-way (multiple genome-based) analysis. As an example, Figure compares the whole genome synteny between rat and mouse genomes, as identified using 2-way and 5-way strategies that utilize subset of markers common to two (rat and mouse) or five genomes (human, mouse, dog, chimpanzee and rat), respectively (see also the Implementation section).
Comparison of whole genome synteny between rat and mouse using two-way (Panel A) and multi-way (Panel B) approach.
An intermediate aggregation level is used here for the two-way comparison (Figure ), with min_len = 100 kb and max_gap = 100 kb, leading to a reversal distance of 128. In Figure , the same setup of parameters is used, except that only orthologs common to the five mammalian genomes are used. In the latter case, the RD of 86 is obtained. In addition, one can see that there are, in general, fewer gaps (represented by white spaces) in the 5-way analysis (Figure ), as a result of a natural coarse-graining due to selecting only highly conserved markers included in 5-way analysis.
Thus, as illustrated above, the absolute values of reversal distances may vary significantly with different choices of markers and aggregation strategies. We would like to comment, however, that RD is much more sensitive to the choice of parameters (min_len in particular) when using the 2-way approach, which can be easily verified using the Cinteny server. It also interesting to note that in relative terms (e.g., when using the mouse to rat distance normalized by the human to mouse distance) the reversal distances appear to be quite constant for a range of aggregation parameters, especially when using the multiple genome approach (data not shown). This may suggest that RDs can be used to indicate evolutionary relatedness in relative terms, as long as proper parametrization of the problem is used. This is the subject of a future work.
To further illustrate the usefulness of multi-way approach, we performed a 5-way analysis for the X chromosome of mouse and rat without any aggregation, yielding the RD of 14 and 19 synteny blocks. Comparison with Figure , as well as comparison of the number of synteny blocks and RDs obtained using different aggregation strategies, suggests that using multiple genome approach provides a natural coarse-graining that allows one to select appropriate aggregation parameters for genomes of interest. Additional examples, e.g., regarding fungal genomes that are characterized by very different gene densities and high levels of genome rearrangements (making the choice of suitable aggregation parameters even more difficult), are included in the on-line help. We also comment that recent efforts to better and more fully annotate orthologous genes in hundreds of sequenced genomes [19
] will likely make tools for multiple genome-based analyses even more important.