‘Sequence logos’ are simple graphical representations of conserved elements in multiple sequence alignments. Sequence logos were first introduced by Tom Schneider and colleagues (Schneider and Stephens, 1990
). However, the popularity of sequence logos was greatly boosted by the advent of WebLogo (Crooks et al., 2004
), which provides a web-based interface for sequence logo generation. WebLogo allows the processing of multiple sequence alignments and generates a logo where each column of the alignment is represented by a stack of letters. The height of the entire stack is proportional to its informational content (maximum—2 bits for nucleotides and 4.32 bits for amino acids), whereas the height of each symbol is proportional to its frequency.
Sequence logos inspired development of several other tools that use principles of Shannon's information theory (Shannon, 1948
) for graphical visualization of conserved biological elements. For example, RNALogo (Chang et al., 2008
) allows visualization of conservation of nucleotides in the context of secondary RNA structures diagrams. CorreLogo (Bindewald et al., 2006
) generates 3D images that represent not only local conservation of nucleotides but also mutual information, thus allowing for visualization of double-stranded regions in RNA structures, the characteristic signature of which is compensatory mutations (Dixon and Hillis, 1993
). BLogo (Li et al., 2008
) allows one to visualize both overrepresented and underrepresented symbols in multiple alignments. Logopaint improves visualization of patterns within alignments of coding regions by removing distortion caused by unequal evolutionary rates for synonymous and non-synonymous substitutions (Schreiber and Brown, 2002
). We have been able to identify 13 different tools (data not shown), freely available through the Web that are closely related to the idea behind sequence logos. Despite the impressive fertility of sequence logos, we have not been able to find a single tool that enables visualization of codon patterns.
Codons have specific biological meaning during translation. Codons are the units interacting with transfer RNAs (tRNAs) during protein sequence decoding, and on numerous occasions the meaning of synonymous codons is not the same. Synonymous codon substitutions could have drastic effects on such phenomena as programmed ribosomal frameshifting (Baranov et al., 2002
; Namy et al., 2004
), and they also could affect speed (Tuller et al., 2010
) and accuracy of translation (Drummond and Wilke, 2008
). Moreover, altered combinations of codons could greatly affect the overall efficiency of translation (Coleman et al., 2008
). Therefore, it is clear that the patterns of codons have biological significance. However, as we show, standard sequence logos are unable to discriminate between conserved patterns of codons and conserved patterns of nucleotides if the nucleotide composition of multiple alignment columns is the same. To overcome the current limitations of sequence logos, we have developed a new tool that we have named CodonLogo.