Mitochondrial genome sequences are widely used in the evolutionary studies of many different groups of organisms. They range in size from just under 6 kb in Plasmodium falciparum
and Plasmodium reichenowi
to over 360 kb in some plants such as Beta vulgaris
and Aribidopsis thaliana
. The 69 kb mitochondrial genome of Reclimonas americana
is the least derived and most gene rich discovered to date, containing 94 genes and bearing a clear relationship to its eubacterial ancestor (1
). Most metazoan mitochondrial genomes are much smaller (~16 kb) and highly derived (2
); they code for a reduced set of 13 proteins, 22 tRNAs and two rRNAs.
Although the gene content of metazoan mitochondria is highly consistent, the order of those genes varies substantially between different taxa. At the time of writing, we observe 14 unique gene orders amongst the set of 177 available complete vertebrate mitochondrial genomes and 50 unique gene orders amongst the 75 invertebrate genomes. This reflects the fact that there are many closely related vertebrate species having identical gene orders, whilst for the invertebrate phyla, where the species sampling is less dense, almost all species differ from one another in gene order. As the number of invertebrate genomes available increases, we expect to observe a rapid increase in the number of unique gene orders. Several mechanisms are envisaged for the rearrangement of mitochondrial genomes including inversions, translocations, duplications and deletions (3
). Gene order information can also be used for phylogenetic purposes (3
). Various algorithms, including inversion distances, breakpoints and related edit distances, having been developed for measuring distances between gene orders and subsequently reconstructing phylogenetic trees (6
In addition to gene orders, mitochondrial gene sequences have proved extremely popular in molecular phylogenetics. They have been used for reconstructing phylogenies at various different levels—from intraspecies (12
) to the divergence of large taxa (13
). The conserved sets of single copy orthologous genes found in complete mitochondrial genomes greatly facilitate the reconstruction of combined gene phylogenies (15
Variation in the frequencies of nucleotide bases can bias the results of phylogenetic methods (16
). Base frequencies on mitochondrial genomes differ quite substantially between species, sometimes even between those that are closely related. In addition, it has been shown that base frequencies can vary between the two DNA strands and sometimes from one part of the genome to another (18
). Mitochondria provide a closed system within which to study variations in base frequency and therefore codon usage. The frequencies of synonymous codons in mitochondrial genes are closely dependent on the base frequencies, suggesting that the asymmetry of mutational rates between the four bases is one of the most important factors in determining codon usage patterns (20
). It is also possible, however, that selective effects such as increasing efficiency of translation or avoiding certain unfavourable DNA motifs may play a role.
Gene order, combined gene phylogeny and codon usage within mitochondria are related progressive areas of research. Large-scale sequencing programs such as those on protists and fungi (1
), invertebrates (22
), fish (23
) and mammals (25
) mean that there are now over 300 complete mitochondrial genome sequences available in the public databases. We have designed a database, Organellar Genome Retrieval, (OGRe) to extract the complete sequence data and provide new resources based around gene order information and codon usage data. Here we describe OGRe and the facilities that it provides.