|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: SJW MS XXC. Performed the experiments: SJW MS. Analyzed the data: SJW MS. Contributed reagents/materials/analysis tools: XXC. Wrote the paper: SJW XXC. Intellectual contributions during the design and implementation of this study: XXC MS. Intellectual contributions during the writing of the manuscript: XXC MS CvA GYY JHH. Provided funds in support of this study: XXC.
Strand asymmetry in nucleotide composition is a remarkable feature of animal mitochondrial genomes. Understanding the mutation processes that shape strand asymmetry is essential for comprehensive knowledge of genome evolution, demographical population history and accurate phylogenetic inference. Previous studies found that the relative contributions of different substitution types to strand asymmetry are associated with replication alone or both replication and transcription. However, the relative contributions of replication and transcription to strand asymmetry remain unclear. Here we conducted a broad survey of strand asymmetry across 120 insect mitochondrial genomes, with special reference to the correlation between the signs of skew values and replication orientation/gene direction. The results show that the sign of GC skew on entire mitochondrial genomes is reversed in all species of three distantly related families of insects, Philopteridae (Phthiraptera), Aleyrodidae (Hemiptera) and Braconidae (Hymenoptera); the replication-related elements in the A+T-rich regions of these species are inverted, confirming that reversal of strand asymmetry (GC skew) was caused by inversion of replication origin; and finally, the sign of GC skew value is associated with replication orientation but not with gene direction, while that of AT skew value varies with gene direction, replication and codon positions used in analyses. These findings show that deaminations during replication and other mutations contribute more than selection on amino acid sequences to strand compositions of G and C, and that the replication process has a stronger affect on A and T content than does transcription. Our results may contribute to genome-wide studies of replication and transcription mechanisms.
Most animal mitochondrial genomes are about 16 Kb in size and contain 37 genes: 13 protein-coding genes, 22 transfer RNA genes (tRNA) and two ribosomal RNA genes (rRNA) . Additionally, there is an A+T-rich region which contains essential regulatory elements for the initiation of transcription and replication, and is therefore referred to the control region . Features of highly economized organization, lack of recombination, maternal inheritance and a high mutation rate relative to the nuclear genome have resulted in the wide use of mitochondrial genomes in studies of genome evolution, population genetic structure and phylogenetic inference .
A remarkable feature of mitochondrial DNA is the violation of Chargaff's second parity rule, called strand asymmetry (strand compositional bias) , . Strand asymmetry is usually reflected by AT skew, as expressed by (A−T)/(A+T), and GC skew, as expressed by (G−C)/(G+C) . Positive AT skew values indicate more A than T on the target strand, and positive GC skew values indicate more G than C, and vice versa. In insect mitochondrial genomes, the two DNA strands are referred to as the majority strand (light strand in mammal mitochondrial genomes), on which more genes are coded, and the minority strand (heavy strand in mammal mitochondrial genomes) . Additionally, there is usually more A than T and more C than G on the majority strand. However, in some arthropods , , , , , flatworms , brachiopods , echinoderms  and fish , strand asymmetry is reversed and there is less A than T and less C than G on the majority strand.
Replication and transcription processes, during which one strand is transiently in a single-stranded state and thereby exposed to more DNA damage, has been widely considered to bias the occurrence of mutations between the two complementary DNA strands , . Therefore, inversion of the replication origin located in the A+T-rich region would change the replication order of two mitochondrial DNA strands and consequently lead to reversal of strand asymmetry . It has been demonstrated in a crustacean  and in two vertebrates that replication order is responsible for the reversal of strand asymmetry . Moreover, it has been demonstrated in experiments that rates of spontaneous deamination of A and C nucleotides are higher in single-stranded DNA than in double-stranded DNA , , . Deamination of A yields a base, hypoxanthine, that pairs with C rather than T, while deamination of C yields a base, uracil, that pairs with A instead of G .
In order to decipher the underlying mutation processes causing strand asymmetry in mitochondrial genomes, much research has gone into assessing the contributions of different substitution types associated with replication. C deamination that promotes C:G to T:A transitions was shown to be the major source of mutation in vertebrate mitochondrial DNA , , . Additionally, genes closer to the replication origin of leading strands remain exposed to mutation for a longer period of time during replication and this should result in a positive correlation between strand asymmetry and duration of time spent in the single-stranded state (DssH) . However, different types of mutation were found to respond differently to DssH gradients . There is no direct evidence to demonstrate the existence of transcription-associated mutational asymmetry in mitochondrial genomes although this phenomenon has been observed in enterobacteria , ,  and in the nuclear DNA of eukaryotes , . Another way to examine the mutation spectrum in mitochondrial DNA is to directly estimate the rates of different substitution types without discerning the processes of replication and transcription. In fruit fly and mouse mitochondrial DNA, transition of G:C to A:T is the dominating mutation , , but in a nematode A:T to G:C  is dominant, indicating that the spectrum of mitochondrial mutations varies across taxa . In summary, the relative contributions of replication and transcription on strand asymmetry are still unclear.
Here, we conduct a broad survey of strand asymmetry in 120 insect mitochondrial genomes, with special reference to the correlation between skew values and gene direction/replication orientation.
We calculated the AT and GC skews for the entire majority strand of 120 insect mitochondrial genomes. Most species have positive AT skews and negative GC skews, i.e., most have a strand asymmetry characterized by an excess of A relative to T and an excess of C relative to G. However, ten species, i.e., two species in Braconidae (Hymenoptera), two species in Philopteridae (Phthiraptera) and six species in Aleyrodidae (Hemiptera), showed negative AT skews and positive GC skews (Figure 1), implying that these species have strand asymmetry reversal on the entire majority strand, with less A than T and less C than G (Figure 1, Table S1). All species with sequenced mitochondrial genomes in the above three families showed reversal of strand asymmetry, suggesting that reversal of strand asymmetry may have occurred in basal members of these taxa and be phylogeneticly associated.
Strand asymmetry is presumed to be predominantly due to the commonality of deamination of A and C in single-stranded DNA during replication and transcription, although the relative contributions of these two processes is uncertain , . Based on replication theory, inversion of replication origin located in the A+T-rich region would lead to reversal of strand asymmetry , , . However, it is not easy to detect an inversion of the A+T-rich region because this is the most variable region of the mitochondrial genome, making it impossible to align between distant species. Consequently, the orientation of the control region cannot be determined by simple sequence comparisons . The direction of the A+T-rich region was determined in the 10 species with reversal of strand asymmetry on the entire majority strand by examination of the elements related to the regulation of transcription and control of DNA replication  (Figure 2, and Figure 2A and 2C in ).
For the family Braconidae, the A+T-rich region is located between trnM and trnQ both in Cotesia vestalis and Spathius agrili. Apart from the conserved elements in insect A+T-rich region, tandem repetition was also found to be a characteristic of A+T-rich regions . In S. agrili, a repeat sequence is present at the 3′-end upstream of the trnQ. In both species, all elements related to the regulation of transcription and control of DNA replication in the A+T-rich region were present and arranged in the conserved order, except for the G+A-rich sequence downstream of the stem and loop structure, and an A[TA(A)]n-like stretch in S. agrili. However, all of these elements are in opposite directions and strands compared to those of other insects , , revealing an inversion of the A+T-rich region in these two species of Braconidae , .
Amongst Aleyrodidae, all species have one A+T-rich region except Bemisia tabaci, which has two, one small and one large. Tandem repeat sequences were found in the A+T-rich region of all species except for in the small one in B. tabaci . All of these repeat regions are located at the 3′-end of the A+T-rich regions except for that of Neomaskellia andropogonis. Replication and transcription related elements in the A+T-rich regions are present in conserved order in all species except for N. andropogonis and Tetraleurodes acaciae. In B. tabaci, conserved elements were found in both A+T-rich regions. In Trialeurodes vaporariorum all conserved elements were found in each of five repeat sequences of the A+T-rich region, indicating that these elements were repeated five times. In T. acaciae, a long polyT stretch is present in the middle of the A+T-rich region upstream of the tandem repeat region. These elements are in opposite directions and strands as in two braconid species, indicating that the mitochondrial genomes of aleyrodid species also have an inversion of the A+T-rich region.
For the Philopteridae, there are several A+T-rich regions in each species and the inferred stem-loop structures in each A+T-rich region are located on the majority strand , , which might indicate reversals of the A+T-rich regions.
In summary, examination of regulatory elements in A+T-rich regions directly supports the hypothesis that the reversal of strand asymmetry is caused by inversion of replication origin.
Mitochondrial gene arrangement is highly conserved in most animals , though there are exceptions. The 10 species with reversal of strand asymmetry over the entire mitochondrial genome were found to have accelerated gene rearrangement rates , , . However, species that have accelerated gene rearrangement rates do not always show a reversal of strand asymmetry, e.g., three Nasonia species (Insecta: Hymenoptera)  and Thrips imaginis (Insecta: Thysanoptera) . Below, we explore the relationship between gene arrangement and strand asymmetry in species with reversal of strand asymmetry over the entire mitochondrial majority strand following the traditional classification of rearrangement events: translocation, local inversion (inverted in the local position), shuffling and remote inversion (translocated and inverted) .
For these 10 species, gene rearrangement varied greatly both among and within families (Figure 3). In Braconidae, the S. agrili mitochondrial genome is relatively conserved in gene arrangement with five tRNA genes rearranged. However, in C. vestalis, the mitochondrial genes are highly rearranged with 15 tRNA and seven protein-coding genes inverted, representing 5968 bp, more than one third of the genome . As discussed in Wei et al. , it is more parsimonious to suggest that the A+T-rich region, trnI and trnM were inverted simultaneously in Braconidae.
Extensive gene rearrangements occur in the two species of Philopteridae. All mitochondrial genes coded on the minority strand are inverted in both species, forming novel patterns, with all genes coded on the majority strand except for trnQ in Campanulotes bidentatus. Thus, A+T-rich regions are probably inverted independent of the rearrangements of other genes in Philopteridae.
Amongst Aleyrodidae, Aleurodicus dugesii and T. vaporariorum the tRNAs are rearranged and largely restricted to the tRNA clusters between the A+T-rich region and nad2, nad2 and cox1. In the mitochondrial genomes of the other four species, a large segment containing at least cox3, trnG and nad3 is remotely inverted.
Based on our analyses of the structures of A+T-rich regions and gene arrangement patterns, inversion of the A+T-rich region is the only character state unique to these 10 species. In both Braconidae and Aleyrodidae, some species show conserved mitochondrial gene arrangement whereas others are extensively rearranged. Thus, the inversion of the A+T-rich region may have occurred prior to extensive gene rearrangements in, at least, some species.
In C. vestalis, Bothriometopus macrocnemis and C. bidentatus, protein-coding genes are extensively inverted. Since replication and transcription processes have been widely considered to bias the occurrence of mutations between two complementary DNA strands , , the inversion of large number of protein-coding genes should have counteracted some of the affect of reversed replication origin on strand asymmetry.
In eukaryotes, many non-coding regions can be used for the comparison of strand asymmetry between transcribed and untranscribed regions , ; however, most animal mitochondrial genomes are highly economized with few non-coding regions to facilitate the study of transcription-associated strand asymmetry . Fortunately, in the putative ancestral insect mitochondrial genome, four of the 13 protein-coding genes are located on the minority strand and the total length of protein-coding genes coded on the minority strand accounts for about 2/3 of those of the entire mitochondrial genome. This provided us with the opportunity to compare the strand asymmetry of genes coded on different strands. Contrastingly, in mammal mitochondrial genomes, which are frequently used models for studies of strand asymmetry, only one protein-coding gene is coded on the minority strand, and this gene has been excluded from many ,  but not all  studies. Here, we analyzed strand asymmetry for individual protein-coding genes in 120 insect mitochondrial genomes with special reference to the relationship between strand asymmetry and gene direction/genome replication orientation.
Third codon positions are less affected by selection on amino acids, but many third codon positions are not free of change because of the existence of two-fold redundant codon positions. Four-fold third codon positions, which can freely alternate between all nucleotides without changing the resulting amino acid, are considered to have little or no affect on selection. Here, we calculated AT and GC skews for all codon positions, third codon positions, two-fold redundant third codon positions, and four-fold redundant third codon positions for individual protein-coding genes (Figure 4, Figure S1 and Table S1).
At all codon positions and two-fold redundant third codon positions, most genes coded on the minority strand show positive AT skew values, whereas most genes coded on the majority strand show negative AT skew values. This is the case in all analyzed insects with normal replication origins and inverted replication origins, except for six Aleyrodidae species with inverted replication origins. In these, some genes coded on the minority strand showed negative AT skew values. Other exceptions are mostly restricted to specific taxa or genes. At all codon positions, genes which coded on the majority strand in all Japygoidea (Diplura), Isoptera, Acrididae (Orthoptera), Archostemata (Coleoptera), Elateriformia (Coleoptera), and Archaeognatha showed positive AT skew values. All genes coded on the majority strand of all Oedipodinae have negative AT skew values. At two-fold redundant third codon positions, nad4L, coded on the minority strand, showed negative AT skew values in most species unlike other genes coded on the minority strand. At third codon positions and four-fold redundant third codon positions, most genes coded on the minority strand showed positive AT skew values, however, genes coded on the majority strand showed either positive or negative AT skew values (Figure 4A and Figure S1).
Most genes coded on both the majority and minority strands have negative GC skew values in genomes with normal replication origin (Figure 4B and Figure S1); whereas most genes coded on both the majority and the minority strands show positive GC skew values in the genomes of the 10 species with reversed strand asymmetry over the entire majority strand (Figure 4C and Figure S1).
We tested the correlation between the sign of skew values on individual genes and gene direction/genome replication orientation using contingency table and chi-square test. The results support the idea that the sign of AT skew on individual genes is associated with gene direction, but the sign of GC skew on individual genes is not associated with gene direction at third codon positions and four-fold redundant third codon positions. The p-values support the correlation between the sign of GC skews of individual gene and gene direction at all codon positions and two-fold redundant third codon positions, however, the chi-square values were lower than that for AT skew. The sign of both AT and GC skew values are associated with replication origin, however, the chi-square values were lower for AT skew than for GC skew (Table 1). In conclusion, the sign of GC skew is associated with replication orientation but not associated with gene direction. The sign of AT skew varies with gene direction, replication and codon position.
We conducted a broad survey of strand asymmetry in 120 insects and showed that reversal of GC skew sign over the entire majority strand evolved three times in insects. Further we demonstrated that reversal of GC skew sign over the entire majority strand appears to be correlated phylogeneticly. This is in contrast with other animal taxa, in which reversal of strand asymmetry is randomly distributed , , though fewer species across a larger phylogenetic space have been sampled. More mitochondrial genomes sequences are needed to confirm the correlation between the reversal of strand asymmetry in mitochondrial genomes and taxonomic groups. That reversal of strand asymmetry was caused by inversion of replication origin was demonstrated by the examination of replication-related elements in the A+T-rich region , . All species with reversal of strand asymmetry over the entire mitochondrial genome were found to have accelerated gene rearrangement rates, whereas species with accelerated mitochondrial genomes rearrangement did not always show reversal of strand asymmetry. This may indicate that inversion of the A+T-rich region, leading to reversal of strand asymmetry, is a type of gene rearrangement event unique to mitochondrial genomes. The causes of frequent gene rearrangements in mitochondrial genomes have yet to be examined .
By comparing six protein-coding genes in 49 metazoan mitochondrial genomes, Hassanin et al. (2005) found that absolute values for GC are always higher than those of AT skews at all codon positions and suggested that strand asymmetry is best reflected in the GC skew. Skew values on individual genes for all codon positions could help to explain this phenomenon. GC skews of different genes coded on the same strand are all positive or negative, whereas the AT skews of different genes coded on the same strand are either positive or negative depending on the direction of the gene, thus the GC skew on a strand is the accumulative effect of all genes, and the AT skew on a strand is the homogenized result of those AT skews with different signs on individual genes. Our conclusion is that the criterion for detecting reversal of strand asymmetry on mitochondrial genomes should be the sign of GC skew values not AT skew values.
Strand asymmetry is the consequence of selection and asymmetric patterns of mutation between the two strands , . Two processes are widely accepted to bias mutations, i.e., replication and transcription. It is well supported that the deamination of A and C at exposed single-stranded regions results in an increase of C and A content on the complementary sequences , , .
Although some  but not all  previous analyses excluded genes encoded on the minority strand or genes with short length, and limited to the positions with weakest selection pressure , , the results on vertebrate mitochondrial genomes show that different types of mutations respond differently to DssH gradient , indicating that mutation patterns in mitochondrial DNA were more complicated than previously thought.
Under the widely accepted “strand-displacement model” of animal mitochondrial genome replication , , , , the parent majority strand is first used as a template to synthesize the nascent minority strand, i.e., the leading strand, with the parent minority strand left single-stranded, and consequently experienced more A and C deaminations. Thus, the synthesized nascent majority strand, i.e., lagging strand, using parent minority strand as a template, has more A and C than the leading strand. This replication theory is congruent with our observations that more A and C are present on the entire majority strand in most mitochondrial genomes with normal replication origin; whereas there is less A and C on the entire majority strand in most genomes with inverted replication origin. On a protein-coding gene sequence, all positions should be affected by both selection and mutation, except for four-fold third codon positions. In our study, the sign of GC skew values on individual protein-coding genes was almost identical among analysis for four different partitions of gene sequences, indicating that deaminations occurring during replication, in addition to other mutations, contributed more than selection on amino acid sequences to the strand composition of G and C. This is congruent with the hypothesis, for mammalian mitochondrial genomes and nuclear DNA, that one of the crucial processes for the origin of strand asymmetry is the spontaneous deamination of C and A in the H-strand (referred to as the “minority strand” in insects) during replication, with deamination of C being twofold higher than deamination of A , , .
Under the transcription model, while RNA is being synthesized on the transcribed strand of DNA, the nontranscribed DNA strand remains transiently single stranded. It has been shown experimentally that transcription biases the mutational patterns between the transcribed and nontranscribed strands . In mammalian mitochondria, both strands are transcribed as a single polycistronic precursor RNA . Therefore, Hassanin et al. (2005) suggested that transcription can be considered to be an asymmetric process, because the L-strand (named as “majority strand” in insects) is transcribed two or three times more frequently than the H-strand (named as “minority strand” in insects), and the H-strand is expected to be more prone to deamination and transcription-coupled repair mutations due to its single-stranded state during transcription of the L-strand. Thus, regions on the same strand have the same tendency of A and C content variation produced by transcription. However, this model does not explain the variation of AT skew values depending on gene direction examined at all codon positions.
It is likely that there is a second initiation site for H-strand transcription in mammalian mitochondria, one that produces RNA transcript spanning the rDNA region . Furthermore, an alternative model of transcription in Drosophila proposed that four major blocks of genes, coded on the same strand, have unique transcription initiation sites upstream of their coding region ,  This implies that transcription of one coding region is possible . We suggest that similar multiple polycistronic transcription models may exist in insect mitochondria, under which each block of genes coded on the same strand are transcribed in at least one transcript, resulting in only antisense strands being transcribed. Thus, antisense strands are paired with nascent mRNA, and sense strands are exposed in a single-stranded state, and more A and C deamination occurs during transcription. This leads to the increase of A and C content on the nascent antisense strand, but hardly affects A and C content on the nascent sense strand. At all codon positions there is more A than T in genes coded on minority strand while more T than A in genes coded on majority strand both with normal and inverted replication origin, indicating that selection on amino acid sequence, in addition to mutation during transcription, is stronger than mutation during replication. At four-fold redundant third codon positions, where selection has little or no effect on nucleotide composition, AT skew values for most genes, coded on both majority and minority strand in mitochondrial genomes with normal replication origin, are positive. Contrastingly, those with inverted replication origin are negative, indicating that replication process has a stronger affect on A and T content than transcription. In a previous study, Faith and Polleck also concluded that transcription plays no detectable role in producing the gradients of asymmetry at redundant sites by separately analysing the four-fold redundant sites in nad6, the only gene transcribed on the minority strand in vertebrate mitochondrial genome .
In conclusion, we hope to have demonstrated the relative contributions of selection on amino acid sequences and mutations during replication and transcription to strand asymmetry in animal mitochondrial DNA. Further we show that replication plays a key role in generating strand asymmetry, and that the relative contribution of selection and mutation varies among nucleotides. This, in turn, indicates that multifactorial studies on mutation patterns are necessary to uncover mutation processes in insect mitochondrial genomes.
Our results not only shed significant light on the mechanisms underlying strand asymmetry, but may also contribute to genome-wide studies of replication and transcription mechanisms.
One hundred and twenty insect mitochondrial genomes were used for strand asymmetry analyses, belonging to 20 orders in three classes of insects (Table S1), including all those available in GenBank at the inception of this study and two recently sequenced ones . Sequences of whole mitochondrial genome strands and individual protein-coding genes were downloaded from the Mitome database .
For each mitochondrial genome, AT and GC skews were calculated for the majority strand, all codon positions, third codon positions, two-fold redundant third codon positions and four-fold redundant third codon positions of protein-coding genes. If a gene was coded on the majority strand, the sense strand was used, whereas if the gene was coded on the minority strand, the antisense strand was used for calculation.
Statistical analyses were conducted in DPS version 9.50 . Contingency table and Chi-square tests for categorical data were used to estimate the association between gene direction and the sign of AT and GC skews, as well as between the replication orientation and the sign of AT and GC skews.
Mitochondrial genomes used in strand asymmetry analyses and skew values calculated for the whole mitochondrial genomes and individual protein-coding genes. All-AT, AT skew values for all codon positions of individual genes; All-GC, GC skew values for all codon positions of individual genes; P3-AT, AT skew values for third codon positions of individual genes; P3-GC, GC skew values for third codon positions of individual genes; R2P3-AT, AT skew values for two-fold redundant third codon positions of individual genes; R2P3-GC, GC skew values for two-fold redundant third codon positions of individual genes; R4P3-AT, AT skew values for four-fold redundant third codon positions of individual genes; R2P3-GC, GC skew values for four-fold redundant third codon positions of individual genes.
(2.82 MB XLS)
Scatterplots of AT and GC skews values calculated for all codon positions, two-fold redundant third codon positions and four-fold redundant third codon positions of individual protein-coding genes in insect mitochondrial genomes. A. Scatterplots of AT skews values calculated for all codon positions of individual protein-coding genes in 120 insect mitochondrial genomes. B. Scatterplots of AT skews values calculated for two-fold redundant third codon positions of individual protein-coding genes in 120 insect mitochondrial genomes. C. Scatterplots of AT skews values calculated for four-fold redundant third codon positions of individual protein-coding genes in 120 insect mitochondrial genomes. D. Scatterplots of GC skews values calculated for all codon positions of individual protein-coding genes in 10 insect mitochondrial genomes with inverted replication origin. E. Scatterplots of GC skews values calculated for all codon positions of individual protein-coding genes in 110 insect mitochondrial genomes with normal replication origin. F. Scatterplots of GC skews values calculated for two-fold redundant third codon positions of individual protein-coding genes in 10 insect mitochondrial genomes with inverted replication origin. G. Scatterplots of GC skews values calculated for two-fold redundant third codon positions of individual protein-coding genes in 110 insect mitochondrial genomes with normal replication origin. H. Scatterplots of GC skews values calculated for four-fold redundant third codon positions of individual protein-coding genes in 10 insect mitochondrial genomes with inverted replication origin. I. Scatterplots of GC skews values calculated for four-fold redundant third codon positions of individual protein-coding genes in 110 insect mitochondrial genomes with normal replication origin. Gene name with minus indicates that the gene is coded on minority strand, while without minus indicates on majority strand.
(0.06 MB PDF)
We thank Professor Qi-yi Tang (Zhejiang University) for his helping in statistical analysis during the study. We also thank two anonymous reviewers for their important comments on the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
Funding: Funding for this study was provided jointly by the National Science Fund for Distinguished Young Scholars (30625006), the 973 Program (2006CB102005), the National Science Foundation of China (30970384, 30700063 and 30499341), the National Special Basic Research Funds (2006FY110500-3, 2006FY120100), Special Fund for Agro-Scientific Research of the Public Interest (200803005-04) and Zhejiang Key Program of Agriculture (2009C12048) to XXC, and the NSF (EF-0337220) to MJS. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.