General features of the mitochondrial- and plastid-DNA sequences from V. carteri
Using a long-range PCR approach in conjunction with cloning, we sequenced from V. carteri
29,961 nt of the mitochondrial genome and 420,650 nt of the plastid genome; partial genetic maps of these genomes describing the coding and noncoding regions that were sequenced are respectively shown in Figures and . Regions of the V. carteri
organelle DNA that were previously characterized (~8 kb of mtDNA and ~5 kb of ptDNA) are highlighted in pink on these maps. Although we attempted to completely sequence the mitochondrial and plastid genomes, presumed secondary structures in the mtDNA and ptDNA templates likely caused many of the sequencing reactions to suddenly stop – even when using protocols designed to alleviate this problem [22
]. Furthermore, the repetitive nature of the organelle genomes means that much of the mtDNA and ptDNA sequence data are irresolvable using the currently available genome-assembly software programs: many of the organelle intergenic and intronic DNA sequences collapse into networks of spurious repetitive motifs upon assembly. Moreover, the fact that most of the mtDNA and ptDNA intergenic regions are much longer than a typical sequencing read (some intergenic regions exceed 15 kb) means that these collapsed repeats are irresolvable. At present, the most sophisticated assembly programs use the paired-end sequencing data from whole-genome shotgun reads to resolve complex repeat regions. Because there is a V. carteri
nuclear genome sequencing project [18
], we have access to paired-end sequencing reads for the mitochondrial and plastid genomes (see Methods for details); but even with these data, neither the assembly programs nor our own manual, by-eye assembly methods can untangle these repeats. Because of these difficulties, our V. carteri
mitochondrial-genome assembly, although contained in a single contig, contains six regions where the mtDNA sequence is either unreadable or unavailable (Figure ), and the assembly of the ptDNA is divided into 34 contigs (Figure ). Nevertheless, we did sequence and characterize enough mtDNA and ptDNA to confidently describe the abundance and various types of noncoding DNA in each of these organelle genomes.
Figure 1 Partial genetic map of the Volvox carteri mitochondrial genome compared to the complete mtDNA genetic map of Chlamydomonas reinhardtii. Protein-coding regions are yellow and their exons are labelled with an "E" followed by a number denoting their position (more ...)
Figure 2 Partial genetic map of the Volvox carteri plastid genome compared to the complete ptDNA genetic map of Chlamydomonas reinhardtii. Regions encoding proteins are yellow and their exons are labelled with an "E" followed by a number signifying their order (more ...)
The organelle-DNA sequences presented in this study were validated by collecting and assembling mtDNA and ptDNA sequence data that were generated by the DOE JGI V. carteri
nuclear genome sequencing project [18
]. This was performed by: 1) downloading DNA-sequence trace files corresponding to the V. carteri
mitochondrial and plastid genomes; 2) assembling these trace files into contigs; and 3) mapping the trace-file contigs to the V. carteri
mtDNA and ptDNA sequences produced in this study. Ultimately, the mtDNA and ptDNA sequences coming from the DOE JGI covered all of our self-generated V. carteri
sequence data with >50-fold redundancy. It is important to note that the DOE JGI data that we used to confirm our mtDNA and ptDNA sequences came from V. carteri
strain HK10 (UTEX 1885), whereas the V. carteri
mtDNA and ptDNA sequences that we generated came from strain 72-52 (UTEX 2908), which is a dissociator mutant derived from HK10 [23
]. In all instances, the organelle DNA sequence data coming from strain HK10 were identical to those of strain 72-52 (i.e., no ambiguities between the DOE JGI trace-file contigs and our sequences were observed), with the exception of a group-I intron that is present in the mtDNA of HK10 but absent in that of 72-52 (see below for details).
Of the 29,961 nt of V. carteri
mtDNA sequence data presented here, 18,355 nt (61%) are noncoding, which include 7,870 nt (26%) of intronic DNA and 10,485 nt (35%) of intergenic DNA; the remaining 11,606 nt (39%) are comprised of 8,166 nt (27%) coding for proteins and 3,440 nt (12%) coding for structural RNAs. The intergenic regions range from 0 to >1,400 nt in length, and, on average, are 455 nt long. The AT content of the 29,961 nt mtDNA sequence is 66%. Our annotation of the V. carteri
mtDNA includes 7 protein-coding genes; the full suite of rRNA-coding modules required for the formation of the large-subunit and small-subunit rRNAs; 3 tRNA-coding genes; and 3 introns, 2 of group-I affiliation, located in cox1
, and 1 of group-II affiliation, located in cob
(Figure ). Both group-I introns contain an open reading frame (ORF) encoding a putative LAGLIDADG endonuclease. The sole group-II intron has an ORF for which the deduced amino-acid sequence shows similarity to a reverse transcriptase (Figure ). The DOE JGI V. carteri
mtDNA sequences that we assembled (derived from V. carteri
strain HK10) have, as mentioned above, an additional group-I intron in cox1
that is not present in V. carteri
strain 72-52 (Figure ). The coding suite that we acquired for the V. carteri
mtDNA is identical to that of the C. reinhardtii
mitochondrial genome [16
] as is the gene order save for two rearrangements, which are outlined on Figure . There are two interesting features of the V. carteri
mtDNA relative to its C. reinhardtii
counterpart. First, the V. carteri
L8 rRNA-coding module harbours a 725 nt insertion composed of short palindromic repeats, whereas that of C. reinhardtii
contains no repeats (Figure ). When the V. carteri
L8 module is folded into a putative secondary-structure model within the context of the LSU rRNA it contains two structural constituents: L8a and L8b (corresponding to the LSU rRNA domains V and VI, respectively), where the 3' end of L8a and the 5' end of L8b border the 725 nt insertion (Figure ). At present, we do not know if this insertion is removed from the primary transcript so that separate L8a and L8b mature transcripts are produced or if a single mature L8 transcript is generated with the insertion. The second point of interest is that although a putative reverse transcriptase gene is found in the mtDNA of both of V. carteri
) and C. reinhardtii
), that of V. carteri
appears to be part of a group-II intron located in cob
, whereas in C. reinhardtii
is a free standing gene that is lacking an intron but speculated to have originated from one [27
] – see Popescu and Lee [29
] for further discussion. The deduced amino-acid sequences of both ORF750
have a conserved domain that resembles that of a reverse transcriptase with group-II intron affiliation; however, the amino-acid sequence of ORF750
also has a conserved domain with similarity to a type-II intron maturase, while that of rtl
does not. This extra domain encoded in ORF750
also explains why this ORF is twice the size of rtl
(2,250 nt versus 1,119 nt).
Figure 3 Schema of the L8 rRNA-coding module in the Volvox carteri mitochondrial genome and its relationship to that in the Chlamydomonas reinhardtii mtDNA. The grey bars in A denote regions of sequence identity between the L8 rRNA-coding modules of the V. carteri (more ...)
In regard to the 420,650 nt of V. carteri ptDNA sequence data that were generated, 338,557 nt (80%) are noncoding, of which 16,005 nt are intronic DNA and 322,552 nt are intergenic DNA; 77,335 nt (19%) code for proteins and 4,758 nt (1%) code for structural RNAs. The intergenic regions that were sequenced range from 87 nt to >12,444 nt in length and have an average size of 5,103 nt. The 420 kb of ptDNA are 57% AT. Our annotation of the ptDNA sequences includes 91 genes: 60 coding for standard plastid proteins, 27 coding for structural RNAs (23 tRNAs and 4 rRNAs), and 4 corresponding to ORFs (ORF494, ORF2032, ycf12, ORF2828) that have been previously found in plastid genomes (Figure ). Three group-I introns were observed, located in chlL, psaA, and atpA; those in the later two genes contain an ORF encoding a putative LAGLIDAD endonuclease. Five group-II introns were discerned, situated in psaA, cemA, psaB, atpA, and atpB; the introns of the latter two genes have an ORF for which the inferred amino-acid sequence resembles that of a reverse transcriptase. The group-II intron of psaA is fragmented into two separate modules, which is also the case for C. reinhardtii (Figure ). The 91 V. carteri ptDNA genes presented here are all found in the C. reinhardtii plastid genome with the exception of ORF494. The only apparent homolog of ORF494 is the ribosomal operon-associated gene (roaA) found in the Euglena gracilis plastid genome. Note, the C. reinhardtii ptDNA encodes a further 4 tRNA-coding and one rRNA-coding regions that we were unable to amplify from V. carteri.
A graph comparing both the estimated sizes and the fraction of noncoding nucleotides in the mitochondrial and plastid genomes of V. carteri
relative to those of the currently available complete organelle-genome sequences from chlorophyte-, streptophyte- and other plastid-harbouring-taxa is shown in Figure [and see Additional file 1
]. Values of 30 kb and 420 kb, respectively, were chosen, based on our sequence data, as minimum-estimate genome sizes for the V. carteri
mtDNA and ptDNA.
Figure 4 Fraction of noncoding DNA plotted against genome size for the available organelle genomes from streptophytes, chlorophytes, and other plastid-harbouring taxa. The data points corresponding to the mtDNA and the ptDNA of V. carteri and those of its close (more ...)
Short palindromic repeats in the mitochondrial and plastid genomes of V. carteri
Scanning of the V. carteri mitochondrial- and plastid-DNA sequences for repetitive elements lead to the identification of a series of short palindromic repeats in both of the organelle genomes; the consensus sequences, complementary bases, and copy numbers of the mtDNA and ptDNA palindromic elements are outlined in Figures and , respectively. Although the short palindromic repeats of the mtDNA share many of the same structural traits as those of the ptDNA (discussed below), they differ by >50% in sequence identity and, therefore, must be considered as distinct repeats relative to those of the plastid genome.
Figure 5 Abundance and classification of the Volvox carteri mitochondrial-DNA palindromic repeat elements. Regions of high sequence identity among the different repeat families are shaded in blue; variable sites are orange; the loop portions of the putative hairpin (more ...)
Figure 6 Abundance and classification of the Volvox carteri plastid-DNA palindromic repeat elements. Regions of high sequence identity among the different repeat families are shaded in blue; the loop portions of the putative hairpin structures are shaded in either (more ...)
The short palindromic repeats in the V. carteri
mtDNA are restricted to intergenic and intronic regions, with the exception of the palindromic elements in the L8 rRNA-coding module. All of the intergenic regions that measure >50 nt in length consist predominantly of palindromic repeats; the few intergenic regions with lengths <50 nt are composed of non-repetitive DNA. Within the intronic regions, the palindromic repeats are confined to the non-ORF portions of the group-I and group-II introns. All four of the identified mitochondrial introns contain short palindromic repeats in their non-ORF regions, including the optional group-I intron of cox1
, which was found in the mtDNA of V. carteri
strain HK10. Approximately 14,600 nt (~80%) of the 18,355 nt of noncoding mtDNA that were sequenced are composed of short palindromic repeats. A dotplot similarity matrix of the V. carteri
mtDNA plotted against itself, shown in Supplementary Figure S1 [see Additional file 2
], emphasizes the magnitude of repetitive DNA in this genome and draws attention to the high degree of sequence identity between the different palindromic elements within and among the various intergenic and intronic regions.
The short palindromic repeat elements identified in the V. carteri
mtDNA show >50% sequence identity with one another and share similar structural and compositional traits (Figure ). The individual palindromes range from 11–77 nt in length (the average size is 50 nt) and from 71–84% in their AT content. When the palindromes are folded into hairpin structures, the stem component of the hairpin varies from 4–37 nt in length, and the loop portion is usually 3–5 nt long and frequently has the sequence 5'-TAAA-3' or 5'-TTTA-3' (Figure ). In many instances, a short palindromic repeat is found inserted within another palindromic repeat, resulting in larger, more elaborate repetitive elements; these larger repeats have a maximum length of 633 nt, and, in a few cases, are found at multiple locations in the mitochondrial genome. For example, a 550 nt repeat sequence composed of complete and incomplete short palindromic units is found in the group-I introns of cob
, in the intergenic regions between cob
, and in the group-II intron of cob
. Some of these more complex repeats can also be folded into tRNA-like structures – as shown in Supplementary Figure S2 [see Additional file 3
Like the mitochondrial genome, the noncoding regions of the V. carteri
plastid genome abound with short palindromic sequences. These palindromic elements are observed in all of the sequenced intergenic regions that have lengths >100 nt and in the non-ORF portions of the psaA
group-I introns. No palindromic elements are located in the chlL
group-I intron or in any of the identified group-II introns. Overall, the short palindromic repeats constitute ~80% (~270 kb) of the 338,557 noncoding nucleotides in the V. carteri
ptDNA. A dotplot similarity matrix of the V. carteri
plastid genome plotted against itself (Supplementary Figure S3 [see Additional file 4
]) shows the high level of sequence identity among the various palindromic repeats.
In the V. carteri ptDNA, most of the short palindromic repeats contain the sequence motif 5'-TCCCCTTTAGGGA-3' (Figure ). The palindromes have a size range of 14–79 nt, with an average length of 50 nt, and, when folded into hairpin structures, their stems and loops vary in length from 5–29 nt and 3–5 nt, respectively. In most cases, the loops of the hairpin structures contain the sequence 5'-TAAA-3' or 5'-TTTA-3' (Figure ). The AT content of the ptDNA palindromes varies from 39–55%. As observed for the mtDNA, the ptDNA palindromic elements are often found inserted into one another, the consequence of which is a series of multifarious repetitive sequences.
The short palindromic repeats of the mitochondrial and plastid compartments have similar structural attributes: 1) they have proliferated in intergenic regions and non-ORF segments of introns; 2) they have an average size of 50 nt and a maximum length of ~78 nt; and 3) the loops of their hairpin structures are generally 3–5 nt long with the sequence 5'-TTTA-3' or 5'-TTTA-3'.
The nuclear genome of V. carteri shares sequence identity with organelle DNA
To investigate if the short palindromic repeats in the mtDNA and ptDNA of V. carteri
are present in the nucDNA, we analyzed the draft nuclear genome sequence of V. carteri
at the DOE JGI [18
]. Manual curation of the V. carteri
nuclear genome is still underway; therefore, only the first 75 scaffolds of the nuclear-genome assembly were analyzed. Approximately 78% of the V. carteri
nucDNA is contained in these 75 scaffolds, their cumulative length is 109.2 Mb, and each scaffold is at least 0.5 Mb long. The amount of nucDNA in these 75 scaffolds that map to mtDNA and ptDNA is described in Table ; the approximate number of nucDNA-located organelle-like repeats is outlined in Figures and .
Amount of nuclear DNA in Volvox carteri that maps to the mitochondrial and plastid genomes.
Thirty-three kilobases of nucDNA (~0.03% of the nuclear genome) share >90% identity with mtDNA; 14.7 kb (44%) of this shared sequence are homologous to the short palindromic repeats in the intergenic and non-ORF intronic regions of the mitochondrial genome; the remaining 10.6 kb (56%) map to the coding and intronic-ORF portions of the mtDNA (Table ). In the nucDNA, 802 distinct regions show homology to mtDNA; the average mapping length of these regions is 39 nt. Of the 75 nuclear scaffolds that were analyzed, all but two (scaffold 66 and 75) have at least one region that shows homology to mtDNA.
Seventy-three kilobases of the V. carteri nucDNA (~0.07% of the nuclear genome) share >90% identity with ptDNA; 50.6 kb (69%) of this shared sequence are homologous to the short palindromic repeats in the intergenic and non-ORF-intronic portions of the ptDNA, and 22.5 kb (31%) are homologous to the coding regions and intronic ORFs of the plastid genome (Table ). In the nuclear genome, 1,450 different regions show homology to ptDNA, and the average similarity length is 29 nt. All of the 75 nuclear scaffolds that were examined have at least one region that shows homology to ptDNA.
In total, 65.4 kb (~0.06%) of the V. carteri nuclear-genome-sequence data that were analyzed share sequence identity to the short palindromic repeats of the organelle genomes. The secondary structure and general characteristics of the nuclear-palindromic repeat elements are the same as those described for the organelle palindromes in Figures and .
In order to place the V. carteri
data described above in a broader context, we analyzed the C. reinhardtii
nuclear genome for regions that show sequence identity to its mtDNA and ptDNA – these results are summarized in Supplementary Table S2 [see Additional file 5
]. Only 0.007% of the C. reinhardtii
nuclear genome maps to organelle DNA (0.0035% to the mtDNA and 0.0035% to the ptDNA), which is 10-times less than what is observed for V. carteri