|Home | About | Journals | Submit | Contact Us | Français|
The origin of new genes has long been considered a fundamental question in evolutionary biology. In eukaryotes, a major pathway for the ‘birth’ of new nuclear genes has been transfer of genes from the cytoplasmic organelles (mitochondria and plastids) to the nucleus. While the vast majority of gene transfer occurred shortly after endosymbiosis, the process continues today and is still driving the evolution of nuclear genomes. In tobacco (Nicotiana tabacum) a number of studies have indicated that DNA can transfer from the chloroplast to the nucleus at relatively high frequency. Less has been known, however, about how a newly transferred organelle gene can become activated in this new genetic environment. In a recent report we observed, in real-time, the activation of a plastid reporter gene newly transferred to the nucleus. A key observation from this study was that non-homologous repair is an important generator of novel sequence combinations which, in rare instances, can result in the nuclear activation of plastid genes. In addition, the activation of relocated genes can be aided by the fortuitous presence of plastid sequences able to promote nuclear expression.
The cells of plant, algal and some protist lineages contain three genetic compartments. These are the nucleus, which houses the majority of the genes, and the two cytoplasmic organelles—mitochondria and plastids. The genomes of each of these compartments are derived from ertswhile free-living prokaryotes. In contrast to the nuclear genome, plastid (chloroplast) and mitochondrial genomes are vastly reduced in size when compared with those of their ancestors and extant relatives.1 This reduction in genome size is due both to the loss of redundant organelle genes and to mass relocation of organelle genes to the nucleus.1,2 The majority of this gene transfer probably occurred early in eukaryote evolution,1 but molecular and bioinformatic analyses (for some examples see refs. 3–10) have shown that transfer of functional genes and DNA fragments from the plastid to the nucleus continues today.
A look at almost any sequenced eukaryote nuclear genome reveals large tracts of DNA essentially identical to the extant plastid and/or mitochondrial genomes.4 These sequences are referred to as nupts (nuclear integrants of plastid DNA) and numts (nuclear integrants of mitochondrial DNA) respectively, or collectively as norgs (nuclear integrants of cytoplasmic organelle DNA). The norgs found in published genome sequences may only be the ‘tip of the iceberg’, as contig assembly tends to minimize clone length, meaning large duplications of organelle DNA can go undetected.11 This is particularly relevant, given that many de novo insertions of plastid DNA contain multiple copies of large regions of the plastome at a single locus.5,9 The common practice of discarding ‘contaminating’ organelle DNA when assembling nuclear genome sequences, must also contribute to the under-representation of norgs in published whole genome sequences.
In tobacco (Nicotiana tabacum) it has been possible to observe experimentally, in real-time, the transfer of plastid DNA to the nucleus.5,9,10 These observations were made in plants containing, in their plastome, an aminoglycoside resistance gene (aadA) used to select the transplastomic lines, together with a closely linked kanamycin resistance gene (neo) designed for exclusive nuclear expression. As neo was not active in the chloroplast but would be active if relocated to the nucleus, transfer of this gene could be detected simply by screening seedling progeny, or cells in tissue culture, for kanamycin resistance (Fig. 1 and screen 1). As neo was already equipped for nuclear expression, these experiments did not measure the frequency of endosymbiotic gene transfer per se but merely the frequency at which plastid DNA transfers to the nucleus.
The transfer frequency was found to be remarkably high, particularly in the male germline, with one in 11,000 to 16,000 pollen grains containing a new nuclear copy of the gene.5,9 This frequency was at least 15-fold higher than that observed in the female germline9 and ~300 fold higher than that observed in somatic tissue.10 The exceptionally high rate is thought to relate to the programmed degeneration and exclusion of plastids in developing male gametophytes,9 both a result and probable cause of uni-parental inheritance. During this process plastid DNA is presumably released into the cytoplasm from where it can transfect nuclear chromosomes. It follows that other factors that compromise organelle integrity, such as environmental stress,12 may also lead to an increase in the frequency with which plastid DNA enters the nucleus.
While the rate at which plastid DNA relocates is an important question, perhaps more interesting, from both evolutionary and biotechnological perspectives, is the rate at which a cytoplasmic organelle gene may become functionally active in the nucleus. This is not a trivial process as it requires not only transfer of DNA encoding a protein to the nucleus but also acquisition of a nuclear promoter and polyadenylation signal and, if the gene product is to be targeted back to the organelle, it must also acquire a sequence specifying a transit peptide or another mechanism of protein targeting.
We addressed the question of functional gene transfer using the kanamycin resistant lines generated in the genetic screens outlined above. In these lines the chloroplast fragments transferred to the nucleus were large and so, in almost all cases, aadA was co-transferred with neo to the nucleus. Although these plants contained a copy of aadA in the nucleus, they were sensitive to aminoglycosides, as the gene had a plastid promoter and terminator and was therefore inactive (these lines no longer contained aadA in the plastome as they were the progeny of backcrosses to female wild type; Fig. 1). By screening cells in tissue culture for aminoglycoside resistance we were able to regenerate plants in which aadA had undergone nuclear activation (Fig. 1 and screen 2), paralleling the pathway of genes functionally transferred during endosymbiotic evolution. This approach had been used once previously in a study by Stegemann and Bock.13 We extended their work, screening a much larger number of independent lines and using a significantly different arrangement of the experimental genes which enabled us to uncover more diverse and evolutionarily applicable activation events. Further, we reported the complete sequence of a de novo nupt and its flanking sequences, providing insight into how chloroplast DNA fragments are incorporated into nuclear chromosomes.
Insertions of plastid DNA are notoriously difficult to characterize.14 This is due to their large size and complex nature and also the presence of many copies of identical or highly similar sequences in the chloroplast and throughout the nuclear genome (due to historic transfer events). Despite these difficulties, we reported the first full sequence of a de novo nupt and showed that it was comprised of three fragments of plastid DNA from disparate regions of the plastome.15 Analysis of the plastid/plastid and plastid/nuclear sequence junctions indicated that the insertion was mediated by synthesis-dependent non-homologous end joining (NHEJ), probably at a site of DSB repair. Despite this tripartite structure, this particular locus is likely to represent a relatively simple chloroplast DNA insertion. Indeed, it was experimentally tractable because of its comparatively short length (~17 kb) and the absence of any internal duplications. Several other lines had far more complex insertions containing approximately 10–15 copies of a >20 kb region of the chloroplast genome which together must exceed several hundred kilobases in length. Recent insertions of similar sizes have been observed in Arabidopsis11 and maize16,17 indicating that insertions of this size are not unique to tobacco. Clearly, the insertion of plastid DNA can generate remarkably large loci, containing complex arrangements of sequence from dispersed parts of the plastid genome. The probable generator of these loci is the cell's endogenous non-homologous DNA repair machinery. By ‘stitching’ together and integrating a selection of what must be a plethora of available DNA fragments, this nuclear repair pathway generates complex libraries of novel sequence combinations from which new nuclear genes can arise. Our results suggest cytoplasmic organelle DNA is a foremost contributor to the spectrum of fragments inserted during repair.
Activation of the dormant aadA gene in its new nuclear environment was revealed by screens for aminoglycoside resistance in somatic cells.13,15 The activation events were the result of local sequence rearrangements such as deletions, inversions, duplications and insertions that recruited nearby nuclear promoters and enhancers. Analyses of the resulting novel sequence junctions suggest that, similar to the insertion of plastid sequences, rearrangements causing “eukaryotization” of aadA are primarily the result of non-homologous end joining at DNA double strand breaks. Thus it is possible that integration may favor areas of the genome more susceptible to DSBs,18 a propensity which may also promote subsequent rearrangements at these loci. This is supported by the finding that large insertions of chloroplast DNA preferentially locate to pericentromeric regions6 which are known DSB hotspots.19 These regions also contain a high density of transposable elements20 which may further increase the frequency of sequence rearrangements. Accordingly, work in rice indicates that the nuclear genome continually integrates shuffles and eliminates chloroplast DNA sequences6 and in tobacco it has been shown that about 50% of nupts are deleted at high frequency within a generation of insertion.14
Notably, the majority of sequence rearrangements appear to be local in nature. This is indicated by the fact that in both studies13,15 all activations of aadA resulting from sequence rearrangements were due to the recruitment of the nearby 35S promoter, the promoter used to drive expression of the closely linked neo gene. In no case was activation due to the recruitment of native nuclear regulatory sequences. The activation frequencies determined in both screens are therefore dependent on their unique experimental arrangements and at most provide a rough upper limit of the activation frequency expected in the absence of a closely linked strong nuclear promoter. Given the large size of our screen, it is now obvious that recruitment of a native nuclear promoter is very rare, probably too rare to investigate through experimental simulation.
A surprise finding of this work was the observation that chloroplast DNA can contain fortuitous sequence elements able to promote nuclear expression. All cells from two of the lines screened were found to be uniformly resistant to aminoglycosides, indicating that in these plants aadA was expressed in the absence of any secondary sequence changes. It appears that, in these two lines, aadA was transferred to the nucleus in sufficiently high copy number that weak nuclear activity of the psbA promoter21 upstream of aadA resulted in significant nuclear expression. Interestingly, we also found that polyadenylation can occur, albeit inefficiently, at two sites within the psbA 3′ UTR of aadA transcripts. Both sites fortuitously match the loose AU-rich plant polyadenylation consensus sequence.22 Polyadenylation of nuclear transcripts from a consensus sequence within the psbA terminator was also observed in the study by Stegemann and Bock.13 They suggest that the AT-rich nature of chloroplast non-coding regions may provide many fortuitous polyadenylation sites, greatly aiding the process of gene activation after DNA transfer. This may also be true of other AT-rich regulatory elements such as TATA and CAAT boxes, both of which are found within the psbA promoter.21 Whether this weak nuclear expression is a unique characteristic of the psbA promoter, or is true more widely of chloroplast promoters, remains to be seen.
The use of chloroplast biotechnology will undoubtedly play an important role in securing food supply for an increasing world population.23 An often-advocated advantage of chloroplast biotechnology is that transgene containment is provided through maternal inheritance of chloroplasts.24 However, to fully understand the level of transgene containment provided, the rate of functional gene transfer to the nucleus must be considered. In our estimates, while it is clear that functional transfer of a chloroplast transgene to the nucleus is possible, provided the chloroplast promoter has no latent nuclear activity, paternal chloroplast leakage remains the major threat to escape.25,26 As little is currently known about the nuclear activity of chloroplast promoters, it may be prudent to assess chloroplast promoters for nuclear activity prior to use in chloroplast biotechnology. In addition, we observed that nuclear activation of a chloroplast transgene primarily occurs via the recruitment of a closely linked nuclear promoter. This point should be considered if simultaneously transforming nuclear and chloroplast genomes.27 This process has the potential to co-integrate nuclear and chloroplast transgenes at a single nuclear locus, inevitably increasing the chance of generating an active nuclear copy of the chloroplast transgene.
Over the past decade a very dynamic picture of organelle DNA in the nuclear genome has arisen, leading to the current model of functional gene transfer (Fig. 2). Organelle DNA is incorporated into the nuclear genome at a strikingly high rate and the resulting loci are often very large, containing multiple copies of DNA fragments from disparate regions of the organelle genomes. Once in the nucleus, these sequences evolve rapidly, generating many new sequence arrangements. In some rare instances these new sequence combinations give rise to new nuclear genes. The process of gene activation may be aided by the presence of latent gene regulatory elements in the AT-rich chloroplast non-coding DNA. While these new nuclear genes may be inefficiently expressed initially, a low level of expression may be sufficient to selectively maintain the gene while more affective transcription and polyadenylation is established through further mutation. The rarity of these events may deter experimental investigation by laboratory scientists but this should not lead to complacency among gene regulation authorities. Broad acre crops surely offer the cell numbers required to uncover endosymbiotic evolutionary gene transfer events.
This research was supported under the Australian Research Council's Discovery Projects funding scheme (project number DP0986973).