|Home | About | Journals | Submit | Contact Us | Français|
Phenotypes and behaviors respond to resource constraints via adaptation, but the influence of ecological limitations on the composition of eukaryotic genomes is still unclear. We trace connections between plant ecology and genomes through their elemental composition. Inorganic sources of nitrogen (N) are severely limiting to plants in natural ecosystems. This constraint would favor the use of N-poor nucleotides in plant genomes. We show that the transcribed segments of undomesticated plant genomes are the most N poor, with genomes and proteomes bearing signatures of N limitation. Consistent with the predictions of natural selection for N conservation, the precursors of transcriptome show the greatest deviations from Chargaff's second parity rule. Furthermore, crops show higher N contents than undomesticated plants, likely due to the relaxation of natural selection owing to the use of N-rich fertilizers. These findings indicate a fundamental role of N limitation in the evolution of plant genomes, and they link the genomes with the ecosystem context within which biota evolve.
Ammonia, nitrates, and nitrites constitute the primary sources of the nitrogen (N) atoms incorporated in the nitrogenous bases that comprise the building blocks of plant genomes (Berg et al. 2006). These sources of inorganic N are severely limited in the natural ecosystems (Elser et al. 2007), which may promote the use of bases requiring fewer N atoms. Thymine (T) contains two N atoms, cytosine (C) three, and adenine (A) and guanine (G) five. Although the avoidance of N-rich nucleotides in DNA sequences can reduce the N cost, the savings differ for double- and single-stranded sequences. Due to the conformational requirements of double-stranded DNA, genomic regions afford minimal opportunities for N conservation (0.5 N atoms per base): A and T require 3.5 N atoms per base and G and C require 4 N atoms per base in double-stranded configurations. In contrast, transcribed genomic regions give rise to single-stranded RNA in which the difference in the N cost is as high as three N atoms per base. Thus, the effects of natural selection for N conservation, if any, should be more pronounced in these templates.
To test these predictions, we analyzed the Arabidopsis thaliana (thale cress) genome and found that the transcribed segments of the genome have a significantly lower N content than the whole genome (5.3% difference; P « 0.01, Z-test; fig. 1A). Only 5% of the transcribed segments show a higher N content than the whole genome. In contrast, animal genomes and transcriptomes are almost identical in N content (<0.02% difference; table 1 and fig. 1B). This is reasonable because the biosynthesis of nucleotides in animals is not likely N limited because animals feed on an N-rich biomass already containing N in preformed amino acids (Berg et al. 2006). In these analyses, we excluded exons when estimating the N content of transcribed regions because their nucleotide composition is dictated by the protein sequence encoded (considered separately in the amino acid sequence analysis below) and by the secondary and tertiary structural constraints on the mature transcripts.
In comparisons of N-limited (plants) and N-sufficient (animals) organisms, the difference in N content is an order of magnitude higher in transcriptomes than in genomes. This disparity reflects differences in the overall contribution of RNA and DNA to cellular biomass: DNA generally contributes less than 2% to overall organismal biomass, whereas RNA can constitute up to 15% of the biomass in multicellular eukaryotes (Sterner and Elser 2002; Elser et al. 2003). These results are also consistent with the observation that, on average, the most highly expressed proteins in plant species show the lowest N content, whereas in animals use of N-rich amino acids is not a function of the amount of gene expression (Elser et al. 2006). Plant proteomes contain 7% fewer (P « 0.01, Student's t-test) N atoms than the two animal taxa (table 1). Taken together, our results add to accumulating evidence for an imprint of natural selection for N conservation on a genomic scale in plants.
Comparison of local deviations from Chargaff's second parity rule in the transcriptome is an independent tool for testing the natural selection hypothesis for N conservation in plants. Chargaff's second parity rule asserts (approximate) equality of the frequencies of A and T nucleotides as well as the equality of frequencies of C and G nucleotides in any single strand of DNA. Deviations from this rule are known to have functional correlations, for example, with the direction of transcription (Bell and Forsdyke 1999; Paz et al. 2007). Under an N-conservation regime, we expect to find a large deviation from Chargaff's rule in the sense strand, such that it conserves N in plants. This is indeed the case: the A. thaliana transcriptome shows a 5-fold greater deviation toward low-N nucleotides than that in animals (fig. 1C; P « 0.01, t-test). As expected, whole-genome analyses do not show appreciable deviations from Chargaff's second parity rule. Furthermore, animal transcriptomes do not show significant deviations from Chargaff's second parity rule (fig. 1C).
Laboratory evidence for the role of natural selection for N conservation in the genomes of plants is not yet available. However, we can take advantage of a natural experiment by examining the N content of crop plant genomes as massive nitrogen enrichment by fertilization of cultivated soils is tantamount to removing the selection pressure exerted by N limitation. In this case, purifying selection would no longer act with high intensity against mutations leading to N-rich nucleotides and amino acids. Consequently, crop species should show a higher N content and lower deviation from Chargaff's rule than A. thaliana.
Analyses of the completely sequenced and annotated genome of domesticated rice (Oryza sativa) produces N content estimates that are significantly higher than those found in A. thaliana (fig. 2A; P « 0.01), which is consistent with our predictions. In addition, deviations from Chargaff's rule have been reduced by half in O. sativa relative to A. thaliana (fig. 2B). Furthermore, proteomic N content is higher in O. sativa than in A. thaliana (P « 0.01, t-test). In fact, a variety of domesticated species show higher proteomic N contents than undomesticated plants (fig. 2C). Plants harboring N-fixing bacteria also show a higher N content than the nondomesticated taxa (fig. 3). Interestingly, phylogenetically divergent crop species (e.g., the dicots Medicago truncatula and Lotus japonicum) show an N content that is more similar to a monocot crop (Zea mays) than to another undomesticated dicot (A. thaliana) (fig. 2C). Therefore, the patterns observed for rice should extend to other domesticated plants as well. It is remarkable that the removal of natural selection (via the use of fertilizers or the presence of N-fixing symbiotic bacteria) has quickly erased ancestral signatures of N limitation and altered the genomic architecture of many species at a fundamental level.
In summary, our findings directly implicate ecological limitations in altering the composition of molecular moieties associated with the flow of genetic information from genome to transcriptome. Thus, environmental growth limitations directly impact the biochemical structure of information storage and processing in both microbial and multicellular forms of life (McEwan et al. 1998; Baudouin-Cornu et al. 2001, 2004; Bragg and Hyder 2004; Bragg et al. 2006; Elser et al. 2006; Kolkman et al. 2006; Acquisti et al. 2007; Bragg and Wagner 2007). In the future, availability of completely sequenced and better annotated plant genomes (including those that are undomesticated, symbiotic with N-fixing bacteria, and crop species) will permit further quantification of the genomic impact of selection pressures due to N limitation across multiple plant phyla and ecosystems.
Completely sequenced and annotated genomes of O. sativa (rice, a crop species) and A. thaliana (thale cress) were used to represent plants. Although the sequencing of several other plant genomes is under way, the absence of well-assembled and well-annotated genomic data and gene models precludes their use at present. Genomic sequences and gene models were retrieved from the TIGR database (ftp://ftp.tigr.org/pub/data) for O. sativa (release 5.0) and from the TAIR database (ftp://ftp.arabidopsis.org/home/tair/) for A. thaliana (release TAIR 7). The available protein sequence data of many other domesticated and undomesticated plants were obtained for Populus trichocarpa, Sorghum bicolor, and Selaginella moellendorffii from the Eukaryotic Genomics database (http://genome.jgi-psf.org/) and for Z. mays, M. truncatula, L. japonicum, and Ricinus communis from the TIGR database (ftp://ftp.tigr.org/pub/data). The genomes of two highly divergent animal species were analyzed: Homo sapiens (a vertebrate) and Drosophila melanogaster (an invertebrate). Sequences and gene models were obtained from the UCSC database (ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomes/) for D. melanogaster (Release 5, FlyBase Gene Models) and for H. sapiens (hg18, RefSeq Gene Models). In a previous analysis of many complete animal genomes (Elser et al. 2006), these genomes were found to be typical animal representatives, with the exception of species in the genus Caenorhabditis, which, for unknown reasons, stand out as outliers among animals in their atomic composition (see Elser et al. 2006; Acquisti et al. 2007). For the transcribed genomic segments (referred to as transcriptome), we used the sequences of introns in the protein-coding, tRNA, and rRNA genes for estimating transcriptome N content.
N content for DNA, RNA, and protein sequences was measured in atoms per base or amino acid residue using the formula ∑(ni × pi), where the ni is the number of N atoms in the i-th base and pi is the proportion of each base in the data (∑pi = 1). For genomes (double-stranded DNA), nA = nT = 3.5 and nC = nG = 4. For transcriptomes (single-stranded RNA), nA = 5, nT = 2, nC = 3, and nG = 5. For amino acid side chains in proteomes, n = 1 for asparagine, glutamine, lysine, and tryptophan; n = 2 for histidine; n = 3 for arginine; and n = 0 for the rest. For the transcribed sequences, the strand bias (deviation from Chargaff's second parity rule) for each gene was calculated as the difference between the N contents of the transcript (sense strand) and its complement (antisense strand).
The lack of dependence of N content on the genomic G + C content has been discussed previously (Elser et al. 2006) for proteomes. We found that this extends to the transcriptome as well. The genomic G + C content of O. sativa was the highest of all the species we examined, but its transcriptome showed a significantly lower N content per nucleotide than animals, which contrasts with an expectation of high N content if G + C content dictated the N content. In addition, the analysis of the deviation from Chargaff's second parity rule (figs 1, 2A, and 2B) showed that the strand bias composition, and not the GC content alone, was the main factor determining the difference in the patterns observed in the transcriptome of plants and animals.
We thank William Fagan, James Gilbert, Thomas Wiehe, Alan Filipski, Peter Stadler, Antonio Marco, Fabia Battistuzzi, Bernhard Haubold, and Gregory Babbitt for scientific discussions; Bernard Van Emden and Revak Raj Tyagi for technical support; and Kristi Garboushian for editorial support. This work was funded by the National Science Foundation (J.J.E. and S.K.) and National Institutes of Health (S.K.).