The analysis of 12 plant proteomes reveals a similar occurrence of disordered proteins to that found in other eukaryotic organisms [1
]. Therefore, there is no clear separation among animals, yeast and plants in terms of the total amount of predicted disordered segments. Nor clear differences were observed among different plant species belonging to bryophyta, chlorophyta and vascular plant, or among eudicots and monocots.
The amino acid composition of disordered segments in plants corresponds well with that reported for other eukaryotes [3
], which can be defined by a low frequency of bulky hydrophobic residues, which normally form the core of a folded protein, and high frequency of polar residues contributing to net charge. The minor presence of cysteine residues within disordered regions was also a characteristic feature observed in either chloroplast, mitochondrial or nuclear proteins, which fits well with other predicted disordered protein profiles [5
]. This finding supports that these features in disordered protein regions are stable during evolution. On the other hand, the distribution of disordered regions along the complete protein sequence was slightly higher in the internal parts than in the terminal parts of proteins. This feature was common for all the plant proteomes investigated and no differences were found among different species. This observation differs from the data obtained from protein 3D structures from the Protein Data Bank [31
]. These authors reported that the fraction of disordered residues is more abundant in the terminal parts (72%), constituted by 40 residues near to the N
-terminal and the C
-terminal compared with the middle part (all other residues).
Interestingly, a survey of chloroplasts and mitochondria revealed significant differences concerning the occurrence of disordered regions when compared with the nuclear genome. The percentages calculated in these organelles are in the order of magnitude of those determined in Archaea and bacteria [1
]. These data are in agreement with the bacterial origin of genes coding for these proteins. We also observed differences concerning the distribution of disordered regions in the protein chain.
It has been suggested that between 800 and 2,000 genes in the Arabidopsis thaliana
genome might come from cyanobacteria, with a majority of proteins included in the functional category of biosynthesis and metabolism [32
]. Furthermore, the analysis of 15 sequenced chloroplast genomes revealed 117 nuclear-encoded proteins that are also still present in at least one chloroplast genome [16
]. Based on these reports we evaluated the degree of disorder in both nuclear-encoded proteins, which were transferred from the plastid to the nuclear genome, and those transferred to the nucleus that also still conserve a copy in the chloroplast genome. Our results indicate that transferred proteins acquired disorder with a frequency similar to that of nucleus-encoded proteins. During evolution, organelles export their genes to the nucleus, but many of these proteins are imported to the chloroplast, with the help of transient peptides and protein-import machinery, to carry out their function. This gain of disorder can be hypothesized to be an advantage during the import-pathway across a double-membrane barrier. However, these disordered segments are not preferentially associated to transient peptides localized in the N-terminal region. Indeed, they were found to be slightly more abundant in the internal region of the protein chain. Moreover, those transferred protein coding-genes that maintain a copy in the chloroplast genome exhibit much lower disorder than those that have lost the plastid copy, similar to proteins encoded by chloroplast or bacterial genes. This fact might be revealing a selection pressure during evolution. These proteins are mainly involved in translation, transcription or RNA biosynthesis, being structural constituents of the ribosome and the ribonucleoprotein complex. The disorder in proteins encoded by ancient chloroplast genes but currently in the nucleus follows the order bryophyta
chlorophyta. In this context, the data suggest that the level of disorder introduced into plastid proteins that have moved to the nuclear genome has increased during evolutionary time, but further investigations will be necessary to clarify this issue.
The gain or loss of disorder in transferred proteins might be to some extent a stochastic process, since orthologous copies found in different plant species do not necessarily conserve disordered segments, despite presumably carrying out similar functions. This observation is in agreement with the finding that gene transfer events from the chloroplast to the nuclear genome occur much more frequently than generally believed, contributing significantly to genetic variations [35
]. In this respect it is also noted that disorder distribution in ribosomal proteins among bacterial species appears rather at random (Additional file 7
: Table S6).
Non-folding unstructured proteins and regions might be expected to change more rapidly during evolution than structured proteins because buried amino acid residues are highly constrained while disordered regions are not constrained by the structure [11
]. It is believed that disordered proteins do not exist as a single structure but rather as a conformational equilibrium of states, which interconvert into each other over a range of time scales. This feature can be an evolutionary advantage for adaptation, for instances, under stress conditions. Additionally, intrinsically disordered proteins could be more susceptible to proteolytic degradation in vitro.
The classical PEST hypothesis states that the presence of segments rich in Pro, Glu(Asp) and Ser/Thr flanked by Arg/Lys residues in proteins correlates with a short lifetime in the cell [36
]. Accordingly, the fact that a group of proteins related to the ribosome biogenesis preserved its ordered character when transferred to the nucleus could be explained by this critical role within the protein synthesis machinery which should be maintained.
On the other hand, around 25% of chloroplast ribosomal proteins transferred to the nucleus are predicted to be intrinsically disordered in our analysis. In this respect it has been argued that flexibility favours the structural assembly of components of large complexes such as those involved in ribosome and therefore such characteristic should be prevalent in certain ribosomal proteins [38
]. Moreover, RNA-binding proteins usually contain unstructured regions as is the case of the ribosomal protein L5, which is reported to be associated with 5S rRNA [39
]. Our results also indicate that intrinsic disorder is a well-conserved character in some ribosomal proteins. This is the case of L4 and L15, predicted to contain unstructured segments in all the bacterial and plant proteomes analysed. Ribosomal protein L4 is localized near the peptidyl transferase center of the bacterial ribosome [40
] and displays significant RNA chaperone activity [41
]. The L15 protein is involved at later stages during assembly [41
The comparison of disorder between bacterial and chloroplast ribosomal proteins unveiled a disorder increase in the chloroplast large 50S subunit, where proteins are in average 55 residues longer, as previously reported by Yamaguchi and Subramanian [42
], and the majority are produced by nuclear genes. This finding contrasts with the data obtained with the whole proteome, which show no differences in length between disordered and non-disordered proteins. In the case of the small 30S subunit such differences were not so clear, probably due to the higher content of chloroplast-encoded proteins, which most of them are predicted to be non-disordered. These results support our hypothesis that proteins encoded in the nuclear genome are more likely to stochastically acquire disorder. On the other hand, however, we cannot preclude that differences in rRNA composition between chloroplast (23S, 5S and 4.5S) and bacterial (23S and 5S) large 50S ribosomal subunit could also explain the gain of disorder observed in this subunit [43
Differences in the genetic machinery between plastids (prokaryotic) and nucleus (eukaryotic) could also help to explain our observations. When plastid genes reach the nucleus they move from a genetic apparatus that is compact, operon-harbouring and intron-poor, to one that is more complex, operon-splitting and intron-rich [45
]. While the gain of disorder is thought to be advantageous or neutral in many cases, there must be selective pressures that put restrictions to this apparently random process, as is the case of the chloroplast RUBISCO small subunit protein, a nuclear-encoded protein with a plastid origin, which was found to be ordered in most of the plant proteomes investigated (see Figure ).
The comparison of 3D structures of bacterial and chloroplast ribosomal subunits revealed the localization of the extra disordered proteins. For instance, S11 is localized in the mRNA path, next to the intrinsically disordered S21, which directly interacts with the 5’ untranslated region of the mRNA [46
]. In the ribosomal 50S subunit, L24 and L29 are localized surrounding the polypeptide tunnel exit site. It is worth noting that some of these chloroplastic disordered proteins are normally found in cyanobacteria (see in Additional file 7
: Table S6), but in some cases are unstructured in gram-positive bacteria and not in cyanobacteria (i.e.
S9, L29 and L31). This might be related with the fact that more Arabidopsis
proteins branched with their homologues from gram-positive bacteria (Mycobacterium
) than did with cyanobacteria (Prochlorococcus, Synechocystis
This has been interpreted as if the Arabidopsis
lineage acquired genes specifically from gram-positive bacteria subsequent to its divergence from the yeast lineage [16