Cysteine proteases and their inhibitors have been studied in detail for many plants, but, to date, little work has been done on these genes or proteins from coffee. Here we describe full length cDNA representing two cysteine proteinase genes, called CcCP1 and CcCP4, which show relatively exclusive expression during grain maturation and germination. We also present the characterization of full length cDNA representing four cysteine proteinase inhibitor genes (CPI-1--CPI-4) and describe the quantitative expression of these genes in coffee.
Sequence comparisons indicated that the two Robusta cysteine proteinases described represent different CP proteinases; Blast analysis against the protein database indicates that CP1 is very closely related to a papain type CP called VsCPR4 from Vicia sativa
(GenBank Accession CAB16316
), as well as a putative CP of Arabidopsis (AT3G54940, GenBank Accession NP_567010
). Examination of Figure shows the protein sequence of CcCP1 and its putative homologues contain a partial ERFNIN box (ERFNAQ) within an N-terminal cathepsin propeptide inhibitor domain, indicating that these polypeptides fall within the CP subfamily having an I29 domain that is found at the N-terminus of some C1 peptidases, like Cathepsin L, where it acts as a propeptide. (http://merops.sanger.ac.uk
]). The I29 domain of CcCP1 and its putative homologues are located just upstream of a clear peptidase C1 superfamily domain which have strong homologies with peptidase C1A_cathepsins_B/C/X. Several other CP specific elements are also completely conserved in the three highly similar protein sequences presented in Figure . The CPR4 polypeptide of Vicia sativa
has been shown to be expressed during seed maturation and during the early part of seedling germination/growth in both the embryonic axis and the cotyledons [29
]. Although the functional activity of this protein has not been proven in a recombinant form, these authors nevertheless implied the processed, and thus presumably activated polypeptide, was involved in seed storage protein mobilization. Microarray expression analysis of the potential Arabidopsis CP1 homologue (AT3G54940) indicates this gene has significant expression in developing endosperm and in the embryos of developing and germinating seedlings (https://www.genevestigator.com
]), supporting the idea that these highly related polypeptides play a role in storage protein modification and/or mobilization. Little expression was seen for the Arabidopsis gene in other tissues under normal conditions. The expression profile of CcCP1 transcripts (Figure ) mirrors the expression of the candidate homologues VsCPR4 and AT3G54940, ie, CcCP1 is expressed in the later stages of grain development and during germination, and implying that CcCP1 performs a similar function as the putative homologue of the other two plants. To date, it has apparently not been possible to express and/or correctly activate any of the recombinant CcCP1 homologues in order to confirm their function. The most likely explanation for this inability to verify the activity of these proteins is that the precise conditions needed to process/activate these proteins have not yet been identified.
The alignment of CcCP4 with two of the closest well characterized plant sequences (NtCP56, and SlCysEP) indicates that CcCP4 and its proposed homologues have a clear ERFNI/VN box within an N-terminal I29 type propeptide domain, followed by a peptidase C1 superfamily domain containing conserved cysteine proteinase specific sequence elements (see Figure for details). All three polypeptides contain N-terminal signal sequences, and the two homologues have a C-terminal endoplasmic reticulum retention sequence (KDEL). Interestingly, the CcCP4 cDNA sequence characterized contains the C-terminal sequence KDDL. In order to explore whether the KDDL sequence was unusual in coffee, we examined the sequences of other cDNA in the coffee unigene (SGN-U613447). This unigene, which is the only clear hit obtained when the Coffea canephora
Unigene set at http://solgenomics.net
is blasted with the CcCP4 protein sequence, has 22 ESTs. Analysis of these ESTs showed that 14 have sequence data for the C-terminal end, and, interestingly 2 of these end with the KDEL sequence. This observation suggests that Robusta may have both KDDL and KDEL alleles of CcCP4. A preliminary PCR analysis of genomic fragments from this region of the CcCP4 gene in Coffea eugenoides
and Coffea arabica
suggests that the KDEL allele is more prominent in these species (data not shown). The significance of CP4 alleles with a C-terminal KDDL sequence is currently unclear. However, the fact that several other plant sequences in the protein database related to CcCP4 also have C-terminal KDEL or RDEL sequences (data not shown) suggests this is the more prominent form found in plants. Also, several groups have proposed that unprocessed CP-KDEL proteins are retained in the endoplasmic reticulum (ER) after synthesis, and are only processed/transported further upon specific signaling [34
], and another group suggested that C-terminal KDDL proteins could be poorly retained in the ER [37
]. These observations raise the possibility that an expressed CcCP4-KDDL protein might be poorly retained in the ER and thus could exist in unintended compartments of developing coffee grain cells, with unknown consequences. Future experiments comparing physiological or other differences between seeds of Robusta trees homozygous for the CcCP4-KDDL or CcCP4-KDEL genes could be illuminating.
Both the proposed tobacco and tomato CcCP4 homologues are known to be involved in pollen development. Zhang et al. [4
] confirmed that the tobacco protein (NtCP56) encoded a functional, acid activated CP proteinase and then went on to show that anti-sense suppression of this gene can disrupt normal pollen development and cause male sterility. The tomato SlCysEP gene product was also shown to encode an acid activated CP proteinase and to be an important component of the tomato ricinosome, which is a subcellular structure believed to orchestrate the final processing/recycling of cellular proteins during plant programmed cell death [5
]. In each case, the recombinant CP polypeptides produced in E. coli
were insoluble, and, as shown here for the coffee CcCP4, needed to be refolded to demonstrate auto-cleavage and cysteine protease activity. Analysis of SlCysEP transcripts showed that they could be detected in flowers at a specific period, and that this expression was primarily limited to the stamens [5
]. While the expression of NtCP56 or SlCysEP was not studied in seeds, our examination of ESTs encoding SlCysEP (http://solgenomics.net
) confirmed that cDNA representing this gene can be found in Solanum lycopersicum
EST banks from fruit, seeds, young leaves, as well as flowers (with seed libraries having the highest number of ESTs). Tomato database analysis indicates the SlCysEP gene has three introns, and that two other highly related KDEL containing "unigene" sequences can be found which potentially represent other members of this specific CP gene family. Three potential CcCP4 homologues were also identified in the Arabidopsis genome (AT5G50260, AT3G48350, and AT3G48340). Examination of the expression patterns for these genes using microarray data [33
] showed that AT5G50260 expression was limited to seeds, silique and stamen/anther, although lower levels could also be found in roots, but not in stems, from plants subjected to osmotic stress. Interestingly, some induction of AT5G50260 also appeared in nematode infested roots. No significant expression was seen for this gene in other tissues; in contrast, low levels of expression were found for the Arabidopsis CcCP4 like AT3G48350 gene in many tissues, suggesting this gene may play a more general role in plant cells. The only two situations that appeared to increase AT3G48350 transcript levels were treatment with uv and dramatic changes in light conditions (dark/light shifts). No probe sets were identified for the gene sequence AT3G48340, so the expression of this gene is not known.
Overall, the CcCP4 expression data presented are consistent with our proposal that CcCP4 may be involved in the PCD associated with coffee grain germination and post-germination stages. Although no significant CcCP4 expression was detected in the Robusta BP409 flower sample tested (data not shown), this may be due to the limited developmental time frame we analysed. New analysis, using several different stages of flower development is clearly needed to clarify the expected participation of CcCP4 in coffee pollen development. It is interesting to note the Vicia sativa
Proteinase A (CcCP4 homologue) was not detected during vetch seed development, but was detected in the cotyledons in the later stages of germination and post-germination (was also not detected in the seedling axis) [29
]. These observations, together with the fact that purified Vicia sativa
Proteinase A was capable of completely digesting the vetch storage proteins vicilin and legumin, led the authors to propose that Proteinase A was not involved in seed development or in the early part of storage protein mobilization, but was important for later stages of germination which involved much more extensive proteolysis [38
]. This contrasts with coffee where there appears to be two periods of CcCP4 expression, one in the developing grain (which may continue into the first part of germination), and another new burst of transcription beginning around the T2 stage of germination up to T5 stage. We currently do not know the significance of finding low CcCP4 expression in the developing grain, although we do note that the "absence" of Protein A in Vicia sativa
in developing seeds [38
] could be due to the less sensitive detection method used earlier (northern blotting versus QRT-PCR here).
The quantitative expression analysis of the four CPI genes (Figure ) showed that CcCPI-2 and CcCPI-3 are expressed in most tissues and that their levels of expression do not vary broadly. In contrast, CPI-1 had increasingly higher expression as grain development progresses (> 100 fold increase from immature to mature stages) and also showed relatively strong expression during the T2 to T5 stages of grain germination and post germination. Little CcCPI-1 expression was detected in the other tissues tested. For CcCPI-4, extremely high levels of transcripts were seen exclusively in the T5 and T6 stages of post-germination, corresponding to stages in which the cotyledons are forming. The significance of this observation is not known, but one interesting line of future investigation will be to determine whether CPI-4 expression contributes to insect tolerance/resistance at this delicate stage of plantlet development. By examining gene expression at different stages of leaf development, we also found that while CcCPI-4 is weakly expressed in young leaves, its expression increases dramatically in mature leaves (Figure ). No significant expression of CcCPI-4 was found for the other tissues tested, except a low level in the roots from Robusta BP409 cDNA set used in Figure (RQ = 0.53), which raises the interesting possibility that the higher levels of one or more CcCPI proteins could reduce damage by root pests like nematodes. Overall, the coffee CPI gene expression data suggest that CPI-2 and CPI-3 could be CP inhibitors with mostly "house-keeping" functions, while CPI-1 may play an important role during grain development, and CPI-4 could contribute to reducing damage by insects during the early life of the plantlets (first cotyledons), and perhaps in mature/old leaves and roots. Finally, as the peptide/amino acid profile of a coffee has an important impact on flavour and aroma generation during coffee grain roasting [24
], further research is warranted to investigate possible links that may exist between the allelic variation in genes encoding coffee cysteine proteinases and cysteine proteinase inhibitors and the flavour/aroma quality associated with the grain of different coffee varieties.