All eukaryotic genomes sequenced so far, contain a number of genes that encode for proteins whose functions are still unknown. These proteins have been documented to be induced under specific set of conditions and participate in protein-protein interactions and/or sometimes are also associated with mutant phenotypes [
1]. These proteins with unknown functions are either called
proteins with
obscure
features (POFs) when they contain no previously defined domains/motifs, or
proteins with
defined
features (PDFs) when they contain at least one previously defined domain/motif [
1,
2]. A protein domain is an evolutionarily conserved unit of protein sequence that can evolve, function and exist independently of the rest of the protein chain. In general, each domain is assumed to perform a specific function. An identical domain may appear in evolutionarily and functionally unrelated proteins, and therefore it is challenging to relate the presence of a domain with overall functionality of the protein. One of the possible approaches to address this important issue is to use the microarray data as a tool to predict the function of proteins having unknown functions, as suggested [
3]. Recently, a number of these kinds of proteins have been characterized in
Arabidopsis and
Oryza using transcriptome studies as well as functional genomics tools, by raising transgenic plants. It has been reported that some of these proteins of unknown function(s) can indeed improve tolerance of transgenic plants to oxidative stress [
4].
To understand the probable mechanism of abiotic stress tolerance in
Oryza sativa, we have made an attempt to characterize several unknown members of the stress responsive machinery [
5]. A group of these proteins of unknown functions were found to have cystathionine-β-synthase (CBS) domain and were differentially regulated in the contrasting genotypes of rice indicating towards their probable role in salinity tolerance. Thus, we assume that these proteins may be participating in known pathways and networks and/or be involved in basic or specialized processes and also might comprise new and undiscovered pathways.
CBS domains are found to be associated with several proteins of unrelated functions, such as
inosine-5'-
monophosphate
de
hydrogenase (IMPDH),
AMP-activated
protein
kinase (AMPK),
ch
loride
channels (CLC) and
cystathionine-β-
synthase (CBS). The importance of CBS domain was realized by the observation that point mutations in the CBS domain cause several hereditary diseases in humans [
6]. CBS domain was first discovered by Bateman [
7] in the genome of the archaebacterium
Methanococcus jannaschii as a conserved domain in a group of proteins. CBS domain exists not only in archaebacterial proteins, but also in eubacterial and eukaryotic proteins [
6]. The name of the CBS domain was coined after its discovery in human CBS enzyme, which is the first enzyme involved in the reverse transsulfuration pathway in which homocysteine is converted to cysteine via cystathionine. In plants and bacteria, transsulfuration pathway operates in forward direction leading to conversion of cysteine to homocysteine by the action of cystathionine-γ-synthase and β-cystathionase. While in mammals, reverse transsulfuration pathway is found in which cysteine is derived from homocysteine by CBS and γ-cystathionase enzymes (Figure ). Yeast and some archaebacteria possess both transsulfuration pathways [
8].
In CBS protein, C-terminal CBS domain exerts an autoinhibitory effect on the CBS activity, while binding of SAM (S-adenosylmethionine) with CBS domain induces a conformational change which relieves the autoinhibitory effect. Mutation in CBS domains abolish or strongly reduce activation by SAM and cause homocystinuria. CBS domain of γ-subunit of AMPK acts as sensor of cellular energy status and mutations cause a glycogen storage disease, which is clinically expressed as a familial hypertrophic cardiomyopathy (Wolff-Parkinson-White syndrome) [
9-
12]. Scott
et al. [
13] reported that CBS domain of IMPDH binds to ATP in a positive cooperative way and activates IMPDH. ATP binding and activation was abolished by a point mutation which corresponds to the mutation causing retinitis pigmentosa [
14]. Earlier research shows that the CBS module in ATP-binding cassette transporter OpuA constitutes the ionic strength sensor whose activity is modulated by the C-terminal anionic tail [
15]. In CLCs, function of CBS domain remains unresolved and controversial. However, it has been shown that CBS domains in human CLCs are required for its function and/or expression because mutations in the CBS domain of CLCs cause diseases due to CLC dysfunction [
16-
21]. However, in plants information available on CLCs is very limited. The first CLC gene (
CLC-Nt1) was identified from tobacco [
22], thereafter CLC genes were identified and characterized from
Arabidopsis [
23-
25] and rice [
26], but function of CBS domain in these CLC channel genes has not been resolved.
The availability of complete genome sequence along with the microarray expression and Massively Parallel Signature Sequencing (MPSS) data makes
Arabidopsis an ideal plant for study of newly identified protein family [
27]. In the present work, we have performed genome wide analysis in the two highly finished plant genomes i.e.
Arabidopsis thaliana and
Oryza sativa where we have identified and classified the CDCPs based on their conserved features. To further establish their possible involvement in development and abiotic stresses, we have analyzed the expression of genes encoding CDCPs using MPSS database and already existing
Arabidopsis microarray database
http://www.arabidopsis.org.