The first plant gene encoding a cellulose synthase catalytic subunit (CesA) was identified in 1996 in cotton based on its sequence similarity to a bacterial CesA [1
]. In 2000, Richmond and Somerville identified 10 CesA genes and 31 cellulose synthases-like (Csl) genes in Arabidopsis, which were further classified into one CesA family and six Csl families (CslA/B/C/D/E/G) based on phylogenetic analyses [2
]. Since then, the whole CesA and Csl gene repertoire has been cataloged in fully sequenced plants, including rice [3
], poplar [4
] and the moss Physcomitrella patens
]. Additional CesA and Csl genes have also been found in diverse and not fully sequenced land plants such as maize [7
], barley [8
] and pine [9
]; CesAs have been identified in streptophyte green algae such as Mesotaenium caldariorum
] and in red alga Porphyra yezoensis
] as well. Two additional Csl families (CslF and CslH) were found in these studies; together with the other six Csl families and one CesA family, they comprise the CesA superfamily.
The CesA superfamily genes are among the most important players involved in the biosynthesis of plant cell walls, which are mainly composed of biopolymers such as celluloses, hemicelluloses, pectins and lignins. Because the Csl genes share sequence similarities with the CesA genes, they are hypothesized to be involved in the biosynthesis of the backbone of various polysaccharide polymers [2
], in particular hemicelluloses [14
]. This so-called "CSL hypothesis" has been supported by recent experimental studies, which suggest that the CslA genes encode the mannan synthases [15
], the CslF and CslH genes encode the mixed linkage glucan synthases [17
], and the CslC genes are probably involved in the xyloglucan biosynthesis [19
]. Therefore the backbone synthases of all major hemicellulose classes except for xylans are known. However, the functional roles of the other Csl families (CslB/D/E/G) remain unclear.
The phylogenetic classification and the function of the CesA superfamily were reviewed by Lerouxel et al
. in 2006 [14
], and since then there have been a few updates in terms of the phylogenetic analyses of these important genes. Fincher et al
. have found a new Csl family (CslJ) in cereals [20
]. Roberts and Bushoven have mined the P. pattens
genomic and EST data and found CesA, CslA, CslC and CslD genes in this lower plant [6
]; their phylogenetic analyses revealed that seven P. patens
CesA genes form a monophyletic clade by themselves and there are no one-to-one orthologs in the moss corresponding to the Arabidopsis CesA triplet subunits (CesA1/3/6 for the primary cell wall and CesA4/7/8 for the secondary cell wall). Furthermore, comprehensive phylogenetic analyses of the plant CesA superfamily by including CesAs from other organismal groups (e.g., bacteria, fungi and animals) indicated that plant CslA and CslC genes have a different origin than the remaining plant genes [22
]. Evidences have been reported that these remaining genes of the CesA superfamily were anciently acquired from cyanobacteria [23
]. It was proposed [22
] that the plant CslG genes evolved first, followed by the CslE, CslB, CesA and CslD/F genes. However, a more recent study could not find homologs of the CslG/E/B/H/F genes in P. patens
], suggesting that these Csl families are narrowly distributed and unlikely to be the earliest evolved.
To date 17 plant and algal genomes have been fully or nearly fully sequenced, and their gene prediction and annotation are publicly available (Table ). The availability of these genomes and their annotated genes facilitates comparative genomic studies of plants, making it possible to address major plant biology questions in silico
]. We have performed comparative analyses of the CesA superfamily genes in the 17 sequenced plant and algal genomes. Our goals are to define CesA and Csl gene homologs across these genomes and to investigate the evolution of different Csl gene families. We have built a catalog of all the Csl genes and classified them phylogenetically. The gene structure, the evolutionary rate, and the distribution of the Csl families across different genomes are also studied. Throughout this paper, we use Csl genes to denote all cellulose synthases-like genes including CesAs.
Plant and algal genomes used in the present study