Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)
more »
Year of Publication
Document Types
author:("Yin, yanbian")
1.  Caldicellulosiruptor Core and Pangenomes Reveal Determinants for Noncellulosomal Thermophilic Deconstruction of Plant Biomass 
Journal of Bacteriology  2012;194(15):4015-4028.
Extremely thermophilic bacteria of the genus Caldicellulosiruptor utilize carbohydrate components of plant cell walls, including cellulose and hemicellulose, facilitated by a diverse set of glycoside hydrolases (GHs). From a biofuel perspective, this capability is crucial for deconstruction of plant biomass into fermentable sugars. While all species from the genus grow on xylan and acid-pretreated switchgrass, growth on crystalline cellulose is variable. The basis for this variability was examined using microbiological, genomic, and proteomic analyses of eight globally diverse Caldicellulosiruptor species. The open Caldicellulosiruptor pangenome (4,009 open reading frames [ORFs]) encodes 106 GHs, representing 43 GH families, but only 26 GHs from 17 families are included in the core (noncellulosic) genome (1,543 ORFs). Differentiating the strongly cellulolytic Caldicellulosiruptor species from the others is a specific genomic locus that encodes multidomain cellulases from GH families 9 and 48, which are associated with cellulose-binding modules. This locus also encodes a novel adhesin associated with type IV pili, which was identified in the exoproteome bound to crystalline cellulose. Taking into account the core genomes, pangenomes, and individual genomes, the ancestral Caldicellulosiruptor was likely cellulolytic and evolved, in some cases, into species that lost the ability to degrade crystalline cellulose while maintaining the capacity to hydrolyze amorphous cellulose and hemicellulose.
PMCID: PMC3416521  PMID: 22636774
2.  Genome-scale identification of cell-wall related genes in Arabidopsis based on co-expression network analysis 
BMC Plant Biology  2012;12:138.
Identification of the novel genes relevant to plant cell-wall (PCW) synthesis represents a highly important and challenging problem. Although substantial efforts have been invested into studying this problem, the vast majority of the PCW related genes remain unknown.
Here we present a computational study focused on identification of the novel PCW genes in Arabidopsis based on the co-expression analyses of transcriptomic data collected under 351 conditions, using a bi-clustering technique. Our analysis identified 217 highly co-expressed gene clusters (modules) under some experimental conditions, each containing at least one gene annotated as PCW related according to the Purdue Cell Wall Gene Families database. These co-expression modules cover 349 known/annotated PCW genes and 2,438 new candidates. For each candidate gene, we annotated the specific PCW synthesis stages in which it is involved and predicted the detailed function. In addition, for the co-expressed genes in each module, we predicted and analyzed their cis regulatory motifs in the promoters using our motif discovery pipeline, providing strong evidence that the genes in each co-expression module are transcriptionally co-regulated. From the all co-expression modules, we infer that 108 modules are related to four major PCW synthesis components, using three complementary methods.
We believe our approach and data presented here will be useful for further identification and characterization of PCW genes. All the predicted PCW genes, co-expression modules, motifs and their annotations are available at a web-based database:
PMCID: PMC3463447  PMID: 22877077
Plant cell wall; Arabidopsis; Co-expression network analysis; Bi-clustering; Cis regulatory motifs
3.  The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces 
Nucleic Acids Research  2012;40(17):8210-8218.
The majority of bacterial genes are located on the leading strand, and the percentage of such genes has a large variation across different bacteria. Although some explanations have been proposed, these are at most partial explanations as they cover only small percentages of the genes and do not even consider the ones biased toward the lagging strand. We have carried out a computational study on 725 bacterial genomes, aiming to elucidate other factors that may have influenced the strand location of genes in a bacterium. Our analyses suggest that (i) genes of some functional categories such as ribosome have higher preferences to be on the leading strands; (ii) genes of some functional categories such as transcription factor have higher preferences on the lagging strands; (iii) there is a balancing force that tends to keep genes from all moving to the leading and more efficient strand and (iv) the percentage of leading-strand genes in an bacterium can be accurately explained based on the numbers of genes in the functional categories outlined in (i) and (ii), genome size and gene density, indicating that these numbers implicitly contain the information about the percentage of genes on the leading versus lagging strand in a genome.
PMCID: PMC3458553  PMID: 22735706
4.  dbCAN: a web resource for automated carbohydrate-active enzyme annotation 
Nucleic Acids Research  2012;40(Web Server issue):W445-W451.
Carbohydrate-active enzymes (CAZymes) are very important to the biotech industry, particularly the emerging biofuel industry because CAZymes are responsible for the synthesis, degradation and modification of all the carbohydrates on Earth. We have developed a web resource, dbCAN (, to provide a capability for automated CAZyme signature domain-based annotation for any given protein data set (e.g. proteins from a newly sequenced genome) submitted to our server. To accomplish this, we have explicitly defined a signature domain for every CAZyme family, derived based on the CDD (conserved domain database) search and literature curation. We have also constructed a hidden Markov model to represent the signature domain of each CAZyme family. These CAZyme family-specific HMMs are our key contribution and the foundation for the automated CAZyme annotation.
PMCID: PMC3394287  PMID: 22645317
5.  Genomic Arrangement of Regulons in Bacterial Genomes 
PLoS ONE  2012;7(1):e29496.
Regulons, as groups of transcriptionally co-regulated operons, are the basic units of cellular response systems in bacterial cells. While the concept has been long and widely used in bacterial studies since it was first proposed in 1964, very little is known about how its component operons are arranged in a bacterial genome. We present a computational study to elucidate of the organizational principles of regulons in a bacterial genome, based on the experimentally validated regulons of E. coli and B. subtilis. Our results indicate that (1) genomic locations of transcriptional factors (TFs) are under stronger evolutionary constraints than those of the operons they regulate so changing a TF's genomic location will have larger impact to the bacterium than changing the genomic position of any of its target operons; (2) operons of regulons are generally not uniformly distributed in the genome but tend to form a few closely located clusters, which generally consist of genes working in the same metabolic pathways; and (3) the global arrangement of the component operons of all the regulons in a genome tends to minimize a simple scoring function, indicating that the global arrangement of regulons follows simple organizational principles.
PMCID: PMC3250446  PMID: 22235300

Results 1-5 (5)