Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)
more »
Year of Publication
Document Types
author:("Yin, yanbian")
1.  Glycosyltransferase Family 43 Is Also Found in Early Eukaryotes and Has Three Subfamilies in Charophycean Green Algae 
PLoS ONE  2015;10(5):e0128409.
The glycosyltransferase family 43 (GT43) has been suggested to be involved in the synthesis of xylans in plant cell walls and proteoglycans in animals. Very recently GT43 family was also found in Charophycean green algae (CGA), the closest relatives of extant land plants. Here we present evidence that non-plant and non-animal early eukaryotes such as fungi, Haptophyceae, Choanoflagellida, Ichthyosporea and Haptophyceae also have GT43-like genes, which are phylogenetically close to animal GT43 genes. By mining RNA sequencing data (RNA-Seq) of selected plants, we showed that CGA have evolved three major groups of GT43 genes, one orthologous to IRX14 (IRREGULAR XYLEM14), one orthologous to IRX9/IRX9L and the third one ancestral to all land plant GT43 genes. We confirmed that land plant GT43 has two major clades A and B, while in angiosperms, clade A further evolved into three subclades and the expression and motif pattern of A3 (containing IRX9) are fairly different from the other two clades likely due to rapid evolution. Our in-depth sequence analysis contributed to our overall understanding of the early evolution of GT43 family and could serve as an example for the study of other plant cell wall-related enzyme families.
PMCID: PMC4449007  PMID: 26023931
2.  AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees 
PLoS ONE  2014;9(6):e98844.
A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at
PMCID: PMC4044049  PMID: 24892935
3.  Bi-Factor Analysis Based on Noise-Reduction (BIFANR): A New Algorithm for Detecting Coevolving Amino Acid Sites in Proteins 
PLoS ONE  2013;8(11):e79764.
Previous statistical analyses have shown that amino acid sites in a protein evolve in a correlated way instead of independently. Even though located distantly in the linear sequence, the coevolved amino acids could be spatially adjacent in the tertiary structure, and constitute specific protein sectors. Moreover, these protein sectors are independent of one another in structure, function, and even evolution. Thus, systematic studies on protein sectors inside a protein will contribute to the clarification of protein function. In this paper, we propose a new algorithm BIFANR (Bi-factor Analysis Based on Noise-reduction) for detecting protein sectors in amino acid sequences. After applying BIFANR on S1A family and PDZ family, we carried out internal correlation test, statistical independence test, evolutionary rate analysis, evolutionary independence analysis, and function analysis to assess the prediction. The results showed that the amino acids in certain predicted protein sector are closely correlated in structure, function, and evolution, while protein sectors are nearly statistically independent. The results also indicated that the protein sectors have distinct evolutionary directions. In addition, compared with other algorithms, BIFANR has higher accuracy and robustness under the influence of noise sites.
PMCID: PMC3835919  PMID: 24278175
4.  Genomic Arrangement of Regulons in Bacterial Genomes 
PLoS ONE  2012;7(1):e29496.
Regulons, as groups of transcriptionally co-regulated operons, are the basic units of cellular response systems in bacterial cells. While the concept has been long and widely used in bacterial studies since it was first proposed in 1964, very little is known about how its component operons are arranged in a bacterial genome. We present a computational study to elucidate of the organizational principles of regulons in a bacterial genome, based on the experimentally validated regulons of E. coli and B. subtilis. Our results indicate that (1) genomic locations of transcriptional factors (TFs) are under stronger evolutionary constraints than those of the operons they regulate so changing a TF's genomic location will have larger impact to the bacterium than changing the genomic position of any of its target operons; (2) operons of regulons are generally not uniformly distributed in the genome but tend to form a few closely located clusters, which generally consist of genes working in the same metabolic pathways; and (3) the global arrangement of the component operons of all the regulons in a genome tends to minimize a simple scoring function, indicating that the global arrangement of regulons follows simple organizational principles.
PMCID: PMC3250446  PMID: 22235300
5.  Evolution of Plant Nucleotide-Sugar Interconversion Enzymes 
PLoS ONE  2011;6(11):e27995.
Nucleotide-diphospho-sugars (NDP-sugars) are the building blocks of diverse polysaccharides and glycoconjugates in all organisms. In plants, 11 families of NDP-sugar interconversion enzymes (NSEs) have been identified, each of which interconverts one NDP-sugar to another. While the functions of these enzyme families have been characterized in various plants, very little is known about their evolution and origin. Our phylogenetic analyses indicate that all the 11 plant NSE families are distantly related and most of them originated from different progenitor genes, which have already diverged in ancient prokaryotes. For instance, all NSE families are found in the lower land plant mosses and most of them are also found in aquatic algae, implicating that they have already evolved to be capable of synthesizing all the 11 different NDP-sugars. Particularly interesting is that the evolution of RHM (UDP-L-rhamnose synthase) manifests the fusion of genes of three enzymatic activities in early eukaryotes in a rather intriguing manner. The plant NRS/ER (nucleotide-rhamnose synthase/epimerase-reductase), on the other hand, evolved much later from the ancient plant RHMs through losing the N-terminal domain. Based on these findings, an evolutionary model is proposed to explain the origin and evolution of different NSE families. For instance, the UGlcAE (UDP-D-glucuronic acid 4-epimerase) family is suggested to have evolved from some chlamydial bacteria. Our data also show considerably higher sequence diversity among NSE-like genes in modern prokaryotes, consistent with the higher sugar diversity found in prokaryotes. All the NSE families are widely found in plants and algae containing carbohydrate-rich cell walls, while sporadically found in animals, fungi and other eukaryotes, which do not have or have cell walls with distinct compositions. Results of this study were shown to be highly useful for identifying unknown genes for further experimental characterization to determine their functions in the synthesis of diverse glycosylated molecules.
PMCID: PMC3220709  PMID: 22125650

Results 1-5 (5)