Search tips
Search criteria

Results 1-25 (63)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  PSC: protein surface classification 
Nucleic Acids Research  2012;40(Web Server issue):W435-W439.
We recently proposed to classify proteins by their functional surfaces. Using the structural attributes of functional surfaces, we inferred the pairwise relationships of proteins and constructed an expandable database of protein surface classification (PSC). As the functional surface(s) of a protein is the local region where the protein performs its function, our classification may reflect the functional relationships among proteins. Currently, PSC contains a library of 1974 surface types that include 25 857 functional surfaces identified from 24 170 bound structures. The search tool in PSC empowers users to explore related surfaces that share similar local structures and core functions. Each functional surface is characterized by structural attributes, which are geometric, physicochemical or evolutionary features. The attributes have been normalized as descriptors and integrated to produce a profile for each functional surface in PSC. In addition, binding ligands are recorded for comparisons among homologs. PSC allows users to exploit related binding surfaces to reveal the changes in functionally important residues on homologs that have led to functional divergence during evolution. The substitutions at the key residues of a spatial pattern may determine the functional evolution of a protein. In PSC (, a pool of changes in residues on similar functional surfaces is provided.
PMCID: PMC3394246  PMID: 22669905
2.  Increasing MicroRNA Target Prediction Confidence by the Relative R-squared Method 
Journal of theoretical biology  2009;259(4):793-798.
MicroRNAs (miRNAs) are short noncoding RNAs involved in post-transcriptional gene regulation via binding to mRNAs. Studies show that in a multicellular organism microRNAs (miRNAs) downregulate a large number of target mRNAs. However, predicting the target genes of a miRNA is challenging. Microarray expression profiling has been proposed as a complementary method to increase the confidence of miRNA target prediction, but it can become computationally costly or even intractable when many miRNAs and their effects across multiple tissues are to be considered. Here, we propose a statistical method, the relative R2 method, to find high-confidence targets among the set of potential targets predicted by a computational method such as TargetScanS or by microarray analysis, when expression data of both miRNAs and mRNAs are available for multiple tissues. Applying this method to existing data, we obtain many high-confidence targets in mouse.
PMCID: PMC2744435  PMID: 19463832
microRNA; microarray; regression model; TargetScanS
3.  Improved variance estimators for one- and two-parameter models of nucleotide substitution 
Journal of theoretical biology  2008;254(1):164-167.
The current variance estimators for Jukes and Cantor’s one-parameter model and Kimura’s two-parameter model tend to seriously underestimate the true variances when the proportion of nucleotide differences between the two sequences under study is not small. In this paper, we developed improved variance estimators, using a higher order Taylor expansion and empirical methods. The new estimators outperform the conventional estimators and provide accurate estimates of the true variances.
PMCID: PMC2580800  PMID: 18571203
substitution model; variance estimator; Taylor expansion; empirical formulas
4.  Roles of cis- and trans-changes in the regulatory evolution of genes in the gluconeogenic pathway in yeast 
Molecular biology and evolution  2008;25(9):1863-1875.
The yeast Saccharomyces cerevisiae proliferates rapidly in glucose-containing media. As glucose is getting depleted, yeast cells enter the transition from fermentative to non-fermentative metabolism, known as the diauxic shift, which is associated with major changes in gene expression. To understand the expression evolution of genes involved in the diauxic shift and in non-fermentative metabolism within species, a laboratory strain (BY), a wild strain (RM), and a clinical isolate (YJM) were used in this study. Our data showed that the RM strain enters into the diauxic shift ∼1 hour earlier than the BY strain with an earlier, higher induction of many key transcription factors (TFs) involved in the diauxic shift. Our sequence data revealed sequence variations between BY and RM in both coding and promoter regions of the majority of these TFs. The key TF Cat8p, a zinc-finger cluster protein, is required for the expression of many genes in gluconeogenesis under non-fermentative growth and its derepression is mediated by deactivation of Mig1p. Our kinetic study of CAT8 expression revealed that CAT8 induction corresponded to the timing of glucose depletion in both BY and RM and CAT8 was induced up to 50-90 folds in RM, whereas only 20-30 folds in BY. In order to decipher the relative importance of cis- and trans-variations in expression divergence in the gluconeogenic pathway during the diauxic shift, we studied the expression levels of MIG1, CAT8, and their downstream target genes in the co-cultures and in the hybrid diploids of BY-RM, BY-YJM, and RM-YJM, and in strains with swapped promoters. Our data showed that the differences between BY and RM in the expression of MIG1, the upstream regulator of CAT8, were affected mainly by changes in cis elements, though also by changes in trans-acting factors, whereas those of CAT8 and its downstream target genes were predominantly affected by changes in trans-acting factors.
PMCID: PMC2515871  PMID: 18573843
cis-regulation; trans-regulation; diauxic shift; expression evolution
5.  Protein complexity, gene duplicability and gene dispensability in the yeast genome 
Gene  2006;387(1-2):109-117.
Using functional genomic and protein structural data we studied the effects of protein complexity (here defined as the number of subunit types in a protein) on gene dispensability and gene duplicability. We found that in terms of gene duplicability the major distinction in protein complexity is between hetero-complexes, each of which includes at least two different types of subunits (polypeptides), and homo-complexes, which include monomers and complexes that consist of only subunits of one polypeptide type. However, gene dispensability decreases only gradually as the number of subunit types in a protein complex increases. These observations suggest that the dosage balance hypothesis can explain gene duplicability of complex proteins well, but cannot completely explain the difference in dispensabilities between hetero-complex subunits. It is likely that knocking out a gene coding for a hetero-complex subunit would disrupt the function of the whole complex, so that the deletion effect on fitness would increase with protein complexity. We also found that multi-domain polypeptide genes are less dispensable but more duplicable than single domain polypeptide genes. Duplicate genes derived from the whole genome duplication event in yeast are more dispensable (except for ribosomal protein genes) than other duplicate genes. Further, we found that subunits of the same protein complex tend to have similar expression levels and similar effects of gene deletion on fitness. Finally, we estimated that in yeast the contribution of duplicate genes to genetic robustness against null mutation is ~ 9%, smaller than previously estimated. In yeast, protein complexity may serve as a better indicator of gene dispensability than do duplicate genes.
PMCID: PMC2707112  PMID: 17049186
Protein complex; Gene deletion; Fitness effect; Duplicate gene; Protein domain; Whole genome duplication
6.  MYBS: a comprehensive web server for mining transcription factor binding sites in yeast 
Nucleic Acids Research  2007;35(Web Server issue):W221-W226.
Correct interactions between transcription factors (TFs) and their binding sites (TFBSs) are of central importance to gene regulation. Recently developed chromatin-immunoprecipitation DNA chip (ChIP-chip) techniques and the phylogenetic footprinting method provide ways to identify TFBSs with high precision. In this study, we constructed a user-friendly interactive platform for dynamic binding site mapping using ChIP-chip data and phylogenetic footprinting as two filters. MYBS (Mining Yeast Binding Sites) is a comprehensive web server that integrates an array of both experimentally verified and predicted position weight matrixes (PWMs) from eleven databases, including 481 binding motif consensus sequences and 71 PWMs that correspond to 183 TFs. MYBS users can search within this platform for motif occurrences (possible binding sites) in the promoters of genes of interest via simple motif or gene queries in conjunction with the above two filters. In addition, MYBS enables users to visualize in parallel the potential regulators for a given set of genes, a feature useful for finding potential regulatory associations between TFs. MYBS also allows users to identify target gene sets of each TF pair, which could be used as a starting point for further explorations of TF combinatorial regulation. MYBS is available at
PMCID: PMC1933147  PMID: 17537814
7.  The prognostic significance of RUNX2 and miR-10a/10b and their inter-relationship in breast cancer 
The major cancer related mortality is caused by metastasis and invasion. It is important to identify genes regulating metastasis and invasion in order to curtail metastatic spread of cancer cells.
This study investigated the association between RUNX2 and miR-10a/miR-10b and the risk of breast cancer relapse. Expression levels of RUNX2 and miR-10a/b in108 pairs of tumor and non-tumor tissue of breast cancer were assayed by quantitative PCR analysis and evaluated for their prognostic implications.
The median expression levels of RUNX2 and miR-10b in tumor tissue normalized using adjacent non-tumor tissue were significantly higher in relapsed patients than in relapse-free patients. Higher expression of these three genes were significantly correlated with the hazard ratio for breast cancer recurrence (RUNX2: 3.02, 95% CI = 1.50 ~ 6.07; miR-10a: 2.31, 95% CI = 1.00 ~ 5.32; miR-10b: 3.96, 95% CI = 1.21 ~ 12.98). The joint effect of higher expression of all three genes was associated with a hazard ratio of 12.37 (95% CI = 1.62 ~ 94.55) for relapse. In a breast cancer cell line, RUNX2 silencing reduced the expression of miR-10a/b and also impaired cell motility, while RUNX2 overexpression elicited opposite effects.
These findings indicate that higher expression of RUNX2 and miR-10a/b was associated with adverse outcome of breast cancer. Expression levels of RUNX2 and miR-10a/b individually or jointly are potential prognostic factors for predicting breast cancer recurrence. Data from in vitro studies support the notion that RUNX2 promoted cell motility by upregulating miR-10a/b.
Electronic supplementary material
The online version of this article (doi:10.1186/s12967-014-0257-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4189660  PMID: 25266482
RUNX2; miR-10a; miR-10b; Breast cancer prognosis
8.  Maize and millet transcription factors annotated using comparative genomic and transcriptomic data 
BMC Genomics  2014;15(1):818.
Transcription factors (TFs) contain DNA-binding domains (DBDs) and regulate gene expression by binding to specific DNA sequences. In addition, there are proteins, called transcription coregulators (TCs), which lack DBDs but can alter gene expression through interaction with TFs or RNA Polymerase II. Therefore, it is interesting to identify and classify the TFs and TCs in a genome. In this study, maize (Zea mays) and foxtail millet (Setaria italica), two important species for the study of C4 photosynthesis and kranz anatomy, were selected.
We conducted a comprehensive genome-wide annotation of TFs and TCs in maize B73 and in two strains of foxtail millet, Zhang gu and Yugu1, and classified them into families. To gain additional support for our predictions, we searched for their homologous genes in Arabidopsis or rice and studied their gene expression level using RNA-seq and microarray data. We identified many new TF and TC families in these two species, and described some evolutionary and functional aspects of the 9 new maize TF families. Moreover, we detected many pseudogenes and transposable elements in current databases. In addition, we examined tissue expression preferences of TF and TC families and identified tissue/condition-specific TFs and TCs in maize and millet. Finally, we identified potential C4-related TF and TC genes in maize and millet.
Our results significantly expand current TF and TC annotations in maize and millet. We provided supporting evidence for our annotation from genomic and gene expression data and identified TF and TC genes with tissue preference in expression. Our study may facilitate the study of regulation of gene expression, tissue morphogenesis, and C4 photosynthesis in maize and millet. The data we generated in this study are available at
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-818) contains supplementary material, which is available to authorized users.
PMCID: PMC4189582  PMID: 25261191
Transcription factor annotation; Coregulators; Comparative genomics; Functional annotation
9.  Genomic Organization, Transcriptomic Analysis, and Functional Characterization of Avian α- and β-Keratins in Diverse Feather Forms 
Genome Biology and Evolution  2014;6(9):2258-2273.
Feathers are hallmark avian integument appendages, although they were also present on theropods. They are composed of flexible corneous materials made of α- and β-keratins, but their genomic organization and their functional roles in feathers have not been well studied. First, we made an exhaustive search of α- and β-keratin genes in the new chicken genome assembly (Galgal4). Then, using transcriptomic analysis, we studied α- and β-keratin gene expression patterns in five types of feather epidermis. The expression patterns of β-keratin genes were different in different feather types, whereas those of α-keratin genes were less variable. In addition, we obtained extensive α- and β-keratin mRNA in situ hybridization data, showing that α-keratins and β-keratins are preferentially expressed in different parts of the feather components. Together, our data suggest that feather morphological and structural diversity can largely be attributed to differential combinations of α- and β-keratin genes in different intrafeather regions and/or feather types from different body parts. The expression profiles provide new insights into the evolutionary origin and diversification of feathers. Finally, functional analysis using mutant chicken keratin forms based on those found in the human α-keratin mutation database led to abnormal phenotypes. This demonstrates that the chicken can be a convenient model for studying the molecular biology of human keratin-based diseases.
PMCID: PMC4202321  PMID: 25152353
keratin; feather; skin appendage; evolution; transcriptome; RNA-seq; chicken; zebra finch; in situ hybridization
10.  The genome and occlusion bodies of marine Penaeus monodon nudivirus (PmNV, also known as MBV and PemoNPV) suggest that it should be assigned to a new nudivirus genus that is distinct from the terrestrial nudiviruses 
BMC Genomics  2014;15(1):628.
Penaeus monodon nudivirus (PmNV) is the causative agent of spherical baculovirosis in shrimp (Penaeus monodon). This disease causes significant mortalities at the larval stage and early postlarval (PL) stage and may suppress growth and reduce survival and production in aquaculture. The nomenclature and classification status of PmNV has been changed several times due to morphological observation and phylogenetic analysis of its partial genome sequence. In this study, we therefore completed the genome sequence and constructed phylogenetic trees to clarify PmNV’s taxonomic position. To better understand the characteristics of the occlusion bodies formed by this marine occluded virus, we also compared the chemical properties of the polyhedrin produced by PmNV and the baculovirus AcMNPV (Autographa californica nucleopolyhedrovirus).
We used next generation sequencing and traditional PCR methods to obtain the complete PmNV genome sequence of 119,638 bp encoding 115 putative ORFs. Phylogenetic tree analysis showed that several PmNV genes and sequences clustered with the non-occluded nudiviruses and not with the baculoviruses. We also investigated the characteristics of PmNV polyhedrin, which is a functionally important protein and the major component of the viral OBs (occlusion bodies). We found that both recombinant PmNV polyhedrin and wild-type PmNV OBs were sensitive to acid conditions, but unlike the baculoviral OBs, they were not susceptible to alkali treatment.
From the viral genome features and phylogenetic analysis we conclude that PmNV is not a baculovirus, and that it should be assigned to the proposed Nudiviridae family with the other nudiviruses, but into a distinct new genus (Gammanudivirus).
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-628) contains supplementary material, which is available to authorized users.
PMCID: PMC4132918  PMID: 25063321
PmNV; Genome; Baculovirus; Nudivirus; OBs; Polyhedrin
11.  Systematic screening of glycosylation- and trafficking-associated gene knockouts in Saccharomyces cerevisiae identifies mutants with improved heterologous exocellulase activity and host secretion 
BMC Biotechnology  2013;13:71.
As a strong fermentator, Saccharomyces cerevisiae has the potential to be an excellent host for ethanol production by consolidated bioprocessing. For this purpose, it is necessary to transform cellulose genes into the yeast genome because it contains no cellulose genes. However, heterologous protein expression in S. cerevisiae often suffers from hyper-glycosylation and/or poor secretion. Thus, there is a need to genetically engineer the yeast to reduce its glycosylation strength and to increase its secretion ability.
Saccharomyces cerevisiae gene-knockout strains were screened for improved extracellular activity of a recombinant exocellulase (PCX) from the cellulose digesting fungus Phanerochaete chrysosporium. Knockout mutants of 47 glycosylation-related genes and 10 protein-trafficking-related genes were transformed with a PCX expression construct and screened for extracellular cellulase activity. Twelve of the screened mutants were found to have a more than 2-fold increase in extracellular PCX activity in comparison with the wild type. The extracellular PCX activities in the glycosylation-related mnn10 and pmt5 null mutants were, respectively, 6 and 4 times higher than that of the wild type; and the extracellular PCX activities in 9 protein-trafficking-related mutants, especially in the chc1, clc1 and vps21 null mutants, were at least 1.5 times higher than the parental strains. Site-directed mutagenesis studies further revealed that the degree of N-glycosylation also plays an important role in heterologous cellulase activity in S. cerevisiae.
Systematic screening of knockout mutants of glycosylation- and protein trafficking-associated genes in S. cerevisiae revealed that: (1) blocking Golgi-to-endosome transport may force S. cerevisiae to export cellulases; and (2) both over- and under-glycosylation may alter the enzyme activity of cellulases. This systematic gene-knockout screening approach may serve as a convenient means for increasing the extracellular activities of recombinant proteins expressed in S. cerevisiae.
PMCID: PMC3766678  PMID: 24004614
Cellulase production; Glycosylation; Protein secretion
12.  Evolutionary Conservation of Histone Modifications in Mammals 
Molecular Biology and Evolution  2012;29(7):1757-1767.
Histone modification is an important mechanism of gene regulation in eukaryotes. Why many histone modifications can be stably maintained in the midst of genetic and environmental changes is a fundamental question in evolutionary biology. We obtained genome-wide profiles of three histone marks, H3 lysine 4 tri-methylation (H3K4me3), H3 lysine 4 mono-methylation (H3K4me1), and H3 lysine 27 acetylation (H3K27ac), for several cell types from human and mouse. We identified histone modifications that were stable among different cell types in human and histone modifications that were evolutionarily conserved between mouse and human in the same cell type. We found that histone modifications that were stable among cell types were also likely to be conserved between species. This trend was consistently observed in promoter, intronic, and intergenic regions for all of the histone marks tested. Importantly, the trend was observed regardless of the expression breadth of the nearby gene, indicating that slow evolution of housekeeping genes was not the major reason for the correlation. These regions showed distinct genetic and epigenetic properties, such as clustered transcription factor binding sites (TFBSs), high GC content, and CTCF binding at flanking sides. Based on our observations, we proposed that TFBS clustering in or near a histone modification plays a significant role in stabilizing and conserving the histone modification because TFBS clustering promotes TFBS conservation, which in turn promotes histone modification conservation. In summary, the results of this study support the view that in mammalian genomes a common mechanism maintains histone modifications against both genetic and environmental (cellular) changes.
PMCID: PMC3375473  PMID: 22319170
histone modification; transcription factor binding site; evolution of chromatin state
13.  Genome-Wide Patterns of Genetic Variation in Two Domestic Chickens 
Genome Biology and Evolution  2013;5(7):1376-1392.
Domestic chickens are excellent models for investigating the genetic basis of phenotypic diversity, as numerous phenotypic changes in physiology, morphology, and behavior in chickens have been artificially selected. Genomic study is required to study genome-wide patterns of DNA variation for dissecting the genetic basis of phenotypic traits. We sequenced the genomes of the Silkie and the Taiwanese native chicken L2 at ∼23- and 25-fold average coverage depth, respectively, using Illumina sequencing. The reads were mapped onto the chicken reference genome (including 5.1% Ns) to 92.32% genome coverage for the two breeds. Using a stringent filter, we identified ∼7.6 million single-nucleotide polymorphisms (SNPs) and 8,839 copy number variations (CNVs) in the mapped regions; 42% of the SNPs have not found in other chickens before. Among the 68,906 SNPs annotated in the chicken sequence assembly, 27,852 were nonsynonymous SNPs located in 13,537 genes. We also identified hundreds of shared and divergent structural and copy number variants in intronic and intergenic regions and in coding regions in the two breeds. Functional enrichments of identified genetic variants were discussed. Radical nsSNP-containing immunity genes were enriched in the QTL regions associated with some economic traits for both breeds. Moreover, genetic changes involved in selective sweeps were detected. From the selective sweeps identified in our two breeds, several genes associated with growth, appetite, and metabolic regulation were identified. Our study provides a framework for genetic and genomic research of domestic chickens and facilitates the domestic chicken as an avian model for genomic, biomedical, and evolutionary studies.
PMCID: PMC3730349  PMID: 23814129
single nucleotide polymorphism; whole genome resequencing; genetic variation; CNVs; chicken
14.  Identifying Cis-Regulatory Changes Involved in the Evolution of Aerobic Fermentation in Yeasts 
Genome Biology and Evolution  2013;5(6):1065-1078.
Gene regulation change has long been recognized as an important mechanism for phenotypic evolution. We used the evolution of yeast aerobic fermentation as a model to explore how gene regulation has evolved and how this process has contributed to phenotypic evolution and adaptation. Most eukaryotes fully oxidize glucose to CO2 and H2O in mitochondria to maximize energy yield, whereas some yeasts, such as Saccharomyces cerevisiae and its relatives, predominantly ferment glucose into ethanol even in the presence of oxygen, a phenomenon known as aerobic fermentation. We examined the genome-wide gene expression levels among 12 different yeasts and found that a group of genes involved in the mitochondrial respiration process showed the largest reduction in gene expression level during the evolution of aerobic fermentation. Our analysis revealed that the downregulation of these genes was significantly associated with massive loss of binding motifs of Cbf1p in the fermentative yeasts. Our experimental assays confirmed the binding of Cbf1p to the predicted motif and the activator role of Cbf1p. In summary, our study laid a foundation to unravel the long-time mystery about the genetic basis of evolution of aerobic fermentation, providing new insights into understanding the role of cis-regulatory changes in phenotypic evolution.
PMCID: PMC3698916  PMID: 23650209
yeast; fermentation; cis-regulation; CBF1; evolution; gene expression
15.  Predicting the probability of H3K4me3 occupation at a base pair from the genome sequence context 
Bioinformatics  2013;29(9):1199-1205.
Motivation: Histone modifications regulate chromatin structure and gene expression. Although nucleosome formation is known to be affected by primary DNA sequence composition, no sequence signature has been identified for histone modifications. It is known that dense H3K4me3 nucleosome sites are accompanied by a low density of other nucleosomes and are associated with gene activation. This observation suggests a different sequence composition of H3K4me3 from other nucleosomes.
Approach: To understand the relationship between genome sequence and chromatin structure, we studied DNA sequences at histone modification sites in various human cell types. We found sequence specificity for H3K4me3, but not for other histone modifications. Using the sequence specificities of H3 and H3K4me3 nucleosomes, we developed a model that computes the probability of H3K4me3 occupation at each base pair from the genome sequence context.
Results: A comparison of our predictions with experimental data suggests a high performance of our method, revealing a strong association between H3K4me3 and specific genomic DNA context. The high probability of H3K4me3 occupation occurs at transcription start and termination sites, exon boundaries and binding sites of transcription regulators involved in chromatin modification activities, including histone acetylases and enhancer- and insulator-associated factors. Thus, the human genome sequence contains signatures for chromatin modifications essential for gene regulation and development. Our method may be applied to find new sequence elements functioning by chromatin modulation.
Availability: Software and supplementary data are available at Bioinformatics online.
Contact: or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3658463  PMID: 23511541
16.  Assembling a cellulase cocktail and a cellodextrin transporter into a yeast host for CBP ethanol production 
Many microorganisms possess enzymes that can efficiently degrade lignocellulosic materials, but do not have the capability to produce a large amount of ethanol. Thus, attempts have been made to transform such enzymes into fermentative microbes to serve as hosts for ethanol production. However, an efficient host for a consolidated bioprocess (CBP) remains to be found. For this purpose, a synthetic biology technique that can transform multiple genes into a genome is instrumental. Moreover, a strategy to select cellulases that interact synergistically is needed.
To engineer a yeast for CBP bio-ethanol production, a synthetic biology technique, called “promoter-based gene assembly and simultaneous overexpression” (PGASO), that can simultaneously transform and express multiple genes in a kefir yeast, Kluyveromyces marxianus KY3, was recently developed. To formulate an efficient cellulase cocktail, a filter-paper-activity assay for selecting heterologous cellulolytic enzymes was established in this study and used to select five cellulase genes, including two cellobiohydrolases, two endo-β-1,4-glucanases and one beta-glucosidase genes from different fungi. In addition, a fungal cellodextrin transporter gene was chosen to transport cellodextrin into the cytoplasm. These six genes plus a selection marker gene were one-step assembled into the KY3 genome using PGASO. Our experimental data showed that the recombinant strain KR7 could express the five heterologous cellulase genes and that KR7 could convert crystalline cellulose into ethanol.
Seven heterologous genes, including five cellulases, a cellodextrin transporter and a selection marker, were simultaneously transformed into the KY3 genome to derive a new strain, KR7, which could directly convert cellulose to ethanol. The present study demonstrates the potential of our strategy of combining a cocktail formulation protocol and a synthetic biology technique to develop a designer yeast host.
PMCID: PMC3599373  PMID: 23374631
Cellulosic ethanol; Crystalline cellulose; Cocktail formulation; Synthetic biology; Consolidated bioprocess
17.  Evolution of 5′ Untranslated Region Length and Gene Expression Reprogramming in Yeasts 
The sequences of the untranslated regions (UTRs) of mRNAs play important roles in posttranscriptional regulation, but whether a change in UTR length can significantly affect the regulation of gene expression is not clear. In this study, we examined the connection between UTR length and Expression Correlation with cytosolic ribosomal proteins (CRP) genes (ECC), which measures the level of expression similarity of a group of genes with CRP genes under various growth conditions. We used data from the aerobic fermentation yeast Saccharomyces cerevisiae and the aerobic respiration yeast Candida albicans. To reduce statistical fluctuations, we computed the ECC for the genes in a Gene Ontology (GO) functional group. We found that in both species, ECC is strongly correlated with the 5′ UTR length but not with the 3′ UTR length and that the 5′ UTR length is evolutionarily better conserved than the 3′ UTR length. Interestingly, we found 11 GO groups that have had a substantial increase in 5′ UTR length in the S. cerevisiae lineage and that the length increase was associated with a substantial decrease in ECC. Moreover, 9 of the 11 GO groups of genes are involved in mitochondrial respiration function, whose expression reprogramming has been shown to be a major factor for the evolution of aerobic fermentation. Finally, we found that an increase in 5′ UTR length may decrease the +1 nucleosome occupancy. This study provides a new angle to understand the role of 5′ UTR in gene expression regulation and evolution.
PMCID: PMC3245540  PMID: 21965341
UTR length; gene expression evolution; aerobic fermentation
18.  MicroRNA 3' end nucleotide modification patterns and arm selection preference in liver tissues 
BMC Systems Biology  2012;6(Suppl 2):S14.
The expression of microRNA (miRNA) genes undergoes several maturation steps. Recent studies brought new insights into the maturation process, but also raised debates on the maturation mechanism. To understand the mechanism better, we downloaded small RNA sequence reads from NCBI SRA and quantified the expression profiles of miRNAs in normal and tumor liver tissues.
From these miRNA expression profiles, we studied several issues related to miRNA biogenesis. First of all, the 3' ends of mature miRNAs usually carried modified nucleotides, generated from nucleotide addition or RNA editing. We found that adenine accounted for more than 50% of all miRNA 3' end modification events in all libraries. However, uracil dominated over adenine in several miRNA types. Moreover, the miRNA reads in the HBV-associated libraries have much lower rates of nucleotide modification. These results indicate that miRNA 3' end modifications are miRNA specific and may differ between normal and tumor tissues. Secondly, according to the hydrogen-bonding theory, the expression ratio of 5p arm to 3p arm miRNAs, derived from the same pre-miRNA, should be constant over tissues. However, a comparison of the expression profiles of the 5p arm and 3p arm miRNAs showed that one arm is preferred in the normal liver tissue whereas the other is preferred in the tumor liver tissue. In other words, different liver tissues have their own preferences on selecting either arm to be mature miRNAs.
The results suggest that besides the traditional miRNA biogenesis theory, another mechanism may also participate in the miRNA biogenesis pathways.
PMCID: PMC3521178  PMID: 23282006
19.  Comprehensive analysis of microRNAs in breast cancer 
BMC Genomics  2012;13(Suppl 7):S18.
MicroRNAs (miRNAs) are short noncoding RNAs (approximately 22 nucleotides in length) that play important roles in breast cancer progression by downregulating gene expression. The detailed mechanisms and biological functions of miRNA molecules in breast carcinogenesis have yet to be fully elucidated. This study used bioinformatics and experimental approaches to conduct detailed analysis of the dysregulated miRNAs, arm selection preferences, 3' end modifications, and position shifts in isoforms of miRNAs (isomiRs) in breast cancer.
Next-generation sequencing (NGS) data on breast cancer was obtained from the NCBI Sequence Read Archive (SRA). The miRNA expression profiles and isomiRs in normal breast and breast tumor tissues were determined by mapping the clean reads back to human miRNAs. Differences in miRNA expression and pre-miRNA 5p/3p arm usage between normal and breast tumor tissues were further investigated using stem-loop reverse transcription and real-time polymerase chain reaction.
The analysis identified and confirmed the aberrant expression of 22 miRNAs in breast cancer. Results from pathway enrichment analysis further indicated that the aberrantly expressed miRNAs play important roles in breast carcinogenesis by regulating the mitogen-activated protein kinase (MAPK) signaling pathway. Data also indicated that the position shifts in isomiRs and 3' end modifications were consistent in breast tumor and adjacent normal tissues, and that 5p/3p arm usage of some miRNAs displayed significant preferences in breast cancer.
Expression pattern and arm selection of miRNAs are significantly varied in breast cancers through analyzing NGS data and experimental approach. These miRNA candidates have high potential to play critical roles in the progression of breast cancer and could potentially provide as targets for future therapy.
PMCID: PMC3521236  PMID: 23281739
20.  The Relationships Among MicroRNA Regulation, Intrinsically Disordered Regions, and Other Indicators of Protein Evolutionary Rate 
Molecular Biology and Evolution  2011;28(9):2513-2520.
Many indicators of protein evolutionary rate have been proposed, but some of them are interrelated. The purpose of this study is to disentangle their correlations. We assess the strength of each indicator by controlling for the other indicators under study. We find that the number of microRNA (miRNA) types that regulate a gene is the strongest rate indicator (a negative correlation), followed by disorder content (the percentage of disordered regions in a protein, a positive correlation); the strength of disorder content as a rate indicator is substantially increased after controlling for the number of miRNA types. By dividing proteins into lowly and highly intrinsically disordered proteins (L-IDPs and H-IDPs), we find that proteins interacting with more H-IDPs tend to evolve more slowly, which largely explains the previous observation of a negative correlation between the number of protein–protein interactions and evolutionary rate. Moreover, all of the indicators examined here, except for the number of miRNA types, have different strengths in L-IDPs and in H-IDPs. Finally, the number of phosphorylation sites is weakly correlated with the number of miRNA types, and its strength as a rate indicator is substantially reduced when other indicators are considered. Our study reveals the relative strength of each rate indicator and increases our understanding of protein evolution.
PMCID: PMC3163433  PMID: 21398349
protein evolution; disordered proteins; microRNA regulation; protein–protein interaction; phosphorylation
21.  PGASO: A synthetic biology tool for engineering a cellulolytic yeast 
To achieve an economical cellulosic ethanol production, a host that can do both cellulosic saccharification and ethanol fermentation is desirable. However, to engineer a non-cellulolytic yeast to be such a host requires synthetic biology techniques to transform multiple enzyme genes into its genome.
A technique, named Promoter-based Gene Assembly and Simultaneous Overexpression (PGASO), that employs overlapping oligonucleotides for recombinatorial assembly of gene cassettes with individual promoters, was developed. PGASO was applied to engineer Kluyveromycesmarxianus KY3, which is a thermo- and toxin-tolerant yeast. We obtained a recombinant strain, called KR5, that is capable of simultaneously expressing exoglucanase and endoglucanase (both of Trichodermareesei), a beta-glucosidase (from a cow rumen fungus), a neomycin phosphotransferase, and a green fluorescent protein. High transformation efficiency and accuracy were achieved as ~63% of the transformants was confirmed to be correct. KR5 can utilize beta-glycan, cellobiose or CMC as the sole carbon source for growth and can directly convert cellobiose and beta-glycan to ethanol.
This study provides the first example of multi-gene assembly in a single step in a yeast species other than Saccharomyces cerevisiae. We successfully engineered a yeast host with a five-gene cassette assembly and the new host is capable of co-expressing three types of cellulase genes. Our study shows that PGASO is an efficient tool for simultaneous expression of multiple enzymes in the kefir yeast KY3 and that KY3 can serve as a host for developing synthetic biology tools.
PMCID: PMC3462719  PMID: 22839502
Consolidated bioprocess; Synthetic biology; Yeast; Cellulolytic enzymes; Bio-ethanol
22.  The Chicken Frizzle Feather Is Due to an α-Keratin (KRT75) Mutation That Causes a Defective Rachis 
PLoS Genetics  2012;8(7):e1002748.
Feathers have complex forms and are an excellent model to study the development and evolution of morphologies. Existing chicken feather mutants are especially useful for identifying genetic determinants of feather formation. This study focused on the gene F, underlying the frizzle feather trait that has a characteristic curled feather rachis and barbs in domestic chickens. Our developmental biology studies identified defects in feather medulla formation, and physical studies revealed that the frizzle feather curls in a stepwise manner. The frizzle gene is transmitted in an autosomal incomplete dominant mode. A whole-genome linkage scan of five pedigrees with 2678 SNPs revealed association of the frizzle locus with a keratin gene-enriched region within the linkage group E22C19W28_E50C23. Sequence analyses of the keratin gene cluster identified a 69 bp in-frame deletion in a conserved region of KRT75, an α-keratin gene. Retroviral-mediated expression of the mutated F cDNA in the wild-type rectrix qualitatively changed the bending of the rachis with some features of frizzle feathers including irregular kinks, severe bending near their distal ends, and substantially higher variations among samples in comparison to normal feathers. These results confirmed KRT75 as the F gene. This study demonstrates the potential of our approach for identifying genetic determinants of feather forms.
Author Summary
With the availability of a sequenced chicken genome, the reservoir of variant plumage genes found in domestic chickens can provide insight into the molecular mechanisms underlying the diversity of feather forms. In this paper, we identify the molecular basis of the distinctive frizzle (F) feather phenotype that is caused by a single autosomal incomplete dominant gene in which heterozygous individuals show less severe phenotypes than homozygous individuals. Feathers in frizzle chickens curve backward. We used computer-assisted analysis to establish that the rachis of the frizzle feather was irregularly kinked and more severely bent than normal. Moreover, microscopic evaluation of regenerating feathers found reduced proliferating cells that give rise to the frizzle rachis. Analysis of a pedigree of frizzle chickens showed that the phenotype is linked to two single-nucleotide polymorphisms in a cluster of keratin genes within the linkage group E22C19W28_E50C23. Sequencing of the gene cluster identified a 69-base pair in-frame deletion of the protein coding sequence of the α-keratin-75 gene. Forced expression of the mutated gene in normal chickens produced a twisted rachis. Although chicken feathers are primarily composed of beta-keratins, our findings indicate that alpha-keratins have an important role in establishing the structure of feathers.
PMCID: PMC3400578  PMID: 22829773
23.  A highly efficient β-glucosidase from the buffalo rumen fungus Neocallimastix patriciarum W5 
Cellulose, which is the most abundant renewable biomass on earth, is a potential bio-resource of alternative energy. The hydrolysis of plant polysaccharides is catalyzed by microbial cellulases, including endo-β-1,4-glucanases, cellobiohydrolases, cellodextrinases, and β-glucosidases. Converting cellobiose by β-glucosidases is the key factor for reducing cellobiose inhibition and enhancing the efficiency of cellulolytic enzymes for cellulosic ethanol production.
In this study, a cDNA encoding β-glucosidase was isolated from the buffalo rumen fungus Neocallimastix patriciarum W5 and is named NpaBGS. It has a length of 2,331 bp with an open reading frame coding for a protein of 776 amino acid residues, corresponding to a theoretical molecular mass of 85.1 kDa and isoelectric point of 4.4. Two GH3 catalytic domains were found at the N and C terminals of NpaBGS by sequence analysis. The cDNA was expressed in Pichia pastoris and after protein purification, the enzyme displayed a specific activity of 34.5 U/mg against cellobiose as the substrate. Enzymatic assays showed that NpaBGS was active on short cello-oligosaccharides from various substrates. A weak activity in carboxymethyl cellulose (CMC) digestion indicated that the enzyme might also have the function of an endoglucanase. The optimal activity was detected at 40°C and pH 5 ~ 6, showing that the enzyme prefers a weak acid condition. Moreover, its activity could be enhanced at 50°C by adding Mg2+ or Mn2+ ions. Interestingly, in simultaneous saccharification and fermentation (SSF) experiments using Saccharomyces cerevisiae BY4741 or Kluyveromyces marxianus KY3 as the fermentation yeast, NpaBGS showed advantages in cell growth, glucose production, and ethanol production over the commercial enzyme Novo 188. Moreover, we showed that the KY3 strain engineered with the NpaNGS gene can utilize 2 % dry napiergrass as the sole carbon source to produce 3.32 mg/ml ethanol when Celluclast 1.5 L was added to the SSF system.
Our characterizations of the novel β-glucosidase NpaBGS revealed that it has a preference of weak acidity for optimal yeast fermentation and an optimal temperature of ~40°C. Since NpaBGS performs better than Novo 188 under the living conditions of fermentation yeasts, it has the potential to be a suitable enzyme for SSF.
PMCID: PMC3403894  PMID: 22515264
Endoglucanase; β-glucosidase; Neocallimastix patriciarum; Rumen fungi; Simultaneous saccharification and fermentation
24.  Transcriptomes of Mouse Olfactory Epithelium Reveal Sexual Differences in Odorant Detection 
Genome Biology and Evolution  2012;4(5):703-712.
To sense numerous odorants and chemicals, animals have evolved a large number of olfactory receptor genes (Olfrs) in their genome. In particular, the house mouse has ∼1,100 genes in the Olfr gene family. This makes the mouse a good model organism to study Olfr genes and olfaction-related genes. To date, whether male and female mice possess the same ability in detecting environmental odorants is still unknown. Using the next generation sequencing technology (paired-end mRNA-seq), we detected 1,088 expressed Olfr genes in both male and female olfactory epithelium. We found that not only Olfr genes but also odorant-binding protein (Obp) genes have evolved rapidly in the mouse lineage. Interestingly, Olfr genes tend to express at a higher level in males than in females, whereas the Obp genes clustered on the X chromosome show the opposite trend. These observations may imply a more efficient odorant-transporting system in females, whereas a more active Olfr gene expressing system in males. In addition, we detected the expression of two genes encoding major urinary proteins, which have been proposed to bind and transport pheromones or act as pheromones in mouse urine. This observation suggests a role of main olfactory system (MOS) in pheromone detection, contrary to the view that only accessory olfactory system (AOS) is involved in pheromone detection. This study suggests the sexual differences in detecting environmental odorants in MOS and demonstrates that mRNA-seq provides a powerful tool for detecting genes with low expression levels and with high sequence similarities.
PMCID: PMC3381674  PMID: 22511034
mRNA-seq; olfactory epithelium; olfactory receptor; odorant-binding protein; major urinary protein; sexual differentiation
25.  The Evolution of Aerobic Fermentation in Schizosaccharomyces pombe Was Associated with Regulatory Reprogramming but not Nucleosome Reorganization 
Molecular Biology and Evolution  2010;28(4):1407-1413.
Aerobic fermentation has evolved independently in two yeast lineages, the Saccharomyces cerevisiae and the Schizosaccharomyces pombe lineages. In the S. cerevisiae lineage, the evolution of aerobic fermentation was shown to be associated with transcriptional reprogramming of the genes involved in respiration and was recently suggested to be linked to changes in nucleosome occupancy pattern in the promoter regions of respiration-related genes. In contrast, little is known about the genetic basis for the evolution of aerobic fermentation in the Sch. pombe lineage. In particular, it is not known whether respiration-related genes in Sch. pombe have undergone a transcriptional reprogramming or changes in nucleosome occupancy pattern in their promoter regions. In this study, we compared genome-wide gene expression profiles of Sch. pombe with those of S. cerevisiae and the aerobic respiration yeast Candida albicans. We found that the expression profile of respiration-related genes in Sch. pombe is similar to that of S. cerevisiae, but different from that of C. albicans, suggesting that their transcriptional regulation has been reprogrammed during the evolution of aerobic fermentation. However, we found no significant nucleosome organization change in the promoter of respiration-related gene in Sch. pombe.
PMCID: PMC3058771  PMID: 21127171
aerobic fermentation; nucleosome organization; gene expression; Schizosaccharomyces pombe

Results 1-25 (63)