PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-15 (15)
 

Clipboard (0)
None

Select a Filter Below

Journals
Authors
more »
Year of Publication
Document Types
1.  DMINDA: an integrated web server for DNA motif identification and analyses 
Nucleic Acids Research  2014;42(Web Server issue):W12-W19.
DMINDA (DNA motif identification and analyses) is an integrated web server for DNA motif identification and analyses, which is accessible at http://csbl.bmb.uga.edu/DMINDA/. This web site is freely available to all users and there is no login requirement. This server provides a suite of cis-regulatory motif analysis functions on DNA sequences, which are important to elucidation of the mechanisms of transcriptional regulation: (i) de novo motif finding for a given set of promoter sequences along with statistical scores for the predicted motifs derived based on information extracted from a control set, (ii) scanning motif instances of a query motif in provided genomic sequences, (iii) motif comparison and clustering of identified motifs, and (iv) co-occurrence analyses of query motifs in given promoter sequences. The server is powered by a backend computer cluster with over 150 computing nodes, and is particularly useful for motif prediction and analyses in prokaryotic genomes. We believe that DMINDA, as a new and comprehensive web server for cis-regulatory motif finding and analyses, will benefit the genomic research community in general and prokaryotic genome researchers in particular.
doi:10.1093/nar/gku315
PMCID: PMC4086085  PMID: 24753419
2.  DOOR 2.0: presenting operons and their functions through dynamic and integrated views 
Nucleic Acids Research  2013;42(Database issue):D654-D659.
We have recently developed a new version of the DOOR operon database, DOOR 2.0, which is available online at http://csbl.bmb.uga.edu/DOOR/ and will be updated on a regular basis. DOOR 2.0 contains genome-scale operons for 2072 prokaryotes with complete genomes, three times the number of genomes covered in the previous version published in 2009. DOOR 2.0 has a number of new features, compared with its previous version, including (i) more than 250 000 transcription units, experimentally validated or computationally predicted based on RNA-seq data, providing a dynamic functional view of the underlying operons; (ii) an integrated operon-centric data resource that provides not only operons for each covered genome but also their functional and regulatory information such as their cis-regulatory binding sites for transcription initiation and termination, gene expression levels estimated based on RNA-seq data and conservation information across multiple genomes; (iii) a high-performance web service for online operon prediction on user-provided genomic sequences; (iv) an intuitive genome browser to support visualization of user-selected data; and (v) a keyword-based Google-like search engine for finding the needed information intuitively and rapidly in this database.
doi:10.1093/nar/gkt1048
PMCID: PMC3965076  PMID: 24214966
3.  Elucidation of How Cancer Cells Avoid Acidosis through Comparative Transcriptomic Data Analysis 
PLoS ONE  2013;8(8):e71177.
The rapid growth of cancer cells fueled by glycolysis produces large amounts of protons in cancer cells, which tri mechanisms to transport them out, hence leading to increased acidity in their extracellular environments. It has been well established that the increased acidity will induce cell death of normal cells but not cancer cells. The main question we address here is: how cancer cells deal with the increased acidity to avoid the activation of apoptosis. We have carried out a comparative analysis of transcriptomic data of six solid cancer types, breast, colon, liver, two lung (adenocarcinoma, squamous cell carcinoma) and prostate cancers, and proposed a model of how cancer cells utilize a few mechanisms to keep the protons outside of the cells. The model consists of a number of previously, well or partially, studied mechanisms for transporting out the excess protons, such as through the monocarboxylate transporters, V-ATPases, NHEs and the one facilitated by carbonic anhydrases. In addition we propose a new mechanism that neutralizes protons through the conversion of glutamate to γ-aminobutyrate, which consumes one proton per reaction. We hypothesize that these processes are regulated by cancer related conditions such as hypoxia and growth factors and by the pH levels, making these encoded processes not available to normal cells under acidic conditions.
doi:10.1371/journal.pone.0071177
PMCID: PMC3743895  PMID: 23967163
4.  CINPER: An Interactive Web System for Pathway Prediction for Prokaryotes 
PLoS ONE  2012;7(12):e51252.
We present a web-based network-construction system, CINPER (CSBL INteractive Pathway BuildER), to assist a user to build a user-specified gene network for a prokaryotic organism in an intuitive manner. CINPER builds a network model based on different types of information provided by the user and stored in the system. CINPER’s prediction process has four steps: (i) collection of template networks based on (partially) known pathways of related organism(s) from the SEED or BioCyc database and the published literature; (ii) construction of an initial network model based on the template networks using the P-Map program; (iii) expansion of the initial model, based on the association information derived from operons, protein-protein interactions, co-expression modules and phylogenetic profiles; and (iv) computational validation of the predicted models based on gene expression data. To facilitate easy applications, CINPER provides an interactive visualization environment for a user to enter, search and edit relevant data and for the system to display (partial) results and prompt for additional data. Evaluation of CINPER on 17 well-studied pathways in the MetaCyc database shows that the program achieves an average recall rate of 76% and an average precision rate of 90% on the initial models; and a higher average recall rate at 87% and an average precision rate at 28% on the final models. The reduced precision rate in the final models versus the initial models reflects the reality that the final models have large numbers of novel genes that have no experimental evidences and hence are not yet collected in the MetaCyc database. To demonstrate the usefulness of this server, we have predicted an iron homeostasis gene network of Synechocystis sp. PCC6803 using the server. The predicted models along with the server can be accessed at http://csbl.bmb.uga.edu/cinper/.
doi:10.1371/journal.pone.0051252
PMCID: PMC3517448  PMID: 23236458
5.  The percentage of bacterial genes on leading versus lagging strands is influenced by multiple balancing forces 
Nucleic Acids Research  2012;40(17):8210-8218.
The majority of bacterial genes are located on the leading strand, and the percentage of such genes has a large variation across different bacteria. Although some explanations have been proposed, these are at most partial explanations as they cover only small percentages of the genes and do not even consider the ones biased toward the lagging strand. We have carried out a computational study on 725 bacterial genomes, aiming to elucidate other factors that may have influenced the strand location of genes in a bacterium. Our analyses suggest that (i) genes of some functional categories such as ribosome have higher preferences to be on the leading strands; (ii) genes of some functional categories such as transcription factor have higher preferences on the lagging strands; (iii) there is a balancing force that tends to keep genes from all moving to the leading and more efficient strand and (iv) the percentage of leading-strand genes in an bacterium can be accurately explained based on the numbers of genes in the functional categories outlined in (i) and (ii), genome size and gene density, indicating that these numbers implicitly contain the information about the percentage of genes on the leading versus lagging strand in a genome.
doi:10.1093/nar/gks605
PMCID: PMC3458553  PMID: 22735706
7.  dbCAN: a web resource for automated carbohydrate-active enzyme annotation 
Nucleic Acids Research  2012;40(Web Server issue):W445-W451.
Carbohydrate-active enzymes (CAZymes) are very important to the biotech industry, particularly the emerging biofuel industry because CAZymes are responsible for the synthesis, degradation and modification of all the carbohydrates on Earth. We have developed a web resource, dbCAN (http://csbl.bmb.uga.edu/dbCAN/annotate.php), to provide a capability for automated CAZyme signature domain-based annotation for any given protein data set (e.g. proteins from a newly sequenced genome) submitted to our server. To accomplish this, we have explicitly defined a signature domain for every CAZyme family, derived based on the CDD (conserved domain database) search and literature curation. We have also constructed a hidden Markov model to represent the signature domain of each CAZyme family. These CAZyme family-specific HMMs are our key contribution and the foundation for the automated CAZyme annotation.
doi:10.1093/nar/gks479
PMCID: PMC3394287  PMID: 22645317
8.  A Comparative Study of Gene-Expression Data of Basal Cell Carcinoma and Melanoma Reveals New Insights about the Two Cancers 
PLoS ONE  2012;7(1):e30750.
A comparative analysis of genome-scale transcriptomic data of two types of skin cancers, melanoma and basal cell carcinoma in comparison with other cancer types, was conducted with the aim of identifying key regulatory factors that either cause or contribute to the aggressiveness of melanoma, while basal cell carcinoma generally remains a mild disease. Multiple cancer-related pathways such as cell proliferation, apoptosis, angiogenesis, cell invasion and metastasis, are considered, but our focus is on energy metabolism, cell invasion and metastasis pathways. Our findings include the following. (a) Both types of skin cancers use both glycolysis and increased oxidative phosphorylation (electron transfer chain) for their energy supply. (b) Advanced melanoma shows substantial up-regulation of key genes involved in fatty acid metabolism (β-oxidation) and oxidative phosphorylation, with aerobic metabolism being far more efficient than anaerobic glycolysis, providing a source of the energetics necessary to support the rapid growth of this cancer. (c) While advanced melanoma is similar to pancreatic cancer in terms of the activity level of genes involved in promoting cell invasion and metastasis, the main metastatic form of basal cell carcinoma is substantially reduced in this activity, partially explaining why this cancer type has been considered as far less aggressive. Our method of using comparative analyses of transcriptomic data of multiple cancer types focused on specific pathways provides a novel and highly effective approach to cancer studies in general.
doi:10.1371/journal.pone.0030750
PMCID: PMC3266277  PMID: 22295108
9.  Integration of sequence-similarity and functional association information can overcome intrinsic problems in orthology mapping across bacterial genomes 
Nucleic Acids Research  2011;39(22):e150.
Existing methods for orthologous gene mapping suffer from two general problems: (i) they are computationally too slow and their results are difficult to interpret for automated large-scale applications when based on phylogenetic analyses; or (ii) they are too prone to making mistakes in dealing with complex situations involving horizontal gene transfers and gene fusion due to the lack of a sound basis when based on sequence similarity information. We present a novel algorithm, Global Optimization Strategy (GOST), for orthologous gene mapping through combining sequence similarity and contextual (working partners) information, using a combinatorial optimization framework. Genome-scale applications of GOST show substantial improvements over the predictions by three popular sequence similarity-based orthology mapping programs. Our analysis indicates that our algorithm overcomes the intrinsic issues faced by sequence similarity-based methods, when orthology mapping involves gene fusions and horizontal gene transfers. Our program runs as efficiently as the most efficient sequence similarity-based algorithm in the public domain. GOST is freely downloadable at http://csbl.bmb.uga.edu/~maqin/GOST.
doi:10.1093/nar/gkr766
PMCID: PMC3239196  PMID: 21965536
10.  SEAS: A System for SEED-Based Pathway Enrichment Analysis 
PLoS ONE  2011;6(7):e22556.
Pathway enrichment analysis represents a key technique for analyzing high-throughput omic data, and it can help to link individual genes or proteins found to be differentially expressed under specific conditions to well-understood biological pathways. We present here a computational tool, SEAS, for pathway enrichment analysis over a given set of genes in a specified organism against the pathways (or subsystems) in the SEED database, a popular pathway database for bacteria. SEAS maps a given set of genes of a bacterium to pathway genes covered by SEED through gene ID and/or orthology mapping, and then calculates the statistical significance of the enrichment of each relevant SEED pathway by the mapped genes. Our evaluation of SEAS indicates that the program provides highly reliable pathway mapping results and identifies more organism-specific pathways than similar existing programs. SEAS is publicly released under the GPL license agreement and freely available at http://csbl.bmb.uga.edu/~xizeng/research/seas/.
doi:10.1371/journal.pone.0022556
PMCID: PMC3142180  PMID: 21799897
11.  KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases 
Nucleic Acids Research  2011;39(Web Server issue):W316-W322.
High-throughput experimental technologies often identify dozens to hundreds of genes related to, or changed in, a biological or pathological process. From these genes one wants to identify biological pathways that may be involved and diseases that may be implicated. Here, we report a web server, KOBAS 2.0, which annotates an input set of genes with putative pathways and disease relationships based on mapping to genes with known annotations. It allows for both ID mapping and cross-species sequence similarity mapping. It then performs statistical tests to identify statistically significantly enriched pathways and diseases. KOBAS 2.0 incorporates knowledge across 1327 species from 5 pathway databases (KEGG PATHWAY, PID, BioCyc, Reactome and Panther) and 5 human disease databases (OMIM, KEGG DISEASE, FunDO, GAD and NHGRI GWAS Catalog). KOBAS 2.0 can be accessed at http://kobas.cbi.pku.edu.cn.
doi:10.1093/nar/gkr483
PMCID: PMC3125809  PMID: 21715386
12.  Computational prediction of the osmoregulation network in Synechococcus sp. WH8102 
BMC Genomics  2010;11:291.
Background
Osmotic stress is caused by sudden changes in the impermeable solute concentration around a cell, which induces instantaneous water flow in or out of the cell to balance the concentration. Very little is known about the detailed response mechanism to osmotic stress in marine Synechococcus, one of the major oxygenic phototrophic cyanobacterial genera that contribute greatly to the global CO2 fixation.
Results
We present here a computational study of the osmoregulation network in response to hyperosmotic stress of Synechococcus sp strain WH8102 using comparative genome analyses and computational prediction. In this study, we identified the key transporters, synthetases, signal sensor proteins and transcriptional regulator proteins, and found experimentally that of these proteins, 15 genes showed significantly changed expression levels under a mild hyperosmotic stress.
Conclusions
From the predicted network model, we have made a number of interesting observations about WH8102. Specifically, we found that (i) the organism likely uses glycine betaine as the major osmolyte, and others such as glucosylglycerol, glucosylglycerate, trehalose, sucrose and arginine as the minor osmolytes, making it efficient and adaptable to its changing environment; and (ii) σ38, one of the seven types of σ factors, probably serves as a global regulator coordinating the osmoregulation network and the other relevant networks.
doi:10.1186/1471-2164-11-291
PMCID: PMC2874817  PMID: 20459751
13.  Genes and (Common) Pathways Underlying Drug Addiction 
Drug addiction is a serious worldwide problem with strong genetic and environmental influences. Different technologies have revealed a variety of genes and pathways underlying addiction; however, each individual technology can be biased and incomplete. We integrated 2,343 items of evidence from peer-reviewed publications between 1976 and 2006 linking genes and chromosome regions to addiction by single-gene strategies, microrray, proteomics, or genetic studies. We identified 1,500 human addiction-related genes and developed KARG (http://karg.cbi.pku.edu.cn), the first molecular database for addiction-related genes with extensive annotations and a friendly Web interface. We then performed a meta-analysis of 396 genes that were supported by two or more independent items of evidence to identify 18 molecular pathways that were statistically significantly enriched, covering both upstream signaling events and downstream effects. Five molecular pathways significantly enriched for all four different types of addictive drugs were identified as common pathways which may underlie shared rewarding and addictive actions, including two new ones, GnRH signaling pathway and gap junction. We connected the common pathways into a hypothetical common molecular network for addiction. We observed that fast and slow positive feedback loops were interlinked through CAMKII, which may provide clues to explain some of the irreversible features of addiction.
Author Summary
Drug addiction has become one of the most serious problems in the world. It has been estimated that genetic factors contribute to 40%–60% of the vulnerability to drug addiction, and environmental factors provide the remainder. What are the genes and pathways underlying addiction? Is there a common molecular network underlying addiction to different abusive substances? Is there any network property that may explain the long-lived and often irreversible molecular and structural changes after addiction? These important questions were traditionally studied experimentally. The explosion of genomic and proteomic data in recent years both enabled and necessitated bioinformatic studies of addiction. We integrated data derived from multiple technology platforms and collected 2,343 items of evidence linking genes and chromosome regions to addiction. We identified 18 statistically significantly enriched molecular pathways. In particular, five of them were common for four types of addictive drugs, which may underlie shared rewarding and addictive actions, including two new ones, GnRH signaling pathway and gap junction. We connected the common pathways into a hypothetical common molecular network for addiction. We observed that fast and slow positive feedback loops were interlinked through CAMKII, which may provide clues to explain some of the irreversible features of addiction.
doi:10.1371/journal.pcbi.0040002
PMCID: PMC2174978  PMID: 18179280
14.  Genome comparison using Gene Ontology (GO) with statistical testing 
BMC Bioinformatics  2006;7:374.
Background
Automated comparison of complete sets of genes encoded in two genomes can provide insight on the genetic basis of differences in biological traits between species. Gene ontology (GO) is used as a common vocabulary to annotate genes for comparison. Current approaches calculate the fold of unweighted or weighted differences between two species at the high-level GO functional categories. However, to ensure the reliability of the differences detected, it is important to evaluate their statistical significance. It is also useful to search for differences at all levels of GO.
Results
We propose a statistical approach to find reliable differences between the complete sets of genes encoded in two genomes at all levels of GO. The genes are first assigned GO terms from BLAST searches against genes with known GO assignments, and for each GO term the abundance of genes in the two genomes is compared using a chi-squared test followed by false discovery rate (FDR) correction. We applied this method to find statistically significant differences between two cyanobacteria, Synechocystis sp. PCC6803 and Anabaena sp. PCC7120. We then studied how the set of identified differences vary when different BLAST cutoffs are used. We also studied how the results vary when only subsets of the genes were used in the comparison of human vs. mouse and that of Saccharomyces cerevisiae vs. Schizosaccharomyces pombe.
Conclusion
There is a surprising lack of statistical approaches for comparing complete genomes at all levels of GO. With the rapid increase of the number of sequenced genomes, we hope that the approach we proposed and tested can make valuable contribution to comparative genomics.
doi:10.1186/1471-2105-7-374
PMCID: PMC1569881  PMID: 16901353
15.  KOBAS server: a web-based platform for automated annotation and pathway identification 
Nucleic Acids Research  2006;34(Web Server issue):W720-W724.
There is an increasing need to automatically annotate a set of genes or proteins (from genome sequencing, DNA microarray analysis or protein 2D gel experiments) using controlled vocabularies and identify the pathways involved, especially the statistically enriched pathways. We have previously demonstrated the KEGG Orthology (KO) as an effective alternative controlled vocabulary and developed a standalone KO-Based Annotation System (KOBAS). Here we report a KOBAS server with a friendly web-based user interface and enhanced functionalities. The server can support input by nucleotide or amino acid sequences or by sequence identifiers in popular databases and can annotate the input with KO terms and KEGG pathways by BLAST sequence similarity or directly ID mapping to genes with known annotations. The server can then identify both frequent and statistically enriched pathways, offering the choices of four statistical tests and the option of multiple testing correction. The server also has a ‘User Space’ in which frequent users may store and manage their data and results online. We demonstrate the usability of the server by finding statistically enriched pathways in a set of upregulated genes in Alzheimer's Disease (AD) hippocampal cornu ammonis 1 (CA1). KOBAS server can be accessed at .
doi:10.1093/nar/gkl167
PMCID: PMC1538915  PMID: 16845106

Results 1-15 (15)