Search tips
Search criteria

Results 1-25 (79)

Clipboard (0)

Select a Filter Below

Year of Publication
author:("Li, yixie")
1.  SysPTM 2.0: an updated systematic resource for post-translational modification 
Post-translational modifications (PTMs) of proteins play essential roles in almost all cellular processes, and are closely related to physiological activity and disease development of living organisms. The development of tandem mass spectrometry (MS/MS) has resulted in a rapid increase of PTMs identified on proteins from different species. The collection and systematic ordering of PTM data should provide invaluable information for understanding cellular processes and signaling pathways regulated by PTMs. For this original purpose we developed SysPTM, a systematic resource installed with comprehensive PTM data and a suite of web tools for annotation of PTMs in 2009. Four years later, there has been a significant advance with the generation of PTM data and, consequently, more sophisticated analysis requirements have to be met. Here we submit an updated version of SysPTM 2.0 (, with almost doubled data content, enhanced web-based analysis tools of PTMBlast, PTMPathway, PTMPhylog, PTMCluster. Moreover, a new session SysPTM-H is constructed to graphically represent the combinatorial histone PTMs and dynamic regulation of histone modifying enzymes, and a new tool PTMGO is added for functional annotation and enrichment analysis. SysPTM 2.0 not only facilitates resourceful annotation of PTM sites but allows systematic investigation of PTM functions by the user. Citation details: Li,J., Jia,J., Li,H. et al. SysPTM 2.0: an updated systematic resource for post-translational modification. Database (2014) Vol. 2014: article ID bau025; doi:10.1093/database/bau025.
Database URL:
PMCID: PMC3975108
2.  LAceP: Lysine Acetylation Site Prediction Using Logistic Regression Classifiers 
PLoS ONE  2014;9(2):e89575.
Lysine acetylation is a crucial type of protein post-translational modification, which is involved in many important cellular processes and serious diseases. However, identification of protein acetylated sites through traditional experiment methods is time-consuming and laborious. Those methods are not suitable to identify a large number of acetylated sites quickly. Therefore, computational methods are still very valuable to accelerate lysine acetylated site finding.
In this study, many biological characteristics of acetylated sites have been investigated, such as the amino acid sequence around the acetylated sites, the physicochemical property of the amino acids and the transition probability of adjacent amino acids. A logistic regression method was then utilized to integrate these information for generating a novel lysine acetylation prediction system named LAceP. When compared with existing methods, LAceP overwhelms most of state-of-the-art methods. Especially, LAceP has a more balanced prediction capability for positive and negative datasets.
LAceP can integrate different biological features to predict lysine acetylation with high accuracy. An online web server is freely available at
PMCID: PMC3930742  PMID: 24586884
3.  A Genome-Wide Association Study Identifies GRK5 and RASGRP1 as Type 2 Diabetes Loci in Chinese Hans 
Diabetes  2012;62(1):291-298.
Substantial progress has been made in identification of type 2 diabetes (T2D) risk loci in the past few years, but our understanding of the genetic basis of T2D in ethnically diverse populations remains limited. We performed a genome-wide association study and a replication study in Chinese Hans comprising 8,569 T2D case subjects and 8,923 control subjects in total, from which 10 single nucleotide polymorphisms were selected for further follow-up in a de novo replication sample of 3,410 T2D case and 3,412 control subjects and an in silico replication sample of 6,952 T2D case and 11,865 control subjects. Besides confirming seven established T2D loci (CDKAL1, CDKN2A/B, KCNQ1, CDC123, GLIS3, HNF1B, and DUSP9) at genome-wide significance, we identified two novel T2D loci, including G-protein–coupled receptor kinase 5 (GRK5) (rs10886471: P = 7.1 × 10−9) and RASGRP1 (rs7403531: P = 3.9 × 10−9), of which the association signal at GRK5 seems to be specific to East Asians. In nondiabetic individuals, the T2D risk-increasing allele of RASGRP1-rs7403531 was also associated with higher HbA1c and lower homeostasis model assessment of β-cell function (P = 0.03 and 0.0209, respectively), whereas the T2D risk-increasing allele of GRK5-rs10886471 was also associated with higher fasting insulin (P = 0.0169) but not with fasting glucose. Our findings not only provide new insights into the pathophysiology of T2D, but may also shed light on the ethnic differences in T2D susceptibility.
PMCID: PMC3526061  PMID: 22961080
4.  Network Analysis Reveals Functional Cross-links between Disease and Inflammation Genes 
Scientific Reports  2013;3:3426.
Connections between inflammation and diseases are suggested important in understanding the genetic mechanisms of diseases. However, studies on the functional cross-links between inflammation and disease genes are still in their early stages. We integrated the protein–protein interaction (PPI), inflammation genes, and gene–disease associations to construct a disease-inflammation network (DIN). We found that nodes, which are both inflammation and disease genes (namely inter-genes), are topologically important in the DIN structure. Via mapping inter-genes to PPI, we classified diseases into two categories, which are significantly different in Intimacy measuring the contribution of inflammation genes to the connections between disease pairs. Furthermore, we constructed a cross-talking subpathways network. As indicated, the cross-subpathway analysis shows great performance in capturing higher-level relationship among inflammation and disease processes. Collectively, The network-based analysis provides us a rather promising insight into the intricate relationship between inflammation and disease genes.
PMCID: PMC3851881  PMID: 24305783
5.  Epitope Predictions Indicate the Presence of Two Distinct Types of Epitope-Antibody-Reactivities Determined by Epitope Profiling of Intravenous Immunoglobulins 
PLoS ONE  2013;8(11):e78605.
Epitope-antibody-reactivities (EAR) of intravenous immunoglobulins (IVIGs) determined for 75,534 peptides by microarray analysis demonstrate that roughly 9% of peptides derived from 870 different human protein sequences react with antibodies present in IVIG. Computational prediction of linear B cell epitopes was conducted using machine learning with an ensemble of classifiers in combination with position weight matrix (PWM) analysis. Machine learning slightly outperformed PWM with area under the curve (AUC) of 0.884 vs. 0.849. Two different types of epitope-antibody recognition-modes (Type I EAR and Type II EAR) were found. Peptides of Type I EAR are high in tyrosine, tryptophan and phenylalanine, and low in asparagine, glutamine and glutamic acid residues, whereas for peptides of Type II EAR it is the other way around. Representative crystal structures present in the Protein Data Bank (PDB) of Type I EAR are PDB 1TZI and PDB 2DD8, while PDB 2FD6 and 2J4W are typical for Type II EAR. Type I EAR peptides share predicted propensities for being presented by MHC class I and class II complexes. The latter interaction possibly favors T cell-dependent antibody responses including IgG class switching. Peptides of Type II EAR are predicted not to be preferentially presented by MHC complexes, thus implying the involvement of T cell-independent IgG class switch mechanisms. The high extent of IgG immunoglobulin reactivity with human peptides implies that circulating IgG molecules are prone to bind to human protein/peptide structures under non-pathological, non-inflammatory conditions. A webserver for predicting EAR of peptide sequences is available at
PMCID: PMC3823795  PMID: 24244326
6.  A Quantitative Analysis of the Impact on Chromatin Accessibility by Histone Modifications and Binding of Transcription Factors in DNase I Hypersensitive Sites 
BioMed Research International  2013;2013:914971.
It is known that chromatin features such as histone modifications and the binding of transcription factors exert a significant impact on the “openness” of chromatin. In this study, we present a quantitative analysis of the genome-wide relationship between chromatin features and chromatin accessibility in DNase I hypersensitive sites. We found that these features show distinct preference to localize in open chromatin. In order to elucidate the exact impact, we derived quantitative models to directly predict the “openness” of chromatin using histone modification features and transcription factor binding features, respectively. We show that these two types of features are highly predictive for chromatin accessibility in a statistical viewpoint. Moreover, our results indicate that these features are highly redundant and only a small number of features are needed to achieve a very high predictive power. Our study provides new insights into the true biological phenomena and the combinatorial effects of chromatin features to differential DNase I hypersensitivity.
PMCID: PMC3819824  PMID: 24236298
7.  The Schistosoma japonicum genome reveals features of host-parasite interplay 
Nature  2009;460(7253):345-351.
Schistosoma japonicum is a parasitic flatworm that causes human schistosomiasis, a significant cause of morbidity in China and the Philippines. Here we present a draft genomic sequence for the worm, which is the first reported for any flatworm, indeed for the superphylum Lophotrochozoa. The genome provides a global insight into the molecular architecture and host interaction of this complex metazoan pathogen, revealing that it can exploit host nutrients, neuroendocrine hormones and signaling pathways for growth, development and maturation. Having a complex nervous system and a well developed sensory system, S. japonicum can accept stimulation of the corresponding ligands as a physiological response to different environments, such as fresh water or the tissues of its intermediate and mammalian hosts. Numerous proteinases, including cercarial elastase, are implicated in mammalian skin penetration and haemoglobin degradation. The genomic information will serve as a valuable platform to facilitate development of new interventions for schistosomiasis control.
PMCID: PMC3747554  PMID: 19606140
8.  Effects of smoking and smoking cessation on human serum metabolite profile: results from the KORA cohort study 
BMC Medicine  2013;11:60.
Metabolomics helps to identify links between environmental exposures and intermediate biomarkers of disturbed pathways. We previously reported variations in phosphatidylcholines in male smokers compared with non-smokers in a cross-sectional pilot study with a small sample size, but knowledge of the reversibility of smoking effects on metabolite profiles is limited. Here, we extend our metabolomics study with a large prospective study including female smokers and quitters.
Using targeted metabolomics approach, we quantified 140 metabolite concentrations for 1,241 fasting serum samples in the population-based Cooperative Health Research in the Region of Augsburg (KORA) human cohort at two time points: baseline survey conducted between 1999 and 2001 and follow-up after seven years. Metabolite profiles were compared among groups of current smokers, former smokers and never smokers, and were further assessed for their reversibility after smoking cessation. Changes in metabolite concentrations from baseline to the follow-up were investigated in a longitudinal analysis comparing current smokers, never smokers and smoking quitters, who were current smokers at baseline but former smokers by the time of follow-up. In addition, we constructed protein-metabolite networks with smoking-related genes and metabolites.
We identified 21 smoking-related metabolites in the baseline investigation (18 in men and six in women, with three overlaps) enriched in amino acid and lipid pathways, which were significantly different between current smokers and never smokers. Moreover, 19 out of the 21 metabolites were found to be reversible in former smokers. In the follow-up study, 13 reversible metabolites in men were measured, of which 10 were confirmed to be reversible in male quitters. Protein-metabolite networks are proposed to explain the consistent reversibility of smoking effects on metabolites.
We showed that smoking-related changes in human serum metabolites are reversible after smoking cessation, consistent with the known cardiovascular risk reduction. The metabolites identified may serve as potential biomarkers to evaluate the status of smoking cessation and characterize smoking-related diseases.
PMCID: PMC3653729  PMID: 23497222
metabolic network; metabolomics; molecular epidemiology; smoking; smoking cessation
9.  Genome Sequence of Bacillus licheniformis CGMCC3963, a Stress-Resistant Strain Isolated in a Chinese Traditional Solid-State Liquor-Making Process 
Genome Announcements  2013;1(1):e00060-12.
Bacillus licheniformis CGMCC3963 is an important mao-tai flavor-producing strain. It was isolated from the starter (Daqu) of a Chinese mao-tai-flavor liquor fermentation process with solid-state fermentation. We report its genome of 4,525,096 bp here. Many potential insertion genes that are responsible for the unique properties of B. licheniformis CGMCC3963 in mao-tai-flavor liquor production were identified.
PMCID: PMC3569315  PMID: 23405325
10.  Evolutionary relationships between miRNA genes and their activity 
BMC Genomics  2012;13:718.
The emergence of vertebrates is characterized by a strong increase in miRNA families. MicroRNAs interact broadly with many transcripts, and the evolution of such a system is intriguing. However, evolutionary questions concerning the origin of miRNA genes and their subsequent evolution remain unexplained.
In order to systematically understand the evolutionary relationship between miRNAs gene and their function, we classified human known miRNAs into eight groups based on their evolutionary ages estimated by maximum parsimony method. New miRNA genes with new functional sequences accumulated more dynamically in vertebrates than that observed in Drosophila. Different levels of evolutionary selection were observed over miRNA gene sequences with different time of origin. Most genic miRNAs differ from their host genes in time of origin, there is no particular relationship between the age of a miRNA and the age of its host genes, genic miRNAs are mostly younger than the corresponding host genes. MicroRNAs originated over different time-scales are often predicted/verified to target the same or overlapping sets of genes, opening the possibility of substantial functional redundancy among miRNAs of different ages. Higher degree of tissue specificity and lower expression level was found in young miRNAs.
Our data showed that compared with protein coding genes, miRNA genes are more dynamic in terms of emergence and decay. Evolution patterns are quite different between miRNAs of different ages. MicroRNAs activity is under tight control with well-regulated expression increased and targeting decreased over time. Our work calls attention to the study of miRNA activity with a consideration of their origin time.
PMCID: PMC3544654  PMID: 23259970
11.  Optimizing hybrid assembly of next-generation sequence data from Enterococcus faecium: a microbe with highly divergent genome 
BMC Systems Biology  2012;6(Suppl 3):S21.
Sequencing of bacterial genomes became an essential approach to study pathogen virulence and the phylogenetic relationship among close related strains. Bacterium Enterococcus faecium emerged as an important nosocomial pathogen that were often associated with resistance to common antibiotics in hospitals. With highly divergent gene contents, it presented a challenge to the next generation sequencing (NGS) technologies featuring high-throughput and shorter read-length. This study was designed to investigate the properties and systematic biases of NGS technologies and evaluate critical parameters influencing the outcomes of hybrid assemblies using combinations of NGS data.
A hospital strain of E. faecium was sequenced using three different NGS platforms: 454 GS-FLX, Illumina GAIIx, and ABI SOLiD4.0, to approximately 28-, 500-, and 400-fold coverage depth. We built a pipeline that merged contigs from each NGS data into hybrid assemblies. The results revealed that each single NGS assembly had a ceiling in continuity that could not be overcome by simply increasing data coverage depth. Each NGS technology displayed some intrinsic properties, i.e. base calling error, systematic bias, etc. The gaps and low coverage regions of each NGS assembly were associated with lower GC contents. In order to optimize the hybrid assembly approach, we tested with varying amount and different combination of NGS data, and obtained optimal conditions for assembly continuity. We also, for the first time, showed that SOLiD data could help make much improved assemblies of E. faecium genome using the hybrid approach when combined with other type of NGS data.
The current study addressed the difficult issue of how to most effectively construct a complete microbial genome using today's state of the art sequencing technologies. We characterized the sequence data and genome assembly from each NGS technologies, tested conditions for hybrid assembly with combinations of NGS data, and obtained optimized parameters for achieving most cost-efficiency assembly. Our study helped form some guidelines to direct genomic work on other microorganisms, thus have important practical implications.
PMCID: PMC3524012  PMID: 23282199
12.  A cross-species analysis method to analyze animal models' similarity to human's disease state 
BMC Systems Biology  2012;6(Suppl 3):S18.
Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism.
We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery.
We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning.
We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology.
PMCID: PMC3524072  PMID: 23282076
13.  An improved dimensionality reduction method for meta-transcriptome indexing based diseases classification 
BMC Systems Biology  2012;6(Suppl 3):S12.
Bacterial 16S Ribosomal RNAs profiling have been widely used in the classification of microbiota associated diseases. Dimensionality reduction is among the keys in mining high-dimensional 16S rRNAs' expression data. High levels of sparsity and redundancy are common in 16S rRNA gene microbial surveys. Traditional feature selection methods are generally restricted to measuring correlated abundances, and are limited in discrimination when so few microbes are actually shared across communities.
Here we present a Feature Merging and Selection algorithm (FMS) to deal with 16S rRNAs' expression data. By integrating Linear Discriminant Analysis method, FMS can reduce the feature dimension with higher accuracy and preserve the relationship between different features as well. Two 16S rRNAs' expression datasets of pneumonia and dental decay patients were used to test the validity of the algorithm. Combined with SVM, FMS discriminated different classes of both pneumonia and dental caries better than other popular feature selection methods.
FMS projects data into lower dimension with preservation of enough features, and thus improve the intelligibility of the result. The results showed that FMS is a more valid and reliable methods in feature reduction.
PMCID: PMC3524076  PMID: 23281712
14.  Differential combinatorial regulatory network analysis related to venous metastasis of hepatocellular carcinoma 
BMC Genomics  2012;13(Suppl 8):S14.
Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world, and metastasis is a significant cause to the high mortality in patients with HCC. However, the molecular mechanism behind HCC metastasis is not fully understood. Study of regulatory networks may help investigate HCC metastasis in the way of systems biology profiling.
By utilizing both sequence information and parallel microRNA(miRNA) and mRNA expression data on the same cohort of HBV related HCC patients without or with venous metastasis, we constructed combinatorial regulatory networks of non-metastatic and metastatic HCC which contain transcription factor(TF) regulation and miRNA regulation. Differential regulation patterns, classifying marker modules, and key regulatory miRNAs were analyzed by comparing non-metastatic and metastatic networks.
Globally TFs accounted for the main part of regulation while miRNAs for the minor part of regulation. However miRNAs displayed a more active role in the metastatic network than in the non-metastatic one. Seventeen differential regulatory modules discriminative of the metastatic status were identified as cumulative-module classifier, which could also distinguish survival time. MiR-16, miR-30a, Let-7e and miR-204 were identified as key miRNA regulators contributed to HCC metastasis.
In this work we demonstrated an integrative approach to conduct differential combinatorial regulatory network analysis in the specific context venous metastasis of HBV-HCC. Our results proposed possible transcriptional regulatory patterns underlying the different metastatic subgroups of HCC. The workflow in this study can be applied in similar context of cancer research and could also be extended to other clinical topics.
PMCID: PMC3535701  PMID: 23282077
15.  Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics 
BMC Genomics  2012;13(Suppl 8):S19.
Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved.
In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets.
Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences.
This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences.
PMCID: PMC3535712  PMID: 23282225
16.  Systematic Comparison of C3 and C4 Plants Based on Metabolic Network Analysis 
BMC Systems Biology  2012;6(Suppl 2):S9.
The C4 photosynthetic cycle supercharges photosynthesis by concentrating CO2 around ribulose-1,5-bisphosphate carboxylase and significantly reduces the oxygenation reaction. Therefore engineering C4 feature into C3 plants has been suggested as a feasible way to increase photosynthesis and yield of C3 plants, such as rice, wheat, and potato. To identify the possible transition from C3 to C4 plants, the systematic comparison of C3 and C4 metabolism is necessary.
We compared C3 and C4 metabolic networks using the improved constraint-based models for Arabidopsis and maize. By graph theory, we found the C3 network exhibit more dense topology structure than C4. The simulation of enzyme knockouts demonstrated that both C3 and C4 networks are very robust, especially when optimizing CO2 fixation. Moreover, C4 plant has better robustness no matter the objective function is biomass synthesis or CO2 fixation. In addition, all the essential reactions in C3 network are also essential for C4, while there are some other reactions specifically essential for C4, which validated that the basic metabolism of C4 plant is similar to C3, but C4 is more complex. We also identified more correlated reaction sets in C4, and demonstrated C4 plants have better modularity with complex mechanism coordinates the reactions and pathways than that of C3 plants. We also found the increase of both biomass production and CO2 fixation with light intensity and CO2 concentration in C4 is faster than that in C3, which reflected more efficient use of light and CO2 in C4 plant. Finally, we explored the contribution of different C4 subtypes to biomass production by setting specific constraints.
All results are consistent with the actual situation, which indicate that Flux Balance Analysis is a powerful method to study plant metabolism at systems level. We demonstrated that in contrast to C3, C4 plants have less dense topology, higher robustness, better modularity, and higher CO2 and radiation use efficiency. In addition, preliminary analysis indicated that the rate of CO2 fixation and biomass production in PCK subtype are superior to NADP-ME and NAD-ME subtypes under enough supply of water and nitrogen.
PMCID: PMC3521184  PMID: 23281598
17.  SchistoDB: an updated genome resource for the three key schistosomes of humans 
Nucleic Acids Research  2012;41(D1):D728-D731.
The new release of SchistoDB ( provides a rich resource of genomic data for key blood flukes (genus Schistosoma) which cause disease in hundreds of millions of people worldwide. SchistoDB integrates whole-genome sequence and annotation of three species of the genus and provides enhanced bioinformatics analyses and data-mining tools. A simple, yet comprehensive web interface provided through the Strategies Web Development Kit is available for the mining and visualization of the data. Genomic scale data can be queried based on BLAST searches, annotation keywords and gene ID searches, gene ontology terms, sequence motifs, protein characteristics and phylogenetic relationships. Search strategies can be saved within a user’s profile for future retrieval and may also be shared with other researchers using a unique web address.
PMCID: PMC3531198  PMID: 23161692
18.  Novel biomarkers for pre-diabetes identified by metabolomics 
A targeted metabolomics approach was used to identify candidate biomarkers of pre-diabetes. The relevance of the identified metabolites is further corroborated with a protein-metabolite interaction network and gene expression data.
Three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine C2) were found with significantly altered levels in pre-diabetic individuals compared with normal controls.Lower levels of glycine and LPC (18:2) were found to predict risks for pre-diabetes and type 2 diabetes (T2D).Seven T2D-related genes (PPARG, TCF7L2, HNF1A, GCK, IGF1, IRS1 and IDE) are functionally associated with the three identified metabolites.The unique combination of methodologies, including prospective population-based and nested case–control, as well as cross-sectional studies, was essential for the identification of the reported biomarkers.
Type 2 diabetes (T2D) can be prevented in pre-diabetic individuals with impaired glucose tolerance (IGT). Here, we have used a metabolomics approach to identify candidate biomarkers of pre-diabetes. We quantified 140 metabolites for 4297 fasting serum samples in the population-based Cooperative Health Research in the Region of Augsburg (KORA) cohort. Our study revealed significant metabolic variation in pre-diabetic individuals that are distinct from known diabetes risk indicators, such as glycosylated hemoglobin levels, fasting glucose and insulin. We identified three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine) that had significantly altered levels in IGT individuals as compared to those with normal glucose tolerance, with P-values ranging from 2.4 × 10−4 to 2.1 × 10−13. Lower levels of glycine and LPC were found to be predictors not only for IGT but also for T2D, and were independently confirmed in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort. Using metabolite–protein network analysis, we identified seven T2D-related genes that are associated with these three IGT-specific metabolites by multiple interactions with four enzymes. The expression levels of these enzymes correlate with changes in the metabolite concentrations linked to diabetes. Our results may help developing novel strategies to prevent T2D.
PMCID: PMC3472689  PMID: 23010998
early diagnostic biomarkers; IGT; metabolomics; prediction; T2D
19.  Towards biological characters of interactions between transcription factors and their DNA targets in mammals 
BMC Genomics  2012;13:388.
In post-genomic era, the study of transcriptional regulation is pivotal to decode genetic information. Transcription factors (TFs) are central proteins for transcriptional regulation, and interactions between TFs and their DNA targets (TFBSs) are important for downstream genes’ expression. However, the lack of knowledge about interactions between TFs and TFBSs is still baffling people to investigate the mechanism of transcription.
To expand the knowledge about interactions between TFs and TFBSs, three biological features (sequence feature, structure feature, and evolution feature) were utilized to build TFBS identification models for studying binding preference between TFs and their DNA targets in mammals. Results show that each feature does have fairly well performance to capture TFBSs, and the hybrid model combined all three features is more robust for TFBS identification. Subsequently, correspondence between TFs and their TFBSs was investigated to explore interactions among them in mammals. Results indicate that TFs and TFBSs are reciprocal in sequence, structure, and evolution level.
Our work demonstrates that, to some extent, TFs and TFBSs have developed a coevolutionary relationship in order to keep their physical binding and maintain their regulatory functions. In summary, our work will help understand transcriptional regulation and interpret binding mechanism between proteins and DNAs.
PMCID: PMC3472306  PMID: 22888987
20.  Comparative proteogenomic analysis of the Leptospira interrogans virulence-attenuated strain IPAV against the pathogenic strain 56601 
Cell Research  2011;21(8):1210-1229.
The virulence-attenuated Leptospira interrogans serovar Lai strain IPAV was derived by prolonged laboratory passage from a highly virulent ancestral strain isolated in China. We studied the genetic variations of IPAV that render it avirulent via comparative analysis against the pathogenic L. interrogans serovar Lai strain 56601. The complete genome sequence of the IPAV strain was determined and used to compare with, and then rectify and reannotate the genome sequence of strain 56601. Aside from their highly similar genomic structure and gene order, a total of 33 insertions, 53 deletions and 301 single-nucleotide variations (SNVs) were detected throughout the genome of IPAV directly affecting 101 genes, either in their 5′ upstream region or within their coding region. Among them, the majority of the 44 functional genes are involved in signal transduction, stress response, transmembrane transport and nitrogen metabolism. Comparative proteomic analysis based on quantitative liquid chromatography (LC)-MS/MS data revealed that among 1 627 selected pairs of orthologs, 174 genes in the IPAV strain were upregulated, with enrichment mainly in classes of energy production and lipid metabolism. In contrast, 228 genes in strain 56601 were upregulated, with the majority enriched in the categories of protein translation and DNA replication/repair. The combination of genomic and proteomic approaches illustrated that altered expression or mutations in critical genes, such as those encoding a Ser/Thr kinase, carbon-starvation protein CstA, glutamine synthetase, GTP-binding protein BipA, ribonucleotide-diphosphate reductase and phosphate transporter, and alterations in the translational profile of lipoproteins or outer membrane proteins are likely to account for the virulence attenuation in strain IPAV.
PMCID: PMC3193473  PMID: 21423275
genome; proteome; Leptospira; virulence; mutation
21.  SyStemCell: A Database Populated with Multiple Levels of Experimental Data from Stem Cell Differentiation Research 
PLoS ONE  2012;7(7):e35230.
Elucidation of the mechanisms of stem cell differentiation is of great scientific interest. Increasing evidence suggests that stem cell differentiation involves changes at multiple levels of biological regulation, which together orchestrate the complex differentiation process; many related studies have been performed to investigate the various levels of regulation. The resulting valuable data, however, remain scattered. Most of the current stem cell-relevant databases focus on a single level of regulation (mRNA expression) from limited stem cell types; thus, a unifying resource would be of great value to compile the multiple levels of research data available. Here we present a database for this purpose, SyStemCell, deposited with multi-level experimental data from stem cell research. The database currently covers seven levels of stem cell differentiation-associated regulatory mechanisms, including DNA CpG 5-hydroxymethylcytosine/methylation, histone modification, transcript products, microRNA-based regulation, protein products, phosphorylation proteins and transcription factor regulation, all of which have been curated from 285 peer-reviewed publications selected from PubMed. The database contains 43,434 genes, recorded as 942,221 gene entries, for four organisms (Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta) and various stem cell sources (e.g., embryonic stem cells, neural stem cells and induced pluripotent stem cells). Data in SyStemCell can be queried by Entrez gene ID, symbol, alias, or browsed by specific stem cell type at each level of genetic regulation. An online analysis tool is integrated to assist researchers to mine potential relationships among different regulations, and the potential usage of the database is demonstrated by three case studies. SyStemCell is the first database to bridge multi-level experimental information of stem cell studies, which can become an important reference resource for stem cell researchers. The database is available at
PMCID: PMC3396617  PMID: 22807998
22.  Associations between Ionomic Profile and Metabolic Abnormalities in Human Population 
PLoS ONE  2012;7(6):e38845.
Few studies assessed effects of individual and multiple ions simultaneously on metabolic outcomes, due to methodological limitation.
Methodology/Principal Findings
By combining advanced ionomics and mutual information, a quantifying measurement for mutual dependence between two random variables, we investigated associations of ion modules/networks with overweight/obesity, metabolic syndrome (MetS) and type 2 diabetes (T2DM) in 976 middle-aged Chinese men and women. Fasting plasma ions were measured by inductively coupled plasma mass spectroscopy. Significant ion modules were selected by mutual information to construct disease related ion networks. Plasma copper and phosphorus always ranked the first two among three specific ion networks associated with overweight/obesity, MetS and T2DM. Comparing the ranking of ion individually and in networks, three patterns were observed (1) “Individual ion,” such as potassium and chrome, which tends to work alone; (2) “Module ion,” such as iron in T2DM, which tends to act in modules/network; and (3) “Module-individual ion,” such as copper in overweight/obesity, which seems to work equivalently in either way.
In conclusion, by using the novel approach of the ionomics strategy and the information theory, we observed potential associations of ions individually or as modules/networks with metabolic disorders. Certainly, these findings need to be confirmed in future biological studies.
PMCID: PMC3374762  PMID: 22719963
23.  Combining ZHENG Theory and High-Throughput Expression Data to Predict New Effects of Chinese Herbal Formulae 
ZHENG is the key theory in traditional Chinese medicine (TCM) and it is very important to find the molecular pharmacology of traditional Chinese herbal formulae. One ZHENG is related to many diseases and the herbal formulae are aiming to ZHENG. Therefore, many herbal formulae whose effects on a certain disease have been confirmed might also treat other diseases with the same ZHENG. In this study, the microarrays collected from patients with QiXuXueYu ZHENG (Qi-deficiency and Blood-stasis syndrome) before treatment and after being treated with Fuzheng Huayu Capsule were analyzed by a high-throughput gene microarrays-based drug similarity comparison method, which could find the small molecules which had similar effects with Fuzheng Huayu Capsule. Besides getting the results of anti-inflammatory and anti-fibrosis drugs which embody the known effect of Fuzheng Huayu Capsule, many other small molecules were screened out and could reflect other types of effects of this formula in treating QiXuXueYu ZHENG, including anti-hyperglycemic, anti-hyperlipidemic, hyposenstive effect. Then we integrated this information to display the effect of Fuzheng Huayu Capsule and its potential multiple-target molecular pharmacology. Moreover, through using clinical blood-tested data to verify our prediction, Fuzheng Huayu Capsule was proved to have effects on diabetes and dyslipidemia.
PMCID: PMC3359832  PMID: 22666299
24.  Insight into the Stability of Cross-β Amyloid Fibril from VEALYL Short Peptide with Molecular Dynamics Simulation 
PLoS ONE  2012;7(5):e36382.
Amyloid fibrils are found in many fatal neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, type II diabetes, and prion disease. The VEALYL short peptide from insulin has been confirmed to aggregate amyloid-like fibrils. However, the aggregation mechanism of amyloid fibril is poorly understood. Here, we utilized molecular dynamics simulation to analyse the stability of VEALYL hexamer. The statistical results indicate that hydrophobic residues play key roles in stabilizing VEALYL hexamer. Single point and two linkage mutants confirmed that Val1, Leu4, and Tyr5 of VEALYL are key residues. The consistency of the results for the VEALYL oligomer suggests that the intermediate states might be trimer (3-0) and pentamer(3-2). These results can help us to obtain an insight into the aggregation mechanism of amyloid fibril. These methods can be used to study the stability of amyloid fibril from other short peptides.
PMCID: PMC3349666  PMID: 22590535
25.  A new strategy for better genome assembly from very short reads 
BMC Bioinformatics  2011;12:493.
With the rapid development of the next generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. Because of repetitive regions of genomes and some other factors, assembly of very short reads is still a challenging issue.
A novel strategy for improving genome assembly from very short reads is proposed. It can increase accuracies of assemblies by integrating de novo contigs, and produce comparative contigs by allowing multiple references without limiting to genomes of closely related strains. Comparative contigs are used to scaffold de novo contigs. Using simulated and real datasets, it is shown that our strategy can effectively improve qualities of assemblies of isolated microbial genomes and metagenomes.
With more and more reference genomes available, our strategy will be useful to improve qualities of genome assemblies from very short reads. Some scripts are provided to make our strategy applicable at
PMCID: PMC3268122  PMID: 22208765

Results 1-25 (79)