Schistosoma japonicum is a parasitic flatworm that causes human schistosomiasis, a significant cause of morbidity in China and the Philippines. Here we present a draft genomic sequence for the worm, which is the first reported for any flatworm, indeed for the superphylum Lophotrochozoa. The genome provides a global insight into the molecular architecture and host interaction of this complex metazoan pathogen, revealing that it can exploit host nutrients, neuroendocrine hormones and signaling pathways for growth, development and maturation. Having a complex nervous system and a well developed sensory system, S. japonicum can accept stimulation of the corresponding ligands as a physiological response to different environments, such as fresh water or the tissues of its intermediate and mammalian hosts. Numerous proteinases, including cercarial elastase, are implicated in mammalian skin penetration and haemoglobin degradation. The genomic information will serve as a valuable platform to facilitate development of new interventions for schistosomiasis control.
Metabolomics helps to identify links between environmental exposures and intermediate biomarkers of disturbed pathways. We previously reported variations in phosphatidylcholines in male smokers compared with non-smokers in a cross-sectional pilot study with a small sample size, but knowledge of the reversibility of smoking effects on metabolite profiles is limited. Here, we extend our metabolomics study with a large prospective study including female smokers and quitters.
Using targeted metabolomics approach, we quantified 140 metabolite concentrations for 1,241 fasting serum samples in the population-based Cooperative Health Research in the Region of Augsburg (KORA) human cohort at two time points: baseline survey conducted between 1999 and 2001 and follow-up after seven years. Metabolite profiles were compared among groups of current smokers, former smokers and never smokers, and were further assessed for their reversibility after smoking cessation. Changes in metabolite concentrations from baseline to the follow-up were investigated in a longitudinal analysis comparing current smokers, never smokers and smoking quitters, who were current smokers at baseline but former smokers by the time of follow-up. In addition, we constructed protein-metabolite networks with smoking-related genes and metabolites.
We identified 21 smoking-related metabolites in the baseline investigation (18 in men and six in women, with three overlaps) enriched in amino acid and lipid pathways, which were significantly different between current smokers and never smokers. Moreover, 19 out of the 21 metabolites were found to be reversible in former smokers. In the follow-up study, 13 reversible metabolites in men were measured, of which 10 were confirmed to be reversible in male quitters. Protein-metabolite networks are proposed to explain the consistent reversibility of smoking effects on metabolites.
We showed that smoking-related changes in human serum metabolites are reversible after smoking cessation, consistent with the known cardiovascular risk reduction. The metabolites identified may serve as potential biomarkers to evaluate the status of smoking cessation and characterize smoking-related diseases.
metabolic network; metabolomics; molecular epidemiology; smoking; smoking cessation
Bacillus licheniformis CGMCC3963 is an important mao-tai flavor-producing strain. It was isolated from the starter (Daqu) of a Chinese mao-tai-flavor liquor fermentation process with solid-state fermentation. We report its genome of 4,525,096 bp here. Many potential insertion genes that are responsible for the unique properties of B. licheniformis CGMCC3963 in mao-tai-flavor liquor production were identified.
The emergence of vertebrates is characterized by a strong increase in miRNA families. MicroRNAs interact broadly with many transcripts, and the evolution of such a system is intriguing. However, evolutionary questions concerning the origin of miRNA genes and their subsequent evolution remain unexplained.
In order to systematically understand the evolutionary relationship between miRNAs gene and their function, we classified human known miRNAs into eight groups based on their evolutionary ages estimated by maximum parsimony method. New miRNA genes with new functional sequences accumulated more dynamically in vertebrates than that observed in Drosophila. Different levels of evolutionary selection were observed over miRNA gene sequences with different time of origin. Most genic miRNAs differ from their host genes in time of origin, there is no particular relationship between the age of a miRNA and the age of its host genes, genic miRNAs are mostly younger than the corresponding host genes. MicroRNAs originated over different time-scales are often predicted/verified to target the same or overlapping sets of genes, opening the possibility of substantial functional redundancy among miRNAs of different ages. Higher degree of tissue specificity and lower expression level was found in young miRNAs.
Our data showed that compared with protein coding genes, miRNA genes are more dynamic in terms of emergence and decay. Evolution patterns are quite different between miRNAs of different ages. MicroRNAs activity is under tight control with well-regulated expression increased and targeting decreased over time. Our work calls attention to the study of miRNA activity with a consideration of their origin time.
Sequencing of bacterial genomes became an essential approach to study pathogen virulence and the phylogenetic relationship among close related strains. Bacterium Enterococcus faecium emerged as an important nosocomial pathogen that were often associated with resistance to common antibiotics in hospitals. With highly divergent gene contents, it presented a challenge to the next generation sequencing (NGS) technologies featuring high-throughput and shorter read-length. This study was designed to investigate the properties and systematic biases of NGS technologies and evaluate critical parameters influencing the outcomes of hybrid assemblies using combinations of NGS data.
A hospital strain of E. faecium was sequenced using three different NGS platforms: 454 GS-FLX, Illumina GAIIx, and ABI SOLiD4.0, to approximately 28-, 500-, and 400-fold coverage depth. We built a pipeline that merged contigs from each NGS data into hybrid assemblies. The results revealed that each single NGS assembly had a ceiling in continuity that could not be overcome by simply increasing data coverage depth. Each NGS technology displayed some intrinsic properties, i.e. base calling error, systematic bias, etc. The gaps and low coverage regions of each NGS assembly were associated with lower GC contents. In order to optimize the hybrid assembly approach, we tested with varying amount and different combination of NGS data, and obtained optimal conditions for assembly continuity. We also, for the first time, showed that SOLiD data could help make much improved assemblies of E. faecium genome using the hybrid approach when combined with other type of NGS data.
The current study addressed the difficult issue of how to most effectively construct a complete microbial genome using today's state of the art sequencing technologies. We characterized the sequence data and genome assembly from each NGS technologies, tested conditions for hybrid assembly with combinations of NGS data, and obtained optimized parameters for achieving most cost-efficiency assembly. Our study helped form some guidelines to direct genomic work on other microorganisms, thus have important practical implications.
Animal models are indispensable tools in studying the cause of human diseases and searching for the treatments. The scientific value of an animal model depends on the accurate mimicry of human diseases. The primary goal of the current study was to develop a cross-species method by using the animal models' expression data to evaluate the similarity to human diseases' and assess drug molecules' efficiency in drug research. Therefore, we hoped to reveal that it is feasible and useful to compare gene expression profiles across species in the studies of pathology, toxicology, drug repositioning, and drug action mechanism.
We developed a cross-species analysis method to analyze animal models' similarity to human diseases and effectiveness in drug research by utilizing the existing animal gene expression data in the public database, and mined some meaningful information to help drug research, such as potential drug candidates, possible drug repositioning, side effects and analysis in pharmacology. New animal models could be evaluated by our method before they are used in drug discovery.
We applied the method to several cases of known animal model expression profiles and obtained some useful information to help drug research. We found that trichostatin A and some other HDACs could have very similar response across cell lines and species at gene expression level. Mouse hypoxia model could accurately mimic the human hypoxia, while mouse diabetes drug model might have some limitation. The transgenic mouse of Alzheimer was a useful model and we deeply analyzed the biological mechanisms of some drugs in this case. In addition, all the cases could provide some ideas for drug discovery and drug repositioning.
We developed a new cross-species gene expression module comparison method to use animal models' expression data to analyse the effectiveness of animal models in drug research. Moreover, through data integration, our method could be applied for drug research, such as potential drug candidates, possible drug repositioning, side effects and information about pharmacology.
Bacterial 16S Ribosomal RNAs profiling have been widely used in the classification of microbiota associated diseases. Dimensionality reduction is among the keys in mining high-dimensional 16S rRNAs' expression data. High levels of sparsity and redundancy are common in 16S rRNA gene microbial surveys. Traditional feature selection methods are generally restricted to measuring correlated abundances, and are limited in discrimination when so few microbes are actually shared across communities.
Here we present a Feature Merging and Selection algorithm (FMS) to deal with 16S rRNAs' expression data. By integrating Linear Discriminant Analysis method, FMS can reduce the feature dimension with higher accuracy and preserve the relationship between different features as well. Two 16S rRNAs' expression datasets of pneumonia and dental decay patients were used to test the validity of the algorithm. Combined with SVM, FMS discriminated different classes of both pneumonia and dental caries better than other popular feature selection methods.
FMS projects data into lower dimension with preservation of enough features, and thus improve the intelligibility of the result. The results showed that FMS is a more valid and reliable methods in feature reduction.
Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world, and metastasis is a significant cause to the high mortality in patients with HCC. However, the molecular mechanism behind HCC metastasis is not fully understood. Study of regulatory networks may help investigate HCC metastasis in the way of systems biology profiling.
By utilizing both sequence information and parallel microRNA(miRNA) and mRNA expression data on the same cohort of HBV related HCC patients without or with venous metastasis, we constructed combinatorial regulatory networks of non-metastatic and metastatic HCC which contain transcription factor(TF) regulation and miRNA regulation. Differential regulation patterns, classifying marker modules, and key regulatory miRNAs were analyzed by comparing non-metastatic and metastatic networks.
Globally TFs accounted for the main part of regulation while miRNAs for the minor part of regulation. However miRNAs displayed a more active role in the metastatic network than in the non-metastatic one. Seventeen differential regulatory modules discriminative of the metastatic status were identified as cumulative-module classifier, which could also distinguish survival time. MiR-16, miR-30a, Let-7e and miR-204 were identified as key miRNA regulators contributed to HCC metastasis.
In this work we demonstrated an integrative approach to conduct differential combinatorial regulatory network analysis in the specific context venous metastasis of HBV-HCC. Our results proposed possible transcriptional regulatory patterns underlying the different metastatic subgroups of HCC. The workflow in this study can be applied in similar context of cancer research and could also be extended to other clinical topics.
Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved.
In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets.
Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences.
This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences.
The C4 photosynthetic cycle supercharges photosynthesis by concentrating CO2 around ribulose-1,5-bisphosphate carboxylase and significantly reduces the oxygenation reaction. Therefore engineering C4 feature into C3 plants has been suggested as a feasible way to increase photosynthesis and yield of C3 plants, such as rice, wheat, and potato. To identify the possible transition from C3 to C4 plants, the systematic comparison of C3 and C4 metabolism is necessary.
We compared C3 and C4 metabolic networks using the improved constraint-based models for Arabidopsis and maize. By graph theory, we found the C3 network exhibit more dense topology structure than C4. The simulation of enzyme knockouts demonstrated that both C3 and C4 networks are very robust, especially when optimizing CO2 fixation. Moreover, C4 plant has better robustness no matter the objective function is biomass synthesis or CO2 fixation. In addition, all the essential reactions in C3 network are also essential for C4, while there are some other reactions specifically essential for C4, which validated that the basic metabolism of C4 plant is similar to C3, but C4 is more complex. We also identified more correlated reaction sets in C4, and demonstrated C4 plants have better modularity with complex mechanism coordinates the reactions and pathways than that of C3 plants. We also found the increase of both biomass production and CO2 fixation with light intensity and CO2 concentration in C4 is faster than that in C3, which reflected more efficient use of light and CO2 in C4 plant. Finally, we explored the contribution of different C4 subtypes to biomass production by setting specific constraints.
All results are consistent with the actual situation, which indicate that Flux Balance Analysis is a powerful method to study plant metabolism at systems level. We demonstrated that in contrast to C3, C4 plants have less dense topology, higher robustness, better modularity, and higher CO2 and radiation use efficiency. In addition, preliminary analysis indicated that the rate of CO2 fixation and biomass production in PCK subtype are superior to NADP-ME and NAD-ME subtypes under enough supply of water and nitrogen.
The new release of SchistoDB (http://SchistoDB.net) provides a rich resource of genomic data for key blood flukes (genus Schistosoma) which cause disease in hundreds of millions of people worldwide. SchistoDB integrates whole-genome sequence and annotation of three species of the genus and provides enhanced bioinformatics analyses and data-mining tools. A simple, yet comprehensive web interface provided through the Strategies Web Development Kit is available for the mining and visualization of the data. Genomic scale data can be queried based on BLAST searches, annotation keywords and gene ID searches, gene ontology terms, sequence motifs, protein characteristics and phylogenetic relationships. Search strategies can be saved within a user’s profile for future retrieval and may also be shared with other researchers using a unique web address.
A targeted metabolomics approach was used to identify candidate biomarkers of pre-diabetes. The relevance of the identified metabolites is further corroborated with a protein-metabolite interaction network and gene expression data.
Three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine C2) were found with significantly altered levels in pre-diabetic individuals compared with normal controls.Lower levels of glycine and LPC (18:2) were found to predict risks for pre-diabetes and type 2 diabetes (T2D).Seven T2D-related genes (PPARG, TCF7L2, HNF1A, GCK, IGF1, IRS1 and IDE) are functionally associated with the three identified metabolites.The unique combination of methodologies, including prospective population-based and nested case–control, as well as cross-sectional studies, was essential for the identification of the reported biomarkers.
Type 2 diabetes (T2D) can be prevented in pre-diabetic individuals with impaired glucose tolerance (IGT). Here, we have used a metabolomics approach to identify candidate biomarkers of pre-diabetes. We quantified 140 metabolites for 4297 fasting serum samples in the population-based Cooperative Health Research in the Region of Augsburg (KORA) cohort. Our study revealed significant metabolic variation in pre-diabetic individuals that are distinct from known diabetes risk indicators, such as glycosylated hemoglobin levels, fasting glucose and insulin. We identified three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine) that had significantly altered levels in IGT individuals as compared to those with normal glucose tolerance, with P-values ranging from 2.4 × 10−4 to 2.1 × 10−13. Lower levels of glycine and LPC were found to be predictors not only for IGT but also for T2D, and were independently confirmed in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort. Using metabolite–protein network analysis, we identified seven T2D-related genes that are associated with these three IGT-specific metabolites by multiple interactions with four enzymes. The expression levels of these enzymes correlate with changes in the metabolite concentrations linked to diabetes. Our results may help developing novel strategies to prevent T2D.
early diagnostic biomarkers; IGT; metabolomics; prediction; T2D
In post-genomic era, the study of transcriptional regulation is pivotal to decode genetic information. Transcription factors (TFs) are central proteins for transcriptional regulation, and interactions between TFs and their DNA targets (TFBSs) are important for downstream genes’ expression. However, the lack of knowledge about interactions between TFs and TFBSs is still baffling people to investigate the mechanism of transcription.
To expand the knowledge about interactions between TFs and TFBSs, three biological features (sequence feature, structure feature, and evolution feature) were utilized to build TFBS identification models for studying binding preference between TFs and their DNA targets in mammals. Results show that each feature does have fairly well performance to capture TFBSs, and the hybrid model combined all three features is more robust for TFBS identification. Subsequently, correspondence between TFs and their TFBSs was investigated to explore interactions among them in mammals. Results indicate that TFs and TFBSs are reciprocal in sequence, structure, and evolution level.
Our work demonstrates that, to some extent, TFs and TFBSs have developed a coevolutionary relationship in order to keep their physical binding and maintain their regulatory functions. In summary, our work will help understand transcriptional regulation and interpret binding mechanism between proteins and DNAs.
The virulence-attenuated Leptospira interrogans serovar Lai strain IPAV was derived by prolonged laboratory passage from a highly virulent ancestral strain isolated in China. We studied the genetic variations of IPAV that render it avirulent via comparative analysis against the pathogenic L. interrogans serovar Lai strain 56601. The complete genome sequence of the IPAV strain was determined and used to compare with, and then rectify and reannotate the genome sequence of strain 56601. Aside from their highly similar genomic structure and gene order, a total of 33 insertions, 53 deletions and 301 single-nucleotide variations (SNVs) were detected throughout the genome of IPAV directly affecting 101 genes, either in their 5′ upstream region or within their coding region. Among them, the majority of the 44 functional genes are involved in signal transduction, stress response, transmembrane transport and nitrogen metabolism. Comparative proteomic analysis based on quantitative liquid chromatography (LC)-MS/MS data revealed that among 1 627 selected pairs of orthologs, 174 genes in the IPAV strain were upregulated, with enrichment mainly in classes of energy production and lipid metabolism. In contrast, 228 genes in strain 56601 were upregulated, with the majority enriched in the categories of protein translation and DNA replication/repair. The combination of genomic and proteomic approaches illustrated that altered expression or mutations in critical genes, such as those encoding a Ser/Thr kinase, carbon-starvation protein CstA, glutamine synthetase, GTP-binding protein BipA, ribonucleotide-diphosphate reductase and phosphate transporter, and alterations in the translational profile of lipoproteins or outer membrane proteins are likely to account for the virulence attenuation in strain IPAV.
genome; proteome; Leptospira; virulence; mutation
Elucidation of the mechanisms of stem cell differentiation is of great scientific interest. Increasing evidence suggests that stem cell differentiation involves changes at multiple levels of biological regulation, which together orchestrate the complex differentiation process; many related studies have been performed to investigate the various levels of regulation. The resulting valuable data, however, remain scattered. Most of the current stem cell-relevant databases focus on a single level of regulation (mRNA expression) from limited stem cell types; thus, a unifying resource would be of great value to compile the multiple levels of research data available. Here we present a database for this purpose, SyStemCell, deposited with multi-level experimental data from stem cell research. The database currently covers seven levels of stem cell differentiation-associated regulatory mechanisms, including DNA CpG 5-hydroxymethylcytosine/methylation, histone modification, transcript products, microRNA-based regulation, protein products, phosphorylation proteins and transcription factor regulation, all of which have been curated from 285 peer-reviewed publications selected from PubMed. The database contains 43,434 genes, recorded as 942,221 gene entries, for four organisms (Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta) and various stem cell sources (e.g., embryonic stem cells, neural stem cells and induced pluripotent stem cells). Data in SyStemCell can be queried by Entrez gene ID, symbol, alias, or browsed by specific stem cell type at each level of genetic regulation. An online analysis tool is integrated to assist researchers to mine potential relationships among different regulations, and the potential usage of the database is demonstrated by three case studies. SyStemCell is the first database to bridge multi-level experimental information of stem cell studies, which can become an important reference resource for stem cell researchers. The database is available at http://lifecenter.sgst.cn/SyStemCell/.
Few studies assessed effects of individual and multiple ions simultaneously on metabolic outcomes, due to methodological limitation.
By combining advanced ionomics and mutual information, a quantifying measurement for mutual dependence between two random variables, we investigated associations of ion modules/networks with overweight/obesity, metabolic syndrome (MetS) and type 2 diabetes (T2DM) in 976 middle-aged Chinese men and women. Fasting plasma ions were measured by inductively coupled plasma mass spectroscopy. Significant ion modules were selected by mutual information to construct disease related ion networks. Plasma copper and phosphorus always ranked the first two among three specific ion networks associated with overweight/obesity, MetS and T2DM. Comparing the ranking of ion individually and in networks, three patterns were observed (1) “Individual ion,” such as potassium and chrome, which tends to work alone; (2) “Module ion,” such as iron in T2DM, which tends to act in modules/network; and (3) “Module-individual ion,” such as copper in overweight/obesity, which seems to work equivalently in either way.
In conclusion, by using the novel approach of the ionomics strategy and the information theory, we observed potential associations of ions individually or as modules/networks with metabolic disorders. Certainly, these findings need to be confirmed in future biological studies.
ZHENG is the key theory in traditional Chinese medicine (TCM) and it is very important to find the molecular pharmacology of traditional Chinese herbal formulae. One ZHENG is related to many diseases and the herbal formulae are aiming to ZHENG. Therefore, many herbal formulae whose effects on a certain disease have been confirmed might also treat other diseases with the same ZHENG. In this study, the microarrays collected from patients with QiXuXueYu ZHENG (Qi-deficiency and Blood-stasis syndrome) before treatment and after being treated with Fuzheng Huayu Capsule were analyzed by a high-throughput gene microarrays-based drug similarity comparison method, which could find the small molecules which had similar effects with Fuzheng Huayu Capsule. Besides getting the results of anti-inflammatory and anti-fibrosis drugs which embody the known effect of Fuzheng Huayu Capsule, many other small molecules were screened out and could reflect other types of effects of this formula in treating QiXuXueYu ZHENG, including anti-hyperglycemic, anti-hyperlipidemic, hyposenstive effect. Then we integrated this information to display the effect of Fuzheng Huayu Capsule and its potential multiple-target molecular pharmacology. Moreover, through using clinical blood-tested data to verify our prediction, Fuzheng Huayu Capsule was proved to have effects on diabetes and dyslipidemia.
Amyloid fibrils are found in many fatal neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, type II diabetes, and prion disease. The VEALYL short peptide from insulin has been confirmed to aggregate amyloid-like fibrils. However, the aggregation mechanism of amyloid fibril is poorly understood. Here, we utilized molecular dynamics simulation to analyse the stability of VEALYL hexamer. The statistical results indicate that hydrophobic residues play key roles in stabilizing VEALYL hexamer. Single point and two linkage mutants confirmed that Val1, Leu4, and Tyr5 of VEALYL are key residues. The consistency of the results for the VEALYL oligomer suggests that the intermediate states might be trimer (3-0) and pentamer(3-2). These results can help us to obtain an insight into the aggregation mechanism of amyloid fibril. These methods can be used to study the stability of amyloid fibril from other short peptides.
With the rapid development of the next generation sequencing (NGS) technology, large quantities of genome sequencing data have been generated. Because of repetitive regions of genomes and some other factors, assembly of very short reads is still a challenging issue.
A novel strategy for improving genome assembly from very short reads is proposed. It can increase accuracies of assemblies by integrating de novo contigs, and produce comparative contigs by allowing multiple references without limiting to genomes of closely related strains. Comparative contigs are used to scaffold de novo contigs. Using simulated and real datasets, it is shown that our strategy can effectively improve qualities of assemblies of isolated microbial genomes and metagenomes.
With more and more reference genomes available, our strategy will be useful to improve qualities of genome assemblies from very short reads. Some scripts are provided to make our strategy applicable at http://code.google.com/p/cd-hybrid/.
In the post-genomic era, transcriptomics and proteomics provide important information to understand the genomes. With fast development of high-throughput technology, more and more transcriptomics and proteomics data are generated at an unprecedented rate. Therefore, requirement of software to annotate those omics data and explore their biological nature arises. In the past decade, some pioneer works were presented to address this issue, but limitations still exist. Fox example, some of these tools offer command line only, which is not suitable for those users with little or no experience in programming. Besides, some tools don’t support large scale gene and protein analysis.
To overcome these limitations, an integrated gene and protein annotation server named iGepros has been developed. The server provides user-friendly interfaces and detailed on-line examples, so most researchers even those with little or no programming experience can use it smoothly. Moreover, the server provides many functionalities to compare transcriptomics and proteomics data. Especially, the server is constructed under a model-view-control framework, which makes it easy to incorporate more functions to the server in the future.
In this paper, we present a server with powerful capability not only for gene and protein functional annotation, but also for transcriptomics and proteomics data comparison. Researchers can survey biological characters behind gene and protein datasets and accelerate their investigation of transcriptome and proteome by applying the server. The server is publicly available at http://www.biosino.org/iGepros/.
Pyrrolidone carboxylic acid (PCA) is formed during a common post-translational modification (PTM) of extracellular and multi-pass membrane proteins. In this study, we developed a new predictor to predict the modification sites of PCA based on maximum relevance minimum redundancy (mRMR) and incremental feature selection (IFS). We incorporated 727 features that belonged to 7 kinds of protein properties to predict the modification sites, including sequence conservation, residual disorder, amino acid factor, secondary structure and solvent accessibility, gain/loss of amino acid during evolution, propensity of amino acid to be conserved at protein-protein interface and protein surface, and deviation of side chain carbon atom number. Among these 727 features, 244 features were selected by mRMR and IFS as the optimized features for the prediction, with which the prediction model achieved a maximum of MCC of 0.7812. Feature analysis showed that all feature types contributed to the modification process. Further site-specific feature analysis showed that the features derived from PCA's surrounding sites contributed more to the determination of PCA sites than other sites. The detailed feature analysis in this paper might provide important clues for understanding the mechanism of the PCA formation and guide relevant experimental validations.
Protein lysine acetylation has emerged as a key posttranslational modification in cellular regulation, in particular through the modification of histones and nuclear transcription regulators. We show that lysine acetylation is a prevalent modification in enzymes that catalyze intermediate metabolism. Virtually every enzyme in glycolysis, gluconeogenesis, the tricarboxylic acid (TCA) cycle, the urea cycle, fatty acid metabolism, and glycogen metabolism was found to be acetylated in human liver tissue. The concentration of metabolic fuels, such as glucose, amino acids, and fatty acids, influenced the acetylation status of metabolic enzymes. Acetylation activated enoyl–coenzyme A hydratase/3-hydroxyacyl–coenzyme A dehydrogenase in fatty acid oxidation and malate dehydrogenase in the TCA cycle, inhibited argininosuccinate lyase in the urea cycle, and destabilized phosphoenolpyruvate carboxykinase in gluconeogenesis. Our study reveals that acetylation plays a major role in metabolic regulation.
Viral integration plays an important role in the development of malignant diseases. Viruses differ in preferred integration site and flanking sequence. Viral integration sites (VIS) have been found next to oncogenes and common fragile sites. Understanding the typical DNA features near VIS is useful for the identification of potential oncogenes, prediction of malignant disease development and assessing the probability of malignant transformation in gene therapy. Therefore, we have built a database of human disease-related VIS (Dr.VIS, http://www.scbit.org/dbmi/drvis) to collect and maintain human disease-related VIS data, including characteristics of the malignant disease, chromosome region, genomic position and viral–host junction sequence. The current build of Dr.VIS covers about 600 natural VIS of 5 oncogenic viruses representing 11 diseases. Among them, about 200 VIS have viral–host junction sequence.
A large amount of differentially expressed proteins (DEPs) have been identified in various cancer proteomics experiments, curation and annotation of these proteins are important in deciphering their roles in oncogenesis and tumor progression, and may further help to discover potential protein biomarkers for clinical applications. In 2009, we published the first database of DEPs in human cancers (dbDEPCs). In this updated version of 2011, dbDEPC 2.0 has more than doubly expanded to over 4000 protein entries, curated from 331 experiments across 20 types of human cancers. This resource allows researchers to search whether their interested proteins have been reported changing in certain cancers, to compare their own proteomic discovery with previous studies, to picture selected protein expression heatmap across multiple cancers and to relate protein expression changes with aberrance in other genetic level. New important developments include addition of experiment design information, advanced filter tools for customer-specified analysis and a network analysis tool. We expect dbDEPC 2.0 to be a much more powerful tool than it was in its first release and can serve as reference to both proteomics and cancer researchers. dbDEPC 2.0 is available at http://lifecenter.sgst.cn/dbdepc/index.do.
The identification of human cancer-related microRNAs (miRNAs) is important for cancer biology research. Although several identification methods have achieved remarkable success, they have overlooked the functional information associated with miRNAs. We present a computational framework that can be used to prioritize human cancer miRNAs by measuring the association between cancer and miRNAs based on the functional consistency score (FCS) of the miRNA target genes and the cancer-related genes. This approach proved successful in identifying the validated cancer miRNAs for 11 common human cancers with area under ROC curve (AUC) ranging from 71.15% to 96.36%. The FCS method had a significant advantage over miRNA differential expression analysis when identifying cancer-related miRNAs with a fine regulatory mechanism, such as miR-27a in colorectal cancer. Furthermore, a case study examining thyroid cancer showed that the FCS method can uncover novel cancer-related miRNAs such as miR-27a/b, which were showed significantly upregulated in thyroid cancer samples by qRT-PCR analysis. Our method can be used on a web-based server, CMP (cancer miRNA prioritization) and is freely accessible at http://bioinfo.hrbmu.edu.cn/CMP. This time- and cost-effective computational framework can be a valuable complement to experimental studies and can assist with future studies of miRNA involvement in the pathogenesis of cancers.