Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("Hu, fengyun")
1.  A comparative study of RNA-seq analysis strategies 
Briefings in Bioinformatics  2015;16(6):932-940.
Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online ( and can be expanded by interested parties to include methods other than the exemplars presented in this article.
PMCID: PMC4652615  PMID: 25788326
RNA-seq; transcriptome assembly; gene expression; RNA splicing
2.  InterMine: extensive web services for modern biology 
Nucleic Acids Research  2014;42(Web Server issue):W468-W472.
InterMine ( is a biological data warehousing system providing extensive automatically generated and configurable RESTful web services that underpin the web interface and can be re-used in many other applications: to find and filter data; export it in a flexible and structured way; to upload, use, manipulate and analyze lists; to provide services for flexible retrieval of sequence segments, and for other statistical and analysis tools. Here we describe these features and discuss how they can be used separately or in combinations to support integrative and comparative analysis.
PMCID: PMC4086141  PMID: 24753429
3.  metabolicMine: an integrated genomics, genetics and proteomics data warehouse for common metabolic disease research 
Common metabolic and endocrine diseases such as diabetes affect millions of people worldwide and have a major health impact, frequently leading to complications and mortality. In a search for better prevention and treatment, there is ongoing research into the underlying molecular and genetic bases of these complex human diseases, as well as into the links with risk factors such as obesity. Although an increasing number of relevant genomic and proteomic data sets have become available, the quantity and diversity of the data make their efficient exploitation challenging. Here, we present metabolicMine, a data warehouse with a specific focus on the genomics, genetics and proteomics of common metabolic diseases. Developed in collaboration with leading UK metabolic disease groups, metabolicMine integrates data sets from a range of experiments and model organisms alongside tools for exploring them. The current version brings together information covering genes, proteins, orthologues, interactions, gene expression, pathways, ontologies, diseases, genome-wide association studies and single nucleotide polymorphisms. Although the emphasis is on human data, key data sets from mouse and rat are included. These are complemented by interoperation with the RatMine rat genomics database, with a corresponding mouse version under development by the Mouse Genome Informatics (MGI) group. The web interface contains a number of features including keyword search, a library of Search Forms, the QueryBuilder and list analysis tools. This provides researchers with many different ways to analyse, view and flexibly export data. Programming interfaces and automatic code generation in several languages are supported, and many of the features of the web interface are available through web services. The combination of diverse data sets integrated with analysis tools and a powerful query system makes metabolicMine a valuable research resource. The web interface makes it accessible to first-time users, whereas the Application Programming Interface (API) and web services provide convenient data access and tools for bioinformaticians. metabolicMine is freely available online at
Database URL:
PMCID: PMC4438919  PMID: 23935057
4.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data 
Bioinformatics  2012;28(23):3163-3165.
Summary: InterMine is an open-source data warehouse system that facilitates the building of databases with complex data integration requirements and a need for a fast customizable query facility. Using InterMine, large biological databases can be created from a range of heterogeneous data sources, and the extensible data model allows for easy integration of new data types. The analysis tools include a flexible query builder, genomic region search and a library of ‘widgets’ performing various statistical analyses. The results can be exported in many commonly used formats. InterMine is a fully extensible framework where developers can add new tools and functionality. Additionally, there is a comprehensive set of web services, for which client libraries are provided in five commonly used programming languages.
Availability: Freely available from under the LGPL license.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3516146  PMID: 23023984
5.  Understanding Regulation of Metabolism through Feasibility Analysis 
PLoS ONE  2012;7(7):e39396.
Understanding cellular regulation of metabolism is a major challenge in systems biology. Thus far, the main assumption was that enzyme levels are key regulators in metabolic networks. However, regulation analysis recently showed that metabolism is rarely controlled via enzyme levels only, but through non-obvious combinations of hierarchical (gene and enzyme levels) and metabolic regulation (mass action and allosteric interaction). Quantitative analyses relating changes in metabolic fluxes to changes in transcript or protein levels have revealed a remarkable lack of understanding of the regulation of these networks. We study metabolic regulation via feasibility analysis (FA). Inspired by the constraint-based approach of Flux Balance Analysis, FA incorporates a model describing kinetic interactions between molecules. We enlarge the portfolio of objectives for the cell by defining three main physiologically relevant objectives for the cell: function, robustness and temporal responsiveness. We postulate that the cell assumes one or a combination of these objectives and search for enzyme levels necessary to achieve this. We call the subspace of feasible enzyme levels the feasible enzyme space. Once this space is constructed, we can study how different objectives may (if possible) be combined, or evaluate the conditions at which the cells are faced with a trade-off among those. We apply FA to the experimental scenario of long-term carbon limited chemostat cultivation of yeast cells, studying how metabolism evolves optimally. Cells employ a mixed strategy composed of increasing enzyme levels for glucose uptake and hexokinase and decreasing levels of the remaining enzymes. This trade-off renders the cells specialized in this low-carbon flux state to compete for the available glucose and get rid of over-overcapacity. Overall, we show that FA is a powerful tool for systems biologists to study regulation of metabolism, interpret experimental data and evaluate hypotheses.
PMCID: PMC3392259  PMID: 22808034
6.  modMine: flexible access to modENCODE data 
Nucleic Acids Research  2011;40(Database issue):D1082-D1088.
In an effort to comprehensively characterize the functional elements within the genomes of the important model organisms Drosophila melanogaster and Caenorhabditis elegans, the NHGRI model organism Encyclopaedia of DNA Elements (modENCODE) consortium has generated an enormous library of genomic data along with detailed, structured information on all aspects of the experiments. The modMine database ( described here has been built by the modENCODE Data Coordination Center to allow the broader research community to (i) search for and download data sets of interest among the thousands generated by modENCODE; (ii) access the data in an integrated form together with non-modENCODE data sets; and (iii) facilitate fine-grained analysis of the above data. The sophisticated search features are possible because of the collection of extensive experimental metadata by the consortium. Interfaces are provided to allow both biologists and bioinformaticians to exploit these rich modENCODE data sets now available via modMine.
PMCID: PMC3245176  PMID: 22080565

Results 1-6 (6)