Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Populus tremula (European aspen) shows no evidence of sexual dimorphism 
BMC Plant Biology  2014;14(1):276.
Evolutionary theory suggests that males and females may evolve sexually dimorphic phenotypic and biochemical traits concordant with each sex having different optimal strategies of resource investment to maximise reproductive success and fitness. Such sexual dimorphism would result in sex biased gene expression patterns in non-floral organs for autosomal genes associated with the control and development of such phenotypic traits.
We examined morphological, biochemical and herbivory traits to test for sexually dimorphic resource allocation strategies within collections of sexually mature and immature Populus tremula (European aspen) trees. In addition we profiled gene expression in mature leaves of sexually mature wild trees using whole-genome oligonucleotide microarrays and RNA-Sequencing.
We found no evidence of sexual dimorphism or differential resource investment strategies between males and females in either sexually immature or mature trees. Similarly, single-gene differential expression and machine learning approaches revealed no evidence of large-scale sex biased gene expression. However, two significantly differentially expressed genes were identified from the RNA-Seq data, one of which is a robust diagnostic marker of sex in P. tremula.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-014-0276-5) contains supplementary material, which is available to authorized users.
PMCID: PMC4203875  PMID: 25318822
Sexual dimorphism; RNA-Sequencing; transcriptomics; Populus tremula; dioecious
2.  ComPlEx: conservation and divergence of co-expression networks in A. thaliana, Populus and O. sativa 
BMC Genomics  2014;15:106.
Divergence in gene regulation has emerged as a key mechanism underlying species differentiation. Comparative analysis of co-expression networks across species can reveal conservation and divergence in the regulation of genes.
We inferred co-expression networks of A. thaliana, Populus spp. and O. sativa using state-of-the-art methods based on mutual information and context likelihood of relatedness, and conducted a comprehensive comparison of these networks across a range of co-expression thresholds. In addition to quantifying gene-gene link and network neighbourhood conservation, we also applied recent advancements in network analysis to do cross-species comparisons of network properties such as scale free characteristics and gene centrality as well as network motifs. We found that in all species the networks emerged as scale free only above a certain co-expression threshold, and that the high-centrality genes upholding this organization tended to be conserved. Network motifs, in particular the feed-forward loop, were found to be significantly enriched in specific functional subnetworks but where much less conserved across species than gene centrality. Although individual gene-gene co-expression had massively diverged, up to ~80% of the genes still had a significantly conserved network neighbourhood. For genes with multiple predicted orthologs, about half had one ortholog with conserved regulation and another ortholog with diverged or non-conserved regulation. Furthermore, the most sequence similar ortholog was not the one with the most conserved gene regulation in over half of the cases.
We have provided a comprehensive analysis of gene regulation evolution in plants and built a web tool for Comparative analysis of Plant co-Expression networks (ComPlEx, The tool can be particularly useful for identifying the ortholog with the most conserved regulation among several sequence-similar alternatives and can thus be of practical importance in e.g. finding candidate genes for perturbation experiments.
PMCID: PMC3925997  PMID: 24498971
3.  OnPLS integration of transcriptomic, proteomic and metabolomic data shows multi-level oxidative stress responses in the cambium of transgenic hipI- superoxide dismutase Populus plants 
BMC Genomics  2013;14:893.
Reactive oxygen species (ROS) are involved in the regulation of diverse physiological processes in plants, including various biotic and abiotic stress responses. Thus, oxidative stress tolerance mechanisms in plants are complex, and diverse responses at multiple levels need to be characterized in order to understand them. Here we present system responses to oxidative stress in Populus by integrating data from analyses of the cambial region of wild-type controls and plants expressing high-isoelectric-point superoxide dismutase (hipI-SOD) transcripts in antisense orientation showing a higher production of superoxide. The cambium, a thin cell layer, generates cells that differentiate to form either phloem or xylem and is hypothesized to be a major reason for phenotypic perturbations in the transgenic plants. Data from multiple platforms including transcriptomics (microarray analysis), proteomics (UPLC/QTOF-MS), and metabolomics (GC-TOF/MS, UPLC/MS, and UHPLC-LTQ/MS) were integrated using the most recent development of orthogonal projections to latent structures called OnPLS. OnPLS is a symmetrical multi-block method that does not depend on the order of analysis when more than two blocks are analysed. Significantly affected genes, proteins and metabolites were then visualized in painted pathway diagrams.
The main categories that appear to be significantly influenced in the transgenic plants were pathways related to redox regulation, carbon metabolism and protein degradation, e.g. the glycolysis and pentose phosphate pathways (PPP). The results provide system-level information on ROS metabolism and responses to oxidative stress, and indicate that some initial responses to oxidative stress may share common pathways.
The proposed data evaluation strategy shows an efficient way of compiling complex, multi-platform datasets to obtain significant biological information.
PMCID: PMC3878592  PMID: 24341908
Statistical integration; OnPLS; Poplar; Oxidative stress; Systems biology
4.  Synergy: A Web Resource for Exploring Gene Regulation in Synechocystis sp. PCC6803 
PLoS ONE  2014;9(11):e113496.
Despite being a highly studied model organism, most genes of the cyanobacterium Synechocystis sp. PCC 6803 encode proteins with completely unknown function. To facilitate studies of gene regulation in Synechocystis, we have developed Synergy (, a web application integrating co-expression networks and regulatory motif analysis. Co-expression networks were inferred from publicly available microarray experiments, while regulatory motifs were identified using a phylogenetic footprinting approach. Automatically discovered motifs were shown to be enriched in the network neighborhoods of regulatory proteins much more often than in the neighborhoods of non-regulatory genes, showing that the data provide a sound starting point for studying gene regulation in Synechocystis. Concordantly, we provide several case studies demonstrating that Synergy can be used to find biologically relevant regulatory mechanisms in Synechocystis. Synergy can be used to interactively perform analyses such as gene/motif search, network visualization and motif/function enrichment. Considering the importance of Synechocystis for photosynthesis and biofuel research, we believe that Synergy will become a valuable resource to the research community.
PMCID: PMC4242644  PMID: 25420108
5.  Characterization of cytokinin signaling and homeostasis gene families in two hardwood tree species: Populus trichocarpa and Prunus persica 
BMC Genomics  2013;14:885.
Through the diversity of cytokinin regulated processes, this phytohormone has a profound impact on plant growth and development. Cytokinin signaling is involved in the control of apical and lateral meristem activity, branching pattern of the shoot, and leaf senescence. These processes influence several traits, including the stem diameter, shoot architecture, and perennial life cycle, which define the development of woody plants. To facilitate research about the role of cytokinin in regulation of woody plant development, we have identified genes associated with cytokinin signaling and homeostasis pathways from two hardwood tree species.
Taking advantage of the sequenced black cottonwood (Populus trichocarpa) and peach (Prunus persica) genomes, we have compiled a comprehensive list of genes involved in these pathways. We identified genes belonging to the six families of cytokinin oxidases (CKXs), isopentenyl transferases (IPTs), LONELY GUY genes (LOGs), two-component receptors, histidine containing phosphotransmitters (HPts), and response regulators (RRs). All together 85 Populus and 45 Prunus genes were identified, and compared to their Arabidopsis orthologs through phylogenetic analyses.
In general, when compared to Arabidopsis, differences in gene family structure were often seen in only one of the two tree species. However, one class of genes associated with cytokinin signal transduction, the CKI1-like family of two-component histidine kinases, was larger in both Populus and Prunus than in Arabidopsis.
PMCID: PMC3866579  PMID: 24341635
Cytokinin signaling; Cytokinin homeostasis; Populus trichocarpa; Black cottonwood; Prunus persica; Peach
6.  Classification of microarrays; synergistic effects between normalization, gene selection and machine learning 
BMC Bioinformatics  2011;12:390.
Machine learning is a powerful approach for describing and predicting classes in microarray data. Although several comparative studies have investigated the relative performance of various machine learning methods, these often do not account for the fact that performance (e.g. error rate) is a result of a series of analysis steps of which the most important are data normalization, gene selection and machine learning.
In this study, we used seven previously published cancer-related microarray data sets to compare the effects on classification performance of five normalization methods, three gene selection methods with 21 different numbers of selected genes and eight machine learning methods. Performance in term of error rate was rigorously estimated by repeatedly employing a double cross validation approach. Since performance varies greatly between data sets, we devised an analysis method that first compares methods within individual data sets and then visualizes the comparisons across data sets. We discovered both well performing individual methods and synergies between different methods.
Support Vector Machines with a radial basis kernel, linear kernel or polynomial kernel of degree 2 all performed consistently well across data sets. We show that there is a synergistic relationship between these methods and gene selection based on the T-test and the selection of a relatively high number of genes. Also, we find that these methods benefit significantly from using normalized data, although it is hard to draw general conclusions about the relative performance of different normalization procedures.
PMCID: PMC3229535  PMID: 21982277
7.  A systems biology model of the regulatory network in Populus leaves reveals interacting regulators and conserved regulation 
BMC Plant Biology  2011;11:13.
Green plant leaves have always fascinated biologists as hosts for photosynthesis and providers of basic energy to many food webs. Today, comprehensive databases of gene expression data enable us to apply increasingly more advanced computational methods for reverse-engineering the regulatory network of leaves, and to begin to understand the gene interactions underlying complex emergent properties related to stress-response and development. These new systems biology methods are now also being applied to organisms such as Populus, a woody perennial tree, in order to understand the specific characteristics of these species.
We present a systems biology model of the regulatory network of Populus leaves. The network is reverse-engineered from promoter information and expression profiles of leaf-specific genes measured over a large set of conditions related to stress and developmental. The network model incorporates interactions between regulators, such as synergistic and competitive relationships, by evaluating increasingly more complex regulatory mechanisms, and is therefore able to identify new regulators of leaf development not found by traditional genomics methods based on pair-wise expression similarity. The approach is shown to explain available gene function information and to provide robust prediction of expression levels in new data. We also use the predictive capability of the model to identify condition-specific regulation as well as conserved regulation between Populus and Arabidopsis.
We outline a computationally inferred model of the regulatory network of Populus leaves, and show how treating genes as interacting, rather than individual, entities identifies new regulators compared to traditional genomics analysis. Although systems biology models should be used with care considering the complexity of regulatory programs and the limitations of current genomics data, methods describing interactions can provide hypotheses about the underlying cause of emergent properties and are needed if we are to identify target genes other than those constituting the "low hanging fruit" of genomic analysis.
PMCID: PMC3030533  PMID: 21232107
8.  Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering 
BMC Bioinformatics  2010;11:503.
Cluster analysis, and in particular hierarchical clustering, is widely used to extract information from gene expression data. The aim is to discover new classes, or sub-classes, of either individuals or genes. Performing a cluster analysis commonly involve decisions on how to; handle missing values, standardize the data and select genes. In addition, pre-processing, involving various types of filtration and normalization procedures, can have an effect on the ability to discover biologically relevant classes. Here we consider cluster analysis in a broad sense and perform a comprehensive evaluation that covers several aspects of cluster analyses, including normalization.
We evaluated 2780 cluster analysis methods on seven publicly available 2-channel microarray data sets with common reference designs. Each cluster analysis method differed in data normalization (5 normalizations were considered), missing value imputation (2), standardization of data (2), gene selection (19) or clustering method (11). The cluster analyses are evaluated using known classes, such as cancer types, and the adjusted Rand index. The performances of the different analyses vary between the data sets and it is difficult to give general recommendations. However, normalization, gene selection and clustering method are all variables that have a significant impact on the performance. In particular, gene selection is important and it is generally necessary to include a relatively large number of genes in order to get good performance. Selecting genes with high standard deviation or using principal component analysis are shown to be the preferred gene selection methods. Hierarchical clustering using Ward's method, k-means clustering and Mclust are the clustering methods considered in this paper that achieves the highest adjusted Rand. Normalization can have a significant positive impact on the ability to cluster individuals, and there are indications that background correction is preferable, in particular if the gene selection is successful. However, this is an area that needs to be studied further in order to draw any general conclusions.
The choice of cluster analysis, and in particular gene selection, has a large impact on the ability to cluster individuals correctly based on expression profiles. Normalization has a positive effect, but the relative performance of different normalizations is an area that needs more research. In summary, although clustering, gene selection and normalization are considered standard methods in bioinformatics, our comprehensive analysis shows that selecting the right methods, and the right combinations of methods, is far from trivial and that much is still unexplored in what is considered to be the most basic analysis of genomic data.
PMCID: PMC3098084  PMID: 20937082
9.  Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue–residue contacts 
Bioinformatics  2009;25(10):1264-1270.
Motivation:Correct prediction of residue–residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail.
Results: We propose a novel hidden Markov model (HMM)-based method for predicting residue–residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 · L predictions (L=sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2677742  PMID: 19289446
10.  A Comprehensive Analysis of the Structure-Function Relationship in Proteins Based on Local Structure Similarity 
PLoS ONE  2009;4(7):e6266.
Sequence similarity to characterized proteins provides testable functional hypotheses for less than 50% of the proteins identified by genome sequencing projects. With structural genomics it is believed that structural similarities may give functional hypotheses for many of the remaining proteins.
Methodology/Principal Findings
We provide a systematic analysis of the structure-function relationship in proteins using the novel concept of local descriptors of protein structure. A local descriptor is a small substructure of a protein which includes both short- and long-range interactions. We employ a library of commonly reoccurring local descriptors general enough to assemble most existing protein structures. We then model the relationship between these local shapes and Gene Ontology using rule-based learning. Our IF-THEN rule model offers legible, high resolution descriptions that combine local substructures and is able to discriminate functions even for functionally versatile folds such as the frequently occurring TIM barrel and Rossmann fold. By evaluating the predictive performance of the model, we provide a comprehensive quantification of the structure-function relationship based only on local structure similarity. Our findings are, among others, that conserved structure is a stronger prerequisite for enzymatic activity than for binding specificity, and that structure-based predictions complement sequence-based predictions. The model is capable of generating correct hypotheses, as confirmed by a literature study, even when no significant sequence similarity to characterized proteins exists.
Our approach offers a new and complete description and quantification of the structure-function relationship in proteins. By demonstrating how our predictions offer higher sensitivity than using global structure, and complement the use of sequence, we show that the presented ideas could advance the development of meta-servers in function prediction.
PMCID: PMC2705683  PMID: 19603073
11.  Revealing cell cycle control by combining model-based detection of periodic expression with novel cis-regulatory descriptors 
BMC Systems Biology  2007;1:45.
We address the issue of explaining the presence or absence of phase-specific transcription in budding yeast cultures under different conditions. To this end we use a model-based detector of gene expression periodicity to divide genes into classes depending on their behavior in experiments using different synchronization methods. While computational inference of gene regulatory circuits typically relies on expression similarity (clustering) in order to find classes of potentially co-regulated genes, this method instead takes advantage of known time profile signatures related to the studied process.
We explain the regulatory mechanisms of the inferred periodic classes with cis-regulatory descriptors that combine upstream sequence motifs with experimentally determined binding of transcription factors. By systematic statistical analysis we show that periodic classes are best explained by combinations of descriptors rather than single descriptors, and that different combinations correspond to periodic expression in different classes. We also find evidence for additive regulation in that the combinations of cis-regulatory descriptors associated with genes periodically expressed in fewer conditions are frequently subsets of combinations associated with genes periodically expression in more conditions. Finally, we demonstrate that our approach retrieves combinations that are more specific towards known cell-cycle related regulators than the frequently used clustering approach.
The results illustrate how a model-based approach to expression analysis may be particularly well suited to detect biologically relevant mechanisms. Our new approach makes it possible to provide more refined hypotheses about regulatory mechanisms of the cell cycle and it can easily be adjusted to reveal regulation of other, non-periodic, cellular processes.
PMCID: PMC2200664  PMID: 17939860
12.  Using local gene expression similarities to discover regulatory binding site modules 
BMC Bioinformatics  2006;7:505.
We present an approach designed to identify gene regulation patterns using sequence and expression data collected for Saccharomyces cerevisae. Our main goal is to relate the combinations of transcription factor binding sites (also referred to as binding site modules) identified in gene promoters to the expression of these genes. The novel aspects include local expression similarity clustering and an exact IF-THEN rule inference algorithm. We also provide a method of rule generalization to include genes with unknown expression profiles.
We have implemented the proposed framework and tested it on publicly available datasets from yeast S. cerevisae. The testing procedure consists of thorough statistical analyses of the groups of genes matching the rules we infer from expression data against known sets of co-regulated genes. For this purpose we have used published ChIP-Chip data and Gene Ontology annotations. In order to make these tests more objective we compare our results with recently published similar studies.
Results we obtain show that local expression similarity clustering greatly enhances overall quality of the derived rules, both in terms of enrichment of Gene Ontology functional annotation and coherence with ChIP-Chip binding data. Our approach thus provides reliable hypotheses on co-regulation that can be experimentally verified. An important feature of the method is its reliance only on widely accessible sequence and expression data. The same procedure can be easily applied to other microbial organisms.
PMCID: PMC2001304  PMID: 17109764

Results 1-12 (12)