Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

more »
more »
Year of Publication
Document Types
1.  Statistical Inference and Reverse Engineering of Gene Regulatory Networks from Observational Expression Data 
In this paper, we present a systematic and conceptual overview of methods for inferring gene regulatory networks from observational gene expression data. Further, we discuss two classic approaches to infer causal structures and compare them with contemporary methods by providing a conceptual categorization thereof. We complement the above by surveying global and local evaluation measures for assessing the performance of inference algorithms.
PMCID: PMC3271232  PMID: 22408642
gene regulatory networks; statistical inference; reverse engineering; causal relations; directed acyclic graphs; Bayesian network; information-theory methods
2.  Large Scale Chemical Cross-linking Mass Spectrometry Perspectives 
The spectacular heterogeneity of a complex protein mixture from biological samples becomes even more difficult to tackle when one’s attention is shifted towards different protein complex topologies, transient interactions, or localization of PPIs. Meticulous protein-by-protein affinity pull-downs and yeast-two-hybrid screens are the two approaches currently used to decipher proteome-wide interaction networks. Another method is to employ chemical cross-linking, which gives not only identities of interactors, but could also provide information on the sites of interactions and interaction interfaces. Despite significant advances in mass spectrometry instrumentation over the last decade, mapping Protein-Protein Interactions (PPIs) using chemical cross-linking remains time consuming and requires substantial expertise, even in the simplest of systems. While robust methodologies and software exist for the analysis of binary PPIs and also for the single protein structure refinement using cross-linking-derived constraints, undertaking a proteome-wide cross-linking study is highly complex. Difficulties include i) identifying cross-linkers of the right length and selectivity that could capture interactions of interest; ii) enrichment of the cross-linked species; iii) identification and validation of the cross-linked peptides and cross-linked sites.
In this review we examine existing literature aimed at the large-scale protein cross-linking and discuss possible paths for improvement. We also discuss short-length cross-linkers of broad specificity such as formaldehyde and diazirine-based photo-cross-linkers. These cross-linkers could potentially capture many types of interactions, without strict requirement for a particular amino-acid to be present at a given protein-protein interface. How these shortlength, broad specificity cross-linkers be applied to proteome-wide studies? We will suggest specific advances in methodology, instrumentation and software that are needed to make such a leap.
PMCID: PMC4101816  PMID: 25045217
Chemical cross linking; Mass spectrometry; Proteomics; Large-scale PPI
3.  A Stress-Activated, p38 Mitogen-Activated Protein Kinase–ATF/CREB Pathway Regulates Posttranscriptional, Sequence-Dependent Decay of Target RNAs 
Molecular and Cellular Biology  2013;33(15):3026-3035.
Broadly conserved, mitogen-activated/stress-activated protein kinases (MAPK/SAPK) of the p38 family regulate multiple cellular processes. They transduce signals via dimeric, basic leucine zipper (bZIP) transcription factors of the ATF/CREB family (such as Atf2, Fos, and Jun) to regulate the transcription of target genes. We report additional mechanisms for gene regulation by such pathways exerted through RNA stability controls. The Spc1 (Sty1/Phh1) kinase-regulated Atf1-Pcr1 (Mts1-Mts2) heterodimer of the fission yeast Schizosaccharomyces pombe controls the stress-induced, posttranscriptional stability and decay of sets of target RNAs. Whole transcriptome RNA sequencing data revealed that decay is associated nonrandomly with transcripts that contain an M26 sequence motif. Moreover, the ablation of an M26 sequence motif in a target mRNA is sufficient to block its stress-induced loss. Conversely, engineered M26 motifs can render a stable mRNA into one that is targeted for decay. This stress-activated RNA decay (SARD) provides a mechanism for reducing the expression of target genes without shutting off transcription itself. Thus, a single p38-ATF/CREB signal transduction pathway can coordinately induce (promote transcription and RNA stability) and repress (promote RNA decay) transcript levels for distinct sets of genes, as is required for developmental decisions in response to stress and other stimuli.
PMCID: PMC3719685  PMID: 23732911
4.  Evolution of gene expression and expression plasticity in long-term experimental populations of Drosophila melanogaster maintained under constant and variable ethanol stress 
Molecular ecology  2012;21(17):4287-4299.
Gene expression responds to the environment, and can also evolve rapidly in response to altered selection regimes. Little is known, however, about the extent to which evolutionary adaptation to a particular type of stress involves changes in the within-generation (“plastic”) responses of gene expression to the stress. We used microarrays to quantify gene expression plasticity in response to ethanol in laboratory populations of Drosophila melanogaster differing in their history of ethanol exposure. Two populations (“R” populations) were maintained on regular medium, two (“E”) were maintained on medium supplemented with ethanol, and two (“M”) were maintained in a mixed regime in which half of the population was reared on one medium type, and half on the other, each generation. After more than 300 generations, embryos from each population were collected and exposed to either ethanol or water as a control, and RNA was extracted from the larvae shortly after hatching. Nearly 2000 transcripts showed significant within-generation responses to ethanol exposure. Evolutionary history also affected gene expression: the E and M populations were largely indistinguishable in expression, but differed significantly in expression from the R populations for over 100 transcripts, the majority of which did not show plastic responses. Notably, in no case was the interaction between selection regime and ethanol exposure significant after controlling for multiple comparisons, indicating that adaptation to ethanol in the E and M populations did not involve substantial changes in gene expression plasticity. The results give evidence that expression plasticity evolves considerably more slowly than mean expression.
PMCID: PMC3654693  PMID: 22774776
adaptation; alcohol; genetic correlation; genotype-environment interaction
5.  Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential 
Nucleic Acids Research  2013;41(7):e82.
In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.
PMCID: PMC3627569  PMID: 23389952
6.  Computational Prediction of Polycomb-Associated Long Non-Coding RNAs 
PLoS ONE  2012;7(9):e44878.
Among thousands of long non-coding RNAs (lncRNAs) only a small subset is functionally characterized and the functional annotation of lncRNAs on the genomic scale remains inadequate. In this study we computationally characterized two functionally different parts of human lncRNAs transcriptome based on their ability to bind the polycomb repressive complex, PRC2. This classification is enabled by the fact that while all lncRNAs constitute a diverse set of sequences, the classes of PRC2-binding and PRC2 non-binding lncRNAs possess characteristic combinations of sequence-structure patterns and, therefore, can be separated within the feature space. Based on the specific combination of features, we built several machine-learning classifiers and identified the SVM-based classifier as the best performing. We further showed that the SVM-based classifier is able to generalize on the independent data sets. We observed that this classifier, trained on the human lncRNAs, can predict up to 59.4% of PRC2-binding lncRNAs in mice. This suggests that, despite the low degree of sequence conservation, many lncRNAs play functionally conserved biological roles.
PMCID: PMC3441527  PMID: 23028655
7.  A Bayesian analysis of the chromosome architecture of human disorders by integrating reductionist data 
Scientific Reports  2012;2:513.
In this paper, we present a Bayesian approach to estimate a chromosome and a disorder network from the Online Mendelian Inheritance in Man (OMIM) database. In contrast to other approaches, we obtain statistic rather than deterministic networks enabling a parametric control in the uncertainty of the underlying disorder-disease gene associations contained in the OMIM, on which the networks are based. From a structural investigation of the chromosome network, we identify three chromosome subgroups that reflect architectural differences in chromosome-disorder associations that are predictively exploitable for a functional analysis of diseases.
PMCID: PMC3400933  PMID: 22822426
9.  Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets 
Bioinformatics  2009;25(18):2348-2354.
Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses.
Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2735665  PMID: 19574285
10.  Mutational hotspots in the TP53 gene and, possibly, other tumor suppressors evolve by positive selection 
Biology Direct  2006;1:4.
The mutation spectra of the TP53 gene and other tumor suppressors contain multiple hotspots, i.e., sites of non-random, frequent mutation in tumors and/or the germline. The origin of the hotspots remains unclear, the general view being that they represent highly mutable nucleotide contexts which likely reflect effects of different endogenous and exogenous factors shaping the mutation process in specific tissues. The origin of hotspots is of major importance because it has been suggested that mutable contexts could be used to infer mechanisms of mutagenesis contributing to tumorigenesis.
Here we apply three independent tests, accounting for non-uniform base compositions in synonymous and non-synonymous sites, to test whether the hotspots emerge via selection or due to mutational bias. All three tests consistently indicate that the hotspots in the TP53 gene evolve, primarily, via positive selection. The results were robust to the elimination of the highly mutable CpG dinucleotides. By contrast, only one, the least conservative test reveals the signature of positive selection in BRCA1, BRCA2, and p16. Elucidation of the origin of the hotspots in these genes requires more data on somatic mutations in tumors.
The results of this analysis seem to indicate that positive selection for gain-of-function in tumor suppressor genes is an important aspect of tumorigenesis, blurring the distinction between tumor suppressors and oncogenes.
This article was reviewed by Sandor Pongor, Christopher Lee and Mikhail Blagosklonny.
PMCID: PMC1403748  PMID: 16542006
11.  Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns 
Genome Biology  2004;5(5):R32.
A hierarchy of 3,688 phyletic patterns was characterized encompassing more than 5,000 known protein-coding genes from 66 complete microbial genomes. The results indicate that gene loss and displacement has occurred in the evolution of most pathways.
Phyletic patterns denote the presence and absence of orthologous genes in completely sequenced genomes and are used to infer functional links between genes, on the assumption that genes involved in the same pathway or functional system are co-inherited by the same set of genomes. However, this basic premise has not been quantitatively tested, and the limits of applicability of the phyletic-pattern method remain unknown.
We characterized a hierarchy of 3,688 phyletic patterns encompassing more than 5,000 known protein-coding genes from 66 complete microbial genomes, using different distances, clustering algorithms, and measures of cluster quality. The most sensitive set of parameters recovered 223 clusters, each consisting of genes that belong to the same metabolic pathway or functional system. Fifty-six clusters included unexpected genes with plausible functional links to the rest of the cluster. Only a small percentage of known pathways and multiprotein complexes are co-inherited as one cluster; most are split into many clusters, indicating that gene loss and displacement has occurred in the evolution of most pathways.
Phyletic patterns of functionally linked genes are perturbed by differential gains, losses and displacements of orthologous genes in different species, reflecting the high plasticity of microbial genomes. Groups of genes that are co-inherited can, however, be recovered by hierarchical clustering, and may represent elementary functional modules of cellular metabolism. The phyletic patterns approach alone can confidently predict the functional linkages for about 24% of the entire data set.
PMCID: PMC416468  PMID: 15128446

Results 1-11 (11)