Genome-wide experiments are routinely used to increase the understanding of the biological processes involved in the development and maintenance of a variety of pathologies. Although the technical feasibility of this type of experiment has improved in recent years, data analysis remains challenging. In this context, gene set analysis has emerged as a fundamental tool for the interpretation of the results. Here, we review strategies used in the gene set approach, and using datasets for the pig cardiocirculatory system as a case study, we demonstrate how the use of a combination of these strategies can enhance the interpretation of results. Gene set analyses are able to distinguish vessels from the heart and arteries from veins in a manner that is consistent with the different cellular composition of smooth muscle cells. By integrating microRNA elements in the regulatory circuits identified, we find that vessel specificity is maintained through specific miRNAs, such as miR-133a and miR-143, which show anti-correlated expression with their mRNA targets.
pathway analysis; miRNA; cardiocirculatory; network reconstruction; integrative analysis; pig; artery; vein; vessel
Muscles in Duchenne dystrophy patients are characterized by the absence of dystrophin, yet transverse sections show a small percentage of fibers (termed “revertant fibers”) positive for dystrophin expression. This phenomenon, whose biological bases have not been fully elucidated, is present also in the murine and canine models of DMD and can confound the evaluation of therapeutic approaches. We analyzed 11 different muscles in a cohort of 40 mdx mice, the most commonly model used in pre-clinical studies, belonging to four age groups; such number of animals allowed us to perform solid ANOVA statistical analysis. We assessed the average number of dystrophin-positive fibers, both absolute and normalized for muscle size, and the correlation between their formation and the ageing process. Our results indicate that various muscles develop different numbers of revertant fibers, with different time trends; besides, they suggest that the biological mechanism(s) behind dystrophin re-expression might not be limited to the early development phases but could actually continue during adulthood. Importantly, such finding was seen also in cardiac muscle, a fact that does not fit into the current hypothesis of the clonal origin of “revertant” myonuclei from satellite cells. This work represents the largest, statistically significant analysis of revertant fibers in mdx mice so far, which can now be used as a reference point for improving the evaluation of therapeutic approaches for DMD. At the same time, it provides new clues about the formation of revertant fibers/cardiomyocytes in dystrophic skeletal and cardiac muscle.
Background: In developing muscle, stimulation of mitochondrial biogenesis and mtDNA expansion occur with down-regulation of deoxynucleotide synthesis.
Results: siRNA silencing of mitochondrial thymidine or deoxyguanosine kinase impacts myotube differentiation causing depletion of mtDNA and of all four deoxynucleotides.
Conclusion: Shortage of even a single deoxynucleotide may upset the regulation of all DNA precursors.
Significance: Deoxynucleotide analysis in myotubes unveils unexpected outcomes of synthetic enzyme deficiencies.
During myogenesis, myoblasts fuse into multinucleated myotubes that acquire the contractile fibrils and accessory structures typical of striated skeletal muscle fibers. To support the high energy requirements of muscle contraction, myogenesis entails an increase in mitochondrial (mt) mass with stimulation of mtDNA synthesis and consumption of DNA precursors (dNTPs). Myotubes are quiescent cells and as such down-regulate dNTP production despite a high demand for dNTPs. Although myogenesis has been studied extensively, changes in dNTP metabolism have not been examined specifically. In differentiating cultures of C2C12 myoblasts and purified myotubes, we analyzed expression and activities of enzymes of dNTP biosynthesis, dNTP pools, and the expansion of mtDNA. Myotubes exibited pronounced post-mitotic modifications of dNTP synthesis with a particularly marked down-regulation of de novo thymidylate synthesis. Expression profiling revealed the same pattern of enzyme down-regulation in adult murine muscles. The mtDNA increased steadily after myoblast fusion, turning over rapidly, as revealed after treatment with ethidium bromide. We individually down-regulated p53R2 ribonucleotide reductase, thymidine kinase 2, and deoxyguanosine kinase by siRNA transfection to examine how a further reduction of these synthetic enzymes impacted myotube development. Silencing of p53R2 had little effect, but silencing of either mt kinase caused 50% mtDNA depletion and an unexpected decrease of all four dNTP pools independently of the kinase specificity. We suggest that during development of myotubes the shortage of even a single dNTP may affect all four pools through dysregulation of ribonucleotide reduction and/or dissipation of the non-limiting dNTPs during unproductive elongation of new DNA chains.
Mitochondrial Diseases; Mitochondrial DNA; Nucleoside Nucleotide Metabolism; Ribonucleotide Reductase; Skeletal Muscle; Deoxyguanosine Kinase; mtDNA Depletion; siRNA Silencing; Thymidine Kinase 2
Gene set analysis using biological pathways has become a widely used statistical approach for gene expression analysis. A biological pathway can be represented through a graph where genes and their interactions are, respectively, nodes and edges of the graph. From a biological point of view only some portions of a pathway are expected to be altered; however, few methods using pathway topology have been proposed and none of them tries to identify the signal paths, within a pathway, mostly involved in the biological problem. Here, we present a novel algorithm for pathway analysis clipper, that tries to fill in this gap. clipper implements a two-step empirical approach based on the exploitation of graph decomposition into a junction tree to reconstruct the most relevant signal path. In the first step clipper selects significant pathways according to statistical tests on the means and the concentration matrices of the graphs derived from pathway topologies. Then, it identifies within these pathways the signal paths having the greatest association with a specific phenotype. We test our approach on simulated and two real expression datasets. Our results demonstrate the efficacy of clipper in the identification of signal transduction paths totally coherent with the biological problem.
MAGIA2 (http://gencomp.bio.unipd.it/magia2) is an update, extension and evolution of the MAGIA web tool. It is dedicated to the integrated analysis of in silico target prediction, microRNA (miRNA) and gene expression data for the reconstruction of post-transcriptional regulatory networks. miRNAs are fundamental post-transcriptional regulators of several key biological and pathological processes. As miRNAs act prevalently through target degradation, their expression profiles are expected to be inversely correlated to those of the target genes. Low specificity of target prediction algorithms makes integration approaches an interesting solution for target prediction refinement. MAGIA2 performs this integrative approach supporting different association measures, multiple organisms and almost all target predictions algorithms. Nevertheless, miRNAs activity should be viewed as part of a more complex scenario where regulatory elements and their interactors generate a highly connected network and where gene expression profiles are the result of different levels of regulation. The updated MAGIA2 tries to dissect this complexity by reconstructing mixed regulatory circuits involving either miRNA or transcription factor (TF) as regulators. Two types of circuits are identified: (i) a TF that regulates both a miRNA and its target and (ii) a miRNA that regulates both a TF and its target.
Trabectedin, a new antitumor compound originally derived from a marine tunicate, is clinically effective in soft tissue sarcoma. The drug has shown a high selectivity for myxoid liposarcoma, characterized by the translocation t(12;16)(q13; p11) leading to the expression of FUS-CHOP fusion gene. Trabectedin appears to act interfering with mechanisms of transcription regulation. In particular, the transactivating activity of FUS-CHOP was found to be impaired by trabectedin treatment. Even after prolonged response resistance occurs and thus it is important to elucidate the mechanisms of resistance to trabectedin. To this end we developed and characterized a myxoid liposarcoma cell line resistant to trabectedin (402-91/ET), obtained by exposing the parental 402-91 cell line to stepwise increases in drug concentration. The aim of this study was to compare mRNAs, miRNAs and proteins profiles of 402-91 and 402-91/ET cells through a systems biology approach. We identified 3,083 genes, 47 miRNAs and 336 proteins differentially expressed between 402-91 and 402-91/ET cell lines. Interestingly three miRNAs among those differentially expressed, miR-130a, miR-21 and miR-7, harbored CHOP binding sites in their promoter region. We used computational approaches to integrate the three regulatory layers and to generate a molecular map describing the altered circuits in sensitive and resistant cell lines. By combining transcriptomic and proteomic data, we reconstructed two different networks, i.e. apoptosis and cell cycle regulation, that could play a key role in modulating trabectedin resistance. This approach highlights the central role of genes such as CCDN1, RB1, E2F4, TNF, CDKN1C and ABL1 in both pre- and post-transcriptional regulatory network. The validation of these results in in vivo models might be clinically relevant to stratify myxoid liposarcoma patients with different sensitivity to trabectedin treatment.
Gene set analysis is moving towards considering pathway topology as a crucial feature. Pathway elements are complex entities such as protein complexes, gene family members and chemical compounds. The conversion of pathway topology to a gene/protein networks (where nodes are a simple element like a gene/protein) is a critical and challenging task that enables topology-based gene set analyses.
Unfortunately, currently available R/Bioconductor packages provide pathway networks only from single databases. They do not propagate signals through chemical compounds and do not differentiate between complexes and gene families.
Here we present graphite, a Bioconductor package addressing these issues. Pathway information from four different databases is interpreted following specific biologically-driven rules that allow the reconstruction of gene-gene networks taking into account protein complexes, gene families and sensibly removing chemical compounds from the final graphs. The resulting networks represent a uniform resource for pathway analyses. Indeed, graphite provides easy access to three recently proposed topological methods. The graphite package is available as part of the Bioconductor software suite.
graphite is an innovative package able to gather and make easily available the contents of the four major pathway databases. In the field of topological analysis graphite acts as a provider of biological information by reducing the pathway complexity considering the biological meaning of the pathway elements.
Dysregulation of miRNAs expression plays a critical role in the pathogenesis of genetic, multifactorial disorders and in human cancers. We exploited sequence, genomic and expression information to investigate two main aspects of post-transcriptional regulation in miRNA biogenesis, namely strand selection regulation and expression relationships between intragenic miRNAs and host genes. We considered miRNAs expression profiles, measured in five sizeable microarray datasets, including samples from different normal cell types and tissues, as well as different tumours and disease states. First, the study of expression profiles of “sister” miRNA pairs (miRNA/miRNA*, 5′ and 3′ strands of the same hairpin precursor) showed that the strand selection is highly regulated since it shows tissue-/cell-/condition-specific modulation. We used information about the direction and the strength of the strand selection bias to perform an unsupervised cluster analysis for the sample classification evidencing that is able to distinguish among different tissues, and sometimes between normal and malignant cells. Then, considering a minimum expression threshold, in few miRNA pairs only one mature miRNA is always present in all considered cell types, whereas the majority of pairs were concurrently expressed in some cell types and alternatively in others. In a significant fraction of concurrently expressed pairs, the major and the minor forms found at comparable levels may contribute to post-transcriptional gene silencing, possibly in a coordinate way. In the second part of the study, the behaved tendency to co-expression of intragenic miRNAs and their “host” mRNA genes was confuted by expression profiles examination, suggesting that the expression profile of a given host gene can hardly be a good estimator of co-transcribed miRNA(s) for post-transcriptional regulatory networks inference. Our results point out the regulatory importance of post-transcriptional phases of miRNAs biogenesis, reinforcing the role of such layer of miRNA biogenesis in miRNA-based regulation of cell activities.
Fish has been deemed suitable to study the complex mechanisms of vertebrate skeletogenesis and gilthead seabream (Sparus aurata), a marine teleost with acellular bone, has been successfully used in recent years to study the function and regulation of bone and cartilage related genes during development and in adult animals. Tools recently developed for gilthead seabream, e.g. mineralogenic cell lines and a 4 × 44K Agilent oligo-array, were used to identify molecular determinants of in vitro mineralization and genes involved in anti-mineralogenic action of vanadate.
Global analysis of gene expression identified 4,223 and 4,147 genes differentially expressed (fold change - FC > 1.5) during in vitro mineralization of VSa13 (pre-chondrocyte) and VSa16 (pre-osteoblast) cells, respectively. Comparative analysis indicated that nearly 45% of these genes are common to both cell lines and gene ontology (GO) classification is also similar for both cell types. Up-regulated genes (FC > 10) were mainly associated with transport, matrix/membrane, metabolism and signaling, while down-regulated genes were mainly associated with metabolism, calcium binding, transport and signaling. Analysis of gene expression in proliferative and mineralizing cells exposed to vanadate revealed 1,779 and 1,136 differentially expressed genes, respectively. Of these genes, 67 exhibited reverse patterns of expression upon vanadate treatment during proliferation or mineralization.
Comparative analysis of expression data from fish and data available in the literature for mammalian cell systems (bone-derived cells undergoing differentiation) indicate that the same type of genes, and in some cases the same orthologs, are involved in mechanisms of in vitro mineralization, suggesting their conservation throughout vertebrate evolution and across cell types. Array technology also allowed identification of genes differentially expressed upon exposure of fish cell lines to vanadate and likely involved in its anti-mineralogenic activity. Many were found to be unknown or they were never associated to bone homeostasis previously, thus providing a set of potential candidates whose study will likely bring insights into the complex mechanisms of tissue mineralization and bone formation.
Motivation: Many models and analysis of signaling pathways have been proposed. However, neither of them takes into account that a biological pathway is not a fixed system, but instead it depends on the organism, tissue and cell type as well as on physiological, pathological and experimental conditions.
Results: The Biological Connection Markup Language (BCML) is a format to describe, annotate and visualize pathways. BCML is able to store multiple information, permitting a selective view of the pathway as it exists and/or behave in specific organisms, tissues and cells. Furthermore, BCML can be automatically converted into data formats suitable for analysis and into a fully SBGN-compliant graphical representation, making it an important tool that can be used by both computational biologists and ‘wet lab’ scientists.
Availability and implementation: The XML schema and the BCML software suite are freely available under the LGPL for download at http://bcml.dc-atlas.net. They are implemented in Java and supported on MS Windows, Linux and OS X.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes.
In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets.
We describe STEPath, a method that starts from gene expression profiles and integrates the analysis of imbalanced region as an a priori step before performing gene set analysis. The application of STEPath in individual studies produced gene set scores weighted by chromosomal activation. As a final step, we propose a way to compare these scores across different studies (meta-analysis) on related biological issues. One complication with meta-analysis is batch effects, which occur because molecular measurements are affected by laboratory conditions, reagent lots and personnel differences. Major problems occur when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. We evaluated the power of combining chromosome mapping and gene set enrichment analysis, performing the analysis on a dataset of leukaemia (example of individual study) and on a dataset of skeletal muscle diseases (meta-analysis approach).
In leukaemia, we identified the Hox gene set, a gene set closely related to the pathology that other algorithms of gene set analysis do not identify, while the meta-analysis approach on muscular disease discriminates between related pathologies and correlates similar ones from different studies.
STEPath is a new method that integrates gene expression profiles, genomic co-expressed regions and the information about the biological function of genes. The usage of the STEPath-computed gene set scores overcomes batch effects in the meta-analysis approaches allowing the direct comparison of different pathologies and different studies on a gene set activation level.
Recently, a great effort in microarray data analysis is directed towards the study of the so-called gene sets. A gene set is defined by genes that are, somehow, functionally related. For example, genes appearing in a known biological pathway naturally define a gene set. The gene sets are usually identified from a priori biological knowledge. Nowadays, many bioinformatics resources store such kind of knowledge (see, for example, the Kyoto Encyclopedia of Genes and Genomes, among others). Although pathways maps carry important information about the structure of correlation among genes that should not be neglected, the currently available multivariate methods for gene set analysis do not fully exploit it.
We propose a novel gene set analysis specifically designed for gene sets defined by pathways. Such analysis, based on graphical models, explicitly incorporates the dependence structure among genes highlighted by the topology of pathways. The analysis is designed to be used for overall surveillance of changes in a pathway in different experimental conditions. In fact, under different circumstances, not only the expression of the genes in a pathway, but also the strength of their relations may change. The methods resulting from the proposal allow both to test for variations in the strength of the links, and to properly account for heteroschedasticity in the usual tests for differential expression.
The use of graphical models allows a deeper look at the components of the pathway that can be tested separately and compared marginally. In this way it is possible to test single components of the pathway and highlight only those involved in its deregulation.
MAGIA (miRNA and genes integrated analysis) is a novel web tool for the integrative analysis of target predictions, miRNA and gene expression data. MAGIA is divided into two parts: the query section allows the user to retrieve and browse updated miRNA target predictions computed with a number of different algorithms (PITA, miRanda and Target Scan) and Boolean combinations thereof. The analysis section comprises a multistep procedure for (i) direct integration through different functional measures (parametric and non-parametric correlation indexes, a variational Bayesian model, mutual information and a meta-analysis approach based on P-value combination) of mRNA and miRNA expression data, (ii) construction of bipartite regulatory network of the best miRNA and mRNA putative interactions and (iii) retrieval of information available in several public databases of genes, miRNAs and diseases and via scientific literature text-mining. MAGIA is freely available for Academic users at http://gencomp.bio.unipd.it/magia.
MicroRNAs (miRNAs) are small non-coding RNAs that mediate gene expression at the post-transcriptional and translational levels by an imperfect binding to target mRNA 3′UTR regions. While the ab-initio computational prediction of miRNA–mRNA interactions still poses significant challenges, it is possible to overcome some of its limitations by carefully integrating into the analysis the paired expression profiles of miRNAs and mRNAs. In this work, we show how the choice of a proper probe annotation for microarray platforms is an essential requirement to achieve good sensitivity in the identification of miRNA–mRNA interactions. We compare the results obtained from the analysis of the same expression profiles using both gene and transcript based custom CDFs that we have developed for a number of different annotations (ENSEMBL, RefSeq, AceView). In all cases, transcript-based annotations clearly improve the effectiveness of data integration and thus provide a more reliable confirmation of computationally predicted miRNA–mRNA interactions.
The application of high-throughput genomic tools in nutrition research is a widespread practice. However, it is becoming increasingly clear that the outcome of individual expression studies is insufficient for the comprehensive understanding of such a complex field. Currently, the availability of the large amounts of expression data in public repositories has opened up new challenges on microarray data analyses. We have focused on PPARα, a ligand-activated transcription factor functioning as fatty acid sensor controlling the gene expression regulation of a large set of genes in various metabolic organs such as liver, small intestine or heart. The function of PPARα is strictly connected to the function of its target genes and, although many of these have already been identified, major elements of its physiological function remain to be uncovered. To further investigate the function of PPARα, we have applied a cross-species meta-analysis approach to integrate sixteen microarray datasets studying high fat diet and PPARα signal perturbations in different organisms.
We identified 164 genes (MDEGs) that were differentially expressed in a constant way in response to a high fat diet or to perturbations in PPARs signalling. In particular, we found five genes in yeast which were highly conserved and homologous of PPARα targets in mammals, potential candidates to be used as models for the equivalent mammalian genes. Moreover, a screening of the MDEGs for all known transcription factor binding sites and the comparison with a human genome-wide screening of Peroxisome Proliferating Response Elements (PPRE), enabled us to identify, 20 new potential candidate genes that show, both binding site, both change in expression in the condition studied. Lastly, we found a non random localization of the differentially expressed genes in the genome.
The results presented are potentially of great interest to resume the currently available expression data, exploiting the power of in silico analysis filtered by evolutionary conservation. The analysis enabled us to indicate potential gene candidates that could fill in the gaps with regards to the signalling of PPARα and, moreover, the non-random localization of the differentially expressed genes in the genome, suggest that epigenetic mechanisms are of importance in the regulation of the transcription operated by PPARα.
Publicly available datasets of microarray gene expression signals represent an unprecedented opportunity for extracting genomic relevant information and validating biological hypotheses. However, the exploitation of this exceptionally rich mine of information is still hampered by the lack of appropriate computational tools, able to overcome the critical issues raised by meta-analysis.
This work presents A-MADMAN, an open source web application which allows the retrieval, annotation, organization and meta-analysis of gene expression datasets obtained from Gene Expression Omnibus. A-MADMAN addresses and resolves several open issues in the meta-analysis of gene expression data.
A-MADMAN allows i) the batch retrieval from Gene Expression Omnibus and the local organization of raw data files and of any related meta-information, ii) the re-annotation of samples to fix incomplete, or otherwise inadequate, metadata and to create user-defined batches of data, iii) the integrative analysis of data obtained from different Affymetrix platforms through custom chip definition files and meta-normalization. Software and documentation are available on-line at .
Various normalisation techniques have been developed in the context of microarray analysis to try to correct expression measurements for experimental bias and random fluctuations. Major techniques include: total intensity normalisation; intensity dependent normalisation; and variance stabilising normalisation. The aim of this paper is to discuss the impact of normalisation techniques for two-channel array technology on the process of identification of differentially expressed genes.
Through three precise simulation plans, we quantify the impact of normalisations: (a) on the sensitivity and specificity of a specified test statistic for the identification of deregulated genes, (b) on the gene ranking induced by the statistic.
Although we found a limited difference of sensitivities and specificities for the test after each normalisation, the study highlights a strong impact in terms of gene ranking agreement, resulting in different levels of agreement between competing normalisations. However, we show that the combination of two normalisations, such as glog and lowess, that handle different aspects of microarray data, is able to outperform other individual techniques.
Skeletal muscle mass can be markedly reduced through a process called atrophy, as a consequence of many diseases or critical physiological and environmental situations. Atrophy is characterised by loss of contractile proteins and reduction of fiber volume. Although in the last decade the molecular aspects underlying muscle atrophy have received increased attention, the fine mechanisms controlling muscle degeneration are still incomplete. In this study we applied meta-analysis on gene expression signatures pertaining to different types of muscle atrophy for the identification of novel key regulatory signals implicated in these degenerative processes.
We found a general down-regulation of genes involved in energy production and carbohydrate metabolism and up-regulation of genes for protein degradation and catabolism. Six functional pathways occupy central positions in the molecular network obtained by the integration of atrophy transcriptome and molecular interaction data. They are TGF-β pathway, apoptosis, membrane trafficking/cytoskeleton organization, NFKB pathways, inflammation and reorganization of the extracellular matrix. Protein degradation pathway is evident only in the network specific for muscle short-term response to atrophy. TGF-β pathway plays a central role with proteins SMAD3/4, MYC, MAX and CDKN1A in the general network, and JUN, MYC, GNB2L1/RACK1 in the short-term muscle response network.
Our study offers a general overview of the molecular pathways and cellular processes regulating the establishment and maintenance of atrophic state in skeletal muscle, showing also how the different pathways are interconnected. This analysis identifies novel key factors that could be further investigated as potential targets for the development of therapeutic treatments. We suggest that the transcription factors SMAD3/4, GNB2L1/RACK1, MYC, MAX and JUN, whose functions have been extensively studied in tumours but only marginally in muscle, appear instead to play important roles in regulating muscle response to atrophy.
Aquaculture represents the most sustainable alternative of seafood supply to substitute for the declining marine fisheries, but severe production bottlenecks remain to be solved. The application of genomic technologies offers much promise to rapidly increase our knowledge on biological processes in farmed species and overcome such bottlenecks. Here we present an integrated platform for mRNA expression profiling in the gilthead sea bream (Sparus aurata), a marine teleost of great importance for aquaculture.
A public data base was constructed, consisting of 19,734 unique clusters (3,563 contigs and 16,171 singletons). Functional annotation was obtained for 8,021 clusters. Over 4,000 sequences were also associated with a GO entry. Two 60mer probes were designed for each gene and in-situ synthesized on glass slides using Agilent SurePrint™ technology. Platform reproducibility and accuracy were assessed on two early stages of sea bream development (one-day and four days old larvae). Correlation between technical replicates was always > 0.99, with strong positive correlation between paired probes. A two class SAM test identified 1,050 differentially expressed genes between the two developmental stages. Functional analysis suggested that down-regulated transcripts (407) in older larvae are mostly essential/housekeeping genes, whereas tissue-specific genes are up-regulated in parallel with the formation of key organs (eye, digestive system). Cross-validation of microarray data was carried out using quantitative qRT-PCR on 11 target genes, selected to reflect the whole range of fold-change and both up-regulated and down-regulated genes. A statistically significant positive correlation was obtained comparing expression levels for each target gene across all biological replicates. Good concordance between qRT-PCR and microarray data was observed between 2- and 7-fold change, while fold-change compression in the microarray was present for differences greater than 10-fold in the qRT-PCR.
A highly reliable oligo-microarray platform was developed and validated for the gilthead sea bream despite the presently limited knowledge of the species transcriptome. Because of the flexible design this array will be able to accommodate additional probes as soon as novel unique transcripts are available.
Vibrionaceae represent a significant portion of the cultivable heterotrophic sea bacteria; they strongly affect nutrient cycling and some species are devastating pathogens.
In this work we propose an improved phylogenetic profile analysis on 14 Vibrionaceae genomes, to study the evolution of this family on the basis of gene content.
The phylogenetic profile is based on the observation that genes involved in the same process (e.g. metabolic pathway or structural complex) tend to be concurrently present or absent within different genomes. This allows the prediction of hypothetical functions on the basis of a shared phylogenetic profiles. Moreover this approach is useful to identify putative laterally transferred elements on the basis of their presence on distantly phylogenetically related bacteria.
Vibrionaceae ORFs were aligned against all the available bacterial proteomes. Phylogenetic profile is defined as an array of distances, based on aminoacid substitution matrixes, from single genes to all their orthologues. Final phylogenetic profiles, derived from non-redundant list of all ORFs, was defined as the median of all the profiles belonging to the cluster. The resulting phylogenetic profiles matrix contains gene clusters on the rows and organisms on the columns.
Cluster analysis identified groups of "core genes" with a widespread high similarity across all the organisms and several clusters that contain genes homologous only to a limited set of organisms. On each of these clusters, COG class enrichment has been calculated. The analysis reveals that clusters of core genes have the highest number of enriched classes, while the others are enriched just for few of them like DNA replication, recombination and repair.
We found that mobile elements have heterogeneous profiles not only across the entire set of organisms, but also within Vibrionaceae; this confirms their great influence on bacteria evolution even inside the same family. Furthermore, several hypothetical proteins highly correlate with mobile elements profiles suggesting a possible horizontal transfer mechanism for the evolution of these genes. Finally, we suggested the putative role of some ORFs having an unknown function on the basis of their phylogenetic profile similarity to well characterized genes.
Rhabdomyosarcoma is a highly malignant soft tissue sarcoma in childhood and arises as a consequence of regulatory disruption of the growth and differentiation pathways of myogenic precursor cells. The pathogenic pathways involved in this tumor are mostly unknown and therefore a better characterization of RMS gene expression profile would represent a considerable advance. The availability of publicly available gene expression datasets have opened up new challenges especially for the integration of data generated by different research groups and different array platforms with the purpose of obtaining new insights on the biological process investigated.
In this work we performed a meta-analysis on four microarray and two SAGE datasets of gene expression data on RMS in order to evaluate the degree of agreement of the biological results obtained by these different studies and to identify common regulatory pathways that could be responsible of tumor growth. Regulatory pathways and biological processes significantly enriched has been investigated and a list of differentially meta-profiles have been identified as possible candidate of aggressiveness of RMS.
Our results point to a general down regulation of the energy production pathways, suggesting a hypoxic physiology for RMS cells. This result agrees with the high malignancy of RMS and with its resistance to most of the therapeutic treatments. In this context, different isoforms of the ANT gene have been consistently identified for the first time as differentially expressed in RMS. This gene is involved in anti-apoptotic processes when cells grow in low oxygen conditions. These new insights in the biological processes responsible of RMS growth and development demonstrate the effective advantage of the use of integrated analysis of gene expression studies.
MIDAW (microarray data analysis web tool) is a web interface integrating a series of statistical algorithms that can be used for processing and interpretation of microarray data. MIDAW consists of two main sections: data normalization and data analysis. In the normalization phase the simultaneous processing of several experiments with background correction, global and local mean and variance normalization are carried out. The data analysis section allows graphical display of expression data for descriptive purposes, estimation of missing values, reduction of data dimension, discriminant analysis and identification of marker genes. The statistical results are organized in dynamic web pages and tables, where the transcript/gene probes contained in a specific microarray platform can be linked (according to user choice) to external databases (GenBank, Entrez Gene, UniGene). Tutorial files help the user throughout the statistical analysis to ensure that the forms are filled out correctly. MIDAW has been developed using Perl and PHP and it uses R/Bioconductor languages and routines. MIDAW is GPL licensed and freely accessible at . Perl and PHP source codes are available from the authors upon request.
The variability of results in microarray technology is in part due to the fact that independent scans of a single hybridised microarray give spot images that are not quite the same. To solve this problem and turn it to our advantage, we introduced the approach of multiple scanning and of image integration of microarrays. To this end, we have developed specific software that creates a virtual image that statistically summarises a series of consecutive scans of a microarray. We provide evidence that the use of multiple imaging (i) enhances the detection of differentially expressed genes; (ii) increases the image homogeneity; and (iii) reveals false-positive results such as differentially expressed genes that are detected by a single scan but not confirmed by successive scanning replicates. The increase in the final number of differentially expressed genes detected in a microarray experiment with this approach is remarkable; 50% more for microarrays hybridised with targets labelled by reverse transcriptase, and 200% more for microarrays developed with the tyramide signal amplification (TSA) technique. The results have been confirmed by semi-quantitative RT–PCR tests.