Human Mesenchymal Stromal/Stem Cells (MSCs) are adult multipotent cells that behave in a highly plastic manner, inhabiting the stroma of several tissues. The potential utility of MSCs is nowadays strongly investigated in the field of regenerative medicine and cell therapy, although many questions about their molecular identity remain uncertain.
MSC primary cultures from human bone marrow (BM) and placenta (PL) were derived and verified by their immunophenotype standard pattern and trilineage differentiation potential. Then, a broad characterization of the transcriptome of these MSCs was performed using RNA deep sequencing (RNA-Seq). Quantitative analysis of these data rendered an extensive expression footprint that includes 5,271 protein-coding genes. Flow cytometry assays of canonical MSC CD-markers were congruent with their expression levels detected by the RNA-Seq. Expression of other recently proposed MSC markers (CD146, Nestin and CD271) was tested in the placenta samples, finding only CD146 and Nestin. Functional analysis revealed enrichment in stem cell related genes and mesenchymal regulatory transcription factors (TFs). Analysis of TF binding sites (TFBSs) identified 11 meta-regulators, including factors KLF4 and MYC among them. Epigenetically, hypomethylated promoter patterns supported the active expression of the MSC TFs found. An interaction network of these TFs was built to show up their links and relations. Assessment of dissimilarities between cell origins (BM versus PL) disclosed two hundred differentially expressed genes enrolled in microenvironment processes related to the cellular niche, as regulation of bone formation and blood vessel morphogenesis for the case of BM-MSCs. By contrast genes overexpressed in PL-MSCs showed functional enrichment on mitosis, negative regulation of cell-death and embryonic morphogenesis that supported the higher growth rates observed in the cultures of these fetal cells and their closer links with development processes.
The results present a transcriptomic portrait of the human MSCs isolated from bone marrow and placenta. The data are released as a cell-specific resource, providing a comprehensive expression footprint of the MSCs useful to better understand their cellular and molecular biology and for further investigations on the isolation and biomedical use of these multipotent cells.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-910) contains supplementary material, which is available to authorized users.
Stromal cells; Mesenchymal stem cells; Placenta; Bone marrow; Human gene expression; RNA-Seq; Transcription factors
Accurate analysis of whole-gene expression and individual-exon expression is essential to characterize different transcript isoforms and identify alternative splicing events in human genes. One of the omic technologies widely used in many studies on human samples are the exon-specific expression microarray platforms.
Since there are not many validated comparative analyses to identify specific splicing events using data derived from these types of platforms, we have developed an algorithm (called ESLiM) to detect significant changes in exon use, and applied it to a reference dataset of 270 human genes that show alternative expression in different tissues. We compared the results with three other methodological approaches and provided the R source code to be applied elsewhere. The genes positively detected by these analyses also provide a verified subset of human genes that present tissue-regulated isoforms. Furthermore, we performed a validation analysis on human patient samples comparing two different subtypes of acute myeloid leukemia (AML) and we experimentally validated the splicing in several selected genes that showed exons with highly significant signal change.
The comparative analyses with other methods using a fair set of human genes that show alternative splicing and the validation on clinical samples demonstrate that the proposed novel algorithm is a reliable tool for detecting differential splicing in exon-level expression data.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-879) contains supplementary material, which is available to authorized users.
Alternative splicing; Splicing index; Human genomics; Exons; Transcripts; Gene expression; Differential expression; Bioinformatics; R algorithm; Acute myeloid leukemia
TET2 is involved in a variety of hematopoietic malignancies, mainly in myeloid malignancies. Most mutations of TET2 have been identified in myeloid disorders, but some have also recently been described in mature lymphoid neoplasms. In contrast to the large amount of data about mutations of TET2, some data are available for gene expression. Moreover, the role of TET2 in chronic lymphocytic leukemia (CLL) is unknown. This study analyzes both TET2 expression and mutations in 48 CLL patients. TET2 expression was analyzed by exon arrays and quantitative real-time polymerase chain reaction (qRT-PCR). Next-generation sequencing (NGS) technology was applied to investigate the presence of TET2 variations. Overexpression of TET2 was observed in B-cell lymphocytes from CLL patients compared with healthy donors (P = 0.004). In addition, in CLL patients, an overexpression of TET2 was also observed in the clonal B cells compared with the nontumoral cells (P = 0.002). However, no novel mutations were observed. Therefore, overexpression of TET2 in CLL seems to be unrelated to the presence of genomic TET2 variations.
Insulin-like Growth Factor 1 (IGF1) is a multifunctional regulator of somatic growth and development throughout evolution. IGF1 signaling through IGF type 1 receptor (IGF1R) controls cell proliferation, survival and differentiation in multiple cell types. IGF1 deficiency in mice disrupts lung morphogenesis, causing altered prenatal pulmonary alveologenesis. Nevertheless, little is known about the cellular and molecular basis of IGF1 activity during lung development.
Prenatal Igf1−/− mutant mice with a C57Bl/6J genetic background displayed severe disproportional lung hypoplasia, leading to lethal neonatal respiratory distress. Immuno-histological analysis of their lungs showed a thickened mesenchyme, alterations in extracellular matrix deposition, thinner smooth muscles and dilated blood vessels, which indicated immature and delayed distal pulmonary organogenesis. Transcriptomic analysis of Igf1−/− E18.5 lungs using RNA microarrays identified deregulated genes related to vascularization, morphogenesis and cellular growth, and to MAP-kinase, Wnt and cell-adhesion pathways. Up-regulation of immunity-related genes was verified by an increase in inflammatory markers. Increased expression of Nfib and reduced expression of Klf2, Egr1 and Ctgf regulatory proteins as well as activation of ERK2 MAP-kinase were corroborated by Western blot. Among IGF-system genes only IGFBP2 revealed a reduction in mRNA expression in mutant lungs. Immuno-staining patterns for IGF1R and IGF2, similar in both genotypes, correlated to alterations found in specific cell compartments of Igf1−/− lungs. IGF1 addition to Igf1−/− embryonic lungs cultured ex vivo increased airway septa remodeling and distal epithelium maturation, processes accompanied by up-regulation of Nfib and Klf2 transcription factors and Cyr61 matricellular protein.
We demonstrated the functional tissue specific implication of IGF1 on fetal lung development in mice. Results revealed novel target genes and gene networks mediators of IGF1 action on pulmonary cellular proliferation, differentiation, adhesion and immunity, and on vascular and distal epithelium maturation during prenatal lung development.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
bioinformatics; training; bioinformatics courses; training life scientists; train the trainers
Summary: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available.
Patients with chronic lymphocytic leukemia and 13q deletion as their only FISH abnormality could have a different outcome depending on the number of cells displaying this aberration. Thus, cases with a high number of 13q- cells (13q-H) had both shorter overall survival and time to first therapy. The goal of the study was to analyze the genetic profile of 13q-H patients.
Design and Methods:
A total of 102 samples were studied, 32 of which served as a validation cohort and five were healthy donors.
Chronic lymphocytic leukemia patients with higher percentages of 13q- cells (>80%) showed a different level of gene expression as compared to patients with lower percentages (<80%, 13q-L). This deregulation affected genes involved in apoptosis and proliferation (BCR and NFkB signaling), leading to increased proliferation and decreased apoptosis in 13q-H patients. Deregulation of several microRNAs, such as miR-15a, miR-155, miR-29a and miR-223, was also observed in these patients. In addition, our study also suggests that the gene expression pattern of 13q-H cases could be similar to the patients with 11q- or 17p-.
This study provides new evidence regarding the heterogeneity of 13q deletion in chronic lymphocytic leukemia patients, showing that apoptosis, proliferation as well as miRNA regulation are involved in cases with higher percentages of 13q- cells.
Analysis of DNA copy number alterations and gene expression changes in human samples have been used to find potential target genes in complex diseases. Recent studies have combined these two types of data using different strategies, but focusing on finding gene-based relationships. However, it has been proposed that these data can be used to identify key genomic regions, which may enclose causal genes under the assumption that disease-associated gene expression changes are caused by genomic alterations.
Following this proposal, we undertake a new integrative analysis of genome-wide expression and copy number datasets. The analysis is based on the combined location of both types of signals along the genome. Our approach takes into account the genomic location in the copy number (CN) analysis and also in the gene expression (GE) analysis. To achieve this we apply a segmentation algorithm to both types of data using paired samples. Then, we perform a correlation analysis and a frequency analysis of the gene loci in the segmented CN regions and the segmented GE regions; selecting in both cases the statistically significant loci. In this way, we find CN alterations that show strong correspondence with GE changes. We applied our method to a human dataset of 64 Glioblastoma Multiforme samples finding key loci and hotspots that correspond to major alterations previously described for this type of tumors.
Identification of key altered genomic loci constitutes a first step to find the genes that drive the alteration in a malignant state. These driver genes can be found in regions that show high correlation in copy number alterations and expression changes.
Most sporadic colorectal cancer (sCRC) deaths are caused by metastatic dissemination of the primary tumor. New advances in genetic profiling of sCRC suggest that the primary tumor may contain a cell population with metastatic potential. Here we compare the cytogenetic profile of primary tumors from liver metastatic versus non-metastatic sCRC.
We prospectively analyzed the frequency of numerical/structural abnormalities of chromosomes 1, 7, 8, 13, 14, 17, 18, 20, and 22 by iFISH in 58 sCRC patients: thirty-one non-metastatic (54%) vs. 27 metastatic (46%) disease. From a total of 18 probes, significant differences emerged only for the 17p11.2 and 22q11.2 chromosomal regions. Patients with liver metastatic sCRC showed an increased frequency of del(17p11.2) (10% vs. 67%;p<.001) and del(22q11.2) (0% vs. 22%;p = .02) versusnon-metastatic cases. Multivariate analysis of prognostic factors for overall survival (OS) showed that the only clinical and cytogenetic parameters that had an independent adverse impact on patient outcome were the presence of del(17p) with a 17p11.2 breakpoint and del(22q11.2). Based on these two cytogenetic variables, patients were classified into three groups: low- (no adverse features), intermediate- (one adverse feature) and high-risk (two adverse features)- with significantly different OS rates at 5-years (p<.001): 92%, 53% and 0%, respectively.
Our results unravel the potential implication of del(17p11.2) in sCRC patients with liver metastasis as this cytogenetic alteration appears to be intrinsically related to an increased metastatic potential and a poor outcome, providing additional prognostic information to that associated with other cytogenetic alterations such as del(22q11.2). Additional prospective studies in larger series of patients would be required to confirm the clinical utility of the new prognostic markers identified.
Transgenic expression of the MafB oncogene in haematopoietic stem/progenitor cells induces plasma cell neoplasia reminiscent of human multiple myeloma and suggests DNA methylation as cause of malignant transformation.
Understanding the cellular origin of cancer can help to improve disease prevention and therapeutics. Human plasma cell neoplasias are thought to develop from either differentiated B cells or plasma cells. However, when the expression of Maf oncogenes (associated to human plasma cell neoplasias) is targeted to mouse B cells, the resulting animals fail to reproduce the human disease. Here, to explore early cellular changes that might take place in the development of plasma cell neoplasias, we engineered transgenic mice to express MafB in haematopoietic stem/progenitor cells (HS/PCs). Unexpectedly, we show that plasma cell neoplasias arise in the MafB-transgenic mice. Beyond their clinical resemblance to human disease, these neoplasias highly express genes that are known to be upregulated in human multiple myeloma. Moreover, gene expression profiling revealed that MafB-expressing HS/PCs were more similar to B cells and tumour plasma cells than to any other subset, including wild-type HS/PCs. Consistent with this, genome-scale DNA methylation profiling revealed that MafB imposes an epigenetic program in HS/PCs, and that this program is preserved in mature B cells of MafB-transgenic mice, demonstrating a novel molecular mechanism involved in tumour initiation. Our findings suggest that, mechanistically, the haematopoietic progenitor population can be the target for transformation in MafB-associated plasma cell neoplasias.
cancer therapy; MafB; multiple myeloma mouse model; oncogenes; reprogramming stem cells
Motivation: Recent developments in experimental methods facilitate increasingly larger signal transduction datasets. Two main approaches can be taken to derive a mathematical model from these data: training a network (obtained, e.g., from literature) to the data, or inferring the network from the data alone. Purely data-driven methods scale up poorly and have limited interpretability, whereas literature-constrained methods cannot deal with incomplete networks.
Results: We present an efficient approach, implemented in the R package CNORfeeder, to integrate literature-constrained and data-driven methods to infer signalling networks from perturbation experiments. Our method extends a given network with links derived from the data via various inference methods, and uses information on physical interactions of proteins to guide and validate the integration of links. We apply CNORfeeder to a network of growth and inflammatory signalling. We obtain a model with superior data fit in the human liver cancer HepG2 and propose potential missing pathways.
Availability: CNORfeeder is in the process of being submitted to Bioconductor and in the meantime available at www.cellnopt.org.
Supplementary data are available at Bioinformatics online.
Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of ‘high-throughput biology’, the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.
Bioinformatics; training; end users; bioinformatics courses; learning bioinformatics
Functional analysis of large sets of genes and proteins is becoming more and more necessary with the increase of experimental biomolecular data at omic-scale. Enrichment analysis is by far the most popular available methodology to derive functional implications of sets of cooperating genes. The problem with these techniques relies in the redundancy of resulting information, that in most cases generate lots of trivial results with high risk to mask the reality of key biological events. We present and describe a computational method, called GeneTerm Linker, that filters and links enriched output data identifying sets of associated genes and terms, producing metagroups of coherent biological significance. The method uses fuzzy reciprocal linkage between genes and terms to unravel their functional convergence and associations. The algorithm is tested with a small set of well known interacting proteins from yeast and with a large collection of reference sets from three heterogeneous resources: multiprotein complexes (CORUM), cellular pathways (SGD) and human diseases (OMIM). Statistical Precision, Recall and balanced F-score are calculated showing robust results, even when different levels of random noise are included in the test sets. Although we could not find an equivalent method, we present a comparative analysis with a widely used method that combines enrichment and functional annotation clustering. A web application to use the method here proposed is provided at http://gtlinker.cnb.csic.es.
Interactome networks represent sets of possible physical interactions between proteins. They lack spatio-temporal information by construction. However, the specialized functions of the differentiated cell types which are assembled into tissues or organs depend on the combinatorial arrangements of proteins and their physical interactions. Is tissue-specificity, therefore, encoded within the interactome? In order to address this question, we combined protein-protein interactions, expression data, functional annotations and interactome topology. We first identified a subnetwork formed exclusively of proteins whose interactions were observed in all tested tissues. These are mainly involved in housekeeping functions and are located at the topological center of the interactome. This ‘Largest Common Interactome Network’ represents a ‘functional interactome core’. Interestingly, two types of tissue-specific interactions are distinguished when considering function and network topology: tissue-specific interactions involved in regulatory and developmental functions are central whereas tissue-specific interactions involved in organ physiological functions are peripheral. Overall, the functional organization of the human interactome reflects several integrative levels of functions with housekeeping and regulatory tissue-specific functions at the center and physiological tissue-specific functions at the periphery. This gradient of functions recapitulates the organization of organs, from cells to organs. Given that several gradients have already been identified across interactomes, we propose that gradients may represent a general principle of protein-protein interaction network organization.
For years, the genetics of metastatic colorectal cancer (CRC) have been studied using a variety of techniques. However, most of the approaches employed so far have a relatively limited resolution which hampers detailed characterization of the common recurrent chromosomal breakpoints as well as the identification of small regions carrying genetic changes and the genes involved in them.
Here we applied 500K SNP arrays to map the most common chromosomal lesions present at diagnosis in a series of 23 primary tumours from sporadic CRC patients who had developed liver metastasis. Overall our results confirm that the genetic profile of metastatic CRC is defined by imbalanced gains of chromosomes 7, 8q, 11q, 13q, 20q and X together with losses of the 1p, 8p, 17p and 18q chromosome regions. In addition, SNP-array studies allowed the identification of small (<1.3 Mb) and extensive/large (>1.5 Mb) altered DNA sequences, many of which contain cancer genes known to be involved in CRC and the metastatic process. Detailed characterization of the breakpoint regions for the altered chromosomes showed four recurrent breakpoints at chromosomes 1p12, 8p12, 17p11.2 and 20p12.1; interestingly, the most frequently observed recurrent chromosomal breakpoint was localized at 17p11.2 and systematically targeted the FAM27L gene, whose role in CRC deserves further investigations.
In summary, in the present study we provide a detailed map of the genetic abnormalities of primary tumours from metastatic CRC patients, which confirm and extend on previous observations as regards the identification of genes potentially involved in development of CRC and the metastatic process.
Genome-wide expression studies have developed exponentially in recent years as a result of extensive use of microarray technology. However, expression signals are typically calculated using the assignment of "probesets" to genes, without addressing the problem of "gene" definition or proper consideration of the location of the measuring probes in the context of the currently known genomes and transcriptomes. Moreover, as our knowledge of metazoan genomes improves, the number of both protein-coding and noncoding genes, as well as their associated isoforms, continues to increase. Consequently, there is a need for new databases that combine genomic and transcriptomic information and provide updated mapping of expression probes to current genomic annotations.
GATExplorer (Genomic and Transcriptomic Explorer) is a database and web platform that integrates a gene loci browser with nucleotide level mappings of oligo probes from expression microarrays. It allows interactive exploration of gene loci, transcripts and exons of human, mouse and rat genomes, and shows the specific location of all mappable Affymetrix microarray probes and their respective expression levels in a broad set of biological samples. The web site allows visualization of probes in their genomic context together with any associated protein-coding or noncoding transcripts. In the case of all-exon arrays, this provides a means by which the expression of the individual exons within a gene can be compared, thereby facilitating the identification and analysis of alternatively spliced exons. The application integrates data from four major source databases: Ensembl, RNAdb, Affymetrix and GeneAtlas; and it provides the users with a series of files and packages (R CDFs) to analyze particular query expression datasets. The maps cover both the widely used Affymetrix GeneChip microarrays based on 3' expression (e.g. human HG U133 series) and the all-exon expression microarrays (Gene 1.0 and Exon 1.0).
GATExplorer is an integrated database that combines genomic/transcriptomic visualization with nucleotide-level probe mapping. By considering expression at the nucleotide level rather than the gene level, it shows that the arrays detect expression signals from entities that most researchers do not contemplate or discriminate. This approach provides the means to undertake a higher resolution analysis of microarray data and potentially extract considerably more detailed and biologically accurate information from existing and future microarray experiments.
Transcriptional and functional analysis reveals that the H-Ras and N-Ras isoforms have different roles in the initial phases of the mouse cell cycle
Using oligonucleotide microarrays, we compared transcriptional profiles corresponding to the initial cell cycle stages of mouse fibroblasts lacking the small GTPases H-Ras and/or N-Ras with those of matching, wild-type controls.
Serum-starved wild-type and knockout ras fibroblasts had very similar transcriptional profiles, indicating that H-Ras and N-Ras do not significantly control transcriptional responses to serum deprivation stress. In contrast, genomic disruption of H-ras or N-ras, individually or in combination, determined specific differential gene expression profiles in response to post-starvation stimulation with serum for 1 hour (G0/G1 transition) or 8 hours (mid-G1 progression). The absence of N-Ras caused significantly higher changes than the absence of H-Ras in the wave of transcriptional activation linked to G0/G1 transition. In contrast, the absence of H-Ras affected the profile of the transcriptional wave detected during G1 progression more strongly than did the absence of N-Ras. H-Ras was predominantly functionally associated with growth and proliferation, whereas N-Ras had a closer link to the regulation of development, the cell cycle, immunomodulation and apoptosis. Mechanistic analysis indicated that extracellular signal-regulated kinase (ERK)-dependent activation of signal transducer and activator of transcription 1 (Stat1) mediates the regulatory effect of N-Ras on defense and immunity, whereas the pro-apoptotic effects of N-Ras are mediated through ERK and p38 mitogen-activated protein kinase signaling.
Our observations confirm the notion of an absolute requirement for different peaks of Ras activity during the initial stages of the cell cycle and document the functional specificity of H-Ras and N-Ras during those processes.
DNA microarrays provide rich profiles that are used in
cancer prediction considering the gene expression levels
across a collection of related samples. Support Vector Machines
(SVM) have been applied to the classification of cancer
samples with encouraging results. However, they rely on
Euclidean distances that fail to reflect accurately the proximities
among sample profiles. Then, non-Euclidean dissimilarities
provide additional information that should be considered
to reduce the misclassification errors.
In this paper, we incorporate in the ν-SVM algorithm a
linear combination of non-Euclidean dissimilarities. The
weights of the combination are learnt in a (Hyper
Reproducing Kernel Hilbert Space) HRKHS using a Semidefinite
Programming algorithm. This approach allows us to incorporate
a smoothing term that penalizes the complexity of the
family of distances and avoids overfitting. The experimental results suggest that the method proposed
helps to reduce the misclassification errors in several
human cancer problems.
Analysis of gene expression data using genome-wide microarrays is a technique often used in genomic studies to find coexpression patterns and locate groups of co-transcribed genes. However, most studies done at global “omic” scale are not focused on human samples and when they correspond to human very often include heterogeneous datasets, mixing normal with disease-altered samples. Moreover, the technical noise present in genome-wide expression microarrays is another well reported problem that many times is not addressed with robust statistical methods, and the estimation of errors in the data is not provided.
Human genome-wide expression data from a controlled set of normal-healthy tissues is used to build a confident human gene coexpression network avoiding both pathological and technical noise. To achieve this we describe a new method that combines several statistical and computational strategies: robust normalization and expression signal calculation; correlation coefficients obtained by parametric and non-parametric methods; random cross-validations; and estimation of the statistical accuracy and coverage of the data. All these methods provide a series of coexpression datasets where the level of error is measured and can be tuned. To define the errors, the rates of true positives are calculated by assignment to biological pathways. The results provide a confident human gene coexpression network that includes 3327 gene-nodes and 15841 coexpression-links and a comparative analysis shows good improvement over previously published datasets. Further functional analysis of a subset core network, validated by two independent methods, shows coherent biological modules that share common transcription factors. The network reveals a map of coexpression clusters organized in well defined functional constellations. Two major regions in this network correspond to genes involved in nuclear and mitochondrial metabolism and investigations on their functional assignment indicate that more than 60% are house-keeping and essential genes. The network displays new non-described gene associations and it allows the placement in a functional context of some unknown non-assigned genes based on their interactions with known gene families.
The identification of stable and reliable human gene to gene coexpression networks is essential to unravel the interactions and functional correlations between human genes at an omic scale. This work contributes to this aim, and we are making available for the scientific community the validated human gene coexpression networks obtained, to allow further analyses on the network or on some specific gene associations.
The data are available free online at http://bioinfow.dep.usal.es/coexpression/.
The 90S preribosomal particle is required for the production of the 18S rRNA from a pre-rRNA precursor. Despite the identification of the protein components of this particle, its mechanism of assembly and structural design remain unknown. In this work, we have combined biochemical studies, proteomic techniques, and bioinformatic analyses to shed light into the rules of assembly of the yeast 90S preribosome. Our results indicate that several protein subcomplexes work as discrete assembly subunits that bind in defined steps to the 35S pre-rRNA. The assembly of the t-UTP subunit is an essential step for the engagement of at least five additional subunits in two separate, and mutually independent, assembling routes. One of these routes leads to the formation of an assembly intermediate composed of the U3 snoRNP, the Pwp2p/UTP-B, subunit and the Mpp10p complex. The other assembly route involves the stepwise binding of Rrp5p and the UTP-C subunit. We also report the use of a bioinformatic approach that provides a model for the topological arrangement of protein components within the fully assembled particle. Together, our data identify the mechanism of assembly of the 90S preribosome and offer novel information about its internal architecture.
The pyridine nucleotide disulfide reductase (PNDR) is a large and heterogeneous protein family divided into two classes (I and II), which reflect the divergent evolution of its characteristic disulfide redox active site. However, not all the PNDR members fit into these categories and this suggests the need of further studies to achieve a more comprehensive classification of this complex family.
A workflow to improve the clusterization of protein families based on the array of linear conserved motifs is designed. The method is applied to the PNDR large family finding two main groups, which correspond to PNDR classes I and II. However, two other separate protein clusters, previously classified as class I in most databases, are outgrouped: the peroxide reductases (NAOX, NAPE) and the type II NADH dehydrogenases (NDH-2). In this way, two novel PNDR classes III and IV for NAOX/NAPE and NDH-2 respectively are proposed. By knowledge-driven biochemical and functional data analyses done on the new class IV, a linear array of motifs putatively related to Cu(II)-reductase activity is detected in a specific subset of NDH-2.
The results presented are a novel contribution to the classification of the complex and large PNDR protein family, supporting its reclusterization into four classes. The linear array of motifs detected within the class IV PNDR subfamily could be useful as a signature for a particular subgroup of NDH-2.
Agile Protein Interaction DataAnalyzer (APID) is an interactive bioinformatics web tool developed to integrate and analyze in a unified and comparative platform main currently known information about protein–protein interactions demonstrated by specific small-scale or large-scale experimental methods. At present, the application includes information coming from five main source databases enclosing an unified sever to explore >35 000 different proteins and 111 000 different proven interactions. The web includes search tools to query and browse upon the data, allowing selection of the interaction pairs based in calculated parameters that weight and qualify the reliability of each given protein interaction. Such parameters are for the ‘proteins’: connectivity, cluster coefficient, Gene Ontology (GO) functional environment, GO environment enrichment; and for the ‘interactions’: number of methods, GO overlapping, iPfam domain–domain interaction. APID also includes a graphic interactive tool to visualize selected sub-networks and to navigate on them or along the whole interaction network. The application is available open access at .