Gene/protein recognition and normalization is an important preliminary step for many biological text mining tasks. In this paper, we present a multistage gene normalization system which consists of four major subtasks: pre-processing, dictionary matching, ambiguity resolution and filtering. For the first subtask, we apply the gene mention tagger developed in our earlier work, which achieves an F-score of 88.42% on the BioCreative II GM testing set. In the stage of dictionary matching, the exact matching and approximate matching between gene names and the EntrezGene lexicon have been combined. For the ambiguity resolution subtask, we propose a semantic similarity disambiguation method based on Munkres' Assignment Algorithm. At the last step, a filter based on Wikipedia has been built to remove the false positives. Experimental results show that the presented system can achieve an F-score of 90.1%, outperforming most of the state-of-the-art systems.
Altered DNA methylation patterns represent an attractive mechanism for understanding the phenotypic changes associated with human aging. Several studies have described global and complex age-related methylation changes, but their structural and functional significance has remained largely unclear.
We have used transcriptome sequencing to characterize age-related gene expression changes in the human epidermis. The results revealed a significant set of 75 differentially expressed genes with a strong functional relationship to skin homeostasis. We then used whole-genome bisulfite sequencing to identify age-related methylation changes at single-base resolution. Data analysis revealed no global aberrations, but rather highly localized methylation changes, particularly in promoter and enhancer regions that were associated with altered transcriptional activity.
Our results suggest that the core developmental program of human skin is stably maintained through the aging process and that aging is associated with a limited destabilization of the epigenome at gene regulatory elements.
Aging; DNA methylation; Epidermis; Methylome sequencing; Transcriptome sequencing
K-ras gene mutations were common in colorectal patients, but their relationship with prognosis was unclear.
Verify prognostic differences between patient with and without mutant K-ras genes by reviewing the published evidence.
Systematic reviews and data bases were searched for cohort/case-control studies of prognosis of colorectal cancer patients with detected K-ras mutations versus those without mutant K-ras genes, both of whom received chemotherapy. Number of patients, regimens of chemotherapy, and short-term or long-term survival rate (disease-free or overall) were extracted. Quality of studies was also evaluated.
7 studies of comparisons with a control group were identified. No association between K-ras gene status with neither short-term disease free-survival (OR=1.01, 95% CI, 0.73-1.38, P=0.97) nor overall survival (OR=1.06, 95% CI, 0.82-1.36, P=0.66) in CRC patients who received chemotherapy was indicated. Comparison of long-term survival between two groups also indicated no significant difference after heterogeneity was eliminated (OR=1.09, 95% CI, 0.85-1.40, P=0.49).
K-ras gene mutations may not be a prognostic index for colorectal cancer patients who received chemotherapy.
Oligodendroglial tumors form a distinct subgroup of gliomas, characterized by a better response to treatment and prolonged overall survival. Most oligodendrogliomas and also some oligoastrocytomas are characterized by a unique and typical unbalanced translocation, der(1,19), resulting in a 1p/19q co-deletion. Candidate tumor suppressor genes targeted by these losses, CIC on 19q13.2 and FUBP1 on 1p31.1, were only recently discovered. We analyzed 17 oligodendrogliomas and oligoastrocytomas by applying a comprehensive approach consisting of RNA expression analysis, DNA sequencing of CIC, FUBP1, IDH1/2, and array CGH. We confirmed three different genetic subtypes in our samples: i) the “oligodendroglial” subtype with 1p/19q co-deletion in twelve out of 17 tumors; ii) the “astrocytic” subtype in three tumors; iii) the “other” subtype in two tumors. All twelve tumors with the 1p/19q co-deletion carried the most common IDH1 R132H mutation. In seven of these tumors, we found protein-disrupting point mutations in the remaining allele of CIC, four of which are novel. One of these tumors also had a deleterious mutation in FUBP1. Only by integrating RNA expression and array CGH data, were we able to discover an exon-spanning homozygous microdeletion within the remaining allele of CIC in an additional tumor with 1p/19q co-deletion. Therefore we propose that the mutation rate might be underestimated when looking at sequence variants alone. In conclusion, the high frequency and the spectrum of CIC mutations in our 1p/19q-codeleted tumor cohort support the hypothesis that CIC acts as a tumor suppressor in these tumors, whereas FUBP1 might play only a minor role.
Hepatitis C virus (HCV) infection develops into chronicity in 80% of all patients, characterized by persistent low-level replication. To understand how the virus establishes its tightly controlled intracellular RNA replication cycle, we developed the first detailed mathematical model of the initial dynamic phase of the intracellular HCV RNA replication. We therefore quantitatively measured viral RNA and protein translation upon synchronous delivery of viral genomes to host cells, and thoroughly validated the model using additional, independent experiments. Model analysis was used to predict the efficacy of different classes of inhibitors and identified sensitive substeps of replication that could be targeted by current and future therapeutics. A protective replication compartment proved to be essential for sustained RNA replication, balancing translation versus replication and thus effectively limiting RNA amplification. The model predicts that host factors involved in the formation of this compartment determine cellular permissiveness to HCV replication. In gene expression profiling, we identified several key processes potentially determining cellular HCV replication efficiency.
Hepatitis C is a severe disease and a prime cause for liver transplantation. Up to 3% of the world's population are chronically infected with its causative agent, the Hepatitis C virus (HCV). This capacity to establish long (decades) lasting persistent infection sets HCV apart from other plus-strand RNA viruses typically causing acute, self-limiting infections. A prerequisite for its capacity to persist is HCV's complex and tightly regulated intracellular replication strategy. In this study, we therefore wanted to develop a comprehensive understanding of the molecular processes governing HCV RNA replication in order to pinpoint the most vulnerable substeps in the viral life cycle. For that purpose, we used a combination of biological experiments and mathematical modeling. Using the model to study HCV's replication strategy, we recognized diverse but crucial roles for the membraneous replication compartment of HCV in regulating RNA amplification. We further predict the existence of an essential limiting host factor (or function) required for establishing active RNA replication and thereby determining cellular permissiveness for HCV. Our model also proved valuable to understand and predict the effects of pharmacological inhibitors of HCV and might be a solid basis for the development of similar models for other plus-strand RNA viruses.
Perturbation experiments for example using RNA interference (RNAi) offer an attractive way to elucidate gene function in a high throughput fashion. The placement of hit genes in their functional context and the inference of underlying networks from such data, however, are challenging tasks. One of the problems in network inference is the exponential number of possible network topologies for a given number of genes. Here, we introduce a novel mathematical approach to address this question. We formulate network inference as a linear optimization problem, which can be solved efficiently even for large-scale systems. We use simulated data to evaluate our approach, and show improved performance in particular on larger networks over state-of-the art methods. We achieve increased sensitivity and specificity, as well as a significant reduction in computing time. Furthermore, we show superior performance on noisy data. We then apply our approach to study the intracellular signaling of human primary nave CD4+ T-cells, as well as ErbB signaling in trastuzumab resistant breast cancer cells. In both cases, our approach recovers known interactions and points to additional relevant processes. In ErbB signaling, our results predict an important role of negative and positive feedback in controlling the cell cycle progression.
Viruses are extremely heterogeneous entities; the size and the nature of their genetic information, as well as the strategies employed to amplify and propagate their genomes, are highly variable. However, as obligatory intracellular parasites, replication of all viruses relies on the host cell. Having co-evolved with their host for several million years, viruses have developed very sophisticated strategies to hijack cellular factors that promote virus uptake, replication, and spread. Identification of host cell factors (HCFs) required for these processes is a major challenge for researchers, but it enables the identification of new, highly selective targets for anti viral therapeutics. To this end, the establishment of platforms enabling genome-wide high-throughput RNA interference (HT-RNAi) screens has led to the identification of several key factors involved in the viral life cycle. A number of genome-wide HT-RNAi screens have been performed for major human pathogens. These studies enable first inter-viral comparisons related to HCF requirements. Although several cellular functions appear to be uniformly required for the life cycle of most viruses tested (such as the proteasome and the Golgi-mediated secretory pathways), some factors, like the lipid kinase Phosphatidylinositol 4-kinase IIIα in the case of hepatitis C virus, are selectively required for individual viruses. However, despite the amount of data available, we are still far away from a comprehensive understanding of the interplay between viruses and host factors. Major limitations towards this goal are the low sensitivity and specificity of such screens, resulting in limited overlap between different screens performed with the same virus. This review focuses on how statistical and bioinformatic analysis methods applied to HT-RNAi screens can help overcoming these issues thus increasing the reliability and impact of such studies.
RNA interference; High-throughput; Cell population; Dependency factors; Bioinformatics; Human immunodeficiency virus; Hepatitis C virus; Dengue virus; Viral infection; Virus-host interactions
Oligodendroglioma poses a biological conundrum for malignant adult human gliomas: it is a tumor type that is universally incurable for patients, and yet, only a few of the human tumors have been established as cell populations in vitro or as intracranial xenografts in vivo. Their survival, thus, may emerge only within a specific environmental context. To determine the fate of human oligodendroglioma in an experimental model, we studied the development of an anaplastic tumor after intracranial implantation into enhanced green fluorescent protein (eGFP) positive NOD/SCID mice. Remarkably after nearly nine months, the tumor not only engrafted, but it also retained classic histological and genetic features of human oligodendroglioma, in particular cells with a clear cytoplasm, showing an infiltrative growth pattern, and harboring mutations of IDH1 (R132H) and of the tumor suppressor genes, FUBP1 and CIC. The xenografts were highly invasive, exhibiting a distinct migration and growth pattern around neurons, especially in the hippocampus, and following white matter tracts of the corpus callosum with tumor cells accumulating around established vasculature. Although tumors exhibited a high growth fraction in vivo, neither cells from the original patient tumor nor the xenograft exhibited significant growth in vitro over a six-month period. This glioma xenograft is the first to display a pure oligodendroglioma histology and expression of R132H. The unexpected property, that the cells fail to grow in vitro even after passage through the mouse, allows us to uniquely investigate the relationship of this oligodendroglioma with the in vivo microenvironment.
miRNA cluster miR-17-92 is known as oncomir-1 due to its potent oncogenic function. miR-17-92 is a polycistronic cluster that encodes 6 miRNAs, and can both facilitate and inhibit cell proliferation. Known targets of miRNAs encoded by this cluster are largely regulators of cell cycle progression and apoptosis. Here, we show that miRNAs encoded by this cluster and sharing the seed sequence of miR-17 exert their influence on one of the most essential cellular processes – endocytic trafficking. By mRNA expression analysis we identified that regulation of endocytic trafficking by miR-17 can potentially be achieved by targeting of a number of trafficking regulators. We have thoroughly validated TBC1D2/Armus, a GAP of Rab7 GTPase, as a novel target of miR-17. Our study reveals regulation of endocytic trafficking as a novel function of miR-17, which might act cooperatively with other functions of miR-17 and related miRNAs in health and disease.
Using a genome-wide screening approach, we have established the genetic requirements for proper telomere structure in Saccharomyces cerevisiae. We uncovered 112 genes, many of which have not previously been implicated in telomere function, that are required to form a fold-back structure at chromosome ends. Among other biological processes, lysine deacetylation, through the Rpd3L, Rpd3S, and Hda1 complexes, emerged as being a critical regulator of telomere structure. The telomeric-bound protein, Rif2, was also found to promote a telomere fold-back through the recruitment of Rpd3L to telomeres. In the absence of Rpd3 function, telomeres have an increased susceptibility to nucleolytic degradation, telomere loss, and the initiation of premature senescence, suggesting that an Rpd3-mediated structure may have protective functions. Together these data reveal that multiple genetic pathways may directly or indirectly impinge on telomere structure, thus broadening the potential targets available to manipulate telomere function.
Impaired telomere elongation eventually results in telomere dysfunction and can lead to diseases such as dyskeratosis congenita, which is associated with bone-marrow failure and pulmonary fibrosis. Cancer cells require continuous telomere maintenance to ensure continued cellular proliferation. Therefore the regulation of telomere function, both positively (in the case of dyskeratosis congenita) and negatively (for cancer), may be of therapeutic benefit. In this study we have used yeast to determine which genetic factors are important for a certain telomeric structure (the loop structure), which may help to maintain chromosome ends in a protected state. We found that multiple genetic factors and pathways affect telomere structure, ranging from metabolic signaling to specific telomere-binding proteins. We found that proper chromatin structure at the telomere is essential to maintain a telomere fold-back structure. Importantly, there was a strong correlation between telomere structure and function, as the mutants found in our screen (looping defective) were often associated with rapid senescence and telomere dysfunction phenotypes. We believe that, through the regulation of the various genetic pathways uncovered in our screen, one may be able to both positively and negatively influence telomere function.
Hepatitis C virus (HCV) is a major causative agent of chronic liver disease in humans. To gain insight into host factor requirements for HCV replication we performed a siRNA screen of the human kinome and identified 13 different kinases, including phosphatidylinositol-4 kinase III alpha (PI4KIIIα) as required for HCV replication. Consistent with elevated levels of the PI4KIIIα product phosphatidylinositol-4-phosphate (PI4P) detected in HCV infected cultured hepatocytes and liver tissue from chronic hepatitis C patients, the enzymatic activity of PI4KIIIα was critical for HCV replication. Viral nonstructural protein 5A (NS5A) was found to interact with PI4KIIIα and stimulate its kinase activity. The absence of PI4KIIIα activity induced a dramatic change in the ultrastructural morphology of the membranous HCV replication complex. Our analysis suggests that the direct activation of a lipid kinase by HCV NS5A contributes critically to the integrity of the membranous viral replication complex.
Hepatitis C virus (HCV) has infected around 160 million individuals. Current therapies have limited efficacy and are fraught with side effects. To identify cellular HCV dependency factors, possible therapeutic targets, we manipulated signaling cascades with pathway-specific inhibitors. Using this approach we identified the MAPK/ERK regulated, cytosolic, calcium-dependent, group IVA phospholipase A2 (PLA2G4A) as a novel HCV dependency factor. Inhibition of PLA2G4A activity reduced core protein abundance at lipid droplets, core envelopment and secretion of particles. Moreover, released particles displayed aberrant protein composition and were 100-fold less infectious. Exogenous addition of arachidonic acid, the cleavage product of PLA2G4A-catalyzed lipolysis, but not other related poly-unsaturated fatty acids restored infectivity. Strikingly, production of infectious Dengue virus, a relative of HCV, was also dependent on PLA2G4A. These results highlight previously unrecognized parallels in the assembly pathways of these human pathogens, and define PLA2G4A-dependent lipolysis as crucial prerequisite for production of highly infectious viral progeny.
The human genome encodes more than 30 phospholipase A2s. These enzymes cleave fatty acids at the C2 atom of phosphoglycerides and thus modulate membrane properties. Among all PLA2s only PLA2G4A, which is recruited to perinuclear membranes by Ca2+ and activated by extracellular stimuli via the mitogen activated protein kinase pathway, specifically cleaves lipids with arachidonic acid. Metabolism of arachidonic acid yields prostaglandins and leukotriens, important lipid mediators of inflammation. We show that inhibition of PLA2G4A produces aberrant HCV particles and that infectivity is rescued by addition of arachidonic acid. Our results suggest that a specific lipid (arachidonic acid) is essential for production of highly infectious HCV progeny, likely by creating a membrane environment conducive for efficient incorporation of crucial host and viral factors into the lipid envelope of nascent particles. Strikingly, PLA2G4A is also essential for production of highly infectious Dengue Virus (DENV) particles but not for vesicular stomatitis virus (VSV). These observations argue that HCV and DENV which unlike VSV produce particles at intracellular membranes usurp a common host factor (PLA2G4A) for assembly of highly infectious progeny. These findings open new perspectives for antiviral intervention and highlight thus far unrecognized parallels in the assembly pathway of HCV and DENV.
Network inference deals with the reconstruction of biological networks from experimental data. A variety of different reverse engineering techniques are available; they differ in the underlying assumptions and mathematical models used. One common problem for all approaches stems from the complexity of the task, due to the combinatorial explosion of different network topologies for increasing network size. To handle this problem, constraints are frequently used, for example on the node degree, number of edges, or constraints on regulation functions between network components. We propose to exploit topological considerations in the inference of gene regulatory networks. Such systems are often controlled by a small number of hub genes, while most other genes have only limited influence on the network's dynamic. We model gene regulation using a Bayesian network with discrete, Boolean nodes. A hierarchical prior is employed to identify hub genes. The first layer of the prior is used to regularize weights on edges emanating from one specific node. A second prior on hyperparameters controls the magnitude of the former regularization for different nodes. The net effect is that central nodes tend to form in reconstructed networks. Network reconstruction is then performed by maximization of or sampling from the posterior distribution. We evaluate our approach on simulated and real experimental data, indicating that we can reconstruct main regulatory interactions from the data. We furthermore compare our approach to other state-of-the art methods, showing superior performance in identifying hubs. Using a large publicly available dataset of over 800 cell cycle regulated genes, we are able to identify several main hub genes. Our method may thus provide a valuable tool to identify interesting candidate genes for further study. Furthermore, the approach presented may stimulate further developments in regularization methods for network reconstruction from data.
High-content, high-throughput RNA interference (RNAi) offers unprecedented possibilities to elucidate gene function and involvement in biological processes. Microscopy based screening allows phenotypic observations at the level of individual cells. It was recently shown that a cell's population context significantly influences results. However, standard analysis methods for cellular screens do not currently take individual cell data into account unless this is important for the phenotype of interest, i.e. when studying cell morphology.
We present a method that normalizes and statistically scores microscopy based RNAi screens, exploiting individual cell information of hundreds of cells per knockdown. Each cell's individual population context is employed in normalization. We present results on two infection screens for hepatitis C and dengue virus, both showing considerable effects on observed phenotypes due to population context. In addition, we show on a non-virus screen that these effects can be found also in RNAi data in the absence of any virus. Using our approach to normalize against these effects we achieve improved performance in comparison to an analysis without this normalization and hit scoring strategy. Furthermore, our approach results in the identification of considerably more significantly enriched pathways in hepatitis C virus replication than using a standard analysis approach.
Using a cell-based analysis and normalization for population context, we achieve improved sensitivity and specificity not only on a individual protein level, but especially also on a pathway level. This leads to the identification of new host dependency factors of the hepatitis C and dengue viruses and higher reproducibility of results.
Autoimmune pancreatitis (AIP) is thought to be an immune-mediated inflammatory process, directed against the epithelial components of the pancreas.
In order to explore key targets of the inflammatory process we analysed the expression of proteins at the RNA and protein level using genomics and proteomics, immunohistochemistry, Western blot and immunoassay. An animal model of AIP with LP-BM5 murine leukemia virus infected mice was studied in parallel. RNA microarrays of pancreatic tissue from 12 patients with AIP were compared to those of 8 patients with non-AIP chronic pancreatitis (CP).
Expression profiling revealed 272 upregulated genes, including those encoding for immunoglobulins, chemokines and their receptors, and 86 downregulated genes, including those for pancreatic proteases such as three trypsinogen isoforms. Protein profiling showed that the expression of trypsinogens and other pancreatic enzymes was greatly reduced. Immunohistochemistry demonstrated a near-loss of trypsin positive acinar cells, which was also confirmed by Western blotting. The serum of AIP patients contained high titres of autoantibodies against the trypsinogens PRSS1, and PRSS2 but not against PRSS3. In addition, there were autoantibodies against the trypsin inhibitor PSTI (the product of the SPINK1 gene). In the pancreas of AIP animals we found similar protein patterns and a reduction in trypsinogen.
These data indicate that the immune-mediated process characterizing AIP involves pancreatic acinar cells and their secretory enzymes such as trypsin isoforms. Demonstration of trypsinogen autoantibodies may be helpful for the diagnosis of AIP.
autoimmune pancreatitis; chronic pancreatitis; trypsinogen; proteomics; transcriptomics; autoantibody
Motivation: Detecting human proteins that are involved in virus entry and replication is facilitated by modern high-throughput RNAi screening technology. However, hit lists from different laboratories have shown only little consistency. This may be caused by not only experimental discrepancies, but also not fully explored possibilities of the data analysis. We wanted to improve reliability of such screens by combining a population analysis of infected cells with an established dye intensity readout.
Results: Viral infection is mainly spread by cell–cell contacts and clustering of infected cells can be observed during spreading of the infection in situ and in vivo. We employed this clustering feature to define knockdowns which harm viral infection efficiency of human Hepatitis C Virus. Images of knocked down cells for 719 human kinase genes were analyzed with an established point pattern analysis method (Ripley's K-function) to detect knockdowns in which virally infected cells did not show any clustering and therefore were hindered to spread their infection to their neighboring cells. The results were compared with a statistical analysis using a common intensity readout of the GFP-expressing viruses and a luciferase-based secondary screen yielding five promising host factors which may suit as potential targets for drug therapy.
Conclusion: We report of an alternative method for high-throughput imaging methods to detect host factors being relevant for the infection efficiency of viruses. The method is generic and has the potential to be used for a large variety of different viruses and treatments being screened by imaging techniques.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Human immunodeficiency virus type 1 (HIV-1) group M viruses have achieved a global distribution, while HIV-1 group O viruses are endemic only in particular regions of Africa. Here, we evaluated biological characteristics of group O and group M viruses in ex vivo models of HIV-1 infection. The replicative capacity and ability to induce CD4 T-cell depletion of eight group O and seven group M primary isolates were monitored in cultures of human peripheral blood mononuclear cells and tonsil explants. Comparative and longitudinal infection studies revealed HIV-1 group-specific activity patterns: CCR5-using (R5) viruses from group M varied considerably in their replicative capacity but showed similar levels of cytopathicity. In contrast, R5 isolates from group O were relatively uniform in their replicative fitness but displayed a high and unprecedented variability in their potential to deplete CD4 T cells. Two R5 group O isolates were identified that cause massive depletion of CD4 T cells, to an extent comparable to CXCR4-using viruses and not documented for any R5 isolate from group M. Intergroup comparisons found a five- to eightfold lower replicative fitness of isolates from group O than for isolates from group M yet a similar overall intrinsic pathogenicity in tonsil cultures. This study establishes biological ex vivo characteristics of HIV-1 group O primary isolates. The current findings challenge the belief that a grossly reduced replicative fitness or inherently impaired cytopathicity of viruses from this group underlies their low global prevalence.
The reconstruction of gene regulatory networks from time series gene expression data is one of the most difficult problems in systems biology. This is due to several reasons, among them the combinatorial explosion of possible network topologies, limited information content of the experimental data with high levels of noise, and the complexity of gene regulation at the transcriptional, translational and post-translational levels. At the same time, quantitative, dynamic models, ideally with probability distributions over model topologies and parameters, are highly desirable.
We present a novel approach to infer such models from data, based on nonlinear differential equations, which we embed into a stochastic Bayesian framework. We thus address both the stochasticity of experimental data and the need for quantitative dynamic models. Furthermore, the Bayesian framework allows it to easily integrate prior knowledge into the inference process. Using stochastic sampling from the Bayes' posterior distribution, our approach can infer different likely network topologies and model parameters along with their respective probabilities from given data. We evaluate our approach on simulated data and the challenge #3 data from the DREAM 2 initiative. On the simulated data, we study effects of different levels of noise and dataset sizes. Results on real data show that the dynamics and main regulatory interactions are correctly reconstructed.
Our approach combines dynamic modeling using differential equations with a stochastic learning framework, thus bridging the gap between biophysical modeling and stochastic inference approaches. Results show that the method can reap the advantages of both worlds, and allows the reconstruction of biophysically accurate dynamic models from noisy data. In addition, the stochastic learning framework used permits the computation of probability distributions over models and model parameters, which holds interesting prospects for experimental design purposes.
Single-nucleotide polymorphism (SNP) analysis is a powerful tool for mapping and diagnosing disease-related alleles. Mutation analysis by polymerase-mediated single-base primer extension (minisequencing) can be massively parallelized using DNA microchips or flow cytometry with microspheres as solid support. By adding a unique oligonucleotide tag to the 5′ end of the minisequencing primer and attaching the complementary antitag to the array or bead surface, the assay can be ‘demultiplexed’. Such high-throughput scoring of SNPs requires a high level of primer multiplexing in order to analyze multiple loci in one assay, thus enabling inexpensive and fast polymorphism scoring. We present a computer program to automate the design process for the assay. Oligonucleotide primers for the reaction are automatically selected by the software, a unique DNA tag/antitag system is generated, and the pairing of primers and DNA tags is automatically done in a way to avoid any crossreactivity. We report results on a 45-plex genotyping assay, indicating that minisequencing can be adapted to be a powerful tool for high-throughput, massively parallel genotyping. The software is available to academic users on request.