With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor.
The quiescent (G0) phase of the cell cycle is the reversible phase from which the cells exit from the cell cycle. Due to the difficulty of defining the G0 phase, quiescent cells have not been well characterized. In this study, a fusion protein consisting of mVenus and a defective mutant of CDK inhibitor, p27 (p27K−) was shown to be able to identify and isolate a population of quiescent cells and to effectively visualize the G0 to G1 transition. By comparing the expression profiles of the G0 and G1 cells defined by mVenus-p27K−, we have identified molecular features of quiescent cells. Quiescence is also an important feature of many types of stem cells, and mVenus-p27K−-transgenic mice enabled the detection of the quiescent cells with muscle stem cell markers in muscle in vivo. The mVenus-p27K− probe could be useful in investigating stem cells as well as quiescent cells.
SETBP1; SECONDARY AML; CMML; MONOSOMY 7; MUTATION
Kampo medicine is the Japanese adaptation of traditional medicine. In Kampo medicine, “medical interview” plays an important role. “Medical interview” in Japanese traditional medicine includes not only chief complaint but also a questionnaire that asked about the patient's lifestyle and subjective symptoms. The diagnosis by Kampo is called “Sho” and determined by completely different view from Western medicine. Specialists gather all available information and decide “Sho.” And this is the reason why non-Kampo specialists without technical knowledge have difficulties to use traditional medicine. We analyzed “medical interview” data to establish an indicator for non-Kampo specialist without technical knowledge to perform suitable traditional medicine. We predicted “Sho” by using random forests algorithm which is powerful algorithm for classification. First, we use all the 2830 first-visit patients' data. The discriminant ratio of training data was perfect but that of test data is only 67.0%. Second, to achieve high prediction power for practical use, we did data cleaning, and discriminant ratio of test data was 72.4%. Third, we added body mass index (BMI) data to “medical interview” data and discriminant ratio of test data is 91.2%. Originally, deficiency and excess category means that patient is strongly built or poorly built. We notice that the most important variable for classification is BMI.
Ctf18-replication factor C complex including Dscc1 (DNA replication and sister chromatid cohesion 1) is implicated in sister chromatid cohesion, DNA replication, and genome stability in S. cerevisiae and C. elegans. We previously performed gene expression profiling in primary colorectal cancer cells in order to identify novel molecular targets for the treatment of colorectal cancer. A feature of the cancer-associated transcriptional signature revealed from this effort is the elevated expression of the proto-oncogene DSCC1. Here, we have interrogated the molecular basis for deviant expression of human DSCC1 in colorectal cancer and its ability to promote survival of cancer cells. Quantitative PCR and immunohistochemical analyses corroborated that the expression level of DSCC1 is elevated in 60–70% of colorectal tumors compared to their matched noncancerous colonic mucosa. An in silico evaluation of the presumptive DSCC1 promoter region for consensus DNA transcriptional regulatory elements revealed a potential role for the E2F family of DNA-binding proteins in controlling DSCC1 expression. RNAi-mediated reduction of E2F1 reduced expression of DSCC1 in colorectal cancer cells. Gain- and loss-of-function experiments demonstrated that DSCC1 is involved in the viability of cancer cells in response to genotoxic stimuli. We reveal that E2F-dependent expression of DSCC1 confers anti-apoptotic properties in colorectal cancer cells, and that its suppression may be a useful option for the treatment of colorectal cancer.
A cold sensation (hie) is common in Japanese women and is an important treatment target in Kampo medicine. Physicians diagnose patients as having hiesho (cold disorder) when hie disturbs their daily activity. However, differences between hie and hiesho in men and women are not well described. Hie can be of three types depending on body part where patients feel hie. We aimed to clarify the characteristics of patients with hie and hiesho by analyzing data from new patients seen at the Kampo Clinic at Keio University Hospital between 2008 and 2013. We collected information about patients' subjective symptoms and their severity using visual analogue scales. Of 4,016 new patients, 2,344 complained about hie and 524 of those were diagnosed with hiesho. Hie was most common in legs/feet and combined with hands or lower back, rather than the whole body. Almost 30% of patients with hie felt upper body heat symptoms like hot flushes. Cold sensation was stronger in hiesho than non-hiesho patients. Patients with hie had more complaints. Men with hiesho had the same distribution of hie and had symptoms similar to women. The results of our study may increase awareness of hiesho and help doctors treat hie and other symptoms.
Kampo medicine or traditional Japanese medicine has been used under Japan's National Health Insurance scheme for 46 years. Recent research has shown that more than 80% of physicians use Kampo in daily practice. However, the use of Kampo from the patient perspective has received scant attention. To assess the current use of Kampo drugs in the National Health Insurance Program, we analysed a total of 67,113,579 health care claim records, which had been collected by Japan's Ministry of Health, Labour and Welfare in 2009. We found that Kampo drugs were prescribed for 1.34% of all patients. Among these, 92.2% simultaneously received biomedical drugs. Shakuyakukanzoto was the most frequently prescribed Kampo drug. The usage of frequently prescribed Kampo drugs differed between the youth and the elderly, males and females, and inpatients and outpatients. Kampo medicine has been employed in a wide variety of conditions, but the prescription rate was highest for disorders associated with pregnancy, childbirth, and the puerperium (4.08%). Although the adoption of Kampo medicine by physicians is large in a variety of diseases, the prescription rate of Kampo drugs is very limited.
Vaccination is a preventive measure against influenza that does not require placing restrictions on social activities. However, since the stockpile of vaccine that can be prepared before the arrival of an emerging pandemic strain is generally quite limited, one has to select priority target groups to which the first stockpile is distributed. In this paper, we study a simulation-based priority target selection method with the goal of enhancing the collective immunity of the whole population. To model the region in which the disease spreads, we consider an urban area composed of suburbs and central areas connected by a single commuter train line. Human activity is modelled following an agent-based approach. The degree to which collective immunity is enhanced is judged by the attack rate in unvaccinated people. The simulation results show that if students and office workers are given exclusive priority in the first three months, the attack rate can be reduced from in the baseline case down to 1–2%. In contrast, random vaccination only slightly reduces the attack rate. It should be noted that giving preference to active social groups does not mean sacrificing those at high risk, which corresponds to the elderly in our simulation model. Compared with the random administration of vaccine to all social groups, this design successfully reduces the attack rate across all age groups.
Assigning a protein into one of its folds is a transitional step for discovering three dimensional protein structure, which is a challenging task in bimolecular (biological) science. The present research focuses on: 1) the development of classifiers, and 2) the development of feature extraction techniques based on syntactic and/or physicochemical properties.
Apart from the above two main categories of research, we have shown that the selection of physicochemical attributes of the amino acids is an important step in protein fold recognition and has not been explored adequately. We have presented a multi-dimensional successive feature selection (MD-SFS) approach to systematically select attributes. The proposed method is applied on protein sequence data and an improvement of around 24% in fold recognition has been noted when selecting attributes appropriately.
The MD-SFS has been applied successfully in selecting physicochemical attributes of the amino acids. The selected attributes show improved protein fold recognition performance.
MicroRNAs (miRNAs) are key post-transcriptional regulators of gene expression and commonly deregulated in carcinogenesis. To explore functionally crucial tumor-suppressive (TS)-miRNAs in hepatocellular carcinoma (HCC), we performed integrative function- and expression-based screenings of TS-miRNAs in six HCC cell lines. The screenings identified seven miRNAs, which showed growth-suppressive activities through the overexpression of each miRNA and were endogenously downregulated in HCC cell lines. Further expression analyses using a large panel of HCC cell lines and primary tumors demonstrated four miRNAs, miR-101, -195, -378 and -497, as candidate TS-miRNAs frequently silenced in HCCs. Among them, two clustered miRNAs miR-195 and miR-497 showed significant growth-suppressive activity with induction of G1 arrest. Comprehensive exploration of their targets using Argonute2-immunoprecipitation-deep-sequencing (Ago2-IP-seq) and genome-wide expression profiling after their overexpression followed by pathway analysis, revealed a significant enrichment of cell cycle regulators. Among the candidates, we successfully identified CCNE1, CDC25A, CCND3, CDK4, and BTRC as direct targets for miR-497 and miR-195. Moreover, target genes frequently upregulated in HCC in a tumor-specific manner, such as CDK6, CCNE1, CDC25A and CDK4, showed an inverse correlation in the expression of miR-195 and miR-497, and their targets. These results suggest the molecular pathway regulating cell cycle progression to be integrally altered by downregulation of miR-195 and miR-497 expression, leading to the aberrant cell proliferation in hepatocarcinogenesis.
The quantification of social media impacts on societal and political events is a difficult undertaking. The Japanese Society of Oriental Medicine started a signature-collecting campaign to oppose a medical policy of the Government Revitalization Unit to exclude a traditional Japanese medicine, “Kampo,” from the public insurance system. The signature count showed a series of aberrant bursts from November 26 to 29, 2009. In the same interval, the number of messages on Twitter including the keywords “Signature” and “Kampo,” increased abruptly. Moreover, the number of messages on an Internet forum that discussed the policy and called for signatures showed a train of spikes.
Methods and Findings
In order to estimate the contributions of social media, we developed a statistical model with state-space modeling framework that distinguishes the contributions of multiple social media in time-series of collected public opinions. We applied the model to the time-series of signature counts of the campaign and quantified contributions of two social media, i.e., Twitter and an Internet forum, by the estimation. We found that a considerable portion (78%) of the signatures was affected from either of the social media throughout the campaign and the Twitter effect (26%) was smaller than the Forum effect (52%) in total, although Twitter probably triggered the initial two bursts of signatures. Comparisons of the estimated profiles of the both effects suggested distinctions between the social media in terms of sustainable impact of messages or tweets. Twitter shows messages on various topics on a time-line; newer messages push out older ones. Twitter may diminish the impact of messages that are tweeted intermittently.
The quantification of social media impacts is beneficial to better understand people’s tendency and may promote developing strategies to engage public opinions effectively. Our proposed method is a promising tool to explore information hidden in social phenomena.
Recent advances in high-throughput sequencing technologies have enabled a comprehensive dissection of the cancer genome clarifying a large number of somatic mutations in a wide variety of cancer types. A number of methods have been proposed for mutation calling based on a large amount of sequencing data, which is accomplished in most cases by statistically evaluating the difference in the observed allele frequencies of possible single nucleotide variants between tumours and paired normal samples. However, an accurate detection of mutations remains a challenge under low sequencing depths or tumour contents. To overcome this problem, we propose a novel method, Empirical Bayesian mutation Calling (https://github.com/friend1ws/EBCall), for detecting somatic mutations. Unlike previous methods, the proposed method discriminates somatic mutations from sequencing errors based on an empirical Bayesian framework, where the model parameters are estimated using sequencing data from multiple non-paired normal samples. Using 13 whole-exome sequencing data with 87.5–206.3 mean sequencing depths, we demonstrate that our method not only outperforms several existing methods in the calling of mutations with moderate allele frequencies but also enables accurate calling of mutations with low allele frequencies (≤10%) harboured within a minor tumour subpopulation, thus allowing for the deciphering of fine substructures within a tumour specimen.
Apoptosis is a critical process in endothelial cell (EC) biology and pathology, which has been extensively studied at protein level. Numerous gene expression studies of EC apoptosis have also been performed, however few attempts have been made to use gene expression data to identify the molecular relationships and master regulators that underlie EC apoptosis. Therefore, we sought to understand these relationships by generating a Bayesian gene regulatory network (GRN) model.
ECs were induced to undergo apoptosis using serum withdrawal and followed over a time course in triplicate, using microarrays. When generating the GRN, this EC time course data was supplemented by a library of microarray data from EC treated with siRNAs targeting over 350 signalling molecules.
The GRN model proposed Vasohibin-1 (VASH1) as one of the candidate master-regulators of EC apoptosis with numerous downstream mRNAs. To evaluate the role played by VASH1 in EC, we used siRNA to reduce the expression of VASH1. Of 10 mRNAs downstream of VASH1 in the GRN that were examined, 7 were significantly up- or down-regulated in the direction predicted by the GRN.Further supporting an important biological role of VASH1 in EC, targeted reduction of VASH1 mRNA abundance conferred resistance to serum withdrawal-induced EC death.
We have utilised Bayesian GRN modelling to identify a novel candidate master regulator of EC apoptosis. This study demonstrates how GRN technology can complement traditional methods to hypothesise the regulatory relationships that underlie important biological processes.
Vasohibin; HUVEC; Bayesian; Gene regulatory network
TNF (Tumor Necrosis Factor-α) induces HUVEC (Human Umbilical Vein Endothelial Cells) to proliferate and form new blood vessels. This TNF-induced angiogenesis plays a key role in cancer and rheumatic disease. However, the molecular system that underlies TNF-induced angiogenesis is largely unknown.
We analyzed the gene expression changes stimulated by TNF in HUVEC over a time course using microarrays to reveal the molecular system underlying TNF-induced angiogenesis. Traditional k-means clustering analysis was performed to identify informative temporal gene expression patterns buried in the time course data. Functional enrichment analysis using DAVID was then performed for each cluster. The genes that belonged to informative clusters were then used as the input for gene network analysis using a Bayesian network and nonparametric regression method. Based on this TNF-induced gene network, we searched for sub-networks related to angiogenesis by integrating existing biological knowledge.
k-means clustering of the TNF stimulated time course microarray gene expression data, followed by functional enrichment analysis identified three biologically informative clusters related to apoptosis, cellular proliferation and angiogenesis. These three clusters included 648 genes in total, which were used to estimate dynamic Bayesian networks. Based on the estimated TNF-induced gene networks, we hypothesized that a sub-network including IL6 and IL8 inhibits apoptosis and promotes TNF-induced angiogenesis. More particularly, IL6 promotes TNF-induced angiogenesis by inducing NF-κB and IL8, which are strong cell growth factors.
Computational gene network analysis revealed a novel molecular system that may play an important role in the TNF-induced angiogenesis seen in cancer and rheumatic disease. This analysis suggests that Bayesian network analysis linked to functional annotation may be a powerful tool to provide insight into disease.
A common approach for time series gene expression data analysis includes the clustering of genes with similar expression patterns throughout time. Clustered gene expression profiles point to the joint contribution of groups of genes to a particular cellular process. However, since genes belong to intricate networks, other features, besides comparable expression patterns, should provide additional information for the identification of functionally similar genes.
In this study we perform gene clustering through the identification of Granger causality between and within sets of time series gene expression data. Granger causality is based on the idea that the cause of an event cannot come after its consequence.
This kind of analysis can be used as a complementary approach for functional clustering, wherein genes would be clustered not solely based on their expression similarity but on their topological proximity built according to the intensity of Granger causality among them.
Structural variations (SVs) in genomes are commonly observed even in healthy individuals and play key roles in biological functions. To understand their functional impact or to infer molecular mechanisms of SVs, they have to be characterized with the maximum resolution. However, high-resolution analysis is a difficult task because it requires investigation of the complex structures involved in an enormous number of alignments of next-generation sequencing (NGS) reads and genome sequences that contain errors.
We propose a new method called ChopSticks that improves the resolution of SV detection for homozygous deletions even when the depth of coverage is low. Conventional methods based on read pairs use only discordant pairs to localize the positions of deletions, where a discordant pair is a read pair whose alignment has an aberrant strand or distance. In contrast, our method exploits concordant reads as well. We theoretically proved that when the depth of coverage approaches zero or infinity, the expected resolution of our method is asymptotically equal to that of methods based only on discordant pairs under double coverage. To confirm the effectiveness of ChopSticks, we conducted computational experiments against both simulated NGS reads and real NGS sequences. The resolution of deletion calls by other methods was significantly improved, thus demonstrating the usefulness of ChopSticks.
ChopSticks can generate high-resolution deletion calls of homozygous deletions using information independent of other methods, and it is therefore useful to examine the functional impact of SVs or to infer SV generation mechanisms.
XiP (eXtensible integrative Pipeline) is a flexible, editable and modular environment
with a user-friendly interface that does not require previous advanced programming skills
to run, construct and edit workflows. XiP allows the construction of workflows by linking
components written in both R and Java, the analysis of high-throughput data in grid engine
systems and also the development of customized pipelines that can be encapsulated in a
package and distributed. XiP already comes with several ready-to-use pipeline flows for
the most common genomic and transcriptomic analysis and ∼300 computational
Availability: XiP is open source, freely available under the Lesser General
Public License (LGPL) and can be downloaded from http://xip.hgc.jp.
To identify stage I lung adenocarcinoma patients with a poor prognosis who will benefit from adjuvant therapy.
Patients and Methods
Whole gene expression profiles were obtained at 19 time points over a 48-hour time course from human primary lung epithelial cells that were stimulated with epidermal growth factor (EGF) in the presence or absence of a clinically used EGF receptor tyrosine kinase (RTK)-specific inhibitor, gefitinib. The data were subjected to a mathematical simulation using the State Space Model (SSM). “Gefitinib-sensitive” genes, the expressional dynamics of which were altered by addition of gefitinib, were identified. A risk scoring model was constructed to classify high- or low-risk patients based on expression signatures of 139 gefitinib-sensitive genes in lung cancer using a training data set of 253 lung adenocarcinomas of North American cohort. The predictive ability of the risk scoring model was examined in independent cohorts of surgical specimens of lung cancer.
The risk scoring model enabled the identification of high-risk stage IA and IB cases in another North American cohort for overall survival (OS) with a hazard ratio (HR) of 7.16 (P = 0.029) and 3.26 (P = 0.0072), respectively. It also enabled the identification of high-risk stage I cases without bronchioalveolar carcinoma (BAC) histology in a Japanese cohort for OS and recurrence-free survival (RFS) with HRs of 8.79 (P = 0.001) and 3.72 (P = 0.0049), respectively.
The set of 139 gefitinib-sensitive genes includes many genes known to be involved in biological aspects of cancer phenotypes, but not known to be involved in EGF signaling. The present result strongly re-emphasizes that EGF signaling status in cancer cells underlies an aggressive phenotype of cancer cells, which is useful for the selection of early-stage lung adenocarcinoma patients with a poor prognosis.
The Gene Expression Omnibus (GEO) GSE31210
Epidemiological studies have suggested that the encounter with commensal microorganisms during the neonatal period is essential for normal development of the host immune system. Basic research involving gnotobiotic mice has demonstrated that colonization at the age of 5 weeks is too late to reconstitute normal immune function. In this study, we examined the transcriptome profiles of the large intestine (LI), small intestine (SI), liver (LIV), and spleen (SPL) of 3 bacterial colonization models—specific pathogen-free mice (SPF), ex-germ-free mice with bacterial reconstitution at the time of delivery (0WexGF), and ex-germ-free mice with bacterial reconstitution at 5 weeks of age (5WexGF)—and compared them with those of germ-free (GF) mice.
Hundreds of genes were affected in all tissues in each of the colonized models; however, a gene set enrichment analysis method, MetaGene Profiler (MGP), demonstrated that the specific changes of Gene Ontology (GO) categories occurred predominantly in 0WexGF LI, SPF SI, and 5WexGF SPL, respectively. MGP analysis on signal pathways revealed prominent changes in toll-like receptor (TLR)- and type 1 interferon (IFN)-signaling in LI of 0WexGF and SPF mice, but not 5WexGF mice, while 5WexGF mice showed specific changes in chemokine signaling. RT-PCR analysis of TLR-related genes showed that the expression of interferon regulatory factor 3 (Irf3), a crucial rate-limiting transcription factor in the induction of type 1 IFN, prominently decreased in 0WexGF and SPF mice but not in 5WexGF and GF mice.
The present study provides important new information regarding the molecular mechanisms of the so-called "hygiene hypothesis".
Hygiene hypothesis; Germ-free; Toll-like receptor; Type 1 interferon; MetaGene Profiler
Motivation: In cancer genomes, chromosomal regions harboring cancer genes are often subjected to genomic aberrations like copy number alteration and loss of heterozygosity. Given this, finding recurrent genomic aberrations is considered an apt approach for screening cancer genes. Although several permutation-based tests have been proposed for this purpose, none of them are designed to find recurrent aberrations from the genomic dataset without paired normal sample controls. Their application to unpaired genomic data may lead to false discoveries, because they retrieve pseudo-aberrations that exist in normal genomes as polymorphisms.
Results: We develop a new parametric method named parametric aberration recurrence test (PART) to test for the recurrence of genomic aberrations. The introduction of Poisson-binomial statistics allow us to compute small P-values more efficiently and precisely than the previously proposed permutation-based approach. Moreover, we extended PART to cover unpaired data (PART-up) so that there is a statistical basis for analyzing unpaired genomic data. PART-up uses information from unpaired normal sample controls to remove pseudo-aberrations in unpaired genomic data. Using PART-up, we successfully predict recurrent genomic aberrations in cancer cell line samples whose paired normal sample controls are unavailable. This article thus proposes a powerful statistical framework for the identification of driver aberrations, which would be applicable to ever-increasing amounts of cancer genomic data seen in the era of next generation sequencing.
Availability: Our implementations of PART and PART-up are available from http://www.hgc.jp/~niiyan/PART/manual.html.
Supplementary data are available at Bioinformatics online.
Summary: Protein–protein interactions (PPIs) are mediated through specific regions on proteins. Some proteins have two or more protein interacting regions (IRs) and some IRs are competitively used for interactions with different proteins. IRView currently contains data for 3417 IRs in human and mouse proteins. The data were obtained from different sources and combined with annotated region data from InterPro. Information on non-synonymous single nucleotide polymorphism sites and variable regions owing to alternative mRNA splicing is also included. The IRView web interface displays all IR data, including user-uploaded data, on reference sequences so that the positional relationship between IRs can be easily understood. IRView should be useful for analyzing underlying relationships between the proteins behind the PPI networks.
Availability: IRView is publicly available on the web at http://ir.hgc.jp/.
Our understanding of the molecular pathways that underlie melanoma remains incomplete. Although several published microarray studies of clinical melanomas have provided valuable information, we found only limited concordance between these studies. Therefore, we took an in vitro functional genomics approach to understand melanoma molecular pathways.
Affymetrix microarray data were generated from A375 melanoma cells treated in vitro with siRNAs against 45 transcription factors and signaling molecules. Analysis of this data using unsupervised hierarchical clustering and Bayesian gene networks identified proliferation-association RNA clusters, which were co-ordinately expressed across the A375 cells and also across melanomas from patients. The abundance in metastatic melanomas of these cellular proliferation clusters and their putative upstream regulators was significantly associated with patient prognosis. An 8-gene classifier derived from gene network hub genes correctly classified the prognosis of 23/26 metastatic melanoma patients in a cross-validation study. Unlike the RNA clusters associated with cellular proliferation described above, co-ordinately expressed RNA clusters associated with immune response were clearly identified across melanoma tumours from patients but not across the siRNA-treated A375 cells, in which immune responses are not active. Three uncharacterised genes, which the gene networks predicted to be upstream of apoptosis- or cellular proliferation-associated RNAs, were found to significantly alter apoptosis and cell number when over-expressed in vitro.
This analysis identified co-expression of RNAs that encode functionally-related proteins, in particular, proliferation-associated RNA clusters that are linked to melanoma patient prognosis. Our analysis suggests that A375 cells in vitro may be valid models in which to study the gene expression modules that underlie some melanoma biological processes (e.g., proliferation) but not others (e.g., immune response). The gene expression modules identified here, and the RNAs predicted by Bayesian network inference to be upstream of these modules, are potential prognostic biomarkers and drug targets.
In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded.
We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood.
For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib.
From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Although protein–RNA interactions (PRIs) are involved in various important cellular processes, compiled data on PRIs are still
limited. This contrasts with protein–protein interactions, which have been intensively recorded in public databases and subjected
to network level analysis. Here, we introduce PRD, an online database of PRIs, dispersed across several sources, including scientific
literature. Currently, over 10,000 interactions have been stored in PRD using PSI-MI 2.5, which is a standard model for describing
detailed molecular interactions, with an emphasis on gene level data. Users can browse all recorded interactions and execute
flexible keyword searches against the database via a web interface. Our database is not only a reference of PRIs, but will also be a
valuable resource for studying characteristics of PRI networks.
PRD can be freely accessed at http://pri.hgc.jp/
protein-RNA interaction; Biomolecular interaction; RNA binding protein; Database
Structural variations (SVs) change the structure of the genome and are therefore the causes of various diseases. Next-generation sequencing allows us to obtain a multitude of sequence data, some of which can be used to infer the position of SVs.
We developed a new method and implementation named ClipCrop for detecting SVs with single-base resolution using soft-clipping information. A soft-clipped sequence is an unmatched fragment in a partially mapped read. To assess the performance of ClipCrop with other SV-detecting tools, we generated various patterns of simulation data – SV lengths, read lengths, and the depth of coverage of short reads – with insertions, deletions, tandem duplications, inversions and single nucleotide alterations in a human chromosome. For comparison, we selected BreakDancer, CNVnator and Pindel, each of which adopts a different approach to detect SVs, e.g. discordant pair approach, depth of coverage approach and split read approach, respectively.
Our method outperformed BreakDancer and CNVnator in both discovering rate and call accuracy in any type of SV. Pindel offered a similar performance as our method, but our method crucially outperformed for detecting small duplications. From our experiments, ClipCrop infer reliable SVs for the data set with more than 50 bases read lengths and 20x depth of coverage, both of which are reasonable values in current NGS data set.
ClipCrop can detect SVs with higher discovering rate and call accuracy than any other tool in our simulation data set.