Management of Retinoblastoma (RB), a pediatric ocular cancer is limited by drug-resistance and drug-dosage related side effects during chemotherapy. Molecular de-regulation in post-chemotherapy RB tumors was investigated.
Materials and Methods
cDNA microarray analysis of two post-chemotherapy and one pre-chemotherapy RB tumor tissues was performed, followed by Principle Component Analysis, Gene ontology, Pathway Enrichment analysis and Biological Analysis Network (BAN) modeling. The drug modulation role of two significantly up-regulated genes (p≤0.05) − Ect2 (Epithelial-cell-transforming-sequence-2), and PRAME (preferentially-expressed-Antigen-in-Melanoma) was assessed by qRT-PCR, immunohistochemistry and cell viability assays.
Differential up-regulation of 1672 genes and down-regulation of 2538 genes was observed in RB tissues (relative to normal adult retina), while 1419 genes were commonly de-regulated between pre-chemotherapy and post- chemotherapy RB. Twenty one key gene ontology categories, pathways, biomarkers and phenotype groups harboring 250 differentially expressed genes were dys-regulated (EZH2, NCoR1, MYBL2, RB1, STAMN1, SYK, JAK1/2, STAT1/2, PLK2/4, BIRC5, LAMN1, Ect2, PRAME and ABCC4). Differential molecular expressions of PRAME and Ect2 in RB tumors with and without chemotherapy were analyzed. There was neither up- regulation of MRP1, nor any significant shift in chemotherapeutic IC50, in PRAME over-expressed versus non-transfected RB cells.
Cell cycle regulatory genes were dys-regulated post-chemotherapy. Ect2 gene was expressed in response to chemotherapy-induced stress. PRAME does not contribute to drug resistance in RB, yet its nuclear localization and BAN information, points to its possible regulatory role in RB.
RB; Ect2; PRAME; MYBL2; NCoR1; drug resistance; micro array; chemotherapy
Histone modifications occur in precise patterns, with several modifications known to affect the binding of proteins. These interactions affect the chromatin structure, gene regulation, and cell cycle events. The dual modifications on the H3 tail, serine10 phosphorylation, and lysine14 acetylation (H3Ser10PLys14Ac) are reported to be crucial for interaction with 14-3-3ζ. However, the mechanism by which H3Ser10P along with neighboring site-specific acetylation(s) is targeted by its regulatory proteins, including kinase and phosphatase, is not fully understood. We carried out molecular modeling studies to understand the interaction of 14-3-3ζ, and its regulatory proteins, mitogen-activated protein kinase phosphatase-1 (MKP1), and mitogen- and stress-activated protein kinase-1 (MSK1) with phosphorylated H3Ser10 alone or in combination with acetylated H3Lys9 and Lys14. In silico molecular association studies suggested that acetylated Lys14 and phosphorylated Ser10 of H3 shows the highest binding affinity towards 14-3-3ζ. In addition, acetylation of H3Lys9 along with Ser10PLys14Ac favors the interaction of the phosphatase, MKP1, for dephosphorylation of H3Ser10P. Further, MAP kinase, MSK1 phosphorylates the unmodified H3Ser10 containing N-terminal tail with maximum affinity compared to the N-terminal tail with H3Lys9AcLys14Ac. The data clearly suggest that opposing enzymatic activity of MSK1 and MKP1 corroborates with non-acetylated and acetylated, H3Lys9Lys14, respectively. Our in silico data highlights that site-specific phosphorylation (H3Ser10P) and acetylation (H3Lys9 and H3Lys14) of H3 are essential for the interaction with their regulatory proteins (MKP1, MSK1, and 14-3-3ζ) and plays a major role in the regulation of chromatin structure.
modeling; histone H3 modifications; 14-3-3ζ; MSK1; MKP1
Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth.
longissimus muscle; intramuscular fat; growth; nutrition
We recently constructed a computable cell proliferation network (CPN) model focused on lung tissue to unravel complex biological processes and their exposure-related perturbations from molecular profiling data. The CPN consists of edges and nodes representing upstream controllers of gene expression largely generated from transcriptomics datasets using Reverse Causal Reasoning (RCR). Here, we report an approach to biologically verify the correctness of upstream controller nodes using a specifically designed, independent lung cell proliferation dataset. Normal human bronchial epithelial cells were arrested at G1/S with a cell cycle inhibitor. Gene expression changes and cell proliferation were captured at different time points after release from inhibition. Gene set enrichment analysis demonstrated cell cycle response specificity via an overrepresentation of proliferation related gene sets. Coverage analysis of RCR-derived hypotheses returned statistical significance for cell cycle response specificity across the whole model as well as for the Growth Factor and Cell Cycle sub-network models.
cell proliferation; biological network model; reverse causal reasoning
Proteins may be related to each other very specifically as homologous subfamilies. Proteins can also be related to diverse proteins at the super family level. It has become highly important to characterize the existing sequence databases by their signatures to facilitate the function annotation of newly added sequences. The algorithm described here uses a scheme for the classification of odorant binding proteins on the basis of functional residues and Cys-pairing. The cysteine-based scoring scheme not only helps in unambiguously identifying families like odorant binding proteins (OBPs), but also aids in their classification at the subfamily level with reliable accuracy. The algorithm was also applied to yet another cysteine-rich family, where similar accuracy was observed that ensures the application of the protocol to other families.
cysteine-based scoring scheme; Classification of proteins; Functionally important residues; Ligand binding residues
We used the newly-developed Dynamic Impact Approach (DIA) and gene network analysis to study the sow mammary transcriptome at 80, 100, and 110 days of pregnancy. A swine oligoarray with 13,290 inserts was used for transcriptome profiling. An ANOVA with false discovery rate (FDR < 0.15) correction resulted in 1,409 genes with a significant time effect across time comparisons. The DIA uncovered that Fatty acid biosynthesis, Interleukin-4 receptor binding, Galactose metabolism, and mTOR signaling were among the most-impacted pathways. IL-4 receptor binding, ABC transporters, cytokine-cytokine receptor interaction, and Jak-STAT signaling were markedly activated at 110 days compared with 80 and 100 days. Epigenetic and transcription factor regulatory mechanisms appear important in coordinating the final stages of mammary development during pregnancy. Network analysis revealed a crucial role for TP53, ARNT2, E2F4, and PPARG. The bioinformatics analyses revealed a number of pathways and functions that perform an irreplaceable role during late gestation to farrowing.
systems biology; transcriptomics; mammary gland; sow; dynamic impact approach
Exposure to environmental stressors such as cigarette smoke (CS) elicits a variety of biological responses in humans, including the induction of inflammatory responses. These responses are especially pronounced in the lung, where pulmonary cells sit at the interface between the body’s internal and external environments. We combined a literature survey with a computational analysis of multiple transcriptomic data sets to construct a computable causal network model (the Inflammatory Process Network (IPN)) of the main pulmonary inflammatory processes. The IPN model predicted decreased epithelial cell barrier defenses and increased mucus hypersecretion in human bronchial epithelial cells, and an attenuated pro-inflammatory (M1) profile in alveolar macrophages following exposure to CS, consistent with prior results. The IPN provides a comprehensive framework of experimentally supported pathways related to CS-induced pulmonary inflammation. The IPN is freely available to the scientific community as a resource with broad applicability to study the pathogenesis of pulmonary disease.
inflammation; cigarette smoke; network model; gene expression; biological expression language (BEL); reverse causal reasoning (RCR)
Limno-terrestrial tardigrades are small invertebrates that are subjected to periodic drought of their micro-environment. They have evolved to cope with these unfavorable conditions by anhydrobiosis, an ametabolic state of low cellular water. During drying and rehydration, tardigrades go through drastic changes in cellular water content. By our transcriptome sequencing effort of the limno-terrestrial tardigrade Milnesium tardigradum and by a combination of cloning and targeted sequence assembly, we identified transcripts encoding eleven putative aquaporins. Analysis of these sequences proposed 2 classical aquaporins, 8 aquaglyceroporins and a single potentially intracellular unorthodox aquaporin. Using quantitative real-time PCR we analyzed aquaporin transcript expression in the anhydrobiotic context. We have identified additional unorthodox aquaporins in various insect genomes and have identified a novel common conserved structural feature in these proteins. Analysis of the genomic organization of insect aquaporin genes revealed several conserved gene clusters.
unorthodox aquaporin; anhydrobiosis; tardigrade
To date, the utility of single genetic markers to improve disease risk assessment still explains only a small proportion of genetic variance for many complex diseases. This missing heritability may be explained by additional variants with weak effects. To discover and incorporate these additional genetic factors, statistical and computational methods must be evaluated and developed. We develop a multi-locus genetic risk score (GRS) based approach to analyze genes in NADPH oxidase complex which may result in susceptibility to development of inflammatory bowel disease (IBD). We find the complex is highly associated with IBD (P = 7.86 × 10−14) using the GRS-based association method. Similar results are also shown in permutation analysis (P = 6.65 × 10−11). Likelihood ratio test shows that the single nucleotide polymorphisms (SNPs) in the complex without nominal signals have significant contribution to the overall genetic effect within the complex (P = 0.015). Our results show that the multi-locus GRS association model can improve the genetic risk assessment on IBD by taking into account both confirmed and as yet unconfirmed disease susceptibility variants.
genetic risk score; inflammatory bowel disease; permutation analysis; association analysis
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expressions by targeting the mRNAs especially in the 3′UTR regions. The identification of miRNAs has been done by biological experiment and computational prediction. The computational prediction approach has been done using two major methods: comparative and noncomparative. The comparative method is dependent on the conservation of the miRNA sequences and secondary structure. The noncomparative method, on the other hand, does not rely on conservation. We hypothesized that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. The average sensitivity was 88.62% for miR-Explore, which relies on within miRNA class alignment, and 70.82% for miR-abela, which relies on global alignment. Compared with global alignment, grouping miRNA by classes yields a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment are used.
miR-explore; chicken; miRNA class alignment; miRNA
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
SNP prediction; inductive logic programming; human monogenic disease; genotype-phenotype relation
Bacterial, small RNAs were once regarded as potent regulators of gene expression and are now being considered as essential for their diversified roles. Many small RNAs are now reported to have a wide array of regulatory functions, ranging from environmental sensing to pathogenesis. Traditionally, noncoding transcripts were rarely detected by means of genetic screens. However, the availability of approximately 2200 prokaryotic genome sequences in public databases facilitates the efficient computational search of those molecules, followed by experimental validation. In principle, the following four major computational methods were applied for the prediction of sRNA locations from bacterial genome sequences: (1) comparative genomics, (2) secondary structure and thermodynamic stability, (3) ‘Orphan’ transcriptional signals and (4) ab initio methods regardless of sequence or structure similarity; most of these tools were applied to locate the putative genomic sRNA locations followed by experimental validation of those transcripts. Therefore, computational screening has simplified the sRNA identification process in bacteria. In this review, a plethora of small RNA prediction methods and tools that have been reported in the past decade are discussed comprehensively and assessed based on their attributes, compatibility, and their prediction accuracy.
comparative genomics; base composition; ncRNA; sRNA prediction; structure stability; transcriptional signal
Towards the development of a systems biology-based risk assessment approach for environmental toxicants, including tobacco products in a systems toxicology setting such as the “21st Century Toxicology”, we are building a series of computable biological network models specific to non-diseased pulmonary and cardiovascular cells/tissues which capture the molecular events that can be activated following exposure to environmental toxicants. Here we extend on previous work and report on the construction and evaluation of a mechanistic network model focused on DNA damage response and the four main cellular fates induced by stress: autophagy, apoptosis, necroptosis, and senescence. In total, the network consists of 34 sub-models containing 1052 unique nodes and 1538 unique edges which are supported by 1231 PubMed-referenced literature citations. Causal node-edge relationships are described using the Biological Expression Language (BEL), which allows for the semantic representation of life science relationships in a computable format. The Network is provided in .XGMML format and can be viewed using freely available network visualization software, such as Cytoscape.
computable; network model; DNA damage; autophagy; apoptosis; necroptosis; senescence; Biological Expression Language (BEL)
Three-dimensional models of the alpha- and beta-1 subunits of the calcium-activated potassium channel (BK) were predicted by threading modeling. A recursive approach comprising of sequence alignment and model building based on three templates was used to build these models, with the refinement of non-conserved regions carried out using threading techniques. The complex formed by the subunits was studied by means of docking techniques, using 3D models of the two subunits, and an approach based on rigid-body structures. Structural effects of the complex were analyzed with respect to hydrogen-bond interactions and binding-energy calculations. Potential interaction sites of the complex were determined by referencing a study of the difference accessible surface area (DASA) of the protein subunits in the complex.
potassium channel; docking; moelcular interactions; binding energy
DNA of apparently recent bacterial origin is found in the genomic sequences of Caenorhabditis angaria and Caenorhabditis remanei. Here we present evidence that the DNA belongs to a single species of the genus Leucobacter (high-GC Gram+ Actinobacteria). Metagenomic tools enabled the assembly of the contaminating sequences in a draft genome of 3.2 Mb harboring 2,826 genes. This information provides insight into a microbial organism intimately associated with Caenorhabditis as well as a solid basis for the reassignment of 3,373 metazoan entries of the public database to a novel bacterial species (Leucobacter sp. AEAR). The application of metagenomic techniques can thus prevent annotation errors and reveal unexpected genetic information in data obtained by conventional genomics.
host-microbe interactions; organism identification in DNA sequences; contamination in sequence database; next-generation sequencing; purine degradation
In this study, we investigated the modalities of coding open reading frame (cORF) classification of expressed sequence tags (EST) by using the universal feature method (UFM). The UFM algorithm is based on the scoring of purine bias (Rrr) and stop codon frequencies. UFM classifies ORFs as coding or non-coding through a score based on 5 factors: (i) stop codon frequency; (ii) the product of the probabilities of purines occurring in the three positions of nucleotide triplets; (iii) the product of the probabilities of Cytosine (C), Guanine (G), and Adenine (A) occurring in the 1st, 2nd, and 3rd positions of triplets, respectively; (iv) the probabilities of a G occurring in the 1st and 2nd positions of triplets; and (v) the probabilities of a T occurring in the 1st and an A in the 2nd position of triplets. Because UFM is based on primary determinants of coding sequences that are conserved throughout the biosphere, it is suitable for cORF classification of any sequence in eukaryote transcriptomes without prior knowledge. Considering the protein sequences of the Protein Data Bank (RCSB PDB or more simply PDB) as a reference, we found that UFM classifies cORFs of ≥200 bp (if the coding strand is known) and cORFs of ≥300 bp (if the coding strand is unknown), and releases them in their coding strand and coding frame, which allows their automatic translation into protein sequences with a success rate equal to or higher than 95%. We first established the statistical parameters of UFM using ESTs from Plasmodium falciparum, Arabidopsis thaliana, Oryza sativa, Zea mays, Drosophila melanogaster, Homo sapiens and Chlamydomonas reinhardtii in reference to the protein sequences of PDB. Second, we showed that the success rate of cORF classification using UFM is expected to apply to approximately 95% of higher eukaryote genes that encode for proteins. Third, we used UFM in combination with CAP3 to assemble large EST samples into cORFs that we used to analyze transcriptome phenotypes in rice, maize, and humans. We discuss the error rate and the interference of noisy sequences such as pseudogenes, transposons, and retrotransposons. This method is suitable for rapid cORF extraction from transcriptome data and allows correct description of the genome phenotypes of plant genomes without prior knowledge. Additional care is necessary when addressing the human transcriptome due to the interference caused by large amounts of noisy sequences. UFM can be regarded as a low complexity tool for prior knowledge extraction concerning the coding fraction of the transcriptome of any eukaryote. Due to its low level of complexity, UFM is also very robust to variations of codon usage.
genomics; RNY; EST; ORF; CDS; UFM; classification
RNA editing is vast in some genetic systems, with up to thousands of targeted C-to-U and U-to-C substitutions in mitochondria and chloroplasts of certain plants. Efficient prognoses of RNA editing in organelle genomes will help to reveal overlooked cases of editing. We present PREPACT 2.0 (http://www.prepact.de) with numerous enhancements of our previously developed Plant RNA Editing Prediction & Analysis Computer Tool. Reference organelle transcriptomes for editing prediction have been extended and reorganized to include 19 curated mitochondrial and 13 chloroplast genomes, now allowing to distinguish RNA editing sites from “pre-edited” sites. Queries may be run against multiple references and a new “commons” function identifies and highlights orthologous candidate editing sites congruently predicted by multiple references. Enhancements to the BLASTX mode in PREPACT 2.0 allow querying of complete novel organelle genomes within a few minutes, identifying protein genes and candidate RNA editing sites simultaneously without prior user analyses.
pyrimidine substitutions; RNA editing prediction; plants; protists; mitochondrial DNA; chloroplast DNA; BLASTX
Retinoblastoma (RB) is a malignant tumor of the retina seen in children, and potential non invasive biomarkers are in need for rapid diagnosis and for prognosticating the therapy. This study was undertaken to identify the differentially expressed miRNAs in the serum of children with RB in comparison with the normal age matched serum, to analyze its concurrence with the existing RB tumor miRNA profile, to identify its novel gene targets specific to RB, and to study the expression of a few of the identified oncogenic miRNAs in the advanced stage primary RB patient’s serum sample. MiRNA profiling was performed on 14 pooled serum from children with advanced RB and 14 normal age matched serum samples, wherein 21 miRNAs were found to be upregulated (fold change ≤ −2.0, P ≤ 0.05) and 24 to be downregulated (fold change ≥ +2.0, P ≤ 0.05). Furthermore, intersection of 59 significantly deregulated miRNAs identified from RB tumor profiles with that of miRNAs detected in serum profile revealed that 33 miRNAs had followed a similar deregulation pattern in RB serum. Later we validated a few of the miRNAs (miRNA 17-92) identified by microarray in the RB patient serum samples (n = 20) by using qRT-PCR. Expression of the oncogenic miRNAs, miR-17, miR-18a, and miR-20a by qRT-PCR was significant in the serum samples exploring the potential of serum miRNAs identification as noninvasive diagnosis. Moreover, from miRNA gene target prediction, key regulatory genes of cell proliferation, apoptosis, and positive and negative regulatory networks involved in RB progression were identified in the gene expression profile of RB tumors. Therefore, these identified miRNAs and their corresponding target genes could give insights on potential biomarkers and key events involved in the RB pathway.
retinoblastoma; micro RNA; biomarkers; bioinformatics tools
In this paper we present a novel method for genome ranking according to gene lengths. The main outcomes described in this paper are the following: the formulation of the genome ranking problem, presentation of relevant approaches to solve it, and the demonstration of preliminary results from prokaryotic genomes ordering. Using a subset of prokaryotic genomes, we attempted to uncover factors affecting gene length. We have demonstrated that hyperthermophilic species have shorter genes as compared with mesophilic organisms, which probably means that environmental factors affect gene length. Moreover, these preliminary results show that environmental factors group together in ranking evolutionary distant species.
adaptation; evolution of prokaryotes; orthologs; machine learning; dimension-reduction techniques; factor analysis; clustering; rating; ranking
In the face of growing resistance in malaria parasites to drugs, pharmacological combination therapies are important. There is accumulating evidence that methylene blue (MB) is an effective drug against malaria. Here we explore the biological effects of both MB alone and in combination therapy using modeling and experimental data.
We built a model of the central metabolic pathways in P. falciparum. Metabolic flux modes and their changes under MB were calculated by integrating experimental data (RT-PCR data on mRNAs for redox enzymes) as constraints and results from the YANA software package for metabolic pathway calculations. Several different lines of MB attack on Plasmodium redox defense were identified by analysis of the network effects. Next, chloroquine resistance based on pfmdr/and pfcrt transporters, as well as pyrimethamine/sulfadoxine resistance (by mutations in DHF/DHPS), were modeled in silico. Further modeling shows that MB has a favorable synergism on antimalarial network effects with these commonly used antimalarial drugs.
Theoretical and experimental results support that methylene blue should, because of its resistance-breaking potential, be further tested as a key component in drug combination therapy efforts in holoendemic areas.
methylene blue; resistance; drug; elementary mode analysis; malaria; combination therapy; pathway; metabolic flux
Bacillus species form an heterogeneous group of Gram-positive bacteria that include members that are disease-causing, biotechnologically-relevant, and can serve as biological research tools. A common feature of Bacillus species is their ability to survive in harsh environmental conditions by formation of resistant endospores. Genes encoding the universal stress protein (USP) domain confer cellular and organismal survival during unfavorable conditions such as nutrient depletion. As of February 2012, the genome sequences and a variety of functional annotations for at least 123 Bacillus isolates including 45 Bacillus cereus isolates were available in public domain bioinformatics resources. Additionally, the genome sequencing status of 10 of the B. cereus isolates were annotated as finished with each genome encoded 3 USP genes. The conservation of gene neighborhood of the 140 aa universal stress protein in the B. cereus genomes led to the identification of a predicted plasmid-encoded transcriptional unit that includes a USP gene and a sulfate uptake gene in the soil-inhabiting Bacillus megaterium. Gene neighborhood analysis combined with visual analytics of chemical ligand binding sites data provided knowledge-building biological insights on possible cellular functions of B. megaterium universal stress proteins. These functions include sulfate and potassium uptake, acid extrusion, cellular energy-level sensing, survival in high oxygen conditions and acetate utilization. Of particular interest was a two-gene transcriptional unit that consisted of genes for a universal stress protein and a sirtuin Sir2 (deacetylase enzyme for NAD+-dependent acetate utilization). The predicted transcriptional units for stress responsive inorganic sulfate uptake and acetate utilization could explain biological mechanisms for survival of soil-inhabiting Bacillus species in sulfate and acetate limiting conditions. Considering the key role of sirtuins in mammalian physiology additional research on the USP-Sir2 transcriptional unit of B. megaterium could help explain mammalian acetate metabolism in glucose-limiting conditions such as caloric restriction. Finally, the deep-rooted position of B. megaterium in the phylogeny of Bacillus species makes the investigation of the functional coupling acetate utilization and stress response compelling.
ATP-binding; acetate utilization; Bacillus; Bacillus cereus; Bacillus megaterium; Sir2; sirtuins; sulfate uptake; universal stress proteins
A cold-adapted marine alkaline protease (MP, accession no. ACY25898) was produced by a marine bacterium strain, which was isolated from Yellow Sea sediment in China. Many previous researches showed that this protease had potential application as a detergent additive. It was therefore crucial to determine the tertiary structure of MP. In this study, a homology model of MP was constructed using the multiple templates alignment method. The tools PROCHECK, ERRAT, and Verify_3D were used to check the effectiveness of the model. The result showed that 94% of residues were found in the most favored allowed regions, 6% were in the additional allowed region, and 96.50% of the residues had average 3D-1D scores of no less than 0.2. Meanwhile, the overall quality factor (ERRAT) of our model was 80.657. In this study, we also focused on elucidating the molecular mechanism of the two “flap” motions. Based on the optimized model, molecular-dynamics simulations in explicit solvent environments were carried out by using the AMBER11 package, for the entire protein, in order to characterize the dynamical behavior of the two flaps. Our results showed an open motion of the two flaps in the water solvent. This research may facilitate inhibitor virtual screening for MP and may also lay the foundation knowledge of mechanism of the inhibitors.
marine alkaline protease; homology modeling; molecular dynamic simulation; zinc-metalprotease; explicit water
Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.
clustering; euclidean distance; quad tree; hierarchical clustering
Myosins are one of the largest protein superfamilies with 24 classes. They have conserved structural features and catalytic domains yet show huge variation at different domains resulting in a variety of functions. Myosins are molecules driving various kinds of cellular processes and motility until the level of organisms. These are ATPases that utilize the chemical energy released by ATP hydrolysis to bring about conformational changes leading to a motor function. Myosins are important as they are involved in almost all cellular activities ranging from cell division to transcriptional regulation. They are crucial due to their involvement in many congenital diseases symptomatized by muscular malfunctions, cardiac diseases, deafness, neural and immunological dysfunction, and so on, many of which lead to death at an early age. We present Myosinome, a database of selected myosin classes (myosin II, V, and VI) from five model organisms. This knowledge base provides the sequences, phylogenetic clustering, domain architectures of myosins and molecular models, structural analyses, and relevant literature of their coiled-coil domains. In the current version of Myosinome, information about 71 myosin sequences belonging to three myosin classes (myosin II, V, and VI) in five model organisms (Homo Sapiens, Mus musculus, D. melanogaster, C. elegans and S. cereviseae) identified using bioinformatics surveys are presented, and several of them are yet to be functionally characterized. As these proteins are involved in congenital diseases, such a database would be useful in short-listing candidates for gene therapy and drug development. The database can be accessed from http://caps.ncbs.res.in/myosinome.
myosin; Myosinome; myosin II; myosin V; myosin VI; myosin database
Gene and protein networks offer a powerful approach for integration of the disparate yet complimentary types of data that result from high-throughput analyses. Although many tools and databases are currently available for accessing such data, they are left unutilized by bench scientists as they generally lack features for effective analysis and integration of both public and private datasets and do not offer an intuitive interface for use by scientists with limited computational expertise. We describe BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. It enables biologists to analyze public as well as private gene expression; interactively query gene expression datasets; integrate data from multiple networks; store and selectively share the data and results. Finally, we describe an application of BioNetwork Bench to the assembly and iterative expansion of a gene network that controls the differentiation of retinal progenitor cells into rod photoreceptors. The tool is available from http://bionetworkbench.sourceforge.net/
The emergence of high-throughput technologies has allowed many biological investigators to collect a great deal of information about the behavior of genes and gene products over time or during a particular disease state. Gene and protein networks offer a powerful approach for integration of the disparate yet complimentary types of data that result from such high-throughput analyses. There are a growing number of public databases, as well as tools for visualization and analysis of networks. However, such databases and tools have yet to be widely utilized by bench scientists, as they generally lack features for effective analysis and integration of both public and private datasets and do not offer an intuitive interface for use by biological scientists with limited computational expertise.
We describe BioNetwork Bench, an open source, user-friendly suite of database and software tools for constructing, querying, and analyzing gene and protein network models. BioNetwork Bench currently supports a broad class of gene and protein network models (eg, weighted and un-weighted, undirected graphs, multi-graphs). It enables biologists to analyze public as well as private gene expression, macromolecular interaction and annotation data; interactively query gene expression datasets; integrate data from multiple networks; query multiple networks for interactions of interest; store and selectively share the data as well as results of analyses. BioNetwork Bench is implemented as a plug-in for, and hence is fully interoperable with, Cytoscape, a popular open-source software suite for visualizing macromolecular interaction networks. Finally, we describe an application of BioNetwork Bench to the problem of assembly and iterative expansion of a gene network that controls the differentiation of retinal progenitor cells into rod photoreceptors.
BioNetwork Bench provides a suite of open source software for construction, querying, and selective sharing of gene and protein networks. Although initially aimed at a community of biologists interested in retinal development, the tool can be adapted easily to work with other biological systems simply by populating the associated database with the relevant datasets.
network analysis; software; network contruction; network integration