In the last few decades, metabolic networks revealed their capabilities as powerful tools to analyze the cellular metabolism. Many research fields (eg, metabolic engineering, diagnostic medicine, pharmacology, biochemistry, biology and physiology) improved the understanding of the cell combining experimental assays and metabolic network-based computations. This process led to the rise of the “systems biology” approach, where the theory meets experiments and where two complementary perspectives cooperate in the study of biological phenomena. Here, the reconstruction of metabolic networks is presented, along with established and new algorithms to improve the description of cellular metabolism. Then, advantages and limitations of modeling algorithms and network reconstruction are discussed.
metabolic network; metabolic adjustments; enzymatic perturbations; metabolic impairments; genome-scale models; pathway simulation; -omics dataset integration
Sea anemone neurotoxins are peptides that interact with Na+ and K+ channels, resulting in specific alterations on their functions. Some of these neurotoxins (1ROO, 1BGK, 2K9E, 1BEI) are important for the treatment of about 80 autoimmune disorders because of their specificity for Kv1.3 channel. The aim of this study was to identify the common residues among these neurotoxins by computational methods, and establish whether there is a pattern useful for the future generation of a treatment for autoimmune diseases. Our results showed eight new key common residues between the studied neurotoxins interacting with a histidine ring and the selectivity filter of the receptor, thus showing a possible pattern of interaction. This knowledge may serve as an input for the design of more promising drugs for autoimmune treatments.
neurotoxins; potassium channel; Kv1.3; computational methods; autoimmune diseases
Pyrococcus furiosus is a hyperthermophilic archaea. A hypothetical protein of this archaea, PF0847, was selected for computational analysis. Basic local alignment search tool and multiple sequence alignment (MSA) tool were employed to search for related proteins. Both the secondary and tertiary structure prediction were obtained for further analysis. Three-dimensional model was assessed by PROCHECK and QMEAN6 programs. To get insights about the physical and functional associations of the protein, STRING network analysis was performed. Binding of the SAM (S-adenosyl-l-methionine) ligand with our protein, fetched from an antibiotic-related methyltransferase (PDB code: 3P2K: D), showed high docking energy and suggested the function of the protein as methyltransferase. Finally, we tried to look for a specific function of the proposed methyltransferase, and binding of the geneticin bound to the eubacterial 16S rRNA A-site (PDB code: 1MWL) in the active site of the PF0847 gave us the indication to predict the protein responsible for aminoglycoside antibiotic resistance.
methyltransferase; aminoglycoside antibiotic resistance; 16S rRNA A-site; molecular docking
Transcriptome alterations in liver and adipose tissue of cows with subclinical endometritis (SCE) at 29 d postpartum were evaluated. Bioinformatics analysis was performed using the Dynamic Impact Approach by means of KEGG and DAVID databases. Milk production, blood metabolites (non-esterified fatty acids, magnesium), and disease biomarkers (albumin, aspartate aminotransferase) did not differ greatly between healthy and SCE cows. In liver tissue of cows with SCE, alterations in gene expression revealed an activation of complement and coagulation cascade, steroid hormone biosynthesis, apoptosis, inflammation, oxidative stress, MAPK signaling, and the formation of fibrinogen complex. Bioinformatics analysis also revealed an inhibition of vitamin B3 and B6 metabolism with SCE. In adipose, the most activated pathways by SCE were nicotinate and nicotinamide metabolism, long-chain fatty acid transport, oxidative phosphorylation, inflammation, T cell and B cell receptor signaling, and mTOR signaling. Results indicate that SCE in dairy cattle during early lactation induces molecular alterations in liver and adipose tissue indicative of immune activation and cellular stress.
uterine infection; liver; adipose; cow genomics
Plant hormones involving salicylic acid (SA), jasmonic acid (JA), ethylene (Et), and auxin, gibberellins, and abscisic acid (ABA) are known to regulate host immune responses. However, plant hormone cytokinin has the potential to modulate defense signaling including SA and JA. It promotes plant pathogen and herbivore resistance; underlying mechanisms are still unknown. Using systems biology approaches, we unravel hub points of immune interaction mediated by cytokinin signaling in Arabidopsis. High-confidence Arabidopsis protein–protein interactions (PPI) are coupled to changes in cytokinin-mediated gene expression. Nodes of the cellular interactome that are enriched in immune functions also reconstitute sub-networks. Topological analyses and their specific immunological relevance lead to the identification of functional hubs in cellular interactome. We discuss our identified immune hubs in light of an emerging model of cytokinin-mediated immune defense against pathogen infection in plants.
systems biology; plant hormones; interaction networks; gene expression; cytokinin
In this study, we explored a time course of peripheral whole blood transcriptomes from kidney transplantation patients who either experienced an acute rejection episode or did not in order to better delineate the immunological and biological processes measureable in blood leukocytes that are associated with acute renal allograft rejection. Using microarrays, we generated gene expression data from 24 acute rejectors and 24 nonrejectors. We filtered the data to obtain the most unambiguous and robustly expressing probe sets and selected a subset of patients with the clearest phenotype. We then performed a data-driven exploratory analysis using data reduction and differential gene expression analysis tools in order to reveal gene expression signatures associated with acute allograft rejection. Using a template-matching algorithm, we then expanded our analysis to include time course data, identifying genes whose expression is modulated leading up to acute rejection. We have identified molecular phenotypes associated with acute renal allograft rejection, including a significantly upregulated signature of neutrophil activation and accumulation following transplant surgery that is common to both acute rejectors and nonrejectors. Our analysis shows that this expression signature appears to stabilize over time in nonrejectors but persists in patients who go on to reject the transplanted organ. In addition, we describe an expression signature characteristic of lymphocyte activity and proliferation. This lymphocyte signature is significantly downregulated in both acute rejectors and nonrejectors following surgery; however, patients who go on to reject the organ show a persistent downregulation of this signature relative to the neutrophil signature.
blood transcriptomics; microarray; kidney transplant rejection; peripheral whole blood; neutrophil to lymphocyte ratio
A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism.
dengue virus (DENV); quasispecies; genomic sequence variability; mutant viability; protein structure
CD36 is an integral membrane protein which is thought to have a hairpin-like structure with alpha-helices at the C and N terminals projecting through the membrane as well as a larger extracellular loop. This receptor interacts with a number of ligands including oxidized low density lipoprotein and long chain fatty acids (LCFAs). It is also implicated in lipid metabolism and heart diseases. It is therefore important to determine the 3D structure of the CD36 site involved in lipid binding. In this study, we predict the 3D structure of the fatty acid (FA) binding site [127–279 aa] of the CD36 receptor based on homology modeling with X-ray structure of Human Muscle Fatty Acid Binding Protein (PDB code: 1HMT). Qualitative and quantitative analysis of the resulting model suggests that this model was reliable and stable, taking in consideration over 97.8% of the residues in the most favored regions as well as the significant overall quality factor. Protein analysis, which relied on the secondary structure prediction of the target sequence and the comparison of 1HMT and CD36 [127–279 aa] secondary structures, led to the determination of the amino acid sequence consensus. These results also led to the identification of the functional sites on CD36 and revealed the presence of residues which may play a major role during ligand-protein interactions.
CD36; fatty acids binding site; homology modeling; 3D model
The neuron-restrictive silencer factor (NRSF) is a zinc finger transcription factor that represses neuronal gene transcription in non-neuronal cells by binding to the consensus repressor element-1 (RE1) located in regulatory regions of target genes. NRSF silences the expression of a wide range of target genes involved in neuron-specific functions. Previous studies showed that aberrant regulation of NRSF plays a key role in the pathological process of human neurodegenerative diseases. However, a comprehensive set of NRSF target genes relevant to human neuronal functions has not yet been characterized. We performed genome-wide data mining from chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) datasets of NRSF binding sites in human embryonic stem cells (ESC) and the corresponding ESC-derived neurons, retrieved from the database of the ENCODE/HAIB project. Using bioinformatics tools such as Avadis NGS and MACS, we identified 2,172 NRSF target genes in ESC and 308 genes in ESC-derived neurons based on stringent criteria. Only 40 NRSF target genes overlapped between both data sets. According to motif analysis, binding regions showed an enrichment of the consensus RE1 sites in ESC, whereas they were mainly located in poorly defined non-RE1 sites in ESC-derived neurons. Molecular pathways of NRSF target genes were linked with various neuronal functions in ESC, such as neuroactive ligand-receptor interaction, CREB signaling, and axonal guidance signaling, while they were not directed to neuron-specific functions in ESC-derived neurons. Remarkable differences in ChIP-Seq-based NRSF target genes and pathways between ESC and ESC-derived neurons suggested that NRSF-mediated silencing of target genes is highly effective in human ESC but not in ESC-derived neurons.
ChIP-seq; data mining; ESC; GenomeJack; Huntington’s disease; human neurons; NRSF; REST
Saint Louis encephalitis virus, a member of the flaviviridae subgroup, is a culex mosquito-borne pathogen. Despite severe epidemic outbreaks on several occasions, not much progress has been made with regard to an epitope-based vaccine designed for Saint Louis encephalitis virus. The envelope proteins were collected from a protein database and analyzed with an in silico tool to identify the most immunogenic protein. The protein was then verified through several parameters to predict the T-cell and B-cell epitopes. Both T-cell and B-cell immunity were assessed to determine that the protein can induce humoral as well as cell-mediated immunity. The peptide sequence from 330–336 amino acids and the sequence REYCYEATL from the position 57 were found as the most potential B-cell and T-cell epitopes, respectively. Furthermore, as an RNA virus, one important thing was to establish the epitope as a conserved one; this was also done by in silico tools, showing 63.51% conservancy. The epitope was further tested for binding against the HLA molecule by computational docking techniques to verify the binding cleft epitope interaction. However, this is a preliminary study of designing an epitope-based peptide vaccine against Saint Louis encephalitis virus; the results awaits validation by in vitro and in vivo experiments.
epitope; computational tools; humoral; cell-mediated immunity; conservancy
Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. This article presents a crowd-verification approach for the visualization and expansion of biological networks.
Web-based graphical interfaces allow visualization of causal and correlative biological relationships represented using Biological Expression Language (BEL). Crowdsourcing principles enable participants to communally annotate these relationships based on literature evidences. Gamification principles are incorporated to further engage domain experts throughout biology to gather robust peer-reviewed information from which relationships can be identified and verified.
The resulting network models will represent the current status of biological knowledge within the defined boundaries, here processes related to human lung disease. These models are amenable to computational analysis. For some period following conclusion of the challenge, the published models will remain available for continuous use and expansion by the scientific community.
community curation; biological network models; reputation system; Biological Expression Language
A good understanding of the population dynamics of algal communities is crucial in several ecological and pollution studies of freshwater and oceanic systems. This paper reviews the subsequent introduction to the automatic identification of the algal communities using image processing techniques from microscope images. The diverse techniques of image preprocessing, segmentation, feature extraction and recognition are considered one by one and their parameters are summarized. Automatic identification and classification of algal community are very difficult due to various factors such as change in size and shape with climatic changes, various growth periods, and the presence of other microbes. Therefore, the significance, uniqueness, and various approaches are discussed and the analyses in image processing methods are evaluated. Algal identification and associated problems in water organisms have been projected as challenges in image processing application. Various image processing approaches based on textures, shapes, and an object boundary, as well as some segmentation methods like, edge detection and color segmentations, are highlighted. Finally, artificial neural networks and some machine learning algorithms were used to classify and identifying the algae. Further, some of the benefits and drawbacks of schemes are examined.
Algae identification; segmentation; neural network; feature extraction; identification
The purpose of this study was to investigate the balance between transfer ribonucleic acid (tRNA) supply and demand in retrovirus-infected cells, seeking the best targets for antiretroviral therapy based on the hypothetical tRNA Inhibition Therapy (TRIT). Codon usage and tRNA gene data were retrieved from public databases. Based on logistic principles, a therapeutic score (T-score) was calculated for all sense codons, in each retrovirus-host system. Codons that are critical for viral protein translation, but not as critical for the host, have the highest T-score values. Theoretically, inactivating the cognate tRNA species should imply a severe reduction of the elongation rate during viral mRNA translation. We developed a method to predict tRNA species critical for retroviral protein synthesis. Four of the best TRIT targets in HIV-1 and HIV-2 encode Large Hydrophobic Residues (LHR), which have a central role in protein folding. One of them, codon CUA, is also a TRIT target in both HTLV-1 and HTLV-2. Therefore, a drug designed for inactivating or reducing the cytoplasmatic concentration of tRNA species with anticodon TAG could attenuate significantly both HIV and HTLV protein synthesis rates. Inversely, replacing codons ending in UA by synonymous codons should increase the expression, which is relevant for DNA vaccine design.
codon usage; tRNA; HIV; HTLV; therapy
Management of Retinoblastoma (RB), a pediatric ocular cancer is limited by drug-resistance and drug-dosage related side effects during chemotherapy. Molecular de-regulation in post-chemotherapy RB tumors was investigated.
Materials and Methods
cDNA microarray analysis of two post-chemotherapy and one pre-chemotherapy RB tumor tissues was performed, followed by Principle Component Analysis, Gene ontology, Pathway Enrichment analysis and Biological Analysis Network (BAN) modeling. The drug modulation role of two significantly up-regulated genes (p≤0.05) − Ect2 (Epithelial-cell-transforming-sequence-2), and PRAME (preferentially-expressed-Antigen-in-Melanoma) was assessed by qRT-PCR, immunohistochemistry and cell viability assays.
Differential up-regulation of 1672 genes and down-regulation of 2538 genes was observed in RB tissues (relative to normal adult retina), while 1419 genes were commonly de-regulated between pre-chemotherapy and post- chemotherapy RB. Twenty one key gene ontology categories, pathways, biomarkers and phenotype groups harboring 250 differentially expressed genes were dys-regulated (EZH2, NCoR1, MYBL2, RB1, STAMN1, SYK, JAK1/2, STAT1/2, PLK2/4, BIRC5, LAMN1, Ect2, PRAME and ABCC4). Differential molecular expressions of PRAME and Ect2 in RB tumors with and without chemotherapy were analyzed. There was neither up- regulation of MRP1, nor any significant shift in chemotherapeutic IC50, in PRAME over-expressed versus non-transfected RB cells.
Cell cycle regulatory genes were dys-regulated post-chemotherapy. Ect2 gene was expressed in response to chemotherapy-induced stress. PRAME does not contribute to drug resistance in RB, yet its nuclear localization and BAN information, points to its possible regulatory role in RB.
RB; Ect2; PRAME; MYBL2; NCoR1; drug resistance; micro array; chemotherapy
Histone modifications occur in precise patterns, with several modifications known to affect the binding of proteins. These interactions affect the chromatin structure, gene regulation, and cell cycle events. The dual modifications on the H3 tail, serine10 phosphorylation, and lysine14 acetylation (H3Ser10PLys14Ac) are reported to be crucial for interaction with 14-3-3ζ. However, the mechanism by which H3Ser10P along with neighboring site-specific acetylation(s) is targeted by its regulatory proteins, including kinase and phosphatase, is not fully understood. We carried out molecular modeling studies to understand the interaction of 14-3-3ζ, and its regulatory proteins, mitogen-activated protein kinase phosphatase-1 (MKP1), and mitogen- and stress-activated protein kinase-1 (MSK1) with phosphorylated H3Ser10 alone or in combination with acetylated H3Lys9 and Lys14. In silico molecular association studies suggested that acetylated Lys14 and phosphorylated Ser10 of H3 shows the highest binding affinity towards 14-3-3ζ. In addition, acetylation of H3Lys9 along with Ser10PLys14Ac favors the interaction of the phosphatase, MKP1, for dephosphorylation of H3Ser10P. Further, MAP kinase, MSK1 phosphorylates the unmodified H3Ser10 containing N-terminal tail with maximum affinity compared to the N-terminal tail with H3Lys9AcLys14Ac. The data clearly suggest that opposing enzymatic activity of MSK1 and MKP1 corroborates with non-acetylated and acetylated, H3Lys9Lys14, respectively. Our in silico data highlights that site-specific phosphorylation (H3Ser10P) and acetylation (H3Lys9 and H3Lys14) of H3 are essential for the interaction with their regulatory proteins (MKP1, MSK1, and 14-3-3ζ) and plays a major role in the regulation of chromatin structure.
modeling; histone H3 modifications; 14-3-3ζ; MSK1; MKP1
Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth.
longissimus muscle; intramuscular fat; growth; nutrition
We recently constructed a computable cell proliferation network (CPN) model focused on lung tissue to unravel complex biological processes and their exposure-related perturbations from molecular profiling data. The CPN consists of edges and nodes representing upstream controllers of gene expression largely generated from transcriptomics datasets using Reverse Causal Reasoning (RCR). Here, we report an approach to biologically verify the correctness of upstream controller nodes using a specifically designed, independent lung cell proliferation dataset. Normal human bronchial epithelial cells were arrested at G1/S with a cell cycle inhibitor. Gene expression changes and cell proliferation were captured at different time points after release from inhibition. Gene set enrichment analysis demonstrated cell cycle response specificity via an overrepresentation of proliferation related gene sets. Coverage analysis of RCR-derived hypotheses returned statistical significance for cell cycle response specificity across the whole model as well as for the Growth Factor and Cell Cycle sub-network models.
cell proliferation; biological network model; reverse causal reasoning
Proteins may be related to each other very specifically as homologous subfamilies. Proteins can also be related to diverse proteins at the super family level. It has become highly important to characterize the existing sequence databases by their signatures to facilitate the function annotation of newly added sequences. The algorithm described here uses a scheme for the classification of odorant binding proteins on the basis of functional residues and Cys-pairing. The cysteine-based scoring scheme not only helps in unambiguously identifying families like odorant binding proteins (OBPs), but also aids in their classification at the subfamily level with reliable accuracy. The algorithm was also applied to yet another cysteine-rich family, where similar accuracy was observed that ensures the application of the protocol to other families.
cysteine-based scoring scheme; Classification of proteins; Functionally important residues; Ligand binding residues
We used the newly-developed Dynamic Impact Approach (DIA) and gene network analysis to study the sow mammary transcriptome at 80, 100, and 110 days of pregnancy. A swine oligoarray with 13,290 inserts was used for transcriptome profiling. An ANOVA with false discovery rate (FDR < 0.15) correction resulted in 1,409 genes with a significant time effect across time comparisons. The DIA uncovered that Fatty acid biosynthesis, Interleukin-4 receptor binding, Galactose metabolism, and mTOR signaling were among the most-impacted pathways. IL-4 receptor binding, ABC transporters, cytokine-cytokine receptor interaction, and Jak-STAT signaling were markedly activated at 110 days compared with 80 and 100 days. Epigenetic and transcription factor regulatory mechanisms appear important in coordinating the final stages of mammary development during pregnancy. Network analysis revealed a crucial role for TP53, ARNT2, E2F4, and PPARG. The bioinformatics analyses revealed a number of pathways and functions that perform an irreplaceable role during late gestation to farrowing.
systems biology; transcriptomics; mammary gland; sow; dynamic impact approach
Exposure to environmental stressors such as cigarette smoke (CS) elicits a variety of biological responses in humans, including the induction of inflammatory responses. These responses are especially pronounced in the lung, where pulmonary cells sit at the interface between the body’s internal and external environments. We combined a literature survey with a computational analysis of multiple transcriptomic data sets to construct a computable causal network model (the Inflammatory Process Network (IPN)) of the main pulmonary inflammatory processes. The IPN model predicted decreased epithelial cell barrier defenses and increased mucus hypersecretion in human bronchial epithelial cells, and an attenuated pro-inflammatory (M1) profile in alveolar macrophages following exposure to CS, consistent with prior results. The IPN provides a comprehensive framework of experimentally supported pathways related to CS-induced pulmonary inflammation. The IPN is freely available to the scientific community as a resource with broad applicability to study the pathogenesis of pulmonary disease.
inflammation; cigarette smoke; network model; gene expression; biological expression language (BEL); reverse causal reasoning (RCR)
Limno-terrestrial tardigrades are small invertebrates that are subjected to periodic drought of their micro-environment. They have evolved to cope with these unfavorable conditions by anhydrobiosis, an ametabolic state of low cellular water. During drying and rehydration, tardigrades go through drastic changes in cellular water content. By our transcriptome sequencing effort of the limno-terrestrial tardigrade Milnesium tardigradum and by a combination of cloning and targeted sequence assembly, we identified transcripts encoding eleven putative aquaporins. Analysis of these sequences proposed 2 classical aquaporins, 8 aquaglyceroporins and a single potentially intracellular unorthodox aquaporin. Using quantitative real-time PCR we analyzed aquaporin transcript expression in the anhydrobiotic context. We have identified additional unorthodox aquaporins in various insect genomes and have identified a novel common conserved structural feature in these proteins. Analysis of the genomic organization of insect aquaporin genes revealed several conserved gene clusters.
unorthodox aquaporin; anhydrobiosis; tardigrade
To date, the utility of single genetic markers to improve disease risk assessment still explains only a small proportion of genetic variance for many complex diseases. This missing heritability may be explained by additional variants with weak effects. To discover and incorporate these additional genetic factors, statistical and computational methods must be evaluated and developed. We develop a multi-locus genetic risk score (GRS) based approach to analyze genes in NADPH oxidase complex which may result in susceptibility to development of inflammatory bowel disease (IBD). We find the complex is highly associated with IBD (P = 7.86 × 10−14) using the GRS-based association method. Similar results are also shown in permutation analysis (P = 6.65 × 10−11). Likelihood ratio test shows that the single nucleotide polymorphisms (SNPs) in the complex without nominal signals have significant contribution to the overall genetic effect within the complex (P = 0.015). Our results show that the multi-locus GRS association model can improve the genetic risk assessment on IBD by taking into account both confirmed and as yet unconfirmed disease susceptibility variants.
genetic risk score; inflammatory bowel disease; permutation analysis; association analysis
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expressions by targeting the mRNAs especially in the 3′UTR regions. The identification of miRNAs has been done by biological experiment and computational prediction. The computational prediction approach has been done using two major methods: comparative and noncomparative. The comparative method is dependent on the conservation of the miRNA sequences and secondary structure. The noncomparative method, on the other hand, does not rely on conservation. We hypothesized that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. The average sensitivity was 88.62% for miR-Explore, which relies on within miRNA class alignment, and 70.82% for miR-abela, which relies on global alignment. Compared with global alignment, grouping miRNA by classes yields a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment are used.
miR-explore; chicken; miRNA class alignment; miRNA
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
SNP prediction; inductive logic programming; human monogenic disease; genotype-phenotype relation
Bacterial, small RNAs were once regarded as potent regulators of gene expression and are now being considered as essential for their diversified roles. Many small RNAs are now reported to have a wide array of regulatory functions, ranging from environmental sensing to pathogenesis. Traditionally, noncoding transcripts were rarely detected by means of genetic screens. However, the availability of approximately 2200 prokaryotic genome sequences in public databases facilitates the efficient computational search of those molecules, followed by experimental validation. In principle, the following four major computational methods were applied for the prediction of sRNA locations from bacterial genome sequences: (1) comparative genomics, (2) secondary structure and thermodynamic stability, (3) ‘Orphan’ transcriptional signals and (4) ab initio methods regardless of sequence or structure similarity; most of these tools were applied to locate the putative genomic sRNA locations followed by experimental validation of those transcripts. Therefore, computational screening has simplified the sRNA identification process in bacteria. In this review, a plethora of small RNA prediction methods and tools that have been reported in the past decade are discussed comprehensively and assessed based on their attributes, compatibility, and their prediction accuracy.
comparative genomics; base composition; ncRNA; sRNA prediction; structure stability; transcriptional signal