Search tips
Search criteria

Results 26-50 (145)

Clipboard (0)

Select a Filter Below

Year of Publication
26.  Three Dimensional Structure Prediction of Fatty Acid Binding Site on Human Transmembrane Receptor CD36 
CD36 is an integral membrane protein which is thought to have a hairpin-like structure with alpha-helices at the C and N terminals projecting through the membrane as well as a larger extracellular loop. This receptor interacts with a number of ligands including oxidized low density lipoprotein and long chain fatty acids (LCFAs). It is also implicated in lipid metabolism and heart diseases. It is therefore important to determine the 3D structure of the CD36 site involved in lipid binding. In this study, we predict the 3D structure of the fatty acid (FA) binding site [127–279 aa] of the CD36 receptor based on homology modeling with X-ray structure of Human Muscle Fatty Acid Binding Protein (PDB code: 1HMT). Qualitative and quantitative analysis of the resulting model suggests that this model was reliable and stable, taking in consideration over 97.8% of the residues in the most favored regions as well as the significant overall quality factor. Protein analysis, which relied on the secondary structure prediction of the target sequence and the comparison of 1HMT and CD36 [127–279 aa] secondary structures, led to the determination of the amino acid sequence consensus. These results also led to the identification of the functional sites on CD36 and revealed the presence of residues which may play a major role during ligand-protein interactions.
PMCID: PMC3859822  PMID: 24348024
CD36; fatty acids binding site; homology modeling; 3D model
27.  ChIP-Seq Data Mining: Remarkable Differences in NRSF/REST Target Genes between Human ESC and ESC-Derived Neurons 
The neuron-restrictive silencer factor (NRSF) is a zinc finger transcription factor that represses neuronal gene transcription in non-neuronal cells by binding to the consensus repressor element-1 (RE1) located in regulatory regions of target genes. NRSF silences the expression of a wide range of target genes involved in neuron-specific functions. Previous studies showed that aberrant regulation of NRSF plays a key role in the pathological process of human neurodegenerative diseases. However, a comprehensive set of NRSF target genes relevant to human neuronal functions has not yet been characterized. We performed genome-wide data mining from chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) datasets of NRSF binding sites in human embryonic stem cells (ESC) and the corresponding ESC-derived neurons, retrieved from the database of the ENCODE/HAIB project. Using bioinformatics tools such as Avadis NGS and MACS, we identified 2,172 NRSF target genes in ESC and 308 genes in ESC-derived neurons based on stringent criteria. Only 40 NRSF target genes overlapped between both data sets. According to motif analysis, binding regions showed an enrichment of the consensus RE1 sites in ESC, whereas they were mainly located in poorly defined non-RE1 sites in ESC-derived neurons. Molecular pathways of NRSF target genes were linked with various neuronal functions in ESC, such as neuroactive ligand-receptor interaction, CREB signaling, and axonal guidance signaling, while they were not directed to neuron-specific functions in ESC-derived neurons. Remarkable differences in ChIP-Seq-based NRSF target genes and pathways between ESC and ESC-derived neurons suggested that NRSF-mediated silencing of target genes is highly effective in human ESC but not in ESC-derived neurons.
PMCID: PMC3855043  PMID: 24324330
ChIP-seq; data mining; ESC; GenomeJack; Huntington’s disease; human neurons; NRSF; REST
28.  A Computational Assay to Design an Epitope-Based Peptide Vaccine Against Saint Louis Encephalitis Virus 
Saint Louis encephalitis virus, a member of the flaviviridae subgroup, is a culex mosquito-borne pathogen. Despite severe epidemic outbreaks on several occasions, not much progress has been made with regard to an epitope-based vaccine designed for Saint Louis encephalitis virus. The envelope proteins were collected from a protein database and analyzed with an in silico tool to identify the most immunogenic protein. The protein was then verified through several parameters to predict the T-cell and B-cell epitopes. Both T-cell and B-cell immunity were assessed to determine that the protein can induce humoral as well as cell-mediated immunity. The peptide sequence from 330–336 amino acids and the sequence REYCYEATL from the position 57 were found as the most potential B-cell and T-cell epitopes, respectively. Furthermore, as an RNA virus, one important thing was to establish the epitope as a conserved one; this was also done by in silico tools, showing 63.51% conservancy. The epitope was further tested for binding against the HLA molecule by computational docking techniques to verify the binding cleft epitope interaction. However, this is a preliminary study of designing an epitope-based peptide vaccine against Saint Louis encephalitis virus; the results awaits validation by in vitro and in vivo experiments.
PMCID: PMC3855041  PMID: 24324329
epitope; computational tools; humoral; cell-mediated immunity; conservancy
29.  On Crowd-verification of Biological Networks 
Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. This article presents a crowd-verification approach for the visualization and expansion of biological networks.
Web-based graphical interfaces allow visualization of causal and correlative biological relationships represented using Biological Expression Language (BEL). Crowdsourcing principles enable participants to communally annotate these relationships based on literature evidences. Gamification principles are incorporated to further engage domain experts throughout biology to gather robust peer-reviewed information from which relationships can be identified and verified.
The resulting network models will represent the current status of biological knowledge within the defined boundaries, here processes related to human lung disease. These models are amenable to computational analysis. For some period following conclusion of the challenge, the published models will remain available for continuous use and expansion by the scientific community.
PMCID: PMC3798292  PMID: 24151423
community curation; biological network models; reputation system; Biological Expression Language
30.  Automatic Identification of Algal Community from Microscopic Images 
A good understanding of the population dynamics of algal communities is crucial in several ecological and pollution studies of freshwater and oceanic systems. This paper reviews the subsequent introduction to the automatic identification of the algal communities using image processing techniques from microscope images. The diverse techniques of image preprocessing, segmentation, feature extraction and recognition are considered one by one and their parameters are summarized. Automatic identification and classification of algal community are very difficult due to various factors such as change in size and shape with climatic changes, various growth periods, and the presence of other microbes. Therefore, the significance, uniqueness, and various approaches are discussed and the analyses in image processing methods are evaluated. Algal identification and associated problems in water organisms have been projected as challenges in image processing application. Various image processing approaches based on textures, shapes, and an object boundary, as well as some segmentation methods like, edge detection and color segmentations, are highlighted. Finally, artificial neural networks and some machine learning algorithms were used to classify and identifying the algae. Further, some of the benefits and drawbacks of schemes are examined.
PMCID: PMC3798295  PMID: 24151424
Algae identification; segmentation; neural network; feature extraction; identification
31.  Human Retrovirus Codon Usage from tRNA Point of View: Therapeutic Insights 
The purpose of this study was to investigate the balance between transfer ribonucleic acid (tRNA) supply and demand in retrovirus-infected cells, seeking the best targets for antiretroviral therapy based on the hypothetical tRNA Inhibition Therapy (TRIT). Codon usage and tRNA gene data were retrieved from public databases. Based on logistic principles, a therapeutic score (T-score) was calculated for all sense codons, in each retrovirus-host system. Codons that are critical for viral protein translation, but not as critical for the host, have the highest T-score values. Theoretically, inactivating the cognate tRNA species should imply a severe reduction of the elongation rate during viral mRNA translation. We developed a method to predict tRNA species critical for retroviral protein synthesis. Four of the best TRIT targets in HIV-1 and HIV-2 encode Large Hydrophobic Residues (LHR), which have a central role in protein folding. One of them, codon CUA, is also a TRIT target in both HTLV-1 and HTLV-2. Therefore, a drug designed for inactivating or reducing the cytoplasmatic concentration of tRNA species with anticodon TAG could attenuate significantly both HIV and HTLV protein synthesis rates. Inversely, replacing codons ending in UA by synonymous codons should increase the expression, which is relevant for DNA vaccine design.
PMCID: PMC3798314  PMID: 24151425
codon usage; tRNA; HIV; HTLV; therapy
32.  Molecular Insights on Post-chemotherapy Retinoblastoma by Microarray Gene Expression Analysis 
Management of Retinoblastoma (RB), a pediatric ocular cancer is limited by drug-resistance and drug-dosage related side effects during chemotherapy. Molecular de-regulation in post-chemotherapy RB tumors was investigated.
Materials and Methods
cDNA microarray analysis of two post-chemotherapy and one pre-chemotherapy RB tumor tissues was performed, followed by Principle Component Analysis, Gene ontology, Pathway Enrichment analysis and Biological Analysis Network (BAN) modeling. The drug modulation role of two significantly up-regulated genes (p≤0.05) − Ect2 (Epithelial-cell-transforming-sequence-2), and PRAME (preferentially-expressed-Antigen-in-Melanoma) was assessed by qRT-PCR, immunohistochemistry and cell viability assays.
Differential up-regulation of 1672 genes and down-regulation of 2538 genes was observed in RB tissues (relative to normal adult retina), while 1419 genes were commonly de-regulated between pre-chemotherapy and post- chemotherapy RB. Twenty one key gene ontology categories, pathways, biomarkers and phenotype groups harboring 250 differentially expressed genes were dys-regulated (EZH2, NCoR1, MYBL2, RB1, STAMN1, SYK, JAK1/2, STAT1/2, PLK2/4, BIRC5, LAMN1, Ect2, PRAME and ABCC4). Differential molecular expressions of PRAME and Ect2 in RB tumors with and without chemotherapy were analyzed. There was neither up- regulation of MRP1, nor any significant shift in chemotherapeutic IC50, in PRAME over-expressed versus non-transfected RB cells.
Cell cycle regulatory genes were dys-regulated post-chemotherapy. Ect2 gene was expressed in response to chemotherapy-induced stress. PRAME does not contribute to drug resistance in RB, yet its nuclear localization and BAN information, points to its possible regulatory role in RB.
PMCID: PMC3785389  PMID: 24092970
RB; Ect2; PRAME; MYBL2; NCoR1; drug resistance; micro array; chemotherapy
33.  Molecular Modeling of Differentially Phosphorylated Serine 10 and Acetylated lysine 9/14 of Histone H3 Regulates their Interactions with 14-3-3ζ, MSK1, and MKP1 
Histone modifications occur in precise patterns, with several modifications known to affect the binding of proteins. These interactions affect the chromatin structure, gene regulation, and cell cycle events. The dual modifications on the H3 tail, serine10 phosphorylation, and lysine14 acetylation (H3Ser10PLys14Ac) are reported to be crucial for interaction with 14-3-3ζ. However, the mechanism by which H3Ser10P along with neighboring site-specific acetylation(s) is targeted by its regulatory proteins, including kinase and phosphatase, is not fully understood. We carried out molecular modeling studies to understand the interaction of 14-3-3ζ, and its regulatory proteins, mitogen-activated protein kinase phosphatase-1 (MKP1), and mitogen- and stress-activated protein kinase-1 (MSK1) with phosphorylated H3Ser10 alone or in combination with acetylated H3Lys9 and Lys14. In silico molecular association studies suggested that acetylated Lys14 and phosphorylated Ser10 of H3 shows the highest binding affinity towards 14-3-3ζ. In addition, acetylation of H3Lys9 along with Ser10PLys14Ac favors the interaction of the phosphatase, MKP1, for dephosphorylation of H3Ser10P. Further, MAP kinase, MSK1 phosphorylates the unmodified H3Ser10 containing N-terminal tail with maximum affinity compared to the N-terminal tail with H3Lys9AcLys14Ac. The data clearly suggest that opposing enzymatic activity of MSK1 and MKP1 corroborates with non-acetylated and acetylated, H3Lys9Lys14, respectively. Our in silico data highlights that site-specific phosphorylation (H3Ser10P) and acetylation (H3Lys9 and H3Lys14) of H3 are essential for the interaction with their regulatory proteins (MKP1, MSK1, and 14-3-3ζ) and plays a major role in the regulation of chromatin structure.
PMCID: PMC3767654  PMID: 24027420
modeling; histone H3 modifications; 14-3-3ζ; MSK1; MKP1
34.  Bioinformatics Analysis of Transcriptome Dynamics During Growth in Angus Cattle Longissimus Muscle 
Transcriptome dynamics in the longissimus muscle (LM) of young Angus cattle were evaluated at 0, 60, 120, and 220 days from early-weaning. Bioinformatic analysis was performed using the dynamic impact approach (DIA) by means of Kyoto Encyclopedia of Genes and Genomes (KEGG) and Database for Annotation, Visualization and Integrated Discovery (DAVID) databases. Between 0 to 120 days (growing phase) most of the highly-impacted pathways (eg, ascorbate and aldarate metabolism, drug metabolism, cytochrome P450 and Retinol metabolism) were inhibited. The phase between 120 to 220 days (finishing phase) was characterized by the most striking differences with 3,784 differentially expressed genes (DEGs). Analysis of those DEGs revealed that the most impacted KEGG canonical pathway was glycosylphosphatidylinositol (GPI)-anchor biosynthesis, which was inhibited. Furthermore, inhibition of calpastatin and activation of tyrosine aminotransferase ubiquitination at 220 days promotes proteasomal degradation, while the concurrent activation of ribosomal proteins promotes protein synthesis. Therefore, the balance of these processes likely results in a steady-state of protein turnover during the finishing phase. Results underscore the importance of transcriptome dynamics in LM during growth.
PMCID: PMC3738383  PMID: 23943656
longissimus muscle; intramuscular fat; growth; nutrition
35.  Systematic Verification of Upstream Regulators of a Computable Cellular Proliferation Network Model on Non-Diseased Lung Cells Using a Dedicated Dataset 
We recently constructed a computable cell proliferation network (CPN) model focused on lung tissue to unravel complex biological processes and their exposure-related perturbations from molecular profiling data. The CPN consists of edges and nodes representing upstream controllers of gene expression largely generated from transcriptomics datasets using Reverse Causal Reasoning (RCR). Here, we report an approach to biologically verify the correctness of upstream controller nodes using a specifically designed, independent lung cell proliferation dataset. Normal human bronchial epithelial cells were arrested at G1/S with a cell cycle inhibitor. Gene expression changes and cell proliferation were captured at different time points after release from inhibition. Gene set enrichment analysis demonstrated cell cycle response specificity via an overrepresentation of proliferation related gene sets. Coverage analysis of RCR-derived hypotheses returned statistical significance for cell cycle response specificity across the whole model as well as for the Growth Factor and Cell Cycle sub-network models.
PMCID: PMC3733638  PMID: 23926424
cell proliferation; biological network model; reverse causal reasoning
36.  Association of Putative Members to Family of Mosquito Odorant Binding Proteins: Scoring Scheme Using Fuzzy Functional Templates and Cys Residue Positions 
Proteins may be related to each other very specifically as homologous subfamilies. Proteins can also be related to diverse proteins at the super family level. It has become highly important to characterize the existing sequence databases by their signatures to facilitate the function annotation of newly added sequences. The algorithm described here uses a scheme for the classification of odorant binding proteins on the basis of functional residues and Cys-pairing. The cysteine-based scoring scheme not only helps in unambiguously identifying families like odorant binding proteins (OBPs), but also aids in their classification at the subfamily level with reliable accuracy. The algorithm was also applied to yet another cysteine-rich family, where similar accuracy was observed that ensures the application of the protocol to other families.
PMCID: PMC3728099  PMID: 23908587
cysteine-based scoring scheme; Classification of proteins; Functionally important residues; Ligand binding residues
37.  Bioinformatics and Gene Network Analyses of the Swine Mammary Gland Transcriptome during Late Gestation 
We used the newly-developed Dynamic Impact Approach (DIA) and gene network analysis to study the sow mammary transcriptome at 80, 100, and 110 days of pregnancy. A swine oligoarray with 13,290 inserts was used for transcriptome profiling. An ANOVA with false discovery rate (FDR < 0.15) correction resulted in 1,409 genes with a significant time effect across time comparisons. The DIA uncovered that Fatty acid biosynthesis, Interleukin-4 receptor binding, Galactose metabolism, and mTOR signaling were among the most-impacted pathways. IL-4 receptor binding, ABC transporters, cytokine-cytokine receptor interaction, and Jak-STAT signaling were markedly activated at 110 days compared with 80 and 100 days. Epigenetic and transcription factor regulatory mechanisms appear important in coordinating the final stages of mammary development during pregnancy. Network analysis revealed a crucial role for TP53, ARNT2, E2F4, and PPARG. The bioinformatics analyses revealed a number of pathways and functions that perform an irreplaceable role during late gestation to farrowing.
PMCID: PMC3728096  PMID: 23908586
systems biology; transcriptomics; mammary gland; sow; dynamic impact approach
38.  A Modular Cell-Type Focused Inflammatory Process Network Model for Non-Diseased Pulmonary Tissue 
Exposure to environmental stressors such as cigarette smoke (CS) elicits a variety of biological responses in humans, including the induction of inflammatory responses. These responses are especially pronounced in the lung, where pulmonary cells sit at the interface between the body’s internal and external environments. We combined a literature survey with a computational analysis of multiple transcriptomic data sets to construct a computable causal network model (the Inflammatory Process Network (IPN)) of the main pulmonary inflammatory processes. The IPN model predicted decreased epithelial cell barrier defenses and increased mucus hypersecretion in human bronchial epithelial cells, and an attenuated pro-inflammatory (M1) profile in alveolar macrophages following exposure to CS, consistent with prior results. The IPN provides a comprehensive framework of experimentally supported pathways related to CS-induced pulmonary inflammation. The IPN is freely available to the scientific community as a resource with broad applicability to study the pathogenesis of pulmonary disease.
PMCID: PMC3700945  PMID: 23843693
inflammation; cigarette smoke; network model; gene expression; biological expression language (BEL); reverse causal reasoning (RCR)
39.  The Aquaporin Channel Repertoire of the Tardigrade Milnesium tardigradum 
Limno-terrestrial tardigrades are small invertebrates that are subjected to periodic drought of their micro-environment. They have evolved to cope with these unfavorable conditions by anhydrobiosis, an ametabolic state of low cellular water. During drying and rehydration, tardigrades go through drastic changes in cellular water content. By our transcriptome sequencing effort of the limno-terrestrial tardigrade Milnesium tardigradum and by a combination of cloning and targeted sequence assembly, we identified transcripts encoding eleven putative aquaporins. Analysis of these sequences proposed 2 classical aquaporins, 8 aquaglyceroporins and a single potentially intracellular unorthodox aquaporin. Using quantitative real-time PCR we analyzed aquaporin transcript expression in the anhydrobiotic context. We have identified additional unorthodox aquaporins in various insect genomes and have identified a novel common conserved structural feature in these proteins. Analysis of the genomic organization of insect aquaporin genes revealed several conserved gene clusters.
PMCID: PMC3666991  PMID: 23761966
unorthodox aquaporin; anhydrobiosis; tardigrade
40.  Association Between a Multi-Locus Genetic Risk Score and Inflammatory Bowel Disease 
To date, the utility of single genetic markers to improve disease risk assessment still explains only a small proportion of genetic variance for many complex diseases. This missing heritability may be explained by additional variants with weak effects. To discover and incorporate these additional genetic factors, statistical and computational methods must be evaluated and developed. We develop a multi-locus genetic risk score (GRS) based approach to analyze genes in NADPH oxidase complex which may result in susceptibility to development of inflammatory bowel disease (IBD). We find the complex is highly associated with IBD (P = 7.86 × 10−14) using the GRS-based association method. Similar results are also shown in permutation analysis (P = 6.65 × 10−11). Likelihood ratio test shows that the single nucleotide polymorphisms (SNPs) in the complex without nominal signals have significant contribution to the overall genetic effect within the complex (P = 0.015). Our results show that the multi-locus GRS association model can improve the genetic risk assessment on IBD by taking into account both confirmed and as yet unconfirmed disease susceptibility variants.
PMCID: PMC3662393  PMID: 23761965
genetic risk score; inflammatory bowel disease; permutation analysis; association analysis
41.  miR-Explore: Predicting MicroRNA Precursors by Class Grouping and Secondary Structure Positional Alignment 
MicroRNAs (miRNAs) are small noncoding RNAs that regulate gene expressions by targeting the mRNAs especially in the 3′UTR regions. The identification of miRNAs has been done by biological experiment and computational prediction. The computational prediction approach has been done using two major methods: comparative and noncomparative. The comparative method is dependent on the conservation of the miRNA sequences and secondary structure. The noncomparative method, on the other hand, does not rely on conservation. We hypothesized that each miRNA class has its own unique set of features; therefore, grouping miRNA by classes before using them as training data will improve sensitivity and specificity. The average sensitivity was 88.62% for miR-Explore, which relies on within miRNA class alignment, and 70.82% for miR-abela, which relies on global alignment. Compared with global alignment, grouping miRNA by classes yields a better sensitivity with very high specificity for pre-miRNA prediction even when a simple positional based secondary and primary structure alignment are used.
PMCID: PMC3623602  PMID: 23645986
miR-explore; chicken; miRNA class alignment; miRNA
42.  Knowledge Discovery in Variant Databases Using Inductive Logic Programming 
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at
PMCID: PMC3615990  PMID: 23589683
SNP prediction; inductive logic programming; human monogenic disease; genotype-phenotype relation
43.  Computational Small RNA Prediction in Bacteria 
Bacterial, small RNAs were once regarded as potent regulators of gene expression and are now being considered as essential for their diversified roles. Many small RNAs are now reported to have a wide array of regulatory functions, ranging from environmental sensing to pathogenesis. Traditionally, noncoding transcripts were rarely detected by means of genetic screens. However, the availability of approximately 2200 prokaryotic genome sequences in public databases facilitates the efficient computational search of those molecules, followed by experimental validation. In principle, the following four major computational methods were applied for the prediction of sRNA locations from bacterial genome sequences: (1) comparative genomics, (2) secondary structure and thermodynamic stability, (3) ‘Orphan’ transcriptional signals and (4) ab initio methods regardless of sequence or structure similarity; most of these tools were applied to locate the putative genomic sRNA locations followed by experimental validation of those transcripts. Therefore, computational screening has simplified the sRNA identification process in bacteria. In this review, a plethora of small RNA prediction methods and tools that have been reported in the past decade are discussed comprehensively and assessed based on their attributes, compatibility, and their prediction accuracy.
PMCID: PMC3596055  PMID: 23516022
comparative genomics; base composition; ncRNA; sRNA prediction; structure stability; transcriptional signal
44.  Construction of a Computable Network Model for DNA Damage, Autophagy, Cell Death, and Senescence 
Towards the development of a systems biology-based risk assessment approach for environmental toxicants, including tobacco products in a systems toxicology setting such as the “21st Century Toxicology”, we are building a series of computable biological network models specific to non-diseased pulmonary and cardiovascular cells/tissues which capture the molecular events that can be activated following exposure to environmental toxicants. Here we extend on previous work and report on the construction and evaluation of a mechanistic network model focused on DNA damage response and the four main cellular fates induced by stress: autophagy, apoptosis, necroptosis, and senescence. In total, the network consists of 34 sub-models containing 1052 unique nodes and 1538 unique edges which are supported by 1231 PubMed-referenced literature citations. Causal node-edge relationships are described using the Biological Expression Language (BEL), which allows for the semantic representation of life science relationships in a computable format. The Network is provided in .XGMML format and can be viewed using freely available network visualization software, such as Cytoscape.
PMCID: PMC3596057  PMID: 23515068
computable; network model; DNA damage; autophagy; apoptosis; necroptosis; senescence; Biological Expression Language (BEL)
45.  Integrative Approach for Computationally Inferring Interactions between the Alpha and Beta Subunits of the Calcium-Activated Potassium Channel (BK): a Docking Study 
Three-dimensional models of the alpha- and beta-1 subunits of the calcium-activated potassium channel (BK) were predicted by threading modeling. A recursive approach comprising of sequence alignment and model building based on three templates was used to build these models, with the refinement of non-conserved regions carried out using threading techniques. The complex formed by the subunits was studied by means of docking techniques, using 3D models of the two subunits, and an approach based on rigid-body structures. Structural effects of the complex were analyzed with respect to hydrogen-bond interactions and binding-energy calculations. Potential interaction sites of the complex were determined by referencing a study of the difference accessible surface area (DASA) of the protein subunits in the complex.
PMCID: PMC3588595  PMID: 23492851
potassium channel; docking; moelcular interactions; binding energy
46.  A Microbial Metagenome (Leucobacter sp.) in Caenorhabditis Whole Genome Sequences 
DNA of apparently recent bacterial origin is found in the genomic sequences of Caenorhabditis angaria and Caenorhabditis remanei. Here we present evidence that the DNA belongs to a single species of the genus Leucobacter (high-GC Gram+ Actinobacteria). Metagenomic tools enabled the assembly of the contaminating sequences in a draft genome of 3.2 Mb harboring 2,826 genes. This information provides insight into a microbial organism intimately associated with Caenorhabditis as well as a solid basis for the reassignment of 3,373 metazoan entries of the public database to a novel bacterial species (Leucobacter sp. AEAR). The application of metagenomic techniques can thus prevent annotation errors and reveal unexpected genetic information in data obtained by conventional genomics.
PMCID: PMC3583267  PMID: 23585714
host-microbe interactions; organism identification in DNA sequences; contamination in sequence database; next-generation sequencing; purine degradation
47.  A Statistical Method without Training Step for the Classification of Coding Frame in Transcriptome Sequences 
In this study, we investigated the modalities of coding open reading frame (cORF) classification of expressed sequence tags (EST) by using the universal feature method (UFM). The UFM algorithm is based on the scoring of purine bias (Rrr) and stop codon frequencies. UFM classifies ORFs as coding or non-coding through a score based on 5 factors: (i) stop codon frequency; (ii) the product of the probabilities of purines occurring in the three positions of nucleotide triplets; (iii) the product of the probabilities of Cytosine (C), Guanine (G), and Adenine (A) occurring in the 1st, 2nd, and 3rd positions of triplets, respectively; (iv) the probabilities of a G occurring in the 1st and 2nd positions of triplets; and (v) the probabilities of a T occurring in the 1st and an A in the 2nd position of triplets. Because UFM is based on primary determinants of coding sequences that are conserved throughout the biosphere, it is suitable for cORF classification of any sequence in eukaryote transcriptomes without prior knowledge. Considering the protein sequences of the Protein Data Bank (RCSB PDB or more simply PDB) as a reference, we found that UFM classifies cORFs of ≥200 bp (if the coding strand is known) and cORFs of ≥300 bp (if the coding strand is unknown), and releases them in their coding strand and coding frame, which allows their automatic translation into protein sequences with a success rate equal to or higher than 95%. We first established the statistical parameters of UFM using ESTs from Plasmodium falciparum, Arabidopsis thaliana, Oryza sativa, Zea mays, Drosophila melanogaster, Homo sapiens and Chlamydomonas reinhardtii in reference to the protein sequences of PDB. Second, we showed that the success rate of cORF classification using UFM is expected to apply to approximately 95% of higher eukaryote genes that encode for proteins. Third, we used UFM in combination with CAP3 to assemble large EST samples into cORFs that we used to analyze transcriptome phenotypes in rice, maize, and humans. We discuss the error rate and the interference of noisy sequences such as pseudogenes, transposons, and retrotransposons. This method is suitable for rapid cORF extraction from transcriptome data and allows correct description of the genome phenotypes of plant genomes without prior knowledge. Additional care is necessary when addressing the human transcriptome due to the interference caused by large amounts of noisy sequences. UFM can be regarded as a low complexity tool for prior knowledge extraction concerning the coding fraction of the transcriptome of any eukaryote. Due to its low level of complexity, UFM is also very robust to variations of codon usage.
PMCID: PMC3561939  PMID: 23400232
genomics; RNY; EST; ORF; CDS; UFM; classification
48.  PREPACT 2.0: Predicting C-to-U and U-to-C RNA Editing in Organelle Genome Sequences with Multiple References and Curated RNA Editing Annotation 
RNA editing is vast in some genetic systems, with up to thousands of targeted C-to-U and U-to-C substitutions in mitochondria and chloroplasts of certain plants. Efficient prognoses of RNA editing in organelle genomes will help to reveal overlooked cases of editing. We present PREPACT 2.0 ( with numerous enhancements of our previously developed Plant RNA Editing Prediction & Analysis Computer Tool. Reference organelle transcriptomes for editing prediction have been extended and reorganized to include 19 curated mitochondrial and 13 chloroplast genomes, now allowing to distinguish RNA editing sites from “pre-edited” sites. Queries may be run against multiple references and a new “commons” function identifies and highlights orthologous candidate editing sites congruently predicted by multiple references. Enhancements to the BLASTX mode in PREPACT 2.0 allow querying of complete novel organelle genomes within a few minutes, identifying protein genes and candidate RNA editing sites simultaneously without prior user analyses.
PMCID: PMC3547502  PMID: 23362369
pyrimidine substitutions; RNA editing prediction; plants; protists; mitochondrial DNA; chloroplast DNA; BLASTX
49.  Identification and Insilico Analysis of Retinoblastoma Serum microRNA Profile and Gene Targets Towards Prediction of Novel Serum Biomarkers 
Retinoblastoma (RB) is a malignant tumor of the retina seen in children, and potential non invasive biomarkers are in need for rapid diagnosis and for prognosticating the therapy. This study was undertaken to identify the differentially expressed miRNAs in the serum of children with RB in comparison with the normal age matched serum, to analyze its concurrence with the existing RB tumor miRNA profile, to identify its novel gene targets specific to RB, and to study the expression of a few of the identified oncogenic miRNAs in the advanced stage primary RB patient’s serum sample. MiRNA profiling was performed on 14 pooled serum from children with advanced RB and 14 normal age matched serum samples, wherein 21 miRNAs were found to be upregulated (fold change ≤ −2.0, P ≤ 0.05) and 24 to be downregulated (fold change ≥ +2.0, P ≤ 0.05). Furthermore, intersection of 59 significantly deregulated miRNAs identified from RB tumor profiles with that of miRNAs detected in serum profile revealed that 33 miRNAs had followed a similar deregulation pattern in RB serum. Later we validated a few of the miRNAs (miRNA 17-92) identified by microarray in the RB patient serum samples (n = 20) by using qRT-PCR. Expression of the oncogenic miRNAs, miR-17, miR-18a, and miR-20a by qRT-PCR was significant in the serum samples exploring the potential of serum miRNAs identification as noninvasive diagnosis. Moreover, from miRNA gene target prediction, key regulatory genes of cell proliferation, apoptosis, and positive and negative regulatory networks involved in RB progression were identified in the gene expression profile of RB tumors. Therefore, these identified miRNAs and their corresponding target genes could give insights on potential biomarkers and key events involved in the RB pathway.
PMCID: PMC3547501  PMID: 23400111
retinoblastoma; micro RNA; biomarkers; bioinformatics tools
50.  Methods of Combinatorial Optimization to Reveal Factors Affecting Gene Length 
In this paper we present a novel method for genome ranking according to gene lengths. The main outcomes described in this paper are the following: the formulation of the genome ranking problem, presentation of relevant approaches to solve it, and the demonstration of preliminary results from prokaryotic genomes ordering. Using a subset of prokaryotic genomes, we attempted to uncover factors affecting gene length. We have demonstrated that hyperthermophilic species have shorter genes as compared with mesophilic organisms, which probably means that environmental factors affect gene length. Moreover, these preliminary results show that environmental factors group together in ranking evolutionary distant species.
PMCID: PMC3528112  PMID: 23300345
adaptation; evolution of prokaryotes; orthologs; machine learning; dimension-reduction techniques; factor analysis; clustering; rating; ranking

Results 26-50 (145)