The effectiveness of current therapeutic regimens for Mycobacterium tuberculosis (Mtb) is diminished by the need for prolonged therapy and the rise of drug resistant/tolerant strains. This global health threat, despite decades of basic research and a wealth of legacy knowledge, is due to a lack of systems level understanding that can innovate the process of fast acting and high efficacy drug discovery.
The enhanced functional annotations of the Mtb genome, which were previously obtained through a crowd sourcing approach was used to reconstruct the metabolic network of Mtb in a bottom up manner. We represent this information by developing a novel Systems Biology Spindle Map of Metabolism (SBSM) and comprehend its static and dynamic structure using various computational approaches based on simulation and design.
The reconstructed metabolism of Mtb encompasses 961 metabolites, involved in 1152 reactions catalyzed by 890 protein coding genes, organized into 50 pathways. By accounting for static and dynamic analysis of SBSM in Mtb we identified various critical proteins required for the growth and survival of bacteria. Further, we assessed the potential of these proteins as putative drug targets that are fast acting and less toxic. Further, we formulate a novel concept of metabolic persister genes (MPGs) and compared our predictions with published in vitro and in vivo experimental evidence. Through such analyses, we report for the first time that de novo biosynthesis of NAD may give rise to bacterial persistence in Mtb under conditions of metabolic stress induced by conventional anti-tuberculosis therapy. We propose such MPG’s as potential combination of drug targets for existing antibiotics that can improve their efficacy and efficiency for drug tolerant bacteria.
The systems level framework formulated by us to identify potential non-toxic drug targets and strategies to circumvent the issue of bacterial persistence can substantially aid in the process of TB drug discovery and translational research.
Electronic supplementary material
The online version of this article (doi:10.1186/s12967-014-0263-5) contains supplementary material, which is available to authorized users.
Systems biology spindle map; Complexity; Bacterial persistence; Mathematical modeling; Metabolic persister genes
Anurag Agrawal and colleagues describe their experience of setting up a readily deployable cargo container-based health center in rural India.
Please see later in the article for the Editors' Summary
The explosion of genome sequencing data along with genotype to phenotype correlation studies has created data deluge in the area of biomedical sciences. The aim of the Medical bioinformatics section is to aid the development and maturation of the field by providing a platform for the translation of these datasets into useful clinical applications. The increase in computing capabilities and availability of different data from advanced technologies will allow researchers to build System Biology models of various diseases in order to efficiently develop new therapeutic interventions and reduce the current prohibitively large costs of drug discovery.
The section welcomes studies on the development of Biomedical Informatics for translational medicine and clinical applications, including tools, methodologies and data integration.
A decade since the availability of Mycobacterium tuberculosis (Mtb) genome sequence, no promising drug has seen the light of the day. This not only indicates the challenges in discovering new drugs but also suggests a gap in our current understanding of Mtb biology. We attempt to bridge this gap by carrying out extensive re-annotation and constructing a systems level protein interaction map of Mtb with an objective of finding novel drug target candidates. Towards this, we synergized crowd sourcing and social networking methods through an initiative ‘Connect to Decode’ (C2D) to generate the first and largest manually curated interactome of Mtb termed ‘interactome pathway’ (IPW), encompassing a total of 1434 proteins connected through 2575 functional relationships. Interactions leading to gene regulation, signal transduction, metabolism, structural complex formation have been catalogued. In the process, we have functionally annotated 87% of the Mtb genome in context of gene products. We further combine IPW with STRING based network to report central proteins, which may be assessed as potential drug targets for development of drugs with least possible side effects. The fact that five of the 17 predicted drug targets are already experimentally validated either genetically or biochemically lends credence to our unique approach.
Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.
MicroRNAs (miRNAs) regulate several biological processes through post-transcriptional gene silencing. The efficiency of binding of miRNAs to target transcripts depends on the sequence as well as intramolecular structure of the transcript. Single Nucleotide Polymorphisms (SNPs) can contribute to alterations in the structure of regions flanking them, thereby influencing the accessibility for miRNA binding.
The entire human genome was analyzed for SNPs in and around predicted miRNA target sites. Polymorphisms within 200 nucleotides that could alter the intramolecular structure at the target site, thereby altering regulation were annotated. Collated information was ported in a MySQL database with a user-friendly interface accessible through the URL: .
The database has a user-friendly interface where the information can be queried using either the gene name, microRNA name, polymorphism ID or transcript ID. Combination queries using 'AND' or 'OR' is also possible along with specifying the degree of change of intramolecular bonding with and without the polymorphism. Such a resource would enable researchers address questions like the role of regulatory SNPs in the 3' UTRs and population specific regulatory modulations in the context of microRNA targets.
Cellular miRNAs play an important role in the regulation of gene expression in eukaryotes. Recently, miRNAs have also been shown to be able to target and inhibit viral gene expression. Computational predictions revealed earlier that the HIV-1 genome includes regions that may be potentially targeted by human miRNAs. Here we report the functionality of predicted miR-29a target site in the HIV-1 nef gene.
We find that the human miRNAs hsa-miR-29a and 29b are expressed in human peripheral blood mononuclear cells. Expression of a luciferase reporter bearing the nef miR-29a target site was decreased compared to the luciferase construct without the target site. Locked nucleic acid modified anti-miRNAs targeted against hsa-miR-29a and 29b specifically reversed the inhibitory effect mediated by cellular miRNAs on the target site. Ectopic expression of the miRNA results in repression of the target Nef protein and reduction of virus levels.
Our results show that the cellular miRNA hsa-miR29a downregulates the expression of Nef protein and interferes with HIV-1 replication.
The Human Genome Variation database of Genotype to Phenotype information (HGVbaseG2P) is a new central database for summary-level findings produced by human genetic association studies, both large and small. Such a database is needed so that researchers have an easy way to access all the available association study data relevant to their genes, genome regions or diseases of interest. Such a depository will allow true positive signals to be more readily distinguished from false positives (type I error) that fail to consistently replicate. In this paper we describe how HGVbaseG2P has been constructed, and how its data are gathered and organized. We present a range of user-friendly but powerful website tools for searching, browsing and visualizing G2P study findings. HGVbaseG2P is available at http://www.hgvbaseg2p.org.
Ayurveda is an ancient system of personalized medicine documented and practiced in India since 1500 B.C. According to this system an individual's basic constitution to a large extent determines predisposition and prognosis to diseases as well as therapy and life-style regime. Ayurveda describes seven broad constitution types (Prakritis) each with a varying degree of predisposition to different diseases. Amongst these, three most contrasting types, Vata, Pitta, Kapha, are the most vulnerable to diseases. In the realm of modern predictive medicine, efforts are being directed towards capturing disease phenotypes with greater precision for successful identification of markers for prospective disease conditions. In this study, we explore whether the different constitution types as described in Ayurveda has molecular correlates.
Normal individuals of the three most contrasting constitutional types were identified following phenotyping criteria described in Ayurveda in Indian population of Indo-European origin. The peripheral blood samples of these individuals were analysed for genome wide expression levels, biochemical and hematological parameters. Gene Ontology (GO) and pathway based analysis was carried out on differentially expressed genes to explore if there were significant enrichments of functional categories among Prakriti types.
Individuals from the three most contrasting constitutional types exhibit striking differences with respect to biochemical and hematological parameters and at genome wide expression levels. Biochemical profiles like liver function tests, lipid profiles, and hematological parameters like haemoglobin exhibited differences between Prakriti types. Functional categories of genes showing differential expression among Prakriti types were significantly enriched in core biological processes like transport, regulation of cyclin dependent protein kinase activity, immune response and regulation of blood coagulation. A significant enrichment of housekeeping, disease related and hub genes were observed in these extreme constitution types.
Ayurveda based method of phenotypic classification of extreme constitutional types allows us to uncover genes that may contribute to system level differences in normal individuals which could lead to differential disease predisposition. This is a first attempt towards unraveling the clinical phenotyping principle of a traditional system of medicine in terms of modern biology. An integration of Ayurveda with genomics holds potential and promise for future predictive medicine.
Expansion of trinucleotide repeats in coding and non-coding regions of genes is associated with sixteen neurodegenerative disorders. However, the molecular effects that lead to neurodegeneration have remained elusive. We have explored the role of transcriptional dysregulation by TATA-box binding protein (TBP) containing an expanded polyglutamine stretch in a mouse neuronal cell culture based model. We find that mouse neuronal cells expressing a variant of human TBP harboring an abnormally expanded polyQ tract not only form intranuclear aggregates, but also show transcription dysregulation of the voltage dependent anion channel, Vdac1, increased cytochrome c release from the mitochondria and upregulation of genes involved in localized neuronal translation. On the other hand, unfolded protein response seemed to be unaffected. Consistent with an increased transcriptional effect, we observe an elevated promoter occupancy by TBP in vivo in TATA containing and TATA-less promoters of differentially expressed genes. Our study suggests a link between transcriptional dysfunction and cell death in trinucleotide repeat mediated neuronal dysfunction through voltage dependent anion channel, Vdac1, which has been recently recognized as a critical determinant of cell death.
Quantitative variation in gene expression has been proposed to underlie phenotypic variation among human individuals. A facilitating step towards understanding the basis for gene expression variability is associating genome wide transcription patterns with potential cis modifiers of gene expression.
EXPOLDB, a novel Database, is a new effort addressing this need by providing information on gene expression levels variability across individuals, as well as the presence and features of potentially polymorphic (TG/CA)n repeats. EXPOLDB thus enables associating transcription levels with the presence and length of (TG/CA)n repeats. One of the unique features of this database is the display of expression data for 5 pairs of monozygotic twins, which allows identification of genes whose variability in expression, are influenced by non-genetic factors including environment. In addition to queries by gene name, EXPOLDB allows for queries by a pathway name. Users can also upload their list of HGNC (HUGO (The Human Genome Organisation) Gene Nomenclature Committee) symbols for interrogating expression patterns. The online application 'SimRep' can be used to find simple repeats in a given nucleotide sequence. To help illustrate primary applications, case examples of Housekeeping genes and the RUNX gene family, as well as one example of glycolytic pathway genes are provided.
The uniqueness of EXPOLDB is in facilitating the association of genome wide transcription variations with the presence and type of polymorphic repeats while offering the feature for identifying genes whose expression variability are influenced by non genetic factors including environment. In addition, the database allows comprehensive querying including functional information on biochemical pathways of the human genes.
EXPOLDB can be accessed at
MicroRNAs (miRNAs) are a new class of 18–23 nucleotide long non-coding RNAs that play critical roles in a wide spectrum of biological processes. Recent reports also throw light into the role of microRNAs as critical effectors in the intricate host-pathogen interaction networks. Evidence suggests that both virus and hosts encode microRNAs. The exclusive dependence of viruses on the host cellular machinery for their propagation and survival also make them highly susceptible to the vagaries of the cellular environment like small RNA mediated interference. It also gives the virus an opportunity to fight and/or modulate the host to suite its needs. Thus the range of interactions possible through miRNA-mRNA cross-talk at the host-pathogen interface is large. These interactions can be further fine-tuned in the host by changes in gene expression, mutations and polymorphisms. In the pathogen, the high rate of mutations adds to the complexity of the interaction network. Though evidence regarding microRNA mediated cross-talk in viral infections is just emerging, it offers an immense opportunity not only to understand the intricacies of host-pathogen interactions, and possible explanations to viral tropism, latency and oncogenesis, but also to develop novel biomarkers and therapeutics.
Creation of human gene families was facilitated significantly by gene duplication and diversification. The (TG/CA)n repeats exhibit length variability, display genome-wide distribution, and are abundant in the human genome. Accumulation of evidences for their multiple functional roles including regulation of transcription and stimulation of recombination and splicing elect them as functional elements. Here, we report analysis of the distribution of (TG/CA)n repeats in human gene families.
The 1,317 human gene families were classified into six functional classes. Distribution of (TG/CA)n repeats were analyzed both from a global perspective and from a stratified perspective based on their biological properties. The number of genes with repeats decreased with increasing repeat length and several genes (53%) had repeats of multiple types in various combinations. Repeats were positively associated with the class of Signaling and communication whereas, they were negatively associated with the classes of Immune and related functions and of Information. The proportion of genes with (TG/CA)n repeats in each class was proportional to the corresponding average gene length. The repeat distribution pattern in large gene families generally mirrored the global distribution pattern but differed particularly for Collagen gene family, which was rich in repeats. The position and flanking sequences of the repeats of Collagen genes showed high conservation in the Chimpanzee genome. However the majority of these repeats displayed length polymorphism.
Positive association of repeats with genes of Signaling and communication points to their role in modulation of transcription. Negative association of repeats in genes of Information relates to the smaller gene length, higher expression and fundamental role in cellular physiology. In genes of Immune and related functions negative association of repeats perhaps relates to the smaller gene length and the directional nature of the recombinogenic processes to generate immune diversity. Thus, multiple factors including gene length, function and directionality of recombinogenic processes steered the observed distribution of (TG/CA)n repeats. Furthermore, the distribution of repeat patterns is consistent with the current model that long repeats tend to contract more than expand whereas, the reverse dynamics operates in short repeats.
Global regulatory mechanisms involving chromatin assembly and remodelling in the promoter regions of genes is implicated in eukaryotic transcription control especially for genes subjected to spatial and temporal regulation. The potential to utilise global regulatory mechanisms for controlling gene expression might depend upon the architecture of the chromatin in and around the gene. In-silico analysis can yield important insights into this aspect, facilitating comparison of two or more classes of genes comprising of a large number of genes within each group.
In the present study, we carried out a comparative analysis of chromatin characteristics in terms of the scaffold/matrix attachment regions, nucleosome formation potential and the occurrence of repetitive sequences, in the upstream regulatory regions of housekeeping and tissue specific genes. Our data show that putative scaffold/matrix attachment regions are more abundant and nucleosome formation potential is higher in the 5' regions of tissue specific genes as compared to the housekeeping genes.
The differences in the chromatin features between the two groups of genes indicate the involvement of chromatin organisation in the control of gene expression. The presence of global regulatory mechanisms mediated through chromatin organisation can decrease the burden of invoking gene specific regulators for maintenance of the active/silenced state of gene expression. This could partially explain the lower number of genes estimated in the human genome.
The primate-specific Alu elements, which originated 65 million years ago, exist in over a million copies in the human genome. These elements have been involved in genome shuffling and various diseases not only through retrotransposition but also through large scale Alu-Alu mediated recombination. Only a few subfamilies of Alus are currently retropositionally active and show insertion/deletion polymorphisms with associated phenotypes. Retroposition occurs by means of RNA intermediates synthesised by a RNA polymerase III promoter residing in the A-Box and B-Box in these elements. Alus have also been shown to harbour a number of transcription factor binding sites, as well as hormone responsive elements. The distribution of Alus has been shown to be non-random in the human genome and these elements are increasingly being implicated in diverse functions such as transcription, translation, response to stress, nucleosome positioning and imprinting.
We conducted a retrospective analysis of putative functional sites, such as the RNA pol III promoter elements, pol II regulatory elements like hormone responsive elements and ligand-activated receptor binding sites, in Alus of various evolutionary ages. We observe a progressive loss of the RNA pol III transcriptional potential with concomitant accumulation of RNA pol II regulatory sites. We also observe a significant over-representation of Alus harboring these sites in promoter regions of signaling and metabolism genes of chromosome 22, when compared to genes of information pathway components, structural and transport proteins. This difference is not so significant between functional categories in the intronic regions of the same genes.
Our study clearly suggests that Alu elements, through retrotransposition, could distribute functional and regulatable promoter elements, which in the course of subsequent selection might be stabilized in the genome. Exaptation of regulatory elements in the preexisting genes through Alus could thus have contributed to evolution of novel regulatory networks in the primate genomes. With such a wide spectrum of regulatory sites present in Alus, it also becomes imperative to screen for variations in these sites in candidate genes, which are otherwise repeat-masked in studies pertaining to identification of predisposition markers.
Our recent work on an A→G single nucleotide polymorphism (SNP) at the quasi-palindromic sequence d(TGGGG[A/G]CCCCA) of HS4 of the human β-globin locus control region in an Indian population showed a significant association between the G allele and the occurrence of β-thalassemia. Using UV-thermal denaturation, gel assay, circular dichroism (CD) and nuclease digestion experiments we have demonstrated that the undecamer quasi- palindromic sequence d(TGGGGACCCCA) (HPA11) and its reported polymorphic (SNP) version d(TGG GGGCCCCA) (HPG11) exist in hairpin–duplex equilibria. The biphasic nature of the melting profiles for both the oligonucleotides persisted at low as well as high salt concentrations. The HPG11 hairpin showed a higher Tm than HPA11. The presence of unimolecular and bimolecular species was also shown by non-denaturating gel electrophoresis experiments. The CD spectra of both oligonucleotides showed features of the A- as well as B-type conformations and, moreover, exhibited a concentration dependence. The disappearance of the 265 nm positive CD signal in an oligomer concentration-dependent manner is indicative of an A→B transition. The results give unprecedented insight into the in vitro structure of the quasi-palindromic sequence and provide the first report in which a hairpin–duplex equilibrium has been correlated with an A→B interconversion of DNA. The nuclease-dependent degradation suggests that HPG11 is more resistant to nuclease than HPA11. Multiple sequence alignment of the HS4 region of the β-globin gene cluster from different organisms revealed that this quasi-palindromic stretch is unique to Homo sapiens. We propose that quasi-palindromic sequences may form stable mini- hairpins or cruciforms in the HS4 region and might play a role in regulating β-globin gene expression by affecting the binding of transcription factors.
Poly purine.pyrimidine sequences have the potential to adopt intramolecular triplex structures and are overrepresented upstream of genes in eukaryotes. These sequences may regulate gene expression by modulating the interaction of transcription factors with DNA sequences upstream of genes.
A poly purine.pyrimidine sequence with the potential to adopt an intramolecular triplex DNA structure was designed. The sequence was inserted within a nucleosome positioned upstream of the β-galactosidase gene in yeast, Saccharomyces cerevisiae, between the cycl promoter and gal 10Upstream Activating Sequences (UASg). Upon derepression with galactose, β-galactosidase gene expression is reduced 12-fold in cells carrying single copy poly purine.pyrimidine sequences. This reduction in expression is correlated with reduced transcription. Furthermore, we show that plasmids carrying a poly purine.pyrimidine sequence are not specifically lost from yeast cells.
We propose that a poly purine.pyrimidine sequence upstream of a gene affects transcription. Plasmids carrying this sequence are not specifically lost from cells and thus no additional effort is needed for the replication of these sequences in eukaryotic cells.