The explosion of genome sequencing data along with genotype to phenotype correlation studies has created data deluge in the area of biomedical sciences. The aim of the Medical bioinformatics section is to aid the development and maturation of the field by providing a platform for the translation of these datasets into useful clinical applications. The increase in computing capabilities and availability of different data from advanced technologies will allow researchers to build System Biology models of various diseases in order to efficiently develop new therapeutic interventions and reduce the current prohibitively large costs of drug discovery.
The section welcomes studies on the development of Biomedical Informatics for translational medicine and clinical applications, including tools, methodologies and data integration.
Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.
We propose an innovative, integrated, cost-effective health system to combat major non-communicable diseases (NCDs), including cardiovascular, chronic respiratory, metabolic, rheumatologic and neurologic disorders and cancers, which together are the predominant health problem of the 21st century. This proposed holistic strategy involves comprehensive patient-centered integrated care and multi-scale, multi-modal and multi-level systems approaches to tackle NCDs as a common group of diseases. Rather than studying each disease individually, it will take into account their intertwined gene-environment, socio-economic interactions and co-morbidities that lead to individual-specific complex phenotypes. It will implement a road map for predictive, preventive, personalized and participatory (P4) medicine based on a robust and extensive knowledge management infrastructure that contains individual patient information. It will be supported by strategic partnerships involving all stakeholders, including general practitioners associated with patient-centered care. This systems medicine strategy, which will take a holistic approach to disease, is designed to allow the results to be used globally, taking into account the needs and specificities of local economies and health systems.
With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes.
A set of ∼30K unique sequences (UniSeqs) representing ∼19K clusters were generated from ∼98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66% of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.
The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.
MicroRNAs (miRNAs) regulate several biological processes through post-transcriptional gene silencing. The efficiency of binding of miRNAs to target transcripts depends on the sequence as well as intramolecular structure of the transcript. Single Nucleotide Polymorphisms (SNPs) can contribute to alterations in the structure of regions flanking them, thereby influencing the accessibility for miRNA binding.
The entire human genome was analyzed for SNPs in and around predicted miRNA target sites. Polymorphisms within 200 nucleotides that could alter the intramolecular structure at the target site, thereby altering regulation were annotated. Collated information was ported in a MySQL database with a user-friendly interface accessible through the URL: .
The database has a user-friendly interface where the information can be queried using either the gene name, microRNA name, polymorphism ID or transcript ID. Combination queries using 'AND' or 'OR' is also possible along with specifying the degree of change of intramolecular bonding with and without the polymorphism. Such a resource would enable researchers address questions like the role of regulatory SNPs in the 3' UTRs and population specific regulatory modulations in the context of microRNA targets.
The prevalence of overweight is increasing globally and has become a serious health problem. Low-grade chronic inflammation in overweight subjects is thought to play an important role in disease development. Novel tools to understand these processes are needed. Metabolic profiling is one such tool that can provide novel insights into the impact of treatments on metabolism.
To study the metabolic changes induced by a mild anti-inflammatory drug intervention, plasma metabolic profiling was applied in overweight human volunteers with elevated levels of the inflammatory plasma marker C-reactive protein. Liquid and gas chromatography mass spectrometric methods were used to detect high and low abundant plasma metabolites both in fasted conditions and during an oral glucose tolerance test. This is based on the concept that the resilience of the system can be assessed after perturbing a homeostatic situation.
Metabolic changes were subtle and were only detected using metabolic profiling in combination with an oral glucose tolerance test. The repeated measurements during the oral glucose tolerance test increased statistical power, but the metabolic perturbation also revealed metabolites that respond differentially to the oral glucose tolerance test. Specifically, multiple metabolic intermediates of the glutathione synthesis pathway showed time-dependent suppression in response to the glucose challenge test. The fact that this is an insulin sensitive pathway suggests that inflammatory modulation may alter insulin signaling in overweight men.
Cellular miRNAs play an important role in the regulation of gene expression in eukaryotes. Recently, miRNAs have also been shown to be able to target and inhibit viral gene expression. Computational predictions revealed earlier that the HIV-1 genome includes regions that may be potentially targeted by human miRNAs. Here we report the functionality of predicted miR-29a target site in the HIV-1 nef gene.
We find that the human miRNAs hsa-miR-29a and 29b are expressed in human peripheral blood mononuclear cells. Expression of a luciferase reporter bearing the nef miR-29a target site was decreased compared to the luciferase construct without the target site. Locked nucleic acid modified anti-miRNAs targeted against hsa-miR-29a and 29b specifically reversed the inhibitory effect mediated by cellular miRNAs on the target site. Ectopic expression of the miRNA results in repression of the target Nef protein and reduction of virus levels.
Our results show that the cellular miRNA hsa-miR29a downregulates the expression of Nef protein and interferes with HIV-1 replication.
Presence of the human Y-chromosome in females with Turner Syndrome (TS) enhances the risk of development of gonadoblastoma besides causing several other phenotypic abnormalities. In the present study, we have analyzed the Y chromosome in 15 clinically diagnosed Turner Syndrome (TS) patients and detected high level of mosaicisms ranging from 45,XO:46,XY = 100:0% in 4; 45,XO:46,XY:46XX = 4:94:2 in 8; and 45,XO:46,XY:46XX = 50:30:20 cells in 3 TS patients, unlike previous reports showing 5–8% cells with Y- material. Also, no ring, marker or di-centric Y was observed in any of the cases. Of the two TS patients having intact Y chromosome in >85% cells, one was exceptionally tall. Both the patients were positive for SRY, DAZ, CDY1, DBY, UTY and AZFa, b and c specific STSs. Real Time PCR and FISH demonstrated tandem duplication/multiplication of the SRY and DAZ genes. At sequence level, the SRY was normal in 8 TS patients while the remaining 7 showed either absence of this gene or known and novel mutations within and outside of the HMG box. SNV/SFV analysis showed normal four copies of the DAZ genes in these 8 patients. All the TS patients showed aplastic uterus with no ovaries and no symptom of gonadoblastoma. Present study demonstrates new types of polymorphisms indicating that no two TS patients have identical genotype-phenotype. Thus, a comprehensive analysis of more number of samples is warranted to uncover consensus on the loci affected, to be able to use them as potential diagnostic markers.
The Human Genome Variation database of Genotype to Phenotype information (HGVbaseG2P) is a new central database for summary-level findings produced by human genetic association studies, both large and small. Such a database is needed so that researchers have an easy way to access all the available association study data relevant to their genes, genome regions or diseases of interest. Such a depository will allow true positive signals to be more readily distinguished from false positives (type I error) that fail to consistently replicate. In this paper we describe how HGVbaseG2P has been constructed, and how its data are gathered and organized. We present a range of user-friendly but powerful website tools for searching, browsing and visualizing G2P study findings. HGVbaseG2P is available at http://www.hgvbaseg2p.org.
Ayurveda is an ancient system of personalized medicine documented and practiced in India since 1500 B.C. According to this system an individual's basic constitution to a large extent determines predisposition and prognosis to diseases as well as therapy and life-style regime. Ayurveda describes seven broad constitution types (Prakritis) each with a varying degree of predisposition to different diseases. Amongst these, three most contrasting types, Vata, Pitta, Kapha, are the most vulnerable to diseases. In the realm of modern predictive medicine, efforts are being directed towards capturing disease phenotypes with greater precision for successful identification of markers for prospective disease conditions. In this study, we explore whether the different constitution types as described in Ayurveda has molecular correlates.
Normal individuals of the three most contrasting constitutional types were identified following phenotyping criteria described in Ayurveda in Indian population of Indo-European origin. The peripheral blood samples of these individuals were analysed for genome wide expression levels, biochemical and hematological parameters. Gene Ontology (GO) and pathway based analysis was carried out on differentially expressed genes to explore if there were significant enrichments of functional categories among Prakriti types.
Individuals from the three most contrasting constitutional types exhibit striking differences with respect to biochemical and hematological parameters and at genome wide expression levels. Biochemical profiles like liver function tests, lipid profiles, and hematological parameters like haemoglobin exhibited differences between Prakriti types. Functional categories of genes showing differential expression among Prakriti types were significantly enriched in core biological processes like transport, regulation of cyclin dependent protein kinase activity, immune response and regulation of blood coagulation. A significant enrichment of housekeeping, disease related and hub genes were observed in these extreme constitution types.
Ayurveda based method of phenotypic classification of extreme constitutional types allows us to uncover genes that may contribute to system level differences in normal individuals which could lead to differential disease predisposition. This is a first attempt towards unraveling the clinical phenotyping principle of a traditional system of medicine in terms of modern biology. An integration of Ayurveda with genomics holds potential and promise for future predictive medicine.
The chromosome 18q22-23 region has been shown to be implicated in bipolar disorder (BPAD) by several studies. PHLPP1 gene, in the locus (chromosome 18q22-23), is involved in circadian pathways
and bears modules like ‘PH domain and leucine rich repeat protein phosphatase’. This gene also contains a polyglutamine (CAG or PolyQ) repeat motif at the carboxyl terminal end. A comparative analysis
of the PolyQ repeats of the PHLPP1 gene in humans, non-human primates and other species has been attempted in order to investigate the possible significance of repeat length as seen in other triplet-repeat
associated diseases. Sequencing of the CAG repeat in humans and in non-human primates revealed that the CAG repeat is not polymorphic in humans; whereas, in other species it shows an area of high variability,
both in length and sequence composition. Despite the conservation of circadian clock components in different species, there is remarkable diversity in the protein structure, regulation and biochemical functions
of the circadian orthologs. These can be due to specific adaptations in accordance with the physiology of the particular species providing a species-specific biological advantage.
PHLPP1; bipolar disorder; CAG repeat; PolyQ; phylogenetic analysis; evolution
Expansion of trinucleotide repeats in coding and non-coding regions of genes is associated with sixteen neurodegenerative disorders. However, the molecular effects that lead to neurodegeneration have remained elusive. We have explored the role of transcriptional dysregulation by TATA-box binding protein (TBP) containing an expanded polyglutamine stretch in a mouse neuronal cell culture based model. We find that mouse neuronal cells expressing a variant of human TBP harboring an abnormally expanded polyQ tract not only form intranuclear aggregates, but also show transcription dysregulation of the voltage dependent anion channel, Vdac1, increased cytochrome c release from the mitochondria and upregulation of genes involved in localized neuronal translation. On the other hand, unfolded protein response seemed to be unaffected. Consistent with an increased transcriptional effect, we observe an elevated promoter occupancy by TBP in vivo in TATA containing and TATA-less promoters of differentially expressed genes. Our study suggests a link between transcriptional dysfunction and cell death in trinucleotide repeat mediated neuronal dysfunction through voltage dependent anion channel, Vdac1, which has been recently recognized as a critical determinant of cell death.
Quantitative variation in gene expression has been proposed to underlie phenotypic variation among human individuals. A facilitating step towards understanding the basis for gene expression variability is associating genome wide transcription patterns with potential cis modifiers of gene expression.
EXPOLDB, a novel Database, is a new effort addressing this need by providing information on gene expression levels variability across individuals, as well as the presence and features of potentially polymorphic (TG/CA)n repeats. EXPOLDB thus enables associating transcription levels with the presence and length of (TG/CA)n repeats. One of the unique features of this database is the display of expression data for 5 pairs of monozygotic twins, which allows identification of genes whose variability in expression, are influenced by non-genetic factors including environment. In addition to queries by gene name, EXPOLDB allows for queries by a pathway name. Users can also upload their list of HGNC (HUGO (The Human Genome Organisation) Gene Nomenclature Committee) symbols for interrogating expression patterns. The online application 'SimRep' can be used to find simple repeats in a given nucleotide sequence. To help illustrate primary applications, case examples of Housekeeping genes and the RUNX gene family, as well as one example of glycolytic pathway genes are provided.
The uniqueness of EXPOLDB is in facilitating the association of genome wide transcription variations with the presence and type of polymorphic repeats while offering the feature for identifying genes whose expression variability are influenced by non genetic factors including environment. In addition, the database allows comprehensive querying including functional information on biochemical pathways of the human genes.
EXPOLDB can be accessed at
MicroRNAs (miRNAs) are a new class of 18–23 nucleotide long non-coding RNAs that play critical roles in a wide spectrum of biological processes. Recent reports also throw light into the role of microRNAs as critical effectors in the intricate host-pathogen interaction networks. Evidence suggests that both virus and hosts encode microRNAs. The exclusive dependence of viruses on the host cellular machinery for their propagation and survival also make them highly susceptible to the vagaries of the cellular environment like small RNA mediated interference. It also gives the virus an opportunity to fight and/or modulate the host to suite its needs. Thus the range of interactions possible through miRNA-mRNA cross-talk at the host-pathogen interface is large. These interactions can be further fine-tuned in the host by changes in gene expression, mutations and polymorphisms. In the pathogen, the high rate of mutations adds to the complexity of the interaction network. Though evidence regarding microRNA mediated cross-talk in viral infections is just emerging, it offers an immense opportunity not only to understand the intricacies of host-pathogen interactions, and possible explanations to viral tropism, latency and oncogenesis, but also to develop novel biomarkers and therapeutics.
Creation of human gene families was facilitated significantly by gene duplication and diversification. The (TG/CA)n repeats exhibit length variability, display genome-wide distribution, and are abundant in the human genome. Accumulation of evidences for their multiple functional roles including regulation of transcription and stimulation of recombination and splicing elect them as functional elements. Here, we report analysis of the distribution of (TG/CA)n repeats in human gene families.
The 1,317 human gene families were classified into six functional classes. Distribution of (TG/CA)n repeats were analyzed both from a global perspective and from a stratified perspective based on their biological properties. The number of genes with repeats decreased with increasing repeat length and several genes (53%) had repeats of multiple types in various combinations. Repeats were positively associated with the class of Signaling and communication whereas, they were negatively associated with the classes of Immune and related functions and of Information. The proportion of genes with (TG/CA)n repeats in each class was proportional to the corresponding average gene length. The repeat distribution pattern in large gene families generally mirrored the global distribution pattern but differed particularly for Collagen gene family, which was rich in repeats. The position and flanking sequences of the repeats of Collagen genes showed high conservation in the Chimpanzee genome. However the majority of these repeats displayed length polymorphism.
Positive association of repeats with genes of Signaling and communication points to their role in modulation of transcription. Negative association of repeats in genes of Information relates to the smaller gene length, higher expression and fundamental role in cellular physiology. In genes of Immune and related functions negative association of repeats perhaps relates to the smaller gene length and the directional nature of the recombinogenic processes to generate immune diversity. Thus, multiple factors including gene length, function and directionality of recombinogenic processes steered the observed distribution of (TG/CA)n repeats. Furthermore, the distribution of repeat patterns is consistent with the current model that long repeats tend to contract more than expand whereas, the reverse dynamics operates in short repeats.
Global regulatory mechanisms involving chromatin assembly and remodelling in the promoter regions of genes is implicated in eukaryotic transcription control especially for genes subjected to spatial and temporal regulation. The potential to utilise global regulatory mechanisms for controlling gene expression might depend upon the architecture of the chromatin in and around the gene. In-silico analysis can yield important insights into this aspect, facilitating comparison of two or more classes of genes comprising of a large number of genes within each group.
In the present study, we carried out a comparative analysis of chromatin characteristics in terms of the scaffold/matrix attachment regions, nucleosome formation potential and the occurrence of repetitive sequences, in the upstream regulatory regions of housekeeping and tissue specific genes. Our data show that putative scaffold/matrix attachment regions are more abundant and nucleosome formation potential is higher in the 5' regions of tissue specific genes as compared to the housekeeping genes.
The differences in the chromatin features between the two groups of genes indicate the involvement of chromatin organisation in the control of gene expression. The presence of global regulatory mechanisms mediated through chromatin organisation can decrease the burden of invoking gene specific regulators for maintenance of the active/silenced state of gene expression. This could partially explain the lower number of genes estimated in the human genome.
The primate-specific Alu elements, which originated 65 million years ago, exist in over a million copies in the human genome. These elements have been involved in genome shuffling and various diseases not only through retrotransposition but also through large scale Alu-Alu mediated recombination. Only a few subfamilies of Alus are currently retropositionally active and show insertion/deletion polymorphisms with associated phenotypes. Retroposition occurs by means of RNA intermediates synthesised by a RNA polymerase III promoter residing in the A-Box and B-Box in these elements. Alus have also been shown to harbour a number of transcription factor binding sites, as well as hormone responsive elements. The distribution of Alus has been shown to be non-random in the human genome and these elements are increasingly being implicated in diverse functions such as transcription, translation, response to stress, nucleosome positioning and imprinting.
We conducted a retrospective analysis of putative functional sites, such as the RNA pol III promoter elements, pol II regulatory elements like hormone responsive elements and ligand-activated receptor binding sites, in Alus of various evolutionary ages. We observe a progressive loss of the RNA pol III transcriptional potential with concomitant accumulation of RNA pol II regulatory sites. We also observe a significant over-representation of Alus harboring these sites in promoter regions of signaling and metabolism genes of chromosome 22, when compared to genes of information pathway components, structural and transport proteins. This difference is not so significant between functional categories in the intronic regions of the same genes.
Our study clearly suggests that Alu elements, through retrotransposition, could distribute functional and regulatable promoter elements, which in the course of subsequent selection might be stabilized in the genome. Exaptation of regulatory elements in the preexisting genes through Alus could thus have contributed to evolution of novel regulatory networks in the primate genomes. With such a wide spectrum of regulatory sites present in Alus, it also becomes imperative to screen for variations in these sites in candidate genes, which are otherwise repeat-masked in studies pertaining to identification of predisposition markers.
Our recent work on an A→G single nucleotide polymorphism (SNP) at the quasi-palindromic sequence d(TGGGG[A/G]CCCCA) of HS4 of the human β-globin locus control region in an Indian population showed a significant association between the G allele and the occurrence of β-thalassemia. Using UV-thermal denaturation, gel assay, circular dichroism (CD) and nuclease digestion experiments we have demonstrated that the undecamer quasi- palindromic sequence d(TGGGGACCCCA) (HPA11) and its reported polymorphic (SNP) version d(TGG GGGCCCCA) (HPG11) exist in hairpin–duplex equilibria. The biphasic nature of the melting profiles for both the oligonucleotides persisted at low as well as high salt concentrations. The HPG11 hairpin showed a higher Tm than HPA11. The presence of unimolecular and bimolecular species was also shown by non-denaturating gel electrophoresis experiments. The CD spectra of both oligonucleotides showed features of the A- as well as B-type conformations and, moreover, exhibited a concentration dependence. The disappearance of the 265 nm positive CD signal in an oligomer concentration-dependent manner is indicative of an A→B transition. The results give unprecedented insight into the in vitro structure of the quasi-palindromic sequence and provide the first report in which a hairpin–duplex equilibrium has been correlated with an A→B interconversion of DNA. The nuclease-dependent degradation suggests that HPG11 is more resistant to nuclease than HPA11. Multiple sequence alignment of the HS4 region of the β-globin gene cluster from different organisms revealed that this quasi-palindromic stretch is unique to Homo sapiens. We propose that quasi-palindromic sequences may form stable mini- hairpins or cruciforms in the HS4 region and might play a role in regulating β-globin gene expression by affecting the binding of transcription factors.
Poly purine.pyrimidine sequences have the potential to adopt intramolecular triplex structures and are overrepresented upstream of genes in eukaryotes. These sequences may regulate gene expression by modulating the interaction of transcription factors with DNA sequences upstream of genes.
A poly purine.pyrimidine sequence with the potential to adopt an intramolecular triplex DNA structure was designed. The sequence was inserted within a nucleosome positioned upstream of the β-galactosidase gene in yeast, Saccharomyces cerevisiae, between the cycl promoter and gal 10Upstream Activating Sequences (UASg). Upon derepression with galactose, β-galactosidase gene expression is reduced 12-fold in cells carrying single copy poly purine.pyrimidine sequences. This reduction in expression is correlated with reduced transcription. Furthermore, we show that plasmids carrying a poly purine.pyrimidine sequence are not specifically lost from yeast cells.
We propose that a poly purine.pyrimidine sequence upstream of a gene affects transcription. Plasmids carrying this sequence are not specifically lost from cells and thus no additional effort is needed for the replication of these sequences in eukaryotic cells.