Metazoan genomes contain many ultra-conserved elements (UCEs), long sequences identical between distant species. In this study we identified UCEs in drosophilid and vertebrate species with a similar level of phylogenetic divergence measured at protein-coding regions, and demonstrated that both the length and number of UCEs are larger in vertebrates. The proportion of non-exonic UCEs declines in distant drosophilids whilst an opposite trend was observed in vertebrates. We generated a set of 2,126 Sophophora UCEs by merging elements identified in several drosophila species and compared these to the eutherian UCEs identified in placental mammals. In contrast to vertebrates, the Sophophora UCEs are depleted around transcription start sites. Analysis of 52,954 P-element, piggyBac and Minos insertions in the D. melanogaster genome revealed depletion of the P-element and piggyBac insertions in and around the Sophophora UCEs. We examined eleven fly strains with transposon insertions into the intergenic UCEs and identified associated phenotypes in five strains. Four insertions behave as recessive lethals, and in one case we observed a suppression of the marker gene within the transgene, presumably by silenced chromatin around the integration site. To confirm the lethality is caused by integration of transposons we performed a phenotype rescue experiment for two stocks and demonstrated that the excision of the transposons from the intergenic UCEs restores viability. Sequencing of DNA after the transposon excision in one fly strain with the restored viability revealed a 47 bp insertion at the original transposon integration site suggesting that the nature of the mutation is important for the appearance of the phenotype. Our results suggest that the UCEs in flies and vertebrates have both common and distinct features, and demonstrate that a significant proportion of intergenic drosophila UCEs are sensitive to disruption.
An expansion of the hexanucleotide GGGGCC repeat in the first intron of C9ORF72 gene was recently linked to amyotrophic lateral sclerosis. It is not known if the mutation results in a gain of function, a loss of function or if, perhaps both mechanisms are linked to pathogenesis. We generated a genetic model of ALS to explore the biological consequences of a null mutation of the Caenorhabditis elegans C9ORF72 orthologue, F18A1.6, also called alfa-1. alfa-1 mutants displayed age-dependent motility defects leading to paralysis and the specific degeneration of GABAergic motor neurons. alfa-1 mutants showed differential susceptibility to environmental stress where osmotic stress provoked neurodegeneration. Finally, we observed that the motor defects caused by loss of alfa-1 were additive with the toxicity caused by mutant TDP-43 proteins, but not by the mutant FUS proteins. These data suggest that a loss of alfa-1/C9ORF72 expression may contribute to motor neuron degeneration in a pathway associated with other known ALS genes.
Transgene technology is one of the most heavily relied upon tools in modern biological research. Expression of an exogenous gene within cells, for research and therapeutic applications, nearly always includes promoters and other regulatory sequences. We found that repeats of a non-protein coding transgenic sequence produced profound changes to the behavior of the nematode Caenorhabditis elegans. These changes were produced by a glial promoter sequence but, unexpectedly, major deficits were observed specifically in backward locomotion, a neuron-driven behavior. We also present evidence that this behavioral phenotype is transpromoter copy number-dependent and manifests early in development and is maintained into adulthood of the worm.
Genetic and chemical biology screens of C. elegans have been of enormous benefit in providing fundamental insight into neural function and neuroactive drugs. Recently the exploitation of microfluidic devices has added greater power to this experimental approach providing more discrete and higher throughput phenotypic analysis of neural systems. Here we make a significant addition to this repertoire through the design of a semi-automated microfluidic device, NeuroChip, which has been optimised for selecting worms based on the electrophysiological features of the pharyngeal neural network. We demonstrate this device has the capability to sort mutant from wild-type worms based on high definition extracellular electrophysiological recordings. NeuroChip resolves discrete differences in excitatory, inhibitory and neuromodulatory components of the neural network from individual animals. Worms may be fed into the device consecutively from a reservoir and recovered unharmed. It combines microfluidics with integrated electrode recording for sequential trapping, restraining, recording, releasing and recovering of C. elegans. Thus mutant worms may be selected, recovered and propagated enabling mutagenesis screens based on an electrophysiological phenotype. Drugs may be rapidly applied during the recording thus permitting compound screening. For toxicology, this analysis can provide a precise description of sub-lethal effects on neural function. The chamber has been modified to accommodate L2 larval stages showing applicability for small size nematodes including parasitic species which otherwise are not tractable to this experimental approach. We also combine NeuroChip with optogenetics for targeted interrogation of the function of the neural circuit. NeuroChip thus adds a new tool for exploitation of C. elegans and has applications in neurogenetics, drug discovery and neurotoxicology.
The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5–25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n = 46) compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n = 344) or fruit fly D. melanogaster (n = 84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies.
This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans.
This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.
Human gene catalogs are fundamental to the study of human biology and medicine. But they are all based on open reading frames (ORFs) in a reference genome sequence (with allowance for introns). Individual genomes, however, are polymorphic: their sequences are not identical. There has been much research on how polymorphism affects previously-identified genes, but no research has been done on how it affects gene identification itself. We computationally predict protein-coding genes in a straightforward manner, by finding long ORFs in mRNA sequences aligned to the reference genome. We systematically test the effect of known polymorphisms with this procedure. Polymorphisms can not only disrupt ORFs, they can also create long ORFs that do not exist in the reference sequence. We found 5,737 putative protein-coding genes that do not exist in the reference, whose protein-coding status is supported by homology to known proteins. On average 10% of these genes are located in the genomic regions devoid of annotated genes in 12 other catalogs. Our statistical analysis showed that these ORFs are unlikely to occur by chance.
Caenorhabditis elegans, especially the N2 isolate, is an invaluable biological model system. Numerous additional natural C. elegans isolates have been shown to have unexpected genotypic and phenotypic variations which has encouraged researchers to use next generation sequencing methodology to develop a more complete picture of genotypic variations among the isolates. To understand the phenotypic effects of a genomic variation (GV) on a single gene, in a variation-rich genetic background, one should analyze that particular GV in a well understood genetic background. In C. elegans, the analysis is usually done in N2, which requires extensive crossing to bring in the GV. This can be a very time consuming procedure thus it is important to establish a fast and efficient approach to test the effect of GVs from different isolates in N2. Here we use a Mos1-mediated single-copy insertion (MosSCI) method for phenotypic assessments of GVs from the variation-rich Hawaiian strain CB4856 in N2. Specifically, we investigate effects of variations identified in the CB4856 strain on tac-1 which is an essential gene that is necessary for mitotic spindle elongation and pronuclear migration. We show the usefulness of the MosSCI method by using EU1004 tac-1(or402) as a control. or402 is a temperature sensitive lethal allele within a well-conserved TACC domain (transforming acidic coiled-coil) that results in a leucine to phenylalanine change at amino acid 229. CB4856 contains a variation that affects the second exon of tac-1 causing a cysteine to tryptophan change at amino acid 94 also within the TACC domain. Using the MosSCI method, we analyze tac-1 from CB4856 in the N2 background and demonstrate that the C94W change, albeit significant, does not cause any obvious decrease in viability. This MosSCI method has proven to be a rapid and efficient way to analyze GVs.
A network of DNA damage response (DDR) mechanisms functions coordinately to maintain genome integrity and prevent disease. The Nucleotide Excision Repair (NER) pathway is known to function in the response to UV-induced DNA damage. Although numbers of coding genes and miRNAs have been identified and reported to participate in UV-induced DNA damage response (UV-DDR), the precise role of non-coding RNAs (ncRNAs) in UV-DDR remains largely unknown.
We used high-throughput RNA-sequencing (RNA-Seq) to discover intermediate-size (70–500 nt) ncRNAs (is-ncRNAs) in C. elegans, using the strains of L4 larvae of wild-type (N2), UV-irradiated (N2/UV100) and NER-deficient mutant (xpa-1), and 450 novel non-coding transcripts were initially identified. A customized microarray assay was then applied to examine the expression profiles of both novel transcripts and known is-ncRNAs, and 57 UV-DDR-related is-ncRNA candidates showed expression variations at different levels between UV irradiated strains and non- irradiated strains. The top ranked is-ncRNA candidates with expression differences were further validated by qRT-PCR analysis, of them, 8 novel is-ncRNAs were significantly up-regulated after UV irradiation. Knockdown of two novel is-ncRNAs, ncRNA317 and ncRNA415, by RNA interference, resulted in higher UV sensitivity and significantly decreased expression of NER-related genes in C. elegans.
The discovery of above two novel is-ncRNAs in this study indicated the functional roles of is-ncRNAs in the regulation of UV-DDR network, and aided our understanding of the significance of ncRNA involvement in the UV-induced DNA damage response.
Neuroligins are cell adhesion proteins that interact with neurexins at the synapse. This interaction may contribute to differentiation, plasticity and specificity of synapses. In humans, single mutations in neuroligin encoding genes lead to autism spectrum disorder and/or mental retardation. Caenorhabditis elegans mutants deficient in nlg-1, an orthologue of human neuroligin genes, have defects in different behaviors. Here we show that the expression of human NLGN1 or rat Nlgn1 cDNAs in C. elegans nlg-1 mutants rescues the fructose osmotic strength avoidance and gentle touch response phenotypes. Two specific point mutations in NLGN3 and NLGN4 genes, involved in autistic spectrum disorder, were further characterized in this experimental system. The R451C allele described in NLGN3, was analyzed with both human NLGN1 (R453C) and worm NLG-1 (R437C) proteins, and both were not functional in rescuing the osmotic avoidance behavior and the gentle touch response phenotype. The D396X allele described in NLGN4, which produces a truncated protein, was studied with human NLGN1 (D432X) and they did not rescue any of the behavioral phenotypes analyzed. In addition, RNAi feeding experiments measuring gentle touch response in wild type strain and worms expressing SID-1 in neurons (which increases the response to dsRNA), both fed with bacteria expressing dsRNA for nlg-1, provided evidence for a postsynaptic in vivo function of neuroligins both in muscle cells and neurons, equivalent to that proposed in mammals. This finding was further confirmed generating transgenic nlg-1 deficient mutants expressing NLG-1 under pan-neuronal (nrx-1) or pan-muscular (myo-3) specific promoters. All these results suggest that the nematode could be used as an in vivo model for studying particular synaptic mechanisms with proteins orthologues of humans involved in pervasive developmental disorders.
Understanding the molecular mechanisms that underlie plant responses to drought stress is challenging due to the complex interplay of numerous different genes. Here, we used network-based gene clustering to uncover the relationships between drought-responsive genes from large microarray datasets. We identified 2,607 rice genes that showed significant changes in gene expression under drought stress; 1,392 genes were highly intercorrelated to form 15 gene modules. These drought-responsive gene modules are biologically plausible, with enrichments for genes in common functional categories, stress response changes, tissue-specific expression and transcription factor binding sites. We observed that a gene module (referred to as module 4) consisting of 134 genes was significantly associated with drought response in both drought-tolerant and drought-sensitive rice varieties. This module is enriched for genes involved in controlling the response of the plant to water and embryonic development, including a heat shock transcription factor as the key regulator in the expression of ABRE-containing genes. These results suggest that module 4 is highly conserved in the ABA-mediated drought response pathway in different rice varieties. Moreover, our study showed that many hub genes clustered in rice chromosomes had significant associations with QTLs for drought stress tolerance. The relationship between hub gene clusters and drought tolerance QTLs may provide a key to understand the genetic basis of drought tolerance in rice.
C. elegans is an excellent model system for studying neuroscience using genetics because of its relatively simple nervous system, sequenced genome, and the availability of a large number of transgenic and mutant strains. Recently, microfluidic devices have been used for high-throughput genetic screens, replacing traditional methods of manually handling C. elegans. However, the orientation of nematodes within microfluidic devices is random and often not conducive to inspection, hindering visual analysis and overall throughput. In addition, while previous studies have utilized methods to bias head and tail orientation, none of the existing techniques allow for orientation along the dorso-ventral body axis. Here, we present the design of a simple and robust method for passively orienting worms into lateral body positions in microfluidic devices to facilitate inspection of morphological features with specific dorso-ventral alignments. Using this technique, we can position animals into lateral orientations with up to 84% efficiency, compared to 21% using existing methods. We isolated six mutants with neuronal development or neurodegenerative defects, showing that our technology can be used for on-chip analysis and high-throughput visual screens.
Nonsense-mediated decay (NMD) is an mRNA surveillance pathway that selectively recognizes and degrades defective mRNAs carrying premature translation-termination codons. However, several studies have shown that NMD also targets physiological transcripts that encode full-length proteins, modulating their expression. Indeed, some features of physiological mRNAs can render them NMD-sensitive. Human HFE is a MHC class I protein mainly expressed in the liver that, when mutated, can cause hereditary hemochromatosis, a common genetic disorder of iron metabolism. The HFE gene structure comprises seven exons; although the sixth exon is 1056 base pairs (bp) long, only the first 41 bp encode for amino acids. Thus, the remaining downstream 1015 bp sequence corresponds to the HFE 3′ untranslated region (UTR), along with exon seven. Therefore, this 3′ UTR encompasses an exon/exon junction, a feature that can make the corresponding physiological transcript NMD-sensitive. Here, we demonstrate that in UPF1-depleted or in cycloheximide-treated HeLa and HepG2 cells the HFE transcripts are clearly upregulated, meaning that the physiological HFE mRNA is in fact an NMD-target. This role of NMD in controlling the HFE expression levels was further confirmed in HeLa cells transiently expressing the HFE human gene. Besides, we show, by 3′-RACE analysis in several human tissues that HFE mRNA expression results from alternative cleavage and polyadenylation at four different sites – two were previously described and two are novel polyadenylation sites: one located at exon six, which confers NMD-resistance to the corresponding transcripts, and another located at exon seven. In addition, we show that the amount of HFE mRNA isoforms resulting from cleavage and polyadenylation at exon seven, although present in both cell lines, is higher in HepG2 cells. These results reveal that NMD and alternative polyadenylation may act coordinately to control HFE mRNA levels, possibly varying its protein expression according to the physiological cellular requirements.
Second-generation sequencing is a powerful method for identifying and quantifying small-RNA components of cells. However, little attention has been paid to the effects of the choice of sequencing platform and library preparation protocol on the results obtained. We present a thorough comparison of small-RNA sequencing libraries generated from the same embryonic stem cell lines, using different sequencing platforms, which represent the three major second-generation sequencing technologies, and protocols. We have analysed and compared the expression of microRNAs, as well as populations of small RNAs derived from repetitive elements. Despite the fact that different libraries display a good correlation between sequencing platforms, qualitative and quantitative variations in the results were found, depending on the protocol used. Thus, when comparing libraries from different biological samples, it is strongly recommended to use the same sequencing platform and protocol in order to ensure the biological relevance of the comparisons.
The Protein Kinase G, EGL-4, is required within the C. elegans AWC sensory neurons to promote olfactory adaptation. After prolonged stimulation of these neurons, EGL-4 translocates from the cytosol to the nuclei of the AWC. This nuclear translocation event is both necessary and sufficient for adaptation of the AWC neuron to odor. A cGMP binding motif within EGL-4 and the Gα protein ODR-3 are both required for this translocation event, while loss of the guanylyl cyclase ODR-1 was shown to result in constitutively nuclear localization of EGL-4. However, the molecular changes that are integrated over time to produce a stably adapted response in the AWC are unknown. Here we show that odor-induced fluctuations in cGMP levels in the adult cilia may be responsible in part for sending EGL-4 into the AWC nucleus to produce long-term adaptation. We found that reductions in cGMP that result from mutations in the genes encoding the cilia-localized guanylyl cyclases ODR-1 and DAF-11 result in constitutively nuclear EGL-4 even in naive animals. Conversely, increases in cGMP levels that result from mutations in cGMP phosphodiesterases block EGL-4 nuclear entry even after prolonged odor exposure. Expression of a single phosphodiesterase in adult, naive animals was sufficient to modestly increase the number of animals with nuclear EGL-4. Further, coincident acute treatment of animals with odor and the phosphodiesterase inhibitor 3-isobutyl-1-methylxanthine (IBMX) decreased the number of animals with nuclear EGL-4. These data suggest that reducing cGMP levels in AWC is necessary and even partially sufficient for nuclear translocation of EGL-4 and adaptation as a result of prolonged odor exposure. Our genetic analysis and chemical treatment of C. elegans further indicate that cilia morphology, as defined by fluorescent microscopic observation of the sensory endings, may allow for odor-induced fluctuations in cGMP levels and this fluctuation may be responsible for sending EGL-4 into the AWC nucleus.
Construction of recombinant DNA from multiple fragments is widely required in molecular biology, especially for synthetic biology purposes. Here we describe a new method, successive hybridization assembling (SHA) which can rapidly do this in a single reaction in vitro. In SHA, DNA fragments are prepared to overlap one after another, so after simple denaturation-renaturation treatment they hybridize in a successive manner and thereby assemble into a recombinant molecule. In contrast to traditional methods, SHA eliminates the need for restriction enzymes, DNA ligases and recombinases, and is sequence-independent. We first demonstrated its feasibility by constructing plasmids from 4, 6 and 8 fragments with high efficiencies, and then applied it to constructing a customized vector and two artificial pathways. As SHA is robust, easy to use and can tolerate repeat sequences, we expect it to be a powerful tool in synthetic biology.
Cells respond to changes in the internal and external environment by a complex regulatory system whose end-point is the activation of transcription factors controlling the expression of a pool of ad-hoc genes. Recent experiments have shown that certain stimuli may trigger oscillations in the concentration of transcription factors such as NF-B and p53 influencing the final outcome of the genetic response. In this study we investigate the role of oscillations in the case of three different well known gene regulatory mechanisms using mathematical models based on ordinary differential equations and numerical simulations. We considered the cases of direct regulation, two-step regulation and feed-forward loops, and characterized their response to oscillatory input signals both analytically and numerically. We show that in the case of indirect two-step regulation the expression of genes can be turned on or off in a frequency dependent manner, and that feed-forward loops are also able to selectively respond to the temporal profile of oscillating transcription factors.
microRNAs (miRNAs) are small (20–23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html.
Genome-wide analysis of gene expression or protein binding patterns using different array or sequencing based technologies is now routinely performed to compare different populations, such as treatment and reference groups. It is often necessary to normalize the data obtained to remove technical variation introduced in the course of conducting experimental work, but standard normalization techniques are not capable of eliminating technical bias in cases where the distribution of the truly altered variables is skewed, i.e. when a large fraction of the variables are either positively or negatively affected by the treatment. However, several experiments are likely to generate such skewed distributions, including ChIP-chip experiments for the study of chromatin, gene expression experiments for the study of apoptosis, and SNP-studies of copy number variation in normal and tumour tissues. A preliminary study using spike-in array data established that the capacity of an experiment to identify altered variables and generate unbiased estimates of the fold change decreases as the fraction of altered variables and the skewness increases. We propose the following work-flow for analyzing high-dimensional experiments with regions of altered variables: (1) Pre-process raw data using one of the standard normalization techniques. (2) Investigate if the distribution of the altered variables is skewed. (3) If the distribution is not believed to be skewed, no additional normalization is needed. Otherwise, re-normalize the data using a novel HMM-assisted normalization procedure. (4) Perform downstream analysis. Here, ChIP-chip data and simulated data were used to evaluate the performance of the work-flow. It was found that skewed distributions can be detected by using the novel DSE-test (Detection of Skewed Experiments). Furthermore, applying the HMM-assisted normalization to experiments where the distribution of the truly altered variables is skewed results in considerably higher sensitivity and lower bias than can be attained using standard and invariant normalization methods.
Experimental evidence is increasingly being used to reassess the quality and accuracy of genome annotation. Proteomics data used for this purpose, called proteogenomics, can alleviate many of the problematic areas of genome annotation, e.g. short protein validation and start site assignment. We performed a proteogenomic analysis of 46 genomes spanning eight bacterial and archaeal phyla across the tree of life. These diverse datasets facilitated the development of a robust approach for proteogenomics that is functional across genomes varying in %GC, gene content, proteomic sampling depth, phylogeny, and genome size. In addition to finding evidence for 682 novel proteins, 1336 new start sites, and numerous dubious genes, we discovered sites of post-translational maturation in the form of proteolytic cleavage of 1175 signal peptides. The number of novel proteins per genome is highly variable (median 7, mean 15, stdev 20). Moreover, comparison of novel genes with the current genes did not reveal any consistent abnormalities. Thus, we conclude that proteogenomics fulfills a yet to be understood deficiency in gene prediction. With the adoption of new sequencing technologies which have higher error rates than Sanger-based methods and the advances in proteomics, proteogenomics may become even more important in the future.
Regulation of pre-mRNA splicing is achieved through the interaction of RNA sequence elements and a variety of RNA-splicing related proteins (splicing factors). The splicing machinery in humans is not yet fully elucidated, partly because splicing factors in humans have not been exhaustively identified. Furthermore, experimental methods for splicing factor identification are time-consuming and lab-intensive. Although many computational methods have been proposed for the identification of RNA-binding proteins, there exists no development that focuses on the identification of RNA-splicing related proteins so far. Therefore, we are motivated to design a method that focuses on the identification of human splicing factors using experimentally verified splicing factors. The investigation of amino acid composition reveals that there are remarkable differences between splicing factors and non-splicing proteins. A support vector machine (SVM) is utilized to construct a predictive model, and the five-fold cross-validation evaluation indicates that the SVM model trained with amino acid composition could provide a promising accuracy (80.22%). Another basic feature, amino acid dipeptide composition, is also examined to yield a similar predictive performance to amino acid composition. In addition, this work presents that the incorporation of evolutionary information and domain information could improve the predictive performance. The constructed models have been demonstrated to effectively classify (73.65% accuracy) an independent data set of human splicing factors. The result of independent testing indicates that in silico identification could be a feasible means of conducting preliminary analyses of splicing factors and significantly reducing the number of potential targets that require further in vivo or in vitro confirmation.
Of the ∼4000 ORFs identified through the genome sequence of Mycobacterium tuberculosis (TB) H37Rv, experimentally determined structures are available for 312. Since knowledge of protein structures is essential to obtain a high-resolution understanding of the underlying biology, we seek to obtain a structural annotation for the genome, using computational methods. Structural models were obtained and validated for ∼2877 ORFs, covering ∼70% of the genome. Functional annotation of each protein was based on fold-based functional assignments and a novel binding site based ligand association. New algorithms for binding site detection and genome scale binding site comparison at the structural level, recently reported from the laboratory, were utilized. Besides these, the annotation covers detection of various sequence and sub-structural motifs and quaternary structure predictions based on the corresponding templates. The study provides an opportunity to obtain a global perspective of the fold distribution in the genome. The annotation indicates that cellular metabolism can be achieved with only 219 folds. New insights about the folds that predominate in the genome, as well as the fold-combinations that make up multi-domain proteins are also obtained. 1728 binding pockets have been associated with ligands through binding site identification and sub-structure similarity analyses. The resource (http://proline.physics.iisc.ernet.in/Tbstructuralannotation), being one of the first to be based on structure-derived functional annotations at a genome scale, is expected to be useful for better understanding of TB and for application in drug discovery. The reported annotation pipeline is fairly generic and can be applied to other genomes as well.
Forward genetic screens in vertebrates are powerful tools to generate models relevant to human diseases, including neuropsychiatric disorders. Variability in phenotypic penetrance and expressivity is common in these disorders and behavioral mutant models, making their molecular-genetic mapping a formidable task. Using a ‘phenotyping by segregation’ strategy, we molecularly map the hypersensitive zebrafish houdini mutant despite its variable phenotypic penetrance, providing a generally applicable strategy to map zebrafish mutants with subtle phenotypes.
Ankrd2 (also known as Arpp) together with Ankrd1/CARP and DARP are members of the MARP mechanosensing proteins that form a complex with titin (N2A)/calpain 3 protease/myopalladin. In muscle, Ankrd2 is located in the I-band of the sarcomere and moves to the nucleus of adjacent myofibers on muscle injury. In myoblasts it is predominantly in the nucleus and on differentiation shifts from the nucleus to the cytoplasm. In agreement with its role as a sensor it interacts both with sarcomeric proteins and transcription factors.
Expression profiling of endogenous Ankrd2 silenced in human myotubes was undertaken to elucidate its role as an intermediary in cell signaling pathways. Silencing Ankrd2 expression altered the expression of genes involved in both intercellular communication (cytokine-cytokine receptor interaction, endocytosis, focal adhesion, tight junction, gap junction and regulation of the actin cytoskeleton) and intracellular communication (calcium, insulin, MAPK, p53, TGF-β and Wnt signaling). The significance of Ankrd2 in cell signaling was strengthened by the fact that we were able to show for the first time that Nkx2.5 and p53 are upstream effectors of the Ankrd2 gene and that Ankrd1/CARP, another MARP member, can modulate the transcriptional ability of MyoD on the Ankrd2 promoter. Another novel finding was the interaction between Ankrd2 and proteins with PDZ and SH3 domains, further supporting its role in signaling. It is noteworthy that we demonstrated that transcription factors PAX6, LHX2, NFIL3 and MECP2, were able to bind both the Ankrd2 protein and its promoter indicating the presence of a regulatory feedback loop mechanism.
In conclusion we demonstrate that Ankrd2 is a potent regulator in muscle cells affecting a multitude of pathways and processes.
Loss of muscle mass via protein degradation is an important clinical problem but we know little of how muscle protein degradation is regulated genetically. To gain insight our labs developed C. elegans into a model for understanding the regulation of muscle protein degradation. Past studies uncovered novel functional roles for genes affecting muscle and/or involved in signalling in other cells or tissues. Here we examine most of the genes previously identified as the sites of mutations affecting muscle for novel roles in regulating degradation. We evaluate genomic (RNAi knockdown) approaches and combine them with our established genetic (mutant) and pharmacologic (drugs) approaches to examine these 159 genes. We find that RNAi usually recapitulates both organismal and sub-cellular mutant phenotypes but RNAi, unlike mutants, can frequently be used acutely to study gene function solely in differentiated muscle. In the majority of cases where RNAi does not produce organismal level phenotypes, sub-cellular defects can be detected; disrupted proteostasis is most commonly observed. We identify 48 genes in which mutation or RNAi knockdown causes excessive protein degradation; myofibrillar and/or mitochondrial morphologies are also disrupted in 19 of these 48 cases. These 48 genes appear to act via at least three sub-networks to control bulk degradation of protein in muscle cytosol. Attachment to the extracellular matrix regulates degradation via unidentified proteases and affects myofibrillar and mitochondrial morphology. Growth factor imbalance and calcium overload promote lysosome based degradation whereas calcium deficit promotes proteasome based degradation, in both cases myofibrillar and mitochondrial morphologies are largely unaffected. Our results provide a framework for effectively using RNAi to identify and functionally cluster novel regulators of degradation. This clustering allows prioritization of candidate genes/pathways for future mechanistic studies.
HIV proteins target host hub proteins for transient binding interactions. The presence of viral proteins in the infected cell results in out-competition of host proteins in their interaction with hub proteins, drastically affecting cell physiology. Functional genomics and interactome datasets can be used to quantify the sequence hotspots on the HIV proteome mediating interactions with host hub proteins. In this study, we used the HIV and human interactome databases to identify HIV targeted host hub proteins and their host binding partners (H2). We developed a high throughput computational procedure utilizing motif discovery algorithms on sets of protein sequences, including sequences of HIV and H2 proteins. We identified as HIV sequence hotspots those linear motifs that are highly conserved on HIV sequences and at the same time have a statistically enriched presence on the sequences of H2 proteins. The HIV protein motifs discovered in this study are expressed by subsets of H2 host proteins potentially outcompeted by HIV proteins. A large subset of these motifs is involved in cleavage, nuclear localization, phosphorylation, and transcription factor binding events. Many such motifs are clustered on an HIV sequence in the form of hotspots. The sequential positions of these hotspots are consistent with the curated literature on phenotype altering residue mutations, as well as with existing binding site data. The hotspot map produced in this study is the first global portrayal of HIV motifs involved in altering the host protein network at highly connected hub nodes.