There is a growing appreciation of the extent of transcriptome variation across individual cells of the same cell type. While expression variation may be a byproduct of, for example, dynamic or homeostatic processes, here we consider whether single‐cell molecular variation per se might be crucial for population‐level function. Under this hypothesis, molecular variation indicates a diversity of hidden functional capacities within an ensemble of “identical” cells, and this functional diversity facilitates collective behavior that would be inaccessible to a homogenous population. In reviewing this topic, we explore possible functions that might be carried by a heterogeneous ensemble of cells; however, this question has proven difficult to test, both because methods to manipulate molecular variation are limited and because it is complicated to define, and measure, population‐level function. We consider several possible methods to further pursue the hypothesis that “variation is function” through the use of comparative analysis and novel experimental techniques.
bet‐hedging; evolution of variation; fractional response; functional variation; single cell transcriptome; single cell variation
Intra-tumoral genetic and functional heterogeneity correlates with cancer clinical prognoses. However, the mechanisms by which intra-tumoral heterogeneity impacts therapeutic outcome remain poorly understood. RNA sequencing (RNA-seq) of single tumor cells can provide comprehensive information about gene expression and single-nucleotide variations in individual tumor cells, which may allow for the translation of heterogeneous tumor cell functional responses into customized anti-cancer treatments.
We isolated 34 patient-derived xenograft (PDX) tumor cells from a lung adenocarcinoma patient tumor xenograft. Individual tumor cells were subjected to single cell RNA-seq for gene expression profiling and expressed mutation profiling. Fifty tumor-specific single-nucleotide variations, including KRASG12D, were observed to be heterogeneous in individual PDX cells. Semi-supervised clustering, based on KRASG12D mutant expression and a risk score representing expression of 69 lung adenocarcinoma-prognostic genes, classified PDX cells into four groups. PDX cells that survived in vitro anti-cancer drug treatment displayed transcriptome signatures consistent with the group characterized by KRASG12D and low risk score.
Single-cell RNA-seq on viable PDX cells identified a candidate tumor cell subgroup associated with anti-cancer drug resistance. Thus, single-cell RNA-seq is a powerful approach for identifying unique tumor cell-specific gene expression profiles which could facilitate the development of optimized clinical anti-cancer strategies.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0692-3) contains supplementary material, which is available to authorized users.
Differentiation of metazoan cells requires execution of different gene expression programs but recent single-cell transcriptome profiling has revealed considerable variation within cells of seeming identical phenotype. This brings into question the relationship between transcriptome states and cell phenotypes. Additionally, single-cell transcriptomics presents unique analysis challenges that need to be addressed to answer this question.
We present high quality deep read-depth single-cell RNA sequencing for 91 cells from five mouse tissues and 18 cells from two rat tissues, along with 30 control samples of bulk RNA diluted to single-cell levels. We find that transcriptomes differ globally across tissues with regard to the number of genes expressed, the average expression patterns, and within-cell-type variation patterns. We develop methods to filter genes for reliable quantification and to calibrate biological variation. All cell types include genes with high variability in expression, in a tissue-specific manner. We also find evidence that single-cell variability of neuronal genes in mice is correlated with that in rats consistent with the hypothesis that levels of variation may be conserved.
Single-cell RNA-sequencing data provide a unique view of transcriptome function; however, careful analysis is required in order to use single-cell RNA-sequencing measurements for this purpose. Technical variation must be considered in single-cell RNA-sequencing studies of expression variation. For a subset of genes, biological variability within each cell type appears to be regulated in order to perform dynamic functions, rather than solely molecular noise.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0683-4) contains supplementary material, which is available to authorized users.
Nibbler (Nbr) is a 3′-to-5′ exonuclease that trims the 3′end of microRNAs (miRNAs) to generate different length patterns of miRNAs in Drosophila. Despite its effect on miRNAs, we lack knowledge of its biological significance and whether Nbr affects other classes of small RNAs such as piRNAs and endo-siRNAs. Here, we characterized the in vivo function of nbr by defining the Nbr protein expression pattern and loss-of-function effects. Nbr protein is enriched in the ovary and head. Analysis of nbr null animals reveals adult-stage defects that progress with age, including held-up wings, decreased locomotion, and brain vacuoles, indicative of accelerated age-associated processes upon nbr loss. Importantly, these effects depend on catalytic residues in the Nbr exonuclease domain, indicating that the catalytic activity is responsible for these effects. Given the impact of nbr on miRNAs, we also analyzed the effect of nbr on piRNA and endo-siRNA lengths by deep-sequence analysis of libraries from ovaries. As with miRNAs, nbr mutation led to longer length piRNAs – an effect that was dependent on the catalytic residues of the exonuclease domain. These analyses indicate a role of nbr on age-associated processes and to modulate length of multiple classes of small RNAs including miRNAs and piRNAs in Drosophila.
aging; endo-siRNA; miRNA; Nibbler; piRNA
Cytoplasmic splicing represents a newly emerging level of transcriptional regulation adding to the molecular diversity of mammalian cells. As examples of this noncanonical form of transcript processing are discovered, the evidence of its importance to normal cellular function grows. Work from a number of groups using a variety of cell types is steadily identifying a large number of transcripts (and soon to be even larger as genome-wide analyses of retained introns across a number of cellular phenotypes are currently underway) that undergo some level of regulated endogenous extranuclear splicing as part of their normal biosynthetic pathway. Here, we review the existing data covering cytoplasmic retained intron sequences and suggest that such sequences may be a component of `sentinel RNA' that serves to generate transcript variants within the cytoplasm as well as a source for RNA-based secondary messages.
Protein synthesis in neuronal dendrites underlies long-term memory formation in the brain. Local translation of reporter mRNAs has demonstrated translation in dendrites at focal points called translational hotspots. Various reports have shown that hundreds to thousands of mRNAs are localized to dendrites, yet the dynamics of translation of multiple dendritic mRNAs has remained elusive. Here, we show that the protein translational activities of two dendritically localized mRNAs are spatiotemporally complex but constrained by the translational hotspots in which they are colocalized. Cotransfection of glutamate receptor 2 (GluR2) and GluR4 mRNAs (engineered to encode different fluorescent proteins) into rat hippocampal neurons demonstrates a heterogeneous distribution of translational hotspots for the two mRNAs along dendrites. Stimulation with s-3,5-dihydroxy-phenylglycine modifies the translational dynamics of both of these RNAs in a complex saturable manner. These results suggest that the translational hotspot is a primary structural regulator of the simultaneous yet differential translation of multiple mRNAs in the neuronal dendrite.
Neurons display a highly polarized architecture. Their ability to modify their features under intracellular and extracellular stimuli, known as synaptic plasticity, is a key component of the neurochemical basis of learning and memory. A key feature of synaptic plasticity involves the delivery of mRNAs to distinct sub-cellular domains where they are locally translated. Regulatory coordination of these spatio-temporal events is critical for synaptogenesis and synaptic plasticity as defects in these processes can lead to neurological diseases. In this work, using microdissected dendrites from primary cultures of hippocampal neurons of two mouse strains (C57BL/6 and Balb/c) and one rat strain (Sprague–Dawley), we investigate via microarrays, subcellular localization of mRNAs in dendrites of neurons to assay the evolutionary differences in subcellular dendritic transcripts localization.
Our microarray analysis highlighted significantly greater evolutionary diversification of RNA localization in the dendritic transcriptomes (81% gene identity difference among the top 5% highly expressed genes) compared to the transcriptomes of 11 different central nervous system (CNS) and non-CNS tissues (average of 44% gene identity difference among the top 5% highly expressed genes). Differentially localized genes include many genes involved in CNS function.
Species differences in sub-cellular localization may reflect non-functional neutral drift. However, the functional categories of mRNA showing differential localization suggest that at least part of the divergence may reflect activity-dependent functional differences of neurons, mediated by species-specific RNA subcellular localization mechanisms.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-883) contains supplementary material, which is available to authorized users.
Dendritic RNA; Evolution; Transcriptome; Comparative genomics; Rodentia
Transcriptome profiling is an indispensable tool in advancing the understanding of single cell biology, but depends upon methods capable of isolating mRNA at the spatial resolution of a single cell. Current capture methods lack sufficient spatial resolution to isolate mRNA from individual in vivo resident cells without damaging adjacent tissue. Because of this limitation, it has been difficult to assess the influence of the microenvironment on the transcriptome of individual neurons. Here, we engineered a Transcriptome In Vivo Analysis (TIVA)-tag, which upon photoactivation enables mRNA capture from single cells in live tissue. Using the TIVA-tag in combination with RNA-seq to analyze transcriptome variance among single dispersed cells and in vivo resident mouse and human neurons, we show that the tissue microenvironment shapes the transcriptomic landscape of individual cells. The TIVA methodology provides the first noninvasive approach for capturing mRNA from single cells in their natural microenvironment.
RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value.
We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation.
These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.
Two independent studies, one of them using a computational approach, identified CHRONO, a gene shown to modulate the activity of circadian transcription factors and alter circadian behavior in mice.
Over the last decades, researchers have characterized a set of “clock genes” that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics.
Daily rhythms are ever-present in the living world, driving the sleep–wake cycle and many other physiological changes. In the last two decades, several labs have identified “clock genes” that interact to generate underlying molecular oscillations. However, many aspects of circadian molecular physiology remain unexplained. Here, we used a simple “machine learning” approach to identify new clock genes by searching the genome for candidate genes that share clock-like features such as cycling, broad-based tissue RNA expression, in vitro circadian activity, genetic interactions, and homology across species. Genes were ranked by their similarity to known clock components and the candidates were screened and validated for evidence of clock function in vitro. One candidate, which we renamed CHRONO (Gm129), interacted with the master regulator of the clock, BMAL1, disrupting its transcriptional activity. We found that Chrono knockout mice had prolonged locomotor activity rhythms, getting up progressively later each day. Our experiments demonstrated that CHRONO interferes with the ability of BMAL1 to recruit CBP, a bona fide histone acetylase and key transcriptional coactivator of the circadian clock.
Recent findings have revealed the complexity of the transcriptional landscape in mammalian cells. One recently described class of novel transcripts are the Cytoplasmic Intron-sequence Retaining Transcripts (CIRTs), hypothesized to confer post-transcriptional regulatory function. For instance, the neuronal CIRT KCNMA1i16 contributes to the firing properties of hippocampal neurons. Intronic sub-sequence retention within IL1-β mRNA in anucleate platelets has been implicated in activity-dependent splicing and translation. In a recent study, we showed CIRTs harbor functional SINE ID elements which are hypothesized to mediate dendritic localization in neurons. Based on these studies and others, we hypothesized that CIRTs may be present in a broad set of transcripts and comprise novel signals for post-transcriptional regulation. We carried out a transcriptome-wide survey of CIRTs by sequencing micro-dissected subcellular RNA fractions. We sequenced two batches of 150-300 individually dissected dendrites from primary cultures of hippocampal neurons in rat and three batches from mouse hippocampal neurons. After statistical processing to minimize artifacts, we found a broad prevalence of CIRTs in the neurons in both species (44-60% of the expressed transcripts). The sequence patterns, including stereotypical length, biased inclusion of specific introns, and intron-intron junctions, suggested CIRT-specific nuclear processing. Our analysis also suggested that these cytoplasmic intron-sequence retaining transcripts may serve as a primary transcript for ncRNAs. Our results show that retaining intronic sequences is not isolated to a few loci but may be a genome-wide phenomenon for embedding functional signals within certain mRNA. The results hypothesize a novel source of cis-sequences for post-transcriptional regulation. Our results hypothesize two potentially novel splicing pathways: one, within the nucleus for CIRT biogenesis; and another, within the cytoplasm for removing CIRT sequences before translation. We also speculate that release of CIRT sequences prior to translation may form RNA-based signals within the cell potentially comprising a novel class of signaling pathways.
The building blocks of complex biological systems are single cells. Fundamental insights gained from single-cell analysis promise to provide the framework for understanding normal biological systems development as well as the limits on systems/cellular ability to respond to disease. The interplay of cells to create functional systems is not well understood. Until recently, the study of single cells has concentrated primarily on morphological and physiological characterization. With the application of new highly sensitive molecular and genomic technologies, the quantitative biochemistry of single cells is now accessible.
quantitative biology; single neurons; single cells; transcriptomics; proteomics; splicing
Viruses are exceedingly diverse in their evolved strategies to manipulate hosts for viral replication. However, despite these differences, most virus populations will occasionally experience two commonly-encountered challenges: growth in variable host environments, and growth under fluctuating population sizes. We used the segmented RNA bacteriophage ϕ6 as a model for studying the evolutionary genomics of virus adaptation in the face of host switches and parametrically varying population sizes. To do so, we created a bifurcating deme structure that reflected lineage splitting in natural populations, allowing us to test whether phylogenetic algorithms could accurately resolve this ‘known phylogeny’. The resulting tree yielded 32 clones at the tips and internal nodes; these strains were fully sequenced and measured for phenotypic changes in selected traits (fitness on original and novel hosts).
We observed that RNA segment size was negatively correlated with the extent of molecular change in the imposed treatments; molecular substitutions tended to cluster on the Small and Medium RNA chromosomes of the virus, and not on the Large segment. Our study yielded a very large molecular and phenotypic dataset, fostering possible inferences on genotype-phenotype associations. Using further experimental evolution, we confirmed an inference on the unanticipated role of an allelic switch in a viral assembly protein, which governed viral performance across host environments.
Our study demonstrated that varying complexities can be simultaneously incorporated into experimental evolution, to examine the combined effects of population size, and adaptation in novel environments. The imposed bifurcating structure revealed that some methods for phylogenetic reconstruction failed to resolve the true phylogeny, owing to a paucity of molecular substitutions separating the RNA viruses that evolved in our study.
Adaptation; Bacteria; Bacteriophage; Experimental evolution; Known phylogeny; Pseudomonas; Virus
RNA precursors give rise to mRNA after splicing of intronic sequences traditionally thought to occur in the nucleus. Here, we show that intron sequences are retained in a number of dendritically-targeted mRNAs, using microarray and Illumina sequencing of isolated dendritic mRNA as well as in situ hybridization. Many of the retained introns contain ID elements, a class of SINE retrotransposon. A portion of these SINEs confers dendritic targeting to exogenous and endogenous transcripts showing the necessity of ID-mediated mechanisms for the targeting of different transcripts to dendrites. ID elements are capable of selectively altering the distribution of endogenous proteins, providing a link between intronic SINEs and protein function. As such, the ID element represents the first common dendritic targeting element to be found across multiple RNAs. Retention of intronic sequence is a more general phenomenon then previously thought and plays a functional role in the biology of the neuron, partly mediated by co-opted repetitive sequences.
SNP (single nucleotide polymorphism) discovery using next-generation sequencing data remains difficult primarily because of redundant genomic regions, such as interspersed repetitive elements and paralogous genes, present in all eukaryotic genomes. To address this problem, we developed Sniper, a novel multi-locus Bayesian probabilistic model and a computationally efficient algorithm that explicitly incorporates sequence reads that map to multiple genomic loci. Our model fully accounts for sequencing error, template bias, and multi-locus SNP combinations, maintaining high sensitivity and specificity under a broad range of conditions. An implementation of Sniper is freely available at http://kim.bio.upenn.edu/software/sniper.shtml.
It has become increasingly clear that the genome is dynamic and exquisitely sensitive, changing expression patterns in response to age, environmental stimuli and pharmacological and physiological manipulations. Similarly, cellular phenotype, traditionally viewed as a stable end-state, should be viewed as versatile and changeable. The phenotype of a cell is better defined as a “homeostatic phenotype” implying plasticity resulting from a dynamically-changing yet characteristic pattern of gene/protein expression. A stable change in phenotype is the result of the movement of a cell between different multi-dimensional identity spaces. Here, we describe a key driver of this transition and the stabilizer of phenotype: the relative abundances of the cellular RNAs. We argue that the quantitative state of RNA can be likened to a state memory, that when transferred between cells, alters the phenotype in a predictable manner.
Gene expression is a dynamic trait, and the evolution of gene regulation can dramatically alter the timing of gene expression without greatly affecting mean expression levels. Moreover, modules of co-regulated genes may exhibit coordinated shifts in expression timing patterns during evolutionary divergence. Here, we examined transcriptome evolution in the dynamical context of the budding yeast cell-division cycle, to investigate the extent of divergence in expression timing and the regulatory architecture underlying timing evolution.
Using a custom microarray platform, we obtained 378 measurements for 6,263 genes over 18 timepoints of the cell-division cycle in nine strains of S. cerevisiae and one strain of S. paradoxus. Most genes show significant divergence in expression dynamics at all scales of transcriptome organization, suggesting broad potential for timing changes. A model test comparing expression level evolution versus timing evolution revealed a better fit with timing evolution for 82% of genes. Analysis of shared patterns of timing evolution suggests the existence of seven dynamically-autonomous modules, each of which shows coherent evolutionary timing changes. Analysis of transcription factors associated with these gene modules suggests a modular pleiotropic source of divergence in expression timing.
We propose that transcriptome evolution may generally entail changes in timing (heterochrony) rather than changes in levels (heterometry) of expression. Evolution of gene expression dynamics may involve modular changes in timing control mediated by module-specific transcription factors. We hypothesize that genome-wide gene regulation may utilize a general architecture comprised of multiple semi-autonomous event timelines, whose superposition could produce combinatorial complexity in timing control patterns.
We investigate the logic by which sensory input is translated into behavioral output. First we provide a functional analysis of the entire odor receptor repertoire of an olfactory system. We construct tuning curves for the 21 functional odor receptors of the Drosophila larva, and show that they sharpen at lower odor doses. We construct a 21-dimensional odor space from the responses of the receptors and find that the distance between two odors correlates with the extent to which one odor masks the other. Mutational analysis shows that different receptors mediate the responses to different concentrations of an odorant. The summed response of the entire receptor repertoire correlates with the strength of the behavioral response. The activity of a small number of receptors is a surprisingly powerful predictor of behavior. Odors that inhibit more receptors are more likely to be repellents. Odor space is largely conserved between two dissimilar olfactory systems.
Olfaction; Drosophila; odor receptor; larva; behavior
The genetics underlying the autism spectrum disorders (ASDs) is complex and remains poorly understood. Previous work has demonstrated an important role for structural variation in a subset of cases, but has lacked the resolution necessary to move beyond detection of large regions of potential interest to identification of individual genes. To pinpoint genes likely to contribute to ASD etiology, we performed high density genotyping in 912 multiplex families from the Autism Genetics Resource Exchange (AGRE) collection and contrasted results to those obtained for 1,488 healthy controls. Through prioritization of exonic deletions (eDels), exonic duplications (eDups), and whole gene duplication events (gDups), we identified more than 150 loci harboring rare variants in multiple unrelated probands, but no controls. Importantly, 27 of these were confirmed on examination of an independent replication cohort comprised of 859 cases and an additional 1,051 controls. Rare variants at known loci, including exonic deletions at NRXN1 and whole gene duplications encompassing UBE3A and several other genes in the 15q11–q13 region, were observed in the course of these analyses. Strong support was likewise observed for previously unreported genes such as BZRAP1, an adaptor molecule known to regulate synaptic transmission, with eDels or eDups observed in twelve unrelated cases but no controls (p = 2.3×10−5). Less is known about MDGA2, likewise observed to be case-specific (p = 1.3×10−4). But, it is notable that the encoded protein shows an unexpectedly high similarity to Contactin 4 (BLAST E-value = 3×10−39), which has also been linked to disease. That hundreds of distinct rare variants were each seen only once further highlights complexity in the ASDs and points to the continued need for larger cohorts.
Autism spectrum disorders (ASDs) are common neurodevelopmental syndromes with a strong genetic component. ASDs are characterized by disturbances in social behavior, impaired verbal and nonverbal communication, as well as repetitive behaviors and/or a restricted range of interests. To identify genes likely to contribute to ASD etiology, we performed high density genotyping in 912 multiplex families from the Autism Genetics Resource Exchange (AGRE) collection and contrasted results to those obtained for 1,488 healthy controls. To enrich for variants most likely to interfere with gene function, we restricted our analyses to deletions and gains encompassing exons. Of the many genomic regions highlighted, 27 were seen to harbor rare variants in cases and not controls, both in the first phase of our analysis, and also in an independent replication cohort comprised of 859 cases and 1,051 controls. More work in a larger number of individuals will be required to determine which of the rare alleles highlighted here are indeed related to the ASDs and how they act to shape risk.
RNA molecules will tend to adopt a folded conformation through the pairing of bases on a single strand; the resulting so-called secondary structure is critical to the function of many types of RNA. The secondary structure of a particular substring of functional RNA may depend on its surrounding sequence. Yet, some RNAs such as microRNAs retain their specific structures during biogenesis, which involves extraction of the substructure from a larger structural context, while other functional RNAs may be composed of a fusion of independent substructures. Such observations raise the question of whether particular functional RNA substructures may be selected for invariance of secondary structure to their surrounding nucleotide context. We define the property of self containment to be the tendency for an RNA sequence to robustly adopt the same optimal secondary structure regardless of whether it exists in isolation or is a substring of a longer sequence of arbitrary nucleotide content. We measured degree of self containment using a scoring method we call the self-containment index and found that miRNA stem loops exhibit high self containment, consistent with the requirement for structural invariance imposed by the miRNA biogenesis pathway, while most other structured RNAs do not. Further analysis revealed a trend toward higher self containment among clustered and conserved miRNAs, suggesting that high self containment may be a characteristic of novel miRNAs acquiring new genomic contexts. We found that miRNAs display significantly enhanced self containment compared to other functional RNAs, but we also found a trend toward natural selection for self containment in most functional RNA classes. We suggest that self containment arises out of selection for robustness against perturbations, invariance during biogenesis, and modular composition of structural function. Analysis of self containment will be important for both annotation and design of functional RNAs. A Python implementation and Web interface to calculate the self-containment index are available at http://kim.bio.upenn.edu/software/.
An RNA molecule is made up of a linear sequence of nucleotides, which form pairwise interactions that define its folded three-dimensional structure; the particular structure largely depends on the specific sequence. These base-pairing interactions are stabilizing, and the RNA will tend to fold in a particular way to maximize stability. Consider some nucleotide sequence that optimally folds into some structure in isolation; if this sequence is now embedded inside a larger sequence, then either the original structure will be a robust subcomponent of the larger folded structure, or it will be disrupted due to new interactions between the original sequence and the surrounding sequence. We explore this property of context robustness of structure and in particular define the property of “self containment” to describe intrinsic context robustness—i.e., the tendency for certain sequences to be structurally robust in many different sequence contexts. Self containment turns out to be a strong characteristic of a class of RNAs called microRNAs, whose biogenesis process depends on the maintenance of structural robustness. This finding will be useful in future efforts to characterize novel miRNAs, as well as in understanding the regulation and evolution of noncoding functional RNAs as modular units.
Computational prediction and in vivo protein coupling experiments identify candidate plant G-protein coupled receptors in Arabidopsis, rice and poplar.
The classic paradigm of heterotrimeric G-protein signaling describes a heptahelical, membrane-spanning G-protein coupled receptor that physically interacts with an intracellular Gα subunit of the G-protein heterotrimer to transduce signals. G-protein coupled receptors comprise the largest protein superfamily in metazoa and are physiologically important as they sense highly diverse stimuli and play key roles in human disease. The heterotrimeric G-protein signaling mechanism is conserved across metazoa, and also readily identifiable in plants, but the low sequence conservation of G-protein coupled receptors hampers the identification of novel ones. Using diverse computational methods, we performed whole-proteome analyses of the three dominant model plant species, the herbaceous dicot Arabidopsis thaliana (mouse-eared cress), the monocot Oryza sativa (rice), and the woody dicot Populus trichocarpa (poplar), to identify plant protein sequences most likely to be GPCRs.
Our stringent bioinformatic pipeline allowed the high confidence identification of candidate G-protein coupled receptors within the Arabidopsis, Oryza, and Populus proteomes. We extended these computational results through actual wet-bench experiments where we tested over half of our highest ranking Arabidopsis candidate G-protein coupled receptors for the ability to physically couple with GPA1, the sole Gα in Arabidopsis. We found that seven out of eight tested candidate G-protein coupled receptors do in fact interact with GPA1. We show through G-protein coupled receptor classification and molecular evolutionary analyses that both individual G-protein coupled receptor candidates and candidate G-protein coupled receptor families are conserved across plant species and that, in some cases, this conservation extends to metazoans.
Our computational and wet-bench results provide the first step toward understanding the diversity, conservation, and functional roles of plant candidate G-protein coupled receptors.
Comparative sequence analysis and annotation of genomic regions surrounding 150 presynaptic genes identified over 26,000 elements highly conserved in eight vertebrate species; these results are made available in the SynapseDB database.
The neuronal synapse is a fundamental functional unit in the central nervous system of animals. Because synaptic function is evolutionarily conserved, we reasoned that functional sequences of genes and related genomic elements known to play important roles in neurotransmitter release would also be conserved.
Evolutionary rate analysis revealed that presynaptic proteins evolve slowly, although some members of large gene families exhibit accelerated evolutionary rates relative to other family members. Comparative sequence analysis of 46 megabases spanning 150 presynaptic genes identified more than 26,000 elements that are highly conserved in eight vertebrate species, as well as a small subset of sequences (6%) that are shared among unrelated presynaptic genes. Analysis of large gene families revealed that upstream and intronic regions of closely related family members are extremely divergent. We also identified 504 exceptionally long conserved elements (≥360 base pairs, ≥80% pair-wise identity between human and other mammals) in intergenic and intronic regions of presynaptic genes. Many of these elements form a highly stable stem-loop RNA structure and consequently are candidates for novel regulatory elements, whereas some conserved noncoding elements are shown to correlate with specific gene expression profiles. The SynapseDB online database integrates these findings and other functional genomic resources for synaptic genes.
Highly conserved elements in nonprotein coding regions of 150 presynaptic genes represent sequences that may be involved in the transcriptional or post-transcriptional regulation of these genes. Furthermore, comparative sequence analysis will facilitate selection of genes and noncoding sequences for future functional studies and analysis of variation studies in neurodevelopmental and psychiatric disorders.
A computationally efficient statistical framework for estimating networks of coexpressed genes is presented that exploits first-order conditional independence relationships among gene expression measurements.
We describe a computationally efficient statistical framework for estimating networks of coexpressed genes. This framework exploits first-order conditional independence relationships among gene-expression measurements to estimate patterns of association. We use this approach to estimate a coexpression network from microarray gene-expression measurements from Saccharomyces cerevisiae. We demonstrate the biological utility of this approach by showing that a large number of metabolic pathways are coherently represented in the estimated network. We describe a complementary unsupervised graph search algorithm for discovering locally distinct subgraphs of a large weighted graph. We apply this algorithm to our coexpression network model and show that subgraphs found using this approach correspond to particular biological processes or contain representatives of distinct gene families.
With the availability of increasing amounts of genomic sequences, it is becoming clear that genomes experience horizontal transfer and incorporation of genetic information. However, to what extent such horizontal gene transfer (HGT) affects the core genealogical history of organisms remains controversial. Based on initial analyses of complete genomic sequences, HGT has been suggested to be so widespread that it might be the “essence of phylogeny” and might leave the treelike form of genealogy in doubt. On the other hand, possible biased estimation of HGT extent and the findings of coherent phylogenetic patterns indicate that phylogeny of life is well represented by tree graphs. Here, we reexamine this question by assessing the extent of HGT among core orthologous genes using a novel statistical method based on statistical comparisons of tree topology. We apply the method to 40 microbial genomes in the Clusters of Orthologous Groups database over a curated set of 297 orthologous gene clusters, and we detect significant HGT events in 33 out of 297 clusters over a wide range of functional categories. Estimates of positions of HGT events suggest a low mean genome-specific rate of HGT (2.0%) among the orthologous genes, which is in general agreement with other quantitative of HGT. We propose that HGT events, even when relatively common, still leave the treelike history of phylogenies intact, much like cobwebs hanging from tree branches.
A stastical approach applied to 297 orthologous gene clusters in 40 microbial genomes suggests a low rate of interspecies gene transfer. Species relationships can therefore be modeled with a tree structure.