Search tips
Search criteria

Results 1-25 (70)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Chromatin-associated RNAi components contribute to transcriptional regulation in Drosophila 
Nature  2011;480(7377):391-395.
RNAi pathways have evolved as important modulators of gene expression that act in the cytoplasm by degrading RNA target molecules via the activity of short (21-30nt) RNAs1-6 RNAi components have been reported to play a role in the nucleus as they are involved in epigenetic regulation and heterochromatin formation7-10. However, although RNAi-mediated post-transcriptional silencing (PTGS) is well documented, mechanisms of RNAi-mediated transcriptional gene silencing (TGS) and in particular the role of RNAi components in chromatin, especially in higher eukaryotes, are still elusive. Here we show that key RNAi components Dicer-2 (Dcr2) and and Argonaute-2 (AGO2) AGO2 associate with chromatin, with strong preference for euchromatic, transcriptionally active loci and interact with core transcription machinery. Notably Dcr2 and AGO2 loss of function show that transcriptional defects are accompanied by perturbation of Pol II positioning on promoters. Further, both Dcr2 and Ago2 null mutations as well as missense mutations compromising the RNAi activity impair global Pol II dynamics upon heat shock. Finally, AGO2 RIP-seq experiments reveal that, AGO2 is strongly enriched in small-RNAs encompassing promoter as well as other parts of heat shock and other gene loci on both sense and antisense, with a strong bias for antisense, particularly after heat shock. Taken together our results reveal a new scenario in which Dcr2 and AGO2 are globally associated with transcriptionally active loci and may play a pivotal role in shaping the transcriptome by controlling RNA Pol II processivity.
PMCID: PMC4082306  PMID: 22056986
2.  MOIRAI: a compact workflow system for CAGE analysis 
BMC Bioinformatics  2014;15:144.
Cap analysis of gene expression (CAGE) is a sequencing based technology to capture the 5’ ends of RNAs in a biological sample. After mapping, a CAGE peak on the genome indicates the position of an active transcriptional start site (TSS) and the number of reads correspond to its expression level. CAGE is prominently used in both the FANTOM and ENCODE project but presently there is no software package to perform the essential data processing steps.
Here we describe MOIRAI, a compact yet flexible workflow system designed to carry out the main steps in data processing and analysis of CAGE data. MOIRAI has a graphical interface allowing wet-lab researchers to create, modify and run analysis workflows. Embedded within the workflows are graphical quality control indicators allowing users assess data quality and to quickly spot potential problems. We will describe three main workflows allowing users to map, annotate and perform an expression analysis over multiple samples.
Due to the many built in quality control features MOIRAI is especially suitable to support the development of new sequencing based protocols.
The MOIRAI source code is freely available at
PMCID: PMC4033680  PMID: 24884663
CAGE; Pipeline; Next generation sequencing
3.  RECLU: a pipeline to discover reproducible transcriptional start sites and their alternative regulation using capped analysis of gene expression (CAGE) 
BMC Genomics  2014;15:269.
Next generation sequencing based technologies are being extensively used to study transcriptomes. Among these, cap analysis of gene expression (CAGE) is specialized in detecting the most 5’ ends of RNA molecules. After mapping the sequenced reads back to a reference genome CAGE data highlights the transcriptional start sites (TSSs) and their usage at a single nucleotide resolution.
We propose a pipeline to group the single nucleotide TSS into larger reproducible peaks and compare their usage across biological states. Importantly, our pipeline discovers broad peaks as well as the fine structure of individual transcriptional start sites embedded within them. We assess the performance of our approach on a large CAGE datasets including 156 primary cell types and two cell lines with biological replicas. We demonstrate that genes have complicated structures of transcription initiation events. In particular, we discover that narrow peaks embedded in broader regions of transcriptional activity can be differentially used even if the larger region is not.
By examining the reproducible fine scaled organization of TSS we can detect many differentially regulated peaks undetected by previous approaches.
PMCID: PMC4029093  PMID: 24779366
CAGE; Peak finding; Reproducibility; Hierarchical stability
4.  Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines 
BMC Genomics  2014;15:120.
Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from these two projects.
Transcription start sites can be distinguished by chromatin states defined by specific combinations of both chromatin mark enrichment and the profile shapes of these chromatin marks. The observed patterns can be associated with cellular functions and processes, and they also show association with expression level, location relative to nearby genes, and CpG content. In particular we find a substantial number of repressed inter- and intra-genic transcription start sites enriched for active chromatin marks and Pol II, and these sites are strongly associated with immediate-early response processes and cell signaling. Associations between start sites with similar chromatin patterns are validated by significant correlations in their global expression profiles.
The results confirm the link between chromatin state and cellular function for expressed transcripts, and also indicate that active chromatin states at repressed transcripts may poise transcripts for rapid activation during immune response.
PMCID: PMC3986914  PMID: 24669905
Fantom; Encode; Cage; Transcription start sites; Chromatin states; Gene expression
5.  NanoCAGE analysis of the mouse olfactory epithelium identifies the expression of vomeronasal receptors and of proximal LINE elements 
By coupling laser capture microdissection to nanoCAGE technology and next-generation sequencing we have identified the genome-wide collection of active promoters in the mouse Main Olfactory Epithelium (MOE). Transcription start sites (TSSs) for the large majority of Olfactory Receptors (ORs) have been previously mapped increasing our understanding of their promoter architecture. Here we show that in our nanoCAGE libraries of the mouse MOE we detect a large number of tags mapped in loci hosting Type-1 and Type-2 Vomeronasal Receptors genes (V1Rs and V2Rs). These loci also show a massive expression of Long Interspersed Nuclear Elements (LINEs). We have validated the expression of selected receptors detected by nanoCAGE with in situ hybridization, RT-PCR and qRT-PCR. This work extends the repertory of receptors capable of sensing chemical signals in the MOE, suggesting intriguing interplays between MOE and VNO for pheromone processing and positioning transcribed LINEs as candidate regulatory RNAs for VRs expression.
PMCID: PMC3927265  PMID: 24600346
vomeronasal receptors; main olfactory epithelium; vomeronasal organ; VNO; MOE; V1Rs; V2Rs
6.  NMDA Receptor Regulation Prevents Regression of Visual Cortical Function in the Absence of Mecp2 
Neuron  2012;76(6):1078-1090.
Brain function is shaped by postnatal experience and vulnerable to disruption of Methyl-CpG-binding protein, Mecp2, in multiple neurodevelopmental disorders. How Mecp2 contributes to the experience-dependent refinement of specific cortical circuits and their impairment remains unknown. We analyzed vision in gene-targeted mice and observed an initial normal development in the absence of Mecp2. Visual acuity then rapidly regressed after postnatal day P35–40 and cortical circuits largely fell silent by P55-60. Enhanced inhibitory gating and an excess of parvalbumin-positive, perisomatic input preceded the loss of vision. Both cortical function and inhibitory hyperconnectivity were strikingly rescued independent of Mecp2 by early sensory deprivation or genetic deletion of the excitatory NMDA receptor subunit, NR2A. Thus, vision is a sensitive biomarker of progressive cortical dysfunction and may guide novel, circuit-based therapies for Mecp2 deficiency.
PMCID: PMC3733788  PMID: 23259945
7.  A comprehensive promoter landscape identifies a novel promoter for CD133 in restricted tissues, cancers, and stem cells 
Frontiers in Genetics  2013;4:209.
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1.
PMCID: PMC3810939  PMID: 24194746
PROM1 protein; human; AC133 antigen; transcription start site; promoter regions; genetic; melanoma; cancer stem cells
8.  Temporal dynamics and transcriptional control using single-cell gene expression analysis 
Genome Biology  2013;14(10):R118.
Changes in environmental conditions lead to expression variation that manifest at the level of gene regulatory networks. Despite a strong understanding of the role noise plays in synthetic biological systems, it remains unclear how propagation of expression heterogeneity in an endogenous regulatory network is distributed and utilized by cells transitioning through a key developmental event.
Here we investigate the temporal dynamics of a single-cell transcriptional network of 45 transcription factors in THP-1 human myeloid monocytic leukemia cells undergoing differentiation to macrophages. We systematically measure temporal regulation of expression and variation by profiling 120 single cells at eight distinct time points, and infer highly controlled regulatory modules through which signaling operates with stochastic effects. This reveals dynamic and specific rewiring as a cellular strategy for differentiation. The integration of both positive and negative co-expression networks further identifies the proto-oncogene MYB as a network hinge to modulate both the pro- and anti-differentiation pathways.
Compared to averaged cell populations, temporal single-cell expression profiling provides a much more powerful technique to probe for mechanistic insights underlying cellular differentiation. We believe that our approach will form the basis of novel strategies to study the regulation of transcription at a single-cell level.
PMCID: PMC4015031  PMID: 24156252
9.  Comparison of RNA- or LNA-hybrid oligonucleotides in template-switching reactions for high-speed sequencing library preparation 
BMC Genomics  2013;14:665.
Analyzing the RNA pool or transcription start sites requires effective means to convert RNA into cDNA libraries for digital expression counting. With current high-speed sequencers, it is necessary to flank the cDNAs with specific adapters. Adding template-switching oligonucleotides to reverse transcription reactions is the most commonly used approach when working with very small quantities of RNA even from single cells.
Here we compared the performance of DNA-RNA, DNA-LNA and DNA oligonucleotides in template-switching during nanoCAGE library preparation. Test libraries from rat muscle and HeLa cell RNA were prepared in technical triplicates and sequenced for comparison of the gene coverage and distribution of the reads within transcripts. The DNA-RNA oligonucleotide showed the highest specificity for capped 5′ ends of mRNA, whereas the DNA-LNA provided similar gene coverage with more reads falling within exons.
While confirming the cap-specific preference of DNA-RNA oligonucleotides in template-switching reactions, our data indicate that DNA-LNA hybrid oligonucleotides could potentially find other applications in random RNA sequencing.
PMCID: PMC3853366  PMID: 24079827
CAGE; Template-switching; LNA; Transcriptome; Quantitative sequencing
10.  Trehalose-enhanced isolation of neuronal sub-types from adult mouse brain 
BioTechniques  2012;52(6):381-385.
Efficient isolation of specific, intact, living neurons from the adult brain is problematic due to the complex nature of the extracellular matrix consolidating the neuronal network. Here, we present significant improvements to the protocol for isolation of pure populations of neurons from mature postnatal mouse brain using fluorescence activated cell sorting (FACS). The 10-fold increase in cell yield enables cell-specific transcriptome analysis by protocols such as nano-CAGE and RNA seq.
PMCID: PMC3696583  PMID: 22668417
FACS; parvalbumin; pyramidal; nanoCAGE; RNA seq
11.  Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma 
Cell  2013;153(1):101-111.
LINE-1 (L1) retrotransposons are mobile genetic elements comprising ∼17% of the human genome. New L1 insertions can profoundly alter gene function and cause disease, though their significance in cancer remains unclear. Here, we applied enhanced retrotransposon capture sequencing (RC-seq) to 19 hepatocellular carcinoma (HCC) genomes and elucidated two archetypal L1-mediated mechanisms enabling tumorigenesis. In the first example, 4/19 (21.1%) donors presented germline retrotransposition events in the tumor suppressor mutated in colorectal cancers (MCC). MCC expression was ablated in each case, enabling oncogenic β-catenin/Wnt signaling. In the second example, suppression of tumorigenicity 18 (ST18) was activated by a tumor-specific L1 insertion. Experimental assays confirmed that the L1 interrupted a negative feedback loop by blocking ST18 repression of its enhancer. ST18 was also frequently amplified in HCC nodules from Mdr2−/− mice, supporting its assignment as a candidate liver oncogene. These proof-of-principle results substantiate L1-mediated retrotransposition as an important etiological factor in HCC.
Graphical Abstract
► L1 retrotransposons promote tumorigenesis in hepatocellular carcinoma (HCC) ► Germline L1 and Alu insertions in MCC activate β-catenin/Wnt signaling ► L1 mobilization in tumor cells accelerates transformation of the HCC genome ► A tumor-specific L1 insertion interrupts a negative feedback loop regulating ST18
L1 retrotransposons, which are widespread in the human genome, can mobilize and activate oncogenes in the livers of individuals infected with the hepatitis B or hepatitis C virus, promoting the development and growth of hepatocellular carcinoma. Genes identified by the L1 insertions present new options for cancer screening and intervention.
PMCID: PMC3898742  PMID: 23540693
12.  Site-specific DICER and DROSHA RNA products control the DNA damage response 
Nature  2012;488(7410):231-235.
Non-coding RNAs (ncRNAs) are involved in an increasing number of cellular events1. Some ncRNAs are processed by DICER and DROSHA ribonucleases to give rise to small double-stranded RNAs involved in RNA interference (RNAi)2. The DNA-damage response (DDR) is a signaling pathway that originates from the DNA lesion and arrests cell proliferation3. So far, DICER or DROSHA RNA products have not been reported to control DDR activation. Here we show that DICER and DROSHA, but not downstream elements of the RNAi pathway, are necessary to activate DDR upon oncogene-induced genotoxic stress and exogenous DNA damage, as studied also by DDR foci formation in mammalian cells and zebrafish and by checkpoint assays. DDR foci are sensitive to RNase A treatment, and DICER- and DROSHA-dependent RNA products are required to restore DDR foci in treated cells. Through RNA deep sequencing and studies of DDR activation at an inducible unique DNA double-strand break (DSB), we demonstrate that DDR foci formation requires site-specific DICER- and DROSHA-dependent small RNAs, named DDRNAs, which act in a MRE11-RAD50-NBS1 (MRN) complex-dependent manner. Chemically synthesized or in vitro-generated by DICER cleavage, DDRNAs are sufficient to restore DDR in RNase A-treated cells, also in the absence of other cellular RNAs. Our results describe an unanticipated direct role of a novel class of ncRNAs in the control of DDR activation at sites of DNA damage.
PMCID: PMC3442236  PMID: 22722852
DICER; DROSHA; small non coding RNAs; DNA damage response (DDR); ATM; cellular senescence; zebrafish
13.  piRNAs Warrant Investigation in Rett Syndrome: An Omics Perspective 
Disease markers  2012;33(5):261-275.
Mutations in the MECP2 gene are found in a large proportion of girls with Rett Syndrome. Despite extensive research, the principal role of MeCP2 protein remains elusive. Is MeCP2 a regulator of genes, acting in concert with co-activators and co-repressors, predominantly as an activator of target genes or is it a methyl CpG binding protein acting globally to change the chromatin state and to supress transcription from repeat elements? If MeCP2 has no specific targets in the genome, what causes the differential expression of specific genes in the Mecp2 knockout mouse brain? We discuss the discrepancies in current data and propose a hypothesis to reconcile some differences in the two viewpoints. Since transcripts from repeat elements contribute to piRNA biogenesis, we propose that piRNA levels may be higher in the absence of MeCP2 and that increased piRNA levels may contribute to the mis-regulation of some genes seen in the Mecp2 knockout mouse brain. We provide preliminary data showing an increase in piRNAs in the Mecp2 knockout mouse cerebellum. Our investigation suggests that global piRNA levels may be elevated in the Mecp2 knockout mouse cerebellum and strongly supports further investigation of piRNAs in Rett syndrome.
PMCID: PMC3810717  PMID: 22976001
Rett Syndrome; MeCP2; piRNAs; LINE 1; short RNAs
14.  Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching 
Nucleic Acids Research  2012;41(3):e44.
Template switching (TS) has been an inherent mechanism of reverse transcriptase, which has been exploited in several transcriptome analysis methods, such as CAGE, RNA-Seq and short RNA sequencing. TS is an attractive option, given the simplicity of the protocol, which does not require an adaptor mediated step and thus minimizes sample loss. As such, it has been used in several studies that deal with limited amounts of RNA, such as in single cell studies. Additionally, TS has also been used to introduce DNA barcodes or indexes into different samples, cells or molecules. This labeling allows one to pool several samples into one sequencing flow cell, increasing the data throughput of sequencing and takes advantage of the increasing throughput of current sequences. Here, we report TS artifacts that form owing to a process called strand invasion. Due to the way in which barcodes/indexes are introduced by TS, strand invasion becomes more problematic by introducing unsystematic biases. We describe a strategy that eliminates these artifacts in silico and propose an experimental solution that suppresses biases from TS.
PMCID: PMC3562004  PMID: 23180801
15.  Somatic retrotransposition alters the genetic landscape of the human brain 
Nature  2011;479(7374):534-537.
Retrotransposons are mobile genetic elements that employ a germ line “copy-and-paste” mechanism to spread throughout metazoan genomes1. At least 50% of the human genome is derived from retrotransposons, with three active families (L1, Alu and SVA) associated with insertional mutagenesis and disease2-3. Epigenetic and post-transcriptional suppression block retrotransposition in somatic cells4-5, excluding early embryo development and some malignancies6-7. Recent reports of L1 expression8-9 and copy number variation10-11 (CNV) in the human brain suggest L1 mobilization may also occur during later development. However, the corresponding integration sites have not been mapped. Here we apply a high-throughput method to identify numerous L1, Alu and SVA germ line mutations, as well as 7,743 putative somatic L1 insertions in the hippocampus and caudate nucleus of three individuals. Surprisingly, we also found 13,692 and 1,350 somatic Alu and SVA insertions, respectively. Our results demonstrate that retrotransposons mobilize to protein-coding genes differentially expressed and active in the brain. Thus, somatic genome mosaicism driven by retrotransposition may reshape the genetic circuitry that underpins normal and abnormal neurobiological processes.
PMCID: PMC3224101  PMID: 22037309
16.  Automated Workflow for Preparation of cDNA for Cap Analysis of Gene Expression on a Single Molecule Sequencer 
PLoS ONE  2012;7(1):e30809.
Cap analysis of gene expression (CAGE) is a 5′ sequence tag technology to globally determine transcriptional starting sites in the genome and their expression levels and has most recently been adapted to the HeliScope single molecule sequencer. Despite significant simplifications in the CAGE protocol, it has until now been a labour intensive protocol.
In this study we set out to adapt the protocol to a robotic workflow, which would increase throughput and reduce handling. The automated CAGE cDNA preparation system we present here can prepare 96 ‘HeliScope ready’ CAGE cDNA libraries in 8 days, as opposed to 6 weeks by a manual operator.We compare the results obtained using the same RNA in manual libraries and across multiple automation batches to assess reproducibility.
We show that the sequencing was highly reproducible and comparable to manual libraries with an 8 fold increase in productivity. The automated CAGE cDNA preparation system can prepare 96 CAGE sequencing samples simultaneously. Finally we discuss how the system could be used for CAGE on Illumina/SOLiD platforms, RNA-seq and full-length cDNA generation.
PMCID: PMC3268765  PMID: 22303458
18.  The Short Non-Coding Transcriptome of the Protozoan Parasite Trypanosoma cruzi 
The pathway for RNA interference is widespread in metazoans and participates in numerous cellular tasks, from gene silencing to chromatin remodeling and protection against retrotransposition. The unicellular eukaryote Trypanosoma cruzi is missing the canonical RNAi pathway and is unable to induce RNAi-related processes. To further understand alternative RNA pathways operating in this organism, we have performed deep sequencing and genome-wide analyses of a size-fractioned cDNA library (16–61 nt) from the epimastigote life stage. Deep sequencing generated 582,243 short sequences of which 91% could be aligned with the genome sequence. About 95–98% of the aligned data (depending on the haplotype) corresponded to small RNAs derived from tRNAs, rRNAs, snRNAs and snoRNAs. The largest class consisted of tRNA-derived small RNAs which primarily originated from the 3′ end of tRNAs, followed by small RNAs derived from rRNA. The remaining sequences revealed the presence of 92 novel transcribed loci, of which 79 did not show homology to known RNA classes.
Author Summary
Chagas' disease is a major health problem in Latin America and is caused by the protozoan parasite Trypanosoma cruzi. T. cruzi lacks the pathway for RNA interference, which is widespread among eukaryotes, and is therefore unable to induce RNAi-related processes. In many organisms, small RNAs play an important role in regulating gene expression and other cellular processes. In order to understand if other small RNA pathways are operating in this organism, we performed high throughput sequencing and genome-wide analyses of the short transcriptome. We identified an abundance of small RNAs derived from non-coding RNA genes, including transfer RNAs, ribosomal RNAs as well as small nucleolar RNAs and small nuclear RNAs. Certain tRNA types were overrepresented as precursors for small RNAs. Further, we identified 79 novel small non-coding RNAs, not previously reported. We did not identify canonical small RNAs, like microRNAs and small interfering RNAs, and concluded that these do not exist in T. cruzi. This study has provided insights into the short transcriptome of a major human pathogen and provided starting points for further functional investigation of small RNAs and their biological roles.
PMCID: PMC3166047  PMID: 21912713
19.  Nuclear Pore Complex Protein Mediated Nuclear Localization of Dicer Protein in Human Cells 
PLoS ONE  2011;6(8):e23385.
Human DICER1 protein cleaves double-stranded RNA into small sizes, a crucial step in production of single-stranded RNAs which are mediating factors of cytoplasmic RNA interference. Here, we clearly demonstrate that human DICER1 protein localizes not only to the cytoplasm but also to the nucleoplasm. We also find that human DICER1 protein associates with the NUP153 protein, one component of the nuclear pore complex. This association is detected predominantly in the cytoplasm but is also clearly distinguishable at the nuclear periphery. Additional characterization of the NUP153-DICER1 association suggests NUP153 plays a crucial role in the nuclear localization of the DICER1 protein.
PMCID: PMC3156128  PMID: 21858095
20.  Genetics, Gene Expression and Bioinformatics of the Pituitary Gland 
Hormone research  2009;71(Suppl 2):101-115.
Genetic cases of congenital pituitary hormone deficiency are common and many are caused by transcription factor defects. Mouse models with orthologous mutations are invaluable for uncovering the molecular mechanisms that lead to problems in organ development and typical patient characteristics. We are using mutant mice defective in the transcription factors PROP1 and POU1F1 for gene expression profiling to identify target genes for these critical transcription factors and candidates for cases of pituitary hormone deficiency of unknown etiology. These studies reveal critical roles for Wnt signalling pathways including the TCF/LEF transcription factors and interacting proteins of the groucho family, bone morphogenetic proteins antagonists, and targets of notch signalling. Current studies are investigating roles of novel homeobox genes and pathways that regulate the transition from proliferation to differentiation, cell adhesion and cell migration.
Pituitary adenomas are a common human health problem, yet most cases are sporadic, necessitating alternative approaches to traditional Mendelian genetic studies. Mouse models of adenoma formation offer the opportunity for gene expression profiling during progressive stages of hyperplasia, adenoma and tumorigenesis. This approach holds promise for identification of relevant pathways and candidate genes as risk factors for adenoma formation, understanding mechanisms of progression, and identifying drug targets and clinically relevant biomarkers.
PMCID: PMC3140954  PMID: 19407506
cell proliferation; apoptosis; transcription factors; Prop1; Emx2
21.  The Reality of Pervasive Transcription 
PLoS Biology  2011;9(7):e1000625.
Despite recent controversies, the evidence that the majority of the human genome is transcribed into RNA remains strong.
PMCID: PMC3134446  PMID: 21765801
22.  Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan 
Nature methods  2010;7(7):528-534.
Large-scale sequencing projects have revealed an unexpected complexity in the origins, structures and functions of mammalian transcripts. Many loci are known to produce overlapping coding and non-coding RNAs with capped 5′ ends that vary in size. Methods that identify the 5′ ends of transcripts will facilitate the discovery of novel promoters and 5′ ends derived from secondary capping events. Such methods often require high input amounts of RNA not obtainable from highly refined samples such as tissue microdissections and subcellular fractions. Therefore, we have developed nanoCAGE (Cap Analysis of Gene Expression), a method that captures the 5′ ends of transcripts from as little as 10 nanograms of total RNA and CAGEscan, a mate-pair adaptation of nanoCAGE that captures the transcript 5′ ends linked to a downstream region. Both of these methods allow further annotation-agnostic studies of the complex human transcriptome.
PMCID: PMC2906222  PMID: 20543846
23.  High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur 
PLoS ONE  2010;5(11):e13934.
Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.
Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.
Our analysis of human chromosomes 4, 21 and 22 estimates ∼46%, ∼41% and ∼27% of these chromosomes, respectively, as being NTLs. This suggests that on average more than 40% of the human genome can be expected to be highly unlikely to initiate transcription. Our method represents the first one that utilizes high-sensitivity TSS prediction to identify, with high accuracy, large portions of mammalian genomes as NTLs. The server with our algorithm implemented is available at
PMCID: PMC2981523  PMID: 21085627
24.  Update of the FANTOM web resource: from mammalian transcriptional landscape to its dynamic regulation 
Nucleic Acids Research  2010;39(Database issue):D856-D860.
The international Functional Annotation Of the Mammalian Genomes 4 (FANTOM4) research collaboration set out to better understand the transcriptional network that regulates macrophage differentiation and to uncover novel components of the transcriptome employing a series of high-throughput experiments. The primary and unique technique is cap analysis of gene expression (CAGE), sequencing mRNA 5′-ends with a second-generation sequencer to quantify promoter activities even in the absence of gene annotation. Additional genome-wide experiments complement the setup including short RNA sequencing, microarray gene expression profiling on large-scale perturbation experiments and ChIP–chip for epigenetic marks and transcription factors. All the experiments are performed in a differentiation time course of the THP-1 human leukemic cell line. Furthermore, we performed a large-scale mammalian two-hybrid (M2H) assay between transcription factors and monitored their expression profile across human and mouse tissues with qRT-PCR to address combinatorial effects of regulation by transcription factors. These interdependent data have been analyzed individually and in combination with each other and are published in related but distinct papers. We provide all data together with systematic annotation in an integrated view as resource for the scientific community ( Additionally, we assembled a rich set of derived analysis results including published predicted and validated regulatory interactions. Here we introduce the resource and its update after the initial release.
PMCID: PMC3013704  PMID: 21075797
25.  Annotating non-coding transcription using functional genomics strategies 
Non-coding RNA (ncRNA) transcripts are RNA molecules that do not code for proteins, but elicit function by other mechanisms. The vast majority of RNA produced in a cell is non-coding ribosomal RNA, produced from relatively few loci, however more recently complementary DNA (cDNA) cloning, tag sequencing, and genome tiling array studies suggest that ncRNAs also account for the majority of RNA species produced by a cell. ncRNA based regulation has been referred to as a ‘hidden layer’ of signals or ‘dark matter’ that control gene expression in cellular processes by poorly described mechanisms. These terms have appeared as ncRNAs until recently have been ignored by expression profiling and cDNA annotation projects and their mode of action is diverse (e.g. influencing chromatin structure and epigenetics, translational silencing, transcriptional silencing). Here, we highlight recent functional genomics strategies toward identifying and assigning function to ncRNA transcription.
PMCID: PMC2762128  PMID: 19833699
non-coding RNA; Sequencing; transcription; annotation

Results 1-25 (70)