Search tips
Search criteria

Results 1-25 (31)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  The Effects of Mary Rose Conservation Treatment on Iron Oxidation Processes and Microbial Communities Contributing to Acid Production in Marine Archaeological Timbers 
PLoS ONE  2014;9(2):e84169.
The Tudor warship the Mary Rose has reached an important transition point in her conservation. The 19 year long process of spraying with polyethylene glycol (PEG) has been completed (April 29th 2013) and the hull is air drying under tightly controlled conditions. Acidophilic bacteria capable of oxidising iron and sulfur have been previously identified and enriched from unpreserved timbers of the Mary Rose, demonstrating that biological pathways of iron and sulfur oxidization existed potentially in this wood, before preservation with PEG. This study was designed to establish if the recycled PEG spray system was a reservoir of microorganisms capable of iron and sulfur oxidization during preservation of the Mary Rose. Microbial enrichments derived from PEG impregnated biofilm collected from underneath the Mary Rose hull, were examined to better understand the processes of cycling of iron. X-ray absorption spectroscopy was utilised to demonstrate the biological contribution to production of sulfuric acid in the wood. Using molecular microbiological techniques to examine these enrichment cultures, PEG was found to mediate a shift in the microbial community from a co-culture of Stenotrophomonas and Brevunidimonas sp, to a co-culture of Stenotrophomonas and the iron oxidising Alicyclobacillus sp. Evidence is presented that PEG is not an inert substance in relation to the redox cycling of iron. This is the first demonstration that solutions of PEG used in the conservation of the Mary Rose are promoting the oxidation of ferrous iron in acidic solutions, in which spontaneous abiotic oxidation does not occur in water. Critically, these results suggest PEG mediated redox cycling of iron between valence states in solutions of 75% PEG 200 and 50% PEG 2000 (v/v) at pH 3.0, with serious implications for the future use of PEG as a conservation material of iron rich wooden archaeological artefacts.
PMCID: PMC3929279  PMID: 24586230
2.  Increased plasma levels of soluble vascular endothelial growth factor (VEGF) receptor 1 (sFlt-1) in women by moderate exercise and increased plasma levels of VEGF in overweight/obese women 
The incidence of breast cancer is increasing worldwide, and this seems to be related to an increase in lifestyle risk factors, including physical inactivity, and overweight/obesity. We previously reported that exercise induced a circulating angiostatic phenotype characterized by increased sFlt-1 and endostatin and decreased unbound-VEGF in men. However, there is no data on women. The present study determines the following: 1) whether moderate exercise increased sFlt-1 and endostatin and decreased unbound-VEGF in the circulation of adult female volunteers; 2) whether overweight/obese women have a higher plasma level of unbound-VEGF than lean women. 72 African American and Caucasian adult women volunteers aged from 18–44 were enrolled into the exercise study. All the participants walked on a treadmill for 30 minutes at a moderate intensity (55–59% heart rate reserve), and oxygen consumption (VO2) was quantified by utilizing a metabolic cart. We had the blood samples before and immediately after exercise from 63 participants. ELISA assays (R&D Systems) showed that plasma levels of sFlt-1 were 67.8±3.7 pg/ml immediately after exercise (30 minutes), significantly higher than basal levels, 54.5±3.3 pg/ml, before exercise (P < 0.01; n=63). There was no significant difference in the % increase of sFlt-1 levels after exercise between African American and Caucasian (P=0.533) or between lean and overweight/obese women (P=0.892). There was no significant difference in plasma levels of unbound VEGF (35.28±5.47 vs. 35.23±4.96 pg/ml; P=0.99) or endostatin (111.12±5.48 vs. 115.45±7.15 ng/ml; P=0.63) before and after exercise. Basal plasma levels of unbound-VEGF in overweight/obese women were 52.26±9.6 pg/ml, significantly higher than basal levels of unbound-VEGF in lean women, 27.34±4.99 pg/ml (P < 0.05). The results support our hypothesis that exercise-induced plasma levels of sFlt-1 could be an important clinical biomarker to explore the mechanisms of exercise training in reducing breast cancer progression and that VEGF is an important biomarker in obesity and obesity-related cancer progression.
PMCID: PMC3449013  PMID: 22609636
Exercise; Young adult women; Overweight/obese; sFlt-1; Endostatin; VEGF
3.  Regionally Specific and Genome-Wide Analyses Conclusively Demonstrate the Absence of CpG Methylation in Human Mitochondrial DNA 
Molecular and Cellular Biology  2013;33(14):2683-2690.
Although CpG methylation clearly distributes genome-wide in vertebrate nuclear DNA, the state of methylation in the vertebrate mitochondrial genome has been unclear. Several recent reports using immunoprecipitation, mass spectrometry, and enzyme-linked immunosorbent assay methods concluded that human mitochondrial DNA (mtDNA) has much more than the 2 to 5% CpG methylation previously estimated. However, these methods do not provide information as to the sites or frequency of methylation at each CpG site. Here, we have used the more definitive bisulfite genomic sequencing method to examine CpG methylation in HCT116 human cells and primary human cells to independently answer these two questions. We found no evidence of CpG methylation at a biologically significant level in these regions of the human mitochondrial genome. Furthermore, unbiased next-generation sequencing of sodium bisulfite treated total DNA from HCT116 cells and analysis of genome-wide sodium bisulfite sequencing data sets from several other DNA sources confirmed this absence of CpG methylation in mtDNA. Based on our findings using regionally specific and genome-wide approaches with multiple human cell sources, we can definitively conclude that CpG methylation is absent in mtDNA. It is highly unlikely that CpG methylation plays any role in direct control of mitochondrial function.
PMCID: PMC3700126  PMID: 23671186
4.  A Reference Methylome Database and Analysis Pipeline to Facilitate Integrative and Comparative Epigenomics 
PLoS ONE  2013;8(12):e81148.
DNA methylation is implicated in a surprising diversity of regulatory, evolutionary processes and diseases in eukaryotes. The introduction of whole-genome bisulfite sequencing has enabled the study of DNA methylation at a single-base resolution, revealing many new aspects of DNA methylation and highlighting the usefulness of methylome data in understanding a variety of genomic phenomena. As the number of publicly available whole-genome bisulfite sequencing studies reaches into the hundreds, reliable and convenient tools for comparing and analyzing methylomes become increasingly important. We present MethPipe, a pipeline for both low and high-level methylome analysis, and MethBase, an accompanying database of annotated methylomes from the public domain. Together these resources enable researchers to extract interesting features from methylomes and compare them with those identified in public methylomes in our database.
PMCID: PMC3855694  PMID: 24324667
5.  scAAV-Mediated Gene Transfer of Interleukin 1-Receptor Antagonist to Synovium and Articular Cartilage in Large Mammalian Joints 
Gene therapy  2012;20(6):670-677.
With the long-term goal of developing a gene-based treatment for osteoarthritis (OA), we performed studies to evaluate the equine joint as a model for AAV-mediated gene transfer to large, weight-bearing human joints. A self-complementary AAV2 vector containing the coding regions for human interleukin-1 receptor antagonist (hIL-1Ra) or green fluorescent protein (GFP) was packaged in AAV capsid serotypes 1, 2, 5, 8 and 9. Following infection of human and equine synovial fibroblasts in culture, we found that both were only receptive to transduction with AAV1, 2 and 5. For these serotypes, however, transgene expression from the equine cells was consistently at least 10-fold higher. Analyses of AAV surface receptor molecules and intracellular trafficking of vector genomes implicate enhanced viral uptake by the equine cells. Following delivery of 1 × 1011 vector genomes of serotypes 2, 5 and 8 into the forelimb joints of the horse, all three enabled hIL-1Ra expression at biologically relevant levels and effectively transduced the same cell types, primarily synovial fibroblasts and, to a lesser degree, chondrocytes in articular cartilage. These results provide optimism that AAV vectors can be effectively adapted for gene delivery to large human joints affected by OA.
PMCID: PMC3577988  PMID: 23151520
Osteoarthritis; Self-complementary Adeno-Associated Virus; Interleukin-1 Receptor Antagonist; Synovium; Cartilage; Equine
6.  Site identification in high-throughput RNA–protein interaction data 
Bioinformatics  2012;28(23):3013-3020.
Motivation: Post-transcriptional and co-transcriptional regulation is a crucial link between genotype and phenotype. The central players are the RNA-binding proteins, and experimental technologies [such as cross-linking with immunoprecipitation- (CLIP-) and RIP-seq] for probing their activities have advanced rapidly over the course of the past decade. Statistically robust, flexible computational methods for binding site identification from high-throughput immunoprecipitation assays are largely lacking however.
Results: We introduce a method for site identification which provides four key advantages over previous methods: (i) it can be applied on all variations of CLIP and RIP-seq technologies, (ii) it accurately models the underlying read-count distributions, (iii) it allows external covariates, such as transcript abundance (which we demonstrate is highly correlated with read count) to inform the site identification process and (iv) it allows for direct comparison of site usage across cell types or conditions.
Availability and implementation: We have implemented our method in a software tool called Piranha. Source code and binaries, licensed under the GNU General Public License (version 3) are freely available for download from
Supplementary information: Supplementary data available at Bioinformatics online.
PMCID: PMC3509493  PMID: 23024010
7.  Predicting the molecular complexity of sequencing libraries 
Nature methods  2013;10(4):325-327.
Predicting the molecular complexity of a genomic sequencing library has emerged as a critical but difficult problem in modern applications of genome sequencing. Available methods to determine either how deeply to sequence, or predict the benefits of additional sequencing, are almost completely lacking. We introduce an empirical Bayesian method to implicitly model any source of bias and accurately characterize the molecular complexity of a DNA sample or library in almost any sequencing application.
PMCID: PMC3612374  PMID: 23435259
8.  MLML: consistent simultaneous estimates of DNA methylation and hydroxymethylation 
Bioinformatics  2013;29(20):2645-2646.
Motivation: The two major epigenetic modifications of cytosines, 5-methylcytosine (5-mC) and 5-hydroxymethylcytosine (5-hmC), coexist with each other in a range of mammalian cell populations. Increasing evidence points to important roles of 5-hmC in demethylation of 5-mC and epigenomic regulation in development. Recently developed experimental methods allow direct single-base profiling of either 5-hmC or 5-mC. Meaningful analyses seem to require combining these experiments with bisulfite sequencing, but doing so naively produces inconsistent estimates of 5-mC or 5-hmC levels.
Results: We present a method to jointly model read counts from bisulfite sequencing, oxidative bisulfite sequencing and Tet-Assisted Bisulfite sequencing, providing simultaneous estimates of 5-hmC and 5-mC levels that are consistent across experiment types.
Supplementary information: Supplementary material is available at Bioinformatics online.
PMCID: PMC3789553  PMID: 23969133
9.  Coping with continuous human disturbance in the wild: insights from penguin heart rate response to various stressors 
BMC Ecology  2012;12:10.
A central question for ecologists is the extent to which anthropogenic disturbances (e.g. tourism) might impact wildlife and affect the systems under study. From a research perspective, identifying the effects of human disturbance caused by research-related activities is crucial in order to understand and account for potential biases and derive appropriate conclusions from the data.
Here, we document a case of biological adjustment to chronic human disturbance in a colonial seabird, the king penguin (Aptenodytes patagonicus), breeding on remote and protected islands of the Southern ocean. Using heart rate (HR) as a measure of the stress response, we show that, in a colony with areas exposed to the continuous presence of humans (including scientists) for over 50 years, penguins have adjusted to human disturbance and habituated to certain, but not all, types of stressors. When compared to birds breeding in relatively undisturbed areas, birds in areas of high chronic human disturbance were found to exhibit attenuated HR responses to acute anthropogenic stressors of low-intensity (i.e. sounds or human approaches) to which they had been subjected intensely over the years. However, such attenuation was not apparent for high-intensity stressors (i.e. captures for scientific research) which only a few individuals experience each year.
Habituation to anthropogenic sounds/approaches could be an adaptation to deal with chronic innocuous stressors, and beneficial from a research perspective. Alternately, whether penguins have actually habituated to anthropogenic disturbances over time or whether human presence has driven the directional selection of human-tolerant phenotypes, remains an open question with profound ecological and conservation implications, and emphasizes the need for more knowledge on the effects of human disturbance on long-term studied populations.
PMCID: PMC3543187  PMID: 22784366
Stress; Heart rate; Habituation; Selection; Seabird; Human disturbance; Long-term monitoring
10.  A Geometric Interpretation for Local Alignment-Free Sequence Comparison 
Journal of Computational Biology  2013;20(7):471-485.
Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant \documentclass{aastex}\usepackage{amsbsy}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{bm}\usepackage{mathrsfs}\usepackage{pifont}\usepackage{stmaryrd}\usepackage{textcomp}\usepackage{portland, xspace}\usepackage{amsmath, amsxtra}\pagestyle{empty}\DeclareMathSizes{10}{9}{7}{6}\begin{document} $$D_2^*$$ \end{document} as the similarity measure, local alignment-free comparison between a pair of sequences can be formulated as the problem of finding the maximum bichromatic dot product between two sets of points in high dimensions. We introduce a geometric framework that reduces this problem to that of finding the bichromatic closest pair (BCP), allowing the properties of the underlying metric to be leveraged. Local alignment-free sequence comparison can be solved by making a quadratic number of alignment-free substring comparisons. We show both theoretically and through empirical results on simulated data that our approximation algorithm requires a subquadratic number of such comparisons and trades only a small amount of accuracy to achieve this efficiency. Therefore, our algorithm can extend the current usage of alignment-free–based methods and can also be regarded as a substitute for local alignment algorithms in many biological studies.
PMCID: PMC3704055  PMID: 23829649
algorithms; alignment; dynamic programming; metagenomics
11.  Transcription Start Site Evolution in Drosophila 
Molecular Biology and Evolution  2013;30(8):1966-1974.
Transcription start site (TSS) evolution remains largely undescribed in Drosophila, likely due to limited annotations in non-melanogaster species. In this study, we introduce a concise new method that selectively sequences from the 5′-end of mRNA and used it to identify TSS in four Drosophila species, including Drosophila melanogaster, D. simulans, D. sechellia, and D. pseudoobscura. For verification, we compared our results in D. melanogaster with known annotations, published 5′-rapid amplification of cDNA ends data, and with RNAseq from the same mRNA pool. Then, we paired 2,849 D. melanogaster TSS with its closest equivalent TSS in each species (likely to be its true ortholog) using the available multiple sequence alignments. Most of the D. melanogaster TSSs were successfully paired with an ortholog in each species (83%, 86%, and 55% for D. simulans, D. sechellia, and D. pseudoobscura, respectively). On the basis of the number and distribution of reads mapped at each TSS, we also estimated promoter-specific expression (PSE) and TSS peak shape, respectively. Among paired TSS orthologs, the location and promoter activity were largely conserved. TSS location appears important as PSE, and TSS peak shape was more frequently divergent among TSS that had moved. Unpaired TSS were surprisingly common in D. pseudoobscura. An increased mutation rate upstream of TSS might explain this pattern. We found an enrichment of ribosomal protein genes among diverged TSS, suggesting that TSS evolution is not uniform across the genome.
PMCID: PMC3708499  PMID: 23649539
promoter; transcription start site; gene expression; CAGE; TSS; Drosophila
12.  Lebrikizumab in the personalized management of asthma 
There is a need for improved therapies for severe asthma. Lebrikizumab, a humanized monoclonal antibody that binds to interleukin (IL)-13, is under development for the treatment of poorly controlled asthma. This article reviews the potential role of IL-13 in the pathogenesis of asthma, the efficacy and safety of lebrikizumab in humans, and progress in patient selection for lebrikizumab therapy. IL-13 is a T-helper (Th2) cell-derived cytokine implicated in inflammatory responses in asthma, including serum immunoglobulin-E synthesis, mucus hypersecretion, and subepithelial fibrosis. Blocking the pro-inflammatory effects of IL-13 with lebrikizumab has the potential to improve asthma control. Published data on the efficacy and safety of lebrikizumab in the treatment of asthma are relatively limited. The late asthmatic response after inhaled allergen challenge is reduced by almost 50%, following treatment with lebrikizumab. In a Phase II study performed in 219 adults with poorly controlled asthma despite inhaled corticosteroids (MILLY trial), lebrikizumab produced an improvement in prebronchodilator forced expiratory volume in 1 second of 5.5% compared with placebo at 12 weeks, but had no effects on other efficacy end points. Adverse effects were similar to placebo, except that musculoskeletal side effects occurred slightly more often with lebrikizumab. Stratifying patients into a high Th2 phenotype using serum periostin, which is upregulated in lung epithelial cells by IL-13, may identify individuals responsive to blockade of IL-13. In the MILLY trial, lebrikizumab treatment was associated with greater improvement in lung function in patients with elevated serum periostin levels compared with those with low periostin levels. Two large Phase III randomized controlled trials in patients with uncontrolled asthma are underway to establish the safety and efficacy of lebrikizumab when administered over a 52-week period. These studies will also help to determine whether identifying patients with a Th2 high inflammatory phenotype using serum periostin allows a personalized approach to the treatment of asthma.
PMCID: PMC3459551  PMID: 23055690
asthma; periostin; interleukin-13; phenotypes; exhaled nitric oxide; lebrikizumab
13.  Directional DNA Methylation Changes and Complex Intermediate States Accompany Lineage Specificity in the Adult Hematopoietic Compartment 
Molecular cell  2011;44(1):17-28.
DNA methylation has been implicated as an epigenetic component of mechanisms that stabilize cell-fate decisions. Here, we have characterized the methylomes of human female hematopoietic stem/progenitor cells (HSPCs) and mature cells from the myeloid and lymphoid lineages. Hypomethylated regions (HMRs) associated with lineage-specific genes were often methylated in the opposing lineage. In HSPCs, these sites tended to show intermediate, complex patterns that resolve to uniformity upon differentiation, by increased or decreased methylation. Promoter HMRs shared across diverse cell types typically display a constitutive core that expands and contracts in a lineage-specific manner to fine-tune the expression of associated genes. Many newly identified intergenic HMRs, both constitutive and lineage specific, were enriched for factor binding sites with an implied role in genome organization and regulation of gene expression, respectively. Overall, our studies represent an important reference data set and provide insights into directional changes in DNA methylation as cells adopt terminal fates.
PMCID: PMC3412369  PMID: 21924933
14.  Before It Gets Started: Regulating Translation at the 5′ UTR 
Translation regulation plays important roles in both normal physiological conditions and diseases states. This regulation requires cis-regulatory elements located mostly in 5′ and 3′ UTRs and trans-regulatory factors (e.g., RNA binding proteins (RBPs)) which recognize specific RNA features and interact with the translation machinery to modulate its activity. In this paper, we discuss important aspects of 5′ UTR-mediated regulation by providing an overview of the characteristics and the function of the main elements present in this region, like uORF (upstream open reading frame), secondary structures, and RBPs binding motifs and different mechanisms of translation regulation and the impact they have on gene expression and human health when deregulated.
PMCID: PMC3368165  PMID: 22693426
15.  Sperm methylation profiles reveal features of epigenetic inheritance and evolution in primates 
Cell  2011;146(6):1029-1041.
During germ cell and preimplantation development, mammalian cells undergo nearly complete reprogramming of DNA methylation patterns. We profiled the methylomes of human and chimp sperm as a basis for comparison to methylation patterns of ES cells. While the majority of promoters escape methylation in both ES cells and sperm, the corresponding hypomethylated regions show substantial structural differences. Repeat elements are heavily methylated in both germ and somatic cells; however, retrotransposons from several subfamilies evade methylation more effectively during male germ cell development, while other subfamilies show the opposite trend. Comparing methylomes of human and chimp sperm revealed a subset of differentially methylated promoters and strikingly divergent methylation in retrotransposon subfamilies, with an evolutionary impact that is apparent in the underlying genomic sequence. Thus, the features that determine DNA methylation patterns differ between male germ cells and somatic cells, and elements of these features have diverged between humans and chimpanzees.
PMCID: PMC3205962  PMID: 21925323
16.  On the Value of Intra-Motif Dependencies of Human Insulator Protein CTCF 
PLoS ONE  2014;9(1):e85629.
The binding affinity of DNA-binding proteins such as transcription factors is mainly determined by the base composition of the corresponding binding site on the DNA strand. Most proteins do not bind only a single sequence, but rather a set of sequences, which may be modeled by a sequence motif. Algorithms for de novo motif discovery differ in their promoter models, learning approaches, and other aspects, but typically use the statistically simple position weight matrix model for the motif, which assumes statistical independence among all nucleotides. However, there is no clear justification for that assumption, leading to an ongoing debate about the importance of modeling dependencies between nucleotides within binding sites. In the past, modeling statistical dependencies within binding sites has been hampered by the problem of limited data. With the rise of high-throughput technologies such as ChIP-seq, this situation has now changed, making it possible to make use of statistical dependencies effectively. In this work, we investigate the presence of statistical dependencies in binding sites of the human enhancer-blocking insulator protein CTCF by using the recently developed model class of inhomogeneous parsimonious Markov models, which is capable of modeling complex dependencies while avoiding overfitting. These findings lead to a more detailed characterization of the CTCF binding motif, which is only poorly represented by independent nucleotide frequencies at several positions, predominantly at the 3′ end.
PMCID: PMC3899044  PMID: 24465627
17.  Genomic Analyses Reveal Broad Impact of miR-137 on Genes Associated with Malignant Transformation and Neuronal Differentiation in Glioblastoma Cells 
PLoS ONE  2014;9(1):e85591.
miR-137 plays critical roles in the nervous system and tumor development; an increase in its expression is required for neuronal differentiation while its reduction is implicated in gliomagenesis. To evaluate the potential of miR-137 in glioblastoma therapy, we conducted genome-wide target mapping in glioblastoma cells by measuring the level of association between PABP and mRNAs in cells transfected with miR-137 mimics vs. controls via RIPSeq. Impact on mRNA levels was also measured by RNASeq. By combining the results of both experimental approaches, 1468 genes were found to be negatively impacted by miR-137 – among them, 595 (40%) contain miR-137 predicted sites. The most relevant targets include oncogenic proteins and key players in neurogenesis like c-KIT, YBX1, AKT2, CDC42, CDK6 and TGFβ2. Interestingly, we observed that several identified miR-137 targets are also predicted to be regulated by miR-124, miR-128 and miR-7, which are equally implicated in neuronal differentiation and gliomagenesis. We suggest that the concomitant increase of these four miRNAs in neuronal stem cells or their repression in tumor cells could produce a robust regulatory effect with major consequences to neuronal differentiation and tumorigenesis.
PMCID: PMC3899048  PMID: 24465609
18.  Identifying dispersed epigenomic domains from ChIP-Seq data 
Bioinformatics  2011;27(6):870-871.
Motivation: Post-translational modifications to histones have several well known associations with regulation of gene expression. While some modifications appear concentrated narrowly, covering promoters or enhancers, others are dispersed as epigenomic domains. These domains mark contiguous regions sharing an epigenomic property, such as actively transcribed or poised genes, or heterochromatically silenced regions. While high-throughput methods like ChIP-Seq have led to a flood of high-quality data about these epigenomic domains, there remain important analysis problems that are not adequately solved by current analysis tools.
Results: We present the RSEG method for identifying epigenomic domains from ChIP-Seq data for histone modifications. In contrast with other methods emphasizing the locations of ‘peaks’ in read density profiles, our method identifies the boundaries of domains. RSEG is also able to incorporate a control sample and find genomic regions with differential histone modifications between two samples.
Availability: RSEG, including source code and documentation, is freely available at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3051331  PMID: 21325299
19.  Diverse endonucleolytic cleavage sites in the mammalian transcriptome depend upon microRNAs, Drosha, and additional nucleases 
Molecular cell  2010;38(6):781-788.
The lifespan of a mammalian mRNA is determined, in part, by the binding of regulatory proteins and small RNA-guided complexes. The conserved endonuclease activity of Argonaute2 requires extensive complementarity between a small RNA and its target and is not used by animal microRNAs, which pair with their targets imperfectly. Here, we investigate the endonucleolytic function of Ago2 and other nucleases by transcriptome-wide profiling of mRNA cleavage products retaining 5′-phosphate groups in mouse ES. We detect a prominent signature of Ago2-dependent cleavage events and validate several such targets. Unexpectedly, a broader class of Ago2-independent cleavage sites is also observed, indicating participation of additional nucleases in site-specific mRNA cleavage. Within this class, we identify a cohort of Drosha-dependent mRNA cleavage events that functionally regulate mRNA levels in mES cells, including one in the Dgcr8 mRNA. Together, these results highlight the underappreciated role of endonucleolytic cleavage in controlling mRNA fates in mammals.
PMCID: PMC2914474  PMID: 20620951
20.  Updates to the RMAP short-read mapping software 
Bioinformatics  2009;25(21):2841-2842.
Summary: We report on a major new version of the RMAP software for mapping reads from short-read sequencing technology. General improvements to accuracy and space requirements are included, along with novel functionality. Included in the RMAP software package are tools for mapping paired-end reads, mapping using more sophisticated use of quality scores, collecting ambiguous mapping locations and mapping bisulfite-treated reads.
Availability: The applications described in this note are available for download at and are distributed as Open Source software under the GPLv3.0. The software has been tested on Linux and OS X platforms.
The RMAP algorithm was introduced by (Smith et al., 2008) as one of the earliest available programs for mapping reads from the Illumina second-generation sequencing technology. One important contribution of RMAP was to incorporate the use of quality scores directly into the mapping process: read positions with too low a quality score were not considered while mapping, and that quality score cutoff could be adjusted by the user. Subsequently, numerous mapping algorithm have appeared (Langmead et al., 2009; Li,H. et al., 2008; Li,R. et al., 2008; Lin et al., 2008; Schatz, 2009; Yanovsky et al., 2008), with improvements in both efficiency and breadth of functionality (e.g. ability to map paired-end reads; integrated SNP calling). Investigators requiring solutions to mapping problems now have many options. As new applications of short-read sequencing emerge, many variations on the analysis task of read mapping emerge. Diversity in performance characteristics of existing mapping tools becomes potentially valuable.
We report the first major update to RMAP. The basic algorithmic framework in RMAP is still to preprocess reads and scan the genome, but several modifications have been made and much additional functionality has been included. Importantly, RMAP has a memory footprint that depends on the number of reads being mapped. This feature allows RMAP to be used effectively in cluster environments with commodity nodes, because partitioning the reads allows natural parallelizations with linear reduction in memory requirements per processor core used.
Included in this release of the RMAP software package is functionality for mapping paired-end reads, making more sophisticated use of quality scores, collecting mapping locations for ambiguously mapping reads and mapping bisulfite-treated reads.
PMCID: PMC2895571  PMID: 19736251
21.  An integrative genomics approach identifies Hypoxia Inducible Factor-1 (HIF-1)-target genes that form the core response to hypoxia 
Nucleic Acids Research  2009;37(14):4587-4602.
The transcription factor Hypoxia-inducible factor 1 (HIF-1) plays a central role in the transcriptional response to oxygen flux. To gain insight into the molecular pathways regulated by HIF-1, it is essential to identify the downstream-target genes. We report here a strategy to identify HIF-1-target genes based on an integrative genomic approach combining computational strategies and experimental validation. To identify HIF-1-target genes microarrays data sets were used to rank genes based on their differential response to hypoxia. The proximal promoters of these genes were then analyzed for the presence of conserved HIF-1-binding sites. Genes were scored and ranked based on their response to hypoxia and their HIF-binding site score. Using this strategy we recovered 41% of the previously confirmed HIF-1-target genes that responded to hypoxia in the microarrays and provide a catalogue of predicted HIF-1 targets. We present experimental validation for ANKRD37 as a novel HIF-1-target gene. Together these analyses demonstrate the potential to recover novel HIF-1-target genes and the discovery of mammalian-regulatory elements operative in the context of microarray data sets.
PMCID: PMC2724271  PMID: 19491311
22.  Gene set-based module discovery in the breast cancer transcriptome 
BMC Bioinformatics  2009;10:71.
Although microarray-based studies have revealed global view of gene expression in cancer cells, we still have little knowledge about regulatory mechanisms underlying the transcriptome. Several computational methods applied to yeast data have recently succeeded in identifying expression modules, which is defined as co-expressed gene sets under common regulatory mechanisms. However, such module discovery methods are not applied cancer transcriptome data.
In order to decode oncogenic regulatory programs in cancer cells, we developed a novel module discovery method termed EEM by extending a previously reported module discovery method, and applied it to breast cancer expression data. Starting from seed gene sets prepared based on cis-regulatory elements, ChIP-chip data, and gene locus information, EEM identified 10 principal expression modules in breast cancer based on their expression coherence. Moreover, EEM depicted their activity profiles, which predict regulatory programs in each subtypes of breast tumors. For example, our analysis revealed that the expression module regulated by the Polycomb repressive complex 2 (PRC2) is downregulated in triple negative breast cancers, suggesting similarity of transcriptional programs between stem cells and aggressive breast cancer cells. We also found that the activity of the PRC2 expression module is negatively correlated to the expression of EZH2, a component of PRC2 which belongs to the E2F expression module. E2F-driven EZH2 overexpression may be responsible for the repression of the PRC2 expression modules in triple negative tumors. Furthermore, our network analysis predicts regulatory circuits in breast cancer cells.
These results demonstrate that the gene set-based module discovery approach is a powerful tool to decode regulatory programs in cancer cells.
PMCID: PMC2674431  PMID: 19243633
23.  Analysis of the vertebrate insulator protein CTCF binding sites in the human genome 
Cell  2007;128(6):1231-1245.
Insulator elements affect gene expression by preventing the spread of heterochromatin and restricting transcriptional enhancers from activation of unrelated promoters. In vertebrates, insulator’s function requires association with the CCCTC-binding factor (CTCF), a protein that recognizes long and diverse nucleotide sequences. While insulators are critical in gene regulation, only a few have been reported. Here, we describe 13,804 CTCF binding sites in potential insulators of the human genome, discovered experimentally in primary human fibroblasts. Most of these sequences are located far from the transcriptional start sites, with their distribution strongly correlated with genes. The majority of them fit to a consensus motif highly conserved and suitable for predicting possible insulators driven by CTCF in other vertebrate genomes. In addition, CTCF localization is largely invariant across different cell types. Our results provide resource for investigating insulator function and possible other general and evolutionarily conserved activities of CTCF sites.
PMCID: PMC2572726  PMID: 17382889
24.  Integrative bioinformatics analysis of transcriptional regulatory programs in breast cancer cells 
BMC Bioinformatics  2008;9:404.
Microarray technology has unveiled transcriptomic differences among tumors of various phenotypes, and, especially, brought great progress in molecular understanding of phenotypic diversity of breast tumors. However, compared with the massive knowledge about the transcriptome, we have surprisingly little knowledge about regulatory mechanisms underling transcriptomic diversity.
To gain insights into the transcriptional programs that drive tumor progression, we integrated regulatory sequence data and expression profiles of breast cancer into a Bayesian Network, and searched for cis-regulatory motifs statistically associated with given histological grades and prognosis. Our analysis found that motifs bound by ELK1, E2F, NRF1 and NFY are potential regulatory motifs that positively correlate with malignant progression of breast cancer.
The results suggest that these 4 motifs are principal regulatory motifs driving malignant progression of breast cancer. Our method offers a more concise description about transcriptome diversity among breast tumors with different clinical phenotypes.
PMCID: PMC2572072  PMID: 18823535
25.  Using quality scores and longer reads improves accuracy of Solexa read mapping 
BMC Bioinformatics  2008;9:128.
Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science. New technologies will make re-sequencing widely available for such applications as identifying genome variations or interrogating the oligonucleotide content of a large sample (e.g. ChIP-sequencing). The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks. The Solexa/Illumina 1G sequencer can produce tens of millions of reads, ranging in length from ~25–50 nt, in a single experiment. Accurately mapping the reads back to a reference genome is a critical task in almost all applications. Two sources of information that are often ignored when mapping reads from the Solexa technology are the 3' ends of longer reads, which contain a much higher frequency of sequencing errors, and the base-call quality scores.
To investigate whether these sources of information can be used to improve accuracy when mapping reads, we developed the RMAP tool, which can map reads having a wide range of lengths and allows base-call quality scores to determine which positions in each read are more important when mapping. We applied RMAP to analyze data re-sequenced from two human BAC regions for varying read lengths, and varying criteria for use of quality scores. RMAP is freely available for downloading at .
Our results indicate that significant gains in Solexa read mapping performance can be achieved by considering the information in 3' ends of longer reads, and appropriately using the base-call quality scores. The RMAP tool we have developed will enable researchers to effectively exploit this information in targeted re-sequencing projects.
PMCID: PMC2335322  PMID: 18307793

Results 1-25 (31)