Search tips
Search criteria

Results 1-25 (41)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Genomic Basis of a Polyagglutinating Isolate of Neisseria meningitidis 
Journal of Bacteriology  2012;194(20):5649-5656.
Containment strategies for outbreaks of invasive Neisseria meningitidis disease are informed by serogroup assays that characterize the polysaccharide capsule. We sought to uncover the genomic basis of conflicting serogroup assay results for an isolate (M16917) from a patient with acute meningococcal disease. To this end, we characterized the complete genome sequence of the M16917 isolate and performed a variety of comparative sequence analyses against N. meningitidis reference genome sequences of known serogroups. Multilocus sequence typing and whole-genome sequence comparison revealed that M16917 is a member of the ST-11 sequence group, which is most often associated with serogroup C. However, sequence similarity comparisons and phylogenetic analysis showed that the serogroup diagnostic capsule polymerase gene (synD) of M16917 belongs to serogroup B. These results suggest that a capsule-switching event occurred based on homologous recombination at or around the capsule locus of M16917. Detailed analysis of this locus uncovered the locations of recombination breakpoints in the M16917 genome sequence, which led to the introduction of an ∼2-kb serogroup B sequence cassette into the serogroup C genomic background. Since there is no currently available vaccine for serogroup B strains of N. meningitidis, this kind capsule-switching event could have public health relevance as a vaccine escape mutant.
PMCID: PMC3458693  PMID: 22904290
2.  Transcriptional Activity, Chromosomal Distribution and Expression Effects of Transposable Elements in Coffea Genomes 
PLoS ONE  2013;8(11):e78931.
Plant genomes are massively invaded by transposable elements (TEs), many of which are located near host genes and can thus impact gene expression. In flowering plants, TE expression can be activated (de-repressed) under certain stressful conditions, both biotic and abiotic, as well as by genome stress caused by hybridization. In this study, we examined the effects of these stress agents on TE expression in two diploid species of coffee, Coffea canephora and C. eugenioides, and their allotetraploid hybrid C. arabica. We also explored the relationship of TE repression mechanisms to host gene regulation via the effects of exonized TE sequences. Similar to what has been seen for other plants, overall TE expression levels are low in Coffea plant cultivars, consistent with the existence of effective TE repression mechanisms. TE expression patterns are highly dynamic across the species and conditions assayed here are unrelated to their classification at the level of TE class or family. In contrast to previous results, cell culture conditions per se do not lead to the de-repression of TE expression in C. arabica. Results obtained here indicate that differing plant drought stress levels relate strongly to TE repression mechanisms. TEs tend to be expressed at significantly higher levels in non-irrigated samples for the drought tolerant cultivars but in drought sensitive cultivars the opposite pattern was shown with irrigated samples showing significantly higher TE expression. Thus, TE genome repression mechanisms may be finely tuned to the ideal growth and/or regulatory conditions of the specific plant cultivars in which they are active. Analysis of TE expression levels in cell culture conditions underscored the importance of nonsense-mediated mRNA decay (NMD) pathways in the repression of Coffea TEs. These same NMD mechanisms can also regulate plant host gene expression via the repression of genes that bear exonized TE sequences.
PMCID: PMC3823963  PMID: 24244387
3.  Inhibition of activated pericentromeric SINE/Alu repeat transcription in senescent human adult stem cells reinstates self-renewal 
Cell Cycle  2011;10(17):3016-3030.
Cellular aging is linked to deficiencies in efficient repair of DNA double strand breaks and authentic genome maintenance at the chromatin level. Aging poses a significant threat to adult stem cell function by triggering persistent DNA damage and ultimately cellular senescence. Senescence is often considered to be an irreversible process. Moreover, critical genomic regions engaged in persistent DNA damage accumulation are unknown. Here we report that 65% of naturally occurring repairable DNA damage in self-renewing adult stem cells occurs within transposable elements. Upregulation of Alu retrotransposon transcription upon ex vivo aging causes nuclear cytotoxicity associated with the formation of persistent DNA damage foci and loss of efficient DNA repair in pericentric chromatin. This occurs due to a failure to recruit of condensin I and cohesin complexes. Our results demonstrate that the cytotoxicity of induced Alu repeats is functionally relevant for the human adult stem cell aging. Stable suppression of Alu transcription can reverse the senescent phenotype, reinstating the cells' self-renewing properties and increasing their plasticity by altering so-called “master” pluripotency regulators.
PMCID: PMC3218602  PMID: 21862875
adult stem cells; senescence; SINE/Alu transposons; DNA damage; H2AX; ChIP-seq; cohesin; condensin; PML body; induced pluripotency
4.  On the presence and role of human gene-body DNA methylation 
Oncotarget  2012;3(4):462-474.
DNA methylation of promoter sequences is a repressive epigenetic mark that down-regulates gene expression. However, DNA methylation is more prevalent within gene-bodies than seen for promoters, and gene-body methylation has been observed to be positively correlated with gene expression levels. This paradox remains unexplained, and accordingly the role of DNA methylation in gene-bodies is poorly understood. We addressed the presence and role of human gene-body DNA methylation using a meta-analysis of human genome-wide methylation, expression and chromatin data sets. Methylation is associated with transcribed regions as genic sequences have higher levels of methylation than intergenic or promoter sequences. We also find that the relationship between gene-body DNA methylation and expression levels is non-monotonic and bell-shaped. Mid-level expressed genes have the highest levels of gene-body methylation, whereas the most lowly and highly expressed sets of genes both have low levels of methylation. While gene-body methylation can be seen to efficiently repress the initiation of intragenic transcription, the vast majority of methylated sites within genes are not associated with intragenic promoters. In fact, highly expressed genes initiate the most intragenic transcription, which is inconsistent with the previously held notion that gene-body methylation serves to repress spurious intragenic transcription to allow for efficient transcriptional elongation. These observations lead us to propose a model to explain the presence of human gene-body methylation. This model holds that the repression of intragenic transcription by gene-body methylation is largely epiphenomenal, and suggests that gene-body methylation levels are predominantly shaped via the accessibility of the DNA to methylating enzyme complexes.
PMCID: PMC3380580  PMID: 22577155
genome-wide methylation; epigenetic mark; intragenic transcription; methylating enzyme complexes
5.  Do human transposable element small RNAs serve primarily as genome defenders or genome regulators? 
Mobile Genetic Elements  2012;2(1):19-25.
It is currently thought that small RNA (sRNA) based repression mechanisms are primarily employed to mitigate the mutagenic threat posed by the activity of transposable elements (TEs). This can be achieved by the sRNA guided processing of TE transcripts via Dicer-dependent (e.g., siRNA) or Dicer-independent (e.g., piRNA) mechanisms. For example, potentially active human L1 elements are silenced by mRNA cleavage induced by element encoded siRNAs, leading to a negative correlation between element mRNA and siRNA levels. On the other hand, there is emerging evidence that TE derived sRNAs can also be used to regulate the host genome. Here, we evaluated these two hypotheses for human TEs by comparing the levels of TE derived mRNA and TE sRNA across six tissues. The genome defense hypothesis predicts a negative correlation between TE mRNA and TE sRNA levels, whereas the genome regulatory hypothesis predicts a positive correlation. On average, TE mRNA and TE sRNA levels are positively correlated across human tissues. These correlations are higher than seen for human genes or for randomly permuted control data sets. Overall, Alu subfamilies show the highest positive correlations of element mRNA and sRNA levels across tissues, although a few of the youngest, and potentially most active, Alu subfamilies do show negative correlations. Thus, Alu derived sRNAs may be related to both genome regulation and genome defense. These results are inconsistent with a simple model whereby TE derived sRNAs reduce levels of standing TE mRNA via transcript cleavage, and suggest that human cells efficiently process TE transcripts into sRNA based on the available message levels. This may point to a widespread role for processed TE transcripts in genome regulation or to alternative roles of TE-to-sRNA processing including the mitigation of TE transcript cytotoxicity.
PMCID: PMC3383446  PMID: 22754749
RNA interference; RNA processing; gene expression; genome regulation; small RNA
6.  Genome Sequences for Six Rhodanobacter Strains, Isolated from Soils and the Terrestrial Subsurface, with Variable Denitrification Capabilities 
Journal of Bacteriology  2012;194(16):4461-4462.
We report the first genome sequences for six strains of Rhodanobacter species isolated from a variety of soil and subsurface environments. Three of these strains are capable of complete denitrification and three others are not. However, all six strains contain most of the genes required for the respiration of nitrate to gaseous nitrogen. The nondenitrifying members of the genus lack only the gene for nitrate reduction, the first step in the full denitrification pathway. The data suggest that the environmental role of bacteria from the genus Rhodanobacter should be reevaluated.
PMCID: PMC3416251  PMID: 22843592
7.  Depletion of nuclear histone H2A variants is associated with chronic DNA damage signaling upon drug-evoked senescence of human somatic cells 
Aging (Albany NY)  2012;4(11):823-842.
Cellular senescence is associated with global chromatin changes, altered gene expression, and activation of chronic DNA damage signaling. These events ultimately lead to morphological and physiological transformations in primary cells. In this study, we show that chronic DNA damage signals caused by genotoxic stress impact the expression of histones H2A family members and lead to their depletion in the nuclei of senescent human fibroblasts. Our data reinforce the hypothesis that progressive chromatin destabilization may lead to the loss of epigenetic information and impaired cellular function associated with chronic DNA damage upon drug-evoked senescence. We propose that changes in the histone biosynthesis and chromatin assembly may directly contribute to cellular aging. In addition, we also outline the method that allows for quantitative and unbiased measurement of these changes.
PMCID: PMC3560435  PMID: 23235539
γH2A.X; DNA damage; senescence; LS-MS analysis; quantitative proteomic; SRM; histone H2A family; chromatin; DNA repair; HCA2 primary fibroblasts; epigenetics
8.  Repetitive DNA elements, nucleosome binding and human gene expression 
Gene  2009;436(1-2):12-22.
We evaluated the epigenetic contributions of repetitive DNA elements to human gene regulation. Human proximal promoter sequences show distinct distributions of transposable elements (TEs) and simple sequence repeats (SSRs). TEs are enriched distal from transcriptional start sites (TSSs) and their frequency decreases closer to TSSs being largely absent from the core promoter region. SSRs, on the other hand, are found at low frequency distal to the TSS and then increase in frequency starting ∼150bp upstream of the TSS. The peak of SSR density is centered around the -35bp position where the basal transcriptional machinery assembles. These trends in repetitive sequence distribution are strongly correlated, positively for TEs and negatively for SSRs, with relative nucleosome binding affinities along the promoters. Nucleosomes bind with highest probability distal from the TSS and the nucleosome binding affinity steadily decreases reaching its nadir just upstream of the TSS at the same point where SSR frequency is at its highest. Promoters that are enriched for TEs are more highly and broadly expressed, on average, than promoters that are devoid of TEs. In addition, promoters that have similar repetitive DNA profiles regulate genes that have more similar expression patterns and encode proteins with more similar functions than promoters that differ with respect to their repetitive DNA. Furthermore, distinct repetitive DNA promoter profiles are correlated with tissue-specific patterns of expression. These observations indicate that repetitive DNA elements mediate chromatin accessibility in proximal promoter regions and the repeat content of promoters is relevant to both gene expression and function.
PMCID: PMC2921533  PMID: 19393174
9.  Cell type-specific termination of transcription by transposable element sequences 
Mobile DNA  2012;3:15.
Transposable elements (TEs) encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question.
Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS) genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3′ UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS.
TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are evolutionarily young. The extent of transcription termination by TEs seen here, along with the preference for sense-oriented TE insertions to provide TTS, is consistent with the observed antisense orientation bias of human TEs.
PMCID: PMC3517506  PMID: 23020800
Polyadenylation; Transcription termination; Orientation bias; Gene regulation
10.  Chromatin signature discovery via histone modification profile alignments 
Nucleic Acids Research  2012;40(21):10642-10656.
We report on the development of an unsupervised algorithm for the genome-wide discovery and analysis of chromatin signatures. Our Chromatin-profile Alignment followed by Tree-clustering algorithm (ChAT) employs dynamic programming of combinatorial histone modification profiles to identify locally similar chromatin sub-regions and provides complementary utility with respect to existing methods. We applied ChAT to genomic maps of 39 histone modifications in human CD4+ T cells to identify both known and novel chromatin signatures. ChAT was able to detect chromatin signatures previously associated with transcription start sites and enhancers as well as novel signatures associated with a variety of regulatory elements. Promoter-associated signatures discovered with ChAT indicate that complex chromatin signatures, made up of numerous co-located histone modifications, facilitate cell-type specific gene expression. The discovery of novel L1 retrotransposon-associated bivalent chromatin signatures suggests that these elements influence the mono-allelic expression of human genes by shaping the chromatin environment of imprinted genomic regions. Analysis of long gene-associated chromatin signatures point to a role for the H4K20me1 and H3K79me3 histone modifications in transcriptional pause release. The novel chromatin signatures and functional associations uncovered by ChAT underscore the ability of the algorithm to yield novel insight on chromatin-based regulatory mechanisms.
PMCID: PMC3505981  PMID: 22989711
11.  Relating the Disease Mutation Spectrum to the Evolution of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) 
PLoS ONE  2012;7(8):e42336.
Cystic fibrosis (CF) is the most common genetic disease among Caucasians, and accordingly the cystic fibrosis transmembrane conductance regulator (CFTR) protein has perhaps the best characterized disease mutation spectrum with more than 1,500 causative mutations having been identified. In this study, we took advantage of that wealth of mutational information in an effort to relate site-specific evolutionary parameters with the propensity and severity of CFTR disease-causing mutations. To do this, we devised a scoring scheme for known CFTR disease-causing mutations based on the Grantham amino acid chemical difference matrix. CFTR site-specific evolutionary constraint values were then computed for seven different evolutionary metrics across a range of increasing evolutionary depths. The CFTR mutational scores and the various site-specific evolutionary constraint values were compared in order to evaluate which evolutionary measures best reflect the disease-causing mutation spectrum. Site-specific evolutionary constraint values from the widely used comparative method PolyPhen2 show the best correlation with the CFTR mutation score spectrum, whereas more straightforward conservation based measures (ConSurf and ScoreCons) show the greatest ability to predict individual CFTR disease-causing mutations. While far greater than could be expected by chance alone, the fraction of the variability in mutation scores explained by the PolyPhen2 metric (3.6%), along with the best set of paired sensitivity (58%) and specificity (60%) values for the prediction of disease-causing residues, were marginal. These data indicate that evolutionary constraint levels are informative but far from determinant with respect to disease-causing mutations in CFTR. Nevertheless, this work shows that, when combined with additional lines of evidence, information on site-specific evolutionary conservation can and should be used to guide site-directed mutagenesis experiments by more narrowly defining the set of target residues, resulting in a potential savings of both time and money.
PMCID: PMC3413703  PMID: 22879944
12.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
13.  Genome Sequences for Five Strains of the Emerging Pathogen Haemophilus haemolyticus 
Journal of Bacteriology  2011;193(20):5879-5880.
We report the first whole-genome sequences for five strains, two carried and three pathogenic, of the emerging pathogen Haemophilus haemolyticus. Preliminary analyses indicate that these genome sequences encode markers that distinguish H. haemolyticus from its closest Haemophilus relatives and provide clues to the identity of its virulence factors.
PMCID: PMC3187195  PMID: 21952546
14.  Genome Sequence of the Mycobacterium colombiense Type Strain, CECT 3035 
Journal of Bacteriology  2011;193(20):5866-5867.
We report the first whole-genome sequence of the Mycobacterium colombiense type strain, CECT 3035, which was initially isolated from Colombian HIV-positive patients and causes respiratory and disseminated infections. Preliminary comparative analyses indicate that the M. colombiense lineage has experienced a substantial genome expansion, possibly contributing to its distinct pathogenic capacity.
PMCID: PMC3187203  PMID: 21952541
15.  Using Single-Nucleotide Polymorphisms To Discriminate Disease-Associated from Carried Genomes of Neisseria meningitidis▿† 
Journal of Bacteriology  2011;193(14):3633-3641.
Neisseria meningitidis is one of the main agents of bacterial meningitis, causing substantial morbidity and mortality worldwide. However, most of the time N. meningitidis is carried as a commensal not associated with invasive disease. The genomic basis of the difference between disease-associated and carried isolates of N. meningitidis may provide critical insight into mechanisms of virulence, yet it has remained elusive. Here, we have taken a comparative genomics approach to interrogate the difference between disease-associated and carried isolates of N. meningitidis at the level of individual nucleotide variations (i.e., single nucleotide polymorphisms [SNPs]). We aligned complete genome sequences of 8 disease-associated and 4 carried isolates of N. meningitidis to search for SNPs that show mutually exclusive patterns of variation between the two groups. We found 63 SNPs that distinguish the 8 disease-associated genomes from the 4 carried genomes of N. meningitidis, which is far more than can be expected by chance alone given the level of nucleotide variation among the genomes. The putative list of SNPs that discriminate between disease-associated and carriage genomes may be expected to change with increased sampling or changes in the identities of the isolates being compared. Nevertheless, we show that these discriminating SNPs are more likely to reflect phenotypic differences than shared evolutionary history. Discriminating SNPs were mapped to genes, and the functions of the genes were evaluated for possible connections to virulence mechanisms. A number of overrepresented functional categories related to virulence were uncovered among SNP-associated genes, including genes related to the category “symbiosis, encompassing mutualism through parasitism.”
PMCID: PMC3133314  PMID: 21622743
16.  Protein interactions with piALU RNA indicates putative participation of retroRNA in the cell cycle, DNA repair and chromatin assembly 
Mobile Genetic Elements  2012;2(1):26-35.
Recent analyses suggest that transposable element-derived transcripts are processed to yield a variety of small RNA species that play critical functional roles in gene regulation and chromatin organization as well as genome stability and maintenance. Here we report a mass spectrometry analysis of an RNA-affinity complex isolation using a piRNA homologous sequence derived from Alu retrotransposal RNA. Our data point to potential roles for piALU RNAs in DNA repair, cell cycle and chromatin regulations.
PMCID: PMC3383447  PMID: 22754750
Alu; DNA repair; PIWI; SINE; TE (transposable element); cell cycle; centromere; chromatin modifiers; histone modifiers; kinetochore assembly; microtubule dynamics; piRNA; transcriptional factors
17.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
18.  Transcription factor binding sites are highly enriched within microRNA precursor sequences 
Biology Direct  2011;6:61.
Transcription factors are thought to regulate the transcription of microRNA genes in a manner similar to that of protein-coding genes; that is, by binding to conventional transcription factor binding site DNA sequences located in or near promoter regions that lie upstream of the microRNA genes. However, in the course of analyzing the genomics of human microRNA genes, we noticed that annotated transcription factor binding sites commonly lie within 70- to 110-nt long microRNA small hairpin precursor sequences.
We report that about 45% of all human small hairpin microRNA (pre-miR) sequences contain at least one predicted transcription factor binding site motif that is conserved across human, mouse and rat, and this rises to over 75% if one excludes primate-specific pre-miRs. The association is robust and has extremely strong statistical significance; it affects both intergenic and intronic pre-miRs and both isolated and clustered microRNA genes. We also confirmed and extended this finding using a separate analysis that examined all human pre-miR sequences regardless of conservation across species.
The transcription factor binding sites localized within small hairpin microRNA precursor sequences may possibly regulate their transcription. Transcription factors may also possibly bind directly to nascent primary microRNA gene transcripts or small hairpin microRNA precursors and regulate their processing.
This article was reviewed by Guillaume Bourque (nominated by Jerzy Jurka), Dmitri Pervouchine (nominated by Mikhail Gelfand), and Yuriy Gusev.
PMCID: PMC3240832  PMID: 22136256
Transcription factors; microRNA biogenesis; drosha
19.  Prediction of Transposable Element Derived Enhancers Using Chromatin Modification Profiles 
PLoS ONE  2011;6(11):e27513.
Experimentally characterized enhancer regions have previously been shown to display specific patterns of enrichment for several different histone modifications. We modelled these enhancer chromatin profiles in the human genome and used them to guide the search for novel enhancers derived from transposable element (TE) sequences. To do this, a computational approach was taken to analyze the genome-wide histone modification landscape characterized by the ENCODE project in two human hematopoietic cell types, GM12878 and K562. We predicted the locations of 2,107 and 1,448 TE-derived enhancers in the GM12878 and K562 cell lines respectively. A vast majority of these putative enhancers are unique to each cell line; only 3.5% of the TE-derived enhancers are shared between the two. We evaluated the functional effect of TE-derived enhancers by associating them with the cell-type specific expression of nearby genes, and found that the number of TE-derived enhancers is strongly positively correlated with the expression of nearby genes in each cell line. Furthermore, genes that are differentially expressed between the two cell lines also possess a divergent number of TE-derived enhancers in their vicinity. As such, genes that are up-regulated in the GM12878 cell line and down-regulated in K562 have significantly more TE-derived enhancers in their vicinity in the GM12878 cell line and vice versa. These data indicate that human TE-derived sequences are likely to be involved in regulating cell-type specific gene expression on a broad scale and suggest that the enhancer activity of TE-derived sequences is mediated by epigenetic regulatory mechanisms.
PMCID: PMC3210180  PMID: 22087331
20.  Genome-wide prediction and analysis of human chromatin boundary elements 
Nucleic Acids Research  2011;40(2):511-529.
Boundary elements partition eukaryotic chromatin into active and repressive domains, and can also block regulatory interactions between domains. Boundary elements act via diverse mechanisms making accurate feature-based computational predictions difficult. Therefore, we developed an unbiased algorithm that predicts the locations of human boundary elements based on the genomic distributions of chromatin and transcriptional states, as opposed to any intrinsic characteristics that they may possess. Application of our algorithm to ChIP-seq data for histone modifications and RNA Pol II-binding data in human CD4+ T cells resulted in the prediction of 2542 putative chromatin boundary elements genome wide. Predicted boundary elements display two distinct features: first, position-specific open chromatin and histone acetylation that is coincident with the recruitment of sequence-specific DNA-binding factors such as CTCF, EVI1 and YYI, and second, a directional and gradual increase in histone lysine methylation across predicted boundaries coincident with a gain of expression of non-coding RNAs, including examples of boundaries encoded by tRNA and other non-coding RNA genes. Accordingly, a number of the predicted human boundaries may function via the synergistic action of sequence-specific recruitment of transcription factors leading to non-coding RNA transcriptional interference and the blocking of facultative heterochromatin propagation by transcription-associated chromatin remodeling complexes.
PMCID: PMC3258141  PMID: 21930510
21.  Neisseria Base: a comparative genomics database for Neisseria meningitidis 
Neisseria meningitidis is an important pathogen, causing life-threatening diseases including meningitis, septicemia and in some cases pneumonia. Genomic studies hold great promise for N. meningitidis research, but substantial database resources are needed to deal with the wealth of information that comes with completely sequenced and annotated genomes. To address this need, we developed Neisseria Base (NBase), a comparative genomics database and genome browser that houses and displays publicly available N. meningitidis genomes. In addition to existing N. meningitidis genome sequences, we sequenced and annotated 19 new genomes using 454 pyrosequencing and the CG-Pipeline genome analysis tool. In total, NBase hosts 27 complete N. meningitidis genome sequences along with their associated annotations. The NBase platform is designed to be scalable, via the underlying database schema and modular code architecture, such that it can readily incorporate new genomes and their associated annotations. The front page of NBase provides user access to these genomes through searching, browsing and downloading. NBase search utility includes BLAST-based sequence similarity searches along with a variety of semantic search options. All genomes can be browsed using a modified version of the GBrowse platform, and a plethora of information on each gene can be viewed using a customized details page. NBase also has a whole-genome comparison tool that yields single-nucleotide polymorphism differences between two user-defined groups of genomes. Using the virulent ST-11 lineage as an example, we demonstrate how this comparative genomics utility can be used to identify novel genomic markers for molecular profiling of N. meningitidis.
Database URL:
PMCID: PMC3263597  PMID: 21930505
22.  A method for visualization of “omic” datasets for sphingolipid metabolism to predict potentially interesting differences[S] 
Journal of Lipid Research  2011;52(6):1073-1083.
Sphingolipids are structurally diverse and their metabolic pathways highly complex, which makes it difficult to follow all of the subspecies in a biological system, even using “lipidomic” approaches. This report describes a method to use transcriptomic data to visualize and predict potential differences in sphingolipid composition, and it illustrates its use with published data for cancer cell lines and tumors. In addition, several novel sphingolipids that were predicted to differ between MDA-MB-231 and MCF7 cells based on published microarray data for these breast cancer cell lines were confirmed by mass spectrometry. For the data that we were able to find for these comparisons, there was a significant match between the gene expression data and sphingolipid composition (P < 0.001 by Fisher's exact test). Upon considering the large number of gene expression datasets produced in recent years, this simple integration of two types of “omic” technologies (“transcriptomics” to direct “sphingolipidomics”) might facilitate the discovery of useful relationships between sphingolipid metabolism and disease, such as the identification of new biomarkers.
PMCID: PMC3090229  PMID: 21415121
lipidomics; pathway visualization; transcriptomics; cancer
23.  Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins 
Genome Biology  2001;2(12):research0053.1-research0053.9.
Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat).
Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment.
Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.
PMCID: PMC64838  PMID: 11790256
24.  Effect of the Transposable Element Environment of Human Genes on Gene Length and Expression 
Independent lines of investigation have documented effects of both transposable elements (TEs) and gene length (GL) on gene expression. However, TE gene fractions are highly correlated with GL, suggesting that they cannot be considered independently. We evaluated the TE environment of human genes and GL jointly in an attempt to tease apart their relative effects. TE gene fractions and GL were compared with the overall level of gene expression and the breadth of expression across tissues. GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, confirming the selection hypothesis that attributes the compactness of highly expressed genes to selection for economy of transcription. However, TE gene fractions overall, and for the L1 family in particular, show stronger anticorrelations with expression level than GL, indicating that GL may not be the most important target of selection for transcriptional economy. These results suggest a specific mechanism, removal of TEs, by which highly expressed genes are selectively tuned for efficiency. MIR elements are the only family of TEs with gene fractions that show a positive correlation with tissue-specific expression, suggesting that they may provide regulatory sequences that help to control human gene expression. Consistent with this notion, MIR fractions are relatively enriched close to transcription start sites and associated with coexpression in specific sets of related tissues. Our results confirm the overall relevance of the TE environment to gene expression and point to distinct mechanisms by which different TE families may contribute to gene regulation.
PMCID: PMC3070429  PMID: 21362639
gene expression; gene regulation; selection hypothesis; genomic design hypothesis; L1; MIR
25.  A Gibbs sampling strategy applied to the mapping of ambiguous short-sequence tags 
Bioinformatics  2010;26(20):2501-2508.
Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is widely used in biological research. ChIP-seq experiments yield many ambiguous tags that can be mapped with equal probability to multiple genomic sites. Such ambiguous tags are typically eliminated from consideration resulting in a potential loss of important biological information.
Results: We have developed a Gibbs sampling-based algorithm for the genomic mapping of ambiguous sequence tags. Our algorithm relies on the local genomic tag context to guide the mapping of ambiguous tags. The Gibbs sampling procedure we use simultaneously maps ambiguous tags and updates the probabilities used to infer correct tag map positions. We show that our algorithm is able to correctly map more ambiguous tags than existing mapping methods. Our approach is also able to uncover mapped genomic sites from highly repetitive sequences that can not be detected based on unique tags alone, including transposable elements, segmental duplications and peri-centromeric regions. This mapping approach should prove to be useful for increasing biological knowledge on the too often neglected repetitive genomic regions.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2951085  PMID: 20871106

Results 1-25 (41)