Search tips
Search criteria

Results 1-25 (107)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Origin and evolution of the cystic fibrosis transmembrane regulator protein R domain 
Gene  2013;523(2):137-146.
The Cystic Fibrosis Transmembrane Conductance Regulator protein (CFTR) is a member of the ABC transporter superfamily. CFTR is distinguished from all other members of this superfamily by its status as an ion channel as well as the presence of its unique regulatory (R) domain. We investigated the origin and subsequent evolution of the R domain along the CFTR evolutionary lineage. The R domain protein coding sequence originated via the loss of a splice donor site at the 3′ end of exon 14, leading to the subsequent read-through and capture of formerly intronic sequence as novel coding sequence. Inclusion of the remaining part of the R domain coding sequence in the CFTR transcript involved a lineage-specific gain of exonic sequence with no homology to protein coding sequences outside of CFTR and loss of two exons conserved among ABC family members. These events occurred at the base of the Gnathostome evolutionary lineage ~550–650 million years ago. The apparent origination of the R domain de novo from previously non-coding sequence is consistent with its lack of sequence similarity to other domains as well as its intrinsically disordered structure, which has important implications for its function. In particular, this lack of structure may provide for a dynamic and inducible regulatory activity based on transient physical interactions with more structured domains of the protein. Since its acquisition along the CFTR evolutionary lineage, the R domain has evolved more rapidly than any other CFTR domain; however, there is no evidence for positive (adaptive) selection in the evolution of the domain. The R domain does show a distinct pattern of relative evolutionary rates compared to other CFTR domains, which sheds additional light on the connection between its function and evolution. The regulatory function of the R domain is dependent upon a fairly small number of sites that are subject to phosphorylation, and these sites were fixed very early in R domain evolution and have remained largely invariant since that time. In contrast, the rest of the R domain has been free to drift in sequence space leading to a more star-like phylogeny than seen for the other CFTR domains. The case of the R domain suggests that domain acquisition via the de novo creation of coding sequence, and the novel functional utility that such an event would seemingly entail, can be one route by which neo-functionalization is favored to occur.
PMCID: PMC3793851  PMID: 23578801
Cystic fibrosis; R domain; Molecular evolution; Coding sequence; Neo-functionalization
2.  Flow-dependent epigenetic DNA methylation regulates endothelial gene expression and atherosclerosis 
The Journal of Clinical Investigation  2014;124(7):3187-3199.
In atherosclerosis, plaques preferentially develop in arterial regions of disturbed blood flow (d-flow), which alters endothelial gene expression and function. Here, we determined that d-flow regulates genome-wide DNA methylation patterns in a DNA methyltransferase–dependent (DNMT-dependent) manner. Induction of d-flow by partial carotid ligation surgery in a murine model induced DNMT1 in arterial endothelium. In cultured endothelial cells, DNMT1 was enhanced by oscillatory shear stress (OS), and reduction of DNMT with either the inhibitor 5-aza-2′-deoxycytidine (5Aza) or siRNA markedly reduced OS-induced endothelial inflammation. Moreover, administration of 5Aza reduced lesion formation in 2 mouse models of atherosclerosis. Using both reduced representation bisulfite sequencing (RRBS) and microarray, we determined that d-flow in the carotid artery resulted in hypermethylation within the promoters of 11 mechanosensitive genes and that 5Aza treatment restored normal methylation patterns. Of the identified genes, HoxA5 and Klf3 encode transcription factors that contain cAMP response elements, suggesting that the methylation status of these loci could serve as a mechanosensitive master switch in gene expression. Together, our results demonstrate that d-flow controls epigenomic DNA methylation patterns in a DNMT-dependent manner, which in turn alters endothelial gene expression and induces atherosclerosis.
PMCID: PMC4071393  PMID: 24865430
3.  Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter 
PLoS ONE  2014;9(4):e94101.
Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant.
PMCID: PMC3978025  PMID: 24709859
4.  The Role of Mutation Rate Variation and Genetic Diversity in the Architecture of Human Disease 
PLoS ONE  2014;9(2):e90166.
We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified.
Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless.
Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease.
PMCID: PMC3937440  PMID: 24587257
5.  Nested Insertions and Accumulation of Indels Are Negatively Correlated with Abundance of Mutator-Like Transposable Elements in Maize and Rice 
PLoS ONE  2014;9(1):e87069.
Mutator-like transposable elements (MULEs) are widespread in plants and were first discovered in maize where there are a total of 12,900 MULEs. In comparison, rice, with a much smaller genome, harbors over 30,000 MULEs. Since maize and rice are close relatives, the differential amplification of MULEs raised an inquiry into the underlying mechanism. We hypothesize this is partly attributed to the differential copy number of autonomous MULEs with the potential to generate the transposase that is required for transposition. To this end, we mined the two genomes and detected 530 and 476 MULEs containing transposase sequences (candidate coding-MULEs) in maize and rice, respectively. Over 1/3 of the candidate coding-MULEs harbor nested insertions and the ratios are similar in the two genomes. Among the maize elements with nested insertions, 24% have insertions in coding regions and over half of them harbor two or more insertions. In contrast, only 12% of the rice elements have insertions in coding regions and 19% have multiple insertions, suggesting that nested insertions in maize are more disruptive. This is because most nested insertions in maize are from LTR retrotransposons, which are large in size and are prevalent in the maize genome. Our results suggest that the amplification of retrotransposons may limit the amplification of DNA transposons but not vice versa. In addition, more indels are detected among maize elements than rice elements whereas defects caused by point mutations are comparable between the two species. Taken together, more disruptive nested insertions combined with higher frequency of indels resulted in few (6%) coding-MULEs that may encode functional transposases in maize. In contrast, 35% of the coding-MULEs in rice retain putative intact transposase. This is in addition to the higher expression frequency of rice coding-MULEs, which may explain the higher occurrence of MULEs in rice than that in maize.
PMCID: PMC3903597  PMID: 24475224
6.  Transcriptional Activity, Chromosomal Distribution and Expression Effects of Transposable Elements in Coffea Genomes 
PLoS ONE  2013;8(11):e78931.
Plant genomes are massively invaded by transposable elements (TEs), many of which are located near host genes and can thus impact gene expression. In flowering plants, TE expression can be activated (de-repressed) under certain stressful conditions, both biotic and abiotic, as well as by genome stress caused by hybridization. In this study, we examined the effects of these stress agents on TE expression in two diploid species of coffee, Coffea canephora and C. eugenioides, and their allotetraploid hybrid C. arabica. We also explored the relationship of TE repression mechanisms to host gene regulation via the effects of exonized TE sequences. Similar to what has been seen for other plants, overall TE expression levels are low in Coffea plant cultivars, consistent with the existence of effective TE repression mechanisms. TE expression patterns are highly dynamic across the species and conditions assayed here are unrelated to their classification at the level of TE class or family. In contrast to previous results, cell culture conditions per se do not lead to the de-repression of TE expression in C. arabica. Results obtained here indicate that differing plant drought stress levels relate strongly to TE repression mechanisms. TEs tend to be expressed at significantly higher levels in non-irrigated samples for the drought tolerant cultivars but in drought sensitive cultivars the opposite pattern was shown with irrigated samples showing significantly higher TE expression. Thus, TE genome repression mechanisms may be finely tuned to the ideal growth and/or regulatory conditions of the specific plant cultivars in which they are active. Analysis of TE expression levels in cell culture conditions underscored the importance of nonsense-mediated mRNA decay (NMD) pathways in the repression of Coffea TEs. These same NMD mechanisms can also regulate plant host gene expression via the repression of genes that bear exonized TE sequences.
PMCID: PMC3823963  PMID: 24244387
7.  On the presence and role of human gene-body DNA methylation 
Oncotarget  2012;3(4):462-474.
DNA methylation of promoter sequences is a repressive epigenetic mark that down-regulates gene expression. However, DNA methylation is more prevalent within gene-bodies than seen for promoters, and gene-body methylation has been observed to be positively correlated with gene expression levels. This paradox remains unexplained, and accordingly the role of DNA methylation in gene-bodies is poorly understood. We addressed the presence and role of human gene-body DNA methylation using a meta-analysis of human genome-wide methylation, expression and chromatin data sets. Methylation is associated with transcribed regions as genic sequences have higher levels of methylation than intergenic or promoter sequences. We also find that the relationship between gene-body DNA methylation and expression levels is non-monotonic and bell-shaped. Mid-level expressed genes have the highest levels of gene-body methylation, whereas the most lowly and highly expressed sets of genes both have low levels of methylation. While gene-body methylation can be seen to efficiently repress the initiation of intragenic transcription, the vast majority of methylated sites within genes are not associated with intragenic promoters. In fact, highly expressed genes initiate the most intragenic transcription, which is inconsistent with the previously held notion that gene-body methylation serves to repress spurious intragenic transcription to allow for efficient transcriptional elongation. These observations lead us to propose a model to explain the presence of human gene-body methylation. This model holds that the repression of intragenic transcription by gene-body methylation is largely epiphenomenal, and suggests that gene-body methylation levels are predominantly shaped via the accessibility of the DNA to methylating enzyme complexes.
PMCID: PMC3380580  PMID: 22577155
genome-wide methylation; epigenetic mark; intragenic transcription; methylating enzyme complexes
8.  Evaluation of Alignment Algorithms for Discovery and Identification of Pathogens Using RNA-Seq 
PLoS ONE  2013;8(10):e76935.
Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences - including those that exhibit RNA splicing - in transcriptome samples. To evaluate aligners objectively we developed a realistic RNA-Seq simulation and evaluation framework (RiSER) and propose a new combined score to rank aligners for viral characterization in terms of their precision, sensitivity and alignment accuracy. We used RiSER to simulate both human and viral read sequences and suggest the best set of aligners for viral sequence characterization in human transcriptome samples. Our results show that significant and substantial differences exist between aligners and that a digital-subtraction-based viral identification framework can and should use different aligners for different parts of the process. We determine the extent to which mutated viral sequences can be effectively characterized and show that more sensitive aligners such as BLAST, BFAST, SHRiMP2, BWA-SW and GSNAP can accurately characterize substantially divergent viral sequences with up to 15% overall sequence mutation rate. We believe that the results presented here will be useful to researchers choosing aligners for viral sequence characterization using next-generation sequencing data.
PMCID: PMC3813700  PMID: 24204709
9.  Sequence Evolution and Expression Regulation of Stress-Responsive Genes in Natural Populations of Wild Tomato 
PLoS ONE  2013;8(10):e78182.
The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.
PMCID: PMC3799731  PMID: 24205149
10.  Do human transposable element small RNAs serve primarily as genome defenders or genome regulators? 
Mobile Genetic Elements  2012;2(1):19-25.
It is currently thought that small RNA (sRNA) based repression mechanisms are primarily employed to mitigate the mutagenic threat posed by the activity of transposable elements (TEs). This can be achieved by the sRNA guided processing of TE transcripts via Dicer-dependent (e.g., siRNA) or Dicer-independent (e.g., piRNA) mechanisms. For example, potentially active human L1 elements are silenced by mRNA cleavage induced by element encoded siRNAs, leading to a negative correlation between element mRNA and siRNA levels. On the other hand, there is emerging evidence that TE derived sRNAs can also be used to regulate the host genome. Here, we evaluated these two hypotheses for human TEs by comparing the levels of TE derived mRNA and TE sRNA across six tissues. The genome defense hypothesis predicts a negative correlation between TE mRNA and TE sRNA levels, whereas the genome regulatory hypothesis predicts a positive correlation. On average, TE mRNA and TE sRNA levels are positively correlated across human tissues. These correlations are higher than seen for human genes or for randomly permuted control data sets. Overall, Alu subfamilies show the highest positive correlations of element mRNA and sRNA levels across tissues, although a few of the youngest, and potentially most active, Alu subfamilies do show negative correlations. Thus, Alu derived sRNAs may be related to both genome regulation and genome defense. These results are inconsistent with a simple model whereby TE derived sRNAs reduce levels of standing TE mRNA via transcript cleavage, and suggest that human cells efficiently process TE transcripts into sRNA based on the available message levels. This may point to a widespread role for processed TE transcripts in genome regulation or to alternative roles of TE-to-sRNA processing including the mitigation of TE transcript cytotoxicity.
PMCID: PMC3383446  PMID: 22754749
RNA interference; RNA processing; gene expression; genome regulation; small RNA
11.  Repetitive DNA elements, nucleosome binding and human gene expression 
Gene  2009;436(1-2):12-22.
We evaluated the epigenetic contributions of repetitive DNA elements to human gene regulation. Human proximal promoter sequences show distinct distributions of transposable elements (TEs) and simple sequence repeats (SSRs). TEs are enriched distal from transcriptional start sites (TSSs) and their frequency decreases closer to TSSs being largely absent from the core promoter region. SSRs, on the other hand, are found at low frequency distal to the TSS and then increase in frequency starting ∼150bp upstream of the TSS. The peak of SSR density is centered around the -35bp position where the basal transcriptional machinery assembles. These trends in repetitive sequence distribution are strongly correlated, positively for TEs and negatively for SSRs, with relative nucleosome binding affinities along the promoters. Nucleosomes bind with highest probability distal from the TSS and the nucleosome binding affinity steadily decreases reaching its nadir just upstream of the TSS at the same point where SSR frequency is at its highest. Promoters that are enriched for TEs are more highly and broadly expressed, on average, than promoters that are devoid of TEs. In addition, promoters that have similar repetitive DNA profiles regulate genes that have more similar expression patterns and encode proteins with more similar functions than promoters that differ with respect to their repetitive DNA. Furthermore, distinct repetitive DNA promoter profiles are correlated with tissue-specific patterns of expression. These observations indicate that repetitive DNA elements mediate chromatin accessibility in proximal promoter regions and the repeat content of promoters is relevant to both gene expression and function.
PMCID: PMC2921533  PMID: 19393174
12.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
13.  Genomic Basis of a Polyagglutinating Isolate of Neisseria meningitidis 
Journal of Bacteriology  2012;194(20):5649-5656.
Containment strategies for outbreaks of invasive Neisseria meningitidis disease are informed by serogroup assays that characterize the polysaccharide capsule. We sought to uncover the genomic basis of conflicting serogroup assay results for an isolate (M16917) from a patient with acute meningococcal disease. To this end, we characterized the complete genome sequence of the M16917 isolate and performed a variety of comparative sequence analyses against N. meningitidis reference genome sequences of known serogroups. Multilocus sequence typing and whole-genome sequence comparison revealed that M16917 is a member of the ST-11 sequence group, which is most often associated with serogroup C. However, sequence similarity comparisons and phylogenetic analysis showed that the serogroup diagnostic capsule polymerase gene (synD) of M16917 belongs to serogroup B. These results suggest that a capsule-switching event occurred based on homologous recombination at or around the capsule locus of M16917. Detailed analysis of this locus uncovered the locations of recombination breakpoints in the M16917 genome sequence, which led to the introduction of an ∼2-kb serogroup B sequence cassette into the serogroup C genomic background. Since there is no currently available vaccine for serogroup B strains of N. meningitidis, this kind capsule-switching event could have public health relevance as a vaccine escape mutant.
PMCID: PMC3458693  PMID: 22904290
14.  The Patterns of Histone Modifications in the Vicinity of Transcription Factor Binding Sites in Human Lymphoblastoid Cell Lines 
PLoS ONE  2013;8(3):e60002.
Transcription factor (TF) binding at specific DNA sequences is the fundamental step in transcriptional regulation and is highly dependent on the chromatin structure context, which may be affected by specific histone modifications and variants, known as histone marks. The lack of a global binding map for hundreds of TFs means that previous studies have focused mainly on histone marks at binding sites for several specific TFs. We therefore studied 11 histone marks around computationally-inferred and experimentally-determined TF binding sites (TFBSs), based on 164 and 34 TFs, respectively, in human lymphoblastoid cell lines. For H2A.Z, methylation of H3K4, and acetylation of H3K27 and H3K9, the mark patterns exhibited bimodal distributions and strong pairwise correlations in the 600-bp region around enriched TFBSs, suggesting that these marks mainly coexist within the two nucleosomes proximal to the TF sites. TFs competing with nucleosomes to access DNA at most binding sites, contributes to the bimodal distribution, which is a common feature of histone marks for TF binding. Mark H3K79me2 showed a unimodal distribution on one side of TFBSs and the signals extended up to 4000 bp, indicating a longer-distance pattern. Interestingly, H4K20me1, H3K27me3, H3K36me3 and H3K9me3, which were more diffuse and less enriched surrounding TFBSs, showed unimodal distributions around the enriched TFBSs, suggesting that some TFs may bind to nucleosomal DNA. Besides, asymmetrical distributions of H3K36me3 and H3K9me3 indicated that repressors might establish a repressive chromatin structure in one direction to repress gene expression. In conclusion, this study demonstrated the ranges of histone marks associated with TF binding, and the common features of these marks around the binding sites. These findings have epigenetic implications for future analysis of regulatory elements.
PMCID: PMC3602107  PMID: 23527292
15.  Genome Sequences for Six Rhodanobacter Strains, Isolated from Soils and the Terrestrial Subsurface, with Variable Denitrification Capabilities 
Journal of Bacteriology  2012;194(16):4461-4462.
We report the first genome sequences for six strains of Rhodanobacter species isolated from a variety of soil and subsurface environments. Three of these strains are capable of complete denitrification and three others are not. However, all six strains contain most of the genes required for the respiration of nitrate to gaseous nitrogen. The nondenitrifying members of the genus lack only the gene for nitrate reduction, the first step in the full denitrification pathway. The data suggest that the environmental role of bacteria from the genus Rhodanobacter should be reevaluated.
PMCID: PMC3416251  PMID: 22843592
16.  Co-evolutionary Rates of Functionally Related Yeast Genes 
Evolutionary knowledge is often used to facilitate computational attempts at gene function prediction. One rich source of evolutionary information is the relative rates of gene sequence divergence, and in this report we explore the connection between gene evolutionary rates and function. We performed a genome-scale evaluation of the relationship between evolutionary rates and functional annotations for the yeast Saccharomyces cerevisiae. Non-synonymous (dN) and synonymous (dS) substitution rates were calculated for 1,095 orthologous gene sets common to S. cerevisiae and six other closely related yeast species. Differences in evolutionary rates between pairs of genes (ΔdN & ΔdS) were then compared to their functional similarities (sGO), which were measured using Gene Ontology (GO) annotations. Substantial and statistically significant correlations were found between ΔdN and sGO, whereas there is no apparent relationship between ΔdS and sGO. These results are consistent with a mode of action for natural selection that is based on similar rates of elimination of deleterious protein coding sequence variants for functionally related genes. The connection between gene evolutionary rates and function was stronger than seen for phylogenetic profiles, which have previously been employed to inform functional inference. The co-evolution of functionally related yeast genes points to the relevance of specific function for the efficacy of natural selection and underscores the utility of gene evolutionary rates for functional predictions.
PMCID: PMC2674680  PMID: 18345352
Functional inference; Co-evolution; natural selection; genome evolution; gene ontology
17.  Depletion of nuclear histone H2A variants is associated with chronic DNA damage signaling upon drug-evoked senescence of human somatic cells 
Aging (Albany NY)  2012;4(11):823-842.
Cellular senescence is associated with global chromatin changes, altered gene expression, and activation of chronic DNA damage signaling. These events ultimately lead to morphological and physiological transformations in primary cells. In this study, we show that chronic DNA damage signals caused by genotoxic stress impact the expression of histones H2A family members and lead to their depletion in the nuclei of senescent human fibroblasts. Our data reinforce the hypothesis that progressive chromatin destabilization may lead to the loss of epigenetic information and impaired cellular function associated with chronic DNA damage upon drug-evoked senescence. We propose that changes in the histone biosynthesis and chromatin assembly may directly contribute to cellular aging. In addition, we also outline the method that allows for quantitative and unbiased measurement of these changes.
PMCID: PMC3560435  PMID: 23235539
γH2A.X; DNA damage; senescence; LS-MS analysis; quantitative proteomic; SRM; histone H2A family; chromatin; DNA repair; HCA2 primary fibroblasts; epigenetics
18.  Cell type-specific termination of transcription by transposable element sequences 
Mobile DNA  2012;3:15.
Transposable elements (TEs) encode sequences necessary for their own transposition, including signals required for the termination of transcription. TE sequences within the introns of human genes show an antisense orientation bias, which has been proposed to reflect selection against TE sequences in the sense orientation owing to their ability to terminate the transcription of host gene transcripts. While there is evidence in support of this model for some elements, the extent to which TE sequences actually terminate transcription of human gene across the genome remains an open question.
Using high-throughput sequencing data, we have characterized over 9,000 distinct TE-derived sequences that provide transcription termination sites for 5,747 human genes across eight different cell types. Rarefaction curve analysis suggests that there may be twice as many TE-derived termination sites (TE-TTS) genome-wide among all human cell types. The local chromatin environment for these TE-TTS is similar to that seen for 3′ UTR canonical TTS and distinct from the chromatin environment of other intragenic TE sequences. However, those TE-TTS located within the introns of human genes were found to be far more cell type-specific than the canonical TTS. TE-TTS were much more likely to be found in the sense orientation than other intragenic TE sequences of the same TE family and TE-TTS in the sense orientation terminate transcription more efficiently than those found in the antisense orientation. Alu sequences were found to provide a large number of relatively weak TTS, whereas LTR elements provided a smaller number of much stronger TTS.
TE sequences provide numerous termination sites to human genes, and TE-derived TTS are particularly cell type-specific. Thus, TE sequences provide a powerful mechanism for the diversification of transcriptional profiles between cell types and among evolutionary lineages, since most TE-TTS are evolutionarily young. The extent of transcription termination by TEs seen here, along with the preference for sense-oriented TE insertions to provide TTS, is consistent with the observed antisense orientation bias of human TEs.
PMCID: PMC3517506  PMID: 23020800
Polyadenylation; Transcription termination; Orientation bias; Gene regulation
19.  WaveSeq: A Novel Data-Driven Method of Detecting Histone Modification Enrichments Using Wavelets 
PLoS ONE  2012;7(9):e45486.
Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns.
To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup.
WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.
PMCID: PMC3461018  PMID: 23029045
20.  Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins 
Genome Biology  2001;2(12):research0053.1-research0053.9.
Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat).
Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment.
Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins.
PMCID: PMC64838  PMID: 11790256
21.  Chromatin signature discovery via histone modification profile alignments 
Nucleic Acids Research  2012;40(21):10642-10656.
We report on the development of an unsupervised algorithm for the genome-wide discovery and analysis of chromatin signatures. Our Chromatin-profile Alignment followed by Tree-clustering algorithm (ChAT) employs dynamic programming of combinatorial histone modification profiles to identify locally similar chromatin sub-regions and provides complementary utility with respect to existing methods. We applied ChAT to genomic maps of 39 histone modifications in human CD4+ T cells to identify both known and novel chromatin signatures. ChAT was able to detect chromatin signatures previously associated with transcription start sites and enhancers as well as novel signatures associated with a variety of regulatory elements. Promoter-associated signatures discovered with ChAT indicate that complex chromatin signatures, made up of numerous co-located histone modifications, facilitate cell-type specific gene expression. The discovery of novel L1 retrotransposon-associated bivalent chromatin signatures suggests that these elements influence the mono-allelic expression of human genes by shaping the chromatin environment of imprinted genomic regions. Analysis of long gene-associated chromatin signatures point to a role for the H4K20me1 and H3K79me3 histone modifications in transcriptional pause release. The novel chromatin signatures and functional associations uncovered by ChAT underscore the ability of the algorithm to yield novel insight on chromatin-based regulatory mechanisms.
PMCID: PMC3505981  PMID: 22989711
22.  Inhibition of activated pericentromeric SINE/Alu repeat transcription in senescent human adult stem cells reinstates self-renewal 
Cell Cycle  2011;10(17):3016-3030.
Cellular aging is linked to deficiencies in efficient repair of DNA double strand breaks and authentic genome maintenance at the chromatin level. Aging poses a significant threat to adult stem cell function by triggering persistent DNA damage and ultimately cellular senescence. Senescence is often considered to be an irreversible process. Moreover, critical genomic regions engaged in persistent DNA damage accumulation are unknown. Here we report that 65% of naturally occurring repairable DNA damage in self-renewing adult stem cells occurs within transposable elements. Upregulation of Alu retrotransposon transcription upon ex vivo aging causes nuclear cytotoxicity associated with the formation of persistent DNA damage foci and loss of efficient DNA repair in pericentric chromatin. This occurs due to a failure to recruit of condensin I and cohesin complexes. Our results demonstrate that the cytotoxicity of induced Alu repeats is functionally relevant for the human adult stem cell aging. Stable suppression of Alu transcription can reverse the senescent phenotype, reinstating the cells' self-renewing properties and increasing their plasticity by altering so-called “master” pluripotency regulators.
PMCID: PMC3218602  PMID: 21862875
adult stem cells; senescence; SINE/Alu transposons; DNA damage; H2AX; ChIP-seq; cohesin; condensin; PML body; induced pluripotency
23.  Integrated Analysis of Residue Coevolution and Protein Structure in ABC Transporters 
PLoS ONE  2012;7(5):e36546.
Intraprotein side chain contacts can couple the evolutionary process of amino acid substitution at one position to that at another. This coupling, known as residue coevolution, may vary in strength. Conserved contacts thus not only define 3-dimensional protein structure, but also indicate which residue-residue interactions are crucial to a protein’s function. Therefore, prediction of strongly coevolving residue-pairs helps clarify molecular mechanisms underlying function. Previously, various coevolution detectors have been employed separately to predict these pairs purely from multiple sequence alignments, while disregarding available structural information. This study introduces an integrative framework that improves the accuracy of such predictions, relative to previous approaches, by combining multiple coevolution detectors and incorporating structural contact information. This framework is applied to the ABC-B and ABC-C transporter families, which include the drug exporter P-glycoprotein involved in multidrug resistance of cancer cells, as well as the CFTR chloride channel linked to cystic fibrosis disease. The predicted coevolving pairs are further analyzed based on conformational changes inferred from outward- and inward-facing transporter structures. The analysis suggests that some pairs coevolved to directly regulate conformational changes of the alternating-access transport mechanism, while others to stabilize rigid-body-like components of the protein structure. Moreover, some identified pairs correspond to residues previously implicated in cystic fibrosis.
PMCID: PMC3348156  PMID: 22590562
24.  Genome Sequences for Five Strains of the Emerging Pathogen Haemophilus haemolyticus 
Journal of Bacteriology  2011;193(20):5879-5880.
We report the first whole-genome sequences for five strains, two carried and three pathogenic, of the emerging pathogen Haemophilus haemolyticus. Preliminary analyses indicate that these genome sequences encode markers that distinguish H. haemolyticus from its closest Haemophilus relatives and provide clues to the identity of its virulence factors.
PMCID: PMC3187195  PMID: 21952546
25.  Genome Sequence of the Mycobacterium colombiense Type Strain, CECT 3035 
Journal of Bacteriology  2011;193(20):5866-5867.
We report the first whole-genome sequence of the Mycobacterium colombiense type strain, CECT 3035, which was initially isolated from Colombian HIV-positive patients and causes respiratory and disseminated infections. Preliminary comparative analyses indicate that the M. colombiense lineage has experienced a substantial genome expansion, possibly contributing to its distinct pathogenic capacity.
PMCID: PMC3187203  PMID: 21952541

Results 1-25 (107)