Search tips
Search criteria

Results 1-4 (4)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Transcription factor binding sites are highly enriched within microRNA precursor sequences 
Biology Direct  2011;6:61.
Transcription factors are thought to regulate the transcription of microRNA genes in a manner similar to that of protein-coding genes; that is, by binding to conventional transcription factor binding site DNA sequences located in or near promoter regions that lie upstream of the microRNA genes. However, in the course of analyzing the genomics of human microRNA genes, we noticed that annotated transcription factor binding sites commonly lie within 70- to 110-nt long microRNA small hairpin precursor sequences.
We report that about 45% of all human small hairpin microRNA (pre-miR) sequences contain at least one predicted transcription factor binding site motif that is conserved across human, mouse and rat, and this rises to over 75% if one excludes primate-specific pre-miRs. The association is robust and has extremely strong statistical significance; it affects both intergenic and intronic pre-miRs and both isolated and clustered microRNA genes. We also confirmed and extended this finding using a separate analysis that examined all human pre-miR sequences regardless of conservation across species.
The transcription factor binding sites localized within small hairpin microRNA precursor sequences may possibly regulate their transcription. Transcription factors may also possibly bind directly to nascent primary microRNA gene transcripts or small hairpin microRNA precursors and regulate their processing.
This article was reviewed by Guillaume Bourque (nominated by Jerzy Jurka), Dmitri Pervouchine (nominated by Mikhail Gelfand), and Yuriy Gusev.
PMCID: PMC3240832  PMID: 22136256
Transcription factors; microRNA biogenesis; drosha
2.  Endogenous retroviruses of the chicken genome 
Biology Direct  2008;3:9.
We analyzed the chicken (Gallus gallus) genome sequence to search for previously uncharacterized endogenous retrovirus (ERV) sequences using ab initio and combined evidence approaches. We discovered 11 novel families of ERVs that occupy more than 21 million base pairs, approximately 2%, of the chicken genome. These novel families include a number of recently active full-length elements possessing identical long terminal repeats (LTRs) as well as intact gag and pol open reading frames. The abundance and diversity of chicken ERVs we discovered underscore the utility of an approach that combines multiple methods for the identification of interspersed repeats in vertebrate genomes.
This article was reviewed by Igor Zhulin and Itai Yanai.
PMCID: PMC2329609  PMID: 18361801
3.  Evaluating the protein coding potential of exonized transposable element sequences 
Biology Direct  2007;2:31.
Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons.
We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences.
The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence.
This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.).
PMCID: PMC2203978  PMID: 18036258
4.  Transposable element derived DNaseI-hypersensitive sites in the human genome 
Biology Direct  2006;1:20.
Transposable elements (TEs) are abundant genomic sequences that have been found to contribute to genome evolution in unexpected ways. Here, we characterize the evolutionary and functional characteristics of TE-derived human genome regulatory sequences uncovered by the high throughput mapping of DNaseI-hypersensitive (HS) sites.
Human genome TEs were found to contribute substantially to HS regulatory sequences characterized in CD4+ T cells: 23% of HS sites contain TE-derived sequences. While HS sites are far more evolutionarily conserved than non HS sites in the human genome, consistent with their functional importance, TE-derived HS sites are highly divergent. Nevertheless, TE-derived HS sites were shown to be functionally relevant in terms of driving gene expression in CD4+ T cells. Genes involved in immune response are statistically over-represented among genes with TE-derived HS sites. A number of genes with both TE-derived HS sites and immune tissue related expression patterns were found to encode proteins involved in immune response such as T cell specific receptor antigens and secreted cytokines as well as proteins with clinical relevance to HIV and cancer. Genes with TE-derived HS sites have higher average levels of sequence and expression divergence between human and mouse orthologs compared to genes with non TE-derived HS sites.
The results reported here support the notion that TEs provide a specific genome-wide mechanism for generating functionally relevant gene regulatory divergence between evolutionary lineages.
This article was reviewed by Wolfgang J. Miller (nominated by Jerzy Jurka), Itai Yanai and Mikhail S.Gelfand.
PMCID: PMC1538576  PMID: 16857058

Results 1-4 (4)