Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)
Year of Publication
Document Types
1.  Genetic and epigenetic variations contributed by Alu retrotransposition 
BMC Genomics  2011;12:617.
De novo retrotransposition of Alu elements has been recognized as a major driver for insertion polymorphisms in human populations. In this study, we exploited Alu-anchored bisulfite PCR libraries to identify evolutionarily recent Alu element insertions, and to investigate their genetic and epigenetic variation.
A total of 327 putatively recent Alu insertions were identified, altogether represented by 1,762 sequence reads. Nearly all such de novo retrotransposition events (316/327) were novel. Forty-seven out of forty-nine randomly selected events, corresponding to nineteen genomic loci, were sequence-verified. Alu element insertions remained hemizygous in one or more individuals in sixteen of the nineteen genomic loci. The Alu elements were found to be enriched for young Alu families with characteristic sequence features, such as the presence of a longer poly(A) tail. In addition, we documented the occurrence of a duplication of the AT-rich target site in their immediate flanking sequences, a hallmark of retrotransposition. Furthermore, we found the sequence motif (TT/AAAA) that is recognized by the ORF2P protein encoded by LINE-1 in their 5'-flanking regions, consistent with the fact that Alu retrotransposition is facilitated by LINE-1 elements. While most of these Alu elements were heavily methylated, we identified an Alu localized 1.5 kb downstream of TOMM5 that exhibited a completely unmethylated left arm. Interestingly, we observed differential methylation of its immediate 5' and 3' flanking CpG dinucleotides, in concordance with the unmethylated and methylated statuses of its internal 5' and 3' sequences, respectively. Importantly, TOMM5's CpG island and the 3 Alu repeats and 1 MIR element localized upstream of this newly inserted Alu were also found to be unmethylated. Methylation analyses of two additional genomic loci revealed no methylation differences in CpG dinucleotides flanking the Alu insertion sites in the two homologous chromosomes, irrespective of the presence or absence of the insertion.
We anticipate that the combination of methodologies utilized in this study, which included repeat-anchored bisulfite PCR sequencing and the computational analysis pipeline herein reported, will prove invaluable for the generation of genetic and epigenetic variation maps.
PMCID: PMC3272032  PMID: 22185517
2.  Construction of a medicinal leech transcriptome database and its application to the identification of leech homologs of neural and innate immune genes 
BMC Genomics  2010;11:407.
The medicinal leech, Hirudo medicinalis, is an important model system for the study of nervous system structure, function, development, regeneration and repair. It is also a unique species in being presently approved for use in medical procedures, such as clearing of pooled blood following certain surgical procedures. It is a current, and potentially also future, source of medically useful molecular factors, such as anticoagulants and antibacterial peptides, which may have evolved as a result of its parasitizing large mammals, including humans. Despite the broad focus of research on this system, little has been done at the genomic or transcriptomic levels and there is a paucity of openly available sequence data. To begin to address this problem, we constructed whole embryo and adult central nervous system (CNS) EST libraries and created a clustered sequence database of the Hirudo transcriptome that is available to the scientific community.
A total of ~133,000 EST clones from two directionally-cloned cDNA libraries, one constructed from mRNA derived from whole embryos at several developmental stages and the other from adult CNS cords, were sequenced in one or both directions by three different groups: Genoscope (French National Sequencing Center), the University of Iowa Sequencing Facility and the DOE Joint Genome Institute. These were assembled using the phrap software package into 31,232 unique contigs and singletons, with an average length of 827 nt. The assembled transcripts were then translated in all six frames and compared to proteins in NCBI's non-redundant (NR) and to the Gene Ontology (GO) protein sequence databases, resulting in 15,565 matches to 11,236 proteins in NR and 13,935 matches to 8,073 proteins in GO. Searching the database for transcripts of genes homologous to those thought to be involved in the innate immune responses of vertebrates and other invertebrates yielded a set of nearly one hundred evolutionarily conserved sequences, representing all known pathways involved in these important functions.
The sequences obtained for Hirudo transcripts represent the first major database of genes expressed in this important model system. Comparison of translated open reading frames (ORFs) with the other openly available leech datasets, the genome and transcriptome of Helobdella robusta, shows an average identity at the amino acid level of 58% in matched sequences. Interestingly, comparison with other available Lophotrochozoans shows similar high levels of amino acid identity, where sequences match, for example, 64% with Capitella capitata (a polychaete) and 56% with Aplysia californica (a mollusk), as well as 58% with Schistosoma mansoni (a platyhelminth). Phylogenetic comparisons of putative Hirudo innate immune response genes present within the Hirudo transcriptome database herein described show a strong resemblance to the corresponding mammalian genes, indicating that this important physiological response may have older origins than what has been previously proposed.
PMCID: PMC2996935  PMID: 20579359
3.  An insight into the sialome of Glossina morsitans morsitans 
BMC Genomics  2010;11:213.
Blood feeding evolved independently in worms, arthropods and mammals. Among the adaptations to this peculiar diet, these animals developed an armament of salivary molecules that disarm their host's anti-bleeding defenses (hemostasis), inflammatory and immune reactions. Recent sialotranscriptome analyses (from the Greek sialo = saliva) of blood feeding insects and ticks have revealed that the saliva contains hundreds of polypeptides, many unique to their genus or family. Adult tsetse flies feed exclusively on vertebrate blood and are important vectors of human and animal diseases. Thus far, only limited information exists regarding the Glossina sialome, or any other fly belonging to the Hippoboscidae.
As part of the effort to sequence the genome of Glossina morsitans morsitans, several organ specific, high quality normalized cDNA libraries have been constructed, from which over 20,000 ESTs from an adult salivary gland library were sequenced. These ESTs have been assembled using previously described ESTs from the fat body and midgut libraries of the same fly, thus totaling 62,251 ESTs, which have been assembled into 16,743 clusters (8,506 of which had one or more EST from the salivary gland library). Coding sequences were obtained for 2,509 novel proteins, 1,792 of which had at least one EST expressed in the salivary glands. Despite library normalization, 59 transcripts were overrepresented in the salivary library indicating high levels of expression. This work presents a detailed analysis of the salivary protein families identified. Protein expression was confirmed by 2D gel electrophoresis, enzymatic digestion and mass spectrometry. Concurrently, an initial attempt to determine the immunogenic properties of selected salivary proteins was undertaken.
The sialome of G. m. morsitans contains over 250 proteins that are possibly associated with blood feeding. This set includes alleles of previously described gene products, reveals new evidence that several salivary proteins are multigenic and identifies at least seven new polypeptide families unique to Glossina. Most of these proteins have no known function and thus, provide a discovery platform for the identification of novel pharmacologically active compounds, innovative vector-based vaccine targets, and immunological markers of vector exposure.
PMCID: PMC2853526  PMID: 20353571
4.  Gene discovery in an invasive tephritid model pest species, the Mediterranean fruit fly, Ceratitis capitata 
BMC Genomics  2008;9:243.
The medfly, Ceratitis capitata, is a highly invasive agricultural pest that has become a model insect for the development of biological control programs. Despite research into the behavior and classical and population genetics of this organism, the quantity of sequence data available is limited. We have utilized an expressed sequence tag (EST) approach to obtain detailed information on transcriptome signatures that relate to a variety of physiological systems in the medfly; this information emphasizes on reproduction, sex determination, and chemosensory perception, since the study was based on normalized cDNA libraries from embryos and adult heads.
A total of 21,253 high-quality ESTs were obtained from the embryo and head libraries. Clustering analyses performed separately for each library resulted in 5201 embryo and 6684 head transcripts. Considering an estimated 19% overlap in the transcriptomes of the two libraries, they represent about 9614 unique transcripts involved in a wide range of biological processes and molecular functions. Of particular interest are the sequences that share homology with Drosophila genes involved in sex determination, olfaction, and reproductive behavior. The medfly transformer2 (tra2) homolog was identified among the embryonic sequences, and its genomic organization and expression were characterized.
The sequences obtained in this study represent the first major dataset of expressed genes in a tephritid species of agricultural importance. This resource provides essential information to support the investigation of numerous questions regarding the biology of the medfly and other related species and also constitutes an invaluable tool for the annotation of complete genome sequences. Our study has revealed intriguing findings regarding the transcript regulation of tra2 and other sex determination genes, as well as insights into the comparative genomics of genes implicated in chemosensory reception and reproduction.
PMCID: PMC2427042  PMID: 18500975
5.  A pig multi-tissue normalised cDNA library: large-scale sequencing, cluster analysis and 9K micro-array resource generation 
BMC Genomics  2008;9:17.
Domestic animal breeding and product quality improvement require the control of reproduction, nutrition, health and welfare in these animals. It is thus necessary to improve our knowledge of the major physiological functions and their interactions. This would be greatly enhanced by the availability of expressed gene sequences in the databases and by cDNA arrays allowing the transcriptome analysis of any function.
The objective within the AGENAE French program was to initiate a high-throughput cDNA sequencing program of a 38-tissue normalised library and generate a diverse microarray for transcriptome analysis in pig species.
We constructed a multi-tissue cDNA library, which was normalised and subtracted to reduce the redundancy of the clones. Expressed Sequence Tags were produced and 24449 high-quality sequences were released in EMBL database. The assembly of all the public ESTs (available through SIGENAE website) resulted in 40786 contigs and 54653 singletons. At least one Agenae sequence is present in 11969 contigs (12.5%) and in 9291 of the deeper-than-one-contigs (22.8%). Sequence analysis showed that both normalisation and subtraction processes were successful and that the initial tissue complexity was maintained in the final libraries. A 9K nylon cDNA microarray was produced and is available through CRB-GADIE. It will allow high sensitivity transcriptome analyses in pigs.
In the present work, a pig multi-tissue cDNA library was constructed and a 9K cDNA microarray designed. It contributes to the Expressed Sequence Tags pig data, and offers a valuable tool for transcriptome analysis.
PMCID: PMC2257943  PMID: 18194535
6.  Insights into a dinoflagellate genome through expressed sequence tag analysis 
BMC Genomics  2005;6:80.
Dinoflagellates are important marine primary producers and grazers and cause toxic "red tides". These taxa are characterized by many unique features such as immense genomes, the absence of nucleosomes, and photosynthetic organelles (plastids) that have been gained and lost multiple times. We generated EST sequences from non-normalized and normalized cDNA libraries from a culture of the toxic species Alexandrium tamarense to elucidate dinoflagellate evolution. Previous analyses of these data have clarified plastid origin and here we study the gene content, annotate the ESTs, and analyze the genes that are putatively involved in DNA packaging.
Approximately 20% of the 6,723 unique (11,171 total 3'-reads) ESTs data could be annotated using Blast searches against GenBank. Several putative dinoflagellate-specific mRNAs were identified, including one novel plastid protein. Dinoflagellate genes, similar to other eukaryotes, have a high GC-content that is reflected in the amino acid codon usage. Highly represented transcripts include histone-like (HLP) and luciferin binding proteins and several genes occur in families that encode nearly identical proteins. We also identified rare transcripts encoding a predicted protein highly similar to histone H2A.X. We speculate this histone may be retained for its role in DNA double-strand break repair.
This is the most extensive collection to date of ESTs from a toxic dinoflagellate. These data will be instrumental to future research to understand the unique and complex cell biology of these organisms and for potentially identifying the genes involved in toxin production.
PMCID: PMC1173104  PMID: 15921535

Results 1-6 (6)