Search tips
Search criteria

Results 1-9 (9)

Clipboard (0)
Year of Publication
Document Types
1.  Updated genome assembly and annotation of Paenibacillus larvae, the agent of American foulbrood disease of honey bees 
BMC Genomics  2011;12:450.
As scientists continue to pursue various 'omics-based research, there is a need for high quality data for the most fundamental 'omics of all: genomics. The bacterium Paenibacillus larvae is the causative agent of the honey bee disease American foulbrood. If untreated, it can lead to the demise of an entire hive; the highly social nature of bees also leads to easy disease spread, between both individuals and colonies. Biologists have studied this organism since the early 1900s, and a century later, the molecular mechanism of infection remains elusive. Transcriptomics and proteomics, because of their ability to analyze multiple genes and proteins in a high-throughput manner, may be very helpful to its study. However, the power of these methodologies is severely limited without a complete genome; we undertake to address that deficiency here.
We used the Illumina GAIIx platform and conventional Sanger sequencing to generate a 182-fold sequence coverage of the P. larvae genome, and assembled the data using ABySS into a total of 388 contigs spanning 4.5 Mbp. Comparative genomics analysis against fully-sequenced soil bacteria P. JDR2 and P. vortex showed that regions of poor conservation may contain putative virulence factors. We used GLIMMER to predict 3568 gene models, and named them based on homology revealed by BLAST searches; proteases, hemolytic factors, toxins, and antibiotic resistance enzymes were identified in this way. Finally, mass spectrometry was used to provide experimental evidence that at least 35% of the genes are expressed at the protein level.
This update on the genome of P. larvae and annotation represents an immense advancement from what we had previously known about this species. We provide here a reliable resource that can be used to elucidate the mechanism of infection, and by extension, more effective methods to control and cure this widespread honey bee disease.
PMCID: PMC3188533  PMID: 21923906
2.  Gene discovery for the bark beetle-vectored fungal tree pathogen Grosmannia clavigera 
BMC Genomics  2010;11:536.
Grosmannia clavigera is a bark beetle-vectored fungal pathogen of pines that causes wood discoloration and may kill trees by disrupting nutrient and water transport. Trees respond to attacks from beetles and associated fungi by releasing terpenoid and phenolic defense compounds. It is unclear which genes are important for G. clavigera's ability to overcome antifungal pine terpenoids and phenolics.
We constructed seven cDNA libraries from eight G. clavigera isolates grown under various culture conditions, and Sanger sequenced the 5' and 3' ends of 25,000 cDNA clones, resulting in 44,288 high quality ESTs. The assembled dataset of unique transcripts (unigenes) consists of 6,265 contigs and 2,459 singletons that mapped to 6,467 locations on the G. clavigera reference genome, representing ~70% of the predicted G. clavigera genes. Although only 54% of the unigenes matched characterized proteins at the NCBI database, this dataset extensively covers major metabolic pathways, cellular processes, and genes necessary for response to environmental stimuli and genetic information processing. Furthermore, we identified genes expressed in spores prior to germination, and genes involved in response to treatment with lodgepole pine phloem extract (LPPE).
We provide a comprehensively annotated EST dataset for G. clavigera that represents a rich resource for gene characterization in this and other ophiostomatoid fungi. Genes expressed in response to LPPE treatment are indicative of fungal oxidative stress response. We identified two clusters of potentially functionally related genes responsive to LPPE treatment. Furthermore, we report a simple method for identifying contig misassemblies in de novo assembled EST collections caused by gene overlap on the genome.
PMCID: PMC3091685  PMID: 20920358
3.  Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome 
BMC Genomics  2010;11:279.
Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar), but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution.
From existing expressed sequence tag (EST) resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius) ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates.
9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.
PMCID: PMC2886063  PMID: 20433749
4.  Genomic sequence of a mutant strain of Caenorhabditis elegans with an altered recombination pattern 
BMC Genomics  2010;11:131.
The original sequencing and annotation of the Caenorhabditis elegans genome along with recent advances in sequencing technology provide an exceptional opportunity for the genomic analysis of wild-type and mutant strains. Using the Illumina Genome Analyzer, we sequenced the entire genome of Rec-1, a strain that alters the distribution of meiotic crossovers without changing the overall frequency. Rec-1 was derived from ethylmethane sulfonate (EMS)-treated strains, one of which had a high level of transposable element mobility. Sequencing of this strain provides an opportunity to examine the consequences on the genome of altering the distribution of meiotic recombination events.
Using Illumina sequencing and MAQ software, 83% of the base pair sequence reads were aligned to the reference genome available at Wormbase, providing a 21-fold coverage of the genome. Using the software programs MAQ and Slider, we observed 1124 base pair differences between Rec-1 and the reference genome in Wormbase (WS190), and 441 between the mutagenized Rec-1 (BC313) and the wild-type N2 strain (VC2010). The most frequent base-substitution was G:C to A:T, 141 for the entire genome most of which were on chromosomes I or X, 55 and 31 respectively. With this data removed, no obvious pattern in the distribution of the base differences along the chromosomes was apparent. No major chromosomal rearrangements were observed, but additional insertions of transposable elements were detected. There are 11 extra copies of Tc1, and 8 of Tc2 in the Rec-1 genome, most likely the remains of past high-hopper activity in a progenitor strain.
Our analysis of high-throughput sequencing was able to detect regions of direct repeat sequences, deletions, insertions of transposable elements, and base pair differences. A subset of sequence alterations affecting coding regions were confirmed by an independent approach using oligo array comparative genome hybridization. The major phenotype of the Rec-1 strain is an alteration in the preferred position of the meiotic recombination event with no other significant phenotypic consequences. In this study, we observed no evidence of a mutator effect at the nucleotide level attributable to the Rec-1 mutation.
PMCID: PMC2837035  PMID: 20178641
5.  Identification of novel androgen-responsive genes by sequencing of LongSAGE libraries 
BMC Genomics  2009;10:476.
The development and maintenance of the prostate is dependent on androgens and the androgen receptor. The androgen pathway continues to be important in prostate cancer. Here, we evaluated the transcriptome of prostate cancer cells in response to androgen using long serial analysis of gene expression (LongSAGE) libraries.
There were 131 tags (87 genes) that displayed statistically significant (p ≤ 0.001) differences in expression in response to androgen. Many of the genes identified by LongSAGE (35/87) have not been previously reported to change expression in the direction or sense observed. In regulatory regions of the promoter and/or enhancer regions of some of these genes there are confirmed or potential androgen response elements (AREs). The expression trends of 24 novel genes were validated using quantitative real time-polymerase chain reaction (qRT-PCR). These genes were: ARL6IP5, BLVRB, C19orf48, C1orf122, C6orf66, CAMK2N1, CCNI, DERA, ERRFI1, GLUL, GOLPH3, HM13, HSP90B1, MANEA, NANS, NIPSNAP3A, SLC41A1, SOD1, SVIP, TAOK3, TCP1, TMEM66, USP33, and VTA1. The physiological relevance of these expression trends was evaluated in vivo using the LNCaP Hollow Fibre model. Novel androgen-responsive genes identified here participate in protein synthesis and trafficking, response to oxidative stress, transcription, proliferation, apoptosis, and differentiation.
These processes may represent the molecular mechanisms of androgen-dependency of the prostate. Genes that participate in these pathways may be targets for therapies or biomarkers of prostate cancer.
PMCID: PMC2766392  PMID: 19832994
6.  A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis) 
BMC Genomics  2008;9:484.
Members of the pine family (Pinaceae), especially species of spruce (Picea spp.) and pine (Pinus spp.), dominate many of the world's temperate and boreal forests. These conifer forests are of critical importance for global ecosystem stability and biodiversity. They also provide the majority of the world's wood and fiber supply and serve as a renewable resource for other industrial biomaterials. In contrast to angiosperms, functional and comparative genomics research on conifers, or other gymnosperms, is limited by the lack of a relevant reference genome sequence. Sequence-finished full-length (FL)cDNAs and large collections of expressed sequence tags (ESTs) are essential for gene discovery, functional genomics, and for future efforts of conifer genome annotation.
As part of a conifer genomics program to characterize defense against insects and adaptation to local environments, and to discover genes for the production of biomaterials, we developed 20 standard, normalized or full-length enriched cDNA libraries from Sitka spruce (P. sitchensis), white spruce (P. glauca), and interior spruce (P. glauca-engelmannii complex). We sequenced and analyzed 206,875 3'- or 5'-end ESTs from these libraries, and developed a resource of 6,464 high-quality sequence-finished FLcDNAs from Sitka spruce. Clustering and assembly of 147,146 3'-end ESTs resulted in 19,941 contigs and 26,804 singletons, representing 46,745 putative unique transcripts (PUTs). The 6,464 FLcDNAs were all obtained from a single Sitka spruce genotype and represent 5,718 PUTs.
This paper provides detailed annotation and quality assessment of a large EST and FLcDNA resource for spruce. The 6,464 Sitka spruce FLcDNAs represent the third largest sequence-verified FLcDNA resource for any plant species, behind only rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), and the only substantial FLcDNA resource for a gymnosperm. Our emphasis on capturing FLcDNAs and ESTs from cDNA libraries representing herbivore-, wound- or elicitor-treated induced spruce tissues, along with incorporating normalization to capture rare transcripts, resulted in a rich resource for functional genomics and proteomics studies. Sequence comparisons against five plant genomes and the non-redundant GenBank protein database revealed that a substantial number of spruce transcripts have no obvious similarity to known angiosperm gene sequences. Opportunities for future applications of the sequence and clone resources for comparative and functional genomics are discussed.
PMCID: PMC2579922  PMID: 18854048
7.  Analysis of 4,664 high-quality sequence-finished poplar full-length cDNA clones and their utility for the discovery of genes responding to insect feeding 
BMC Genomics  2008;9:57.
The genus Populus includes poplars, aspens and cottonwoods, which will be collectively referred to as poplars hereafter unless otherwise specified. Poplars are the dominant tree species in many forest ecosystems in the Northern Hemisphere and are of substantial economic value in plantation forestry. Poplar has been established as a model system for genomics studies of growth, development, and adaptation of woody perennial plants including secondary xylem formation, dormancy, adaptation to local environments, and biotic interactions.
As part of the poplar genome sequencing project and the development of genomic resources for poplar, we have generated a full-length (FL)-cDNA collection using the biotinylated CAP trapper method. We constructed four FLcDNA libraries using RNA from xylem, phloem and cambium, and green shoot tips and leaves from the P. trichocarpa Nisqually-1 genotype, as well as insect-attacked leaves of the P. trichocarpa × P. deltoides hybrid. Following careful selection of candidate cDNA clones, we used a combined strategy of paired end reads and primer walking to generate a set of 4,664 high-accuracy, sequence-verified FLcDNAs, which clustered into 3,990 putative unique genes. Mapping FLcDNAs to the poplar genome sequence combined with BLAST comparisons to previously predicted protein coding sequences in the poplar genome identified 39 FLcDNAs that likely localize to gaps in the current genome sequence assembly. Another 173 FLcDNAs mapped to the genome sequence but were not included among the previously predicted genes in the poplar genome. Comparative sequence analysis against Arabidopsis thaliana and other species in the non-redundant database of GenBank revealed that 11.5% of the poplar FLcDNAs display no significant sequence similarity to other plant proteins. By mapping the poplar FLcDNAs against transcriptome data previously obtained with a 15.5 K cDNA microarray, we identified 153 FLcDNA clones for genes that were differentially expressed in poplar leaves attacked by forest tent caterpillars.
This study has generated a high-quality FLcDNA resource for poplar and the third largest FLcDNA collection published to date for any plant species. We successfully used the FLcDNA sequences to reassess gene prediction in the poplar genome sequence, perform comparative sequence annotation, and identify differentially expressed transcripts associated with defense against insects. The FLcDNA sequences will be essential to the ongoing curation and annotation of the poplar genome, in particular for targeting gaps in the current genome assembly and further improvement of gene predictions. The physical FLcDNA clones will serve as useful reagents for functional genomics research in areas such as analysis of gene functions in defense against insects and perennial growth. Sequences from this study have been deposited in NCBI GenBank under the accession numbers EF144175 to EF148838.
PMCID: PMC2270264  PMID: 18230180
8.  Novel expressed sequences identified in a model of androgen independent prostate cancer 
BMC Genomics  2007;8:32.
Prostate cancer is the most frequently diagnosed cancer in American men, and few effective treatment options are available to patients who develop hormone-refractory prostate cancer. The molecular changes that occur to allow prostate cells to proliferate in the absence of androgens are not fully understood.
Subtractive hybridization experiments performed with samples from an in vivo model of hormonal progression identified 25 expressed sequences representing novel human transcripts. Intriguingly, these 25 sequences have small open-reading frames and are not highly conserved through evolution, suggesting many of these novel expressed sequences may be derived from untranslated regions of novel transcripts or from non-coding transcripts. Examination of a large metalibrary of human Serial Analysis of Gene Expression (SAGE) tags demonstrated that only three of these novel sequences had been previously detected. RT-PCR experiments confirmed that the 6 sequences tested were expressed in specific human tissues, as well as in clinical samples of prostate cancer. Further RT-PCR experiments for five of these fragments indicated they originated from large untranslated regions of unannotated transcripts.
This study underlines the value of using complementary techniques in the annotation of the human genome. The tissue-specific expression of 4 of the 6 clones tested indicates the expression of these novel transcripts is tightly regulated, and future work will determine the possible role(s) these novel transcripts may play in the progression of prostate cancer.
PMCID: PMC1790899  PMID: 17257419
9.  Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach 
BMC Genomics  2006;7:246.
High throughput sequencing-by-synthesis is an emerging technology that allows the rapid production of millions of bases of data. Although the sequence reads are short, they can readily be used for re-sequencing. By re-sequencing the mRNA products of a cell, one may rapidly discover polymorphisms and splice variants particular to that cell.
We present the utility of massively parallel sequencing by synthesis for profiling the transcriptome of a human prostate cancer cell-line, LNCaP, that has been treated with the synthetic androgen, R1881. Through the generation of approximately 20 megabases (MB) of EST data, we detect transcription from over 10,000 gene loci, 25 previously undescribed alternative splicing events involving known exons, and over 1,500 high quality single nucleotide discrepancies with the reference human sequence. Further, we map nearly 10,000 ESTs to positions on the genome where no transcription is currently predicted to occur. We also characterize various obstacles with using sequencing by synthesis for transcriptome analysis and propose solutions to these problems.
The use of high-throughput sequencing-by-synthesis methods for transcript profiling allows the specific and sensitive detection of many of a cell's transcripts, and also allows the discovery of high quality base discrepancies, and alternative splice variants. Thus, this technology may provide an effective means of understanding various disease states, discovering novel targets for disease treatment, and discovery of novel transcripts.
PMCID: PMC1592491  PMID: 17010196

Results 1-9 (9)