Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)
Year of Publication
Document Types
1.  Single nucleotide polymorphism discovery in elite north american potato germplasm 
BMC Genomics  2011;12:302.
Current breeding approaches in potato rely almost entirely on phenotypic evaluations; molecular markers, with the exception of a few linked to disease resistance traits, are not widely used. Large-scale sequence datasets generated primarily through Sanger Expressed Sequence Tag projects are available from a limited number of potato cultivars and access to next generation sequencing technologies permits rapid generation of sequence data for additional cultivars. When coupled with the advent of high throughput genotyping methods, an opportunity now exists for potato breeders to incorporate considerably more genotypic data into their decision-making.
To identify a large number of Single Nucleotide Polymorphisms (SNPs) in elite potato germplasm, we sequenced normalized cDNA prepared from three commercial potato cultivars: 'Atlantic', 'Premier Russet' and 'Snowden'. For each cultivar, we generated 2 Gb of sequence which was assembled into a representative transcriptome of ~28-29 Mb for each cultivar. Using the Maq SNP filter that filters read depth, density, and quality, 575,340 SNPs were identified within these three cultivars. In parallel, 2,358 SNPs were identified within existing Sanger sequences for three additional cultivars, 'Bintje', 'Kennebec', and 'Shepody'. Using a stringent set of filters in conjunction with the potato reference genome, we identified 69,011 high confidence SNPs from these six cultivars for use in genotyping with the Infinium platform. Ninety-six of these SNPs were used with a BeadXpress assay to assess allelic diversity in a germplasm panel of 248 lines; 82 of the SNPs proved sufficiently informative for subsequent analyses. Within diverse North American germplasm, the chip processing market class was most distinct, clearly separated from all other market classes. The round white and russet market classes both include fresh market and processing cultivars. Nevertheless, the russet and round white market classes are more distant from each other than processing are from fresh market types within these two groups.
The genotype data generated in this study, albeit limited in number, has revealed distinct relationships among the market classes of potato. The SNPs identified in this study will enable high-throughput genotyping of germplasm and populations, which in turn will enable more efficient marker-assisted breeding efforts in potato.
PMCID: PMC3128068  PMID: 21658273
2.  Identification and characterization of pseudogenes in the rice gene complement 
BMC Genomics  2009;10:317.
The Osa1 Genome Annotation of rice (Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog.
A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes.
These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.
PMCID: PMC2724416  PMID: 19607679
3.  Analysis of the Pythium ultimum transcriptome using Sanger and Pyrosequencing approaches 
BMC Genomics  2008;9:542.
Pythium species are an agriculturally important genus of plant pathogens, yet are not understood well at the molecular, genetic, or genomic level. They are closely related to other oomycete plant pathogens such as Phytophthora species and are ubiquitous in their geographic distribution and host rage. To gain a better understanding of its gene complement, we generated Expressed Sequence Tags (ESTs) from the transcriptome of Pythium ultimum DAOM BR144 (= ATCC 200006 = CBS 805.95) using two high throughput sequencing methods, Sanger-based chain termination sequencing and pyrosequencing-based sequencing-by-synthesis.
A single half-plate pyrosequencing (454 FLX) run on adapter-ligated cDNA from a normalized cDNA population generated 90,664 reads with an average read length of 190 nucleotides following cleaning and removal of sequences shorter than 100 base pairs. After clustering and assembly, a total of 35,507 unique sequences were generated. In parallel, 9,578 reads were generated from a library constructed from the same normalized cDNA population using dideoxy chain termination Sanger sequencing, which upon clustering and assembly generated 4,689 unique sequences. A hybrid assembly of both Sanger- and pyrosequencing-derived ESTs resulted in 34,495 unique sequences with 1,110 sequences (3.2%) that were solely derived from Sanger sequencing alone. A high degree of similarity was seen between P. ultimum sequences and other sequenced plant pathogenic oomycetes with 91% of the hybrid assembly derived sequences > 500 bp having similarity to sequences from plant pathogenic Phytophthora species. An analysis of Gene Ontology assignments revealed a similar representation of molecular function ontologies in the hybrid assembly in comparison to the predicted proteomes of three Phytophthora species, suggesting a broad representation of the P. ultimum transcriptome was present in the normalized cDNA population. P. ultimum sequences with similarity to oomycete RXLR and Crinkler effectors, Kazal-like and cystatin-like protease inhibitors, and elicitins were identified. Sequences with similarity to thiamine biosynthesis enzymes that are lacking in the genome sequences of three Phytophthora species and one downy mildew were identified and could serve as useful phylogenetic markers. Furthermore, we identified 179 candidate simple sequence repeats that can be used for genotyping strains of P. ultimum.
Through these two technologies, we were able to generate a robust set (~10 Mb) of transcribed sequences for P. ultimum. We were able to identify known sequences present in oomycetes as well as identify novel sequences. An ample number of candidate polymorphic markers were identified in the dataset providing resources for phylogenetic and diagnostic marker development for this species. On a technical level, in spite of the depth possible with 454 FLX platform, the Sanger and pyro-based sequencing methodologies were complementary as each method generated sequences unique to each platform.
PMCID: PMC2612028  PMID: 19014603
4.  Analysis of 90 Mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition 
BMC Genomics  2008;9:286.
The Solanaceae family contains a number of important crop species including potato (Solanum tuberosum) which is grown for its underground storage organ known as a tuber. Albeit the 4th most important food crop in the world, other than a collection of ~220,000 Expressed Sequence Tags, limited genomic sequence information is currently available for potato and advances in potato yield and nutrition content would be greatly assisted through access to a complete genome sequence. While morphologically diverse, Solanaceae species such as potato, tomato, pepper, and eggplant share not only genes but also gene order thereby permitting highly informative comparative genomic analyses.
In this study, we report on analysis 89.9 Mb of potato genomic sequence representing 10.2% of the genome generated through end sequencing of a potato bacterial artificial chromosome (BAC) clone library (87 Mb) and sequencing of 22 potato BAC clones (2.9 Mb). The GC content of potato is very similar to Solanum lycopersicon (tomato) and other dicotyledonous species yet distinct from the monocotyledonous grass species, Oryza sativa. Parallel analyses of repetitive sequences in potato and tomato revealed substantial differences in their abundance, 34.2% in potato versus 46.3% in tomato, which is consistent with the increased genome size per haploid genome of these two Solanum species. Specific classes and types of repetitive sequences were also differentially represented between these two species including a telomeric-related repetitive sequence, ribosomal DNA, and a number of unclassified repetitive sequences. Comparative analyses between tomato and potato at the gene level revealed a high level of conservation of gene content, genic feature, and gene order although discordances in synteny were observed.
Genomic level analyses of potato and tomato confirm that gene sequence and gene order are conserved between these solanaceous species and that this conservation can be leveraged in genomic applications including cross-species annotation and genome sequencing initiatives. While tomato and potato share genic features, they differ in their repetitive sequence content and composition suggesting that repetitive sequences may have a more significant role in shaping speciation than previously reported.
PMCID: PMC2442093  PMID: 18554403
5.  Diversity in conserved genes in tomato 
BMC Genomics  2007;8:465.
Tomato has excellent genetic and genomic resources including a broad set of Expressed Sequence Tag (EST) data and high-density genetic maps. In addition, emerging physical maps and bacterial artificial clone sequence data serve as template to investigate genetic variation within the cultivated germplasm pool with the goal to manipulate agriculturally important traits. Unfortunately, the nearly exclusive focus of resource development on interspecific populations for genetic analyses and diversity studies has left a void in our understanding of genotypic variation within tomato breeding programs that focus on intra-specific populations. We describe the results of a study to identify nucleotide variation within tomato breeding germplasm and mapping parents for a set of conserved single-copy ESTs that are orthologous between tomato and Arabidopsis.
Using a pooled sequencing strategy, 967 tomato transcripts were screened for polymorphism in 12 tomato lines. Although intron position was conserved, intron lengths were 2-fold larger in tomato than in Arabidopsis. A total of 1,487 single nucleotide polymorphisms and 282 insertion/deletions were identified, of which 579 and 206 were polymorphic in breeding germplasm, respectively. Fresh market and processing germplasm were clearly divergent, as were Solanum lycopersicum var. cerasiformae and Solanum pimpinellifolium, tomato's closest relatives. The polymorphisms identified serve as marker resources for tomato. The COS is also applicable to other Solanaceae crops.
The results from this research enabled significant progress towards bridging the gap between genetic and genomic resources developed for populations derived from wide crosses and those applicable to intra-specific crosses for breeding in tomato.
PMCID: PMC2249608  PMID: 18088428
6.  EuCAP, a Eukaryotic Community Annotation Package, and its application to the rice genome 
BMC Genomics  2007;8:388.
Despite the improvements of tools for automated annotation of genome sequences, manual curation at the structural and functional level can provide an increased level of refinement to genome annotation. The Institute for Genomic Research Rice Genome Annotation (hereafter named the Osa1 Genome Annotation) is the product of an automated pipeline and, for this reason, will benefit from the input of biologists with expertise in rice and/or particular gene families. Leveraging knowledge from a dispersed community of scientists is a demonstrated way of improving a genome annotation. This requires tools that facilitate 1) the submission of gene annotation to an annotation project, 2) the review of the submitted models by project annotators, and 3) the incorporation of the submitted models in the ongoing annotation effort.
We have developed the Eukaryotic Community Annotation Package (EuCAP), an annotation tool, and have applied it to the rice genome. The primary level of curation by community annotators (CA) has been the annotation of gene families. Annotation can be submitted by email or through the EuCAP Web Tool. The CA models are aligned to the rice pseudomolecules and the coordinates of these alignments, along with functional annotation, are stored in the MySQL EuCAP Gene Model database. Web pages displaying the alignments of the CA models to the Osa1 Genome models are automatically generated from the EuCAP Gene Model database. The alignments are reviewed by the project annotators (PAs) in the context of experimental evidence. Upon approval by the PAs, the CA models, along with the corresponding functional annotations, are integrated into the Osa1 Genome Annotation. The CA annotations, grouped by family, are displayed on the Community Annotation pages of the project website , as well as in the Community Annotation track of the Genome Browser.
We have applied EuCAP to rice. As of July 2007, the structural and/or functional annotation of 1,094 genes representing 57 families have been deposited and integrated into the current gene set. All of the EuCAP components are open-source, thereby allowing the implementation of EuCAP for the annotation of other genomes. EuCAP is available at .
PMCID: PMC2151081  PMID: 17961238
7.  Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis 
BMC Genomics  2006;7:327.
Recently, genomic sequencing efforts were finished for Oryza sativa (cultivated rice) and Arabidopsis thaliana (Arabidopsis). Additionally, these two plant species have extensive cDNA and expressed sequence tag (EST) libraries. We employed the Program to Assemble Spliced Alignments (PASA) to identify and analyze alternatively spliced isoforms in both species.
A comprehensive analysis of alternative splicing was performed in rice that started with >1.1 million publicly available spliced ESTs and over 30,000 full length cDNAs in conjunction with the newly enhanced PASA software. A parallel analysis was performed with Arabidopsis to compare and ascertain potential differences between monocots and dicots. Alternative splicing is a widespread phenomenon (observed in greater than 30% of the loci with transcript support) and we have described nine alternative splicing variations. While alternative splicing has the potential to create many RNA isoforms from a single locus, the majority of loci generate only two or three isoforms and transcript support indicates that these isoforms are generally not rare events. For the alternate donor (AD) and acceptor (AA) classes, the distance between the splice sites for the majority of events was found to be less than 50 basepairs (bp). In both species, the most frequent distance between AA is 3 bp, consistent with reports in mammalian systems. Conversely, the most frequent distance between AD is 4 bp in both plant species, as previously observed in mouse. Most alternative splicing variations are localized to the protein coding sequence and are predicted to significantly alter the coding sequence.
Alternative splicing is widespread in both rice and Arabidopsis and these species share many common features. Interestingly, alternative splicing may play a role beyond creating novel combinations of transcripts that expand the proteome. Many isoforms will presumably have negative consequences for protein structure and function, suggesting that their biological role involves post-transcriptional regulation of gene expression.
PMCID: PMC1769492  PMID: 17194304
8.  Comparative analyses of six solanaceous transcriptomes reveal a high degree of sequence conservation and species-specific transcripts 
BMC Genomics  2005;6:124.
The Solanaceae is a family of closely related species with diverse phenotypes that have been exploited for agronomic purposes. Previous studies involving a small number of genes suggested sequence conservation across the Solanaceae. The availability of large collections of Expressed Sequence Tags (ESTs) for the Solanaceae now provides the opportunity to assess sequence conservation and divergence on a genomic scale.
All available ESTs and Expressed Transcripts (ETs), 449,224 sequences for six Solanaceae species (potato, tomato, pepper, petunia, tobacco and Nicotiana benthamiana), were clustered and assembled into gene indices. Examination of gene ontologies revealed that the transcripts within the gene indices encode a similar suite of biological processes. Although the ESTs and ETs were derived from a variety of tissues, 55–81% of the sequences had significant similarity at the nucleotide level with sequences among the six species. Putative orthologs could be identified for 28–58% of the sequences. This high degree of sequence conservation was supported by expression profiling using heterologous hybridizations to potato cDNA arrays that showed similar expression patterns in mature leaves for all six solanaceous species. 16–19% of the transcripts within the six Solanaceae gene indices did not have matches among Solanaceae, Arabidopsis, rice or 21 other plant gene indices.
Results from this genome scale analysis confirmed a high level of sequence conservation at the nucleotide level of the coding sequence among Solanaceae. Additionally, the results indicated that part of the Solanaceae transcriptome is likely to be unique for each species.
PMCID: PMC1249569  PMID: 16162286

Results 1-8 (8)