PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (666860)

Clipboard (0)
None

Related Articles

1.  Identifying and mapping novel retinal-expressed ESTs from humans 
Molecular vision  1999;5:5.
Purpose
The goal of this study was to develop efficient methods to identify tissue-specific expressed sequence tags (ESTs) and to map their locations in the human genome. Through a combination of database analysis and laboratory investigation, unique retina-specific ESTs were identified and mapped as candidate genes for inherited retinal diseases.
Methods
DNA sequences from retina-specific EST clusters were obtained from the TIGR Human Gene Index Database. Further processing of the EST sequence data was necessary to ensure that each EST cluster represented a novel, non-redundant mapping candidate. Processing involved screening for homologies to known genes and proteins using BLAST, excluding known human gene sequences and repeat sequences, and developing primers for PCR amplification of the gene encoding each cDNA cluster from genomic DNA. The EST clusters were mapped using the GeneBridge 4.0 Radiation Hybrid Mapping Panel with standard PCR conditions.
Results
A total of 83 retinal-expressed EST clusters were examined as potential novel, non-redundant mapping candidates. Fifty-five clusters were mapped successfully and their locations compared to the locations of known retinal disease genes. Fourteen EST clusters localize to candidate regions for inherited retinal diseases.
Conclusions
This pilot study developed methodology for mapping uniquely expressed retinal ESTs and for identifying potential candidate genes for inherited retinal disorders. Despite the overall success, several complicating factors contributed to the high failure rate (33%) for mapping EST-clustered sequences. These include redundancy in the sequence data, widely dispersed sequences, ambiguous nucleotides within the sequences, the possibility of amplifying through introns and the presence of repetitive elements within the sequence. However, the combination of database analysis and laboratory mapping is a powerful method for identification of candidate genes for inherited diseases.
PMCID: PMC2583080  PMID: 10228186
2.  Annotation and analysis of 10,000 expressed sequence tags from developing mouse eye and adult retina 
Genome Biology  2003;4(10):R65.
The generation and analysis of 10,000 expressed sequence tags (ESTs) from three mouse eye tissue cDNA libraries is reported that identifies a large number of potentially interesting genes for biological investigation.
Background
As a biomarker of cellular activities, the transcriptome of a specific tissue or cell type during development and disease is of great biomedical interest. We have generated and analyzed 10,000 expressed sequence tags (ESTs) from three mouse eye tissue cDNA libraries: embryonic day 15.5 (M15E) eye, postnatal day 2 (M2PN) eye and adult retina (MRA).
Results
Annotation of 8,633 non-mitochondrial and non-ribosomal high-quality ESTs revealed that 57% of the sequences represent known genes and 43% are unknown or novel ESTs, with M15E having the highest percentage of novel ESTs. Of these, 2,361 ESTs correspond to 747 unique genes and the remaining 6,272 are represented only once. Phototransduction genes are preferentially identified in MRA, whereas transcripts for cell structure and regulatory proteins are highly expressed in the developing eye. Map locations of human orthologs of known genes uncovered a high density of ocular genes on chromosome 17, and identified 277 genes in the critical regions of 37 retinal disease loci. In silico expression profiling identified 210 genes and/or ESTs over-expressed in the eye; of these, more than 26 are known to have vital retinal function. Comparisons between libraries provided a list of temporally regulated genes and/or ESTs. A few of these were validated by qRT-PCR analysis.
Conclusions
Our studies present a large number of potentially interesting genes for biological investigation, and the annotated EST set provides a useful resource for microarray and functional genomic studies.
PMCID: PMC328454  PMID: 14519200
3.  The Prostate Expression Database (PEDB): status and enhancements in 2000 
Nucleic Acids Research  2000;28(1):212-213.
The Prostate Expression Database (PEDB) is an online resource designed to access and analyze gene expression information derived from the human prostate. PEDB archives >55 000 expressed sequence tags (ESTs) from 43 cDNA libraries in a curated relational database that provides detailed library information including tissue source, library construction methods, sequence diversity and sequence abundance. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons. Recent enhancements to PEDB include: (i) the functional categorization of annotated EST assemblies using a classification scheme developed at The Institute for Genome Research; (ii) catalogs of expressed genes in specific prostate tissue sources designated as transcriptomes; and (iii) the addition of prostate proteome information derived from two-dimensional electrophoreses and mass spectrometry of prostate cancer cell lines. PEDB may be accessed via the WWW at http://www.mbt.washington.edu/PEDB/
PMCID: PMC102457  PMID: 10592228
4.  Comparative Gene Expression Analysis of Susceptible and Resistant Near-Isogenic Lines in Common Wheat Infected by Puccinia triticina 
Gene expression after leaf rust infection was compared in near-isogenic wheat lines differing in the Lr10 leaf rust resistance gene. RNA from susceptible and resistant plants was used for cDNA library construction. In total, 55 008 ESTs were sequenced from the two libraries, then combined and assembled into 14 268 unigenes for further analysis. Of these ESTs, 89% encoded proteins similar to (E value of ≤10−5) characterized or annotated proteins from the NCBI non-redundant database representing diverse molecular functions, cellular localization and biological processes based on gene ontology classification. Further, the unigenes were classified into susceptible and resistant classes based on the EST members assembled from the respective libraries. Several genes from the resistant sample (14-3-3 protein, wali5 protein, actin-depolymerization factor and ADP-ribosylation factor) and the susceptible sample (brown plant hopper resistance protein, caffeic acid O-methyltransferase, pathogenesis-related protein and senescence-associated protein) were selected and their differential expression in the resistant and susceptible samples collected at different time points after leaf rust infection was confirmed by RT–PCR analysis. The molecular pathogenicity of leaf rust in wheat was studied and the EST data generated made a foundation for future studies.
doi:10.1093/dnares/dsq009
PMCID: PMC2920755  PMID: 20360266
wheat; leaf rust; ESTs; resistance; susceptible
5.  Analysis of the Asian Seabass Transcriptome Based on Expressed Sequence Tags 
Analysis of transcriptomes is of great importance in genomic studies. Asian seabass is an important fish species. A number of genomic tools in it were developed, while large expressed sequence tag (EST) data are lacking. We sequenced ESTs from nine normalized cDNA libraries and obtained 11 431 high-quality ESTs. We retrieved 8524 ESTs from dbEST database and analyzed all 19 975 ESTs using bioinformatics tools. After clustering, we obtained 8837 unique sequences (2838 contigs and 5999 singletons). The average contig length was 574 bp. Annotation of these unique sequences revealed that 48.9% of them showed significant homology to RNA sequences in GenBank. Functional classification of the unique ESTs identified a broad range of genes involved in different functions. We identified 6114 putative single-nucleotide polymorphisms and 634 microsatellites in ESTs. We discovered different temporal and spatial expression patterns of some immune-related genes in the Asian seabass after challenging with a pathogen Vibrio harveyi. The unique EST sequences are being used in developing a cDNA microarray to examine global gene expression and will also facilitate future whole-genome sequence assembly and annotation of Asian seabass and comparative genomics.
doi:10.1093/dnares/dsr036
PMCID: PMC3223082  PMID: 22086997
Asian seabass; EST; function; expression
6.  Construction and Application of an Electronic Spatiotemporal Expression Profile and Gene Ontology Analysis Platform Based on the EST Database of the Silkworm, Bombyx mori  
An Expressed Sequence Tag (EST) is a short sub-sequence of a transcribed cDNA sequence. ESTs represent gene expression and give good clues for gene expression analysis. Based on EST data obtained from NCBI, an EST analysis package was developed (apEST). This tool was programmed for electronic expression, protein annotation and Gene Ontology (GO) category analysis in Bombyx mori (L.) (Lepidoptera: Bombycidae). A total of 245,761 ESTs (as of 01 July 2009) were searched and downloaded in FASTA format, from which information for tissue type, development stage, sex and strain were extracted, classified and summed by running apEST. Then, corresponding distribution profiles were formed after redundant parts had been removed. Gene expression profiles for one tissue of different developmental stages and from one development stage of the different tissues were attained. A housekeeping gene and tissue-and-stage-specific genes were selected by running apEST, contrasting with two other online analysis approaches, microarray-based gene expression profile on SilkDB (BmMDB) and EST profile on NCBI. A spatio-temporal expression profile of catalase run by apEST was then presented as a three-dimensional graph for the intuitive visualization of patterns. A total of 37 query genes confirmed from microarray data and RT—PCR experiments were selected as queries to test apEST. The results had great conformity among three approaches. Nevertheless, there were minor differences between apEST and BmMDB because of the unique items investigated. Therefore, complementary analysis was proposed. Application of apEST also led to the acquisition of corresponding protein annotations for EST datasets and eventually for their functions. The results were presented according to statistical information on protein annotation and Gene Ontology (GO) category. These all verified the reliability of apEST and the operability of this platform. The apEST can also be applied in other species by modifying some parameters and serves as a model for gene expression study for Lepidoptera.
doi:10.1673/031.010.11401
PMCID: PMC3016962  PMID: 20874595
EST analysis package; UniGene; Lepidoptera
7.  PEDB: the Prostate Expression Database. 
Nucleic Acids Research  1999;27(1):204-208.
The Prostate Expression Database (PEDB) is a curated relational database and suite of analysis tools designed for the study of prostate gene expression in normal and disease states. Expressed Sequence Tags (ESTs) and full-length cDNA sequences derived from more than 40 human prostate cDNA libraries are maintained and represent a wide spectrum of normal and pathological conditions. Detailed library information including tissue source, library construction methods, sequence diversity and abundance are available in a library archive. Prostate ESTs are assembled into distinct species groups using the multiple alignment program CAP2 and are annotated with information from the GenBank, dbEST and Unigene public sequence databases. Annotated sequences in PEDB are searched using the BLAST algorithm. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons. PEDB may be accessed via the World Wide Web at http://www.mbt.washington.edu/PEDB/
PMCID: PMC148136  PMID: 9847181
8.  Annotation of expressed sequence tags for the East African cichlid fish Astatotilapia burtoni and evolutionary analyses of cichlid ORFs 
BMC Genomics  2008;9:96.
Background
The cichlid fishes in general, and the exceptionally diverse East African haplochromine cichlids in particular, are famous examples of adaptive radiation and explosive speciation. Here we report the collection and annotation of more than 12,000 expressed sequence tags (ESTs) generated from three different cDNA libraries obtained from the East African haplochromine cichlid species Astatotilapia burtoni and Metriaclima zebra.
Results
We first annotated more than 12,000 newly generated cichlid ESTs using the Gene Ontology classification system. For evolutionary analyses, we combined these ESTs with all available sequence data for haplochromine cichlids, which resulted in a total of more than 45,000 ESTs. The ESTs represent a broad range of molecular functions and biological processes. We compared the haplochromine ESTs to sequence data from those available for other fish model systems such as pufferfish (Takifugu rubripes and Tetraodon nigroviridis), trout, and zebrafish. We characterized genes that show a faster or slower rate of base substitutions in haplochromine cichlids compared to other fish species, as this is indicative of a relaxed or reinforced selection regime. Four of these genes showed the signature of positive selection as revealed by calculating Ka/Ks ratios.
Conclusion
About 22% of the surveyed ESTs were found to have cichlid specific rate differences suggesting that these genes might play a role in lineage specific characteristics of cichlids. We also conclude that the four genes with a Ka/Ks ratio greater than one appear as good candidate genes for further work on the genetic basis of evolutionary success of haplochromine cichlid fishes.
doi:10.1186/1471-2164-9-96
PMCID: PMC2279125  PMID: 18298844
9.  TomatEST database: in silico exploitation of EST data to explore expression patterns in tomato species 
Nucleic Acids Research  2006;35(Database issue):D901-D905.
TomatEST is a secondary database integrating expressed sequence tag (EST)/cDNA sequence information from different libraries of multiple tomato species. Redundant EST collections from each species are organized into clusters (gene indices). A cluster consists of one or multiple contigs. Multiple contigs in a cluster represent alternatively transcribed forms of a gene. The set of stand-alone EST sequences (singletons) and contigs, representing all the computationally defined ‘Transcript Indices’, are annotated according to similarity versus protein and RNA family databases. Sequence function description is integrated with the Gene Ontologies and the Enzyme Commission identifiers for a standard classification of gene products and for the mapping of the expressed sequences onto metabolic pathways. Information on the origin of the ESTs, on their structural features, on clusters and contigs, as well as on functional annotations are accessible via a user-friendly web interface. Specific facilities in the database allow Transcript Indices from a query be automatically classified in Enzyme classes and in metabolic pathways. The ‘on the fly’ mapping onto the metabolic maps is integrated in the analytical tools. The TomatEST database website is freely available at .
doi:10.1093/nar/gkl921
PMCID: PMC1669777  PMID: 17142232
10.  Pattern analysis approach reveals restriction enzyme cutting abnormalities and other cDNA library construction artifacts using raw EST data 
BMC Biotechnology  2012;12:16.
Background
Expressed Sequence Tag (EST) sequences are widely used in applications such as genome annotation, gene discovery and gene expression studies. However, some of GenBank dbEST sequences have proven to be “unclean”. Identification of cDNA termini/ends and their structures in raw ESTs not only facilitates data quality control and accurate delineation of transcription ends, but also furthers our understanding of the potential sources of data abnormalities/errors present in the wet-lab procedures for cDNA library construction.
Results
After analyzing a total of 309,976 raw Pinus taeda ESTs, we uncovered many distinct variations of cDNA termini, some of which prove to be good indicators of wet-lab artifacts, and characterized each raw EST by its cDNA terminus structure patterns. In contrast to the expected patterns, many ESTs displayed complex and/or abnormal patterns that represent potential wet-lab errors such as: a failure of one or both of the restriction enzymes to cut the plasmid vector; a failure of the restriction enzymes to cut the vector at the correct positions; the insertion of two cDNA inserts into a single vector; the insertion of multiple and/or concatenated adapters/linkers; the presence of 3′-end terminal structures in designated 5′-end sequences or vice versa; and so on. With a close examination of these artifacts, many problematic ESTs that have been deposited into public databases by conventional bioinformatics pipelines or tools could be cleaned or filtered by our methodology. We developed a software tool for Abnormality Filtering and Sequence Trimming for ESTs (AFST, http://code.google.com/p/afst/) using a pattern analysis approach. To compare AFST with other pipelines that submitted ESTs into dbEST, we reprocessed 230,783 Pinus taeda and 38,709 Arachis hypogaea GenBank ESTs. We found 7.4% of Pinus taeda and 29.2% of Arachis hypogaea GenBank ESTs are “unclean” or abnormal, all of which could be cleaned or filtered by AFST.
Conclusions
cDNA terminal pattern analysis, as implemented in the AFST software tool, can be utilized to reveal wet-lab errors such as restriction enzyme cutting abnormities and chimeric EST sequences, detect various data abnormalities embedded in existing Sanger EST datasets, improve the accuracy of identifying and extracting bona fide cDNA inserts from raw ESTs, and therefore greatly benefit downstream EST-based applications.
doi:10.1186/1472-6750-12-16
PMCID: PMC3424822  PMID: 22554190
cDNA terminus; cDNA library construction; Pattern analysis; Restriction enzyme cutting abnormality; Chimeric EST sequences
11.  Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum 
Chinese Medicine  2010;5:9.
Background
Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library.
Methods
A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis.
Results
A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified.
Conclusion
The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum.
doi:10.1186/1749-8546-5-9
PMCID: PMC2848221  PMID: 20230644
12.  A White Campion (Silene latifolia) floral expressed sequence tag (EST) library: annotation, EST-SSR characterization, transferability, and utility for comparative mapping 
BMC Genomics  2009;10:243.
Background
Expressed sequence tag (EST) databases represent a valuable resource for the identification of genes in organisms with uncharacterized genomes and for development of molecular markers. One class of markers derived from EST sequences are simple sequence repeat (SSR) markers, also known as EST-SSRs. These are useful in plant genetic and evolutionary studies because they are located in transcribed genes and a putative function can often be inferred from homology searches. Another important feature of EST-SSR markers is their expected high level of transferability to related species that makes them very promising for comparative mapping. In the present study we constructed a normalized EST library from floral tissue of Silene latifolia with the aim to identify expressed genes and to develop polymorphic molecular markers.
Results
We obtained a total of 3662 high quality sequences from a normalized Silene cDNA library. These represent 3105 unigenes, with 73% of unigenes matching genes in other species. We found 255 sequences containing one or more SSR motifs. More than 60% of these SSRs were trinucleotides. A total of 30 microsatellite loci were identified from 106 ESTs having sufficient flanking sequences for primer design. The inheritance of these loci was tested via segregation analyses and their usefulness for linkage mapping was assessed in an interspecific cross. Tests for crossamplification of the EST-SSR loci in other Silene species established their applicability to related species.
Conclusion
The newly characterized genes and gene-derived markers from our Silene EST library represent a valuable genetic resource for future studies on Silene latifolia and related species. The polymorphism and transferability of EST-SSR markers facilitate comparative linkage mapping and analyses of genetic diversity in the genus Silene.
doi:10.1186/1471-2164-10-243
PMCID: PMC2689282  PMID: 19467153
13.  Quantitative Gene Expression Profiles in Real Time From Expressed Sequence Tag Databases 
Gene expression  2010;14(6):321-336.
An accumulation of expressed sequence tag (EST) data in the public domain and the availability of bioinformatic programs have made EST gene expression profiling a common practice. However, the utility and validity of using EST databases (e.g., dbEST) has been criticized, particularly for quantitative assessment of gene expression. Problems with EST sequencing errors, library construction, EST annotation, and multiple paralogs make generation of specific and sensitive qualitative and quantitative expression profiles a concern. In addition, most EST-derived expression data exists in previously assembled databases. The Virtual Northern Blot (VNB) (http://tlab.bu.edu/vnb.html) allows generation, evaluation, and optimization of expression profiles in real time, which is especially important for alternatively spliced, novel, or poorly characterized genes. Representative gene families with variable nucleotide sequence identity, tissue specificity, and levels of expression (bcl-xl, aldoA, and cyp2d9) are used to assess the quality of VNB’s output. The profiles generated by VNB are more sensitive and specific than those constructed with ESTs listed in preindexed databases at UCSI and NCBI. Moreover, quantitative expression profiles produced by VNB are comparable to quantization obtained from Northern blots and qPCR. The VNB pipeline generates real-time gene expression profiles for single-gene queries that are both qualitatively and quantitatively reliable.
PMCID: PMC2954622  PMID: 20635574
Expressed sequence tag (EST); Transcriptomics; Bioinformatics; Quantitative PCR; Northern blot
14.  Evaluation of the G protein coupled receptor-75 (GPR75) in age related macular degeneration 
BACKGROUND—A long term project was initiated to identify and to characterise genes that are expressed exclusively or preferentially in the retina as candidates for a genetic susceptibility to age related macular degeneration (AMD). A transcript represented by a cluster of five human expressed sequence tags (ESTs) derived exclusively from retinal cDNA libraries was identified.
METHODS—Northern blot and RT-PCR analyses confirmed preferential retinal expression of the gene, which encodes a G protein coupled receptor, GPR75. Following isolation of the full length cDNA and determination of the genomic organisation, the coding sequence of GPR75 was screened for mutations in 535 AMD patients and 252 controls from Germany, the United States, and Italy. Employed methods included single stranded conformational polymorphism (SSCP) analysis, denaturing high performance liquid chromatography (DHPLC), and direct sequencing.
RESULTS—Nine different sequence variations were identified in patients and control individuals. Three of these (-30A>C, 150G>A, and 346G>A) likely represent polymorphic variants. Each of six alterations (-4G>A, N78K, P99L, S108T, T135P, and Q234X) were found once in single AMD patients and were considered variants that could affect the protein function and potentially cause retinal pathology.
CONCLUSION—The presence of six potential pathogenic variants in a cohort of 535 AMD patients alone does not provide statistically significant evidence for the association of sequence variation in GPR75 with genetic predisposition to AMD. However, a possible connection between the variants and age related retinal pathology cannot be discarded. Functional studies are needed to clarify the role of GPR75 in retinal physiology.


doi:10.1136/bjo.85.8.969
PMCID: PMC1724093  PMID: 11466257
15.  ESTIMA, a tool for EST management in a multi-project environment 
BMC Bioinformatics  2004;5:176.
Background
Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users.
Results
A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline.
ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects.
Conclusions
The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from . The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.
doi:10.1186/1471-2105-5-176
PMCID: PMC533868  PMID: 15527510
16.  Defining the Human Macula Transcriptome and Candidate Retinal Disease Genes UsingEyeSAGE 
Purpose
To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE).
Methods
Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR.
Results
Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified.
Conclusions
The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.
doi:10.1167/iovs.05-1437
PMCID: PMC2813776  PMID: 16723438
17.  Construction of a full-length cDNA Library from Chinese oak silkworm pupa and identification of a KK-42-binding protein gene in relation to pupa-diapause termination 
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
PMCID: PMC2702828  PMID: 19564928
Chinese oak silkworm; Antheraea pernyi; cDNA library; Expressed sequence tag; KK-42-binding protein; diapause termination
18.  Identification of transcripts involved in meiosis and follicle formation during ovine ovary development 
BMC Genomics  2008;9:436.
Background
The key steps in germ cell survival during ovarian development are the entry into meiosis of oogonies and the formation of primordial follicles, which then determine the reproductive lifespan of the ovary. In sheep, these steps occur during fetal life, between 55 and 80 days of gestation, respectively. The aim of this study was to identify differentially expressed ovarian genes during prophase I meiosis and early folliculogenesis in sheep.
Results
In order to elucidate the molecular events associated with early ovarian differentiation, we generated two ovary stage-specific subtracted cDNA libraries using SSH. Large-scale sequencing of these SSH libraries identified 6,080 ESTs representing 2,535 contigs. Clustering and assembly of these ESTs resulted in a total of 2,101 unique sequences depicted in 1,305 singleton (62.11%) and 796 contigs (37.9%) ESTs (clusters). BLASTX evaluation indicated that 99% of the ESTs were homologous to various known genes/proteins in a broad range of organisms, especially ovine, bovine and human species. The remaining 1% which exhibited any homology to known gene sequences was considered as novel. Detailed study of the expression patterns of some of these genes using RT-PCR revealed new promising candidates for ovary differentiation genes in sheep.
Conclusion
We showed that the SSH approach was relevant to determining new mammalian genes which might be involved in oogenesis and early follicle development, and enabled the discovery of new potential oocyte and granulosa cell markers for future studies. These genes may have significant implications regarding our understanding of ovarian function in molecular terms, and for the development of innovative strategies to both promote and control fertility.
doi:10.1186/1471-2164-9-436
PMCID: PMC2566313  PMID: 18811939
19.  Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep 
BMC Genomics  2006;7:172.
Background
The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes.
Results
In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis.
Conclusion
The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.
doi:10.1186/1471-2164-7-172
PMCID: PMC1578570  PMID: 16822315
20.  The Human EST Ontology Explorer: a tissue-oriented visualization system for ontologies distribution in human EST collections 
BMC Bioinformatics  2009;10(Suppl 12):S2.
Background
The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) , a web facility for comparison of expression levels among libraries from several healthy and diseased tissues.
Results
The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed.
Conclusion
The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.
doi:10.1186/1471-2105-10-S12-S2
PMCID: PMC2762067  PMID: 19828078
21.  Transcriptomic analysis of the entomopathogenic nematode Heterorhabditis bacteriophora TTO1 
BMC Genomics  2009;10:205.
Background
The entomopathogenic nematode Heterorhabditis bacteriophora and its symbiotic bacterium, Photorhabdus luminescens, are important biological control agents of insect pests. This nematode-bacterium-insect association represents an emerging tripartite model for research on mutualistic and parasitic symbioses. Elucidation of mechanisms underlying these biological processes may serve as a foundation for improving the biological control potential of the nematode-bacterium complex. This large-scale expressed sequence tag (EST) analysis effort enables gene discovery and development of microsatellite markers. These ESTs will also aid in the annotation of the upcoming complete genome sequence of H. bacteriophora.
Results
A total of 31,485 high quality ESTs were generated from cDNA libraries of the adult H. bacteriophora TTO1 strain. Cluster analysis revealed the presence of 3,051 contigs and 7,835 singletons, representing 10,886 distinct EST sequences. About 72% of the distinct EST sequences had significant matches (E value < 1e-5) to proteins in GenBank's non-redundant (nr) and Wormpep190 databases. We have identified 12 ESTs corresponding to 8 genes potentially involved in RNA interference, 22 ESTs corresponding to 14 genes potentially involved in dauer-related processes, and 51 ESTs corresponding to 27 genes potentially involved in defense and stress responses. Comparison to ESTs and proteins of free-living nematodes led to the identification of 554 parasitic nematode-specific ESTs in H. bacteriophora, among which are those encoding F-box-like/WD-repeat protein theromacin, Bax inhibitor-1-like protein, and PAZ domain containing protein. Gene Ontology terms were assigned to 6,685 of the 10,886 ESTs. A total of 168 microsatellite loci were identified with primers designable for 141 loci.
Conclusion
A total of 10,886 distinct EST sequences were identified from adult H. bacteriophora cDNA libraries. BLAST searches revealed ESTs potentially involved in parasitism, RNA interference, defense responses, stress responses, and dauer-related processes. The putative microsatellite markers identified in H. bacteriophora ESTs will enable genetic mapping and population genetic studies. These genomic resources provide the material base necessary for genome annotation, microarray development, and in-depth gene functional analysis.
doi:10.1186/1471-2164-10-205
PMCID: PMC2686736  PMID: 19405965
22.  Comparative genomic mapping of uncharacterized canine retinal ESTs to identify novel candidate genes for hereditary retinal disorders 
Molecular Vision  2009;15:927-936.
Purpose
To identify the genomic location of previously uncharacterized canine retina-expressed expressed sequence tags (ESTs), and thus identify potential candidate genes for heritable retinal disorders.
Methods
A set of over 500 retinal canine ESTs were mapped onto the canine genome using the RHDF5000–2 radiation hybrid (RH) panel, and the resulting map positions were compared to their respective localization in the CanFam2 assembly of the canine genome sequence.
Results
Unique map positions could be assigned for 99% of the mapped clones, of which only 29% showed significant homology to known RefSeq sequences. A comparison between RH map and sequence assembly indicated some areas of discrepancy. Retinal expressed genes were not concentrated in particular areas of the canine genome, and also were located on the canine Y chromosome (CFAY). Several of the EST clones were located within areas of conserved synteny to human retinal disease loci.
Conclusions
RH mapping of canine retinal ESTs provides insight into the location of potential candidate genes for hereditary retinal disorders, and, by comparison with the assembled canine genome sequence, highlights inconsistencies with the current assembly. Regions of conserved synteny between the canine and the human genomes allow this information to be extrapolated to identify potential positional candidate genes for mapped human retinal disorders. Furthermore, these ESTs can help identify novel or uncharacterized genes of significance for better understanding of retinal morphology, physiology, and pathology.
PMCID: PMC2683029  PMID: 19452016
23.  Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs 
BMC Genomics  2007;8:177.
Background
EST sequencing is one of the most efficient means for gene discovery and molecular marker development, and can be additionally utilized in both comparative genome analysis and evaluation of gene duplications. While much progress has been made in catfish genomics, large-scale EST resources have been lacking. The objectives of this project were to construct primary cDNA libraries, to conduct initial EST sequencing to generate catfish EST resources, and to obtain baseline information about highly expressed genes in various catfish organs to provide a guide for the production of normalized and subtracted cDNA libraries for large-scale transcriptome analysis in catfish.
Results
A total of 17 cDNA libraries were constructed including 12 from channel catfish (Ictalurus punctatus) and 5 from blue catfish (I. furcatus). A total of 31,215 ESTs, with average length of 778 bp, were generated including 20,451 from the channel catfish and 10,764 from blue catfish. Cluster analysis indicated that 73% of channel catfish and 67% of blue catfish ESTs were unique within the project. Over 53% and 50% of the channel catfish and blue catfish ESTs, respectively, had significant similarities to known genes. All ESTs have been deposited in GenBank. Evaluation of the catfish EST resources demonstrated their potential for molecular marker development, comparative genome analysis, and evaluation of ancient and recent gene duplications. Subtraction of abundantly expressed genes in a variety of catfish tissues, identified here, will allow the production of low-redundancy libraries for in-depth sequencing.
Conclusion
The sequencing of 31,215 ESTs from channel catfish and blue catfish has significantly increased the EST resources in catfish. The EST resources should provide the potential for microarray development, polymorphic marker identification, mapping, and comparative genome analysis.
doi:10.1186/1471-2164-8-177
PMCID: PMC1906771  PMID: 17577415
24.  OREST: the online resource for EST analysis 
Nucleic Acids Research  2008;36(Web Server issue):W140-W144.
The generation of expressed sequence tag (EST) libraries offers an affordable approach to investigate organisms, if no genome sequence is available. OREST (http://mips.gsf.de/genre/proj/orest/index.html) is a server-based EST analysis pipeline, which allows the rapid analysis of large amounts of ESTs or cDNAs from mammalia and fungi. In order to assign the ESTs to genes or proteins OREST maps DNA sequences to reference datasets of gene products and in a second step to complete genome sequences. Mapping against genome sequences recovers additional 13% of EST data, which otherwise would escape further analysis. To enable functional analysis of the datasets, ESTs are functionally annotated using the hierarchical FunCat annotation scheme as well as GO annotation terms. OREST also allows to predict the association of gene products and diseases by Morbid Map (OMIM) classification. A statistical analysis of the results of the dataset is possible with the included PROMPT software, which provides information about enrichment and depletion of functional and disease annotation terms. OREST was successfully applied for the identification and functional characterization of more than 3000 EST sequences of the common marmoset monkey (Callithrix jacchus) as part of an international collaboration.
doi:10.1093/nar/gkn253
PMCID: PMC2447738  PMID: 18463135
25.  Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis 
Nucleic Acids Research  2002;30(24):5579-5592.
We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5′- and 3′-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by ‘electronic northern’ representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.
PMCID: PMC140066  PMID: 12490726

Results 1-25 (666860)