1.  Identifying and mapping novel retinal-expressed ESTs from humans 
Molecular vision  1999;5:5.
The goal of this study was to develop efficient methods to identify tissue-specific expressed sequence tags (ESTs) and to map their locations in the human genome. Through a combination of database analysis and laboratory investigation, unique retina-specific ESTs were identified and mapped as candidate genes for inherited retinal diseases.
DNA sequences from retina-specific EST clusters were obtained from the TIGR Human Gene Index Database. Further processing of the EST sequence data was necessary to ensure that each EST cluster represented a novel, non-redundant mapping candidate. Processing involved screening for homologies to known genes and proteins using BLAST, excluding known human gene sequences and repeat sequences, and developing primers for PCR amplification of the gene encoding each cDNA cluster from genomic DNA. The EST clusters were mapped using the GeneBridge 4.0 Radiation Hybrid Mapping Panel with standard PCR conditions.
A total of 83 retinal-expressed EST clusters were examined as potential novel, non-redundant mapping candidates. Fifty-five clusters were mapped successfully and their locations compared to the locations of known retinal disease genes. Fourteen EST clusters localize to candidate regions for inherited retinal diseases.
This pilot study developed methodology for mapping uniquely expressed retinal ESTs and for identifying potential candidate genes for inherited retinal disorders. Despite the overall success, several complicating factors contributed to the high failure rate (33%) for mapping EST-clustered sequences. These include redundancy in the sequence data, widely dispersed sequences, ambiguous nucleotides within the sequences, the possibility of amplifying through introns and the presence of repetitive elements within the sequence. However, the combination of database analysis and laboratory mapping is a powerful method for identification of candidate genes for inherited diseases.
PMCID: PMC2583080  PMID: 10228186
2.  Annotation and analysis of 10,000 expressed sequence tags from developing mouse eye and adult retina 
Genome Biology  2003;4(10):R65.
The generation and analysis of 10,000 expressed sequence tags (ESTs) from three mouse eye tissue cDNA libraries is reported that identifies a large number of potentially interesting genes for biological investigation.
As a biomarker of cellular activities, the transcriptome of a specific tissue or cell type during development and disease is of great biomedical interest. We have generated and analyzed 10,000 expressed sequence tags (ESTs) from three mouse eye tissue cDNA libraries: embryonic day 15.5 (M15E) eye, postnatal day 2 (M2PN) eye and adult retina (MRA).
Annotation of 8,633 non-mitochondrial and non-ribosomal high-quality ESTs revealed that 57% of the sequences represent known genes and 43% are unknown or novel ESTs, with M15E having the highest percentage of novel ESTs. Of these, 2,361 ESTs correspond to 747 unique genes and the remaining 6,272 are represented only once. Phototransduction genes are preferentially identified in MRA, whereas transcripts for cell structure and regulatory proteins are highly expressed in the developing eye. Map locations of human orthologs of known genes uncovered a high density of ocular genes on chromosome 17, and identified 277 genes in the critical regions of 37 retinal disease loci. In silico expression profiling identified 210 genes and/or ESTs over-expressed in the eye; of these, more than 26 are known to have vital retinal function. Comparisons between libraries provided a list of temporally regulated genes and/or ESTs. A few of these were validated by qRT-PCR analysis.
Our studies present a large number of potentially interesting genes for biological investigation, and the annotated EST set provides a useful resource for microarray and functional genomic studies.
PMCID: PMC328454  PMID: 14519200
3.  Onecut 1 and Onecut 2 Are Potential Regulators of Mouse Retinal Development 
Our current study focuses on the expression of two members of the onecut transcription factor family, One-cut1 (Oc1) and Onecut2 (Oc2), in the developing mouse retina. By immunofluorescence staining, we found that Oc1 and Oc2 had very similar expression patterns throughout retinal development. Both factors started to be expressed in the retina at around embryonic day (E) 11.5. At early stages (E11.5 and E12.5), they were expressed in both the neuroblast layer (NBL) and ganglion cell layer (GCL). As development progressed (from E14.5 to postnatal day [P] 0), expression diminished in the retinal progenitor cells and became more restricted to the GCL. By P5, Oc1 and Oc2 were expressed at very low levels in the GCL. By co-labeling with transcription factors known to be involved in retinal ganglion cell (RGC) development, we found that Oc1 and Oc2 had extensive overlap with Math5 in the NBL, and that they completely overlapped with Pou4f2 and Isl1 in the GCL, but only partially in the NBL. Co-labeling of Oc1 with cell cycle markers confirmed that Oc1 was expressed in both proliferating retinal progenitors and postmitotic retinal cells. In addition, we demonstrated that expression of Oc1 and Oc2 did not require Math5, Isl1, or Pou4f2. Thus, Oc1 and Oc2 may regulate the formation of RGCs in a pathway independent of Math5, Pou4f2, and Isl1. Furthermore, we showed that Oc1 and Oc2 were expressed in both developing and mature horizontal cells (HCs). Therefore the two factors may also function in the genesis and maintenance of HCs. J. Comp. Neurol. 520:952–969, 2012.
PMCID: PMC3898336  PMID: 21830221
retina; transcription factors; cell differentiation; retinal ganglion cells; horizontal cells; retinal development
4.  The Prostate Expression Database (PEDB): status and enhancements in 2000 
Nucleic Acids Research  2000;28(1):212-213.
The Prostate Expression Database (PEDB) is an online resource designed to access and analyze gene expression information derived from the human prostate. PEDB archives >55 000 expressed sequence tags (ESTs) from 43 cDNA libraries in a curated relational database that provides detailed library information including tissue source, library construction methods, sequence diversity and sequence abundance. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons. Recent enhancements to PEDB include: (i) the functional categorization of annotated EST assemblies using a classification scheme developed at The Institute for Genome Research; (ii) catalogs of expressed genes in specific prostate tissue sources designated as transcriptomes; and (iii) the addition of prostate proteome information derived from two-dimensional electrophoreses and mass spectrometry of prostate cancer cell lines. PEDB may be accessed via the WWW at
PMCID: PMC102457  PMID: 10592228
5.  Evaluation of the G protein coupled receptor-75 (GPR75) in age related macular degeneration 
BACKGROUND—A long term project was initiated to identify and to characterise genes that are expressed exclusively or preferentially in the retina as candidates for a genetic susceptibility to age related macular degeneration (AMD). A transcript represented by a cluster of five human expressed sequence tags (ESTs) derived exclusively from retinal cDNA libraries was identified.
METHODS—Northern blot and RT-PCR analyses confirmed preferential retinal expression of the gene, which encodes a G protein coupled receptor, GPR75. Following isolation of the full length cDNA and determination of the genomic organisation, the coding sequence of GPR75 was screened for mutations in 535 AMD patients and 252 controls from Germany, the United States, and Italy. Employed methods included single stranded conformational polymorphism (SSCP) analysis, denaturing high performance liquid chromatography (DHPLC), and direct sequencing.
RESULTS—Nine different sequence variations were identified in patients and control individuals. Three of these (-30A>C, 150G>A, and 346G>A) likely represent polymorphic variants. Each of six alterations (-4G>A, N78K, P99L, S108T, T135P, and Q234X) were found once in single AMD patients and were considered variants that could affect the protein function and potentially cause retinal pathology.
CONCLUSION—The presence of six potential pathogenic variants in a cohort of 535 AMD patients alone does not provide statistically significant evidence for the association of sequence variation in GPR75 with genetic predisposition to AMD. However, a possible connection between the variants and age related retinal pathology cannot be discarded. Functional studies are needed to clarify the role of GPR75 in retinal physiology.

PMCID: PMC1724093  PMID: 11466257
6.  Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs 
BMC Genomics  2007;8:177.
EST sequencing is one of the most efficient means for gene discovery and molecular marker development, and can be additionally utilized in both comparative genome analysis and evaluation of gene duplications. While much progress has been made in catfish genomics, large-scale EST resources have been lacking. The objectives of this project were to construct primary cDNA libraries, to conduct initial EST sequencing to generate catfish EST resources, and to obtain baseline information about highly expressed genes in various catfish organs to provide a guide for the production of normalized and subtracted cDNA libraries for large-scale transcriptome analysis in catfish.
A total of 17 cDNA libraries were constructed including 12 from channel catfish (Ictalurus punctatus) and 5 from blue catfish (I. furcatus). A total of 31,215 ESTs, with average length of 778 bp, were generated including 20,451 from the channel catfish and 10,764 from blue catfish. Cluster analysis indicated that 73% of channel catfish and 67% of blue catfish ESTs were unique within the project. Over 53% and 50% of the channel catfish and blue catfish ESTs, respectively, had significant similarities to known genes. All ESTs have been deposited in GenBank. Evaluation of the catfish EST resources demonstrated their potential for molecular marker development, comparative genome analysis, and evaluation of ancient and recent gene duplications. Subtraction of abundantly expressed genes in a variety of catfish tissues, identified here, will allow the production of low-redundancy libraries for in-depth sequencing.
The sequencing of 31,215 ESTs from channel catfish and blue catfish has significantly increased the EST resources in catfish. The EST resources should provide the potential for microarray development, polymorphic marker identification, mapping, and comparative genome analysis.
PMCID: PMC1906771  PMID: 17577415
7.  Construction and Application of an Electronic Spatiotemporal Expression Profile and Gene Ontology Analysis Platform Based on the EST Database of the Silkworm, Bombyx mori  
An Expressed Sequence Tag (EST) is a short sub-sequence of a transcribed cDNA sequence. ESTs represent gene expression and give good clues for gene expression analysis. Based on EST data obtained from NCBI, an EST analysis package was developed (apEST). This tool was programmed for electronic expression, protein annotation and Gene Ontology (GO) category analysis in Bombyx mori (L.) (Lepidoptera: Bombycidae). A total of 245,761 ESTs (as of 01 July 2009) were searched and downloaded in FASTA format, from which information for tissue type, development stage, sex and strain were extracted, classified and summed by running apEST. Then, corresponding distribution profiles were formed after redundant parts had been removed. Gene expression profiles for one tissue of different developmental stages and from one development stage of the different tissues were attained. A housekeeping gene and tissue-and-stage-specific genes were selected by running apEST, contrasting with two other online analysis approaches, microarray-based gene expression profile on SilkDB (BmMDB) and EST profile on NCBI. A spatio-temporal expression profile of catalase run by apEST was then presented as a three-dimensional graph for the intuitive visualization of patterns. A total of 37 query genes confirmed from microarray data and RT—PCR experiments were selected as queries to test apEST. The results had great conformity among three approaches. Nevertheless, there were minor differences between apEST and BmMDB because of the unique items investigated. Therefore, complementary analysis was proposed. Application of apEST also led to the acquisition of corresponding protein annotations for EST datasets and eventually for their functions. The results were presented according to statistical information on protein annotation and Gene Ontology (GO) category. These all verified the reliability of apEST and the operability of this platform. The apEST can also be applied in other species by modifying some parameters and serves as a model for gene expression study for Lepidoptera.
PMCID: PMC3016962  PMID: 20874595
EST analysis package; UniGene; Lepidoptera
8.  Comparative Gene Expression Analysis of Susceptible and Resistant Near-Isogenic Lines in Common Wheat Infected by Puccinia triticina 
Gene expression after leaf rust infection was compared in near-isogenic wheat lines differing in the Lr10 leaf rust resistance gene. RNA from susceptible and resistant plants was used for cDNA library construction. In total, 55 008 ESTs were sequenced from the two libraries, then combined and assembled into 14 268 unigenes for further analysis. Of these ESTs, 89% encoded proteins similar to (E value of ≤10−5) characterized or annotated proteins from the NCBI non-redundant database representing diverse molecular functions, cellular localization and biological processes based on gene ontology classification. Further, the unigenes were classified into susceptible and resistant classes based on the EST members assembled from the respective libraries. Several genes from the resistant sample (14-3-3 protein, wali5 protein, actin-depolymerization factor and ADP-ribosylation factor) and the susceptible sample (brown plant hopper resistance protein, caffeic acid O-methyltransferase, pathogenesis-related protein and senescence-associated protein) were selected and their differential expression in the resistant and susceptible samples collected at different time points after leaf rust infection was confirmed by RT–PCR analysis. The molecular pathogenicity of leaf rust in wheat was studied and the EST data generated made a foundation for future studies.
PMCID: PMC2920755  PMID: 20360266
wheat; leaf rust; ESTs; resistance; susceptible
9.  Analysis of the Asian Seabass Transcriptome Based on Expressed Sequence Tags 
Analysis of transcriptomes is of great importance in genomic studies. Asian seabass is an important fish species. A number of genomic tools in it were developed, while large expressed sequence tag (EST) data are lacking. We sequenced ESTs from nine normalized cDNA libraries and obtained 11 431 high-quality ESTs. We retrieved 8524 ESTs from dbEST database and analyzed all 19 975 ESTs using bioinformatics tools. After clustering, we obtained 8837 unique sequences (2838 contigs and 5999 singletons). The average contig length was 574 bp. Annotation of these unique sequences revealed that 48.9% of them showed significant homology to RNA sequences in GenBank. Functional classification of the unique ESTs identified a broad range of genes involved in different functions. We identified 6114 putative single-nucleotide polymorphisms and 634 microsatellites in ESTs. We discovered different temporal and spatial expression patterns of some immune-related genes in the Asian seabass after challenging with a pathogen Vibrio harveyi. The unique EST sequences are being used in developing a cDNA microarray to examine global gene expression and will also facilitate future whole-genome sequence assembly and annotation of Asian seabass and comparative genomics.
PMCID: PMC3223082  PMID: 22086997
Asian seabass; EST; function; expression
10.  Transcriptional Profiling of ESTs from the Biocontrol Fungus Chaetomium cupreum 
The Scientific World Journal  2012;2012:340565.
Comparative analysis was applied to two cDNA/ESTs libraries (C1 and C2) from Chaetomium cupreum. A total of 5538 ESTs were sequenced and assembled into 2162 unigenes including 585 contigs and 1577 singletons. BlastX analysis enabled the identification of 1211 unigenes with similarities to sequences in the public databases. MFS monosaccharide transporter was found as the gene expressed at the highest level in library C2, but no expression in C1. The majority of unigenes were library specific. Comparative analysis of the ESTs further revealed the difference of C. cupreum in gene expression and metabolic pathways between libraries. Two different sequences similar to the 48-KDa endochitinase and 46-KDa endochitinase were identified in libraries C1 and C2, respectively.
PMCID: PMC3289965  PMID: 22448129
11.  Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome 
BMC Plant Biology  2008;8:101.
There is no dedicated database available for Expressed Sequence Tags (EST) of the chili pepper (Capsicum annuum), although the interest in a chili pepper EST database is increasing internationally due to the nutritional, economic, and pharmaceutical value of the plant. Recent advances in high-throughput sequencing of the ESTs of chili pepper cv. Bukang have produced hundreds of thousands of complementary DNA (cDNA) sequences. Therefore, a chili pepper EST database was designed and constructed to enable comprehensive analysis of chili pepper gene expression in response to biotic and abiotic stresses.
We built the Pepper EST database to mine the complexity of chili pepper ESTs. The database was built on 122,582 sequenced ESTs and 116,412 refined ESTs from 21 pepper EST libraries. The ESTs were clustered and assembled into virtual consensus cDNAs and the cDNAs were assigned to metabolic pathway, Gene Ontology (GO), and MIPS Functional Catalogue (FunCat). The Pepper EST database is designed to provide a workbench for (i) identifying unigenes in pepper plants, (ii) analyzing expression patterns in different developmental tissues and under conditions of stress, and (iii) comparing the ESTs with those of other members of the Solanaceae family. The Pepper EST database is freely available at .
The Pepper EST database is expected to provide a high-quality resource, which will contribute to gaining a systemic understanding of plant diseases and facilitate genetics-based population studies. The database is also expected to contribute to analysis of gene synteny as part of the chili pepper sequencing project by mapping ESTs to the genome.
PMCID: PMC2575210  PMID: 18844979
12.  A comparison of expressed sequence tags (ESTs) to human genomic sequences. 
Nucleic Acids Research  1997;25(8):1626-1632.
The Expressed Sequence Tag (EST) division of GenBank, dbEST, is a large repository of the data being generated by human genome sequencing centers. ESTs are short, single pass cDNA sequences generated from randomly selected library clones. The approximately 415 000 human ESTs represent a valuable, low priced, and easily accessible biological reagent. As many ESTs are derived from yet uncharacterized genes, dbEST is a prime starting point for the identification of novel mRNAs. Conversely, other genes are represented by hundreds of ESTs, a redundancy which may provide data about rare mRNA isoforms. Here we present an analysis of >1000 ESTs generated by the WashU-Merck EST project. These ESTs were collected by querying dbEST with the genomic sequences of 15 human genes. When we aligned the matching ESTs to the genomic sequences, we found that in one gene, 73% of the ESTs which derive from spliced or partially spliced transcripts either contain intron sequences or are spliced at previously unreported sites; other genes have lower percentages of such ESTs, and some have none. This finding suggests that ESTs could provide researchers with novel information about alternative splicing in certain genes. In a related analysis of pairs of ESTs which are reported to derive from a single gene, we found that as many as 26% of the pairs do not BOTH align with the sequence of the same gene. We suspect that some of these unusual ESTs result from artifacts in EST generation, and caution researchers that they may find such clones while analyzing sequences in dbEST.
PMCID: PMC146621  PMID: 9092672
13.  PEDB: the Prostate Expression Database. 
Nucleic Acids Research  1999;27(1):204-208.
The Prostate Expression Database (PEDB) is a curated relational database and suite of analysis tools designed for the study of prostate gene expression in normal and disease states. Expressed Sequence Tags (ESTs) and full-length cDNA sequences derived from more than 40 human prostate cDNA libraries are maintained and represent a wide spectrum of normal and pathological conditions. Detailed library information including tissue source, library construction methods, sequence diversity and abundance are available in a library archive. Prostate ESTs are assembled into distinct species groups using the multiple alignment program CAP2 and are annotated with information from the GenBank, dbEST and Unigene public sequence databases. Annotated sequences in PEDB are searched using the BLAST algorithm. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons. PEDB may be accessed via the World Wide Web at
PMCID: PMC148136  PMID: 9847181
14.  Generation, annotation and analysis of ESTs from Trichoderma harzianum CECT 2413 
BMC Genomics  2006;7:193.
The filamentous fungus Trichoderma harzianum is used as biological control agent of several plant-pathogenic fungi. In order to study the genome of this fungus, a functional genomics project called "TrichoEST" was developed to give insights into genes involved in biological control activities using an approach based on the generation of expressed sequence tags (ESTs).
Eight different cDNA libraries from T. harzianum strain CECT 2413 were constructed. Different growth conditions involving mainly different nutrient conditions and/or stresses were used. We here present the analysis of the 8,710 ESTs generated. A total of 3,478 unique sequences were identified of which 81.4% had sequence similarity with GenBank entries, using the BLASTX algorithm. Using the Gene Ontology hierarchy, we performed the annotation of 51.1% of the unique sequences and compared its distribution among the gene libraries. Additionally, the InterProScan algorithm was used in order to further characterize the sequences. The identification of the putatively secreted proteins was also carried out. Later, based on the EST abundance, we examined the highly expressed genes and a hydrophobin was identified as the gene expressed at the highest level. We compared our collection of ESTs with the previous collections obtained from Trichoderma species and we also compared our sequence set with different complete eukaryotic genomes from several animals, plants and fungi. Accordingly, the presence of similar sequences in different kingdoms was also studied.
This EST collection and its annotation provide a significant resource for basic and applied research on T. harzianum, a fungus with a high biotechnological interest.
PMCID: PMC1562415  PMID: 16872539
15.  The human (PEDB) and mouse (mPEDB) Prostate Expression Databases 
Nucleic Acids Research  2002;30(1):218-220.
The Prostate Expression Databases (PEDB and mPEDB) are online resources designed to allow researchers to access and analyze gene expression information derived from the human and murine prostate, respectively. Human PEDB archives more than 84 000 Expressed Sequence Tags (ESTs) from 38 prostate cDNA libraries in a curated relational database that provides detailed library information including tissue source, library construction methods, sequence diversity and sequence abundance. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library sequence comparisons. Recent enhancements to PEDB include (i) the development of a murine prostate expression database, mPEDB, that complements the human gene expression information in PEDB, (ii) the assembly of a non-redundant sequence set or ‘prostate unigene’ that represents the diversity of gene expression in the prostate, and (iii) an expanded search tool that supports both text-based and BLAST queries. PEDB and mPEDB are accessible via the World Wide Web at and
PMCID: PMC99083  PMID: 11752298
16.  Gene Discovery in the Auditory System: Characterization of Additional Cochlear-Expressed Sequences  
To identify genes involved in hearing, 8494 expressed sequence tags (ESTs) were generated from a human fetal cochlear cDNA library in two distinct sequencing projects. Analysis of the first set of 4304 ESTs revealed clones representing 517 known human genes, 41 mammalian genes not previously detected in human tissues, 487 ESTs from other human tissues, and 541 cochlear-specific ESTs ( ). We now report results of a DNA sequence similarity (BLAST) analysis of an additional 4190 cochlear ESTs and a comparison to the first set. Among the 4190 new cochlear ESTs, 959 known human genes were identified; 594 were found only among the new ESTs and 365 were found among ESTs from both sequencing projects. COL1A2 was the most abundant transcript among both sets of ESTs, followed in order by COL3A1, SPARC, EEF1A1, and TPTI. An additional 22 human homologs of known nonhuman mammalian genes and 1595 clusters of ESTs, of which 333 are cochlear-specific, were identified among the new cochlear ESTs. Map positions were determined for 373 of the new cochlear ESTs and revealed 318 additional loci. Forty-nine of the mapped ESTs are located within the genetic interval of 23 deafness loci. Reanalysis of unassigned ESTs from the prior study revealed 338 additional known human genes. The total number of known human genes identified from 8494 cochlear ESTs is 1449 and is represented by 4040 ESTs. Among the known human genes are 14 deafness-associated genes, including GJB2 (connexin 26) and KVLQT1. The total number of nonhuman mammalian genes identified is 43 and is represented by 58 ESTs. The total number of ESTs without sequence similarity to known genes is 4055. Of these, 778 also do not have sequence similarity to any other ESTs, are categorized into 700 clusters, and may represent genes uniquely or preferentially expressed in the cochlea. Identification of additional known genes, ESTs, and cochlear-specific ESTs provides new candidate genes for both syndromic and nonsyndromic deafness disorders.
PMCID: PMC3202364  PMID: 12083723
ESTs; genes; cochlea; cochlear-expressed genes
17.  A White Campion (Silene latifolia) floral expressed sequence tag (EST) library: annotation, EST-SSR characterization, transferability, and utility for comparative mapping 
BMC Genomics  2009;10:243.
Expressed sequence tag (EST) databases represent a valuable resource for the identification of genes in organisms with uncharacterized genomes and for development of molecular markers. One class of markers derived from EST sequences are simple sequence repeat (SSR) markers, also known as EST-SSRs. These are useful in plant genetic and evolutionary studies because they are located in transcribed genes and a putative function can often be inferred from homology searches. Another important feature of EST-SSR markers is their expected high level of transferability to related species that makes them very promising for comparative mapping. In the present study we constructed a normalized EST library from floral tissue of Silene latifolia with the aim to identify expressed genes and to develop polymorphic molecular markers.
We obtained a total of 3662 high quality sequences from a normalized Silene cDNA library. These represent 3105 unigenes, with 73% of unigenes matching genes in other species. We found 255 sequences containing one or more SSR motifs. More than 60% of these SSRs were trinucleotides. A total of 30 microsatellite loci were identified from 106 ESTs having sufficient flanking sequences for primer design. The inheritance of these loci was tested via segregation analyses and their usefulness for linkage mapping was assessed in an interspecific cross. Tests for crossamplification of the EST-SSR loci in other Silene species established their applicability to related species.
The newly characterized genes and gene-derived markers from our Silene EST library represent a valuable genetic resource for future studies on Silene latifolia and related species. The polymorphism and transferability of EST-SSR markers facilitate comparative linkage mapping and analyses of genetic diversity in the genus Silene.
PMCID: PMC2689282  PMID: 19467153
18.  The transcriptome analysis of early morphogenesis in Paracoccidioides brasiliensis mycelium reveals novel and induced genes potentially associated to the dimorphic process 
BMC Microbiology  2007;7:29.
Paracoccidioides brasiliensis is a human pathogen with a broad distribution in Latin America. The fungus is thermally dimorphic with two distinct forms corresponding to completely different lifestyles. Upon elevation of the temperature to that of the mammalian body, the fungus adopts a yeast-like form that is exclusively associated with its pathogenic lifestyle. We describe expressed sequence tags (ESTs) analysis to assess the expression profile of the mycelium to yeast transition. To identify P. brasiliensis differentially expressed sequences during conversion we performed a large-scale comparative analysis between P. brasiliensis ESTs identified in the transition transcriptome and databases.
Our analysis was based on 1107 ESTs from a transition cDNA library of P. brasiliensis. A total of 639 consensus sequences were assembled. Genes of primary metabolism, energy, protein synthesis and fate, cellular transport, biogenesis of cellular components were represented in the transition cDNA library. A considerable number of genes (7.51%) had not been previously reported for P. brasiliensis in public databases. Gene expression analysis using in silico EST subtraction revealed that numerous genes were more expressed during the transition phase when compared to the mycelial ESTs [1]. Classes of differentially expressed sequences were selected for further analysis including: genes related to the synthesis/remodeling of the cell wall/membrane. Thirty four genes from this family were induced. Ten genes related to signal transduction were increased. Twelve genes encoding putative virulence factors manifested increased expression. The in silico approach was validated by northern blot and semi-quantitative RT-PCR.
The developmental program of P. brasiliensis is characterized by significant differential positive modulation of the cell wall/membrane related transcripts, and signal transduction proteins, suggesting the related processes important contributors to dimorphism. Also, putative virulence factors are more expressed in the transition process suggesting adaptation to the host of the yeast incoming parasitic phase. Those genes provide ideal candidates for further studies directed at understanding fungal morphogenesis and its regulation.
PMCID: PMC1855332  PMID: 17425801
19.  Bioinformatic analysis of fruit-specific expressed sequence tag libraries of Diospyros kaki Thunb.: view at the transcriptome at different developmental stages 
3 Biotech  2011;1(1):35-45.
We present here a systematic analysis of the Diospyros kaki expressed sequence tags (ESTs) generated from development stage-specific libraries. A total of 2,529 putative tentative unigenes were identified in the MF library whereas the OYF library displayed 3,775 tentative unigenes. Among the two cDNA libraries, 325 EST-Simple sequence repeats (SSRs) in 296 putative unigenes were detected in the MF library showing an occurrence of 11.7% with a frequency of 1 SSR/3.16 kb whereas the OYF library had an EST-SSRs occurrence of 10.8% with 407 EST-SSRs in the 352 putative unigenes with a frequency of 1 SSR/2.92 kb. We observed a higher frequency of SNPs and indels in the OYF library (20.94 SNPs/indels per 100 bp) in comparison to MF library showed a relatively lower frequency (0.74 SNPs/indels per 100 bp). A combined homology and secondary structure analysis approach identified a potential miRNA precursor, an ortholog of miR159, and potential miR159 targets, in the development-specific ESTs of D. kaki.
Electronic supplementary material
The online version of this article (doi:10.1007/s13205-011-0005-9) contains supplementary material, which is available to authorized users.
PMCID: PMC3339603  PMID: 22558534
Diospyros kaki; Expressed sequence tag; GC3 biology; MicroRNA; SSRs; SSR-FDM; SNPs; Chemistry; Biotechnology; Stem Cells; Biomaterials; Bioinformatics; Agriculture; Cancer Research
21.  Analysis of Expressed Sequence Tags and Characterization of a Novel Gene, Slmg7, in the Midgut of the Common Cutworm, Spodoptera litura 
PLoS ONE  2012;7(3):e33621.
Out of total 3,081 assembled expressed sequence tags (ESTs) sequences representing 6,815 high-quality ESTs identified in three cDNA libraries constructed with RNA isolated from the midgut of Spodoptera litura, 1,039 ESTs showed significant hits and 1,107 ESTs did not show significant hits in BLAST searches. It is of interest to clarify whether or not these ESTs that did not show hits function in S. Litura.
Twenty “no-hit” ESTs containing at least one putative open reading frame were selected for further expression analysis. The results from northern blot analysis showed that six of the selected ESTs are expressed in the larval midgut of this insect at different levels, suggesting that these ESTs represent true mRNA products, whereas the other 14 ESTs could not be detected. Homologues of the four larval midgut-predominant genes (Slmg2, Slmg7, Slmg9 and Slmg17) were detected in the genomes of other lepidopteran insects but not in Drosophila melanogaster. A novel gene, Slmg7, is expressed at a high level specifically in the midgut during each of the larval stages. Slmg7 is a single copy gene and encodes a 143-amino acids protein. The SLMG7 protein was localized to the cytoplasm of Spli-221 cells.
Six ESTs from the no hit list are transcribed into mRNA and are mainly expressed in the midgut of S. litura. Slmg7 is a novel gene that is localized to the cytoplasm.
PMCID: PMC3314667  PMID: 22470457
22.  Identification of transcripts involved in meiosis and follicle formation during ovine ovary development 
BMC Genomics  2008;9:436.
The key steps in germ cell survival during ovarian development are the entry into meiosis of oogonies and the formation of primordial follicles, which then determine the reproductive lifespan of the ovary. In sheep, these steps occur during fetal life, between 55 and 80 days of gestation, respectively. The aim of this study was to identify differentially expressed ovarian genes during prophase I meiosis and early folliculogenesis in sheep.
In order to elucidate the molecular events associated with early ovarian differentiation, we generated two ovary stage-specific subtracted cDNA libraries using SSH. Large-scale sequencing of these SSH libraries identified 6,080 ESTs representing 2,535 contigs. Clustering and assembly of these ESTs resulted in a total of 2,101 unique sequences depicted in 1,305 singleton (62.11%) and 796 contigs (37.9%) ESTs (clusters). BLASTX evaluation indicated that 99% of the ESTs were homologous to various known genes/proteins in a broad range of organisms, especially ovine, bovine and human species. The remaining 1% which exhibited any homology to known gene sequences was considered as novel. Detailed study of the expression patterns of some of these genes using RT-PCR revealed new promising candidates for ovary differentiation genes in sheep.
We showed that the SSH approach was relevant to determining new mammalian genes which might be involved in oogenesis and early follicle development, and enabled the discovery of new potential oocyte and granulosa cell markers for future studies. These genes may have significant implications regarding our understanding of ovarian function in molecular terms, and for the development of innovative strategies to both promote and control fertility.
PMCID: PMC2566313  PMID: 18811939
23.  Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep 
BMC Genomics  2006;7:172.
The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes.
In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis.
The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.
PMCID: PMC1578570  PMID: 16822315
24.  The Human EST Ontology Explorer: a tissue-oriented visualization system for ontologies distribution in human EST collections 
BMC Bioinformatics  2009;10(Suppl 12):S2.
The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) , a web facility for comparison of expression levels among libraries from several healthy and diseased tissues.
The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed.
The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.
PMCID: PMC2762067  PMID: 19828078
25.  Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum 
Chinese Medicine  2010;5:9.
Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library.
A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis.
A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified.
The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum.
PMCID: PMC2848221  PMID: 20230644

