The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history.
Fever is typically treated empirically in rural Mozambique. We examined the distribution and antimicrobial susceptibility patterns of bacterial pathogens isolated from blood-culture specimens, and clinical characteristics of ambulatory HIV-infected febrile patients with and without bacteremia. This analysis was nested within a larger prospective observational study to evaluate the performance of new Mozambican guidelines for fever and anemia in HIV-infected adults (clinical trial registration NCT01681914, www.clinicaltrials.gov); the guidelines were designed to be used by non-physician clinicians who attended ambulatory HIV-infected patients in very resource-constrained peripheral health units. In 2012 (April-September), we recruited 258 HIV-infected adults with documented fever or history of recent fever in three sites within Zambézia Province, Mozambique. Although febrile patients were routinely tested for malaria, blood culture capacity was unavailable in Zambézia prior to study initiation. We confirmed bacteremia in 39 (15.1%) of 258 patients. The predominant organisms were non-typhoid Salmonella, nearly all resistant to multiple first-line antibiotics (ampicillin, chloramphenicol, and trimethoprim-sulfamethoxazole). Features most associated with bacteremia included higher temperature, lower CD4+ T-lymphocyte count, lower hemoglobin, and headache. Introduction of blood cultures allowed us to: 1) confirm bacteremia in a substantial proportion of patients; 2) tailor specific antimicrobial therapy for confirmed bacteremia based on known susceptibilities; 3) make informed choices of presumptive antibiotics for patients with suspected bacteremia; and 4) construct a preliminary clinical profile to help clinicians determine who would most likely benefit from presumptive bacteremia treatment. Our findings demonstrate that in resource-limited settings, there is urgent need to expand local microbiologic capacity to better identify and treat cases of bacteremia in HIV-infected and other patients, and to support surveillance. Data on the prevalence and susceptibility patterns of important pathogens can guide national formulary and prescribing practices.
The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information.
We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications.
Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans.
More than 50 mutations in the UBE3A gene (E6-AP ubiquitin protein ligase gene) have been found in Angelman syndrome patients with no deletion, no uniparental disomy, and no imprinting defect.
We here describe a novel UBE3A frameshift mutation in two siblings who have inherited it from their asymptomatic mother. Despite carrying the same UBE3A mutation, the proband shows a more severe phenotype whereas his sister shows a milder phenotype presenting the typical AS features.
We hypothesized that the mutation Leu125Stop causes both severe and milder phenotypes. Potential mechanisms include: i) maybe the proband has an additional problem (genetic or environmental) besides the UBE3A mutation; ii) since the two siblings have different fathers, the UBE3A mutation is interacting with a different genetic variant in the proband that, by itself, does not cause problems but in combination with the UBE3A mutation causes the severe phenotype; iii) this UBE3A mutation alone can cause either typical AS or the severe clinical picture seen in the proband.
Angelman syndrome; UBE3A gene; Imprinting; Novel mutation; Distinct phenotypes; HRM
The antidepressant fluoxetine has been under discussion because of its potential influence on cancer risk. It was found to inhibit the development of carcinogen-induced preneoplastic lesions in colon tissue, but the mechanisms of action are not well understood. Therefore, we investigated anti-proliferative effects, and used HT29 colon tumor cells in vitro, as well as C57BL/6 mice exposed to intra-rectal treatment with the carcinogen N-methyl-N’-nitro-N-nitrosoguanidine (MNNG) as models. Fluoxetine increased the percentage of HT29 cells in the G0/G1 phase of cell-cycle, and the expression of p27 protein. This was not related to an induction of apoptosis, reactive oxygen species or DNA damage. In vivo, fluoxetine reduced the development of MNNG-induced dysplasia and vascularization-related dysplasia in colon tissue, which was analyzed by histopathological techniques. An anti-proliferative potential of fluoxetine was observed in epithelial and stromal areas. It was accompanied by a reduction of VEGF expression and of the number of cells with angiogenic potential, such as CD133, CD34, and CD31-positive cell clusters. Taken together, our findings suggest that fluoxetine treatment targets steps of early colon carcinogenesis. This confirms its protective potential, explaining at least partially the lower colon cancer risk under antidepressant therapy.
Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper.
The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology.
This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms.
Up-regulation of S100A7 (Psoriasin), a small calcium-binding protein, is associated with the development of several types of carcinomas, but its function and possibility to serve as a diagnostic or prognostic marker have not been fully defined. In order to prepare antibodies to the protein for immunohistochemical studies we produced the recombinant S100A7 protein in E. coli. mRNA extracted from human tracheal tumor tissue which was amplified by RT-PCR to provide the region coding for the S100A7 gene. The amplified fragment was cloned in the vector pCR2.1-TOPO and sub-cloned in the expression vector pAE. The protein rS100A7 (His-tag) was expressed in E. coli BL21::DE3, purified by affinity chromatography on an Ni-NTA column, recovered in the 2.0 to 3.5 mg/mL range in culture medium, and used to produce a rabbit polyclonal antibody anti-rS100A7 protein. The profile of this polyclonal antibody was evaluated in a tissue microarray.
The rS100A7 (His-tag) protein was homogeneous by SDS-PAGE and mass spectrometry and was used to produce an anti-recombinant S100A7 (His-tag) rabbit serum (polyclonal antibody anti-rS100A7). The molecular weight of rS100A7 (His-tag) protein determined by linear MALDI-TOF-MS was 12,655.91 Da. The theoretical mass calculated for the nonapeptide attached to the amino terminus is 12,653.26 Da (delta 2.65 Da). Immunostaining with the polyclonal anti-rS100A7 protein generated showed reactivity with little or no background staining in head and neck squamous cell carcinoma cells, detecting S100A7 both in nucleus and cytoplasm. Lower levels of S100A7 were detected in non-neoplastic tissue.
The polyclonal anti-rS100A7 antibody generated here yielded a good signal-to-noise contrast and should be useful for immunohistochemical detection of S100A7 protein. Its potential use for other epithelial lesions besides human larynx squamous cell carcinoma and non-neoplastic larynx should be explored in future.
S100A7 (Psoriasin); Recombinant protein; Production of a polyclonal antibody; E. coli BL21::DE3; Mass spectrometry
Although patterns of somatic alterations have been reported for tumor genomes, little is known on how they compare with alterations present in non-tumor genomes. A comparison of the two would be crucial to better characterize the genetic alterations driving tumorigenesis. We sequenced the genomes of a lymphoblastoid (HCC1954BL) and a breast tumor (HCC1954) cell line derived from the same patient and compared the somatic alterations present in both. The lymphoblastoid genome presents a comparable number and similar spectrum of nucleotide substitutions to that found in the tumor genome. However, a significant difference in the ratio of non-synonymous to synonymous substitutions was observed between both genomes (P = 0.031). Protein–protein interaction analysis revealed that mutations in the tumor genome preferentially affect hub-genes (P = 0.0017) and are co-selected to present synergistic functions (P < 0.0001). KEGG analysis showed that in the tumor genome most mutated genes were organized into signaling pathways related to tumorigenesis. No such organization or synergy was observed in the lymphoblastoid genome. Our results indicate that endogenous mutagens and replication errors can generate the overall number of mutations required to drive tumorigenesis and that it is the combination rather than the frequency of mutations that is crucial to complete tumorigenic transformation.
A total of 172 persons from nine South Amerindian, three African and one Eskimo populations were studied in relation to the Paired box gene 9 (PAX9) exon 3 (138 base pairs) as well as its 5′and 3′flanking intronic segments (232 bp and 220 bp, respectively) and integrated with the information available for the same genetic region from individuals of different geographical origins. Nine mutations were scored in exon 3 and six in its flanking regions; four of them are new South American tribe-specific singletons. Exon3 nucleotide diversity is several orders of magnitude higher than its intronic regions. Additionally, a set of variants in the PAX9 and 101 other genes related with dentition can define at least some dental morphological differences between Sub-Saharan Africans and non-Africans, probably associated with adaptations after the modern human exodus from Africa. Exon 3 of PAX9 could be a good molecular example of how evolvability works.
Butyrylcholinesterase (BChE) is a plasma enzyme that catalyzes the hydrolysis of choline esters, including the muscle-relaxant succinylcholine and mivacurium. Patients who present sustained neuromuscular blockade after using succinylcholine usually carry BChE variants with reduced enzyme activity or an acquired BChE deficiency. We report here the molecular basis of the BCHE gene underlying the slow catabolism of succinylcholine in a patient who underwent endoscopic nasal surgery. We measured the enzyme activity of BChE and extracted genomic DNA in order to study the promoter region and all exons of the BCHE gene of the patient, her parents and siblings. PCR products were sequenced and compared with reference sequences from GenBank. We detected that the patient and one of her brothers have two homozygous mutations: nt1615 GCA > ACA (Ala539Thr), responsible for the K variant, and nt209 GAT > GGT (Asp70Gly), which produces the atypical variant A. Her parents and two of her brothers were found to be heterozygous for the AK allele, and another brother is homozygous for the normal allele. Sequence analysis of exon 1 including 5′UTR showed that the proband and her brother are homozygous for –116GG. The AK/AK genotype is considered the most frequent in hereditary hypocholinesterasemia (44%). This work demonstrates the importance of defining the phenotype and genotype of the BCHE gene in patients who are subjected to neuromuscular block by succinylcholine, because of the risk of prolonged neuromuscular paralysis.
hereditary hypocholinesterasemia; butyrylcholinesterase; succinylcholine BCHE gene; DNA polymorphism
While microRNAs (miRNAs) play important roles in tissue differentiation and in maintaining basal physiology, little is known about the miRNA expression levels in stomach tissue. Alterations in the miRNA profile can lead to cell deregulation, which can induce neoplasia.
A small RNA library of stomach tissue was sequenced using high-throughput SOLiD sequencing technology. We obtained 261,274 quality reads with perfect matches to the human miRnome, and 42% of known miRNAs were identified. Digital Gene Expression profiling (DGE) was performed based on read abundance and showed that fifteen miRNAs were highly expressed in gastric tissue. Subsequently, the expression of these miRNAs was validated in 10 healthy individuals by RT-PCR showed a significant correlation of 83.97% (P<0.05). Six miRNAs showed a low variable pattern of expression (miR-29b, miR-29c, miR-19b, miR-31, miR-148a, miR-451) and could be considered part of the expression pattern of the healthy gastric tissue.
This study aimed to validate normal miRNA profiles of human gastric tissue to establish a reference profile for healthy individuals. Determining the regulatory processes acting in the stomach will be important in the fight against gastric cancer, which is the second-leading cause of cancer mortality worldwide.
The post-genomic era has brought new challenges regarding the understanding of the organization and function of the human genome. Many of these challenges are centered on the meaning of differential gene regulation under distinct biological conditions and can be performed by analyzing the Multiple Differential Expression (MDE) of genes associated with normal and abnormal biological processes. Currently MDE analyses are limited to usual methods of differential expression initially designed for paired analysis.
We proposed a web platform named ProbFAST for MDE analysis which uses Bayesian inference to identify key genes that are intuitively prioritized by means of probabilities. A simulated study revealed that our method gives a better performance when compared to other approaches and when applied to public expression data, we demonstrated its flexibility to obtain relevant genes biologically associated with normal and abnormal biological processes.
ProbFAST is a free accessible web-based application that enables MDE analysis on a global scale. It offers an efficient methodological approach for MDE analysis of a set of genes that are turned on and off related to functional information during the evolution of a tumor or tissue differentiation. ProbFAST server can be accessed at http://gdm.fmrp.usp.br/probfast.
Head and neck squamous cell carcinoma (HNSCC) is a heterogeneous disease affecting the epithelium of the oral cavity, pharynx and larynx. Conditions of most patients are diagnosed at late stages of the disease, and no sensitive and specific predictors of aggressive behavior have been identified yet. Therefore, early detection and prognostic biomarkers are highly desirable for a more rational management of the disease. Hypermethylation of CpG islands is one of the most important epigenetic mechanisms that leads to gene silencing in tumors and has been extensively used for the identification of biomarkers. In this study, we combined rapid subtractive hybridization and microarray analysis in a hierarchical manner to select genes that are putatively reactivated by the demethylating agent 5-aza-2′-deoxycytidine (5Aza-dC) in HNSCC cell lines (FaDu, UM-SCC-14A, UM-SCC-17A, UM-SCC-38A). This combined analysis identified 78 genes, 35 of which were reactivated in at least 2 cell lines and harbored a CpG island at their 5′ region. Reactivation of 3 of these 35 genes (CRABP2, MX1, and SLC15A3) was confirmed by quantitative real-time polymerase chain reaction (PCR; fold change, ≥3). Bisulfite sequencing of their CpG islands revealed that they are indeed differentially methylated in the HNSCC cell lines. Using methylation-specific PCR, we detected a higher frequency of CRABP2 (58.1% for region 1) and MX1 (46.3%) hypermethylation in primary HNSCC when compared with lymphocytes from healthy individuals. Finally, absence of the CRABP2 protein was associated with decreased disease-free survival rates, supporting a potential use of CRABP2 expression as a prognostic biomarker for HNSCC patients.
High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis.
This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system.
These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at . S3T source code and datasets can also be downloaded from the aforementioned website.
Head and neck squamous cell carcinoma (HNSCC) is one of the most common malignancies in humans. The average 5-year survival rate is one of the lowest among aggressive cancers, showing no significant improvement in recent years. When detected early, HNSCC has a good prognosis, but most patients present metastatic disease at the time of diagnosis, which significantly reduces survival rate. Despite extensive research, no molecular markers are currently available for diagnostic or prognostic purposes.
Aiming to identify differentially-expressed genes involved in laryngeal squamous cell carcinoma (LSCC) development and progression, we generated individual Serial Analysis of Gene Expression (SAGE) libraries from a metastatic and non-metastatic larynx carcinoma, as well as from a normal larynx mucosa sample. Approximately 54,000 unique tags were sequenced in three libraries.
Statistical data analysis identified a subset of 1,216 differentially expressed tags between tumor and normal libraries, and 894 differentially expressed tags between metastatic and non-metastatic carcinomas. Three genes displaying differential regulation, one down-regulated (KRT31) and two up-regulated (BST2, MFAP2), as well as one with a non-significant differential expression pattern (GNA15) in our SAGE data were selected for real-time polymerase chain reaction (PCR) in a set of HNSCC samples. Consistent with our statistical analysis, quantitative PCR confirmed the upregulation of BST2 and MFAP2 and the downregulation of KRT31 when samples of HNSCC were compared to tumor-free surgical margins. As expected, GNA15 presented a non-significant differential expression pattern when tumor samples were compared to normal tissues.
To the best of our knowledge, this is the first study reporting SAGE data in head and neck squamous cell tumors. Statistical analysis was effective in identifying differentially expressed genes reportedly involved in cancer development. The differential expression of a subset of genes was confirmed in additional larynx carcinoma samples and in carcinomas from a distinct head and neck subsite. This result suggests the existence of potential common biomarkers for prognosis and targeted-therapy development in this heterogeneous type of tumor.
American tegumentary leishmaniasis (ATL) represents one of the most important public health issues in the world. An increased number of autochthonous cases of ATL in the Northeastern region of São Paulo State has been documented in the last few years, leading to a desire to determine the Leishmania species implicated.
PCR followed by DNA sequencing was carried out to identify a 120bp fragment from the universal kDNA minicircle of the genus Leishmania in 61 skin or mucosal biopsies from patients with ATL.
DNA sequencing permitted the identification of a particular 15bp fragment (5’ …GTC TTT GGG GCA AGT... 3’) in all samples. Analysis by the neighbor-joining method showed the occurrence of two distinct groups related to the genus Viannia (V) and Leishmania (L), each with two subgroups. Autochthonous cases with identity to a special Leishmania sequence not referenced in Genbank predominated in subgroup V.1, suggesting the possible existence of a subtype or mutation of Leishmania Viannia in this region. In the subgroup L.2, which showed identity with a known sequence of L. (L.) amazonensis, there was a balanced distribution of autochthonous and non-autochthonous cases, including the mucosal and mucocutaneus forms in four patients. The last observation may direct us to new concepts, since the mucosal compromising has commonly been attributed to L. (V.) braziliensis, even though L. (L.) amazonensis is more frequent in the Amazonian region.
These results confirm the pattern of distribution and possible mutations of these species, as well as the change in the clinical form presentation of ATL in the São Paulo State.
Tegumentary; Leishmaniasis; Phylogenetic analysis; L. (L.) Amazonensis; L. (V.) Braziliensis; Molecular epidemiology
Identification of genes that are upregulated in tumors,
and whose normal expression excludes adult somatic tissues but includes germline
and/or embryonic tissues, has resulted in a rich variety
of cancer antigens that are attractive targets for cancer vaccine
and other therapeutic approaches. In the present study, we extended
this approach to include genes strongly and restrictively expressed
in the placenta by mining publicly available SAGE and EST databases.
We identified a number of genes with high expression in placenta
and different cancer types but with relatively restricted expression
in normal tissues. The gene with the most distinctive expression
pattern was found to be PLAC1, which encodes a
putative cell surface protein that is highly expressed in placenta,
testis, cancer cell lines and lung tumors. Hence we have designated
it CT92. We found by ELISA that PLAC1 is immunogenic in a subset
of cancer patients and healthy women. Its physical and expression
characteristics render it a potential target for both active and
passive cancer immunotherapeutic strategies.
human; tumor antigens; PLAC1; mRNA; tissue distribution; humoral
In the present study we describe the cases of two patients with cluster-like headache related to intracranial carotid artery aneurysm. One of these patients responded to verapamil prescription with headache resolution. In both cases the surgical clipping of the aneurysm resolved the cluster pain. These findings strongly suggest a pathophysiological link between the two conditions. The authors discuss the potential pathophysiological mechanisms underlying cluster-like headache due to intracranial carotid artery aneurysm.
Cerebral aneurysm; Cluster headache; Parasympathetic; Third cranial nerve; Internal carotid artery; Pathophysiology
Besides their variable presence in fetal and adult
germ cells, CT antigens have occasionally been detected in placental
tissue. However, these data are scarce and solely based on mRNA
analyses; nothing is known about their presence at the protein level.
Here, we analyzed the expression of various CT antigens in placental
tissues from gestational age week 5 to week 42 using monoclonal
antibodies to various antigens of the MAGE-A and -C families, NY-ESO-1,
as well as GAGE. We show that CT antigen expression in placenta
varies widely for the various antigens, ranging from completely
negative to abundant. Since little is known about the function and
biology of CT antigens, interpretation of this highly variable expression
pattern is purely speculative. However, our data indicate that the
various CT antigens have different functions during placental development.
human; placenta; CT antigens; immunohistochemistry
The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome.
Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury.
Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data.
The ongoing efforts to sequence the honey bee genome require additional initiatives to define its transcriptome. Towards this end, we employed the Open Reading frame ESTs (ORESTES) strategy to generate profiles for the life cycle of Apis mellifera workers.
Of the 5,021 ORESTES, 35.2% matched with previously deposited Apis ESTs. The analysis of the remaining sequences defined a set of putative orthologs whose majority had their best-match hits with Anopheles and Drosophila genes. CAP3 assembly of the Apis ORESTES with the already existing 15,500 Apis ESTs generated 3,408 contigs. BLASTX comparison of these contigs with protein sets of organisms representing distinct phylogenetic clades revealed a total of 1,629 contigs that Apis mellifera shares with different taxa. Most (41%) represent genes that are in common to all taxa, another 21% are shared between metazoans (Bilateria), and 16% are shared only within the Insecta clade. A set of 23 putative genes presented a best match with human genes, many of which encode factors related to cell signaling/signal transduction. 1,779 contigs (52%) did not match any known sequence. Applying a correction factor deduced from a parallel analysis performed with Drosophila melanogaster ORESTES, we estimate that approximately half of these no-match ESTs contigs (22%) should represent Apis-specific genes.
The versatile and cost-efficient ORESTES approach produced minilibraries for honey bee life cycle stages. Such information on central gene regions contributes to genome annotation and also lends itself to cross-transcriptome comparisons to reveal evolutionary trends in insect genomes.