Bacterial degradation of steroid compounds is of high ecological and biotechnological relevance. Pseudomonas sp. strain Chol1 is a model organism for studying the degradation of the steroid compound cholate. Its draft genome sequence is presented and reveals one gene cluster responsible for the metabolism of steroid compounds.
Mutations in any genome may lead to phenotype characteristics that determine ability of an individual to cope with adaptation to environmental challenges. In studies of human biology, among the most interesting ones are phenotype characteristics that determine responses to drug treatments, response to infections, or predisposition to specific inherited diseases. Most of the research in this field has been focused on the studies of mutation effects on the final gene products, peptides, and their alterations. Considerably less attention was given to the mutations that may affect regulatory mechanism(s) of gene expression, although these may also affect the phenotype characteristics. In this study we make a pilot analysis of mutations observed in the regulatory regions of 24,667 human RefSeq genes. Our study reveals that out of eight studied mutation types, “insertions” are the only one that in a statistically significant manner alters predicted transcription factor binding sites (TFBSs). We also find that 25 families of TFBSs have been altered by mutations in a statistically significant manner in the promoter regions we considered. Moreover, we find that the related transcription factors are, for example, prominent in processes related to intracellular signaling; cell fate; morphogenesis of organs and epithelium; development of urogenital system, epithelium, and tube; neuron fate commitment. Our study highlights the significance of studying mutations within the genes regulatory regions and opens way for further detailed investigations on this topic, particularly on the downstream affected pathways.
SNP; insertion; deletion; mutation; transcription factor; transcription factor binding site; promoter region; bioinformatics
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias.
We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
Summary: In higher eukaryotes, the identification of translation initiation
sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using
Arabidopsis thaliana (A.t.) information, we developed
a prediction tool for signals within genomic sequences of plants that correspond to TISs.
Our tool requires only genome sequence, not expressed sequences. Its
sensitivity/specificity is for A.t. (90.75%/92.2%), for
Vitis vinifera (66.8%/94.4%) and for Populus
trichocarpa (81.6%/94.4%), which suggests that our tool can be
used in annotation of different plant genomes. We provide a list of features used in our
model. Further study of these features may improve our understanding of mechanisms of the
Availability and implementation: Our tool is implemented as an artificial
neural network. It is available as a web-based tool and, together with the source code,
the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts.
Supplementary information: Supplementary data are available at Bioinformatics
Estrogen therapy has positively impact the treatment of several cancers, such as prostate, lung and breast cancers. Moreover, several groups have reported the importance of estrogen induced gene regulation in esophageal cancer (EC). This suggests that there could be a potential for estrogen therapy for EC. The efficient design of estrogen therapies requires as complete as possible list of genes responsive to estrogen. Our study develops a systems biology methodology using esophageal squamous cell carcinoma (ESCC) as a model to identify estrogen responsive genes. These genes, on the other hand, could be affected by estrogen therapy in ESCC.
Based on different sources of information we identified 418 genes implicated in ESCC. Putative estrogen responsive elements (EREs) mapped to the promoter region of the ESCC genes were used to initially identify candidate estrogen responsive genes. EREs mapped to the promoter sequence of 30.62% (128/418) of ESCC genes of which 43.75% (56/128) are known to be estrogen responsive, while 56.25% (72/128) are new candidate estrogen responsive genes. EREs did not map to 290 ESCC genes. Of these 290 genes, 50.34% (146/290) are known to be estrogen responsive. By analyzing transcription factor binding sites (TFBSs) in the promoters of the 202 (56+146) known estrogen responsive ESCC genes under study, we found that their regulatory potential may be characterized by 44 significantly over-represented co-localized TFBSs (cTFBSs). We were able to map these cTFBSs to promoters of 32 of the 72 new candidate estrogen responsive ESCC genes, thereby increasing confidence that these 32 ESCC genes are responsive to estrogen since their promoters contain both: a/mapped EREs, and b/at least four cTFBSs characteristic of ESCC genes that are responsive to estrogen. Recent publications confirm that 47% (15/32) of these 32 predicted genes are indeed responsive to estrogen.
To the best of our knowledge our study is the first to use a cancer disease model as the framework to identify hormone responsive genes. Although we used ESCC as the disease model and estrogen as the hormone, the methodology can be extended analogously to other diseases as the model and other hormones. We believe that our results provide useful information for those interested in genes responsive to hormones and in the design of hormone-based therapies.
Motivation: Burgeoning sequencing technologies have generated massive amounts of genomic and proteomic data. Annotating the functions of proteins identified in this data has become a big and crucial problem. Various computational methods have been developed to infer the protein functions based on either the sequences or domains of proteins. The existing methods, however, ignore the recurrence and the order of the protein domains in this function inference.
Results: We developed two new methods to infer protein functions based on protein domain recurrence and domain order. Our first method, DRDO, calculates the posterior probability of the Gene Ontology terms based on domain recurrence and domain order information, whereas our second method, DRDO-NB, relies on the naïve Bayes methodology using the same domain architecture information. Our large-scale benchmark comparisons show strong improvements in the accuracy of the protein function inference achieved by our new methods, demonstrating that domain recurrence and order can provide important information for inference of protein functions.
Availability: The new models are provided as open source programs at http://sfb.kaust.edu.sa/Pages/Software.aspx.
Supplementary data are available at Bioinformatics Online.
Cone snails produce a distinctive repertoire of venom peptides that are used both as a defense mechanism and also to facilitate the immobilization and digestion of prey. These peptides target a wide variety of voltage- and ligand-gated ion channels, which make them an invaluable resource for studying the properties of these ion channels in normal and diseased states, as well as being a collection of compounds of potential pharmacological use in their own right. Examples include the United States Food and Drug Administration (FDA) approved pharmaceutical drug, Ziconotide (Prialt®; Elan Pharmaceuticals, Inc.) that is the synthetic equivalent of the naturally occurring ω-conotoxin MVIIA, whilst several other conotoxins are currently being used as standard research tools and screened as potential therapeutic drugs in pre-clinical or clinical trials. These developments highlight the importance of driving conotoxin-related research. A PubMed query from 1 January 2007 to 31 August 2011 combined with hand-curation of the retrieved articles allowed for the collation of 98 recently identified conotoxins with therapeutic potential which are selectively discussed in this review. Protein sequence similarity analysis tentatively assigned uncharacterized conotoxins to predicted functional classes. Furthermore, conotoxin therapeutic potential for neurodegenerative disorders (NDD) was also inferred.
Conus; cone snail; peptide; neuropeptide; conotoxin; nicotinic acetylcholine receptor; sodium channel; calcium channel; potassium channel
Protein interaction networks (PINs) specific within a particular context contain crucial information regarding many cellular biological processes. For example, PINs may include information on the type and directionality of interaction (e.g. phosphorylation), location of interaction (i.e. tissues, cells), and related diseases. Currently, very few tools are capable of deriving context-specific PINs for conducting exploratory analysis.
We developed a literature-based online system, Context-specific Protein Network Miner (CPNM), which derives context-specific PINs in real-time from the PubMed database based on a set of user-input keywords and enhanced PubMed query system. CPNM reports enriched information on protein interactions (with type and directionality), their network topology with summary statistics (e.g. most densely connected proteins in the network; most densely connected protein-pairs; and proteins connected by most inbound/outbound links) that can be explored via a user-friendly interface. Some of the novel features of the CPNM system include PIN generation, ontology-based PubMed query enhancement, real-time, user-queried, up-to-date PubMed document processing, and prediction of PIN directionality.
CPNM provides a tool for biologists to explore PINs. It is freely accessible at http://www.biotextminer.com/CPNM/.
We present the draft genome of Haloplasma contractile, isolated from a deep-sea brine and representing a new order between Firmicutesand Mollicutes. Its complex morphology with contractile protrusions might be strongly influenced by the presence of seven MreB/Mbl homologs, which appears to be the highest copy number ever reported.
We present the draft genome of Halorhabdus tiamatea, the first member of the Archaeaever isolated from a deep-sea anoxic brine. Genome comparison with Halorhabdus utahensisrevealed some striking differences, including a marked increase in genes associated with transmembrane transport and putative genes for a trehalose synthase and a lactate dehydrogenase.
We present the genome of Salinisphaera shabanensis, isolated from a brine-seawater interface and representing a new order within the Gammaproteobacteria. Its adaptations to physicochemical and nutrient availability fluctuations include six genes encoding heavy metal-translocating P-type ATPases and multiple genes involved in iron uptake, siderophore production, and poly-β-hydroxybutyrate synthesis.
The demand for antimicrobial peptides (AMPs) is rising because of the increased occurrence of pathogens that are tolerant or resistant to conventional antibiotics. Since naturally occurring AMPs could serve as templates for the development of new anti-infectious agents to which pathogens are not resistant, a resource that contains relevant information on AMP is of great interest. To that extent, we developed the Dragon Antimicrobial Peptide Database (DAMPD, http://apps.sanbi.ac.za/dampd) that contains 1232 manually curated AMPs. DAMPD is an update and a replacement of the ANTIMIC database. In DAMPD an integrated interface allows in a simple fashion querying based on taxonomy, species, AMP family, citation, keywords and a combination of search terms and fields (Advanced Search). A number of tools such as Blast, ClustalW, HMMER, Hydrocalculator, SignalP, AMP predictor, as well as a number of other resources that provide additional information about the results are also provided and integrated into DAMPD to augment biological analysis of AMPs.
Motivation: Recognition of poly(A) signals in mRNA is relatively straightforward due to the presence of easily recognizable polyadenylic acid tail. However, the task of identifying poly(A) motifs in the primary genomic DNA sequence that correspond to poly(A) signals in mRNA is a far more challenging problem. Recognition of poly(A) signals is important for better gene annotation and understanding of the gene regulation mechanisms. In this work, we present one such poly(A) motif prediction method based on properties of human genomic DNA sequence surrounding a poly(A) motif. These properties include thermodynamic, physico-chemical and statistical characteristics. For predictions, we developed Artificial Neural Network and Random Forest models. These models are trained to recognize 12 most common poly(A) motifs in human DNA. Our predictors are available as a free web-based tool accessible at http://cbrc.kaust.edu.sa/dps. Compared with other reported predictors, our models achieve higher sensitivity and specificity and furthermore provide a consistent level of accuracy for 12 poly(A) motif variants.
Supplementary information: Supplementary data are available at Bioinformatics online.
MicroRNAs (miRNAs) are small non-coding RNA molecules that repress the translation of messenger RNAs (mRNAs) or degrade mRNAs. These functions of miRNAs allow them to control key cellular processes such as development, differentiation and apoptosis, and they have also been implicated in several cancers such as leukaemia, lung, pancreatic and ovarian cancer (OC). Unfortunately, the specific machinery of miRNA regulation, involving transcription factors (TFs) and transcription co-factors (TcoFs), is not well understood. In the present study we focus on computationally deciphering the underlying network of miRNAs, their targets, and their control mechanisms that have an influence on OC development.
We analysed experimentally verified data from multiple sources that describe miRNA influence on diseases, miRNA targeting of mRNAs, and on protein-protein interactions, and combined this data with ab initio transcription factor binding site predictions within miRNA promoter regions. From these analyses, we derived a network that describes the influence of miRNAs and their regulation in human OC. We developed a methodology to analyse the network in order to find the nodes that have the largest potential of influencing the network's behaviour (network hubs). We further show the potentially most influential miRNAs, TFs and TcoFs, showing subnetworks illustrating the involved mechanisms as well as regulatory miRNA network motifs in OC. We find an enrichment of miRNA targeted OC genes in the highly relevant pathways cell cycle regulation and apoptosis.
We combined several sources of interaction and association data to analyse and place miRNAs within regulatory pathways that influence human OC. These results represent the first comprehensive miRNA regulatory network analysis for human OC. This suggests that miRNAs and their regulation may play a major role in OC and that further directed research in this area is of utmost importance to enhance our understanding of the molecular mechanisms underlying human cancer development and OC in particular.
Our study focuses on identifying potential biomarkers for diagnosis and early detection of ovarian cancer (OC) through the study of transcription regulation of genes affected by estrogen hormone.
The results are based on a set of 323 experimentally validated OC-associated genes compiled from several databases, and their subset controlled by estrogen. For these two gene sets we computationally determined transcription factors (TFs) that putatively regulate transcription initiation. We ranked these TFs based on the number of genes they are likely to control. In this way, we selected 17 top-ranked TFs as potential key regulators and thus possible biomarkers for a set of 323 OC-associated genes. For 77 estrogen controlled genes from this set we identified three unique TFs as potential biomarkers.
We introduced a new methodology to identify potential diagnostic biomarkers for OC. This report is the first bioinformatics study that explores multiple transcriptional regulators of OC-associated genes as potential diagnostic biomarkers in connection with estrogen responsiveness. We show that 64% of TF biomarkers identified in our study are validated based on real-time data from microarray expression studies. As an illustration, our method could identify CP2 that in combination with CA125 has been reported to be sensitive in diagnosing ovarian tumors.
The barnacle Balanus amphitrite is a globally distributed biofouler and a model species in intertidal ecology and larval settlement studies. However, a lack of genomic information has hindered the comprehensive elucidation of the molecular mechanisms coordinating its larval settlement. The pyrosequencing-based transcriptomic approach is thought to be useful to identify key molecular changes during larval settlement.
Methodology and Principal Findings
Using 454 pyrosequencing, we collected totally 630,845 reads including 215,308 from the larval stages and 415,537 from the adults; 23,451 contigs were generated while 77,785 remained as singletons. We annotated 31,720 of the 92,322 predicted open reading frames, which matched hits in the NCBI NR database, and identified 7,954 putative genes that were differentially expressed between the larval and adult stages. Of these, several genes were further characterized with quantitative real-time PCR and in situ hybridization, revealing some key findings: 1) vitellogenin was uniquely expressed in late nauplius stage, suggesting it may be an energy source for the subsequent non-feeding cyprid stage; 2) the locations of mannose receptors suggested they may be involved in the sensory system of cyprids; 3) 20 kDa-cement protein homologues were expressed in the cyprid cement gland and probably function during attachment; and 4) receptor tyrosine kinases were expressed higher in cyprid stage and may be involved in signal perception during larval settlement.
Our results provide not only the basis of several new hypotheses about gene functions during larval settlement, but also the availability of this large transcriptome dataset in B. amphitrite for further exploration of larval settlement and developmental pathways in this important marine species.
Physical interactions between transcription factors (TFs) are necessary for forming regulatory protein complexes and thus play a crucial role in gene regulation. Currently, knowledge about the mechanisms of these TF interactions is incomplete and the number of known TF interactions is limited. Computational prediction of such interactions can help identify potential new TF interactions as well as contribute to better understanding the complex machinery involved in gene regulation.
We propose here such a method for the prediction of TF interactions. The method uses only the primary sequence information of the interacting TFs, resulting in a much greater simplicity of the prediction algorithm. Through an advanced feature selection process, we determined a subset of 97 model features that constitute the optimized model in the subset we considered. The model, based on quadratic discriminant analysis, achieves a prediction accuracy of 85.39% on a blind set of interactions. This result is achieved despite the selection for the negative data set of only those TF from the same type of proteins, i.e. TFs that function in the same cellular compartment (nucleus) and in the same type of molecular process (transcription initiation). Such selection poses significant challenges for developing models with high specificity, but at the same time better reflects real-world problems.
The performance of our predictor compares well to those of much more complex approaches for predicting TF and general protein-protein interactions, particularly when taking the reduced complexity of model utilisation into account.
Despite intense efforts to develop non-cytotoxic anticancer treatments, effective agents are still not available. Therefore, novel apoptosis-inducing drug leads that may be developed into effective targeted cancer therapies are of interest to the cancer research community. Targeted cancer therapies affect specific aberrant apoptotic pathways that characterize different cancer types and, for this reason, it is a more desirable type of therapy than chemotherapy or radiotherapy, as it is less harmful to normal cells. In this regard, marine sponge derived metabolites that induce apoptosis continue to be a promising source of new drug leads for cancer treatments. A PubMed query from 01/01/2005 to 31/01/2011 combined with hand-curation of the retrieved articles allowed for the identification of 39 recently confirmed apoptosis-inducing anticancer lead compounds isolated from the marine sponge that are selectively discussed in this review.
marine sponge; apoptosis; cancer treatment; targeted cancer therapy; anticancer
Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.
Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.
Our analysis of human chromosomes 4, 21 and 22 estimates ∼46%, ∼41% and ∼27% of these chromosomes, respectively, as being NTLs. This suggests that on average more than 40% of the human genome can be expected to be highly unlikely to initiate transcription. Our method represents the first one that utilizes high-sensitivity TSS prediction to identify, with high accuracy, large portions of mammalian genomes as NTLs. The server with our algorithm implemented is available at http://cbrc.kaust.edu.sa/ddm/.
The initiation and regulation of transcription in eukaryotes is complex and involves a large number of transcription factors (TFs), which are known to bind to the regulatory regions of eukaryotic DNA. Apart from TF–DNA binding, protein–protein interaction involving TFs is an essential component of the machinery facilitating transcriptional regulation. Proteins that interact with TFs in the context of transcription regulation but do not bind to the DNA themselves, we consider transcription co-factors (TcoFs). The influence of TcoFs on transcriptional regulation and initiation, although indirect, has been shown to be significant with the functionality of TFs strongly influenced by the presence of TcoFs. While the role of TFs and their interaction with regulatory DNA regions has been well-studied, the association between TFs and TcoFs has so far been given less attention. Here, we present a resource that is comprised of a collection of human TFs and the TcoFs with which they interact. Other proteins that have a proven interaction with a TF, but are not considered TcoFs are also included. Our database contains 157 high-confidence TcoFs and additionally 379 hypothetical TcoFs. These have been identified and classified according to the type of available evidence for their involvement in transcriptional regulation and their presence in the cell nucleus. We have divided TcoFs into four groups, one of which contains high-confidence TcoFs and three others contain TcoFs which are hypothetical to different extents. We have developed the Dragon Database for Human Transcription Co-Factors and Transcription Factor Interacting Proteins (TcoF-DB). A web-based interface for this resource can be freely accessed at http://cbrc.kaust.edu.sa/tcof/ and http://apps.sanbi.ac.za/tcof/.
Prostate cancer (PC) is one of the most commonly diagnosed cancers in men. PC is relatively difficult to diagnose due to a lack of clear early symptoms. Extensive research of PC has led to the availability of a large amount of data on PC. Several hundred genes are implicated in different stages of PC, which may help in developing diagnostic methods or even cures. In spite of this accumulated information, effective diagnostics and treatments remain evasive. We have developed Dragon Database of Genes associated with Prostate Cancer (DDPC) as an integrated knowledgebase of genes experimentally verified as implicated in PC. DDPC is distinctive from other databases in that (i) it provides pre-compiled biomedical text-mining information on PC, which otherwise require tedious computational analyses, (ii) it integrates data on molecular interactions, pathways, gene ontologies, gene regulation at molecular level, predicted transcription factor binding sites on promoters of PC implicated genes and transcription factors that correspond to these binding sites and (iii) it contains DrugBank data on drugs associated with PC. We believe this resource will serve as a source of useful information for research on PC. DDPC is freely accessible for academic and non-profit users via http://apps.sanbi.ac.za/ddpc/ and http://cbrc.kaust.edu.sa/ddpc/.
Multiple factors underlie susceptibility to essential hypertension, including a significant genetic and ethnic component, and environmental effects. Blood pressure response of hypertensive individuals to salt is heterogeneous, but salt sensitivity appears more prevalent in people of indigenous African origin. The underlying genetics of salt-sensitive hypertension, however, are poorly understood. In this study, computational methods including text- and data-mining have been used to select and prioritize candidate aetiological genes for salt-sensitive hypertension. Additionally, we have compared allele frequencies and copy number variation for single nucleotide polymorphisms in candidate genes between indigenous Southern African and Caucasian populations, with the aim of identifying candidate genes with significant variability between the population groups: identifying genetic variability between population groups can exploit ethnic differences in disease prevalence to aid with prioritisation of good candidate genes. Our top-ranking candidate genes include parathyroid hormone precursor (PTH) and type-1angiotensin II receptor (AGTR1). We propose that the candidate genes identified in this study warrant further investigation as potential aetiological genes for salt-sensitive hypertension.
An ever increasing amount of transcriptomic data and analysis tools provide novel insight into complex responses of biological systems. Given these resources we have undertaken to review aspects of transcriptional regulation in response to the plant hormone gibberellic acid (GA) and its second messenger guanosine 3′,5′-cyclic monophosphate (cGMP) in Arabidopsis thaliana, both wild type and selected mutants. Evidence suggests enrichment of GA-responsive (GARE ) elements in promoters of genes that are transcriptionally upregulated in response to cGMP but downregulated in a GA insensitive mutant (ga1-3). In contrast, in the genes upregulated in the mutant, no enrichment in the GARE is observed suggesting that GARE motifs are diagnostic for GA-induced and cGMP-dependent transcriptional upregulation. Further, we review how expression studies of GA-dependent transcription factors and transcriptional networks based on common promoter signatures derived from ab initio analyses can contribute to our understanding of plant responses at the systems level.
gibberellic acid; gibberellic acid response elements (GARE); guanosine 3′,5′-cyclic monophosphate (cGMP); plant homeostasis
The purpose of this study is to: i) develop a computational model of promoters of human histone-encoding genes (shortly histone genes), an important class of genes that participate in various critical cellular processes, ii) use the model so developed to identify regions across the human genome that have similar structure as promoters of histone genes; such regions could represent potential genomic regulatory regions, e.g. promoters, of genes that may be coregulated with histone genes, and iii/ identify in this way genes that have high likelihood of being coregulated with the histone genes.
We successfully developed a histone promoter model using a comprehensive collection of histone genes. Based on leave-one-out cross-validation test, the model produced good prediction accuracy (94.1% sensitivity, 92.6% specificity, and 92.8% positive predictive value). We used this model to predict across the genome a number of genes that shared similar promoter structures with the histone gene promoters. We thus hypothesize that these predicted genes could be coregulated with histone genes. This hypothesis matches well with the available gene expression, gene ontology, and pathways data. Jointly with promoters of the above-mentioned genes, we found a large number of intergenic regions with similar structure as histone promoters.
This study represents one of the most comprehensive computational analyses conducted thus far on a genome-wide scale of promoters of human histone genes. Our analysis suggests a number of other human genes that share a high similarity of promoter structure with the histone genes and thus are highly likely to be coregulated, and consequently coexpressed, with the histone genes. We also found that there are a large number of intergenic regions across the genome with their structures similar to promoters of histone genes. These regions may be promoters of yet unidentified genes, or may represent remote control regions that participate in regulation of histone and histone-coregulated gene transcription initiation. While these hypotheses still remain to be verified, we believe that these form a useful resource for researchers to further explore regulation of human histone genes and human genome. It is worthwhile to note that the regulatory regions of the human genome remain largely un-annotated even today and this study is an attempt to supplement our understanding of histone regulatory regions.
Ovarian epithelial cancer (OEC) usually presents in the later stages of the disease. Factors, especially those associated with cell-cycle genes, affecting the genesis and tumour progression for ovarian cancer are largely unknown. We hypothesized that over-expressed transcription factors (TFs), as well as those that are driving the expression of the OEC over-expressed genes, could be the key for OEC genesis and potentially useful tissue and serum markers for malignancy associated with OEC.
Using a combination of computational (selection of candidate TF markers and malignancy prediction) and experimental approaches (tissue microarray and western blotting on patient samples) we identified and evaluated E2F5 transcription factor involved in cell proliferation, as a promising candidate regulatory target in early stage disease. Our hypothesis was supported by our tissue array experiments that showed E2F5 expression only in OEC samples but not in normal and benign tissues, and by significantly positively biased expression in serum samples done using western blotting studies.
Analysis of clinical cases shows that of the E2F5 status is characteristic for a different population group than one covered by CA125, a conventional OEC biomarker. E2F5 used in different combinations with CA125 for distinguishing malignant cyst from benign cyst shows that the presence of CA125 or E2F5 increases sensitivity of OEC detection to 97.9% (an increase from 87.5% if only CA125 is used) and, more importantly, the presence of both CA125 and E2F5 increases specificity of OEC to 72.5% (an increase from 55% if only CA125 is used). This significantly improved accuracy suggests possibility of an improved diagnostics of OEC. Furthermore, detection of malignancy status in 86 cases (38 benign, 48 early and late OEC) shows that the use of E2F5 status in combination with other clinical characteristics allows for an improved detection of malignant cases with sensitivity, specificity, F-measure and accuracy of 97.92%, 97.37%, 97.92% and 97.67%, respectively.
Overall, our findings, in addition to opening a realistic possibility for improved OEC diagnosis, provide an indirect evidence that a cell-cycle regulatory protein E2F5 might play a significant role in OEC pathogenesis.