We applied fluorescence microscopy based quantitative assays to living cells to identify
regulators of ER to Golgi trafficking and/or Golgi complex maintenance. We first validated an
automated procedure to identify factors, which influence Golgi to ER re-localization of GalT-CFP
after brefeldin A (BFA) addition and/or wash-out. We then tested 14 proteins that localize to the ER
and/or Golgi complex when over-expressed for a role in ER to Golgi trafficking. Nine of them
interfered with the rate of BFA induced redistribution of GalT-CFP from the Golgi complex to the ER,
6 of them interfered with GalT-CFP redistribution from the ER to a juxtanuclear region (i.e., Golgi
complex) after BFA wash-out, and 6 of them were positive effectors in both assays. Notably, our live
cell approach captures regulator function in ER to Golgi trafficking, that were missed in previous
fixed cell assays; as well as assigns putative roles for other less characterized proteins.
Moreover, we show that our assays can be extended to RNAi and chemical screens.
Brefeldin A (BFA); GalT; ER to Golgi trafficking; YIPF; GOT1B; USE1; SACM1L
Non-coding RNAs are much more common than previously thought. However, for the vast majority of non-coding RNAs, the cellular function remains enigmatic. The two long non-coding RNA (lncRNA) genes DLEU1 and DLEU2 map to a critical region at chromosomal band 13q14.3 that is recurrently deleted in solid tumors and hematopoietic malignancies like chronic lymphocytic leukemia (CLL). While no point mutations have been found in the protein coding candidate genes at 13q14.3, they are deregulated in malignant cells, suggesting an epigenetic tumor suppressor mechanism. We therefore characterized the epigenetic makeup of 13q14.3 in CLL cells and found histone modifications by chromatin-immunoprecipitation (ChIP) that are associated with activated transcription and significant DNA-demethylation at the transcriptional start sites of DLEU1 and DLEU2 using 5 different semi-quantitative and quantitative methods (aPRIMES, BioCOBRA, MCIp, MassARRAY, and bisulfite sequencing). These epigenetic aberrations were correlated with transcriptional deregulation of the neighboring candidate tumor suppressor genes, suggesting a coregulation in cis of this gene cluster. We found that the 13q14.3 genes in addition to their previously known functions regulate NF-kB activity, which we could show after overexpression, siRNA–mediated knockdown, and dominant-negative mutant genes by using Western blots with previously undescribed antibodies, by a customized ELISA as well as by reporter assays. In addition, we performed an unbiased screen of 810 human miRNAs and identified the miR-15/16 family of genes at 13q14.3 as the strongest inducers of NF-kB activity. In summary, the tumor suppressor mechanism at 13q14.3 is a cluster of genes controlled by two lncRNA genes that are regulated by DNA-methylation and histone modifications and whose members all regulate NF-kB. Therefore, the tumor suppressor mechanism in 13q14.3 underlines the role both of epigenetic aberrations and of lncRNA genes in human tumorigenesis and is an example of colocalization of a functionally related gene cluster.
Recent results suggest that genome regions not coding for proteins are read and transcribed into RNA. While the function for the majority of the resulting non-coding RNA molecules remains unclear, some of them are termed according to their length (typically 200–2,000 nucleotides) as long non-coding RNA (lncRNA) genes that play a role in regulating the activity of target genes. In most instances, this deregulation involves changes of so-called “epigenetic” marks associated with the DNA that are inherited to the cellular progeny without changes in the DNA sequence. Here we describe an example where two lncRNA genes (DLEU1 and DLEU2) are epigenetically deregulated together with a cluster of neighboring protein-coding tumor suppressor genes in almost all patients suffering from chronic lymphocytic leukemia. Such a common regulation suggests that the affected genes are involved in the same cellular pathway. In line with this notion, the 13q14.3 genes modulate the NF-kB signalling pathway, either inducing or repressing its activity. An activation of NF-kB has previously been shown to promote survival of the leukemic cells, underlining the importance of the 13q14.3 tumor suppressor locus for the pathomechanism of the disease.
Targeting receptor tyrosine kinases (RTKs) with kinase inhibitors is a clinically validated anti-cancer approach. However, blocking one signaling pathway is often not sufficient to cause tumor regression and the effectiveness of individual inhibitors is often short-lived. As alterations in fibroblast growth factor receptor (FGFR) activity have been implicated in breast cancer, we examined in breast cancer models with autocrine FGFR activity the impact of targeting FGFRs in vivo with a selective kinase inhibitor in combination with an inhibitor of PI3K/mTOR or with a pan-ErbB inhibitor.
Using 4T1 or 67NR models of basal-like breast cancer, tumor growth was measured in mice treated with an FGFR inhibitor (dovitinib/TKI258), a PI3K/mTOR inhibitor (NVP-BEZ235) or a pan-ErbB inhibitor (AEE788) individually or in combination. To uncover mechanisms underlying inhibitor action, signaling pathway activity was examined in tumor lysates and transcriptome analysis carried out to identify pathways upregulated by FGFR inhibition. Anti-phosphotyrosine receptor antibody arrays (P-Tyr RTK) were also used to screen 4T1 tumors.
The combination of dovitinib + NVP-BEZ235 causes tumor stasis and strong down-regulation of the FRS2/Erk and PI3K/Akt/mTOR signaling pathways. P-Tyr RTK arrays identified high levels of P-EGFR and P-ErbB2 in 4T1 tumors. Testing AEE788 in the tumor models revealed that the combination of dovitinib + AEE788 resulted in blockade of the PI3K/Akt/mTOR pathway, prolonged tumor stasis and in the 4T1 model, a significant decrease in lung metastasis. The results show that in vivo these breast cancer models become dependent upon co-activation of FGFR and ErbB receptors for PI3K pathway activity.
The work presented here shows that in the breast cancer models examined, the combination of dovitinib + NVP-BEZ235 or dovitinib + AEE788 results in strong inhibition of tumor growth and a block in metastatic spread. Only these combinations strongly down-regulate the FGFR/FRS2/Erk and PI3K/Akt/mTOR signaling pathways. The resultant decrease in mitosis and increase in apoptosis was consistently stronger in the dovitinib + AEE788 treatment-group, suggesting that targeting ErbB receptors has broader downstream effects compared to targeting only PI3K/mTOR. Considering that sub-classes of human breast tumors co-express ErbB receptors and FGFRs, these results have implications for targeted therapy.
A genome-wide microRNA (miRNome) screen coupled with high-throughput monitoring of protein levels reveals complex, modular miRNA regulation of the EGFR-driven cell-cycle network, and identifies new miRNAs that can suppress breast cancer cell proliferation.
We interrogated, for the first time, a mammalian oncogenic signaling network with the miRNome and report the outputs at the protein level.Whole-genome microRNA (miRNA) effects on a given protein are generally mild, supporting a fine-tuning role for miRNAs, and these effects are dominated by sequence-matching mechanisms.We developed a novel network-analysis methodology with a bipartite graph model to identify proteins co-regulated by miRNAs. Besides the sequence-based mechanism, our results demonstrated that miRNAs simultaneously regulate several proteins belonging to the same functional module.We identified three miRNAs, miR-124, miR-147 and miR-193a-3p, as novel tumor suppressors that co-regulate EGFR-driven cell-cycle network proteins, and inhibit cell-cycle progression and proliferation in breast cancer.Our results demonstrate the potential to steer miRNA research toward the network level, underlining the need for systematic approaches before positioning miRNAs as drugs or drug targets.
The EGFR-driven cell-cycle pathway has been extensively studied due to its pivotal role in breast cancer proliferation and pathogenesis. Although several studies reported regulation of individual pathway components by microRNAs (miRNAs), little is known about how miRNAs coordinate the EGFR protein network on a global miRNA (miRNome) level. Here, we combined a large-scale miRNA screening approach with a high-throughput proteomic readout and network-based data analysis to identify which miRNAs are involved, and to uncover potential regulatory patterns. Our results indicated that the regulation of proteins by miRNAs is dominated by the nucleotide matching mechanism between seed sequences of the miRNAs and 3′-UTR of target genes. Furthermore, the novel network-analysis methodology we developed implied the existence of consistent intrinsic regulatory patterns where miRNAs simultaneously co-regulate several proteins acting in the same functional module. Finally, our approach led us to identify and validate three miRNAs (miR-124, miR-147 and miR-193a-3p) as novel tumor suppressors that co-target EGFR-driven cell-cycle network proteins and inhibit cell-cycle progression and proliferation in breast cancer.
breast cancer; EGFR signaling; microRNA; miRNA–protein interaction network; network analysis
MicroRNA-200c (miR-200c) has been shown to suppress epithelial-mesenchymal transition (EMT), which is attributed mainly to targeting of ZEB1/ZEB2, repressors of the cell-cell contact protein E-cadherin. Here we demonstrated that modulation of miR-200c in breast cancer cells regulates cell migration, cell elongation, and transforming growth factor β (TGF-β)-induced stress fiber formation by impacting the reorganization of cytoskeleton that is independent of the ZEB/E-cadherin axis. We identified FHOD1 and PPM1F, direct regulators of the actin cytoskeleton, as novel targets of miR-200c. Remarkably, expression levels of FHOD1 and PPM1F were inversely correlated with the level of miR-200c in breast cancer cell lines, breast cancer patient samples, and 58 cancer cell lines of various origins. Furthermore, individual knockdown/overexpression of these target genes phenocopied the effects of miR-200c overexpression/inhibition on cell elongation, stress fiber formation, migration, and invasion. Mechanistically, targeting of FHOD1 by miR-200c resulted in decreased expression and transcriptional activity of serum response factor (SRF), mediated by interference with the translocation of the SRF coactivator mycocardin-related transcription factor A (MRTF-A). This finally led to downregulation of the expression and phosphorylation of the SRF target myosin light chain 2 (MLC2) gene, required for stress fiber formation and contractility. Thus, miR-200c impacts on metastasis by regulating several EMT-related processes, including a novel mechanism involving the direct targeting of actin-regulatory proteins.
Cell migration is essential during development and in human disease progression including cancer. Most cell migration studies concentrate on known or predicted components of migration pathways.
Here we use data from a genome-wide RNAi morphology screen in Drosophila melanogaster cells together with bioinformatics to identify 26 new regulators of morphology and cytoskeletal organization in human cells. These include genes previously implicated in a wide range of functions, from mental retardation, Down syndrome and Huntington's disease to RNA and DNA-binding genes. We classify these genes into seven groups according to phenotype and identify those that affect cell migration. We further characterize a subset of seven genes, FAM40A, FAM40B, ARC, FMNL3, FNBP3/FBP11, LIMD1 and ZRANB1, each of which has a different effect on cell shape, actin filament distribution and cell migration. Interestingly, in several instances closely related isoforms with a single Drosophila homologue have distinct phenotypes. For example, FAM40B depletion induces cell elongation and tail retraction defects, whereas FAM40A depletion reduces cell spreading.
Our results identify multiple regulators of cell migration and cytoskeletal signalling that are highly conserved between Drosophila and humans, and show that closely related paralogues can have very different functions in these processes.
Network inference from high-throughput data has become an important means of current analysis of biological systems. For instance, in cancer research, the functional relationships of cancer related proteins, summarised into signalling networks are of central interest for the identification of pathways that influence tumour development. Cancer cell lines can be used as model systems to study the cellular response to drug treatments in a time-resolved way. Based on these kind of data, modelling approaches for the signalling relationships are needed, that allow to generate hypotheses on potential interference points in the networks.
We present the R-package 'ddepn' that implements our recent approach on network reconstruction from longitudinal data generated after external perturbation of network components. We extend our approach by two novel methods: a Markov Chain Monte Carlo method for sampling network structures with two edge types (activation and inhibition) and an extension of a prior model that penalises deviances from a given reference network while incorporating these two types of edges. Further, as alternative prior we include a model that learns signalling networks with the scale-free property.
The package 'ddepn' is freely available on R-Forge and CRAN http://ddepn.r-forge.r-project.org, http://cran.r-project.org. It allows to conveniently perform network inference from longitudinal high-throughput data using two different sampling based network structure search algorithms.
Analysis of biological processes is frequently performed with the help of phenotypic assays where data is mostly acquired in single end-point analysis. Alternative phenotypic profiling techniques are desired where time-series information is essential to the biological question, for instance to differentiate early and late regulators of cell proliferation in loss-of-function studies. So far there is no study addressing this question despite of high unmet interests, mostly due to the limitation of conventional end-point assaying technologies. We present the first human kinome screen with a real-time cell analysis system (RTCA) to capture dynamic RNAi phenotypes, employing time-resolved monitoring of cell proliferation via electrical impedance. RTCA allowed us to investigate the dynamics of phenotypes of cell proliferation instead of using conventional end-point analysis. By introducing data transformation with first-order derivative, i.e. the cell-index growth rate, we demonstrate this system suitable for high-throughput screenings (HTS). The screen validated previously identified inhibitor genes and, additionally, identified activators of cell proliferation. With the information of time kinetics available, we could establish a network of mitotic-event related genes to be among the first displaying inhibiting effects after RNAi knockdown. The time-resolved screen captured kinetics of cell proliferation caused by RNAi targeting human kinome, serving as a resource for researchers. Our work establishes RTCA technology as a novel robust tool with biological and pharmacological relevance amenable for high-throughput screening.
Reverse phase protein arrays (RPPA) have been demonstrated to be a useful experimental platform for quantitative protein profiling in a high-throughput format. Target protein detection relies on the readout obtained from a single detection antibody. For this reason, antibody specificity is a key factor for RPPA. RNAi allows the specific knockdown of a target protein in complex samples and was therefore examined for its utility to assess antibody performance for RPPA applications.
To proof the feasibility of our strategy, two different anti-EGFR antibodies were compared by RPPA. Both detected the knockdown of EGFR but at a different rate. Western blot data were used to identify the most reliable antibody. The RNAi approach was also used to characterize commercial anti-STAT3 antibodies. Out of ten tested anti-STAT3 antibodies, four antibodies detected the STAT3-knockdown at 80-85%, and the most sensitive anti-STAT3 antibody was identified by comparing detection limits. Thus, the use of RNAi for RPPA antibody validation was demonstrated to be a stringent approach to identify highly specific and highly sensitive antibodies. Furthermore, the RNAi/RPPA strategy is also useful for the validation of isoform-specific antibodies as shown for the identification of AKT1/AKT2 and CCND1/CCND3-specific antibodies.
RNAi is a valuable tool for the identification of very specific and highly sensitive antibodies, and is therefore especially useful for the validation of RPPA-suitable detection antibodies. On the other hand, when a set of well-characterized RPPA-antibodies is available, large-scale RNAi experiments analyzed by RPPA might deliver useful information for network reconstruction.
Motivation: Network modelling in systems biology has become an important tool to study molecular interactions in cancer research, because understanding the interplay of proteins is necessary for developing novel drugs and therapies. De novo reconstruction of signalling pathways from data allows to unravel interactions between proteins and make qualitative statements on possible aberrations of the cellular regulatory program. We present a new method for reconstructing signalling networks from time course experiments after external perturbation and show an application of the method to data measuring abundance of phosphorylated proteins in a human breast cancer cell line, generated on reverse phase protein arrays.
Results: Signalling dynamics is modelled using active and passive states for each protein at each timepoint. A fixed signal propagation scheme generates a set of possible state transitions on a discrete timescale for a given network hypothesis, reducing the number of theoretically reachable states. A likelihood score is proposed, describing the probability of measurements given the states of the proteins over time. The optimal sequence of state transitions is found via a hidden Markov model and network structure search is performed using a genetic algorithm that optimizes the overall likelihood of a population of candidate networks. Our method shows increased performance compared with two different dynamical Bayesian network approaches. For our real data, we were able to find several known signalling cascades from the ERBB signalling pathway.
Availability: Dynamic deterministic effects propagation networks is implemented in the R programming language and available at http://www.dkfz.de/mga2/ddepn/
Reverse phase protein arrays (RPPA) emerged as a useful experimental platform to analyze biological samples in a high-throughput format. Different signal detection methods have been described to generate a quantitative readout on RPPA including the use of fluorescently labeled antibodies. Increasing the sensitivity of RPPA approaches is important since many signaling proteins or posttranslational modifications are present at a low level.
A new antibody-mediated signal amplification (AMSA) strategy relying on sequential incubation steps with fluorescently-labeled secondary antibodies reactive against each other is introduced here. The signal quantification is performed in the near-infrared range. The RPPA-based analysis of 14 endogenous proteins in seven different cell lines demonstrated a strong correlation (r = 0.89) between AMSA and standard NIR detection. Probing serial dilutions of human cancer cell lines with different primary antibodies demonstrated that the new amplification approach improved the limit of detection especially for low abundant target proteins.
Antibody-mediated signal amplification is a convenient and cost-effective approach for the robust and specific quantification of low abundant proteins on RPPAs. Contrasting other amplification approaches it allows target protein detection over a large linear range.
A report on the conference 'Systems Genomics 2008', Heidelberg, Germany, 2-3 May 2008.
A report on the conference 'Systems Genomics 2008', Heidelberg, Germany, 2-3 May 2008.
Motivation: KEGG PATHWAY is a service of Kyoto Encyclopedia of Genes and Genomes (KEGG), constructing manually curated pathway maps that represent current knowledge on biological networks in graph models. While valuable graph tools have been implemented in R/Bioconductor, to our knowledge there is currently no software package to parse and analyze KEGG pathways with graph theory.
Results: We introduce the software package KEGGgraph in R and Bioconductor, an interface between KEGG pathways and graph models as well as a collection of tools for these graphs. Superior to existing approaches, KEGGgraph captures the pathway topology and allows further analysis or dissection of pathway graphs. We demonstrate the use of the package by the case study of analyzing human pancreatic cancer pathway.
Availability:KEGGgraph is freely available at the Bioconductor web site (http://www.bioconductor.org). KGML files can be downloaded from KEGG FTP site (ftp://ftp.genome.jp/pub/kegg/xml).
Supplementary information: Supplementary data are available at Bioinformatics online.
In breast cancer, overexpression of the transmembrane tyrosine kinase ERBB2 is an adverse prognostic marker, and occurs in almost 30% of the patients. For therapeutic intervention, ERBB2 is targeted by monoclonal antibody trastuzumab in adjuvant settings; however, de novo resistance to this antibody is still a serious issue, requiring the identification of additional targets to overcome resistance. In this study, we have combined computational simulations, experimental testing of simulation results, and finally reverse engineering of a protein interaction network to define potential therapeutic strategies for de novo trastuzumab resistant breast cancer.
First, we employed Boolean logic to model regulatory interactions and simulated single and multiple protein loss-of-functions. Then, our simulation results were tested experimentally by producing single and double knockdowns of the network components and measuring their effects on G1/S transition during cell cycle progression. Combinatorial targeting of ERBB2 and EGFR did not affect the response to trastuzumab in de novo resistant cells, which might be due to decoupling of receptor activation and cell cycle progression. Furthermore, examination of c-MYC in resistant as well as in sensitive cell lines, using a specific chemical inhibitor of c-MYC (alone or in combination with trastuzumab), demonstrated that both trastuzumab sensitive and resistant cells responded to c-MYC perturbation.
In this study, we connected ERBB signaling with G1/S transition of the cell cycle via two major cell signaling pathways and two key transcription factors, to model an interaction network that allows for the identification of novel targets in the treatment of trastuzumab resistant breast cancer. Applying this new strategy, we found that, in contrast to trastuzumab sensitive breast cancer cells, combinatorial targeting of ERBB receptors or of key signaling intermediates does not have potential for treatment of de novo trastuzumab resistant cells. Instead, c-MYC was identified as a novel potential target protein in breast cancer cells.
An arbitrary set of 96 human proteins was selected and tested to set-up a fully automated protein production strategy, covering all steps from DNA preparation to protein purification and analysis. The target proteins are encoded by functionally uncharacterized open reading frames (ORF) identified by the German cDNA consortium. Fusion proteins were produced in E. coli with four different fusion tags and tested in five different purification strategies depending on the respective fusion tag. The automated strategy relies on standard liquid handling and clone picking equipment.
A robust automated strategy for the production of recombinant human proteins in E. coli was established based on a set of four different protein expression vectors resulting in NusA/His, MBP/His, GST and His-tagged proteins. The yield of soluble fusion protein was correlated with the induction temperature and the respective fusion tag. NusA/His and MBP/His fusion proteins are best expressed at low temperature (25°C), whereas the yield of soluble GST fusion proteins was higher when protein expression was induced at elevated temperature. In contrast, the induction of soluble His-tagged fusion proteins was independent of the temperature. Amylose was not found useful for affinity-purification of MBP/His fusion proteins in a high-throughput setting, and metal chelating chromatography is recommended instead.
Soluble fusion proteins can be produced in E. coli in sufficient qualities and μg/ml culture quantities for downstream applications like microarray-based assays, and studies on protein-protein interactions employing a fully automated protein expression and purification strategy. Future applications might include the optimization of experimental conditions for the large-scale production of soluble recombinant proteins from libraries of open reading frames.
High-throughput technologies like functional screens and gene expression analysis produce extended lists of candidate genes. Gene-Set Enrichment Analysis is a commonly used and well established technique to test for the statistically significant over-representation of particular pathways. A shortcoming of this method is however, that most genes that are investigated in the experiments have very sparse functional or pathway annotation and therefore cannot be the target of such an analysis. The approach presented here aims to assign lists of genes with limited annotation to previously described functional gene collections or pathways. This works by comparing InterPro domain signatures of the candidate gene lists with domain signatures of gene sets derived from known classifications, e.g. KEGG pathways.
In order to validate our approach, we designed a simulation study. Based on all pathways available in the KEGG database, we create test gene lists by randomly selecting pathway genes, removing these genes from the known pathways and adding variable amounts of noise in the form of genes not annotated to the pathway. We show that we can recover pathway memberships based on the simulated gene lists with high accuracy. We further demonstrate the applicability of our approach on a biological example.
Results based on simulation and data analysis show that domain based pathway enrichment analysis is a very sensitive method to test for enrichment of pathways in sparsely annotated lists of genes. An R based software package domainsignatures, to routinely perform this analysis on the results of high-throughput screening, is available via Bioconductor.
With the completion of the human genome sequence the functional analysis and characterization of the encoded proteins has become the next urging challenge in the post-genome era. The lack of comprehensive ORFeome resources has thus far hampered systematic applications by protein gain-of-function analysis. Gene and ORF coverage with full-length ORF clones thus needs to be extended. In combination with a unique and versatile cloning system, these will provide the tools for genome-wide systematic functional analyses, to achieve a deeper insight into complex biological processes.
Here we describe the generation of a full-ORF clone resource of human genes applying the Gateway cloning technology (Invitrogen). A pipeline for efficient cloning and sequencing was developed and a sample tracking database was implemented to streamline the clone production process targeting more than 2,200 different ORFs. In addition, a robust cloning strategy was established, permitting the simultaneous generation of two clone variants that contain a particular ORF with as well as without a stop codon by the implementation of only one additional working step into the cloning procedure. Up to 92 % of the targeted ORFs were successfully amplified by PCR and more than 93 % of the amplicons successfully cloned.
The German cDNA Consortium ORFeome resource currently consists of more than 3,800 sequence-verified entry clones representing ORFs, cloned with and without stop codon, for about 1,700 different gene loci. 177 splice variants were cloned representing 121 of these genes. The entry clones have been used to generate over 5,000 different expression constructs, providing the basis for functional profiling applications. As a member of the recently formed international ORFeome collaboration we substantially contribute to generating and providing a whole genome human ORFeome collection in a unique cloning system that is made freely available in the community.
Neurons, with their long axons and elaborate dendritic arbour, establish the complex circuitry that is essential for the proper functioning of the nervous system. Whereas a catalogue of structural, molecular, and functional differences between axons and dendrites is accumulating, the mechanisms involved in early events of neuronal differentiation, such as neurite initiation and elongation, are less well understood, mainly because the key molecules involved remain elusive. Here we describe the establishment and application of a microscopy-based approach designed to identify novel proteins involved in neurite initiation and/or elongation. We identified 21 proteins that affected neurite outgrowth when ectopically expressed in cells. Complementary time-lapse microscopy allowed us to discriminate between early and late effector proteins. Localization experiments with GFP-tagged proteins in fixed and living cells revealed a further 14 proteins that associated with neurite tips either early or late during neurite outgrowth. Coexpression experiments of the new effector proteins provide a first glimpse on a possible functional relationship of these proteins during neurite outgrowth. Altogether, we demonstrate the potential of the systematic microscope-based screening approaches described here to tackle the complex biological process of neurite outgrowth regulation.
The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics.
CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs. Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping exons and the structural classification of cDNAs with respect to the reference set of splice variants.
The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon cDNAs and 85 % of the multiple exon cDNAs.
The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like EST-annotation, or to extend it by adding new classification rules and new organism databases as they become available. We think that it is a very useful program for the annotation and research of unfinished genomes.
CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel transcripts for new experiments.
A software tool for the analysis of high-throughput cell-based assays is presented.
Highthroughput cell-based assays with flow cytometric readout provide a powerful technique for identifying components of biologic pathways and their interactors. Interpretation of these large datasets requires effective computational methods. We present a new approach that includes data pre-processing, visualization, quality assessment, and statistical inference. The software is freely available in the Bioconductor package prada. The method permits analysis of large screens to detect the effects of molecular interventions in cellular systems.
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.
The identification of patterns in biological sequences is a key challenge in genome analysis and in proteomics. Frequently such patterns are complex and highly variable, especially in protein sequences. They are frequently described using terms of regular expressions (RegEx) because of the user-friendly terminology. Limitations arise for queries with the increasing complexity of patterns and are accompanied by requirements for enhanced capabilities. This is especially true for patterns containing ambiguous characters and positions and/or length ambiguities.
We have implemented the 3of5 web application in order to enable complex pattern matching in protein sequences. 3of5 is named after a special use of its main feature, the novel n-of-m pattern type. This feature allows for an extensive specification of variable patterns where the individual elements may vary in their position, order, and content within a defined stretch of sequence. The number of distinct elements can be constrained by operators, and individual characters may be excluded. The n-of-m pattern type can be combined with common regular expression terms and thus also allows for a comprehensive description of complex patterns. 3of5 increases the fidelity of pattern matching and finds ALL possible solutions in protein sequences in cases of length-ambiguous patterns instead of simply reporting the longest or shortest hits. Grouping and combined search for patterns provides a hierarchical arrangement of larger patterns sets. The algorithm is implemented as internet application and freely accessible. The application is available at .
The 3of5 application offers an extended vocabulary for the definition of search patterns and thus allows the user to comprehensively specify and identify peptide patterns with variable elements. The n-of-m pattern type offers an improved accuracy for pattern matching in combination with the ability to find all solutions, without compromising the user friendliness of regular expression terms.
Well known for its gene density and the large number of mapped diseases, the human sub-chromosomal region Xq28 has long been a focus of genome research. Over 40 of approximately 300 X-linked diseases map to this region, and systematic mapping, transcript identification, and mutation analysis has led to the identification of causative genes for 26 of these diseases, leaving another 17 diseases mapped to Xq28, where the causative gene is still unknown. To expedite disease gene identification, we have initiated the functional characterisation of all known Xq28 genes.
By using a systematic approach, we describe the Xq28 genes by RNA in situ hybridisation and Northern blotting of the mouse orthologs, as well as subcellular localisation and data mining of the human genes. We have developed a relational web-accessible database with comprehensive query options integrating all experimental data. Using this database, we matched gene expression patterns with affected tissues for 16 of the 17 remaining Xq28 linked diseases, where the causative gene is unknown.
By using this systematic approach, we have prioritised genes in linkage regions of Xq28-mapped diseases to an amenable number for mutational screens. Our database can be queried by any researcher performing highly specified searches including diseases not listed in OMIM or diseases that might be linked to Xq28 in the future.
LIFEdb () integrates data from large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. New features of LIFEdb include (i) an updated user interface with enhanced query capabilities, (ii) a configurable output table and the option to download search results in XML, (iii) the integration of data from cell-based screening assays addressing the influence of protein-overexpression on cell proliferation and (iv) the display of the relative expression (‘Electronic Northern’) of the genes under investigation using curated gene expression ontology information. LIFEdb enables researchers to systematically select and characterize genes and proteins of interest, and presents data and information via its user-friendly web-based interface.
Given the complexity of higher organisms, the number of genes encoded by their genomes is surprisingly small. Tissue specific regulation of expression and splicing are major factors enhancing the number of the encoded products. Commonly these mechanisms are intragenic and affect only one gene.
Here we provide evidence that the IL4I1 gene is specifically transcribed from the apparent promoter of the upstream NUP62 gene, and that the first two exons of NUP62 are also contained in the novel IL4I1_2 variant. While expression of IL4I1 driven from its previously described promoter is found mostly in B cells, the expression driven by the NUP62 promoter is restricted to cells in testis (Sertoli cells) and in the brain (e.g., Purkinje cells). Since NUP62 is itself ubiquitously expressed, the IL4I1_2 variant likely derives from cell type specific alternative pre-mRNA processing.
Comparative genomics suggest that the promoter upstream of the NUP62 gene originally belonged to the IL4I1 gene and was later acquired by NUP62 via insertion of a retroposon. Since both genes are apparently essential, the promoter had to serve two genes afterwards. Expression of the IL4I1 gene from the "NUP62" promoter and the tissue specific involvement of the pre-mRNA processing machinery to regulate expression of two unrelated proteins indicate a novel mechanism of gene regulation.