Search tips
Search criteria

Results 1-5 (5)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Integrating heterogeneous sequence information for transcriptome-wide microarray design; a Zebrafish example 
BMC Research Notes  2010;3:192.
A complete gene-expression microarray should preferably detect all genomic sequences that can be expressed as RNA in an organism, i.e. the transcriptome. However, our knowledge of a transcriptome of any organism still is incomplete and transcriptome information is continuously being updated. Here, we present a strategy to integrate heterogeneous sequence information that can be used as input for an up-to-date microarray design.
Our algorithm consists of four steps. In the first step transcripts from different resources are grouped into Transcription Clusters (TCs) by looking at the similarity of all transcripts. TCs are groups of transcripts with a similar length. If a transcript is much smaller than a TC to which it is highly similar, it will be annotated as a subsequence of that TC and is used for probe design only if the probe designed for the TC does not query the subsequence. Secondly, all TCs are mapped to a genome assembly and gene information is added to the design. Thirdly TC members are ranked according to their trustworthiness and the most reliable sequence is used for the probe design. The last step is the actual array design. We have used this strategy to build an up-to-date zebrafish microarray.
With our strategy and the software developed, it is possible to use a set of heterogeneous transcript resources for microarray design, reduce the number of candidate target sequences on which the design is based and reduce redundancy. By changing the parameters in the procedure it is possible to control the similarity within the TCs and thus the amount of candidate sequences for the design. The annotation of the microarray is carried out simultaneously with the design.
PMCID: PMC2913925  PMID: 20626891
2.  RNA isolation method for single embryo transcriptome analysis in zebrafish 
BMC Research Notes  2010;3:73.
Transcriptome analysis during embryogenesis usually requires pooling of embryos to obtain sufficient RNA. Hence, the measured levels of gene-expression represent the average mRNA levels of pooled samples and the biological variation among individuals is confounded. This can irreversibly reduce the robustness, resolution, or expressiveness of the experiment. Therefore, we developed a robust method to isolate abundant high-quality RNA from individual embryos to perform single embryo transcriptome analyses using zebrafish as a model organism. Available methods for embryonic zebrafish RNA isolation minimally utilize ten embryos. Further downscaling of these methods to one embryo is practically not feasible.
We developed a single embryo RNA extraction method based on sample homogenization in liquid nitrogen, RNA extraction with phenol and column purification. Evaluation of this method showed that: the quality of the RNA was very good with an average RIN value of 8.3-8.9; the yield was always ≥ 200 ng RNA per embryo; the method was applicable to all stages of zebrafish embryogenesis; the success rate was almost 100%; and the extracted RNA performed excellent in microarray experiments in that the technical variation was much lower than the biological variation.
Presented is a high-quality, robust RNA isolation method. Obtaining sufficient RNA from single embryos eliminates the necessity of sample pooling and its associated drawbacks. Although our RNA isolation method has been setup for transcriptome analysis in zebrafish, it can also be used for other model systems and other applications like (q)PCR and transcriptome sequencing.
PMCID: PMC2845602  PMID: 20233395
3.  SigWinR; the SigWin-detector updated and ported to R 
BMC Research Notes  2009;2:205.
Our SigWin-detector discovers significantly enriched windows of (genomic) elements in any sequence of values (genes or other genomic elements in a DNA sequence) in a fast and reproducible way. However, since it is grid based, only (life) scientists with access to the grid can use this tool. Therefore and on request, we have developed the SigWinR package which makes the SigWin-detector available to a much wider audience. At the same time, we have introduced several improvements to its algorithm as well as its functionality, based on the feedback of SigWin-detector end users.
To allow usage of the SigWin-detector on a desktop computer, we have rewritten it as a package for R: SigWinR. R is a free and widely used multi platform software environment for statistical computing and graphics. The package can be installed and used on all platforms for which R is available. The improvements involve: a visualization of the input-sequence values supporting the interpretation of Ridgeograms; a visualization allowing for an easy interpretation of enriched or depleted regions in the sequence using windows of pre-defined size; an option that allows the analysis of circular sequences, which results in rectangular Ridgeograms; an application to identify regions of co-altered gene expression (ROCAGEs) with a real-life biological use-case; adaptation of the algorithm to allow analysis of non-regularly sampled data using a constant window size in physical space without resampling the data. To achieve this, support for analysis of windows with an even number of elements was added.
By porting the SigWin-detector as an R package, SigWinR, improving its algorithm and functionality combined with adequate performance, we have made SigWin-detector more useful as well as more easily accessible to scientists without a grid infrastructure.
PMCID: PMC2762987  PMID: 19807919
4.  Using R in Taverna: RShell v1.2 
BMC Research Notes  2009;2:138.
R is the statistical language commonly used by many life scientists in (omics) data analysis. At the same time, these complex analyses benefit from a workflow approach, such as used by the open source workflow management system Taverna. However, Taverna had limited support for R, because it supported just a few data types and only a single output. Also, there was no support for graphical output and persistent sessions. Altogether this made using R in Taverna impractical.
We have developed an R plugin for Taverna: RShell, which provides R functionality within workflows designed in Taverna. In order to fully support the R language, our RShell plugin directly uses the R interpreter. The RShell plugin consists of a Taverna processor for R scripts and an RShell Session Manager that communicates with the R server. We made the RShell processor highly configurable allowing the user to define multiple inputs and outputs. Also, various data types are supported, such as strings, numeric data and images. To limit data transport between multiple RShell processors, the RShell plugin also supports persistent sessions. Here, we will describe the architecture of RShell and the new features that are introduced in version 1.2, i.e.: i) Support for R up to and including R version 2.9; ii) Support for persistent sessions to limit data transfer; iii) Support for vector graphics output through PDF; iv)Syntax highlighting of the R code; v) Improved usability through fewer port types.
Our new RShell processor is backwards compatible with workflows that use older versions of the RShell processor. We demonstrate the value of the RShell processor by a use-case workflow that maps oligonucleotide probes designed with DNA sequence information from Vega onto the Ensembl genome assembly.
Our RShell plugin enables Taverna users to employ R scripts within their workflows in a highly configurable way.
PMCID: PMC2717104  PMID: 19607662
5.  Salvaging Affymetrix probes after probe-level re-annotation 
BMC Research Notes  2008;1:66.
Affymetrix GeneChips can be re-annotated at the probe-level by breaking up the original probe-sets and recomposing new probe-sets based on up-to-date genomic knowledge, such as available in Entrez Gene. This results in custom Chip Description Files (CDF). Using these custom CDFs improves the quality of the data and thus the results of related gene expression studies. However, 44–71% of the probes on a GeneChip are lost in this re-annotation process. Although generally aimed at less known genes, losing these probes obviously means a substantial loss of expensive experiment data. Biologists are therefore very reluctant to adopt this approach.
We aimed to re-introduce the non-affected Affymetrix probe-sets after these re-annotation procedures. For this, we developed an algorithm (CDF-Merger) and applied it to standard Affymetrix CDFs and custom Brainarray CDFs to obtain Hybrid CDFs. Thus, salvaging lost Affymetrix probes with our CDF-Merger restored probe content up to 94%. Because the salvaged probes (up to 54% of the probe content on the arrays) represent less-reliable probe-sets, we made the origin of all probe-set definitions traceable, so biologists can choose at any time in their analyses, which subset of probe-sets they want to use.
The availability of up-to-date Hybrid CDFs plus R environment allows for easy implementation of our approach.
PMCID: PMC2547102  PMID: 18710586

Results 1-5 (5)