Search tips
Search criteria

Results 1-14 (14)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  New insights into the performance of human whole-exome capture platforms 
Nucleic Acids Research  2015;43(11):e76.
Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.
PMCID: PMC4477645  PMID: 25820422
2.  The draft genome of Primula veris yields insights into the molecular basis of heterostyly 
Genome Biology  2015;16(1):12.
The flowering plant Primula veris is a common spring blooming perennial that is widely cultivated throughout Europe. This species is an established model system in the study of the genetics, evolution, and ecology of heterostylous floral polymorphisms. Despite the long history of research focused on this and related species, the continued development of this system has been restricted due the absence of genomic and transcriptomic resources.
We present here a de novo draft genome assembly of P. veris covering 301.8 Mb, or approximately 63% of the estimated 479.22 Mb genome, with an N50 contig size of 9.5 Kb, an N50 scaffold size of 164 Kb, and containing an estimated 19,507 genes. The results of a RADseq bulk segregant analysis allow for the confident identification of four genome scaffolds that are linked to the P. veris S-locus. RNAseq data from both P. veris and the closely related species P. vulgaris allow for the characterization of 113 candidate heterostyly genes that show significant floral morph-specific differential expression. One candidate gene of particular interest is a duplicated GLOBOSA homolog that may be unique to Primula (PveGLO2), and is completely silenced in L-morph flowers.
The P. veris genome represents the first genome assembled from a heterostylous species, and thus provides an immensely important resource for future studies focused on the evolution and genetic dissection of heterostyly. As the first genome assembled from the Primulaceae, the P. veris genome will also facilitate the expanded application of phylogenomic methods in this diverse family and the eudicots as a whole.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0567-z) contains supplementary material, which is available to authorized users.
PMCID: PMC4305239  PMID: 25651398
3.  Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching 
BMC Bioinformatics  2013;14:370.
RNA-seq is now widely used to quantitatively assess gene expression, expression differences and isoform switching, and promises to deliver results for the entire transcriptome. However, whether the transcriptional state of a gene can be captured accurately depends critically on library preparation, read alignment, expression estimation and the tests for differential expression and isoform switching. There are comparisons available for the individual steps but there is not yet a systematic investigation which specific genes are impacted by biases throughout the entire analysis workflow. It is especially unclear whether for a given gene, with current methods and protocols, expression changes and isoform switches can be detected.
For the human genes, we report their detectability under various conditions using different approaches. Overall, we find that the input material has the biggest influence and may, depending on the protocol and RNA degradation, exhibit already strong length-dependent over- and underrepresentation of transcripts. The alignment step aligns for 50% of the isoforms up to 99% of the reads correctly; only in the presence of transcript modifications mainly short isoforms will have a low alignment rate. In our dataset, we found that, depending on the aligner and the input material used, the expression estimation of up to 93% of the genes being accurate within a factor of two; with the deviations being due to ambiguous alignments. Detection of differential expression using a negative-binomial count model works reliably for our simulated data but is dependent on the count accuracy. Interestingly, using the fold-change instead of the p-value as a score for differential expression yields the same performance in the situation of three replicates and the true change being two-fold. Isoform switching is harder to detect and for at least 109 genes the isoform differences evade detection independent of the method used.
RNA-seq is a reliable tool but the repetitive nature of the human genome makes the origin of the reads ambiguous and limits the detectability for certain genes. RNA-seq does not equally well represent isoforms independent of their size which may range from ~200nt to ~100′000nt. Researchers are advised to verify that their target genes do not have extreme properties with respect to repeated regions, GC content, and isoform length and complexity.
PMCID: PMC3879183  PMID: 24365034
4.  Identification of a SIRT1 Mutation in a Family with Type 1 Diabetes 
Cell metabolism  2013;17(3):448-455.
Type 1 diabetes is caused by autoimmune-mediated β cell destruction leading to insulin deficiency. The histone deacetylase SIRT1 plays an essential role in modulating several age-related diseases. Here we describe a family carrying a mutation in the SIRT1 gene, in which all five affected members developed an autoimmune disorder: four developed type 1 diabetes, and one developed ulcerative colitis. Initially, a 26-year-old man was diagnosed with the typical features of type 1 diabetes, including lean body mass, autoantibodies, T cell reactivity to β cell antigens, and a rapid dependence on insulin. Direct and exome sequencing identified the presence of a T-to-C exchange in exon 1 of SIRT1, corresponding to a leucine-to-proline mutation at residue 107. Expression of SIRT1-L107P in insulin-producing cells resulted in overproduction of nitric oxide, cytokines, and chemokines. These observations identify a role for SIRT1 in human autoimmunity and unveil a monogenic form of type 1 diabetes.
PMCID: PMC3746172  PMID: 23473037
5.  Drop drying on surfaces determines chemical reactivity - the specific case of immobilization of oligonucleotides on microarrays 
BMC Biophysics  2013;6:8.
Drop drying is a key factor in a wide range of technical applications, including spotted microarrays. The applied nL liquid volume provides specific reaction conditions for the immobilization of probe molecules to a chemically modified surface.
We investigated the influence of nL and μL liquid drop volumes on the process of probe immobilization and compare the results obtained to the situation in liquid solution. In our data, we observe a strong relationship between drop drying effects on immobilization and surface chemistry. In this work, we present results on the immobilization of dye labeled 20mer oligonucleotides with and without an activating 5′-aminoheptyl linker onto a 2D epoxysilane and a 3D NHS activated hydrogel surface.
Our experiments identified two basic processes determining immobilization. First, the rate of drop drying that depends on the drop volume and the ambient relative humidity. Oligonucleotides in a dried spot react unspecifically with the surface and long reaction times are needed. 3D hydrogel surfaces allow for immobilization in a liquid environment under diffusive conditions. Here, oligonucleotide immobilization is much faster and a specific reaction with the reactive linker group is observed. Second, the effect of increasing probe concentration as a result of drop drying. On a 3D hydrogel, the increasing concentration of probe molecules in nL spotting volumes accelerates immobilization dramatically. In case of μL volumes, immobilization depends on whether the drop is allowed to dry completely. At non-drying conditions, very limited immobilization is observed due to the low oligonucleotide concentration used in microarray spotting solutions. The results of our study provide a general guideline for microarray assay development. They allow for the initial definition and further optimization of reaction conditions for the immobilization of oligonucleotides and other probe molecule classes to different surfaces in dependence of the applied spotting and reaction volume.
PMCID: PMC3694035  PMID: 23758982
6.  Manager of Next Generation Sequencing Orders – MANGO 
The Functional Genomics Center Zurich (FGCZ) is a joint state-of-the-art research and training facility of the ETH Zurich and the University of Zurich. With latest technologies and expert support in genomics, transcriptomics, and bioinformatics, the FGCZ carries out research projects and technology development in collaboration with the Zurich Life Science research community. The FGCZ offers services for different applications on the Illumina HiSeq2000, Illumina MiSeq, Ion Torrent, Roche 454 and PACBIO RS. At the FGCZ, we handle hundreds of NGS projects a year. A working tool is necessary to monitor and document these sequencing projects. Because of our specialized need, we conceptualized, developed and implemented the MANGO to help manage, track, monitor and document our various and diverse NGS service orders. The MANGO works in multiple levels, first, it is a web accessible sample tracking system. It can be accessed and sample data can be added in real-time through a computer, an android tablet or an Ipad. Secondly, it manages multiplexing of sequencing runs because it can detect sub-optimal index combinations from various popular commericial kits and self made indices. Thirdly, the MANGO creates well-formatted sample sheets for the various sequencers available in the FGCZ. Lastly, it can accept data in .csv format from instruments used for QC during library preparation. The MANGO is a reliable and secure cross-platform manager of our NGS service orders.
PMCID: PMC3635301
7.  iTRAQ-Based and Label-Free Proteomics Approaches for Studies of Human Adenovirus Infections 
Both isobaric tags for relative and absolute quantitation (iTRAQ) and label-free methods are widely used for quantitative proteomics. Here, we provide a detailed evaluation of these proteomics approaches based on large datasets from biological samples. iTRAQ-label-based and label-free quantitations were compared using protein lysate samples from noninfected human lung epithelial A549 cells and from cells infected for 24 h with human adenovirus type 3 or type 5. Either iTRAQ-label-based or label-free methods were used, and the resulting samples were analyzed by liquid chromatography (LC) and tandem mass spectrometry (MS/MS). To reduce a possible bias from quantitation software, we applied several software packages for each procedure. ProteinPilot and Scaffold Q+ software were used for iTRAQ-labeled samples, while Progenesis LC-MS and ProgenesisF-T2PQ/T3PQ were employed for label-free analyses. R2 correlation coefficients correlated well between two software packages applied to the same datasets with values between 0.48 and 0.78 for iTRAQ-label-based quantitations and 0.5 and 0.86 for label-free quantitations. Analyses of label-free samples showed higher levels of protein up- or downregulation in comparison to iTRAQ-labeled samples. The concentration differences were further evaluated by Western blotting for four downregulated proteins. These data suggested that the label-free method was more accurate than the iTRAQ method.
PMCID: PMC3608280  PMID: 23555056
8.  PPINGUIN: Peptide Profiling Guided Identification of Proteins improves quantitation of iTRAQ ratios 
BMC Bioinformatics  2012;13:34.
Recent development of novel technologies paved the way for quantitative proteomics. One of the most important among them is iTRAQ, employing isobaric tags for relative or absolute quantitation. Despite large progress in technology development, still many challenges remain for derivation and interpretation of quantitative results. One of these challenges is the consistent assignment of peptides to proteins.
We have developed Peptide Profiling Guided Identification of Proteins (PPINGUIN), a statistical analysis workflow for iTRAQ data addressing the problem of ambiguous peptide quantitations. Motivated by the assumption that peptides uniquely derived from the same protein are correlated, our method employs clustering as a very early step in data processing prior to protein inference. Our method increases experimental reproducibility and decreases variability of quantitations of peptides assigned to the same protein. Giving further support to our method, application to a type 2 diabetes dataset identifies a list of protein candidates that is in very good agreement with previously performed transcriptomics meta analysis. Making use of quantitative properties of signal patterns identified, PPINGUIN can reveal new isoform candidates.
Regarding the increasing importance of quantitative proteomics we think that this method will be useful in practical applications like model fitting or functional enrichment analysis. We recommend to use this method if quantitation is a major objective of research.
PMCID: PMC3368728  PMID: 22340093
9.  Preferred analysis methods for single genomic regions in RNA sequencing revealed by processing the shape of coverage 
Nucleic Acids Research  2011;40(9):e63.
The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator × 4 normalizations × 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primers.
PMCID: PMC3351146  PMID: 22210855
10.  Phosphoproteomic Analysis Reveals Interconnected System-Wide Responses to Perturbations of Kinases and Phosphatases in Yeast 
Science signaling  2010;3(153):rs4.
The phosphorylation and dephosphorylation of proteins by kinases and phosphatases constitute an essential regulatory network in eukaryotic cells. This network supports the flow of information from sensors through signaling systems to effector molecules, and ultimately drives the phenotype and function of cells, tissues, and organisms. Dysregulation of this process has severe consequences and is one of the main factors in the emergence and progression of diseases, including cancer. Thus, major efforts have been invested in developing specific inhibitors that modulate the activity of individual kinases or phosphatases; however, it has been difficult to assess how such pharmacological interventions would affect the cellular signaling network as a whole. Here, we used label-free, quantitative phosphoproteomics in a systematically perturbed model organism (Saccharomyces cerevisiae) to determine the relationships between 97 kinases, 27 phosphatases, and more than 1000 phosphoproteins. We identified 8814 regulated phosphorylation events, describing the first system-wide protein phosphorylation network in vivo. Our results show that, at steady state, inactivation of most kinases and phosphatases affected large parts of the phosphorylation-modulated signal transduction machinery, and not only the immediate downstream targets. The observed cellular growth phenotype was often well maintained despite the perturbations, arguing for considerable robustness in the system. Our results serve to constrain future models of cellular signaling and reinforce the idea that simple linear representations of signaling pathways might be insufficient for drug development and for describing organismal homeostasis.
PMCID: PMC3072779  PMID: 21177495
12.  PhosphoPep—a phosphoproteome resource for systems biology research in Drosophila Kc167 cells 
The ability to analyze and understand the mechanisms by which cells process information is a key question of systems biology research. Such mechanisms critically depend on reversible phosphorylation of cellular proteins, a process that is catalyzed by protein kinases and phosphatases. Here, we present PhosphoPep, a database containing more than 10 000 unique high-confidence phosphorylation sites mapping to nearly 3500 gene models and 4600 distinct phosphoproteins of the Drosophila melanogaster Kc167 cell line. This constitutes the most comprehensive phosphorylation map of any single source to date. To enhance the utility of PhosphoPep, we also provide an array of software tools that allow users to browse through phosphorylation sites on single proteins or pathways, to easily integrate the data with other, external data types such as protein–protein interactions and to search the database via spectral matching. Finally, all data can be readily exported, for example, for targeted proteomics approaches and the data thus generated can be again validated using PhosphoPep, supporting iterative cycles of experimentation and analysis that are typical for systems biology research.
PMCID: PMC2063582  PMID: 17940529
data integration; Drosophila; interactive database; phosphoproteomics; systems biology
13.  MAGMA: analysis of two-channel microarrays made easy 
Nucleic Acids Research  2007;35(Web Server issue):W86-W90.
The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at
PMCID: PMC1933123  PMID: 17517778
14.  Gain and Loss of Multiple Genes During the Evolution of Helicobacter pylori 
PLoS Genetics  2005;1(4):e43.
Sequence diversity and gene content distinguish most isolates of Helicobacter pylori. Even greater sequence differences differentiate distinct populations of H. pylori from different continents, but it was not clear whether these populations also differ in gene content. To address this question, we tested 56 globally representative strains of H. pylori and four strains of Helicobacter acinonychis with whole genome microarrays. Of the weighted average of 1,531 genes present in the two sequenced genomes, 25% are absent in at least one strain of H. pylori and 21% were absent or variable in H. acinonychis. We extrapolate that the core genome present in all isolates of H. pylori contains 1,111 genes. Variable genes tend to be small and possess unusual GC content; many of them have probably been imported by horizontal gene transfer. Phylogenetic trees based on the microarray data differ from those based on sequences of seven genes from the core genome. These discrepancies are due to homoplasies resulting from independent gene loss by deletion or recombination in multiple strains, which distort phylogenetic patterns. The patterns of these discrepancies versus population structure allow a reconstruction of the timing of the acquisition of variable genes within this species. Variable genes that are located within the cag pathogenicity island were apparently first acquired en bloc after speciation. In contrast, most other variable genes are of unknown function or encode restriction/modification enzymes, transposases, or outer membrane proteins. These seem to have been acquired prior to speciation of H. pylori and were subsequently lost by convergent evolution within individual strains. Thus, the use of microarrays can reveal patterns of gene gain or loss when examined within a phylogenetic context that is based on sequences of core genes.
The Gram-negative pathogenic bacterium Helicobacter pylori colonizes the stomach of 50% of mankind and has probably infected humans since their origins. Due to geographic isolation and frequent local recombination, phylogeographic differences within H. pylori have arisen, resulting in multiple populations and subpopulations that mirror ancient human migrations and genetic diversity. We have examined the gene content of representatives of these populations by whole genome microarrays. Only 1,111 genes are predicted to exist in all H. pylori of the 1,531 that are present on average in two sequenced genomes. Missing genes fall into two classes: one class contains genes within the cag pathogenicity island that was acquired en bloc after speciation and is present only in particular populations. The second class contains a variety of genes whose function may be unimportant for the cell and that were acquired prior to speciation. Their absence in individual isolates reflects convergent evolution through gene loss. Thus, patterns of gene gain or loss can be identified by whole genome microarrays within a phylogenetic context that can be supplied by sequences of genes from the core genome.
PMCID: PMC1245399  PMID: 16217547

Results 1-14 (14)