PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-5 (5)
 

Clipboard (0)
None
Journals
Authors
more »
Year of Publication
Document Types
1.  Performance assessment of copy number microarray platforms using a spike-in experiment 
Bioinformatics  2011;27(8):1052-1060.
Motivation: Changes in the copy number of chromosomal DNA segments [copy number variants (CNVs)] have been implicated in human variation, heritable diseases and cancers. Microarray-based platforms are the current established technology of choice for studies reporting these discoveries and constitute the benchmark against which emergent sequence-based approaches will be evaluated. Research that depends on CNV analysis is rapidly increasing, and systematic platform assessments that distinguish strengths and weaknesses are needed to guide informed choice.
Results: We evaluated the sensitivity and specificity of six platforms, provided by four leading vendors, using a spike-in experiment. NimbleGen and Agilent platforms outperformed Illumina and Affymetrix in accuracy and precision of copy number dosage estimates. However, Illumina and Affymetrix algorithms that leverage single nucleotide polymorphism (SNP) information make up for this disadvantage and perform well at variant detection. Overall, the NimbleGen 2.1M platform outperformed others, but only with the use of an alternative data analysis pipeline to the one offered by the manufacturer.
Availability: The data is available from http://rafalab.jhsph.edu/cnvcomp/.
Contact: pevsner@jhmi.edu; fspencer@jhmi.edu; rafa@jhu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr106
PMCID: PMC3072561  PMID: 21478196
2.  A framework for oligonucleotide microarray preprocessing 
Bioinformatics  2010;26(19):2363-2367.
Motivation: The availability of flexible open source software for the analysis of gene expression raw level data has greatly facilitated the development of widely used preprocessing methods for these technologies. However, the expansion of microarray applications has exposed the limitation of existing tools.
Results: We developed the oligo package to provide a more general solution that supports a wide range of applications. The package is based on the BioConductor principles of transparency, reproducibility and efficiency of development. It extends the existing tools and leverages existing code for visualization, accessing data and widely used preprocessing routines. The oligo package implements a unified paradigm for preprocessing data and interfaces with other BioConductor tools for downstream analysis. Our infrastructure is general and can be used by other BioConductor packages.
Availability: The oligo package is freely available through BioConductor, http://www.bioconductor.org.
Contact: benilton.carvalho@cancer.org.uk; rafa@jhu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btq431
PMCID: PMC2944196  PMID: 20688976
3.  Quantifying uncertainty in genotype calls 
Bioinformatics  2009;26(2):242-249.
Motivation: Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Microarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs) simultaneously. The starting point for the statistical analyses used by GWAS to determine association between loci and disease is making genotype calls (AA, AB or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays and different sample batches have substantial influence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability can adversely affect the quality of findings reported by the GWAS.
Results: We developed a method based on an enhanced version of the multi-level model used by CRLMM version 1. Two key differences are that we now account for variability across batches and improve the call-specific assessment of each call. The new model permits the development of quality metrics for SNPs, samples and batches of samples. Using three independent datasets, we demonstrate that the CRLMM version 2 outperforms CRLMM version 1 and the algorithm provided by Affymetrix, Birdseed. The main advantage of the new approach is that it enables the identification of low-quality SNPs, samples and batches.
Availability: Software implementing of the method described in this article is available as free and open source code in the crlmm R/BioConductor package.
Contact: rafa@jhu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btp624
PMCID: PMC2804295  PMID: 19906825
4.  R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips 
Bioinformatics  2009;25(19):2621-2623.
Summary: Illumina produces a number of microarray-based technologies for human genotyping. An Infinium BeadChip is a two-color platform that types between 105 and 106 single nucleotide polymorphisms (SNPs) per sample. Despite being widely used, there is a shortage of open source software to process the raw intensities from this platform into genotype calls. To this end, we have developed the R/Bioconductor package crlmm for analyzing BeadChip data. After careful preprocessing, our software applies the CRLMM algorithm to produce genotype calls, confidence scores and other quality metrics at both the SNP and sample levels. We provide access to the raw summary-level intensity data, allowing users to develop their own methods for genotype calling or copy number analysis if they wish.
Availability and Implementation: The crlmm Bioconductor package is available from http://www.bioconductor.org. Data packages and documentation are available from http://rafalab.jhsph.edu/software.html.
Contact: mritchie@wehi.edu.au; rafa@jhu.edu
doi:10.1093/bioinformatics/btp470
PMCID: PMC2752620  PMID: 19661241
5.  High-resolution spatial normalization for microarrays containing embedded technical replicates 
Bioinformatics (Oxford, England)  2006;22(24):3054-3060.
Motivation
Microarray data are susceptible to a wide-range of artifacts, many of which occur on physical scales comparable to the spatial dimensions of the array. These artifacts introduce biases that are spatially correlated. The ability of current methodologies to detect and correct such biases is limited.
Results
We introduce a new approach for analyzing spatial artifacts, termed ‘conditional residual analysis for microarrays’ (CRAM). CRAM requires a microarray design that contains technical replicates of representative features and a limited number of negative controls, but is free of the assumptions that constrain existing analytical procedures. The key idea is to extract residuals from sets of matched replicates to generate residual images. The residual images reveal spatial artifacts with single-feature resolution. Surprisingly, spatial artifacts were found to coexist independently as additive and multiplicative errors. Efficient procedures for bias estimation were devised to correct the spatial artifacts on both intensity scales. In a survey of 484 published single-channel datasets, variance fell 4- to 12-fold in 5% of the datasets after bias correction. Thus, inclusion of technical replicates in a microarray design affords benefits far beyond what one might expect with a conventional ‘n = 5’ averaging, and should be considered when designing any microarray for which randomization is feasible.
doi:10.1093/bioinformatics/btl542
PMCID: PMC2262854  PMID: 17060357

Results 1-5 (5)