PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-18 (18)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
1.  Multi-platform and cross-methodological reproducibility of transcriptome profiling by RNA-seq in the ABRF Next-Generation Sequencing Study 
Nature biotechnology  2014;32(9):915-925.
High-throughput RNA sequencing (RNA-seq) dramatically expands the potential for novel genomics discoveries, but the wide variety of platforms, protocols and performance has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We tested replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (polyA-selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies’ PGM and Proton, Pacific Biosciences RS and Roche’s 454). The results show high intra-platform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. These data also demonstrate that ribosomal RNA depletion can both enable effective analysis of degraded RNA samples and be readily compared to polyA-enriched fractions. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.
doi:10.1038/nbt.2972
PMCID: PMC4167418  PMID: 25150835
2.  Metagenomic Assay for Identification of Microbial Pathogens in Tumor Tissues 
mBio  2014;5(5):e01714-14.
ABSTRACT
Screening for thousands of viruses and other pathogenic microorganisms, including bacteria, fungi, and parasites, in human tumor tissues will provide a better understanding of the contributory role of the microbiome in the predisposition for, causes of, and therapeutic responses to the associated cancer. Metagenomic assays designed to perform these tasks will have to include rapid and economical processing of large numbers of samples, supported by straightforward data analysis pipeline and flexible sample preparation options for multiple input tissue types from individual patients, mammals, or environmental samples. To meet these requirements, the PathoChip platform was developed by targeting viral, prokaryotic, and eukaryotic genomes with multiple DNA probes in a microarray format that can be combined with a variety of upstream sample preparation protocols and downstream data analysis. PathoChip screening of DNA plus RNA from formalin-fixed, paraffin-embedded tumor tissues demonstrated the utility of this platform, and the detection of oncogenic viruses was validated using independent PCR and deep sequencing methods. These studies demonstrate the use of the PathoChip technology combined with PCR and deep sequencing as a valuable strategy for detecting the presence of pathogens in human cancers and other diseases.
IMPORTANCE
This work describes the design and testing of a PathoChip array containing probes with the ability to detect all known publicly available virus sequences as well as hundreds of pathogenic bacteria, fungi, parasites, and helminths. PathoChip provides wide coverage of microbial pathogens in an economical format. PathoChip screening of DNA plus RNA from formalin-fixed, paraffin-embedded tumor tissues demonstrated the utility of this platform, and the detection of oncogenic viruses was validated using independent PCR and sequencing methods. These studies demonstrate that the PathoChip technology is a valuable strategy for detecting the presence of pathogens in human cancers and other diseases.
doi:10.1128/mBio.01714-14
PMCID: PMC4172075  PMID: 25227467
4.  Longer Reads Do Not Significantly Improve RNA-seq Results 
The initial next-generation sequencing technologies produced reads of 25 or 36 base-pairs and only read from a single-end of the library sequence. Currently, it is possible to reliably produce 300bp paired-end sequences for RNA-expression analysis. While these read lengths have consistently increased, people have assumed that longer reads are better and that paired-end reads produce better results than single-end reads. These assumptions have been based upon intuition rather than hard experimentation. Using the RNA-seq standards from the Association of Biomolecular Facilities – Next Generation Sequencing (ABRF-NGS) Study, we were able to evaluate the impact of read-length on RNA-seq results. We started with paired-end 100bp reads and then trimmed them to simulate different read lengths along with separating the pairs to produce single-end reads. For each read-length and paired status, we evaluated differential expression levels between two standard samples and compared the results to those obtained by qPCR. We found that with the exception of reads trimmed to 25bp, there is little difference for the detection of differential expression regardless of the read-length. Once single-end 50bp reads are used, the results do not change substantially for any level up to and including 100bp paired-end reads. Thus, a researcher could save substantial resources by using 50bp single-end reads for their paired-end expression analysis. We replicated these results by using multiple computational pipelines to confirm that they were not a result of the particular algorithm we were using. Additionally, we performed the same analysis on two ENCODE samples and found consistent results affirming that our conclusions have broad application.
PMCID: PMC4162222
5.  The ABRF Next Generation Sequencing Study: Multi-platform and Cross-methodological Reproducibility of Transcriptome Profiling by RNA-seq 
Next generation sequencing (NGS) has dramatically expanded the potential for novel genomics discoveries, but the wide variety of platforms, protocols, and performance has created the need for reference data sets to understand the sources of variation in results. The goals of the ABRF-NGS Study are to use standard references to evaluate the performance of NGS platforms and to identify optimal methods and best practices. For the first phase of this study, over 20 core facility laboratories performed replicate RNA-seq experiments, using titrated reference RNA standards and a set of synthetic RNA spike-ins, evaluated over a wide range of methods: polyA-enriched, ribo-depleted, size-specific fractionations, and degraded RNA, on six NGS platforms (Illumina HiSeq 2000/2500 and MiSeq, Life Technologies PGM and Proton, Roche 454 GS FLX+, and PacBio RS). Two RT-qPCR data sets were used as orthogonal tools to gauge the RNA-seq results. The results show high intra-platform consistency and inter-platform concordance for expression measures, but also demonstrate highly variable rates of efficiency and costs for splice isoform detection between platforms. The data also add evidence that ribosomal RNA depletion can both salvage degraded RNA samples and be readily compared to polyA-enriched fractions. Comparisons of alternative aligners for each platform show that algorithm choice affects mapping rates and transcript coverage more than gene quantification. Surrogate variable analysis (SVA) proved to be an optimal method to combine data within and between platforms, increasing sensitivity and reducing false positives by over 90%. Taken together, these data represent a broad cross-platform characterization of RNA standards and provide a comprehensive comparison of results from degraded, full-length, and size-selected RNA across the latest NGS platforms. The next phase of this study is focusing on use of DNA reference standards. Results of the ABRF-NGS Study provide a broad foundation for cross-platform standardization, evaluation, and improvement of NGS applications.
PMCID: PMC4162241
6.  Development of a Genotyping Microarray for Studying the Role of Gene-Environment Interactions in Risk for Lung Cancer 
A microarray (LungCaGxE), based on Illumina BeadChip technology, was developed for high-resolution genotyping of genes that are candidates for involvement in environmentally driven aspects of lung cancer oncogenesis and/or tumor growth. The iterative array design process illustrates techniques for managing large panels of candidate genes and optimizing marker selection, aided by a new bioinformatics pipeline component, Tagger Batch Assistant. The LungCaGxE platform targets 298 genes and the proximal genetic regions in which they are located, using ∼13,000 DNA single nucleotide polymorphisms (SNPs), which include haplotype linkage markers with a minimum allele frequency of 1% and additional specifically targeted SNPs, for which published reports have indicated functional consequences or associations with lung cancer or other smoking-related diseases. The overall assay conversion rate was 98.9%; 99.0% of markers with a minimum Illumina design score of 0.6 successfully generated allele calls using genomic DNA from a study population of 1873 lung-cancer patients and controls.
doi:10.7171/jbt.13-2404-004
PMCID: PMC3792704  PMID: 24294113
genetic association; environmental exposures; Tagger Batch Assistant; LungCaGxE
7.  Metabolite and transcriptome analysis during fasting suggest a role for the p53-Ddit4 axis in major metabolic tissues 
BMC Genomics  2013;14:758.
Background
Fasting induces specific molecular and metabolic adaptions in most organisms. In biomedical research fasting is used in metabolic studies to synchronize nutritional states of study subjects. Because there is a lack of standardization for this procedure, we need a deeper understanding of the dynamics and the molecular mechanisms in fasting.
Results
We investigated the dynamic changes of liver gene expression and serum parameters of mice at several time points during a 48 hour fasting experiment and then focused on the global gene expression changes in epididymal white adipose tissue (WAT) as well as on pathways common to WAT, liver, and skeletal muscle. This approach produced several intriguing insights: (i) rather than a sequential activation of biochemical pathways in fasted liver, as current knowledge dictates, our data indicates a concerted parallel response; (ii) this first characterization of the transcriptome signature of WAT of fasted mice reveals a remarkable activation of components of the transcription apparatus; (iii) most importantly, our bioinformatic analyses indicate p53 as central node in the regulation of fasting in major metabolic tissues; and (iv) forced expression of Ddit4, a fasting-regulated p53 target gene, is sufficient to augment lipolysis in cultured adipocytes.
Conclusions
In summary, this combination of focused and global profiling approaches provides a comprehensive molecular characterization of the processes operating during fasting in mice and suggests a role for p53, and its downstream target Ddit4, as novel components in the transcriptional response to food deprivation.
doi:10.1186/1471-2164-14-758
PMCID: PMC3907060  PMID: 24191950
Fasting; Starvation; Nutrient deprivation; Adipose tissue; p53 signaling; Ddit4; Lipolysis
8.  Development of a Genotyping Microarray for Studying the Role of Gene-Environment Interactions in Risk for Lung Cancer 
A microarray (LungCaGxE), based on Illumina BeadChip technology, was developed for high-resolution genotyping of genes that are candidates for involvement in environmentally driven aspects of lung cancer oncogenesis and/or tumor growth. The iterative array design process illustrates techniques for managing large panels of candidate genes and optimizing marker selection, aided by a new bioinformatics pipeline component, Tagger Batch Assistant. The LungCaGxE platform targets 298 genes and the proximal genetic regions in which they are located, using ∼13,000 DNA single nucleotide polymorphisms (SNPs), which include haplotype linkage markers with a minimum allele frequency of 1% and additional specifically targeted SNPs, for which published reports have indicated functional consequences or associations with lung cancer or other smoking-related diseases. The overall assay conversion rate was 98.9%; 99.0% of markers with a minimum Illumina design score of 0.6 successfully generated allele calls using genomic DNA from a study population of 1873 lung-cancer patients and controls.
doi:10.7171/jbt.13-2404-004
PMCID: PMC3792704  PMID: 24294113
genetic association; environmental exposures; Tagger Batch Assistant; LungCaGxE
9.  [No title available] 
RNA sequencing is a rich assay for delineating the transcriptome but few RNA-Seq standard data sets exist to help quantification of gene or splice form expression. Moreover, each next-generation sequencing (NGS) platform has unique aspects of library synthesis, sequencing, alignment, and data processing. Little is known about cross-site reproducibility, technical variance and interoperability of NGS platforms for RNA-Seq.
The goals of the ABRF-NGS study are to evaluate the performance of NGS platforms and to identify optimal methods and best practices. The study includes five ABRF Research Groups and over 20 core facility laboratories. To address RNA-Seq issues, we performed sequencing on five NGS platforms at multiple sites using two standardized RNA samples with synthetic RNA spike-ins. Platforms tested included Illumina HiSeq 2000/2500, Roche 454 GS FLX, Life Technology Ion PGM and Proton, and PacBio. We evaluated a wide range of variables, including varying input amount (1-1000 ng), alternate library preparation methods, specific size fractionation (1, 2, and 3 kb), and performance on degraded RNA (using heat, sonication, and RNase A). We used a set of 18,250 rt-PCR reactions as an orthogonal tool to gauge the linear and dynamic range of the RNA-Seq results.
Our results show that unique transcripts and isoforms are revealed by each method and NGS platform. We found that the majority of the human transcriptome can be found with each method and platform. We also discovered thousands of transcriptionally active regions (TARs) beyond existing gene annotations, which demonstrate that conservative annotation sets are inappropriate for analysis, versus larger annotation sets. Moreover, while we see high correlation of RNA-Seq within sites, we observed that “site effect” is the largest variance factor outside of biological sources. Additionally, we observed that the “bioinformatics noise” of aligners and annotations contributes substantial variance, underscoring the need for data provenance for long-term studies.
PMCID: PMC3635248
10.  Genomics Research Group Studies on Transcriptome Analysis by Microarrays and NGS 
Global transcriptome analysis is of growing importance in understanding how altered expression of genetic variants contributes to complex diseases such as cancer, diabetes, and heart disease and the adverse effects of environmental pollutants on living organisms. The Genomics Research Group presentation is intended to describe the current activities of the group in applying the latest tools and technologies for transcriptome analysis in order to determine the advantages and disadvantages of each of the platforms and the suitability of the different platforms for the different studies. We have specifically evaluated microarrays, QPCR and NGS platforms for examining the sensitivity and specificity of microRNA detection using synthetic miRNA standards. We have also used low input RNA from Asian Oysters to decipher their transcriptome and study the effect of PAHs and oxygen on them using next generation sequencing technology. We will discuss in detail the technical challenges and the results obtained from each of these projects.
PMCID: PMC3635278
11.  The ABRF-Next Generation Sequencing Study: A Five-Platform, Cross-site, Cross-Protocol Examination of RNA Sequencing 
RNA sequencing is a rich assay for delineating the transcriptome but few RNA-Seq standard data sets exist to help quantification of gene or splice form expression. Moreover, each next-generation sequencing (NGS) platform has unique aspects of library synthesis, sequencing, alignment, and data processing. Little is known about cross-site reproducibility, technical variance and interoperability of NGS platforms for RNA-Seq.
The goals of the ABRF-NGS study are to evaluate the performance of NGS platforms and to identify optimal methods and best practices. The study includes five ABRF Research Groups and over 20 core facility laboratories. To address RNA-Seq issues, we performed sequencing on five NGS platforms at multiple sites using two standardized RNA samples with synthetic RNA spike-ins. Platforms tested included Illumina HiSeq 2000/2500, Roche 454 GS FLX, Life Technology Ion PGM and Proton, and PacBio. We evaluated a wide range of variables, including varying input amount (1-1000 ng), alternate library preparation methods, specific size fractionation (1, 2, and 3 kb), and performance on degraded RNA (using heat, sonication, and RNase A). We used a set of 18,250 rt-PCR reactions as an orthogonal tool to gauge the linear and dynamic range of the RNA-Seq results.
Our results show that unique transcripts and isoforms are revealed by each method and NGS platform. We found that the majority of the human transcriptome can be found with each method and platform. We also discovered thousands of transcriptionally active regions (TARs) beyond existing gene annotations, which demonstrate that conservative annotation sets are inappropriate for analysis, versus larger annotation sets. Moreover, while we see high correlation of RNA-Seq within sites, we observed that “site effect” is the largest variance factor outside of biological sources. Additionally, we observed that the “bioinformatics noise” of aligners and annotations contributes substantial variance, underscoring the need for data provenance for long-term studies.
PMCID: PMC3635422
12.  Genomics Research Group (GRG): Elucidating the Effects of the Deepwater Horizon Oil Spill on the Atlantic Oyster Using Global Transcriptome Analysis 
Global transcriptome analysis is of growing importance in understanding how altered expression of genetic variants contributes to complex diseases such as cancer, diabetes, and heart disease as well as the effect of environmental pollutants to living organisms. The Genomics Research Group applied next generation sequencing technologies to study the effects of Deep Water Horizon oil spill on the transcriptome of atlantic oysters. The Deep Water Horizon oil spill resulted in the release of over 200 million gallons of crude oil into the waters of the Gulf of Mexico. Over two million gallons of chemical were used to emulsify and disperse oil plumes posing further risks to the environment in addition to the direct impacts of crude oil. Biota such as the commercially important Atlantic oyster Crassostrea virginica, were inevitably exposed to spill-related contaminants in the Gulf. The potential effects of oiled water and sediments on oysters range from non-detectable to reduced settlement to impaired immune function, acute intoxication, and death due to bioaccumulation of contaminants. Oil also may affect oxygen diffusion through the water column, and in some cases lead to hypoxic conditions that prompt avoidance migration by mobile species. Sedentary organisms such as oysters are even more susceptible to these negative effects of oil contamination. The mechanisms of toxicity of the oil and spill-related compounds are not well understood. In order to understand these mechanisms, we used RNAsequencing of oyster samples from before and after the spill. As the C. virginica genome is not available, we used two different approaches. First, the sequences were mapped to the Pacific oyster genome, recently released. Secondly, a de novo transcriptome assembly was performed. The de novo transcriptome assembly returned a 66-70% alignment rate. Finally, 9,469 transcripts were identified as homologs between the Atlantic and the Pacific oyster.
PMCID: PMC3635446
13.  Association of the Nicotine Metabolite Ratio and CHRNA5/CHRNA3 Polymorphisms With Smoking Rate Among Treatment-Seeking Smokers 
Nicotine & Tobacco Research  2011;13(6):498-503.
Introduction:
Genome-wide association studies have linked single-nucleotide polymorphisms (SNPs) in the CHRNA5/A3/B4 gene cluster with heaviness of smoking. The nicotine metabolite ratio (NMR), a measure of the rate of nicotine metabolism, is associated with the number of cigarettes per day (CPD) and likelihood of cessation. We tested the potential interacting effects of these two risk factors on CPD.
Methods:
Pretreatment data from three prior clinical trials were pooled for analysis. One thousand and thirty treatment seekers of European ancestry with genotype data for the CHRNA5/A3/B4 SNPs rs578776 and rs1051730 and complete data for NMR and CPD at pretreatment were included. Data for the third SNP, rs16969968, were available for 677 individuals. Linear regression models estimated the main and interacting effects of genotype and NMR on CPD.
Results:
We confirmed independent associations between the NMR and CPD as well as between the SNPs rs16969968 and rs1051730 and CPD. We did not detect a significant interaction between NMR and any of the SNPs examined.
Conclusions:
This study demonstrates the additive and independent association of the NMR and SNPs in the CHRNA5/A3/B4 gene cluster with smoking rate in treatment-seeking smokers.
doi:10.1093/ntr/ntr012
PMCID: PMC3103715  PMID: 21385908
14.  [No title available] 
The goals of the ABRF Next Generation Sequencing (ABRF-NGS) study are to evaluate the performance of all available NGS platforms and to identify optimal methods and best practices across sites. The study is a coordinated effort of five ABRF Research Groups, involving over 20 core facility laboratories. The ABRF-NGS study currently includes the Illumina HiSeq, Roche 454, Life Technologies Ion Torrent PGM and Pacific Biosciences PacBio RS platforms. The first phase of the study is focused on transcriptome analysis using RNA reference samples from the Microarray Quality Control (MAQC) study together with spike-in controls developed by the External RNA Controls Consortium (ERCC). The aim of this first phase is to assess sequencing accuracy, absolute and relative expression levels, and differential expression detection. The ABRF-NGS study is not intended to be a “bake-off” but rather is an effort to establish a reference data set for each platform to help sites improve their methods. Future phases of the study will include evaluation of results with degraded RNA and DNA, microRNA profiling, DNA and RNA sequencing of a HapMap trio, and DNA sequencing of reference sets of samples with well defined “difficult-to-sequence” regions. The long-term goals of the ABRF-NGS study are to optimize the detection of genetic variation with the latest sequencing tools and to establish a community resource for self-evaluation and self-improvement that will allow users of next generation sequencing technologies to readily compare their own performance data as instruments and protocols change. This is a key feature of an evaluation resource given the rapid pace of development of NGS technologies. This session will present the ABRF-NGS project design and participants and the current status of data collection and analysis.
PMCID: PMC3630658
15.  Convergent Evidence that Choline Acetyltransferase Gene Variation is Associated with Prospective Smoking Cessation and Nicotine Dependence 
Neuropsychopharmacology  2010;35(6):1374-1382.
The ability to quit smoking is heritable, yet few genetic studies have investigated prospective smoking cessation. We conducted a systems-based genetic association analysis in a sample of 472 treatment-seeking smokers of European ancestry after 8 weeks of transdermal nicotine therapy for smoking cessation. The genotyping panel included 169 single-nucleotide polymorphisms (SNPs) in 7 nicotinic acetylcholine receptor subunit genes and 4 genes in the endogenous cholinergic system. The primary outcome was smoking cessation (biochemically confirmed) at the end of treatment. SNPs clustered in the choline acetyltransferase (ChAT) gene were individually identified as nominally significant, and a 5-SNP haplotype (block 6) in ChAT was found to be significantly associated with quitting success. Single SNPs in ChAT haplotype block 2 were also associated with pretreatment levels of nicotine dependence in this cohort. To replicate associations of SNPs in haplotype blocks 2 and 6 of ChAT with nicotine dependence in a non-treatment-seeking cohort, we used data from an independent community-based sample of 629 smokers representing 200 families of European ancestry. Significant SNP and haplotype associations were identified for multiple measures of nicotine dependence. Although the effect sizes in both cohorts are modest, converging data across cohorts and phenotypes suggest that ChAT may be involved in nicotine dependence and ability to quit smoking. Additional sequencing and characterization of ChAT may reveal functional variants that contribute to nicotine dependence and smoking cessation.
doi:10.1038/npp.2010.7
PMCID: PMC2855736  PMID: 20147892
nicotine; smoking cessation; choline acetyltransferase ChAT; pharmacogenetics; addiction; Pharmacogenetics/Pharmacogenomics; Addiction & Substance Abuse; Clinical Pharmacology/Trials; Psychiatry & Behavioral Sciences; Nicotine; Smoking Cessation; choline acetyltransferase ChAT
16.  Convergent Evidence that Choline Acetyltransferase Gene Variation is Associated with Prospective Smoking Cessation and Nicotine Dependence 
The ability to quit smoking is heritable, yet few genetic studies have investigated prospective smoking cessation. We conducted a systems-based genetic association analysis in a sample of 472 treatment-seeking smokers of European ancestry following eight weeks of transdermal nicotine therapy for smoking cessation. The genotyping panel included 169 SNPs in 7 nicotinic acetylcholine receptor subunit genes and 4 genes in the endogenous cholinergic system. The primary outcome was smoking cessation (biochemically confirmed) at the end of treatment. SNPs clustered in the choline acetyltransferase (ChAT) gene were individually identified as nominally significant, and a 5-SNP haplotype (block 6) in ChAT was found to be significantly associated with quitting success. Single SNPs in ChAT haplotype block 2 were also associated with pre-treatment levels of nicotine dependence in this cohort. To replicate associations of SNPs in haplotype blocks 2 and 6 of ChAT with nicotine dependence in a non treatment-seeking cohort, we utilized data from an independent community-based sample of 629 smokers representing 200 families of European ancestry. Significant SNP and haplotype associations were identified for multiple measures of nicotine dependence. Although the effect sizes in both cohorts are modest, converging data across cohorts and phenotypes suggest that ChAT may be involved in nicotine dependence and ability to quit smoking. Additional sequencing and characterization of ChAT may reveal functional variants that contribute to nicotine dependence and smoking cessation.
doi:10.1038/npp.2010.7
PMCID: PMC2855736  PMID: 20147892
nicotine; smoking cessation; choline acetyltransferase ChAT; pharmacogenetics; addiction
17.  Global genomic analysis reveals rapid control of a robust innate response in SIV-infected sooty mangabeys 
The Journal of Clinical Investigation  2009;119(12):3556-3572.
Natural SIV infection of sooty mangabeys (SMs) is nonprogressive despite chronic virus replication. Strikingly, it is characterized by low levels of immune activation, while pathogenic SIV infection of rhesus macaques (RMs) is associated with chronic immune activation. To elucidate the mechanisms underlying this intriguing phenotype, we used high-density oligonucleotide microarrays to longitudinally assess host gene expression in SIV-infected SMs and RMs. We found that acute SIV infection of SMs was consistently associated with a robust innate immune response, including widespread upregulation of IFN-stimulated genes (ISGs) in blood and lymph nodes. While SMs exhibited a rapid resolution of ISG expression and immune activation, both responses were observed chronically in RMs. Systems biology analysis indicated that expression of the lymphocyte inhibitory receptor LAG3, a marker of T cell exhaustion, correlated with immune activation in SIV-infected RMs but not SMs. Our findings suggest that active immune regulatory mechanisms, rather than intrinsically attenuated innate immune responses, underlie the low levels of immune activation characteristic of SMs chronically infected with SIV.
doi:10.1172/JCI40115
PMCID: PMC2786806  PMID: 19959874
18.  Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays 
PLoS Genetics  2007;3(8):e143.
Genomic aberrations recurrent in a particular cancer type can be important prognostic markers for tumor progression. Typically in early tumorigenesis, cells incur a breakdown of the DNA replication machinery that results in an accumulation of genomic aberrations in the form of duplications, deletions, translocations, and other genomic alterations. Microarray methods allow for finer mapping of these aberrations than has previously been possible; however, data processing and analysis methods have not taken full advantage of this higher resolution. Attention has primarily been given to analysis on the single sample level, where multiple adjacent probes are necessarily used as replicates for the local region containing their target sequences. However, regions of concordant aberration can be short enough to be detected by only one, or very few, array elements. We describe a method called Multiple Sample Analysis for assessing the significance of concordant genomic aberrations across multiple experiments that does not require a-priori definition of aberration calls for each sample. If there are multiple samples, representing a class, then by exploiting the replication across samples our method can detect concordant aberrations at much higher resolution than can be derived from current single sample approaches. Additionally, this method provides a meaningful approach to addressing population-based questions such as determining important regions for a cancer subtype of interest or determining regions of copy number variation in a population. Multiple Sample Analysis also provides single sample aberration calls in the locations of significant concordance, producing high resolution calls per sample, in concordant regions. The approach is demonstrated on a dataset representing a challenging but important resource: breast tumors that have been formalin-fixed, paraffin-embedded, archived, and subsequently UV-laser capture microdissected and hybridized to two-channel BAC arrays using an amplification protocol. We demonstrate the accurate detection on simulated data, and on real datasets involving known regions of aberration within subtypes of breast cancer at a resolution consistent with that of the array. Similarly, we apply our method to previously published datasets, including a 250K SNP array, and verify known results as well as detect novel regions of concordant aberration. The algorithm has been fully implemented and tested and is freely available as a Java application at http://www.cbil.upenn.edu/MSA.
Author Summary
Cancer is a genetic disease caused by genomic mutations that confer an increased ability to proliferate and survive in a specific environment. It is now known that many regions of genomic DNA are deleted or amplified in specific cancer types. These aberrations are believed to occur randomly in the genome. If these aberrations overlap more than would be expected by chance across individual occurrences of the cancer this suggests a selective pressure on this aberration. These conserved aberrations likely represent regions that are important for the development, progression, and survival of a specific cancer type in its environment. We present a method for identifying these conserved aberrations within a class of samples. The applications for this method include accurate high resolution mapping of aberrations characteristic of cancer subtypes as well as other genetic diseases and determination of conserved copy number variations in the population. With the use of high resolution microarray methods we have profiled different tumor types. We have been able to create high resolution profiles of conserved aberrations in specific cancer types. These conserved aberrations are prime targets for cancer therapies and many of these regions have already been used to develop effective cancer therapeutics.
doi:10.1371/journal.pgen.0030143
PMCID: PMC1950957  PMID: 17722985

Results 1-18 (18)