Summary: The introduction of in vitro nucleic acid amplification techniques, led by real-time PCR, into the clinical microbiology laboratory has transformed the laboratory detection of viruses and select bacterial pathogens. However, the progression of the molecular diagnostic revolution currently relies on the ability to efficiently and accurately offer multiplex detection and characterization for a variety of infectious disease pathogens. Microarray analysis has the capability to offer robust multiplex detection but has just started to enter the diagnostic microbiology laboratory. Multiple microarray platforms exist, including printed double-stranded DNA and oligonucleotide arrays, in situ-synthesized arrays, high-density bead arrays, electronic microarrays, and suspension bead arrays. One aim of this paper is to review microarray technology, highlighting technical differences between them and each platform's advantages and disadvantages. Although the use of microarrays to generate gene expression data has become routine, applications pertinent to clinical microbiology continue to rapidly expand. This review highlights uses of microarray technology that impact diagnostic microbiology, including the detection and identification of pathogens, determination of antimicrobial resistance, epidemiological strain typing, and analysis of microbial infections using host genomic expression and polymorphism profiles.
A common technique used for sensitive and specific diagnostic virus detection in clinical samples is PCR that can identify one or several viruses in one assay. However, a diagnostic microarray containing probes for all human pathogens could replace hundreds of individual PCR-reactions and remove the need for a clear clinical hypothesis regarding a suspected pathogen. We have established such a diagnostic platform for random amplification and subsequent microarray identification of viral pathogens in clinical samples. We show that Phi29 polymerase-amplification of a diverse set of clinical samples generates enough viral material for successful identification by the Microbial Detection Array, demonstrating the potential of the microarray technique for broad-spectrum pathogen detection. We conclude that this method detects both DNA and RNA virus, present in the same sample, as well as differentiates between different virus subtypes. We propose this assay for diagnostic analysis of viruses in clinical samples.
Comparative genomic hybridization (CGH) microarrays have been used to determine copy number variations (CNVs) and their effects on complex diseases. Detection of absolute CNVs independent of genomic variants of an arbitrary reference sample has been a critical issue in CGH array experiments. Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform. The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs. Our method thus provides a new approach to interpret CGH array data for personalized medicine.
To address the limitations of traditional virus and pathogen detection methodologies in clinical diagnosis, scientists have developed high-throughput oligonucleotide microarrays to rapidly identify infectious agents. However, objectively identifying pathogens from the complex hybridization patterns of these massively multiplexed arrays remains challenging.
In this study, we conceived an automated method based on the hypergeometric distribution for identifying pathogens in multiplexed arrays and compared it to five other methods. We evaluated these metrics: 1) accurate prediction, whether the top ranked prediction(s) match the real virus(es); 2) four accuracy scores.
Though accurate prediction and high specificity and sensitivity can be achieved with several methods, the method based on hypergeometric distribution provides a significant advantage in term of positive predicting value with two to sixty folds the positive predicting values of other methods.
The proposed multi-specie array analysis based on the hypergeometric distribution addresses shortcomings of previous methods by enhancing signals of positively hybridized probes.
Multiplexed detection assays that analyze a modest number of nucleic acid targets over large sample sets are emerging as the preferred testing approach in such applications as routine pathogen typing, outbreak monitoring, and diagnostics. However, very few DNA testing platforms have proven to offer a solution for mid-plexed analysis that is high-throughput, sensitive, and with a low cost per test. In this work, an enhanced genotyping method based on MassCode technology was devised and integrated as part of a high-throughput mid-plexing analytical system that facilitates robust qualitative differential detection of DNA targets. Samples are first analyzed using MassCode PCR (MC-PCR) performed with an array of primer sets encoded with unique mass tags. Lambda exonuclease and an array of MassCode probes are then contacted with MC-PCR products for further interrogation and target sequences are specifically identified. Primer and probe hybridizations occur in homogeneous solution, a clear advantage over micro- or nanoparticle suspension arrays. The two cognate tags coupled to resultant MassCode hybrids are detected in an automated process using a benchtop single quadrupole mass spectrometer. The prospective value of using MassCode probe arrays for multiplexed bioanalysis was demonstrated after developing a 14plex proof of concept assay designed to subtype a select panel of Salmonella enterica serogroups and serovars. This MassCode system is very flexible and test panels can be customized to include more, less, or different markers.
Rapid and multiplexed measurement is vital in the detection of food-borne pathogens. While highly specific and sensitive, traditional immunochemical assays such as enzyme-linked immunosorbent assays (ELISAs) often require expensive read-out equipment (e.g. fluorescent labels) and lack the capability of multiplex detection. By combining the superior specificity of immunoassays with the sensitivity and simplicity of magnetic detection, we have developed a novel multiplex magnetic nanotag-based detection platform for mycotoxins that functions on a sub-picomolar concentration level. Unlike fluorescent labels, magnetic nanotags (MNTs) can be detected with inexpensive giant magnetoresistive (GMR) sensors such as spin-valve sensors. In the system presented here, each spin-valve sensor has an active area of 90 × 90 µm2, arranged in an 8×8 array. Sample is added to the antibody-immobilized sensor array prior to the addition of the biotinylated detection antibody. The sensor response is recorded in real time upon the addition of streptavidin-linked MNTs on the chip. Here we demonstrate the simultaneous detection of multiple mycotoxins (aflatoxins B1, zearalenone and HT-2) and show that a detection limit of 50 pg/mL can be achieved.
Microarrays are becoming a very popular tool for microbial detection and diagnostics. Although these diagnostic arrays are much simpler when compared to the traditional transcriptome arrays, due to the high throughput nature of the arrays, the data analysis requirements still form a bottle neck for the widespread use of these diagnostic arrays. Hence we developed a new online data sharing and analysis environment customised for diagnostic arrays.
Microbial Diagnostic Array Workstation (MDAW) is a database driven application designed in MS Access and front end designed in ASP.NET.
MDAW is a new resource that is customised for the data analysis requirements for microbial diagnostic arrays.
New design and optimization of pathogen detection microarrays is shown to allow robust and accurate detection of a range of pathogens. The customized microarray platform includes a method for reducing PCR bias during DNA amplification.
DNA microarrays used as 'genomic sensors' have great potential in clinical diagnostics. Biases inherent in random PCR-amplification, cross-hybridization effects, and inadequate microarray analysis, however, limit detection sensitivity and specificity. Here, we have studied the relationships between viral amplification efficiency, hybridization signal, and target-probe annealing specificity using a customized microarray platform. Novel features of this platform include the development of a robust algorithm that accurately predicts PCR bias during DNA amplification and can be used to improve PCR primer design, as well as a powerful statistical concept for inferring pathogen identity from probe recognition signatures. Compared to real-time PCR, the microarray platform identified pathogens with 94% accuracy (76% sensitivity and 100% specificity) in a panel of 36 patient specimens. Our findings show that microarrays can be used for the robust and accurate diagnosis of pathogens, and further substantiate the use of microarray technology in clinical diagnostics.
Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm—Tissue Array Co-Occurrence Matrix Analysis (TACOMA)—for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists’ input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size 30). We give theoretical insights into the success of co-training via thinning of the feature set in a high dimensional setting when there is “sufficient” redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists’ performance in terms of accuracy and repeatability.
We have developed and validated a consolidated bead-based genotyping platform, the Bioplex suspension array for simultaneous detection of multiple single nucleotide polymorphisms (SNPs) of the ATP-binding cassette transporters. Genetic polymorphisms have been known to influence therapeutic response and risk of disease pathologies. Genetic screening for therapeutic and diagnostic applications thus holds great promise in clinical management. The allele-specific primer extension (ASPE) reaction was used to assay 22 multiplexed SNPs for eight subjects. Comparison of the microsphere-based ASPE assay results to sequencing results showed complete concordance in genotype assignments. The Bioplex suspension array thus proves to be a reliable, cost-effective and high-throughput technological platform for genotyping. It can be easily adapted to customized SNP panels for specific applications involving large-scale mutation screening of clinically relevant markers.
Genotype; Microspheres; Polymorphism, Genetic
Microarrays are the most common method of studying global gene expression, and may soon enter the realm of FDA-approved clinical/diagnostic testing of cancer and other diseases. However, the acceptance of array data has been made difficult by the proliferation of widely different array platforms with gene probes ranging in size from 25 bases (oligonucleotides) to several kilobases (complementary DNAs or cDNAs). The algorithms applied for image and data analysis are also as varied as the microarray platforms, perhaps more so. In addition, there is a total lack of universally accepted standards for use among the different platforms and even within the same array types. Due to this lack of coherency in array technologies, confusion in interpretation of data within and across platforms has often been the norm, and studies of the same biological phenomena have, in many cases, led to contradictory results. In this commentary/review, some of the causes of this confusion will be summarized, and progress in overcoming these obstacles will be described, with the goal of providing an optimistic view of the future for the use of array technologies in global expression profiling and other applications.
microarray; expression profiling; RNA standards; controls; MGED; MAQC; NIST; ERCC
Due to insufficient biomarker validation and poor performances in diagnostic assays, the candidate biomarker verification process has to be improved. Multi-analyte immunoassays are the tool of choice for the identification and detailed validation of protein biomarkers in serum. The process of identification and validation of serum biomarkers, as well as their implementation in diagnostic routine requires an application of independent immunoassay platforms with the possibility of high-throughput. This review will focus on three main multi-analyte immunoassay platforms: planar microarrays, multiplex bead systems and, array-based surface plasmon resonance (SPR) chips. Recent developments of each platform will be discussed for application in clinical proteomics, principles, detection methods, and performance strength. The requirements for specific surface functionalization of assay platforms are continuously increasing. The reasons for this increase is the demand for highly sensitive assays, as well as the reduction of non-specific adsorption from complex samples, and with it high signal-to-noise-ratios. To achieve this, different support materials were adapted to the immobilized biomarker/ligand, allowing a high binding capacity and immobilization efficiency. In the case of immunoassays, the immobilized ligands are proteins, antibodies or peptides, which exhibit a diversity of chemical properties (acidic/alkaline; hydrophobic/hydrophilic; secondary or tertiary structure/linear). Consequently it is more challenging to develop immobilization strategies necessary to ensure a homogenous covered surface and reliable assay in comparison to DNA immobilization. New developments concerning material support for each platform are discussed especially with regard to increase the immobilization efficiency and reducing the non-specific adsorption from complex samples like serum and cell lysates.
clinical proteomics and diagnostic; multi-analyte immunoassays; serum screening; antibody-antigen interaction
Identifying the bacteria and viruses present in a complex sample is useful in disease diagnostics, product safety, environmental characterization, and research. Array-based methods have proven utility to detect in a single assay at a reasonable cost any microbe from the thousands that have been sequenced.
We designed a pan-Microbial Detection Array (MDA) to detect all known viruses (including phages), bacteria and plasmids and developed a novel statistical analysis method to identify mixtures of organisms from complex samples hybridized to the array. The array has broader coverage of bacterial and viral targets and is based on more recent sequence data and more probes per target than other microbial detection/discovery arrays in the literature. Family-specific probes were selected for all sequenced viral and bacterial complete genomes, segments, and plasmids. Probes were designed to tolerate some sequence variation to enable detection of divergent species with homology to sequenced organisms, and to have no significant matches to the human genome sequence.
In blinded testing on spiked samples with single or multiple viruses, the MDA was able to correctly identify species or strains. In clinical fecal, serum, and respiratory samples, the MDA was able to detect and characterize multiple viruses, phage, and bacteria in a sample to the family and species level, as confirmed by PCR.
The MDA can be used to identify the suite of viruses and bacteria present in complex samples.
Recent studies have shown that copy number variations (CNVs) are frequent in higher eukaryotes and associated with a substantial portion of inherited and acquired risk for various human diseases. The increasing availability of high-resolution genome surveillance platforms provides opportunity for rapidly assessing research and clinical samples for CNV content, as well as for determining the potential pathogenicity of identified variants. However, few informatics tools for accurate and efficient CNV detection and assessment currently exist.
We developed a suite of software tools and resources (CNV Workshop) for automated, genome-wide CNV detection from a variety of SNP array platforms. CNV Workshop includes three major components: detection, annotation, and presentation of structural variants from genome array data. CNV detection utilizes a robust and genotype-specific extension of the Circular Binary Segmentation algorithm, and the use of additional detection algorithms is supported. Predicted CNVs are captured in a MySQL database that supports cohort-based projects and incorporates a secure user authentication layer and user/admin roles. To assist with determination of pathogenicity, detected CNVs are also annotated automatically for gene content, known disease loci, and gene-based literature references. Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data, integration with the UCSC Genome Browser, and tabular displays of genomic attributes for each CNV.
To our knowledge, CNV Workshop represents the first cohesive and convenient platform for detection, annotation, and assessment of the biological and clinical significance of structural variants. CNV Workshop has been successfully utilized for assessment of genomic variation in healthy individuals and disease cohorts and is an ideal platform for coordinating multiple associated projects.
Availability and Implementation
Available on the web at: http://sourceforge.net/projects/cnv
Many rapid methods have been developed for screening foods for the presence of pathogenic microorganisms. Rapid methods that have the additional ability to identify microorganisms via multiplexed immunological recognition have the potential for classification or typing of microbial contaminants thus facilitating epidemiological investigations that aim to identify outbreaks and trace back the contamination to its source. This manuscript introduces a novel, high throughput typing platform that employs microarrayed multiwell plate substrates and laser-induced fluorescence of the nucleic acid intercalating dye/stain SYBR Gold for detection of antibody-captured bacteria. The aim of this study was to use this platform for comparison of different sets of antibodies raised against the same pathogens as well as demonstrate its potential effectiveness for serotyping. To that end, two sets of antibodies raised against each of the “Big Six” non-O157 Shiga toxin-producing E. coli (STEC) as well as E. coli O157:H7 were array-printed into microtiter plates, and serial dilutions of the bacteria were added and subsequently detected. Though antibody specificity was not sufficient for the development of an STEC serotyping method, the STEC antibody sets performed reasonably well exhibiting that specificity increased at lower capture antibody concentrations or, conversely, at lower bacterial target concentrations. The favorable results indicated that with sufficiently selective and ideally concentrated sets of biorecognition elements (e.g., antibodies or aptamers), this high-throughput platform can be used to rapidly type microbial isolates derived from food samples within ca. 80 min of total assay time. It can also potentially be used to detect the pathogens from food enrichments and at least serve as a platform for testing antibodies.
antibody; microarray; bacteria; fluorescence; microtiter plate; typing
Despite the known relevance of genomic structural variants to pathogen behavior, cancer, development, and evolution, certain repeat based structural variants may evade detection by existing high-throughput techniques. Here, we present ruler arrays, a technique to detect genomic structural variants including insertions and deletions (indels), duplications, and translocations. A ruler array exploits DNA polymerase’s processivity to detect physical distances between defined genomic sequences regardless of the intervening sequence. The method combines a sample preparation protocol, tiling genomic microarrays, and a new computational analysis. The analysis of ruler array data from two genomic samples enables the identification of structural variation between the samples. In an empirical test between two closely related haploid strains of yeast ruler arrays detected 78% of the structural variants larger than 100 bp.
Genotyping platforms such as single nucleotide polymorphism (SNP) arrays are powerful tools to study genomic aberrations in cancer samples. Allele specific information from SNP arrays provides valuable information for interpreting copy number variation (CNV) and allelic imbalance including loss-of-heterozygosity (LOH) beyond that obtained from the total DNA signal available from array comparative genomic hybridization (aCGH) platforms. Several algorithms based on hidden Markov models (HMMs) have been designed to detect copy number changes and copy-neutral LOH making use of the allele information on SNP arrays. However heterogeneity in clinical samples, due to stromal contamination and somatic alterations, complicates analysis and interpretation of these data.
We have developed MixHMM, a novel hidden Markov model using hidden states based on chromosomal structural aberrations. MixHMM allows CNV detection for copy numbers up to 7 and allows more complete and accurate description of other forms of allelic imbalance, such as increased copy number LOH or imbalanced amplifications. MixHMM also incorporates a novel sample mixing model that allows detection of tumor CNV events in heterogeneous tumor samples, where cancer cells are mixed with a proportion of stromal cells.
We validate MixHMM and demonstrate its advantages with simulated samples, clinical tumor samples and a dilution series of mixed samples. We have shown that the CNVs of cancer cells in a tumor sample contaminated with up to 80% of stromal cells can be detected accurately using Illumina BeadChip and MixHMM.
The MixHMM is available as a Python package provided with some other useful tools at http://genecube.med.yale.edu:8080/MixHMM.
Affymetrix microarrays are used by many laboratories to generate gene expression profiles. Generally, only large differences (> 1.7-fold) between conditions have been reported. Computational methods to reduce inter-array variability might be of value when attempting to detect smaller differences. We examined whether inter-array variability could be reduced by using data based on the Affymetrix algorithm for pairwise comparisons between arrays (ratio method) rather than data based on the algorithm for analysis of individual arrays (signal method). Six HG-U95A arrays that probed mRNA from young (21–31 yr old) human muscle were compared with six arrays that probed mRNA from older (62–77 yr old) muscle.
Differences in mean expression levels of young and old subjects were small, rarely > 1.5-fold. The mean within-group coefficient of variation for 4629 mRNAs expressed in muscle was 20% according to the ratio method and 25% according to the signal method. The ratio method yielded more differences according to t-tests (124 vs. 98 differences at P < 0.01), rank sum tests (107 vs. 85 differences at P < 0.01), and the Significance Analysis of Microarrays method (124 vs. 56 differences with false detection rate < 20%; 20 vs. 0 differences with false detection rate < 5%). The ratio method also improved consistency between results of the initial scan and results of the antibody-enhanced scan.
The ratio method reduces inter-array variance and thereby enhances statistical power.
Affecting the core functional microbiome, peculiar high level taxonomic unbalances of the human intestinal microbiota have been recently associated with specific diseases, such as obesity, inflammatory bowel diseases, and intestinal inflammation.
In order to specifically monitor microbiota unbalances that impact human physiology, here we develop and validate an original DNA-microarray (HTF-Microbi.Array) for the high taxonomic level fingerprint of the human intestinal microbiota. Based on the Ligase Detection Reaction-Universal Array (LDR-UA) approach, the HTF-Microbi.Array enables specific detection and approximate relative quantification of 16S rRNAs from 30 phylogenetically related groups of the human intestinal microbiota. The HTF-Microbi.Array was used in a pilot study of the faecal microbiota of eight young adults. Cluster analysis revealed the good reproducibility of the high level taxonomic microbiota fingerprint obtained for each of the subject.
The HTF-Microbi.Array is a fast and sensitive tool for the high taxonomic level fingerprint of the human intestinal microbiota in terms of presence/absence of the principal groups. Moreover, analysis of the relative fluorescence intensity for each probe pair of our LDR-UA platform can provide estimation of the relative abundance of the microbial target groups within each samples. Focusing the phylogenetic resolution at division, order and cluster levels, the HTF-Microbi.Array is blind with respect to the inter-individual variability at the species level.
Alternative splicing (AS) is an important regulatory mechanism for gene expression and protein diversity in eukaryotes. Previous studies have demonstrated that it can be causative for, or specific to splicing-related diseases. Understanding the regulation of AS will be helpful for diagnostic efforts and drug discoveries on those splicing-related diseases. As a novel exon-centric microarray platform, exon array enables a comprehensive analysis of AS by investigating the expression of known and predicted exons. Identifying of AS events from exon array has raised much attention, however, new and powerful algorithms for exon array data analysis are still absent till now.
Here, we considered identifying of AS events in the framework of variable selection and developed a regression method for AS detection (REMAS). Firstly, features of alternatively spliced exons were scaled by reasonably defined variables. Secondly, we designed a hierarchical model which can represent gene structure and transcriptional influence to exons, and the lasso type penalties were introduced in calculation because of huge variable size. Thirdly, an iterative two-step algorithm was developed to select alternatively spliced genes and exons. To avoid negative effects introduced by small sample size, we ranked genes as parameters indicating their AS capabilities in an iterative manner. After that, both simulation and real data evaluation showed that REMAS could efficiently identify potential AS events, some of which had been validated by RT-PCR or supported by literature evidence.
As a new lasso regression algorithm based on hierarchical model, REMAS has been demonstrated as a reliable and effective method to identify AS events from exon array data.
Mental retardation is a heterogeneous condition, affecting 1-3% of general population. In the last few years, several emerging clinical entities have been described, due to the advent of newest genetic techniques, such as array Comparative Genomic Hybridization. The detection of cryptic microdeletion/microduplication abnormalities has allowed genotype-phenotype correlations, delineating recognizable syndromic conditions that are herein reviewed. With the aim to provide to Paediatricians a combined clinical and genetic approach to the child with cognitive impairment, a practical diagnostic algorithm is also illustrated. The use of microarray platforms has further reduced the percentage of "idiopathic" forms of mental retardation, previously accounted for about half of total cases. We discussed the putative pathways at the basis of remaining "pure idiopathic" forms of mental retardation, highlighting possible environmental and epigenetic mechanisms as causes of altered cognition.
Emerging known and unknown pathogens create profound threats to public health. Platforms for rapid detection and characterization of microbial agents are critically needed to prevent and respond to disease outbreaks. Available detection technologies cannot provide broad functional information about known or novel organisms. As a step toward developing such a system, we have produced and tested a series of high-density functional gene arrays to detect elements of virulence and antibiotic resistance mechanisms. Our first generation array targets genes from Escherichia coli strains K12 and CFT073, Enterococcus faecalis and Staphylococcus aureus. We determined optimal probe design parameters for gene family detection and discrimination. When tested with organisms at varying phylogenetic distances from the four target strains, the array detected orthologs for the majority of targeted gene families present in bacteria belonging to the same taxonomic family. In combination with whole-genome amplification, the array detects femtogram concentrations of purified DNA, either spiked in to an aerosol sample background, or in combinations from one or more of the four target organisms. This is the first report of a high density NimbleGen microarray system targeting microbial antibiotic resistance and virulence mechanisms. By targeting virulence gene families as well as genes unique to specific biothreat agents, these arrays will provide important data about the pathogenic potential and drug resistance profiles of unknown organisms in environmental samples.
Phylogenetic microarrays present an attractive strategy to high-throughput interrogation of complex microbial communities. In this work we present several approaches to optimize the analysis of intestinal microbiota with the recently developed Microbiota Array. First, we determined how 16S rDNA-specific PCR amplification influenced bacterial detection and the consistency of measured abundance values. Bacterial detection improved with an increase in the number of PCR amplification cycles, but 25 cycles were sufficient to achieve the maximum possible detection. A PCR-caused deviation in the measured abundance values was also observed. We also developed two mathematical algorithms aimed to account for a predicted cross-hybridization of 16S rDNA fragments among different species, and to adjust the measured hybridization signal based on the number of 16S rRNA gene copies per species genome. The 16S rRNA gene copy adjustment indicated that the presence of members of class Clostridia might be over-estimated in some 16S rDNA-based studies. Finally, we show that the examination of total community RNA with phylogenetic microarray can provide estimates of the relative metabolic activity of individual community members. Complementary profiling of genomic DNA and total RNA isolated from the same sample presents an opportunity to assess population structure and activity in the same microbial community.
The delineation of genomic copy number abnormalities (CNAs) from cancer samples has been instrumental for identification of tumor suppressor genes and oncogenes and proven useful for clinical marker detection. An increasing number of projects have mapped CNAs using high-resolution microarray based techniques. So far, no single resource does provide a global collection of readily accessible oncogenomic array data.
We here present arrayMap, a curated reference database and bioinformatics resource targeting copy number profiling data in human cancer. The arrayMap database provides a platform for meta-analysis and systems level data integration of high-resolution oncogenomic CNA data. To date, the resource incorporates more than 40,000 arrays in 224 cancer types extracted from several resources, including the NCBI’s Gene Expression Omnibus (GEO), EBI’s ArrayExpress (AE), The Cancer Genome Atlas (TCGA), publication supplements and direct submissions. For the majority of the included datasets, probe level and integrated visualization facilitate gene level and genome wide data review. Results from multi-case selections can be connected to downstream data analysis and visualization tools.
To our knowledge, currently no data source provides an extensive collection of high resolution oncogenomic CNA data which readily could be used for genomic feature mining, across a representative range of cancer entities. arrayMap represents our effort for providing a long term platform for oncogenomic CNA data independent of specific platform considerations or specific project dependence. The online database can be accessed at http//www.arraymap.org.
Assessment of array quality is an essential step in the analysis of data from microarray experiments. Once detected, less reliable arrays are typically excluded or "filtered" from further analysis to avoid misleading results.
In this article, a graduated approach to array quality is considered based on empirical reproducibility of the gene expression measures from replicate arrays. Weights are assigned to each microarray by fitting a heteroscedastic linear model with shared array variance terms. A novel gene-by-gene update algorithm is used to efficiently estimate the array variances. The inverse variances are used as weights in the linear model analysis to identify differentially expressed genes. The method successfully assigns lower weights to less reproducible arrays from different experiments. Down-weighting the observations from suspect arrays increases the power to detect differential expression. In smaller experiments, this approach outperforms the usual method of filtering the data. The method is available in the limma software package which is implemented in the R software environment.
This method complements existing normalisation and spot quality procedures, and allows poorer quality arrays, which would otherwise be discarded, to be included in an analysis. It is applicable to microarray data from experiments with some level of replication.