The majority of next-generation sequencing short-reads can be properly aligned by leading aligners at high speed. However, the alignment quality can still be further improved, since usually not all reads can be correctly aligned to large genomes, such as the human genome, even for simulated data. Moreover, even slight improvements in this area are important but challenging, and usually require significantly more computational endeavor. In this paper, we present CUSHAW3, an open-source parallelized, sensitive and accurate short-read aligner for both base-space and color-space sequences. In this aligner, we have investigated a hybrid seeding approach to improve alignment quality, which incorporates three different seed types, i.e. maximal exact match seeds, exact-match k-mer seeds and variable-length seeds, into the alignment pipeline. Furthermore, three techniques: weighted seed-pairing heuristic, paired-end alignment pair ranking and read mate rescuing have been conceived to facilitate accurate paired-end alignment. For base-space alignment, we have compared CUSHAW3 to Novoalign, CUSHAW2, BWA-MEM, Bowtie2 and GEM, by aligning both simulated and real reads to the human genome. The results show that CUSHAW3 consistently outperforms CUSHAW2, BWA-MEM, Bowtie2 and GEM in terms of single-end and paired-end alignment. Furthermore, our aligner has demonstrated better paired-end alignment performance than Novoalign for short-reads with high error rates. For color-space alignment, CUSHAW3 is consistently one of the best aligners compared to SHRiMP2 and BFAST. The source code of CUSHAW3 and all simulated data are available at http://cushaw3.sourceforge.net.
PROM1 is the gene encoding prominin-1 or CD133, an important cell surface marker for the isolation of both normal and cancer stem cells. PROM1 transcripts initiate at a range of transcription start sites (TSS) associated with distinct tissue and cancer expression profiles. Using high resolution Cap Analysis of Gene Expression (CAGE) sequencing we characterize TSS utilization across a broad range of normal and developmental tissues. We identify a novel proximal promoter (P6) within CD133+ melanoma cell lines and stem cells. Additional exon array sampling finds P6 to be active in populations enriched for mesenchyme, neural stem cells and within CD133+ enriched Ewing sarcomas. The P6 promoter is enriched with respect to previously characterized PROM1 promoters for a HMGI/Y (HMGA1) family transcription factor binding site motif and exhibits different epigenetic modifications relative to the canonical promoter region of PROM1.
PROM1 protein; human; AC133 antigen; transcription start site; promoter regions; genetic; melanoma; cancer stem cells
Little is known about the mechanisms of persistent airflow obstruction that result from chronic occupational endotoxin exposure. We sought to analyze the inflammatory response underlying persistent airflow obstruction as a result of chronic occupational endotoxin exposure. We developed a murine model of daily inhaled endotoxin for periods of 5 days to 8 weeks. We analyzed physiologic lung dysfunction, lung histology, bronchoalveolar lavage fluid and total lung homogenate inflammatory cell and cytokine profiles, and pulmonary gene expression profiles. We observed an increase in airway hyperresponsiveness as a result of chronic endotoxin exposure. After 8 weeks, the mice exhibited an increase in bronchoalveolar lavage and lung neutrophils that correlated with an increase in proinflammatory cytokines. Detailed analyses of inflammatory cell subsets revealed an expansion of dendritic cells (DCs), and in particular, proinflammatory DCs, with a reduced percentage of macrophages. Gene expression profiling revealed the up-regulation of a panel of genes that was consistent with DC recruitment, and lung histology revealed an accumulation of DCs in inflammatory aggregates around the airways in 8-week–exposed animals. Repeated, low-dose LPS inhalation, which mirrors occupational exposure, resulted in airway hyperresponsiveness, associated with a failure to resolve the proinflammatory response, an inverted macrophage to DC ratio, and a significant rise in the inflammatory DC population. These findings point to a novel underlying mechanism of airflow obstruction as a result of occupational LPS exposure, and suggest molecular and cellular targets for therapeutic development.
airway resistance; inhalation; neutrophils; macrophages; dendritic cells; endotoxin
The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.
Endotoxin is a near ubiquitous environmental exposure that that has been associated with both asthma and chronic obstructive pulmonary disease (COPD). These obstructive lung diseases have a complex pathophysiology, making them difficult to study comprehensively in the context of endotoxin. Genome-wide gene expression studies have been used to identify a molecular snapshot of the response to environmental exposures. Identification of differentially expressed genes shared across all published murine models of chronic inhaled endotoxin will provide insight into the biology underlying endotoxin-associated lung disease.
We identified three published murine models with gene expression profiling after repeated low-dose inhaled endotoxin. All array data from these experiments were re-analyzed, annotated consistently, and tested for shared genes found to be differentially expressed. Additional functional comparison was conducted by testing for significant enrichment of differentially expressed genes in known pathways. The importance of this gene signature in smoking-related lung disease was assessed using hierarchical clustering in an independent experiment where mice were exposed to endotoxin, smoke, and endotoxin plus smoke.
A 101-gene signature was detected in three murine models, more than expected by chance. The three model systems exhibit additional similarity beyond shared genes when compared at the pathway level, with increasing enrichment of inflammatory pathways associated with longer duration of endotoxin exposure. Genes and pathways important in both asthma and COPD were shared across all endotoxin models. Mice exposed to endotoxin, smoke, and smoke plus endotoxin were accurately classified with the endotoxin gene signature.
Despite the differences in laboratory, duration of exposure, and strain of mouse used in three experimental models of chronic inhaled endotoxin, surprising similarities in gene expression were observed. The endotoxin component of tobacco smoke may play an important role in disease development.
Comparisons of stem cell experiments at both molecular and semantic levels remain challenging due to inconsistencies in results, data formats, and descriptions among biomedical research discoveries. The Harvard Stem Cell Institute (HSCI) has created the Stem Cell Commons (stemcellcommons.org), an open, community-based approach to data sharing. Experimental information is integrated using the Investigation-Study-Assay tabular format (ISA-Tab) used by over 30 organizations (ISA Commons, isacommons.org). The early adoption of this format permitted the novel integration of three independent systems to facilitate stem cell data storage, exchange and analysis: the Blood Genomics Repository, the Stem Cell Discovery Engine, and the new Refinery platform that links the Galaxy analytical engine to data repositories.
Although ovarian cancer is often initially chemotherapy-sensitive, the vast majority of tumors eventually relapse and patients die of increasingly aggressive disease. Cancer stem cells are believed to have properties that allow them to survive therapy and may drive recurrent tumor growth. Cancer stem cells or cancer-initiating cells are a rare cell population and difficult to isolate experimentally. Genes that are expressed by stem cells may characterize a subset of less differentiated tumors and aid in prognostic classification of ovarian cancer. The purpose of this study was the genomic identification and characterization of a subtype of ovarian cancer that has stem cell-like gene expression. Using human and mouse gene signatures of embryonic, adult, or cancer stem cells, we performed an unsupervised bipartition class discovery on expression profiles from 145 serous ovarian tumors to identify a stem-like and more differentiated subgroup. Subtypes were reproducible and were further characterized in four independent, heterogeneous ovarian cancer datasets. We identified a stem-like subtype characterized by a 51-gene signature, which is significantly enriched in tumors with properties of Type II ovarian cancer; high grade, serous tumors, and poor survival. Conversely, the differentiated tumors share properties with Type I, including lower grade and mixed histological subtypes. The stem cell-like signature was prognostic within high-stage serous ovarian cancer, classifying a small subset of high-stage tumors with better prognosis, in the differentiated subtype. In multivariate models that adjusted for common clinical factors (including grade, stage, age), the subtype classification was still a significant predictor of relapse. The prognostic stem-like gene signature yields new insights into prognostic differences in ovarian cancer, provides a genomic context for defining Type I/II subtypes, and potential gene targets which following further validation may be valuable in the clinical management or treatment of ovarian cancer.
Background & Aims
The precise mechanisms by which IFN exerts its antiviral effect against HCV have not yet been elucidated. We sought to identify host genes that mediate the antiviral effect of IFN-α by conducting a whole-genome siRNA library screen.
High throughput screening was performed using an HCV genotype 1b replicon, pRep-Feo. Those pools with replicate robust Z scores ≥ 2.0 entered secondary validation in full-length OR6 replicon cells. Huh7.5.1 cells infected with JFH1 were then used to validate the rescue efficacy of selected genes for HCV replication under IFN-α treatment.
We identified and confirmed 93 human genes involved in the IFN-α anti-HCV effect using a whole-genome siRNA library. Gene ontology analysis revealed that mRNA processing (23 genes, P=2.756e-22), translation initiation (9 genes, P=2.42e-6), and IFN signaling (5 genes, P=1.00e-3) were the most enriched functional groups. Nine genes were components of U4/U6.U5 tri-snRNP. We confirmed that silencing squamous cell carcinoma antigen recognized by T cells (SART1), a specific factor of tri-snRNP, abrogates IFN-α's suppressive effects against HCV in both replicon cells and JFH1 infectious cells. We further found that SART1 was not an IFN-α inducible, and its anti-HCV effector in the JFH1 infectious model was through regulation of interferon stimulated genes (ISGs) with or without IFN-α.
We identified 93 genes that mediate the anti-HCV effect of IFN-α through genome-wide siRNA screening; 23 and 9 genes were involved in mRNA processing and translation initiation, respectively. These findings reveal an unexpected role for mRNA processing in generation of the antiviral state, and suggest a new avenue for therapeutic development in HCV.
Hepatitis C Virus, HCV; Interferon-α, IFN-α; Small interfering RNA, siRNA; Squamous cell carcinoma Antigen Recognized by T cells, SART1; U4/U6.U5 tri-small nuclear ribonucleoproteins, U4/U6.U5 tri-snRNP
► We describe monolayers consisting of molecules with polar repeat units. ► The global energy gap in such layers can vanish. ► In such films the dipole moment per molecule saturates beyond a certain length. ► These effects result from electrostatic dimensionality effects. ► This represents an “organic electronics” analogue to inorganic “polar surfaces”.
In conjugated organic molecules, excitation gaps typically decrease reciprocally with increasing the number of repeat units, n. This usually holds for individual molecules as well as for the corresponding bulk materials. Here, we show using density-functional theory calculations that a qualitatively different evolution is found for layers built from molecules consisting of polar repeat units. Whereas a 1/n-dependence is still observed in the case of isolated polar molecules, the global gap decreases essentially linearly with n in the corresponding 2D-periodic systems and vanishes beyond a certain molecular length, with the frontier states being localized at opposite ends of the layer. The latter is accompanied by a saturation of the dipole moment per molecule, an effect not observed in the isolated polar molecules. Interestingly, in both cases the limit of the gap for long (but finite) molecules differs qualitatively from that of infinite length obtained in 1D-periodic and 3D-periodic calculations, the latter serving as models for polymers and the bulk. We rationalize these dimensionality effects as a consequence of the potential gradient within the finite-length layers. They arise from the collective action of intra-molecular dipoles in the 2D periodic layers and can be traced back to surface effects.
Energy gap; Polar surface; Dimensionality effects; Band-structure calculations; Polar molecules; Thin-film electrostatics
To make full use of research data, the bioscience community needs to adopt technologies and reward mechanisms that support interoperability and promote the growth of an open ‘data commoning’ culture. Here we describe the prerequisites for data commoning and present an established and growing ecosystem of solutions using the shared ‘Investigation-Study-Assay’ framework to support that vision.
Background: Animal studies suggest that early-life lead exposure influences gene expression and production of proteins associated with Alzheimer’s disease (AD).
Objectives: We attempted to assess the relationship between early-life lead exposure and potential biomarkers for AD among young men and women. We also attempted to assess whether early-life lead exposure was associated with changes in expression of AD-related genes.
Methods: We used sandwich enzyme-linked immunosorbent assays (ELISA) to measure plasma concentrations of amyloid β proteins Aβ40 and Aβ42 among 55 adults who had participated as newborns and young children in a prospective cohort study of the effects of lead exposure on development. We used RNA microarray techniques to analyze gene expression.
Results: Mean plasma Aβ42 concentrations were lower among 13 participants with high umbilical cord blood lead concentrations (≥ 10 μg/dL) than in 42 participants with lower cord blood lead concentrations (p = 0.08). Among 10 participants with high prenatal lead exposure, we found evidence of an inverse relationship between umbilical cord lead concentration and expression of ADAM metallopeptidase domain 9 (ADAM9), reticulon 4 (RTN4), and low-density lipoprotein receptor-related protein associated protein 1 (LRPAP1) genes, whose products are believed to affect Aβ production and deposition. Gene network analysis suggested enrichment in gene sets involved in nerve growth and general cell development.
Conclusions: Data from our exploratory study suggest that prenatal lead exposure may influence Aβ-related biological pathways that have been implicated in AD onset. Gene network analysis identified further candidates to study the mechanisms of developmental lead neurotoxicity.
Alzheimer’s disease; children; fetal basis of adult disease; human; lead
Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10−6 revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10−7) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis.
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)—an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at http://discovery.hsci.harvard.edu.
A simple biochemical method to isolate mRNAs pulled down with a transfected, biotinylated microRNA was used to identify direct target genes of miR-34a, a tumor suppressor gene. The method reidentified most of the known miR-34a regulated genes expressed in K562 and HCT116 cancer cell lines. Transcripts for 982 genes were enriched in the pull-down with miR-34a in both cell lines. Despite this large number, validation experiments suggested that ∼90% of the genes identified in both cell lines can be directly regulated by miR-34a. Thus miR-34a is capable of regulating hundreds of genes. The transcripts pulled down with miR-34a were highly enriched for their roles in growth factor signaling and cell cycle progression. These genes form a dense network of interacting gene products that regulate multiple signal transduction pathways that orchestrate the proliferative response to external growth stimuli. Multiple candidate miR-34a–regulated genes participate in RAS-RAF-MAPK signaling. Ectopic miR-34a expression reduced basal ERK and AKT phosphorylation and enhanced sensitivity to serum growth factor withdrawal, while cells genetically deficient in miR-34a were less sensitive. Fourteen new direct targets of miR-34a were experimentally validated, including genes that participate in growth factor signaling (ARAF and PIK3R2) as well as genes that regulate cell cycle progression at various phases of the cell cycle (cyclins D3 and G2, MCM2 and MCM5, PLK1 and SMAD4). Thus miR-34a tempers the proliferative and pro-survival effect of growth factor stimulation by interfering with growth factor signal transduction and downstream pathways required for cell division.
microRNAs (miRNAs) are small RNAs that regulate gene expression by binding to mRNAs bearing a partially complementary sequence. miRNAs decrease the stability or translation of mRNA targets, leading to reduced protein expression. Understanding the biological function of a miRNA requires identifying its targets. Here we developed a sensitive and specific biochemical method to identify candidate microRNA targets that are enriched by pull-down with a tagged, transfected microRNA mimic. The method was applied to miR-34a, a miRNA that inhibits cell proliferation. We found that miR-34a can potentially regulate hundreds of genes. Computational analysis of these genes suggested a novel function for miR-34a—suppression of the pro-proliferative response to diverse growth factors. This function complements the previously known role of miR-34a in blocking cell cycle progression. Thus, by reducing the expression of an extensive network of genes, miR-34a dampens growth factor signaling as well as its downstream consequences, promotion of cell survival and proliferation.
By combining genome-wide association data from 8,130 individuals with type 2 diabetes (T2D) and 38,987 controls of European descent and following up previously unidentified meta-analysis signals in a further 34,412 cases and 59,925 controls, we identified 12 new T2D association signals with combinedP < 5 × 10−8. These include a second independent signal at the KCNQ1 locus; the first report, to our knowledge, of an X-chromosomal association (near DUSP9); and a further instance of overlap between loci implicated in monogenic and multifactorial forms of diabetes (at HNF1A). The identified loci affect both beta-cell function and insulin action, and, overall, T2D association signals show evidence of enrichment for genes involved in cell cycle regulation. We also show that a high proportion of T2D susceptibility loci harbor independent association signals influencing apparently unrelated complex traits.
As playing important roles in gene regulation, microRNAs (miRNAs) are
believed as indispensable involvers in the pathogenesis of myocardial
infarction (MI) that causes significant morbidity and mortality. Working on
a hypothesis that modulation of only some key members in the miRNA
superfamily could benefit ischemic heart, we proposed a microarray based
network biology approach to identify them with the recognized clinical
effect of propranolol as a prompt.
A long-term MI model of rat was established in this study. The microarray
technology was applied to determine the global miRNA expression change
intervened by propranolol. Multiple network analyses were sequentially
applied to evaluate the regulatory capacity, efficiency and emphasis of the
miRNAs which dysexpression in MI were significantly reversed by
Microarray data analysis indicated that long-term propranolol administration
caused 18 of the 31 dysregulated miRNAs in MI undergoing reversed
expression, implying that intentional modulation of miRNA expression might
show favorable effects for ischemic heart. Our network analysis identified
that, among these miRNAs, the prime players in MI were miR-1, miR-29b and
miR-98. Further finding revealed that miR-1 focused on regulation of myocyte
growth, yet miR-29b and miR-98 stressed on fibrosis and inflammation,
Our study illustrates how a combination of microarray technology and
functional protein network analysis can be used to identify disease-related
A series of π-extended phosphorescent palladium(II) and platinum(II) porphyrin complexes were synthesized, in which additional benzene rings are fused radially onto at least one of the four peripheral benzo groups. The photophysical properties of the metalloporphyrins palladium(II)-meso-tetra-(4-fluorophenyl)mononaphthotribenzoporphyrin (Pd1NF), cis-palladium(II)-meso-tetra-(4-fluorophenyl)dibenzodinaphthoporphyrin (Pd2NF), and palladium(II)-meso-tetra-(4-fluorophenyl)monobenzotrinaphthoporphyrin (Pd3NF) and the corresponding platinum(II) compounds (Pt1NF, cis-Pt2NF, Pt3NF) were investigated. The compounds under investigation absorb intensively in the near-infrared region (628−691 nm) and emit at room temperature at 815−882 nm. Phosphorescence quantum yields of the platinum(II) porphyrins range from 25 to 53% with luminescence decay times of 21 to 44 μs in deoxygenated toluene solutions at room temperature. The corresponding palladium(II) complexes exhibit quantum yields in the range of 7 to 18% with lifetimes of 106 to 206 μs. Density functional theory (DFT) calculations revealed nonplanar geometries for all complexes and corroborate the absorption characteristics. The subsequent π extension of the porphyrin system leads to near-infrared absorbing oxygen indicators with tailor-made luminescence properties as well as tunable oxygen sensitivity.
The synthesis and characterization of novel “hybrid” porphyrin complexes is presented. Photophysical properties, such as absorption in the near-infrared, emission maxima, quantum yields, and lifetimes, are determined by the conjugation length of the porphyrin system, which leads to “tailor-made” NIR indicators for optical oxygen detection.
Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories.
Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org
Computational modeling is used to describe the mechanisms governing energy level alignment between an organic semiconductor (OSC) and a metal covered by various self-assembled monolayers (SAMs). In particular, we address the question to what extent and under what circumstances SAM-induced work-function modifications lead to an actual change of the barriers for electron and hole injection from the metal into the OSC layer. Depending on the nature of the SAM, we observe clear transitions between Fermi level pinning and vacuum-level alignment regimes. Surprisingly, although in most cases the pinning occurs only when the metal is present, it is not related to charge transfer between the electrode and the organic layer. Instead, charge rearrangements at the interface between the SAM and the OSC are observed, accompanied by a polarization of the SAM.
computational modeling; density functional theory; electronic structure/processes/mechanisms; monolayers; organic electronics; molecular electronics; self-assembly; metal/organic interfaces
With the arrival of the postgenomic era, there is increasing interest in the discovery of biomarkers for the accurate diagnosis, prognosis, and early detection of cancer. Blood-borne cancer markers are favored by clinicians, because blood samples can be obtained and analyzed with relative ease. We have used a combined mining strategy based on an integrated cancer microarray platform, Oncomine, and the biomarker module of the Ingenuity Pathways Analysis (IPA) program to identify potential blood-based markers for six common human cancer types.
In the Oncomine platform, the genes overexpressed in cancer tissues relative to their corresponding normal tissues were filtered by Gene Ontology keywords, with the extracellular environment stipulated and a corrected Q value (false discovery rate) cut-off implemented. The identified genes were imported to the IPA biomarker module to separate out those genes encoding putative secreted or cell-surface proteins as blood-borne (blood/serum/plasma) cancer markers. The filtered potential indicators were ranked and prioritized according to normalized absolute Student t values. The retrieval of numerous marker genes that are already clinically useful or under active investigation confirmed the effectiveness of our mining strategy. To identify the biomarkers that are unique for each cancer type, the upregulated marker genes that are in common between each two tumor types across the six human tumors were also analyzed by the IPA biomarker comparison function.
The upregulated marker genes shared among the six cancer types may serve as a molecular tool to complement histopathologic examination, and the combination of the commonly upregulated and unique biomarkers may serve as differentiating markers for a specific cancer. This approach will be increasingly useful to discover diagnostic signatures as the mass of microarray data continues to grow in the ‘omics’ era.