The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.
Malignant melanoma is a highly-aggressive type of malignancy with considerable metastatic potential and frequent resistance to cytotoxic agents. BRAF mutant protein was recently recognized as therapeutic target in metastatic melanoma. We present a newly-developed U-BRAFV600 approach – a universal pyrosequencing-based assay for mutation detection within activation segment in exon 15 of human braf. We identified 5 different BRAF mutations in a single assay analyzing 75 different formalin-fixed paraffin-embedded (FFPE) samples of cutaneous melanoma metastases from 29 patients. We found BRAF mutations in 21 of 29 metastases. All mutant variants were quantitatively detectable by the newly-developed U-BRAFV600 assay. These results were confirmed by ultra-deep-sequencing validation (∼60,000-fold coverage). In contrast to all other BRAF state detection methods, the U-BRAFV600 assay is capable of automated quantitative identification of at least 36 previously-published BRAF mutations. Under the precaution of a minimum of 3% mutated cells in front of a background of wild type cells, U-BRAFV600 assay design completely excludes false wild-type results. The corresponding algorithm for classification of BRAF-mutated variants is provided. The single-reaction assay and data analysis automation makes our approach suitable for the assessment of large clinical sample sizes. Therefore, we suggest U-BRAFV600 assay as a most powerful sequencing-based diagnostic tool to automatically identify BRAF state as a prerequisite to targeted therapy.
Genomic rearrangements are thought to occur progressively during tumor development. Recent findings, however, suggest an alternative mechanism, involving massive chromosome rearrangements in a one-step catastrophic event termed chromothripsis. We report the whole-genome sequencing-based analysis of a Sonic-Hedgehog medulloblastoma (SHH-MB) brain tumor from a patient with a germline TP53 mutation (Li-Fraumeni syndrome), uncovering massive, complex chromosome rearrangements. Integrating TP53 status with microarray and deep sequencing-based DNA rearrangement data in additional patients reveals a striking association between TP53 mutation and chromothripsis in SHH-MBs. Analysis of additional tumor entities substantiates a link between TP53 mutation and chromothripsis, and indicates a context-specific role for p53 in catastrophic DNA rearrangements. Among these, we observed a strong association between somatic TP53 mutations and chromothripsis in acute myeloid leukemia. These findings connect p53 status and chromothripsis in specific tumor types, providing a genetic basis for understanding particularly aggressive subtypes of cancer.
Amyotrophic lateral sclerosis (ALS) is a fatal disorder of the motor neuron system with poor prognosis and marginal therapeutic options. Current clinical diagnostic criteria are based on electrophysiological examination and exclusion of other ALS-mimicking conditions. Neuroprotective treatments are, however, most promising in early disease stages. Identification of disease-specific CSF biomarkers and associated biochemical pathways is therefore most relevant to monitor disease progression, response to neuroprotective agents and to enable early inclusion of patients into clinical trials.
Methods and Findings
CSF from 35 patients with ALS diagnosed according to the revised El Escorial criteria and 23 age-matched controls was processed using paramagnetic bead chromatography for protein isolation and subsequently analyzed by MALDI-TOF mass spectrometry. CSF protein profiles were integrated into a Random Forest model constructed from 153 mass peaks. After reducing this peak set to the top 25%, a classifier was built which enabled prediction of ALS with high accuracy, sensitivity and specificity. Further analysis of the identified peptides resulted in a panel of five highly sensitive ALS biomarkers. Upregulation of secreted phosphoprotein 1 in ALS-CSF samples was confirmed by univariate analysis of ELISA and mass spectrometry data. Further quantitative validation of the five biomarkers was achieved in an 80-plex Multiple Reaction Monitoring mass spectrometry assay.
ALS classification based on the CSF biomarker panel proposed in this study could become a valuable predictive tool for early clinical risk stratification. Of the numerous CSF proteins identified, many have putative roles in ALS-related metabolic processes, particularly in chromogranin-mediated secretion signaling pathways. While a stand-alone clinical application of this classifier will only be possible after further validation and a multicenter trial, it could be readily used to complement current ALS diagnostics and might also provide new insights into the pathomechanisms of this disease in the future.
Mitochondria exist as a network of interconnected organelles undergoing constant fission and fusion. Current approaches to study mitochondrial morphology are limited by low data sampling coupled with manual identification and classification of complex morphological phenotypes. Here we propose an integrated mechanistic and data-driven modeling approach to analyze heterogeneous, quantified datasets and infer relations between mitochondrial morphology and apoptotic events. We initially performed high-content, multi-parametric measurements of mitochondrial morphological, apoptotic, and energetic states by high-resolution imaging of human breast carcinoma MCF-7 cells. Subsequently, decision tree-based analysis was used to automatically classify networked, fragmented, and swollen mitochondrial subpopulations, at the single-cell level and within cell populations. Our results revealed subtle but significant differences in morphology class distributions in response to various apoptotic stimuli. Furthermore, key mitochondrial functional parameters including mitochondrial membrane potential and Bax activation, were measured under matched conditions. Data-driven fuzzy logic modeling was used to explore the non-linear relationships between mitochondrial morphology and apoptotic signaling, combining morphological and functional data as a single model. Modeling results are in accordance with previous studies, where Bax regulates mitochondrial fragmentation, and mitochondrial morphology influences mitochondrial membrane potential. In summary, we established and validated a platform for mitochondrial morphological and functional analysis that can be readily extended with additional datasets. We further discuss the benefits of a flexible systematic approach for elucidating specific and general relationships between mitochondrial morphology and apoptosis.
Autoimmune pancreatitis (AIP) is thought to be an immune-mediated inflammatory process, directed against the epithelial components of the pancreas.
In order to explore key targets of the inflammatory process we analysed the expression of proteins at the RNA and protein level using genomics and proteomics, immunohistochemistry, Western blot and immunoassay. An animal model of AIP with LP-BM5 murine leukemia virus infected mice was studied in parallel. RNA microarrays of pancreatic tissue from 12 patients with AIP were compared to those of 8 patients with non-AIP chronic pancreatitis (CP).
Expression profiling revealed 272 upregulated genes, including those encoding for immunoglobulins, chemokines and their receptors, and 86 downregulated genes, including those for pancreatic proteases such as three trypsinogen isoforms. Protein profiling showed that the expression of trypsinogens and other pancreatic enzymes was greatly reduced. Immunohistochemistry demonstrated a near-loss of trypsin positive acinar cells, which was also confirmed by Western blotting. The serum of AIP patients contained high titres of autoantibodies against the trypsinogens PRSS1, and PRSS2 but not against PRSS3. In addition, there were autoantibodies against the trypsin inhibitor PSTI (the product of the SPINK1 gene). In the pancreas of AIP animals we found similar protein patterns and a reduction in trypsinogen.
These data indicate that the immune-mediated process characterizing AIP involves pancreatic acinar cells and their secretory enzymes such as trypsin isoforms. Demonstration of trypsinogen autoantibodies may be helpful for the diagnosis of AIP.
autoimmune pancreatitis; chronic pancreatitis; trypsinogen; proteomics; transcriptomics; autoantibody
The β-amyloid precursor protein (APP) and the related β-amyloid precursor-like proteins (APLPs) undergo complex proteolytic processing giving rise to several fragments. Whereas it is well established that Aβ accumulation is a central trigger for Alzheimer's disease, the physiological role of APP family members and their diverse proteolytic products is still largely unknown. The secreted APPsα ectodomain has been shown to be involved in neuroprotection and synaptic plasticity. The γ-secretase-generated APP intracellular domain (AICD) functions as a transcriptional regulator in heterologous reporter assays although its role for endogenous gene regulation has remained controversial.
To gain further insight into the molecular changes associated with knockout phenotypes and to elucidate the physiological functions of APP family members including their proposed role as transcriptional regulators, we performed DNA microarray transcriptome profiling of prefrontal cortex of adult wild-type (WT), APP knockout (APP-/-), APLP2 knockout (APLP2-/-) and APPsα knockin mice (APPα/α) expressing solely the secreted APPsα ectodomain. Biological pathways affected by the lack of APP family members included neurogenesis, transcription, and kinase activity. Comparative analysis of transcriptome changes between mutant and wild-type mice, followed by qPCR validation, identified co-regulated gene sets. Interestingly, these included heat shock proteins and plasticity-related genes that were both down-regulated in knockout cortices. In contrast, we failed to detect significant differences in expression of previously proposed AICD target genes including Bace1, Kai1, Gsk3b, p53, Tip60, and Vglut2. Only Egfr was slightly up-regulated in APLP2-/- mice. Comparison of APP-/- and APPα/α with wild-type mice revealed a high proportion of co-regulated genes indicating an important role of the C-terminus for cellular signaling. Finally, comparison of APLP2-/- on different genetic backgrounds revealed that background-related transcriptome changes may dominate over changes due to the knockout of a single gene.
Shared transcriptome profiles corroborated closely related physiological functions of APP family members in the adult central nervous system. As expression of proposed AICD target genes was not altered in adult cortex, this may indicate that these genes are not affected by lack of APP under resting conditions or only in a small subset of cells.
Gastrointestinal stromal tumors (GIST) represent the most common mesenchymal tumors of the gastrointestinal tract. About 85% carry an activating mutation in the KIT or PDGFRA gene. Approximately 10% of GIST are so-called wild type GIST (wt-GIST) without mutations in the hot spots. In the present study we evaluated appropriate reference genes for the expression analysis of formalin-fixed, paraffin-embedded and fresh frozen samples from gastrointestinal stromal tumors. We evaluated the gene expression of KIT as well as of the alternative receptor tyrosine kinase genes FLT3, CSF1-R, PDGFRB, AXL and MET by qPCR. wt-GIST were compared to samples with mutations in KIT exon 9 and 11 and PDGFRA exon 18 in order to evaluate whether overexpression of these alternative RTK might contribute to the pathogenesis of wt-GIST.
Gene expression variability of the pooled cDNA samples is much lower than the single reverse transcription cDNA synthesis. By combining the lowest variability values of fixed and fresh tissue, the genes POLR2A, PPIA, RPLPO and TFRC were chosen for further analysis of the GIST samples. Overexpression of KIT compared to the corresponding normal tissue was detected in each GIST subgroup except in GIST with PDGFRA exon 18 mutation. Comparing our sample groups, no significant differences in the gene expression levels of FLT3, CSF1R and AXL were determined. An exception was the sample group with KIT exon 9 mutation. A significantly reduced expression of CSF1R, FLT3 and PDGFRB compared to the normal tissue was detected. GIST with mutations in KIT exon 9 and 11 and in PDGFRA exon 18 showed a significant PDGFRB downregulation.
As the variability of expression levels for the reference genes is very high comparing fresh frozen and formalin-fixed tissue there is a strong need for validation in each tissue type. None of the alternative receptor tyrosine kinases analyzed is associated with the pathogenesis of wild-type or mutated GIST. It remains to be clarified whether an autocrine or paracrine mechanism by overexpression of receptor tyrosine kinase ligands is responsible for the tumorigenesis of wt-GIST.
In the past, molecular mechanisms that drive the initiation of an inflammatory response have been studied intensively. However, corresponding mechanisms that sustain the expression of inflammatory response genes and hence contribute to the establishment of chronic disorders remain poorly understood. Recently, we provided genetic evidence that signaling via the receptor for advanced glycation end products (Rage) drives the strength and maintenance of an inflammatory reaction. In order to decipher the mode of Rage function on gene transcription levels during inflammation, we applied global gene expression profiling on time-resolved samples of mouse back skin, which had been treated with the phorbol ester TPA, a potent inducer of skin inflammation.
Ranking of TPA-regulated genes according to their time average mean and peak expression and superimposition of data sets from wild-type (wt) and Rage-deficient mice revealed that Rage signaling is not essential for initial changes in TPA-induced transcription, but absolutely required for sustained alterations in transcript levels. Next, we used a data set of differentially expressed genes between TPA-treated wt and Rage-deficient skin and performed computational analysis of their proximal promoter regions. We found a highly significant enrichment for several transcription factor binding sites (TFBS) leading to the prediction that corresponding transcription factors, such as Sp1, Tcfap2, E2f, Myc and Egr, are regulated by Rage signaling. Accordingly, we could confirm aberrant expression and regulation of members of the E2f protein family in epidermal keratinocytes of Rage-deficient mice.
In summary, our data support the model that engagement of Rage converts a transient cellular stimulation into sustained cellular dysfunction and highlight a novel role of the Rb-E2f pathway in Rage-dependent inflammation during pathological conditions.
Normalization of microarrays is a standard practice to account for and minimize effects which are not due to the controlled factors in an experiment. There is an overwhelming number of different methods that can be applied, none of which is ideally suited for all experimental designs. Thus, it is important to identify a normalization method appropriate for the experimental setup under consideration that is neither too negligent nor too stringent. Major aim is to derive optimal results from the underlying experiment. Comparisons of different normalization methods have already been conducted, none of which, to our knowledge, comparing more than a handful of methods.
In the present study, 25 different ways of pre-processing Illumina Sentrix BeadChip array data are compared. Among others, methods provided by the BeadStudio software are taken into account. Looking at different statistical measures, we point out the ideal versus the actual observations. Additionally, we compare qRT-PCR measurements of transcripts from different ranges of expression intensities to the respective normalized values of the microarray data. Taking together all different kinds of measures, the ideal method for our dataset is identified.
Pre-processing of microarray gene expression experiments has been shown to influence further downstream analysis to a great extent and thus has to be carefully chosen based on the design of the experiment. This study provides a recommendation for deciding which normalization method is best suited for a particular experimental setup.
Theme-driven cancer survival studies address whether the expression signature of genes related to a biological process can predict patient survival time. Although this should ideally be achieved by testing two separate null hypotheses, current methods treat both hypotheses as one. The first test should assess whether a geneset, independent of its composition, is associated with prognosis (frequently done with a survival test). The second test then verifies whether the theme of the geneset is relevant (usually done with an empirical test that compares the geneset of interest with random genesets). Current methods do not test this second null hypothesis because it has been assumed that the distribution of p-values for random genesets (when tested against the first null hypothesis) is uniform. Here we demonstrate that such an assumption is generally incorrect and consequently, such methods may erroneously associate the biology of a particular geneset with cancer prognosis.
To assess the impact of non-uniform distributions for random genesets in such studies, an automated theme-driven method was developed. This method empirically approximates the p-value distribution of sets of unrelated genes based on a permutation approach, and tests whether predefined sets of biologically-related genes are associated with survival. The results from a comparison with a published theme-driven approach revealed non-uniform distributions, suggesting a significant problem exists with false positive rates in the original study. When applied to two public cancer datasets our technique revealed novel ontological categories with prognostic power, including significant correlations between "fatty acid metabolism" with overall survival in breast cancer, as well as "receptor mediated endocytosis", "brain development", "apical plasma membrane" and "MAPK signaling pathway" with overall survival in lung cancer.
Current methods of theme-driven survival studies assume uniformity of p-values for random genesets, which can lead to false conclusions. Our approach provides a method to correct for this pitfall, and provides a novel route to identifying higher-level biological themes and pathways with prognostic power in clinical microarray datasets.
Human herpesvirus 8 (HHV-8) is the etiologic agent of Kaposi's sarcoma and primary effusion lymphoma. Activation of the cellular transcription factor nuclear factor-kappa B (NF-κB) is essential for latent persistence of HHV-8, survival of HHV-8-infected cells, and disease progression. We used reverse-transfected cell microarrays (RTCM) as an unbiased systems biology approach to systematically analyze the effects of HHV-8 genes on the NF-κB signaling pathway. All HHV-8 genes individually (n = 86) and, additionally, all K and latent genes in pairwise combinations (n = 231) were investigated. Statistical analyses of more than 14,000 transfections identified ORF75 as a novel and confirmed K13 as a known HHV-8 activator of NF-κB. K13 and ORF75 showed cooperative NF-κB activation. Small interfering RNA-mediated knockdown of ORF75 expression demonstrated that this gene contributes significantly to NF-κB activation in HHV-8-infected cells. Furthermore, our approach confirmed K10.5 as an NF-κB inhibitor and newly identified K1 as an inhibitor of both K13- and ORF75-mediated NF-κB activation. All results obtained with RTCM were confirmed with classical transfection experiments. Our work describes the first successful application of RTCM for the systematic analysis of pathofunctions of genes of an infectious agent. With this approach, ORF75 and K1 were identified as novel HHV-8 regulatory molecules on the NF-κB signal transduction pathway. The genes identified may be involved in fine-tuning of the balance between latency and lytic replication, since this depends critically on the state of NF-κB activity.
MicroRNAs (miRNAs) play key roles in mammalian gene expression and several cellular processes, including differentiation, development, apoptosis and cancer pathomechanisms. Recently the biological importance of primary cilia has been recognized in a number of human genetic diseases. Numerous disorders are related to cilia dysfunction, including polycystic kidney disease (PKD). Although involvement of certain genes and transcriptional networks in PKD development has been shown, not much is known how they are regulated molecularly.
Given the emerging role of miRNAs in gene expression, we explored the possibilities of miRNA-based regulations in PKD. Here, we analyzed the simultaneous expression changes of miRNAs and mRNAs by microarrays. 935 genes, classified into 24 functional categories, were differentially regulated between PKD and control animals. In parallel, 30 miRNAs were differentially regulated in PKD rats: our results suggest that several miRNAs might be involved in regulating genetic switches in PKD. Furthermore, we describe some newly detected miRNAs, miR-31 and miR-217, in the kidney which have not been reported previously. We determine functionally related gene sets, or pathways to reveal the functional correlation between differentially expressed mRNAs and miRNAs.
We find that the functional patterns of predicted miRNA targets and differentially expressed mRNAs are similar. Our results suggest an important role of miRNAs in specific pathways underlying PKD.
Differences in MYCN/c-MYC target gene expression are associated with distinct neuroblastoma subtypes and clinical outcome.
Amplified MYCN oncogene resulting in deregulated MYCN transcriptional activity is observed in 20% of neuroblastomas and identifies a highly aggressive subtype. In MYCN single-copy neuroblastomas, elevated MYCN mRNA and protein levels are paradoxically associated with a more favorable clinical phenotype, including disseminated tumors that subsequently regress spontaneously (stage 4s-non-amplified). In this study, we asked whether distinct transcriptional MYCN or c-MYC activities are associated with specific neuroblastoma phenotypes.
We defined a core set of direct MYCN/c-MYC target genes by applying gene expression profiling and chromatin immunoprecipitation (ChIP, ChIP-chip) in neuroblastoma cells that allow conditional regulation of MYCN and c-MYC. Their transcript levels were analyzed in 251 primary neuroblastomas. Compared to localized-non-amplified neuroblastomas, MYCN/c-MYC target gene expression gradually increases from stage 4s-non-amplified through stage 4-non-amplified to MYCN amplified tumors. This was associated with MYCN activation in stage 4s-non-amplified and predominantly c-MYC activation in stage 4-non-amplified tumors. A defined set of MYCN/c-MYC target genes was induced in stage 4-non-amplified but not in stage 4s-non-amplified neuroblastomas. In line with this, high expression of a subset of MYCN/c-MYC target genes identifies a patient subtype with poor overall survival independent of the established risk markers amplified MYCN, disease stage, and age at diagnosis.
High MYCN/c-MYC target gene expression is a hallmark of malignant neuroblastoma progression, which is predominantly driven by c-MYC in stage 4-non-amplified tumors. In contrast, moderate MYCN function gain in stage 4s-non-amplified tumors induces only a restricted set of target genes that is still compatible with spontaneous regression.
The opportunistic food-borne gram-positive pathogen Listeria monocytogenes can exist as a free-living microorganism in the environment and grow in the cytoplasm of vertebrate and invertebrate cells following infection. The general stress response, controlled by the alternative sigma factor, σB, has an important role for bacterial survival both in the environment and during infection. We used quantitative real-time PCR analysis and immuno-blot analysis to examine σB expression during growth of L. monocytogenes EGD-e. Whole genome-based transcriptional profiling was used to identify σB-dependent genes at different growth phases.
We detected 105 σB-positively regulated genes and 111 genes which appeared to be under negative control of σB and validated 36 σB-positively regulated genes in vivo using a reporter gene fusion system.
Genes comprising the σB regulon encode solute transporters, novel cell-wall proteins, universal stress proteins, transcriptional regulators and include those involved in osmoregulation, carbon metabolism, ribosome- and envelope-function, as well as virulence and niche-specific survival genes such as those involved in bile resistance and exclusion. Ten of the σB-positively regulated genes of L. monocytogenes are absent in L. innocua. A total of 75 σB-positively regulated listerial genes had homologs in B. subtilis, but only 33 have been previously described as being σB-regulated in B. subtilis even though both species share a highly conserved σB-dependent consensus sequence. A low overlap of genes may reflects adaptation of these bacteria to their respective environmental conditions.
The most fatal and prevalent form of malaria is caused by the bloodborne pathogen Plasmodium falciparum (henceforth P.f). Annually, approximately three million people died of malaria. Despite P.f devastivating effect globally, the vast majority of its proteins have not been characterized experimentally. In this work, we provide computational insight that explore the modalities of the regulation for some important group of genes of P.f, namely components of the glycolytic pathway, and those involved in apicoplast metabolism. Glycolysis is a crucial pathway in the maintenance of the parasite while the recently discovered apicoplast contains a range of metabolic pathways and housekeeping processes that differ radically to those of the host, which makes it ideal for drug therapy.
We have been able to validate some of our findings from available literature and therefore provide a basis to give theoretical insight for some genes regulations, which has not been characterized experimentally.
Neuroblastoma patients show heterogeneous clinical courses ranging from life-threatening progression to spontaneous regression. Recently, gene expression profiles of neuroblastoma tumours were associated with clinically different phenotypes. However, such data is still rare for important patient subgroups, such as patients with MYCN non-amplified advanced stage disease. Prediction of the individual course of disease and optimal therapy selection in this cohort is challenging. Additional research effort is needed to describe the patterns of gene expression in this cohort and to identify reliable prognostic markers for this subset of patients.
We combined gene expression data from two studies in a meta-analysis in order to investigate differences in gene expression of advanced stage (3 or 4) tumours without MYCN amplification that show contrasting outcomes (alive or dead) at five years after initial diagnosis. In addition, a predictive model for outcome was generated. Gene expression profiles from 66 patients were included from two studies using different microarray platforms.
In the combined data set, 72 genes were identified as differentially expressed by meta-analysis at a false discovery rate (FDR) of 8.33%. Meta-analysis detected 34 differentially expressed genes that were not found as significant in either single study. Outcome prediction based on data of both studies resulted in a predictive accuracy of 77%. Moreover, the genes that were differentially expressed in subgroups of advanced stage patients without MYCN amplification accurately separated MYCN amplified tumours from low stage tumours without MYCN amplification.
Our findings support the hypothesis that neuroblastoma consists of two biologically distinct subgroups that differ by characteristic gene expression patterns, which are associated with divergent clinical outcome.
The role of central tolerance induction has recently been revised after the discovery of promiscuous expression of tissue-restricted self-antigens in the thymus. The extent of tissue representation afforded by this mechanism and its cellular and molecular regulation are barely defined. Here we show that medullary thymic epithelial cells (mTECs) are specialized to express a highly diverse set of genes representing essentially all tissues of the body. Most, but not all, of these genes are induced in functionally mature CD80hi mTECs. Although the autoimmune regulator (Aire) is responsible for inducing a large portion of this gene pool, numerous tissue-restricted genes are also up-regulated in mature mTECs in the absence of Aire. Promiscuously expressed genes tend to colocalize in clusters in the genome. Analysis of a particular gene locus revealed expression of clustered genes to be contiguous within such a cluster and to encompass both Aire-dependent and –independent genes. A role for epigenetic regulation is furthermore implied by the selective loss of imprinting of the insulin-like growth factor 2 gene in mTECs. Our data document a remarkable cellular and molecular specialization of the thymic stroma in order to mimic the transcriptome of multiple peripheral tissues and, thus, maximize the scope of central self-tolerance.
MicroRNAs (miRNAs) constitute a recently discovered class of small non-coding RNAs that regulate expression of target genes either by decreasing the stability of the target mRNA or by translational inhibition. They are involved in diverse processes, including cellular differentiation, proliferation and apoptosis. Recent evidence also suggests their importance for cancerogenesis. By far the most important model systems in cancer research are mammalian organisms. Thus, we decided to compile comprehensive information on mammalian miRNAs, their origin and regulated target genes in an exhaustive, curated database called Argonaute (). Argonaute collects latest information from both literature and other databases. In contrast to current databases on miRNAs like miRBase::Sequences, NONCODE or RNAdb, Argonaute hosts additional information on the origin of an miRNA, i.e. in which host gene it is encoded, its expression in different tissues and its known or proposed function, its potential target genes including Gene Ontology annotation, as well as miRNA families and proteins known to be involved in miRNA processing. Additionally, target genes are linked to an information retrieval system that provides comprehensive information from sequence databases and a simultaneous search of MEDLINE with all synonyms of a given gene. The web interface allows the user to get information for a single or multiple miRNAs, either selected or uploaded through a text file. Argonaute currently has information on 839 miRNAs from human, mouse and rat.
The extensive use of DNA microarray technology in the characterization of the cell transcriptome is leading to an ever increasing amount of microarray data from cancer studies. Although similar questions for the same type of cancer are addressed in these different studies, a comparative analysis of their results is hampered by the use of heterogeneous microarray platforms and analysis methods.
In contrast to a meta-analysis approach where results of different studies are combined on an interpretative level, we investigate here how to directly integrate raw microarray data from different studies for the purpose of supervised classification analysis. We use median rank scores and quantile discretization to derive numerically comparable measures of gene expression from different platforms. These transformed data are then used for training of classifiers based on support vector machines. We apply this approach to six publicly available cancer microarray gene expression data sets, which consist of three pairs of studies, each examining the same type of cancer, i.e. breast cancer, prostate cancer or acute myeloid leukemia. For each pair, one study was performed by means of cDNA microarrays and the other by means of oligonucleotide microarrays. In each pair, high classification accuracies (> 85%) were achieved with training and testing on data instances randomly chosen from both data sets in a cross-validation analysis. To exemplify the potential of this cross-platform classification analysis, we use two leukemia microarray data sets to show that important genes with regard to the biology of leukemia are selected in an integrated analysis, which are missed in either single-set analysis.
Cross-platform classification of multiple cancer microarray data sets yields discriminative gene expression signatures that are found and validated on a large number of microarray samples, generated by different laboratories and microarray technologies. Predictive models generated by this approach are better validated than those generated on a single data set, while showing high predictive power and improved generalization performance.
gene expression profiling; DNA microarray; cross-platform analysis; classification; cancer
Promiscuous expression of tissue-specific self-antigens in the thymus imposes T cell tolerance and protects from autoimmune diseases, as shown in animal studies. Analysis of promiscuous gene expression in purified stromal cells of the human thymus at the single and global gene level documents the species conservation of this phenomenon. Medullary thymic epithelial cells overexpress a highly diverse set of genes (>400) including many tissue-specific antigens, disease-associated autoantigens, and cancer-germline genes. Although there are no apparent structural or functional commonalities among these genes and their products, they cluster along chromosomes. These findings have implications for human autoimmune diseases, immuno-therapy of tumors, and the understanding of the nature of this unorthodox regulation of gene expression.
self-tolerance; autoimmunity; tumor antigens; epigenetics; gene array