The most common risk factor for developing hepatocellular carcinoma (HCC) is chronic infection with hepatitis B virus (HBV). To better understand the evolutionary forces driving HCC we performed a near saturating transposon mutagenesis screen in a mouse HBV model of HCC. This screen identified 21 candidate early stage drivers, and a bewildering number (2860) of candidate later stage drivers, that were enriched for genes mutated, deregulated, or that function in signaling pathways important for human HCC, with a striking 1199 genes linked to cellular metabolic processes. Our study provides a comprehensive overview of the genetic landscape of HCC.
A major challenge for cancer genetics is to determine which low frequency somatic mutations are drivers of tumorigenesis. Here we interrogate the genomes of 7,651 diverse human cancers to identify novel drivers and find inactivating mutations in the homeodomain transcription factor CUX1 (cut-like homeobox 1) in ~1-5% of tumors. Meta-analysis of CUX1 mutational status in 2,519 cases of myeloid malignancies reveals disruptive mutations associated with poor survival, highlighting the clinical significance of CUX1 loss. In parallel, we validate CUX1 as a bona fide tumor suppressor using mouse transposon-mediated insertional mutagenesis and Drosophila cancer models. We demonstrate that CUX1 deficiency activates phosphoinositide 3-kinase (PI3K) signaling through direct transcriptional downregulation of the PI3K inhibitor PIK3IP1 (phosphoinositide-3-kinase interacting protein 1), leading to increased tumor growth, while exposing susceptibility to PI3K-AKT inhibition. Thus, our complementary approaches identify CUX1 as a new pan-driver of tumorigenesis and uncover a potential strategy for treating CUX1-mutant tumors.
The tumor suppressor gene RASSF1A is inactivated through point mutation or promoter hypermethylation in many human cancers. In this study, we performed a Sleeping Beauty transposon-mediated insertional mutagenesis screen in Rassf1a null mice to identify candidate genes that collaborate with loss of Rassf1a in tumorigenesis. We identified 10 genes, including the transcription factor Runx2, a transcriptional partner of Yes-associated protein (YAP1) that displays tumor suppressive activity through competing with the oncogenic TEA domain family of transcription factors (TEAD) for YAP1 association. While loss of RASSF1A promoted the formation of oncogenic YAP1-TEAD complexes, the combined loss of both RASSF1A and RUNX2 further increased YAP1-TEAD levels, demonstrating that loss of RASSF1A, together with RUNX2, is consistent with the multi-step model of tumorigenesis. Clinically, RUNX2 expression was frequently down-regulated in various cancers, and reduced RUNX2 expression was associated with poor survival in patients with diffuse large B-cell or atypical Burkitt’s/Burkitt’s-like lymphomas. Interestingly, decreased expression levels of RASSF1 and RUNX2 were observed in both precursor T-cell acute lymphoblastic leukemia and colorectal cancer, further supporting the hypothesis that dual regulation of YAP1-TEAD promotes oncogenic activity. Together, our findings provide evidence that loss of RASSF1A expression switches YAP1 from a tumor suppressor to an oncogene through regulating its association with transcription factors, thereby suggesting a novel mechanism for RASSF1A-mediated tumor suppression.
RASSF1A; RUNX2; YAP; tumor suppressor; transposon; hippo
The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of to unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%–33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes.
Retroviruses and transposons are widely used in cancer research and gene therapy. However, these systems show integration biases that may strongly affect results. To address this issue, we generated very large datasets consisting of to unselected integrations for the Sleeping Beauty and piggyBac transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed (epi)genomic features to generate bias maps at local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome, and a striking similarity was observed between piggyBac and the Murine Leukemia Virus. Moreover, we find that target site selection is directed at multiple scales. At larger scales, it is similar across systems, and directed by a set of domain-oriented features, including chromatin compaction, replication timing, and CpG islands. Notable differences between systems are defined at smaller scales by a diverse range of epigenetic features. As a practical application of our findings, we determined that three recent insertional mutagenesis screens - commonly used for cancer gene discovery - contained 7%–33% putative false positive integration hotspots.
Retroviral insertional mutagenesis (RIM) is a powerful tool for cancer genomics that was combined in this study with deep sequencing (RIM/DS) to facilitate a comprehensive analysis of lymphoma progression. Transgenic mice expressing two potent collaborating oncogenes in the germ line (CD2-MYC, -Runx2) develop rapid onset tumours that can be accelerated and rendered polyclonal by neonatal Moloney murine leukaemia virus (MoMLV) infection. RIM/DS analysis of 28 polyclonal lymphomas identified 771 common insertion sites (CISs) defining a ‘progression network’ that encompassed a remarkably large fraction of known MoMLV target genes, with further strong indications of oncogenic selection above the background of MoMLV integration preference. Progression driven by RIM was characterised as a Darwinian process of clonal competition engaging proliferation control networks downstream of cytokine and T-cell receptor signalling. Enhancer mode activation accounted for the most efficiently selected CIS target genes, including Ccr7 as the most prominent of a set of chemokine receptors driving paracrine growth stimulation and lymphoma dissemination. Another large target gene subset including candidate tumour suppressors was disrupted by intragenic insertions. A second RIM/DS screen comparing lymphomas of wild-type and parental transgenics showed that CD2-MYC tumours are virtually dependent on activation of Runx family genes in strong preference to other potent Myc collaborating genes (Gfi1, Notch1). Ikzf1 was identified as a novel collaborating gene for Runx2 and illustrated the interface between integration preference and oncogenic selection. Lymphoma target genes for MoMLV can be classified into (a) a small set of master regulators that confer self-renewal; overcoming p53 and other failsafe pathways and (b) a large group of progression genes that control autonomous proliferation in transformed cells. These findings provide insights into retroviral biology, human cancer genetics and the safety of vector-mediated gene therapy.
Cancers are known to arise by a series of mutational and non-mutational (epigenetic) events but the advent of cancer genome sequencing highlights the growing challenge of separating important (driver) from irrelevant (passenger) mutations. Retroviruses that induce cancer by inserting into host DNA and thereby altering key genes are valuable tools because they act as ‘tags’ to identify the critical targets. In this study we combined retroviral tagging with next generation sequencing to achieve a comprehensive description of lymphoma development and progression in transgenic mouse model systems. Our study suggests that three events may be sufficient for lymphoma development and identifies a genetic bottleneck at a small gene set that regulates tumour cell self-renewal, including the Myc oncogene and the p53 tumour suppressor. In contrast, many genes can provide the final step where the lymphoma cell acquires the ability to divide independently of external stimuli. As many of the target genes are conserved and play roles in cancers of non-viral origin, this study may provide a paradigm for the gene interactions that underlie cancer biology. It also elucidates the risks entailed in the recent use of retrovirus-based vectors for human gene therapy.
Summary: We have developed Cake, a bioinformatics software pipeline that integrates four publicly available somatic variant-calling algorithms to identify single nucleotide variants with higher sensitivity and accuracy than any one algorithm alone. Cake can be run on a high-performance computer cluster or used as a stand-alone application.
Availabilty: Cake is open-source and is available from http://cakesomatic.sourceforge.net/
Supplementary data are available at Bioinformatics online.
Medulloblastomas, the most frequent malignant brain tumours affecting
children, comprise at least 4 distinct clinicogenetic subgroups. Aberrant
sonic hedgehog (SHH) signalling is observed in approximately 25% of tumours
and defines one subgroup. Although alterations in SHH pathway genes (e.g.
PTCH1, SUFU) are observed in many of these tumours,
high throughput genomic analyses have identified few other recurring
mutations. Here, we have mutagenised the Ptch+/- murine tumour model using the Sleeping Beauty transposon
system to identify additional genes and pathways involved in SHH subgroup
Mutagenesis significantly increased medulloblastoma frequency and identified
17 candidate cancer genes, including orthologs of genes somatically mutated
(PTEN, CREBBP) or associated with poor outcome (PTEN,
MYT1L) in the human disease. Strikingly, these candidate genes were
enriched for transcription factors (p=2x10-5), the
majority of which (6/7; Crebbp, Myt1L, Nfia, Nfib,
Tead1 and Tgif2) were linked within a single regulatory
network enriched for genes associated with a differentiated neuronal
phenotype. Furthermore, activity of this network varied significantly
between the human subgroups, was associated with metastatic disease, and
predicted poor survival specifically within the SHH subgroup of tumours.
Igf2, previously implicated in medulloblastoma, was the most
differentially expressed gene in murine tumours with network perturbation,
and network activity in both mouse and human tumours was characterised by
enrichment for multiple gene-sets indicating increased cell proliferation,
IGF signalling, MYC target upregulation, and decreased neuronal
Collectively, our data support a model of medulloblastoma development in
SB-mutagenised Ptch+/- mice which involves disruption of a novel transcription
factor network leading to Igf2 upregulation, proliferation of GNPs,
and tumour formation. Moreover, our results identify rational therapeutic
targets for SHH subgroup tumours, alongside prognostic biomarkers for the
identification of poor-risk SHH patients.
Medulloblastoma; Mutagenesis; Transcription network; Differentiation
Antiviral responses must be tightly regulated to rapidly defend against infection while minimizing inflammatory damage. Type 1 interferons (IFN-I) are crucial mediators of antiviral responses1 and their transcription is regulated by a variety of transcription factors2; principal amongst these is the family of interferon regulatory factors (IRFs)3. The IRF gene regulatory networks are complex and contain multiple feedback loops. The tools of systems biology are well suited to elucidate the complex interactions that give rise to precise coordination of the interferon response. Here we have used an unbiased systems approach to predict that a member of the forkhead family of transcription factors, FOXO3, is a negative regulator of a subset of antiviral genes. This prediction was validated using macrophages isolated from Foxo3-null mice. Genome-wide location analysis combined with gene deletion studies identified the Irf7 gene as a critical target of FOXO3. FOXO3 was identified as a negative regulator of Irf7 transcription and we have further demonstrated that FOXO3, IRF7 and IFN-I form a coherent feed-forward regulatory circuit. Our data suggest that the FOXO3-IRF7 regulatory circuit represents a novel mechanism for establishing the requisite set points in the interferon pathway that balances the beneficial effects and deleterious sequelae of the antiviral response.
The t(12;21) translocation which generates the ETV6-RUNX1 (TEL-AML1) fusion gene, is the most common chromosomal rearrangement in childhood cancer and is exclusively associated with B-cell precursor acute lymphoblastic leukemia (BCP-ALL). The translocation arises in utero and is necessary but insufficient for the development of leukemia. SNP array analysis of ETV6-RUNX1 patient samples have identified multiple additional genetic alterations, however the role of these lesions in leukemogenesis remains undetermined. Moreover, murine models of ETV6-RUNX1 ALL that faithfully recapitulate the human disease are lacking. To identify novel genes that co-operate with ETV6-RUNX1 in leukemogenesis, we generated a mouse model that uses the endogenous Etv6 locus to co-express the ETV6-RUNX1 fusion and Sleeping Beauty (SB) transposase. An insertional mutagenesis screen was performed by intercrossing these mice with those carrying a SB transposon array. In contrast to previous models, a substantial proportion (20%) of the offspring developed BCP-ALL. Isolation of the transposon insertion sites identified genes known to be associated with BCP-ALL, including Ebf1 and Epor, in addition to other novel candidates. This is the first mouse model of ETV6-RUNX1 to develop BCP-ALL and provides important insights into the cooperating genetic alterations in ETV6-RUNX1 leukemia.
ETV6-RUNX1; leukemia; precursor-B cell; insertional mutagenesis
Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.
Pancreatic ductal adenocarcinoma (PDA) remains a lethal malignancy despite tremendous progress in its molecular characterization. Indeed, PDA tumors harbor four signature somatic mutations1–4, and a plethora of lower frequency genetic events of uncertain significance5. Here, we used Sleeping Beauty (SB) transposon-mediated insertional mutagenesis6,7 in a mouse model of pancreatic ductal preneoplasia8 to identify genes that cooperate with oncogenic KrasG12D to accelerate tumorigenesis and promote progression. Our screen revealed new candidates and confirmed the importance of many genes and pathways previously implicated in human PDA. Interestingly, the most commonly mutated gene was the X-linked deubiquitinase Usp9x, which was inactivated in over 50% of the tumors. Although prior work had attributed a pro-survival role to USP9X in human neoplasia9, we found instead that loss of Usp9x enhances transformation and protects pancreatic cancer cells from anoikis. Clinically, low USP9X protein and mRNA expression in PDA correlates with poor survival following surgery, and USP9X levels are inversely associated with metastatic burden in advanced disease. Furthermore, chromatin modulation with trichostatin A or 5-aza-2′-deoxycytidine elevates USP9X expression in human PDA cell lines to suggest a clinical approach for certain patients. The conditional deletion of Usp9x cooperated with KrasG12D to rapidly accelerate pancreatic tumorigenesis in mice, validating their genetic interaction. Therefore, we propose USP9X as a major new tumor suppressor gene with prognostic and therapeutic relevance in PDA.
We recently proposed that competitive endogenous RNAs (ceRNAs) sequester microRNAs to regulate mRNA transcripts containing common microRNA recognition elements (MREs). However, the functional role of ceRNAs in cancer remains unknown. Loss of PTEN, a tumor suppressor regulated by ceRNA activity, frequently occurs in melanoma. Here, we report the discovery of significant enrichment of putative PTEN ceRNAs among genes whose loss accelerates tumorigenesis following Sleeping Beauty insertional mutagenesis in a mouse model of melanoma. We validated several putative PTEN ceRNAs and further characterized one, the ZEB2 transcript. We show that ZEB2 modulates PTEN protein levels in a microRNA-dependent, protein coding-independent manner. Attenuation of ZEB2 expression activates the PI3K/AKT pathway, enhances cell transformation, and commonly occurs in human melanomas and other cancers expressing low PTEN levels. Our study genetically identifies multiple putative microRNA decoys for PTEN, validates ZEB2 mRNA as a bona fide PTEN ceRNA, and demonstrates that abrogated ZEB2 expression cooperates with BRAFV600E to promote melanomagenesis.
Haploinsufficiency of the human 5q35 region spanning the NSD1 gene results in a rare genomic disorder known as Sotos syndrome (Sotos), with patients displaying a variety of clinical features, including pre- and postnatal overgrowth, intellectual disability, and urinary/renal abnormalities. We used chromosome engineering to generate a segmental monosomy, i.e., mice carrying a heterozygous 1.5-Mb deletion of 36 genes on mouse chromosome 13 (4732471D19Rik-B4galt7), syntenic with 5q35.2–q35.3 in humans (Df(13)Ms2Dja+/− mice). Surprisingly Df(13)Ms2Dja+/− mice were significantly smaller for their gestational age and also showed decreased postnatal growth, in contrast to Sotos patients. Df(13)Ms2Dja+/− mice did, however, display deficits in long-term memory retention and dilation of the pelvicalyceal system, which in part may model the learning difficulties and renal abnormalities observed in Sotos patients. Thus, haploinsufficiency of genes within the mouse 4732471D19Rik–B4galt7 deletion interval play important roles in growth, memory retention, and the development of the renal pelvicalyceal system.
Electronic supplementary material
The online version of this article (doi:10.1007/s00335-012-9416-0) contains supplementary material, which is available to authorized users.
The evolution of colorectal cancer suggests the involvement of many genes. We performed insertional mutagenesis with the Sleeping Beauty (SB) transposon system in mice carrying germline or somatic Apc mutation. Analysis of common insertion sites (CISs) isolated from 446 tumors revealed many hundreds of candidate cancer drivers. Comparison to human datasets suggested that 234 CIS genes are also deregulated in human colorectal cancers. 183 CIS genes are candidate Wnt targets, and 20 are shown to be novel modifiers of canonical Wnt signaling. We also identified gene mutations associated with a subset of tumors containing an expanded number of Paneth cells, a hallmark of deregulated Wnt signaling, and genes associated with more severe dysplasia included members of the FGF signaling cascade. Some 70 genes showed pairwise co-occurrence clustering into 38 sub-networks that may regulate tumor development.
CADM1 encodes an immunoglobulin superfamily (IGSF) cell adhesion molecule. Inactivation of CADM1, either by promoter hypermethylation or loss of heterozygosity, has been reported in a wide variety of tumor types, thus it has been postulated as a tumor suppressor gene.
We show for the first time that Cadm1 homozygous null mice die significantly faster than wildtype controls due to the spontaneous development of tumors at an earlier age and an increased tumor incidence of predominantly lymphomas, but also some solid tumors. Tumorigenesis was accelerated after irradiation of Cadm1 mice, with the reduced latency in tumor formation suggesting there are genes that collaborate with loss of Cadm1 in tumorigenesis. To identify these co-operating genetic events, we performed a Sleeping Beauty transposon-mediated insertional mutagenesis screen in Cadm1 mice, and identified several common insertion sites (CIS) found specifically on a Cadm1-null background (and not wildtype background).
We confirm that Cadm1 is indeed a bona fide tumor suppressor gene and provide new insights into genetic partners that co-operate in tumorigenesis when Cadm1-expression is lost.
Cell adhesion molecule; Tumor suppressor; Transposon; Glucocorticoid; Cell junction
Nuclear receptor binding protein 1 regulates intestinal progenitor cell homeostasis and tumour formation
Arising from a ras-interaction screen in C. elegans, nuclear receptor binding protein 1 (NRBP1) is shown to impose a crypt progenitor phenotype in mice and is proposed as a novel tumour suppressor in human cancer.
Genetic screens in simple model organisms have identified many of the key components of the conserved signal transduction pathways that are oncogenic when misregulated. Here, we identify H37N21.1 as a gene that regulates vulval induction in let-60(n1046gf), a strain with a gain-of-function mutation in the Caenorhabditis elegans Ras orthologue, and show that somatic deletion of Nrbp1, the mouse orthologue of this gene, results in an intestinal progenitor cell phenotype that leads to profound changes in the proliferation and differentiation of all intestinal cell lineages. We show that Nrbp1 interacts with key components of the ubiquitination machinery and that loss of Nrbp1 in the intestine results in the accumulation of Sall4, a key mediator of stem cell fate, and of Tsc22d2. We also reveal that somatic loss of Nrbp1 results in tumourigenesis, with haematological and intestinal tumours predominating, and that nuclear receptor binding protein 1 (NRBP1) is downregulated in a range of human tumours, where low expression correlates with a poor prognosis. Thus NRBP1 is a conserved regulator of cell fate, that plays an important role in tumour suppression.
intestine; progenitor cell; Ras; tumour suppressor gene; WNT
The genetics of renal cancer is dominated by inactivation of the VHL tumour suppressor gene in clear cell carcinoma (ccRCC), the commonest histological subtype. A recent large-scale screen of ~3500 genes by PCR-based exon re-sequencing identified several new cancer genes in ccRCC including UTX (KDM6A)1, JARID1C (KDM5C) and SETD22. These genes encode enzymes that demethylate (UTX, JARID1C) or methylate (SETD2) key lysine residues of histone H3. Modification of the methylation state of these lysine residues of histone H3 regulates chromatin structure and is implicated in transcriptional control3. However, together these mutations are present in fewer than 15% of ccRCC, suggesting the existence of additional, currently unidentified cancer genes. Here, we have sequenced the protein coding exome in a series of primary ccRCC and report the identification of the SWI/SNF chromatin remodeling complex gene PBRM14 as a second major ccRCC cancer gene, with truncating mutations in 41% (92/227) of cases. These data further elucidate the somatic genetic architecture of ccRCC and emphasize the marked contribution of aberrant chromatin biology.
While genomic alterations identified in human tumors using techniques such as comparative genomic hybridisation (CGH) may be recurrent, they frequently encompass large regions, in some cases containing hundreds of genes. Here we combine high-resolution CGH analysis of 598 human cancer cell lines with insertion sites isolated from 1,005 mouse tumors induced with the Murine Leukaemia Virus (MuLV). This cross-species oncogenomic analysis revealed candidate tumor suppressor genes and oncogenes recurrently mutated in both human and mouse tumors, making them strong candidate cancer genes. A significant number of these genes contained binding sites for the transcription factors Oct4 and Nanog and mice carrying tumors with insertions in or near stem cell module genes, genes that are thought to participate in self-renewal, died significantly faster than mice without these insertions. The profile of MuLV insertions that we identified was compared to insertions isolated from 73 tumors induced using the Sleeping Beauty (SB) transposon system revealing significant differences in the profile of recurrently mutated genes. Collectively this work provides a rich catalogue of candidate genes for follow-up functional analysis.
Cross-species analysis; insertional mutagenesis; bioinformatics; oncogenomics; comparative genomic hybridization
The innate immune system is a two-edged sword; it is absolutely required for host defense against infection but, uncontrolled, can trigger a plethora of inflammatory diseases. Here we used systems biology approaches to predict and validate a gene regulatory network involving a dynamic interplay between the transcription factors NF-κB, C/EBPδ, and ATF3 that controls inflammatory responses. We mathematically modeled transcriptional regulation of Il6 and Cebpd genes and experimentally validated the prediction that the combination of an initiator (NF-κB), an amplifier (C/EBPδ) and an attenuator (ATF3) forms a regulatory circuit that discriminates between transient and persistent Toll-like receptor 4-induced signals. Our results suggest a mechanism that enables the innate immune system to detect the duration of infection and to respond appropriately.
We present a computational framework for predicting targets of transcription factor regulation. The framework is based on the integration of a number of sources of evidence, derived from DNA sequence and gene expression data, using a weighted sum approach. Sources of evidence are prioritized based on a training set, and their relative contributions are then optimized. The performance of the proposed framework is demonstrated in the context of BCL6 target prediction. We show that this framework is able to uncover BCL6 targets reliably when biological prior information is utilized effectively, particularly in the case of sequence analysis. The framework results in a considerable gain in performance over scores in which sequence information was not incorporated. This analysis shows that with assessment of the quality and biological relevance of the data, reliable predictions can be obtained with this computational framework.
network inference; transcription factor binding site prediction; data integration
Methods for accurate identification of nucleotide and structural variation using de novo short read sequencing of mouse chromosomes are described.
Genome sequences are essential tools for comparative and mutational analyses. Here we present the short read sequence of mouse chromosome 17 from the Mus musculus domesticus derived strain A/J, and the Mus musculus castaneus derived strain CAST/Ei. We describe approaches for the accurate identification of nucleotide and structural variation in the genomes of vertebrate experimental organisms, and show how these techniques can be applied to help prioritize candidate genes within quantitative trait loci.
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org.
Macrophages are versatile immune cells that can detect a variety of pathogen-associated molecular patterns through their Toll-like receptors (TLRs). In response to microbial challenge, the TLR-stimulated macrophage undergoes an activation program controlled by a dynamically inducible transcriptional regulatory network. Mapping a complex mammalian transcriptional network poses significant challenges and requires the integration of multiple experimental data types. In this work, we inferred a transcriptional network underlying TLR-stimulated murine macrophage activation. Microarray-based expression profiling and transcription factor binding site motif scanning were used to infer a network of associations between transcription factor genes and clusters of co-expressed target genes. The time-lagged correlation was used to analyze temporal expression data in order to identify potential causal influences in the network. A novel statistical test was developed to assess the significance of the time-lagged correlation. Several associations in the resulting inferred network were validated using targeted ChIP-on-chip experiments. The network incorporates known regulators and gives insight into the transcriptional control of macrophage activation. Our analysis identified a novel regulator (TGIF1) that may have a role in macrophage activation.
Macrophages play a vital role in host defense against infection by recognizing pathogens through pattern recognition receptors, such as the Toll-like receptors (TLRs), and mounting an immune response. Stimulation of TLRs initiates a complex transcriptional program in which induced transcription factor genes dynamically regulate downstream genes. Microarray-based transcriptional profiling has proved useful for mapping such transcriptional programs in simpler model organisms; however, mammalian systems present difficulties such as post-translational regulation of transcription factors, combinatorial gene regulation, and a paucity of available gene-knockout expression data. Additional evidence sources, such as DNA sequence-based identification of transcription factor binding sites, are needed. In this work, we computationally inferred a transcriptional network for TLR-stimulated murine macrophages. Our approach combined sequence scanning with time-course expression data in a probabilistic framework. Expression data were analyzed using the time-lagged correlation. A novel, unbiased method was developed to assess the significance of the time-lagged correlation. The inferred network of associations between transcription factor genes and co-expressed gene clusters was validated with targeted ChIP-on-chip experiments, and yielded insights into the macrophage activation program, including a potential novel regulator. Our general approach could be used to analyze other complex mammalian systems for which time-course expression data are available.