Variability in the quality of antibodies to histone post-translational modifications (PTMs) presents widely recognized hindrance in epigenetics research. Here, by using antibody engineering technologies we produced recombinant antibodies directed to the trimethylated lysine residues of histone H3 with high specificity and affinity and no lot-to-lot variation. These recombinant antibodies performed well in common epigenetics applications, and their high specificity enabled us to identify positive and negative correlations among histone PTMs.
Researchers have now had access to the fully sequenced Drosophila melanogaster genome for over a decade, and the sequenced genomes of 11 additional Drosophila species have been available for almost 5 years, with more species’ genomes becoming available every year [Adams MD, Celniker SE, Holt RA, et al. The genome sequence of Drosophila melanogaster. Science 2000;287:2185–95; Clark AG, Eisen MB, Smith DR, et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 2007;450:203–18]. Although the best studied of the D. melanogaster transcription factors (TFs) were cloned before sequencing of the genome, the availability of sequence data promised to transform our understanding of TFs and gene regulatory networks. Sequenced genomes have allowed researchers to generate tools for high-throughput characterization of gene expression levels, genome-wide TF localization and analyses of evolutionary constraints on DNA elements across multiple species. With an estimated 700 DNA-binding proteins in the Drosophila genome, it will be many years before each potential sequence-specific TF is studied in detail, yet the last decade of functional genomics research has already impacted our view of gene regulatory networks and TF DNA recognition.
Drosophila; transcription factor; genomics; enhancer; Zelda
The autosomal dominant spinocerebellar ataxias (SCAs) are a genetically heterogeneous group of disorders exhibiting cerebellar atrophy and Purkinje cell degeneration whose subtypes arise from 31 distinct genetic loci. Our group previously published the locus for SCA26 on chromosome 19p13.3. In this study, we performed targeted deep sequencing of the critical interval in order to identify candidate causative variants in individuals from the SCA26 family. We identified a single variant that co-segregates with the disease phenotype that produces a single amino acid substitution in eukaryotic elongation factor 2. This substitution, P596H, sits in a domain critical for maintaining reading frame during translation. The yeast equivalent, P580H EF2, demonstrated impaired translocation, detected as an increased rate of −1 programmed ribosomal frameshift read-through in a dual-luciferase assay for observing translational recoding. This substitution also results in a greater susceptibility to proteostatic disruption, as evidenced by a more robust activation of a reporter gene driven by unfolded protein response activation upon challenge with dithiothreitol or heat shock in our yeast model system. Our results present a compelling candidate mutation and mechanism for the pathogenesis of SCA26 and further support the role of proteostatic disruption in neurodegenerative diseases.
The Yorkie/Yap transcriptional coactivator is a well-known regulator of cellular proliferation in both invertebrates and mammals. As a coactivator, Yorkie (Yki) lacks a DNA binding domain and must partner with sequence-specific DNA binding proteins in the nucleus to regulate gene expression; in Drosophila, the developmental regulators Scalloped (Sd) and Homothorax (Hth) are two such partners. To determine the range of target genes regulated by these three transcription factors, we performed genome-wide chromatin immunoprecipitation experiments for each factor in both the wing and eye-antenna imaginal discs. Strong, tissue-specific binding patterns are observed for Sd and Hth, while Yki binding is remarkably similar across both tissues. Binding events common to the eye and wing are also present for Sd and Hth; these are associated with genes regulating cell proliferation and “housekeeping” functions, and account for the majority of Yki binding. In contrast, tissue-specific binding events for Sd and Hth significantly overlap enhancers that are active in the given tissue, are enriched in Sd and Hth DNA binding sites, respectively, and are associated with genes that are consistent with each factor's previously established tissue-specific functions. Tissue-specific binding events are also significantly associated with Polycomb targeted chromatin domains. To provide mechanistic insights into tissue-specific regulation, we identify and characterize eye and wing enhancers of the Yki-targeted bantam microRNA gene and demonstrate that they are dependent on direct binding by Hth and Sd, respectively. Overall these results suggest that both Sd and Hth use distinct strategies – one shared between tissues and associated with Yki, the other tissue-specific, generally Yki-independent and associated with developmental patterning – to regulate distinct gene sets during development.
The Hippo tumor suppressor pathway controls proliferation in a tissue-nonspecific fashion in Drosophila epithelial progenitor tissues via the transcriptional coactivator Yorkie (Yki). However, despite the tissue-nonspecific role that Yki plays in tissue growth, the transcription factors that recruit Yki to DNA, most notably Scalloped (Sd) and Homothorax (Hth), are important regulators of developmental patterning with many tissue-specific functions. Thus, these three transcriptional regulators – Yki, Sd, and Hth – provide a model for exploring the properties of protein-DNA interactions that regulate both tissue-shared and tissue-specific functions. With this goal in mind, we identified the positions in the fly genome that are bound by Yki, Sd, and Hth in the progenitors of the wing and eye-antenna structures of the fly. These data not only provide a global view of the Yki gene regulatory network, they reveal an unusual amount of tissue specificity in the genomic regions targeted by Sd and Hth, but not Yki. The data also reveal that tissue-specific binding is very likely to overlap tissue-specific enhancer regions, provide important clues for how tissue-specific Sd and Hth binding occurs, and support the idea that gene regulatory networks are plastic, with spatial differences in binding significantly impacting network structures.
“Candidatus Portiera aleyrodidarum” is the primary endosymbiont of whiteflies. We report two complete genome sequences of this bacterium from the worldwide invasive B and Q biotypes of the whitefly Bemisia tabaci. Differences in the two genome sequences may add insights into the complex differences in the biology of both biotypes.
“Candidatus Portiera aleyrodidarum” is the obligate primary endosymbiotic bacterium of whiteflies, including the sweet potato whitefly Bemisia tabaci, and provides essential nutrients to its host. Here we report two complete genome sequences of this bacterium from the B and Q biotypes of B. tabaci.
The Hippo pathway regulates growth through the transcriptional co-activator Yorkie, but how Yorkie promotes transcription remains poorly understood. We address this by characterizing Yorkie’s association with chromatin, and by identifying nuclear partners that effect transcriptional activation. Co-immunoprecipitation and mass spectrometry identify GAGA Factor (GAF), Brahma complex, and Mediator complex as Yorkie-associated nuclear protein complexes. All three are required for Yorkie’s transcriptional activation of downstream genes, and GAF and the Brahma complex subunit Moira interact directly with Yorkie. Genome-wide chromatin binding experiments identify thousands of Yorkie sites, most of which are associated with elevated transcription, based on genome-wide analysis of mRNA and histone H3K4Me3 modification. Chromatin binding also supports extensive functional overlap between Yorkie and GAF. Our studies suggest a widespread role for Yorkie as a regulator of transcription, and identify recruitment of the chromatin modifying GAF protein and BRM complex as a molecular mechanism for transcriptional activation by Yorkie.
Motivation: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene.
Results: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation.
Availability: All code used for this work is available at http://dinner-group.uchicago.edu/downloads.html
Supplementary Information: Supplementary data are available at Bioinformatics online.
Disseminated tumor cells (DTCs) detected in the bone marrow have been shown as an independent prognostic factor for women with breast cancer. However, the mechanisms behind the tumor cell dissemination are still unclear and more detailed knowledge is needed to fully understand why some cells remain dormant and others metastasize. Sequencing of single cells has opened for the possibility to dissect the genetic content of subclones of a primary tumor, as well as DTCs. Previous studies of genetic changes in DTCs have employed single-cell array comparative genomic hybridization which provides information about larger aberrations. To date, next-generation sequencing provides the possibility to discover new, smaller, and copy neutral genetic changes. In this study, we performed whole-genome amplification and subsequently next-generation sequencing to analyze DTCs from two breast cancer patients. We compared copy-number profiles of the DTCs and the corresponding primary tumor generated from sequencing and SNP-comparative genomic hybridization (CGH) data, respectively. While one tumor revealed mostly whole-arm gains and losses, the other had more complex alterations, as well as subclonal amplification and deletions. Whole-arm gains or losses in the primary tumor were in general also observed in the corresponding DTC. Both primary tumors showed amplification of chromosome 1q and deletion of parts of chromosome 16q, which was recaptured in the corresponding DTCs. Interestingly, clear differences were also observed, indicating that the DTC underwent further evolution at the copy-number level. This study provides a proof-of-principle for sequencing of DTCs and correlation with primary copy-number profiles. The analyses allow insight into tumor cell dissemination and show ongoing copy-number evolution in DTCs compared to the primary tumors.
single tumor cell sequencing; disseminating tumor cells; circulating tumor cells; tumor heterogeneity; clonal evolution
We performed a systematic evaluation of how variations in sequencing depth and other parameters influence interpretation of Chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) experiments. Using Drosophila S2 cells, we generated ChIP-seq datasets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin state bias, open chromatin regions yielded higher coverage, which led to false positives if not corrected and had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP library complexity at high coverage. The removal of reads originating at the same base reduced false-positives while having little effect on detection sensitivity. Even at a depth of ~1 read/bp coverage of mappable genome, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely-used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle datasets with deep coverage.
Comparative ChIP-seq data reveal adaptive evolution of insulator protein CTCF binding in multiple Drosophila species.
Changes in the physical interaction between cis-regulatory DNA sequences and proteins drive the evolution of gene expression. However, it has proven difficult to accurately quantify evolutionary rates of such binding change or to estimate the relative effects of selection and drift in shaping the binding evolution. Here we examine the genome-wide binding of CTCF in four species of Drosophila separated by between ∼2.5 and 25 million years. CTCF is a highly conserved protein known to be associated with insulator sequences in the genomes of human and Drosophila. Although the binding preference for CTCF is highly conserved, we find that CTCF binding itself is highly evolutionarily dynamic and has adaptively evolved. Between species, binding divergence increased linearly with evolutionary distance, and CTCF binding profiles are diverging rapidly at the rate of 2.22% per million years (Myr). At least 89 new CTCF binding sites have originated in the Drosophila melanogaster genome since the most recent common ancestor with Drosophila simulans. Comparing these data to genome sequence data from 37 different strains of Drosophila melanogaster, we detected signatures of selection in both newly gained and evolutionarily conserved binding sites. Newly evolved CTCF binding sites show a significantly stronger signature for positive selection than older sites. Comparative gene expression profiling revealed that expression divergence of genes adjacent to CTCF binding site is significantly associated with the gain and loss of CTCF binding. Further, the birth of new genes is associated with the birth of new CTCF binding sites. Our data indicate that binding of Drosophila CTCF protein has evolved under natural selection, and CTCF binding evolution has shaped both the evolution of gene expression and genome evolution during the birth of new genes.
A large proportion of the diversity of living organisms results from differential regulation of gene transcription. Transcriptional regulation is thought to differ between species because of evolutionary changes in the physical interactions between regulatory DNA elements and DNA-binding proteins; these can generate variation in the spatial and temporal patterns of gene expression. The mechanisms by which these protein–DNA interactions evolve is therefore an important question in evolutionary biology. Does adaptive evolution play a role, or is the process dominated by neutral genetic drift? Insulator proteins are a special group of DNA-binding proteins—instead of directly serving to activate or repress genes, they can function to coordinate the interactions between other regulatory elements (such as enhancers and promoters). Additionally, insulator proteins can limit the spreading of chromatin condensation and help to demarcate the boundaries of regulatory domains in the genome. In spite of their critical role in genome regulation, little is known about the evolution of interactions between insulator proteins and DNA. Here, we use ChIP-seq to examine the distribution of binding sites for CTCF, a highly conserved insulator protein, in four closely related Drosophila species. We find that genome-wide binding profiles of CTCF are highly dynamic across evolutionary time, with frequent births of new CTCF-DNA interactions, and we demonstrate that this evolutionary process is driven by natural selection. By comparing these with RNA-seq data, we find that gain or loss of CTCF binding impacts the expression levels of nearby genes and correlates with structural evolution of the genome. Together these results suggest a potential mechanism of regulatory re-wiring through adaptive evolution of CTCF binding.
The retinoblastoma (RB) tumor suppressor protein is a transcriptional cofactor with essential roles in cell cycle and development. Physical and functional targets of RB and its paralogs p107/p130 have been studied largely in cultured cells, but the full biological context of this family of proteins’ activities will likely be revealed only in whole organismal studies. To identify direct targets of the major Drosophila RB counterpart in a developmental context, we carried out ChIP-Seq analysis of Rbf1 in the embryo. The association of the protein with promoters is developmentally controlled; early promoter access is globally inhibited, whereas later in development Rbf1 is found to associate with promoter-proximal regions of approximately 2000 genes. In addition to conserved cell-cycle–related genes, a wholly unexpected finding was that Rbf1 targets many components of the insulin, Hippo, JAK/STAT, Notch, and other conserved signaling pathways. Rbf1 may thus directly affect output of these essential growth-control and differentiation pathways by regulation of expression of receptors, kinases and downstream effectors. Rbf1 was also found to target multiple levels of its own regulatory hierarchy. Bioinformatic analysis indicates that different classes of genes exhibit distinct constellations of motifs associated with the Rbf1-bound regions, suggesting that the context of Rbf1 recruitment may vary within the Rbf1 regulon. Many of these targeted genes are bound by Rbf1 homologs in human cells, indicating that a conserved role of RB proteins may be to adjust the set point of interlinked signaling networks essential for growth and development.
retinoblastoma; Rbf1; cell-cycle; Drosophila
Retinoic acid (RA) triggers growth-suppressive effects in tumor cells and therefore RA has and its synthetic analogs have great potential as anti-carcinogenic agent. RA effects are mediated by Retinoic Acid Receptors (RARs), which regulate gene expression in an RA-dependent manner. To define the genetic network regulated by RARs in breast cancer, we identified RAR genomic targets using chromatin immunoprecipitation and expression analysis. We found that RAR binding throughout the genome is highly co-incident with estrogen receptor α (ERα) binding, and identified a widespread crosstalk of RA and estrogen signaling to antagonistically regulate breast cancer-associated genes. ERα and RAR binding sites appear to be co-evolved on a large scale throughout the human genome, allowing for competitive binding between these transcription factors via nearby or overlapping cis-regulatory elements. Together these data indicate the existence of a highly coordinated intersection between these two critical nuclear hormone receptor signaling pathways providing a global mechanism for balancing gene expression output via local regulatory interactions dispersed throughout the genome.
Behavior is among the most dynamic animal phenotypes, modulated by a variety of internal and external stimuli. Behavioral differences are associated with large-scale changes in gene expression, but little is known about how these changes are regulated. Here we show how a transcription factor (TF), ultraspiracle (usp; the insect homolog of the Retinoid X Receptor), working in complex transcriptional networks, can regulate behavioral plasticity and associated changes in gene expression. We first show that RNAi knockdown of USP in honey bee abdominal fat bodies delayed the transition from working in the hive (primarily “nursing” brood) to foraging outside. We then demonstrate through transcriptomics experiments that USP induced many maturation-related transcriptional changes in the fat bodies by mediating transcriptional responses to juvenile hormone. These maturation-related transcriptional responses to USP occurred without changes in USP's genomic binding sites, as revealed by ChIP–chip. Instead, behaviorally related gene expression is likely determined by combinatorial interactions between USP and other TFs whose cis-regulatory motifs were enriched at USP's binding sites. Many modules of JH– and maturation-related genes were co-regulated in both the fat body and brain, predicting that usp and cofactors influence shared transcriptional networks in both of these maturation-related tissues. Our findings demonstrate how “single gene effects” on behavioral plasticity can involve complex transcriptional networks, in both brain and peripheral tissues.
Animals use behavior as one of the principal means of meeting their basic needs and responding flexibly to changes in their environment. An emerging insight is that changes in behavior are associated with massive changes in gene expression in the brain, but we know relatively little about how these changes are regulated. One important class of gene regulators are transcription factors (TF), proteins that orchestrate the expression of tens to thousands of genes. We discovered that ultraspiracle (USP), a TF previously known primarily for its role in development, regulates behavioral change in the honey bee; and we show that USP causes behaviorally related changes in gene expression by mediating responses to an endocrine regulator, juvenile hormone. We present evidence that these effects on gene expression occur through combinatorial interactions between USP and other TFs, and that these hormonally related transcriptional networks are preserved between two tissues with causal roles in behavioral plasticity: the brain and the fat body, a peripheral nutrient-sensing organ. These results suggest that behavior is subserved by complex interactions between genes and gene networks, occurring both in the brain and in peripheral tissues. More generally our results suggest that molecular systems biology is a promising paradigm by which to understand the mechanistic basis for behavior.
Breast cancers expressing estrogen receptor α (ERα) are often more differentiated histologically than ERα-negative tumors, but the reasons for this difference are poorly understood. One possible explanation is that transcriptional co-factors associated with ERα determine the expression of genes which promote a more differentiated phenotype. In this study, we identify one such cofactor as coactivator associated arginine methyltransferase 1 (CARM1), a unique co-activator of ERα that can simultaneously block cell proliferation and induce differentiation through global regulation of ERα-regulated genes. CARM1 was evidenced as an ERα co-activator in cell-based assays, gene expression microarrays, and mouse xenograft models. In human breast tumors, CARM1 expression positively correlated with ERα levels in ER+ tumors but was inversely correlated with tumor grade. Our findings suggest that co-expression of CARM1 and ERα may provide a better biomarker of well-differentiated breast cancer. Further, our findings define an important functional role of this histone arginine methyltransferase in re-programming ERα-regulated cellular processes, implicating CARM1 as a putative epigenetic target in ER-positive breast cancers.
CARM1; histone methylation; breast cancer; differentiation; epigenetics
The regulatory logic of time- and tissue-specific gene expression has mostly been dissected in the context of the smallest DNA fragments that, when isolated, recapitulate native expression in reporter assays. It is not known if the genomic sequences surrounding such fragments, often evolutionarily conserved, have any biological function or not. Using an enhancer of the even-skipped gene of Drosophila as a model, we investigate the functional significance of the genomic sequences surrounding empirically identified enhancers. A 480 bp long “minimal stripe element” is able to drive even-skipped expression in the second of seven stripes but is embedded in a larger region of 800 bp containing evolutionarily conserved binding sites for required transcription factors. To assess the overall fitness contribution made by these binding sites in the native genomic context, we employed a gene-replacement strategy in which whole-locus transgenes, capable of rescuing even-skipped- lethality to adulthood, were substituted for the native gene. The molecular phenotypes were characterized by tagging Even-skipped with a fluorescent protein and monitoring gene expression dynamics in living embryos. We used recombineering to excise the sequences surrounding the minimal enhancer and site-specific transgenesis to create co-isogenic strains differing only in their stripe 2 sequences. Remarkably, the flanking sequences were dispensable for viability, proving the sufficiency of the minimal element for biological function under normal conditions. These sequences are required for robustness to genetic and environmental perturbation instead. The mutant enhancers had measurable sex- and dose-dependent effects on viability. At the molecular level, the mutants showed a destabilization of stripe placement and improper activation of downstream genes. Finally, we demonstrate through live measurements that the peripheral sequences are required for temperature compensation. These results imply that seemingly redundant regulatory sequences beyond the minimal enhancer are necessary for robust gene expression and that “robustness” itself must be an evolved characteristic of the wild-type enhancer.
In this study we provide evidence that eukaryotic enhancers contain regulatory sequences that provide robustness of gene expression to genetic and environmental perturbation. The regulatory logic of tissue-specific gene expression is encoded by compact non-coding enhancer sequences. We hypothesized that enhancers function not merely to turn genes “on” or “off” but to do so under the range of genetic and temperature conditions experienced by developing embryos. We tested this hypothesis using an enhancer of the even-skipped gene of Drosophila as a model. The enhancer is composed of a “minimal element,” capable of recapitulating native expression in reporter assays, and potentially redundant but evolutionarily-conserved sequences surrounding the minimal element. We assayed the functional impact of the peripheral sequences on development, from in vivo gene expression to adult viability, to show that they are required for optimal performance under temperature and X chromosome dosage perturbations. Our results suggest that the architecture of enhancers is adjusted by natural selection to ensure robust gene expression. Such adaptive fine-tuning may explain how enhancers experience rapid sequence divergence between closely related species while exhibiting functional conservation.
Target specific antibodies are pivotal for the design of vaccines, immunodiagnostic tests, studies on proteomics for cancer biomarker discovery, identification of protein-DNA and other interactions, and small and large biochemical assays. Therefore, it is important to understand the properties of protein sequences that are important for antigenicity and to identify small peptide epitopes and large regions in the linear sequence of the proteins whose utilization result in specific antibodies.
Our analysis using protein properties suggested that sequence composition combined with evolutionary information and predicted secondary structure, as well as solvent accessibility is sufficient to predict successful peptide epitopes. The antigenicity and the specificity in immune response were also found to depend on the epitope length. We trained the B-Cell Epitope Oracle (BEOracle), a support vector machine (SVM) classifier, for the identification of continuous B-Cell epitopes with these protein properties as learning features. The BEOracle achieved an F1-measure of 81.37% on a large validation set. The BEOracle classifier outperformed the classical methods based on propensity and sophisticated methods like BCPred and Bepipred for B-Cell epitope prediction. The BEOracle classifier also identified peptides for the ChIP-grade antibodies from the modENCODE/ENCODE projects with 96.88% accuracy. High BEOracle score for peptides showed some correlation with the antibody intensity on Immunofluorescence studies done on fly embryos. Finally, a second SVM classifier, the B-Cell Region Oracle (BROracle) was trained with the BEOracle scores as features to predict the performance of antibodies generated with large protein regions with high accuracy. The BROracle classifier achieved accuracies of 75.26-63.88% on a validation set with immunofluorescence, immunohistochemistry, protein arrays and western blot results from Protein Atlas database.
Together our results suggest that antigenicity is a local property of the protein sequences and that protein sequence properties of composition, secondary structure, solvent accessibility and evolutionary conservation are the determinants of antigenicity and specificity in immune response. Moreover, specificity in immune response could also be accurately predicted for large protein regions without the knowledge of the protein tertiary structure or the presence of discontinuous epitopes. The dataset prepared in this work and the classifier models are available for download at https://sites.google.com/site/oracleclassifiers/.
Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster.
Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis.
Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis.
Therapy-related myeloid neoplasm (t-MN) is a distinctive clinical syndrome occurring after exposure to chemotherapy or radiotherapy. t-MN arises in most cases from a multipotential hematopoietic stem cell or, less commonly, in a lineage committed progenitor cell. The prognosis for patients with t-MN is poor, as current forms of therapy are largely ineffective. Cytogenetic analysis, molecular analysis and gene expression profiling analysis of t-MN has revealed that there are distinct subtypes of the disease; however, our understanding of the genetic basis of t-MN is incomplete. Elucidating the genetic pathways and molecular networks that are perturbed in t-MNs, may facilitate the identification of therapeutic targets that can be exploited for the development of urgently-needed targeted therapies.
Annotating and interpreting the results of genome-wide association studies (GWAS) remains challenging. Assigning function to genetic variants as expression quantitative trait loci is an expanding and useful approach, but focuses exclusively on mRNA rather than protein levels. Many variants remain without annotation. To address this problem, we measured the steady state abundance of 441 human signaling and transcription factor proteins from 68 Yoruba HapMap lymphoblastoid cell lines to identify novel relationships between inter-individual protein levels, genetic variants, and sensitivity to chemotherapeutic agents. Proteins were measured using micro-western and reverse phase protein arrays from three independent cell line thaws to permit mixed effect modeling of protein biological replicates. We observed enrichment of protein quantitative trait loci (pQTLs) for cellular sensitivity to two commonly used chemotherapeutics: cisplatin and paclitaxel. We functionally validated the target protein of a genome-wide significant trans-pQTL for its relevance in paclitaxel-induced apoptosis. GWAS overlap results of drug-induced apoptosis and cytotoxicity for paclitaxel and cisplatin revealed unique SNPs associated with the pharmacologic traits (at p<0.001). Interestingly, GWAS SNPs from various regions of the genome implicated the same target protein (p<0.0001) that correlated with drug induced cytotoxicity or apoptosis (p≤0.05). Two genes were functionally validated for association with drug response using siRNA: SMC1A with cisplatin response and ZNF569 with paclitaxel response. This work allows pharmacogenomic discovery to progress from the transcriptome to the proteome and offers potential for identification of new therapeutic targets. This approach, linking targeted proteomic data to variation in pharmacologic response, can be generalized to other studies evaluating genotype-phenotype relationships and provide insight into chemotherapeutic mechanisms.
The central dogma of biology explains that DNA is transcribed to mRNA that is further translated into protein. Many genome-wide studies have implicated genetic variation that influences gene expression and that ultimately affect downstream complex traits including response to drugs. However, because of technical limitations, few studies have evaluated the contribution of genetic variation on protein expression and ensuing effects on downstream phenotypes. To overcome this challenge, we used a novel technology to simultaneously measure the baseline expression of 441 proteins in lymphoblastoid cell lines and compared them with publicly available genetic data. To further illustrate the utility of this approach, we compared protein-level measurements with chemotherapeutic induced apoptosis and cell-growth inhibition data. This study demonstrates the importance of using protein information to understand the functional consequences of genetic variants identified in genome-wide association studies. This protein data set will also have broad utility for understanding the relationship between other genome-wide studies of complex traits.
In the largest E3 ligase subfamily, Cul3 binds a BTB domain, and an associated protein-interaction domain such as MATH recruits substrates for ubiquitination. Here we present biochemical and structural analyses of the MATH-BTB protein, SPOP. We define a SPOP-binding consensus (SBC), and determine structures revealing recognition of SBCs from the phosphatase Puc, the transcriptional regulator Ci, and the chromatin component MacroH2A. We identify a dimeric SPOP-Cul3 assembly involving a conserved helical structure C-terminal of BTB domains, which we call “3-box” due to its facilitating Cul3-binding and its resemblance to F-/SOCS-boxes in other cullin-based E3s. Structural flexibility between the substrate-binding MATH and Cul3-binding BTB/3-box domains potentially allows a SPOP dimer to engage multiple SBCs found within a single substrate, such as Puc. These studies provide a molecular understanding of how MATH-BTB proteins recruit substrates to Cul3, and how their dimerization and conformational variability may facilitate avid interactions with diverse substrates.
Despite the successes of genomics, little is known about how genetic information produces complex organisms. A look at the crucial functional elements of fly and worm genomes could change that.
We constructed Drosophila melanogaster BAC libraries with 21-kb and 83-kb inserts in the P(acman) system. Clones representing 12-fold coverage and encompassing more than 95% of annotated genes were mapped onto the reference genome. These clones can be integrated into predetermined attP sites in the genome using ΦC31 integrase to rescue mutations. They can be modified through recombineering, for example to incorporate protein tags and assess expression patterns.
Codon usage bias (CUB) is a ubiquitous observation in molecular evolution. As a model, Drosophila has been particularly well-studied and indications show that selection at least partially controls codon usage, probably through selection for translational efficiency. Although many aspects of Drosophila CUB have been studied, this is the first study relating codon usage to development in this holometabolous insect with very different life stages. Here we ask the question: What developmental stage of Drosophila melanogaster has the greatest CUB? Genes with maximum expression in the larval stage have the greatest overall CUB when compared with embryos, pupae, and adults. (The same pattern was observed in Drosophila pseudoobscura, see Supplementary Material online.) We hypothesize this is related to the very rapid growth of larvae, placing increased selective pressure to produce large amounts of protein: a 300-fold increase requiring an approximate doubling of protein content every 10 h. Genes with highest expression in adult males and early embryos, stages with the least de novo protein synthesis, display the least CUB. These results are consistent with the hypothesis that CUB is caused (at least in part) by selection for efficient protein production. This seems to hold on the individual gene level (highly expressed genes are more biased than lowly expressed genes) as well as on a more global scale where genes with maximum expression during times of very rapid growth and protein synthesis are more biased than genes with maximum expression during times of low growth.
codon usage bias; protein synthesis; Drosophila; development; melanogaster; pseudoobscura; larval stage
Motivation: The highly coordinated expression of thousands of genes in an organism is regulated by the concerted action of transcription factors, chromatin proteins and epigenetic mechanisms. High-throughput experimental data for genome wide in vivo protein–DNA interactions and epigenetic marks are becoming available from large projects, such as the model organism ENCyclopedia Of DNA Elements (modENCODE) and from individual labs. Dissemination and visualization of these datasets in an explorable form is an important challenge.
Results: To support research on Drosophila melanogaster transcription regulation and make the genome wide in vivo protein–DNA interactions data available to the scientific community as a whole, we have developed a system called Flynet. Currently, Flynet contains 101 datasets for 38 transcription factors and chromatin regulator proteins in different experimental conditions. These factors exhibit different types of binding profiles ranging from sharp localized peaks to broad binding regions. The protein–DNA interaction data in Flynet was obtained from the analysis of chromatin immunoprecipitation experiments on one color and two color genomic tiling arrays as well as chromatin immunoprecipitation followed by massively parallel sequencing. A web-based interface, integrated with an AJAX based genome browser, has been built for queries and presenting analysis results. Flynet also makes available the cis-regulatory modules reported in literature, known and de novo identified sequence motifs across the genome, and other resources to study gene regulation.
Availability: Flynet is available at https://www.cistrack.org/flynet/.
Supplementary information: Supplementary data are available at Bioinformatics online.