The successful application of MRM in biological specimens raises the exciting possibility that assays can be configured to measure all human proteins, resulting in an assay resource that would promote advances in biomedical research. We report the results of a pilot study designed to test the feasibility of a large-scale, international effort in MRM assay generation. We have configured, validated across three laboratories, and made publicly available as a resource to the community 645 novel MRM assays representing 319 proteins expressed in human breast cancer. Assays were multiplexed in groups of >150 peptides and deployed to quantify endogenous analyte in a panel of breast cancer-related cell lines. Median assay precision was 5.4%, with high inter-laboratory correlation (R2 >0.96). Peptide measurements in breast cancer cell lines were able to discriminate amongst molecular subtypes and identify genome-driven changes in the cancer proteome. These results establish the feasibility of a scaled, international effort.
Access to a wider range of quantitative protein assays would significantly impact the number and use of tissue markers in guiding disease treatment. Quantitative mass spectrometry-based peptide and protein assays, such as immuno-SRM assays, have seen tremendous growth in recent years in application to protein quantification in biological fluids such as plasma or urine. Here, we extend the capability of the technique by demonstrating the application of a multiplexed immuno-SRM assay for quantification of estrogen receptor (ER) and human epidermal growth factor receptor 2 (HER2) levels in cell line lysates and human surgical specimens. The performance of the assay was characterized using peptide response curves, with linear ranges covering approximately 4 orders of magnitude and limits of detection in the low fmol/mg lysate range. Reproducibility was acceptable with median coefficients of variation of approximately 10%. We applied the assay to measurements of ER and HER2 in well-characterized cell line lysates with good discernment based on ER/HER2 status. Finally, the proteins were measured in surgically resected breast cancers, and the results showed good correlation with ER/HER2 status determined by clinical assays. This is the first implementation of the peptide-based immuno-SRM assay technology in cell lysates and human surgical specimens.
Estrogen receptor; HER2/Neu; immunoaffinity; peptides; tissue
We generated extensive transcriptional and proteomic profiles from a Her2-driven mouse model of breast cancer that closely recapitulates human breast cancer. This report makes these data publicly available in raw and processed forms, as a resource to the community. Importantly, we previously made biospecimens from this same mouse model freely available through a sample repository, so researchers can obtain samples to test biological hypotheses without the need of breeding animals and collecting biospecimens.
Twelve datasets are available, encompassing 841 LC-MS/MS experiments (plasma and tissues) and 255 microarray analyses of multiple tissues (thymus, spleen, liver, blood cells, and breast). Cases and controls were rigorously paired to avoid bias.
In total, 18,880 unique peptides were identified (PeptideProphet peptide error rate ≤1%), with 3884 and 1659 non-redundant protein groups identified in plasma and tissue datasets, respectively. Sixty-one of these protein groups overlapped between cancer plasma and cancer tissue.
Conclusions and clinical relevance
These data are of use for advancing our understanding of cancer biology, for software and quality control tool development, investigations of analytical variation in MS/MS data, and selection of proteotypic peptides for MRM-MS. The availability of these datasets will contribute positively to clinical proteomics.
Breast cancer; Her2; mouse; proteome; transcriptome
Mice lacking all three Rb genes in the liver develop tumors resembling specific subgroups of human hepatocellular carcinomas, and Notch activity appears to suppress the growth and progression of these tumors.
Hepatocellular carcinoma (HCC) is the third cancer killer worldwide with >600,000 deaths every year. Although the major risk factors are known, therapeutic options in patients remain limited in part because of our incomplete understanding of the cellular and molecular mechanisms influencing HCC development. Evidence indicates that the retinoblastoma (RB) pathway is functionally inactivated in most cases of HCC by genetic, epigenetic, and/or viral mechanisms. To investigate the functional relevance of this observation, we inactivated the RB pathway in the liver of adult mice by deleting the three members of the Rb (Rb1) gene family: Rb, p107, and p130. Rb family triple knockout mice develop liver tumors with histopathological features and gene expression profiles similar to human HCC. In this mouse model, cancer initiation is associated with the specific expansion of populations of liver stem/progenitor cells, indicating that the RB pathway may prevent HCC development by maintaining the quiescence of adult liver progenitor cells. In addition, we show that during tumor progression, activation of the Notch pathway via E2F transcription factors serves as a negative feedback mechanism to slow HCC growth. The level of Notch activity is also able to predict survival of HCC patients, suggesting novel means to diagnose and treat HCC.
Functional characterization of the human genome requires tools for systematically modulating gene expression in both loss- and gain-of-function experiments. We describe the production of a sequence-confirmed, clonal collection of over 16,100 human open-reading frames (ORFs) encoded in a versatile Gateway vector system. Utilizing this ORFeome resource, we created a genome-scale expression collection in a lentiviral vector, thereby enabling both targeted experiments and high-throughput screens in diverse cell types.
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
High-throughput technologies can now identify hundreds of candidate protein biomarkers for any disease with relative ease. However, because there are no assays for the majority of proteins and de novo immunoassay development is prohibitively expensive, few candidate biomarkers are tested in clinical studies. We tested whether the analytical performance of a biomarker identification pipeline based on targeted mass spectrometry would be sufficient for data-dependent prioritization of candidate biomarkers, de novo development of assays and multiplexed biomarker verification. We used a data-dependent triage process to prioritize a subset of putative plasma biomarkers from >1,000 candidates previously identified using a mouse model of breast cancer. Eighty-eight novel quantitative assays based on selected reaction monitoring mass spectrometry were developed, multiplexed and evaluated in 80 plasma samples. Thirty-six proteins were verified as being elevated in the plasma of tumor-bearing animals. The analytical performance of this pipeline suggests that it should support the use of an analogous approach with human samples.
With sequencing of thousands of organisms completed or in progress, there is a growing need to integrate gene prediction with metabolic network analysis. Using Chlamydomonas reinhardtii as a model, we describe a systems-level methodology bridging metabolic network reconstruction with experimental verification of enzyme encoding open reading frames. Our quantitative and predictive metabolic model and its associated cloned open reading frames provide useful resources for metabolic engineering.
Acute leukemias induced by MLL chimeric oncoproteins are among the subset of cancers distinguished by a paradoxical dependence on GSK-3 kinase activity for sustained proliferation. We demonstrate here that GSK-3 maintains the MLL leukemia stem cell transcriptional program by promoting the conditional association of CREB and its co-activators TORC and CBP with homedomain protein MEIS1, a critical component of the MLL-subordinate program, which in turn facilitates HOX-mediated transcription and transformation. This mechanism also applies to hematopoietic cells transformed by other HOX genes, including CDX2, which is highly expressed in a majority of acute myeloid leukemias, thus providing a molecular approach based on GSK-3 inhibitory strategies to target HOX-associated transcription in a broad spectrum of leukemias.
GSK-3; MLL; Hox; Creb; Meis1; Pbx; c-fos
Small cell lung carcinoma (SCLC) is a neuroendocrine subtype of lung cancer. While SCLC patients often initially respond to therapy, tumors nearly always recur, resulting in a 5-year survival rate of less than 10%. A mouse model has been developed based on the fact that the RB and p53 tumor suppressor genes are mutated in more than 90% of human SCLCs. Emerging evidence in patients and mouse models suggests that p130, a gene related to RB, may act as a tumor suppressor in SCLC cells. To test this idea, we used conditional mutant mice to delete p130 in combination with Rb and p53 in adult lung epithelial cells. We found that loss of p130 resulted in increased proliferation and significant acceleration of SCLC development in this triple knockout mouse model. The histopathological features of the triple mutant mouse tumors closely resembled that of human SCLC. Genome-wide expression profiling experiments further showed that Rb/p53/p130 mutant mouse tumors were similar to human SCLC. These findings indicate that p130 plays a key tumor suppressor role in SCLC. Rb/p53/p130 mutant mice provide a novel pre-clinical mouse model to identify novel therapeutic targets against SCLC.
p53; Rb; p130; mouse model; small cell lung carcinoma
To provide accurate biological hypotheses and inform upon global properties of cellular networks, systematic identification of protein–protein interactions has to meet high-quality standards. We present an expanded Caenorhabditis elegans protein-protein interaction network, or “interactome” map derived from testing a matrix of ~ 10,000 × ~ 10,000 proteins using a highly specific high-throughput yeast two-hybrid system. Through a new quality control empirical framework, We show that the resulting dataset (Worm Interactome 2007 or WI-2007) is similar in quality to low-throughput data curated from the literature. Previous interaction datasets have been filtered and integrated with WI-2007 to generate a high confidence consolidated map (Worm Interactome version 8 or WI8). This work allows us to estimate the size of the worm interactome at ~ 116,000 interactions. Comparison with other types of functional genomic data shows the complementarity of distinct experimental approaches in predicting different functional relationship features between genes or proteins.
KRAS is one of the most frequently mutated human oncogenes. In some
settings, oncogenic KRAS can trigger cellular senescence, whereas in others it
produces hyperproliferation. Elucidating the mechanisms regulating these 2
drastically distinct outcomes would help identify novel therapeutic approaches in
RAS-driven cancers. Using a combination of functional genomics and mouse genetics, we
identified a role for the transcription factor Wilms tumor 1 (WT1)
as a critical regulator of senescence and proliferation downstream of oncogenic
KRAS signaling. Deletion or suppression of Wt1
led to senescence of mouse primary cells expressing physiological levels of oncogenic
Kras but had no effect on wild-type cells, and
Wt1 loss decreased tumor burden in a mouse model of
Kras-driven lung cancer. In human lung cancer cell lines dependent
on oncogenic KRAS, WT1 loss decreased proliferation and induced
senescence. Furthermore, WT1 inactivation defined a gene expression
signature that was prognostic of survival only in lung cancer patients exhibiting
evidence of oncogenic KRAS activation. These findings reveal an
unexpected role for WT1 as a key regulator of the genetic network of oncogenic KRAS
and provide important insight into the mechanisms that regulate proliferation or
senescence in response to oncogenic signals.
Several attempts have been made at systematically mapping protein-protein interaction, or “interactome” networks. However, it remains difficult to assess the quality and coverage of existing datasets. We describe a framework that uses an empirically-based approach to rigorously dissect quality parameters of currently available human interactome maps. Our results indicate that high-throughput yeast two-hybrid (HT-Y2H) interactions for human are superior in precision to literature-curated interactions supported by only a single publication, suggesting that HT-Y2H is suitable to map a significant portion of the human interactome. We estimate that the human interactome contains ~130,000 binary interactions, most of which remain to be mapped. Similar to estimates of DNA sequence data quality and genome size early in the human genome project, estimates of protein interaction data quality and interactome size are critical to establish the magnitude of the task of comprehensive human interactome mapping and to illuminate a path towards this goal.
The Myc oncoprotein, a transcriptional regulator involved in the etiology of many different tumor types, has been demonstrated to play an important role in the functions of embryonic stem (ES) cells. Nonetheless, it is still unclear as to whether Myc has unique target and functions in ES cells.
To elucidate the role of c-Myc in murine ES cells, we mapped its genomic binding sites by chromatin-immunoprecipitation combined with DNA microarrays (ChIP-chip). In addition to previously identified targets we identified genes involved in pluripotency, early development, and chromatin modification/structure that are bound and regulated by c-Myc in murine ES cells. Myc also binds and regulates loci previously identified as Polycomb (PcG) targets, including genes that contain bivalent chromatin domains. To determine whether c-Myc influences the epigenetic state of Myc-bound genes, we assessed the patterns of trimethylation of histone H3-K4 and H3-K27 in mES cells containing normal, increased, and reduced levels of c-Myc. Our analysis reveals widespread and surprisingly diverse changes in repressive and activating histone methylation marks both proximal and distal to Myc binding sites. Furthermore, analysis of bulk chromatin from phenotypically normal c-myc null E7 embryos demonstrates a 70–80% decrease in H3-K4me3, with little change in H3-K27me3, compared to wild-type embryos indicating that Myc is required to maintain normal levels of histone methylation.
We show that Myc induces widespread and diverse changes in histone methylation in ES cells. We postulate that these changes are indirect effects of Myc mediated by its regulation of target genes involved in chromatin remodeling. We further show that a subset of PcG-bound genes with bivalent histone methylation patterns are bound and regulated in response to altered c-Myc levels. Our data indicate that in mES cells c-Myc binds, regulates, and influences the histone modification patterns of genes involved in chromatin remodeling, pluripotency, and differentiation.
Cellular functions are mediated through complex systems of macromolecules and metabolites linked through biochemical and physical interactions, represented in interactome models as ‘nodes' and ‘edges', respectively. Better understanding of genotype-to-phenotype relationships in human disease will require modeling of how disease-causing mutations affect systems or interactome properties. Here we investigate how perturbations of interactome networks may differ between complete loss of gene products (‘node removal') and interaction-specific or edge-specific (‘edgetic') alterations. Global computational analyses of ∼50 000 known causative mutations in human Mendelian disorders revealed clear separations of mutations probably corresponding to those of node removal versus edgetic perturbations. Experimental characterization of mutant alleles in various disorders identified diverse edgetic interaction profiles of mutant proteins, which correlated with distinct structural properties of disease proteins and disease mechanisms. Edgetic perturbations seem to confer distinct functional consequences from node removal because a large fraction of cases in which a single gene is linked to multiple disorders can be modeled by distinguishing edgetic network perturbations. Edgetic network perturbation models might improve both the understanding of dissemination of disease alleles in human populations and the development of molecular therapeutic strategies.
binary protein interaction; genotype-to-phenotype relationships; human Mendelian disorders; network perturbation
Describing the “ORFeome” of an organism, including all major isoforms, is essential for a systems understanding of any species; however, conventional cloning and sequencing approaches are prohibitively costly and labor-intensive. We describe a potentially genome-wide methodology for efficiently capturing novel coding isoforms using RT-PCR recombinational cloning, “deep well” pooling, and a “next generation” sequencing platform. This ORFeome discovery pipeline will be applicable to any eukaryotic species with a sequenced genome.
RACE (Rapid Amplification of cDNA Ends) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. Here, we describe a strategy that uses array hybridization to improve sampling efficiency of human transcripts. The products of the RACE reaction are hybridized onto tiling arrays, and the exons detected are used to delineate a series of RT-PCR reactions, through which the original RACE mixture is segregated into simpler RT-PCR reactions. These are independently cloned, and randomly selected clones are sequenced. This approach is superior to direct cloning and sequencing of RACE products: it specifically targets novel transcripts, and often results in overall normalization of transcript abundances. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of novel transcripts, and we investigate multiplexing it by pooling RACE reactions from multiple interrogated loci prior to hybridization.
The genes encoding XMT and DXMT, the enzymes from Coffea canephora (robusta) that catalyse the three independent N-methyl transfer reactions in the caffeine-biosynthesis pathway, have been cloned and the proteins have been expressed in Escherichia coli. Both proteins have been crystallized in the presence of the demethylated cofactor S-adenosyl-l-cysteine (SAH) and substrate (xanthosine for XMT and theobromine for DXMT).
Caffeine is a secondary metabolite produced by a variety of plants including Coffea canephora (robusta) and there is growing evidence that caffeine is part of a chemical defence strategy protecting young leaves and seeds from potential predators. The genes encoding XMT and DXMT, the enzymes from Coffea canephora (robusta) that catalyse the three independent N-methyl transfer reactions in the caffeine-biosynthesis pathway, have been cloned and the proteins have been expressed in Escherichia coli. Both proteins have been crystallized in the presence of the demethylated cofactor S-adenosyl-l-cysteine (SAH) and substrate (xanthosine for XMT and theobromine for DXMT). The crystals are orthorhombic, with space group P212121 for XMT and C2221 for DXMT. X-ray diffraction to 2.8 Å for XMT and to 2.5 Å for DXMT have been collected on beamline ID23-1 at the ESRF.
caffeine; SAM; N-methyltransferases
• Background and Aims Dehydrins, or group 2 late embryogenic abundant proteins (LEA), are hydrophilic Gly-rich proteins that are induced in vegetative tissues in response to dehydration, elevated salt, and low temperature, in addition to being expressed during the late stages of seed maturation. With the aim of characterizing and studying genes involved in osmotic stress tolerance in coffee, several full-length cDNA-encoding dehydrins (CcDH1, CcDH2 and CcDH3) and an LEA protein (CcLEA1) from Coffea canephora (robusta) were isolated and characterized.
• Methods The protein sequences deduced from the full-length cDNA were analysed to classify each dehydrin/LEA gene product and RT–PCR was used to determine the expression pattern of all four genes during pericarp and grain development, and in several other tissues of C. arabica and C. canephora. Primer-assisted genome walking was used to isolate the promoter region of the grain specific dehydrin gene (CcDH2).
• Key Results The CcDH1 and CcDH2 genes encode Y3SK2 dehydrins and the CcDH3 gene encodes an SK3 dehydrin. CcDH1 and CcDH2 are expressed during the final stages of arabica and robusta grain development, but only the CcDH1 transcripts are clearly detected in other tissues such as pericarp, leaves and flowers. CcDH3 transcripts are also found in developing arabica and robusta grain, in addition to being detected in pericarp, stem, leaves and flowers. CcLEA1 transcripts were only detected during a brief period of grain development. Finally, over 1 kb of genomic sequence potentially encoding the entire grain-specific promoter region of the CcDH2 gene was isolated and characterized.
• Conclusions cDNA sequences for three dehydrins and one LEA protein have been obtained and the expression of the associated genes has been determined in various tissues of arabica and robusta coffees. Because induction of dehydrin gene expression is associated with osmotic stress in other plants, the dehydrin sequences presented here will facilitate future studies on the induction and control of the osmotic stress response in coffee. The unique expression pattern observed for CcLEA1, and the expression of a related gene in other plants, suggests that this gene may play an important role in the development of grain endosperm tissue. Genomic DNA containing the grain-specific CcDH2 promoter region has been cloned. Sequence analysis indicates that this promoter contains several putative regulatory sites implicated in the control of both seed- and osmotic stress-specific gene expression. Thus, the CcDH2 promoter is likely to be a useful tool for basic studies on the control of gene expression during both grain maturation and osmotic stress in coffee.
Dehydrins; late embryogenic abundant protein (LEA); seed development; Coffea; C. canephora; C. arabica; Rubiaceae
An EST database has been generated for coffee based on sequences from approximately 47,000 cDNA clones derived from five different stages/tissues, with a special focus on developing seeds. When computationally assembled, these sequences correspond to 13,175 unigenes, which were analyzed with respect to functional annotation, expression profile and evolution. Compared with Arabidopsis, the coffee unigenes encode a higher proportion of proteins related to protein modification/turnover and metabolism—an observation that may explain the high diversity of metabolites found in coffee and related species. Several gene families were found to be either expanded or unique to coffee when compared with Arabidopsis. A high proportion of these families encode proteins assigned to functions related to disease resistance. Such families may have expanded and evolved rapidly under the intense pathogen pressure experienced by a tropical, perennial species like coffee. Finally, the coffee gene repertoire was compared with that of Arabidopsis and Solanaceous species (e.g. tomato). Unlike Arabidopsis, tomato has a nearly perfect gene-for-gene match with coffee. These results are consistent with the facts that coffee and tomato have a similar genome size, chromosome karyotype (tomato, n=12; coffee n=11) and chromosome architecture. Moreover, both belong to the Asterid I clade of dicot plant families. Thus, the biology of coffee (family Rubiacaeae) and tomato (family Solanaceae) may be united into one common network of shared discoveries, resources and information.
Coffea canephora; Rubiaceae; Solanaceae; Seed development; Comparative genomics