1.  The p130 Isoform of Angiomotin Is Required for Yap-Mediated Hepatic Epithelial Cell Proliferation and Tumorigenesis 
Science signaling  2013;6(291):ra77.
The Hippo-Yap signaling pathway regulates a number of developmental and adult cellular processes, including cell fate determination, tissue growth, and tumorigenesis. Members of the scaffold protein angiomotin (Amot) family interact with several Hippo pathway components, including Yap (Yes-associated protein), and either stimulate or inhibit Yap activity. We used a combination of genetic, biochemical, and transcriptional approaches to assess the functional consequences of the Amot-Yap interaction in mice and in human cells. Mice with a liver-specific Amot knockout exhibited reduced hepatic “oval cell” proliferation and tumorigenesis in response to toxin-induced injury or when crossed with mice lacking the tumor suppressor Nf2. Biochemical examination of the Amot-Yap interaction revealed that the p130 splicing isoform of Amot (Amot-p130) and Yap interacted in both the cytoplasm and nucleus, which involved binding of PPxY and LPxY motifs in Amot-p130 to WW domains of Yap. In the cytoplasm, Amot-p130 prevented the phosphorylation of Yap by blocking access of the WW domains to the kinase Lats1. Within the nucleus, Amot-p130 was associated with the transcriptional complex containing Yap and Teads (TEA domain family members) and contributed to the regulation of a subset of Yap target genes, many of which are associated with tumorigenesis. These findings indicated that Amot acts as a Yap cofactor, preventing Yap phosphorylation and augmenting its activity toward a specific set of genes that facilitate tumorigenesis.
PMCID: PMC4175526  PMID: 24003254
2.  Resection of Non-Small Cell Lung Cancers Reverses Tumor-Induced Gene Expression Changes in the Peripheral Immune System 
To characterize the interactions of Non-small Cell Lung Cancer (NSCLC) tumors with the immune system at the level of mRNA and microRNA (miRNA) expression and to define expression signatures that characterize the presence of a malignant tumor vs. a non-malignant nodule.
We have examined the changes of both mRNA and miRNA expression levels in peripheral blood mononuclear cells (PBMC) between paired samples collected from NSCLC patients before and after tumor removal using Illumina gene expression arrays.
We found that malignant tumor removal significantly changes expression of more than 3,000 protein-coding genes, especially genes in pathways associated with suppression of the innate immune response, including NK cell signaling and apoptosis-associated ceramide signaling. Binding sites for the ETS-domain transcription factors ELK1, ELK4 and SPI1 were enriched in promoter regions of genes upregulated in the presence of a tumor. Additional important regulators included five miRNAs expressed at significantly higher levels before tumor removal. Repressed protein-coding targets of those miRNAs included many transcription factors, several involved in immunologically important pathways. While there was a significant overlap in the effects of malignant tumors and benign lung nodules on PBMC gene expression, we identified one gene panel which indicates a tumor or nodule presence and a second panel that can distinguish malignant from non-malignant nodules.
A tumor presence in the lung influences mRNA and miRNA expression in PBMC and this influence is reversed by tumor removal. These results suggest that PBMC gene expression signatures could be used for lung cancer diagnosis.
PMCID: PMC3618688  PMID: 21807633
lung cancer; pbmc; peripheral immune system
3.  Inflammation and Its Correlates in Regenerative Wound Healing: An Alternate Perspective 
Advances in Wound Care  2014;3(9):592-603.
Objective: The wound healing response may be viewed as partially overlapping sets of two physiological processes, regeneration and wound repair with the former overrepresented in some lower species such as newts and the latter more typical of mammals. A robust and quantitative model of regenerative healing has been described in Murphy Roths Large (MRL) mice in which through-and-through ear hole wounds in the ear pinna leads to scarless healing and replacement of all tissue through blastema formation and including cartilage. Since these mice are naturally autoimmune and display many aspects of an enhanced inflammatory response, we chose to examine the inflammatory status during regenerative ear hole closure and observed that inflammation has a clear positive effect on regenerative healing.
Approach: The inflammatory gene expression patterns (Illumina microarrays) of early healing ear tissue from regenerative MRL and nonregenerative C57BL/6 (B6) strains are presented along with a survey of innate inflammatory cells found in this tissue type pre and postinjury. The role of inflammation on healing is tested using a COX-2 inhibitor.
Innovation and Conclusion: We conclude that (1) enhanced inflammation is consistent with, and probably necessary, for a full regenerative response and (2) the inflammatory gene expression and cell distribution patterns suggest a novel mast cell population with markers found in both immature and mature mast cells that may be a key component of regeneration.
PMCID: PMC4152783  PMID: 25207202
4.  The transcription factor Foxp1 is a critical negative regulator of the differentiation of follicular helper T cells 
Nature immunology  2014;15(7):667-675.
CD4+ follicular helper T cells (TFH cells) are essential for germinal center (GC) responses and long-lived antibody responses. Here we report that naive CD4+ T cells deficient in the transcription factor Foxp1 ‘preferentially’ differentiated into TFH cells, which resulted in substantially enhanced GC and antibody responses. We found that Foxp1 used both constitutive Foxp1A and Foxp1D induced by stimulation of the T cell antigen receptor (TCR) to inhibit the generation of TFH cells. Mechanistically, Foxp1 directly and negatively regulated interleukin 21 (IL-21); Foxp1 also dampened expression of the costimulatory molecule ICOS and its downstream signaling at early stages of T cell activation, which rendered Foxp1-deficient CD4+ T cells partially resistant to blockade of the ICOS ligand (ICOSL) during TFH cell development. Our findings demonstrate that Foxp1 is a critical negative regulator of TFH cell differentiation.
PMCID: PMC4142638  PMID: 24859450
5.  The peripheral immune response and lung cancer prognosis 
Oncoimmunology  2012;1(8):1414-1416.
Attempts to refine and improve outcome predictions using tumor gene expression have been recently reported. We show that peripheral blood mononuclear cell (PBMC)-associated gene signatures can predict outcome in non-small cell lung carcinoma patients independent of demographic data or TNM staging, and that this information may persist after tumor resection.
PMCID: PMC3518521  PMID: 23243612
NSCLC; PBMC; T cell; gene expression; lung cancer; myeloid; prognosis; survival
6.  CD164 and FCRL3 are highly expressed on CD4+CD26-T cells in Sézary syndrome patients 
The Journal of investigative dermatology  2013;134(1):10.1038/jid.2013.279.
Sézary syndrome (SS) cells express cell surface molecules also found on normal activated CD4 T cells. In an effort to find a more specific surface marker for malignant SS cells, a microarray analysis of gene expression was performed. Results showed significantly increased levels of mRNA for CD164, a sialomucin found on human CD34+ hematopoietic stem cells, and FCRL3, a molecule present on a subset of human natural T regulatory cells. Both markers were increased in CD4 T cells from SS patients compared to healthy donors. Flow cytometry studies confirmed the increased expression of CD164 and FCRL3 primarily on CD4+CD26− T cells of SS patients. Importantly, a statistically significant correlation was found between an elevated percentage of CD4+CD164+ T cells and an elevated percentage of CD4+CD26− T cells in all tested SS patients but not in patients with Mycosis Fungoides and atopic dermatitis or healthy donors. FCRL3 expression was significantly increased only in high tumor burden patients. CD4+CD164+ cells displayed cerebriform morphology and their loss correlated with clinical improvement in treated patients. Our results suggest that CD164 can serve as a marker for diagnosis and for monitoring progression of CTCL/SS and that FCRL3 expression correlates with a high circulating tumor burden.
PMCID: PMC3869886  PMID: 23792457
7.  Sp100 as a Potent Tumor Suppressor: Accelerated Senescence and Rapid Malignant Transformation of Human Fibroblasts through Modulation of an Embryonic Stem Cell Program* 
Cancer research  2010;70(23):9991-10001.
Identifying the functions of proteins, which associate with specific subnuclear structures, is critical to understanding eukaryotic nuclear dynamics. Sp100 is a prototypical protein of ND10/PML nuclear bodies, which colocalizes with Daxx and the proto-oncogenic PML. Sp100 isoforms contain SAND, PHD, Bromo and HMG domains and are highly sumoylated, all characteristics suggestive of a role in chromatin-mediated gene regulation. A role for Sp100 in oncogenesis has not previously been defined. Using selective Sp100 isoform-knockdown approaches, we show that normal human diploid fibroblasts with reduced Sp100 levels rapidly senesce. Subsequently, small rapidly dividing Sp100 minus cells emerge from the senescing fibroblasts and are found to be highly tumorigenic in nude mice. The derivation of these tumorigenic cells from the parental fibroblasts is confirmed by micro-satellite analysis. The small rapidly dividing Sp100 minus cells now also lack ND10/PML bodies, exhibit genomic instability and p53 cytoplasmic sequestration. They also have activated MYC, RAS and TERT pathways and express mesenchymal to epithelial (MET) trans-differentiation markers. Reintroduction of expression of only the Sp100A isoform is sufficient to maintain senescence and to inhibit emergence of the highly tumorigenic cells. Global transcriptome studies, quantitative PCR and protein studies as well as immunolocalization studies during the course of the transformation reveal that a transient expression of stem cell markers precedes the malignant transformation. These results identify a role for Sp100 as a tumor suppressor in addition to its role in maintaining ND10/PML bodies and in the epigenetic regulation of gene expression.
PMCID: PMC3059726  PMID: 21118961
Immortalization; senescence; MYC deregulation; reprogramming; KRAS activation
8.  Identification of a 251 Gene Expression Signature That Can Accurately Detect M. tuberculosis in Patients with and without HIV Co-Infection 
PLoS ONE  2014;9(2):e89925.
Co-infection with tuberculosis (TB) is the leading cause of death in HIV-infected individuals. However, diagnosis of TB, especially in the presence of an HIV co-infection, can be limiting due to the high inaccuracy associated with the use of conventional diagnostic methods. Here we report a gene signature that can identify a tuberculosis infection in patients co-infected with HIV as well as in the absence of HIV.
We analyzed global gene expression data from peripheral blood mononuclear cell (PBMC) samples of patients that were either mono-infected with HIV or co-infected with HIV/TB and used support vector machines to identify a gene signature that can distinguish between the two classes. We then validated our results using publically available gene expression data from patients mono-infected with TB.
Our analysis successfully identified a 251-gene signature that accurately distinguishes patients co-infected with HIV/TB from those infected with HIV only, with an overall accuracy of 81.4% (sensitivity = 76.2%, specificity = 86.4%). Furthermore, we show that our 251-gene signature can also accurately distinguish patients with active TB in the absence of an HIV infection from both patients with a latent TB infection and healthy controls (88.9–94.7% accuracy; 69.2–90% sensitivity and 90.3–100% specificity). We also demonstrate that the expression levels of the 251-gene signature diminish as a correlate of the length of TB treatment.
A 251-gene signature is described to (a) detect TB in the presence or absence of an HIV co-infection, and (b) assess response to treatment following anti-TB therapy.
PMCID: PMC3934945  PMID: 24587128
9.  Isoform-level gene signature improves prognostic stratification and accurately classifies glioblastoma subtypes 
Nucleic Acids Research  2014;42(8):e64.
Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients’ molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification.
PMCID: PMC4005667  PMID: 24503249
10.  Immune Modulators as Therapeutic Agents for Cutaneous T-Cell Lymphoma 
Clinical lymphoma, myeloma & leukemia  2010;10(0 2):10.3816/CLML.2010.s.017.
Cutaneous T-cell lymphoma at all stages appears to be responsive to immune modulatory therapeutic approaches. We describe here the mechanistic rationale for the use of interferons, interleukin-12, retinoids, Toll-like receptor agonists, photopheresis and combinations of immune preserving, immune stimulatory therapies for CTCL.
PMCID: PMC3877673  PMID: 20826407
11.  The Rac1 splice form Rac1b promotes K-ras-induced lung tumorigenesis 
Oncogene  2012;32(7):903-909.
Rac1b, an alternative splice form of Rac1, has been previously shown to be upregulated in colon and breast cancer cells, suggesting an oncogenic role for Rac1b in these cancers. Our analysis of NSCLC tumor and matched normal tissue samples indicates Rac1b is upregulated in a significant fraction of lung tumors in correlation with mutational status of K-ras. To directly assess the oncogenic potential of Rac1b in vivo, we employed a mouse model of lung adenocarcinoma, in which the expression of Rac1b can be conditionally activated specifically in the lung. While expression of Rac1b alone is insufficient to drive tumor initiation, the expression of Rac1b synergizes with an oncogenic allele of K-ras resulting in increased cellular proliferation and accelerated tumor growth. Finally, we show that in contrast to our previous findings demonstrating a requirement for Rac1 in K-ras-driven cell proliferation, Rac1b is not required in this context. Given the partially overlapping spectrum of downstream effectors regulated by Rac1 and Rac1b, our findings further delineate the signaling pathways downstream of Rac1 that are required for K-ras driven tumorigenesis.
PMCID: PMC3384754  PMID: 22430205
12.  Classification and Prediction of Survival in Patients with the Leukemic Phase of Cutaneous T Cell Lymphoma 
The Journal of Experimental Medicine  2003;197(11):1477-1488.
We have used cDNA arrays to investigate gene expression patterns in peripheral blood mononuclear cells from patients with leukemic forms of cutaneous T cell lymphoma, primarily Sezary syndrome (SS). When expression data for patients with high blood tumor burden (Sezary cells >60% of the lymphocytes) and healthy controls are compared by Student's t test, at P < 0.01, we find 385 genes to be differentially expressed. Highly overexpressed genes include Th2 cells–specific transcription factors Gata-3 and Jun B, as well as integrin β1, proteoglycan 2, the RhoB oncogene, and dual specificity phosphatase 1. Highly underexpressed genes include CD26, Stat-4, and the IL-1 receptors. Message for plastin-T, not normally expressed in lymphoid tissue, is detected only in patient samples and may provide a new marker for diagnosis. Using penalized discriminant analysis, we have identified a panel of eight genes that can distinguish SS in patients with as few as 5% circulating tumor cells. This suggests that, even in early disease, Sezary cells produce chemokines and cytokines that induce an expression profile in the peripheral blood distinctive to SS. Finally, we show that using 10 genes, we can identify a class of patients who will succumb within six months of sampling regardless of their tumor burden.
PMCID: PMC2193901  PMID: 12782714
cDNA microarrays; discriminant analysis; class prediction; prognosis; CTCL
Cancer research  2012;72(13):3251-3259.
Survivin is an oncogene that functions in cancer cell cytoprotection and mitosis. Here we report that differential expression in cancer cells of a C-terminal splice variant of survivin, termed survivin-ΔEx3, is tightly associated with aggressive disease and markers of unfavorable prognosis. In contrast to other survivin variants, survivin-ΔEx3 localized exclusively to nuclei in tumor cells and was phosphorylated at multiple residues by the checkpoint kinase Chk2 during DNA damage. Mutagenesis of the Chk2 phosphorylation sites enhanced the stability of survivin-ΔEx3 in tumor cells, inhibited the expression of phosphorylated H2AX (γH2AX) in response to double strand DNA breaks, and impaired growth after DNA damage. DNA damage-induced Chk2 phosphorylation, stabilization of p53, induction of the cyclin-dependent kinase inhibitor p21, and homologous recombination-induced repair were not affected. In vivo, active Chk2 was detected at the earliest stages of the colorectal adenoma-to-carcinoma transition, persisted in advanced tumors, and correlated with increased survivin expression. Together, our findings suggest that Chk2-mediated phosphorylation of survivin-ΔEx3 contributes to a DNA damage-sensing checkpoint that may affect cancer cell sensitivity to genotoxic therapies.
PMCID: PMC3695608  PMID: 22586065
Survivin; ΔEx3; alternative splicing; Chk2; DNA damage
15.  Synergistic Enhancement of Cellular Immune Responses by the Novel Toll Receptor 7/8 Agonist 3M 007 and Interferon-γ: Implications for Therapy of Cutaneous T-Cell Lymphoma 
Leukemia & lymphoma  2011;52(10):1970-1979.
CTCL is responsive at all stages to immunotherapy. We determined whether a novel agonist for TLR 7/8 (3M007) combined with either IFN-γ or IL-15 enhanced patients' immune responses in vitro. Our data demonstrate synergism between IFN-γ and 007 in the activation of patients' NK cytolytic activity against CTCL tumor cell lines and increased production of cytokines by dendritic cells compared to 007 alone. Microarray studies of gene expression of patients' PBMC primed with IFN-γ followed by stimulation with 007 identified significant upregulation of expression of IL-12- p35 (α-chain), IL-12-p40 (β-chain) and nine IFN-α genes. Importantly, the underlying mechanism of increased levels of IFN-α and IL-12 from combined treatment appears to involve IFN regulatory factor 8 (IRF-8). These results further support our hypothesis that combinations of biological modifiers activating different arms of the immune system may provide significant therapeutic benefits for patients with advanced CTCL.
PMCID: PMC3612958  PMID: 21942329
CTCL; immunotherapy; TLR7/8 agonist; IFN-γ; IRF-8
16.  Human breast cancer associated fibroblasts exhibit subtype specific gene expression profiles 
BMC Medical Genomics  2012;5:39.
Breast cancer is a heterogeneous disease for which prognosis and treatment strategies are largely governed by the receptor status (estrogen, progesterone and Her2) of the tumor cells. Gene expression profiling of whole breast tumors further stratifies breast cancer into several molecular subtypes which also co-segregate with the receptor status of the tumor cells. We postulated that cancer associated fibroblasts (CAFs) within the tumor stroma may exhibit subtype specific gene expression profiles and thus contribute to the biology of the disease in a subtype specific manner. Several studies have reported gene expression profile differences between CAFs and normal breast fibroblasts but in none of these studies were the results stratified based on tumor subtypes.
To address whether gene expression in breast cancer associated fibroblasts varies between breast cancer subtypes, we compared the gene expression profiles of early passage primary CAFs isolated from twenty human breast cancer samples representing three main subtypes; seven ER+, seven triple negative (TNBC) and six Her2+.
We observed significant expression differences between CAFs derived from Her2+ breast cancer and CAFs from TNBC and ER + cancers, particularly in pathways associated with cytoskeleton and integrin signaling. In the case of Her2+ breast cancer, the signaling pathways found to be selectively up regulated in CAFs likely contribute to the enhanced migration of breast cancer cells in transwell assays and may contribute to the unfavorable prognosis of Her2+ breast cancer.
These data demonstrate that in addition to the distinct molecular profiles that characterize the neoplastic cells, CAF gene expression is also differentially regulated in distinct subtypes of breast cancer.
PMCID: PMC3505468  PMID: 22954256
17.  Peripheral Immune Cell Gene Expression Predicts Survival of Patients with Non-Small Cell Lung Cancer 
PLoS ONE  2012;7(3):e34392.
Prediction of cancer recurrence in patients with non-small cell lung cancer (NSCLC) currently relies on the assessment of clinical characteristics including age, tumor stage, and smoking history. A better prediction of early stage cancer patients with poorer survival and late stage patients with better survival is needed to design patient-tailored treatment protocols. We analyzed gene expression in RNA from peripheral blood mononuclear cells (PBMC) of NSCLC patients to identify signatures predictive of overall patient survival. We find that PBMC gene expression patterns from NSCLC patients, like patterns from tumors, have information predictive of patient outcomes. We identify and validate a 26 gene prognostic panel that is independent of clinical stage. Many additional prognostic genes are specific to myeloid cells and are more highly expressed in patients with shorter survival. We also observe that significant numbers of prognostic genes change expression levels in PBMC collected after tumor resection. These post-surgery gene expression profiles may provide a means to re-evaluate prognosis over time. These studies further suggest that patient outcomes are not solely determined by tumor gene expression profiles but can also be influenced by the immune response as reflected in peripheral immune cells.
PMCID: PMC3315526  PMID: 22479623
18.  Modulation of gene expression in heart and liver of hibernating black bears (Ursus americanus) 
BMC Genomics  2011;12:171.
Hibernation is an adaptive strategy to survive in highly seasonal or unpredictable environments. The molecular and genetic basis of hibernation physiology in mammals has only recently been studied using large scale genomic approaches. We analyzed gene expression in the American black bear, Ursus americanus, using a custom 12,800 cDNA probe microarray to detect differences in expression that occur in heart and liver during winter hibernation in comparison to summer active animals.
We identified 245 genes in heart and 319 genes in liver that were differentially expressed between winter and summer. The expression of 24 genes was significantly elevated during hibernation in both heart and liver. These genes are mostly involved in lipid catabolism and protein biosynthesis and include RNA binding protein motif 3 (Rbm3), which enhances protein synthesis at mildly hypothermic temperatures. Elevated expression of protein biosynthesis genes suggests induction of translation that may be related to adaptive mechanisms reducing cardiac and muscle atrophies over extended periods of low metabolism and immobility during hibernation in bears. Coordinated reduction of transcription of genes involved in amino acid catabolism suggests redirection of amino acids from catabolic pathways to protein biosynthesis. We identify common for black bears and small mammalian hibernators transcriptional changes in the liver that include induction of genes responsible for fatty acid β oxidation and carbohydrate synthesis and depression of genes involved in lipid biosynthesis, carbohydrate catabolism, cellular respiration and detoxification pathways.
Our findings show that modulation of gene expression during winter hibernation represents molecular mechanism of adaptation to extreme environments.
PMCID: PMC3078891  PMID: 21453527
19.  Gene Expression Profiles in Peripheral Blood Mononuclear Cells Can Distinguish Patients with Non-Small-Cell Lung Cancer from Patients with Non-Malignant Lung Disease 
Cancer research  2009;69(24):9202-9210.
Early diagnosis of lung cancer followed by surgery presently is the most effective treatment for non-small-cell lung cancer (NSCLC). An accurate, minimally invasive test that could detect early disease would permit timely intervention and potentially reduce mortality. Recent studies have shown that the peripheral blood can carry information related to the presence of disease, including prognostic information and information on therapeutic response. We have analyzed gene expression in peripheral blood mononuclear cell (PBMC) samples including 137 patients with NSCLC tumors and 91 patient controls with non-malignant lung conditions, including histologically diagnosed benign nodules. Subjects were primarily smokers and former smokers. We have identified a 29-gene signature that separates these two patient classes with 86% accuracy (91% sensitivity, 80% specificity). Accuracy in an independent validation set, including samples from a new location, was 78% (sensitivity of 76% and specificity of 82%). An analysis of this NSCLC-gene signature in 18 NSCLCs taken pre-surgery, with matched samples from 2-5 months post-surgery, showed that in 78% of cases, the signature was reduced post-surgery and disappeared entirely in 33%. Our results demonstrate the feasibility of using peripheral blood gene expression signatures to identify early-stage NSCLC in at-risk populations.
PMCID: PMC2798582  PMID: 19951989
20.  Genome-wide mapping of RNA Pol-II promoter usage in mouse tissues by ChIP-seq 
Nucleic Acids Research  2010;39(1):190-201.
Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.
PMCID: PMC3017616  PMID: 20843783
21.  KLF17 is a negative regulator of epithelial-mesenchymal transition and metastasis in breast cancer 
Nature cell biology  2009;11(11):1297-1304.
Metastasis is a complex multi-step process requiring the concerted action of many genes and is the primary cause of cancer deaths. Pathways that regulate metastasis enhancement and suppression both contribute to tumor dissemination process. In order to identify novel metastasis suppressors, we set up a forward genetic screen in a mouse model. We transduced a genome-wide RNAi library into the non-metastatic 168FARN breast cancer cell line, orthotopically transplanted the cells into mouse mammary fat pads, and then selected for cells that could metastasize to the lung and identified an RNAi for the KLF17 gene. Conversely, we demonstrate that ectopic expression of KLF17 in highly metastatic 4T1 breast cancer cell line inhibited their ability to metastasize from the mammary fat pad to the lung. We also show that suppression of KLF17 expression promotes breast cancer cell invasion and epithelial-mesenchymal transition (EMT) and that KLF17 functions by directly binding to the promoter of Id-1, a key metastasis regulator in breast cancer, to inhibit its transcription. Finally, we demonstrate that KLF17 expression is significantly down-regulated in primary human breast cancer samples and that the combined expression patterns of KLF17 and Id-1 can serve as a potential biomarker for lymph node metastasis in breast cancer.
PMCID: PMC2784164  PMID: 19801974
22.  Classification and biomarker identification using gene network modules and support vector machines 
BMC Bioinformatics  2009;10:337.
Classification using microarray datasets is usually based on a small number of samples for which tens of thousands of gene expression measurements have been obtained. The selection of the genes most significant to the classification problem is a challenging issue in high dimension data analysis and interpretation. A previous study with SVM-RCE (Recursive Cluster Elimination), suggested that classification based on groups of correlated genes sometimes exhibits better performance than classification using single genes. Large databases of gene interaction networks provide an important resource for the analysis of genetic phenomena and for classification studies using interacting genes.
We now demonstrate that an algorithm which integrates network information with recursive feature elimination based on SVM exhibits good performance and improves the biological interpretability of the results. We refer to the method as SVM with Recursive Network Elimination (SVM-RNE)
Initially, one thousand genes selected by t-test from a training set are filtered so that only genes that map to a gene network database remain. The Gene Expression Network Analysis Tool (GXNA) is applied to the remaining genes to form n clusters of genes that are highly connected in the network. Linear SVM is used to classify the samples using these clusters, and a weight is assigned to each cluster based on its importance to the classification. The least informative clusters are removed while retaining the remainder for the next classification step. This process is repeated until an optimal classification is obtained.
More than 90% accuracy can be obtained in classification of selected microarray datasets by integrating the interaction network information with the gene expression information from the microarrays.
The Matlab version of SVM-RNE can be downloaded from
PMCID: PMC2774324  PMID: 19832995
23.  An integrative ChIP-chip and gene expression profiling to model SMAD regulatory modules 
BMC Systems Biology  2009;3:73.
The TGF-β/SMAD pathway is part of a broader signaling network in which crosstalk between pathways occurs. While the molecular mechanisms of TGF-β/SMAD signaling pathway have been studied in detail, the global networks downstream of SMAD remain largely unknown. The regulatory effect of SMAD complex likely depends on transcriptional modules, in which the SMAD binding elements and partner transcription factor binding sites (SMAD modules) are present in specific context.
To address this question and develop a computational model for SMAD modules, we simultaneously performed chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) and mRNA expression profiling to identify TGF-β/SMAD regulated and synchronously coexpressed gene sets in ovarian surface epithelium. Intersecting the ChIP-chip and gene expression data yielded 150 direct targets, of which 141 were grouped into 3 co-expressed gene sets (sustained up-regulated, transient up-regulated and down-regulated), based on their temporal changes in expression after TGF-β activation. We developed a data-mining method driven by the Random Forest algorithm to model SMAD transcriptional modules in the target sequences. The predicted SMAD modules contain SMAD binding element and up to 2 of 7 other transcription factor binding sites (E2F, P53, LEF1, ELK1, COUPTF, PAX4 and DR1).
Together, the computational results further the understanding of the interactions between SMAD and other transcription factors at specific target promoters, and provide the basis for more targeted experimental verification of the co-regulatory modules.
PMCID: PMC2724489  PMID: 19615063
24.  Learning from positive examples when the negative class is undetermined- microRNA gene identification 
The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using naïve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species.
Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70–80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs.
One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined.
The OneClassmiRNA program is available at: [1]
PMCID: PMC2248178  PMID: 18226233
25.  A Novel Cross-Disciplinary Multi-Institute Approach to Translational Cancer Research: Lessons Learned from Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC) 
Cancer Informatics  2007;3:255-274.
The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC, is one of the first major project-based initiatives stemming from the Pennsylvania Cancer Alliance that was funded for four years by the Department of Health of the Commonwealth of Pennsylvania. The objective of this was to initiate a prototype biorepository and bioinformatics infrastructure with a robust data warehouse by developing a statewide data model (1) for bioinformatics and a repository of serum and tissue samples; (2) a data model for biomarker data storage; and (3) a public access website for disseminating research results and bioinformatics tools. The members of the Consortium cooperate closely, exploring the opportunity for sharing clinical, genomic and other bioinformatics data on patient samples in oncology, for the purpose of developing collaborative research programs across cancer research institutions in Pennsylvania. The Consortium’s intention was to establish a virtual repository of many clinical specimens residing in various centers across the state, in order to make them available for research. One of our primary goals was to facilitate the identification of cancer-specific biomarkers and encourage collaborative research efforts among the participating centers.
The PCABC has developed unique partnerships so that every region of the state can effectively contribute and participate. It includes over 80 individuals from 14 organizations, and plans to expand to partners outside the State. This has created a network of researchers, clinicians, bioinformaticians, cancer registrars, program directors, and executives from academic and community health systems, as well as external corporate partners - all working together to accomplish a common mission.
The various sub-committees have developed a common IRB protocol template, common data elements for standardizing data collections for three organ sites, intellectual property/tech transfer agreements, and material transfer agreements that have been approved by each of the member institutions. This was the foundational work that has led to the development of a centralized data warehouse that has met each of the institutions’ IRB/HIPAA standards.
Currently, this “virtual biorepository” has over 58,000 annotated samples from 11,467 cancer patients available for research purposes. The clinical annotation of tissue samples is either done manually over the internet or semi-automated batch modes through mapping of local data elements with PCABC common data elements. The database currently holds information on 7188 cases (associated with 9278 specimens and 46,666 annotated blocks and blood samples) of prostate cancer, 2736 cases (associated with 3796 specimens and 9336 annotated blocks and blood samples) of breast cancer and 1543 cases (including 1334 specimens and 2671 annotated blocks and blood samples) of melanoma. These numbers continue to grow, and plans to integrate new tumor sites are in progress. Furthermore, the group has also developed a central web-based tool that allows investigators to share their translational (genomics/proteomics) experiment data on research evaluating potential biomarkers via a central location on the Consortium’s web site.
The technological achievements and the statewide informatics infrastructure that have been established by the Consortium will enable robust and efficient studies of biomarkers and their relevance to the clinical course of cancer. Studies resulting from the creation of the Consortium may allow for better classification of cancer types, more accurate assessment of disease prognosis, a better ability to identify the most appropriate individuals for clinical trial participation, and better surrogate markers of disease progression and/or response to therapy.
PMCID: PMC2675833  PMID: 19455246

