1.  Relationship of DNA Methylation and Gene Expression in Idiopathic Pulmonary Fibrosis 
Rationale: Idiopathic pulmonary fibrosis (IPF) is an untreatable and often fatal lung disease that is increasing in prevalence and is caused by complex interactions between genetic and environmental factors. Epigenetic mechanisms control gene expression and are likely to regulate the IPF transcriptome.
Objectives: To identify methylation marks that modify gene expression in IPF lung.
Methods: We assessed DNA methylation (comprehensive high-throughput arrays for relative methylation arrays [CHARM]) and gene expression (Agilent gene expression arrays) in 94 patients with IPF and 67 control subjects, and performed integrative genomic analyses to define methylation–gene expression relationships in IPF lung. We validated methylation changes by a targeted analysis (Epityper), and performed functional validation of one of the genes identified by our analysis.
Measurements and Main Results: We identified 2,130 differentially methylated regions (DMRs; <5% false discovery rate), of which 738 are associated with significant changes in gene expression and enriched for expected inverse relationship between methylation and expression (P < 2.2 × 10−16). We validated 13/15 DMRs by targeted analysis of methylation. Methylation–expression quantitative trait loci (methyl-eQTL) identified methylation marks that control cis and trans gene expression, with an enrichment for cis relationships (P < 2.2 × 10−16). We found five trans methyl-eQTLs where a methylation change at a single DMR is associated with transcriptional changes in a substantial number of genes; four of these DMRs are near transcription factors (castor zinc finger 1 [CASZ1], FOXC1, MXD4, and ZDHHC4). We studied the in vitro effects of change in CASZ1 expression and validated its role in regulation of target genes in the methyl-eQTL.
Conclusions: These results suggest that DNA methylation may be involved in the pathogenesis of IPF.
PMCID: PMC4315819  PMID: 25333685
DNA methylation; gene expression; pulmonary fibrosis; quantitative trait; mapping
2.  Differentiating progressive from nonprogressive T1 bladder cancer by gene expression profiling: Applying RNA-sequencing analysis on archived specimens 
Urologic oncology  2013;32(3):327-336.
To identify gene signatures in transitional cell carcinoma that can differentiate high-grade T1 nonprogressive (T1NP) bladder cancer (BCa) from those T1 progressive (T1P) tumors that progress to muscularis propria–invasive T2 tumors.
Materials and methods
We performed a high-throughput RNA sequencing (RNA-Seq) on formalin-fixed and paraffin-embedded BCa specimens with clinical pathologic characteristics best representing the general clinical development of the disease. For the T1NP group, only patients with long-term follow-up (6–17 y) and periodic examinations (average of 4 resections and 9 cytology tests) were selected. For the T1P group, only patients in whom a complete resection was performed after a minimum of 8 months after the initial T1 diagnosis were selected, therefore eliminating the possibility of underdiagnosis. Only samples in which muscularis propria was present and uninvolved were included, further assuring a correct diagnosis. The RNA-Seq reads were mapped to the human genome build NCBI 36 (hg18) using TopHat with no mismatch. After alignment to the transcriptome and expression quantification, a linear statistical model was built using Limma between T1NP and T1P samples to identify differentially expressed genes.
Overall, 5,561 genes were mapped to all samples and used for RNA-Seq analysis to identify a gene signature that was significantly and differentially expressed between patients with T1NP BCa and patients with T1P BCa. Signature-based stratification indicated the gene signature correlated notably with the time of T1 development to T2 tumor, suggesting that the molecular signature might be used as an independent predictor for the pace of high-grade T1 BCa progression.
This is the first demonstration that RNA-Seq can be applied as a powerful tool to study BCa using formalin-fixed and paraffin-embedded specimens. We identified a gene signature that can distinguish patients diagnosed with high-grade T1 BCas that remain as non–muscle invasive tumors from those patients with cancers progressing to muscle-invasive tumors. Our findings will make future large-scale clinical cohort studies and clinical trial–based studies possible and help the development of prognostic tools for accurate prediction of T1 BCa progression that may considerably influence the clinical decision–making process, treatment regimen, and patient survival.
PMCID: PMC4446054  PMID: 24055427
T1 bladder cancer; RNA-Seq; Tumor progression; Molecular signature
3.  CXCR4 pathway associated with family history of melanoma 
Cancer causes & control : CCC  2013;25(1):125-132.
Genetic predisposition plays a major role in the etiology of melanoma, but known genetic markers only account for a limited fraction of family history-associated melanoma cases. Expression microarrays have offered the opportunity to identify further genomic profiles correlated with family history of melanoma. We aimed to distinguish mRNA expression signatures between melanoma cases with and without a family history of melanoma.
Based on the Nurses’ Health Study, family history was defined as having one or more first-degree family members diagnosed with melanoma. Melanoma diagnosis was confirmed by reviewing pathology reports and tumor blocks were collected by mail from across the United States. Genomic interrogation was accomplished through evaluating expression profiling of formalin-fixed paraffin-embedded tissues from 78 primary cutaneous invasive melanoma cases, on either a 6K or whole-genome (24K) Illumina gene chip. Gene Set Enrichment Analysis was performed for each batch to determine the differentially enriched pathways and key contributing genes.
The CXC chemokine receptor 4 (CXCR4) pathway was consistently up-regulated within cases of familial melanoma in both platforms. Leading edge analysis showed four genes from the CXCR4 pathway, including MAPK1, PLCG1, CRK, and PTK2, were among the core members that contributed to the enrichment of this pathway. There was no association between the enrichment of CXCR4 pathway and NRAS, BRAF mutation, or Breslow thickness of the primary melanoma cases.
We found that the CXCR4 pathway might constitute a novel susceptibility pathway associated with family history of melanoma in first-degree relatives.
PMCID: PMC4100594  PMID: 24158781
melanoma; genetic pathway analysis; genome-wide expression profiling
4.  Alterations in Diversity of the Oral Microbiome in Pediatric Inflammatory Bowel Disease 
Inflammatory bowel diseases  2011;18(5):935-942.
Oral pathology is a commonly reported extraintestinal manifestation of Crohn’s disease (CD). The host–microbe interaction has been implicated in the pathogenesis of inflammatory bowel disease (IBD) in genetically susceptible hosts, yet limited information exists about oral microbes in IBD. We hypothesize that the microbiology of the oral cavity may differ in patients with IBD. Our laboratory has developed a 16S rRNA-based technique known as the Human Oral Microbe Identification Microarray (HOMIM) to study the oral microbiome of children and young adults with IBD.
Tongue and buccal mucosal brushings from healthy controls, CD, and ulcerative colitis (UC) patients were analyzed using HOMIM. Shannon Diversity Index (SDI) and Principal Component Analysis (PCA) were employed to compare population and phylum-level changes among our study groups.
In all, 114 unique subjects from the Children’s Hospital Boston were enrolled. Tongue samples from patients with CD showed a significant decrease in overall microbial diversity as compared with the same location in healthy controls (P = 0.015) with significant changes seen in Fusobacteria (P < 0.0002) and Firmicutes (P = 0.022). Tongue samples from patients with UC did not show a significant change in overall microbial diversity as compared with healthy controls (P = 0.418).
As detected by HOMIM, we found a significant decrease in overall diversity in the oral microbiome of pediatric CD. Considering the proposed microbe–host interaction in IBD, the ease of visualization and direct oral mucosal sampling of the oral cavity, further study of the oral microbiome in IBD is of potential diagnostic and prognostic value.
PMCID: PMC4208308  PMID: 21987382
inflammatory bowel disease; Crohn’s disease; ulcerative colitis; microbiome; oral microbial biomarkers
5.  Mutational analysis clopidogrel resistance and platelet function in patients scheduled for coronary artery bypass grafting 
Genomics  2013;101(6):313-317.
Clopidogrel is an oral antiplatelet pro-drug prescribed to 40 million patients worldwide who are at risk for thrombotic events or receiving percutaneous coronary intervention (PCI). However about a fifth of patients treated with clopidogrel do not respond adequately to the drug. From a cohort of 105 patients on whom we had functional data on clopidogrel response, we used ultra-high throughput sequencing to assay mutations in CYP2C19 and ABCB1, the two genes genetically linked to respond. Testing for mutations in CYP2C19, as recommended by the FDA, only correctly predicted if a patient would respond to clopidogrel 52.4% of the time. Similarly, testing of the ABCB1 gene only correctly foretold response in 51 (48.6%) patients. These results are clinically relevant and suggest that until additional genetic factors are discovered that predict response more completely, functional assays are more appropriate for clinical use.
PMCID: PMC4149181  PMID: 23462555
Cardiovascular surgery; Ultra-high throughput sequencing; Gene polymorphisms; Platelets
6.  Next-generation sequencing and microarray-based interrogation of microRNAs from formalin-fixed, paraffin-embedded tissue: Preliminary assessment of cross-platform concordance 
Genomics  2013;102(1):8-14.
Next-generation sequencing is increasingly employed in biomedical investigations. Strong concordance between microarray and mRNA-seq levels has been reported in high quality specimens but information is lacking on formalin-fixed, paraffin-embedded (FFPE) tissues, and particularly for microRNA (miRNA) analysis. We conducted a preliminary examination of the concordance between miRNA-seq and cDNA-mediated annealing, selection, extension, and ligation (DASL) miRNA assays. Quantitative agreement between platforms is moderate (Spearman correlation 0.514–0.596) and there is discordance of detection calls on a subset of miRNAs. Quantitative PCR (q-RT-PCR) performed for several discordant miRNAs confirmed the presence of most sequences detected by miRNA-seq but not by DASL but also that miRNA-seq did not detect some sequences, which DASL confidently detected. Our results suggest that miRNA-seq is specific, with few false positive calls, but it may not detect certain abundant miRNAs in FFPE tissue. Further work is necessary to fully address these issues that are pertinent for translational research.
PMCID: PMC4116671  PMID: 23562991
RNA-sequencing; microarrays; paraffin-tissue
7.  Brain and Testicular Tumors in Mice with Progenitor Cells Lacking BAX and BAK 
Oncogene  2012;32(35):4078-4085.
The pro-apoptotic BCL-2 family proteins BAX and BAK serve as essential gatekeepers of the intrinsic apoptotic pathway and, when activated, transform into pore forming homo-oligomers that permeabilize the mitochondrial outer membrane. Deletion of Bax and Bak causes marked resistance to death stimuli in a variety of cell types. Bax−/−Bak−/− mice are predominantly nonviable and survivors exhibit multiple developmental abnormalities characterized by cellular excess, including accumulation of neural progenitor cells in the periventricular, hippocampal, cerebellar, and olfactory bulb regions of the brain. To explore the long-term pathophysiologic consequences of BAX/BAK deficiency in a stem cell niche, we generated Bak−/− mice with conditional deletion of Bax in Nestin-positive cells. Aged NestinCreBaxfl/flBak−/− mice manifest progressive brain enlargement with a profound accumulation of NeuN- and Sox2-positive neural progenitor cells within the subventricular zone. One-third of the mice develop frank masses comprised of neural progenitors, and in 20% of these cases, more aggressive, hypercellular tumors emerged. Unexpectedly, 60% of NestinCreBaxfl/flBak−/− mice harbored high-grade tumors within the testis, a peripheral site of Nestin expression. This in vivo model of severe apoptotic blockade highlights the constitutive role of BAX/BAK in long-term regulation of Nestin-positive progenitor cell pools, with loss of function predisposing to adult-onset tumorigenesis.
PMCID: PMC3529761  PMID: 22986529
BAX; BAK; neural progenitor cell; apoptosis; tumorigenesis
8.  Live Attenuated Rev-Independent Nef¯SIV Enhances Acquisition of Heterologous SIVsmE660 in Acutely Vaccinated Rhesus Macaques 
PLoS ONE  2013;8(9):e75556.
Rhesus macaques (RMs) inoculated with live-attenuated Rev-Independent Nef¯ simian immunodeficiency virus (Rev-Ind Nef¯SIV) as adults or neonates controlled viremia to undetectable levels and showed no signs of immunodeficiency over 6-8 years of follow-up. We tested the capacity of this live-attenuated virus to protect RMs against pathogenic, heterologous SIVsmE660 challenges.
Methodology/Principal Findings
Three groups of four RM were inoculated with Rev-Ind Nef¯SIV and compared. Group 1 was inoculated 8 years prior and again 15 months before low dose intrarectal challenges with SIVsmE660. Group 2 animals were inoculated with Rev-Ind Nef¯SIV at 15 months and Group 3 at 2 weeks prior to the SIVsmE660 challenges, respectively. Group 4 served as unvaccinated controls. All RMs underwent repeated weekly low-dose intrarectal challenges with SIVsmE660. Surprisingly, all RMs with acute live-attenuated virus infection (Group 3) became superinfected with the challenge virus, in contrast to the two other vaccine groups (Groups 1 and 2) (P=0.006 for each) and controls (Group 4) (P=0.022). Gene expression analysis showed significant upregulation of innate immune response-related chemokines and their receptors, most notably CCR5 in Group 3 animals during acute infection with Rev-Ind Nef¯SIV.
We conclude that although Rev-Ind Nef¯SIV remained apathogenic, acute replication of the vaccine strain was not protective but associated with increased acquisition of heterologous mucosal SIVsmE660 challenges.
PMCID: PMC3787041  PMID: 24098702
9.  Integrative Genomic Analysis Implicates Gain of PIK3CA at 3q26 and MYC at 8q24 in Chronic Lymphocytic Leukemia 
The disease course of chronic lymphocytic leukemia (CLL) varies significantly within cytogenetic groups. We hypothesized that high resolution genomic analysis of CLL would identify additional recurrent abnormalities associated with short time to first therapy (TTFT).
Experimental Design
We undertook high resolution genomic analysis of 161 prospectively enrolled CLLs using Affymetrix 6.0 SNP arrays, and integrated analysis of this dataset with gene expression profiles.
Copy number analysis (CNA) of nonprogressive CLL reveals a stable genotype, with a median of only 1 somatic CNA per sample. Progressive CLL with 13q deletion was associated with additional somatic CNAs, and a greater number of CNAs was predictive of TTFT. We identified other recurrent CNAs associated with short TTFT: 8q24 amplification focused on the cancer susceptibility locus near MYC in 3.7%; 3q26 amplifications focused on PIK3CA in 5.6%; and 8p deletions in 5% of patients. Sequencing of MYC further identified somatic mutations in two CLLs. We determined which catalytic subunits of PI3K were in active complex with the p85 regulatory subunit, and demonstrated enrichment for the alpha subunit in three CLLs carrying PIK3CA amplification.
Our findings implicate amplifications of 3q26 focused on PIK3CA and 8q24 focused on MYC in CLL.
PMCID: PMC3719990  PMID: 22623730
CLL; PIK3CA; MYC; genomics; copy number
10.  Ovarian reserve status in young women is associated with altered gene expression in membrana granulosa cells 
Molecular Human Reproduction  2012;18(7):362-371.
Diminished ovarian reserve (DOR) is a challenging diagnosis of infertility, as there are currently no tests to predict who may become affected with this condition, or at what age. We designed the present study to compare the gene expression profile of membrana granulosa cells from young women affected with DOR with those from egg donors of similar age and to determine if distinct genetic patterns could be identified to provide insight into the etiology of DOR. Young women with DOR were identified based on FSH level in conjunction with poor follicular development during an IVF cycle (n = 13). Egg donors with normal ovarian reserve (NOR) comprised the control group (n = 13). Granulosa cells were collected following retrieval, RNA was extracted and microarray analysis was conducted to evaluate genetic differences between the groups. Confirmatory studies were undertaken with quantitative RT–PCR (qRT–PCR). Multiple significant differences in gene expression were observed between the DOR patients and egg donors. Two genes linked with ovarian function, anti-Mullerian hormone (AMH) and luteinizing hormone receptor (LHCGR), were further analyzed with qRT–PCR in all patients. The average expression of AMH was significantly higher in egg donors (adjusted P-value = 0.01), and the average expression of LHCGR was significantly higher in DOR patients (adjusted P-value = 0.005). Expression levels for four additional genes, progesterone receptor membrane component 2 (PGRMC2), prostaglandin E receptor 3 (subtype EP3) (PTGER3), steroidogenic acute regulatory protein (StAR), and StAR-related lipid transfer domain containing 4 (StarD4), were validated in a group consisting of five NOR and five DOR patients. We conclude that gene expression analysis has substantial potential to determine which young women may be affected with DOR. More importantly, our analysis suggests that DOR patients fall into two distinct subgroups based on gene expression profiles, indicating that different mechanisms may be involved during development of this pathology.
PMCID: PMC3378309  PMID: 22355044
granulosa cells; oocyte quality; diminished ovarian reserve; IVF; microarray analysis
Clinical chemistry  2011;58(3):580-589.
Despite widespread interest in the application of next-generation-sequencing (NGS) to the mutation profiling of individual cancer specimens, the onset of personalized clinical genomics is currently stalled due in part to technical hurdles. As tumors are genetically-heterogeneous and often mixed with normal/stromal cells, the resulting low-abundance DNA somatic mutations often produce ambiguous results or fall below the current NGS detection limit, thus hindering mutation calling that abides to clinical sensitivity/specificity standards. Here we examine the feasibility of applying COLD-PCR, a form of PCR that magnifies selectively the mutations, to boost the detection of unknown rare somatic mutations prior to applying NGS-based amplicon re-sequencing to clinical samples. We amplified DNA from serially-diluted mutation-containing human cell-lines into wild-type (WT) DNA, as well as lung adenocarcinoma and colorectal cancer specimens using COLD-PCR or conventional PCR for comparison. Following individual amplification of TP53, KRAS, IDH1, and EGFR regions, PCR products were barcoded, pooled for library preparation and sequenced on the Illumina-HiSeq2000 platform. Regardless of sequencing depth, sequencing errors dictated a mutation-detection limit of ~1–2% mutation abundance in conventional PCR amplicons analyzed by NGS. In contrast, COLD-PCR amplicons enabled genuine mutations to exceed the sequence noise levels, thus allowing reliable identification of mutation abundances of ~0.04%. Sequencing depth was not a significant factor in the identification of COLD-PCR-magnified mutations. The analyzed clinical specimens revealed several TP53 and KRAS missense mutations that could not be called following NGS of conventional amplicons, yet were clearly detectable in COLD-PCR amplicons. Extensive tumor heterogeneity in the TP53 gene was revealed in some samples. As cancer care shifts toward personalized intervention, based on the unique genetic abnormalities in each patient’s tumor genome, we anticipate that COLD-PCR-NGS will elucidate the role of rare mutations in tumors, enable NGS-based analysis of diverse clinical specimens and the broad inter-phasing of NGS with clinical practice.
PMCID: PMC3418918  PMID: 22194627
COLD-PCR; mutation enrichment; low-abundance mutations; next generation sequencing; cancer
12.  Interpreting cancer genomes using systematic host perturbations by tumour virus proteins 
Nature  2012;487(7408):491-495.
Genotypic differences greatly influence susceptibility and resistance to disease. Understanding genotype-phenotype relationships requires that phenotypes be viewed as manifestations of network properties, rather than simply as the result of individual genomic variations1. Genome sequencing efforts have identified numerous germline mutations associated with cancer predisposition and large numbers of somatic genomic alterations2. However, it remains challenging to distinguish between background, or “passenger” and causal, or “driver” cancer mutations in these datasets. Human viruses intrinsically depend on their host cell during the course of infection and can elicit pathological phenotypes similar to those arising from mutations3. To test the hypothesis that genomic variations and tumour viruses may cause cancer via related mechanisms, we systematically examined host interactome and transcriptome network perturbations caused by DNA tumour virus proteins. The resulting integrated viral perturbation data reflects rewiring of the host cell networks, and highlights pathways that go awry in cancer, such as Notch signalling and apoptosis. We show that systematic analyses of host targets of viral proteins can identify cancer genes with a success rate on par with their identification through functional genomics and large-scale cataloguing of tumour mutations. Together, these complementary approaches result in increased specificity for cancer gene identification. Combining systems-level studies of pathogen-encoded gene products with genomic approaches will facilitate prioritization of cancer-causing driver genes so as to advance understanding of the genetic basis of human cancer.
PMCID: PMC3408847  PMID: 22810586
16.  A microRNA activity map of human mesenchymal tumors: connections to oncogenic pathways; an integrative transcriptomic study 
BMC Genomics  2012;13:332.
MicroRNAs (miRNAs) are nucleic acid regulators of many human mRNAs, and are associated with many tumorigenic processes. miRNA expression levels have been used in profiling studies, but some evidence suggests that expression levels do not fully capture miRNA regulatory activity. In this study we integrate multiple gene expression datasets to determine miRNA activity patterns associated with cancer phenotypes and oncogenic pathways in mesenchymal tumors – a very heterogeneous class of malignancies.
Using a computational method, we identified differentially activated miRNAs between 77 normal tissue specimens and 135 sarcomas and we validated many of these findings with microarray interrogation of an independent, paraffin-based cohort of 18 tumors. We also showed that miRNA activity is imperfectly correlated with miRNA expression levels. Using next-generation miRNA sequencing we identified potential base sequence alterations which may explain differential activity. We then analyzed miRNA activity changes related to the RAS-pathway and found 21 miRNAs that switch from silenced to activated status in parallel with RAS activation. Importantly, nearly half of these 21 miRNAs were predicted to regulate integral parts of the miRNA processing machinery, and our gene expression analysis revealed significant reductions of these transcripts in RAS-active tumors. These results suggest an association between RAS signaling and miRNA processing in which miRNAs may attenuate their own biogenesis.
Our study represents the first gene expression-based investigation of miRNA regulatory activity in human sarcomas, and our findings indicate that miRNA activity patterns derived from integrated transcriptomic data are reproducible and biologically informative in cancer. We identified an association between RAS signaling and miRNA processing, and demonstrated sequence alterations as plausible causes for differential miRNA activity. Finally, our study highlights the value of systems level integrative miRNA/mRNA assessment with high-throughput genomic data, and the applicability of paraffin-tissue-derived RNA for validation of novel findings.
PMCID: PMC3443663  PMID: 22823907
MicroRNA; Microarray; RAS; Mesenchymal tumors; MicroRNA biogenesis
17.  A function for cyclin D1 in DNA repair uncovered by interactome analyses in human cancers 
Nature  2011;474(7350):230-234.
Cyclin D1 is a component of the core cell cycle machinery1. Abnormally high levels of cyclin D1 are detected in many human cancer types2. To elucidate the molecular functions of cyclin D1 in human cancers, here we performed a proteomic screen for cyclin D1 protein partners in several types of human tumors. Analyses of cyclin D1-interactors revealed a network of DNA repair proteins, including RAD51, a recombinase that drives the homologous recombination process3. We found that cyclin D1 directly binds RAD51, and that cyclin D1-RAD51 interaction is induced by radiation. Like RAD51, cyclin D1 is recruited to DNA damage sites in a BRCA2-dependent fashion. Reduction of cyclin D1 levels in human cancer cells impaired recruitment of RAD51 to damaged DNA, impeded the homologous recombination-mediated DNA repair, and increased sensitivity of cells to radiation in vitro and in vivo. This effect was seen in cancer cells lacking the retinoblastoma protein, which do not require D-cyclins for proliferation4, 5. These findings reveal an unexpected function of a core cell cycle protein in DNA repair and suggest that targeting cyclin D1 may be beneficial also in retinoblastoma-negative cancers which are currently thought to be oblivious to cyclin D1 inhibition.
PMCID: PMC3134411  PMID: 21654808
18.  The Stem Cell Discovery Engine: an integrated repository and analysis system for cancer stem cell comparisons 
Nucleic Acids Research  2011;40(Database issue):D984-D991.
Mounting evidence suggests that malignant tumors are initiated and maintained by a subpopulation of cancerous cells with biological properties similar to those of normal stem cells. However, descriptions of stem-like gene and pathway signatures in cancers are inconsistent across experimental systems. Driven by a need to improve our understanding of molecular processes that are common and unique across cancer stem cells (CSCs), we have developed the Stem Cell Discovery Engine (SCDE)—an online database of curated CSC experiments coupled to the Galaxy analytical framework. The SCDE allows users to consistently describe, share and compare CSC data at the gene and pathway level. Our initial focus has been on carefully curating tissue and cancer stem cell-related experiments from blood, intestine and brain to create a high quality resource containing 53 public studies and 1098 assays. The experimental information is captured and stored in the multi-omics Investigation/Study/Assay (ISA-Tab) format and can be queried in the data repository. A linked Galaxy framework provides a comprehensive, flexible environment populated with novel tools for gene list comparisons against molecular signatures in GeneSigDB and MSigDB, curated experiments in the SCDE and pathways in WikiPathways. The SCDE is available at
PMCID: PMC3245064  PMID: 22121217
19.  GeneSigDB: a manually curated database and resource for analysis of gene expression signatures 
Nucleic Acids Research  2011;40(Database issue):D1060-D1066.
GeneSigDB ( or is a database of gene signatures that have been extracted and manually curated from the published literature. It provides a standardized resource of published prognostic, diagnostic and other gene signatures of cancer and related disease to the community so they can compare the predictive power of gene signatures or use these in gene set enrichment analysis. Since GeneSigDB release 1.0, we have expanded from 575 to 3515 gene signatures, which were collected and transcribed from 1604 published articles largely focused on gene expression in cancer, stem cells, immune cells, development and lung disease. We have made substantial upgrades to the GeneSigDB website to improve accessibility and usability, including adding a tag cloud browse function, facetted navigation and a ‘basket’ feature to store genes or gene signatures of interest. Users can analyze GeneSigDB gene signatures, or upload their own gene list, to identify gene signatures with significant gene overlap and results can be viewed on a dynamic editable heatmap that can be downloaded as a publication quality image. All data in GeneSigDB can be downloaded in numerous formats including .gmt file format for gene set enrichment analysis or as a R/Bioconductor data file. GeneSigDB is available from
PMCID: PMC3245038  PMID: 22110038
20.  Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks 
Nucleic Acids Research  2011;40(Database issue):D866-D875.
Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these ‘known’ interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from The PN code base is freely available at
PMCID: PMC3245161  PMID: 22096235
21.  GeneSigDB—a curated database of gene expression signatures 
Nucleic Acids Research  2009;38(Database issue):D716-D725.
The primary objective of most gene expression studies is the identification of one or more gene signatures; lists of genes whose transcriptional levels are uniquely associated with a specific biological phenotype. Whilst thousands of experimentally derived gene signatures are published, their potential value to the community is limited by their computational inaccessibility. Gene signatures are embedded in published article figures, tables or in supplementary materials, and are frequently presented using non-standard gene or probeset nomenclature. We present GeneSigDB ( a manually curated database of gene expression signatures. GeneSigDB release 1.0 focuses on cancer and stem cells gene signatures and was constructed from more than 850 publications from which we manually transcribed 575 gene signatures. Most gene signatures (n = 560) were successfully mapped to the genome to extract standardized lists of EnsEMBL gene identifiers. GeneSigDB provides the original gene signature, the standardized gene list and a fully traceable gene mapping history for each gene from the original transcribed data table through to the standardized list of genes. The GeneSigDB web portal is easy to search, allows users to compare their own gene list to those in the database, and download gene signatures in most common gene identifier formats.
PMCID: PMC2808880  PMID: 19934259
22.  CRX Is a Diagnostic Marker of Retinal and Pineal Lineage Tumors 
PLoS ONE  2009;4(11):e7932.
CRX is a homeobox transcription factor whose expression and function is critical to maintain retinal and pineal lineage cells and their progenitors. To determine the biologic and diagnostic potential of CRX in human tumors of the retina and pineal, we examined its expression in multiple settings.
Methodology/Principal Findings
Using situ hybridization and immunohistochemistry we show that Crx RNA and protein expression are exquisitely lineage restricted to retinal and pineal cells during normal mouse and human development. Gene expression profiling analysis of a wide range of human cancers and cancer cell lines also supports that CRX RNA is highly lineage restricted in cancer. Immunohistochemical analysis of 22 retinoblastomas and 13 pineal parenchymal tumors demonstrated strong expression of CRX in over 95% of these tumors. Importantly, CRX was not detected in the majority of tumors considered in the differential diagnosis of pineal region tumors (n = 78). The notable exception was medulloblastoma, 40% of which exhibited CRX expression in a heterogeneous pattern readily distinguished from that seen in retino-pineal tumors.
These findings describe new potential roles for CRX in human cancers and highlight the general utility of lineage restricted transcription factors in cancer biology. They also identify CRX as a sensitive and specific clinical marker and a potential lineage dependent therapeutic target in retinoblastoma and pineoblastoma.
PMCID: PMC2775954  PMID: 19936203

