Copy number alterations (CNAs) can be observed in most of cancer patients. Several oncogenes and tumor suppressor genes with CNAs have been identified in different kinds of tumor. However, the systematic survey of CNA-affected functions is still lack. By employing systems biology approaches, instead of examining individual genes, we directly identified the functional hotspots on human genome. A total of 838 hotspots on human genome with 540 enriched Gene Ontology functions were identified. Seventy-six aCGH array data of hepatocellular carcinoma (HCC) tumors were employed in this study. A total of 150 regions which putatively affected by CNAs and the encoded functions were identified. Our results indicate that two immune related hotspots had copy number alterations in most of patients. In addition, our data implied that these immune-related regions might be involved in HCC oncogenesis. Also, we identified 39 hotspots of which copy number status were associated with patient survival. Our data implied that copy number alterations of the regions may contribute in the dysregulation of the encoded functions. These results further demonstrated that our method enables researchers to survey biological functions of CNAs and to construct regulation hypothesis at pathway and functional levels.
Copy number alteration; Gene set enrichment; Pathway analysis; Liver cancer
The role of adjuvant radiotherapy (RT) for patients with stage III thymoma after complete resection is not definite. Some authors have advocated postoperative RT after complete tumor resection, but some others suggested observation. In this study, we retrospectively evaluated the effect of postoperative RT on survival as well as tumor control in patients with Masaoka stage III thymoma.
Between June 1982 and December 2010, 65 patients who underwent complete resection of stage III thymoma entered the study. Fifty-three patients had adjuvant RT after surgery (S + R) and 12 had surgery only (S alone). Of patients who had adjuvant RT, 28 had three-dimensional conformal RT (3D-CRT)/intensity modulated RT (IMRT) and 25 had conventional RT. A median prescribed dose of 56 Gy (range, 28–60 Gy) was given.
The median follow-up time was 50 months (range, 5–360 months). Five- and 10-year overall survival (OS) rates were 91.7% and 71.6%, respectively, for S + R and 81.5% and 65.2% for S alone (P = 0.5), respectively. In the subgroup analysis, patients with 3D-CRT/IMRT showed a trend of improved 5-year OS rate compared with conventional RT (100% vs. 86.9%, P =0.12). Compared with S alone, the 5-year OS rate was significantly improved (100% vs. 81.5%, P = 0.049). Relapses occurred in 15 patients (23.1%). There was a trend of lower crude local recurrence rates for S + R (3.8%) compared with S alone (16.7%) (P = 0.09), whereas the crude regional recurrence rates were similar (P = 0.9). No clear dose–response relationship was found according to prescribed doses.
Adjuvant 3D-CRT/IMRT showed potential advantages in improving survival and reducing relapse in patients with stage III thymoma after complete resection, whereas adjuvant RT did not significantly improve survival or reduce recurrence for the cohort as a whole. Doses of ≤ 50 Gy may be effective and could be prescribed for adjuvant RT. To confirm the role of adjuvant 3D-CRT/IMRT in patients who undergo a complete resection of thymoma, a multicenter randomized study should be performed.
Thymoma; Radiation; Surgery; Failure pattern
Extracting maximal information from gene signature sets (GSSs) via microarray-based transcriptional profiling involves assigning function to up and down regulated genes. Here we present a novel sample scoring method called Signature-score (S-score) which can be used to quantify the expression pattern of tumor samples from previously identified gene signature sets. A simulation result demonstrated an improved accuracy and robustness by S-score method comparing with other scoring methods. By applying the S-score method to cholangiocarcinoma (CAC), an aggressive hepatic cancer that arises from bile ducts cells, we identified enriched oncogenic pathways in two large CAC data sets. Thirteen pathways were enriched in CAC compared with normal liver and bile duct. Moreover, using S-score, we were able to dissect correlations between CAC-associated oncogenic pathways and Gene Ontology function. Two major oncogenic clusters and associated functions were identified. Cluster 1, which included beta-catenin and Ras, showed a positive correlation with the cell cycle, while cluster 2, which included TGF-beta, cytokeratin 19 and EpCAM was inversely correlated with immune function. We also used S-score to identify pathways that are differentially expressed in CAC and hepatocellular carcinoma (HCC), the more common subtype of liver cancer. Our results demonstrate the utility and effectiveness of S-score in assigning functional roles to tumor-associated gene signature sets and in identifying potential therapeutic targets for specific liver cancer subtypes.
gene signature set; pathway analysis; S-score method; tumor classification
DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters.
Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework.
CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/.
microRNAs (miRNAs) have been implicated in the control of many biological processes and their deregulation has been associated with many cancers. In recent years, the cancer stem cell (CSC) concept has been applied to many cancers including pediatric. We hypothesized that a common signature of deregulated miRNAs in the CSCs fraction may explain the disrupted signaling pathways in CSCs.
Using a high throughput qPCR approach we identified 26 CSC associated differentially expressed miRNAs (DEmiRs). Using BCmicrO algorithm 865 potential CSC associated DEmiR targets were obtained. These potential targets were subjected to KEGG, Biocarta and Gene Ontology pathway and biological processes analysis. Four annotated pathways were enriched: cell cycle, cell proliferation, p53 and TGF-beta/BMP. Knocking down hsa-miR-21-5p, hsa-miR-181c-5p and hsa-miR-135b-5p using antisense oligonucleotides and small interfering RNA in cell lines led to the depletion of the CSC fraction and impairment of sphere formation (CSC surrogate assays).
Our findings indicated that CSC associated DEmiRs and the putative pathways they regulate may have potential therapeutic applications in pediatric cancers.
Background & Aims
Hepatocellular carcinoma (HCC) is an aggressive malignancy; its mechanisms of development and progression are poorly understood. We used an integrative approach to identify HCC driver genes, defined as genes whose copy numbers associate with gene expression and cancer progression.
We combined data from high-resolution, array-based comparative genomic hybridization (CGH) and transcriptome analysis of HCC samples from 76 patients with hepatitis B virus infection with data on patient survival times. Candidate genes were functionally validated using in vitro and in vivo models.
Unsupervised analyses of array CGH data associated loss of chromosome 8p with poor outcome (reduced survival time); somatic copy number alterations correlated with expression of 27.3% of genes analyzed. We associated expression levels of 10 of these genes with patient survival times in 2 independent cohorts (comprising 319 cases of HCC with mixed etiology) and 3 breast cancer cohorts (637 cases). Among the 10-gene signature, a cluster of 6 genes on 8p, (DLC1, CCDC25, ELP3, PROSC, SH2D4A, and SORBS3) were deleted in HCCs from patients with poor outcomes. In vitro and in vivo analyses indicated that the products of PROSC, SH2D4A, and SORBS3 have tumor-suppressive activities, along with the known tumor suppressor gene, DLC1.
We used an unbiased approach to identify 10 genes associated with HCC progression. These might be used in assisting diagnosis and to stage tumors based on gene expression patterns.
Liver Cancer; Tumor Profiling; Cancer Driver Genes
Myelodysplastic syndrome (MDS) is a complex family of pre-leukemic diseases in which hematopoietic stem cell defects lead to abnormal differentiation in one or more blood lineages. Disease progression is associated with increasing genomic instability and a large proportion of patients go on to develop acute myeloid leukemia. Primarily a disease of the elderly, it can also develop following chemotherapy. We have previously reported that CREB binding protein (Crebbp) heterozygous mice have an increased incidence of hematological malignancies, and others have shown that CREBBP is one of the genes altered by chromosomal translocations found in patients suffering from therapy-related MDS. This led us to investigate whether hematopoietic tumor development in Crebbp+/- mice is preceded by a myelodysplastic phase and whether we could uncover molecular mechanisms that might contribute to its development. We report here that Crebbp+/- mice invariably develop myelodysplastic/myeloproliferative neoplasm within 9-12 months of age. They are also hypersensitive to ionizing radiation and show a marked decrease in PARP1 activity after irradiation. In addition, protein levels of XRCC1 and APEX1, key components of base excision repair machinery, are reduced in unirradiated Crebbp+/- cells or upon targeted knock down of CREBBP levels. Our results thus provide validation of a novel myelodysplastic/myeloproliferative neoplasm mouse model and, more importantly, point to defective repair of DNA damage as a contributing factor to the pathogenesis of this currently incurable disease.
CREBBP; MDS/MPN; DNA repair; radiation hypersensitivity; PARP1
Common microarray and next-generation sequencing data analysis concentrate on tumor subtype classification, marker detection, and transcriptional regulation discovery during biological processes by exploring the correlated gene expression patterns and their shared functions. Genetic regulatory network (GRN) based approaches have been employed in many large studies in order to scrutinize for dysregulation and potential treatment controls. In addition to gene regulation and network construction, the concept of the network modulator that has significant systemic impact has been proposed, and detection algorithms have been developed in past years. Here we provide a unified mathematic description of these methods, followed with a brief survey of these modulator identification algorithms. As an early attempt to extend the concept to new RNA regulation mechanism, competitive endogenous RNA (ceRNA), into a modulator framework, we provide two applications to illustrate the network construction, modulation effect, and the preliminary finding from these networks. Those methods we surveyed and developed are used to dissect the regulated network under different modulators. Not limit to these, the concept of “modulation” can adapt to various biological mechanisms to discover the novel gene regulation mechanisms.
Increasing evidence suggests that chromosomal regions containing microRNAs are functionally important in cancers. Here, we show that genomic loci encoding miR-204 are frequently lost in multiple cancers, including ovarian cancers, pediatric renal tumors, and breast cancers. MiR-204 shows drastically reduced expression in several cancers and acts as a potent tumor suppressor, inhibiting tumor metastasis in vivo when systemically delivered. We demonstrated that miR-204 exerts its function by targeting genes involved in tumorigenesis including brain-derived neurotrophic factor (BDNF), a neurotrophin family member which is known to promote tumor angiogenesis and invasiveness. Analysis of primary tumors shows that increased expression of BDNF or its receptor tropomyosin-related kinase B (TrkB) parallel a markedly reduced expression of miR-204. Our results reveal that loss of miR-204 results in BDNF overexpression and subsequent activation of the small GTPase Rac1 and actin reorganization through the AKT/mTOR signaling pathway leading to cancer cell migration and invasion. These results suggest that microdeletion of genomic loci containing miR-204 is directly linked with the deregulation of key oncogenic pathways that provide crucial stimulus for tumor growth and metastasis. Our findings provide a strong rationale for manipulating miR-204 levels therapeutically to suppress tumor metastasis.
MicroRNAs (miRNAs) are 19-25 nucleotides non-coding RNAs known to have important post-transcriptional regulatory functions. The computational target prediction algorithm is vital to effective experimental testing. However, since different existing algorithms rely on different features and classifiers, there is a poor agreement among the results of different algorithms. To benefit from the advantages of different algorithms, we proposed an algorithm called BCmicrO that combines the prediction of different algorithms with Bayesian Network. BCmicrO was evaluated using the training data and the proteomic data. The results show that BCmicrO improves both the sensitivity and the specificity of each individual algorithm. All the related materials including genome-wide prediction of human targets and a web-based tool are available at http://compgenomics.utsa.edu/gene/gene_1.php.
Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms.
To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis.
The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.
One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets.
After gene set enrichment finds representative pathways for large gene sets, pathways are consolidated into representative pathway concepts. Three complementary, but different methods of pathway consolidation are explored. Enrichment Consolidation combines the set of the pathways enriched for the signature gene list through iterative combining of enriched pathways with other pathways with similar signature gene sets; Weighted Consolidation utilizes a Protein-Protein Interaction network based gene-weighting approach that finds clusters of both enriched and non-enriched pathways limited to the experiments' resultant gene list; and finally the de novo Consolidation method uses several measurements of pathway similarity, that finds static pathway clusters independent of any given experiment.
We demonstrate that the three consolidation methods provide unified yet different functional insights of a resultant gene set derived from a genome-wide profiling experiment. Results from the methods are presented, demonstrating their applications in biological studies and comparing with a pathway web-based framework that also combines several pathway databases. Additionally a web-based consolidation framework that encompasses all three methods discussed in this paper, Pathway Distiller (http://cbbiweb.uthscsa.edu/PathwayDistiller), is established to allow researchers access to the methods and example microarray data described in this manuscript, and the ability to analyze their own gene list by using our unique consolidation methods.
By combining several pathway systems, implementing different, but complementary pathway consolidation methods, and providing a user-friendly web-accessible tool, we have enabled users the ability to extract functional explanations of their genome wide experiments.
Although decades of research have established that androgen is essential for spermatogenesis, androgen's mechanism of action remains elusive. This is in part because only a few androgen-responsive genes have been definitively identified in the testis. Here, we propose that microRNAs – small, non-coding RNAs – are one class of androgen-regulated trans-acting factors in the testis. Specifically, by using androgen suppression and androgen replacement in mice, we show that androgen regulates the expression of several microRNAs in Sertoli cells. Our results reveal that several of these microRNAs are preferentially expressed in the testis and regulate genes that are highly expressed in Sertoli cells. Because androgen receptor-mediated signaling is essential for the pre- and post-meiotic germ cell development, we propose that androgen controls these events by regulating Sertoli/germ cell-specific gene expression in a microRNA-dependent manner.
Embryonal rhabdomyosarcoma (eRMS) shows the most myodifferentiation amongst sarcomas, yet the precise cell of origin remains undefined. Using Ptch1, p53 and/or Rb1 conditional mouse models and controlling prenatal or postnatal myogenic cell of origin, we demonstrate that eRMS and undifferentiated pleomorphic sarcoma (UPS) lie in a continuum, with satellite cells predisposed to giving rise to UPS. Conversely, p53 loss in maturing myoblasts gives rise to eRMS, which have the highest myodifferentiation potential. Irrespective of origin, Rb1 loss modifies tumor phenotype to mimic UPS. In human sarcomas that lack pathognomic chromosomal translocations, p53 loss of function is prevalent whereas Shh or Rb1 alterations likely act primarily as modifiers. Thus, sarcoma phenotype is strongly influenced by cell of origin and mutational profile.
Transcriptional regulation by transcription factor (TF) controls the time and abundance of mRNA transcription. Due to the limitation of current proteomics technologies, large scale measurements of protein level activities of TFs is usually infeasible, making computational reconstruction of transcriptional regulatory network a difficult task.
We proposed here a novel Bayesian non-negative factor model for TF mediated regulatory networks. Particularly, the non-negative TF activities and sample clustering effect are modeled as the factors from a Dirichlet process mixture of rectified Gaussian distributions, and the sparse regulatory coefficients are modeled as the loadings from a sparse distribution that constrains its sparsity using knowledge from database; meantime, a Gibbs sampling solution was developed to infer the underlying network structure and the unknown TF activities simultaneously. The developed approach has been applied to simulated system and breast cancer gene expression data. Result shows that, the proposed method was able to systematically uncover TF mediated transcriptional regulatory network structure, the regulatory coefficients, the TF protein level activities and the sample clustering effect. The regulation target prediction result is highly coordinated with the prior knowledge, and sample clustering result shows superior performance over previous molecular based clustering method.
The results demonstrated the validity and effectiveness of the proposed approach in reconstructing transcriptional networks mediated by TFs through simulated systems and real data.
Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself.
We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process.
A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.
Replication of mammalian genomes requires the activation of thousands of
origins which are both spatially and temporally regulated by as yet unknown
mechanisms. At the most fundamental level, our knowledge about the
distribution pattern of origins in each of the chromosomes, among different
cell types, and whether the physiological state of the cells alters this
distribution is at present very limited.
We have used standard λ-exonuclease resistant nascent DNA preparations in
the size range of 0.7–1.5 kb obtained from the breast cancer cell line
MCF–7 hybridized to a custom tiling array containing 50–60 nt
probes evenly distributed among genic and non-genic regions covering about
1% of the human genome. A similar DNA preparation was used for
high-throughput DNA sequencing. Array experiments were also performed with
DNA obtained from BT-474 and H520 cell lines. By determining the sites
showing nascent DNA enrichment, we have localized several thousand origins
of DNA replication. Our major findings are: (a) both array and DNA
sequencing assay methods produced essentially the same origin distribution
profile; (b) origin distribution is largely conserved (>70%) in
all cell lines tested; (c) origins are enriched at the 5′ends of
expressed genes and at evolutionarily conserved intergenic sequences; and
(d) ChIP on chip experiments in MCF-7 showed an enrichment of H3K4Me3 and
RNA Polymerase II chromatin binding sites at origins of DNA replication.
Our results suggest that the program for origin activation is largely
conserved among different cell types. Also, our work supports recent studies
connecting transcription initiation with replication, and in addition
suggests that evolutionarily conserved intergenic sequences have the
potential to participate in origin selection. Overall, our observations
suggest that replication origin selection is a stochastic process
significantly dependent upon local accessibility to replication factors.
In vitro cell culture experiments with primary cells have reported that cell proliferation is retarded in the presence of ambient compared to physiological O2 levels. Cancer is primarily a disease of aberrant cell proliferation, therefore, studying cancer cells grown under ambient O2 may be undesirable. To understand better the impact of O2 on the propagation of cancer cells in vitro, we compared the growth potential of a panel of ovarian cancer cell lines under ambient (21%) or physiological (3%) O2.
Our observations demonstrate that similar to primary cells, many cancer cells maintain an inherent sensitivity to O2, but some display insensitivity to changes in O2 concentration. Further analysis revealed an association between defective G2/M cell cycle transition regulation and O2 insensitivity resultant from overexpression of 14-3-3 σ. Targeting 14-3-3 σ overexpression with RNAi restored O2 sensitivity in these cell lines. Additionally, we found that metastatic ovarian tumors frequently overexpress 14-3-3 σ, which in conjunction with phosphorylated RB, results in poor prognosis.
Cancer cells show differential proliferative sensitivity to changes in O2 concentration. Although a direct link between O2 insensitivity and metastasis was not determined, this investigation showed that an O2 insensitive phenotype in cancer cells to correlate with metastatic tumor progression.
Response of cells to changing endogenous or exogenous conditions is governed by intricate molecular interactions, or regulatory networks. To lead to appropriate responses, regulatory network should be 1) context-specific, i.e., its constituents and topology depend on the phonotypical and experimental context including tissue types and cell conditions, such as damage, stress, macroenvironments of cell, etc. and 2) time varying, i.e., network elements and their regulatory roles change actively over time to control the endogenous cell states e.g. different stages in a cell cycle.
A novel network model PathRNet and a reconstruction approach PATTERN are proposed for reconstructing the context specific time varying regulatory networks by integrating microarray gene expression profiles and existing knowledge of pathways and transcription factors. The nodes of the PathRNet are Transcription Factors (TFs) and pathways, and edges represent the regulation between pathways and TFs. The reconstructed PathRNet for Kaposi's sarcoma-associated herpesvirus infection of human endothelial cells reveals the complicated dynamics of the underlying regulatory mechanisms that govern this intricate process. All the related materials including source code are available at http://compgenomics.utsa.edu/tvnet.html.
The proposed PathRNet provides a system level landscape of the dynamics of gene regulatory circuitry. The inference approach PATTERN enables robust reconstruction of the temporal dynamics of pathway-centric regulatory networks. The proposed approach for the first time provides a dynamic perspective of pathway, TF regulations, and their interaction related to specific endogenous and exogenous conditions.
MicroRNAs (miRNAs) are single-stranded non-coding RNAs shown to plays important regulatory roles in a wide range of biological processes and diseases. The functions and regulatory mechanisms of most of miRNAs are still poorly understood in part because of the difficulty in identifying the miRNA regulatory targets. To this end, computational methods have evolved as important tools for genome-wide target screening. Although considerable work in the past few years has produced many target prediction algorithms, most of them are solely based on sequence, and the accuracy is still poor. In contrast, gene expression profiling from miRNA transfection experiments can provide additional information about miRNA targets. However, most of existing research assumes down-regulated mRNAs as targets. Given the fact that the primary function of miRNA is protein inhibition, this assumption is neither sufficient nor necessary.
A novel Bayesian approach is proposed in this paper that integrates sequence level prediction with expression profiling of miRNA transfection. This approach does not restrict the target to be down-expressed and thus improve the performance of existing target prediction algorithm. The proposed algorithm was tested on simulated data, proteomics data, and IP pull-down data and shown to achieve better performance than existing approaches for target prediction. All the related materials including source code are available at http://compgenomics.utsa.edu/expmicro.html.
The proposed Bayesian algorithm integrates properly the sequence paring data and mRNA expression profiles for miRNA target prediction. This algorithm is shown to have better prediction performance than existing algorithms.
MicroRNAs (miRNAs) are single-stranded non-coding RNAs known to regulate a wide range of cellular processes by silencing the gene expression at the protein and/or mRNA levels. Computational prediction of miRNA targets is essential for elucidating the detailed functions of miRNA. However, the prediction specificity and sensitivity of the existing algorithms are still poor to generate meaningful, workable hypotheses for subsequent experimental testing. Constructing a richer and more reliable training data set and developing an algorithm that properly exploits this data set would be the key to improve the performance current prediction algorithms.
A comprehensive training data set is constructed for mammalian miRNAs with its positive targets obtained from the most up-to-date miRNA target depository called miRecords and its negative targets derived from 20 microarray data. A new algorithm SVMicrO is developed, which assumes a 2-stage structure including a site support vector machine (SVM) followed by a UTR-SVM. SVMicrO makes prediction based on 21 optimal site features and 18 optimal UTR features, selected by training from a comprehensive collection of 113 site and 30 UTR features. Comprehensive evaluation of SVMicrO performance has been carried out on the training data, proteomics data, and immunoprecipitation (IP) pull-down data. Comparisons with some popular algorithms demonstrate consistent improvements in prediction specificity, sensitivity and precision in all tested cases. All the related materials including source code and genome-wide prediction of human targets are available at http://compgenomics.utsa.edu/svmicro.html.
A 2-stage SVM based new miRNA target prediction algorithm called SVMicrO is developed. SVMicrO is shown to be able to achieve robust performance. It holds the promise to achieve continuing improvement whenever better training data that contain additional verified or high confidence positive targets and properly selected negative targets are available.
Hepatocellular carcinoma is a common and aggressive cancer that occurs mainly in men. We examined microRNA expression patterns, survival, and response to interferon alfa in both men and women with the disease.
We analyzed three independent cohorts that included a total of 455 patients with hepatocellular carcinoma who had undergone radical tumor resection between 1999 and 2003. MicroRNA-expression profiling was performed in a cohort of 241 patients with hepatocellular carcinoma to identify tumor-related microRNAs and determine their association with survival in men and women. In addition, to validate our findings, we used quantitative reverse-transcriptase–polymerase-chain-reaction assays to measure microRNAs and assess their association with survival and response to therapy with interferon alfa in 214 patients from two independent, prospective, randomized, controlled trials of adjuvant interferon therapy.
In patients with hepatocellular carcinoma, the expression of miR-26a and miR-26b in nontumor liver tissue was higher in women than in men. Tumors had reduced levels of miR-26 expression, as compared with paired noncancerous tissues, which indicated that the level of miR-26 expression was also associated with hepatocellular carcinoma. Moreover, tumors with reduced miR-26 expression had a distinct transcriptomic pattern, and analyses of gene networks revealed that activation of signaling pathways between nuclear factor κB and interleukin-6 might play a role in tumor development. Patients whose tumors had low miR-26 expression had shorter overall survival but a better response to interferon therapy than did patients whose tumors had high expression of the microRNA.
The expression patterns of microRNAs in liver tissue differ between men and women with hepatocellular carcinoma. The miR-26 expression status of such patients is associated with survival and response to adjuvant therapy with interferon alfa.
Expression profile analysis clusters Gpnmb with known pigment genes, Tyrp1, Dct, and Si. During development, Gpnmb is expressed in a pattern similar to Mitf, Dct and Si with expression vastly reduced in Mitf mutant animals. Unlike Dct and Si, Gpnmb remains expressed in a discrete population of caudal melanoblasts in Sox10-deficient embryos. To understand the transcriptional regulation of Gpnmb we performed a whole genome annotation of 2,460,048 consensus MITF binding sites, and cross-referenced this with evolutionarily conserved genomic sequences at the GPNMB locus. One conserved element, GPNMB-MCS3, contained two MITF consensus sites, significantly increased luciferase activity in melanocytes and was sufficient to drive expression in melanoblasts in vivo. Deletion of the 5’-most MITF consensus site dramatically reduced enhancer activity indicating a significant role for this site in Gpnmb transcriptional regulation. Future analysis of the Gpnmb locus will provide insight into the transcriptional regulation of melanocytes and Gpnmb expression can be used as a marker for analyzing melanocyte development and disease progression.
Comparative analysis of gene expression profiles using melanocyte lines derived from mice provides a powerful resource to explore genetic components of melanocyte development and pigment cell function. Using expression data, we identified Gpnmb as a new marker for early melanoblast development. We show that Gpnmb is dependent on Mitf for in vivo expression and marks a unique set of Sox10-independent melanoblasts. We identified an 89 basepair evolutionarily conserved genomic sequence at the Gpnmb locus that can enhance expression in melanocytes and tested MITF E-box consensus sequences for their involvement in melanocyte-restricted expression. Gpnmb and the panel of genes identified in this study will be valuable resources for understanding the genetic components involved in melanocyte development and diseases.
Gpnmb; Mitf; Sox10; melanoblast; melanocyte; melanoma
Motivation: Genomic instability is one of the fundamental factors in tumorigenesis and tumor progression. Many studies have shown that copy-number abnormalities at the DNA level are important in the pathogenesis of cancer. Array comparative genomic hybridization (aCGH), developed based on expression microarray technology, can reveal the chromosomal aberrations in segmental copies at a high resolution. However, due to the nature of aCGH, many standard expression data processing tools, such as data normalization, often fail to yield satisfactory results.
Results: We demonstrated a novel aCGH normalization algorithm, which provides an accurate aCGH data normalization by utilizing the dependency of neighboring probe measurements in aCGH experiments. To facilitate the study, we have developed a hidden Markov model (HMM) to simulate a series of aCGH experiments with random DNA copy number alterations that are used to validate the performance of our normalization. In addition, we applied the proposed normalization algorithm to an aCGH study of lung cancer cell lines. By using the proposed algorithm, data quality and the reliability of experimental results are significantly improved, and the distinct patterns of DNA copy number alternations are observed among those lung cancer cell lines.
Supplementary information: Source codes and.gures may be found at http://ntumaps.cgm.ntu.edu.tw/aCGH_supplementary