GWAS identify highly significant SNP phenotype associations (p < 5 × 10−8
), the vast majority of which (80%) occur within non-protein-coding sequences.1
These consistent and reproducible findings highlight a major knowledge gap in our understanding of phenotype-defining functions of human genome segments lacking protein-coding potentials. Our experiments reveal widespread transcription at IDAGL, raising the possibility that non-protein-coding RNA molecules may play an important role in predisposition to multiple common human disorders. We demonstrate that forced expression of 52 nt snpRNAs imposes a castration-resistant phenotype on human prostate carcinoma cells. It transforms low-malignancy, hormone-dependent human prostate cancer cells into highly malignant, androgen depletion-independent prostate cancer. To facilitate the assessment of clinical significance of discovered snpRNAs, we developed Q-PCR methods of quantitative analysis of snpRNAs and validated its utility by analysis of snpRNA expression in clinical samples. Our analysis reveals markedly elevated snpRNA expression levels in prostate cancer tissues compared with the adjacent normal prostate (). Notably higher expression levels of snpRNAs in human prostate adenocarcinoma samples and apparent association of increased snpRNA expression with pathohistological features of clinically significant disease (high Gleason score) highlight potential translational relevance of our findings. Collectively, our data imply that prostate cancer cells that emerge in individuals expressing high levels of the prostate cancer susceptibility snpRNAs are more likely to progress early to hormone-independent, incurable, metastatic disease. Our work defines the intergenic 8q24 region as RAD-regulatory locus of critical significance for human prostate cancer, reveals previously unknown molecular links between the innate immunity/inflammasome system and development of hormone-independent PC and identifies novel diagnostic and therapeutic targets successful validation of which should be highly beneficial for clinical management of prostate cancer patients.
Our experiments demonstrate that IDAGLs represent multifunctional genomic trans-regulatory domains possessing a broad range of intrinsic regulatory functions that are mediated by both DNA sequences and transcribed RNA molecules. Many IDAGLs harbor a consensus chromatin signature comprising H3K27Me3 and H3K4Me1 histones, Ezh2 and disease-state-specific parts of transcription factors. IDAGL's functions as cell type-specific, long-range enhancers or insulators appear dependent on the allelic status of a disease-linked SNP and are regulated by snpRNAs. Our experiments indicate that microRNAs that have complementary sequences in corresponding snpRNAs may constitute one of the primary targets of snpRNA-induced genomewide epigenetic regulatory networks, engagement of which is triggered by distinctive single-base-level molecular recognition events ( and S8
). Altered microRNA expression and activity would facilitate an epigenetic amplification of a single-base-driven regulatory event by inducing downstream mRNA expression changes of many (perhaps, thousands) protein-coding genes which would ultimately cause clinically significant alterations of cellular functions. Examples of experimentally identified components of such a regulatory network are key inflammasome/innate immunity pathway-related genetic targets.4
In agreement with this mechanism, we found markedly altered expression of prostate cancer susceptibility snpRNAs in cell lines genetically engineered to stably express either NLRP1
-locus snpRNAs or snpRNA-regulated microRNAs ( and S8
). Further, our data suggest that microRNAs may contribute to biogenesis of snpRNAs by guiding Argonaute family endonucleases to execute a sequence-specific cleavage of snpRNAs and putative small snpRNA-precursors, long noncoding snpRNAs. Consistently, we found that many small snpRNAs exhibit cell type specific expression profiles, whereas long noncoding snpRNAs containing disease-associated SNP sequences manifest more ubiquitous expression patterns.4
These observations indicate that small snpRNAs may represent products of a cell type-specific processing of long noncoding snpRNAs and support the hypothesis that microRNAs are intrinsic regulatory components of snpRNA/enhancer IDAGL networks that contribute to maintenance of epigenetic regulatory state in a cell.
Figure 7 Application of the allele affinity model of snpRNA-mediated regulation of microRNA expression and activity ( and S8) to development of the allele equilibrium hypothesis explaining a dynamic transition, the allele-specific phenotype-altering effects (more ...)
Our experiments indicate that activation of the NLRP1
-locus snpRNA/miR-205 axis may contribute to development of clinically significant prostate cancer by reducing expression of the PTEN
tumor suppressor. We found that NLRP1
-locus snpRNAs induce expression and activity of miR-205 in human cells () and forced expression of miR-205 recapitulates many molecular, phenotypic and clinical features associated with expression of NLRP1
-locus snpRNAs (), including markedly decreased expression of the PTEN
tumor suppressor. PTEN
tumor suppressor has been identified as a target for miR-205 based on target prediction algorithms and miR-205 overexpression experiments using the pGL3-PTEN 3′UTR reporter constructs with mutations of miR-205-binding sites, microarray and TaqMan Q-PCR analyses and protein gel blot analysis.25
Altered expression and activity of miR-205 have been associated with epithelial-to-mesenchymal transition (EMT), emergence of stem cell-like properties and maintenance of mammary epithelial cell progenitors.25–27
Results of our experiments are highly consistent with the emerging concept of the pervasive, global transcriptional activity of human genomes, which is supported by observations that vast majority of transcripts in human cells is represented by noncoding RNAs (refs. 28–37
; see Sup. Material
for discussion and additional references). Collectively, they lend credence to the idea that intergenic DNA sequence variations may contribute to disease pathogenesis via noncoding RNA intermediaries that assert trans-regulatory effects on epigenetic regulatory circuitry of a cell. This concept challenges the dominant position of protein-centric experimental paradigm that is focused on analysis of effects of genetic variations on protein-coding genes within or near boundaries where the genetic variants are located. Conclusive evidence of ubiquitous expression in human cells of intergenic disease-associated genetic loci indicates the potential clinical benefits of the concurrent analyses of DNA sequences and expression profiling of corresponding RNA products. This is critically important, because the transcriptional activity and transcript abundance levels are genetically defined, quantitative traits, assessment of which should enhance the disease risk prediction power of corresponding genetic tests. We anticipate that findings reported here should have a major near-term implication on design and execution of genome-wide association studies and follow-up mechanistic studies of non-protein-coding disease-linked loci. To this end, we provide sequences of validated primers and experimental protocols (Tables S1–5
) to facilitate the immediate implementation of this type of analyses for 96 intergenic SNPs associated with increased risk of developing 21 common human disorders.
Experimental progress in defining candidate SNP variations associated with human disorders is rapidly generating leads with potential clinical relevance. Analysis of defined SNP variations appears to distinguish distinct autoimmune disorders and has a prognostic significance in leukemia and lymphoma.44,45
This progress is not limited to the investigations of disease phenotypes. Aging and longevity phenotypes in human populations have been associated with multiple SNP variations.46–48
Novel conceptual principles and comprehensive analytical approaches are beginning to emerge that signify growth and maturity of this exciting field. There is increasing understanding of a requirement for systematic, in-depth analysis of tge functional and clinical significance of non-protein-coding risk regions, particularly with respect to the cancer risk loci.49,50
Rapidly emerging experimental and clinical evidence supports the growing recognition of the important roles of microRNAs and other classes of noncoding RNAs in human diseases.51,52
Based on theoretical considerations as well as computational and bioinformatics analyses, we proposed a disease phenocode hypothesis that integrates the potential mechanistic relationships between structural features and gene expression patterns of disease-linked SNPs, microRNAs and mRNAs of protein-coding genes in association to phenotypes of multiple common human disorders.53–58
In this context, our work documents several important steps toward critical experimental testing and validation of potential practical utility and clinical relevance of a disease phenocode hypothesis.