|Home | About | Journals | Submit | Contact Us | Français|
Human microRNAs (miRNAs) are potent regulators of gene expression and thus involved in a broad range of biological processes. The objective of this study was to update the properties of human miRNAs and to search for SNPs and CNVs with potential effects on them. Based on the latest miRBase 13.0 database, we identified 380 (53.9%) precursor miRNAs (pre-miRNAs) embedded in gene loci that are enriched in biological processes such as “Neuronal activities”, “Cell Cycle” and “Protein phosphorylation” (Bonferroni p < 0.05). Gene lengths of the pre-miRNA host genes are significantly larger than other genes in the genome (p < 2.2E-16). Using data mining public resources, we performed a genome-scale search for the regulatory polymorphisms in the loci of pre-miRNAs and their related genes. Altogether, we found 187 SNPs in the pre-miRNAs, 497 consensus SNPs in the seed-matching untranslated regions of target genes, 385 CNVs harboring pre-miRNA precursors and 9 CNVs covering important miRNA processing genes. We also noticed that minimum free energy changed by pre-miRNA-residing SNPs could be ranked by the order from low to high as the SNPs in the loop domain, the SNPs in the adjacent stem and basal stem domains, and the SNPs in mature miRNA and its complementary sequence domains (p = 0.0065). With a full list of miRNA-related polymorphisms, this study will facilitate future association studies between the genetic polymorphisms in miRNA targets or pre-miRNAs and the disease susceptibility or therapeutic outcome.
As a new class of abundantly distributed small non-coding RNA molecules, miRNAs are initially transcribed as primary miRNA (pri-miRNAs) that are further processed through stem-loop pre-miRNAs into a single-stranded mature form.1 Generally, miRNAs partially complement the 3′-untranslated regions (UTR) of target mRNAs, and subsequently invoke a series of posttranscriptional silencing events on the target genes. These include intervention of translational initiation and elongation, induction of deadenylation and interruption of mRNA and protein synthesis.2,3 The core of miRNA’s pairing sequence is termed “seed”, which is usually conserved across multiple species.4 Since this seed sequence is on average 8 nucleotides in length and critical for the miRNA-target binding, the miRNAs are estimated to have anywhere from several to thousands of targets.4 Therefore, mature miRNAs as potent regulators of gene expression have been implicated in various biological development processes and disease progression.5,6
SNPs within the sequences of human miRNAs and their targets have been shown to have impact on various phenotypes including blood pressure,7 drug resistance,8 outcomes of therapeutic intervention,9 abnormal psychiatry disorder,10 the development of gastric mucosal atrophy,11 the risk of diseases that consist of asthma,12 diabetes,13 Parkinson,14 cancers of colorectal,15 breast,16 bladder,17 papillary thyroid,18,19 lung20,21 and esophageal.22 Moreover, miRNAs and their target genes may be located at the genomic regions of high instability, a feature often observed in cancer and other genetic diseases. To note, specific deletions of key enzymes such as Dicer1 (dicer 1, ribonuclease type III) may cause global impairment of miRNA processing leading to severe abnormality.23
As shown in Table 1, several researchers have uncovered links between genomic variations and miRNAs.9,15,16,18–22,24–36 However, a systematic update of the genetic factors that influence miRNA activities according to the most recent miRBase is not yet in place. Taking advantage of the latest miRNA databases and bioinformatics tools, we analyzed human miRNA for their biological properties and performed a genome-wide scan for functional genetic variations that may potentially affect human miRNA processing and targeting.
As shown in the Figure 1, there are a total of 718 loci for human pre-miRNA genes in the genome based on the miRBase 13.0. The pre-miRNAs with multiple copies include mir-1184 (3 copies), mir-1233 (2 copies), mir-1244 (4 copies), mir-1302-2 (4 copies), mir-1972 (2 copies), mir-1974 (2 copies), mir-1977 (2 copies) and mir-1978 (2 copies). These 706 pre-miRNAs can be processed into 885 mature miRNAs (including mirR*products) with lengths ranging from 17 to 27 nucleotides. Based on the common seed sequence of mature miRNA products, the pre-miRNAs may be further grouped into families. In the current released version of miRBase, there are altogether 381 miR-families (644 members) comprising of up to 42 members in humans. The largest pre-miRNA families include mir-515 (n = 42), mir-548 (n = 31), mir-154 (n = 19), mir-506 (n = 18) and let-7 (n = 12).
There are 380 human pre-miRNA genes residing in the loci of 340 protein-coding-genes (PCGs) (Suppl. Table 1). The genes hosting the most pre-miRNAs include HTR2C (5 pre-miRNAs), RTL1 (5 pre-miRNAs) and LARP7 (5 pre-miRNAs). By comparing the length of gene-spanning regions between pre-miRNA host PCGs and other PCGs in humans, our results showed that the pre-miRNA-host PCGs have significant larger gene-spanning regions (Fig. 2A, Kolmogorov-Smirnov test, D = 0.38, p < 2.2E-16). In addition, the host genes were analyzed for enriched categories in the Gene Ontology (GO)37 and Kyoto Encyclopedia of Genes and Genomes (KEGG)38 pathways. As shown in Table 2, the host genes are significantly enriched in the pathways of “Insulin/IGF pathway-mitogen activated protein kinase kinase/MAP kinase cascade” and “Coenzyme A biosynthesis”, the biological processes of “Neuronal activities”, “Protein phosphorylation”, “Cell cycle” and “Cell structure and motility”, and the molecular function of “Kinase” (Bonferroni corrected p < 0.05). Among these pre-miRNA host genes, only 33 genes are not annotated and these are significantly less than those found in all NCBI annotated genes (p < 0.0001, df = 1, χ2 = 67.98, OR = 0.25). In addition, the category of “unclassified” for both the pathway and GO analysis are significantly underrepresented for miRNA host genes (Bonferroni p < 0.05), implying that important functions are implicated for miRNA host genes. To note, the pre-miRNA host PCGs trend to have larger gene length than the rest PCGs in human genome, thus they are more likely (by chance) to harbor miRNAs. And this may potentially cause some biased observations for certain GO terms.
In the present study, we performed a genome-scale search for both SNPs and CNVs that may potentially affect miRNA processing or targeting. Our results revealed that only 188 SNPs are located at 138 pre-miRNA regions (Table 3). In contrast, on average there are over 300 SNPs per 100 bps in the flanking regions of all the pre-miRNAs (Fig. 2B). This observation agrees with the previous finding based on the miRNAs in the earlier miRBase released version.31,36 Among the SNPs in the hairpins of pre-miRNAs, there are 16 SNPs in adjacent stem, 77 SNPs in the basal stem, 17 SNPs in the loop, 44 SNPs in mature miRNA, 54 SNPs in the complementary sequence of mature miRNA (Table 3). We use RNAfold web server to determine the minimum free energy (MFE) of hairpin structures for all the SNP-residing pre-miRNAs.39 The changes of minimum free energy (MFE) by the SNPs in the pre-miRNAs are also given in the Table 3. Significantly more MFE changes are caused by SNPs in mature miRNA and its complementary sequence domains than the SNPs in the adjacent stem and basal stem domains that are followed by SNPs in the loop domain (Fig. 3, Kruskal-Wallis test, χ2 = 14.25, df = 4, p = 0.0065).
We also evaluated the known CNV coverage of human pre-miRNA genes using the CNVs deposited in the Database of Genomic Variants (DGV).40 We found that 193 pre-miRNAs were located in the regions covered by 385 CNV markers (Table 4 and Suppl. Table 2). No significant difference for the distribution of pre-miRNAs in PCGs (n = 109) and in the intergenic regions (n = 84, χ2 = 0.71, df = 1, p = 0.39).
Using the predicted targets by TargetScanS41 and PITA,42 we found 1,238 and 4,235 SNPs located in the putative seed-matching regions of targeting genes (Suppl. Tables 3 and 4). As shown in Table 5, eleven 3′-UTR SNPs may disrupt the miRNA-target regulation that has been supported by experimental evidence maintained in TarBase.43 A total of 497 overlapping SNPs are found in the same seed-matching regions of 434 target genes by both TargetScanS and PITA (Suppl. Table 5). As shown in Table 6, the 434 overlapping target genes are significantly enriched in the KEGG pathways of “Angiogensis” (Bonferroni p = 4 × 10−6) and “T cell activation” (Bonferroni p = 0.004) as well as the GO biological processes of “Developmental processes” (Bonferroni p = 3 × 10−21), “Signal transduction” (Bonferroni p = 5 × 10−12), “mRNA transcription regulation” (Bonferroni p = 2 × 10−6), “Neurogenesis” (Bonferroni p = 2 × 10−6) and “Oncogenesis” (Bonferroni p = 8 × 10−5). Focusing on both the pre-miRNA host genes and the miRNA target genes with SNPs in the consensus target sites, the pathway and GO analysis show some similar pathways and GO categories for these two lists, such as “Neuronal activities”, “Cell cyle”, “Protein phosphorylation”, and “Cell structure and motility” in the biological processes, and the “kinase” in the shared molecular function. These suggest a potential involvement of miRNAs in these biological activities.
Given these delicate processes in miRNA biogenesis (Fig. 4), any alterations of the key proteins involved in the miRNA processing and targeting will potentially lead to a global deregulation of the miRNA-mediated posttranscriptional silencing. Altogether, we found 3,921 SNPs in these miRNA-processing genes. A total of 83 SNPs may change the coding sequence of protein products. However, there are only 35 SNPs with allele frequency reports (Suppl. Table 6). Their contributions to gene functions remain to be discovered. A further analysis between CNVs and 13 miRNA-related genes shows that there are deletion in the loci of SNIP1, RNASEN, DICER1 and DGCR8, suggesting a potentially disrupted miRNA-processing pathway in those CNV carriers (Table 7).
In the present study, we evaluated the properties of human pre-miRNAs based on the miRBase database (release 13). Our survey shows that 53.9% of pre-miRNAs are located in PCGs. We found that pre-miRNA host genes have longer spanning regions than other genes in the genome and pre-miRNA host genes are more likely to be annotated with functional descriptions as compared with other genes in the genome (p < 0.0001, OR = 4.05), although this may be biased by a trend in the published studies of pre-miRNAs. A total of 193 pre-miRNAs (27.4%) are located in regions with genome instability. Interestingly, there are 10 out of 12 let-7 family members found in CNV regions (Suppl. Table 2). Given that let-7 plays a role in tumorigenesis, this finding suggests a non-random connection between pre-miRNA, CNVs and cancer development which is in agreement with previous findings.44
PCGs may host multiple pre-miRNAs and thus have potential to network with others. For example, HTR2C, a host gene for 5 pre-miRNAs, is a G-protein coupled receptor and mediates the signaling of neuronal activities.45 RTL1, a host gene of 5 pre-miRNAs is a reverse transcriptase and aspartic protease46 that is involved in angiogenesis, apoptosis and pathway-mitogen activated protein kinase kinase/MAP kinase cascade. Harboring 5 pre-miRNAs, LARP7 is a ribonucleoprotein and plays an important role in tRNA metabolism.47
Given the extensive variation found in the human genome, miRNA-mediated functions may be affected by polymorphisms in the miRNA target loci, pre-miRNA gene loci and/or the miRNA regulating gene loci.48 The polymorphism-driven alterations of miRNA activity are observed in numerous association studies.43 Several studies have catalogued the SNPs at the human pre-miRNAs and mature miRNA binding sites based on the earlier miRBase version (Table 1). In the present study, we also evaluated the roles of genomic variants, including both SNPs and CNVs, which may influence the miRNA-associated biological functions. Using the 718 genomic coordinates of 705 human pre-miRNAs (not including mir-941-4), we found 188 SNPs in the 138 pre-miRNAs with various numbers of SNPs in different pre-miRNA domains. The SNP-residing domains with the trend of MFE changes from low to high are loop, stem (including adjacent stem and basal stem), mature miRNA and their complementary sequences (Fig. 3). SNPs in pre-miRNAs could potentially change the stem-loop structures and thus may influence the miRNA processing and maturation. SNPs in the mature miRNAs, especially in the seed regions, are likely to affect the specificity for gene silencing.
Bioinformatics databases along with experimental evidence implicate eleven SNPs in miRNA target sites (Table 5). In our analysis, we found three SNPs with allele frequencies reported in NCBI dbSNP, including rs3783620, rs1621 and rs17620927, that may affect miRNA:gene regulation recorded in TarBase v5.0.49 For example, the miR-126 was reported to repress the translational level of VCAM1 by binding to the 3′-UTR of the gene50 and this regulation might be influenced by SNP rs3783620 in the seed-matching region on the 3′-UTR of VCAM1. To note, VCAM1 is an important gene in regulating leukocyte trafficking to sites of inflammation.51 In addition, Kim et al. reported that miR-199a* mediated the downregulation of MET gene by targeting at the 3′-UTR.52 However, SNP rs1621 in the seed-matching sequence of MET could affect this activity. Another example is miR-373 and MKRN1 gene. Using a microarray assay, the expression of miR-373 was identified to be inversely associated with the expression of MKRN1.53 SNP rs17620927, located in the miR-373 target site for MKRN1 likely affects miRNA-mediated gene silencing.
It is notable that the pre-miRNAs with multiple genomic copies, including mir-1244 (4 copies) and mir-1302-2 (4 copies) reside in CNV regions. This implies that the multiple copies of these pre-miRNAs are likely to be generated by genomic instability. The copy number changes may affect both the miRNAs and their host genes. Among the pre-miRNAs and their host genes in the CNV regions, some of them are implicated with important biological functions. Among them, mir-200b was reported to mediate gene silencing of ZFHX1B, a gene that is important in the TGFbeta signaling pathway.54 As shown in the Table 4, mir-555 is found to be located in a CNV loss region and it is embedded in ASH1L, a gene that is identified as a histone methyltransferase regulating gene transcription.55 AATK, a tyrosine kinase involved in apoptosis,56 is also located in the CNV loss region and harbors three pre-miRNAs including mir-657, mir-338 and mir-1250. These structural variants of the pre-miRNAs may be implicated with important biological effects in humans. Eleven out of 95 individuals were found to carry a deletion that covers mir-199a-1 and its host PCG (DNM2). DNM2 encodes a member of GTPases and it was a candidate gene of dominant intermediate Charcot-Marie-Tooth disease57 and autosomal dominant centronuclear myopathy.58 HTR2C that harbors 5 pre-miRNAs was found to be located in a CNV with genotypes of 4 gains and 9 losses. HTR2C gene encodes a serotonin receptor that is associated with mental disorders59,60 and side effects induced by antipsychotic drugs.61,62 Since little evidence is available about functional connection between the pre-miRNAs and their host PCGs, the current study provides researchers with a comprehensive list for future analysis.
As shown in Figure 4, pre-miRNAs have been shown to be produced through either alternative splicing from their host genes or processing by Drosha (RNASEN, nuclear ribonuclease type III).2 Subsequently, these intermediate precursors are exported by Exportin-5 (XPO5) and Ran-GTP (RAN) from nucleus to the cytoplasm, where Dicer (DICER1) excises stem-loop pre-miRNAs into single-stranded mature miRNA.2,3,63 These mature miRNA will be mediated by the argonaute family proteins (EIF2C1, EIF2C2)64 and other proteins (TARBP2, TNRC6A, PIWIL1)65,66 to evoke a series of posttranscriptional silencing of target genes. Besides these well known genes, recent evidence shows that SNIP1,67 and DGCR8,68 along with Drosha are involved in the initiation of miRNA biogenesis. Gemin3 (DDX20) and Gemin4 (GEMIN4) are two other important genes for miRNA function. They can form novel complex ribonucleoproteins to perform gene silencing function with miRNA and eIF2C2, a member of the Argonaute protein family.69
In this study, we evaluated the contribution of CNVs to 13 miRNA-processing pathway genes. SNIP1 and DGCR8 are important proteins in the initiation of the pri-miRNA transcription. EIF2C2 is in the Argonaut family of genes that play an important role in gene silencing. The CNVs with genomic loss suggest that other proteins may rescue their biological functions. Interestingly, we also noticed there is a 350 bp loss in the DICER1 loci, which will cause a truncated form of DICER1. Given the severe abnormality caused by the Dicer1 gene deletion,23 we speculate that this truncated form of DICER maintains the function of DICER1. Since the CNV frequencies at most genes except DGCR8 are relatively low, more evidence is needed for an in-depth exploration.
In summary, we have performed an in-depth analysis of human miRNAs based on multiple association studies. Compared with previous studies, we focused on both SNPs and CNV information that may potentially affect miRNA processing and maturation. Since aberrant miRNA expressions have been implicated in oncogenesis and other diseases,70 the miRNA-related polymorphisms provided by us will facilitate future studies and increase our understanding of the role of miRNAs in human gene regulation.
The chromosomal coordinates for 705 (not including mir-941-4) pre-miRNAs and over 30,000 human genes were obtained from the miRBase version 13.0 and NCBI dbGene. We compared the genomic start and end positions of over 30,000 genes with those of 705 pre-miRNAs. A host gene of a pre-miRNA is defined when their loci overlap with each other. All of the genomic coordinates in the present study were based on hg18 (March, 2006).
The genomic coordinates of mature miRNAs were inferred from the pre-miRNAs by the sequence matching. The miRNA target seed-matching regions predicted by TargetScan v4.1 and PITA were downloaded from the UCSC genome browser.71 Genomic positions of over 10 million SNPs were retrieved from NCBI dbSNP129 and then they were used to search for SNPs within the start and end positions of pre-miRNAs, mature miRNAs and miRNA target sites.
The genomic coordinates of CNVs (after excluding inversions) were downloaded from Database of Genomic Variants (DGV) version 7.40 The genomic start and end positions of over 20,000 CNVs with those of 705 pre-miRNAs were compared to see whether there were overlapping regions between them.
PANTHER database72 was used to identify enriched functional annotation categories for pre-miRNA host genes and the genes with SNPs in the miRNA target sites. KEGG pathway and two GO terms (biological process and molecular function) were evaluated. Uncorrected p < 0.01 was considered statistically significant.
NCBI dbSNP: http://www.ncbi.nih.gov/SNP/
Database of Genomic Variants: http://projects.tcag.ca/variation/
UCSC genome browser: http://genome.ucsc.edu/
TargetScanS miRNA target database: http://www.targetscan.org/vert_42/
PITA miRNA target database: http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html
RNAfold web server: http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi
This Pharmacogenetics of Anticancer Agents Research (PAAR) Group (http://pharmacogenetics.org/) study was supported by NIH/NIGMS grant UO1GM61393 and data deposits are supported by UO1GM61374 (http://pharmgkb.org/). This study was also supported by P50 CA125183 University of Chicago Breast Cancer SPORE grant.