|Home | About | Journals | Submit | Contact Us | Français|
A significant portion of the genome is transcribed as long non-coding RNAs (lncRNAs), several of which are known to control gene expression. The repertoire and regulation of lncRNAs in disease-relevant tissues, however, has not been systematically explored. We report a comprehensive strand-specific transcriptome map of human pancreatic islets and β-cells, and uncover >1100 intergenic and antisense islet-cell lncRNA genes. We find islet lncRNAs that are dynamically regulated, and show that they are an integral component of the β-cell differentiation and maturation program. We sequenced the mouse islet transcriptome, and identify lncRNA orthologs that are regulated like their human counterparts. Depletion of HI-LNC25, a β-cell specific lncRNA, downregulated GLIS3 mRNA, thus exemplifying a gene regulatory function of islet lncRNAs. Finally, selected islet lncRNAs were dysregulated in type 2 diabetes or mapped to genetic loci underlying diabetes susceptibility. These findings reveal a new class of islet-cell genes relevant to β-cell programming and diabetes pathophysiology.
During recent years, it has become apparent that the genomes of species as diverse as zebrafish, mice and humans transcribe thousands of RNAs that do not encode for proteins (Bertone et al., 2004; Birney et al., 2007; Carninci et al., 2005; Guttman et al., 2009; Ulitsky et al., 2011). A subset of non-coding transcripts are larger than 200 nucleotides, and are known as long non-coding RNAs (lncRNAs) (Mattick and Makunin, 2006). The function of most lncRNAs remains unknown. However, several dozen lncRNAs are known to exert non-redundant roles in processes such as X-inactivation, imprinting, splicing, transcriptional regulation, pluripotency, cancer, cell cycle, or survival (Gupta et al., 2010; Guttman et al., 2011; Hu et al., 2011; Penny et al., 1996; Rinn et al., 2007; Sleutels et al., 2002). In one example, a lncRNA has been shown to promote reprogramming of pluripotent cells from somatic cells (Loewer et al., 2010). Available evidence thus indicates that lncRNAs represent a still poorly understood layer of gene regulation.
Numerous mammalian lncRNAs are expressed in a cell-type specific manner (Cabili et al., 2011; Mercer et al., 2008). Together with knowledge that several such transcripts are functional, this raises the intriguing possibility that lncRNAs could be previously unsuspected mediators of lineage-specific differentiation or specialized cellular functions. Defects in lncRNAs could thus underlie human disease, and cell-specific regulatory lncRNAs might provide therapeutic targets. This warrants the need to explore the repertoires of lncRNAs of disease-relevant cell types and tissues.
Pancreatic islets of Langerhans are an excellent model of a specialized tissue that is closely linked to human disease. Islets comprise insulin-secreting β-cells and other polypeptide hormone-producing cells, including glucagon-secreting α-cells. Islet-cell dysfunction is central to the pathophysiology of Type 2 diabetes (T2D), the most prevalent form of diabetes (Bell and Polonsky, 2001). Recent genome-wide association studies for T2D and related traits have revealed >50 susceptibility loci, most of which are not known to carry variants that alter protein-coding sequences (McCarthy, 2010). A common hypothesis is that such variants impact regulatory elements of protein-coding genes, although they could equally affect other non-protein coding elements such as lncRNAs.
In Type 1 diabetes, β-cells are destroyed by autoimmune mechanisms, and consequently several experimental approaches are being developed to replace destroyed cells (Halban et al., 2001). One approach is based on the recent discovery of β-cell transcription factors, some of which have been misexpressed in somatic cells to create insulin-expressing cells (Collombat et al., 2009; Zhou et al., 2008). Another approach is to derive β-cells from pluripotent cells (Kroon et al., 2008). However, existing strategies have not yet succeeded in generating fully functional therapeutic β-cells in vitro. Clearly, the identification of novel regulators of β-cell differentiation and maturation remains a major challenge, and islet-specific regulatory transcripts are logical targets for this effort. Despite the potential implications for human diabetes, information on islet-cell lncRNAs is lacking.
In this study, we integrated sequence-based transcriptome and chromatin maps of human islets and β-cells to define 1128 islet lncRNA genes. We show that these lncRNAs are an integral component of the dynamic β-cell specific differentiation program, suggesting a role as biomarkers and potential regulators for programming efforts. We extend existing knowledge on lncRNAs by disclosing orthologous mouse transcripts that are regulated in an evolutionarily conserved manner. We focused on one islet-specific lncRNA, and show that it impacts the expression of a regulatory target. Finally, we show dysregulation of islet lncRNAs in T2D, and map selected lncRNAs to human diabetes genetic susceptibility loci. Collectively, these studies describe a new class of islet genes, and provide diverse lines of evidence to suggest that islet lncRNAs may impact diabetes pathophysiology and efforts to program therapeutic β-cells.
To identify human β-cell lncRNAs we integrated transcriptional and chromatin maps of purified human islet-cells (Figure 1A). We generated directional and non-directional cDNA libraries from PolyA+ and Ribo- RNA fractions of six human islets and two FACS-purified β-cell samples, and collected >460 million uniquely mapped paired-end sequence reads (Table S1). To increase our ability to define the structure of novel genes, we mapped epigenetic gene landmarks in three human islet samples, including H3K4me3 which is enriched in active promoter regions (Table S1). These datasets were then integrated to build strand-specific models of active human islet-cell genes with their predominant splicing patterns (Figure 1A).
As a measure of quality, we first examined annotated genes, and recovered ~70% as expressed above a threshold level of 0.5 reads per million mapped reads (RPKM) in islet cells. Despite the fact that the samples originated from diverse human islet donors (Table S2), both mRNA and H3K4me3 levels showed a high correlation between the different human islet samples (r2 > 0.94, Figure S1A, B). Consistent with the high islet purity, the most abundant H3K4me3-positive transcripts included known islet-cell and β-cell genes (Figure 1B, Table S3). Remarkably, ~20% of exonic reads in islets and ~45% in β-cells originated from the Proinsulin mRNA (Figure 1B). Moreover, known islet-specific genes displayed coherent chromatin enrichment patterns (see for ex. Figure 1C, Figure S1C).
Genomic inventories of human islets are unavoidably confounded by signals from contaminant exocrine cells. Our approach, however, allowed us to define lncRNAs that are truly expressed in β-cells and other islet cell-types. Transcripts from few contaminating acinar cells were efficiently excluded because they lacked H3K4me3 enrichment in islet preparations (Figure S1D, H Table S3). Accordingly, <1% of genes that we defined as active in islet cells were contaminant acinar transcripts, on the basis of a relative enrichment in acinar vs. islet-cell RNA. This initial analysis thus validated our strategy to map bona-fide active genes in human islets and β-cells.
Over 19% of the transcribed genome in human islets mapped outside of annotated protein-coding genes (Figure S1E). To define discrete islet-cell lncRNA genes, we selected 1128 transcripts with the following properties: (i) length >200 bp, (ii) H3K4me3 enrichment in a coherent location relative to the transcribed strand, (iii) expression >0.5 RPKM in all five islet and β-cell PolyA+ samples, (iv) no splicing or overlap with any coding gene present in RefSeq, UCSC or Ensembl annotations, and (v) low protein-coding potential. Of these, 761 were antisense (AS) lncRNAs, located <1 kb from an annotated gene but in a divergent orientation (Figure 1D). Thirty-two were <1 kb from an annotated gene but in a convergent orientation. Another set of 335 were intergenic (IG) lncRNAs, located >1kb from any coding gene. Finally, we identified 55 annotated lncRNAs that were located within the boundaries of coding genes, and termed these overlapping antisense lncRNAs (Figure 1D). For subsequent analysis we merged convergent and intergenic lncRNAs (a total of 367) as neither was closely associated with a promoter of a protein-coding gene. Comparison with acinar RNA-seq showed that only one lncRNA showed >3 fold acinar enrichment. Furthermore, qPCR analysis confirmed expression of 31/31 lncRNAs in human islets, and 26/31 in the human β-cell line EndoC-β-H1 (Ravassard et al., 2011) (data not shown). These transcripts are therefore bona-fide islet-cell and in most cases β-cell lncRNAs. Their genomic location and exon annotations are provided in Table S4.
Further analysis of the genomic properties of the 1128 islet lncRNAs showed that their overall protein-coding parameters resembled those of randomly chosen intergenic regions, confirming that most are likely to be truly non-coding transcripts (Figure 1E, Figure S1F). Antisense and intergenic lncRNAs had a similar average length as RefSeq genes, yet showed ~10 fold lower expression (Figure 1F). Despite the low abundance of lncRNAs, the top quartile were expressed at levels comparable to mRNAs encoding transcriptional regulators linked to human diabetes (HNF1A, HNF4A, and TCF7L2; 6.4, 7.1 and 5.7 RPKM in human islets, respectively). Compared with RefSeq transcripts, islet lncRNA genes displayed similar RNA Pol II-enrichment, but were less often H3K36me3-enriched, in keeping with less frequent splicing (36% for intergenic lncRNAs, vs. 93% in RefSeq genes) (Table S5). Finally, lncRNAs overlapped interspersed repeat regions more often than coding exons, yet less than random intergenic control regions (Figure S1G). In conclusion, these studies uncover a high-confidence set of 1128 human islet-cell genes that have several expected properties for lncRNAs.
Because many islet lncRNAs have escaped annotation, we reasoned that this could be in part because this class of transcripts is often cell-type specific. We realigned 16 human non-pancreatic RNA-seq datasets (Table S1), and found that 9.4% of RefSeq annotated genes were islet-specific (Figure 2A, Table S6, see methods for a definition of the islet-specificity score). By contrast, 55% of intergenic lncRNAs, and 40% of antisense islet lncRNAs were islet-specific, a significant enrichment over protein-coding genes (p<3·10−16 for both comparisons) (Figure 2B). This set of lncRNAs was similarly enriched in purified β-cells (Figure 2A and not shown). Many examples of lncRNAs were thus found to be unique or highly specific to β-cells within the entire panel of 18 tissues (Figure 2C). The islet selectivity was confirmed by qPCR in 12/12 lncRNAs using an independent panel of nine tissues (Figure S2). Thus, human islet-cell lncRNAs are frequently transcribed in a highly cell-type specific manner.
To explore the biological significance of human islet lncRNAs, we next examined their relationship with nearby protein-coding genes. The abundance of each lncRNA transcript in human islets was not related to that of the neighboring protein-coding mRNA (Figure S3A). However, the expression of lncRNAs and their most proximal protein-coding gene significantly correlated across different tissues (p<3·10−14) (Figure 3A). Accordingly, the closest protein-coding genes to islet intergenic and antisense lncRNAs were more often islet-specific than the entire collection of annotated genes (p<10−10), up to >3-fold when considering only islet-specific lncRNAs (p<2·10−11) (Figure 3B, C). Notably, islet lncRNAs were preferentially located in genomic regions where the closest protein-coding gene has been linked to β-cell function, development, and transcription (Figure 3C). Examples of known islet protein-coding genes paired with intergenic lncRNAs include MAFB, FOXA2, PCSK1, and ISL1 (Figure S3B and data not shown). We compiled a list of 20 genes encoding known islet-enriched transcription factors, and found that remarkably 13 (65%), including HNF1A, PDX1, PAX6, ISL1, INSM1, NEUROD1, GATA6, NKX2-2, and RFX6, were associated with antisense lncRNAs (Figure S3B and data not shown). Thus, islet lncRNAs often map to the same genomic region as islet-enriched protein-coding genes.
It was also apparent that lncRNAs were often located in gene-poor areas. We examined the genomic intergenic spaces where lncRNAs reside, and found that they are on average nearly 3-fold larger than the entire set of intergenic intervals in the human genome (p<3·10−12) (Figure 3D). The fact that islet lncRNAs were often located in large gene-poor spaces, yet associated to genes with similar cell-type specific expression, suggested the existence of broad cell-specific regulatory domains. Human islets harbor tissue-specific Clusters of Open chromatin Regulatory Elements (COREs), often linked to a single islet-enriched protein-coding gene (Gaulton et al., 2010). We thus examined the relationship of islet lncRNAs to islet-specific COREs (see methods). Both intergenic and antisense lncRNAs, as well as their nearby annotated genes, were more often located near islet-selective COREs than random intergenic regions (p<3·10−9) or expressed annotated genes (p<0.013) (Figure 3E). Thus, islet lncRNAs are often associated to broad cell-specific regulatory domains, many of which appear to be shared by protein coding genes.
HI-LNC25 illustrates many of the above-mentioned features. It is a multiexonic transcript located in a broad ~1.6 Mb space that lacks any protein-coding gene, but contains clusters of islet-specific active chromatin (Figure 3F). The most proximal protein-coding gene is MAFB, an essential regulator of islet-cell maturation that is abundantly expressed in human β-cells (Artner et al., 2007; Dorrell et al., 2011). Interestingly whereas MAFB is expressed in several tissues, HI-LNC25 shows a much more restricted tissue distribution (Figure 3F). Taken together, these results indicate that islet-selective lncRNAs are frequently associated with cell-specific higher-order chromatin domains that potentially underlie co-regulation of non-coding and coding transcript pairs.
To understand the developmental regulation of islet lncRNAs, we first used qPCR to test the expression of β-cell selective lncRNAs (10 intergenic, 3 antisense) in dissected Carnegie stage 17–19 human embryonic pancreas, a progenitor stage that shows scarce signs of cytodifferentiation (Piper et al., 2004). Of the 13 lncRNAs that were examined, all except one were silent or expressed at low levels in pancreatic progenitors, and were subsequently active in adult islets (Figure 4A). Thus, islet lncRNAs are not only often islet-specific, but they are also linked to the pancreatic endocrine differentiation program.
We next examined the dynamics of islet lncRNAs in a human embryonic stem-cell (hES) differentiation model. Current methods to differentiate β-cells from stem-cells are limited by difficulties in completing the maturation of β-cells in vitro. We thus used a hES cell protocol that involves multiple in vitro differentiation steps, followed by encapsulation of differentiated pancreatic endoderm, which after implantation into mice produces mature endocrine cells over a 140-day in vivo incubation period (Kroon et al., 2008). We profiled the same lncRNAs at each stage of the protocol, and discovered that all islet lncRNAs were markedly induced during the in vivo maturation step (Figure 4B). Notably, 6 lncRNAs were expressed at very low or undetectable levels throughout all in vitro differentiation steps, and were only activated during the in vivo maturation step (Figure 4B). Thus, islet-specific lncRNA gene activation is linked to pancreatic endocrine differentiation during embryogenesis, and during in vivo differentiation of hES cell-derived pancreatic progenitors. Islet lncRNAs are therefore candidate markers and/or regulators of β-cell differentiation and maturation.
Mature β-cells exhibit a transcriptional response to increased demand (Bensellam et al., 2009; Schuit et al., 2002). Selected islet protein-coding transcripts are thus moderately induced in response to high glucose under appropriate experimental conditions. To assess if islet lncRNAs are also dynamically regulated, we exposed human islets to 4 mM or 11 mM glucose for 72 hrs. This revealed glucose-dependent changes in protein-coding mRNAs related to insulin secretion (INS, IAPP, PCSK1), whereas the stress-responsive marker DDIT3 (CHOP) was not induced (Figure 4C). We then tested the set of 13 islet lncRNAs and discovered that HI-LNC78 and HI-LNC80, two intergenic multiexonic lncRNAs that are not located near islet-enriched protein coding genes, were consistently upregulated in a glucose-dependent manner in five individual donors (Figure 4C). Thus, selected islet lncRNAs are dynamically regulated in settings that are relevant for mature islet-cell physiology.
We next assessed the evolutionary conservation of islet lncRNAs. Mammalian sequence conservation scores of islet lncRNAs were markedly lower than protein-coding exons, yet higher than random intergenic regions, in keeping with previous results (Pauli et al., 2012; Ulitsky et al., 2011) (Figure 5A). For nearly 70% of human lncRNAs, we identified orthologous mouse genomic regions of >200bp, compared to 58% of random human intergenic regions (p<2·10−9) (Figure 5B).
The degree of sequence identity that is required for functional conservation of non-coding transcripts is uncertain. To improve our understanding of the evolutionary conservation of islet lncRNAs, we examined if orthologous lncRNAs were transcribed in mouse islets. To this end, we generated 78 million paired-end reads from purified mouse islet PolyA+ RNA directional libraries (Table S7). We found that 47% of the mouse orthologous regions were transcribed in mouse islets, as opposed to <3% random intergenic regions of the same size (p<3·10−16) (Figure 5C). This is likely an underestimate of orthologous mouse lncRNAs since qPCR analysis confirmed the expression of 7/7 orthologous mouse lncRNAs, and also showed that 5/7 orthologous mouse genomic regions that did not surpass our detection threshold in the RNA-seq analysis were nevertheless detected at low levels in mouse islet and/or β-cell line RNA, whereas control intergenic regions were undetectable (Figure 5D, E).
Next, we assessed if mouse orthologous lncRNA transcripts are regulated in a similar manner as their human counterparts. Like human lncRNAs, mouse orthologous transcripts were >4-fold more frequently islet-cell specific than protein-coding genes (p<3·10−16) (Figure 5F, G, Figure S4A). Moreover, the analysis of E13.5 embryonic pancreas showed that 5 out 8 lncRNAs were inactive prior to endocrine differentiation, indicating that they are typically regulated in a similar stage-specific fashion as their human orthologs (Figure 5H, Figure S4B). Furthermore, a comparison of islets from neonate and adult mice showed that most lncRNAs were further upregulated during postnatal islet maturation (Figure S4C). Likewise, five out of eight tested mouse islet lncRNAs were induced upon exposure to high glucose concentrations, including Mi-Linc80, whose ortholog is glucose-responsive in human islets (Figure 5I). Several islet lncRNAs were also regulated in vivo in glucose intolerant Leptin-deficient mice (Figure S4D). Thus, genomic mouse regions that are orthologous to human islet lncRNA genes are frequently also transcribed in islets, and like their human counterparts they exhibit a highly dynamic and cell-specific regulation.
The evolutionary conserved regulation of islet lncRNAs points to a functional role. LncRNAs have been linked to diverse types of functions, frequently involving direct or indirect regulation of expression of protein-coding genes (Wang and Chang, 2011). To begin to understand islet lncRNA function we focused on HI-LNC25, which fulfills several typical features of human islet lncRNAs, as shown in Figure 3F. As an experimental model we used a human β-cell line (EndoC-β-H1) that exhibits glucose-induced insulin secretion (Ravassard et al., 2011). We transduced human β-cells with lentiviral vectors expressing two independent shRNA hairpins that suppressed HI-LNC25 RNA and examined the expression of a panel of 24 islet mRNAs. This screen identified GLIS3 mRNA as a potential regulatory target of HI-LNC25. GLIS3 encodes an islet transcription factor, it is mutated in a form of monogenic diabetes, and contains T2D risk variants (Cho et al., 2012; Senee et al., 2006). We therefore performed four independent HI-LNC25 knockdown experiments with two separate shRNA hairpins. Although this depletion of HI-LNC25 did not cause significant changes in glucose-stimulated insulin secretion (not shown), we observed a consistent reduction of GLIS3 mRNA in comparison to five control shRNA hairpins, whereas other control genes remained unaltered (Figure 6A). This effect was stably maintained over several days in culture (Figure 6B). Thus, HI-LNC25 positively regulates GLIS3 mRNA, supporting the notion that our collection of islet lncRNAs contains regulatory transcripts.
To determine the potential role of islet lncRNAs in the pathogenesis of diabetes, we first examined whether lncRNAs are abnormally expressed in human islets from donors with T2D. We examined the panel of 13 lncRNAs, and added KCNQ1OT1, a lncRNA that was detected as an overlapping antisense lncRNA in our analysis, and has been previously genetically associated with T2D (Voight et al., 2010). We compared these lncRNAs in islets from 19 non-diabetic and 16 T2D donors. We found that two lncRNAs, namely KCNQ1OT1 and HI-LNC45, were significantly increased or decreased in T2D islets, respectively (p<0.02) (Figure 7A). To ensure that this result was not due to differences in islet purity between groups of samples, we normalized the expression levels of HI-LNCs relative to the islet transcription factor PAX6, and found similar results (data not shown). Thus, selected lncRNAs are dysregulated in T2D islets.
To further examine the potential role of islet lncRNAs in the genetic susceptibility for diabetes, we first examined loci implicated in monogenic syndromes of β-cell dysfunction disorders. Islet antisense lncRNAs were found within loci underlying neonatal diabetes (ABCC8/KCNJ11), pancreas agenesis (GATA6), and monogenic diabetes (HNF1A) (Allen et al., 2012; Bell and Polonsky, 2001; Gloyn et al., 2004). Because several antisense lncRNAs have been implicated in cis regulatory effects (Wilusz et al., 2009; Xu et al., 2011), these islet lncRNAs provide additional candidate regulatory elements that may contain pathogenic variants.
Next, we examined genetic association data from multifactorial T2D and related continuous glycemic measures, including fasting glucose in non-diabetic individuals (Dupuis et al., 2010; Voight et al., 2010). Currently, >50 loci are known to show association to T2D and related traits, and at most loci non-coding variants are likely to be causal (McCarthy, 2010). Earlier studies showed that two annotated lncRNAs, ANRIL/CDKN2BAS and KCNQ1OT1, map within established T2D susceptibility loci (Voight et al., 2010; Zeggini et al., 2007). To evaluate the potential overlap between newly identified islet lncRNAs with T2D loci, we used MAGENTA (Segre et al., 2010). This tool tests whether a pre-specified set of genes is enriched for trait associations in genome-wide data, and was applied here to available data sets for T2D and related continuous glycemic measures, as well as 5 non-islet related control phenotypes.
We found that islet lncRNA transcripts were, as a group, enriched for association with both T2D (p=0.02) and fasting glucose (p=0.01), with no enrichment for the control phenotypes apart from height (p=0.02) (Lango Allen et al., 2010) and waist hip ratio (p=0.001) (Heid et al., 2010). When lncRNAs were examined by type (intergenic vs. antisense), enrichment of association signal for intergenic lncRNAs was only observed for T2D and fasting glucose (T2D p=0.03; fasting glucose p=0.003) (Table S8). Of 55 T2D susceptibility loci, 9 contained islet lncRNAs within 150 kb of the reported lead SNP, 6 of which have been linked directly to β-cell dysfunction (Figure 7A) (Cho et al., 2012; Dupuis et al., 2010; Kooner et al., 2011; Strawbridge et al., 2011; van de Bunt and Gloyn, 2010; Voight et al., 2010). Examples include a lncRNA in the vicinity of PROX1, which overlaps the region of strongest association, very likely to contain the casual SNP (Figure 7B), and the most significant lncRNA in the MAGENTA analysis, near WFS1 (Figure 7C). These studies therefore offer a new class of genomic elements that can be interrogated to dissect the functional etiology of T2D susceptibility.
Increasing evidence points to a regulatory function of many lncRNAs, which suggests a pivotal role in physiology and disease (Guttman and Rinn, 2012; Wang and Chang, 2011). We have integrated transcriptional and chromatin maps to systematically annotate lncRNA genes in pancreatic islet cells, a key tissue for human diabetes. Strand-specific analysis uncovered hundreds of intergenic islet lncRNAs, and many others that were antisense to protein-coding genes and hence not discernable using conventional cDNA libraries. Human islet lncRNAs were found to be highly cell-type specific, developmentally regulated, and tightly linked to the differentiation of ES cell-derived islet-cells. We identified mouse orthologous transcripts, and discovered that they are dynamically regulated in a similar manner as human islet lncRNAs. We focused on a prototypical islet lncRNA and demonstrate that it acts as a positive regulator of an islet mRNA. Finally, we show examples of lncRNAs that are dysregulated in T2D islets and others that map to T2D susceptibility loci. These studies open new avenues to study the role of functional lncRNAs in islet-cell disease and therapeutic programming.
Islet lncRNAs show a striking cell-type specific expression pattern. This finding is in line with a recent analysis of intergenic lncRNAs across multiple tissues (Pauli et al., 2012), and with earlier descriptions of lncRNAs that have distinct cell-type specific expression patterns in brain (Mercer et al., 2008). Our studies additionally demonstrate that islet-specific lncRNA genes are inactive in embryonic pancreatic progenitors, and subsequently become activated in islet-cells. Likewise, numerous lncRNAs were activated during the final endocrine differentiation step of a pluripotent cell differentiation protocol. This indicates that islet lncRNAs are an integral component of the endocrine differentiation program.
Islet-specific lncRNA transcription was linked to clusters of open chromatin. Such clusters were previously shown to be associated with islet-specific transcription (Gaulton et al., 2010). Interestingly, we also observed concordant cell-specific transcription of lncRNAs with their nearest protein-coding genes, in accordance with other genomic analysis of lncRNAs (Ponjavic et al., 2009). These data suggest that concordant tissue-specific expression of lncRNA and protein-coding pairs is linked to shared chromosomal regulatory domains.
The islet specificity of lncRNAs is reminiscent of the recent discovery of transcription factors that show a similar stage- and cell-specific activation pattern (reviewed in (Servitja and Ferrer, 2004)). Several such islet-enriched factors have now been shown to control islet development, and have consequently been exploited to program insulin-expressing cells from somatic cells (Collombat et al., 2009; Zhou et al., 2008). By analogy, islet lncRNAs that exert gene regulatory functions could also be employed in efforts to program functional β-cells. This prospect is supported by recent studies showing that a lncRNA can promote reprogramming pluripotent cells from somatic cells (Loewer et al., 2010).
Several islet lncRNAs were specifically activated during the in vivo maturation step of the ES cell differentiation protocol. This result is significant because the major roadblock for efforts to derive fully functional β-cells from pluripotent cells lies in the inability to complete the differentiation process in vitro (Van Hoof et al., 2009). Our findings suggest that lncRNAs that are induced at this late stage can be used as biomarkers of mature β-cells, and potentially be exploited as effectors to promote β-cell programming.
For a major fraction of lncRNAs we identified orthologous mouse genomic regions, although the overall sequence identity was clearly not subject to a similar evolutionary constraint as protein-coding genes, in keeping with earlier studies of lncRNAs (Pauli et al., 2012; Ulitsky et al., 2011). Several mechanisms have been previously proposed to explain this comparatively low sequence conservation, including a hypothetically less stringent requirement for primary sequence conservation to maintain functional secondary structures, or the existence of very short stretches of functional sequences (Ulitsky et al., 2011). It is also conceivable that evolutionary changes in lncRNA sequences are functionally coupled to other parallel genomic changes, such as the acquisition of species-specific repetitive elements.
Importantly, in the current study we discovered that a significant fraction of orthologous mouse sequences were transcribed in mouse islet-cells and β-cell lines. Orthologous lncRNA pairs display similar cell-type specific expression and stage-specific regulation during embryonic development. This finding points to an evolutionary conserved functional property of lncRNAs that extends beyond primary sequence, and further supports that lncRNAs are an integral component of the mammalian islet-cell differentiated phenotype.
The identification of lncRNAs opens a new framework to study human pathophysiology. Our study revealed lncRNA genes at 6 neonatal diabetes loci and in HNF1A, the most common monogenic diabetes locus (Bell and Polonsky, 2001). Most such lncRNAs run antisense from the coding gene. Antisense transcripts have been shown to control the transcription of protein-coding genes in cis (Wilusz et al., 2009; Xu et al., 2011). Antisense lncRNAs identified in this study are thus candidate regulators of protein-coding genes implicated in human β-cell monogenic disorders, and can be potentially affected by pathogenic mutations.
These results are also relevant to the molecular etiology of T2D, a disease caused by abnormal β-cell function or growth (Bell and Polonsky, 2001; McCarthy, 2010). We identified examples of islet lncRNAs that are dysregulated in T2D. Our findings also revealed islet-cell lncRNAs as candidates to dissect the underpinnings of non-coding variants underlying T2D risk. More generally, these results set the stage for future studies to dissect the potential role of inherited and acquired defects in lncRNA genes in human β-cell physiology and disease. Several experimental model systems, including mouse genetics coupled with perturbation studies in human β-cells, are now available to address this challenge.
Human islets were isolated at the University of Virginia, University of Geneva, University of Lille, and San Raffaele Scientific Institute islet centers (Bucher et al., 2005). Human islets used for RNA-seq and ChIP-seq were cultured with CMRL 1066 medium containing 10% Fetal Calf Serum (FCS) before shipment, after which they were cultured for three days with RPMI 1640 medium containing 11 mM glucose, supplemented with 10% FCS. For glucose regulation experiments, human or mouse islets were cultured for 72 hrs in RPMI 1640 medium containing 4 or 11 mM glucose. Donor information for non-diabetic and diabetic donors is provided in Table S2. Samples were selected based on islet cell purity, as assessed by dithizone staining, immunofluorescence analysis, and qPCR analysis of cell-specific mRNAs.
RNA was isolated with Trizol (Invitrogen) or RNeasy (Qiagen). DNase I treatment and a control lacking reverse transcriptase were performed on all RNA samples. Quantitative PCR was performed as described based on SYBR green detection (van Arensbergen et al., 2010). See Table S9 for oligonucleotide sequences.
Lentiviral vectors carrying two miRNA-based shRNAs targeting HI-LNC25 and five non-targeting control sequences were transduced into the EndoC-β-H1 human β-cell line (Castaing et al., 2005; Ravassard et al., 2011; Scharfmann et al., 2008). Oligonucleotide sequences are shown in Table S9. Non-transduced cells were assayed in parallel. Cells were assayed three or seven days post transduction.
Human ES cells were differentiated to pancreatic insulin-producing cells as described (Kroon et al., 2008).
Chromatin immunoprecipitations were performed as described (Boj et al., 2009), with modifications described in supplemental methods. ChIP libraries were prepared according to Illumina protocols and sequencing of single end reads was performed on a GAIIx system. RNA-seq was performed from unidirectional and non-directional cDNA libraries prepared from PolyA+ or rRNA-depleted pancreatic islet RNA, and sequenced using GAIIx, HiSeq2000 or SOLiD 4 systems. Non pancreatic RNA-seq reads were retrieved from the Illumina human BodyMap2 dataset (http://www.ebi.ac.uk/arrayexpress/browse.html?keywords=E-MTAB-513) and the ENCODE/LICR Project (Birney et al., 2007). Reads were aligned to the NCBI36/hg18 and NCBI37/mm9 genomes using Bowtie v0.11.3 for ChIP-seq, allowing 1 mismatch per read, no multi-mapping and no clonal reads. Non-directional RNA-seq libraries were mapped with TopHat v1.2.0, and unidirectional libraries with Bowtie, using parameters described in supplemental methods. The number of uniquely aligned reads, read length, and sample information are shown in Table S1, S7.
Transcriptional units were defined from the RNA-seq alignments and joined by splice junctions and paired-end mapping to create putative gene models. Transcriptional units that were expressed at >0.5 RPKMs, with H3K4me3 enrichment in the 5’ region in at least two samples, and either did not overlap with annotated protein-coding genes or did overlap but were transcribed in the opposite strand, were assessed for protein-coding potential and processed through our lncRNA discovery pipeline. To assess evolutionary sequence conservation human islet lncRNA exon locations or the same fragments randomized 1000 times in the alignable portion of intergenic space were analyzed. Orthologous mouse genomic sequences were identified with the LiftOver tool from the UCSC browser. Mouse islet RNAseq was used to detect transcripts expressed at >0.5 RPKM in orthologus or randomized control regions. To evaluate islet specificity of lncRNAs, Cluster 3.0 v1.5a was used for hierarchical clustering of transcript expression in islet cells and 16 non-pancreatic tissues. Furthermore, a score was created that measures the difference between the average expression in 3 human islet PolyA+ samples and the average + 2 standard deviations in the remaining 16 tissues. MAGENTA (Segre et al, 2010) was used to evaluate whether lncRNAs were enriched for association to T2D and related glycemic traits, with adaptions described in supplemental methods.
Chi-square, Pearson’s r coefficient, and independent 2-group Mann-Whitney U-tests were implemented with the R statistical package (http://www.r-project.org). Student’s t-test was performed with Microsoft Excel for quantitative PCR experiments. Error bars in Figures 4–6 represent SEM.
This work was funded by grants from the NIH-BCBC (2U01 DK072473-06 to J.F., P.R., L.S.; U01-DK089567 to M.S. and J.F.), the Juvenile Diabetes Research Foundation (26-2008-633 to J.F., and 31-2008-416 to TB, FP, LP to support human islet isolation), Ministerio de Economía y Competitividad (SAF2008-03116 to J.F. and a PhD fellowship to Ignasi Morán). ALG is a Wellcome Trust Senior Fellow in Basic Biomedical Research (WT 0951)0/Z/10/Z). We thank Rudolph Leibel (University of Columbia) for providing B6.V-Lepob/J mice, Bing Ren (Ludwig Institute for Cancer Research) for generating the ENCODE mouse tissue RNA-seq data, Kelli Bramlett (Genome Sequencing Collaborations Group) for her help in generation of SOLiD data and Viacyte Inc for performing implantations of hES cell derivatives. We are also grateful to Ayellet Segrè (Broad Institute) for helpful discussion on MAGENTA, and to Chris Stoeckert and Elisabetta Manduchi (University of Pennsylvania) for help with the online submission.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Data sets have been deposited in the ArrayExpress Archive under accession number E-MTAB-1294. Islet lncRNAs coordinates are available at http://www.betacellregulation.net
Supplemental information includes Extended Experimental Procedures, 9 tables, and 4 figures.