|Home | About | Journals | Submit | Contact Us | Français|
Functional genomics, the analysis of the wealth of data produced by genome-wide analyses of gene expression, protein-protein, and protein-DNA interactions, has revolutionized biomedical research. Our ability to determine global gene expression profiles, transcription factor binding sites, and histone modification maps using microarray-based technologies and next generation sequencing applications has greatly enhanced our understanding of gene regulatory networks and the molecular wiring diagrams of cells and tissues. The organogenesis of the endocrine pancreas involves numerous signaling events within the endoderm-derived pancreatic epithelium and the surrounding mesenchyme, as well as complex transcription factor networks. Detailed understanding of the differentiation process from foregut endoderm to mature endocrine cells has enabled the rational design of in vitro differentiation protocols that coax embryonic stem cells into β-like cells that might enable cell replacement therapy for diabetes in the future. In this review, we summarize the research studies that have utilized genomic tools to elucidate endocrine pancreatic organogenesis.
The pancreas, a complex organ consisting of both exocrine and endocrine compartments, is critical for nutrient digestion and blood glucose homeostasis. The endocrine pancreas is organized into the islets of Langerhans, which constitute less than 2% of the mass of the pancreas, and comprise five endocrine cell types: the α, β, δ, ε, and pancreatic polypeptide (PP) cells. The α cells produce glucagon and β cells produce insulin, two hormones that act in an opposing fashion to maintain blood glucose homeostasis. Glucagon mobilizes glucose from peripheral tissues and therefore elevates blood glucose levels to prevent severe hypoglycemia during fasting, whereas insulin stimulates glucose storage and lowers blood glucose levels in the postprandial state.
In diabetes, there is an insufficient amount of insulin, which in type 1 diabetes is caused by autoimmune destruction of pancreatic β cells and absolute insulin deficiency, and in type 2 diabetes is the result of decreased peripheral insulin sensitivity and relative insulin deficiency. The resulting hyperglycemia can lead to severe complications such as stroke, heart attack, and renal failure. The incidence of diabetes, especially type 2, has increased dramatically over the past 30 years, even among adolescents , and it has been estimated that by the year 2030, about 439 million adults worldwide will be affected by the disease . Because both α and β cells of the endocrine pancreas are central to the control of glucose homeostasis, understanding their biology and genesis is critical for the future development of new treatment paradigms, including cell replacement therapy.
Eleven years ago, following the establishment of the so-called ‘Edmonton protocol’ for islet transplantation for severely ill type 1 diabetics, pancreatic endocrine cell differentiation and proliferation became an intense research focus . Through a novel immunosuppression regimen, cadaveric islets transplanted into a recipient liver via the portal vein could survive for years and normalize blood sugar levels, even allowing complete insulin independence. However, given the permanent shortage of organ donors and the increasing incidence of type 1 diabetes , the exploration of novel sources of β (and possibly α) cells became an urgent research endeavor. With a detailed understanding of pancreatic development, the in vitro production of functional β cells from human embryonic stem cells (hESC) to produce an infinite source of functional β cells for diabetes treatment has developed into a realistic possibility [5,6].
The field of functional genomics focuses on the genome-wide evaluation of gene transcription, gene translation, and DNA-protein interactions [7,8]. About 15 years ago, the sequencing of the first genomes of model organisms (such as yeast) and the development of DNA array-based technologies (such as microarrays for genome-wide expression analysis and later assays for genome-wide location of transcription factor binding sites) revolutionized the field [9,10]. The application of hierarchical clustering analysis made it possible to analyze the large data sets resulting from expression microarrays in a comprehensive fashion by assembling genes into groups of similar and different expression patterns  and elucidating tissue- and development-specific gene regulation modules. Even though these techniques have transformed biomedical research, there are certain limitations to the microarray approach, because the coverage of the genome is restricted, and because this hybridization-based method has limited sensitivity . The development of new sequencing technologies, including ultra high-throughput sequencing strategies such as the sequencing of mRNA (RNA-Seq) and chromatin immunoprecipitation followed by sequencing (ChIP-Seq), has overcome these restrictions, and has enabled biomedical researchers to analyze global changes in gene expression and identify transcription factor binding sites across the entire genome with extremely high accuracy and sensitivity [12,13].
Transcription factor networks control the maintenance of the pluripotent state in stem cells and the progression of development toward the α and β cell fates [14,15]. Therefore, global gene expression analysis and changes in transcription factor occupancy during the time course of development provide valuable information for in vitro differentiation protocols from hESC or other cell sources toward mature α and β cells. Gene regulatory networks are controlled by the binding of transcription factors to promoter or enhancer regions of other transcription factors and numerous additional target genes that execute enzymatic and structural functions within the cell . The ChIP-Seq method, which will be discussed in detail below, can be applied to determine transcription factor binding sites and to derive complex maps of histone modifications , thereby determining the precise epigenetic landscape of different tissues and cell types, developmental stages, and disease states. This allows for the identification of gene regulatory networks; for example, those that control development and function of the endocrine pancreas, and their epigenetic regulation [18,19]. In-depth analysis of these networks is crucial for the understanding of physiology and pathophysiology of α and β cells, and can provide important clues for new diabetes treatment options. This review will focus on the organogenesis of the endocrine pancreas, and summarize how functional genomics has contributed to the understanding of its development and function.
Historically, the embryonic origin of the endocrine pancreas was a source of controversy. Because pancreatic endocrine cells share many features (such as hormone secretion and electrical excitability) with neuroendocrine cells like the chromaffin cells of the adrenal medulla, scientists in the 1970s and 1980s suggested that pancreatic endocrine cells are of neural crest origin . However, embryological and genetic lineage tracing experiments have established beyond doubt that the endocrine pancreas - just like its exocrine counterpart - is derived from the embryonic endoderm, which also gives rise to the gut and liver, among other organs .
Complex signaling networks control the proper induction of pancreatic endoderm from pre-pancreatic endoderm and the formation of the dorsal and the ventral pancreatic buds. These signaling events include the repression of sonic hedgehog (Shh) signaling in the pre-pancreatic endoderm through inhibitory signals such as FGF2 and activin β emanating from the notochord, which facilitates the formation of the dorsal pancreatic bud . Furthermore, inductive signaling from the neighboring endothelium, and the mesenchyme via FGF10 for instance, is crucial for dorsal pancreatic budding and the activation of Ptf1a, a transcription factor critical for pancreas development, in the dorsal pancreatic endoderm [23,24]. Interestingly, the emergence of the ventral pancreatic bud is controlled by different mechanisms. This was clearly demonstrated through genetic models in which only the development of the ventral pancreatic Anlage was affected. For example, Hhex-/- embryos lack a ventral pancreatic bud, and careful investigation of Hhex-/- embryos revealed that Hhex is crucial for the proliferation and correct positioning of the ventral definitive endoderm, leading to induction of the ventral pancreatic fate . Conversely, retinoic acid signaling is essential for the formation of the dorsal pancreas, as shown by mutations of the retinaldehyde dehydrogenase 2 (Raldh2) gene . In addition, an exact interplay and switch of inhibiting and activating signaling through BMP and TGFβ between the 3-4 somite and 5-6 somite stages allows for the specification of liver and pancreas in adjacent domains .
Pancreatic organogenesis and the differentiation of progenitor cells to endocrine pancreatic cells take place in three stages, referred to as the primary, secondary, and tertiary transitions . The primary transition occurs from embryonic day (E) 8.5 to E12.5 of mouse development, during which time the foregut endoderm thickens and buds into the surrounding mesenchyme, giving rise to the dorsal and the ventral pancreatic Anlagen . The two buds contain pancreatic progenitor cells (E9), which generate a small number of early hormone-positive cells. Most of these elaborate glucagon (E9.5), but some also stain for insulin (E10.5) [29,28,21].
Interestingly, first-wave insulin/glucagon co-expressing cells do not seem to contribute to the mature islet, at least in the mice where this question process has been studied using genetic lineage tracing . However, the contribution of other first-wave hormone-positive cells to the adult islet still remains uncertain. The first wave of endocrine cell differentiation is accompanied by the beginning of pancreatic epithelial branching (E11.5) and followed by a second, more extensive, wave of endocrine cell differentiation during the secondary transition (E13-E16.5). This major wave of endocrine cell differentiation gives rise to glucagon-expressing α cells, insulin-expressing β cells, somatostatin-expressing δ cells, ghrelin-expressing ε cells and pancreatic polypeptide expressing cells [29,31,32]. The proportion of endocrine cells produced varies with developmental time; α cells initially predominate, and then β cells . The second-wave endocrine cells originate from a specialized domain of multipotent progenitor cells, termed the ‘trunk’ domain, which is distinguishable molecularly around E14.5 due to the expression of different markers and transcription factors . Cells in the trunk region are primarily labeled by markers of endocrine precursors, and are thought to differentiate into duct and endocrine cells, whereas cells in the tip region present labeling by exocrine markers and give rise to acinar cells . The tertiary and final transition phase (E16.5 to birth) includes islet cell proliferation, migration of differentiated cells, and formation of the islets of Langerhans, the final endocrine structure .
In summary, the development of the pancreas is an elaborate process in which all pancreatic cell types (endocrine, acinar, and duct cells) are derived from the epithelium, itself a descendant of the endoderm germ layer . Extensive genetics research has established the indispensable role in pancreatic organogenesis of several transcription factors, such as neurogenin 3 (Ngn3), the forkhead box proteins A1 and A2 (Foxa1 and Foxa2), regulatory factor x 6 (Rfx6), pancreatic and duodenal homeobox 1 (Pdx1), the nk2 and nk6 homeobox factors (Nkx2.2 and Nkx6.1), and pancreas-specific transcription factor 1 (Ptf1a) [37-50]. While some of the crucial transcription factor genes are still active and have known functions in subsets of mature endocrine pancreatic cells, for example Pdx1 , other essential developmental factors, such as Ngn3, exhibit much reduced levels of expression in differentiated pancreatic cells . Thus, some transcription factors play roles in both endocrine development and maintenance of endocrine cell function, while others are only relevant during specific developmental stages.
During the last twenty years, gene targeting in the mouse (and more recently, functional genomics) have been vital for gaining in-depth knowledge about the transcriptional networks governing endocrine pancreatic development and maintenance of mature endocrine cells. Analysis of mice carrying null, point, and conditional mutations in transcription factor genes helped to establish a hierarchy of activation and function of these factors by establishing their relevance during development and in adult function. In addition, as our understanding of early endocrine cell markers deepens and cell sorting strategies are optimized, the characterization of the transcriptome of cells at early developmental stages and the comparison to mature β cells using DNA- microarray or RNA-Seq technology can provide important insights about the temporal importance of each transcription factor, and provide crucial knowledge for the directed differentiation of hESC (as well as other cell types) into functional β cells.
The invention of microarray technology during the mid 1990s enabled for the first time massively parallel analyses of steady-state mRNA levels for thousands of genes in a process later termed ‘expression profiling.’ Briefly, short single-stranded DNA sequences are printed or synthesized on a glass slide, and one or two cDNA samples (labeled separately with fluorescent dyes) are hybridized to this ‘microarray’ slide containing many thousand DNA probes . After extensive washing to remove excess of the labeled sample, the resulting fluorescence intensity at each spot on the array is determined using a high-resolution fluorescence scanner. The fluorescence intensity is then a direct measure of the relative abundance of the corresponding mRNA in the sample. Though this method initially was used only for global mRNA expression analysis, its utility was greatly expanded through the invention of ‘ChIP-on-chip’ by Ren and colleagues in 2000 . ChIP-on-chip combines chromatin immunoprecipitation and microarray analysis to make possible the simultaneous analysis of transcription factor binding sites at thousands of loci in a given genome . Labeling of an immunoprecipitated sample and a control sample (not immunoprecipitated/enriched) with two different fluorescent dyes, followed by hybridization to and scanning of this DNA-microarray allows for detection of DNA-protein interaction sites. Two limitations of the ChIP-on-chip technology, however, eventually became apparent. First, because the size of the targets on the array are relatively large and represent genomic areas containing gaps, the spatial resolution of binding site determination was limited. Second, and even more important, ChIP-on-chip is not truly genome-wide for species with complex genomes such as mice or humans, because these large genomes cannot be tiled completely even on the highest density arrays. In the last five years, advances in DNA sequencing technology have dramatically reduced the cost and time required for large-scale sequencing, such that at the time of this writing, a human genome can be sequenced within a week and for less than $5,000 . As a result of these methodological advances, next generation sequencing applications, such as RNA-Seq and ChIP-Seq, have begun to replace microarray-based expression profiling and genome-wide location analysis by ChIP-on-chip [12,13].
Next generation sequencing applications include the RNA-Seq and ChIP-Seq methods. There are multiple ultra-high-throughput sequencing technologies available today, but we will focus on one commonly used methodology, the so-called “Solexa sequencing” technology marketed by Illumina. In this section, we focus on the ChIP-Seq application, describing it in more detail and pointing out the advantages of this technique over the formerly used ChIP-on-chip assay. In ChIP-Seq, cells or tissues are first treated with formaldehyde, which crosslinks the DNA-binding proteins or histones to the DNA. Sequencing analysis of the DNA requires random shearing of the chromatin by sonication, after which the majority of the resultant DNA fragments ideally range from about 200 to 500 bp in size. For some applications (in particular the analysis of histone modifications) chromatin can instead be digested with micrococcal nuclease (MNase). This enzyme digests all DNA except that which is wrapped around the nucleosome core, which is made up of the histone octamer. Following obtainance of the optimal size range, the chromatin is immunoprecipitated with an antibody, for example one against a transcription factor or a specific posttranslational histone modification. After reversal of the crosslinks, a ‘library’ is prepared that allows for sequencing analysis of the DNA fragments that were initially bound by the protein of interest. Briefly, the immunoprecipitated DNA is deproteinized, modified with adapters, size-selected, and amplified (Fig. 1). Because both sonication and MNase digestion produce DNA fragments with ragged ends (either 5′ or 3′ overhangs of unknown length and sequence) the DNA fragments require repair to produce blunt ends. This is typically achieved using T4 DNA polymerase and Klenow polymerase in the presence of deoxynucleotides. The 5′ ends of the blunted ends are then phosphorylated using polynucleotide kinase before the addition of adaptors of defined sequence by DNA ligation. The DNA fragments are subsequently amplified by limited PCR. Finally, the resulting library is melted to single-strand DNA. The ligated adapters allow one single DNA molecule to bind to one of billions of randomly spaced complementary ‘anchor’ oligonucleotides on the flow cell surface. Limited PCR then amplifies the captured molecules to millions of clusters consisting of 100s to 1,000s of identical DNA fragments (Fig. 1). Adenosine, cytosine, guanine, and thymine, each labeled with a different fluorescent dye and blocked at the 3′ end to limit DNA synthesis to one base per cycle, are incorporated using DNA polymerase at the first base of each cluster. After laser excitation, a separate image for each of the four fluorophores of the first base of each cluster is captured and saved. The fluorescent signal is then quenched, and the 3′ end of the added nucleotide is deblocked. Addition of a single fluorescent base, laser excitation, image capturing, signal quenching, and deblocking of the 3′ end are then repeated base by base, providing the sequence of each of the DNA fragments, which are then aligned to the reference genome. Depending upon the antibody used in the ChIP experiment, this procedure can allow for detection of transcription factor binding sites, the determination of histone modification maps, or the determination of nucleosome positions.
The ChIP-Seq method enormously increases the accuracy of our knowledge about DNA-protein interactions. Compared to the ChIP-on-chip method, the resolution is enhanced and the genome coverage is much more comprehensive, as ChIP-Seq does not depend upon sequences spotted on a microarray . Another important advantage is the low amount of DNA required for a ChIP-Seq experiment. This is especially beneficial for determining exact histone modification maps and studying DNA-protein binding events in tissues at early developmental stages, in small, sorted cell populations, and in scarce samples, such as human tissue samples. Some of the first published studies using the ChIP-Seq method included the description of histone modification profiles in CD4+ T cells  and the mapping of neuron-restrictive silencer factor (NRSF) binding sites in the human genome  in 2007, and this technology has now been adopted by hundreds of laboratories world-wide.
The large volume of data that results from these applications necessitates the creation of new algorithms for proper data analysis and the determination of transcription factor targets as well as histone modification profiles. Differences in peak type, such as strong and narrow peaks resulting from classic transcription factor binding to its sequence motif, or low and broad peaks such as those for the repressive histone 3 lysine 27 trimethylation mark (H3K27me3), determine which algorithm is suitable for use. Publicly-available algorithms include CisGenome , Model-based Analysis of ChIP-Seq (MACS) , PeakSeq , Quantitive Enrichment of Sequence Tags (QuEST) , GLobal Identifier of Target Regions (GLITR) , and Hypergeometric Optimization of Motif EnRichment (HOMER) [61,62], discerning characteristics of which include their peak calling criteria and the use of background data [63,64].
The possibility of studying defined cell populations at early developmental stages and comparing transcription factor targets and the epigenetic landscape to those of mature endocrine cells greatly improves our understanding of pancreatic development, and will help us to attain the goal of producing functional β cells in vitro. Previously unknown enhancer and promoter sites, determined by the histone 3 lysine 4 monomethylation mark (H3K4me1) and the histone 3 lysine 4 trimethylation mark (H3K4me3), can be identified across the entire genome, elucidating the regulation of crucial genes. In the cancer research field, an extensive effort to develop epigenetic therapeutics is being mounted, as abnormal epigenetic modifications can lead to dysregulation of cell differentiation and proliferation . For example, histone deacetylase inhibitors are presently being tested in clinical trials and have been approved for the treatment of several cancer types . Interestingly, it has been shown that histone deacetylase inhibitors also play a role in pancreatic development . Treatment of embryonic rat pancreata with different histone deacetylase inhibitors resulted in an increase of the endocrine progenitor population and the β cell pool, indicating a possible use of these inhibitors in the development of new therapeutic strategies for diabetes and improving the efficacy of in vitro techniques for differentiating stem cells into β cells [67,68], and highlighting the importance of epigenetic studies in diabetes research.
The RNA-Seq method is used for precise profiling of the transcriptome using high-throughput sequencing . Some of the first studies using this method were published in 2008 describing the yeast transcriptome , the transcriptome of mouse embryonic stem cells and embryoid bodies , and different mouse tissues . A comparison of RNA-sequencing to microarray data showed that 81% of differentially-expressed genes characterized by the microarray method were also detected with the RNA-Seq method, conferring validity to this novel approach . Though RNA-Seq offers many advantages over the microarray method, it likewise presents several challenges. It permits the detection of a large range of expression levels with an enhanced sensitivity, and allows one to map exact gene boundaries, as well as to discover novel transcripts and rare or tissue-specific alternative splicing isoforms . High sequencing depth and the production of large numbers of sequence reads, which lead to the desired high sensitivity and accuracy, also result in a vast amount of data that need to be analyzed, and present bioinformatical challenges . The possibility of accurately determining the entire transcriptome, especially for transcripts at low expression levels, is extremely attractive for studies focusing on pancreatic organogenesis and can greatly improve our understanding about the differentiation process from pancreatic progenitor cells to mature α and β cells, which promises to impact the development of novel strategies for diabetes treatment.
The exciting possibility of determining global gene expression levels prompted many type 1 and type 2 diabetes research groups to perform microarray analyses on tissues involved in the pathogenesis of type 2 diabetes, such as white adipose tissue, liver, and pancreatic islets. One such analysisusing Affymetrix Genechip Mu6500 involved the comparison of gene expression levels in white adipose tissue of wild-type, ob/ob, and transgenic mice expressing low levels of leptin . Other studies sought to further our understanding of the β cell transcriptome, for example by performing gene expression analysis on a pancreatic β cell line after the induction of growth arrest (using the mouse GDA system, Genome Systems Inc.) , and on primary rat β cells after cytokine-stimulation  in order to elucidate the signaling pathways and mechanisms involved in β cell dysfunction and to find targets for new treatment options. However, the arrays used in these early studies were small and not specifically designed for diabetes research, which presented limitations and prompted the development of the “PancChip,” a resource developed specifically for the diabetes research field . This microarray contained 3,400 clones corresponding mostly to mRNAs expressed in the pancreas, but also including diabetes-related pathways and housekeeping genes, and was used for the study of pancreatic tissue at different time points during development and in adulthood . Recognizing that the early expressed sequence tag (EST) sequencing efforts in the 1990s had completely ignored the endocrine pancreas, the NIDDK-sponsored Endocrine Pancreas Consortium set out to construct cDNA libraries and sequence hundreds of thousands of transcripts from various stages of pancreatic development. This led not only to the discovery of several thousand novel genes, but also to the construction of a new microarray called “PancChip 4.0,” comprising 13,848 elements, with more than 10,000 unique genes, a development that greatly improved diabetes research resources at the time . These and other array types have been extensively used to determine gene expression patterns and to elucidate the impact of transcription factors and signaling pathways on the differentiation process from endocrine progenitor cells to α and β cells.
Arguably the most important protein controlling pancreatic development is the homeobox transcription factor encoded by the Pdx1 gene. Loss of Pdx1 (termed IPF-1 in humans) results in pancreatic agenesis in mice and humans alike, and heterozygous mutations in the human IPF-1/PDX1 gene lead to the development of MODY4 [78,40,38,79-81,39]. Gene expression profiling using a small custom microarray (95 genes spotted on Affymetrix GMS417 arrayer) and sixty single epithelial cells isolated from dorsal pancreatic buds at murine E10.5 showed the presence of six different gene expression patterns, the majority of which express Pdx1, and some of which coexpress pancreatic hormones, suggesting that even the early dorsal pancreatic bud is made up of a heterogeneous cell population . Gu and colleagues analyzed the transcriptional profiles at four different stages of murine pancreas development, including unspecified endoderm at E7.5, FACS-sorted Pdx1-expressing cells at E10.5, Ngn3-expressing cells at E13.5, and mature islets of Langerhans, by microarray analysis utilizing Mu11K, Mu74Av1, and Mu74Av2 Affymetrix arrays . They detected 193, 60, 71, and 217 genes with enriched expression at the respective stages, and the discovery of pathways not previously implicated in pancreatic development, such as the ‘myelin transcription factor 1’ (Myt1) gene which was found to be expressed in endocrine progenitors. Inhibiting the function of Myt1 using a dominant negative Myt1 under the Ngn3-promoter led to a reduction of insulin- and glucagon-expressing cells at E14.5, illustrating the utility of this “discovery research” for establishing new, testable hypotheses on gene function in pancreas development.
To decipher the pathways regulated by Pdx1 in early pancreatic progenitor cells and to elucidate the mechanisms by which pancreatic organogenesis is prevented, global gene expression analysis was performed using two different microarray types (containing ~15,000 and ~21,000 clones from different sources)on control and Pdx1-/- dorsal pancreatic buds at E10.5 . The analyses revealed differential expression for 111 genes, of which 73 genes were down- and 38 genes were up-regulated. Pax6 and Nkx6.1 were among the down-regulated genes, and the expression of Foxa2 and glucagon remained unchanged. The relatively low number of differentially-regulated genes could be due to technical difficulties and the small sample size employed, which most likely limited the outcome of this experiment . Adult Pdx1+/- mice displayed limited β cell mass expansion when metabolically challenged by a high fat diet . This finding was attributed to enhanced ER stress-induced apoptosis, and was examined more closely by measuring global transcript levels using microarray analysis (Mouse PancChip 6.1) in MIN6 cells after acute silencing of Pdx1, and also by comparing Pdx1+/+ and Pdx1+/- mouse islets. Additionally, ChIP-on-chip analysis (Mouse PromoterChip BCBC-5A) was used to characterize Pdx1-binding events in MIN6 cells. The expression of two genes encoding ER proteins, Ero1lb and Nnat, was down-regulated in both MIN6 cells transduced with an adenovirus expressing a short hairpin RNA targeting Pdx1 and in Pdx1+/- mouse islets, implying a role for Pdx1 in ER homeostasis.
In 2007, Keller and colleagues employed a promoter microarray (Mouse PromoterChip BCBC-5B), including the promoters of the miRNA genes known at the time, to determine the direct genomic targets of Pdx1 in a mouse insulinoma cell line . 583 new Pdx1 targets were identified, and the transcriptional regulation of the majority of a selection of genes was validated in a cell culture system expressing a dominant negative mutant of Pdx1. In addition, NeuroD1 targeting events were determined, and it was discovered that Pdx1 and NeuroD1 share binding sites in 440 genes, among them the gene encoding miR-375, important for glucose-stimulated insulin secretion in MIN6 cells  and the maintenance of proper pancreatic endocrine cell function in mice . There is growing evidence that miRNAs, a group of small non-coding RNA molecules, serve as a crucial component of the gene regulatory network and play an important role in health and disease states, including those of pancreatic β cells . The application of functional genomics will be needed to discover unknown miRNAs, predict miRNA targets and expand our knowledge about the nature of their regulation and function.
The transcription factor Ngn3 is expressed in endocrine progenitor cells and is indispensable for the development of pancreatic endocrine cells . Therefore, it was important to understand the regulatory networks that depend on Ngn3. In 2006, the microarray method was used to compare the transcriptome of Ngn3-deficient and wild-type dorsal murine pancreatic buds at E13 and E15, capturing the secondary transition phase of embryonic development . The microarray used in this study (ArrayTAG 20k murine gene collection spotted in duplicate on CodeLink slides) allowed for the expression analysis of 12,140 genes, but unfortunately did not include some of the genes of interest, such as insulin, glucagon, Pdx1, and NeuroD1. At either or both time points, 504 genes were differentially expressed in the Ngn3-deficient mice. After the application of different filtering strategies, 52 genes showed significant and robust differential expression, all characterized by decreased expression in the Ngn3-deficient state. Among the robustly down-regulated genes were genes important for the processing of hormones, genes encoding vesicle proteins (such as synaptotagmin 7 and 13), the hormones ghrelin and neuropeptide Y, and two transcription factors (‘Iroquois related homeobox 1’ (Irx1) and Myt1). Given the complete absence of endocrine cells in the Ngn3-deficient state, finding these endocrine cell-expressed transcripts reduced in the mutant pancreas was not surprising. Irx1 and Irx2 were co-expressed in glucagon-positive, but not insulin- or somatostatin-expressing, cells at E18, suggesting the discovery of novel transcription factors specific for glucagon-expressing cells. A more recent study analyzed mRNA expression profiles from E12.5, E15.5, and E18.5 pancreata of Ngn3-/- and wild type mice using the MGU74Av2 and/or the MOE430 2.0 Affymetrix microarrays, generating lists of differentially-expressed genes at each time point and comparing the expression patterns to adult mouse islets and pancreatic endocrine tumor cell lines . The highest number of downregulated transcripts was detected at E18.5 (645 genes) and included MafB, Nkx6.1, Isl1, and Iapp. Juhl and colleagues point out that the insulin granule zinc transporter Znt8 (Slc30a8), a gene associated with type 2 diabetes risk, was decreased by 18-fold in Ngn3-/- mice at E18.5, reflective of the loss of endocrine cells in the mutant tissue .
White and colleagues chose a different approach to determine the gene expression profiles of endocrine progenitors, which are marked by Ngn3 expression, and their direct descendants . The Ngn3-enhanced green fluorescent protein (EGFP) ‘knock-in’ mouse model was used to sort Ngn3+/EGFP-positive and -negative cells from E13.5, E14.5, E15.5, E16.5, and E17.5 pancreata. RNA was isolated from Ngn3+/EGFP-positive cells and adult mouse islets, and subjected to microarray analysis (Mouse PancChip 6.0) to define the precise mRNA expression profiles at these various developmental stages. This led to the identification of 1,029 genes, including 237 transcription factors that are temporally regulated in endocrine cells during development, and provided an extensive transcriptional profile of the endocrine cell lineage from the progenitor to the differentiated state.
In-depth examination of the regulatory function of diabetes-related genes, as discussed above for the MODY4 gene Pdx1, is critical for the discovery of new aspects and important transcriptional networks in diabetes pathogenesis. In a microarray study (Affymetrix 133A and B arrays, ~44,928 genes) analyzing the global gene expression of non-diabetic and type 2 diabetic human islets, 370 genes were found to be differentially expressed, 243 up-regulated, and 137 down-regulated in the type 2 diabetic samples . A marked reduction of the Hnf-4α (hepatic nuclear factor 4 alpha) transcript, the gene mutated in ‘Maturity onset diabetes of the young 1’ (MODY1), was reported in type 2 diabetic islets. The expression of genes involved in the insulin-signaling pathway, such as the insulin receptor (IR) and AKT2 (which encodes a serine/threonine-protein kinase), were significantly reduced as well, while the hormone expression levels did not differ significantly. The transcript of the aryl hydrocarbon receptor nuclear transporter (ARNT) gene showed the most significant reduction in type 2 diabetic islets, suggesting an important function of ARNT in β cell function. Mice carrying a β cell-specific ablation of Arnt showed impaired glucose tolerance and decreased glucose-stimulated insulin secretion (more pronounced in female mice) and reduced levels of Hnf-4α, further suggesting a role of Arnt in the regulation of this MODY gene . Global gene expression profiling of mice deficient for Hnf-4α in β cells in Hnf-4αLoxP/LoxP;InsCre mice revealed up-regulation of 128 genes and down-regulation of 57 genes using the PancChip version 5.0, 13K cDNA microarray . Gene ontology analysis disclosed that a large percentage of the differentially-expressed genes were annotated to play a role in cell proliferation, a previously unsuspected function of Hnf-4α. While β cell mass was unchanged in unstressed four month-old Hnf-4α mutant mice, there was a 50% decrease in β cell mass in four month-old pregnant Hnf-4α-deficient mice, and a complete loss of the pregnancy-induced increase in β cell replication. The expression profiling thereby formed the basis for uncovering a novel role for Hnf-4α in the physiological β cell expansion in response to proliferative stimuli. Further data mining of the expression profile suggested that Hnf-4α contributes to β cell proliferation at least partly through the Ras/ERK signaling pathway.
The forkhead box A1 and A2 proteins (Foxa1 and Foxa2), two winged-helix transcription factors, are expressed and serve important roles in the development and functional maintenance of tissues derived from the foregut endoderm, such as the pancreas and liver . The loss of Foxa2 in β cells and its effect on gene expression patterns in islets was studied in two different mouse models, the Foxa2LoxP/LoxP;RIP-Cre  and the tamoxifen-inducible Foxa2LoxP/LoxP;Pdx1-CreERT2 model  using PancChip version 4.0 and 6.1, respectively. Microarray analysis revealed the reduced expression of Kir6.2 and Sur1, encoding the two subunits of the ATP-dependent K+ channel, in Foxa2 mutant islets from both studies. The latter study determined differential expression of 143 genes, among them genes that encode proteins for vesicle trafficking, granule biosynthesis and exocytosis, highlighting the importance of Foxa2 in insulin secretion and therefore mature β cell function . The additional simultaneous deletion of the related Foxa1 gene in Foxa1LoxP/LoxP;Foxa2LoxP/LoxP;Pdx1-CreERT2 mice led to a more severe phenotype with impaired glucose homeostasis with significantly altered expression levels of 566 genes in Foxa1/a2-mutant islets, of which 294 were down-regulated and mainly involved in metabolic processes and ion transport, and 272 were up-regulated and found to be annotated to function in neuronal differentiation (Whole Mouse Genome Oligo Microarray G4122A; Agilent Technologies) . The up-regulation of neuronal genes was surprising and uncovered a repressive role of the Foxa factors in the regulation of neuronal genes in mature β cells. The carbohydrate response element-binding protein (ChREBP), a main regulator of carbohydrate metabolism, showed a 13-fold reduction in mRNA expression levels, which prompted Gao and colleagues to further investigate the role of this factor and its association with the Foxa factors during development. Foxa2 ChIP-Seq analysis and Foxa1 and Foxa2 ChIP-assays determined Foxa1 and Foxa2 binding regions in the first intron of the ChREBP gene from E13.5 onward, with the highest chromatin occupancy at E18.5. This discovery of a direct regulation of the ChREBP gene by Foxa1 and Foxa2 suggested a novel role of the two Foxa factors in the regulation of metabolic processes associated with ChREBP . In 2008, an additional study on the regulatory functions of Foxa1 and Foxa2 with respect to Pdx1 was performed using ChIP and ChIP-Seq analyses on fetal pancreas and adult islets . Deletion of Foxa1 and Foxa2 starting at E9.0 in Foxa1LoxP/LoxP;Foxa2LoxP/LoxP;Pdx1-CreEarly mice resulted in dramatic pancreatic hypoplasia, severe hyperglycemia, and early postnatal lethality. Loss of Pdx1 expression in the Foxa1/a2 mutant mice suggested direct or indirect regulation of Pdx1 by the Foxa factors. ChIP and ChIP-Seq analyses demonstrated binding of both Foxa1 and Foxa2 at the Pdx1 locus. Interestingly, these experiments showed that the location of the Foxa1 and Foxa2 binding differed according to developmental stage, with preferred binding at the Area IV of the Pdx1 enhancer during fetal life, and increased binding at the Area I-II-III enhancer in adult islets. These findings proved that Foxa1 and Foxa2 regulate Pdx1 and highlighted the importance of the two winged-helix transcription factors in pancreatic development.
The homeobox transcription factor Nkx2.2 is important for endocrine cell differentiation, particularly terminal β cell differentiation, as Nkx2.2-null mutant mice completely lack β cells but display small numbers of α and PP cells . Microarray analysis (using PancChip 6 containing 13,059 cDNAs) of control and Nkx2.2-null mutant pancreata at E12.5 and E13.5 was employed to determine differential gene expression patterns and elucidate the mechanism of endocrine cell specification arrest that occurs during development . Differential expression of 65 genes was observed at E12.5 (49 down-regulated and 16 up-regulated genes) and an additional 15 genes were down-regulated at E13.5. Among the down-regulated genes were nine transcription factors, including ones that encode for early endocrine transcription factors such as Ngn3 and MafB. In addition, it was shown that deletion of Nkx2.2 leads to differential expression of exocrine genes, such as the upregulation of Elastase 1 and the downregulation of Spink3, the functional repercussions of which are yet to be elucidated .
MafA and MafB belong to the basic leucine-basic transcription factor family and are dynamically regulated during pancreatic development . Global gene expression analysis using Mouse PancChip 6.1 was performed on E18.5 murine pancreata from wild type, MafB-/-, pancreas-wide MafA (MafAΔPanc) and MafAΔPanc; MafB-/- mutants, and revealed differing expression patterns. The authors concluded that MafB regulates genes important for mature β cell function, including insulin secretion and glucose sensing through Nnat and Slc2a2, but not for early β cell development, since transcription factors important for early endocrine differentiation, such as Ngn3, were not differentially expressed .
NeuroD1-deficient mice are severely diabetic at birth and die perinatally, with NeuroD1-deficient pancreata displaying a reduction of endocrine cells, especially β cells . The importance of NeuroD1 for the development and function of β cells was studied utilizing two gene ablation models deleting NeuroD1 at the beginning of β cell formation, and in adult β cells after tamoxifen-induced activation of Cre recombinase . Deletion of NeuroD in β cells resulted in impaired glucose tolerance despite maintained expression of Insulin 2, suggesting that a defect in insulin secretion was responsible for the observed phenotype. Microarray analysis (Mouse PancChip 6.0) revealed the differential gene expression of 68 genes in β cell-specific NeuroD-deficient adult islets, among them genes encoding glycolytic enzymes, resulting in a gene expression program reminiscent of neonatal, immature β cells, thereby suggesting a role of NeuroD in terminal β cell maturation .
To gain insight into the transcriptional landscape and the biology of human endodermal cells in vitro, Wang and colleagues recently analyzed hESC and in vitro derived purified Sox17+ cells at stage 1 (early definitive endoderm) and stage 2 (primitive gut tube endoderm) utilizing microarray (Affymetrix Mouse Human Genome Array U133 Plus 2.0) . They determined that 13,209 genes are differentially expressed between these three stages, which clustered into six groups of similar expression patterns, and they discovered new surface antigens that provide a novel sorting strategy for the endoderm at the primitive gut tube stage . Of course, given the dramatic alterations in cellular phenotype produced in this in vitro differentiation system, it is not surprising that nearly half the genome was affected.
Analysis of gene expression patterns in mature islets promises to yield important insights into their functional state, and establishes a valuable resource for the diabetes research community. Massive parallel signature sequencing (MPSS) on two human pancreatic islet samples identified 6,941 expressed genes, of which 3,552 were shared between both samples . Apart from the pancreatic hormones INS, GCG and SST, the most abundant transcripts included the regenerating islet-derived 3 alpha gene (REG3A), REG1B, REG1A, and bone morphogenetic protein 5 (BMP5). Gene ontology analysis showed that the top 200 transcripts were enriched for categories such as protein synthesis and metabolism, reflecting the function of islets as protein hormone secreting cells.
Martens and colleagues examined the transcriptional landscape of human β-cell enriched samples, mouse islets, and rat FACS-purified β-cells by microarray analysis (Affymetrix HG133A and RG230A) and provided a set of 332 conserved β cell marker genes . The conserved β-cell marker genes were distributed into three clusters, with 15% of the genes being β-cell specific, 30% shared with immune and gut mucosal cells, and 15% with neuronal tissue. These gene sets were functionally related to hormone processing, protein synthesis, and vesicle transport, respectively. Motif analysis for transcription factor binding sites on all identified conserved genes showed an overrepresentation of established β-cell transcription factors, but also muscle-related regulators, a finding that remains unexplained. In addition, the effect of fasting was assessed by comparison of the transcriptome of β-cells isolated from fed and 24h-fasted rats, which showed that especially pathways involved in protein folding and their ER/Golgi processing were affected.
In 2011, the first detailed transcriptional analysis of sorted human pancreatic α, β, acinar, small and large duct cells was reported (using Agilent 4 × 44 Whole Human Genome Array) . Genes expressed 20-fold higher in β cells compared to α cells included INS, IAPP, oestrogen receptor 1 (ESR1), and the solute carrier protein SLC17A6, whereas GCG, ARX, IRX2, and the glutamate receptor GRIN3A were expressed at least 20-fold higher in α cells. Interestingly, MAFB, expressed in the adult mouse specifically in α cells, but not β cells, is active in both mature human α and β cells, revealing important differences of transcription factor expression patterns among species. Analysis of signaling events between different cell types identified 121 ligand-receptor interactions among pancreatic cell types, and suggested that Ephrin (EFN) signaling is implicated in pancreatic function.
The study of epigenetics, including DNA methylation and histone modifications, has become increasingly important to many research fields . The discovery that chromatin structure regulates gene transcription along with DNA replication and repair, coupled with the fact that the information stored in the chromatin structure is inheritable, prompted many research fields to study the epigenetic landscape of several tissues in different disease states and developmental stages . The presence and function of different groups of histone modifications, such as acetylation, phosphorylation, and methylation, have now been characterized. Many studies have focused on the activating histone 3 lysine 4 trimethylation (H3K4me3) and the repressing H3K27me3 mark due to their regulatory role in transcription. The colocalization of the H3K4me3 and the H3K27me3 mark in the gene promoter region is referred to as a bivalent mark, and has been predominantly observed in undifferentiated cell types, such as embryonic stem cells [109,110]. Upon differentiation, the bivalent mark is resolved at many loci via removal of the H3K27me3 mark, which results in activation of the corresponding genes.
Despite elevated interest in chromatin research and the genome-wide analysis of histone modifications, only a few related studies of the endocrine pancreas have been published to date, likely due in part to the difficulty of obtaining pure islets tissue and sorted endocrine cells. Genome-wide analysis of open chromatin regions in human pancreatic islets was performed utilizing the formaldehyde-assisted isolation of regulatory elements with high-throughput sequencing (FAIRE-Seq) approach . Approximately 80,000 regions were identified and after comparison to open chromatin sites of non-islet human cell-lines, about 3,300 were implicated as islet-specific, one of which contained a TCF7L2 variant, which is linked to type 2 diabetes . Bhandare and colleagues performed ChIP-Seq analysis on human pancreatic islets and provided the first genome-wide histone modification map for the H3K4me1, H3K4me2, H3K4me3, and H3K27me3 marks in human pancreatic islets . Comparison with H3K4me3 levels in CD4+ T-cells  revealed that a group of promoters exhibited tissue-specific enrichment for histone modification marks. Interestingly, a set of developmental regulators showed bivalent marks, previously thought to be characteristic of pluripotent and not differentiated cells. Furthermore, H3K4me3 enrichment levels did not always correlate with gene expression levels, but were dependent upon the specific promoter structure. While promoters with CpG islands exhibited the expected loose correlation of H3K4me3 enrichment with mRNA expression, promoters without CpG islands, such as those for Insulin and Glucagon, were not marked strongly by H3K4me3 even though they are very active in islet cells. Future studies on sorted endocrine cell populations will undoubtedly increase the resolution and power of these epigenetic analyses.
Genome-wide analysis of H3K4me1, H3K4me3, and H3K79me2 marked histones and CTCF binding sites was performed in human pancreatic islets identifying ~18,000 putative promoters, approximately 30% of which were not previously annotated and some of which were islet-specific . This study confirmed earlier findings indicating that promoters of highly-transcribed genes, such as the Insulin and Glucagon, were not strongly marked by the standard activating histone modifications, suggesting alternative activating mechanisms .
Gene expression and histone modification (H3K4me3 and H3K27me3) profiles of different mouse tissue types (including neural, adipose, and liver) as well as pancreatic β and acinar cells, were determined by microarray (Affymetrix Mouse Genome 430 2.0 Array) and ChIP-Seq analyses . Interestingly, the global gene expression and active H3K4me3 profiles of pancreatic β cells clustered with the profiles of neural tissues, and not with the acinar cell profiles. However, the repressive H3K27me3 profile of β cells was most similar to acinar cells, pancreatic progenitor cells, and liver tissue, which is reflective of their common embryonic origin. In concordance with one of the epigenetic studies on human pancreatic islets , bivalent marks were also observed in genes of differentiated tissues although they were more common in pluripotent cells . These epigenetic studies provide genome-wide histone modification profiles of human and mouse pancreatic tissue. They present interesting findings on the comparison of these profiles with gene expression levels and with the epigenetic landscape of other tissues, and highlight that it is crucial to investigate a combination of histone modifications and other transcription factor and coregulator binding events in order to fully elucidate the molecular mechanisms directing cell differentiation and other vital cell functions.
In summary, functional genomics technologies have complemented gene loss-of-function and gain-of-function experiments throughout the last decade and have greatly improved our understanding of the transcriptional networks governing α and β cell differentiation and pancreatic organogenesis and have helped to refine the in vitro differentiation protocols that are being deployed to coax hESC toward the β cell fate (Fig. 2). Development of the more sensitive RNASeq method and the ability to sequence RNA obtained from only a few hundred cells, and in the future possibly from single cells, will further increase this understanding and allow for global gene expression analysis even at the earliest developmental stages. In addition, these technical improvements will enable researchers to address the critical question of whether α and β cells are homogeneous or composed of distinct subpopulations. The combination of precise gene expression profiles and detailed histone modification maps will provide insight into the molecular biology of distinct endocrine cell populations at early and late developmental stages and enhance our understanding of endocrine pancreatic organogenesis.
We thank Amber Riblett for careful copy-editing of the manuscript. Related work in the Kaestner lab has been supported through NIH grants DK088383, DK055342, and DK089529.