|Home | About | Journals | Submit | Contact Us | Français|
The evolutionary conservation of transcriptional mechanisms has been widely exploited to understand human biology and disease. Recent findings, however, unexpectedly showed that the transcriptional regulators hepatocyte nuclear factor (HNF)-1α and -4α rarely bind to the same genes in mice and humans, leading to the proposal that tissue-specific transcriptional regulation has undergone extensive divergence in the two species. Such observations have major implications for the use of mouse models to understand HNF-1α– and HNF-4α–deficient diabetes. However, the significance of studies that assess binding without considering regulatory function is poorly understood.
We compared previously reported mouse and human HNF-1α and HNF-4α binding studies with independent binding experiments. We also integrated binding studies with mouse and human loss-of-function gene expression datasets.
First, we confirmed the existence of species-specific HNF-1α and -4α binding, yet observed incomplete detection of binding in the different datasets, causing an underestimation of binding conservation. Second, only a minor fraction of HNF-1α– and HNF-4α–bound genes were downregulated in the absence of these regulators. This subset of functional targets did not show evidence for evolutionary divergence of binding or binding sequence motifs. Finally, we observed differences between conserved and species-specific binding properties. For example, conserved binding was more frequently located near transcriptional start sites and was more likely to involve multiple binding events in the same gene.
Despite evolutionary changes in binding, essential direct transcriptional functions of HNF-1α and -4α are largely conserved between mice and humans.
Changes in gene transcription are central for evolution (1,2). At the same time, the conservation of a large body of gene regulatory mechanisms has enabled the use of genetic models and comparative genomics to provide a wealth of insights into the role of gene regulation in human biology and disease (3–7).
Recent studies have challenged preconceived ideas concerning the extent of conservation of gene regulation. A systematic comparison of ~4,000 orthologous genes showed that the transcription factors hepatocyte nuclear factor (HNF)-1α, HNF-4α, FOXA2 (forkhead box A2), and HNF-6 frequently bind to different genes in mice and humans, leading to the conclusion that tissue-specific transcriptional regulation has significantly diverged across these two species (8). An analogous striking divergence of regulator binding sites has been observed across related yeast species (9). Such results have major implications for human disease. For example, of all mouse genes bound by HNF-1α, a regulator encoded by the most frequently mutated gene in human monogenic diabetes (MODY3) (10), only 20% showed binding to human orthologs (8). This finding questions the value of mouse models of human MODY3 (maturity-onset diabetes of the young 3). By extension, this notion affects other diseases caused by defects in genes encoding for transcriptional regulators, including several susceptibility variants recently implicated in type 2 diabetes (11–13).
The significance of such observations, however, is uncertain, because many genomic binding events could be functionally dispensable. Only essential functions of regulators are expected to be under strong evolutionary constraints. Essential regulatory functions are also the most relevant to the phenotypic consequences of human disease. We have now assessed the conservation of HNF-1α and -4α binding in genes where we could document that these regulators are required for transcription. In contrast to the previous global comparative study (8), our results reveal a high conservation of the essential functions of HNF-1α and -4α in mice and humans.
Mouse gene expression datasets from Hnf1a- and Hnf4α-deficient liver are available in ArrayExpress (accession numbers: E-MEXP-1733 and E-MEXP-1709, respectively). A more comprehensive analysis of the Hnf1a-deficient expression datasets is reported elsewhere (14). Briefly, Affymetrix Mouse Genome 430 2.0 arrays were used for the comparison of RNA from liver from C57BL6/J Hnf1a−/− and wild-type 4-week-old male mice (14), or from liver-specific Hnf4α deletion (albumin Cre+/− / Hnf4 fl/fl) and wild-type controls. Hnf1a−/− and albumin Cre+/− / Hnf4 fl/fl mouse models have been previously described (15,16). Affymetrix expression data were normalized with RMA, and the LIMMA package was used for statistical analysis to identify downregulated genes in triplicate hybridizations using an adjusted P value <0.05. For genes with multiple probes, we selected a single most informative probe showing the lowest P value in mutant/wild-type comparisons. For human expression studies, we used the results of a published microarray analysis of human hepatocellular adenomas with biallelic HNF-1α mutations, and we used the entire set of genes that were downregulated relative to normal tissue as listed in the supplementary data of the report by Rebouissou et al. (17). To relate expression ratios of bound genes versus all genes, we reprocessed the published human hepatocellular adenoma and control tissue HG-U133A Affymetrix chip dataset (GEO GSE7473) with RMA using identical conditions as for the mouse chip datasets.
We used the genomic binding datasets in human hepatocytes and mouse liver genes reported by Odom et al. (8). Unless otherwise stated, we used the default (P < 0.01) criteria based on the JBD (joint binding deconvolution) algorithm that was reported in that study to select bound genes (8). Analogous results were obtained with the alternate binding criteria that were presented in the same study (8).
To assess independent binding datasets, we used mouse hepatocyte HNF-1α and HNF-4α ChIP/chip experiments obtained with β-Cell Biology Consortium (BCBC) promoter arrays. A more detailed description of BCBC HNF-1α binding studies is described elsewhere (14). Data for BCBC HNF-1α and -4α binding studies are available in Arrayexpress (accession numbers E-MEXP-1714 and E-MEXP-1730, respectively). Briefly, freshly isolated mouse hepatocytes were used for chromatin immunoprecipitation as described (18,19). After reverse cross-linking, immunoprecipitated DNA was amplified with ligation-mediated PCR and used for hybridization of BCBC promoter microarrays. For HNF-1α, we used version BCBC 5A0, and for HNF-4α we used version BCBC 5A1. Six microarrays were used for each antibody with dye swapping. Normalized data were analyzed with the LIMMA package. Unless otherwise stated we used a stringent threshold to define genes as bound (P < 0.001 and Log2 immunoprecipitate/input binding ratios/M >0.8), although alternate ratios ranging from M >0.3 to 1 did not alter the conclusions. Control experiments with IgG showed negligible binding with these criteria. We used antibodies SC-6556 for HNF-4α and SC-8986 for HNF-1 (Santa Cruz Biotechnology). The HNF-1 antibody cross-reacts with HNF-1β. However, in our experience the low abundance of HNF-1β in wild-type hepatocytes is insufficient to elicit detectable binding when using an HNF-1β–specific antibody that shows robust enrichment in experimental conditions in which HNF-1β is induced (14). Thus, HNF-1β cross-reactivity in our studies was negligible.
Of the 4,022 genes reported by Odom et al. (8), we matched 3,665 genes to probes represented in the Affymetrix Mouse Genome 430 2.0 arrays based on either identical Refseq or mouse gene symbols linked to the Refseqs; in the latter instance, we verified genomic positions of Refseq and gene symbols to eliminate errors caused by equivocal nomenclature. An analogous approach was used for matching other gene sets described in this analysis. A compilation of the gene expression and binding findings can be found in an online appendix, available at http://diabetes.diabetesjournals.org/cgi/content/full/db08-0812/DC1.
We extracted 5′ flanking sequences (−500 to +1 bp) from mouse (mm8 assembly) and human (hg17 assembly) genomes based on annotations from Ensembl release 49. After the recovery of sequences in one species, we extracted the aligned sequence in the other species based on the multiple genome alignments from the University of California at Santa Cruz using the Galaxy platform (20). We considered the latter sequence as the putative orthologous promoter if at least 50% of the nucleotides aligned. We then scanned sequences with the HNF-1α (M00132) matrices from Transfac Professional using Patser (21). We considered hits above a threshold of 90% of the matrix score range, which corresponds to high-affinity HNF-1α binding sequences (22).
Statistical significance was calculated with two-sided Fisher's exact test, or by testing the hypergeometric distribution as stated. To assess whether HNF-1α binding enrichment among downregulated genes differed in mouse versus human samples, we used binary logistic regression implemented with SPSS 14.0.2.
Microarray data presented in this article have been deposited in ArrayExpress (http://www.ebi.ac.uk) under the accession numbers E-MEXP-1733, E-MEXP-1709, E-MEXP-1714, and E-MEXP-1730.
We first integrated the mouse and human liver HNF-1α binding results reported in a systematic comparison of ~4,000 orthologous genes (8) with gene expression studies in HNF-1α–deficient mouse and human tissues. We studied expression profiles from Hnf1a−/− versus wild-type mouse liver and from a previously reported study comparing gene expression in human hepatocellular adenomas carrying biallelic mutations of HNF1A versus control tissue (17). The results showed that most genes bound by HNF-1α in mouse or human chromatin did not exhibit changes in gene expression in HNF-1α–deficient mouse and human tissues (Fig. 1).
The reasons for the lack of perturbation of many HNF-1α–bound genes in cells lacking HNF-1α are currently unknown (see discussion). However, for a subset of HNF-1α–bound genes, we could clearly ascertain that HNF-1α plays an essential regulatory role in liver because they showed significant downregulation in the loss-of-function models. HNF-1α binding frequency was significantly enriched 2.7-fold in genes that were downregulated in Hnf1a−/− liver (P < 0.0001) and 4.9-fold in human genes downregulated in HNF1A-deficient tumors (P < 0.0001). This enrichment reflects the essential transactivating function of HNF-1α in a subset of its direct targets.
We next assessed HNF-1α binding conservation specifically in the subset of genes where the mouse and human expression studies could document that HNF-1α function is essential (Fig. 2A and E). Of note, throughout this analysis we focused on binding conservation in gene orthologs irrespective of whether this occurred in precisely aligned sequences because it is thought that regulatory functions can be conserved through compensatory sequence changes (8,23,24). Only 17% of HNF-1α–bound mouse genes that were not downregulated in Hnf1a−/− mice showed conserved binding in human orthologs, as opposed to 46% of downregulated targets (P < 0.0001) (Fig. 2B). We estimated that binding was conserved in as many as 65% of the genes that accounted for the increase in binding frequency among HNF-1α–dependent genes. Similarly, HNF-1α binding was conserved in only 15% of cases among genes that were not downregulated in HNF1A-deficient tumors, in contrast to 43% conservation of downregulated targets (P < 0.0001) (Fig. 2F). Thus, HNF-1α binding exhibits much greater human-mouse conservation in genes in which it is essential for transcription.
Even among target genes where HNF-1α was functionally essential, binding was not conserved in all cases (Fig. 2B and F). However, the extent to which this reflects true species-specific regulation or the effect of experimental variables is uncertain. Significant false-negative and false-positive binding results in both species can theoretically lead to a marked overestimation of binding divergence. This notion is important because even in optimized chromatin immunoprecipitation microarray (ChIP-chip) protocols, the reported false-negative rate is >20% (25,26).
To provide an independent test of HNF-1α binding accuracy, we compared data published by Odom et al. (8), based on Agilent 10-Kb tiles surrounding transcription start sites, with another mouse liver HNF-1α binding experiment based on BCBC promoter arrays containing 1- to 2-Kb PCR product tiles. Despite major platform and analytical differences, there was a considerable overlap of targets (Fig. 3A). This analysis also confirmed species-specific binding because HNF-1α binding in mouse BCBC arrays showed a higher overlap with mouse-specific rather than human-specific binding events (Fig. 3A and B).
We furthermore observed that binding in mouse BCBC arrays overlapped disproportionately with the conserved subset of mouse Agilent targets, in contrast to mouse-specific Agilent targets (Fig. 3B). This could result from false-positive mouse-specific events and/or, as discussed below, if species-specific events have distinct properties that are captured less efficiently by the BCBC platform.
Importantly, several HNF-1α targets classified as human-specific in the report by Odom et al. (8) were strongly bound in mouse BCBC arrays (Fig. 3A), and up to 26–37% were bound in mouse chromatin at less stringent thresholds (Fig. 3C and D). This demonstrates false-negative binding in ChIP-chip studies and indicates that overlaps of lists of bound genes from different species do not provide an unequivocal measure of HNF-1α binding conservation.
Other factors can overestimate binding divergence and were not tested, yet they remain plausible. This includes the extremely different experimental conditions inherent to the mouse-human binding comparison, and the likelihood that in at least some instances, regulator binding selectively relocates in one species to a region that is not interrogated in array platforms. Thus, documented and presumed factors can collectively lead to an overestimation of the interspecific binding divergence.
To overcome the nonexhaustive nature of binding conservation estimates, we undertook an alternate analytical approach that does not make assumptions about the completeness of binding detection. The increased frequency with which HNF-1α binds to genes that are downregulated in HNF-1α deficiency, compared with nonregulated genes, provides a measure of the direct essential function of HNF-1α within those genes. It follows that if the function of HNF-1α is conserved in only ~20% of its target genes, as implied in the study by Odom et al. (8), then the enrichment of HNF-1α binding events that is observed in the HNF-1α–dependent gene set from one species should be diluted in the gene set that is composed of orthologous genes from the other species. The results failed to show differences in binding enrichment between genes that are shown to be downregulated in HNF-1α deficiency and their orthologs (Fig. 2C and G). Thus, human orthologs of the gene set that was downregulated in Hnf1a−/− mice had a similar increase in HNF-1α binding frequency as the regulated mouse gene set, and the same occurred for mouse orthologs of genes that are HNF-1α–dependent in human tissues (Fig. 2C and G).
Further inspection of regulated genes revealed a remarkable enrichment of conserved binding events (Fig. 2D and H). A more moderate enrichment of species-specific binding was also observed (Fig. 2D and H). However, this was not restricted to the species where regulation was observed (as would be expected if it reflected species-specific regulation). For example, human-specific binding was paradoxically enriched in genes that were HNF-1α–dependent in mouse liver (Fig. 2D). This is consistent with the incomplete detection of binding outlined above (Fig. 2D and H). Taken together, these findings fail to detect evidence for major human-mouse divergence of functionally essential HNF-1α binding events.
The analysis of HNF-1α binding was focused on a large but incomplete subset of genes. To provide an independent confirmation of the binding studies, we analyzed computational high-affinity HNF-1α binding sequence motifs (22). We did not assess the degree of conservation of precisely aligned motifs because its significance may be obscured by the high degree of interspecies binding site turnover (factor A binds to gene X in both species, but in different regions) (8,23,24). We therefore studied the conservation of HNF1 motif enrichment among HNF-1α–dependent genes. HNF1 motifs were enriched 11.5- and 6-fold in the immediate 5′ flanking regions of experimentally defined mouse and human HNF-1α–dependent genes, respectively (Fig. 4). We thus used the enrichment of HNF1 motifs in regulated genes as a surrogate quantitative measure of direct HNF-1α functional effects within such genes. In analogy to the binding analysis, we asked whether the enrichment of HNF1 motifs was absent or markedly decreased in promoter regions of orthologs of HNF-1α–dependent genes, as predicted from the hypothesis that HNF-1α function has undergone a major evolutionary divergence. The results showed that high-affinity HNF-1α binding motifs were highly enriched in human orthologs of genes that showed HNF-1α dependence in mice (albeit at a marginally lower rate than the mouse orthologs) and in mouse orthologs of genes that showed HNF-1α dependence in human tumors (Fig. 4). This finding further supports that a substantial fraction of functional HNF-1α targets is conserved in mice and humans.
We also studied HNF-4α, another regulator involved in human diabetes (27). In analogy to HNF-1α, most HNF-4α–bound genes were not perturbed in Hnf4a-deficient liver (Fig. 1). Among the subset of genes that did show decreased expression in Hnf4a-deficient liver, a similar number was bound by HNF-4α in mice and humans, in contrast to the expectation if these genes were selectively regulated in mice (Fig. 5A). The overall conservation of mouse HNF-4α binding was in reality quite high: even among nonregulated mouse genes there was 58% conservation, and this increased to 66% in Hnf4a–dependent genes (Fig. 5B). The true extent of conservation is likely to be higher because several genes classified as human-specific targets were also bound in mice in an independent experiment (Fig. 5C). This analysis therefore also failed to support an extensive divergence of HNF-4α function across mice and humans.
Because conserved and species-specific binding showed different functional properties, we predicted that they should also differ in other properties. We studied binding multiplicity and observed that conserved HNF-1α and -4α targets were more frequently bound at multiple sites on the same gene, as compared with genes that were bound in a species-specific manner (Fig. 6A). Interestingly, HNF-4α dependence strongly correlated with HNF-4α binding multiplicity, suggesting that this may represent a critical attribute of functional HNF-4α binding (Fig. 6B). Conserved binding was also more likely to be located in proximal promoter regions than species-specific binding (Fig. 6C). Because BCBC arrays are built with large proximal PCR fragments rather than oligonucleotide tiles, these two properties could theoretically partly explain the abovementioned differential detection of conserved events by the two platforms. The data presented by Odom et al. (8) also indicate that genes with conserved HNF-1α binding were twice as likely to contain a canonical HNF1 sequence motif. Collectively, these findings showed that conserved and nonconserved binding events may differ not only in functionality, but also in location, multiplicity, and binding site sequence.
The results presented here are consistent with a recent report indicating that HNF-1α and -4α binding has undergone evolutionary divergence across mice and humans (8), yet they qualify this information in two critically important ways. First, the data suggest that current large-scale binding assays overestimate the evolutionary divergence of transcription factor binding. Second, and more importantly, we show that binding to gene targets where HNF-1α and -4α exert essential functions is considerably conserved between mice and humans.
Our analysis rests on the observation that only a small portion of HNF-1α and -4α binding events are affected in loss-of-function studies. This result is striking, but entirely consistent with several recent studies that compared gene expression models with binding patterns for Oct4, Nanog, glucocorticoid receptor, and p63 (28–30). This is central to our analysis because high evolutionary conservation is not expected among binding events that are not functionally essential. Consistent with this prediction, we observed that binding conservation was markedly dependent on the gene expression phenotype in loss-of-function studies.
There are several likely causes for the lack of functional dependence on HNF-1α and -4α for numerous direct targets of these factors. First, HNF-4α or -1α are expected to be dispensable in many bound genes because of redundant regulatory factors. Second, in an undetermined number of genes, binding could simply have limited functional consequences, as recently proposed for many binding sites of several Drosophila regulators (31). On the other hand, some bound genes with unperturbed expression may be dependent on HNF-1α or -4α only in specific physiological or developmental settings. For example, functional dependence of HNF-1α– or HNF-4α–bound genes is highly tissue specific, although most bound genes show no changes in gene expression in either liver or pancreatic islets of mice lacking these factors (J.M.S., S.F.B., J.F., unpublished observations). Even though some unperturbed targets in null mutant cells are likely to be truly functionally dependent on HNF-1α or -4α in other settings, the observed differences in binding conservation between perturbed or unperturbed genes suggests that this classification is largely correct. In fact, we predict that binding conservation differences between gene expression classes would be larger if all functionally significant targets were correctly classified.
Our results highlight that the comparison of two incomplete binding datasets from different species can lead to an overestimation of evolutionary divergence. One expected cause of incomplete detection is the high false-negative rate in ChIP-chip (25,26). In part, this is because it relies on the en masse amplification of thousands of DNA templates and unavoidably results in poor amplification of a subset of sequences in each of the two species. Failure to detect binding conservation can also result from transcription factors binding outside of the interrogated regions in only one species. Furthermore, extreme differences in experimental conditions in the two species can differentially affect the binding measurements. This includes differences in age, leanness, nutritional status, recent exposure to drug therapies, cause of death, and use of cultured cells versus freshly isolated tissue in the mouse models and human organ donors (8).
To circumvent the limitation that current assays do not capture all binding events, we studied the extent to which the increased frequency with which HNF-1α binds to HNF-1α–regulated genes in one species is conserved in orthologous genes. Because HNF-1α has a complex well-characterized DNA binding sequence motif (22), we also studied whether the enrichment of high-affinity HNF1 motifs is conserved among regulated ortholog pairs. Both comparisons independently tested the hypothesis that functional binding is divergent between mice and humans. Neither approach makes assumptions about the fraction of binding events that are detected, or the extent of turnover of evolutionary conserved transcription factor binding sites. For both experimental and computational sites, we observed no evidence to support an evolutionary divergence of functional HNF-1α binding between mice and humans.
Taken together, these results suggest that functionally important binding events exhibit a much stronger evolutionary conservation than anticipated from studies that only measure the conservation of binding. Similar conclusions were drawn in a recent study that related binding of muscle regulators with the conservation of bound sequences in 12 Drosophila genomes (32). That study concluded that binding to conserved sequences was more likely to be biologically significant because it occurred more frequently in the proximity of muscle genes than binding events occurring in nonconserved sequences (32).
We expect that the degree of conservation will vary for different regulators, depending on the nature of the cellular functions they regulate. Comparative studies using accurate genome-wide sequencing approaches are warranted to fully understand the evolutionary conservation of different regulators, but, importantly, such studies should not be restricted to assaying genomic occupancy.
Our findings also showed that compared with conserved binding, species-specific binding events differed not only in function, but also in several binding properties. This suggests that a subset of species-specific binding events could be fundamentally distinct from conserved, functionally relevant binding events. We speculate that such species-specific binding events may be less exposed to evolutionary pressure, but they could be instrumental in the acquisition of new functions.
Recent data proposing that transcriptional regulation has diverged between mice and humans questioned the value of mouse genetic models (8). Our findings therefore have important implications for the use of mouse models of human monogenic diabetes and more generally for the use of animal models and comparative genomics to understand transcriptional regulation and human disease.
This work was funded by the Ministerio de Educación y Ciencia and the E.U. VI Framework program. J.M.S. was supported by the Ramon y Cajal Programme.
No potential conflicts of interest relevant to this article were reported.
We thank the Instituto Nacional de Bioinformatica de Genoma España for support, Frank Gonzalez (National Cancer Institute) for Hnf1a mice, Jose Antonio Rios for statistical advice, Duncan Odom for helpful insights, Natalia del Pozo for animal assistance, Thien Vu Manh for initial database development, and Pedro Jares (Institut d'Investigacions Biomèdiques August Pi i Sunyer) and Lauro Sumoy (Centre de Regulació Genòmica) for microarray hybridizations and processing.
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.