Comparative analysis of genomes is a major tool for the identification of regulatory elements
[2]. In this context, a comparative analysis of the human and mouse orthologous regions spanning the SRO revealed 25 CNCs with an average length of 165 bp and average homology of 82.5% (). These identified CNCs were an important focus here. We included 57 patients with a diagnosis of BPES who tested negative for intragenic mutations and copy number changes of
FOXL2. First, these patients were screened for copy number changes outside
FOXL2, with special interest for the initial SRO region of upstream deletions. This was carried out by one or a combination of the following assays: microsatellite analysis, arrayCGH and two qPCR assays called qPCR-3q23 () and qPCR-CNC () respectively. The use of different techniques can be explained by the availability of more convenient techniques in the course of the study. In a second step, the remaining negative patients were specifically screened for sequence variants of CNCs within the initial SRO (). In addition, functional analyses (i.e. luciferase assays) were performed for wild type and variant CNCs in different cellular systems. Finally, the chromosome conformation of the
FOXL2 locus was investigated by 3C.
| Table 1Mapped CNCs within the initial SRO. |
ArrayCGH revealed 1 novel extragenic deletion 5′ to
FOXL2 which was further delineated by qPCR-CNC at the centromeric end (Deletion A) ( and ). In addition, qPCR-3q23 with 3 amplicons located in the SRO revealed 3 more novel extragenic deletions (Deletion B–D). Deletion B and C, both encompassing all 3 amplicons and identified in typical BPES patients, were subsequently further delineated using additional amplicons (). Deletions B and C were found to be 190 kb–478 kb and 1.12 Mb–2.3 Mb in size respectively (). Deletions A, B and C can be added to the previously described relatively large deletions 5′ to
FOXL2, which were believed to be pathogenic through the deletion of
cis-regulatory elements
[10]. Here, the
de novo occurrence could be assessed for deletion C for which parental DNA was available.
Most remarkable, however, was the identification of the very subtle deletion D, which encompassed only 1 amplicon. Deletion D could be mapped to a region of minimum 6 kb and maximum 12.5 kb in size using qPCR-CNC. Subsequent long-range PCR and direct sequencing of the junction PCR fragment allowed us to define its extact size (7358 bp) and location (chr3:140,431,841-140,439,199), being 283 kb upstream of FOXL2 (). This deletion is entirely retained within the previously described SRO of 126 kb and thus defines a drastically reduced SRO. Furthermore, segregation analysis suggested a de novo occurrence of this small deletion, sustaining its pathogenic potential. Despite its small size, the deletion is presumed to lead to a classic BPES phenotype in a 7-year-old sporadic male.
The observation that all known and novel regulatory deletions do not show recurrent breakpoint regions argues against non-allelic homologous recombination (NAHR) as a possible mechanism underlying this subtle deletion
[24]. Other models such as non-homologous end joining (NHEJ) or Forkhead Stalling and end Switching (FoSTeS) might explain the formation of the deletion, although there is no scar at the junction fragment
[24]. To unravel the mechanism responsible for this deletion, bioinformatics analyses of the breakpoint junctions was performed. A 70 bp ClustalW alignment of the abnormal junction sequence with the reference genomic sequence from both breakpoint regions, did not reveal any significant homologies, although there is some minor sequence similarity. Similarly, BLAST2 analysis of the 2 kb breakpoint regions did not reveal any significant similarities either. Analysis with RepeatMasker indicated a 36-bp low complexity region at the centromeric end of the deletion, but no additional repetitive elements. At the telomeric end it revealed a LINE2 repeat in very close proximity of the breakpoint and at a larger distance a 25-bp simple repeat and a 123-bp low complexity region. Tandem repeats and palindromes were excluded in a region of 300 bp around the breakpoints using Mreps and Palindrome. In addition, the GC content of a 1-Mb region around the 7.4 deletion appeared not to be above average. Interestingly, with DNA Pattern Finder three motifs, known to be implicated in DNA rearrangements elsewhere, were identified in 70-bp regions surrounding the breakpoints, including one of the immunoglobulin heavy chain class switch repeats (
GGGCT), a deletion hotspot consensus site (
TG[AG][AG][GT][AC]) and a DNA polymerase α pause site core sequence
GC [GC]. It cannot be excluded, however, that the occurrence of these motifs is coincidental. Manual inspection of the breakpoint regions and the junction fragment revealed a mirror repeat at the telomeric breakpoint. Such mirror repeats have the capacity to form noncanonical, three stranded structures referred to as H-DNA, being one of the non-B DNA structures
[25]. H-DNA-forming sequences have previously been identified in regions that are prone to genomic rearrangements
[25]–
[28]. Interestingly, the pentanucleotide motif present in this mirror repeat is also seen on the reverse strand at the centromeric end of the deletion (). We thus hypothesize that a double-stranded break (DSB) at the telomeric side triggered the deletion, followed by a DSB repair mechanism guided by the formation of a knot loop between the reverse complement of the pentanucleotide motif at the centromeric end ().
The drastically reduced SRO contains 8 out of 25 CNCs identified in the initial SRO. Moreover, 4 out of 8 are conserved up to chicken, adding weight to an assumed functional role (). According to several miRNA databases the reduced SRO does not contain any miRNAs. We also investigated the regulatory potential of the deleted region using regulatory tracks. Based on the currently available data, the region is devoid of CpG islands, transcription start sites, conserved transcription factor binding sites, miRNA regulatory sites, VISTA enhancers, regulatory elements from OregAnno,
DNaseI hypersensitivity sites and CTCF binding sites. While the new SRO is devoid of known human genes, it does contain 4 human ESTs. Three are unspliced ESTs from two testis cDNA libraries, sharing a common telomeric end position (). BLASTn analysis with EST AI204197 as query sequence retrieved 51 hits, including a significant alignment with
Capra hircus PISRT1 mRNA and
Mus musculus Pisrt1 partial mRNA sequence.
PISRT1 is one of the genes affected by the causal PIS deletion in goat. The PIS goat is the only known natural animal model for BPES associated with absence of horns (polledness) and intersexuality, caused by a regulatory 11.7 kb deletion located 280 kb upstream of goat
FOXL2. It was shown that the deletion does not contain, but alters the transcription of at least three genes:
FOXL2, the non-protein coding gene
PISRT1, and
PFOXic [14],
[15]. Pailhoux et al. (2001) suggested that the PIS deletion harboured elements involved in long-range
cis-regulation of goat
FOXL2 and
PISRT1, as the expression of both genes is affected by the deletion
[14]. This was further supported by our previous findings, revealing that the initial 126 kb SRO 5′ to
FOXL2 contains the PIS locus
[10]. Here, the 7.4 kb deletion proved to contain the
PISRT1 orthologue, but not the PIS deletion (). This suggests the existence of distinct interspecies
cis-regulatory elements, which have similar effects when disrupted. Caprine
PISRT1 encodes a long non-coding transcript (ncRNA) of 1.5 kb that is highly expressed in adult testis
[13]. A full-length cDNA of 758 bp was identified by 5′ RACE PCR starting from the known testis ESTs containing a polyadenylation site (
Figure S1). These findings confirm its expression in human testis. In addition, no expression could be detected in fibroblasts, while a low
PISRT1 expression could be observed in KGN cells, indicating a co-expression of
PISRT1 and
FOXL2 in adult ovarian granulosa cells. These findings are consistent with expression profiles in goat and mice
[13]. The latter is in line with a presumed regulatory function of
PISRT1, requiring a tissue and cell-type specific expression.
Apart from copy number analyses, the remaining negative patients were specifically screened for sequence variants of CNCs within the initial SRO (). To date, there are only a few human phenotypes found to be associated with sequence variations within
cis-regulators
[19],
[20],
[29],
[30]. In this study, we identified 15 single nucleotide substitutions within CNCs or in flanking nucleotides and a 4-bp deletion mapping immediately upstream of CNC14 (
Table S1). Only 3 nucleotide substitutions were found in BPES patients exclusively. However, no parents were available of these particular patients for segregation analysis (
Table S1). Moreover, computational transcription factor binding site (TFBS) prediction on any of the wild type and variant CNCs did not support the creation or abolition of a TFBS.
Although comparative sequence analysis has been proven to be a powerful approach to identify regulatory elements, experimental studies are required to confirm their role in gene regulation. The ability to modulate expression of a linked minimal promoter element in transient cell transfections is a widely exploited
in vitro test of
cis-regulatory potential
[31]. Thus, for 24/25 CNC identified in the original SRO,
in vitro luciferase assays were conducted (CNC19 could not be cloned). In both the KGN and 293T cell line, 29% (7/24) of the tested CNCs showed a significant difference in luciferase activity compared to the basal activity of the vector itself (T-test, P value<0.05) (). Interestingly, cell-type specific regulatory potential could be observed among the constructs tested, three of which map within the 7.4 kb reduced SRO (CNC14, CNC5 and CNC15). This cell type specific regulatory activity supports that at least a fraction of the tested CNCs might be involved in the tissue-specific expression of
FOXL2. We also addressed the putative functional impact of the identified nucleotide variants, but did not detect significant effects. A small quantitative and tissue-specific
cis-regulatory effect of an individual CNC variation cannot be ruled out however. These results suggest that sequence variations within individual CNCs do not directly contribute to the molecular pathogenesis of BPES in our study.
As an additional experimental tool, 3C was conducted for a large region of 625 kb flanking the
FOXL2 gene. Using 3C, physical interactions between regulatory elements and their target genes can be demonstrated
[32]. In the
FOXL2 expressing KGN cell line, the
FOXL2 core promoter containing
EcoRI fragment 58 proved to come in close vicinity to
EcoRI restriction fragments 109, 133 and 158, located 177, 283 and 360 kb upstream of
FOXL2 respectively (
Figure S2). Moreover, an identical but lower interaction profile was detected in expressing fibroblast cells from a normal individual (F2) (
Figure S2). These data demonstrate that in the nucleus of expressing cells, the promoter region of the
FOXL2 gene interacts with three long-distance
cis-regulatory sequences.
To validate mutual interactions between these three regulatory regions, 3C was performed in EBV, KGN and F2 cells with fragments 109, 133 and 158 respectively as anchor fragments in a second step (). It was found that in expressing cells, all three distant sequences mutually interact and contact the
FOXL2 core promoter, assuming that the intervening DNA loops out. Interestingly, fragment 133 contains the 7.4 kb fragment that is deleted in deletion D ( and ; ). To investigate the consequences of a heterozygous deletion of interacting fragment 133 on the interaction profile of the
FOXL2 locus, we analysed the mutual interactions of these three fragments in a fibroblast cell line F1, obtained from a BPES patient carrying an upstream deletion defining the initial SRO
[10]. As a control, we used the fibroblast cell line F2. Interactions of the promoter with the two elements 109 and 158 that are not located within the deletion are not reduced. Thus, this suggests that even on the deleted chromosome these elements can interact with the
FOXL2 promotor despite the absence of fragment 133. Furthermore, fragments 109 and 158 appear to mutually interact even in the absence of 133. The upstream deletion disrupts fragment 133 within the reduced SRO, and causes a BPES phenotype. The latter might lead to the conclusion that the retained interactions between fragments 109 and 158 and the
FOXL2 core promoter are not sufficient to correctly regulate
FOXL2 transcription in the adult expressing cell system studied here. Moreover, it implies that the interactions between the
cis-regulatory element(s) located in fragment 133 and the
FOXL2 core promoter are essential for this.
General conclusions and perspectives
We identified a
de novo distant 7.4-kb deletion that is causally related to BPES. To our knowledge, this is the smallest fully characterized distant deletion implicated in the causation of a human genetic condition (
Table S2). This deletion disrupts a long ncRNA
PISRT1 and 8 CNCs, 4 of which are conserved up to chicken. Functional assays suggest a
cis-regulatory and tissue-specific potential of 3 of them. The biological relevance of these findings was corroborated by the 3C study of a normal and aberrant
FOXL2 locus in expressing adult cellular systems respectively, demonstrating a close proximity of the 7.4 kb deleted fragment and two other conserved regions with the
FOXL2 core promoter, and the necessity of the integrity of the regulatory domain for correct
FOXL2 expression.
Altogether, we identified and characterized a novel tissue-specific cis-regulatory domain of FOXL2 expression. As we demonstrated the consequences of its disruption, our findings impact mutation screening of strictly regulated developmental and other disease genes. Specifically, our study emphasizes the need for high-resolution copy number screening of their cis-regulatory domains. Genome-wide tools such as oligonucleotide or SNP arrays and next-generation sequencing will play a prominent role in this. In addition, a well-selected patient population is another requirement, as illustrated here: (1) we only included patients with a diagnosis of BPES, a clinically distinguishable but rare disorder, and (2) they all underwent a uniform pre-screening excluding intragenic FOXL2 mutations and gene deletions.
Sequence variations within individual CNCs did not contribute to the molecular pathogenesis of BPES in our study. This can be explained by the fact that sequence changes within individual CNCs might result in a more subtle, different or even normal phenotype, as the
cis-regulatory elements they represent might act in a tissue-specific and quantitative manner
[5],
[6],
[19],
[33]. The most striking example of the latter is the differential phenotype caused by point mutations in
SHH and in its limb-specific enhancer ZRS of
SHH, resulting in holoprosencephaly type III (HPE3) (OMIM 142945) and PPD respectively
[20],
[34].
Other mechanisms may explain the phenotype in the remaining 53 molecularly undefined BPES patients. Although there is no clear evidence for locus heterogeneity in BPES, mutations in other disease genes apart from
FOXL2 cannot be excluded in some of the remaining molecularly unresolved cases. Another possibility is the occurrence of regulatory variants within the untranslated regions (UTRs) or the core promoter. A number of non-pathogenic sequence variants have been reported in the
FOXL2 putative core promoter and untranslated regions (UTRs) up to now. However, a single basepair insertion in the
FOXL2 3′UTR was found to co-segregate with BPES in a large Chinese type II BPES family, and was shown to be located in an AU rich repeat
[35]. No functional studies were provided however to unequivocally prove a relationship between the insertion and the phenotype in this family. Interestingly, in the
FOXP3 gene (NM_014009), encoding another forkhead transcription factor, a presumed disease-causing sequence change was found in the 3′UTR within the poly(A) signal, in affected members of a five-generation family with X-linked immune dysfunction, polyendocrinopathy, enteropathy (IPEX) (MIM 304790)
[36]. The occurrence of interesting pathogenic or modifying variants in 3′UTRs is in line with their important role in the regulation of gene expression at both pre-mRNA, mature mRNA and post-transcriptional level through
cis-acting elements that interact with a variety of
trans-acting factors
[37]. This is highlighted by their many conserved sequence motifs, including microRNA (miRNA) targets
[37]. It cannot be ruled out that changes in post-transcriptional regulation by altered miRNA targeting may result in BPES. A unique example of a variant that alters the gene expression level by modifying miRNA targeting activity is a 3′UTR SNP in human
SLITRK1 (NM_052910), which is implicated in Tourette syndrome (MIM 137580)
[38].
Finally, this study considerably adds to the importance of an intact tissue-specific
cis-regulatory domain in this and other developmental disorders. This impacts upon the concept of mutation screening of developmental disease in particular, and of human genetic disease in general. In the future, online databases such as Decipher and the Database of Genomic Variants which collect information on copy number changes, might help to interpret copy number changes affecting putative regulatory regions that might lead to disease
[39],
[40].