Comparative analysis of genomes is a major tool for the identification of regulatory elements 
. In this context, a comparative analysis of the human and mouse orthologous regions spanning the SRO revealed 25 CNCs with an average length of 165 bp and average homology of 82.5% (). These identified CNCs were an important focus here. We included 57 patients with a diagnosis of BPES who tested negative for intragenic mutations and copy number changes of FOXL2
. First, these patients were screened for copy number changes outside FOXL2
, with special interest for the initial SRO region of upstream deletions. This was carried out by one or a combination of the following assays: microsatellite analysis, arrayCGH and two qPCR assays called qPCR-3q23 () and qPCR-CNC () respectively. The use of different techniques can be explained by the availability of more convenient techniques in the course of the study. In a second step, the remaining negative patients were specifically screened for sequence variants of CNCs within the initial SRO (). In addition, functional analyses (i.e. luciferase assays) were performed for wild type and variant CNCs in different cellular systems. Finally, the chromosome conformation of the FOXL2
locus was investigated by 3C.
Mapped CNCs within the initial SRO.
Human Genome Browser view of the FOXL2 region.
Human Genome Browser view of the initial and reduced SRO.
ArrayCGH revealed 1 novel extragenic deletion 5′ to FOXL2
which was further delineated by qPCR-CNC at the centromeric end (Deletion A) ( and ). In addition, qPCR-3q23 with 3 amplicons located in the SRO revealed 3 more novel extragenic deletions (Deletion B–D). Deletion B and C, both encompassing all 3 amplicons and identified in typical BPES patients, were subsequently further delineated using additional amplicons (). Deletions B and C were found to be 190 kb–478 kb and 1.12 Mb–2.3 Mb in size respectively (). Deletions A, B and C can be added to the previously described relatively large deletions 5′ to FOXL2
, which were believed to be pathogenic through the deletion of cis
-regulatory elements 
. Here, the de novo
occurrence could be assessed for deletion C for which parental DNA was available.
Most remarkable, however, was the identification of the very subtle deletion D, which encompassed only 1 amplicon. Deletion D could be mapped to a region of minimum 6 kb and maximum 12.5 kb in size using qPCR-CNC. Subsequent long-range PCR and direct sequencing of the junction PCR fragment allowed us to define its extact size (7358 bp) and location (chr3:140,431,841-140,439,199), being 283 kb upstream of FOXL2 (). This deletion is entirely retained within the previously described SRO of 126 kb and thus defines a drastically reduced SRO. Furthermore, segregation analysis suggested a de novo occurrence of this small deletion, sustaining its pathogenic potential. Despite its small size, the deletion is presumed to lead to a classic BPES phenotype in a 7-year-old sporadic male.
The observation that all known and novel regulatory deletions do not show recurrent breakpoint regions argues against non-allelic homologous recombination (NAHR) as a possible mechanism underlying this subtle deletion 
. Other models such as non-homologous end joining (NHEJ) or Forkhead Stalling and end Switching (FoSTeS) might explain the formation of the deletion, although there is no scar at the junction fragment 
. To unravel the mechanism responsible for this deletion, bioinformatics analyses of the breakpoint junctions was performed. A 70 bp ClustalW alignment of the abnormal junction sequence with the reference genomic sequence from both breakpoint regions, did not reveal any significant homologies, although there is some minor sequence similarity. Similarly, BLAST2 analysis of the 2 kb breakpoint regions did not reveal any significant similarities either. Analysis with RepeatMasker indicated a 36-bp low complexity region at the centromeric end of the deletion, but no additional repetitive elements. At the telomeric end it revealed a LINE2 repeat in very close proximity of the breakpoint and at a larger distance a 25-bp simple repeat and a 123-bp low complexity region. Tandem repeats and palindromes were excluded in a region of 300 bp around the breakpoints using Mreps and Palindrome. In addition, the GC content of a 1-Mb region around the 7.4 deletion appeared not to be above average. Interestingly, with DNA Pattern Finder three motifs, known to be implicated in DNA rearrangements elsewhere, were identified in 70-bp regions surrounding the breakpoints, including one of the immunoglobulin heavy chain class switch repeats (GGGCT
), a deletion hotspot consensus site (TG[AG][AG][GT][AC]
) and a DNA polymerase α pause site core sequence GC [GC]
. It cannot be excluded, however, that the occurrence of these motifs is coincidental. Manual inspection of the breakpoint regions and the junction fragment revealed a mirror repeat at the telomeric breakpoint. Such mirror repeats have the capacity to form noncanonical, three stranded structures referred to as H-DNA, being one of the non-B DNA structures 
. H-DNA-forming sequences have previously been identified in regions that are prone to genomic rearrangements 
. Interestingly, the pentanucleotide motif present in this mirror repeat is also seen on the reverse strand at the centromeric end of the deletion (). We thus hypothesize that a double-stranded break (DSB) at the telomeric side triggered the deletion, followed by a DSB repair mechanism guided by the formation of a knot loop between the reverse complement of the pentanucleotide motif at the centromeric end ().
Characterization of 7.4 kb deletion.
The drastically reduced SRO contains 8 out of 25 CNCs identified in the initial SRO. Moreover, 4 out of 8 are conserved up to chicken, adding weight to an assumed functional role (). According to several miRNA databases the reduced SRO does not contain any miRNAs. We also investigated the regulatory potential of the deleted region using regulatory tracks. Based on the currently available data, the region is devoid of CpG islands, transcription start sites, conserved transcription factor binding sites, miRNA regulatory sites, VISTA enhancers, regulatory elements from OregAnno, DNase
I hypersensitivity sites and CTCF binding sites. While the new SRO is devoid of known human genes, it does contain 4 human ESTs. Three are unspliced ESTs from two testis cDNA libraries, sharing a common telomeric end position (). BLASTn analysis with EST AI204197 as query sequence retrieved 51 hits, including a significant alignment with Capra hircus PISRT1
mRNA and Mus musculus Pisrt1
partial mRNA sequence. PISRT1
is one of the genes affected by the causal PIS deletion in goat. The PIS goat is the only known natural animal model for BPES associated with absence of horns (polledness) and intersexuality, caused by a regulatory 11.7 kb deletion located 280 kb upstream of goat FOXL2
. It was shown that the deletion does not contain, but alters the transcription of at least three genes: FOXL2
, the non-protein coding gene PISRT1
, and PFOXic 
. Pailhoux et al. (2001) suggested that the PIS deletion harboured elements involved in long-range cis
-regulation of goat FOXL2
, as the expression of both genes is affected by the deletion 
. This was further supported by our previous findings, revealing that the initial 126 kb SRO 5′ to FOXL2
contains the PIS locus 
. Here, the 7.4 kb deletion proved to contain the PISRT1
orthologue, but not the PIS deletion (). This suggests the existence of distinct interspecies cis
-regulatory elements, which have similar effects when disrupted. Caprine PISRT1
encodes a long non-coding transcript (ncRNA) of 1.5 kb that is highly expressed in adult testis 
. A full-length cDNA of 758 bp was identified by 5′ RACE PCR starting from the known testis ESTs containing a polyadenylation site (Figure S1
). These findings confirm its expression in human testis. In addition, no expression could be detected in fibroblasts, while a low PISRT1
expression could be observed in KGN cells, indicating a co-expression of PISRT1
in adult ovarian granulosa cells. These findings are consistent with expression profiles in goat and mice 
. The latter is in line with a presumed regulatory function of PISRT1
, requiring a tissue and cell-type specific expression.
Apart from copy number analyses, the remaining negative patients were specifically screened for sequence variants of CNCs within the initial SRO (). To date, there are only a few human phenotypes found to be associated with sequence variations within cis
. In this study, we identified 15 single nucleotide substitutions within CNCs or in flanking nucleotides and a 4-bp deletion mapping immediately upstream of CNC14 (Table S1
). Only 3 nucleotide substitutions were found in BPES patients exclusively. However, no parents were available of these particular patients for segregation analysis (Table S1
). Moreover, computational transcription factor binding site (TFBS) prediction on any of the wild type and variant CNCs did not support the creation or abolition of a TFBS.
Although comparative sequence analysis has been proven to be a powerful approach to identify regulatory elements, experimental studies are required to confirm their role in gene regulation. The ability to modulate expression of a linked minimal promoter element in transient cell transfections is a widely exploited in vitro
test of cis
-regulatory potential 
. Thus, for 24/25 CNC identified in the original SRO, in vitro
luciferase assays were conducted (CNC19 could not be cloned). In both the KGN and 293T cell line, 29% (7/24) of the tested CNCs showed a significant difference in luciferase activity compared to the basal activity of the vector itself (T-test, P value<0.05) (). Interestingly, cell-type specific regulatory potential could be observed among the constructs tested, three of which map within the 7.4 kb reduced SRO (CNC14, CNC5 and CNC15). This cell type specific regulatory activity supports that at least a fraction of the tested CNCs might be involved in the tissue-specific expression of FOXL2
. We also addressed the putative functional impact of the identified nucleotide variants, but did not detect significant effects. A small quantitative and tissue-specific cis
-regulatory effect of an individual CNC variation cannot be ruled out however. These results suggest that sequence variations within individual CNCs do not directly contribute to the molecular pathogenesis of BPES in our study.
Regulatory activity of wild-type and variant CNCs in FOXL2 expressing and non-expressing cells.
As an additional experimental tool, 3C was conducted for a large region of 625 kb flanking the FOXL2
gene. Using 3C, physical interactions between regulatory elements and their target genes can be demonstrated 
. In the FOXL2
expressing KGN cell line, the FOXL2
core promoter containing Eco
RI fragment 58 proved to come in close vicinity to Eco
RI restriction fragments 109, 133 and 158, located 177, 283 and 360 kb upstream of FOXL2
respectively (Figure S2
). Moreover, an identical but lower interaction profile was detected in expressing fibroblast cells from a normal individual (F2) (Figure S2
). These data demonstrate that in the nucleus of expressing cells, the promoter region of the FOXL2
gene interacts with three long-distance cis
To validate mutual interactions between these three regulatory regions, 3C was performed in EBV, KGN and F2 cells with fragments 109, 133 and 158 respectively as anchor fragments in a second step (). It was found that in expressing cells, all three distant sequences mutually interact and contact the FOXL2
core promoter, assuming that the intervening DNA loops out. Interestingly, fragment 133 contains the 7.4 kb fragment that is deleted in deletion D ( and ; ). To investigate the consequences of a heterozygous deletion of interacting fragment 133 on the interaction profile of the FOXL2
locus, we analysed the mutual interactions of these three fragments in a fibroblast cell line F1, obtained from a BPES patient carrying an upstream deletion defining the initial SRO 
. As a control, we used the fibroblast cell line F2. Interactions of the promoter with the two elements 109 and 158 that are not located within the deletion are not reduced. Thus, this suggests that even on the deleted chromosome these elements can interact with the FOXL2
promotor despite the absence of fragment 133. Furthermore, fragments 109 and 158 appear to mutually interact even in the absence of 133. The upstream deletion disrupts fragment 133 within the reduced SRO, and causes a BPES phenotype. The latter might lead to the conclusion that the retained interactions between fragments 109 and 158 and the FOXL2
core promoter are not sufficient to correctly regulate FOXL2
transcription in the adult expressing cell system studied here. Moreover, it implies that the interactions between the cis
-regulatory element(s) located in fragment 133 and the FOXL2
core promoter are essential for this.
3C analysis of FOXL2 region: mutual interactions between three regulatory sequences upstream of FOXL2.
General conclusions and perspectives
We identified a de novo
distant 7.4-kb deletion that is causally related to BPES. To our knowledge, this is the smallest fully characterized distant deletion implicated in the causation of a human genetic condition (Table S2
). This deletion disrupts a long ncRNA PISRT1
and 8 CNCs, 4 of which are conserved up to chicken. Functional assays suggest a cis
-regulatory and tissue-specific potential of 3 of them. The biological relevance of these findings was corroborated by the 3C study of a normal and aberrant FOXL2
locus in expressing adult cellular systems respectively, demonstrating a close proximity of the 7.4 kb deleted fragment and two other conserved regions with the FOXL2
core promoter, and the necessity of the integrity of the regulatory domain for correct FOXL2
Altogether, we identified and characterized a novel tissue-specific cis-regulatory domain of FOXL2 expression. As we demonstrated the consequences of its disruption, our findings impact mutation screening of strictly regulated developmental and other disease genes. Specifically, our study emphasizes the need for high-resolution copy number screening of their cis-regulatory domains. Genome-wide tools such as oligonucleotide or SNP arrays and next-generation sequencing will play a prominent role in this. In addition, a well-selected patient population is another requirement, as illustrated here: (1) we only included patients with a diagnosis of BPES, a clinically distinguishable but rare disorder, and (2) they all underwent a uniform pre-screening excluding intragenic FOXL2 mutations and gene deletions.
Sequence variations within individual CNCs did not contribute to the molecular pathogenesis of BPES in our study. This can be explained by the fact that sequence changes within individual CNCs might result in a more subtle, different or even normal phenotype, as the cis
-regulatory elements they represent might act in a tissue-specific and quantitative manner 
. The most striking example of the latter is the differential phenotype caused by point mutations in SHH
and in its limb-specific enhancer ZRS of SHH
, resulting in holoprosencephaly type III (HPE3) (OMIM 142945) and PPD respectively 
Other mechanisms may explain the phenotype in the remaining 53 molecularly undefined BPES patients. Although there is no clear evidence for locus heterogeneity in BPES, mutations in other disease genes apart from FOXL2
cannot be excluded in some of the remaining molecularly unresolved cases. Another possibility is the occurrence of regulatory variants within the untranslated regions (UTRs) or the core promoter. A number of non-pathogenic sequence variants have been reported in the FOXL2
putative core promoter and untranslated regions (UTRs) up to now. However, a single basepair insertion in the FOXL2
3′UTR was found to co-segregate with BPES in a large Chinese type II BPES family, and was shown to be located in an AU rich repeat 
. No functional studies were provided however to unequivocally prove a relationship between the insertion and the phenotype in this family. Interestingly, in the FOXP3
gene (NM_014009), encoding another forkhead transcription factor, a presumed disease-causing sequence change was found in the 3′UTR within the poly(A) signal, in affected members of a five-generation family with X-linked immune dysfunction, polyendocrinopathy, enteropathy (IPEX) (MIM 304790) 
. The occurrence of interesting pathogenic or modifying variants in 3′UTRs is in line with their important role in the regulation of gene expression at both pre-mRNA, mature mRNA and post-transcriptional level through cis
-acting elements that interact with a variety of trans
-acting factors 
. This is highlighted by their many conserved sequence motifs, including microRNA (miRNA) targets 
. It cannot be ruled out that changes in post-transcriptional regulation by altered miRNA targeting may result in BPES. A unique example of a variant that alters the gene expression level by modifying miRNA targeting activity is a 3′UTR SNP in human SLITRK1
(NM_052910), which is implicated in Tourette syndrome (MIM 137580) 
Finally, this study considerably adds to the importance of an intact tissue-specific cis
-regulatory domain in this and other developmental disorders. This impacts upon the concept of mutation screening of developmental disease in particular, and of human genetic disease in general. In the future, online databases such as Decipher and the Database of Genomic Variants which collect information on copy number changes, might help to interpret copy number changes affecting putative regulatory regions that might lead to disease