Cockayne syndrome (CS) is a neurodevelopmental disorder most often caused by loss of functional CSB or CSA protein (OMIM #133540 or #216400) 
. CSB is a SWI/SNF2-like ATPase and chromatin remodeling protein that plays a key role in transcription-coupled nucleotide excision repair (TC-NER) of helix-distorting DNA lesions. When RNA polymerase II (RNAPII) stalls at a site of DNA damage, CSB is among the first proteins to bind 
and is required to recruit other NER factors including CSA and the TFIIH complex containing the XPB and XPD helicases 
. CSB is also known to activate RNA polymerase I (RNAPI) transcription of ribosomal RNA 
, and to induce changes in gene expression resembling those caused by chromatin remodeling and histone modification 
We recently discovered a domesticated PGBD3 transposon (piggyBac transposable element-derived 3) that inserted into intron 5 of the CSB gene at least 43 Mya in the common ancestor of marmoset and humans. As a result, primate CSB genes including our own now generate both full length CSB (coding exons 2–21) and — by alternative splicing and polyadenylation — a CSB-PGBD3 fusion protein that joins the N-terminal domain of CSB (coding exons 2–5) to the intact PGBD3 transposase 
. CSB-PGBD3 is startlingly well conserved from marmoset to humans, whereas four other identifiable copies of the PGBD3 transposon elsewhere in the human genome have all decayed into pseudogenes (PGBD3P1-4). The PGBD3 transposon contains a 5′ splice acceptor site just upstream of the transposase ORF and a polyadenylation signal downstream of the ORF that allow alternative splicing of CSB exon 5 to the intact transposase without precluding continued expression of full length CSB (). In fact, the insertion of PGBD3 expanded the repertoire of the CSB locus from one protein to three: full length CSB, the more abundant CSB-PGBD3 fusion protein, and most abundant of all, the intact PGBD3 transposase transcribed from a cryptic promoter near the 3′ end of CSB exon 5 
. Coexpression of the CSB-PGBD3 fusion protein with CSB initially suggested that the fusion protein might contribute to or modulate CS disease 
; however, mutations that cause CS are distributed across the entire length of the CSB gene (except in the PGBD3 transposon) and no consistent clinical differences have been observed between CS patients with CSB mutations in coding exons 2–5 (many of whom do not make the CSB-PGBD3 fusion protein) and patients with mutations in exons 6–21 (who continue to make the CSB-PGBD3 fusion protein) 
The CSB-PGBD3 fusion protein is abundantly expressed by alternative splicing and polyadenylation of the CSB transcript.
Unlike the ATPase domain (CSB exons 6–21), the function of the N-terminal domain (coding exons 2–5) shared by CSB and the CSB-PGBD3 fusion protein is not yet well understood (). The only recognizable motif in exons 2–5 is a highly acidic domain between E356 and E403 containing 25 aspartates and glutamates, but this domain does not appear to be essential for recovery of RNA synthesis following UV damage 
. Interestingly, the N-terminus autoinhibits association of CSB with chromatin in both normal and UV-irradiated cells, and ATP hydrolysis is required for relief of inhibition 
. The isolated N-terminal domain has also been shown to interfere with transcription and repair: Truncated CSB protein expressed in the patient-derived cell line CS1AN represses elongation by RNAPI 
and the N-terminus of CSB interacts with topoisomerase I (Top1) to inhibit repair of Top1 adducts both as part of the CSB-PGBD3 fusion protein and independently 
We have recently shown that expression of the CSB-PGBD3 fusion protein in CSB-null UVSS1KO cells induces a strong transcriptional response dominated by an interferon-like innate antiviral immune response that may be driven by upregulation of the STAT1, STAT2, and IRF9 components of the heterotrimeric transcription factor ISGF3 (interferon-stimulated gene factor 3) 
. As might be expected from conservation of the CSB-PGBD3 fusion protein for over 43 My, the interferon-like response induced by CSB-PGBD3 is dramatically repressed by coexpression of full-length CSB, and is not induced by CSB alone. However, the mechanism by which the CSB-PGBD3 fusion protein induces the interferon-like response, and CSB represses it, are still unclear.
The CSB-PGBD3 fusion protein may affect RNAPII gene expression through both global and local mechanisms. Globally, CSB-PGBD3 may modulate CSB functions by interacting with complexes that normally contain functional CSB; this could explain how the fusion protein modulates DNA repair without inducing or repressing transcription of known DNA repair factors 
. CSB-PGBD3 may also affect RNAPII transcription locally by binding to dispersed DNA elements called MER85s, thereby regulating expression of nearby genes.
PGBD3, like many autonomous mobile elements, has given rise to a family of internally-deleted, nonautonomous elements that can be mobilized by the PGBD3 transposase. These 140 bp MER85 elements retain about 100 bp from the 5′ end of PGBD3, and about 40 bp from the 3′ end, but have lost the transposase ORF along with the upstream 5′ SS and the downstream poly(A) site. We have identified 889 MER85 elements dispersed throughout the human genome, most of which include 13 bp terminal inverted repeats (TIRs) that are required by the PGBD3 transposase for excision and reinsertion into TTAA
target sites. We have also demonstrated that MER85 elements bind PGBD3 and CSB-PGBD3 in vitro
. Thus, CSB-PGBD3 may enable MER85s to recruit the N-terminus of CSB to specific genomic loci where it can affect local chromatin structure or recruit transcription and repair factors.
We wish to understand why the CSB-PGBD3 fusion protein is so well conserved, and to determine what roles it may play in health and CS disease. Here, we explore the connection between the genome-wide DNA binding profile of CSB-PGBD3 and transcriptional regulation in UVSS1KO cells. As expected, we find that CSB-PGBD3 binds directly in vivo
to many MER85 elements throughout the genome. Surprisingly, we also find that CSB-PGBD3 binds indirectly to TRE motifs (tumor promoting antigen response elements) recognized by AP-1 family (activating protein-1) transcription factors, as well as to motifs for the TEAD1 (TEA domain family member 1) and CTCF (CCCTC
-binding factor) transcription factors. We show that CSB-PGBD3 physically interacts with the AP-1 protein c-Jun, and that genes upregulated by CSB-PGBD3 correlate with binding of CSB-PGBD3 to nearby TRE motifs but not with binding to MER85 elements. We also show that CSB-PGBD3 interacts with RNAPII (RNA polymerase II), and that interactions with RNAPII and c-Jun are both mediated primarily by the N-terminal CSB domain of CSB-PGBD3. Thus despite the ability of the CSB-PGBD3 fusion protein to bind specifically to MER85s both in vitro
and in vivo
, binding does not appear to have widespread transcriptional consequences. In contrast, binding of the CSB-PGBD3 fusion protein to TRE motifs through protein-protein interactions with c-Jun and possibly other AP-1 family members correlates with genes involved in angiogenesis 
, innate immunity 
, and the Smad2/3 and TGF-beta pathways 
, demonstrating that the CSB-PGBD3 protein modulates a preexisting AP-1-based regulatory network. Whether these regulatory effects were responsible for initial fixation of the CSB-PGBD3 fusion protein in the common ancestor of humans and marmoset 43 Mya, or whether these regulatory effects have evolved over time, remains to be seen.