|Home | About | Journals | Submit | Contact Us | Français|
SOX5 encodes a transcription factor involved in the regulation of chondrogenesis and the development of the nervous system. Despite its important developmental roles, SOX5 disruption has yet to be associated with human disease. We report one individual with a reciprocal translocation breakpoint within SOX5, eight individuals with intragenic SOX5 deletions (four are apparently de novo and one inherited from an affected parent), and seven individuals with larger 12p12 deletions encompassing SOX5. Common features in these subjects include prominent speech delay, intellectual disability, behavior abnormalities, and dysmorphic features. The phenotypic impact of the deletions may depend on the location of the deletion and consequently which of the three major SOX5 protein isoforms are affected. One intragenic deletion involving only untranslated exons was present in a more mildly affected subject, was inherited from a healthy parent and grandparent, and is similar to a deletion found in a control cohort. Therefore, some intragenic SOX5 deletions may have minimal phenotypic effect. Based on the location of the deletions in the subjects compared to the controls, the de novo nature of most of these deletions, and the phenotypic similarities among cases, SOX5 appears to be a dosage-sensitive, developmentally important gene.
Molecular cytogenetic techniques, such as array comparative genomic hybridization (aCGH), have precipitated a change in diagnostic emphasis from phenotype to genotype. Traditionally, identification of genetic causes of a syndrome first required ascertainment of multiple patients with similar phenotypes followed by a search for the underlying genetic cause. In contrast, techniques such as aCGH allow for identification of patients with similar genotypes followed by characterization of the associated phenotype. This genotype-first approach (Shaffer et al., 2007) is the only way to appreciate how similar genetic changes can lead to a phenotypic spectrum that may include nonspecific features, be variably expressed, and have overlapping features that may be found in other syndromes.
To increase the likelihood of identifying previously uncharacterized copy-number imbalances that may be causing a phenotypic spectrum of nonspecific neurodevelopmental features, we constructed whole-genome microarrays with enhanced coverage of over 500 functionally significant genes including transcription factors and other developmentally important genes. This has facilitated identification of intragenic, disease-causing deletions (Rosenfeld et al., 2009a; Rosenfeld et al., 2009b; Talkowski et al., 2011b). Another of the targeted genes, located on 12p12.1, SOX5 (SRY-box 5; OMIM 604975), has had multiple, small, apparently de novo deletions identified in patients referred for clinical aCGH testing.
SOX5, along with SOX6 and SOX13, encode members of the SOXD family of transcription factors. SOXD proteins play a role in multiple developmental pathways, including cartilage formation (Aza-Carmona et al., 2011; Lefebvre et al., 1998) and nervous system development (Kwan et al., 2008; Lai et al., 2008; Lefebvre, 2010). There are three major SOX5 transcription products, two long forms (NM_006940.4 and NM_152989.2) that code for proteins similar in size to those coded for by SOX6 and SOX13 (NP_008871.3 and NP_694534.1, respectively) and a unique short form (NM_178010.1, encoding NP_821078.1; Figure 1) (Kiselak et al., 2010). In humans, the long forms are highly expressed in chondrocytes and striated muscles (Ikeda et al., 2002) and have been seen in the fetal brain (Wunderle et al., 1996), while the short form is expressed mainly in the testes (Wunderle et al., 1996). Mouse studies have shown the long and short forms to be expressed in the brain (Kiselak et al., 2010). Mouse models support a role for long Sox5 and Sox6 in chondrogenesis (Smits et al., 2001) and in the development of neocortical projection neurons (Kwan et al., 2008; Lai et al., 2008). Homozygous loss of long Sox5 (through deletion of a coding exon specific to the long transcripts) leads to respiratory distress causing death at birth, apparently due to cleft palate and a small thoracic cage. Complete knockout of Sox6 is frequently lethal at birth; a short sternum is the only apparent skeletal defect observed, although severe dwarfism develops postnatally. Inactivation of both genes is lethal 3 days before birth, with restricted skeletal growth and ossification (Dy et al., 2008; Smits et al., 2001). The short Sox5 protein also functions as a transcription factor that drives testis-specific gene expression (Blaise et al., 1999; Budde et al., 2002; Kiselak et al., 2010; Xu et al., 2009) and likely plays a major role in the formation and function of motile cilia in brain, lung, testis, and sperm (Kiselak et al., 2010).
Each of the long transcripts is associated with a separate promoter region and transcription start site (TSS), supported by the presence of H3K4Me3 histone modifications, a mark commonly associated with the promoter regions of actively transcribed genes (Bernstein, 2002; Ng et al., 2003; Pokholok et al., 2005; Santos-Rosa et al., 2002; Schneider et al., 2004; Schubeler et al., 2004) (Figure 1). The two long isoforms have slightly different translation start sites, resulting in the inclusion of 13 additional amino acids at the N-terminus of NP_008871.3 (the protein product encoded by NM_006940.4; Figure 1C). The short transcript includes only 7 exons from the 3′ end of the gene, encoding a smaller protein containing the high mobility group (HMG) domain, which is involved in DNA binding and some interaction with other proteins (Aza-Carmona et al., 2011), and only one of the two coiled-coil domains found in the larger protein, which allow for homo-and heterodimerization necessary for the dimer to bind to some paired DNA binding sites (Figure 1). The TSS for the short transcript does not have a consensus H3KMe3 mark (Birney et al., 2007), which may be due to its more restricted expression (Kiselak et al., 2010).
The role of SoxD genes in many developmental pathways is well established in mouse and suggests that alterations of SOXD genes in humans could impact human disease (Lefebvre, 2010); however, no genetic studies to date have established such a link. Therefore, to understand how SOX5 alterations may contribute to disease, we analyzed the chromosomal abnormalities and phenotypes in a series of 16 subjects with structural variations disrupting SOX5, including an individual with autism that we previously reported with a small, apparently de novo, intragenic SOX5 deletion (Rosenfeld et al., 2010).
Subjects were identified after referral for clinical molecular cytogenetic testing, either to Signature Genomic Laboratories, Seattle Children’s Hospital, Pittsburgh Cytogenetic Laboratories, Nantes University Hospital, or Hôpital Jean Verdier, or through enrollment in the Developmental Genome Anatomy Project (DGAP). Informed consent was obtained to publish the subject photographs shown here, according to protocols approved by IRB-Spokane.
Oligonucleotide-based aCGH was performed on subjects 2, 3, 6, 11, 14, and subject 6’s mother using a 105K-feature whole-genome microarray [SignatureChip Oligo Solution (OS) version 1, custom-designed by Signature Genomics; manufactured by Agilent Technologies, Santa Clara, CA] as previously described (Ballif et al., 2008). Oligonucleotide-based aCGH was performed on subjects 1, 8, 9, 10, 12, 13, and subject 9’s father using a 135K-feature whole-genome microarray (SignatureChipOS version 2, custom-designed by Signature Genomics; manufactured by Roche NimbleGen, Madison, WI) as previously described (Duker et al., 2010). DNA from subject 4 was analyzed using Illumina HumanHap 300 single nucleotide polymorphism (SNP) microarray (Illumina, San Diego, CA); DNA from subject 7 was analyzed using an Agilent oligonucleotide-based 105K whole-genome microarray (SignatureSelect OS version 1.0); DNA from subject 15 was analyzed using an Agilent oligonucleotide-based 180K-feature whole-genome microarray; DNA from subject 16 was analyzed using a RocheNimbleGen oligonucleotide-based 135K-feature whole-genome CGX microarray, all according to manufacturers’ instructions.
Metaphase FISH analysis was performed using a BAC clone from the abnormal region as determined by aCGH to visualize the abnormalities as previously described (Traylor et al., 2009). When available, parental samples were also assayed for the abnormal region detected by aCGH in the proband, using FISH.
Subject 5 (DGAP189) was sequenced using a custom large-insert jumping library for Illumina sequencing as previously described (Talkowski et al., 2011a). In brief, 20 μg of genomic DNA from subject 5 was sheared to ~3.5-kb fragments that were size-selected, end-repaired, and ligated to cap adaptors containing an EcoP15I restriction site and a GT overhang. Fragments were circularized with an oligonucleotide containing an AC overhang, a subject-specific bar code, and a single biotinylated thymine at the circularization junction. Circularized fragments were restriction-digested, and fragments containing the biotinylated base were captured onto streptaviden beads, purified, and Illumina paired-end adaptors were ligated. The sample was run on a single lane of a HiSeq 2000 (Illumina), using paired-end 25-bp sequencing. Reads were aligned with Burrows-Wheeler Alignment Tool (Li and Durbin, 2009) then processed with SAMtools (Li et al., 2009) and BamStat , a customized program designed to isolate anomalous read pairs indicating a chromosomal rearrangement (Talkowski et al., 2011a).
The proximal and distal breakpoint interval sequences were compared using BLAST sequence similarity (Altschul et al., 1990), and all sequence alignments were manually visualized for stretches of high sequence identity. Analysis for repeats within these breakpoint intervals were then performed using RepeatMasker (http://www.repeatmasker.org).
We identified eight subjects with heterozygous deletions that only involved SOX5 and ranged in size from 72 kb to 466 kb. Most deletions involved at least some of the coding exons and/or a region likely to be involved in transcription initiation, while subject 9’s deletion only involved two of the 5′ untranslated exons. Additionally, in subject 5 (DGAP189), with an apparently balanced de novo translocation [46,XX,t(11;12)(p13;p12.1)dn], we identified a translocation breakpoint within SOX5; sequencing revealed a 9-bp deletion at the breakpoint in intron 11 (chr12:23,602,450–23,602,458) and a 16-bp deletion at the breakpoint in 11p13 (chr11:35,033,903–35,033,918). No genes were present within 50 kb on either side of the 11p13 breakpoint. Depending on the abnormality size and location, the translocation or deletions are predicted to impact different protein isoforms to varying degrees (Figure 1, Tables 1 and and2).2). However, it is not always known which protein isoforms will be altered by the deletions. The deletions in subjects 7 and 8 remove the transcription initiation site of NM_006940.4, so while this likely prevents expression of that isoform, it is not known whether the other long transcript is affected. The deletion of two untranslated exons in subject 9 would alter the 5′ untranslated region of NM_152989.2, though it is uncertain if this ultimately affects gene expression or protein translation.
We identified seven additional subjects with 12p deletions encompassing multiple genes including SOX5, ranging from 1.4 Mb to 12.1 Mb and including 8 to 63 genes (Figure 2, Table 3). Two of these deletions have breakpoints within SOX5, one (in subject 15) between coding exons 3 and 4 of NM_006940.4 and extending 5′ and the other (in subject 16) between untranslated exons 3 and 4 of NM_152989.2 and extending 5′ (Figure 1). Therefore, while subject 15’s deletion is predicted to impact both long isoforms of the gene, subject 16’s deletion may only impact NM_152989.2. However, it should be noted for both of these deletions that it is not known what effect, if any, deletion of the promoter region 5′ of the untranslated exons has on expression of the shorter transcripts (Table 1).
No other clinically significant gains or losses of DNA were identified in any of the 16 subjects.
Two subjects (12 and 13) had apparently identical 12p12.3p11.23 deletions. Query of Signature’s database of abnormalities revealed two additional cases carrying this apparently identical deletion, one referred for developmental delay (DD) and microcephaly and the other referred for pituitary dwarfism, lack of coordination, pervasive developmental delay (PDD), attention deficit-hyperactivity disorder (ADHD), and optic nerve abnormality. No additional follow-up clinical information was available. The similarity in the breakpoints of these alterations suggests that underlying genomic architecture may play a role in mediating these recurrent deletions. The aCGH results refined the intervals containing the distal and proximal breakpoints to approximately 49 kb (chr12:17,755,660–17,785,732) and 30 kb (chr12:26,583,349–26,632,432), respectively. A search for repeats within these breakpoint intervals using RepeatMasker in the reference sequence (Build 36, hg18) showed enrichment for long and short interspersed repeats (Supp. Figure S1). Specifically, L1MA4 repetitive elements with high sequence identity are present within the breakpoint intervals (Supp. Figure S2), which may be mediating recurrent deletions via non-allelic homologous recombination (NAHR), as has been proposed for long stretches of highly homologous sequences such as LINEs and Alus (Deininger and Batzer, 1999; Han et al., 2008; Shaw and Lupski, 2004).
FISH using BAC probes to the deleted region confirmed the deletion in all subjects, including diminished signals for the smallest deletions in which the BAC probes used in FISH are larger than the deletion intervals. Parental FISH testing indicated the deletions in subjects 1–4 and 15 were apparently de novo in origin; additionally, subject 12’s mother did not have the deletion, while her father was unavailable for testing. Three deletions were inherited. Two of these segregated with a developmental phenotype in the family, one from a more severely affected mother and was also present in an affected sister (subject 6) and one from an affected father (subject 16). The third was inherited from an apparently normal father (subject 9) (Tables 2 and and3).3). For the parents of subjects 6 and 9, aCGH confirmed that the deletions were identical in parents and children. Additionally, FISH revealed that the healthy paternal grandmother of subject 9 also carried the deletion. All other parents were unavailable for testing.
Clinical information is presented for subjects 1–9 in Table 2 and for subjects 10–16 in Table 3. Major features for the eight subjects with abnormalities limited to SOX5 include developmental delay/intellectual disability (DD/ID) (9/9), speech delay (8/9), behavior problems (5/9), strabismus (6/9), mild dysmorphic appearance (6/9), brain anomalies (2/5), seizures (2/9), and genital anomalies (2/9) (Table 4). Behavioral aspects include aggressive behavior in subjects 2–4, self-injurious behavior in subject 1, and ADHD in subject 7. Subjects 2 and 4 demonstrated stereotypies but were not formally assessed for autism, while subject 1 received a diagnosis of PDD from his therapists and primary care physician, and subject 3 had a diagnosis of PDD and atypical autism through the TEACCH program (Mesibov and Shea, 2010), which uses assessment batteries including the Childhood Autism Rating Scale (CARS-2) and Psychoeducational Profile – Third Edition (PEP-3). Some minor dysmorphic features were noted in all but subject 2, with the only common feature of frontal bossing seen in 4/9 subjects (Figure 3). Skeletal system involvement was noted as butterfly vertebrae in one and scoliosis in two subjects.
Major features for the seven subjects with larger, SOX5-encompassing deletions include: DD/ID (7/7), speech delay (5/5), behavior problems (5/5), dysmorphic features (6/7; Figure 3), clinodactyly/deviated fingers or toes (4/7), skeletal anomalies (4/7), and brain malformations (4/5). No aggressive behavior was noted in this group (Tables 3 and and44).
Among 24,081 probands tested with oligonucleotide-based microarrays at Signature Genomics between February 2008 and April 2011, seven deletions within SOX5 were identified; excluding the deletions in subject 8, which may or may not involve a coding exon, due to gaps in probe coverage, and subject 9, which only includes untranslated exons, five of these are known to include coding exons. Additionally, one deletion immediately 5′ of exon 1 was identified in a parental sample (chr12:24,001,784–24,041,797); this healthy parent’s affected child did not carry this deletion. Ten additional larger deletions, involving all or part of SOX5 and additional genes, were identified during this time period. In comparison, in one series of 8,329 control subjects studied on high-resolution Illumina genome-wide SNP arrays (mostly with >550,000 probes) with denser coverage of SOX5 than our arrays (Cooper et al., 2011), 62 deletions were identified in SOX5. Most were intronic, one involved an untranslated exon, and three involved coding exons (Figure 1). Unlike the coding exon deletions in cases, these control deletions may still allow the production of functional long SOX5 isoforms. No whole-gene deletions were detected. Unfortunately, comparison of deletion frequency in cases to controls is complicated by incomplete knowledge of how the deletions affect expression of the various SOX5 isoforms.
SOXD genes—SOX5, SOX6, and SOX13—encode transcription factors that play important roles in the development of many systems and processes such as cell proliferation, differentiation, terminal maturation, and survival. Although no known association of these genes with human disease has been previously noted, it has been hypothesized that such associations will be observed due to the critical role of these genes in a large number of pathways (Lefebvre, 2010). Furthermore, predictive modeling shows SOX5 and SOX6 as being likely haploinsufficient (Huang et al., 2010). A survey of aCGH results among patients referred for clinical testing in our laboratories shows multiple cases with deletions affecting SOX5, including apparently de novo intragenic deletions. Interestingly, almost no cases of deletions or small duplications involving the coding regions of SOX6 or SOX13 have been observed in our patient populations, with only one exception of an intragenic, apparently de novo duplication in SOX6 in a male referred for DD, autism spectrum disorder, and morbid obesity. This difference in the number of copy number variants affecting these genes may be due to critical developmental functions performed by SOX6 and/or SOX13 that cannot be compensated for by SOX5, or it may reflect a greater susceptibility of the SOX5 locus to rearrangement. Our analysis of SOX5 abnormalities suggests that haploinsufficiency of this gene results in speech delays, behavioral problems, and minor dysmorphic features.
SOX5 encodes three major transcription products, and the phenotypic consequences of intragenic deletions may depend upon the protein isoforms affected. Subjects 1–4 have de novo deletions that are predicted to result in loss of the primary DNA-binding domain and lead to haploinsufficiency of all three protein isoforms. The de novo translocation in subject 5 would lead to expression of truncated versions of all three protein isoforms that lack the primary DNA-binding domain. The deletions in subject 6 and his affected mother and sister may prevent expression of functional long forms of SOX5 through two mechanisms: either by inducing a translational frameshift by removing exons 4–6 or by losing a putative coiled-coil domain partially encoded by these same exons and presumably critical to homo- and heterodimerization potential. This frameshift would not be predicted if exon 3 is also included in the deletion; however, loss of the coiled-coil domain would still be expected. Deletions in subjects 7 and 8 may eliminate proximal regulatory elements of NM_006940.4, thereby preventing effective transcription initiation. Potential effects, if any, of this deletion on the expression of the other isoforms cannot be predicted without further characterization of the regulation of SOX5 transcription (Figure 1, Table 1). Subjects 1–8 all demonstrate DD, with greatest delay in speech. Additionally, subjects 1–4 and 8 also demonstrate behavior problems, including a diagnosis of PDD in subjects 1 and 3; behavior problems were not noted in subjects 5 or 6. This may be due to variable expressivity, as subjects 5 and 6 are predicted to have altered expression of different isoforms (Table 1).
To help interpret the clinical significance of CNVs in our patient population, comparisons need to be made to rearrangements in the gene observed in a control population (Cooper et al., 2011; Girirajan and Eichler, 2010; Sharp, 2009). A deletion similar to those observed in subjects 7 and 8 was detected in one control sample (Figure 1) and in a healthy parent in our clinical aCGH testing population. The deletion in the control sample retained approximately 3 kb of sequence upstream of exon 1, whereas the deletion in subject 7 removed this 3-kb region as well as the first exon, and, due to gaps between probes, it is unknown if this region is deleted in subject 8 and the parent. The presence of the sequence upstream of exon 1 may allow for normal gene expression to continue, although SOX5 expression levels were not assayed in these subjects. Interestingly, a deletion affecting both exon 9 of NM_006940.4 and the TSS of the short form was observed in one control individual, and two additional control individuals had deletion of the TSS of the short form. While deletion of exon 9 would not be predicted to cause a frameshift in the larger protein products, deletion of the TSS of the short form would be expected to cause reduced expression of this transcript. This may suggest that haploinsufficiency for the short form alone is not generally detrimental to normal phenotypic development. Because the short and long proteins each have distinct tissue-specific expression patterns and contain different functional domains (Kiselak et al., 2010), it is unlikely that short and long forms can completely compensate for each other’s functions, and it remains possible that adding haploinsufficiency of the short form to haploinsufficiency of the long forms can further impact phenotypic expression.
Unlike the deletions in subjects 1–4 and 6, subject 9’s deletion was inherited from a phenotypically normal father and grandmother. The deletion removes two untranslated exons, and a very similar deletion affecting the fourth untranslated exon was observed in one control individual. Deletions within the 5′ untranslated region may not affect expression of the gene or may only affect one of the long forms, leaving the other long form intact. Subject 9’s phenotype is milder than seen in our other subjects; the child does not have delays in language. Therefore, it is possible that her phenotype may not be caused by the deletion in SOX5, or that it may be attributed to reduced penetrance or variable expression.
We attempted to determine if the phenotypic observations in subjects 1–9 were also seen in subjects with larger deletions containing SOX5. Subjects 10–14 had whole-gene deletions (Figure 2). Subjects 15–16 had deletions of the 5′ end of the gene that remove a transcription start site and the control region and, therefore, should result in haploinsufficiency of at least one of the long forms (Figure 1, Table 1). It is not known if this would affect expression of all products. All of the subjects older than one year with large deletions including SOX5 have speech delay, consistent with the effects of SOX5 haploinsufficiency observed in subjects 1–8. Additionally, all subjects older than one year demonstrate some type of abnormal behaviors, including subject 15 and the father of subject 16, who was described to be Asperger-like. This is similar to what was observed in subjects 1–5, where behavioral problems were seen in most subjects with haploinsufficiency for the long and short isoforms of SOX5 (4/5).
Additional genes within these large deletions may be contributing to these subjects’ phenotypes. Consistent with this hypothesis, we observed that the subjects with larger deletions that include 30–63 genes tend to show more dysmorphic features and have more musculoskeletal anomalies (Tables 3 and and4).4). In the literature, common features reported among individuals with 12p12 deletions include DD/ID, short stature, microcephaly, brachydactyly, clinodactyly, and dysmorphic features including low-set ears, broad nasal bridge, and microretrognathia (Bahring et al., 1997; Boilly-Dartigalongue et al., 1985; Fryns et al., 1990; Glaser et al., 2003; Lu et al., 2009; Magenis et al., 1981; Magnelli and Therman, 1975; Malpuech et al., 1975; Mayeda et al., 1974; Nagai et al., 1995; Orye and Craen, 1975; Stumm et al., 2007; Tenconi et al., 1975) (Table 4). Behavior problems have only been described in one 13-month-old male with poor psychosocial contact (Orye and Craen, 1975), although a majority of these cases were identified through traditional cytogenetic techniques, and the inclusion of SOX5 in the deleted intervals is uncertain. The brachydactyly observed in these individuals is type E, with shortening of the metacarpals and metatarsals, and, along with the short stature and oligodontia seen in some of these individuals, may be due to the deletion of PTHLH within 12p11.22 (Klopocki et al., 2010). In our series, subjects 14 and 16 are deleted for this gene; subject 14 demonstrates brachydactyly, and subject 16 has short stature. There is also an autosomal-dominant hypertension with brachydactyly syndrome (OMIM 112410) due to an inversion of a minimum ~450-kb segment immediately distal to SOX5 and containing no known protein-coding genes but containing putative microRNA-coding gene(s) that show altered splicing in inversion carriers (Bahring et al., 2008). Expression of SOX5 is not altered in these individuals (Bahring et al., 2004). This suggests a gain-of-function mechanism for this disease, and, consistent with that, the individuals in our cohort with deletions of SOX5 and not PTHLH do not demonstrate brachydactyly. However, in our cohort we do not have information on hypertension, which has been described in an individual with a deletion of 12p11.22 (Bahring et al., 1997).
In summary, deletions within SOX5 result in prominent speech delay and frequently in behavior problems. Larger deletions that include all of SOX5 or that remove the 5′ regulatory region, which may or may not alter expression of all protein isoforms, also show language delay, behavioral problems, and more dysmorphic features. These findings support the role of SOX5 in human neurodevelopment. Complete haploinsufficiency of SOX5, with roles in chondrogenesis, may only occasionally result in skeletal abnormalities, such as the butterfly vertebrae and scoliosis in some of our subjects with deletions. Haploinsufficiency of SOX5 may be compensated for by SOX6 so that the resulting phenotype is milder than may have been hypothesized for the developmentally important SOXD family of genes (Lefebvre, 2010). Further research into how various SOX5 deletions impact the function of the SOX5 protein isoforms and identification of additional individuals with SOX5 abnormalities will be helpful in understanding further how loss of this developmentally important gene contributes to neurodevelopmental disease.
The authors thank the patients and families who contributed clinical information for this study; Erin Dodge (Signature Genomic Laboratories) for her critical editing and preparation of the manuscript; and Beth Torchia, Carrie Hanscom, and Shahrin Ahsan for their technical assistance. This work is supported in part by NIH grants GM061354 (DGAP; C.C.M), HD065286 (J.F.G.), and F32MH087123 (M.E.T.).
J.A.R., N.J.N., R.A.S., B.C.B., and L.G.S. disclose the following possible conflict of interest: they are employees of Signature Genomic Laboratories, PerkinElmer, Inc.
All other authors have no conflict of interest to report.