|Home | About | Journals | Submit | Contact Us | Français|
Mutations in the Otopetrin 1 gene (Otop1) in mice and fish produce an unusual bilateral vestibular pathology that involves the absence of otoconia without hearing impairment. The encoded protein, Otop1, is the only functionally characterized member of the Otopetrin Domain Protein (ODP) family; the extended sequence and structural preservation of ODP proteins in metazoans suggest a conserved functional role. Here, we use the tools of sequence- and cytogenetic-based comparative genomics to study the Otop1 and the Otop2-Otop3 genes and to establish their genomic context in 25 vertebrates. We extend our evolutionary study to include the gene mutated in Usher syndrome (USH) subtype 1G (Ush1g), both because of the head-to-tail clustering of Ush1g with Otop2 and because Otop1 and Ush1g mutations result in inner ear phenotypes.
We established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. Other lineage-specific evolutionary events included a three-fold expansion of the Otop genes in Xenopus tropicalis and of Ush1g in teleostei fish. The tight physical linkage between Otop2 and Ush1g is conserved in all vertebrates. To further understand the functional organization of the Ushg1-Otop2 locus, we deduced a putative map of binding sites for CCCTC-binding factor (CTCF), a mammalian insulator transcription factor, from genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq) data in mouse and human embryonic stem (ES) cells combined with detection of CTCF-binding motifs.
The results presented here clarify the evolutionary history of the vertebrate Otop and Ush1g families, and establish a framework for studying the possible interaction(s) of Ush1g and Otop in developmental pathways.
Although bilateral vestibular pathology is an important cause of imbalance in humans, it is underdiagnosed and poorly understood. Approximately 50% of dizziness is attributed to benign paroxysmal positional vertigo (BBPV), a major risk factor for falls, bone fractures, and accidental death, particularly in the elderly . The etiology of BBPV often relates to the degeneration or displacement of otoconia [2,3], the minute biomineral particles in the utricle and saccule within the inner ear involved in detecting linear acceleration and gravity .
Currently, there are no known human hereditary vestibular disorders attributed to mutations in genes selectively affecting otoconial development or maintenance. Otopetrin 1 (Otop1) is one of a handful of known genes that, when mutated, cause an imbalance phenotype with selective otoconial involvement in animal models [4,5]. In the mouse, Otop1 is expressed in the supporting cells of the sensory epithilium patches (maculae) of the utricule and saccule from embryonic day 13.5 to adulthood, as well as in several other tissues . In contrast, Otop1 expression in zebrafish is predominantly restricted to the hair cells of the sensory epithelium and the neuromasts of the lateral line organ [7,8]. Specific Otop1 mutations have been reported in mouse (tilted [tlt, ], mergulhador [mlh, ], and inner ear defect [ied, ]) and zebrafish (backstroke [bks, ]). Despite the inter-species differences in Otop1-expression patterns, mutant mice and fish lack otoconia and otoliths respectively in all cases, but otherwise have normal inner ear structures, hearing, and overall development. Similarly, the knockdown of Otop1 in mouse and zebrafish results in a phenocopy of the selective lack of otoconia/otoliths seen in the mutants [6,7].
The Otop gene family in most vertebrates is comprised of three members clustered on two chromosomes: Otop1 (e.g., mouse chromosome 5) and the paralogous tandem genes Otop2 and Otop3 (e.g., mouse chromosome 11). The origins of the Otop gene family have been traced far back in metazoans ; for example, the phylogenetic relationships of vertebrate Otop and arthropod and nematode Otop-like proteins (also DUF270 proteins [10,12]) have been deduced from 62 open reading frames in 25 species, demonstrating that they constitute a single family, named the Otopetrin Domain Protein (ODP) family. Fragmentary, but clearly ODP-related, sequences have also been identified in urochordates (Ciona), echinoderms (urchin), and cnidarians (Nematostella) (unpublished data). Signature features of ODP proteins are 12 transmembrane domains organized into three "Otopetrin Domains" (highly conserved among metazoans) and a highly constrained predicted loop structure. Although ODP proteins do not show homology to any transporter, channel, exchanger, or receptor families, the extensive sequence and structural similarity among them suggests a conserved functional role(s) . It has been postulated that Otop1 inhibits P2Y purinergic receptor-mediated calcium release in macular epithelial cells in a calcium-dependent manner, and promotes an influx of calcium in response to ATP during otoconial development [6,13]. In this model, Otop1 acts as a sensor of the extracellular calcium concentration near supporting cells, and responds to ATP in the endolymph to increase intracellular calcium levels during otoconia mineralization. Otop2 and Otop3 functions remain unknown.
The larger syntenic context of the Otop1 and tandem Otop2-Otop3 genomic loci may reflect lineage-specific genomic features and gene associations worthy of exploration. For instance, established genetic and physical maps of the Otop1-containing region suggest an inverted gene order in the mouse and human genomes [14,15]; further, in these genomes, the Otop2-Otop3 gene tandem is physically clustered (head-to-tail) with the Usher syndrome (USH) subtype 1G gene (USH1G; also called SANS). USH1G encodes a scaffold protein with three ankyrin repeats and a SAM domain, which is preferentially expressed in tissues containing motile cilia, such as cochlea and retina [16,17]. Mouse Ush1g on chromosome 11 and its human ortholog USH1G on chromosome 17 are mutated in the Jackson shaker (js) mouse mutant and in human USH1G patients, respectively. The former is a recessive condition associated with deafness, circling, and disorganization of the hair cell bundles (stereocilia) ; the latter is also a recessive condition characterized by congenital deafness, vestibular dysfunction, and prepubertal onset of visual loss (retinitis pigmentosa) . USH is generally a clinically and genetically heterogeneous disease and the most common cause of hereditary deaf-blindness in humans ; invariably, USH is associated with highly disorganized stereocilia.
Ush1g and Otop1 thus play important roles in the vestibular (Ush1g and Otop1), auditory (Ush1g), and visual (Ush1g) systems in vertebrates. In addition to the experimentally confirmed expression of Ush1g in the mouse and human inner ear and retina [16,17] and of Otop1 in the mouse and zebrafish inner ear [7,10], an Otop-like gene (Nlo, for neural crest-lateral otic vesicle localization) is expressed in the anterior placodal ectoderm area of the X. laevis embryo that forms both the eye lens and otic vesicle neurogenic placodes . Furthermore, analyses of expressed-sequence tag (EST) databases reveal evidence for expression of Otop1, Otop2, and Otop3 in the mouse and human retina. The challenge remains to establish whether Ush1g and Otop participate in common developmental pathways affecting ear and/or eye physiology.
To examine the evolutionary diversity of Otop and Ush1g and their larger genomic context in vertebrates, we performed a comprehensive comparative genomic study that analyzed 25 available genome sequences (from fish to human) and sequenced bacterial artificial-chromosome (BAC) clones from seven species. Through cytogenetic and comparative genomic methods, we established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. We could further infer evolutionary scenarios for gene family expansions in individual lineages, including a three-fold expansion of the Otop family in Xenopus tropicalis and the Ush1g family in fish lineages. Ush1g genes have remained tightly linked to Otop2-related sequences throughout vertebrate evolution, raising questions about whether they are functionally insulated or whether they participate in common developmental pathways. To further understand their functional organization, we deduced and analyzed a map of putative CCCTC-binding factor (CTCF)-binding sites within the Ushg1-Otop2 locus. These analyses offer some hints about chromatin organization of the Ush1g-Otop2 locus in mouse and human, and are a resource for further investigating the role of mammalian insulator CTCF in orchestrating gene expression at this locus.
We sought to characterize the genomic regions encompassing Otop1 and the Ush1g-Otop2-Otop3 cluster in a diverse set of vertebrates. As the reference sequence for these studies, we extracted relevant portions of the NCBI mouse genome sequence (build 37; ), specifically, the intervals containing Otop1 on chromosome 5B2 and Ush1g-Otop2-Otop3 on chromosome 11E2, plus the 100-kb segments immediately flanking each interval (see Materials and Methods). The human sequence was an unsuitable reference because the human OTOP1 locus diverged significantly from the orthologous regions in most other vertebrate genomes (see below).
We compiled the homologous sequences of these two genomic regions from 25 vertebrate species for comparative analyses (Table (Table1).1). For seven species, we isolated  and sequenced  BAC clones spanning one or both of the targeted genomic regions (Table (Table2);2); these efforts generated ~3.8 Mb of high-quality sequence data  specifically for this study. For the remaining species (and in cases where we were unable to isolate suitable BACs from the species listed in Table Table2),2), the orthologous sequences were retrieved from whole-genome sequence assemblies available on the UCSC Genome Browser (http://genome.ucsc.edu). With few exceptions (i.e., hedgehog and platypus), the generated data sets contain orthologous sequences of the Otop1 and the Ush1g-Otop2-Otop3 loci from multiple phylogenetic lineages.
In 20 of the 25 vertebrate genomes studied, the immediate genomic context for Otop1 is identical (with Otop1 flanked by Tmem128 and Drd5 on the 5' and 3' side, respectively; Figure Figure1a).1a). For three of the remaining species (cow, marmoset, and hedgehog), there were insufficient sequence data to draw firm conclusions about the genomic context of Otop1. In X. tropicalis, instead of a single Otop1 gene, Tmem128 and Drd5 flank two closely spaced Otop1 genes, which we have named Otop1a and Otop1b (Figure (Figure1b;1b; and additional file 1).
In our maximum likelihood analysis of the Otop family (Figure (Figure2),2), the Otop1a-encoding gene resides within a clade containing the mammalian Otop1-encoding genes, while the Otop1b-encoding gene resides within a sister clade. A strict interpretation of this tree (Figure (Figure2)2) requires the loss of an Otop1b gene in the lineage leading to mammals. Having the genomic sequence of the Otop1-containing region of X. laevis or another amphibian might be useful for timing this genomic event (note that it has not been determined whether the X. laevis genome contains multiple Otop1 genes). The X. tropicalis Otop1a and Oto1b genes encode predicted proteins that are 69% identical to each other at the amino-acid sequence level, and 72% and 65% identical to the mouse Otop1 protein, respectively. Otop1a more closely resembles mammalian Otop1, suggesting that, upon duplication, Otop1b may have acquired novel functions.
The most dramatic difference in the architecture of the OTOP1-containing genomic region is in the human genome, in which OTOP1 is the boundary gene of a submicroscopic inversion polymorphism on chromosome 4p16. In the inverted configuration (represented in the hg17 reference sequence), OTOP1 and DRD5 are separated by ~5 Mb, with the break in synteny (with respect to mouse chromosome 5) occurring immediately adjacent to the 3' end of OTOP1 (Figure (Figure1c).1c). All genes in this large genomic segment have the opposite orientation to most other vertebrate genomes studied.
The inversion polymorphism on human chromosome 4p16 was previously observed in a heterozygous state in 12.5% of a Caucasian population . Large (e.g., ~5 Mb) inversions are uncommon in the human genome; in fact, within euchromatin, there are relatively few greater than 1 Mb. We thus sought to clarify the evolutionary origins of this inversion by comparisons with three non-human primate species serving as outgroups. The orientation of the OTOP1-containing region was validated by three-color interphase fluorescence in situ hybridization (FISH)  studies involving a single human individual, five chimpanzees, three orangutans, and one macaque. To visualize the inversion, we selected two probes inside (red and blue) and one outside (green) the inverted region. The presence of an inversion changes the order of the red and blue probes and their position relative to the green probe (Figure (Figure3).3). These results demonstrate the inversion of the OTOP1-containing region in the human individual and in all chimpanzees sampled, but not in any orangutan or macaque studied, suggesting that the inversion might have occurred in the shared human-chimpanzee lineage and could still reflect a polymorphism in humans. It is not possible to establish the polymorphic or fixed status of this inversion in the chimpanzee lineage because of the small number of chimpanzees studied (five).
The polymorphic OTOP1-containing inverted segment is flanked by ~260-kb duplicated segments, also referred to as segmental duplications (SD; Figure Figure1d;1d; and additional file 2). These are highly similar sequences (>95% sequence identity) located near the OTOP1-proximal and OTOP1-distal inversion breakpoints at positions ~4 Mb and ~9 Mb on human chromosome 4 (referred to as positions Chr4:4 and Chr4:9, respectively, by Darai-Ramqvist et al. ). The SDs are oriented in an inverted configuration, suggesting that they originated by a non-allelic homologous recombination event. Each SD consists of a mosaic of duplicated sequences originating from multiple chromosomes, most likely compiled in a step-wise fashion [27,28]. The immediate boundary region between OTOP1 and the OTOP1-proximal duplicon contains one retroposed pseudogene (ΨUNC93B1). Both SD boundaries are enriched for olfactory receptor (OR)-gene sequences from the 7E subfamily ; specifically, each cluster of 7E OR sequences consists of one or more ~13-kb segment(s) containing a 7E OR gene interspersed with LINEs, SINEs, the LTR-containing satellite repeat SATR, and HERVE retroviruses arranged in a characteristic pattern (Figure (Figure1d;1d; also see ).
Of note, the OTOP1-distal SD (also flanked by clusters of OR-gene sequences) lies immediately adjacent to the RS447 megasatellite, a tandem repetitive sequence containing copies of the deubiquitinating enzyme gene USP17 (Figure (Figure1d).1d). Bioinformatic analyses have revealed the presence of a less-conserved, minor RS447 locus on human chromosome 8p23, which is relevant to our study (as discussed below).
We next analyzed a map of known SDs in the primate genome [30,31] to infer the evolutionary history of the OTOP1-flanking SD. As depicted in Figure Figure4a4a (and expanded in additional file 3), this family of SDs, also called the tumor break-prone segmental duplication (TBSD) family , is specific to great apes, emerging and spreading out in hominid genomes since the divergence of the human, chimpanzee, and orangutan common ancestor from the macaque lineage, roughly 12-16 million years ago. All SDs belonging to the TBSD family are >95% identical and have distinctive boundaries enriched in 7E OR genes, LTR-containing retroviruses, and copies of the retroposed pseudogene ΨUNC93B1. The phylogenetic relationships among the 7E OR genes have been studied by others  revealing unique features, such as a high frequency of gene conversion with distant neighbors and evidence of inter-and intra-chromosomal duplication events. This pattern is distinct from the entire set of human SDs, of which 86% duplicated intra-chromosomally . Collectively, these data support the idea that the highly similar 7E OR gene clusters at the boundaries of TBSD family members may predispose to chromosome breakage and rearrangement; this likely reflects the history of the human chromosome 4p16 inversion described here. Additional details about the TBSD family in primates can be found in additional files 4, 5, and 6.
Our analyses also revealed a second SD (~60 kb in size) in the human genome containing OTOP1- and TMEM128-like sequences (ΨOTOP1 and ΨTEM128, respectively; Figure Figure4a4a and and4b).4b). Here, the boundaries lacked the distinctive features of the TBSD family members. This small duplication most likely resulted from a typical duplication-shadowing event [30,31]. Furthermore, oligonucleotide-based comparative genomic hybridization (array CGH)  confirmed that this duplicated segment is present in single copy in human and gorilla genomes but, interestingly, not in the chimpanzee or bonobo genomes (additional file 3). This duplication pattern is inconsistent with the generally accepted human-great ape phylogeny; such a scenario may have arisen from a deletion event in the chimpanzee lineage, incomplete lineage sorting or, less likely, a recurrent duplication event in the human and gorilla lineages . In the human genome, this 60-kb duplicon resides in the pericentromeric region of chromosome 2 (Figure (Figure4b).4b). Human ΨOTOP1 is 96% identical to OTOP1 at the sequence level, but contains multiple stop codons and indels (e.g., complete deletion of exon 4); it is thus unlikely to encode a functional protein. However, the promoter associated with ΨOTOP1 may still be active, as multiple truncated ΨOTOP1 ESTs exist in GenBank (grouped within UniGene Hs.615109).
The Ush1g-Otop2-Otop3 locus spans ~40 kb in the mouse genome (Figure (Figure5a).5a). The tandem genes Otop2 and Otop3 have the same orientation and are separated by a 2.4-kb intergenic region, while Ush1g has the opposite transcriptional orientation. Based on analyses of mouse ESTs, Otop2-derived transcripts yield three alternatively spliced variants derived from the use of four non-coding first exons (1a to 1d) as well as two internal splice sites in exon 1d and the non-coding portion of exon 2 (Figure (Figure5a5a inset). The presence of exon 1a, 1b, or 1c in an Otop2-derived transcript is mutually exclusive of the others, while all transcripts share parts of exon 1d and the non-coding portion of exon 2. Consistent with its presence in all Otop2-derived splice forms, exon 1d is highly conserved among vertebrates (e.g., mouse and opossum exon 1d are 60% identical at the nucleotide level, with highly conserved canonic donor and acceptor splice sites), and the exon's internal acceptor splice site is conserved across rodents, primates, carnivores, and ungulates. Exon 1c is well conserved among vertebrates, with the exception of mouse and rat, which have divergent sequences but a conserved splice-donor site. Finally, exons 1a and 1b show neither sequence nor donor splice-site conservation among vertebrates except in mouse and rat, suggesting that these exons may be rodent-specific. Thus, Ush1g is not adjacent to, but embedded within, the first intron of Otop2 in rodent genomes due to the presence of lineage-specific Otop2 untranslated exons.
An identical repertoire of genes near the Ush1g-Otop2-Otop3 locus is present in 18 of the 25 genomes studied, including those encoding: (1) fatty acid desaturase domain family member 6 (Fads6) nearest the 5' end of Otop2; and (2) hypothetical protein (C17orf2) and cerebellar degeneration-related protein 2-like (Cdr21) nearest the 3' end of Otop3 (Figure (Figure5a).5a). The exceptions are platypus (for which we were unable to generate or recover a sequence for this genomic region); and Xenopus and the five fish lineages (all of which have notably different structures for this genomic region).
The Ush1g-Otop2-Otop3 locus is markedly different in frog and fish lineages compared to the other vertebrates studied. Specifically, the X. tropicalis genome contains only one copy of the Ush1g gene (similar to other vertebrate genomes), but Otop2 and Otop3 are uniquely expanded into a cluster of six paralogous genes (Figure (Figure5b).5b). A maximum likelihood (ML) unrooted tree that includes all known amphibian, mouse, human, and dog Otop sequences is shown in Figure Figure2.2. Three well-supported clades representing the Otop1, Otop2, and Otop3 subfamilies (labeled in red, green, and blue, respectively) are seen, indicating that the genome of the last common ancestor of placental mammals and amphibians (~320-360 million years ago) had at least three Otop paralogs.
Determining the timing of the Otop duplication events is difficult. The tree in Figure Figure22 suggests that the ancestry of Otop1- and Otop2-encoding genes may have been shaped by a combination of ancient duplication events and mammalian-specific gene loss. Alternatively, our tree may be thrown off by to the asymmetric evolutionary rates of daughter genes following duplication and divergence--with the duplicate that retains the ancestral function evolving more slowly than the other(s) (see discussion in Larroux et al., ). Including additional X. laevis Otop1 and Otop2 sequences (if they exist and are found in the future) may help break longer branches and clarify these relationships. On the other hand, the Otop3 scenario seems to be more clearly a result of amphibian-specific duplication. Furthermore, the topology of the two X. laevis Otop3-like sequences (Nlo  and this study) suggests that at least one of the Otop3 duplication events occurred prior to the last common ancestor of X. laevis and X. tropicalis. Notably, Nlo is expressed within the non-neural ectoderm surrounding the anterior neural plate of X. laevis embryos; at tailbud stages, Nlo is expressed in the dorsolateral region of the otic vesicle, which later gives rise to the gravity organs. Given the phylogenetic relationship of Nlo and Otop3 as well as the apparent role of amphibian Nlo, mouse Otop1, and zebrafish Otop1 in the development of the vestibular system, it seems possible that the ancestral Otop gene was involved in vestibular system development.
We examined whole-genome sequence assemblies for five fish species (zebrafish, Danio rerio; tetraodon, Tetraodon nigroviridis; fugu, Takifugu rubripes; stickleback, Gasterosteus aculeatus; and medaka, Oryzias latipes). We emphasize our analyses of the stickleback genome in Figure Figure5c5c because its whole-genome sequence assembly included the best coverage of the Ush1g-Otop2-Otop3 locus among fish. Consistent with the whole-genome duplication that occurred in teleost fish lineage subsequent to its divergence from mammalian ancestors ~230 million years ago , the Ush1g-Otop2-Otop3 locus occupies two locations in the stickleback genome. One segment (not assigned to a chromosome, ChrUn: 25945000-25970000) shows conserved synteny with other vertebrate genomes, with Fads6, Ush1ga, Otop2, and C17orf28 similarly ordered and oriented (but not Otop3; see Figure Figure5c).5c). The other segment, residing on chromosome XI, appears to result from duplication and rearrangement events, yielding broken synteny among the genes (Figure (Figure5c);5c); this region is not fully characterized, but in the stickleback whole genome assembly it contains at least two Otop2-related genes (Otop2d and Otop2e), one Otop3-related gene (Otop3d), and two copies of Ush1g (Ush1gb and Ush1gc). Note that Ush1ga and Ush1gb (but not Ush1gc) have remained close to Otop-related sequences. A total of 15 Ushg1 genes were annotated in the whole-genome sequence assemblies of the five fish (for the complete listing and coordinates of all fish Ushg1 genes see additional files 7, 8, 9, 10 and 11). Multi-species comparisons of the predicted protein sequences revealed that the defining architectural features of Ush1g in placental mammals (i.e., three ankyrin domains and a SAM domain) are highly conserved in the three fish Ush1g paralogs, with most sequence variation residing within the central region of the protein of unknown function (additional file 12).
We next set out to identify conserved non-coding sequences of potential functional relevance within the ~40-kb Ush1g-Otop2-Otop3 locus. We analyzed the high-quality sequences generated from galago, dog, armadillo, cow, and hedgehog in conjunction with the orthologous mouse and human sequences from the UCSC Genome Browser. The X. tropicalis sequence was not included in this analysis because its unique Otop family expansion complicates the alignment of non-coding sequences with other species' sequences. ExactPlus  was used to identify multi-species conserved sequences (MCSs) within the multi-sequence alignments. MCSs overlapping known coding sequences were removed, thereby enriching conserved non-coding sequences. The resulting set of 67 non-coding MCSs (from 7-31 bp long and averaging 12 bp) together span 806 bp (2%) of the analyzed interval.
To assess ExactPlus' performance, we compared its output with data from the "PhastCons Conserved Elements, 30-way Vertebrate Multiz Alignment" track on the UCSC Human Genome Browser; this track depicts the top 5% most-conserved MCSs based on PhastCons analyses of 30 vertebrate genome sequences . Roughly 350 bp of non-coding sequence were identified by both ExactPlus and PhastCons, accounting for ~0.8% of the analyzed ~40-kb interval. The first intron of USH1G, in particular, appears to contain a set of regulatory sequences with characteristics that are well conserved among vertebrates: a uniform intron size of ~2.2 kb; virtual absence of repetitive elements across species; and richer in highly conserved non-coding elements than the surrounding genomic regions (see additional file 13 for the complete listing and coordinates of non-coding MCSs identified by ExactPlus and PhastCons).
An open question is whether there are non-coding sequences in and around the Ushg1-Otop2 locus that act as 'functional boundaries'- either to insulate these genes from each other or to orchestrate their regulation. Chromatin immunoprecipitation-sequencing (ChIP-seq) combines chromatin immunoprecipitation of a DNA-binding protein with high-throughput DNA sequencing and mapping of the 'sequence tags' to a reference genome. An obvious candidate for organizing chromatin domains is the mammalian insulator CTCF. We thus mined data from in vivo genome-wide ChIP-seq studies in human and mouse ES cells for CTCF 'sequence tags' that map to the Ush1g-Otop2 locus (Duke/UNC/UT-Austin/EBI ENCODE group [http://genome-test.cse.ucsc.edu] and Chen et al., , respectively). Four putative CTCF-binding sites (Figure (Figure6a)6a) were found with the following species-specific occupancies: CTCF1 (located at the 3' end of Ush1g) and CTCF4 (within exon6 of Otop2), mouse-specific; CTCF2 (within exon 2 of Ush1g) human-specific; and CTCF3 (in between Ushg1 and Otop2) occupied in mouse and human ES cells. Notably, the binding of CTCF to CTCF3 has been replicated in all available genome-wide ChIP-Seq studies, including resting human CD4+T cells , five differentiated non-cancerous, karyotypically normal human cell lines (Figure (Figure6a),6a), and a number of immortalized human cancer cell lines ( and http://genome-test.cse.ucsc.edu; data not shown).
In silico analyses confirmed consensus CTCF-binding motifs within the sequence of all four CTCF-binding locations (Figure (Figure6f)6f) . Typical CTCF-binding sites are 20 bp long, with two conserved cores: one spanning bases 4-8 and the other bases 10-18 (Figure (Figure6f).6f). Furthermore, CTCF-binding sites can be grouped into low-, medium-, and high-occupancy sites (LowOc, MedOc, and HighOc, respectively) depending upon how closely they match the consensus CTCF-binding motif and the reported density of mapped tags. Each class has statistically divergent features, suggesting distinct functional roles . CTCF3 contains a HighOc binding site that is highly conserved across placental mammals. Statistically, HighOc sites tend to be more conserved than their flanking regions, are ubiquitously bound by CTCF, and act as chromatin barriers less often than LowOc sites; however, paradoxically, they have a greater tendency to delimit domains of co-regulated genes .On the other hand, CTCF2 (not occupied in mouse ES cells and some differentiated human cell lines) contains a LowOc motif. LowOc binding motifs demonstrate greater conservation in their flanking regions compared to HighOc sites, and tend to be cell-specific, with CTCF-binding being more variable between cell types than at HighOc sites. LowOc sites have been proposed to play a role in establishing chromatin barriers. Neither CTCF2- nor CTCF3-binding motifs are conserved in non-placental vertebrates (data not shown).
CTCF1 and CTCF4 are only occupied in mouse ES cells. Of note, the spacing of the cores within the CTCF1 motif was uniquely lost in human, chimpanzee, and gorilla due to the deletion of spacer base 9; otherwise the binding motif is conserved in placental mammals. Also, CTCF4 contains a strong CTCF-binding site (5' -GAGACCTGCAGGGGGCGCTC) in the mouse genome, although the binding motif is biased towards less-commonly reported bases (e.g., an almost fixed T in position 10, instead of an A) in other placental mammals.
In this study, we used comparative genome sequencing and cytogenetics to examine the evolutionary history and the genomic context of three Otop genes in 25 evolutionarily diverse vertebrate species. We also extended our evolutionary studies to the Ush1g deafness gene because of the tight head-to-tail physical clustering of Ush1g with Otop2 and Otop3 in vertebrate genomes, and because mutations in Otop1 and Ush1g result in inner ear phenotypes in vertebrates. Based on our analyses, we conclude that the evolution of the Otop family in hominoids, amphibians, and rodents significantly departs from that of most vertebrate genomes, as does the evolution of Ush1g in the teleostei fish lineages.
The most striking difference found between hominoid species and other vertebrates is that the OTOP1 locus is flanked by a large SD of high complexity and sequence identity, belonging to the TBSD family, and arranged in an inverted orientation. Furthermore, OTOP1 sits only 3 kb away from the OTOP1-proximal inversion boundary. Thus, the immediate genomic context of human OTOP1 differs significantly from that of mouse and zebrafish, the only other vertebrates in which Otop1 has been closely studied [4,10]. Therefore, information on regulation of mouse and zebrafish Otop1 may not accurately reflect human OTOP1 regulation, and/or Otop1 developmental and biochemical function(s) in mouse and fish may be represented by another OTOP gene in humans. Using cytogenetic and comparative genomic approaches, we examined the evolutionary history of the hominoid Otop1 locus, including the flanking TBSD duplicons. Our findings indicate that the TBSD family emerged at some point after the divergence of the human, chimpanzee, and orangutan common ancestor from the macaque lineage ~12-16 million years ago, and later underwent significant expansion (perhaps within the common ancestor of the great apes). Further, these SDs contribute to plasticity and instability in multiple regions of the genome .On human chromosome 4p16 we describe a large (~5 Mb) inversion polymorphism flanked by palindromic TBSD sequences with OTOP1 as the boundary gene; this polymorphic arrangement, occurring in one in eight individuals of the Caucasian population, likely originated in the common human-chimpanzee lineage prior to ~6 million years ago.
Our studies, combined with others , have yielded a predicted timeline of genomic events affecting human chromosome loci 4p16 and 8p23 that have contributed to the structural complexity and genomic instability of the OTOP1 locus in humans; this timeline highlights interesting evolutionary parallels between these two regions. The macaque and mouse genomes are orthologous across the Otop1 locus, meaning that Otop1 and Drd5 are tightly linked, with no evidence for the presence of 7E OR gene clusters, complex SDs, or RS447 microsatellite sequences. RS447 sequences are, however, present in the macaque region orthologous to human chromosome 8p23. In orangutan, clusters of 7E OR and RS447 sequences reside close to each other in both regions, yet there is no evidence for complex mosaic duplications or inversions in either location. Chimpanzee (and bonobo) show increased copy number of the RS447 megasatellite on chromosome 4p16 (as deduced by array CGH; see additional file 3) and evidence for significant expansion of the TBSD family across a number of chromosomes. Furthermore, inversion polymorphisms of similar size (~5 Mb) have been reported on chromosomes 8p23 and 4p16 in humans and chimpanzees.
The mechanism underlying such genomic rearrangements is not completely clear. One possible scenario is that flanking clusters of SDs triggered the inversions via a non-allelic homologous recombination event (; and present study). Alternatively, rather than rearrangements being mediated by the duplications themselves, the inversions could have helped to create the complex segmental duplication architecture present at their breakpoints . Whatever the causal mechanism, the frequency of 8p23 and 4p16 inversion polymorphisms is relatively high in the human population, oftentimes with serious consequences for double heterozygotes. Specifically, mothers who are double heterozygotes for the inversion polymorphisms on chromosomes 4p16 and 8p23 are prone to recurrent de novo t(4;8)(p16;p23) translocations through unusual meiotic exchanges, resulting in offspring with Wolf-Hirschhorn syndrome. The phenotype of patients with this syndrome consists of mental retardation and an array of developmental defects, often including hypotonia (decreased muscle tone; [15,44,45]). It is conceivable that individuals broadly labeled hypotonic might also suffer from vestibular dysfunction, which could go undiagnosed because it is often not clinically assessed. It would thus be interesting to study Wolf-Hirschhorn syndrome patients with balanced t(4;8)(p16;p23) translocations in search of individuals with disrupted or dysregulated OTOP1 (via OTOP1 copy number changes, formation of OTOP1 chimeras, or altered OTOP1 position). The resulting physiological consequences could include complete abrogation of OTOP1 function, haploinsufficiency, elevated OTOP1 expression, or creation of a fusion protein with a dominant-negative effect or novel gain of function , any of which would contribute to our understanding of the role of OTOP1 in human development.
X. tropicalis is the other vertebrate in which we found significant differences in genomic organization of the Otop family. Specifically, the X. tropicalis genome contains multiple paralogs for each Otop gene. We have determined that amphibian Nlo has a phylogenetic relationship to Otop3-like genes. Both Nlo and Otop1 are apparently involved in the development of the vestibular system in amphibians and placental mammals, respectively, suggesting that multiple Otop genes may have roles in vestibular system development. Studying the degree of sub-functionalization of Otop paralogs in X. tropicalis may help to define the unique functions attributable to each paralog.
Amphibians and other vertebrates share similar auditory and vestibular physiology, with their vestibular organs particularly well conserved in position, structure, and function . However, a striking difference is the unusual calcium carbonate crystalline forms of amphibian otoconia. The general trend during vertebrate evolution has been a replacement of otoconial vaterite and aragonite crystals by calcite, a calcium carbonate crystal polymorph of increased stability . Such a trend correlates with the transition of vertebrates from aquatic to terrestrial life. Chondrostei fish have vaterite and/or aragonite crystals, teleostean fish only have aragonitic otoliths, and both mammalian and avian otoconia consist exclusively of calcite crystals. During vertebrate evolution, calcitic otoconia appeared for the first time in the amphibian inner ear, yielding an intriguing intermediate situation with two crystalline forms: calcite in the utricle and aragonite in the saccule, lagena, and endolymphatic sac [49-52].
The structural variation of otoconia among vertebrates is thought to result from different properties of their scaffold proteins. However, the main scaffold protein in amphibians, otoconin-22 (Oc-22), is produced in most portions of the developing inner ear (i.e., saccule, crista ampularis, endolymphatic sac, and utricule ), making it unclear how two entirely different crystalline forms of otoconia are regionally specified during amphibian development. Therefore, additional factors unique to either the utricule (with calcitic otoconia) or saccule (with aragonitic otoconia) likely lead to distinct regional nucleation and crystal growth during otoconial formation in amphibians. Reverse genetic screens that might clarify the peculiarities of otoconial formation in amphibians have been impossible to undertake because animal models with clean, non-syndromic balance phenotypes are generally very rare , to the point that frog mutants with complete otoconial agenesis have yet to be described. In this regard, a candidate gene approach involving targeted disruption of Otop genes in X. tropicalis, individually or in combinations, could help discern whether different Otop paralogs have acquired discrete functions contributing to the regional differences in otoconial formation; such a study could be performed using morpholinos, which have been successfully used to specifically examine the role of Otop1 in vestibular system development in zebrafish  and general otic vesicle development in Xenopus .
Of the 16,000 deaf-blind persons in the United States, more than half are believed to have Usher syndrome (USH), a combination of progressive retinopathy and congenital hearing loss. USH type I is especially devastating because of its early onset and extreme phenotype of profound congenital deafness with unintelligible speech, retinitis pigmentosa within the first decade of life, and vestibular dysfunction. USHIG is one of five USH type 1 causative genes identified to date. Those include the genes encoding the molecular motor myosin VIIa (MYO7A; USH1B); two cell adhesion cadherin proteins, cadherin 23 (CDH23; USH1D) and protocadherin 15 (PCDH15; USH1F); and two scaffold proteins, harmonin (harmonin; USH1C) and USH1G (USH1G; USH1G). These proteins are involved in a protein network or "interactome" within sensory cells of the eye and inner ear .
Our evolutionary studies of Ush1g in vertebrates revealed a unique expansion of this gene family to include three members in all five fish species studied. A number of zebrafish models of USH have been described, including mariner (Myo7A, ), sputnik (Cdh23, ), and orbiter (Pcdh15a, ). Our efforts have revealed that multiple copies of Myo7A, Cdh23, Pcdh15, and Ush1g are present in fish genomes (two, two, two, and three copies, respectively; data not shown). Interestingly, there is evidence that the "USH interactome" may be more complex in fish than in other vertebrates . Specifically, the two Pcdh15 orthologs in zebrafish have distinct functions in hearing and vision: Pcdh15a is required for normal auditory and vestibular function, while Pcdh15b is required for normal photoreceptor outer segment organization and retinal function. Because Ush1g is a scaffold protein, the presence of three Ush1g genes in fish may affect the fish "USH interactome" (and consequently eye and inner-ear development) differently than in humans; such differences should be considered when using fish systems as general models for vertebrate ear physiology.
Functional studies that shed light on the cis-regulatory elements and fine genomic architecture of the Ush1g-Otop2 locus are needed to establish the mechanisms underlying the lineage-specific differences in Ush1g function. For instance, like human USH1g patients, the Ush1g-defective mouse model Js is profoundly deaf with vestibular dysfunction; however, unlike human USH1g patients, it does not have abnormal retinal phenotype . It is notable that the transcriptional boundaries between Ush1g and Otop2 are indistinct in rodent genomes due the presence of two rodent-specific Otop2, 5' untranslated exons that cause Ush1g to overlap with (or be embedded within) the Otop2 transcriptional unit; the effect of this configuration on the expression of these two genes is not known.
CTCF is a zinc-finger transcriptional repressor that serves an insulator function to limit the spread of heterochromatin; it can also operate as a transcriptional activator, regulate nuclear localization, and participate in the control of imprinting . The first intron of Ush1g contains an extensive number of evolutionarily conserved sequences and is bracketed by two putative CTCF-binding sites, CTCF2 and CTCF3. As a caveat, the predicted CTCF binding sites are based on positional information of sequence reads in chIP-seq studies and need to be further validated. Nonetheless, we suggest that the HighOc CTCF3 site between Ush1g and Otop2 may be ubiquitously bound by CTCF in mouse and human, while CTCF1, CTCF2, and CTCF4 may participate in gene expression regulation in a cell- and/or species-specific fashion. Further functional characterization of CTCF1 to CTCF4 will be pivotal for determining whether Ush1g and Otop are functionally insulated or participate in common developmental pathways.
Distinct evolutionary events have affected the Otop and Ush1g genes in vertebrate genomes, particularly in humans and commonly used animal models (specifically, mouse, fish, and Xenopus). The lineage-specific evolutionary history of these genes may therefore limit the functional information that can be inferred from animal studies. In this regard, our findings should help guide the choice of which animal system to use when investigating Otop and Ush1g function. For example, sub-functionalization of the Otop paralogs in amphibians may make X. tropicalis an appropriate model for exploring the organization, function, and regulation of Otop genes and their role in inner ear development in vertebrates; and fish may be an effective vertebrate model for discriminating the different functions of Ush1g, perhaps offering the ability to define the unique functions in inner ear and eye development attributable to each paralog. We also suggest that humans with chromosome 4p16 rearrangements should be studied for altered OTOP1 function, which may clarify this gene's role in human development.
This study also establishes a framework for defining whether and how Ush1g and Otop participate in common developmental pathways. Future studies will focus on functional validation of the highly conserved non-coding sequences that were identified in the Ush1g-Otop locus, including four putative CTCF-binding sites that likely play a role in orchestrating gene expression at this locus.
We assimilated a 25-species comparative sequence data set for studying the Otop1 and Ush1g-Otop2-Otop3 loci (species are listed in Table Table1).1). For seven species, we generated the sequence by targeted mapping and sequencing. Briefly, BAC clones were isolated from the following libraries (see http://bacpac.chori.org for details), as described [22,23]: galago (Otolemur garnettii; CHORI-256), dog (Canis familiaris; RPCI-81), cow (Bos taurus; CHORI-240), armadillo (Dasypus novemcinctus; VMRC-5), hedgehog (Atelerix albiventris; LB-4), frog (Xenopus tropicalis; CHORI-216), and platypus (Ornithorhynchus anatinus; CHORI-236). Each library was screened using pooled sets of oligonucleotide-based hybridization probes designed from the established reference sequence of the genomic regions encompassing the mouse Otop1 or Ush1g-Otop2-Otop3 loci (NCBI37, Chr5: 38602000-38894000 and Chr11: 115128000-115293000, respectively). After isolation and mapping, 25 BACs were selected and subjected to shotgun sequencing and sequence finishing, as previously described . Ultimately, we generated the orthologous sequence of the (1) Otop1 locus in galago, dog, armadillo, and platypus; and the (2) Ush1g-Otop2-Otop3 locus in galago, dog, armadillo, cow, hedgehog, and frog (see Table Table2).2). For each species and locus, a single non-redundant sequence was generated with the individual BAC sequences (i.e., a multi-BAC sequence assembly) using the program TPF Processor (http://www.ncbi.nlm.nih.gov/projects/zoo_seq). The resulting assemblies were manually verified and submitted to GenBank ([GenBank: DP000186-DP000190], [GenBank: DP000192], [GenBank: DP000198], and [GenBank: DP000200-DP000202]; Table Table22).
For the remaining species, and to capture data not derived from our targeted sequencing efforts, we retrieved the orthologous sequences from whole-genome sequence assemblies available on the UCSC Genome Browser (; see genome.ucsc.edu). Identifying the sequences of interest within these assemblies was generally straightforward. One exception involved the detection of a transcribed pseudogene (ΨOTOP1) located in the pericentromeric region of human chromosome 2 (NCBI human genome sequence build 35, Chr2:91186050-91237900); this region was not included in our data set and was identified by BLAST  analysis using OTOP1 as a query. Our collective efforts yielded the sequences of both genomic regions from nearly all of the 25 vertebrates listed in Table Table1.1. In addition to the data reported here, 22 additional low-coverage (~2-fold redundancy) vertebrate whole-genome sequence assemblies were examined. At the level of resolution offered by these sequences, no new instances of lineage-specific evolutionary events affecting the Otop and Ush1g families were detected (data not shown).
The assembled BAC sequences were annotated for gene content based on alignments to human RefSeq mRNA (or species-specific mRNA, if available) sequences using Spidey (http://www.ncbi.nlm.nih.gov/spidey) and BLAST . Known repetitive sequences were detected by RepeatMasker (http://www.repeatmasker.org) using appropriate repeat libraries for each species. In addition, Sequin (http://www.ncbi.nlm.nih.gov/Sequin) was used to import and confirm all annotations, including verifying splice-site consensus sequences, exon structure, and predicted protein sequences. Pair-wise and multi-species sequence comparisons were performed using MultiPipMaker , first using the mouse sequence and then the human sequence as the reference. Multi-species alignments of nucleotide or deduced protein sequences were refined with Sequence Alignment Editor (Se-Al, http://evolve.zoo.ox.ac.uk). Sequence Logos for CTCF1 to CTCF4 were based on multi-species sequence alignments in placental mammals, as prepared with WebLogo v2.8.2 (http://weblogo.berkeley.edu/). All data associated with these analyses are available in GenBank or in the form of UCSC custom track bed files (see additional files 1, 7, 8, 9, 10 and 11).
The program Miropeats  was used to align genome sequence assemblies, to determine the length, location, and relative orientations of the segmental duplications (see hg17 versus hg19, additional file 5; and OTOP1-containing region in Chr4 versus ΨOTOP1-containing region in Chr2, Figure Figure4b),4b), and to display such DNA sequence similarity information graphically. The order and orientation of mosaic duplicons within complex duplication blocks was delineated with DupMasker  with customized Perl scripts.
Interphase nuclei were prepared from lymphoblast cell lines from human and three primate outgroup species as follows: human cell line GM12004 (Coriell Cell Repository, Camden, NJ); chimpanzees Marcus, Cochise, Douglas, Katie, and Veronica (unknown source; samples donated by Dr. Mariano Rocchi and Dr. Mario Ventura); macaque MMU2 (Macaca mulatta; sample donated by Dr. Mariano Rocchi and Dr. Mario Ventura); and orangutan cell line pr01109 (Susie; Coriell Cell Repository, Camden, NJ) and individuals PPY9 and PPY6 (unknown source; samples donated by Dr. Mariano Rocchi and Dr. Mario Ventura). FISH analyses were performed using fosmids WIBR2-1849E16, WIBR2-1416B12, and WIBR2-1634L14, which were labeled by nick-translation with Cy3-dUTP (Perkin-Elmer), Cy5-dUTP (Perkin-Elmer), and fluorescein-dUTP (Enzo), respectively, as generally described by Lichter et al. . Briefly, 300 ng of labeled probe were used; hybridizations were performed at 37°C in 2XSSC, 50% (v/v) formamide, 10% (w/v) dextran sulphate, and 3 μg sonicated salmon sperm DNA in a volume of 10 μl; posthybridization washing was performed at 60°C in 0.1XSSC (three times, high stringency); and nuclei were simultaneously DAPI stained. Digital images were obtained using a Leica DMRXA2 epifluorescence microscope equipped with a cooled CCD camera (Princeton Instruments). DAPI, Cy3, Cy5, and fluorescein fluorescence signals (each detected with specific filters) were recorded separately as gray-scale images. Pseudocoloring and merging of images were performed using Adobe Photoshop software. A minimum of 50 interphase cells were analyzed for each hybridization.
SDs in genomic regions of interest were identified by two analytical approaches: assembly-dependent (whole-genome assembly comparison, or WGAC) and assembly-independent (whole-genome shotgun sequence detection or WSSD) . WGAC is a self-self BLAST-based strategy optimized to provide pair-wise relationships among the SDs within one species. WSSD detects SDs as sequences with overrepresented depth of coverage in randomly selected sequences from a human and non-human whole-genome sequence assembly, using the human sequence assembly as a reference. Inferences can thus be made about the duplication status in different primates . The duplicon content analysis was based on the study by Jiang et al. , in which the evolutionary history of every duplicon of the human genome was reconstructed with a De-Brujin algorithm using macaque as an outgroup.
Multi-species conserved sequences (MCSs) were detected across the 40-kb Ush1g-Otop2-Otop3 locus using ExactPlus (; http://research.nhgri.nih.gov/projects/exactplus/). Briefly, a line-by-line MultiPipMaker alignment was generated with the high-quality genomic sequence from galago, dog, armadillo, cow, and hedgehog, along with the mouse (build 37; Chr11:115168000-115209000) and human (hg17; Chr17:70408200-70458400) orthologous sequences from the UCSC Genome Browser. Conservation parameters (designated as 7-5-4) were set such that ExactPlus would scan the MultiPipMaker alignment (acgt) file and find blocks of bases (or 'seeds') where a minimum of five species had identical sequences for seven bases; a subsequent base-by-base extension step in either direction was allowed on condition that each extended base was identical in a minimum of four species. Detected MCSs that coincided with UCSC Genome Browser gene annotations, gene predictions, mouse mRNAs, or spliced mouse ESTs were removed to enrich for non-coding MCSs. Subsequently, we intersected the ExactPlus-detected MCSs with MCSs represented on the "PhastCons Conserved Elements, 30-way Vertebrate Multiz Alignment" track in the UCSC Genome Browser (phastConsElement30way; http://genome.ucsc.edu). This track displays conserved sequences detected in alignments of 30 vertebrate sequences using phastCons from the PHAST package ; note that the phastConsElement30way default parameters are tuned to detect the top 5% most-conserved sequences in the genome. The intersection of the two data sets revealed 24 shared non-coding MCSs, accounting for ~0.8% of the analyzed sequence. These 24 are considered the most conserved non-coding MCSs across this ~40-kb genomic region. Comprehensive listings of the ExactPlus-identified MCSs and their intersection with phastConsElement30way are available in additional file 13.
A comprehensive phylogeny of the ODP family in vertebrates and invertebrates has been described elsewhere . Here, we specifically examined the phylogenetic relationships between the multiple amphibian Otop genes with respect to the set of three subfamilies (Otop1, -2, and -3) typically found in other vertebrates. In X. tropicalis, the Otop family consists of eight genes, as determined by a combination of targeted BAC sequencing and database searches. Specifically, six genes were identified and annotated in BAC AC166187 (which was specifically isolated and sequenced for this study), and two genes were identified and annotated in X. tropicalis scaffold_441 (genome.ucsc.edu; see additional file 1 for Otop1a and Otop1b coordinates in the X. tropicalis assembly). Thirteen identified X. tropicalis ESTs and a full-length cDNA were used to verify the Otop gene annotations ([GenBank: CX926385], Otop1a; [GenBank: DN050494], non-spliced Otop2a; [GenBank: DN023283], Otop2b; [GenBank: EL816442], Otop2c; [GenBank: DN086801, BX774194, EL846279, DN086800 and CR448138], Otop3b; [GenBank: EL846278, BX774945, AL858287 and AL886472], Otop3c; and [GenBank: BC155406], full-length Otop3b). For X. laevis, only a partial gene complement could be identified by literature and database searches, including one EST ([GenBank: EB479546], Otop3a) and a full-length cDNA, Nlo ; also three partial Nlo ESTs were identified [GenBank:EB479546, BJ065669 and BJ077411]. A multi-species protein sequence alignment was created that included all known amphibian, mouse, human, and dog Otop sequences. Otop proteins contain 12 trans-membrane domains ; for these analyses, the entire intracellular domain and four loops (L5, L6, L8, and L10) were removed from the multi-sequence alignment because of their highly variable length across phyla. The resulting alignment was analyzed using RaxML with a JTT mixed model of amino-acid substitution to generate the Maximum Likelihood (ML) tree. Of the ten inferences generated from ten distinct randomized maximum parsimony (MP) starting trees, four of the resulting phylogenetic trees produced a maximum likelihood value of -8429.55674. The resulting tree was drawn with FigTree (http://tree.bio.ed.ac.uk/software/figtree/).
(USH): Usher syndrome; (USH1G; also SANS): USH subtype 1G gene; (SAM): Sterile alpha motif domain; (Otop): otopetrin; (ODP): Otopetrin Domain Protein; (Nlo): neural crest-lateral otic vesicle localization gene; (Zfp509): zinc finger protein 509; (Lyar): Ly1 antibody reactive clone; (Tmem128): transmembrane protein 128; (Drd5): dopamine receptor 5; (Slc2a9): facilitated glucose transporter member 9; (OR): olfactory receptor; (SD): segmental duplication; (TBSD): tumor break-prone segmental duplication; (USP17): deubiquitinating enzyme gene; (Fads6): fatty acid desaturase domain family member 6; (C17orf2): hypothetical protein; (Cdr21): cerebellar degeneration-related protein 2-like; (BAC): Bacterial Artificial-Chromosome clone; (FISH): Fluorescence in situ hybridization; (CTCF): CCCTC-binding factor, zinc finger protein; (ChIP-seq): chromatin immunoprecipitation-sequencing; (ES): embryonic stem cells; (MCSs): Multi-species Conserved Sequences; (HRE): human renal epithelial cells; (BJ): skin fibroblasts; (HUVEC): human umbilical vein endothelial cells; (SAEC): small airway epithelial cells; (NHEK): normal human epidermal keratinocytes; (EST): expressed sequence tag; (Oc-22): otoconin-22; (MYO7A): myosin VIIa; (CDH23): cadherin 23; (PCDH15): protocadherin 15; (js): Jackson shaker; (tlt): tilted; (mlh): mergulhador; (ied): inner ear defect; (bsk): backstroke; (WGAC): genome assembly comparison; (WSSD): genome shotgun sequence detection; (ML): Maximum Likelihood; (MP): maximum parsimony; (array CGH): Array Comparative Genome Hybridization.
BH and EDG designed the overall study. BH, TMB, FA, and JFR performed all analyses and drafted the manuscript. NISC Comparative Sequencing Program provided sequence data. IH, EEE, DMO, and EDG edited the manuscript. All authors read and approved the final manuscript.
UCSC custom track bed file: Xenopus tropicalis Otop1b; Xenopus tropicalis Otop1a.
Table S1. Pair-wise alignment between the proximal and distal segmental duplications flanking human cytogenetic band 4p16.1-p16.3
Figure S1. Comparative genomic architecture of the chromosome 4p16.3 region
Segmental duplication content of the Otop1-proximal and distal regions in the h17 and h19 human genome sequence assemblies; orthologous conservation of the TBSD family in the primate genomes
Figure S2. Segmental duplication content of the Otop1-proximal and Otop1-distal regions in the hg17 and hg19 genome assemblies
Figure S3. Distribution of the TBSD segmental duplication family across the human genome
UCSC custom track bed file: Gasterosteus aculeatus Sansa; Gasterosteus aculeatus Sansb; Gasterosteus aculeatus Sansc
UCSC custom track bed file: Tetraodon nigroviridis Sansa; Tetraodon nigroviridis Sansb; Tetraodon nigroviridis Sansc
UCSC custom track bed file: Oryzias latipes Sansa; Oryzias latipes Sansab; Oryzias latipes Sansac;
UCSC custom track bed file: Takifugu rubripes Sansa; Takifugu rubripes Sansb: Takifugu rubripes Sansc
UCSC custom track bed file: Danio rerio Sansb; Danio rerio Sansc
Figure S4. Comparative analysis of the Ush1g proteins in five fish species
UCSC custom track bed file: ExactPlus 7-5-4 MCSs; ExactPlus 7-5-4 overlaps phastConsElement30way MCSs
We thank Mariano Rocchi and Mario Ventura (University of Bari) for providing chimpanzee and macaque samples used in the tri-color interphase FISH analyses; Laura Elnitski (NHGRI) for advice about analyzing non-coding conserved sequences; Zhaoshi Jiang (University of Washington) for advice about analyzing the evolutionary breakpoints on chromosome 4p16; Bechara Kachar (NIDCD) for critically reading the manuscript and for insightful discussions; Julia Fekecs and Darryl Leja, Rebecca Chodroff, and Shurjo Sen (NHGRI) for input about figure generation; Jacquelyn K. Beals for the editing; and the three anonymous reviewers for their helpful feedback. We also thank numerous people associated with the NISC Comparative Sequencing Program, in particular Robert Blakesley, Gerry Bouffard, Jennifer McDowell, Baishali Maskeri, Nancy Hansen, Morgan Park, Pamela Thomas, Alice Young, and the many dedicated technicians. This work was supported by the NHGRI Intramural Research Program, NIDCD grant DC02236 (DMO); and NHGRI grants HG002385 and HG0058815 (EEE). EEE is an investigator of the Howard Hughes Medical Institute.