|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
The mammalian vomeronasal organ (VNO) expresses two G-protein coupled receptor gene families that mediate pheromone responses, the V1R and V2R receptor genes. In rodents, there are ~150 V1R genes comprising 12 subfamilies organized in gene clusters at multiple chromosomal locations. Previously, we showed that several of these subfamilies had been extensively modulated by gene duplications, deletions, and gene conversions around the time of the evolutionary split of the mouse and rat lineages, consistent with the hypothesis that V1R repertoires might be involved in reinforcing speciation events. Here, we generated genome sequence for one large cluster containing two V1R subfamilies in Mus spretus, a closely related and sympatric species to Mus musculus, and investigated evolutionary change in these repertoires along the two mouse lineages.
We describe a comparison of spretus and musculus with respect to genome organization and synteny, as well as V1R gene content and phylogeny, with reference to previous observations made between mouse and rat. Unlike the mouse-rat comparisons, synteny seems to be largely conserved between the two mouse species. Disruption of local synteny is generally associated with differences in repeat content, although these differences appear to arise more from deletion than new integrations. Even though unambiguous V1R orthology is evident, we observe dynamic modulation of the functional repertoires, with two of seven V1Rb and one of eleven V1Ra genes lost in spretus, two V1Ra genes becoming pseudogenes in musculus, two additional orthologous pairs apparently subject to strong adaptive selection, and another divergent orthologous pair that apparently was subjected to gene conversion.
Therefore, eight of the 18 (~44%) presumptive V1Ra/V1Rb genes in the musculus-spretus ancestor appear to have undergone functional modulation since these two species diverged. As compared to the rat-mouse split, where modulation is evident by independent expansions of these two V1R subfamilies, divergence between musculus and spretus has arisen more by mutations within coding sequences. These results support the hypothesis that adaptive changes in functional V1R repertoires contribute to the delineation of very closely related species.
Pheromones are secreted chemicals recognized by members of the same species (conspecifics) that convey information about individual members. The vomeronasal organ (VNO) of terrestrial vertebrates is responsible for pheromonal responses that evoke social and reproductive behaviors, including male territorial aggression, sexual preference, and sexual maturity (reviewed in ). These responses are thought to be mediated by at least two unrelated gene families, referred to as V1Rs  and V2Rs [3-5], encoding G protein-coupled receptors (GPCRs) that are expressed on surfaces of sensory neurons in the VNO.
In addition to being the principle chemoreceptors in the olfactory and gustatory sensory systems, the large GPCR superfamily has prominent roles in transducing signals for diverse ligands, including ions, hormones, neurotransmitters, nucleotides, and photons . The GPCR repertoire has a common seven transmembrane structure, and mammalian GPCR proteins are classified into six major families: the peptide-binding secretin types, the adhesion types that contain N-terminal domains with motifs implicated in cell adhesion functions (e.g., EGF-like repeats), the glutamate types (including TAS1 taste receptors and the V2Rs expressed in the VNO), the frizzled types, the Taste2 types (TAS2 taste receptors), and the large set of rhodopsin types (including the olfactory receptors) . V1Rs share a distant relationship with the Taste2- and rhodopsin-types of GPCRs that bind ligands within transmembrane cavities (and that lack the large N-terminal binding domains characteristic of other GPCRs).
V1R gene repertoires vary significantly among species. In mouse and rat, there are a total of approximately 150 V1R genes from 12 phylogenetically distinct subfamilies [7,8]; members of each subfamily tend to be clustered at one or two chromosomal locations, reflective of a recent history of expansion by tandem duplications. In human and chimpanzee, the V1R repertoire consists almost entirely of non-functional pseudogenes, consistent with the loss of VNO function possibly concurrent with the advent of trichromatic color vision during primate evolution . Surprisingly, the dog genome encodes only five intact V1R genes , the cow and opossum genomes encode less than a third the number of V1Rs as mouse , and several other mammalian species (such as pig , sheep , and ferret ) seem to similarly exhibit reduced VNO function as compared to rodents. Therefore, rodents appear to be exceptional with respect to their reliance on the VNO and V1R gene repertoires for mediating pheromone responses.
The mouse and rat V1R repertoires have modulated significantly since these two species diverged. In general, orthologous relationships between genes and syntenic relationships between clusters are difficult to map as a result of lineage-specific duplications and deletions. Two striking examples of delineation between the two species are the complete deletion of two subfamilies (V1Rh and V1Ri) in rat that represent the largest gene cluster in mouse, and the independent generation of two subfamilies (V1Ra and V1Rb) in both mouse and rat from a small number of ancestral genes in the presumptive ancestor [7,8,13]. Moreover, the V1Ra/V1Rb duplications in both species appear to have occurred over a very short period of evolution just following mouse-rat speciation, probably driven in part by a wave of LINE repeat integration within the locus . These observations suggest that adaptive modulation of V1R repertoires were important in the reinforcement of interspecies communication barriers in the period immediately following speciation.
Unlike the odorant receptors of the main olfactory system, which exhibit ligand promiscuity as part of a combinatorial system of odor recognition (reviewed in ), the V1Rs of the VNO might respond to a very narrow range of ligands, if not exclusively to one ligand . Moreover, while odorant receptor sensory neurons of the main olfactory system exhibit responsiveness in a concentration-dependent manner, the subset of vomeronasal sensory neurons responding to a particular stimulus do not change with the concentration of that stimulus. These observations suggest that different members of a V1R subfamily detect highly specific ligands, implying that modulation of subfamily repertoires could modulate physiologically distinct functions.
Little is currently known about V1R function. A recent knockout of the entire V1Ra-V1Rb cluster in mouse perturbed normal patterns of female aggression and caused male mice to show reduced sexual behavior towards females . Similarly broad effects in sexual and social behaviors are evident in mice genetically deficient in Trp2, a protein required for vomeronasal neuronal signalling . The low resolution of these phenotypes and the apparent phenotypic overlap between mice deficient in a subset of V1Rs and those fully deficient in V1R response suggest that individual V1Rs make combinatorial or additive contributions to broad behavioural patterns, as opposed to each V1R directing a distinct behaviour. To our knowledge, the hypothesis that V1Rs contribute to the establishment of mating barriers between species has not yet been investigated.
It is presumed that one function of pheromones is the establishment of prezygotic mating barriers so that unproductive mating is not attempted (reviewed in ). Prezygotic mating barriers can quickly arise to prevent non-productive interbreeding attempts between subpopulations otherwise capable of mating , and therefore, modulation of pheromones and their receptors might be especially important for selective breeding within sympatric populations. For example, the Mediterranean short-tailed mouse, Mus spretus, is sympatric with certain subspecies of Mus musculus in parts of southern Europe and North Africa. No hybrids of these two species have been observed in nature, suggesting that they will not interbreed (presumably due to a selective disadvantage in hybrid offspring), even though they can produce viable offspring in the laboratory . These two species diverged approximately 1.1 million years ago [21,22]; with an estimated neutral substitution rate of ~1% per million years [23-25], we expect that typical orthologs will be ~98% identical between the two species.
In this study, we compare V1Ra and V1Rb gene repertoires in Mus spretus and Mus musculus in order to investigate whether, even over these very short evolutionary periods and in a background of very high sequence identity, these V1R repertoires exhibit the dynamic functional modulation observed between mouse and rat. Our results indicate that functional modulation of rodent V1R repertoires has occurred, albeit by diverse evolutionary paths, supporting the hypothesis that adaptive changes in these V1R repertoires have contributed to the delineation of even very closely related species.
We produced a mixture of Mus musculus V1R probes composed of 200–300 bp PCR products from three members of the V1Ra and four members of the V1Rb subfamily (Table (Table1).1). We confirmed probe specificity by conducting a Southern blotting of musculus BAC clones known to contain V1R sequences from the V1Ra, V1Rb, V1Rc, V1Rh, and V1Ri subfamilies; only clones containing V1Ra/V1Rb sequences hybridized to our probes (not shown). We then used these V1Ra/V1Rb probes to screen the CHORI-35 SPRET/Ei BAC library http://bacpac.chori.org/library.php?id=170. We identified 17 positive clones; we expected to identify approximately 12 positive clones given the redundancy of the library (three-fold) and the anticipated size of the V1Ra/V1Rb cluster (~700 kb, or approximately four BAC-sized inserts in length). We sequenced the ends of all 17 BACs and confirmed that all non-repeat BAC-end sequences were unambiguously orthologous to the V1Ra/V1Rb locus in the most recent musculus genome assembly (UCSC Genome Browser, July 2007). These BAC-end sequences, along with an independent Southern blot restriction fragment analysis of the isolated clones, were used to select a minimal tiling path of four BAC clones that we thought would span the entire Spretus V1Ra/V1Rb cluster, with the two flanking BACs extending out to at least one non-V1R gene at either end of the cluster. These four BACs were then subjected to shotgun sequencing, yielding near 'comparative-grade' finished sequence as described in Blakesley et al.  (Genbank accession numbers: AC225052, AC225271, AC225873, AC229624).
We mapped contigs from spretus BAC sequences onto the most recent musculus assembly using PipMaker http://pipmaker.bx.psu.edu/pipmaker. We generated a presumptive locus assembly of ordered/oriented spretus contigs that maximizes contiguous alignment between the two species (Fig. (Fig.1).1). Chained alignments between musculus and spretus sequences encompass ~485 kb, or approximately 68% of the spretus assembly and 67% of the musculus assembly. The remaining portion of the two mouse sequences that cannot be unambiguously aligned consists predominantly of repeats that have integrated or deleted in one but not the other lineage (see following section). It is possible that this presumptive assembly is incorrect if there have been additional evolutionary events that have disrupted local synteny. However, we note that the resulting spretus sequence produces an array of putative orthologous V1R genes that exactly matches the order and orientation observed in musculus.
The synteny map shown in Figure Figure11 reveals three ambiguities. First, the spretus sequences at one end of the cluster have subregions that exhibit lower than expected levels of orthology. As noted, neutral sequences are expected to diverge ~2% between the musculus-spretus split, and yet, orthologous sequences in the vicinity of the musculus V1rb7 gene, as well as other subregions (e.g., in the vicinity of the spretus YUA.6pg/musculus MUSpg.89648 orthologs), exhibit unexpectedly high divergence (Fig. (Fig.2).2). The V1rb7-YUA.5 are closest mutual homologs in the same relative position and orientation in their respective clusters, and therefore are putative orthologs, yet exhibit a synonymous substitution rate (dS) of ~17%, which is much greater than should be the case in the roughly one million years since the two Mus species diverged. It seems likely that these subregions have been subject to gene conversion events in one or both species that have obscured what is otherwise a much closer relationship. Nevertheless, this high level of sequence divergence might underlie divergent functions between these orthologs.
Second, the spretus locus appears to have deleted a segment in the vicinity of the musculus V1ra8-V1rb3 genes, although it is possible the segment is missing at this phase of sequence completion (Fig. (Fig.1).1). In musculus, the similar V1ra7 and V1ra8 paralogs exhibit a dS = ~6.5%, consistent with a duplication event that occurred prior to the musculus-spretus split. Therefore, the absence of V1ra8 in the spretus assembly is more consistent with loss in spretus than gain in musculus. The fact that V1rb3, the next gene in the musculus cluster, is also missing in spretus reinforces the hypothesis that a segmental deletion that includes both V1ra8 and V1rb3 occurred in the spretus lineage.
Third, the spretus locus is unresolved in the vicinity of the musculus V1ra3-V1ra4 paralogs (Fig. (Fig.1).1). This region is near the junction of the two middle BACs (CH35-373N1 and CH35-336J18). There is no apparent overlap in these two BACs (i.e., with 100% sequence identity), although both BACs have highly similar subsequences as a consequence of each having possible orthologs to V1ra3/V1ra4 (YUB.3 and YUC.6, respectively). dS values predict that paralagous duplications in this region occurred prior to the musculus-spretus split (dS for V1ra3-V1ra4 = ~6%; dS for YUB.3-YUC.6 = ~5%), and yet the dS values of cross-species comparisons are greater than expected for orthology (dS range = ~4–7%). Therefore, the evolutionary history relating these four homologs might again be obscured by lineage-specific gene conversions. We also note that no putative ortholog to the YUC.7pg V1R pseudogene (most similar to musculus V1rb4), located just downstream of YUC.6 on CH35-336J18, is present in the musculus assembly, possibly indicating that lineage-specific rearrangements might have disrupted synteny in this region. Further resolution of this phylogeny would require higher-quality and more complete sequence across this region, so that the full V1R content/positioning and surrounding non-coding gene blocks through this region could be more rigorously analyzed.
We previously observed that the V1Ra and V1Rb subfamilies underwent a burst of duplications independently in both the mouse and rat lineage shortly after these two species diverged . The timing of these gene duplications was correlated with the timing of LINE repeat integrations at this locus, possibly suggesting a cause-effect relationship. As in musculus and rat, LINE repeats are abundant within the spretus V1Ra/V1Rb locus, comprising ~39% of the sequence and clearly demarking this V1R gene cluster from surrounding LINE-poor non-V1R territory (Fig. (Fig.1;1; Table Table2).2). As noted above, a significant fraction of the broken synteny between musculus and spretus is LINE content present in one but not the other lineage. However, unlike with mouse versus rat, where lineage-specific LINE integration accounted for the vast majority of broken synteny, we find very little evidence for lineage-specific LINE integration since the spretus-musculus split (Fig. (Fig.3;3; Table Table2).2). Therefore, the LINE (and other repeat) blocks present in one but not the other Mus species are more likely due to lineage-specific deletion, recombination, and/or gene conversion of pre-existing LINE content.
V1R orthologs are easily identified based on two criteria: first, orthologous pairs occupy the same relative position and orientation in the syntenic map (Fig. (Fig.1);1); and second, orthologous pairs exhibit synonymous substitution (dS) rates approximating the expected ~2% in codon-aligned sequences (Table (Table3).3). Noted exceptions with conserved synteny but with higher than expected dS levels (the YUA.5-MUS.B7 and YUA.6pg-MUSpg.89648 putative orthologous pairs) and where orthologous relationships are ambiguous (the YUB.3-YUC.6-MUS.A3-MUS.A4 orthologous groups) are discussed separately below. A phylogenetic tree of spretus V1R genes and pseudogenes that includes members of the V1Ra and V1Rb subfamilies from musculus and rat illustrates these orthologous relationships (Fig. (Fig.44).
This tree suggests that the rodent ancestral cluster probably had at least five V1R genes, four from the V1Ra subfamily (V1ra1-like, V1ra7-like, V1ra9-like, and V1ra16-like) and one from the V1Rb subfamily. The V1ra9-like ancestor at one end of the cluster has been deleted in rat, and the V1ra16-like ancestor at the other end of the cluster became a pseudogene in mouse. The three other ancestral genes (the V1Rb-like, V1Ra1-like, and the V1ra7-like ancestors) duplicated independently in the mouse and rat lineages (Fig. (Fig.4).4). Therefore, the functional repertoire is significantly different between mouse and rat, driven by lineage-specific mutations, deletions and duplications.
We do not identify any lineage-specific V1R duplications since the musculus-spretus split. The YUD.3-MUS.A7-MUS.A8 clade in Figure Figure44 shows an apparent musculus-specific duplication, however as noted previously, the synonymous substitution rate (dS) between the A7-A8 pair is ~6.5%, or much greater than ~2% expected for the spretus-musculus split, and thus, the paralogous duplication probably occurred before the two mouse species diverged. Therefore, this clade is better explained by spretus-specific gene deletion of the A8 homolog (assuming it is not missing within gaps of the unfinished spretus sequence). In addition, the YUB.3-YUC.6-MUS.A4-MUS.A3 clade (Figure (Figure4)4) also shows apparent species-specific duplications, however as noted previously, neither the spretus pair (YUB.3-YUC.6, dS = 5%) nor the musculus pair (MUS.A4-MUS.A3, dS = 6%) exhibit a low enough synonymous substitution rate to be consistent with post-speciation duplication. Instead, it seems more likely that a paralogous duplication occurred prior to the musculus-spretus split, and that the unexpected topology of this clade is due to noise in the analysis or to gene conversion events that might have obscured relationships. Therefore, in contrast to mouse and rat, the musculus-spretus V1Ra/V1Rb repertoires have not significantly diverged by lineage-specific gene duplications, a result that seems consistent with the observed low incidence of lineage-specific repeat activity.
Instead, divergence of V1Ra/V1Rb repertoires between spretus and musculus is evident by selective loss of function (Table (Table3).3). Of the eight orthologs in the V1ra1-like clade, two have apparently become pseudogenes in musculus. Of the three orthologs in the V1ra7-like-V1ra9-like clade, one has apparently deleted in spretus. And of the seven orthologs in the V1Rb clade, one has deleted and another has become a pseudogene in spretus. Therefore, of the 18 V1Ra/V1Rb genes presumed to be functional in the common Mus ancestor, five (~28%) have lost function in one but not the other Mus lineage.
We emphasize however, that additional V1R sequences could reside within the remaining gaps in the current spretus sequence assembly. We compared in silico digests of the BAC sequences by three separate restriction endonucleases (BamH1, EcoR1, and HindIII) to the same generated in the laboratory and imaged on a high resolution gel electropherogram. This comparison of sequence to physical data allows a validation of sequence assembly (order and orientation) for each BAC. From this analysis, we estimate that all gaps internal to each BAC assembly are <2 kb (not shown), too small to be expected to contain additional V1R gene blocks (~5–10 kb in size). It is also unlikely that a V1R gene resides between the leftmost two BACs (CH35-319B4 and CH35-373N1) in the spretus assembly, since based on mapping to the musculus assembly, the missing inter-BAC region is only ~3.5 kb and this region does not contain a V1R gene in musculus. In contrast, the unresolved region between the two middle BACs (CH35-373N1 and CH35-336J18) might be >40 kb and is likely to contain several V1Rs. In particular, identification of the complete set of the "A3-like" and "A4-like" homologs in spretus, and their evolutionary relationship with V1ra3 and V1ra4 in musculus, will require the identification and analysis of a genomic DNA clone that spans this gap in our assembly. Thus, it remains possible that additional functional divergence between spretus and musculus will be revealed by future sequence from this region.
It has been previously noted that rodent V1R homologs exhibit unusually low dS/dN ratios as compared to typical gene pairs in these species , suggestive of a tendency for adaptive selection, presumably to meet changing niche- and species-specific requirements. Previous studies have focused on mouse paralogs, rat paralogs, and mouse-rat homologs, but not mouse-rat orthology, because orthologs cannot be unambiguously assigned due to excessive lineage-specific duplications that obscure orthologous relationships. In the current study, we specifically investigated selective pressures acting on putative functional orthologs, an investigation made possible by the unambiguous assignment of orthologous gene pairs.
We observe that two orthologous pairs, YUB.2-MUS.A5 (dS/dN = 0.99) and YUC.1-MUS.B2 (dS/dN = 0.25), exhibit greater non-synonymous than synonymous substitution rates (Table (Table3).3). This observation is consistent with adaptive selection acting on these pairs since the musculus-spretus split. Selection for new amino acids in these V1R proteins could indicate divergence from ancestral functions, and therefore, these V1Rs might contribute to additional functional delineation between the two species. It is important to note that with so few nucleotide substitutions accumulated between these orthologs (9 and 15 mutations, respectively), dS/dN ratios can be misleading due to small sample sizes. For example, we cannot assert that the dS/dN ratio for the YUB.2-MUS.A5 pair is significantly different than neutrality (dS/dN = 1), a possibility if one or both genes have become cryptic pseudogenes (i.e., non-functional as opposed to diverged function). The argument for adaptive selection between YUC.1-MUS.B2 is much more compelling, where 14 of 15 observed mutations have caused amino acid changes.
In order to assess the possible biological significance of amino acid substitutions observed in these two V1R gene pairs, we mapped mutations onto predicted protein structures. We compared these locations to other gene pairs presumed to be under purifying selection (six orthologous pairs with dS/dN > 2) (Table (Table4).4). Non-synonymous mutations in the six V1R pairs presumed to be under purifying selection have accumulated more frequently within transmembrane domain-4 (TM4), TM6, and the intracellular loop between TM5 and TM6. Non-synonymous mutations in YUB.2-MUS.A5 and YUC.1-MUS.B2, the two V1R pairs postulated to be under positive selection, also exhibit a bias for accumulation within the intracellular loop between TM5 and TM6, but exhibit a greater incidence of amino acid change in TM3 and TM1. Three non-synonymous mutations (3/44 opportunities = ~7% incidence) have accumulated within TM3 (two for YUB2-MUSA5 and one for YUC1-MUSB2), whereas no amino acid substitutions are observed in this transmembrane domain among the other intact genes (0/132 opportunities = 0% incidence). Five non-synonymous mutations (5/24 opportunities = ~21% incidence) have accumulated within TM1 (one for YUB2-MUSA5 and four for YUC1-MUSB2), whereas only one amino acid substitution is observed in this transmembrane domain among the other intact genes (1/72 opportunities = ~1% incidence). Interestingly, previous analysis of mouse-rat gene pairs indicated a paucity of non-synonymous mutations within TM1 and TM3 , raising the possibility that different regions of the V1R structure adapted during the mouse-rat split than during the musculus-spretus split.
At higher resolution, we observe an interesting mutational trend for YUB.2-MUS.A5 and YUC.1-MUS.B2: a tendency to accumulate non-synonymous substitutions immediately adjacent to the most universally conserved amino acid residues (Table (Table4).4). Assuming a random distribution, we would expect a 21% incidence of amino acid changes immediately adjacent (+/- 1 amino acid position) to the 21 most well-conserved residues (see Methods). This expectation is slightly greater than what we observe for six gene pairs presumed to be evolving under purifying selection (6/36 amino acid changes = 17%). In contrast, 35% (8/23) of the amino acid changes in YUB.2-MUS.A5 and YUC.1-MUS.B2 occur immediately adjacent to these well-conserved residues. Note that only in one of these eight cases did the well-conserved amino acid itself change, therefore this bias would appear to have more to do with modulating rather than eliminating the function of these conserved residues. Analysis of a larger sample size when additional Mus spretus V1R sequences become available are required to further evaluate the significance of these observations, and elucidation of the physiological significance of these amino acid changes awaits structure-function and ligand-binding studies.
We have generated ~700 kb of genome sequence encompassing a V1R putative pheromone receptor locus (V1Ra and V1Rb subfamilies) in Mus spretus. The genome features of this locus resemble the syntenic locus in Mus musculus, including a comparable number of V1R genes with little evidence for gene duplication/deletion, organized in a comparably sized region flanked by the same non-V1R genes on either side. These two mouse V1R loci also exhibit a similar high density of LINE repeat content that appears to have predominantly integrated just before and since the mouse-rat split. Although disruptions in synteny are largely accounted for by the presence/absence of LINE repeat content in one or the other lineage, these LINEs appear to be too old to have integrated by retrotransposition since the musculus-spretus split, and thus, seem to have either arrived by other mechanisms or been lost in one of the lineages. Therefore, LINE integrations and V1R gene birth/death processes are not characteristics of the divergence between musculus and spretus, in contrast to the previous observations made for mouse and rat. Instead, functional V1R repertoires in musculus and spretus appear to have diverged by adaptive evolution in which complementary V1R subsets have deleted along the two lineages and specific orthologs appear to be subject to diversifying selection and gene conversion. While the evolutionary paths differ, the outcome is similar among mouse and rat species: nearly half of the mouse V1R repertoire (8/18) have been subject to dynamic modulation, consistent with the hypothesis that even very closely related species, such as Mus musculus and Mus spretus that are separated by roughly ~ one million years, have evolved species-specific chemosensory functions.
A panel of mouse V1R probes was selected that would together hybridize to all V1R genes in the Mus musculus V1Ra/V1Rb cluster on chromosome 6, assuming hybridization required >90% nucleotide identity (Table (Table1).1). Gene-specific PCR primers were used to amplify 200–300 bp from each gene; PCR was conducted using digoxigenin- (Dig-) labeled dUTP (Roche). Probes were mixed for subsequent medium-high stringency hybridization to genomic filters of an arrayed Mus spretus BAC library (CHORI-35 SPRET/Ei BAC library, Children's Hospital Oakland Research Institute). Glycerol stocks were prepared for positive clones, and BAC DNA was isolated using the Qiagen Midi Prep kit (Qiagen).
BAC ends were sequenced using proprietary protocols by Agencourt Bioscience QuickLane Sequencing service using the SP6 and T7 priming sites on the pTARBAC2.1 vector DNA. Resulting BAC-end sequences were BLAT-matched onto the Mus musculus sequence assembly (UCSC Genome Browser, July 2007) in order to provide initial mapping information, assuming that synteny was conserved between the two closely related Mus species. In addition, EcoR1-digested BAC DNA was Southern blotted and hybridized with the above V1R probe mixture to estimate the number of V1R genes per BAC and to provide additional mapping resolution. From these analyses, four BAC clones (CH35-319B4, CH35-373N1, CH35-336J18, CH35-362A11) that appeared to be an efficient tiling path spanning the entire V1R cluster were selected for sequencing.
The selected BACs were subjected to shotgun sequencing using standard methods, resulting in 7, 6, 8, and 18 contigs in the four BAC assemblies, respectively. These assemblies were subjected to near 'comparative-grade' sequence finishing as described in Blakesley et al. . A majority of contigs were ordered and oriented by read pairs of gap spanning subclones. Other contigs were established by alignment to the reference (M. musculus) sequence (see below). Contig maps were verified by comparing restriction digest patterns from laboratory produced gels to those generated in silico from the consensus sequences. The four BAC sequences are available in Genbank with the following accession numbers (gi numbers indicate draft versions used in this analysis): AC225052 (gi:189095727), AC225271 (gi:183227749), AC225873 (gi:187960227), AC229624 (gi:192807370).
The CH35-319B4 BAC sequence was assembled into seven contigs that include the ortholog to the Txnrd3 gene that flanks the "left" end of the V1Ra/V1Rb locus in the Mus musculus assembly, as well as putative orthologs to V1rb7, V1rb9, and V1Ra6, the three "left"-most V1Rs in this assembly. The CH35-373N1 BAC sequence was assembled into six contigs that include putative orthologs to V1ra5, V1rb4, V1ra2, V1rb8, and V1ra4, consistent with synteny to the next segment (moving "right") in the musculus assembly. The CH35-336J18 BAC sequence was assembled into eight contigs that include putative orthologs to V1Ra3, V1rb2, V1rb1, and V1ra1, consistent with sytneny to the next segment (moving "right") in the musculus assembly. The CH35-362A11 BAC sequence was assembled into 18 contigs that include putative orthologs to V1ra7 and V1ra9, two "rightmost" V1Rs in the musculus cluster, as well as the ortholog to the Uroc1 gene that flanks the "right" end of the musculus locus. We confirmed a ~9.5 kb overlap of the CH35-336J18 and CH35-362A11 BACs (the two "rightmost" BACs), but cannot confirm overlap for the other two junctions. Based on synteny mapping, we estimate that the gap between CH35-319B4 and CH35-373N1 (the two "leftmost" BACs) is approximately 3.5 kb, a region not predicted to contain V1R-like sequence. The junction between the two middle BACs, CH35-373N1 and CH35-336J18, is less resolved. As indicated, the former BAC contains a putative ortholog to V1rb8, and the latter BAC contains a putative ortholog to V1rb2 (located ~95 kb to the "right" of V1rb8), and both BACs contain putative orthologs to the similar V1ra3 and V1ra4 gene pair located between V1rb8 and V1rb2 in the musculus assembly (Fig. (Fig.1).1). However, our phylogenetic and neutral substitution analysis (see text) indicates that there might have been lineage-specific gene conversions or recombinations involving the V1ra3-V1ra4 region, and therefore, it is not possible to unambiguously map synteny where these two middle BACs intersect. Contigs were assembled and oriented for each spretus BAC in order to maximize contiguous alignment with the musculus assembly; short contigs without unambiguous alignment to musculus were omitted from this "locus assembly". The resulting spretus sequence assembly was analyzed for repeat content using the RepeatMasker algorithm (http://www.repeatmasker.org; Institute for Systems Biology, Seattle).
All spretus contigs (including short contigs excluded from the "locus assembly") were surveyed for V1R gene content using a virtual V1R probe designed from a composite of all known musculus V1R sequences. A total of 22 V1R-like sequences were identified, including 15 intact V1Rs that encode open reading frames. The 15 intact V1Rs, along with five of the pseudogenes, could be confidently aligned over a 906-bp stretch. One of the excluded pseudogenes, YUA.1pg (located between orthologs to V1rb9 and V1ra6 on CH35-319B4), is located at a syntenic position to a pseudogene in musculus located at the 89739 kb position on chromosome 6 (UCSC Genome Browser, July 2007 assembly). The other pseudogene excluded from the alignment, YUD.1 pg (located between orthologs to V1ra7 and V1ra9 on CH35-362A11), is located at a syntenic position to a pseudogene in musculus located at the 90192 kb position on chromosome 6. We note that the spretus YUC.4 V1R was present in the overlapping sequence of both the CH35-362A11 and CH35-336J18 BAC assemblies. We produced nucleotide alignments derived from amino acid alignments of the 20 V1R sequences ("codon-aligned"; see Additional file 1), as well as homologous sequences from musculus and rat. We used the SNAP on-line tool http://www.hiv.lanl.gov/content/sequence/SNAP/SNAP.html to analyze synonymous and non-synonymous substitution histories, and the Paup program (Sinauer Associates) for phylogenetic reconstructions.
VCK designed and produced V1R probes, screened and isolated clones from the spretus BAC library, and conducted Southern blots used to select BAC clones for subsequent sequencing. MG prepared BAC clones for sequencing and conducted sequence analyses. The NISC Comparative Sequencing Program performed shotgun sequencing of the BAC clones to produce high-quality sequence data; this included detailed sequencing finishing and confirmatory bioinformatics analyses to validate the accuracy of the generated sequence. EDG supervised the BAC sequencing process and contributed to the intellectual content of the manuscript. RPL conducted evolutionary analyses and wrote the manuscript.
V1R gene alignments used in this study.
This work was supported by National Institutes of Health Grant R01-DC006267. The work was also supported in part by the Intramural Program of the National Human Genome Research Institute, National Institutes of Health. We thank various members of the NIH Intramural Sequencing Center (NISC) for their role in generating the BAC sequences. In particular, we thank Robert Blakesley for his leadership in all aspects of the sequence generation and finishing process, including ensuring the accurate description of these activities in this manuscript, and Gerry Bouffard for his leadership in the bioinformatics components of the BAC-sequencing effort.