|Home | About | Journals | Submit | Contact Us | Français|
In vertebrates, olfaction is mediated by several families of G protein–coupled receptors (GPCRs) including odorant receptors (ORs). In this study, we investigated the antiquity of OR genes by searching for amino acid motifs found in chordate ORs among the protein predictions from 12 nonchordate species. Our search uncovered a novel group of genes in the cnidarian Nematostella vectensis. Phylogenetic analysis that included representatives from the other major lineages of rhodopsin-like GPCRs showed that the cnidarian genes, the cephalochordate and vertebrate ORs, and a family of genes from the echinoderm, Strongylocentrotus purpuratus, form a monophyletic clade. The taxonomic distribution of these genes indicates that the formation of this clade and therefore the diversification of the rhodopsin-like GPCR family began at least 700 million years ago, prior to the divergence of cnidarians and bilaterians. ORs and other rhodopsin-like GPCRs have roles in cell migration, axon guidance, and neurite growth; therefore, duplication and divergence in this family may have played a key role in the evolution of cell type diversity (including the emergence of complex nervous systems) and in the evolution of metazoan body plan diversity.
Vertebrate olfaction involves several families of G protein–coupled receptors (GPCRs), including the odorant receptors (ORs) (Buck and Axel 1991), the trace amine–associated receptors (TAARs) (Liberles and Buck 2006), the type 1 (Dulac and Axel 1995) and type 2 vomeronasal receptors (Herrada and Dulac 1997; Matsunami and Buck 1997; Ryba and Tirindelli 1997), and the formyl peptide receptor–like proteins (FPRs) (Rivière et al. 2009). In this study, we focused on the ORs which are GPCRs that belong to the rhodopsin-like gene family (Fredriksson and Schiöth 2005). Like other GPCRs, genes in the rhodopsin-like family encode seven-transmembrane (TM) domain receptors that convert external stimuli into intracellular biochemical signals. ORs, TAARs, and FPRs belong to the rhodopsin-like gene family, whereas the type 1 and type 2 vomeronasal receptors are not rhodopsin-like (reviewed in Bargmann 2006). Rhodopsin-like GPCRs are present in placozoans (Srivastava et al. 2008) and sponges (Srivastava et al. 2010), and the first representatives of this family are believed to have appeared between 580 and 800 million years ago (Römpler et al. 2007). The rhodopsin-like GPCRs found in humans can be divided into four subgroups: the α subgroup contains genes such as the amine receptors, melatonin receptors, TAARs, and opsins; the β subgroup contains peptide receptors (e.g., gonadotropin-releasing hormone receptors and oxytocin receptors); the γ subgroup includes chemokine, somatostatin, and opioid receptors; and the δ subgroup includes ORs, purinergic receptors, FPRs, and leucine-rich repeat containing GPCRs (Fredriksson et al. 2003).
The OR repertoires in several mammals, fish, amphibians, and cephalochordates have been described, yet orthologs of chordate ORs do not appear to occur in protostomes (reviewed in Kaupp 2010). Though chemosensory receptors have been described in protostomes (e.g., insects and nematode worms), these genes are unrelated to the ORs found in chordates. The apparent absence of chordate ORs in the protostomes does not mean that they are a deuterostome innovation. To explore the origin of ORs relative to the emergence of early metazoan lineages, we used amino acid motifs (table 1) that are commonly found in cephalochordate and vertebrate ORs (Churcher and Taylor 2009) to query the protein predictions of 12 nonchordate species. Sequences possessing these motifs were then used as queries in Blastp searches for paralogous genes. Motif-possessing sequences and Blastp hits were then used in two phylogenetic analyses.
Using this approach, we uncovered a family of genes from the cnidarian, Nematostella vectensis that belongs to a monophyletic clade that includes echinoderm, cephalochordate, and vertebrate ORs (fig. 1, supplementary fig. S1, Supplementary Material online). This N. vectensis gene family includes 35 full-length and 11 partial OR-like genes (supplementary table S1, Supplementary Material online) and contains six of the seven sequences with motif 1 and seven of the nine sequences with motif 2 (table 1). Sequence identity among the full-length OR-like genes over the TM-spanning domains ranges from 14–98%. Like the chordate ORs, many of these genes have only one exon and are tandemly arrayed in the N. vectensis genome (supplementary table S1, Supplementary Material online). Our data suggest that chordate ORs evolved from a distinct clade of rhodopsin-like genes that was present in the ancestor of cnidarians and bilaterians and that is approximately 700 million years (My) old (Putnam et al. 2007).
Our search also uncovered 27 full-length OR-like genes in the echinoderm, Strongylocentrotus purpuratus that, with the N. vectensis OR-like genes, form a monophyletic clade with the chordate ORs (fig. 1, supplementary fig. S1 and table S1, Supplementary Material online). Most of the 27 S. purpuratus genes are single exon genes that are tandemly arrayed in the sea urchin genome (supplementary table S1, Supplementary Material online). The 27 S. purpuratus genes belong to a family of at least 40 OR-like genes that were characterized by Burke et al. (2006) and Raible et al. (2006) (fig. 1, supplementary table S2, Supplementary Material online). Although 13 of the 40 genes do not possess our query motifs and were not Blastp hits to motif-containing sequences, they are single exon genes and most are linked to genes that were detected in our survey. We therefore suspected that the 13 genes may be related to the genes from this survey. We added them to our alignment and our phylogenetic analysis and our suspicions were confirmed; the 13 additional genes form a monophyletic clade with the 27 genes from this survey. More importantly, this new analysis clarifies the position of the S. purpuratus OR-like gene family relative to chordate ORs and other rhodopsin-like GPCRs.
Three other proteins in our survey contained a query motif. Two were from N. vectensis and one was from the gastropod snail Lottia gigantea. One of the N. vectensis genes has motif 2 (table 1). Phylogenetic analysis showed that this gene, and eight genes that were Blastp hits to this gene, form a monophyletic clade with the α subgroup of rhodopsin-like GPCRs (data not shown). The other N. vectensis gene has motif 1 and 2 (table 1) and it had a single Blastp hit. Although these two genes are clearly related to each other, we were unable to assign them to a rhodopsin-like GPCR clade based on the amino acid positions used in our phylogenetic analysis. The L. gigantea protein has motif 5 (table 1) and retrieved a single hit in our Blastp search for paralogs. Both of the L. gigantea genes are most closely related to receptors in the α subgroup of rhodopsin-like GPCRs (data not shown).
Consistent with previous genome surveys, we did not uncover orthologs of chordate ORs in the protostomes surveyed. Therefore, chordate ORs join the list of genes that are present in N. vectensis and vertebrates but appear to have been lost in flies and nematode worms (Putnam et al. 2007). In mammals, fish, and cephalochordates, the OR genes appear to have arisen from only a few ancestral genes (Niimura and Nei 2005; Niimura 2009; Churcher and Taylor 2009) and therefore the apparent lack of chordate ORs in the protostomes may be the result of the loss of one or a few genes in the common ancestor of protostomes. Although our survey included representatives from both the ecdysozoa and lophotrochozoa, it remains possible that OR orthologs occur in unsurveyed protostomes.
Although our survey uncovered orthologs of chordate ORs in N. vectensis, we did not find representatives of this clade in the freshwater hydrozoan Hydra magnipapillata. This result is not surprising given that the amino acid substitution rates in H. magnipapillata proteins relative to their human orthologs are typically greater than the substitution rates found in N. vectensis and that the divergence time between anthozoans and H. magnipapillata is estimated to be 540 My (Chapman et al. 2010). It is possible that OR-like genes are present in H. magnipapillata and other hydrozoans but that these genes lack our query motifs. Many of the N. vectensis genes described above lack the motifs as did many of the S. purpuratus genes. And, although the motifs are common in chordate ORs, not all of them have the motifs. Therefore, these motifs can uncover OR orthologs in very distantly related metazoans, but the lack of motif-possessing sequences in a given genome does not rule out the possibility that OR orthologs are present.
Although the cnidarian and echinoderm genes are more closely related to the chordate ORs than any other rhodopsin-like GPCRs, the function of these genes is not yet clear. Expression data suggest that the amphioxus genes in this clade function as ORs (Satoh 2005), but without experimental data from cnidarians and echinoderms, we cannot attribute the same role to these genes. The conservation of motif 1 over time, however, combined with the results of our phylogenetic analysis makes these N. vectensis and S. purpuratus OR-like genes excellent candidates for expression and functional analysis.
Several amino acid positions in the N. vectensis OR-like genes (fig. 2) are also conserved in cephalochordate (Churcher and Taylor 2009) and vertebrate ORs (Alioto and Ngai 2005). In more closely related sequences (e.g., mouse or rat ORs), these sites are not obvious because residues at other positions can appear highly conserved simply because they occur in a large number of genes produced by recent duplication events. The discovery of orthologs of chordate ORs in cnidarians, however, allows us to compare genes among lineages that diverged a long time ago and therefore exposes sites that are likely to be functionally significant.
Conserved amino acid positions found in N. vectensis and chordate ORs include an asparagine (N) in transmembrane domain one (TM1) and several leucines (L), a proline (P), and an aspartate (D) in intracellular loop one (IL1) and TM2 (fig. 2). These residues were part of our query motifs (motifs 1 and 2) that also include a tyrosine (Y) residue. Although the tyrosine residue is present in a subset of N. vectensis OR-like genes, it appears to have been lost in the majority. This residue is also absent in some cephalochordate ORs (Churcher and Taylor 2009) and zebrafish ORs (Alioto and Ngai 2005). In chordate ORs, the KAxxTxxxH (where x represents a variable amino acid position) motif in IL3 occurs frequently (Churcher and Taylor 2009). In N. vectensis OR-like genes, however, only the lysine (K) and the threonine (T) residues are common.
The N. vectensis OR-like genes also have several amino acid residues that are common to rhodopsin-like GPCRs and that are believed to have roles in receptor activation. These include the D/ERY motif at the boundary between TM3 and IL2, a tryptophan (W) residue in TM4 and the NPxxY motif in TM7. The arginine (R) in the D/ERY motif and the NPxxY motif are believed to function in receptor activation (reviewed in Nygaard et al. 2009).
Most of the conserved residues occur in or very close to the intracellular loops of the protein (fig. 2). In other GPCRs, the intracellular loops interact with G proteins and other proteins on the cytosolic side of the cell (reviewed in Ritter and Hall 2009). In the mouse odorant receptor, mOR-EG, conserved positions within the intracellular loops are required for receptor function (Kato et al. 2008). It is therefore possible that signal transduction in cnidarians and chordates is regulated in a similar fashion.
Our phylogenetic analysis combined with the data presented by Anctil (2009) shows that there was a diversity of rhodopsin-like GPCRs in the cnidarian–bilaterian ancestor; cnidarians have representatives of at least three of the four main subgroups of rhodopsin-like GPCRs found in humans (fig. 1,supplementary table S2, Supplementary Material online). Because the receptors involved in olfaction represent much of the diversity within the rhodopsin-like gene family (e.g., ORs, TAARs and FPRs) and receptors for environmental sensing are likely to have evolved prior to those for cell–cell communication, sensory receptors such as the ORs may be appropriate outgroups for understanding the evolution of rhodopsin-like GPCRs that have other functions.
We suggest that the evolution of rhodopsin-like GPCRs, including ORs, played a key role in metazoan evolution. These genes may have been associated with the emergence of cells that could migrate from one location to another and cells that form connections with distant cells. This hypothesis is based on the observation that several rhodopsin-like GPCRs have roles in cellular development. For example, the δ subgroup includes genes that influence axon path-finding (Mombaerts et al. 1996; Wang et al. 1998; Bozza et al. 2002); the γ subgroup includes genes that regulate the formation of nerve cell projections and cell migration (Li and Ransohoff 2008; Tiveron and Cremer 2008); and the α subgroup contains genes involved in neurite growth (Gaspar et al. 2003; Jordan et al. 2005; McKenna et al. 2008; Bhide 2009; Galve-Roperh et al. 2009). This hypothesis may explain why organisms with unorthodox body plans and sensory organs such as the echinoderms have unusual repertoires of rhodopsin-like GPCRs (described in Raible et al. 2006). Thus, the duplication and divergence of rhodopsin-like GPCRs in early metazoans may have been a prerequisite for the cellular and neuronal flexibility that led to the evolution of complex body plans.
The protein predictions from Monosiga brevicollis (n = 9196, genome assembly version 1.0; King et al. 2008), Capitella sp. I (n = 32,415, genome assembly version 1.0, unpublished data), Helobdella robusta (n = 23,432, genome assembly version 1.0, unpublished data), L. gigantea (n = 23,851, genome assembly version 1.0, unpublished data), Trichoplax adhaerens (n = 11,520, genome assembly version 1.0; Srivastava et al. 2008), N. vectensis (n = 27,273, genome assembly version 1.0; Putnam et al. 2007), and Ciona intestinalis (n = 14,002, genome assembly version 2.0; Dehal et al. 2002) were downloaded from the US Department of Energy Joint Genome Institute (http://www.jgi.doe.gov). The sea urchin, S. purpuratus, proteins (n = 28,944, assembly version 0.5; Sea Urchin Genome Sequencing Consortium 2006) were downloaded from the Human Genome Sequencing Center at Baylor College of Medicine (ftp://ftp.hgsc.bcm.tmc.edu/pub/data). The Caenorhabditis elegans proteins (n = 39,620, release WS190; C. elegans Sequencing Consortium 1998) were downloaded from WormBase (http://www.wormbase.org). The Drosophila melanogaster (n = 20,815, release 54; Adams et al. 2000) and Anopheles gambiae (n = 13,133, release 54; Holt et al. 2002) proteins were downloaded from Ensembl (www.ensembl.org). The H. magnipapillata proteins (n = 17,398, assembly version 1.0; Chapman et al. 2010) were downloaded from the National Center for Biotechnology Information (ftp://ftp.ncbi.nih.gov). Protein sets were used to construct 12 MySQL databases, which were searched using five amino acid motifs (table 1) that are present in less than 1% of non-OR rhodopsin-like GPCRs (Churcher and Taylor 2009). These motifs were derived from a diverse set of lamprey, fish, mammalian, and cephalochordate ORs and were selected because of their ability to discriminate between ORs and non-ORs from the rhodopsin-like GPCR family. Proteins containing a query motif were used in Blastp (Altschul et al. 1997) searches for paralogs. Sequences that were at least 40% identical to the query sequence over a minimum of 100 amino acids were used in iterative Blastp searches until no more genes meeting our search criteria could be identified. Only hits to at least part of any of the TM domains were retained; however, three sequences were excluded because they do not appear to be GPCRs (S. purpuratus protein 028230, N. vectensis proteins 205877 and 212306). Proteins with all seven TM domains were considered full-length sequences; all others were considered partial sequences.
Motif-containing sequences and Blastp hits were aligned to vertebrate and cephalochordate ORs using ClustalW (Thompson et al. 1994) and the alignment was adjusted by hand in BioEdit (Hall 1999). The alignment file is included as supplementary figure S2 (Supplementary Material online). The Neighbor-Joining (NJ) tree was constructed in MEGA version 4.0 (Tamura et al. 2007) using the pairwise deletion option. The unrooted NJ tree is based on approximately 200 amino acid positions and Poisson-corrected distances. Support for nodes was estimated using 1,000 bootstrap replicates. Outgroup sequences from the protostomes, deuterostomes, cnidarians, and a placozoan were carefully selected to ensure that, where possible, representatives of the α, β, γ, and δ subgroups of rhodopsin-like GPCRs were included. The list of sequences can be found in supplementary table S2 (Supplementary Material online). Nematostella vectensis protein (214496) is very short and was excluded from the NJ tree to allow pairwise distances to be calculated from the alignment. Therefore, figure 1 includes 45 of the 46 N. vectensis OR-like genes. Tree topology was confirmed using PHYML (Guindon and Gascuel 2003) and the tree is included in supplementary figure S1 (Supplementary Material online).
To highlight conserved amino acid residues we constructed a WebLogo (Crooks et al. 2004) from the alignment of 35 full-length N. vectensis OR-like genes (fig. 2). Amino acid residues that are commonly found in fish (Alioto and Ngai 2005), mouse (Alioto and Ngai 2005), and cephalochordate ORs (Churcher and Taylor 2009) are marked in the figure.
We are grateful to George O. Mackie for his contribution and to Christine E. Churcher for her editorial support. This work was supported by grants from the Canadian Foundation for Innovation (9639 to J.S.T.), the British Columbia Knowledge Development Fund, and the Natural Sciences and Engineering Research Council of Canada (262783 to J.S.T. and OGP0001427 to G.O.M.).