The availability of the complete genome sequence of 12 phylogenetically close Drosophila
spp. represents a milestone in comparative genomics. The phylogenetically guided analysis of genome information can provide a fine description of the gene structure of the complete set of genes, and can clearly improve our knowledge of the evolution of Drosophila
spp. Indeed, analysis of currently available genome information can provide insight into the major forces that shape gene family evolution. Here, we conducted an exhaustive gene-by-gene bioinformatic analysis, precisely identifying orthologous members and the number of gene gains and losses, which allows one to draw accurate evolutionary inferences. We show that, in the Drosophila
genus the number of OBP genes is moderately variable (from 40 to 61 genes) across species diverging only about 40 to 60 million years [39
], and that the MRCA of the 12 Drosophila
species had 47 genes. Moreover, the species visibly differ with respect to pseudogenes; although the number of nonfunctional genes is nearly the same (from 0 to 3 copies per species), they represent different pseudogenization events, except for the two pseudogenes identified in the closest species D. pseudoobscura
and D. persimilis
. The 12 Drosophila
spp. Nevertheless maintain the basic chromosomal organization of the OBP genes in clusters; however, although the relative physical position of clusters differs among species, they are always maintained within the same Muller element. This feature therefore points to chromosomal inversions as being the main mechanism responsible for chromosomal rearrangements [46
Several models have been proposed to explain the global evolution of multigene families (for review [49
]). There are three basic models. The divergent evolution model [50
] postulates that duplicate copies diverge in a gradual manner and that new functions are acquired progressively. The concerted evolution scenario [51
] proposes that gene family members evolve in a concerted manner through gene conversion, unequal crossover, or gene amplification [52
]. More recently proposed [53
] is the birth-and-death model of gene family evolution, which states that the new genes are created by gene duplication and are lost by deletions or become nonfunctional accumulating deleterious mutations. Under the latter model, different gene duplicates would differ in the times that they are maintained in the genome. The controversy over the relative importance and interplay of these multigene family evolution models currently remains active [49
]; two critical limiting issues are the lack of DNA sequence data from the complete set of genes and pseudogenes in multiple genomes, and the partial knowledge of the gene conversion mechanism and therefore of its significance.
Analysis of the whole set of OBP genes and pseudogenes in the complete genome of the 12 Drosophila spp. clearly points to birth-and-death as being the major model for the evolution of the OBP multigene family. First, the phylogenetic analysis shows that orthologous groups share a MRCA more recent than that of paralogous groups (the average amino acid divergence within orthologous groups is much lower than estimates, including orthologous and paralogous), in addition to the lack of evidence supporting gene conversion. Second, orthologous copies fit very well the accepted phylogeny of the species. Third, we detected a number of gene gain and loss events in numerous lineages of the phylogeny. Fourth, we also identified several nonfunctional members (pseudogenes). Therefore, OBP genes would evolve independently from their origin by gene duplication until their loss by deletion or transiently as a pseudogene.
Under the birth-and-death model, the new duplicate genes are eventually lost from the genome by two basic processes: by a deletion or via pseudogenization. Our study shows that all pseudogenization events, except for the two inferred in the ancestral branch leading to the short D. pseudoobscura and D. persimilis lineages, occurred in terminal branches. It is most likely that the failure to detect pseudogenes on internal phylogeny branches is caused by the short half-life of pseudogenes (the elapsing time before a pseudogene can no longer be recognized as a member of its original sequence family is very short). Therefore, we cannot quantify the relative magnitude of the two processes. Nonetheless, the uneven distribution of deletions and pseudogenizations on internal and external branches of the phylogeny suggests that several gene losses detected as deletions were initially triggered by a pseudogenization event.
Our comparative genome analysis has also provided insights into the rates of the origin and loss of duplicate genes. In particular, we estimated the birth-and-death rate by two approaches: the stochastic birth-and-death process, which uses information of the number of genes in extant species and assumes equal rates of gene gain and loss [37
]; and comparative genome analysis of the inferred number of gene gain and loss on each phylogeny branch and those inferred at the internal nodes (see Materials and methods, below). We estimated the birth-and-death rates as β
= 0.002 to 0.004 and δ
= 0.002 to 0.003 per gene and million years. These estimates are slightly lower than those obtained using the method proposed by Hahn and coworkers [37
= 0.005 to 0.008);, the latter method, however, assumes equal gain and loss rates. Present OBP estimates are higher than the average value for the whole Drosophila
= 0.0013 [55
]) although similar (or lower) to the estimates obtained for the two other major olfactory multigene families of Drosophila
, namely the Ors (λ
= 0.006 to 0.009) and the Grs (λ
= 0.011 to 0.015). (These estimates were derived using the numbers of genes identified by McBride and Arguello [56
] and those identified by Gardner and Ritchie [personal communication].) OBP birth rates are quite similar to previous estimates for the complete set of gene families in Drosophila
= 0.001 to 0.002 [57
]). However, these estimates are not completely comparable because the methodological approach used by Lynch and Conery [57
] is quite different from our approach in that they made use of single-genome information. We also show that high-density tandem OBP gene regions are more likely to generate new duplicates. Therefore, a given gene family might present different birth (or birth-and-death) rates across the genome. As more genome information becomes available, it will be possible to determine whether the birth-and-death rates differ in gene families with different function, chromosomal locations, dissimilar gene sizes, or in different group of species, and whether they correlate with gene-specific functional importance [59
]. These studies will undoubtedly provide valuable insight into the molecular evolution and biologic importance of multigene families.
The present study also provides significant clues as to the origin and fate of duplicate genes. We show that the majority of gene gains occur in extant chromosomal clusters, suggesting that gene duplications are mostly produced in tandem by unequal crossing over. Furthermore, the highly significant relationship between chromosomal clusters and phylogenetic groups would indicate that OBP members evolve gradually from their origin in existing clusters. It is known that transposable element-rich regions could generate these 'gene factories' by increasing the levels of unequal crossing over [60
]. Although we did detect a number of repetitive elements neighboring most of the genome clusters, no confident conclusion could be drawn. Further analysis of the relative distribution of these repetitive elements will provide more information about the origin of OBP gene duplicates and genome clusters.
We found that OBP genes exhibit high functional constraints, with an average ω
value of 0.153, and confirmed that the results obtained in two individual members are a general feature of the family [63
]. In spite of the fact that the selective constraint levels are not clearly associated with phylogenetic groups, they differ both among individual genes and across chromosome clusters. This feature supports the contention that, concurrent with the sequence divergence, duplicates copies would also diverge functionally. Although OBP members would essentially maintain the same global function, they would probably acquire subtle functional differences (a micro-functionalization [62
]), perhaps in their gene expression levels or in their ligand-binding specificity or affinity properties. In addition, Andronopoulou and coworkers [65
] have demonstrated that some Anopheles
OBPs might form homodimers and specific heterodimers, suggesting a high combinatorial complexity that will allow for new binding or kinetic properties (also see Sánchez-Gracia and coworkers [63
]). Hence, small differences in the number and pattern of protein-protein OBP interactions might have an appreciable functional meaning and might underlie the observed functional constraint differences [66
]. In this context, it is suggestive that dimmer OBPs, which might have been originated from a gene duplication event followed by in-frame fusion, produce a single-chain multidomain protein that retains structural features of the original dimeric unit.
The present estimates of the protein evolutionary rates at the OBP genes are in contrast to the strong conservation pattern of genome clusters across the genus. However, this feature does not occur in the Or and Gr gene families, which have a comparable number of genes [56
] (Gardner and Ritchie, personal communication), suggesting the action of some mechanism that actively prevents their break. Indeed, genes belonging to the same cluster might exhibit a spatiotemporally coordinated expression. For instance, in D. melanogaster
some OBP genes are co-expressed either in the same developmental stage or in the same local region of the chemosensory organs [31
]. Although we did not detect clear evidence that genes on the same cluster are expressed at particular developmental stages, the incomplete gene expression data precludes us from drawing any firm conclusion. To shed light on this issue, it is essential to determine how the expression patterns correlates with genomic gene organization in these olfactory system gene families.
The evolutionary analysis of the complete set of genes in a family involved in the response to environmental chemicals is also very attractive because they may be able to provide insights into the selective pressures that result from changes in the species 'lifestyle' during and after speciation. Here we find that OBP genes in specialist lineages (those that recently underwent a host speciation episode) evolve at significantly higher ω
rates than do generalists. Consequently, either purifying selection is more relaxed in several OBP genes, probably caused by loss (or partial loss) of function during the specialization process, or positive selection acted throughout this process. McBride and Arguello [56
] also found a significant increase in the evolutionary rate at the Ors and Grs in the D. sechellia
and D. erecta
lineages; this study detected a genome-wide increase in the amino acid fixation rate in this species, although it was lower than that observed at the receptor repertory. Because the genome-wide higher ω
values detected in the specialist species could reflect some demographic changes, we have also conducted in the OBP family the same analysis as McBride and Arguello [56
] did, using the same genome-wide set of genes. We also found that the OBP repertory in specialist species has evolved under lower functional constraints (higher ω
values) than the genome-wide trend (the median difference between specialists and generalists ω
for the OBP family [0.0556] is significantly greater than that for genes across the genome [0.0087]; P
= 0.0031). This feature, jointly with the birth-and-death evolution pattern similarity, suggests that these two olfactory system multigene families might have co-evolved in response to ecologic changes across the Drosophila
genus. Therefore, it would be very interesting to establish the precise contribution of the OBP gene family to this specialization process, and to identify the specific members involved in this phenomenon. This knowledge will provide fundamental insight into the roles played by the various selective forces in shaping patterns nucleotide variation associated with host-switching or ecologic specialization processes.