(a) Saturation in morphological changes
Previous morphological analyses failed to account for saturation in character states as a source of homoplasy that could contribute to conflict with molecular phylogenies. If saturation in morphological character states explained phylogenetic incongruence, then: (i) there will be fewer nodes conflicting with the molecular tree in the morphological phylogeny obtained from BYS analyses than from the MP analyses; and (ii) there will be fewer significant comparisons of the fit of the BYS morphological tree to the molecular data than with the MP tree. The results were consistent with the first prediction; only half the major conflicting nodes recovered in the MP analysis of the morphological data were recovered with BYS (). The conflicting nodes supported by morphology in both MP and BYS were mainly those supported by potentially convergent characters associated with feeding ecology. However, the BYS morphological tree was still not congruent with the molecular data (P≤ 6E-09, ).
Applying a model of character evolution to a combination of true synapomorphies and homoplastic characters could help recover the underlying phylogeny, which would be obscured when using only MP (Lewis, 2001
). The node-by-node analysis showed that both MP and the model-based approach recovered a small number of conflicting nodes (). Agreement with MP in recovering “incorrect” nodes, and significant incongruence between the Bayesian morphology tree and all molecular data sets imply that the model of morphological evolution did not fully account for the ahistorical signal in the morphological data. Both developmental and functional constraints limit the range of observed changes in morphology, and these limits may require more complex models than were implemented here. For example, the model of evolution applied to the morphological data assumed symmetrical rates of change from one state to another (Lewis, 2001
), but a more complex model can implement asymmetrical rates from one state to another (Schultz & Churchill, 1999
). Asymmetries in rates of change are expected when a character state can arise through more than one developmental pathway, resulting in inadequate assessments of homology between states (e.g. Bharathan et al., 2002
; Geeta et al., 2012
). Allowing asymmetrical rates of change for characters that arise through multiple pathways does not correct for the incorrect homology assessment, but accounts for the greater ease of change in one direction, providing a better fit of the model to the observations and a better estimate of phylogeny.
Another mechanism that results in multiple hits and saturation is adaptive convergence, when selective pressure on morphological function results in homoplasious character states. Across mammals, molecular phylogenies have helped identify convergent morphological changes associated with foraging ecology at multiple hierarchical levels from genera to orders (Hunter & Jernvall, 1995
; Delsuc et al., 2001
; Reiss, 2001
; Ruedi & Mayer, 2001
). The Mkv model implemented here can potentially account for convergent changes, but only when the data include the single-taxon changes (or autapomorphies) that characterize long branches and can help identify adaptive convergent evolution (Lewis, 2001
). Unfortunately, we collected only shared derived morphological characters, eschewing autapomorphies. This may explain why the BYS morphological phylogeny also recovered a number of conflicting nodes supported in MP. Despite being homoplasious, changes associated with dietary specialization could not be identified as such by the model. If this was the case, then the proportion of potentially convergent diet characters in support of conflicting nodes should be higher in nodes recovered with MP and BYS methods than those recovered only with MP. This prediction lacked statistical significance: the proportion of diet characters in support of conflicting nodes recovered with MP and BYS was 0.70, and in nodes recovered only with MP was 0.33 (unpaired one-sided t-test P
= 0.11, see ). To take advantage of currently available models of evolution an approach to morphological data collection that includes autapomorphies may be necessary to minimize the consequences of saturation in morphological data sets.
Based on extensive experience with saturation in molecular data, we anticipate that analyzing morphological data in a model-based framework requires: (i
) excluding fast-evolving characters; (ii
) identifying functional categories of characters and accounting for their properties (e.g. rates of change, change asymmetry); (iii
) using partitioned models; and (iv
) modeling site heterogeneity for functional partitions of the data (Lartillot, Brinkmann & Philippe, 2007
; Rodríguez-Ezpeleta et al., 2007
; Dávalos & Perkins, 2008
; Wagner, 2012
(c) Saturation in molecular substitutions
To date, analyses of molecular data to resolve relationships across Phyllostomidae have not applied models to account explicitly for saturation in substitutions, particularly at silent third codon positions (e.g. Baker et al.
, 2003, 2000; Datzmann et al., 2010
). We uncovered extensive saturation in mitochondrial sequences particularly at third positions and in 16S loops (), and these sites also differed significantly in base composition. If biases in base composition superimposed on sites where substitutions were saturated produced phylogenetic conflict, then: (i
) support for nodes in conflict would decline or disappear after down-weighting or excluding those sites; (ii
) support for conflicting nodes will concentrate at mt third codon positions and mtr loops; and (iii
) analyses down-weighting or excluding these data will result in fewer significant conflicts between molecular partitions and individual trees.
There is some evidence for the first prediction: support for the position of Lonchorhina declined after excluding saturated sites and down-weighting semi-saturated sites (). There was significant support from mt third codon positions and loops for Micronycterinae being the sister taxon of the remaining phyllostomids (). However, the origin of support for that node changed when analyses fitted separate models to each partition (). RAG2 third positions, which were neither saturated nor different in base composition, consistently supported that node. PLS analyses showed that saturated sites did not consistently support Micronycterinae as the sister taxon of the remaining phyllostomids, the position of Lonchorhina, or the monophyly of nectar feeders (). Saturation did generate conflict between the mitochondrial data and RAG2 sequences, with conflict going from being significant (P = 0.008) to non-significant (P≥ 0.547, cf. and ).
Down-weighting or excluding saturated sites resulted in lower measures of support in most phylogenies (cf. ). The low slopes (0.00–0.02) of transitions at mitochondrial third codon positions and 16S loops indicated these were accumulating changes independently from shared evolutionary history (). By definition their contribution to phylogenetic resolution is noise. Transversions at those sites were not as saturated (aside from third positions of COX1, slopes ranged from 0.24 to 0.43), indicating that in some parts of the phylogeny these sites contributed some signal, and not only noise.
(d) Incongruence between gene trees
Conflict between mitochondrial and nuclear phylogenies among phyllostomids has been highlighted before (Velazco & Patterson, 2008
), and has been implied in the different resolutions obtained by results of separate analyses of mt and nuclear data (Baker et al.
, 2003, 2000). If different gene histories explained phylogenetic conflict between molecular data sets, then: (i
) there should be significant incongruence between different genes; and (ii
) conflict will persist even after correcting for systematic biases such as base compositional heterogeneity superimposed on mutational saturation. We uncovered significant incongruence between RAG2
and the mtrDNA data, despite using a partitioned model to reduce the impact of saturation on the phylogeny (P
≤ 0.044, ). Down-weighting saturated sites when estimating the mtrDNA phylogeny did not render the conflict non-significant (P
≥ 0.009, ). Our results are consistent with different gene trees underlying the phylogenetic conflict between mt and nuclear data.
Several processes can generate incongruence between gene trees, particularly when these genes correspond to genomes with distinct modes of inheritance, as is the case here. These processes include: (i
) paralogy (Fawcett, Maere & Van de Peer, 2009
) lateral gene transfer (Bapteste et al., 2005
; Hotopp et al., 2007
) introgression (Hailer & Leonard, 2008
) incomplete lineage sorting (Pollard et al., 2006
; Heckman et al., 2007
) poor taxon sampling and outgroup choice (Hedtke, Townsend & Hillis, 2006
; de la Torre-Bárcena et al.
, 2009); and (vi
) adaptive convergence (Li et al.
, 2008, 2010; Liu et al., 2010
Paralogy could affect either the nuclear or mt sequences, so that gene copies from some individuals are not orthologous to those of others and, having a different gene history, result in significantly incongruent trees. The structure, expression, and function of the recombination activating genes RAG1
have been studied in depth in humans (Oettinger et al., 1990
; Corneo et al., 2001
; Sadofsky, 2004
), and paralogy with other genes has not been reported. In bats, both RAG1
have been used as phylogenetic markers, and their sequences, although paralogous, are distinct and do not co-amplify (Teeling et al.
, 2003, 2005). Alternatively, paralogy could affect the mitochondrial data through co-amplification of insertions of mitochondrial sequences onto the host nucleus (Bensasson, Feldman & Petrov, 2003
; Anthony et al., 2007
). These insertion events are relatively common within mammals (e.g. Olson & Yoder, 2002
; Antunes et al., 2007
; Hazkani-Covo & Graur, 2007
), and could mislead phylogenetic analyses because the nuclear sequence (or Numt, for nuclear-mt insertion) changes more slowly and is inherited biparentally, unlike the orthologous copy of the gene in the mitochondrial genome. We searched for frame-shifts and stop codons in the concatenated CYTB
alignment to detect the presence of Numts in our data, but found no indication of such insertions. We conducted similar comparisons using the structural mtr alignment, and found no evidence of Numts [although these might not always be detectable through structural analysis, see Olson & Yoder (2002)
]. There is currently no support for paralogy underlying incongruence between mt and nuclear genes with these data.
Although frequently invoked in prokaryote evolution, lateral gene transfer (LGT) can also affect multicellular eukaryotes, e.g. through the transfer of genes from bacterial endosymbionts to the host (Hotopp et al., 2007
). This kind of LGT is an unlikely explanation for the observed incongruence because bacterial endosymbionts have not been reported in mammals; and gene loss in other genomes is the more likely explanation for genes thought to have transferred from prokaryotes to eukaryotes (Salzberg et al., 2001
). Most cases of LGT reported in prokaryotes are among genes present in more than one copy (Lerat, Daubin & Moran, 2003
), or mediated by transposable genetic elements called “transposons” (Xu et al., 2007
). The nuclear marker studied here (RAG2
) could fit the profile of laterally transferred genes because it has a paralogue—RAG1
—and therefore is present in more than one copy (Sadofsky, 2004
). Further, the function and structure of the genes are consistent with an origin by insertion of a transposon or “jumping gene” (Market & Papavasiliou, 2003
), and recombination activating genes appear abruptly in evolution beginning with jawed vertebrates (Schluter et al., 1999
). The proposed transfer event into ancestral vertebrates, however, would not explain phylogenetic conflict within phyllostomids unless more recent transposon activity was found. Since the conflict we document here is with the virtually independently evolving mitochondrial genome, comparisons with other nuclear genes are necessary to identify a pattern consistent with more recent gene transfer.
A pattern of phylogenetic conflict between RAG2
and a mitochondrial gene is also consistent with introgression and incomplete lineage sorting, and these have been documented in vertebrates much more commonly than LGT. Introgression has been used to explain conflict between mitochondrial haplotype diversity and coalescent pattern and either morphologically delineated evolutionary units, or phylogenies based on nuclear data (e.g. Dávalos, 2005
; Russell et al., 2008
). More extensive analyses with dense population samples from more than two independent loci have uncovered patterns of genetic diversity and differentiation consistent with female- (Berthier, Excoffier & Ruedi, 2006
), or male-mediated introgression (Mao et al., 2010
). Although previous analyses have all suggested or uncovered introgression among close relatives, the long-term phylogenetic signature of those events would be incongruence between genomes inherited maternally, paternally, and/or bi-parentally. Sampling more nuclear markers could help test introgression as an explanation for phylogenetic conflict: there should be congruence across nuclear phylogenies and conflict with the mt phylogeny if sex-specific introgression explains the conflict. More complicated scenarios, such as recurrent hybridization leading to genetic admixture in an important fraction of the population (25%) of one species (Berthier et al., 2006
), should produce more complex patterns of incongruence among nuclear genes, and would require dense sampling of the genome to uncover after a series of speciation events (Eckert & Carstens, 2008
In contrast with a simple sex-specific introgression scenario, lineage sorting of ancestral polymorphism is expected to generate incongruence between phylogenies from genes in the same genome (Degnan & Rosenberg, 2006
). The development of new approaches to infer species phylogeny while accounting for conflicting gene trees arising from independent coalescent processes (reviewed by Liu et al., 2009
), has renewed interest in accounting for lineage sorting. New methods seek to model the coalescent process and therefore estimate some of the population-level parameters that give rise to lineage sorting in the first place (e.g. ancestral molecular diversity, effective population size). Multiple alleles per species from several loci are therefore needed to parameterize such models, significantly increasing data requirements (e.g. Brumfield et al., 2008
; Liu et al., 2008
). Determining if lineage sorting and/or introgression are driving the conflict observed requires collecting multi-allele, multi-locus nuclear data that are currently unavailable for this system.
Sampling more taxa to break up long branches and reduce systematic bias can improve estimates of phylogeny (Graybeal, 1998
; Zwickl & Hillis, 2002
), and resolve significant conflict among gene trees (Hedtke et al., 2006
). Further, using a single distantly related outgroup to polarize characters results in incongruence if it makes the root of each gene tree random (de la Torre-Bárcena et al.
, 2009). Incongruence among gene trees in mammals, and bats in particular, has been traced to differing taxon samples and poor choices in rooting phylogenies (Van Den Bussche & Hoofer, 2004
). Our taxon sample aimed to include the diversity of the family and, although individual genes were not available for the entire sample, tests of incongruence encompassed at least 45 species representing the taxonomic diversity of the family. The choice of outgroups (see Section II.1a
) encompassed a suite of nine species from both closely (i.e. in the superfamily Noctlionoidea) and more distantly related taxa. If phylogenetic conflict was the result of poor taxonomic sampling and/or a poor choice of outgroup, expanding taxon and outgroup sampling would reduce the conflict between gene trees. Instead, incongruence persisted despite adding five more species, two of them outgroups (cf
. and ). These results indicate that taxon and outgroup sampling are probably not driving the conflict.
Paralogy, lateral gene transfer, and taxon/outgroup sampling can be ruled out as drivers of phylogenetic conflict between gene trees with our data. Evaluating the roles of lineage sorting of ancestral polymorphism and introgression in generating conflict will require collecting data from many more individuals from each species and across more genes than were analyzed here.
(f) Potentially adaptive convergent molecular evolution
Just as morphological features may converge on a similar phenotype under selective pressure associated with ecological specialization, genes facing selection for a particular function may also converge. For example, convergence in the protein sequences of the Prestin
gene, associated with sensitivity and frequency selectivity of the cochlea in mammals, has been proposed to explain spurious clades found in trees based on protein sequences in bats and among cetaceans (Li et al.
, 2008, 2010; Liu et al., 2010
). Based on the results of analyses of the concatenated molecular data, we examined the node uniting the nectar-feeding Lonchophyllinae and Glossophaginae (, ). If adaptive convergence caused the conflict between the molecular resolution obtained here and earlier molecular phylogenies (cf
. ), then: (i
) support for this node should come from functional parts of the genes involved; (ii
) selection should be operating on the genes at rates different from those of other lineages; (iii
) and there should be a link between the gene and ecological function. We found that—as in studies of snakes and agamid lizards (Castoe et al., 2009
)—support for the spurious nectar-feeding clade using molecular data stemmed from substitutions that result in amino acid changes in mitochondrial protein-coding genes (). This is also consistent with the fact that including the mitochondrial protein-coding data is the main difference in gene sampling between the data analyzed here and those of Baker et al. (2003)
It is more difficult to implicate selection in producing the spurious clade. The shift in selection pressure detected at 18 amino acids in the CYTB
gene across the two nectar-feeding lineages was linked to both high support and high rejection of the spurious clade, rather than just to high support. The amino acids under shifting selection comprise a proportion of shared changes in nectar-feeding phyllostomids, but also a smaller proportion of changes distinctive to each branch, or autapomorphies of these two subfamilies (). Half the amino acid changes (eight of 16) that provided significant support to the spurious clade correspond to regions of CYTB
and two amino acids in COX1
experiencing background selection (). Finally, the shift in selection detected can be interpreted as relaxation of purifying selection (e.g. Zhao et al., 2010
), or as the result of a brief burst of positive selection in the nectar-feeding lineages (e.g. Li et al., 2007
) (). Relaxed selection is more parsimoniously explained by genetic drift resulting from small population size (e.g. Moran, 1996
; Ingman & Gyllensten, 2007
), rather than by changes in selection pressure. Demographic changes are expected to leave signatures across the entire genome (Russell et al., 2011
), and examining Ka/Ks
across many loci can help test this hypothesis. The critical importance of the cytochrome b
protein in all eukaryotes (discussed below), the conservation of its function across mitochondria, chloroplasts, and cyanobacteria, and the non-random distribution of sites under shifting selection relative to the structure of the protein (), weakly support the hypothesis of a brief burst of positive selection. Change in Ka/Ks
in the evolutionary history of these lineages is consistent with positive selection toward a particular genotype, but it cannot explain all the phylogenetic support for the spurious clade. Selection as a source of support for the spurious clade is a hypothesis that merits further study.
The link between the CYTB
gene and function is clear, but not exclusive to the particular ecology of nectar-feeding phyllostomids. The cytochrome b
protein is a component of the respiratory complex involved in electron transport across the mitochondrial membrane, which generates an electrochemical potential indispensable in ATP synthesis (Esposti et al., 1993
). Specifically, cytochrome b
is a respiratory subunit that forms centres at each side of the mitochondrial membrane that react with the coenzyme ubiquinone and results in oxidation/reduction of the ferricytochrome complex and trans-membrane electron transfer (Mitchell, 1975
; Iwata et al., 1998
; Cramer et al., 2006
). In humans, mutations in CYTB
can lead to sudden infant death, cardiac defects, neurological defects, deafness, epilepsy, growth retardation, mental retardation, and muscle weakness (Keightley et al., 2000
). Although the function of this enzyme has not been documented in bats, its centrality to respiration and energy production in the cell, and the range of deleterious consequences of mutation in humans imply that the cytochrome b
protein is of critical importance to fitness. The sites under shifting selection were significantly associated with two broad regions: trans-membrane helices, responsible for trans-membrane electron transfer (Trumpower & Gennis, 1994
), and the carboxy-terminal domain of the protein, essential for correct assembly of the entire respiratory complex (di Rago et al., 1993
Based on what is known about the protein's function from model organisms, we speculate that these sites might be of particular importance to nectar-feeding bats because of their high energy requirements. Phyllostomid nectar-feeders have extremely high metabolic rates (Kelm et al., 2011
), which in turn result in extreme metabolic adaptations such as near-maximum rates of nutrient absorption (Winter, 1998
), and almost exclusive use of dietary carbohydrates in respiration, instead of fat reserves (Voigt & Speakman, 2007
). At present, the relationship between amino acid replacements in the trans-membrane helices and carboxy-terminus of CYTB
in nectar-feeding phyllostomids is still a conjecture that can be tested with studies coupling whole-organism performance to allelic variants and their in vitro
biochemical activity, and corroborated by examining CYTB
variation among other “high-energy” lineages descended from generalized plant-visiting ancestors.
The first two predictions concerning the impact of adaptive convergence on incongruence were met: amino acid substitutions in functional parts of the CYTB gene (trans-membrane helices and carboxy-terminus) support the potentially spurious node, and these replacements may be the result of a short burst of positive selection. There is also a potential link between the gene and ecological function and a potential mechanism generating convergence: selection as a result of demand for high-performance energy metabolism in these lineages. The link between support for the clade and molecular selection is ambiguous, and the connection between bat ecology and protein function is inferential. For these reasons we consider the adaptive convergence hypothesis weakly supported, but worthy of further research.