Completely sequenced eukaryotic genomes represent large and complex data sets, and phylogenomic investigations generally uncover numerous conflicts among individual gene phylogenies [
1,
2]. When a given gene produces a phylogeny with strong support for an alternative relationship to what generally is accepted, it is viewed as a likely candidate for horizontal (or lateral) gene transfer (HGT) [
3]. Multiple genes from a given genome supporting the same discrepant relationship are interpreted as evidence of correlated HGT, stemming from an historical endosymbiosis in the organism's ancestors [
4-
6].
Genomes of all photosynthetic eukaryotes contain numerous sequences acquired via HGT from cyanbacterial ancestors of plastids. Interpreting cases of endosymbiotic gene transfer (EGT) and endosymbiotic gene replacement ("EGR" - when the endosymbiont's gene replaces a previously existing homolog) can be relatively straightforward in organisms like green plants and red algae that harbor "primary plastids" (the endosymbiont was a cyanobacterium). Some algal groups, however, are products of secondary or higher-order endosymbioses, meaning they adopted a eukaryotic endosymbiont along with its pre-existing plastid. In these cases, the host genome acquires not only cyanobacterial genes via EGT and EGR, but also eukaryotic sequences from the nucleus of the endosymbiont [
7].
Large-scale impacts from EGT and EGR have important implications for understanding eukaryotic relationships and, in particular, whether and how plastids have moved among major lineages [
7-
9]. For example, they could provide evidence of a transient endosymbiosis in taxa for which there is no current cytological indication of an active or vestigial plastid. Consequently, a number of efforts have been made to look for evidence of EGT/EGR from a photosynthetic endosymbiont that could have been lost from parasitic and heterotrophic relatives of various algal groups [
10-
12].
The completed genomes of the oomycetes
Phytophthora ramorum and
P. sojae were found to contain multiple genes that imply phylogenetic affiliations with red algae and cyanobacteria [
13]. The presence of these genes has been interpreted widely as support for the chromalveolate model [
7,
8,
12,
13], which argues that algal groups (ochrophytes, cryptophytes, haptophytes, dinoflagellates, apicomplexans) with red algal-derived plastids trace to a common photosynthetic ancestor [
14]. During the establishment of this endosymbiont and its transition to a fully integrated organelle, the host cell nucleus would have accumulated some unknown fraction of red algal and cyanobacterial genes via EGT and EGR. The model further stipulates that this "red" plastid subsequently was inherited by genealogical descent, meaning that extant, aplastidic relatives of these algae must have lost the organelle along the way. Thus, the presence of "algal" genes in
Phytophthora genomes is cited as key evidence that non-photosynthetic heterokonts (stramenopiles) once harbored the same plastid now present in their close relatives, the ochrophytes (e.g. diatoms and brown algae).
"Algal" genes also have been found in several other aplastidic members of the "Chromalveolata" and, likewise interpreted as potential support for this broader model of plastid and organismal evolution [
5,
11,
12]. Such
a posteriori results from genome-level data mining are difficult to interpret, however, because they do not address whether the amount of aberrant phylogenetic signal found is significantly greater than expected from null or alternative models. Persistently discordant gene phylogenies have a number of possible explanations; they are consistent with directional phylogenetic artifacts [
15-
17], horizontal gene transfers associated with feeding preferences or other symbiotic associations [
8,
18], and alternative models of endosymbiotic plastid transfer [
19-
21]. Therefore, it is critical to test whether algal genes in aplastidic protists explicitly support a given evolutionary model such as the Chromalveolata. It is particularly important that tests be structured to include appropriate controls that demonstrate observed phylogenetic affinities are not simply an expected outcome from intragenomic co-variation in tree-building signal.
In genome-level surveys, comparisons of raw similarity scores are the most sensitive method for detecting cases of gene transfer [
22], and provide rapid, quantitative and reproducible data for identifying and ranking HGT candidates [
23]. To improve selectivity, individual genes extracted by genome-wide BLAST surveys and/or automated phylogenetic pipelines, generally are examined more thoroughly using broader sampling and model-based phylogenetic approaches [
8,
24]. These more rigorous phylogenetic treatments remain computationally intractable on gene-by-gene basis, particularly across four large eukaryotic genomes as we investigate here (see below). Moreover, it is unclear how the relative strengths of cumulative phylogenetic signals favoring competing hypothesis would be assessed statistically. Because of these limitations, most comparative genomic investigations [
25], including a number with important phylogenomic implications [
26-
28], have been based on recognized correlations between similarities in blast scores and phylogenetic signal [
29]. This well-demonstrated relationship is an explicit assumption of automated pipelines used to identify likely HGT/EGT candidates from whole genomes for more detailed phylogenetic analyses [
8,
22,
30]. Therefore, we analyzed the relative strength of support for top blastp hits to designated eukaryotic groups as a statistical proxy for aggregate phylogenetic signal. We also employed clear positive controls that validate the use of this methodology.
We identified three explicit assumptions of the chromalveolate model that can be tested directly, to determine whether they are supported over null or alternative hypotheses (Figure ). These are, 1) if putative cyanobacterial genes in oomycete genomes are to be considered evidence of a red algal endosymbiosis (given that most already resided in the nuclear genome of the engulfed rhodophyte), then, as a group, they should show greater affinity to red algal genomes than do genes with stronger similarities to other bacterial groups; 2) the signal from red algal genes should be proportionally stronger in oomycete genomes than signal from control eukaryotic taxa thought to be unrelated to heterkonts, either phylogenetically or through endosymbiosis, and 3) because of the relative antiquity of the presumed chromalveolate endosymbiosis and associated EGT/EGR, signal from red algal genes specifically unrelated to plastid function should be shared between oomycetes and diatoms. To determine whether putative "red algal" and "cyanobacterial" genes in Phytophthora genomes provide support for the chromalveolate model, we applied statistical tests of these clear a priori expectations to comparative results against defined control groups.