|Home | About | Journals | Submit | Contact Us | Français|
Pathogen genes involved in interactions with their plant hosts are expected to evolve under positive Darwinian selection or balancing selection. In this study a single copy avirulence gene, AvrP4, in the plant pathogen Melampsora lini, was used to investigate the evolution of such a gene across species. Partial translation elongation factor 1-alpha sequences were obtained to establish phylogenetic relationships among the Melampsora species. We amplified AvrP4 homologues from species pathogenic on hosts from different plant families and orders, across the inferred phylogeny. Translations of the AvrP4 sequences revealed a predicted signal peptide and towards the C-terminus of the protein, six identically spaced cysteines were identified in all sequences. Maximum likelihood analysis of synonymous versus non-synonymous substitution rates indicated that positive selection played a role in the evolution of the gene during the diversification of the genus. Fourteen codons under significant positive selection reside in the C-terminal 28 amino acid region, suggesting that this region interacts with host molecules in most sequenced accessions. Selection pressures on the gene may be either due to the pathogenicity or avirulence function of the gene or both.
The gene-for-gene model of plant–pathogen interactions was proposed by Flor (1942, 1955) over 60 years ago, and has shaped our current thinking about host–pathogen coevolution in both animal and plant systems. This paradigm underlies much of the theoretical literature on disease evolution (e.g. Frank 1993; Damgaard 1999; Thrall & Burdon 1999, 2002; Sasaki 2000; Agrawal & Lively 2002). Under this model, host resistance (R) genes control recognition of pathogens carrying specific avirulence (Avr) genes leading to disease resistance. Many plant R genes have been cloned and most encode proteins of a conserved class that together represent the pathogen recognition components of plant innate immunity (Dangl & Jones 2001). Pathogen Avr proteins on the other hand are highly diverse, and are generally thought to act as pathogenicity effectors that promote infection in the absence of recognition by corresponding R proteins. Many such Avr/effectors appear specific to particular pathogen classes, and recent genome sequencing of two Phytophthora species has shown very little overlap between their predicted effector protein repertoires (Tyler et al. 2006). Studies of Avr gene evolution have focused on genetic variation within single host species, with an emphasis on the presence or absence and characteristics of gene homologues among races and pathotypes, e.g. Pseudomonas syringae Avr genes (Stevens et al. 1998; Arnold et al. 2001), weighing up the role of Darwinian selection versus balancing selection in introducing and maintaining polymorphisms (Holub 2001; Tellier & Brown 2007). Loss of gene function through deletions, gene disruption or point mutations has been shown to play a role in overcoming host resistance (Farman et al. 2002; Tosa et al. 2005), and in some cases Avr gene diversification correlates with the ability to overcome host resistance (Allen et al. 2004; Dodds et al. 2006).
The interaction between flax rust (Melampsora lini) and cultivated flax (Linum usitatissimum) has been a useful model system for understanding the molecular basis of the gene-for-gene interaction (Ellis et al. 2007; Lawrence et al. 2007). To date, 19 flax rust resistance genes have been isolated in flax and four Avr genes have been isolated from the rust. All four Avr genes, AvrL567, AvrM, AvrP4 and AvrP123 encode small secreted proteins that are expressed in haustoria, and are thought to be transported into the host cytoplasm (Dodds et al. 2004; Catanzariti et al. 2006). Dodds et al. (2006) showed that for the AvrL567 locus in M. lini, and the L5, L6 and L7 resistance specificities in L. usitatissimum, direct protein interactions underlie gene-for-gene specificity. Differences in recognition specificity between corresponding R and Avr protein variants, and evidence for diversifying selection acting on these genes suggest a sustained gene-specific coevolutionary interaction between L. usitatissimum and M. lini. Considerable sequence variation and evidence for diversifying selection also occurs at the AvrM, AvrP4 and AvrP123 loci (Catanzariti et al. 2006; P. N. Dodds 2007, unpublished data).
Much of the theoretical and empirical work on gene-for-gene interactions has focused on within-species gene evolution and diversity. Sackton et al. (2007) investigated the evolution of innate immune-related genes across species of Drosophila, however in plants little work has been done to characterize variation in pathogen avirulence genes across species boundaries, despite the importance of such a perspective for understanding the evolutionary history of these genes both within and among host–pathogen interactions. For example, determination of whether homologues are present in sister taxa or within a genus or family can answer questions about sequence variation and selective constraints in gene sequences, and whether specific areas of the genes are prone to selection or relaxed selective constraint. The potential for conservation in function of host resistance and pathogen avirulence genes among species remains unclear and unexplored. In contrast to systems where deletion of a locus renders the pathogen virulent, all characterized isolates of M. lini from both cultivated flax and native Linum marginale populations contain intact copies of each of the Avr genes (Catanzariti et al. 2006; Dodds et al. 2006; P. N. Dodds & L. G. Barrett 2006, unpublished data). Avr gene sequence diversification, rather than major gene disruptions such as deletions, suggests an important functional role. It is possible that these genes remain conserved at a higher phylogenetic level, raising the question of whether genes related to M. lini Avr genes are present in the genomes of related rust species.
In this study, we examine the evolution of the flax rust AvrP4 gene within the rust genus Melampsora. Melampsora includes species with heteroecious (alternating between two unrelated host plants) and autoecious (completing the entire life cycle on one host plant) life cycles. All heteroecious Melampsora taxa have their dikaryotic stage on a Populus or Salix (Salicaceae) host. The haplont hosts are found among diverse plant families both in the gymnosperms (Pinaceae) and angiosperms (Araceae, Alliaceae, Orchidaceae, Papaveraceae, Grossulariaceae, Saxifragaceae, Celastraceae, Clusiaceae, Euphorbiaceae, Linaceae and Violaceae). The autoecious taxa are mainly found on hosts in the families Clusiaceae, Euphorbiaceae, Linaceae (all Malpighiales) and Pinaceae. Little information is available on specific relationships among the 80 plus species (Hawksworth et al. 1995) of Melampsora, but molecular phylogenies of the Uredinales (rust fungi) support Melampsora as a monophyletic genus (Maier et al. 2003; Wingfield et al. 2004).
The AvrP4 gene encodes a protein of 95-amino acids including a 28-amino acid cleavable secretion signal peptide and six cysteine (cys) residues, spaced according to the consensus of a cystine-knot, found in the C-terminal 28 amino acids of the protein (Catanzariti et al. 2006). While the function of the protein is unknown, a cys knot is commonly found in toxins and in inhibitors of receptors and proteases (Pallaghy et al. 1994). Here we assess the extent of sequence maintenance of the AvrP4 locus across a range of Melampsora species, and evaluate selection pressures acting on the AvrP4 gene by investigating the non-synonymous versus synonymous rate ratio using maximum-likelihood methods. This is compared to a phylogeny obtained using the translation–elongation factor 1-alpha ‘house-keeping’ gene sequence.
Infected host material was obtained for 17 species (26 specimens) of Melampsora, including the aggregate species M. lini and Melampsora epitea (which includes M. aff. capraearum, Melampsora repentis and Melampsora reticulatae). The position of the Caeoma sp. is unclear, although the habitat indicates that it alternates with Salix and is a member of the M. epitea aggregate. The material represents autoecious and heteroecious life cycles, and pathogenicity on a range of diverse plant families. For all species, the dikaryont host is a member of the order Malpighiales among the eurosids I, while the haplont hosts include members of Asparagales (monocots) and Saxifragales (core eudicots) (APG II 2003). Hosts and life cycle information is given in figure 1 and table S1 in the electronic supplementary material. Additional information on each isolate is available in table S1 in the electronic supplementary material. Two out-group taxa, Chrysomyxa woronini (Coleosporiaceae) and Pucciniastrum epilobii (Pucciniastraceae) were included. Owing to quarantine regulations most DNA extractions were conducted on spore material stored and shipped in ethanol (see table S1 in the electronic supplementary material for type of material used), using a modified CTAB method described by Van der Merwe et al. (2007). Melampsora lini isolates from three Hesperolinon species were collected, but it is not clear whether host specific isolates have evolved (Springer 2007; Y. P. Springer 2006, personal communication).
Partial sequences of the translation elongation factor 1-alpha (TEF) gene were obtained for the accessions following the methods of Van der Merwe et al. (2007).
We designed primers to amplify 180 bp of 5′ flanking sequence, 285 bp of open reading frame (ORF), and 103 bp of 3′ flanking sequence for the AvrP4 gene (F: CATCAAAATCTAACCCGTAC and R: GTAGCATTGAGATCCATGG). Polymerase chain reactions (PCRs) were carried out in 50 μl reactions with 10 pmol of each primer, 1 unit of Immolase polymerase (a high-fidelity polymerase from Bioline), 1.2 μl Betaine (Sigma), 5 μl PCR buffer (supplied with the polymerase), 2.5 mM dNTPs and 2 mM MgCl2. Best amplification was found when using a touch-down PCR with the following conditions: 7 min at 97°C followed by 10 cycles of 30 s at 94°C; 40 s at 55°C (with a temperature decrease of 1°C with every cycle) and 1 min 30 s at 70°C; followed by 28 cycles of 30 s at 94°C; 30 s at 54°C and 1 min at 70°C; and a final extension of 8 min at 72°C.
PCR fragments were purified using the Qiagen PCR clean-up system and sequenced using the PCR primers. Standard sequencing reactions using BigDye v. 3.1 were performed and run on an ABI3730 at ACRF Biomolecular Resource Facility (ANU). The pGEM-T easy vector system (Promega) was used for cloning where initial sequencing indicated the presence of multiple variant copies. Ten colonies were sequenced from each cloned PCR product and colony PCRs were carried out using M13 primers. To test for the level of PCR and sequencing error, we amplified and cloned the AvrP4 gene from the same isolates used by Catanzariti et al. (2006). No sequence variation was detected between our sequences and those obtained in their study.
Amplification attempts were made on all accessions in figure 1. Accessions from eight Melampsora species and out-group taxa failed to amplify, leaving AvrP4 sequence data from nine species of Melampsora available for analysis. Unsuccessful amplification may have resulted from poor sample quality, gene deletions or mutations within the primer sites.
Sequences were aligned using MEGA v. 3.1. Phylogenetic trees inferred using Bayesian methods were constructed using MrBayes (Huelsenbeck & Ronquist 2001; Ronquist & Huelsenbeck 2003). Data was partitioned into the intronic and exonic regions with the latter further partitioned according to the first, second and third codon positions. A general time reversible (GTR) model was implemented as determined by ModelTest v. 3.7 (Posada & Crandall 1998). Two million Markov chain Monte Carlo generations were run and trees were sampled every 100th generation. The first 20 per cent of sampled trees were removed and a 50 per cent majority rule consensus tree was constructed. Maximum likelihood (ML) analysis was conducted at the Cipres Portal v. 1.13 (available at http://www.phylo.org/) using the RAxML algorithm for finding the best known likelihood tree from the data and from 1000 non-parametric bootstrap datasets (Stamatakis et al. 2008). The data were partitioned into first, second and third codon positions and non-coding sequence with a distinct GTR+Γ substitution model for each partition.
Sequences were manually aligned in MEGA v. 3.1 (Kumar et al. 2004) by using conserved amino acid motifs as reference points. Bayesian and ML methods similar to the TEF data above were implemented. ML analysis were conducted in PAUP (Swofford 2001) and RAxML with the AvrP4 data partitioned according to the three codon positions. Substitution rates for each codon position were estimated from the data and included in the GTR model with invariable sites and a gamma distribution (GTR +I+G) using a heuristic search method. The resultant ML tree was used in the CODEML analysis. Bootstrap support values were obtained by analysing 100 replicates without codon partitioning.
Four tests available in RDP3 (Martin et al. 2005) were implemented to detect possible recombination blocks with the following settings: RDP (Martin & Rybicki 2000)—six nucleotide windows with no reference sequences used; GENECONV (Padidam et al. 1999)—sequence triplets with a scale of 1; BOOTSCAN (Maynard Smith 1992)—100 nucleotide windows with a 20 nucleotide step distance scanned, using a 70 per cent bootstrap cut-off and 100 bootstrap replicates; and CHIMAERA (Maynard Smith 1992) set to 30 variable sites per window, 10 permutations with a maximum permutation p-value of 0.01.
The non-synonymous/synonymous rate ratio parameter ω was estimated using the program CODEML (Yang 1997) in phylogenetic analysis by ML v. 3.15 (Yang 1997). To investigate variation in selection among branches we applied a model that estimated ω for each branch in the phylogeny (ML tree obtained with PAUP). Tests for positive selection were performed using the site class models implemented in CODEML that estimate ω for amino acid sites. Neutral sites have ω=1, while those under purifying selection have ω<1 and those under positive selection, ω>1 (Neilsen & Yang 1998; Yang et al. 2000). Likelihood ratio tests were performed using the likelihood values λ 0 and λ 1 from the nested models M7 and M8 by comparing the test statistic 2Δλ=2(λ 1−λ 0), with the Χ 2 distribution (d.f.=2). The null model M7 allowed for 10 site classes that provide a discrete approximation of the beta distribution of site-wise ω-values between 0 and 1. Model M8 added an 11th class with ω>1 and estimates the proportion of sites within this class and the value of ω for that class. Simulations have suggested that the beta distribution null model provides greater power to detect positive selection than simpler models implemented in CODEML (Anisimova et al. 2001). For model M8, the Bayes empirical Bayes (Yang et al. 2005) procedure estimated the mean ω-value for each codon site, and the posterior probability that the site is under positive selection.
Total length of the sequence alignments for the TEF data was 696 bp, including 150 codons and 246 bp of intronic sequence (GenBank accessions: EF487218 to EF487243). The average pairwise nucleotide divergence was 8 per cent, ranging between 0.2 and 13 per cent. Since no evidence for recombination was detected, the sequences were used for phylogenetic data analysis (figure 1). Topologies obtained from the best ML and Bayesian trees were identical. Despite poor support for relationships along the backbone, the following relationships received significant support from both the ML and Bayesian analysis: (i) all M. lini isolates, (ii) M. repentis and Melampsora salicis-albae, (iii) Melampsora hypericorum and Melampsora euphorbiae, (iv) M. reticulatae and the M. epitea complex in a clade with Melampsora larici-populina and Melampsora ricini, (v) Melampsora rostrupii and Melampsora magnusiana, and (vi) Melampsora ribesii-viminalis and Melampsora vernalis.
Primers flanking the AvrP4 gene of M. lini were used to amplify homologous sequences (GenBank accessions EF639296 to EF 639314) from 9 of the 17 species of Melampsora (including the Caeoma sp.) occurring across the TEF phylogeny. The sequence length varied between 509 and 526 bp. Average pairwise nucleotide divergence was 20 per cent, ranging between 2 and 32 per cent. No significant evidence for recombination was detected among the sequences. Each sequence contained an ORF of between 91 and 98 codons that corresponded to the predicted coding region of the M. lini AvrP4 gene (figure 2). The first methionine codon conserved in all sequences corresponded to amino acid five of the predicted M. lini sequence. Beginning at the conserved methionine, the 24 amino acids corresponding to the predicted signal peptide for the M. lini isolates (Catanzariti et al. 2006) were highly conserved. Following the predicted signal peptide was a variable region of nine amino acids, followed by a conserved region with the sequence I(Q/R)GFS (figure 2). This motif was followed by a variable region of eight amino acids, which was followed by another highly conserved region of five amino acids with a consensus sequence L(E/Q)E(D/E)S. The last 49 amino acids in the alignment were highly variable apart from six cys residues present in all sequences.
The spacing of the six cys residues in the predicted amino acid alignment (Cx4Cx5Cx4Cx2Cx4-7C; figure 2) was in accordance with the consensus sequence of an inhibitor cys-knot structure (Cx3-7Cx4-6Cx0-5Cx1-4Cx4-10C). All sequences in the alignment contained a termination codon at an equivalent position with the exception of Melampsora amygdalinae, which contained an additional four codons (PIVQ). Within the 123 bp 5′ non-coding region, 45 variable nucleotide sites and one indel were present. There were 35 variable nucleotide sites in the 97 bp non-coding region at the 3′ end. The overall mean (Nei's) genetic distance between sequences was 0.231 (std 0.018).
Two AvrP4 variants were amplified from each of three accessions of M. lini pathogenic on Hesperolinon and the one accession of M. reticulatae (A and B in figure 3). The AvrP4 A variant sequences from accessions pathogenic on Hesperolinon bicarpellatum and Hesperolinon serpentinum were identical, as were the B variant sequences (figure 3), but differed from the A and B variant sequences obtained from Hesperolinon micranthum (three non-synonymous changes among the A copies and one non-synonymous and one synonymous change among the B copies). Sequence variation between the A and B variants for these rusts involved 45 substitutions in the coding region (20 amino acids). The two M. reticulatae variants differed from each other by 28 amino acid substitutions and two indels (four codons), but unlike the Hesperolinon rusts, these two sequences were not sister to each other on the AvrP4 phylogeny (figure 3).
We investigated positive selection in the AvrP4 phylogeny by examining the rates of synonymous and non-synonymous nucleotide substitutions using ML methods. The results from pairwise comparisons are given in table S2 in the electronic supplementary material. The sequence of M. amygdalinae contained unique deletions, and as CODEML requires the removal of codons corresponding to gaps in the alignment, we conducted the analyses with and without the M. amygdalinae data. Including M amygdalinae allowed 78 codons to be used, and the subsequent analysis indicated 17 codons to be under positive selection (14 highly significant (p>0.99), one significant (p>0.95) and two not significant). All occurred after the L(E/Q)E(E/D)S motif (figure 2) and all with significant posterior probabilities were in the 3′ region encoding the six conserved cys residues. In a likelihood ratio test (see §2), model M8 provided a significantly better fit to the data (p<0.001) than model M7 (table 1). Parameter estimates from the two models are given in table 1. For model M8, the Bayes empirical Bayes (Yang et al. 2005) procedure estimated ω for each codon site (figure 2). When M. amygdalinae was excluded from the analysis, the dataset contained 81 codons of which 21 were found to be under positive selection (figure 2).
The AvrP4 gene tree (figure 3) is divided into four clades: (i) the M. lini clade that includes the rusts on flax and other Linum species, (ii) a clade with M. ricini and M. euphorbiae, (iii) a clade with the six sequences amplified from the M. lini accessions derived from Hesperolinon, and (iv) a clade consisting of sequences from rusts that are mostly pathogenic on various Salix species. High values of ω were estimated in several Melampsora species lineages (nine branches with ω>1), although for several internal branches, the presence of only non-synonymous changes and no synonymous changes precluded the estimation of ω (indicated by X in figure 3). The remaining branches for which ω could be estimated indicated conservation of the protein sequence, with ω estimated to be between 0.4 and 1.
We have demonstrated that highly variable AvrP4 homologues are present in a range of species in the rust genus Melampsora across a phylogenetic tree constructed from the TEF gene. The species are pathogenic on a variety of hosts from distantly related families. Five characteristics of the alignment (present in all sequences), support the new sequences as homologues of the M. lini AvrP4 locus: (i) the highly conserved flanking regions, (ii) the conserved signal peptide sequence, (iii) two highly conserved amino acid motifs, I(Q/R)GFS and L(E/Q)E(E/D)S, (iv) the six cys residues towards the C-terminus of the predicted protein (similar to a cys-knot structure), and (v) the termination codons, adjacent to the six cys codons. AvrP4 is one of a class of secreted proteins expressed in haustoria of M. lini and translocated into host cells during infection (Catanzariti et al. 2006), where they are presumed to enhance pathogenicity. The presence of the AvrP4 locus across the TEF phylogenetic spectrum indicates that this gene arose prior to the diversification of the genus Melampsora, and is not specific to the system in which it was first identified. This contrasts with the host specific appearance and disappearance of an Avr gene in rice specific Magnaporthe strains (Tosa et al. 2005). Although gene deletions may have occurred in the evolution of the genus, reflected in the non-amplification of certain accessions, it is possible that divergence in the primer target sites precluded amplification of AvrP4 homologues from these species. Conservation of AvrP4 homologues across the Melampsora genus suggests that the AvrP4 gene product is important for the biology of these rust species.
Although the focus of this study was not phylogenetic relationships, it is important to note that the gene trees are neither fully resolved nor are they congruent and lack support for the deeper nodes. This may be due to poor species sampling or rapid speciation during the early diversification of the genus. Good support was found for a sister relationship between the AvrP4 genes of the two autoecious species on Euphorbiaceae (M. ricini and M. euphorbiae) reflecting host relationships. TEF, however, placed M. euphorbiae closer to M. hypericorum (host family Clusiaceae) (figures 1 and and3;3; and S2 in the electronic supplementary material). The close relationship between M. lini from Hesperolinon and those from Linum inferred from the TEF gene, beta-tubulin 1 sequence data (M. M. Van der Merwe 2006, unpublished data) and morphology (Y. P. Springer 2005, personal communication) were not reflected in the AvrP4 phylogeny. The relationship between M. repentis and M. salicis-albae, with monocot haplont hosts (Orchidaceae and Alliaceae) were supported by both genes. Although, there seems to be a tendency for the heteroecious species to form a supported clade on the AvrP4 phylogeny, it would be premature to comment on the evolutionary significance of hosts/life cycles and the AvrP4 gene.
The high ω-values estimated for 15 codons indicate that the AvrP4 gene is under strong positive selection within Melampsora. The ω estimated along the branches of the gene tree imply that strong positive selection has occurred throughout the evolutionary history of the gene. Diversifying selection has previously been observed among allelic variants of AvrP4 in M. lini, associated with differences in recognition by the corresponding host R genes, and suggesting that R gene-imposed selection has driven the accumulation of sequence diversity in this species (Catanzariti et al. 2006; L. G. Barrett 2006, unpublished data). Such R gene-imposed selection is a likely explanation for the rapid divergence of AvrP4 homologues among Melampsora species. However, we do not know whether the AvrP4 proteins from other Melampsora species function as avirulence factors that are recognized by R genes in their host plants. The role of AvrP4, similar to that of bacterial Avr proteins (Tsiamis et al. 2000; Alfano & Collmer 2004; Janjusevic et al. 2006) may be as an effector of pathogen virulence, promoting pathogen growth or suppressing host defence pathways. Thus, AvrP4 may be under selection to maintain a host specific pathogenicity function across different host species, and evade specific host resistance mechanisms. This group of Melampsora species infects diverse host plants, so it is possible that variation in the host target proteins may have driven AvrP4 divergence. Between species variation indicative of co-phylogenetic patterns between hosts and pathogens may reflect co-speciation, involving selection on the effector function of a gene, whereas strong selection acting on Avr genes within host specific lineages is a strong indication that recognition by R genes drives selection. Positive selection detected within the AvrP4 locus across species may have been driven by the effector function of the locus, or the interaction between R and Avr genes or a combination of both.
If the AvrP4 gene does function as an avirulence elicitor across these pathogen species, it raises an important question regarding host defence systems. Do these hosts have a single conserved homologous defence system, or do the hosts each have their own independently evolved mechanisms for recognizing the AvrP4 gene products? Kruijt et al. (2005) have suggested that the interaction between Cladosporium fulvum and host species in the genus Lycopersicon pre-dates speciation in Lycopersicon, since several functional homologues of R genes (Cf-4 and Cf-9) from multiple species in Lycopersicon interact with Avr4 and Avr9 elicitors from C. fulvum. However, the AvrB protein from P. syringae pv. glycinea can act as an avirulence factor in both Arabidopsis thaliana and Glycine max (phylogenetically diverse plants), interacting with the unrelated R genes Rpm1 and Rpg1b, respectively (Ong & Innes 2006).
Strong diversifying selection has been found in several fungal and oomycete Avr genes within a single pathogen species such as ATR1 and ATR13 of Hyaloperonospora parasitica (Allen et al. 2004; Rehmany et al. 2005), scr74 in Phytophthora infestans (Liu et al. 2005), the flax rust AvrL567, AvrM, AvrP123 and AvrP4 genes (Catanzariti et al. 2006; Dodds et al. 2006), and Avr4 in C. fulvum (Stergiopoulos et al. 2007), consistent with R-Avr gene coevolution. Other Avr loci are characterized by presence and absence polymorphisms within species (see DeWit & Stergiopoulos in press). Few studies have observed variation in Avr gene homologues across species. Avr3 of Fusarium oxysporum f.sp. Lycopersicon is restricted to tomato infecting isolates of this species and does not occur in other forma speciales that infect different hosts (Rep et al. 2004). Comparisons of the complete genomes of three oomycetes Phytophthora ramorum, Phytophthora sojae and H. parasitica infecting oak, soya bean and Arabidopsis, respectively, have been more informative (Win et al. 2007; Jiang et al. 2008). Oomycetes encode a class of host-translocated effectors that are defined by the presence of an N-terminal RxLR uptake motif that occurs shortly after the secretion signal peptide. Approximately 400 such genes were predicted in the two Phytophthora genomes and approximately 100–200 in H. parasitica. Only approximately 10 per cent of the H. parasitica effectors have homologues in the Phytophthora species, while about a third of the Phytophthora effectors are common to both species. This class of effector genes seems to have been subject to a birth and death evolutionary process, with the rapid emergence of new effectors and loss of older ones since speciation. Thus gene conservation, such as AvrP4 across Melampsora species, is relatively uncommon in the limited datasets currently available and suggests that these genes perform important functions in rust infection.
The conserved cys-knot structure (six cys residues and their spacing) indicates that these residues may be important for gene function across these accessions. Because sequence differences in the C-terminal cys-rich domain of the M. lini isolates resulted in non-recognition by the host, Catanzariti et al. (2006) suggested that this region is important for R gene recognition. As codons with highly significant ω-values were concentrated in this region (figure 2, black bars), we speculate that the corresponding region interacts with the host defence system, and this interaction drives the diversification of the gene. The rapid accumulation of non-synonymous changes detected in the cys knot compared with the rate of synonymous changes indicates that fixation of new mutations played a more important role in the accumulation of sequence variation than stochastic processes (Tellier & Brown 2007). This suggests that Darwinian selection outweighed balancing selection in the gene-for-gene interaction in the AvrP4 locus. It is possible that the N-terminus of AvrP4 is subject to proteolytic processing, as occurs in the C. fulvum cys-knot protein Avr9 (Vervoort et al. 1997), which would explain the relaxed selection in this region. Likewise, diversifying selection is confined to the C-terminal regions of ATR1 and ATR13 (Allen et al. 2004; Rehmany et al. 2005). Win et al. (2007) found a similar uneven pattern of diversification in many effector gene families in Phytophthora. In these cases, it is the N-terminal RxLR motif that is protected from diversification, presumably due to its conserved function in effector transport. It is possible that the N-terminal I(Q/R)GFS and L(E/Q)E(E/D)S motifs in AvrP4 have a conserved role such as in protein processing or transport into host cells.
Contrasting patterns of selection were observed for the accessions where two variant sequences were amplified. In the M. lini accessions from the Hesperolinon hosts (M5–M10; figures 2 and and3),3), the higher ω-values along the B copy branch contrasted markedly with those along the A copy branch (figure 3), indicating that positive selection may only be acting on the one copy (B). While multiple copies may act as reservoirs of mutational variation, selection may not act on all copies. Values of ω close to 1 may also indicate that the gene or a specific copy is non-functional. However, maintenance of conserved areas most probably indicate that the copy has been functional in the recent past.
In contrast to the Hesperolinon rusts, for the two sequences from M. reticulatae, values of ω>1 for both branches leading to the terminal nodes indicate positive selection driving amino acid substitutions (M16 and M17 figures 2 and and3)3) in both variants. The origin of divergent A and B variants may not be the same for the two species. Processes such as incomplete lineage sorting, hybridization, allelic diversity and gene-duplication could all result in divergent copies of a gene from one species (Maddison 1997).
Investigation of genetic variation in a known avirulence gene has revealed a locus that has been under strong positive selection in a number of rusts occurring on highly diverse hosts. The high values of ω across the gene phylogeny supports the idea that the AvrP4 gene is an avirulence elicitor across the genus Melampsora and that it has an important function that has been maintained through structurally conserved regions within the gene. Selection may have acted on these sequences because of the effector function of the locus or in order to escape recognition by host resistance factors or both. These findings are important for understanding the evolution of genes involved in the interaction between pathogens and their hosts and highlight that such genes could have originated and diversified long before a specific host–pathogen interaction.
The authors acknowledge the financial support of the Grains Research Development Corporation of Australia, Uri Springer for supplying material of the Hesperolinon rusts, John Walker for rust accessions and Celeste Linde (Australian National University, School of Botany and Zoology) for computer and office facilities, and the valuable comments of anonymous reviewers.