|Home | About | Journals | Submit | Contact Us | Français|
The American Eel (Anguilla rostrata) has an exceptional life cycle characterized by panmictic reproduction at the species scale, random dispersal, and selection in a highly heterogeneous habitat extending from subtropical to subarctic latitudes. The genetic consequences of spatially-varying selection in this species have been investigated for decades, revealing subtle clines in allele frequency at a few loci that contrast with complete panmixia on the vast majority of the genome. Because reproduction homogenizes allele frequencies every generation, sampling size, and genomic coverage are critical to reach sufficient power to detect selected loci in this context. Here, we used a total of 710 individuals from 12 sites and 12,098 high-quality single nucleotide polymorphisms to re-evaluate the extent to which local selection affects the spatial distribution of genetic diversity in this species. We used environmental association methods to identify markers under spatially-varying selection, which indicated that selection affects ~1.5% of the genome. We then evaluated the extent to which candidate markers collectively vary with environmental factors using additive polygenic scores. We found significant correlations between polygenic scores and latitude, longitude and temperature which are consistent with polygenic selection acting against maladapted genotypes in different habitats occupied by eels throughout their range of distribution. Gene functions associated with outlier markers were significantly enriched for the insulin signaling pathway, indicating that the trade-offs inherent to occupying such a large distribution range involve the regulation of metabolism. Overall, this study highlights the potential of the additive polygenic scores approach in detecting selective effects in a complex environment.
Panmictic species challenge the classic view of local adaptation, whereby individuals from local populations are more fit to their own habitat compared with migrants from different populations (Kawecki and Ebert 2004). Indeed, local adaptation cannot develop in panmictic species because random mating and random dispersal repeatedly bring individuals carrying “maladaptive” alleles every generation over the entire species range (Yeaman and Otto 2011). Locally, deleterious alleles should be eventually removed by selection, but this elimination can take several hundred generations, especially if selected alleles have variable effects across different environments (Gagnaire et al. 2012). In certain conditions, such as when species occupy different niches where different alleles are favored, spatially-varying selection could maintain polymorphism at selected loci (Levene 1953). Most of the time, however, locally selected loci are only transiently polymorphic and eventually attain fixation if selection is not well balanced (Yeaman 2015).
The American Eel (Anguilla rostrata) is a catadromous panmictic species (Avise et al. 1986; Wirth and Bernatchez 2003; Côté et al. 2013; Pavey et al. 2017) occupying a large geographic range that extends from the Caribbean to Greenland all along the North American Atlantic coast. Environmental conditions dramatically differ between the southern tropical and the northern subarctic part of the species range, translating into a strong latitudinal gradient of selection (Gagnaire et al. 2012; Williams et al. 1973). The species also occupies a wide variety of freshwater, brackish and saltwater habitats, as well as clean versus polluted sites, which represent additional selective challenges (Pavey et al. 2015; Laporte et al. 2016). Although eels spend most of their life in continental or coastal waters, maturing adult eels eventually return to the Sargasso Sea to mate and then die (Béguer-Pon et al. 2015). Young planktonic larvae (leptocephali) that hatch in the Sargasso Sea are then advected to the American coast by passive drift in the Antilles Current and the Gulf Stream (Bonhommeau et al. 2010; Vélez-Espino and Koops 2010). As they reach the continental shelf, the leptocephali undergo metamorphosis into glass eels that use a selective tidal stream transport to reach the coast (McCleave and Kleckner 1982). Recruiting glass eels can either stay in saltwater along the coast, settle in estuarine brackish water, or continue their inland migration upstream into freshwater (Cairns et al. 2009). In this complex life cycle, the panmictic allelic combinations that are produced every generation are randomly dispersed across the whole habitat, making it impossible for local adaptation to appear even if eels have to face strong environmental selective mortality rates due to varying environmental constraints across their range.
The genetic consequences of spatially-varying selection have been studied in the American Eel for decades, leading to the finding of temporally stable latitudinal clines at allozyme loci (Williams et al. 1973; Koehn and Williams 1978). The question has been revisited more recently with population genomic approaches, with the aim to screen for gene-associated SNPs exhibiting genetic differentiation between the extreme parts of the species range despite panmixia (Gagnaire et al. 2012). Candidate outlier SNPs scrutinized for association with environmental variables in a large cohort of individuals sampled throughout the species range allowed validating the influence of spatially-varying selection at 13 coding gene SNPs, most of which were partly associated with river mouth temperature when eel larvae approach the coast. Most of these SNPs were found to be transient polymorphisms contributing to genetic-by-environment interactions during progress toward allelic fixation (Gagnaire et al. 2012).
A less resolved issue concerns the proportion of the genome that is influenced by spatially-varying selection. In a genome scan study based on 50,000 RAD-seq markers, Pujolar et al. (2014) identified 710 loci probably influenced by spatially-varying selection in the European Eel (Anguilla anguilla), the American Eel sister species. Among these, only 74 loci displayed significant associations with environment variables, representing a small fraction (~0.15%) of the SNPs considered. Higher proportions (0.8–1%) were found in the American Eel in two RAD-Seq studies that documented differential selection between freshwater and saltwater habitats (Pavey et al. 2015) and between clean versus polluted sites (Laporte et al. 2016). Both of these studies, however, applied a polygenic statistical framework (multi-locus random forest algorithm approach) that differs substantially from genome-scan statistical methods searching for single-locus selection based on allele frequencies. Previous studies comparing the efficiency of polygenic multi-locus analysis versus single locus genome scan to detect selection in other fish species revealed that the former approach surpassed the latter (e.g. Bourret et al. 2014). More generally, population genomic studies face statistical challenge to detect individual outliers in panmictic species for two reasons (Gagnaire and Gaggiotti 2016). First, moderate selection on single gene traits can only generate small allele frequency changes within a single generation. Second, a comparable intensity of selection acting on polygenic traits results in even smaller allele frequency changes, because the selective pressure is spread over the many loci contributing to the traits under selection (Pritchard and Di Rienzo 2010; Le Corre and Kremer 2012). Because sample size determines the power of such tests to detect a selected locus with a given effect size, increasing sample size while controlling for confounding effects (e.g. temporal variation) remains the most effective way to detect loci influenced by selection.
The goal of this study was to re-evaluate the extent to which spatially-varying selection affects the spatial distribution of genetic diversity in the American Eel. We implemented a reference-guided RAD-Seq genome scan based on the published sequence and annotations of the American Eel genome (Pavey et al. 2017). In order to detect small allele frequency changes generated by selection, we more than doubled the sample size and localities used in previous RAD-Seq studies in Atlantic Eels, and controlled for temporal effects by focusing on glass eel cohorts from two consecutive years. We used environmental association methods and confirm the results of these analyses by simulations to reduce the false positive detection of markers influenced by selection, building on the principle that genetic-by-environment associations are not confounded by neutral geographic differentiation given the panmictic nature of American Eel. We then used additive polygenic scores (Gagnaire and Gaggiotti 2016) to evaluate how the cumulative signal displayed by outlier loci correlates with environmental factors, both at the individual and the sampling site levels. The genes associated with outlier markers and their functions were then analyzed to test for functional enrichments that could provide indications on the currently unknown phenotypes that are undergoing spatially-varying selection.
A total of 710 glass eels studied by Gagnaire et al. (2012) were used for this study. Those individuals were sampled in 2008 at the mouth of 12 tributaries, soon after recruitment to estuaries. Among these, three geographically distant sites were sampled again in 2009 for temporal replication and a 13th site was sampled in Nova Scotia in 2009 only (fig. 1). Sampling sites cover the core of the American Eel's distribution range spanning from Newfoundland to Florida. Details on sampled individuals are provided in table 2. Except for one site from Georgia (n=24), the mean number of samples was 45 by site.
Whole genomic DNA was extracted from tissues using a salt extraction protocol (Aljanabi and Martinez 1997) with RNase A treatment. DNA quality was checked on agarose gel and DNA concentration was quantified using PicoGreen (Invitrogen). DNA libraries were prepared with the RAD-sequencing protocol used by Benestan (2015) using the restriction enzyme, EcoRI which recognized a 6pb cutting site. The libraries were prepared in pools of 12 individuals. For the sequencing, four pools of 12 individuals with different sets of barcodes were put in the same sequencing lane for a total of 17 sequencing lanes. 82 individuals with a low sequencing success were sequenced twice to increase their coverage. Single-end 100bp sequencing was performed on an Illumina HiSeq 2000 at the Genome Quebec Innovation Center (McGill University, Montreal, Canada).
Following sequencing, adapters were removed from the reads using Cutadapt (Martin 2011). The reads were then demultiplexed and trimmed to 80bp using the process_radtags module of Stacks version 1.32 (Catchen et al. 2011, 2013). Reads were then mapped on the American Eel genome (Pavey et al. 2017) using BWA aln (Li and Durbin 2009). The American Eel genome has a size of 1.41Gb and is assembled on 79,000 scaffolds (Pavey et al. 2017). SNPs were called using SAMtools mpileup and BCFtools view modules (Li et al. 2009). Only markers present in at least 80% of the individuals and with a global minor allele frequency of at least 0.02 were retained. A maf of 0.02 was applied as a compromise between minimizing false SNPs caused by sequencing errors and retaining rare alleles. The remaining markers were then filtered based on their sequencing depth to keep only SNPs with an overall coverage of at least 1,500 reads for downstream analyses. The SNPs allele frequencies were then estimated using estpEM (Gompert et al. 2014) which uses the EM algorithm (Li 2011) to take into account genotype likelihoods of the different individuals present in the population to deal with low coverage sequencing data. The program was run independently for each site and temporal replicate.
To identify markers influenced by spatially-varying selection, potential explanatory variables were considered, including the latitude and longitude of the sampling sites, the river mouth temperature and year at recruitment (table 1). Temperature data were obtained from a National Oceanic and Atmospheric Administration database (SST14NA; https://www.class.ngdc.noaa.gov/saa/products/search?datatype_family=SST14NA; last accessed November 6, 2017). For the 2008 individuals, the river mouth temperature for each site was determined by the average sea-surface temperature near the river mouth during the 10days prior to the sampling day during recruitment. That time period was chosen because it corresponded to the most highly correlated temperature with loci under selection in Gagnaire et al. (2012). For the 2009 individuals, the exact sampling date was not available, and therefore, the temperature used for those samples was the average sea-surface temperature of the river mouth during the sampling month.
A RDA was performed on all the markers retained after the filtering steps using the R package “VEGAN” v. 2.3-2 (Oksanen et al. 2017). The RDA was used to estimate the proportion of genetic variance in the entire data set that is explained by the environmental factors. Then, the significance of the RDA was tested with an ANOVA. Both, the joint effects of the four constraining variables as well as their marginal effects were tested using ANOVAs performed with 1,000 permutations. Also, the effect of each variable was partitioned using conditioned RDA, in order to evaluate the percentage of genetic variance explained by each variable separately, after removing the variance due to partial correlation with other variables. Here again, an analysis of variance (ANOVA) was performed to verify the significance level of each of the four partial RDAs. Finally, the explicative importance of the different variables was represented as vectors in biplot graphs.
In order to identify subsets of loci showing extreme contributions (RDA outliers) to the main directions of constrained variation, we selected the markers with the 0.5% most positive and negative scores for the first and the second RDA axes. We chose a threshold of 0.5% as a compromise between neither being too conservative nor too permissive. To minimize the effect of linkage disequilibrium, only one marker per RAD locus pair was retained. When more than one outlier was found on the same locus pair among the potential outliers, the marker with the highest score was retained. A locus pair is two reads that are separated by the same restriction site. A new RDA was then performed with the subset of markers identified as outliers in order to evaluate the extent to which genetic variation at outlier loci was explained by each explanatory variable.
In order to detect outlier loci possibly influenced by spatially-varying selection, we used the program Bayenv2 (Coop et al. 2010; Gunther and Coop 2013) which searches for exceptional correlations between explanatory variables and allele frequencies at individual locus. Although this method specifically takes into account the underlying population structure, it can also deal with the particular case of a panmixia. We used the markers retained after the different filtering steps to calculate the covariance matrix using all samples with 100,000 iterations. Then, 12 runs of 100,000 iterations each were performed using the covariance matrix to identify the most strongly associated markers with each of the four explanatory variables. These 12 replicates aimed at ensuring high repeatability, in order to reduce the number of false positive detection due to the variance in the level of support observed among runs for some markers (Blair et al. 2014). Markers were considered to be outlier if their average Bayes factor across the 12 runs was >3, and above 3 for at least 6 of the 12 runs. We used a threshold >3 following the recommendation of Jeffreys (1961) for the significance of Bayes factor. Then, a RDA was performed with the subset of outlier identified with Bayenv2 in order to evaluate the proportion of their total variance explained by the constraining variables taken together and separately.
The rate of false positives detected by Bayenv2 was determined using simulated data sets. A total of 100 data sets were simulated by randomly assigning the individuals to another site, to simulate a random dispersal without the effect of selection. A covariance matrix was calculated for each of these data sets. Then only one run of Bayenv2 was performed for each data set given our goal of estimating the number of false positives. We then compared the number of outlier detected in each simulation for the different variables to the average number of outlier detected by the 12 runs made on the real data set for the different variables.
Conditioned RDAs (partial RDAs) were performed on a combined data set comprising all the outliers detected by the RDA and by the Bayenv2 approaches. These partial RDAs aimed at identifying which explanatory variable contributed the most to each outlier SNP variation. Marker scores were ranked for each partial RDA performed separately with each variable, and the variable for which a given marker had the highest rank was considered to be the most influencing variable.
The cumulative signal of multiple outlier SNPs markers can be assessed using polygenic scores to evaluate how their joint effect mirrors environmental variation (Gagnaire and Gaggiotti 2016). When the selective effects of individual allele are unknown, polygenic scores are obtained for each individual by summing over loci the number of alleles that are inferred to be favored in a given environment (Hancock et al. 2011; Aarnegard 2014). This relates directly to the concept of additive genetic variance within a theoretical quantitative genetics framework (Roff 1997). Here, alleles that were associated with increasing values of temperature, latitude or longitude were identified based on the sign of their correlation with a given variable for each outlier. Polygenic scores were then computed at the individual eel level. Individual polygenic scores for each eel were obtained by summing over loci the number of favored alleles in a given environmental condition. Correlation between individual and latitude, longitude and temperature were then independently assessed to evaluate how well the cumulated signal of locally putative adaptive alleles varies with each explanatory variable. For each variable, three models (a linear, a quadratic and a logistic) were tested to identify which one fits best the relationship between the polygenic scores and the environmental variable. For each variable, we selected the model with the lowest Akaike information criterion (AIC) value. To evaluate the extent to which extreme polygenic scores found in each locality were correlated with the environmental variables, we assessed the correlations between the 10th and 90th centile of the polygenic score of each site and the environmental variables values. Correlations were tested using a linear and a quadratic model.
Using the mapping of outlier markers onto the American Eel genome, functional annotations were used to identify the closest gene sequences for each outlier. Only the closest genes located within 100kb from each outlier were considered. A gene ontology (GO) term enrichment analysis was then performed on the list of candidate genes potentially associated with outlier loci using DAVID web-server version 6.7 (https://david.ncifcrf.gov/tools.jsp; last accessed November 6, 2017; Huang et al. 2007). Zebrafish (Danio rerio) annotations were used as a reference genome for the GO term enrichment analysis, because this species has the most complete list of annotated genes with known functions among teleost fishes.
Sequencing of RAD libraries resulted in a total of 3.2×109 reads. After the alignment, 7.5 millions of SNPs were identified by SAMtools. Following SNP calling, 656,244 SNPs that were present in at least 80% of the individuals were kept, among which 12,098 were retained after applying minor allele frequency and coverage depth thresholds. The 12,098 SNPs, which were used to detect candidate markers potentially affected by spatially-varying selection in downstream analyses, were present on 837 scaffolds.
The RDA conducted using all 12,098 retained markers did not reveal a significant amount of genetic variation associated with the four constraining variables (F=1.04, P=0.188). Although the RDA model including the four explanatory variables explained 27.5% of the genetic variance observed among sites, this amount was lower than the cumulated variance explained by the first four unconstrained PC axes. The four variables explained similar proportions of genetic variance varying between 6.6% and 6.8%, but their marginal effects were not significant. Despite this lack of significance, the projection of sampling locations on the first two RDA axes closely reflected the geography and thermal characteristics of sampling sites along the directions indicated by the geographic and temperature vectors (fig. 2A). The 0.5% of the markers with the most positive and negative scores for the first two axes that were considered as RDA outliers represented 197 markers after retaining only one marker per RAD locus pair (supplementary table 1, Supplementary Material online).
The RDA performed using the subset of 197 outliers detected by the global RDA explained 35% (F=1.4759, P=0.018) of the genetic variance present at these markers. The first two RDA axes respectively explained 15.4 and 11.6% of genetic variance among sites (fig. 2B). The temperature vector was in an opposite direction comparing to the vectors for latitude and longitude and the vector for the year of capture was orthogonal to the three other constraining variables.
A total of 90 SNPs detected as candidate outliers with Bayenv2 were retained for further downstream analyses (supplementary table 2, Supplementary Material online). Among these, 15 outlier markers were associated with the latitude of the sampling site, 16 to the longitude (among which four were also latitude-related outliers), 16 markers to local temperature, and 47 markers were related to the year of recruitment of the glass eels. Among the 90 outliers detected with Bayenv2, 19 were among the 197 RDA outliers, and 9 additional ones were located within 100bp of one of these RDA outliers. Thus, 31% percent of the Bayenv2 outliers were also detected by the RDA outlier detection approach. The RDA performed on this subset of 90 Bayenv2 outlier markers explained 45.6% (F=2.3086, P<0.001) of the genetic variance present at these markers. The first 2 axes respectively explained 19.4 and 12.9% of genetic variance among sites (fig. 2C). The temperature was mostly related the first axis and in opposite directions to both longitude and latitude whereas the year of capture was mostly related to the second axis.
The simulations revealed that a significantly higher number of outliers were detected by Bayenv2 with the real data set compared with the simulated ones for the four variables (fig. 3A–D). Thus, except for the temperature for which one simulation out of 100 returned a number of outliers higher than the real data set, no run detected a higher number of outlier than the average of the 12 runs made with the real data set. These results therefore indicate that false positives are unlikely to have driven the observed patterns of association between allelic and environmental variation observed at some loci, which is therefore suggestive of selection driven by local environmental condition. Note that the number of outliers detected in each run was higher than the number of outliers used for further analyses. This is because, not always the same outliers were detected in each run and in order to remain conservative in downstream analyses, we retained only the markers that were significant in 6 runs or more out of 12.
Partial RDAs were performed on a data set combining all 259 outliers that were detected by one or both approaches. The year of recruitment explained the most important proportion of the total variance (10.3%, P=0.025), followed by the temperature (8%, P=0.12) as well as the latitude and the longitude (6.1%, P=0.4). The variable with the highest influence on each marker was identified. The year of recruitment was related to 76 outliers, followed by the temperature with 70 markers. The latitude was related to 64 markers and the longitude to 49 markers. In some cases, the variable that was detected to be related to an outlier by Bayenv was not the same as for the partial RDA. The variable related to each outlier marker is indicated in the supplementary table 3, Supplementary Material online.
The correlations assessed between each variable and the corresponding additive scores calculated using outlier loci (fig. 4A–C) and were all significant. For the three variables, the quadratic model had the lowest AIC, and it was followed by linear model and then the logistic model. Longitude was the variable with the highest correlation coefficient (R2=0.154, P≤10E-15), followed by the latitude (R2=0.152, P≤E10-15) and the temperature (R2=0.142, P≤E10-15). Correlations between the 10th and 90 centile of polygenic scores in a site and variables were significant for all three variables (R2=0.65–0.87; table 2). Correlations for the latitude and the temperature were best explained by a linear model. For longitude, the quadratic model was the best one.
The 259 outliers detected combining Bayenv2 and RDA analyses were distributed across 204 different scaffolds, 45 of which had more than one outlier. A total of 217 outliers were within 100kb from at least one nearby gene enabling the identification of 241 genes associated with an outlier (supplementary table 3, Supplementary Material online). Among them, eight genes had an exonic outlier SNP and 50 genes had an intronic outlier, one gene had both exonic and intronic outliers and 56 genes were within a 10kb distance from an outlier. For the 28 markers detected both by Bayenv2 and RDA, two markers were located in exons, 8 markers were in introns, and 10 were within 10kb from the nearest gene. The proportion of markers present in introns (20%) and exons (3%) was similar between the outliers and the whole data set.
No GO term was found significantly enriched in the analysis performed with DAVID. However, the KEGG pathway insulin signaling pathway showed a significant enrichment (P-value=0.04). Among the 171 genes implicated in this pathway, five (PRKACAA, GYS1, PTPN1, IKBKB, PHKG1A) were represented in our data set of genes potentially affected by selection. The gene GYS1 has an outlier into one of its intron. This gene codes for the glycogen synthase muscle isoform which produce glycogen in the muscle (Palm et al. 2013). The gene IKBKB, which has an exonic outlier, codes for a subunit of the protein IKK involved in the NF-kappa-B pathway which regulates the expression of multiple genes (Häcker and Karin 2006).
The list of outlier-associated genes also contained four genes coding for proteins implicated in the mitochondrial electron transport (NDUFA4, ATP5F1, NDUFA10, COX10) and five genes (SNX14, PHACTR4B, SHANK3, ESRP2, and NFATC3) that were previously detected to be potentially under spatially-varying selection between the glass eel and the yellow eel stages in the European Eel (Pujolar et al. 2015). Among them, two have intronic outliers (ESRP2 and NFATC3).
The goal of this study was to implement a genome-scan approach to evaluate the extent to which spatially-varying selection shapes genetic variation across the American Eel genome. Environmental correlation methods combined with simulations were used to reduce the false positive detection rate of outlier loci, controlling for temporal effects to identify polymorphisms influenced by selection within a single generation. We also evaluated the extent to which candidate markers collectively reflect environmental variation using an approach based on additive polygenic scores which has been rarely applied thus far in studies of nonmodel species. After screening a total of 12,098 stringently filtered RAD SNP markers in 710 glass eels from the entire species range, we identified 183 loci potentially affected by spatially-varying selection after controlling for temporal outliers. This represents ~1.5% of the genome, which is higher but still close to previous comparable estimates (Pavey et al. 2015; Laporte et al. 2016; Pujolar et al. 2014). This result is thus consistent with polygenic selection acting on multiple mutations with spatially-varying effects on fitness. Below, we discuss why the list of candidate loci detected here represents only a fraction of the genomic regions affected by spatially-varying selection. To support the “many genes of small effects” hypothesis, we show that subsets of outlier loci individually associated to a similar environmental variable collectively display a highly significant association with environmental variation. Finally, we searched for functional enrichment in biological pathways that may provide cues for the unknown gene networks under polygenic selection. We found an overrepresentation of the insulin signaling pathway, which together with previous findings (Gagnaire et al. 2012), indicate that lipid and saccharide metabolism pathways are among the main biological functions under spatially-varying selection across one the most ecologically contrasted species range in the marine realm.
The small changes in allele frequency at individual loci generated by spatially-varying selection in panmixia challenge the power of detecting the molecular footprints of selection, especially when selection acts on polygenic traits (Gagnaire and Gaggiotti 2016). Environmental association methods (e.g. Coop et al. 2010; Frichot et al. 2013; Guillot et al. 2014) offer interesting approaches to detect polygenic selection, although their relative power and error rate differ depending on the scenario considered (de Villemereuil et al. 2014; Wellenreuther and Hansson 2016). Here, we were interested in detecting patterns of allele frequencies associated with a selection gradient, a scenario in which Bayenv usually shows a low error rate. Although Bayenv takes into account the covariance in allele frequency across samples to test for locus-specific association with each environmental variable, the absence of neutral population structure to account for in our data set should at worst make outlier detection more conservative. The panmictic reproduction regime of the American Eel should thus eliminate the risk of detecting false positives due to spatial autocorrelation, a classical pitfall for genetic-environment association methods (Hoban et al. 2016). To complete this approach, we combined it with a constrained ordination method (RDA) that uses linear combinations of environmental variables to describe how environmental variations explain patterns of allele frequencies across samples. Because in that case outliers were identified as the strongest contributors to the constrained axes, the RDA approach seems appropriate to detect markers potentially influenced by multiple explanatory variables simultaneously. On the downside, this approach does not include a statistical framework to address the significance of candidate markers individually, which implies defining a somewhat arbitrary empirical threshold for outlier detection. Despite these differences between the two methods, 31% of the outliers that were detected by Bayenv were also found with the RDA approach, thus providing additional evidence for spatially-varying selection at these loci.
When combining all the 259 outliers detected by the Bayenv and RDA approaches, 29.3% of these markers were related to the year of recruitment which explained 10.3% of the total outlier genetic variance after accounting for confounding effects in a conditioned RDA. The importance of the recruitment year could be explained by the fact that environmental conditions vary among years potentially resulting in selection for different alleles. For example, the river mouth temperature at the time of larval recruitment to the coast was different between the two samplings years for the sites that were sampled both in 2008 and in 2009 at about the same time of the year. The genetic composition of the pool of reproducing adults in the Sargasso Sea may also vary from year to year. For instance, Côté et al. (2013) found yearly variation in the effective number of breeders (Nb) associated to the North Atlantic Oscillation (NAO), which could affect the gene pool composition of glass eels. The NAO has also a marked influence on North Atlantic surface winds which affects the Atlantic thermohaline circulation (Hurrell et al. 2001). A recent meta-analysis examining factors influencing selection in >160 species of plants and animals has found that the NAO was one of the factor explaining the biggest part of the temporal variation in natural selection observed in these species (Sipielsky et al. 2017). Finally, the sweepstake reproductive success of adults (Hedgecock and Pudovkin 2011) may generate temporal variation in the genetic composition of yearly cohorts, as already observed in the European Eel (A. anguilla; Maes et al. 2006; Pujolar et al. 2006). Therefore, the predominance of temporal outliers could be explained by a combination of temporally variable selective regimes acting on stochastically changing gene pools due to environmental forcing and sweepstake reproduction. The 183 remaining outlier loci were individually correlated with either temperature (38%), latitude (35%), or longitude (27%), and were therefore likely affected by spatially-varying selection acting during the early glass eel life stage (Gagnaire et al. 2012).
Inevitably, a genome scan approach based on environmental correlations may be underpowered to detect selected alleles with small individual effect, because subtle correlations between allele frequencies and explanatory variables may not reach statistical significance. This effect is also amplified by the reduced capacity of detecting linkage disequilibrium between our RAD markers and the actual causative alleles under selection, due to limited marker density (<1 SNP per 100kb). Because there is a tradeoff between the power to detect true positive associations and the risk of false positive detections, our list of candidate loci detected here probably contains false positives, while only representing a fraction of the genomic regions affected by spatially-varying selection.
Because of the above limitations, and in order to determine the extent to which individual outlier loci collectively mirror environmental variation across the American Eel distribution range, we also used a multilocus approach based on additive polygenic scores for different environmental factors. Ideally, genotypic variation at outlier loci detected in genome scans should be compared with individual phenotypic data to establish associations between genotype, phenotype and fitness (Barrett and Hoekstra 2011). In the case of polygenic adaptation, additive polygenic scores can be used to assess the joint contribution of individual loci to a quantitative phenotype or a measure of fitness (Gagnaire and Gaggiotti 2016; Hancock et al. 2011; Aarnegard 2014). Because spatially-varying selection effects on phenotype remain unknown in the American Eel, we could not use this approach to validate the genotype–phenotype link at outlier loci. However, we used a similar approach to evaluate the additive signal of outlier loci against environmental variation. In order to circumvent the possible circularity of an analysis based on the pooled polygenic scores of whole samples, additive polygenic scores were calculated at the individual level. Outlier loci were detected using single locus association tests that do not consider individual genotype information separately, nor preferential allelic associations among loci. Given the panmictic nature of eel population structure, and in absence of selective effects, genotypic similarity among individuals from a given sampling site would not be expected to be more important than among individuals from different sites. Here, correlations between individual polygenic scores and environmental variables were highly significant and best followed a quadratic model. Interestingly, these quadratic relationships closely mirror the one previously documented between the condition factor and the latitude of glass eels (Laflamme et al. 2012). This provides cues for a possible link between genotype and phenotype and suggests that the total number of locally favorable alleles displayed by individuals affects their chance of survival in a given location. Compared with previous studies on the same topic, our results thus provide new indications on the spatial patterns of polygenic variation generated by spatially-varying selection in eels. This should encourage further developments to specifically detect the molecular footprints of polygenic selection within a multilocus quantitative genetics framework (e.g. Stephan 2016).
Although the selective factors associated with longitude are still not very clear, the finding of higher polygenic scores at intermediate longitudes as supported by the quadratic model suggest that selection favors different optimal genotypes between the core region of the species range and its periphery on a longitudinal axis. For latitude the correlation may be explained by its strong covariation with environmental and biotic parameters, which is consistent with the molecular footprints of selection along latitudinal gradients observed in many species, such as Drosphila melanogaster (Adrion et al. 2015), Arabidopsis thaliana (Debieu et al. 2013), and Anopheles gambiae (Cheng et al. 2012). Moreover, eels that were captured at higher latitude had to migrate for a longer time and distance compared with the ones that were captured at lower latitudes. During their oceanic dispersal, eel larvae face different environmental conditions depending on where they are brought by the ocean currents. This could result in the selection for different alleles during larval transport which mostly follows a latitudinal trajectory. The relationship between polygenic scores and latitude follows a quadratic model which revealed that polygenic scores increase more rapidly at the lowest latitude than at the highest ones, which is selection gradient appears to be steeper in the southern part of the range.
The river mouth temperature during glass eel recruitment was the third strongest predictor of outlier polygenic scores. River mouth temperature, which can vary from 3 to 18°C at time of recruitment, has previously been identified to be an important selective agent responsible for spatially-varying selection in both the American Eel (Gagnaire et al. 2012) and the European Eel (Pujolar et al. 2014). As for the latitude, the relationship between the polygenic score and temperature follows a quadratic model into which the scores increase more rapidly in the warmer sites than in the colder ones, supporting that the strongest selection gradient is imposed by high temperatures. These observations, as well as the distribution of extreme polygenic scores across sampling sites, suggest that selection favors different optima throughout the species range by eliminating locally maladapted extreme genotypes.
In order to find cues regarding the phenotypic effects of the selected loci, we analyzed the functions of the 241 genes that could be associated with outliers. Among these, four genes coding for subunits of different mitochondrial electron transport chain complexes were found. This is a relevant result considering the role of the electron transport chain in ATP production, and the range of distances that glass eels have to travel to reach different parts of the species range during dispersal from the Sargasso Sea.
Another result of interest was the finding of five genes that were previously identified to be under selection between life stages in the European Eel (Pujolar et al. 2015). Because American and European eels are only partially reproductively isolated and exchange genes through introgressive hybridization (Albert et al. 2006; Gagnaire et al. 2009; Jacobsen et al. 2017), the sharing of locally adaptive variants is expected to occur. However, previous studies have found only limited evidence for parallel patterns of spatially-varying selection in Atlantic eels (Ulrik et al. 2014). Among the five candidate genes shared between species, three (pactr4b, SNX14, and SHANK3) are implicated in brain development (Cho et al. 2014; Ha et al. 2015; Kozol et al. 2015), perhaps reflecting similar selective constraints on this biological pathway in the two species.
Finally, the insulin signaling pathway, which was represented by five outlier genes, was found to be significantly enriched among the list of outlier genes potentially affected by selection. Here again, this pathway was previously found to be significantly enriched among genes affected by selection between life stages in the European Eel (Pujolar et al. 2015), which strengthens its candidate functional role. The insulin pathway is activated in response to the binding of the hormone on its receptor and plays a central role in the regulation of usage and storage of cellular glucose (Bevan 2001). Phenotypically, this pathway has also been related to body size in Drosophila (De Jong and Bochdanovitz 2003). In American glass eels, measures of otolith daily growth increments have shown a latitudinal gradient in growth rate (Wang and Tzeng 1998), with faster growing and earlier maturing glass eels at intermediate latitudes. The overrepresentation of the insulin signaling pathway playing a role in the use of energy reserve also corroborates previous findings indicating selection on lipid and saccharide metabolism pathways (Gagnaire et al. 2012). Altogether, this suggests that metabolic and energy productions are among the more important ones pathways under spatially-varying selection in the American Eel. Because spatial variation in individual growth rates has been reported (Wang and Tzeng 1998), future studies should aim at testing whether outlier polygenic scores partly account for differences in growth rate across the species range, or whether these are mainly caused by phenotypic plasticity in response to environmental conditions.
Finally, the admittedly small subset of annotated genes found here to be potentially under selection represents a good first set of candidates for further functional studies on the mechanism implicated in the spatially-varying selection in American Eel. For instance, the expression of these genes in individuals facing different environmental conditions (e.g. different water temperatures) could be analyzed in order to test for differential allelic expression and its association with growth related phenotypes in the different conditions.
C.B. and L.B. conceived the study. C.B. performed the lab work. C.B., P.A.G., and S.A.P. analyzed the data. C.B. and P.A.G. wrote the first draft of the manuscript. All authors contributed to the revision of the manuscript.
Supplementary data are available at Genome Biology and Evolution online.
We thank Clément Rougeux who helped for library preparation, Damien Boivin-Delisle who helped for DNA extraction, and all the Bernatchez lab members for discussion and support. We are also thankful to two anonymous referees and Associate Editor C. Baer for their constructive and useful comments on an earlier version of the manuscript. This work was supported by a Natural Science and Engineering Research Council of Canada (NSERC) discovery grant and Canadian Research Chair in Genomics and Conservation of Aquatic Resources to L.B. [grant number STGP 493768].