Individual genotyping data were modelled according to the reparameterized extension, recently proposed by Riebler et al. [
8], of the initial Bayesian hierarchical model developed by Beaumont and Balding [
7]. Briefly and in the bi-allelic SNP case, let
aij be the observed reference allele (defined arbitrarily) count in population
j = 1, ...,
J at locus
i = 1, ...,
I. The conditional distribution given the true (unobserved) allele frequency
pij in that population at that locus is assumed to be binomial with parameters
nij (twice the number of genotyped individuals in population
j at locus
i) and
pij:
aij |
pij~
Bin(
nij,
pij) (1). Note that 1) implicitly assumes populations are in HWE or their respective inbreeding coefficients (
FIS) are null. Non null
FIS could be taken into account in the model by considering instead that the three possible genotypes are drawn from a multinomial distribution with parameters corresponding to the number of individuals genotyped and genotype probabilities: (1-
FISj)
pij2 +
FISjpij; 2(1-
FISj)
pij(1-
pij) and (1-
FISj)(1-
pij)
2 +
FISj(1-
pij) [
18]. Nevertheless, for co-dominant markers such as SNP and given the range of
FIS values in our study (see Results), the binomial distribution seems to be reasonable (Figure S3 in additional file
1). The second step assumes that the true allele frequencies
pij are themselves sampled from a Beta distribution:
pij |
λij,
xi~
Beta(
λijxi,
λij(1-
xi)) (2). This distribution relies on an infinite Wright island model involving mutation, drift and migration at its equilibrium state [
76,
77]. Under this model,
xi might be interpreted as the frequency of the chosen reference allele at locus
i in the gene pool from which each allele frequency
pij is sampled. The scaling parameter
λij reflects the gene flow from the gene pool to population
j. Under Wright's model, this parameter is specific to each population
j but remains homogeneous over loci so that any departure from that property may indicate that locus
i is no longer neutral. Note that under the Beta distribution, the first two moments can be simply expressed as
E(pij) =
xiand
Var(pij) =
xi(1-
xi)(1+
λij)
-1. Actually (1+
λij)
-1 can be identified as a
FSTij coefficient [
17]. The next level of the model consists of specifying the distributions of
λij (or
FSTij) and of
xi. These frequencies are nuisance parameters and uncertainty about them will be taken into account by assigning to them non informative prior such as a
Beta(1,1) distribution (
e.g. [
78]). Following Beaumont and Balding [
7], the
λij's are modelled via a linear model on the logistic transformation of the
FSTij. Since F
STij/(1-F
STij) = 1/
λij, we can write this model in terms of
ηij =
log(FSTij/(1-
FSTij)) = -log(
λij) as:
ηij =
αi+
βj+
γij where
αi is a locus effect,
βj a population effect and
γij an error term corresponding to a departure of the logit of
FSTij from the additive decomposition. For theoretical and computational reasons, it is more efficient to consider the previous decomposition under a hierarchical Bayesian structure than under its basic form, and implement it as follows:
ηij |
αi,
βj,
σγ2~
iidN(
αi +
βj,
σγ2) with
βj |
μβ,
σβ2~
iidN(
μβ,
σβ2) and
αi |
σα2~
iidN(0,
σα2). Introduction of additional levels by defining priors on the variance components
σα2,
σβ2 and
σγ2 is straightforward but adds marginal gain in the estimates and requires high supplementary computational costs (Gautier and Foulley, unpublished data). We thus gave for these components the known values proposed initially by Beaumont and Balding (2004):
σα2 = 1,
σβ2 = 3.24 and
σγ2 = 0.25. The prior mean
μ for the
βj was also taken as -2.0. Recently, Riebler
et al. [
8] introduced in the above logistic model an auxiliary indicator variable
δi attached to each locus specifying whether it can be regarded as selected (
δi = 1) or neutral (
δi = 0). They demonstrated the efficiency of this approach for improving the power of the statistical procedure, particularly for the detection of loci under balancing selection. Under this reparameterized model, the previous parameters
αi are written as:
αi =
δiαi * with
αi * |
σα*
2~
iidN(0,
σα*
2). The model further assumes a Bernoulli distribution for the indicator
δi variable with parameter
P:
δi | P ~
Bin(1, P). P is itself assumed to be
Beta distributed to take into account uncertainty on this crucial parameter. Here we took
P ~Beta(0.2,1.8) [
8], thus assuming that a priori 10% of the loci are on average under selection, but within a very large range of possible values as the 95% credibility interval goes from almost 0 to 65%. Note that the value
σα*
2 can be derived from
σα2 since by construction
σα2 =
E(P) σα*
2 (hence
σα*
2 = 10).