Home | About | Journals | Submit | Contact Us | Français |

**|**Springer Open Choice**|**PMC2987561

Formats

Article sections

- Abstract
- Introduction
- Background: MCMC framework
- EMCMC framework
- Recombinative proposal distributions for EMCMCs
- MH acceptance rules for recombinative EMCMC
- Three experimental tests
- Conclusions and discussion
- References

Authors

Related links

Evolutionary Intelligence

Evol Intell. 2010 August; 3(2): 79–101.

Published online 2010 July 21. doi: 10.1007/s12065-010-0040-1

PMCID: PMC2987561

Department of Information and Computing Sciences, Utrecht University, PO. Box 80.089, 3508 TB Utrecht, The Netherlands

Madalina M. Drugan, Email: ln.uu.sc@aniladam.

Received 2010 February 23; Revised 2010 June 25; Accepted 2010 July 2.

Copyright © The Author(s) 2010

*Markov Chain Monte Carlo* (MCMC) methods are often used to sample from intractable target distributions. Some MCMC variants aim to improve the performance by running a population of MCMC chains. In this paper, we investigate the use of techniques from Evolutionary Computation (EC) to design population-based MCMC algorithms that exchange useful information between the individual chains. We investigate how one can ensure that the resulting class of algorithms, called *Evolutionary MCMC* (EMCMC), samples from the target distribution as expected from any MCMC algorithm. We analytically and experimentally show—using examples from discrete search spaces—that the proposed EMCMCs can outperform standard MCMCs by exploiting common partial structures between the more likely individual states. The MCMC chains in the population interact through recombination and selection. We analyze the required properties of recombination operators and acceptance (or selection) rules in EMCMCs. An important issue is how to preserve the detailed balance property which is a sufficient condition for an irreducible and aperiodic EMCMC to converge to a given target distribution. Transferring EC techniques to population-based MCMCs should be done with care. For instance, we prove that EMCMC algorithms with an elitist acceptance rule do not sample the target distribution correctly.

Markov Chain Monte Carlo (MCMC) is a framework of algorithms for sampling from complicated distributions. The use of MCMC in Machine Learning has recently been advocated by [1]. Usually, a single MCMC is run until it converges to the stationary distribution. To improve their efficiency, some MCMC variants consist of a population of chains that interact by exchanging useful information and at the same time preserve the MCMC convergence characteristics at the population level. In this paper, we are particularly interested in techniques that use multiple interacting chains in parallel as opposed to a single chain.

The stochastic process of Evolutionary Computation (EC) and MCMC algorithms is basically similar: both are Markov chains with fixed transition matrices between individual states, for instance transition matrices given by mutation and recombination operators for EC and by perturbation operators for MCMC. Furthermore, both Metropolis-Hastings—a subclass of MCMCs—and EC algorithms have a selection step, the acceptance rule, to propagate good individuals to the next generation. There are also many differences induced by the different scope of these algorithms: EC is used for optimization and MCMC is used for sampling. Additionally, MCMC uses a single chain whereas EC algorithms use a population of individuals that interact. Motivated by the common points of these two algorithms, we have previously discussed the Evolutionary MCMC (EMCMC) framework which aims to improve the efficiency of standard MCMC algorithms [7, 8]. EMCMC is a population-based MCMC that exchanges information between the individual chains such that at population level it is still an MCMC.

In general, it is not straightforward to integrate interaction between chains, like recombination or selection, into population based MCMCs and to preserve the convergence to the target distribution. To ease proving that EMCMCs converge to the stationary distribution the individuals generated with recombinative operators (an alternation between mutation and recombination operators) should be all accepted or all rejected [8, 16] with a so called *coupled acceptance rule*. Note the difference between this coupled acceptance and the popular selection strategies in EC; the coupled acceptance rule is selective at the family (i.e., the set of children generated by a set of parents) level whereas the selection strategies are selective at individual level that is one of the children competes against one of its parents. Using the standard MH acceptance rule where only one of the multiple children generated from multiple parents is accepted/rejected is a straightforward alternative algorithm [27]. It is interesting to note that Mahfoud and Goldberg [17] also obtained good results for *Simulated Annealing* (SA) [14] algorithms where one child competes against one of the parents. However, such a recombinative EMCMC does not fit in the standard framework of Metropolis-Hastings algorithms. Some alternative solutions proposed previously restrict the proposal distributions that generate new individuals by generating only one child at the time from a family of parents [3, 5, 15, 23, 24]. For example, [15] proposed an EMCMC algorithm that uses a population based univariate distribution to sample from likely Bayesian network structures. Other algorithms, for example some population-based adaptive MCMCs [9] and sequential Monte Carlo [6], relax the Markov property at the price of more difficult convergence properties and usage by practitioners.

In this paper, we theoretically and experimentally study various recombination operators and their interaction with acceptance rules resulting into *EMCMCs* with a required target distribution. We investigate the properties of several popular recombination operators in GAs (i.e., uniform recombination) when integrated in the EMCMC framework. We show that the individuals that interact in generating candidate individuals should also interact in the acceptance rule to sample from the target distribution. Acceptance rules that are directly derived from the EC’s selection strategies are more useful for optimization than for sampling. The sampled distribution is skewed compared with the target distribution: the fit states of the search space are amplified and the less fit states are diminished.

We propose a general method that corrects the target distribution of a recombinative EMCMC that does not sample from the intended distribution. This technique simply considers the recombinative EMCMC as the proposal distribution and the generated children are all accepted/rejected with a coupled acceptance rule. In this way we postpone the acceptance or rejection of all children with the hope that the recombinative EMCMC generates fit individuals that will increase the chance that children are accepted and, consequently, that the algorithm converges faster to the target distribution. This method has theoretical value constructing a correction term with which the sampled distribution should be multiplied to transform it into the target distribution.

We compare in practice the performance of various recombinative and non-recombinative EMCMCs with the standard and the population-based MCMC. When comparing (E)MCMCs we respond to three questions: (1) how useful are EMCMCs when compared with MCMCs, (2) how useful are the recombinative operators and (3) what is the difference in performance between EMCMCs using the standard MH acceptance rule selective at individual level and EMCMCs using the coupled acceptance rules. The recombinative operators chosen are the most popular operators in EC: discrete space uniform recombination and uniform mutation. As a consequence, the theory and the practical examples are formulated for the discrete space (E)MCMCs. We also mention that it is straightforward to extend these results to the continuous space (E)MCMCs.

For our first experiment we analytically compare the algorithms an a toy example such that the exact performance of algorithms is calculated from all the transitions between all the states of an (E)MCMCs. In the second experiment we calculate the Kullback-Leiber distance between the target distribution and the distribution output by an algorithm after a finite number of steps on a relatively small size *binary quadratic programming problem* (BQP) to exactly compute the target distribution. The next experiment is on a larger size BQP where we can compare the performance of (E)MCMCs using only graphical (and more imprecise) tests. Note that BQP is related to the popular mathematical problem in statistical mechanics known as the Ising model [10]. The obtained results show that recombination improves the mixing of the EMCMC especially when the standard MH acceptance rule is used with recombination.

Section 2 presents some basic knowledge of MCMC algorithms and introduces the notation used in the rest of the paper. For an in depth study on MCMCs we refer the reader to [12]. In Sect. 3 the EMCMC framework is presented. In Sect. 4 we investigate several recombination operators and their desired properties for EMCMCs. Section 5 proposes and analyzes various MH acceptance rules and the properties of the resulting EMCMCs when combined with the recombinative operators. We also establish rules to design recombinative EMCMCs for sampling and optimization. In Sect. 6 we analytically investigate the discussed EMCMCs on a toy problem and experimentally test them on two BQP problem instances. Section 7 concludes and discusses the results of the paper.

MCMC is a general framework to generate samples *X*^{(t)} from a probability distribution *P*(·) while exploring its so-called countable -dimensional state (or search) space *E* using a *Markov chain*. We assume the state space is compact. MCMC does not sample directly from *P*(·), but only requires that it can be evaluated within a multiplicative constant , where *Z* is a normalization constant and the unnormalized target distribution. A discrete time Markov chain is a stochastic process (*X*^{(0)}, *X*^{(1)}, …) with the property that the probability distribution for the state *X*^{(t)} given all previous values (*X*^{(0)}, *X*^{(1)},…, *X*^{(t−1)}) only depends on *X*^{(t−1)}. Mathematically, we can write

We call the *transition matrix* of the Markov chain. A *homogeneous* Markov chain in addition, has a time-independent transition matrix. In the following we only consider homogeneous Markov chains, unless specified otherwise. Aperiodicity excludes for instance that certain points can only be reached at even times. For any starting point a Markov chain with a finite state-space converges to a unique invariant distribution if it is irreducible and aperiodic. A Markov chain is called irreducible if, and only if, every state can be reached from every other state in a finite number of steps.

A sufficient, but not necessary, condition to ensure that the given distribution *P*(·) is the stationary distribution is that it satisfies the *detailed balance condition* [1]. A MCMC satisfies the detailed balance condition if, and only if, the probability to move from *X* to *Y* multiplied by the probability to be in *X* is equal to the probability to move from *Y* to *X* multiplied by the probability to be in *Y*:

Many MCMC algorithms are Metropolis-Hastings (MH) algorithms [13, 18]. Since we cannot sample directly from the distribution *P*(·), MH algorithms consider a simpler distribution , called the *proposal distribution* to generate the next state of a MCMC chain. generates the candidate state *Y* from the current state *X*^{(t)}, and the new state *Y* is accepted with probability:

If the candidate state is accepted, the next state becomes *X*^{(t+1)} = *Y*. Otherwise, *X*^{(t+1)} = *X*^{(t)}. For finite search spaces, the transition probability for arriving in *Y* when the current state is *X*^{(t)}, where *X*^{(t)} ≠ *Y*, is

The rejection probability is,

An MH algorithm is aperiodic, since the chain can remain in the same state with a probability greater than 0, and by construction it satisfies the detailed balance condition,

If, in addition, the chain is irreducible, then it converges to the stationary distribution *P*(·). The rate of convergence depends on the relationship between the proposal distribution and the target distribution: the closer the proposal distribution is to the stationary distribution, the faster the chain converges. A popular Metropolis-Hastings algorithm is the *Metropolis algorithm* where the proposal distribution is *symmetrical* and the acceptance rule becomes

A popular and often used set of irreducible proposal distributions for MH algorithms can be described by a *mutation operator*. We generically denote the proposal distributions resulting from mutation operators with *Q*_{m}. We consider a state in the discrete space as a string of characters, **X** = (*X*_{1}, *X*_{2},…, *X*_{}). The *h*-th position in **X** is called the *locus* of *X*_{h}, where 1 ≤ *h* ≤ , and the value of *X* in the locus *h* is called an allele. Each position (or *locus*) *h* in an individual **X** is instantiated with an *allele**X*_{h} *E*(*X*_{·}), where *E*(*X*_{·}) is the multi-set of all possible values of *X*_{·}.

The *uniform mutation operator* randomly changes every value of each variable of the current state with a non-zero probability, called the mutation rate [8, 16, 17, 23]. The bigger the uniform mutation rate, the bigger the jump in the search space of the child state from the parent state. *Q*_{m} denotes the *uniform mutation proposal distribution*. When the context is not ambiguous, we simply refer to it as mutation. The uniform mutation operator defines an irreducible, symmetric and stationary proposal distribution [8].

In the sequel, the *uniform mutation transition matrix*, *K*_{m}, proposes candidate individuals with *Q*_{m} and accepts them with the MH acceptance rule. The uniform mutation transition matrix, *K*_{m}, defines an irreducible MH algorithm which converges to its stationary distribution [8].

When we talk about the performance of an MCMC, we refer to how well an MCMC is mixing or how “fast” it converges to the target distribution. We say that an MCMC is mixing “well” if it rapidly traverses the search space and, at the same time, accurately samples the target distribution. Note that the mixing concept in *MCMC* is not related to the mixing of building blocks in the EC literature.

In an attempt to improve the mixing behavior of MCMCs one could make use of multiple chains that run independently (MICs). The chains are started at different initial states and their output is observed at the same time. It is hoped that this way a more reliable sampling of the target distribution *P*(·) is obtained. It is important to note that no information exchange between the chains takes place.

Recommendations in the literature are conflicting regarding the efficiency of multiple independent chains. Yet there are at least theoretical advantages of multiple independent chains MCMC for establishing its convergence to *P*(·) [12]. Let’s consider a large dimensional distribution where an MCMC takes a long time to find a relevant region of the search space and to escape from it to search for other relevant regions. Then, the time necessary for a long MCMC can be larger than just starting multiple MCMCs spread over the search space sampling in different regions. However, MIC converges only after all the component MCMC chains have converged.

Since the chains do not interact, MIC is at the population level an MCMC with transition probabilities equal to the product of component chains transition probabilities, or

where . If the MCMCs have detailed balance, are irreducible and aperiodic, then MIC inherits these properties and it converges, at the population level, to the product of their target distributions, *P*_{1}(·)××*P*_{N}(·), where *P*_{i}(·) is the target distribution of the *i*-th chain.

EMCMCs use a population of chains that allow interactions between the individuals under the assumption that individuals in the current population exchange information that helps the EMCMC to sample the desired distribution. Note that, in EMCMCs, the population is a multi-set of individual states rather than a collection of MCMCs: the current individual states depend on several states from the previous population. Now the sample at time *t* is the population of *N* states (or individuals) .

An *evolutionary Markov chain Monte Carlo* (EMCMC) algorithm is a population MCMC that exchanges information between individual states such that, at the population level, the EMCMC is an MCMC.

Similarly to an MCMC, the main goal of an EMCMC is to sample from a given distribution, *P*(·). Ideally, an MCMC algorithm generates individuals directly from the target distribution. Unfortunately, we do not know where the most likely—or equivalently, the most fit—individual states can be found in the search space. Furthermore, MCMCs can poorly “mix” when individual states are disproportionately proposed with their probability. A standard MCMC, for example, generates individuals with some mutation proposal distribution (e.g., the uniform mutation proposal distribution *Q*_{m}) that does not have any knowledge of the sampled distribution. A method to speed up the mixing is to propose individuals using proposal distributions that are “close” to the target distribution. For that, we can use recombination operators that exploit the common structure of the parents. Sampling from a distribution implies that the more fit individuals are more often generated than less fit ones. As a consequence, the commonalities of more likely individuals are used by recombination to create other more likely individuals. Intuitively, such a proposal distribution approximates better the target distribution than a proposal distribution that does not make any assumption about the generated individuals, like uniform mutation. In this perspective, the recombination operators adapt the proposal probabilities to generate an individual from the current population. Note that, the allowed types of proposal distribution are the ones that preserve the Markov chain property at the population level: we can only use the information in the current population for generating new individuals.

We call EMCMCs that use recombination to exchange information between individuals *recombinative EMCMCs*.

A recombination operator used as proposal distribution of an EMCMC generates one or more children from two or more parents using some function that is independent of the EMCMCs’ sampled distribution. Each generation, the population is uniform randomly grouped in disjunct families of few (i.e., two, three) individuals such that each individual belongs to exactly one family. All the chains from an EMCMC eventually interact in population recombinations. We call *recombination proposal distribution*, *Q*_{r}, the distribution defined by the recombination probabilities at the population level.

It is important to note that at the individual family level, the proposal probabilities of recombination are not stationary since they depend on the family members with which they are grouped. At population level, however, the recombination proposal distribution generating the next population from the current one is stationary.

We only consider recombination operators that are respectful—this is, the common substructures of the parents are inherited by the offspring [20]. With respectful recombination the common parts of strings are protected against disruption.

An important aspect of any recombination operator is to establish whether it is symmetrical or not: for non symmetrical recombinations, we have to compute the proposal probabilities, whereas for symmetrical operators we can simply use the Metropolis algorithm. In Sect. 4.1 we design and investigate several recombination operators that generate symmetrical proposal distributions and in Sect. 4.2 we give examples of recombination operators that generate non-symmetrical distributions. We focus on the most popular type of recombination operators in GAs that swap alleles between two or more parents with some probability to generate one or more children. Since respectful recombination by definition is reducible [8], in Sect. 4.3 we combine recombination with mutation to obtain irreducible proposal distributions following the simple mathematical rules of mixtures and cycles [8].

The recombination operators usually have no information about how fit the individuals in the current and proposed population are. Then, like for the standard MCMCs, we need acceptance rules to sample from the target distribution. Detailed balance is a sufficient, but not a necessary condition, for an irreducible aperiodic EMCMC to converge to a desired target distribution *P*(·). By definition, MH algorithms are aperiodic and have detailed balance. Most EMCMCs are irreducible MH algorithms—by use of mutation—and apply recombination in the proposal distribution.

In Sect. 5.1 we propose an EMCMC where individuals are generated with recombinative proposal distributions and the parents and children are competing in a Metropolis-Hasting acceptance rule. Such an EMCMC has detailed balance if and only if the individuals that interact through recombination also interact in the acceptance rule. We further call these acceptance rules where two or more chains interact the *coupled acceptance rule*. We prove that such an algorithm is ergodic—that is irreducible and aperiodic—with the stationary distribution *P*_{1}(·)××*P*_{N}(·), where *P*_{i}(·) is the target distribution of the *i*-th chain. However, such a coupled acceptance rule has a negative effect on the performance of an EMCMC. If some children are fit individuals but the others are not, this acceptance rule can reject “good” individuals whereas the standard MH acceptance rule will always accept them.

We investigate the convergence properties of recombinative EMCMCs using variations of the Metropolis-Hasting acceptance rule. In Sect. 5.2 we prove that the recombinative population-based MCMCs that accept/reject each candidate state using the standard Metropolis acceptance rule does not have detailed balance. Its advantage is that the probability of accepting at least one individual of this EMCMC is larger than the acceptance probability of an EMCMC using the coupled acceptance rule. In Sect. 5.3 an example of an MH acceptance rule derived from the elitist replacements selection strategy [25] is designed. The sampled distribution is even more skewed towards probable states and the acceptance probability of one individual is even larger. In Sect. 5.4 we propose and analyze a methodology, we call it *nested EMCMC*. It corrects the sampled distributions of skewed EMCMCs by accepting/rejecting all the individuals generated with the EMCMCs with the coupled acceptance rule. This nested EMCMC has detailed balance even though the initial EMCMC does not.

In this section we propose and analyze various recombinative proposal distributions and their properties for EMCMCs that sample from the desired target distribution.

In EMCMCs, the symmetry is obtained by preserving the distance between the parents and their children. For example, the distance between *N* children is equal with the distance between the *N* parents that generate the children, or the distance between a parent and its child is constant as compared with the distance between two other individuals in the population.

When the distance, i.e. Hamming distance, between the generated children is the same as the distance between their parents, the recombination operator is symmetrical.

*Consider**N**parents uniform randomly chosen without replacement from the current population*, , *and an associated distance metric**with*

*where*. *Let the recombination operator where**N**candidate individuals*, {**y**_{i},…, **y**_{i+N−1}}*, are generated by swapping alleles of parents such that the corresponding proposal probability**is a function of the distance between parents such that the distance between parents is equal with the distance between their children*

then *S*_{r} is symmetrical.

The probability to generate children from their parents is *S*_{r} is a distance function *f*: *E*^{2} → between parents

Thus, this recombination is symmetrical.

Note that if the number of children is different from *N*, in general, the symmetry condition does not hold. We discuss such examples in the next section.

The swapping recombinations, often used in EMCMCs and the standard GAs, are particular cases of the above proposition where the distance between individuals are kept constant by swapping alleles.

*Recombination proposal distributions which swap parts of individuals in between chains using a uniform distribution are symmetrical, respectful and stationary*.

Since there are equal probabilities to swap alleles (parts) in between parents and in between children, this recombination is symmetrical and the distance between them remains equal. If the parents have the same allele on a locus, so do the children since the swapping does not change the values of alleles.

We have recombinations which exchange non-common alleles, e.g., uniform crossover, or parts of individuals, e.g., 1 and 2 point crossover [16, 17]. These recombinations are often used only with two parents.

In binary spaces, an example of swapping recombination is *parameterized uniform crossover*, *Q*_{unif}, which generates two candidate individuals by swapping alleles between two parents with a uniform probability, *p*_{x}. Thus, it is impossible to generate children that have other common alleles than their parents. Where the two parents differ, an allele is swapped with the probability *p*_{x} and is not swapped with the probability 1 − *p*_{x}. It is interesting to observe that the time complexity to generate two children from two parents with *Q*_{unif}, like for uniform mutation, is linear with the dimensionality, .

For *p*_{x} = 0.5, the operator is called uniform crossover and is used with all codings: for strings of bits [16] and for strings of real numbers [12].

In the following, we introduce a general condition to design symmetrical recombinations using three parents which generate one child.

*Consider three parents uniform randomly chosen without replacement from the current population*, . *The recombination operator where a candidate individual*, **y**_{i}*, is generated from the three parents such that the total distance between parents is equal with the total distance between the candidate individual and*

*is symmetrical, where**is a distance metric*.

The parent and the child **y**_{i} are interchangeable; they have the same total distance with the other two parents. Thus, this recombination is symmetrical.

As an example in the binary space, we propose the *total difference crossover*, *Q*_{dif}. This type of recombination is imported from real coded EAs [22] and EMCMCs [24]. The new individual, **y**_{i} has the same alleles like on the positions where the two other parents coincide. On the other positions, we flip the alleles of with the probability *p*_{x}.

*Q*_{dif}*is symmetric, respectful and stationary. The time complexity of**Q*_{dif}*, like for**Q*_{unif}*, is linear with the dimensionality*, .

The xor crossover [23] is a special case of *Q*_{dif} where the probability to flip a bit is 1 for ’s bits where and disagree.

The main difference between the two symmetrical types of recombination is that one preserves the sum of distances between the three parents when generating a child and the other preserves the distance between two parents when generating two children.

Given the number of chains that interact, we distinguish between *family* and *population* recombinations. Recombining few chains (e.g., two or three chains) is an example of the first approach, while in the latter all chains from the population exchange information. The above recombination proposal distributions are all family recombinations.

We assume that, for family recombination, each generation, the population is uniform randomly grouped in disjunct families such that each individual belongs to exactly one family. All the chains from an EMCMC, eventually, interact in population recombinations. We call *recombination proposal distribution* the distribution defined by the recombination probabilities at the population level. We denote it with *Q*_{r}. In the case of an individual at the family level, the proposal probabilities of recombination are not stationary since they depend on the family members with which they are grouped. At population level, the probability to generate with recombination one population from another one is stationary.

For the above family recombinations (e.g., *Q*_{unif} and *Q*_{dif}), the time complexity at the population level is linear with the number of individuals in the population: each generation, each individual is randomly paired in exactly one family. The complexity of these recombination proposal distributions at the population level therefore is . Note that, at the population level, the complexity of the mutation proposal distribution depends linearly on the number of individuals in the population .

We investigate two non-symmetric recombinations where the alleles are exchanged between parents but, this time, the distance between parents and children is not preserved

This recombination swaps the alleles between two parents like the parameterized recombination but it generates one child instead of two. Then, the distance between parents, in general, is not same with the distance between the child and one of the parents. Thus, the recombination is not symmetrical. We call this recombination the *masked recombination*, *Q*_{mask}.

A child **y**_{i} is generated from a parent, , and a mask, . The common alleles of and are passed to **y**_{i}, but the non-common alleles are flipped in with the rate *p*_{x}. Note that this crossover and the parameterized uniform crossover have the same probabilities to generate one child. But, *Q*_{unif} is symmetrical and *Q*_{mask} is not symmetrical, because *Q*_{mask} generates only one child. Consequently, we have to compute the probabilities to generate a candidate individual with *Q*_{mask} in the acceptance rule of the MH algorithm. *Q*_{mask} also resembles *Q*_{dif} where two parents are identical. However, by replacing the identical parents with the child in the candidate generation, the symmetry condition does not hold.

*Q*_{mask}*is reducible and stationary. Consider that from a parent**and a mask**we generate a child*, **y**_{i}*with**Q*_{mask}*. Then*, *Q*_{mask}*is non-symmetrical. The time complexity to generate a child with**Q*_{mask}*is linear with the string size*.

Let’s consider that because bits are flipped on some positions. In those positions, the mask and the child **y**_{i} have the same values, whereas and do not. Then, it is impossible to generate from **y**_{i} and . The rest of the properties follow directly.

This recombination builds a probabilistic model of the parents to generate the children. It is analogous with the operator that generates individuals for the estimation distribution (EDA) algorithms applied in Evolutionary Computation for solving optimization problems [19].

We propose the *tree frequencies probabilistic recombination*, *Q*_{tree}, closely related with the probabilistic model of Baluja [2]. Unlike the previous recombination operators where an allele is generated only given the alleles on the same position, *Q*_{tree} considers the dependencies between two positions in the population using the Chow and Liu [4] algorithm.

In the following, we describe the algorithm we use to generate individuals with *Q*_{tree}. This algorithm constructs from the population of current individuals a tree with maximum entropy using a mutual information function. The entropy describes the level of uncertainty in a statistical variable. Here, the frequencies of the alleles in a position define a statistical variable for that position. Mutual information captures the extent to which two statistical variables are dependent. This algorithm keeps adding dependencies between variables based upon their mutual information under the constraint of building a tree (e.g., there are no cyclic paths between variables). The higher the mutual information is, the sooner the algorithm tries to add the dependency in the tree.

A root for this tree is chosen at random from the set of positions. The allele for the root position is chosen based on its frequencies in the current population. We iteratively generate the other alleles based on their dependency with an allele—called parent—which was already instantiated in the tree. If *h* is the root of the tree, then the allele **y**_{ih} is generated using the distribution *N*(**y**_{ih})/*N*, where *N*(**y**_{ih}) is the number of alleles **y**_{ih} in the current population. Otherwise, if *h* has the parent *h*_{1} in the tree, then the allele **y**_{ih} is generated with the probability where is the allele already generated in position *h*_{1}, and is the number of individuals in the current population that have allele **y**_{ih} on position *h* and allele in position *h*_{1}.

We observe that *Q*_{tree} is the most expensive recombinative proposal distribution we have investigated for EMCMC. Unlike the other discrete space recombinations, *Q*_{tree} exploits some relationships between dimensions: it computes the dependencies between two positions in order to construct the tree of maximum entropy and to assign a value to an allele. Then, the generation of an allele on a position also depends on the alleles on another position.

*Q*_{tree}*is respectful, non-symmetrical, stationary and biases the exploration according to the non-linear correlations between dimensions. The computational complexity to generate a child with**Q*_{tree}*is**, where**l**is the dimensionality and**N**the size of the population*.

When an individual is generated with *Q*_{tree} and replaces a parent, some allele frequencies can increase at the cost of the others. The computational complexity of this operator is given by building the maximum log-likelihood tree. Chow and Liu [4] show that this is .

*Q*_{tree} is a generalization of Laskey and Myers [15]’s recombination proposal distribution; when generating an allele, they consider only the frequencies of the alleles on the same position and not also on the other positions as *Q*_{tree} does. Therefore, their recombination, unlike *Q*_{tree}, does not exploit the relationships between dimensions.

Since respectful recombination by definition is reducible, in the following, we study how to combine it with mutation to obtain irreducible proposal distributions. We combine the proposal distributions following the same simple mathematical rules as for transition distributions. We study the properties (like symmetry and irreducibility) of the resulting proposal distributions. We show some examples where the properties of the component proposal distributions are inherited by the complex proposal distribution. However, in general, we have to check the properties for each distribution.

We combine mutation and recombination in mixtures and cycles.

A *mixture of proposal distributions* is a probabilistic sum of proposal distributions where each step one distribution is selected according to some constant positive probability. A *cycle of proposal distributions* is the product of proposal distributions where in each step one distribution is used in turn in a specific order.

*In a mixture of proposal distributions, if one distribution is irreducible, then the mixture is irreducible. A mixture is symmetrical if the component distributions are symmetrical. A mixture is stationary if all component distributions are stationary*.

If one distribution is irreducible, then there exists a non-zero probability to generate any population from any other population. The rest of the properties follows directly.

For example, the following mixture

is irreducible when *p*_{r} < 1, and symmetrical when the recombination is symmetrical. Note that the above operator is equivalent to recombination, *Q*_{m+r} = *Q*_{r}, for *p*_{r} = 1; then, like recombination, *Q*_{m+r} is reducible. Note that the computational cost of a mixture of proposal distributions is driven by the most expensive component proposal distribution. Furthermore, a mixture exploits some relationships between dimensions if a component proposal distribution does.

Unlike for mixtures, for cycles, there are no rules for irreducibility or symmetry. They have to be checked for each cycle. Cycles of proposal distributions are common for the standard GAs where one considers first mutation and then recombination, *Q*_{m×r}, or first recombination and then mutation, *Q*_{r×m}.

In general, since two matrices usually do not commute, *Q*_{m×r} and *Q*_{r×m} are non-symmetrical.

*Q*_{m×r}*and**Q*_{r×m}*, are symmetrical for any recombination that swaps alleles* [17]. *Q*_{m×dif}*and**Q*_{dif×m}*are non-symmetrical*. *Q*_{m×r}*and**Q*_{r×m}*are irreducible and stationary. If the recombination**Q*_{r}*is symmetrical, we have*

To ease the reading of the paper, we give the proof for the above proposition in Appendix 1.

Parallel Recombinative Simulated Annealing (PRSA) [17] uses recombination that swaps alleles followed by mutation. Note that it is impractical to compute the probabilities of a cycle: we have to sum over all possible intermediate populations. Therefore, in general, it is impractical to use non-symmetrical cycles. In the following, we show two cycles where the above non-symmetrical recombinations are efficiently combined with uniform mutation directly on each position of an individual.

Consider a parent and a mask chosen at random from the population. Like for *Q*_{mask}, for the non-common values of the two parents, is flipped with the probability *p*_{x} to generate the child **y**_{i}. Unlike for *Q*_{mask}, for the common parts of these parents, is flipped with the low probability 1/ to generate the child **y**_{i}. We generate from the mask a second child **y**_{i+1} with the uniform mutation with the mutation rate *p*_{m}. We denote this proposal distribution with *Q*_{m×mask} where

In the next proposition we show that *Q*_{m×mask}, unlike *Q*_{mask}, can be used with an MH algorithm. Furthermore, although it is a cycle, its computational time is similar with the one of uniform mutation.

*Q*_{m×mask}*is irreducible*. *Q*_{m×mask}*is symmetrical if**p*_{m} = 1/2 *or**p*_{x} = 1/*. If**p*_{m} ≠ 1/2 *and**p*_{x} ≠ 1/ *then**Q*_{m×mask}*is non-symmetrical. The time complexity of**Q*_{m×mask}*is linear with the string size*.

The prove is given in Appendix 2 to ease the reading of the paper.

Similarly, we combine the tree frequencies probabilistic recombination, *Q*_{tree}, with the uniform mutation in a cycle to be able to use it with the MH algorithm. We first construct the maximum entropy tree. We choose at random a position, *h*, which we consider the root, we propose an allele **y**_{ih} with the probability . Iteratively, we propose an allele **y**_{ih} with the probability

where the allele on the position *h*_{1}, **y**_{ih_1}, is already instantiated. We denote this operator with . Like *Q*_{tree} and unlike the other proposal distributions, *Q*_{m×tree} exploits some relationships between different dimensions.

*Q*_{m×tree}*is irreducible and non-symmetrical. The time complexity to generate an individual with**Q*_{m×tree}*is**, where* *is the string size and**N**the population size*.

The proof is immediate.

In Table 1 we present the operators composed from mutation and/or recombination, their irreducibility, their symmetry, and their number of parents compared with the number of children.

In this section we propose various MH acceptance rules and we discuss the properties of EMCMC algorithms resulting from the interaction between the recombinative operators and these acceptance rules.

We establish that the EMCMCs that generate individuals with irreducible recombinative proposal distributions and accept/reject them all has detailed balance and the target distribution for this EMCMC.

*Consider the EMCMC algorithm that proposes**N* ≥ 2 *children*, **Y** = (**y**_{1},…, **y**_{N}) *from**N**parents*, *using a irreducible proposal distribution**Q**that is independent of the target distribution. All children are accepted or all children are rejected with the probability*

This EMCMC is ergodic with unique stationary distribution *P*_{1}(·)××*P*_{N}(·), where *P*_{i}(·) is the unique stationary marginal distribution for the *i*th chain, ∀*i* = 1,…, *N*.

The prove is given in Appendix 3 to ease the reading of the paper.

Note that the EMCMC resulting from the interaction between the proposal distribution *Q* and the MH acceptance rule α_{C} is an MCMC over the *N* dimensional search space *E*^{N}. We denote the transition matrix for this EMCMC algorithm with *K*_{C}. The transition probability between a candidate state **Y** and the current state **X**^{(t)} is and the rejection probability is .

**The coupled acceptance rule**. The coupled acceptance rule α_{C} [11, 16] considers for acceptance two chains. Two children, **y**_{1} and **y**_{2}, generated from two parents, and , are both accepted or rejected with the coupled acceptance rule .

When α_{C} is associated with one of the irreducible recombinative proposal distributions—for instance *Q*_{m×unif} and *Q*_{m+unif} that generates two children from two parents—according with Theorem 1, the EMCMC algorithm has detailed balance and samples from the target distribution *P*_{1}(·)×*P*_{2}(·).

*Consider the EMCMC algorithm that proposes two children from two parents using an irreducible proposal distribution**Q**and accepts/rejects the children using the coupled acceptance rule* α_{C}. *We denote the corresponding transition matrix with**K*_{C}. *This EMCMC converges to**P*_{1}(·)×*P*_{2}(·)*, where**P*_{i}(·) *is the marginal distribution of the**i**-th chain**i* = 1,2.

However, in practice, such an acceptance rule is not always desired, since it is not selective at individual level. For example, usually, individuals with higher and lower probabilities are proposed; with α_{C} the fit individuals can be rejected and the acceptance of less fit individuals depends on the family’s fit individuals.

Note that the target distribution of this EMCMC is given by the product of distributions in the MH acceptance rule. By replacing this product with other mathematical functions (e.g., maximum of two values as in the next example), the corresponding EMCMC converges to a different distribution.

To sample from the *order two statistics* distribution

we create a variant of the coupled acceptance rule

where max is the maximum for the values of two individuals, and is any proposal distribution.

According to Lemma1, an EMCMC that proposes two candidate individuals from two parents and accepts/rejects them both with α_{2:1} has detailed balance. If the proposal distribution is also irreducible, this EMCMC converges to the stationary distribution *P*_{2:1}(·,·).

Most EMCMCs use family recombinations where, each generation, all individuals are randomly grouped such that each individual belongs to exactly one group. If the children generated with recombination are all accepted or all rejected with an acceptance rule as suggested in Theorem 1, we obtain *family transition probabilities* with detailed balance. At individual or family level, these transitions are not MCMCs, since their proposal probabilities are not stationary—they depend on how the individuals are grouped. At population level, for all possible groupings of the current population, the transition distribution is stationary. Then, the *population transition probabilities* obtained by combining the family transitions have detailed balance and define an MCMC.

In the following, we investigate the properties of EMCMCs that use irreducible recombinative proposal distributions and the standard MH acceptance rule. Such an EMCMC does not fit in the standard MH framework where the individuals that interact in the proposal distribution also interact in the acceptance rule. For this EMCMC individuals interact in the proposal distribution but children are accepted/rejected individually given only one parent.

To ease the reading, we consider that two children, **y**_{1} and **y**_{2}, are generated with a symmetrical proposal distribution *Q* from two parents and . Each child is accepted/rejected given one of the parents, randomly chosen without replacement, with the standard Metropolis acceptance rules, . Let’s denote with *K*_{1.1} the resulting transition matrix. The transition probability to accept both children is

The transition probability to accept only one child (i.e., **y**_{1}) and to reject the other child is

The rejection probability of both candidate states is

To analyze the behavior of this EMCMC, we compare its transition distribution with *K*_{C}, which we showed in Theorem 1 that it converges to the target distribution. We show that even though the acceptance and rejection transition probabilities are similar, *K*_{C} samples from the target distribution and *K*_{1.1} does not.

*Consider the two transition distributions**K*_{C}*and**K*_{1.1}*, the coupled acceptance rule* α_{C}*, the standard Metropolis acceptance rule* α *as before. Let’s further consider two parents**and their two children* (**y**_{1}, **y**_{2}) *generated with an irreducible symmetrical proposal distribution**Q*.

*The probability to accept a child that it fitter than one of its parents*, *, is higher for**K*_{1.1}*than for**K*_{C}

*The probability to reject a child less fit than one of its parents*, *, is higher for**K*_{1.1}*than for**K*_{C}*when the second child is more fit than the second parent*,

*The probability to reject a child less fit than one of its parents*, *, is lower for**K*_{1.1}* than for**K*_{C}* when the second child is less fit than the second parent,*.

*The EMCMC algorithm**K*_{1.1}* has detailed balance if and only if the probability to generate two children from two parents is equal with the probability to generate one child and one parent from the other parent and the other child*

1

*If Eq.* 1* holds, the algorithm converges to the target distribution**P*_{1} (·) · *P*_{2}(·).

Again, to ease the reading, we prove this theorem in Appendix 4.

According to the above proposition, an MH algorithm that accepts/rejects with the standard MH acceptance rule some, not all, of the individuals generated with some recombinative proposal distribution does exhibit detailed balance only for some particular types of recombinations.

Equation 1 holds, for example, for uniform mutation distribution *Q*_{m} and symmetrical recombination distributions that generate one child [8, 23]. It does not hold for other symmetrical recombinations that generate two or more children, like for example, uniform recombination. With uniform recombination for any four individuals, we have

Unfortunately, if the detailed balance condition does not hold, there is no standard method to know the target distribution.

It is interesting to notice that the MH algorithms generated with *K*_{1.1} have a higher probability of acceptance of at least one candidate state than the algorithms generated with *K*_{C} that accept or reject all individuals at once. As a consequence, for the same proposal distribution, the algorithm determined by *K*_{1.1} samples faster than an algorithm that uses *K*_{C}.

In this section we investigate an acceptance rule inspired from the elitist replacement strategy [25] which does not have detailed balance regardless of the proposal distribution used. Furthermore, we show that the marginal distribution of the generated EMCMC is different from the target distribution being amplified for the fit individuals and diminished for the less fit individuals.

The elitist coupled acceptance rule (ECA) algorithm is a family competitive acceptance rule where the best two solutions from the family of four is kept if at least one of them is a child. Otherwise, when both children have a lower fitness than both their parents, the children can probabilistically replace the parents.

ECA can be viewed as a combination between the elitist replacement rule from regular GAs and the coupled acceptance rule α_{C}. When compared with the elitist replacement, ECA is more exploratory but less elitist since it still accepts with some probability less fit individuals. When compared with α_{C} and α acceptance rules, ECA is more elitist but less exploratory. With ECA, if a child and a parent are the two most fit individual states from two parents and their children, they are *always* accepted whereas with α the other child will be accepted with some probability.

To establish the properties of ECA’s target distribution, we compare it with *K*_{C}. The probability to escape from the basin of attraction of a peak, as we show in the next paragraphs, is rather poor when compared with the transition distribution *K*_{C} generated with the same proposal distribution and the coupled acceptance rule α_{C}. The transition distribution generated by accepting with ECA the individuals proposed with the irreducible proposal distribution *Q* is denoted with *K*_{ECA}. We call max_{2} the function returning the two most fit solutions.

We distinguish three cases.

*Both children are better or worse than their parents*. Thenorwhere and . The transition probability to accept or reject both children, {**y**_{1},**y**_{2}}, proposed with the proposal distribution*Q*is non-zero only in this case. ThenNote that in this case, the transition probability of*ECA*is equal with the transition probability of an EMCMC using the coupled acceptance,*One of the children and one of the parents are most fit*. Then, for example,The transition probability to go from to {**y**_{1},**y**_{2}} is 0.Now,*K*_{C}is larger than 0 and*K*_{ECA}is 0.*Only one parent is replaced by its child*. The proposal probability where only one parent is replaced in the next generation, , is amplified with the sum over all proposal probabilities that generate a state**y**_{2}such thatThen

For irreducible proposal distributions *Q*, this EMCMC algorithm is irreducible because any two individuals can be generated from any other two individuals with a non-zero probability in two iterations of the algorithm Let’s assume again that a child **y**_{1} and one of the parents have the largest probabilities. In one iteration

and, for the second iteration, we also have

Following the above observation, we prove that this EMCMC converges to a stationary distribution and also it does not have detailed balance regardless of the proposal distribution. The proof is given in Appendix 5.

*Consider the EMCMC algorithm that generates candidate individuals using an irreducible proposal distribution**Q* and then accepts or rejects them with the ECA acceptance rule. This EMCMC algorithm does not have detailed balance for any non-uniform distribution *Q**and converges to a stationary distribution*.

This algorithm is climbing towards a local optima since it is very probable that a good solution remains a long time in the population to generate better solutions. Only when both children are worse than their parents this algorithm rejects the two candidate individuals with some probability. Otherwise, ECA always accepts at least one child. As a consequence, the probability to accept at least one proposed child is the largest from all the previous acceptance rules. Thus, an algorithm that uses ECA behaves more similar to a standard GA than to a sampling algorithm. As a consequence, the target distribution of ECA is biased towards high regions of *P*(·): the highest fitness states are sampled more often at the expense of the lower fitness states.

In the following, we propose a method to integrate the transition distributions without detailed balance in MH algorithms with detailed balance. To achieve this, we need to accept or reject all the individuals generated with an MH algorithm without detailed balance.

A *nested EMCMC algorithm* is an EMCMC algorithm where individuals are proposed using a transition distribution, and are further all accepted or all rejected by a coupled MH acceptance rule. A *nested transition distribution* is the transition distribution used as proposal distribution by the nested EMCMC algorithm.

Furthermore, the nested transition distribution that generates individuals with a recombination distribution is itself a recombinative proposal distribution: from two or more parents we propose two or more children.

*The nested EMCMC algorithm has detailed balance. The nested transition distribution composed by a respectful recombination proposal distribution and an acceptance rule is by itself a respectful recombination proposal distribution*.

The proof is immediate if we consider the nested transition distribution as a proposal distribution and Lemma 1. If parents have identical values at certain positions, then the individuals generated by respectful recombination have—by definition—the same values at those positions. An acceptance rule simply selects from parents and children, therefore, the accepted individuals have the same values on those positions.

Nested transitions are, usually, non-symmetrical. Thus, we need to compute these probabilities. In Fig. 1, we graphically depict the nested EMCMC framework.

**Correcting***K*_{1.1}. Consider the *nested EMCMC* that uses as proposal distribution the nested transition distribution, *K*_{1.1} where two candidate individuals are proposed from two parents with some recombinative proposal distribution, *Q*, and each child competes against one of the parents randomly chosen from the population with a standard MH acceptance rule. The candidate individuals proposed with *K*_{1.1} are, at their turn, accepted with the coupled acceptance rule, α_{C}. The nested EMCMC’s transition distribution is

where the coupled acceptance rule is

We observe that the nested EMCMC eliminates the influence of the proposal distribution on *K*_{1.1}’s target distribution with the coupled acceptance rule, α_{C}.

In the following proposition, we express *K*_{nEMCMC} as a function of *K*_{1.1} and the proposal distribution *Q*. The proof of this proposition is in Appendix 6.

*Consider that the symmetrical proposal distribution**Q**generates***y**_{1}*and***y**_{2}*from**and**. If both children are different from their parents*, , *the nested transition distribution is*

*If only one child is different from its parent*, *, then*

*where*

*Otherwise, if both children are rejected*,

From the above proposition, we note that the difference between the two transition distributions, *K*_{nEMCMC}, which samples from the target distribution, and *K*_{1.1}, which does not sample from the target distribution, is given by the correction term from Eq. 2. In other words, *K*_{1.1} has to be multipled with the above correction term to sample from the target distribution. If the irreducible proposal distribution *Q* has the property that

for any and **y**_{2}, the correction term is 1. In this case, according with both Proposition 13 and 10, *K*_{1.1} has detailed balance and converges to the target distribution. Note that the probability of acceptance of at least one candidate individual with *K*_{nEMCMC} is smaller than with *K*_{1.1} and larger than with *K*_{C}. Furthermore, *K*_{1.1}, as proposal distribution, is not symmetrical and to use it in *K*_{nEMCMC}, we have to compute the impractical correction term.

**K**_{C}**as nested proposal distribution**. The coupled transition distribution *K*_{C} is invariant for the nested method. The proof of this proposition is in Appendix 7.

*Consider that the symmetrical proposal distribution**Q**generates***y**_{1}*and***y**_{2}*from**and**. The nested transition distribution is*

The coupled transition distribution *K*_{C} does not need a correction term to converge to the target distribution.

We compare the performance of recombinative and non-recombinative population-based (E)MCMCs on two functions: a toy problem, the *hyper-geometrical* distribution, which we use to analytically compare the performance of the algorithms and a larger problem the *binary quadratic programming problem* (BQP). We show that recombinative EMCMCs can outperform the standard MCMCs. Furthermore, we show that the algorithms that use the coupled acceptance rule α_{C} are less efficient than the algorithms that use the standard acceptance rule α.

We compare the performance of five MCMCs: a single chain MCMC, two non-recombinative MCMCs with two recombinative EMCMCs. We take the size of population *N* = 2.

*MCMC*: one single chain MCMC that proposes new states with*Q*_{m}with the mutation rate*p*_{m}= 1/ and accepts (rejects) them using the Metropolis acceptance rule α.*MIC*: 2 independent MCMCs that propose new states with*Q*_{m}with the mutation rate*p*_{m}= 1/ and accept (reject) them using the Metropolis acceptance rule α.*mut*+α_{C}: a non-recombinative population-based MCMC that proposes each generation 2 new states with the same*Q*_{m}and accepts (rejects) all of them using the coupled acceptance rule α_{C}.*rEMCMC*: generates two individuals with a cycle between*Q*_{m}and parameterized uniform recombination,*Q*_{unif}, with*p*_{r}= 50%, and then accepts them with the standard Metropolis acceptance rule α.*rEMCMC*+α_{C}: generates two individuals with a cycle between*Q*_{m}and*Q*_{unif}and then accepts them with the coupled acceptance rule α_{C}.

As shown in previous sections, the target distribution of the three population based EMCMCs– *MIC*, *mut*+α_{C} and *rEMCMC*+α_{C}—is and the target distribution of single chain *MCMC* is *P*(·). The sampled distribution of *rEMCMC* is not the target distribution but, as the experimental results show it, it approximates quite well *P*(·) for large search spaces and a small number of samples.

To compare MH algorithms analytically, we compute the second largest eigenvalue of the transition matrices of the corresponding (E)MCMCs. Note that the second eigen value should be small to mix well.

A *hyper-geometric distribution* (Hyper) is

with the string size, *w* the number of bits-1 in the individuals with the lowest value 0.01, individual **x**_{0} with all bits equal to 0 is the second largest peak *h*_{2}, and the individual with all bits equal to 1 is the largest peak *h*_{1}. We set = 8 and *h*_{1} = 1.

In the first experiment, see Fig. 2a and b, we vary the distance of the lowest valued states to the optimum, *w* = {1, 2, 3} and we set the value of the second largest state to *h*_{2} = 0.75. In this case, the local and global optimum have a close value and we vary their basin of attraction: the greater the distance from the local optimum, the smaller the basin of attraction of the global optimum. Second, we vary *h*_{2} from 0.25 to 0.75 with a step size of 0.25 and we set *w* = 3. In this case, the optimum is isolated and its importance is decreasing with the height of the second largest peak. In Fig. 2a and c we show results for high mutation and swapping rates, 0.5; in Fig. 2b and d we have low mutation and swapping rates 0.125. We set the low mutation rate for the cycle *Q*_{m}×*Q*_{unif} to 0.125. Again, we reduce the computation costs by grouping individuals with the same number of ones and zeros in one individual because these individuals have the same fitness value and therefore, the same acceptance probability. The eigenvalues for *MIC* and *MCMC* are the same because the two MCMCs have the same acceptance rule and proposal distribution. Thus, we have chosen to show results only for one of the two algorithms.

Second largest eigenvalues for Hyper-geometrical function on 8 bits, that is two blocks each of 4 bits, where** a**,**b***w* = {1, 2, 3} and the peak heights are set to *h*_{1} = 1 and *h*_{2} = 0.75, and** c** **...**

In Fig. 2a and b, for *w* = 2, the basin of attraction is equal for the two peaks. Then, we obtain the highest eigenvalues, and thus the worst performance, for all the four algorithms. Here we have the largest amount of low fit states that separate two narrow regions with fit individuals; a random sampler, see *MCMC* with mutation rate of 0.5 in Fig. 2b, is the best algorithm since it covers a large area with low equal values in short time.

For the other values of *w*, the basin of attraction of one of the peaks is wider than the basin of attraction of the other peak; the narrower one region is, the harder to find and sample it. For *w* = {1,3} we have the lowest eigenvalues and, furthermore, the highest difference between the algorithms. The non-recombinative (E)MCMCs do well because the narrow peak is reduced now to one point. The recombinative EMCMCs do better than the non-recombinative (E)MCMCs with the same acceptance rule because recombination generates with higher probability more fit individuals by combining the two building blocks of this function, In Fig. 2c and d, we observe that the performance of all the (E)MCMC algorithms varies very little with the height of the second largest peak *h*_{2}. Thus, these eigenvalues are (approximatively) the same with the eigenvalues for *w* = {1,3} from Fig. 2a and b.

To conclude this example, we observe that due to the structure of the problem recombinative EMCMCs have provably a better performance than the non-recombinative EMCMCs. The performance of MCMCs are diminished by the coupled acceptance rule α_{C}; *MCMC* is sampling more efficient than *mut*+α_{C} and *rEMCMC* is better than *rEMCMC*+α_{C}. The mutation rate greatly influences the performance of non-recombinative MCMCs; a high mutation rate decreases the performance of the algorithm. The swapping probability influences less the efficiency of the recombinative EMCMCs. *rEMCMC* and *rEMCMC*+α_{C} perform best for high swapping probabilities, whereas *MCMC* and *mut*+α_{C} perform best for low mutation rates.

In the following, we have performed experiments with the *binary quadratic programming problem* (BQP) to show, on a more elaborated example, that recombination is useful for sampling. The fitness function of an individual *x* is *f*(*x*) = ∑_{j=1}^{} ∑_{k=1}^{}*F*[*j*][*k*] · *x*[*j*] · *x*[*k*], where *F*[*j*][*k*] is the element on the *j*-th row and on the *k*-th column of a matrix *F* of integers, both positive and negative and **x** a binary string (e.g., **x**_{·} is 0 or 1). Then, *F*’s size is · .

The interaction between two or more positions of the BQP problem depends on the matrix *F*’s density, which is defined as the number of non-zero elements divided by the number of total elements in the matrix. The density is then between 0 and 1, where 0 means no interaction between positions and 1 means maximum interaction—that is every position depends on every other position. For our experiments, we generate random matrices with density 0.1.

These fitness values are positive and negative. However, the (unnormalized) probabilities of a distribution can be only greater than 0. Therefore, we add to all the fitness values a fixed positive integer *transl*; every value that now is equal or below 0 is assigned with the value 0.01. The unnormalized probabilities are , when *f*(*x*) > *transl* and, otherwise, .

In this section, we show that recombination can improve sampling. We first discuss the experimental methodology available to measure and compare the performance of EMCMCs. Second, we show experimental results on a BQP problem on 20 bits. By expanding the target distribution, we are able to compute the distance between this distribution and the true distribution. At last, we show results on a larger search space, for *l* = 100 bits. Unlike for the previous example, we are not able to expand this distribution, and therefore we are constrained to use less precise methods to assess the performance of (E)MCMCs. For both experiments, we compare the five (E)MCMC algorithms described above: three non-recombinative (E)MCMCs—that are one long chain *MCMC*, *MIC* and *mut*+α_{C}—and two recombinative EMCMCs—that are *rEMCMC* and *rEMCMC*+α_{C}.

To assess the efficiency of various EMCMCs we focus on monitoring how fast an MCMC is mixing and how well the samples spread over the entire target distribution after a fixed and rather small number of generated individuals. There is no generally acknowledged methodology on measuring how “close” a set of samples generated with a real-coded MCMC is to the true target distribution. Wolpert and Lee [26] argue that a good approach is to use the Kullback-Leibler (KL) distance between an approximation of the sampled distribution and a discrete approximation of the true distribution.

To measure the speed with which an algorithm samples the search space, Roberts et al. [21] recommend to monitor the acceptance probability of an algorithm. They analytically and experimentally study the behavior of a standard MCMC using a normal distributed mutation with fixed and equal variances in all dimensions. The target distribution is a multivariate normal distribution with standard deviation of 1.0 in all dimensions and no correlations. They conclude that a very high or very low acceptance rate of the MCMC indicates slow mixing, and a good acceptance rate is between 0.2 and 0.5. A high acceptance rate and a high performance (e.g., the KL-measure close to 0) indicates a well performing algorithm that mixes fast. Analytically computing the optimal acceptance probability is only feasible for very simple target and proposal distributions and when using the Metropolis acceptance rule. Here, we restrict ourselves to experimentally monitoring the acceptance probability.

For the tested recombinative EMCMCs, we have good performance (e.g., KL distance) even for very high acceptance rates that shows that recombination can improve the mixing of MCMCs. Furthermore, we show that algorithms with similar acceptance probabilities can have rather different performance.

For the first experiment, we set the string length to = 20 and *transl* = 50. Since the *F*’s density is 0.1, only 40 elements of *F* have non-zero values. The non-zero integers are generated at random from the interval [−100, 100]. For the generated matrix *F*, we have found the maximum fitness value 146; when this is translated, the maximum unnormalized probability value is 196. We group the individuals with the same value to generate the histogram and we also store the number of individuals with the same value.

We set the population size *N* = 20. Each generation, all individuals are randomly coupled in *N*/2 pairs such that each individual belongs to exactly one pair. We have performed experiments for various mutation rates (from 0.05 to 0.5) and swapping probabilities for the uniform recombination (from 0.05 to 0.5). With each algorithm, we generate 20,000 individuals; our measurements are averaged over 50 runs. We throw away the first 10,000 generated individuals to diminish the impact of the starting points over the performance of the algorithms. This is called the *burn-in* period. Thus, in total, we sample 10,000 “useful” individuals from which we generate Table 2 and the graphs from Fig. 3.

The search space is 2^{20}. By expanding the target distribution, we are able to compare the frequencies of samples generated with the tested (E)MCMCs with their value in the true distribution. In Table 2, we compute the KL distance and acceptance ratio for the five (E)MCMCs. Mann-Whitney nonparametric two-sided test with significance level *p* < 0.05 is used to verify if KL distances of the five tested algorithms are sampled from different distributions. The algorithms have statistical significantly different output except with *mut*+α_{C} and *MCMC*.

The best algorithm, with the significantly lowest KL distance and the highest acceptance ratio, is the recombinative EMCMC, *rEMCMC*. The only difference between *MIC*, the algorithm with second best KL distance, and *rEMCMC* is that *rEMCMC* uses recombination and *MIC* does not. Further, we observe that the two recombinative EMCMCs, that are *rEMCMC* and *rEMCMC*+α_{C}, have a higher acceptance probability than the three non-recombinative EMCMCs, that are *MCMC*, *MIC* and *mut*+α_{C}. That indicates that the recombinative proposal distribution *Q*_{m}×*Q*_{unif}, by exploiting the commonalities of the search space, is a “better” proposal distribution than *Q*_{m}.

Furthermore, as we already observed in the analytical experiments, the coupled acceptance rule α_{C} has a negative influence over both recombinative and non-recombinative EMCMCs. Even though using α_{C}, the recombinative *rEMCMC*+α_{C} has the third best KL distance and the second acceptance ratio, whereas *mut*+α_{C} is the worst algorithm of all. We explain the good behavior of *rEMCMC*+α_{C} by synchronizing the individuals in the family with the uniform recombination: children that inherit the common parts of their parents have similar fitness with the parents and the algorithm accepts more individuals. In opposition, uniform mutation independently proposes two individuals in random directions; then, if one of the candidates has very low fitness, there is a big probability that both children are rejected. As a consequence, *mut*+α_{C} has a low acceptance rate and, thus, performance.

In accordance with the analytical results from the previous section, we observe that *MIC*, by using populations of MCMC chains has a lower KL distance than the standard *MCMC*. Note that the acceptance ratio for these two algorithms is the same, but their KL distance quite different.

In Fig. 3, we show experimental results for the two most performant (E)MCMCs presented in the previous section: *MIC* and *rEMCMC*. By using recombination, *rEMCMC* is a better sampler than *MIC* is: in frequencies, *rEMCMC*, see Fig. 3a, is closer to the true target distribution than *MIC* is. If *rEMCMC* samples with predilection in the high values of the target distribution, in opposition, *MIC* typically samples the low fit individuals. Furthermore, *rEMCMC* finds more higher probable solutions than *MIC*, see Fig. 3b. Overall, in Fig. 3c, we notice that the distribution sampled with *rEMCMC* is closer to the true distribution than *MIC* is. These results are in concordance with the ones in Table 2 from which we conclude that *rEMCMC* is the most performant algorithm for this particular problem by proposing individuals with recombination.

We now show that recombination can improve mixing on BQP with string size = 100, for which it is impractical to generate the target distribution. In this experiment, we cannot compute the KL distance. Furthermore, we do not know the maximum value of this function or if the values are uniformly distributed in some interval.

For this experiment, we compare all the five MCMCs as before, but we show the results only for the best two algorithms *MIC* and *rEMCMC*. Note that these two algorithms are the best performant algorithms in all three experiments.

Assuming, that the unnormalized values of the distribution are in a very large range we group our samples is using the individual’s number of ones to compare two (E)MCMCs algorithms. Given this grouping, we compute the frequencies, Fig. 4a, and the mean value, Fig. 4b, for each such a group.

Again, we set the density of *F* to 0.1; thus, approximately 1,000 elements of *F* are non-zero. We generate these non-zero integers with a uniform random distribution from the interval [−100, 100]. To generate a distribution with positive values, we set *transl* = 1,250. We set population size *N* = 100 and, each algorithm we run 50 times. With each algorithm, we generate 100,000 individuals which we throw away, and we use the next 100,000 generated individuals. Again, we vary the mutation rate and swapping rate from 0.05 and 0.5. In Fig. 4, we show results for *MIC*’s mutation rate 0.2, and *rEMCMC*’s swapping probability 0.5 and mutation rate 0.01. Then, *MIC* has an acceptance rate of 30%, whereas *rEMCMC* has an acceptance rate of 78%. We mention that we have performed experiments with various mutation and swapping probabilities but the results are not very different from the ones we currently show.

In Fig. 4a, we notice that *rEMCMC* samples slightly more individuals with a higher number of ones than *MIC* does. Except that, the distributions sampled by *MIC* and *rEMCMC* are similar and both are sampling especially from individuals with half number of zeros and ones indicating that the target distribution is symmetrically distributed around these individuals. Despite that, the mean values of the sampled individuals are remarkably larger for *rEMCMC* than for *MIC*. It seems that 100,000 individuals are not enough for *MIC*’s burn in whereas for *rEMCMC* it is. We also have performed experiments with single chain *MCMC*; we mention that the mean values are worse than of *MIC*. We explain that by the shape of this BQP: a lot of peaks with many low fit individuals. We therefore consider that *MIC* mixes slower than *rEMCMC*: *N* = 100 is not large enough to cover the number of these peaks and thus *MIC* will always have the problem to escape from these peaks to find the other useful ones. To sample the same amount of individuals, an increase in population size must be combined with a decrease in the *MCMC*’s time to run. The less time we allow an *MCMC* to run, the worse an *MCMC* samples from the search space and eventually, when population size goes to infinity, *MIC* is just a random sampler.

We discussed aspects from the Evolutionary MCMC framework, a class of population based MCMC algorithms that exchange useful information by using recombination and selection. The main issue for EMCMC algorithms is to improve the performance of the sampling process, or the convergence time to a desired distribution. Detailed balance is a straightforward and sufficient, but not necessary, condition for an irreducible and aperiodic EMCMC to converge to a given distribution.

We aim to increase the efficiency of MCMCs by the use of recombination. Recombination operators can generate “good” proposal distributions that exploit the structure of the search space such that EMCMCs using it converge faster to the target distribution. Of course, when the search space has no structure or the structure is not correctly matched with the recombination operator, the recombinative proposal distribution will offer no advantage and will most likely be as efficient as a uniform randomly generated distribution, or even worse in the worst case.

We proposed various recombinative proposal distributions on discrete spaces and we studied how to integrate them into EMCMCs with detailed balance. Since we consider only respectful recombinations, which are reducible, we have to combine recombination with mutation in order to obtain irreducible EMCMCs. We focus on discrete space recombinations and study the properties of discrete space EMCMCs resulting from the interaction of recombinative proposal distributions and MH acceptance rules. The analytical and experimental results show that EMCMCs can outperform the standard MCMC sampling algorithms by using recombination operators.

We have proposed and investigated various MH acceptance rules derived from EC’s selection strategies. In order to obtain a recombinative EMCMC with detailed balance, the children proposed by the recombination operator need to be all accepted or all rejected with the coupled acceptance rule.

Both analytical investigations and experimental tests show that the recombinative EMCMC in which a child individually competes against a parent in the standard MH acceptance rule is the best sampler. In the experimental section, for very large search spaces and small number of samples, this EMCMC, *rEMCMC*, samples high regions of the search space faster than an EMCMC using the coupled acceptance rule. Thus, in short time, *rEMCMC* approximates the desired distribution better. However, there is no theoretical guarantee that *rEMCMC* converges to the target distribution. We also proposed the nested EMCMCs that individually accept or reject fitted states with a EMCMC without detailed balance. Even though the nested EMCMCs on the proposed unbalanced EMCMC have theoretical value by indicating a correction term of the sampled distribution, its computation is impractical.

Finally, we also discussed a recombinative EMCMC without detailed balance but that can be useful for optimization purposes. It is a straightforward extension of the elitist replacement in an MH acceptance rule: two parents compete against two children and the best two from the four are selected for the next generation. This EMCMC can be used only for optimization since it is sampling mainly from the fittest regions of the sampled distribution at the expense of the less fit regions. Its disadvantage is that it can get stuck for a long time in good, but isolated, modes of the sampled distribution.

We conclude that one should be careful with adopting recombination and selection operators from EC into population-based MCMC framework. Population-based techniques that are suited for optimization can be less suitable for sampling and vice-versa. To have a positive impact on the sampling performance of interacting MCMC chains, recombination and selection techniques need to follow some design principles as outlined in this paper.

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

*Q*_{m×r} and *Q*_{r×m} are irreducible because they have non-zero probabilities to go from any population to any other population, *Q*_{m×r} > 0 and *Q*_{r×m} > 0.

When *Q*_{r} is symmetric, we have

*Q*_{r×m} and *Q*_{m×r} are symmetrical for recombinations that swap alleles because mutation generates the alleles which differ in the two populations and recombination swaps them or vice-versa.

By means of an example, we prove that *Q*_{dif×m} is non-symmetrical. Consider the current population of bits **X**^{(t)} = {0, 1, 0} and the candidate population *Y* = {1, 1, 1}, the mutation rate of 1/3, and, for simplicity, the xor operator. We compute the probability to generate *Y* from **X**^{(t)} with uniform mutation and then with xor recombination and the inverse probability to generate **X**^{(t)} from *Y*.

Let’s consider all possible parent choices for xor. With the xor recombination, given the distance between the first two bits, we generate 1 from the third bit of the current population 0; the intermediate population is now . The distance between the second and the third bit is also , and thus the intermediate population is again . Since the distance between first and second bits of the current population is , we generate 1 from 1 and the intermediate population is . When we mutate the intermediate populations, we have = (1/3)^{2} ·2/3 and = (2/3)^{2} ·1/3. Computing in a similar manner the inverse probability, for all possible intermediate populations, we have .

To generate **X**^{(t)} from *Y* with *Q*_{dif×}*m*, we mutate *Y* to and then swap with the xor operator the last bit given the difference between the first two bits resulting in **X**^{(t)}. Similarly, we mutate *Y* to and swap the first bit of or we mutate into and do not swap the middle bit with xor since the difference between the first and the last bit is 0. We then have = 1/3 · (2 · (2/3)^{2} · 1/3+1/3 ·(2/3)^{2}) = 4/81.

We conclude that *Q*_{dif×m} is not symmetrical since

*Q*_{m×mask} is irreducible, since is has If *p*_{x} = 1/, the *Q*_{m×mask} is equivalent with the mutation operator, since all alleles in the parents can be flipped with the probability 1/. Then *Q*_{m×mask} is symmetric.

For *p*_{m} = 1/2, we uniformly random generate the child **y**_{i+1} from the mask Then, *Q*_{m×mask} is symmetric since the common and uncommon parts of the parents and the children are randomly generated.

By means of an example, we show that *Q*_{m×mask} is non-symmetrical for other values of *p*_{m} and *p*_{x}. Consider and . When **y**_{i} = 1 and **y**_{i+1} = 0, the probability to generate **y**_{i} is 1/, and the probability to generate **y**_{i+1} is 1 − *p*_{m}. The inverse probability is = (1 − *p*_{x}) · (1 − *p*_{m}). When **y**_{i} = 0 and **y**_{i+1} = 1, the probability to generate **y**_{i} is 1 − 1/ and the probability to generate **y**_{i+1} is *p*_{m}. The reverse probability is now *p*_{x} · *p*_{m}. Then and = (1 − *p*_{x}) · (1 − *p*_{m})+*p*_{x} · *p*_{m}. We now have that if *p*_{x} ≠ 1/ and *p*_{m} ≠ 1/2, then *Q*_{m×mask} is non-symmetrical.

We consider that the EMCMC resulting from the interaction between transition matrix *Q* and the (generalized) Metropolis acceptance rule is an EMCMC over the *N* dimensional search space *E*^{N}. For ease of exposure and without loss of generality, let’s consider populations of two individuals *N* = 2. Two children {**y**_{1}, **y**_{2}} that are generated with some irreducible and symmetrical proposal distribution *Q* from two parents . Then .

The Metropolis acceptance rule in this case is . The transition matrix we denote with *K*_{C}. The transition probability that two children **y**_{1} and **y**_{2} are generated and both are accepted is

The rejection transition probability that both children are rejected is

Let’s assume without loss of generality that . Then,

and .

We now show that the detailed balance condition holds

The marginal transition probability to generate from **y**_{1} when summing over the variables of the second chain is

The stationary marginal distribution of the first chain is

From the above equations we infer

where we have used

We conclude that the marginal target distribution for the *i*-th chain is *P*_{i}(·) and that this EMCMC algorithm has the stationary distribution *P*_{1}(·)×*P*_{2}(·).

The EMCMC algorithm is irreducible since the proposal distribution *Q* is irreducible. This algorithm is aperiodic since the Metropolis algorithm, by construction is aperiodic. We conclude that this EMCMC is ergodic with the stationary distribution *P*_{1}(·)×*P*_{2}(·), where *P*_{i}(·) is the marginal target distribution of the *i*-th chain.

Consider two parents and and their two generated children **y**_{1} and **y**_{2}, and the coupled acceptance α_{C} that accepts/rejects both states and the probability to accept/reject one or both children with α_{1.1}. At first, we prove the first inequality from Proposition 10

The right side of this inequation can be rewriten as

because and, thus . The left side of the inequality 3 can be rewritten as

The inequality 3 holds since

We now prove that the rejection probability for a less fit child, , is larger for *K*_{1.1} than for *K*_{C} when the second child is more fit than the second parent, . Then

The right side of the inequality 4 is

Rewriting the left side of the inequality 4

The inequality 4 holds since

Following the same line of reasoning, the rejection probability that a less fit child, , is lower for *K*_{1.1} than for *K*_{C} when the second child is less fit than the second parent, follows directly.

We now show that the EMCMC defined by *K*_{1.1} has detailed balance if and only if

The detailed balance should hold only in the case that two different children are proposed but only one of them is accepted and the other is rejected

Or the above equation holds if and only if

If the detailed condition holds and the proposal distribution is irreducible and symmetrical, the EMCMC is ergodic and converge to the target distribution *P*_{1}(·)×*P*_{2}(·).

We show that ECA does not have detailed balance for any non-uniform distribution. Let’s consider three states, such that . In our discussion from Sect. 5.3 we show that

If ECA has detailed balance, then

Because *Q* is symmetrical we further have

*K*_{C} is also symmetrical, so further we have that

The above equation does not hold since for the first sum requires that

and for the second sum that

We conclude that *K*_{ECA} does not have detailed balance for any *Q*.

When *Q* is irreducible, this algorithm is irreducible since it can generate any state from any other state with a non-zero probability. Therefore, the algorithm is also aperiodic since the *K*_{C} is aperiodic and thus the target distribution of ECA exists.

The proof is split in three parts, corresponding with the three equations in the proposition.

*Both children are different from their parents*. ThenThe reverse transition probability isWe now havewhere*Q*is symmetrical and thus . By replacing the definition of acceptance rule α, we haveandThe coupled acceptance probability for the nested acceptance probability is nowThe nested transition probability now is*One child is different from its parent*. For example**y**_{2}is rejected and**y**_{1}is accepted. ThenThe reverse transition probability isWe now havewhere . Now, the coupled acceptance iswhere . The second equation from the proposition now follows directly.*Both children are rejected*. Then

The coupled acceptance probability for the nested acceptance probability is now

where *Q* is symmetrical

and

The equation in the proposition follows directly.

Madalina M. Drugan, Email: ln.uu.sc@aniladam.

Dirk Thierens, Email: ln.uu.sc@krid.

1. Andrieu C, de Freitas N, Doucet A, Jordan M (2003) An introduction to MCMC for machine learning. Mach Learn 50:5–43

2. Baluja S, Davies S (1997) Using optimal dependency-trees for combinational optimization. In: Proceedings of international conference of machine learning (ICML’97). Morgan Kaufmann, San Francisco, pp 30–38

3. Campillo F, Rakotozafy R, Rossi V. Parallel and interacting Markov chain Monte Carlo algorithm Math Comput Simul 2009. 79123424–3433.34331178.65003254716510.1016/j.matcom.2009.04.010 [Cross Ref]

4. Chow CK, Liu CN. Approximating discrete probability distributions with dependence trees IEEE Trans Inf Theory 1968. 14462–467.4670165.2230510.1109/TIT.1968.1054142 [Cross Ref]

5. Corander J, Ekdahl M, Koski Timo. Parallel interacting MCMC for learning of topologies of graphical models Data Min Knowl Discov 2008. 173431–456.456244865610.1007/s10618-008-0099-9 [Cross Ref]

6. Doucet A, de Freitas N, Gordon N, editors. Sequential Monte Carlo methods in practice. Berlin: Springer; 2001.

7. Drugan MM, Thierens D (2004) Evolutionary Markov chain Monte Carlo. In: Artificial evolution, LNCS 2936. Springer, Berlin, pp 63–76

8. Drugan MM, Thierens D (2005) Recombinative EMCMC algorithms. In: Proceeding of IEEE congress of evolutionary computation, CEC’05. IEEE Press, Piscataway, pp 2024–2031

9. Fearnhead P. Special issue: adaptive Monte Carlo methods Statistics and Computing 2008. 184341–480.480239081610.1007/s11222-008-9107-6 [Cross Ref]

10. Geraci J. A new connection between quantum circuits, graphs and the ising partition function Quantum Information Processing 2008. 75227–242.2421160.81330245360610.1007/s11128-008-0084-7 [Cross Ref]

11. Geyer CJ (1991) Markov Chain Monte Carlo maximum likelihood. In: Computing science and statistics: proceeding of the 23rd symposium on the interface. pp 156–163

12. Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo in practice. London: Chapman & Hall; 1996.

13. Hastings WK. Monte Carlo sampling methods using Markov chains and their applications Biometrika 1970. 5797–109.1090219.6500810.1093/biomet/57.1.97 [Cross Ref]

14. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680 [PubMed]

15. Laskey KB, Myers JW (2003) Population Markov Chain Monte Carlo. Mach Learn 50:175–196

16. Liang F, Wong WH (2000) Evolutionary Monte Carlo: applications to *C*_{p} model sampling and change point problem. In: Statistica sinica. pp 317–342

17. Mahfoud SW, Goldberg DE (1995) Parallel recombinative simulated annealing: a genetic algorithm. Parallel Comput 21:1–28

18. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equation of state calculations by fast computing machines. J Chem Phys. 1953;21:1087–1092. doi: 10.1063/1.1699114. [Cross Ref]

19. Pelikan M, Goldberg DE, Lobo F. A survey of optimization by building and using probabilistic models Comput Optim Appl 2002. 215–20.200988.90052188308710.1023/A:1013500812258 [Cross Ref]

20. Radcliffe NJ (1991) Forma analysis and random respectful recombination. In: Proceeding of the fourth international conference on genenetic algorithms. pp 222–229. Morgan Kaufmann

21. Roberts GO, Gelman A, Gilks WR. Weak convergence and optimal scaling of random walk Metropolis algorithms The Annals of Applied Probability 1997. 71110–120.1200876.60015142875110.1214/aoap/1034625254 [Cross Ref]

22. Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces Journal Global Optimization 1997. 11341–359.3590888.90135147955310.1023/A:1008202821328 [Cross Ref]

23. Strens MJA (2003) Evolutionary MCMC sampling and optimization in discrete spaces. In: Proceeding of international conference of machine learning (ICML’03). pp 736–743

24. ter Braak CJF. Genetic algorithms and Markov chain Monte Carlo: differential evolution Markov Chain makes Bayesian computing easy Stat Comput 2006. 163239–249.249224223610.1007/s11222-006-8769-1 [Cross Ref]

25. Thierens D, Goldberg DE (1994) Elitist recombination: an integrated selection-recombination GA. In Proceeding of the congress on computational intelligence. pp 508–512

26. Wolpert DH, Lee CF (2005) Adaptive metropolis sampling and optimization with product distributions. Technical report, NASA Ames Research Center

27. Zhang BT, Cho DY. System identification using evolutionary Markov Chain Monte Carlo. Journal of Systems Architecture. 2001;47:587–599. doi: 10.1016/S1383-7621(01)00017-0. [Cross Ref]

Articles from Springer Open Choice are provided here courtesy of **Springer**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |