Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5361064

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Related Work
- 3 Models in Which Scan Order Matters
- 4 Experiments
- 5 Relative Bounds on Mixing Times via Conductance
- 6 Conclusion
- References

Authors

Related links

Adv Neural Inf Process Syst. Author manuscript; available in PMC 2017 March 22.

Published in final edited form as:

PMCID: PMC5361064

NIHMSID: NIHMS826679

Stanford University

Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.

Gibbs sampling, or Glauber dynamics, is a Markov chain Monte Carlo method that draws approximate samples from multivariate distributions that are difficult to sample directly [9; 15, p. 40]. A major use of Gibbs sampling is marginal inference: the estimation of the marginal distributions of some variables of interest [8]. Some applications include various computer vision tasks [9, 23, 24], information extraction [7], and latent Dirichlet allocation for topic modeling [11]. Gibbs sampling is simple to implement and quickly produces accurate samples for many models, so it is widely used and available in popular libraries such as OpenBUGS [16], FACTORIE [17], JAGS [18], and MADlib [14].

input Variables
x for 1 ≤
_{i}i ≤ n, and target
distribution π |

Initialize
x_{1}, …,
x_{n} |

loop |

Select variable
index s from {1, …,
n} |

Sample
x from the conditional
distribution _{s}P_{π}
(X |
_{s}X_{{1, …,
n}\{s}}) |

end loop |

Gibbs sampling (Algorithm 1)
iteratively selects a single variable and resamples it from its conditional
distribution, given the other variables in the model. The method that selects the
variable index to sample (*s* in Algorithm 1) is called the *scan order*. Two scan orders
are commonly used: random scan and systematic scan (also known as deterministic or
sequential scan). In random scan, the variable to sample is selected uniformly and
independently at random at each iteration. In systematic scan, a fixed permutation
is selected, and the variables are repeatedly selected in that order. The existence
of these two distinct options raises an obvious question—which scan order
produces accurate samples more quickly? This question has two components: hardware
efficiency (how long does each iteration take?) and statistical efficiency (how many
iterations are needed to produce an accurate sample?).

From the hardware efficiency perspective, systematic scans are clearly superior [21, 22]. Systematic scans have good spatial locality because they access the variables in linear order, which makes their iterations run faster on hardware. As a result, systematic scans are commonly used in practice.

Comparing the two scan orders is much more interesting from the perspective
of statistical efficiency, which we focus on for the rest of this paper. Statistical
efficiency is measured by the *mixing time*, which is the number of
iterations needed to obtain an accurate sample [15, p. 55]. The mixing times of random scan and systematic scan have
been studied, and there is a longstanding conjecture [3; 15, p. 300] that systematic
scan (1) never mixes more than a constant factor slower than random scan and (2)
never mixes more than a logarithmic factor faster than random scan. This conjecture
implies that the choice of scan order does not have a large effect on
performance.

Recently, Roberts and Rosenthal [20] described a model in which systematic scan mixes more slowly than random scan by a polynomial factor; this disproves direction (1) of this conjecture. Independently, we constructed other models for which the scan order has a significant effect on mixing time. This raises the question: what are the true bounds on the difference between these mixing times? In this paper, we address this question and make the following contributions.

- In Section 3, we study the effect of the variable permutation chosen for systematic scan on the mixing time. In particular, in Section 3.1, we construct a model for which a systematic scan mixes a polynomial factor faster than random scan, disproving direction (2) of the conjecture, and in Section 3.2, we construct a model for which the systematic scan with the worst-case permutation results in a mixing time that is slower by a polynomial factor than both the best-case systematic scan permutation and random scan.
- In Section 4, we empirically verify the mixing times of the models we construct, and we analyze how the mixing time changes as a function of the permutation.
- In Section 5, we prove a weaker version of the conjecture described above, providing relative bounds on the mixing times of random and systematic scan. Specifically, under a mild condition, different scan orders can only change the mixing time by a polynomial factor. To obtain these bounds, we introduce a method of augmenting the state space of Gibbs sampling so that the method of conductance can be applied to analyze its dynamics.

Recent work has made progress on analyzing the mixing time of Gibbs
sampling, but there are still some major limitations to our understanding. In
particular, most known results are only for specific models or are only for random
scan. For example, mixing times are known for Mallow’s model [1, 4], and
colorings of a graph [5] for both random and
systematic scan, but these are not applicable to general models. On the other hand,
random scan has been shown to mix in polynomial time for models that satisfy
structural conditions – such as having close-to-modular energy functions
[10] or having bounded hierarchy width
and factor weights [2] – but
corresponding results for for systematic scan are not known. The major exception to
these limitations is Dobrushin’s condition, which guarantees
*O*(*n* log *n*) mixing for both
random scan and systematic scan [6, 13]. However, there are many models of interest
with close-to-modular energy functions or bounded hierarchy width that do not
satisfy Dobrushin’s condition.

A similar choice of scan order appears in stochastic gradient descent (SGD), where the standard SGD algorithm uses random scan, and the incremental gradient method (IGM) uses systematic scan. In contrast to Gibbs sampling, avoiding “bad permutations” in the IGM is known to be important to ensure fast convergence [12, 19]. In this paper, we bring some intuition about the existence of bad permutations from SGD to Gibbs sampling.

Despite a lack of theoretical results regarding the effect of scan order on mixing times, it is generally believed that scan order only has a small effect on mixing time. In this section, we first define relevant terms and state some common conjectures regarding scan order. Afterwards, we give several counterexamples showing that the scan order can have asymptotic effects on the mixing time.

The *total variation distance* between two probability
distributions μ and ν on Ω is [15, p. 47]

$${\Vert \mu -\nu \Vert}_{\mathrm{T}\mathrm{V}}=\underset{A\subseteq \mathrm{\Omega}}{\text{max}}|\mu (A)-\nu (A)|.$$

The *mixing time* is the minimum number of steps
needed to guarantee that the total variation distance between the true and estimated
distributions is below a given threshold ε from any starting distribution.
Formally, the *mixing time* of a stochastic process
*P* with transition matrix
*P*^{(t)} after *t*
steps and stationary distribution π is [15, p. 55]

$${t}_{\text{mix}}(P,\epsilon )=\text{min}\{t:\underset{\mu}{\text{max}}{\Vert {P}^{(t)}\mu -\pi \Vert}_{\mathrm{T}\mathrm{V}}\le \epsilon \},$$

where the maximum is taken over the distribution μ of the
initial state of the process. When comparing the statistical efficiency of
systematic scan and random scan, it would be useful to establish, for any systematic
scan process *S* and random scan process *R* on the
same *n*-variable model, a relative bound of the form

$${F}_{1}(\epsilon ,n,{t}_{\text{mix}}(R,\epsilon ))\le {t}_{\text{mix}}(S,\epsilon )\le {F}_{2}(\epsilon ,n,{t}_{\text{mix}}(R,\epsilon ))$$

(1)

for some functions *F*_{1} and
*F*_{2}. Similarly, to bound the effect that the choice
of permutation can have on the mixing time, it would be useful to know, for any two
systematic scan processes *S*_{α} and
*S*_{β} with different permutations on the same
model, that for some function *F*_{3},

$${t}_{\text{mix}}({S}_{\alpha},\epsilon )\le {F}_{3}(\epsilon ,n,{t}_{\text{mix}}({S}_{\beta},\epsilon )).$$

(2)

Diaconis [3] and Levin et al. [15, p. 300] conjecture that systematic scan is
never more than a constant factor slower or a logarithmic factor faster than random
scan. This is equivalent to choosing *F*_{1}(ε,
*n*, *t*) =
*C*_{1}(ε) · *t* ·
(log *n*)^{−1} and
*F*_{2}(ε, *n*, *t*) =
*C*_{2}(ε) · *t* in the
inequality in (1), for some functions
*C*_{1} and *C*_{2}. It is also
commonly believed that all systematic scans mix at the same asymptotic rate, which
is equivalent to choosing *F*_{3}(ε,
*n*, *t*) =
*C*_{3}(ε) · *t* in (2).

These conjectures imply that using systematic scan instead of random scan will not result in significant consequences, at least asymptotically, and that the particular permutation used for systematic scan is not important. However, we show that neither conjecture is true by constructing models (listed in Table 1) in which the scan order has substantial asymptotic effects on mixing time.

In the rest of this section, we go through two models in detail to highlight
the diversity of behaviors that different scan orders can have. First, as a warm-up,
we construct a model, which we call the *sequence of dependencies*
model, for which a single “good permutation” of systematic scan
mixes faster, by a polynomial factor, than both random scan and systematic scans
using most other permutations. This serves as a counterexample to the conjectured
lower bounds (i.e. the choice of *F*_{1} and
*F*_{3}) on the mixing time of systematic scan. Second,
we construct a model, the *two islands* model, in which there is a
small set of “bad permutations” that mix very slowly in comparison
to random scan and most other systematic scans. This contradicts the conjectured
upper bounds (i.e. the choice of *F*_{2} and
*F*_{3}). For completeness, we also discuss the
*discrete pyramid* model introduced by Roberts and Rosenthal
[20] (which contradicts the conjectured
choice of *F*_{2}). Table
1 also lists the mixing times of a few additional models we constructed:
these models further explore the space of asymptotic comparisons among scan orders,
but for brevity we defer them to the appendix.

The first model we will describe is the sequence of dependencies model (Figure 1a). The goal of this model is to explore the question of how fast systematic scan can be, by constructing a model for which systematic scan mixes rapidly for one particular good permutation. The sequence of dependencies model achieves this by having the property that, at any time, progress towards mixing is only made if a particular variable is sampled; this variable is always the one that is chosen by the good permutation. As a result, while a systematic scan using a good permutation makes progress at every step, both random scan and other systematic scans often fail to make progress, which leads to a gap between their mixing times. Thus, this model exhibits two surprising behaviors: (1) one systematic scan is polynomially better than random scan and (2) systematic scans using different permutations have polynomial differences in mixing times. We now describe this model in detail.

There are *n* binary variables
*x*_{1}, …,
*x _{n}*. Independently, each variable has a very
strong prior of being true. However, variable

$$P(x)\propto \{\begin{array}{cc}0\hfill & \text{if}{x}_{i}\text{is true and}{x}_{i-1}\text{is false for some}i\in \{2,\dots ,n\}\hfill \\ {M}^{|x|}\hfill & \text{otherwise}\hfill \end{array}$$

There are *n* + 1 states with non-zero probability:
*s*_{0}, …,
*s _{n}*, where

The number of steps needed to transition from
*s*_{0} to *s*_{1} is
distributed as a geometric random variable with mean *n*
(variables are randomly selected, and specifically
*x*_{1} must be selected). Similarly, the number
of steps needed to transition from
*s*_{i−1} to
*s _{i}* is distributed as a geometric random
variable with mean

The best systematic scan uses the order
*x*_{1}, *x*_{2},
…, *x _{n}*. For this scan, one sweep will
reach

The worst systematic scan uses the order
*x _{n}*,

With the sequence of dependencies model, we showed that a single good permutation can mix much faster than other scan orders. Next, we describe the two islands model (Figure 1b), which has the reverse behavior: it has bad permutations that yield much slower mixing times. The two islands model achieves this by having two disjoint blocks of variables such that consecutively sampling two variables from the same block accomplishes very little. As a result, a systematic scan that uses a permutation that frequently consecutively samples from the same block mixes a polynomial factor slower than both random scan and most other systematic scans. We now describe this model in detail.

There are 2*n* binary variables grouped into two
blocks: *x*_{1}, …,
*x _{n}* and

$$P(x,y)\propto \{\begin{array}{cc}0\hfill & \text{if}\exists {x}_{i}\text{true and}\exists {y}_{j}\text{true}\hfill \\ 1\hfill & \text{otherwise}\hfill \end{array}$$

(3)

This model can be interpreted as a machine learning inference
problem in the following way. Each variable represents whether the reasoning
in some sentence is sound. The sentences corresponding to
*x*_{1}, …,
*x _{n}* and the sentences corresponding to

We can think of the states as being divided into three groups:
states in island *x* (at least one of the *x*
variables are true), states in island *y* (at least one of
the *y* variables are true), and a single *bridge
state b* (all variables are false). The islands are
well-connected internally, but it is impossible to directly move from one
island to the other – the only way to move from one island to the
other is through the bridge. To simplify the analysis, we assume that the
bridge state has very low mass. This assumption allows us to assume that
islands mix rapidly in comparison to the time required to move onto the
bridge and that chains always move off of the bridge when a variable is
sampled. The same asymptotic behavior results when the bridge state has the
same mass as the other states.

The bridge is the only way to move from one island to the other, so
it acts as a bottleneck. As a result, the efficiency of bridge usage is
critical to the mixing time. We will use *bridge efficiency*
to refer to the probability that the chain moves to the other island when it
reaches the bridge. Under the assumption that mixing within the islands is
rapid in comparison to the time needed to move onto the bridge, the mixing
time will be inversely proportional to the bridge efficiency of the
chain.

In random scan, the variable selected after getting on the bridge is independent of the previous variable. As a result, with probability 1/2, the chain will move onto the other island, and with probability 1/2, the chain will return to the same island, so the bridge efficiency is 1/2.

Several different systematic scans achieve the fastest mixing time.
One such scan is *x*_{1},
*y*_{1}, *x*_{2},
*y*_{2}, …, *x _{n},
y_{n}*. Since the sampled variables alternate
between the blocks, if the chain moves onto the bridge (necessarily by
sampling a variable from the island it was previously on), it will always
proceed to sample a variable from the other block, which will cause it to
move onto the other island. Thus, the bridge efficiency is 1. More
generally, any systematic scan that alternates between sampling from

Several different systematic scans achieve the slowest mixing time.
One such scan is *x*_{1}, …,
*x _{n}*,

The mixing times of the chains are inversely proportional to the
bridge efficiency. As a result, random scan takes twice as long to mix as
the best systematic scan, and mixes *n*/2 times faster than
the worst systematic scan.

In the discrete pyramid model (Figure
1c) introduced by Roberts and Rosenthal [20], there are *n* binary variables
*x _{i}*, and the mass is uniformly distributed
over all states where at most one

In this section, we run several experiments to illustrate the effect of scan order on mixing times. First, in Figure 2a, we plot the mixing times of the models from Section 3 as a function of the number of variables. These experiments validate our results about the asymptotic scaling of the mixing time, as well as show that the scan order can have a significant effect on the mixing time for even small models. (Due to the exponential state space of the two islands model, we modify it slightly to make the computation of mixing times feasible: we simplify the model by only considering the states that are adjacent to the bridge, and assume that the states on each individual island mix instantly.)

In the following experiments, we consider a modified version of the two islands model, in which the mass of the bridge state is set to 0.1 of the mass of the other states to allow the effect of scan order to be clear even for a small number of variables. Figure 2b illustrates the rate at which different scan orders explore this modified model. Due to symmetry, we know that half of the mass should be on each island in the stationary distribution, so getting half of the mass onto the other island is necessary for mixing. This experiment illustrates that random scan and a good systematic scan move to the other island quickly, while a bad systematic scan requires many more iterations.

Figure 2c illustrates the effect that the permutation chosen for systematic scan can have on the mixing time. In this experiment, the mixing time for each permutation was found and plotted in sorted order. For the sequence of dependencies model, there are a small number of good permutations which mix very quickly compared to the other permutations and random scan. However, no permutation is bad compared to random scan. In the two islands model, as we would expect based on the analysis in Section 3, there are a small number of bad permutations which mix very slowly compared to the other permutations and random scan. Some permutations are slightly better than random scan, but none of the scan orders are substantially better. In addition, the mixing times for systematic scan approximately discretized due to the fact that mixing time depends so heavily on the bridge efficiency.

In Section 3, we described two models for which a systematic scan can mix a polynomial factor faster or slower than random scan, thus invalidating conventional wisdom that the scan order does not have an asymptotically significant effect on mixing times. This raises a question of how different the mixing times of different scans can be. In this section, we derive the following weaker – but correct – version of the conjecture stated by Diaconis [3] and Levin et al. [15].

One of the obstacles to proving this result is that the systematic scan
chain is not *reversible*. A standard method of handling
non-reversible Markov chains is to study a lazy version of the Markov chain instead
[15, p. 9]. In the lazy version of a
Markov chain, each step has a probability of 1/2 of staying at the current state,
and acts as a normal step otherwise. This is equivalent to stopping at a random time
that is distributed as a binomial random variable. Due to the fact that systematic
scan is not reversible, our bounds are on the lazy systematic scan, rather than the
standard systematic scan.

*For any random scan Gibbs sampler R and lazy systematic scan
sampler S with the same stationary distribution* π,
*their relative mixing times are bounded as follows*.

$${(1/2-\epsilon )}^{2}{t}_{\mathit{\text{mix}}}(R,\epsilon )\le 2{t}_{\mathit{\text{mix}}}^{2}(S,\epsilon )\text{log}\left(\frac{1}{\epsilon {\pi}_{\mathit{\text{min}}}}\right)$$

$${(1/2-\epsilon )}^{2}{t}_{\mathit{\text{mix}}}(S,\epsilon )\le \frac{8{n}^{2}}{{({\text{min}}_{x,i}{P}_{i}(x,x))}^{2}}{t}_{\mathit{\text{mix}}}^{2}(R,\epsilon )\text{log}\left(\frac{1}{\epsilon {\pi}_{\mathit{\text{min}}}}\right),$$

*where P _{i} is the transition matrix corresponding to resampling
just variable i, and* π

Under mild conditions, namely ε being fixed and the quantities $\text{log}({\pi}_{\text{min}}^{-1})$ and (min_{x,i}
*P _{i}*(

In the two islands models, the mixing time of a scan order was determined by its ability to move through a single bridge state that restricted flow. This suggests that a technique with the ability to model the behavior of this bridge state is needed to bound the relative mixing times of different scans. Conductance, also known as the bottleneck ratio, is a topological property of Markov chains used to bound mixing times by considering the flow of mass around the model [15, p. 88]. This ability to model bottlenecks in a Markov chain makes conductance a natural technique both for studying the two islands model and bounding mixing times in general.

More formally, consider a Markov chain on state space Ω with
transition matrix *P* and stationary distribution π. The
*conductance* of a set *S* and of the whole
chain are respectively defined as

$$\mathrm{\Phi}(S)=\frac{{\sum}_{x\in S,y\notin S}\pi (x)P(x,y)}{\pi (S)}\hspace{1em}\hspace{1em}\hspace{1em}{\mathrm{\Phi}}_{\u2605}=\underset{S:\pi (S)\le \frac{1}{2}}{\text{min}}\mathrm{\Phi}(S).$$

Conductance can be directly applied to analyze random scan. Let
*P _{i}*, be the transition matrix corresponding
to sampling variable

On the other hand, conductance cannot be directly applied to systematic
scan. Systematic scan is not a true Markov chain because it uses a sequence of
transition matrices rather than a single transition matrix. One standard method
of converting systematic scan into a true Markov chain is to consider each full
scan as one step of a Markov chain. However, this makes it difficult to compare
with random scan because it completely changes which states are connected by
single steps of the transition matrix. To allow systematic and random scan to be
compared more easily, we introduce an alternative way of converting systematic
scan to a true Markov chain by augmenting the state space. The augmented state
space is Ψ = Ω × [*n*], which represents
an ordered pair of the normal state and the index of the variable to be sampled.
The transition probability is *P* ((*x, i*),
(*y, j*)) = *P _{i}*(

Additionally, augmenting the state space for random scan allows easier
comparison with systematic scan in some cases. For augmented random scan, the
state space is also Ψ = Ω × [*n*], the
same as for systematic scan. The transition probability is $P((x,i),(y,j))=\frac{1}{n}{P}_{i}(x,y)$, which means that the next variable to sample is selected
uniformly. The stationary distributions of the augmented random scan and
systematic scan chains are both π ((*x, i*)) =
*n*^{−1}π(*x*).
Because the state space and stationary distribution are the same, augmented
random scan and augmented systematic scan can be compared directly, which lets
us prove the following lemma.

*For any random scan Gibbs sampler and systematic scan sampler
with the same stationary distribution* π,
*let* Φ_{RS}
*denote the conductance of the random scan process, let*
Φ_{RS-A}
*denote the conductance of the augmented random scan process, and
let* Φ_{SS-A}
*denote the conductance of the augmented systematic scan process.
Then,*

$$\frac{1}{2n}\xb7\underset{x,i}{\text{min}}\phantom{\rule{thinmathspace}{0ex}}{P}_{i}(x,x)\xb7{\mathrm{\Phi}}_{RS-A}\le {\mathrm{\Phi}}_{SS-A}\le {\mathrm{\Phi}}_{RS}.$$

In Lemma 1, the upper bound states that the conductance of systematic
scan is no larger than the conductance of random scan. We use this in the next
section to show that systematic scan cannot mix too much more quickly than
random scan. To prove this lemma, we show that for any set *S*
under random scan, the set *Ŝ* containing the
corresponding augmented states for systematic scan will have the same
conductance under systematic scan as *S* had under random
scan.

The lower bound in Lemma 1 states that the conductance of systematic
scan is no smaller than a function of the conductance of augmented random scan.
This function depends on the number of variables *n* and
min_{x,i}
*P _{i}*(

There are well-known bounds on the mixing time of a Markov chain in terms of its conductance, which we state in Theorem 2 [15, pp. 89, 235].

*For any lazy or reversible Markov chain,*

$$\frac{1/2-\epsilon}{{\mathrm{\Phi}}_{\u2605}}\le {t}_{\mathit{\text{mix}}}(\epsilon )\le \frac{2}{{\mathrm{\Phi}}_{\u2605}^{2}}\text{log}\left(\frac{1}{\epsilon {\pi}_{\mathit{\text{min}}}}\right).$$

It is straightforward to prove the result of Theorem 1 by combining these bounds with the conductance bounds from the previous section.

We studied the effect of scan order on mixing times of Gibbs samplers, and found that for particular models, the scan order can have an asymptotic effect on the mixing times. These models invalidate conventional wisdom about scan order and show that we cannot freely change scan orders without considering the resulting changes in mixing times. In addition, we found bounds on the mixing times of different scan orders, which replaces a common conjecture about the mixing times of random scan and systematic scan.

The authors acknowledge the support of: DARPA FA8750-12-2-0335; NSF IIS-1247701; NSF CCF-1111943; DOE 108845; NSF CCF-1337375; DARPA FA8750-13-2-0039; NSF IIS-1353606; ONR N000141210041 and N000141310129; NIH U54EB020405; NSF DGE-114747; DARPA’s SIMPLEX program; Oracle; NVIDIA; Huawei; SAP Labs; Sloan Research Fellowship; Moore Foundation; American Family Insurance; Google; and Toshiba. The views and conclusions expressed in this material are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA, AFRL, NSF, ONR, NIH, or the U.S. Government.

min_{α}
t_{mix}(S_{α})
versus
t_{mix}(R) | ||||
---|---|---|---|---|

≈ | ||||

max_{α}
t_{mix}(S_{α})versus t_{mix}(R) | Memorize and Repeat | × | × | |

≈ | Sequence of Dependencies | Known Cases | × | |

Soft Dependencies | Two Islands | Discrete Pyramid [20] |

Table 2 classifies the models based on the mixing times of the best systematic scan and worst systematic scan relative to random scan. For models in the first column, the best systematic scan mixes asymptotically faster than random scan. For models in the second column, they differ only up to logarithmic factors. For models in the third column, the best systematic scan mixes asymptotically slower than random scan. The rows are classified in the same way based on the worst systematic scan instead of the best systematic scan.

The models in Section 3 display possible problems that can occur when
changing scan orders. However, two behaviors were not shown by the models in
Section 3: a case where random scan is asymptotically worse than all systematic
scans, and a case where random scan is asymptotically better than a systematic
scan and asymptotically worse than another systematic scan. This section gives
two additional models that exhibit these behaviors, along with a more detailed
explanation of the discrete pyramid model. These two models are more complicated
than the models in Section 3 due to the fact that they require the number of
states for each variable to grow with *n*. Combined with the
models in Section 3, these models show that all logically consistent asymptotic
behaviors (that is, behaviors where the worst systematic scan is no better than
the best systematic scan) are possible.

The discrete pyramid model (Figure 1c) was first described by Roberts and Rosenthal [20] and is included for completeness to show that it is possible for random scan to mix asymptotically faster than all systematic scans.

There are *n* binary variables
*x*_{1}, …,
*x _{n}*. Conditioned on all other variables
being false, each variable is equally likely to be true or false.
However, the variables are all contradictory, and at most one variable
can be true. This model can also be interpreted as an

There are *n* + 1 states
*s*_{0}, *s*_{1},
…, *s _{n}* with nonzero probability.

The worst-case total variation occurs when the starting state is
not *x*_{0}. Suppose that the starting state is
*s _{k}*.

Once again, the worst-case total variation occurs when the
starting state is not *x*_{0}. Suppose that the
starting state is *s _{k}*. A systematic scan step
will change nothing until

Figure 3: Memorize and Repeat Model

We introduce the memorize and repeat model to show that it is possible for all systematic scans to be asymptotically better than random scan.

In this example (Figure
3), there are two types of states: states memorizing a
permutation, and states requiring the memorized permutation to be
repeated. In the stationary distribution, almost all of the mass is on
the state where a permutation has been memorized and repeated, and the
probability of the states increase exponentially from left to right.
There are *n* variables *x*_{1},
…, *x _{n}*. In the memorize phase,
sampling a variable will allow the state to change as long as the
variable has not been sampled before. In the repeat phase, a sampling a
variable will only allow the state to change if it is the next variable
in the memorized permutation. The repeat phase is similar to the
sequence of dependencies example, with the variables rearranged to match
the memorized permutation.

In the worst-case starting distribution, the wrong permutation
can already be memorized, which results in no asymptotic gap between
mixing times. To allow the difference in mixing time to appear even in
the worst-case analysis, we repeat the memorize and repeat process
*n* times.

Random scan encounters the coupon collector problem to get
through the memorize phase, so *O*(*n* log
*n*) steps are needed. Afterwards, the repeat phase
is equivalent to a sequence of dependencies, which requires
*O*(*n*^{2}) steps. Thus, to
get through one memorize and repeat chain,
*O*(*n*^{2}) steps are needed.
This mechanism is repeated *n* times, so
*O*(*n*^{3}) steps are
needed.

If a systematic scan starts at the beginning of a memorize
sequence, 2*n* steps are sufficient to move through the
memorize and repeat chain (*n* steps to memorize,
*n* steps to repeat). However, when analyzing the
worst-case total variation, we must consider the case where the wrong
permutation has already been memorized, which requires
*O*(*n*^{2}) steps to move
through. However, this can only happen once, so in the worst case
*O*(*n*^{2}) steps are needed
to move through the one incorrect permutation, and
*O*(*n*) steps are needed for the
remaining *n* − 1 memorize and repeat chains.
Thus, *O*(*n*^{2}) steps are
needed in total.

To simplify the explanation of the state space and formulation as a Gibbs sampler, this explanation is for only one memorize and repeat cycle. In the memorize phase, the states are labeled by the part of the permutation that has already been memorized. In the repeat phase, the states are labeled by the remainder of the permutation that still has to be repeated.

There are *n* integer-valued variables
*x*_{1}, …,
*x _{n}* with values ranging from 0 to

$${\text{log}}_{M}P(x)=-{\text{log}}_{M}Z+\{\begin{array}{cc}\text{max}({x}_{1},\dots ,{x}_{n})\hfill & \text{if}({x}_{1},\dots ,{x}_{n})\in \sigma (0,\dots ,0,1,\dots ,i)\hfill \\ n+{\sum}_{i=1}^{n}\mathtt{I}[{x}_{i}=n+1]\hfill & \text{if}({x}_{1},\dots ,{x}_{n})\in \sigma (i+1,\dots ,n,n+1,\dots ,n+1)\hfill \\ -\infty \hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}=-{\text{log}}_{M}Z+\{\begin{array}{cc}(\text{Number of Memorized Variables})\hfill & \text{if valid memorize state}\hfill \\ n+(\text{Number of Repeated Variables})\hfill & \text{if valid repeat state}\hfill \\ -\infty \hfill & \text{otherwise}\hfill \end{array}$$

First, for states in the memorize phase, if a variable that has
not yet been memorized is sampled, then its value will almost always
change to *i* + 1, where *i* is the number
of variables already memorized, due to the large value of
*M*. However, sampling a variable that has already
been memorized will not change its value because all other values will
either result in an invalid state (and have a probability of 0) or be
significantly less likely due to the large value of *M*.
Next, for states in the repeat phase, if the correct variable is
sampled, then its value will almost always change to *n*
+ 1. However, sampling the wrong variable will not change its value
because all other values result in an invalid state or decrease the
probability of the state.

In the soft dependencies model, some systematic scans mix asymptotically faster than random scan, and some systematic scans mix asymptotically slower.

This example resembles the memorize portion of the previous
example. However, no repeat phase is included, and only some
permutations are accepted. In particular, a permutation is accepted only
if each element of the permutation is followed by an element that is in
the next $\sqrt{n}$ unused variables that come after it (mod
*n*).

More formally, consider a permutation
(*a*_{1}, …,
*a _{n}*) of (1, …,

$$\sqrt{n}\ge |\{j\in \{1,\dots ,n\}|j>i+1\text{and}{a}_{j}-{a}_{i}(\text{mod}n){a}_{i+1}-{a}_{i}(\text{mod}n)\}|$$

In this condition, the requirement that
*j* > *i* + 1 means that
*a _{j}* is an unused element, and the
requirement that

$$\sqrt{n}\ge \{\begin{array}{cc}{\sum}_{j=i+2}^{n}\mathtt{I}[{a}_{i}<{a}_{j}<{a}_{i+1}]\hfill & \text{if}{a}_{i}{a}_{i+1}\hfill \\ {\sum}_{j=i+2}^{n}\mathtt{I}[{a}_{i}{a}_{j}\text{or}{a}_{j}{a}_{i+1}]\hfill & \text{if}{a}_{i}{a}_{i+1}\hfill \end{array}$$

First, consider when there are at least $\sqrt{n}$ remaining transitions. The first $n-\sqrt{n}=O(n)$ transitions will have this property. During this
period, random scan advances to the next state with probability $n/\sqrt{n}=\sqrt{n}$. Thus, $O(n\sqrt{n})$ steps are needed to get until there are fewer than $\sqrt{n}$ remaining transitions. Once there are fewer than $\sqrt{n}$ remaining transitions, random scan needs
*O*(*n*) steps to make each
transition, so $O(n\sqrt{n})$ more steps are needed. In total, $O(n\sqrt{n})$ steps are needed.

The best systematic scan uses the order
*x*_{1}, …,
*x _{n}*. This uses

The worst systematic scan uses the order
*x _{n}*,

The state space has a similar form to the memorize portion of the previous example. The states are labeled by the portion of the permutation that has already been given, but only prefixes to valid permutations are accepted.

There are *n* integer-valued variables
*x*_{1}, …,
*x _{n}* with values ranging from 0 to

In addition, to merge the permutations together into one state,
all variables need to be sampled one more time (note that this does not
require the variables to be sampled in a particular order). This extra
sampling process only adds lower order terms and constant factors to the
mixing time. Once each variable is sampled again, its value is changed
to *n*.

The log probability distribution is the following, where
*M* is a very large constant, *Z* is
the normalizing constant, and σ(*a*_{1},
…, *a _{n}*) denotes the set of all
permutations of (

$${\text{log}}_{M}P(x)=-{\text{log}}_{M}Z+\{\begin{array}{cc}\text{max}({x}_{1},\dots ,{x}_{n})\hfill & \text{if}({x}_{1},\dots ,{x}_{n})\in \sigma (0,\dots ,0,1,\dots ,i)\text{and is valid}\hfill \\ n-1+{\sum}_{i=1}^{n}\mathtt{I}[{x}_{i}=n]\hfill & \text{if at least two of}({x}_{1},\dots ,{x}_{n})\text{have a value of}n\hfill \\ -\infty \hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}=-{\text{log}}_{M}Z+\{\begin{array}{cc}(\text{Number of Memorized Variables})\hfill & \text{if valid memorize state}\hfill \\ n-1+(\text{Number of Merged Variables})\hfill & \text{if valid merge state}\hfill \\ -\infty \hfill & \text{otherwise}\hfill \end{array}$$

First, in the memorize phase, sampling a valid unused variable
will almost always change its value to *i* + 1, where
*i* is the number of variables already memorized, due
to the large value of *M*. However, sampling a variable
that has already been memorized or sampling a variable that least to an
invalid permutation will not change its value. Then, when the last
variable in the permutation is sampled, its value changes to
*n*. Afterwards, sampling any variable will also
change its value to *n*.

To simplify the analysis of the sequence of dependencies, we assumed
that the priors on the variables were strong enough such sampling the correct
variable essentially always caused the state to advance. We now analyze how
large *M* (the strength of the priors) needs to be in order to
allow the best systematic scan to mix in *O*(*n*)
time. In this section, we show that if *M* =
Ω(*n*), then the best systematic scan mixes in
*O*(*n*) time, and if *M* =
*o*(*n*), then the best systematic scan cannot
mix *0*(*n*^{2}) time.

First, suppose that *M* = Ω(*n*).
This means that *M* ≥ *cn* for some
*c* > 0 and sufficiently large *n*.
The probability of transitioning from
*s*_{i−1} to
*s _{i}*, when variable

$$\frac{M}{1+M}\ge \frac{cn}{1+cn}=1-\frac{1}{1+cn}>1-\frac{1}{cn}.$$

The probability of transitioning from
*s*_{0} to *s _{n}* after
sampling the sequence

On the other hand, suppose that *M* =
*o*(*n*). Then, for any *c*
> 0, *M* < *cn* for sufficiently
large *n*. This means that as *n* increases, the
probability that a single sweep of the best systematic scan will reach
*s _{n}* will go to 0, so no constant number of
sweeps will be sufficient to mix.

In the analysis of the two islands model, we assumed that the bridge has negligible mass. We now analyze the mixing times without the assumption that the bridge has small mass. The same asymptotic behavior still results.

Even when the bridge has the same mass as the other states, it still
acts as a bottleneck to the model. For sufficiently large *n*,
the islands will mix rapidly in comparison reaching the bridge, so the mixing
time is still inversely proportional to the bridge efficiency. However, when the
bridge has the same mass as the other states, sampling a variable while on the
bridge will only have a 1/2 chance of moving off of the bridge.

In random scan, the variables that are sampled are completely independent, so the bridge efficiency is still 1/2.

Consider the scan *x*_{1},
*y*_{1}, *x*_{2},
*y*_{2}, …. Suppose the sampling
*x*_{1} changes the state to the bridge state
– a similar analysis will apply for any other variable. The next
variable is *y*_{1}, which will change the state to
the other island with probability 1/2. Afterwards,
*x*_{2} is sampled, which will change the state
to the same island with probability 1/4. Then,
*y*_{2} is sampled, which will change the state
to the other island with probability 1/8. Thus, the probability of moving to
the other island is 1/2 + 1/8 + 1/32 + … = 2/3.

Consider the scan *x*_{1}, …,
*x _{n}*,

In this section, we prove our relative bounds on mixing times (Theorem 1), along with related claims and lemmas.

*The stationary distribution of augmented random and
systematic scan is*

$$\pi ((x,i))=\frac{1}{n}\pi (x)$$

We prove this claim by showing that applying the transition matrix for augmented random scan or augmented systematic scan does not change this distribution.

For augmented random scan,

$$\sum _{x,i}\pi ((x,i))P((x,i),(y,j))=\sum _{x,i}\frac{\pi (x)}{n}\xb7P((x,i),(y,j))=\frac{1}{n}\sum _{x,i}\pi (x)\xb7\frac{1}{n}{P}_{i}(x,y)\phantom{\rule{0ex}{0ex}}=\frac{1}{n}\sum _{i=1}^{n}[\sum _{x\in \mathrm{\Omega}}(\pi (x){P}_{i}(x,y))\xb7\frac{1}{n}]\phantom{\rule{0ex}{0ex}}=\frac{1}{n}\sum _{i=1}^{n}[\pi (y)\xb7\frac{1}{n}]=\frac{1}{n}\pi (y)\phantom{\rule{0ex}{0ex}}=\pi ((y,j))$$

For augmented systematic scan,

$$\sum _{x,i}\pi ((x,i))P((x,i),(y,j))=\sum _{x,i}\frac{\pi (x)}{n}\xb7P((x,i),(y,j))=\frac{1}{n}\sum _{x,i}\pi (x)\xb7{P}_{i}(x,y)s(i,j)\phantom{\rule{0ex}{0ex}}=\frac{1}{n}\sum _{i=1}^{n}[\sum _{x\in \mathrm{\Omega}}(\pi (x){P}_{i}(x,y))s(i,j)]\phantom{\rule{0ex}{0ex}}=\frac{1}{n}\sum _{i=1}^{n}[\pi (y)s(i,j)]=\frac{1}{n}\pi (y)\phantom{\rule{0ex}{0ex}}=\pi ((y,j))$$

*For any random scan Gibbs sampler and systematic scan
sampler with the same stationary distribution*
π*, let* Φ_{RS}
*denote the conductance of the random scan process, let*
Φ_{RS-A}
*denote the conductance of the augmented random scan process, and
let* Φ_{SS-A}
*denote the conductance of the augmented systematic scan process.
Then,*

$$\frac{1}{2n}\xb7\underset{x,i}{\text{min}}\phantom{\rule{thinmathspace}{0ex}}{P}_{i}(x,x)\xb7{\mathrm{\Phi}}_{RS-A}\le {\mathrm{\Phi}}_{SS-A}\le {\mathrm{\Phi}}_{RS}.$$

The conductance of the whole chain is the smallest
conductance of any set with mass no larger than $\frac{1}{2}$. Then, to prove that this inequality holds, we
will show that for any set *S* Ω
with mass no larger than $\frac{1}{2}$, there exists a set *T*
Ψ with mass no larger than $\frac{1}{2}$ such that the conductance of *S*
under random scan is the same as the conductance of
*T* under augmented systematic scan.

From the standpoint of random scan, consider a set
*S* Ω with mass no larger than $\frac{1}{2}$. The conductance is

$${\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}(S)=\frac{{\sum}_{x\in S}{\sum}_{y\in {S}^{\mathrm{c}}}\pi (x)P(x,y)}{\pi (S)}=\frac{\frac{1}{n}{\sum}_{i=1}^{n}{\sum}_{x\in S}{\sum}_{y\in {S}^{\mathrm{c}}}\pi (x){P}_{i}(x,y)}{\pi (S)}$$

Then, for augmented systematic scan, consider the set
*T* = {(*x, i*) :
*x* *S, i*
[*n*]}. First, notice that $\pi (T)=\pi (S)\le \frac{1}{2}$. The conductance is

$${\mathrm{\Phi}}_{\mathrm{S}\mathrm{S}-\mathrm{A}}(T)=\frac{{\sum}_{(x,i)\in T}{\sum}_{(y,j)\in {T}^{\mathrm{c}}}\pi ((x,i))P((x,i),(y,j))}{\pi (T)}\phantom{\rule{0ex}{0ex}}=\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{n}{\sum}_{x\in S}{\sum}_{y\in {S}^{\mathrm{c}}}\pi ((x,i))P((x,i),(y,j))}{\pi (S)}\phantom{\rule{0ex}{0ex}}=\frac{1}{n}\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{n}{\sum}_{x\in S}{\sum}_{y\in {S}^{\mathrm{c}}}\pi (x){P}_{i}(x,y)s(i,j)}{\pi (S)}\phantom{\rule{0ex}{0ex}}=\frac{1}{n}\frac{{\sum}_{i=1}^{n}{\sum}_{x\in S}{\sum}_{y\in {S}^{\mathrm{c}}}\pi (x){P}_{i}(x,y)}{\pi (S)}\phantom{\rule{0ex}{0ex}}={\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}(S)$$

This implies that for any *S*
Ω with $\pi (S)\le \frac{1}{2}$, there exists *T*
Ψ with $\pi (T)\le \frac{1}{2}$ such that
Φ_{RS}(*S*) =
Φ_{SS}(*T*). Therefore,
Φ_{SS-A} ≤ Φ_{RS}.

In this proof, we will work with the *flow*
between two sets

$$Q(A,B)=\sum _{x\in A,y\in B}\pi (x)P(x,y).$$

Notice that the conductance of a set can then be defined as

$$\mathrm{\Phi}(S)=\frac{Q(S,{S}^{c})}{\pi (S)}.$$

To prove that this inequality holds for the whole chain, we
will show that the same inequality holds for any set
*S* Ψ. Consider some arbitrary
state *x* Ω. Flow can leave from
the corresponding augmented states in two ways: flowing from some
(*x, i*) *S* to
(*x, j*) *S ^{c}*
or flowing from (

Now, we will find upper bounds for the random scan flows
and lower bounds for the systematic scan flows. In the following
statements, it is implicit that *y* ≠
*x*, and γ = min_{x,
i}
*P _{i}*(

First, we bound the amount of flow from (*x,
i*) *S* to (*x,
j*) *S ^{c}*. For
augmented random scan, the following upper bound holds.

$${Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{x}^{c})=\sum _{(x,i)\in {S}_{\mathrm{x}},(x,j)\in {S}_{\mathrm{x}}^{\mathrm{c}}}\pi ((x,i))P((x,i),(x,j))\phantom{\rule{0ex}{0ex}}=\sum _{(x,i)\in {S}_{\mathrm{x}},(x,j)\in {S}_{\mathrm{x}}^{\mathrm{c}}}\frac{1}{n}\pi (x)\xb7\frac{1}{n}{P}_{i}(x,x)\phantom{\rule{0ex}{0ex}}\le \sum _{(x,i)\in {S}_{\mathrm{x}},(x,j)\in {S}_{\mathrm{x}}^{\mathrm{c}}}\frac{1}{n}\pi (x)\xb7\frac{1}{n}\phantom{\rule{0ex}{0ex}}=\frac{|{S}_{x}|}{n}\pi (x)\frac{n-|{S}_{x}|}{n}\phantom{\rule{0ex}{0ex}}\le \{\begin{array}{cc}\frac{1}{4}\pi (x)\hfill & \text{if}|{S}_{x}|\ne 0,n\hfill \\ 0\hfill & \text{if}|{S}_{x}|=0,n\hfill \end{array}$$

For augmented systematic scan, the following lower bound holds.

$${Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{x}^{c})=\sum _{(x,i)\in {S}_{\mathrm{x}},(x,j)\in {S}_{x}^{\mathrm{c}}}\pi ((x,i))P((x,i),(x,j))\phantom{\rule{0ex}{0ex}}=\sum _{(x,i)\in {S}_{\mathrm{x}},(x,j)\in {S}_{\mathrm{x}}^{\mathrm{c}}}\frac{1}{n}\pi (x)\xb7{P}_{i}(x,x)s(i,j)\phantom{\rule{0ex}{0ex}}\ge \sum _{(x,i)\in {S}_{\mathrm{x}},(x,j)\in {S}_{\mathrm{x}}^{\mathrm{c}}}\frac{1}{n}\pi (x)\xb7\gamma \xb7s(i,j)\phantom{\rule{0ex}{0ex}}\ge \{\begin{array}{cc}\frac{1}{n}\pi (x)\gamma \hfill & \text{if}|{S}_{x}|\ne 0,n\hfill \\ 0\hfill & \text{if}|{S}_{x}|=0,n\hfill \end{array}$$

Similarly,

$${Q}_{\mathrm{S}\mathrm{S}}({S}_{y},{S}_{y}^{c})\ge \{\begin{array}{cc}\frac{1}{n}\pi (y)\gamma \hfill & \text{if}|{S}_{y}|\ne 0,n\hfill \\ 0\hfill & \text{if}|{S}_{y}|=0,n\hfill \end{array}$$

Now, we bound the amount of flow from *x* to
*y* for *y* ≠
*x*. Note that
*P _{i}*(

$${Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{y}^{c})=\{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\frac{n-|{S}_{y}|}{n}\hfill & \text{if}(x,i)\in S\hfill \\ 0\hfill & \text{if}(x,i)\notin S\hfill \end{array}\phantom{\rule{0ex}{0ex}}\le \{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|\ne 0\hfill \\ 0\hfill & \text{if}(x,i)\notin S\text{or}|{S}_{y}^{c}|=0\hfill \end{array}$$

In the derivation of the next bound, note that if $|{S}_{y}^{c}|=n$, then we are guaranteed that $(y,i+1(\text{mod}n))\in {S}_{y}^{c}$. For augmented systematic scan, the following lower bound holds.

$${Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c})=\{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\hfill & \text{if}(x,i)\in S\text{and}(y,i+1)\in {S}^{c}\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}\ge \{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|=n\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$

Now, we can derive several inequalities between the
augmented random scan flow and the augmented systematic scan flow as
direct consequences of the bounds we just found. First, we bound the
relative flows from *S _{x}* to ${S}_{x}^{c}$.

$${Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{x}^{c})\ge \frac{4\gamma}{n}{Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{x}^{c})\ge \frac{\gamma}{n}{Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{x}^{c})$$

Next, we bound the relative flows from *x*
to *y*, where *x* and
*y* differ in exactly variable
*i*.

$${Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c})+\frac{1}{n}{P}_{i}(y,x){Q}_{\mathrm{S}\mathrm{S}}({S}_{y},{S}_{y}^{c})\phantom{\rule{0ex}{0ex}}\ge \{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|=n\hfill \\ \frac{1}{n}{P}_{i}(y,x)\xb7\frac{1}{n}\pi (y)\gamma \hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|\ne 0,n\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}=\{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|=n\hfill \\ \frac{1}{{n}^{2}}\pi (x){P}_{i}(x,y)\gamma \hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|\ne 0,n\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}\ge \{\begin{array}{cc}\frac{1}{{n}^{2}}\pi (x){P}_{i}(x,y)\gamma \hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|\ne 0\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}=\frac{\gamma}{n}\{\begin{array}{cc}\frac{1}{n}\pi (x){P}_{i}(x,y)\hfill & \text{if}(x,i)\in S\text{and}|{S}_{y}^{c}|\ne 0\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}\phantom{\rule{0ex}{0ex}}\ge \frac{\gamma}{n}{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{y}^{c})$$

Finally, we bound the relative flows.

$${Q}_{\mathrm{S}\mathrm{S}}(S,{S}^{c})=\sum _{x\in \mathrm{\Omega}}\sum _{y\in \mathrm{\Omega}}{Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c})\phantom{\rule{0ex}{0ex}}=\sum _{x\in \mathrm{\Omega}}({Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{x}^{c})+\sum _{y\ne x}{Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c}))\phantom{\rule{0ex}{0ex}}\ge \frac{1}{2}\sum _{x\in \mathrm{\Omega}}(2{Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{x}^{c})+\sum _{y\ne x}{Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c}))\phantom{\rule{0ex}{0ex}}\frac{1}{2}\sum _{x\in \mathrm{\Omega}}({Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{x}^{c})+\sum _{y\in \mathrm{\Omega}}\sum _{i=1}^{n}\frac{1}{n}{P}_{i}(y,x){Q}_{\mathrm{S}\mathrm{S}}({S}_{y},{S}_{y}^{c})+\sum _{y\ne x}{Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c}))\phantom{\rule{0ex}{0ex}}\ge \frac{1}{2}\sum _{x\in \mathrm{\Omega}}({Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{x}^{c})+\sum _{y\ne x}({Q}_{\mathrm{S}\mathrm{S}}({S}_{x},{S}_{y}^{c})+\sum _{i=1}^{n}\frac{1}{n}{P}_{i}(y,x){Q}_{\mathrm{S}\mathrm{S}}({S}_{y},{S}_{y}^{c})\left)\right)\phantom{\rule{0ex}{0ex}}\ge \frac{1}{2}\sum _{x\in \mathrm{\Omega}}(\frac{\gamma}{n}{Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{x}^{c})+\frac{\gamma}{n}\sum _{y\ne x}{Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{y}^{c}))\phantom{\rule{0ex}{0ex}}\ge \frac{\gamma}{2n}\sum _{x\in \mathrm{\Omega}}({Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{x}^{c})+\sum _{y\ne x}{Q}_{\mathrm{R}\mathrm{S}}({S}_{x},{S}_{y}^{c}))\phantom{\rule{0ex}{0ex}}=\frac{\gamma}{2n}{Q}_{\mathrm{S}\mathrm{S}}(S,{S}^{c})$$

The mass of *S* is the same for augmented
random scan and augmented systematic scan, so the same inequality
holds for the conductances of the sets. Finally, because this
inequality holds for any set *S*, the inequality also
holds for the conductance of the whole chain.

*For any lazy or reversible Markov chain,*

$$\frac{1/2-\epsilon}{{\mathrm{\Phi}}_{\star}}\le {t}_{\mathit{\text{mix}}}(\epsilon )\le \frac{2}{{\mathrm{\Phi}}_{\star}^{2}}\text{log}\left(\frac{1}{\epsilon {\pi}_{\mathit{\text{min}}}}\right).$$

The lower bound of this inequality,

$$\frac{1/2-\epsilon}{{\mathrm{\Phi}}_{\star}}\le {t}_{\text{mix}}(\epsilon )$$

is is shown by Theorem 7.3 of [15] and holds for any Markov chain – that is, it does not actually require the Markov chain to be lazy or reversible.

The upper bound of this inequality,

$${t}_{\text{mix}}(\epsilon )\le \frac{2}{{\mathrm{\Phi}}_{\star}^{2}}\text{log}\left(\frac{1}{\epsilon {\pi}_{\text{min}}}\right),$$

is shown by Theorem 17.10 of [15].

*For any random scan Gibbs sampler R and lazy systematic
scan sampler S with the same stationary distribution*
π*, their relative mixing times are bounded as
follows*.

$${(1/2-\epsilon )}^{2}{t}_{\mathit{\text{mix}}}(R,\epsilon )\le 2{t}_{\mathit{\text{mix}}}^{2}(S,\epsilon )\text{log}\left(\frac{1}{\epsilon {\pi}_{\mathit{\text{min}}}}\right)$$

$${(1/2-\epsilon )}^{2}{t}_{\mathit{\text{mix}}}(S,\epsilon )\le \frac{8{n}^{2}}{{({\text{min}}_{x,i}{P}_{i}(x,x))}^{2}}{t}_{\mathit{\text{mix}}}^{2}(R,\epsilon )\text{log}\left(\frac{1}{\epsilon {\pi}_{\mathit{\text{min}}}}\right),$$

*where P _{i} is the transition matrix corresponding to
resampling just variable i, and*
π

First, we upper bound the mixing time of random scan.

$${t}_{\text{mix}}(R,\epsilon )\le \frac{2}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}^{2}}\text{log}\left(\frac{1}{\epsilon {\pi}_{\text{min}}}\right)$$

Next, we lower bound the mixing time for systematic scan.

$${t}_{\text{mix}}(S,\epsilon )\ge \frac{1/2-\epsilon}{{\mathrm{\Phi}}_{\mathrm{S}\mathrm{S}-\mathrm{A}}}\ge \frac{1/2-\epsilon}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}}$$

This theorem results from algebraic manipulation of the previous two inequalities.

$${t}_{\text{mix}}^{2}(S,\epsilon )\ge \frac{{(1/2-\epsilon )}^{2}}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}^{2}}$$

$${(1/2-\epsilon )}^{2}{t}_{\text{mix}}(R,\epsilon )\le 2{t}_{\text{mix}}^{2}(S,\epsilon )\text{log}\left(\frac{1}{\epsilon {\pi}_{\text{min}}}\right)$$

First, we lower bound the mixing time for random scan.

$${t}_{\text{mix}}(R,\epsilon )\ge \frac{1/2-\epsilon}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}-\mathrm{A}}}$$

$${t}_{\text{mix}}{(R,\epsilon )}^{2}\ge \frac{{(1/2-\epsilon )}^{2}}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}-\mathrm{A}}^{2}}$$

Next, we manipulate the lower bound of Lemma 1.

$${\mathrm{\Phi}}_{\mathrm{S}\mathrm{S}}\ge \frac{1}{2n}\xb7\underset{x,i}{\text{min}}\phantom{\rule{thinmathspace}{0ex}}{P}_{i}(x,x)\xb7{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}-\mathrm{A}}$$

$$\frac{1}{{\mathrm{\Phi}}_{\mathrm{S}\mathrm{S}}^{2}}\le \frac{4{n}^{2}}{{({\text{min}}_{x,i}\phantom{\rule{thinmathspace}{0ex}}{P}_{i}(x,x))}^{2}}\xb7\frac{1}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}-\mathrm{A}}^{2}}$$

Using this result, we upper bound the mixing time for systematic scan.

$${t}_{\text{mix}}(S,\epsilon )\le \frac{2}{{\mathrm{\Phi}}_{\mathrm{S}\mathrm{S}}^{2}}\text{log}\left(\frac{1}{\epsilon {\pi}_{\text{min}}}\right)\le \frac{8{n}^{2}}{{({\text{min}}_{x,i}\phantom{\rule{thinmathspace}{0ex}}{P}_{i}(x,x))}^{2}}\xb7\frac{1}{{\mathrm{\Phi}}_{\mathrm{R}\mathrm{S}}^{2}}\text{log}\left(\frac{1}{\epsilon {\pi}_{\text{min}}}\right)$$

This theorem results from the previous inequalities.

$${(1/2-\epsilon )}^{2}{t}_{\text{mix}}(S,\epsilon )\le \frac{8{n}^{2}}{{({\text{max}}_{x,i}\phantom{\rule{thinmathspace}{0ex}}{P}_{i}(x,x))}^{2}}{t}_{\text{mix}}{(R,\epsilon )}^{2}\text{log}\left(\frac{1}{\epsilon {\pi}_{\text{min}}}\right)$$

1. Benjamini Itai, Berger Noam, Hoffman Christopher, Mossel Elchanan. Mixing times of the biased card shuffling and the asymmetric
exclusion process. Transactions of the American Mathematical Society. 2005;357(8):3013–3029.

2. De Sa Christopher, Zhang Ce, Olukotun Kunle, Ré Christopher. Rapidly mixing gibbs sampling for a class of factor graphs using
hierarchy width. Advances in Neural Information Processing Systems. 2015 [PMC free article] [PubMed]

3. Diaconis Persi. Some things we’ve learned (about markov chain monte
carlo) Bernoulli. 2013;19(4):1294–1305.

4. Diaconis Persi, Ram Arun. Analysis of systematic scan metropolis algorithms using
iwahori-hecke algebra techniques. The Michigan Mathematical Journal. 2000;48(1):157–190.

5. Dyer Martin, Goldberg Leslie Ann, Jerrum Mark. Systematic scan for sampling colorings. The Annals of Applied Probability. 2006;16(1):185–230.

6. Dyer Martin, Goldberg Leslie Ann, Jerrum Mark. Dobrushin conditions and systematic scan. Combinatorics, Probability and Computing. 2008;17(06):761–779.

7. Finkel Jenny Rose, Grenager Trond, Manning Christopher. Incorporating non-local information into information extraction
systems by gibbs sampling. Proceedings of the 43rd Annual Meeting on Association for Computational
Linguistics. 2005

8. Gelfand Alan E, Smith Adrian FM. Sampling-based approaches to calculating marginal
densities. Journal of the American Statistical Association. 1990;85(410):398–409.

9. Geman Stuart, Geman Donald. Stochastic relaxation, gibbs distributions, and the bayesian
restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984;(6):721–741. [PubMed]

10. Gotovos Alkis, Hassani Hamed, Krause Andreas. Sampling from probabilistic submodular models. Advances in Neural Information Processing Systems. 2015

11. Griffiths Thomas L, Steyvers Mark. Finding scientific topics. Proceedings of the National Academy of Sciences. 2004;101(suppl 1):5228–5235. [PubMed]

12. Gürbüzbalaban Mert, Ozdaglar Asu, Parrilo Pablo. Convergence rate of incremental gradient and newton
methods. arXiv preprint arXiv:1510.08562. 2015

13. Hayes Thomas P. A simple condition implying rapid mixing of single-site dynamics
on spin systems. 2006 47th Annual IEEE Symposium on Foundations of Computer Science
(FOCS’06) 2006

14. Hellerstein Joseph M, Ré Christoper, Schoppmann Florian, Wang Daisy Zhe, Fratkin Eugene, Gorajek Aleksander, Ng Kee Siong, Welton Caleb, Feng Xixuan, Li Kun, et al. The madlib analytics library: or mad skills, the
sql. Proceedings of the VLDB Endowment. 2012;5(12):1700–1711.

15. Levin David Asher, Peres Yuval, Wilmer Elizabeth Lee. Markov chains and mixing times. American Mathematical Society; 2009.

16. Lunn David, Spiegelhalter David, Thomas Andrew, Best Nicky. The bugs project: Evolution, critique and future
directions. Statistics in medicine. 2009;28(25):3049–3067. [PubMed]

17. McCallum Andrew, Schultz Karl, Singh Sameer. Factorie: Probabilistic programming via imperatively defined
factor graphs. Advances in Neural Information Processing Systems. 2009

18. Plummer Martyn. Jags: A program for analysis of bayesian graphical models using
gibbs sampling. Proceedings of the 3rd international workshop on distributed statistical
computing. 2003

19. Recht Benjamin, Ré Christopher. Beneath the valley of the noncommutative arithmetic-geometric
mean inequality: conjectures, case-studies, and consequences; Proceedings of the 25th Annual Conference on Learning Theory; 2012.

20. Roberts Gareth O, Rosenthal Jeffrey S. Surprising convergence properties of some simple gibbs samplers
under various scans. International Journal of Statistics and Probability. 2015;5(1):51–60.

21. Smola Alexander, Narayanamurthy Shravan. An architecture for parallel topic models. Proceedings of the VLDB Endowment. 2010;3(1):703–710.

22. Zhang Ce, Ré Christopher. Towards high-throughput gibbs sampling at scale: A study across
storage managers; Proceedings of the 2013 ACM SIGMOD International Conference on
Management of Data; 2013.

23. Zhang Yongyue, Brady Michael, Smith Stephen. Segmentation of brain mr images through a hidden markov random
field model and the expectation-maximization algorithm. IEEE Transactions on Medical Imaging. 2001;20(1):45–57. [PubMed]

24. Zhu Song Chun, Wu Yingnian, Mumford David. Filters, random fields and maximum entropy (frame): Towards a
unified theory for texture modeling. International Journal of Computer Vision. 1998;27(2):107–126.