Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC4645990

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Sampling Method
- 3 Quality Assessment
- 4 Evaluation and Comparison with MCMC in an Exactly Computable Case
- 5 Sample Applications
- 6 Discussion
- 7 Conclusion
- References

Authors

Related links

J Math Sociol. Author manuscript; available in PMC 2016 July 9.

Published in final edited form as:

J Math Sociol. 2015; 39(3): 174–202.

Published online 2015 July 9. doi: 10.1080/0022250X.2015.1022279PMCID: PMC4645990

NIHMSID: NIHMS682292

Carter T. Butts^{2}

See other articles in PMC that cite the published article.

Stochastic models for finite binary vectors are widely used in sociology, with examples ranging from social influence models on dichotomous behaviors or attitudes to models for random graphs. Exact sampling for such models is difficult in the presence of dependence, leading to the use of Markov chain Monte Carlo (MCMC) as an approximation technique. While often effective, MCMC methods have variable execution time, and the quality of the resulting draws can be difficult to assess. Here, we present a novel alternative method for approximate sampling from binary discrete exponential families having fixed execution time and well-defined quality guarantees. We demonstrate the use of this sampling procedure in the context of random graph generation, with an application to the simulation of a large-scale social network using both geographical covariates and dyadic dependence mechanisms.

Stochastic models for finite binary vectors are widely used in sociology, with examples ranging from social influence models on dichotomous behaviors or attitudes (Coleman, 1964; Latané, 1996; Holyst et al., 2001; Robins et al., 2001) to models for random graphs (Holland and Leinhardt, 1981; Skvoretz et al., 2004; Robins et al., 2005; Wasserman and Robins, 2005). Exact sampling for such models is often quite difficult in the presence of dependence, generally leading to the use of Markov chain Monte Carlo (MCMC) as an approximation technique (Snijders, 2002; Hunter et al., 2008). While often effective, MCMC methods have variable execution time (i.e., time to convergence), can be slow to converge (Bhamidi et al., 2011), and produce draws whose quality can be difficult to assess (Gelman, 1996). Here, we present an alternative method for approximate sampling from binary discrete exponential families having fixed execution time and well-defined quality guarantees. In many cases, worst-case complexity of the sampling process is no worse than the worst-case complexity of an exact sampler for independent, inhomogeneous binary variables (i.e., order *N*, where *N* is the number of variables). We demonstrate the use of this sampling procedure in the context of random graph generation, with application to simulation of exponential family random graph models (ERGMs) fit to network data, and to the exploratory simulation of large-scale social networks using both geographical covariates and dyadic dependence mechanisms.

Let *Y* = *Y*_{1}, . . . , *Y _{N}* be a finite vector of binary-valued random variables with joint support ${\mathcal{Y}}_{N}$. In a social network context, each

$$\mathrm{Pr}(Y=y\mid \theta )=\frac{\mathrm{exp}\left[{\theta}^{T}t\left(y\right)\right]}{{\sum}_{{y}^{\prime}\in {\mathcal{Y}}_{N}}\mathrm{exp}\left[{\theta}^{T}t\left({y}^{\prime}\right)\right]}{I}_{{\mathcal{Y}}_{N}}\left(y\right),$$

(1)

where $t:{\mathcal{Y}}_{N}\mapsto {\mathbb{R}}^{p}$ is a vector of sufficient statistics, $\theta \in {\mathbb{R}}^{p}$ is a parameter vector, and ${I}_{{\mathcal{Y}}_{N}}$ is an indicator function for membership in the support. Note that we may also write *Y*'s joint distribution in a very different, hierarchical form:

$$\mathrm{Pr}(Y=y\mid \theta )=\mathrm{Pr}({Y}_{1}={y}_{1}\mid \theta )\mathrm{Pr}({Y}_{2}={y}_{2}\mid \theta ,{Y}_{1}={y}_{1})\dots \times \mathrm{Pr}({Y}_{N}={y}_{N}\mid \theta ,{Y}_{1}={y}_{1},\dots ,{Y}_{N-1}={y}_{N-1}).$$

(2)

With the exception of relatively trivial cases, neither Equation 1 nor Equation 2 is of much direct use: direct calculation of the normalizing factor of the former is generally prohibitive, and the partially marginalized distributions of the latter (which involve similar normalizing factors) are generally unknown. Standard approaches to simulating *Y* in the general case typically employ Markov chain Monte Carlo methods, exploiting the fact that

$$\frac{\mathrm{Pr}(Y={y}^{\prime}\mid \theta )}{\mathrm{Pr}(Y=y\mid \theta )}=\mathrm{exp}\left[{\theta}^{T}[t\left({y}^{\prime}\right)-t\left(y\right)]\right],$$

(3)

and hence that calculations depending only on probability ratios for alternative realizations of *Y* (as required, e.g., for Metropolis-Hastings algorithms) are often computationally inexpensive. (See, e.g. Morris et al. (2008) in the ERGM case.) Our approach in the present paper utilizes a different (but closely related) property of Equation 1, namely that

$$\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{i}^{c}={y}_{i}^{c})={[1+\mathrm{exp}\left[{\theta}^{T}[t\left({y}_{i}^{-}\right)-t\left({y}_{i}^{+}\right)]\right]]}^{-1},$$

(4)

where ${Y}_{i}^{c}$ refers to all elements of *Y* other than the *i*th, ${Y}_{i}^{+}$ refers to the vector *Y* such that the *i*th entry is set to 1, and ${Y}_{i}^{-}$ refers to the same vector with the *i*th entry set to 0. (We have here suppressed the support term for clarity of notation: $\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{i}^{c}={y}_{i}^{c})$ is obviously equal to 0 ${I}_{{\mathcal{Y}}_{N}}\left({y}_{i}^{+}\right)=0$, and to 1 if ${I}_{{\mathcal{Y}}_{N}}\left({y}_{i}^{-}\right)=0$.)

To see how we may leverage this property, let us consider Equation 2. While we cannot in general compute the marginal elements of this expression (the *i*th of which is a sum over a set of states that grows exponentially as 2^{N–i}), we can *bound* them using the full conditionals of Equation 4. Specifically, let ${Y}_{<i}^{c}$ refer to the elements *Y*_{1}, . . . , *Y*_{i–1} of *Y*. By definition,

$$\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{<i}^{c}={y}_{<i}^{c})=\sum _{{y}^{\prime}\in {\mathcal{Y}}_{n}:{y}_{<i}^{\prime c}={y}_{<i}^{c}}\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{i}^{c}={y}_{i}^{\prime c})\mathrm{Pr}({Y}_{i}^{c}={y}_{i}^{\prime c}\mid \theta ,{Y}_{<i}^{c}={y}_{<i}^{c}).$$

(5)

The marginal probability that *Y _{i}* will take state 1 (versus 0) given the states of the previous elements is thus a convex combination of the full conditionals for

This last observation has been used in prior work by Butts (2011) to provide bounds on the behavior of random graph models. It can also be used, however, as the basis for a novel approximate sampling scheme for any finite binary vector *Y* that differs considerably from the standard MCMC approach. We now turn to a description of this algorithm.

We begin by describing the approach in intuitive form, subsequently turning to the issue of efficient implementation. Let *R* = *R*_{1}, . . . , *R _{N}* be a vector of iid uniform random variables on the unit interval. Equation 2 suggests an obvious (exact) sampling scheme for

Our base algorithm is fairly straightforward: we advance through the elements of *y*, fixing each element as we proceed. (The order in which elements of *y* are visited is arbitrary, a fact that we exploit in section 2.2.1.) This process is shown in detail in Algorithm 1. Effectively, the operations performed when setting each *y _{i}* consist of three steps:

After initialization, we proceed to the fixation step (lines 5-8). Here, we exploit the fact that ${\alpha}_{i}^{\prime}\le {\beta}_{i}\le {\gamma}_{i}^{\prime}$ to immediately recognize that *y _{i}*|

As we visit each element of *y* once, the complexity of Algorithm 1 is clearly $\mathcal{O}\left(Nf\left(N\right)\right)$, where *f*(*N*) is the worst-case complexity of the initialization step (i.e., determining the $({\alpha}_{i}^{\prime},{\gamma}_{i}^{\prime})$ bounds). In many cases (including network models based on degree, reciprocity, and mixing terms), this step can be performed in constant time, leading to a worst-case complexity that is linear in the number of variables. As this is also the worst-case complexity for simulation of *N* inhomogeneous Bernoulli trials, Algorithm 1 can clearly achieve optimal performance in such settings. Also, unlike MCMC algorithms (which must be run for an unknown period to obtain convergence), the execution time for Algorithm 1 is fixed and independent of the distributional properties of *Y*. These are both attractive features for applications requiring user-unsupervised sampling (e.g., normalizing factor computation), wherein convergence assessment is difficult and predictable execution time is highly desirable.

Various modifications and/or refinements of Algorithm 1 are possible, either to enhance speed or to improve the quality of the resulting draws. Here, we discuss several such refinements, before turning to the problem of quality assessment.

One observation that can prove useful when constructing refinements to the sampling process of Algorithm 1 is the fact that the order in which the elements of *Y* are given in Equation 2 (and hence the sampling algorithm) is arbitrary; we can thus re-order the sequence in which we visit the elements of *Y* within the sampling algorithm, so long as our choice does not depend on the realizations of *R* or previous elements of *Y*. A straightforward application of this idea can be used for quality improvement when the elements of *Y* are heterogeneous, with some close *ex ante* to fixation. Specifically, we modify Algorithm 1 by adding an *ordering step* prior to the main loop, in which we sort the elements of *Y* in ascending order by *γ – α* . This results in an algorithm that begins by visiting the elements of *Y* most likely to be fixed, increasing the probability that the state of later elements will be decided by *Y* states assigned through a fixation cascade involving only *R*-fixed elements, rather than by a cascade “contaminated” by perturbation. In the general case, this results in a total complexity $\mathcal{O}(Nf\left(N\right)+N\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}N)$ for the sampling algorithm; where *f*(*N*) grows more slowly than log *N*, the pre-sorting operation will dominate the worst-case complexity of the sampler. Where *f*(*N*) grows more quickly, by contrast, the complexity of the pre-sort is on the same order as the complexity of the sampler itself, and cost is not a substantial consideration in its use.

Another family of refinements that can be employed when most elements of *α* are large (close to 1) or most elements of *γ* are small (close to 0) is to take the elements of *y* equal to 1 (respectively 0) by default, and explicitly evaluate only elements at high risk of taking the alternate state. (This is a technique used in the simulation of sparse Bernoulli graphs, where the expected number of *Y _{i}* equal to 1 is often $\mathcal{O}\left(\sqrt{N}\right)$; see, e.g. Batagelj and Brandes (2005).) Consider, for example, the “sparse” case in which all

As noted in Section 2, when fixation fails we must approximate the true probability that *Y _{i}* = 1,

Our default method shown in Algorithm 1 is equivalent to treating *β _{i}* as entirely unknown, save for the fact that it lies within the interval $({\alpha}_{i}^{\prime},{\gamma}_{i}^{\prime})$. In practice, however, we are not entirely ignorant regarding

To formalize this idea, let the pmf *f* represent our “prior” beliefs about an arbitrary *β _{i}*. (We will here assume that our “priors” are exchangeable; the term is used in quotation marks to emphasize that our use here is heuristic, and not assumed to satisfy the criteria for a proper Bayesian analysis.) Imagine that we subsequently learn that ${\alpha}_{i}^{\prime}\le {\beta}_{i}\le {\gamma}_{i}^{\prime}$. The corresponding “posterior” density for

$$\begin{array}{cc}\hfill p({\beta}_{i}=z\mid {\alpha}_{i}^{\prime},{\gamma}_{i}^{\prime})& \propto {I}_{\mathbb{R}\ge {\alpha}_{i}^{\prime}}\left(z\right){I}_{\mathbb{R}\le {\gamma}_{i}^{\prime}}\left(z\right)f\left(z\right)\hfill \\ \hfill =& \frac{f\left(z\right)}{{\int}_{{\alpha}_{i}^{\prime}}^{{\gamma}_{i}^{\prime}}f\left(x\right)dx}{I}_{x\in \mathbb{R}:{\alpha}_{i}^{\prime}\le x\le {\gamma}_{i}^{\prime}}\left(Z\right)\hfill \\ \hfill & =\frac{f\left(z\right)}{F\left({\gamma}_{i}^{\prime}\right)-F\left({\alpha}_{i}^{\prime}\right)}{I}_{x\in \mathbb{R}:{\alpha}_{i}^{\prime}\le x\le {\gamma}_{i}^{\prime}}\left(z\right),\hfill \end{array}$$

where *I* is defined as before as a set indicator function, and *F* is the prior cdf of *β _{i}*. For random input

$$\mathrm{Pr}({r}_{i}<{\beta}_{i}\mid {\alpha}_{i}^{\prime},{\gamma}_{i}^{\prime})=\{\begin{array}{cc}1\hfill & \text{if}\phantom{\rule{thickmathspace}{0ex}}{r}_{i}<{\alpha}_{i}^{\prime}\hfill \\ \frac{F\left({r}_{i}\right)-F\left({\alpha}_{i}^{\prime}\right)}{F\left({\gamma}_{i}^{\prime}\right)-F\left({\alpha}_{i}^{\prime}\right)}\hfill & \text{if}\phantom{\rule{thickmathspace}{0ex}}{\alpha}_{i}^{\prime}\le {r}_{i}<{\gamma}_{i}^{\prime}\hfill \\ 0\hfill & \text{if}\phantom{\rule{thickmathspace}{0ex}}{r}_{i}\ge {\gamma}_{i}^{\prime}\hfill \end{array}\phantom{\}},$$

with the middle term being equal to ${\int}_{{\alpha}_{i}^{\prime}}^{{r}_{i}}p({\beta}_{i}=z\mid {\alpha}_{i}^{\prime},{\gamma}_{i}^{\prime})dz$, as derived above.

Taking *f* to be the uniform distribution on (0, 1) gives the default perturbation method. However, there is no reason that we must employ this distribution. For instance, a natural alternative choice would be Beta(*δk,* (1 – *δ*)*k*), where *δ* is the “1-density” (or an approximation thereto) and *k* is a user-selected dispersion parameter. In a conventional Bayesian setting, *k* = 2 provides a prior weight equal to that of a uniform distribution, with *k* = 1 being the weight of the Jeffreys prior for a binomial likelihood. (Specifically, both of these distributions can be realized by the appropriate choice of *k* for *δ* = 0.5.) Here, our selection is heuristic, and should be used to produce threshold distributions that have reasonable behavior when the bounding method fails. In the absence of additional information, small *k* (e.g., ≈ 1) with *δ* chosen as above leads to a distribution that is relatively flat in its middle range, but otherwise weighted heavily towards extreme thresholds (in the direction of *δ*). For models that combine low/high baseline probabilities of *Y _{i}* = 1 with (respectively) strong excitory/inhibitory dependence effects, the resulting threshold probabilities are likely to be much better approximations of the true

As a final comment on this technique, it should be noted that additional information on the distribution of *β* can gleaned in settings for which some observation from *Y* (or an approximation thereto) is available. A typical case is that in which *Y*|*θ* is chosen as a data model for observation *y*, which (to the extent that the model is adequate) can then be viewed as an approximate draw from the random process. In this latter case, Equation 4 gives a direct method for evaluating the full conditionals for a single *Y* element, given the rest of the realized process. Although the full conditionals for a single realization are not equal to the true *β* values for that realization, they nevertheless provide a rough approximation thereto. (This is, indeed, the basis for the pseudo-likelihood approximation of Besag (1974).) Inspection of the properties of the full conditional distribution may thus prove useful in choosing *f*, e.g., by suggesting good approximate *δ* values and/or choices for *k* in the Beta case, and/or alternative distributional forms. Since *f* is only used when the fixation process fails, its selection will have little impact for models requiring little perturbation. For less constrained models, however, intelligent approximation of *β* may greatly aid performance. Fortunately, one virtue of the sampling strategy employed here is that we may predict this degree of constraint on an *ex ante* basis. We turn to a discussion of this problem in Section 3.

As shown in Equation 5, the marginal probability $\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{<i}^{c}={y}_{<i}^{c})$ is a convex combination of the full conditionals of *Y _{i}*, mixed over the possible realizations of

$${\widehat{\mu}}_{i}\left({y}_{<i}^{c}\right)=\frac{\sum _{j=1}^{m}\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{i}^{c}={\left({y}^{\left(j\right)}\right)}_{i}^{c}){I}_{\{{y}^{\prime}\in {\mathcal{Y}}_{N}:{y}_{<i}^{\prime c}={y}_{<i}^{c}\}}\left({y}^{\left(j\right)}\right)}{{\sum}_{j=1}^{m}{I}_{\{{y}^{\prime}\in {\mathcal{Y}}_{N}:{y}_{<i}^{\prime c}={y}_{<i}^{c}\}}\left({y}^{\left(j\right)}\right)},$$

(6)

with convergence occurring in the limit as *m* → *∞*. While we do not have an exact sample from *Y*, we may instead consider a set of draws from some random variable *Y′* whose distribution is close to *Y*; we refer to the calculation of $\widehat{\mu}$ under the resulting sample as “pseudo-marginalization.” Intuitively, if *Y′* is a reasonable approximation to *Y* (and/or if ${\widehat{\mu}}_{i}\left({y}_{<i}^{c}\right)$ is relatively insensitive to ${y}_{\ge i}^{c}$), then the pseudo-marginalized estimator of $\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{<i}^{c}={y}_{<i}^{c})$ (hence *β _{i}*) may be fairly close to the true value. Using the estimated

The procedure for pseudo-marginalization is as follows. We begin by positing a distribution *Y′*, and drawing an initial *marginalization sample M* = (*y′*^{(1)}, . . . , *y′*^{(m)})*, y′*^{(i)} ~ *Y′*. Ideally, *Y′* should be as close to *Y* as possible, while still being easily simulated. In practice, we suggest starting with an independent Bernoulli model, calibrated to preserve overall average features of *Y* (e.g., frequency of 1s, marginal covariate relationships, etc.). We then proceed with Algorithm 1, with the following simple modifications. Where fixation occurs (via the bounds), pseudo-marginalization is not employed. In the event that perturbation is necessary, the *τ* assignment of line 10 is replaced by $\tau \u2254{\widehat{\mu}}_{i}\left({y}_{<i}^{c}\right)$ per Equation 6, with *M* being used as the set of sample draws. Finally, once an assignment is made to *y _{i}* (either by perturbation or fixation), we set

Although pseudo-marginalization does not improve the guaranteed quality of the sampler, it can in practice have a substantial effect. The total increase in computational and storage cost is proportional to *m*, and is thus a constant; however, this may still prove burdensome for large problems. In cases where dependence between elements is strong (especially if *N* is not too large), pseudo-marginalization can be a relatively “cheap” way of improving performance.

One useful property of the construction of Section 2 is that it provides a very direct means of assessing the quality of the approximate sample (both ex post and ex ante). To make this notion precise, define *y ^{t}* to be the unique draw from

While we do not know *Q*(*y*) exactly, we can provide a lower bound by exploiting the properties of the fixation cascade. Consider some element *y _{i}* of

$${\alpha}_{i}^{\u2033}=\underset{{y}^{\prime}\in {\mathcal{Y}}_{N}:{y}_{<i}^{\prime h}={y}_{<i}^{h}}{\mathrm{min}}\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{i}^{c}={y}_{i}^{\prime c})$$

and

$${\gamma}_{i}^{\u2033}=\underset{{y}^{\prime}\in {\mathcal{Y}}_{N}:{y}_{<i}^{\prime h}={y}_{<i}^{h}}{\mathrm{max}}\mathrm{Pr}({Y}_{i}=1\mid \theta ,{Y}_{i}^{c}={y}_{i}^{\prime c})$$

in direct analogy to the $({\alpha}_{i}^{\prime},{\gamma}_{i}^{\prime})$ pair of Algorithm 1. If ${r}_{i}<{\alpha}_{i}^{\u2033}$ or ${r}_{i}\ge {\gamma}_{i}^{\u2033}$, then we may fix the value of *y _{i}* by conditioning only on prior elements that are known to be equal to their corresponding elements in

Although it should be noted that the fixation under $({\alpha}_{i}^{\u2033},{\gamma}_{i}^{\u2033})$ required for *h* is stronger than the associated condition of Algorithm 1 (and hence one cannot simply set *h _{i}* = 1 where the latter occurs), computation of

Ex post evaluation is useful, in that it provides us with a sense of how well the algorithm performed in a given instance. Ex ante evaluation, however, is also useful in determining whether the sampling algorithm is likely to prove successful. In this regard, note that we can place an immediate bound on the expected value of *Q*(*Y*) by exploiting the initial fixation condition: specifically, $\mathbf{E}Q\left(Y\right)\ge {\scriptstyle \frac{1}{N}}{\sum}_{i=1}^{N}(1-{\gamma}_{i}+{\alpha}_{i})$. This may, of course, understate the quality of the approximation method in systems that are prone to long fixation cascades; typically, however, one would expect such systems to have fairly small *γ* – *α* intervals for most elements (since cascades are otherwise difficult to propagate), and thus to have reasonably large quality bounds. Regardless, a distribution with very high ex ante quality bounds is likely to be effectively approximated by the algorithm, and high **E***Q* may thus be taken as diagnostic of situations in which use of the procedure is fairly safe.

While the above provide overall indicators of quality, it is apparent that we can do better: since we have, for each element, both the ex ante bound on its fixation (1 – *γ _{i}* +

As a final note, it must be emphasized that the ex post diagnostics should *not* be used to guide the sampling process itself (e.g., by repeatedly sampling until a high-quality draw is obtained). This is because *Q*(*y*) is not in general independent of *y ^{t}*, and the two may in fact be strongly related. (Consider, e.g., the case of simulating a random adjacency matrix with dependence induced by a triangle term. Extremely sparse realizations will have few potential two-paths, leading to long initial fixation cascades, and subsequently higher quality. Selecting for high quality draws will thus lead to a bias in favor of low-density, unclustered graphs – quite the opposite of what the model is intended to produce!) On the other hand, quality diagnostics can be safely used to place bounds on the possible states of

Where the lower bound on quality, , is high, it is evident that the bound sampler is close to exact; however, the bound sampler may still perform well even when is relatively low. In general, assessing performance in such cases requires comparison with high-quality MCMC runs (as shown in section 5.1). In a few cases, however, one can posit model families with edgewise dependence whose behavior can be solved exactly; in these cases, we may compare the bound sampler (and MCMC) directly to the behavior of an exact sampler.

Here we provide an example of such an analysis, using an exponential family form of the well-known *U*|*man* model (Wasserman and Faust, 1994). The *U*|*man* (standing for “uniform, given probabilities of mutual, asymmetric, and null dyads”) is a homogeneous random digraph family that incorporates biases in mean density and reciprocity. In exponential family form, this distribution may be written as

$$\mathrm{Pr}(Y=y\mid \theta )=\frac{\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}({\theta}_{\mathrm{e}}{t}_{\mathrm{e}}\left(y\right)+{\theta}_{\mathrm{m}}{t}_{\mathrm{e}}\left(m\right))}{{[1+2\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}\left({\theta}_{\mathrm{e}}\right)+\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}(2{\theta}_{\mathrm{e}}+{\theta}_{\mathrm{m}})]}^{\left({\scriptstyle \begin{array}{c}\hfill n\hfill \\ \hfill 2\hfill \end{array}}\right)}},$$

(7)

where *Y* is a directed adjacency matrix of dimension *n* × *n*, *t*_{e} is the count of edges within *Y*, *t*_{m} is the count of mutual dyads (i.e., node pairs *i, j* such that *Y _{ij}* =

A first natural question to ask of the *U*|*man* system is how (unperturbed) fixation probability varies with *θ*. Because this model is dyad independent, it suffices to consider the case of a representative *i, j* dyad. Per Algorithm 1, either (*i, j*), (*j, i*), or both may be initially fixed; if exactly one is initially fixed, then the other will necessarily be fixed by a cascade. (This last follows immediately from the fact that the probability of an (*i, j*) edge depends only on the state of the corresponding (*j, i*) edge.) It follows, then, that the probability that the (*i, j*) edge will be neither directly nor indirectly fixed is equal to the probability that neither (*i, j*) nor (*j, i*) is initially fixed. Let *k* be the position of the (*i, j*) edge variable in the Algorithm 1 sort order. By homogeneity, the non-fixation probability is then equal to

$$\begin{array}{cc}\hfill {({\gamma}_{k}-{\alpha}_{k})}^{2}=& {\left(\frac{1}{1+\mathrm{exp}(-{\theta}_{\mathrm{e}}-{\theta}_{\mathrm{m}})}-\frac{1}{1+\mathrm{exp}(-{\theta}_{\mathrm{e}})}\right)}^{2}\hfill \\ \hfill =& 1-\widehat{Q}\hfill \end{array}$$

(8)

(where we have exploited the symmetry between *α* and *γ* with respect to the sign of *θ*_{m} and the square to write the two terms in the above difference in arbitrary order).

To provide an intuitive sense of how this varies in practice, Figure 1 shows (scaled to percentages for ease of display) as a function of *θ*_{e} and *θ*_{m}. As can be seen, fixation is near total except with respect to two “lobes” of high |*θ*_{m}|, one occurring where the mutuality effect is strongly positive and the edge effect is somewhat negative and the other occurring where the mutuality effect is strongly negative and the edge effect is somewhat positive. These cases (with a strong mutuality effect that is largely balanced by the baseline density) are those that maximize the *γ* – *α* gap, and hence that minimize .

To get a better sense of how performance varies above and beyond what can be guaranteed by , we here show a simple simulation study comparing mean edge, mutual, and triangle counts under the bound sampler to their *U*|*man* expectations. By way of comparison to standard methods, we also show the performance of a basic (single random edge update Metropolis) MCMC algorithm with various choices of burn-in time. For all simulation methods, we estimate the expected subgraph counts using the sample means from 1,000 independent draws; to ensure independence, each MCMC-based draw was taken from an independently run Markov chain seeded with an iid uniform graph and simulated forward for the indicated number of iterations.^{2} Perturbation for the bound sampler was performed using the pseudo-Bayesian method of Section 2.2.3, with the pseudo-prior being a beta distribution with expectation equal to the expected density and a total concentration (i.e., parameter sum) of 6 (chosen to produce unimodality while still being diffuse). The simulation target in each case was a *U*|*man* distribution for a digraph of order *n* = 25, with *θ*_{e} and *θ*_{m} each systematically varied in an 11-point sequence from −3 to 3. The parameter range was selected to cover density levels from the nearly empty (mean degree less than 1) to the nearly complete (mean degree greater than 23), with a corresponding range in tendencies towards or away from mutuality. MCMC samples were taken from chains of length *n* (25), *n*^{2} (625), and *n*^{3} (15,625).

To assess simulation quality, we compare the mean counts for edges, mutuals, and triangles under each sampling scheme to the exact values from the corresponding *U*|*man* distribution; we employ the mean absolute errors in these counts at each parameter value as our performance criterion. Figures 2--44 show the corresponding error surfaces (darker values indicate greater deviation) along with the mean absolute errors at each simulated value (rounded to the nearest integer). As a reference, the maximum numbers of edges, mutuals, and triangles for these digraphs are respectively 600, 300, and 2,300.

Mean absolute errors (rounded) in expected edge count for bound and MCMC samplers as a function of edge and mutuality parameters (edge/mutual model). Shading interpolates linearly from white (no error) to black (maximum observed error).

Mean absolute errors (rounded) in expected triangle count for bound and MCMC samplers as a function of edge and mutuality parameters (edge/mutual model). Shading interpolates linearly from white (no error) to black (maximum observed error).

Figure 2 shows the error in the expected edge count (and hence density) under each method. As expected, the “gold standard” MCMC with a chain length that is long compared to the number of edges (bottom right) produces nearly flawless performance over the entire parameter space. On the other hand, the bound sampler (top left) also does quite well: it shows no or very minimal errors over much of the parameter space, with error rates that are still quite modest (< 4%) in the regions of worst performance. Although error is obviously related to (errors cannot be large when is high), the sampler can still perform well when is lower: for instance, the region near *θ*_{e} (–2, –1), *θ*_{m} ≈ 3 (and its mirror image) has the worst quality bounds over the parameter range considered, but the simulated densities here are quite accurate. It is interesting to contrast this performance with that of Markov chains that are poorly mixed. When chain length is on the order of the number of nodes (top right panel), densities are generally far from their target values. This is not surprising (given that the chain cannot update many edges), but serves as a cautionary. Perhaps more significant is that chain lengths on the order of the number of edge variables (bottom left) are still quite insufficient to yield reasonable results. While intervals much greater than *n*^{2} are easy to achieve for small graphs like those used here, typical default chain lengths for current ERGM MCMC tools (e.g., **ergm** v3.1) will prove inadequate for approximately *n* ≥ 100.

Mean absolute errors for for mutuality (number of mutuals) are shown in Figure 3. The results here largely mirror those of Figure 2: very long-chain (i.e., $\gg {n}^{2}$) MCMC performs flawlessly, the bound sampler works well (with modest errors over certain parts of the parameter space), and shorter-chain MCMC runs do very poorly.

Mean absolute errors (rounded) in expected mutual count for bound and MCMC samplers as a function of edge and mutuality parameters (edge/mutual model). Shading interpolates linearly from white (no error) to black (maximum observed error).

Finally, we consider the propagation of simulation error into higher order structures, by way of the triangle count. As Figure 4 shows, the results in this case are not qualitatively different from those shown in the other figures: we continue to see strong performance from the bound sampler and long-chain MCMC simulations, and poor performance for shorter MCMC runs. In particular, we note that we do not see a tendency for biases in mutuality rate to propagate destructively into the triangle formation rate for the bound sampler, which continues to show error rates of 4% or less across the sampled parameter space. While the bound sampler is indeed approximate, we thus find that it here approaches the performance of high-quality MCMC simulations even for the parts of the parameter space for which is not provably near-exact.

As a final comment on these results, it should be noted that the edge/mutual model allows constant-time initialization, resulting in a worst-case time complexity of order *N* (in terms of graph size, *n*^{2}). The above experiments suggest that MCMC runs of substantially higher time complexity (e.g., as much as *n*^{3}) are necessary to obtain superior results for this system in the general case. Obviously, the implications of this for actual practice are not direct—in a practical setting, we could for instance enhance MCMC convergence by using a seed draw from a distribution close to the target, or employ an alternative proposal distribution, and we could likewise enhance the bound sampler by means of the techniques described elsewhere in this paper—and there is in any event little reason to choose either the bound sampler or MCMC over exact sampling for *U*|*man*! However, these results do demonstrate that there are settings in which the bound sampler can produce draws whose quality would require a much more time-complex MCMC run to match. In light of the fact that mixing times for current MCMC algorithms on ERGMs can be extremely long (Bhamidi et al., 2011), it is reasonable to suspect that more such cases will be found in practice.

As noted above, the bound sampler may be employed in any setting involving a finite series of binary random variables whose joint distribution is specified in exponential family form. Currently, the most active area of such research in sociology centers on social networks, with exponential family models (ERGMs) serving as an increasingly widely used tool for specifying models on graphs. Here, we show two applications of the bound sampler to network simulation, both reflective of typical use cases in the social network field. First, we show how the bound sampler may be used to simulate draws from an ERGM fitted to observed network data (here, a communication network), comparing the results to what would be obtained from standard MCMC methods. Second, we provide an example of exploratory simulation, in which we use the bound sampler to investigate the effect of dependence (specifically, isolation avoidance) on a large-scale spatially embedded network. In this latter case, the network in question is sufficiently large to make MCMC simulation prohibitive, making the bound sampler the only effective means of exploring the impact of edgewise dependence on the structure of the resulting network.

To illustrate the use of the bound sampler as a tool to simulate networks from fitted ERGMs, we consider a simple model fit to an intraorganizational communication network collected by Butts et al. (2007). Specifically, we consider radio communication ties among all named communicants in the Port Authority Trans-Hudson (PATH) control desk network during the emergency phase of the 2001 World Trade Center disaster; each vertex represents an individual, and two individuals are tied if at least one radio call was made between them (in either direction). The resulting network (see Figure 5) contains 229 individuals and 269 edges (mean degree 2.35). As is evident from the figure, this network shows substantial degree heterogeneity, some but not all of which is associated with the presence of individuals occupying institutionalized coordinative roles (e.g., desk operator) within the organization (orange nodes). Although not obvious from visual inspection, this network is also relatively triangulated given its size and density (38 triangles, CUG *p* < 0.001), suggesting an underlying tendency towards triadic closure.

PATH Control Desk communication network. Light colored nodes indicate individuals with institutionalized coordinative roles.

To capture these effects, we fit^{3} a simple ERGM to the PATH network using the following terms: differential mixing rates between and within nodes with institutionalized coordinative roles (ICRs); individual-specific fixed effects for degree (i.e., sociality terms); and a geometrically weighted edge-wise shared partner term (GWESP (Hunter, 2007)) to account for residual transitivity. The coefficients for the simulated model are summarized in Table 1. Note that the apparently negative effect of ICR status on interaction probability must be interpreted in light of the individual-level fixed effects: sociality is *positively* correlated with ICR status (Spearman's *ρ* = 0.33, *p* < 0.0001), and indeed the median sociality effect for those with ICRs is substantially higher than for those without (3.648 vs. 0.002). A more nuanced view is thus that those with ICRs are more likely to be highly active, but ICRs are less likely to mix with each other than might be expected given their overall activity level. Similarly, the negative GWESP effect indicates that triangulation is reduced, but this relative to the level of clustering that would be observed in a random graph with strong degree heterogeneity effects;^{4} although there are more triangles than would be expected due to size and density alone, there are fewer (and they are more dispersed through the graph) than would be expected given the strong variation in individual activity level.

To illustrate the behavior of the bound sampler on the PATH network model, we simulate 500 draws from the fitted ERGM; ex ante fixation probability sorting was used, and perturbations were made using a diffuse Beta prior distribution with expectation equal to the observed graph density. As a point of comparison, we also simulate draws from the “gold standard” MCMC method (Hunter et al., 2008), with a burn-in of 5 × 10^{6} draws and a thinning interval of 5 × 10^{5} (enough to revisit
each edge variable an average of approximately 20 times between each draw; convergence was also verified using standard diagnostics (Goodreau et al., 2008)). Results from both simulations are shown in Figure 6. Our simulation quality index, , suggests that the sampler works well here: on average, approximately 99% of edge variables were subject to unperturbed fixation, with very consistent performance across simulation runs (panel (A)). Although the structure of a sparse graph can be sensitive to small numbers of edge variables, Figure 6 shows that the distributions of edges (panel (B)), triangles (panel (B)), degree scores (panel (C)), and ESP statistics (panel (D)) generated by the bound sampler draws fairly closely approximate the high-quality MCMC sample. Unlike the MCMC sample, however, draws from the bound sampler are guaranteed to be independent, and require visiting each edge variable at most once.

Bound and MCMC sampler results for the PATH network model. (A) Distribution of ex post quality measure () for the bound simulator; ex ante expected quality (here identical for all draws) shown via dotted vertical line. (B) Edge and triangle **...**

Simulated draws like those given here are clearly adequate to approximate model behavior in cases for which MCMC is too computationally expensive, or where independence of draws is a priority (e.g., quadrature problems). Alternately, simulated draws from the bound sampler can be used to produce fully independent “seed” draws for MCMC simulations, allowing shorter MCMC runs and ensuring that the results from each run are independent. This may particularly useful in settings such as goodness-of-fit assessment, in which seeding runs with the observed data is problematic (although widely performed).

Although the bound sampler can be used as a complement to MCMC where both are available, its scalability also allows it to be used to generate networks that are too large to be effectively simulated by current MCMC methods. As an illustration of such an application, we here demonstrate its use to simulate draws from a social network model combining geographical effects and degree distribution bias. The network model we employ is an extension of the “social friendship” model of (Butts, 2002), an inhomogeneous Bernoulli graph model based on friendship data from a housing study by Festinger et al. (1950) used by Butts et al. (2012) to investigate the potential impact of geographical heterogeneity on network structure. Written in conditional ERG form, this model is a curved exponential family (Hunter and Handcock, 2006) specified by

$$\mathrm{Pr}({Y}_{ij}=1\mid \theta ,t,{\mathcal{Y}}_{N},X,{Y}_{ij}^{c}={y}_{ij}^{c})={\left[1+\mathrm{exp}\left[-\mathrm{log}\left(\frac{{(1+\alpha {D}_{ij})}^{\gamma}}{{p}_{b}}-1\right)\right]\right]}^{-1}$$

(9)

where *D _{ij}* is the geographical distance between

$$\mathrm{Pr}({Y}_{ij}=1\mid \theta ,t,{\mathcal{Y}}_{N},X,{Y}_{ij}^{c}={y}_{ij}^{c})={\left[1+\mathrm{exp}\left[-\mathrm{log}\left(\frac{{(1+\alpha {D}_{ij})}^{\gamma}}{{p}_{b}}-1+{\delta}^{T}\left(t\left({y}_{ij}^{-}\right)-t\left({y}_{ij}^{+}\right)\right)\right)\right]\right]}^{-1}$$

(10)

where *t*_{0}, . . . , *t*_{N–1} is a vector of statistics such that *t _{i}*(

Because vertex degrees are not independent of one another, the model of Equation 10 no longer has Bernoulli structure: the conditional probability of observing a particular edge depends upon other edges within the graph. (Specifically, this model exhibits Markov graph dependence; see Frank and Strauss (1986).) The traditional approach to simulation of such models is the use of MCMC (see, e.g. Snijders, 2002), which scales poorly for large networks (Bhamidi et al., 2011). Here, we employ instead our proposed sampling procedure, taking draws from the model of Equation 10 with fixed distance effects and various choices of *δ*. For the baseline parameters, we use the median parameter estimates from Butts (2002), specifically *p _{b}* = 0.529,

Population distribution for Choctaw, MS. Points are placed uniformly by census block (|*V*| = 9, 758).

For the *δ* effects, we here focus on the phenomenon of *isolation avoidance*: where ties are valuable, those who would otherwise be socially isolated may be motivated to seek them (hence leading to a reduced proportion of vertices with degree zero). In ERGM terms, this amounts to setting *δ _{i}* = 0 for

To explore the potential impact of isolation avoidance in the Choctaw model, we take 100 draws using the bound sampler, with *δ*_{0} set respectively to 0, −1, −2, and −4 (reflecting an increasingly sharp penalty for isolate formation). Mean values for selected network statistics are given in Table 2; standard errors (not shown) are less than 1% of the mean value in all cases, and all differences are significant at $p\ll 0.001$. The first column of Table 2 shows the behavior of the baseline model without dependence effects: this is an exact sample (*Q* = 1), in which approximately 32% of nodes are isolated on average (*Isolate Fraction*). The average mean degree in the baseline case is 1.28 (unsurprisingly low, given the large isolate fraction), but the graph is relatively clustered (transitivity of approximately 0.18). As a simple measure of embeddedness in cohesive subgroups, we employ the *mean core number*, which is equal to the mean of the number of the highest-order *k*-core (Wasserman and Faust, 1994) to which each vertex belongs. Vertices with a core number of 0 are isolates; those with a core number of 1 belong to pendant trees, those with a core number of 2 belong to bicomponents, etc. The mean core number under the baseline model is approximately 0.95, reflecting the strong contribution from isolated and pendant vertices in the graph. Finally, we note that–as expected from the spatial model–there is a clear negative relationship between distance and the presence of a tie (expressed as a correlation between edge state and log distance).

How do these properties change when isolate avoidance is added to the model? For this, we look to the succeeding columns of Table 2, which show mean values for the above statistics at increasingly negative parameter values. First, we verify that, as *δ*_{0} falls, the fraction of isolated vertices does indeed decline (dropping from a 32% baseline to just over 1% at *δ*_{0} = –4. Simulation quality (as evaluated via ) remains high throughout, although the fixation rate falls slightly as the dependence among edges is increased (from 1 in the baseline model to approximately 0.994 at *δ*_{0} = –4). With these indications that the sampler is behaving as expected, we proceed to consider the impact of increasing isolation avoidance on other aspects of network structure. For instance, a reduction in isolation could be achieved in practice via two processes: one could see a net increase in edges (with new edges forming from those who would otherwise be isolated to other vertices in the graph), or one could see a reallocation of edges (with an increase in edges to otherwise isolated nodes being compensated by a loss of edges by nodes that would otherwise have higher degree). As the table shows, the former mechanism is active here–mean degree increases as *δ*_{0} falls, implying (given the fixed vertex set) a net increase in the number of edges in the graph. This increase is associated with an increase in the mean core number, reflecting the movement of vertices from the 0-shell (the set containing those in the 0-core but not the 1-core) into shells of higher order.

Does the movement of vertices from an isolated state to a better-connected one reflect embeddedness in dense subgroups, embeddedness in locally tree-like structures, or formation of lone dyads? We can tell from the declining transitivity score that the second of these possibilities is the dominant effect: given that isolates are being removed via edge addition, a declining transitivity score implies the formation of new incomplete 2-paths. Lone pairs do not produce such structures, and hence cannot be the source of the change; nor would the creation of dense, heavily triangulated subgroups. Several conclusions follow. First, since would-be isolates are not primarily pairing with each other, they are necessarily forming ties to vertices that would otherwise have higher degree; suppression of isolates thus has “spillover” effects for higher degree nodes. Second, since said potential isolates are generally embedding in locally tree-like structures, it seems plausible that isolate suppression increases connectivity without a corresponding increase in cohesion (i.e., formation of redundant paths). Intuitively, cohesion is driven heavily by the fraction of nodes having degree at least 2 (since any node of degree less than 2 cannot lie on a cycle), and hence social processes that act primarily to shift vertices from degree 0 to degree 1 will have a very different impact than those that act to increase degree more uniformly.

In addition to asking how the positional structure of the network changes as isolates are suppressed, we can also ask how isolate avoidance alters the spatial structure of the network. From Table 2, we can see that the negative relationship between tie probability and distance inherent in the spatial component of the model persists (with some mild fluctuations) as *δ*_{0} becomes more negative; for a richer view of the relationship, it is useful to examine the detailed distribution of ego-alter distances at each parameter value. Figure 8 shows the cumulative distribution functions (CDFs) for geographical distances among adjacent alters for representative draws from the bound sampler at each value of *δ*_{0} (with the dotted black line showing CDF for distances among all dyads). Consistent with the negative spatial correlation, there is a strong tendency for adjacent alters to be proximate relative to what would be expected under random mixing (all model CDFs are above the baseline). Introducing an isolation penalty, however, greatly increases the number of long-range ties in the network: while 90% of ties are within 100m in the purely spatial model, this drops to 83% for *δ*_{0} = –1, 72% for *δ*_{0} = –2, and 54% for *δ*_{0} = –4. Moreover, the disparity is greatest for edges to alters who are relatively close (e.g., < 1km) while being outside of ego's immediate household area.

Cumulative distance distributions for Choctaw, MS. Dotted black line shows the baseline CDF over all dyads, colored lines show CDFs for distances between adjacent alters for each value of *δ*.

Taken together with the above results, this gives us a more complete understanding of what happens as isolation avoidance increases. Nodes that would otherwise be isolated form ties to others (not merely other isolates) who are farther away than would otherwise be likely, but still favoring those at short distances. These ties tend not to be embedded in locally cohesive structures, but are instead part of tree-like complexes (possibly converting previously isolated vertices into pendents). The result is a graph that is better connected, with slightly greater spatial mixing, but without an increase in overall clustering. (Indeed, the median core number holds constant at 1—and the average maximum at approximately 4—over all simulations, consonant with conclusion that most change involves movement from the 0-shell into the 1-shell.) This is a distinctive pattern of behavior that would be difficult or impossible to explore without the ability to efficiently simulate large networks, and in particular using conventional MCMC-based sampling schemes. The bound sampler gives us the ability to investigate much larger systems than have been accessible in the past.

Like any sampling scheme, the bound sampler has inherent tradeoffs and complexities that may indicate—or contraindicate—its use in particular settings. Before concluding, we comment here on several issues related to the use of the bound sampler in practice, both in the context of network simulation and in other applications.

Section 4 analyzed bound sampler quality in the context of the exactly computable *U*|*man* family; section 5 showed the behavior of the bound sampler in several more realistic applications. Beyond these special cases, what can be said regarding the quality of bound sampler draws in practice? Noting that a lower bound on *Q* can be obtained by the mean distance between and in the absence of any prior fixation, it follows that distributions admitting tight Bernoulli bounds (in the sense of Butts (2011)) will have provably high quality. Butts (2011) discusses various factors influencing bound tightness, the most important of which being the degree of inhomogeneity in baseline probabilities and the degree of coupling between variables. In general, the most loosely bounded distributions are those involving strong global coupling with a high degree of homogeneity; highly inhomogeneous distributions with weaker and/or more local coupling admit tighter bounds. In the context of the bound sampler, the presence of fixation cascades can lead to better performance than a simple Bernoulli bound analysis would suggest, and this should thus be viewed as a sufficient but not necessary condition for high quality draws.

As we have noted throughout, both the bound sampler and MCMC samplers are approximate; MCMC draws can in principle be made arbitrarily close to exact, but in practice this depends on mixing time (which can be quite long) and requires convergence assessment. Where circumstances permit the generation of high-quality MCMC samples (and neither computational costs nor convergence assessment are problematic), MCMC is recommended. This is particularly true when the statistic is low, and bound sampler quality cannot be guaranteed. On the other hand, the bound sampler is recommended as an alternative to MCMC in settings where the latter's mixing time is prohibitively long (due either to coupling time between variables or system size) and the quality of the former is acceptable. Pilot simulations may be required to assess this. The bound sampler is also suggested as a possible alternative to MCMC when independence of draws is a serious concern, and/or where unsupervised execution is a priority (again, assuming that the sample quality is adequate). The third alternative of relatively short MCMC runs seeded with independent bound sampler draws may be an even better option in such cases, and should be considered when neither approach alone appears to produce adequate results.

One exception to the above guidance is when the objective is to *bound* a statistic on *Y* (e.g., place an upper bound on the expected number of triangles formed within a network). In this case, the bound sampler allows provable bounds on e.g., the noncentral moments or quantiles of the statistic in question from an exact sample, a property that MCMC does not share. In these applications, the bound sampler may be preferred to MCMC even when the latter produces higher quality draws overall.

Although we have focused here on its use in the context of ERGM simulation, the bound sampler can be used in any setting in which one seeks to simulate binary vectors and can represent one's target distribution in exponential family form. One such setting is the modeling of influence on binary attitudes (approve/disapprove, agree/disagree) or behaviors within a fixed network or other social setting; the exponential family models employed by Holyst et al. (2001) to model social impact dynamics (Latané, 1981; 1996) are one example of the types of influence process to which the bound sampler could be applied. Other settings lending themselves naturally to representation as dependent binary processes are modeling of dichotomous voting systems (for/against) and attendance or participation by persons or organizations at events. Examples of such processes have been famously studied e.g. by Coleman (1964); Granovetter (1978), and exponential families for the latter have recently been used in dynamic network models (see, e.g. Almquist and Butts, 2014) but could easily be treated in their own right.

Finally, it should be noted that, while we have here been concerned exclusively with binary vectors, generalizations of the bound sampler to more general contexts are possible. For instance, consider the case in which we seek to simulate a vector, *Z* = *Z*_{1}, . . . , *Z _{N}*, such that

In this paper, we have introduced a novel alternative sampling algorithm for binary vectors with fixed execution time and quality guarantees. This approach stands in contrast to MCMC methods, which have variable convergence time (taking “convergence” to be defined in pragmatic rather than ideal terms) and which cannot be deterministically related ex ante to an exact sampler. Our approach is based on sequential approximations to the partial marginals of the joint distribution, and exploits the fact that bounds on these marginals can often be used to exactly determine the “correct” state of a given random variable (vis a vis an exact sampler to which our sampler is stochastically coupled). Where the state of the corresponding exact sampler is uncertain, the elements of the simulated draw that are fixed (i.e., known to match the exact draw) can still be used to place bounds on the properties of the vector as a whole. (For instance, the total number of 1s in the exact draw must be at least the number of 1s in the fixed portion of the simulated draw, and no more than this number plus the number of unfixed elements.) These properties make the bound sampler a potentially useful alternative to MCMC in unsupervised settings, or when principled bounds on a random quantity are more important than exact estimates.

Although there are many problems that can be expressed in terms of random binary vectors, the most active in current sociological research is the simulation of random graphs (a fundamental task for both theory and empirical investigation in the study of social networks). To that end, we here illustrated the use of the bound sampler to simulate social networks from both empirically fitted ERGMs and from exploratory models. In the former case, we showed that the bound sampler can provide a reasonable proxy for MCMC-based draws, with the added benefits of fixed execution time and quality guarantees mentioned above. In the latter, we showed that the superior scalability of the bound sampler allowed it to be used to simulate very large networks, where current MCMC implementations are too slow to be useful. Given the increasing importance of large-scale networks within sociological research, there is a substantial need for principled techniques for statistical simulation of networks with tens of thousands (or more) vertices; the bound sampler would seem to have considerable virtue in such applications.

Finally, we note that there are many potential extensions and refinements of our basic algorithm (some of which were discussed in Section 2.2). While a drawback of our method is that, unlike the case of MCMC, one cannot improve simulation quality simply by running it longer, techniques such as pseudo-marginalization can be used to improve quality where necessary. Further research on when and where such methods can be used, and on further techniques to enhance simulation quality, would seem to be an important direction.

^{1}This research was supported in part by ONR award N00014-08-1-1015, NIH award 1R01HD068395-01, and NSF awards BCS-0827027 and IIS-1251267.

^{1}It should be borne in mind that *β* is a random vector, unlike the fixed vectors of *α* and *γ*. However, each *β _{i}* depends only on the outcomes preceding it, and is bounded by

^{2}Simulation was performed using the **sna** (Butts, 2008) and **ergm** (Hunter et al., 2008) libraries.

^{3}All MCMC-based simulation and inference was performed using **ergm** (Hunter et al., 2008) and the **statnet** software suite (Handcock et al., 2008).

^{4}Ceteris paribus, enhanced degree variance leads to clustering, since there are fewer ways to avoid creating triangles when edges are concentrated on a smaller set of nodes than when edges are evenly dispersed through the graph.

- Almquist ZW, Butts CT. Point process models for household distributions within small areal units. Demographic Research. 2012;26(22):593–632.
- Almquist ZW, Butts CT. Logistic network regression for scalable analysis of networks with joint edge/vertex dynamics. Sociological Methodology. 2014;44(1):273–321. [PMC free article] [PubMed]
- Batagelj V, Brandes U. Efficient generation of large random networks. Physical Review E. 2005;71(3):036113–1. [PubMed]
- Besag J. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B. 1974;36(2):192–236.
- Bhamidi S, Bresler G, Sly A. Mixing time of exponential random graphs. The Annals of Applied Probability. 2011;21(6):2146–2170.
- Butts CT. Doctoral Dissertation. Carnegie Mellon University; 2002. Spatial Models of Large-scale Interpersonal Networks.
- Butts CT. Social network analysis with sna. Journal of Statistical Software. 2008;24(6)
- Butts CT. Bernoulli graph bounds for general random graphs. Sociological Methodology. 2011;41:299–345.
- Butts CT, Acton RM. Spatial modeling of social networks. In: Nyerges T, Couclelis H, McMaster R, editors. The SAGE Handbook of GIS and Society Research. SAGE Publications; 2011. pp. 222–250. chapter 12.
- Butts CT, Acton RM, Hipp JR, Nagle NN. Geographical variability and network structure. Social Networks. 2012;34:82–100.
- Butts CT, Petrescu-Prahova M, Cross BR. Responder communication networks in the World Trade Center Disaster: Implications for modeling of communication within emergency settings. Journal of Mathematical Sociology. 2007;31(2):121–147.
- Coleman JS. Introduction to Mathematical Sociology. Free Press; New York: 1964.
- Festinger L, Schachter S, Back K. Social Pressures in Informal Groups. Stanford University Press; Stanford, California: 1950.
- Frank O, Strauss D. Markov graphs. Journal of the American Statistical Association. 1986;81:832–842.
- Gelman A. Inference and monitoring convergence. In: Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov Chain Monte Carlo In Practice. Chapman and Hall/CRC; Boca Raton, FL.: 1996.
- Goodreau SM, Handcock MS, Hunter DR, Butts CT, Morris M. A statnet tutorial. Journal of Statistical Software. 2008;24(9) [PMC free article] [PubMed]
- Granovetter M. Threshold models of collective behavior. American Journal of Sociology. 1978;83:1420–1443.
- Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M. statnet: Software tools for the representation, visualization, analysis and simulation of network data. Journal of Statistical Software. 2008;24(1):1–11. [PMC free article] [PubMed]
- Holland PW, Leinhardt S. An exponential family of probability distributions for directed graphs (with discussion). Journal of the American Statistical Association. 1981;76(373):33–50.
- Holyst JA, Kacperski K, Schweitzer F. Social impact models of opinion dynamics. Annual Reviews of Computational Physics. 2001;9:253–273.
- Hunter DR. Curved exponential family models for social networks. Social Networks. 2007;29:216–230. [PMC free article] [PubMed]
- Hunter DR, Handcock MS. Inference in curved exponential family models for networks. Journal of Computational and Graphical Statistics. 2006;15:565–583.
- Hunter DR, Handcock MS, Butts CT, Goodreau SM, Morris M. ergm: A package to fit, simulate and diagnose exponential-family models for networks. Journal of Statistical Software. 2008;24(3) [PMC free article] [PubMed]
- Latané B. The psychology of social impact. American Psychologist. 1981;36(4):343–356.
- Latané B. Dynamic social impact: The creation of culture by communication. Journal of Communication. 1996;46(4):13–25.
- Morris M, Handcock MS, Hunter DR. Specification of exponential-family random graph models: Terms and computational aspects. Journal of Statistical Software. 2008;24(4):1–24. [PMC free article] [PubMed]
- Robins GL, Pattison PE, Elliott P. Network models for social influence processes. Psychometrika. 2001;66:161–190.
- Robins GL, Pattison PE, Woolcock J. Small and other worlds: Network structures from local processes. American Journal of Sociology. 2005;110(4):894–936.
- Skvoretz J, Fararo TJ, Agneessens F. Advances in biased net theory: Definitions, derivations, and estimations. Social Networks. 2004;26:113–139.
- Snijders TAB. Markov chain Monte Carlo estimation of exponential random graph models. Journal of Social Structure. 2002;3(2)
- Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge University Press; Cambridge: 1994.
- Wasserman S, Robins GL. An introduction to random graphs, dependence graphs, and
*p**. In: Carrington PJ, Scott J, Wasserman S, editors. Models and Methods in Social Network Analysis. Cambridge University Press; Cambridge: 2005. pp. 192–214. chapter 10.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |