Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2827882

Formats

Article sections

- Abstract
- 1 Introduction
- 2 The Latent Cluster Random Effects Model for Social Networks
- 3 Estimation
- 4 Examples
- 5 Discussion
- References

Authors

Related links

Soc Networks. Author manuscript; available in PMC 2010 July 1.

Published in final edited form as:

Soc Networks. 2009 July 1; 31(3): 204–213.

PMCID: PMC2827882

NIHMSID: NIHMS106146

Pavel N. Krivitsky,^{*,}^{1,}^{2,}^{3} Mark S. Handcock,^{1,}^{3} Adrian E. Raftery,^{4,}^{3,}^{5} and Peter D. Hoff^{3,}^{6}

University of Washington, Seattle

See other articles in PMC that cite the published article.

Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actor degrees. We propose a latent cluster random effects model to represent all of these features, and we describe a Bayesian estimation method for it. The model is applicable to both binary and non-binary network data. We illustrate the model using two real datasets. We also apply it to two simulated network datasets with the same, highly skewed, degree distribution, but very different network behavior: one unstructured and the other with transitivity and clustering. Models based on degree distributions, such as scale-free, preferential attachment and power-law models, cannot distinguish between these very different situations, but our model does.

Social network data consist of data about pairs of actors or nodes. Often these data represent the presence, absence or value of a relationship between pairs of actors, such as liking, respect, familial relationship, shared membership in a group of individuals, or volume of trade for collectivities such as countries or companies. In this article we primarily consider binary social network data, representing presence or absence of a relationship, and count data, representing the number of times a relationship between a pair of actors was observed. The methods we develop can also be extended to accomodate other types of relational data.

Much social network data share a number of features. One of these is *transitivity*, for example the fact that if actor *A* relates to actor *B* and actor *B* relates to actor *C*, then actor *A* is more likely to relate to actor *C*. Another is *homophily on observed attributes*, according to which actors with similar characteristics are more likely to relate. A third feature is *clustering*, in which actors cluster into groups such that ties are more dense within groups than between them. This can be due to social self-organization or to homophily on unobserved attributes, such as interest in the same sport, about which the analyst might not have information. A fourth feature is *degree heterogeneity*, namely the tendency of some actors to send and/or receive links more than others.

Hoff, Raftery, and Handcock (2002) proposed the latent space model for social networks. This postulates an unobserved Euclidean social space in which each actor has a position. The probability of a link between pairs of actors depends on the distance between them in the space and on their observed characteristics. Estimation of the model involves estimating both the latent positions and the parameters of the model specifying how the probability of a link depends on distance and observed attributes. This accounts for transitivity automatically through the latent space and is flexible enough to include the other common features of social network data also. This model was extended by Handcock, Raftery, and Tantrum (2007) — hereafter HRT — to include model-based clustering of the latent space positions, giving a way to detect groups of actors. Hoff (2005) added random sender and receiver effects to model inhomogeneity of the actors, similar to those in the *p*_{2} model (van Duijn, Snijders, and Zijlstra, 2004), and described its generalized linear model formulation, applying it to non-binary data.

No model so far proposed has modeled all the four common features of social network data that we mentioned above. In this paper, we propose the Latent Cluster Random Effects Model, which explicitly models all four features by adding the random sender and receiver or sociality effects as proposed by Hoff (2005) to HRT's latent position cluster model. We apply it to count data as well as binary network data.

In Section 2, we introduce the latent cluster random effects model. In Section 3, we describe our Bayesian method for estimating it using Markov chain Monte Carlo, as well as heuristics for prior and starting value selection. In Section 4 we illustrate the model using two real network datasets, one binary and the other consisting of counts. We also apply our method to two simulated networks with the same, highly skewed degree distribution, but very different network behaviors: one unstructured and the other exhibiting transitivity and clustering. Currently popular methods based on degree distributions cannot distinguish between these situations, but our model does.

We first review the latent position cluster model of HRT and then expand it to allow for actor-specific random effects. The data we model consist of *y _{i,j}*, the value of the relation from actor

The model posits that each actor *i* has an unobserved position, *Z _{i}*, in

$$\text{logit}\phantom{\rule{0.2em}{0ex}}(p({Y}_{i,j}=1|Z,x,\beta ))\equiv {\eta}_{i,j}=\sum _{k=1}^{p}{\beta}_{k}{x}_{k,i,j}-\Vert {Z}_{i}-{Z}_{j}\Vert ,$$

(1)

where logit(*p*) = log(*p*/(1 – *p*)) and *β* denotes a vector of regression parameters to be estimated. The model accounts for transitivity, homophily on the observed attributes *x*, as well potential homophily on unobserved attributes via the latent space. As in HRT, we allow for clustering in the *Z _{i}* via a finite spherical multivariate normal mixture:

$$\begin{array}{cc}{Z}_{i}^{\underset{\sim}{\text{i}.i.d.}}\sum _{g=1}^{G}{\lambda}_{g}{\phantom{\rule{0.2em}{0ex}}\text{MVN}}_{d}\phantom{\rule{0.2em}{0ex}}({\mu}_{g},{\sigma}_{g}^{2}{I}_{d})\phantom{\rule{0.5em}{0ex}}& i=1,\dots ,n,\end{array}$$

(2)

where *λ _{g}* is the probability that an actor belongs to the

To represent heterogeneity in the propensity for actors to form ties not captured by the dyad-level covariates or actor positions, we introduce actor-specific random effects. The nature of the effects differs for directed and undirected relationships. For an undirected relationship, each actor *i* has a latent “sociality” denoted by *δ _{i}*, representing his or her propensity to form ties with other actors. The effect of these random effects on the propensity to form ties is modeled as follows:

$${\eta}_{i,j}=\sum _{k=1}^{p}{\beta}_{k}{x}_{k,i,j}-\Vert {Z}_{i}-{Z}_{j}\Vert +{\delta}_{i}+{\delta}_{j}.$$

(3)

The sociality *δ _{i}* is then the conditional log-odds ratio of an actor

This model can also be used for directed relationships. In that case we define both sender and receiver random effects, *δ _{i}* and

$${\eta}_{i,j}=\sum _{k=1}^{p}{\beta}_{k}{x}_{k,i,j}-\Vert {Z}_{i}-{Z}_{j}\Vert +{\delta}_{i}+{\gamma}_{j}\phantom{\rule{0.2em}{0ex}},$$

(4)

where

$$\begin{array}{cc}{\delta}_{i}^{\underset{\sim}{\text{i}.i.d.}}\phantom{\rule{0.2em}{0ex}}\text{N}(0,{\sigma}_{\delta}^{2})& i=1,\dots ,n,\\ {\gamma}_{i}^{\underset{\sim}{\text{i}.i.d}.}\phantom{\rule{0.2em}{0ex}}\text{N}(0,{\sigma}_{\gamma}^{2})& i=1,\dots ,n,\end{array}$$

and the variances
${\sigma}_{\delta}^{2}$ and
${\sigma}_{\gamma}^{2}$ measure heterogeneity in the propensity to send and receive links. The use of random effects in the latent space model was proposed by Hoff (2003), and van Duijn et al. (2004) who made a similar proposal for the *p _{2}* model.

We propose a Bayesian approach to estimate the latent cluster random effects model given by (1), (2), and either (3) or (4). The approach estimates the latent positions, the clustering model and the actor-specific effects simultaneously. We implement the methods computationally using a Markov chain Monte Carlo (MCMC) algorithm.

We introduce the new variables *K _{i}*, equal to

$$\begin{array}{c}\phantom{\rule{0.5em}{0ex}}\beta \sim {\text{MVN}}_{p}\phantom{\rule{0.2em}{0ex}}(\xi ,\psi )\phantom{\rule{0.2em}{0ex}},\\ \phantom{\rule{0.5em}{0ex}}\lambda \sim \text{Dirichlet}\phantom{\rule{0.2em}{0ex}}(\nu )\phantom{\rule{0.2em}{0ex}},\\ {\sigma}_{\delta}^{2}\sim {\alpha}_{\delta}{\sigma}_{0,\delta}^{2}\phantom{\rule{0.3em}{0ex}}\text{Inv}{{\chi}^{2}}_{{\alpha}_{\delta}}\phantom{\rule{0.2em}{0ex}},\\ {\sigma}_{\gamma}^{2}\sim {\alpha}_{\gamma}{\sigma}_{0,\gamma}^{2}\phantom{\rule{0.3em}{0ex}}\text{Inv}{{\chi}^{2}}_{{\alpha}_{\gamma}}\phantom{\rule{0.2em}{0ex}},\\ {\sigma}_{g}^{\underset{\sim}{2\text{i}.i.d.}}{\alpha}_{Z}{\sigma}_{0,Z}^{2}\phantom{\rule{0.3em}{0ex}}\text{Inv}{{\chi}^{2}}_{{\alpha}_{Z}}\phantom{\rule{1.5em}{0ex}}g=1,\dots ,G,\\ {\mu}_{g}^{\underset{\sim}{\text{i}.i.d.}}{\text{MVN}}_{d}\phantom{\rule{0.2em}{0ex}}\left(0,{\omega}^{2}{I}_{d}\right)\phantom{\rule{0.3em}{0ex}},\phantom{\rule{1.4em}{0ex}}g=1\dots G,\end{array}$$

where ξ, ψ, ν = (ν_{1},…, ν* _{G}*),
${\sigma}_{0,Z}^{2}$,

We set *ν _{g}* equal to the smallest group size we are willing to consider for the network of interest, and ξ = 0 and Ψ = 9

Our MCMC algorithm iterates over the model parameters with the priors given above, the latent positions *Z _{i}*, the random effects

We first describe the full conditional updates. Let ellipsis (“…”) represent those variables which the variable being sampled is conditionally independent of, and thus do not figure in its full conditional distribution. The relevant priors being conjugate, the full conditionals for those variables that can be Gibbs-sampled are as follows:

$$\begin{array}{c}\phantom{\rule{8em}{0ex}}{\sigma}_{\delta}^{2}|\delta ,\dots \sim \left({\alpha}_{\delta}{\sigma}_{0,\delta}^{2}+\sum _{i=1}^{n}{\delta}_{i}^{2}\right)\phantom{\rule{0.3em}{0ex}}\text{Inv}{{\chi}^{2}}_{{\alpha}_{\delta}+n}\phantom{\rule{0.1em}{0ex}},\\ \phantom{\rule{8em}{0ex}}{\sigma}_{\gamma}^{2}|\gamma ,\dots \sim \left({\alpha}_{\gamma}{\sigma}_{0,\gamma}^{2}+\sum _{i=1}^{n}{\gamma}_{i}^{2}\right)\phantom{\rule{0.3em}{0ex}}\text{Inv}{{\chi}^{2}}_{{\alpha}_{\gamma}+n}\phantom{\rule{0.1em}{0ex}},\\ \phantom{\rule{5em}{0ex}}{\mu}_{g}|Z,K,{\sigma}_{g}^{2},{\dots}^{\underset{\sim}{\text{ind}}}{\text{MVN}}_{d}\phantom{\rule{0.2em}{0ex}}\left(\frac{{n}_{g}{\overline{Z}}_{g}}{{n}_{g}+{\sigma}_{g}^{2}/{\omega}^{2}},\frac{{\sigma}_{g}^{2}}{{n}_{g}+{\sigma}_{g}^{2}/{\omega}^{2}}\right)\phantom{\rule{1em}{0ex}}g=1,\dots ,G,\\ \phantom{\rule{5em}{0ex}}{\sigma}_{g}^{2}|Z,K,{\mu}_{g},{\dots}^{\underset{\sim}{\text{ind}}}\left({\alpha}_{Z}{\sigma}_{Z,0}^{2}+S{S}_{{Z}_{g}}\right)\phantom{\rule{0.3em}{0ex}}\text{Inv}{{\chi}^{2}}_{{\alpha}_{Z}+{n}_{g}d}\phantom{\rule{2em}{0ex}}g=1,\dots ,G,\\ \phantom{\rule{8em}{0ex}}\lambda |K,\dots \sim \text{Dirichlet}\phantom{\rule{0.2em}{0ex}}({\nu}_{1}+{n}_{1},\dots ,{\nu}_{G}+{n}_{G})\phantom{\rule{0.2em}{0ex}},\\ Pr\phantom{\rule{0.2em}{0ex}}\left({K}_{i}=g|\lambda ,Z,{\mu}_{g},{\sigma}_{g}^{2},\dots \right)=\frac{{\lambda}_{g}\phantom{\rule{0.3em}{0ex}}{f}_{{\text{MVN}}_{d}({\mu}_{g},{\sigma}_{g}^{2}{I}_{d})}\phantom{\rule{0.2em}{0ex}}({Z}_{i})}{{\sum}_{k=1}^{G}{\lambda}_{k}\phantom{\rule{0.3em}{0ex}}{f}_{{\text{MVN}}_{d}\phantom{\rule{0.3em}{0ex}}({\mu}_{k},{\sigma}_{k}^{2}{I}_{d})}\phantom{\rule{0.3em}{0ex}}({Z}_{i})}\phantom{\rule{1.5em}{0ex}}i=1,\dots ,n,\end{array}$$

where
$S{S}_{{Z}_{g}}={{\sum}_{i=1}^{n}{1}_{{K}_{i=g}}\phantom{\rule{0.2em}{0ex}}({Z}_{i}-{\mu}_{g})}^{T}\phantom{\rule{0.2em}{0ex}}({Z}_{i}-{\mu}_{g})$, the sum of squared deviations of the latent positions in cluster *g* from their cluster's mean, and
${n}_{g}={\sum}_{i=1}^{n}{1}_{{K}_{i=g}}$, the number of actors assigned to cluster *g* during a particular iteration.

We now describe the Metropolis-Hastings updates. Two kinds of Metropolis-Hastings proposals are used. First, actor-specific parameters (latent space positions and random effects) are updated one actor at a time, in a random order. Second, covariate coefficients are block-updated with the scale of latent space positions and a shift in random effects.

An independent *d*-variate normal jump is proposed for each actor (in random order). For a particular actor *i*, the proposal

$${Z}_{i}^{\ast}\sim {\text{MVN}}_{d}\phantom{\rule{0.3em}{0ex}}({Z}_{i},{\tau}_{Z}^{2}{I}_{d})$$

is made. At the same time, an independent proposal is made for the sender and receiver effects of that actor:

$$\begin{array}{l}{\delta}_{i}^{\ast}\sim \text{N}\phantom{\rule{0.3em}{0ex}}({\delta}_{i},{\tau}_{\delta}^{2})\phantom{\rule{0.2em}{0ex}},\\ {\gamma}_{i}^{\ast}\sim \text{N}\phantom{\rule{0.3em}{0ex}}({\gamma}_{i},{\tau}_{\gamma}^{2})\phantom{\rule{0.2em}{0ex}}.\end{array}$$

The parameters ${Z}_{i}^{\ast}$, ${\delta}_{i}^{\ast}$, and ${\gamma}_{i}^{\ast}$ are then accepted or rejected as a block. The reason for this block-updating is that parameters pertaining to a particular node are likely to have strong dependence: for example, a jump that moves an actor away from others would be associated with an increase in its random effect, to compensate.

This proposal is symmetric. Because each actor is assigned to one cluster at each MCMC iteration, the acceptance probability is

$$\text{min}\phantom{\rule{0.2em}{0ex}}\left(1,\frac{{f}_{Y|{Z}_{i},{\delta}_{i},{\gamma}_{i},\dots}(y|{Z}_{i}^{\ast},{\delta}_{i}^{\ast},{\gamma}_{i}^{\ast},\dots )\phantom{\rule{0.2em}{0ex}}{f}_{{\text{MVN}}_{d}\phantom{\rule{0.2em}{0ex}}({\mu}_{{K}_{i}}}{,}_{{\sigma}_{{K}_{i}}^{2}{I}_{d})}({Z}_{i}^{\ast})\phantom{\rule{0.2em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\delta}^{2})}\phantom{\rule{0.2em}{0ex}}({\delta}_{i}^{\ast})\phantom{\rule{0.3em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\gamma}^{2})}\phantom{\rule{0.2em}{0ex}}({\gamma}_{i}^{\ast})}{{f}_{Y|{Z}_{i},{\delta}_{i},{\gamma}_{i},\dots}(y|{Z}_{i},{\delta}_{i},{\gamma}_{i},\dots )\phantom{\rule{0.2em}{0ex}}{f}_{{\text{MVN}}_{d}\phantom{\rule{0.2em}{0ex}}({\mu}_{{K}_{i}}}{,}_{{\sigma}_{{K}_{i}}^{2}{I}_{d})}({Z}_{i})\phantom{\rule{0.2em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\delta}^{2})}\phantom{\rule{0.2em}{0ex}}({\delta}_{i})\phantom{\rule{0.2em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\gamma}^{2})\phantom{\rule{0.2em}{0ex}}}({\gamma}_{i})}\right)\phantom{\rule{0.2em}{0ex}}.$$

Once per MCMC iteration, a correlated proposal is used to jointly update *β, Z, μ, σ, δ*, and γ. Jumps *h _{β}*

$$\left[\begin{array}{c}{h}_{\beta}\\ {h}_{Z}\\ {h}_{\delta}\\ {h}_{\gamma}\end{array}\right]\sim {\text{MNV}}_{p+1+1+1}\phantom{\rule{0.2em}{0ex}}(0,{\tau}_{\beta ,Z,\delta ,\gamma})\phantom{\rule{0.3em}{0ex}},$$

and updates are proposed as follows:

$$\begin{array}{l}\phantom{\rule{0.3em}{0ex}}{\beta}^{\ast}=\beta +{h}_{\beta}\phantom{\rule{0.1em}{0ex}},\\ \phantom{\rule{0.3em}{0ex}}{Z}_{i}^{\ast}=\text{exp}\phantom{\rule{0.2em}{0ex}}({h}_{Z})\phantom{\rule{0.2em}{0ex}}{Z}_{i}\phantom{\rule{1.5em}{0ex}}i=1,\dots ,n,\\ \phantom{\rule{0.3em}{0ex}}{\mu}_{g}^{\ast}=\text{exp}\phantom{\rule{0.2em}{0ex}}({h}_{Z})\phantom{\rule{0.2em}{0ex}}{\mu}_{g}\phantom{\rule{1.5em}{0ex}}g=1,\dots ,G,\\ {\sigma}_{g}^{2\ast}={\text{exp}\phantom{\rule{0.2em}{0ex}}(2h}_{Z})\phantom{\rule{0.2em}{0ex}}{\sigma}_{g}^{2}\phantom{\rule{1.3em}{0ex}}g=1,\dots ,G,\\ \phantom{\rule{0.3em}{0ex}}{\delta}_{i}^{\ast}={\delta}_{i}+{h}_{\delta}\phantom{\rule{1.5em}{0ex}}i=1,\dots ,n,\\ \phantom{\rule{0.3em}{0ex}}{\gamma}_{i}^{\ast}={\gamma}_{i}+{h}_{\gamma}\phantom{\rule{1.5em}{0ex}}i=1,\dots ,n,\end{array}$$

This proposal accommodates expected posterior dependencies. The proposals to scale latent space positions, means, and variances are not symmetric in the Metropolis sense, but can be viewed as symmetric proposals on the log of the magnitudes of these variables expressed in polar coordinates. It can be shown that the acceptance ratio should be multiplied by ${h}_{Z}^{nd}$ for latent space positions, ${h}_{Z}^{Gd}$ for latent cluster means, and ${h}_{Z}^{2G}$ for latent cluster variances.

The acceptance probability is thus

$$\text{min}\phantom{\rule{0.3em}{0ex}}(1,\frac{{f}_{Y|\beta ,Z,\delta ,\gamma ,\dots}\phantom{\rule{0.2em}{0ex}}(y|{\beta}^{\ast},{Z}^{\ast},{\delta}^{\ast},{\gamma}^{\ast},\dots )\phantom{\rule{0.3em}{0ex}}{f}_{\text{Prior}}\phantom{\rule{0.2em}{0ex}}({\beta}^{\ast},{\mu}^{\ast},{\sigma}^{2\ast}\phantom{\rule{0.2em}{0ex}}){\prod}_{i=1}^{n}\phantom{\rule{0.2em}{0ex}}{f}_{\text{Actor}\phantom{\rule{0.3em}{0ex}}i}^{\ast}\phantom{\rule{0.3em}{0ex}}}{{f}_{Y|\beta ,Z,\delta ,\gamma ,\dots}\phantom{\rule{0.2em}{0ex}}(y|\beta ,Z,\delta ,\gamma ,\dots )\phantom{\rule{0.3em}{0ex}}{f}_{\text{Prior}}\phantom{\rule{0.2em}{0ex}}(\beta ,\mu ,{\sigma}^{2}){\prod}_{i=1}^{n}\phantom{\rule{0.2em}{0ex}}{f}_{\text{Actor}\phantom{\rule{0.3em}{0ex}}i}}{h}_{Z}^{(n+G)d+2G})\phantom{\rule{0.4em}{0ex}},$$

where

$$\begin{array}{c}{f}_{\text{Prior}}\phantom{\rule{0.2em}{0ex}}(\beta ,\mu ,{\sigma}^{2})={f}_{{\text{MVN}}_{p}(\xi ,\Psi )}\phantom{\rule{0.3em}{0ex}}(\beta )\prod _{g=1}^{G}\phantom{\rule{0.2em}{0ex}}\left({f}_{{\text{MVN}}_{d}(0,{\omega}^{2}{I}_{d})}\phantom{\rule{0.3em}{0ex}}({\mu}_{g})\phantom{\rule{0.3em}{0ex}}{f}_{{\alpha}_{Z}{\sigma}_{0,Z}^{2}}\phantom{\rule{0.3em}{0ex}}{\text{Inv}}_{{{\chi}^{2}}_{{\alpha}_{Z}}}\phantom{\rule{0.4em}{0ex}}\left({\sigma}_{g}^{2}\right)\right)\phantom{\rule{0.3em}{0ex}},\end{array}$$

$$\begin{array}{c}{f}_{\text{Actor}\phantom{\rule{0.3em}{0ex}}i}={f}_{{\text{MVN}}_{d}({\mu}_{{K}_{i}},{\sigma}_{{K}_{i}}^{2}\phantom{\rule{0.1em}{0ex}}{I}_{d})}\phantom{\rule{0.3em}{0ex}}({Z}_{i})\phantom{\rule{0.4em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\delta}^{2})}\phantom{\rule{0.3em}{0ex}}({\delta}_{i})\phantom{\rule{0.4em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\gamma}^{2})}\phantom{\rule{0.2em}{0ex}}({\gamma}_{i})\phantom{\rule{0.2em}{0ex}},\end{array}$$

and

$${f}_{\text{Actor}\phantom{\rule{0.3em}{0ex}}i}^{\ast}={f}_{{\text{MVN}}_{d}({\mu}_{{K}_{i}}^{\ast},{\sigma}_{{K}_{i}}^{2\ast}{I}_{d})}\phantom{\rule{0.3em}{0ex}}({Z}_{i}^{\ast})\phantom{\rule{0.3em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\delta}^{2})}\phantom{\rule{0.3em}{0ex}}({\delta}_{i}^{\ast})\phantom{\rule{0.3em}{0ex}}{f}_{\text{N}(0,{\sigma}_{\gamma}^{2})}\phantom{\rule{0.3em}{0ex}}({\gamma}_{i}^{\ast})\phantom{\rule{0.2em}{0ex}}.$$

The likelihood is a function of the latent positions only through their distances, and so it is invariant to reflections, rotations and translations of the latent positions. The likelihood is also invariant to relabelling of the clusters, in the sense that permuting the cluster labels does not change the likelihood (Stephens, 2000).

We use the approach of HRT to resolve these near nonidentifiabilities by postprocessing the MCMC output. The approach is to find a configuration of cluster labels and positions with implied distribution close to the corresponding “true” distribution in terms of Bayes risk. This is done by minimizing the Kullback-Leibler divergence between the distribution of networks predicted by the configuration of positions and the posterior predicted distribution of networks. These are called *Minimum Kullback-Leibler* (MKL) positions. The post-processed actor positions are denoted by Z_{MKL}.

A further source of non-identifiability is that adding a constant to all of the actors' sender, receiver, or sociality effects and subtracting it from *β*_{0}, the density covariate coefficient, preserves the likelihood. While the prior distributions resolve this non-identifiability, we found that it resulted in slow mixing in our MCMC sampling, and addressed it using the correlated proposal described above.

For visualization purposes, posterior cluster means and variances corresponding to chosen positions are also needed. We use the full conditionals for *μ _{g}*,
${\sigma}_{g}^{2},$ λ, and

The proposal distribution variance parameters, *τ _{z}, τ_{γ}*,

To speed convergence, we start the algorithm at an approximation to the posterior mode. Specifically:

- Multidimensional scaling is performed on geodesic distances between the graph vertices to get initial latent space positions
*Z*_{MDS}(Breiger, Boorman, and Arabie, 1975). These are then centered at the origin. - Model-based clustering is used to get a hard clustering
*K*_{MDS}of*Z*_{MDS}(Fraley and Raftery, 2002). To improve robustness, the first time through, locations with Mahalanobis distances from the origin greater than 20 are excluded. This threshold value was found experimentally to exclude small graph components and isolates but still provide a good margin of safety for vertices containing useful information about structure. For the excluded points,*K*_{MDS}is arbitrarily assigned to the largest cluster. - Numerical optimization is used to find the posterior mode conditional on
*K*_{MDS}. - Steps 2 and 3 are repeated to convergence.

We implemented the algorithms described in an R (R Development Core Team, 2008) package,
`latentnet` (Krivitsky and Handcock, 2008b), which was used to analyze the following examples.

We consider four datasets, summarized in Table 1. The first, liking among monks in a monastery, has previously been analyzed using latent position and latent position cluster models, and we compare the model fit to those previously obtained. The second and third datasets are simulated. Both have the same degree distribution, but one has both transitivity and clustering, while the other has neither. The last dataset is a network of Slovenian newspapers and magazines, with each pair of magazines having a count of Slovenians surveyed who reported reading both of them. This allows us to apply this family of models to non-binary data, and provides an example of a situation where heterogeneity of actors is better modeled using fixed effects.

Our first example is the Sampson's Monks dataset: relations of “liking” among 18 monks in a monastery (Sampson, 1969). The network analyzed has a directed edge between two monks if the sender monk ranked the receiver monk in the top three monks for positive affection in any of the three interviews given over a twelve month period. The sociogram of this dataset is shown in Figure 1.

Relationships among monks within a monastery and their affiliations as identified by Sampson: Young (T)urks, (L)oyal Opposition, (O)utcasts, and (W)averers.

The measurement process for these data imposed constraints on the monk-specific sender effects. In particular, the sender effects are limited: Sampson asked each monk to name the three others that he liked most, three times over the period of the study, so the out-degree of each monk is bounded. The dataset pools these nominations, so a tie between one monk and another exists if the first monk nominated the second as one of his top three most liked *at least once.* Thus, the number of out-ties a monk has is less a measure of the monk's sociality and more a measure of how often the monk changes his friends. On the other hand, the in-ties were not constrained, so a monk's receiver effect can be interpreted as the popularity of the monk, to the extent that it is reflected by how many others nominate him as a friend.

Sampson (1969) identified three main groups of monks: the Young Turks (7 members), the Loyal Opposition (5 members) and the Outcasts (3 members). The other three monks wavered between the Loyal Opposition and the Young Turks, which he described as being in intense conflict (Sampson 1969, p. 370; White, Boorman, and Breiger 1976, p. 752–753).

We fit two versions of our clustering model: a two-dimensional, three-cluster, latent space model without random effects, and one with receiver effects. In accordance with the heuristic described in Section 3.1, the hyperparameter values used were *v*_{1} = *v*_{2} = *v*_{3} ≈ 2.45,
${\sigma}_{0}^{2}$ = 0.75, *α _{z}* ≈ 2.54,
${\sigma}_{0,\delta}^{2}$ = 1.0,

The fits are summarized in Figure 2. From the plots, the monks are well separated into the three groups and our model assigns each monk to the same group that Sampson did: all monks of Loyal Opposition (and two of the Waverers) are reliably assigned to the “Red” cluster, all the Young Turks to the “Blue” cluster, and all the Outcasts (and one Waverer) to the “Green” cluster. The Young Turks are also more tightly clustered than the Loyal Opposition. (The posterior means of the variances for their clusters are, respectively, 0.716 and 1.09 for the model without receiver effects and 0.716 and 0.968 for the model with receiver effects.)

Minimum Kullback-Leibler estimates of positions in the social space of monks within a monastery. Panel (a) gives estimates from a latent cluster model without monk-specific random effects; panel (b) adds receiver random effects. For the latter, the area **...**

An interesting contrast between models with and without receiver effects is Monk #1 (Ramauld, a Waverer). This monk is relatively unpopular: he has out-ties to 4 of the 6 members of Loyal Opposition (as identified in Sampson's original paper), but few in-ties from anyone. In the model without receiver effects (Fig. 2a), this monk is thus pushed to the edge of the Loyal Opposition group. When the receiver effects are added (Fig. 2b), this monk moves toward the center of the Loyal Opposition group because of his out-ties to them and has a small receiver effect to compensate. Thus, his position is more determined by his relations to other monks than his overall unpopularity, which is accounted for by the receiver effect.

We use the results from fitting the latent cluster receiver effects model to verify that the model and our implementation of it are able to recover the latent positions. Among the 18 monks, there are only 18 × 17 = 306 directed dyads — binary observations — and the latent cluster receiver effects model of dimension 2 has 55 continuous parameters in the likelihood, so in order to test whether the model is able to recover latent space positions with any accuracy, we must artificially increase the precision of the estimates. To do this, we simulated 200 networks based on 200 draws of parameter configurations from the posterior distribution of the latent cluster random effects model, and, for every ordered pair of monks, counted the number of simulated networks in which a tie on that pair was observed. We then fit a latent cluster receiver effects model with binomial response with 200 trials.

The results are summarized in Figure 3. The latent space positions from the fit based on the summed network are very close to those from the original fit (average Eucledian distance between their MKL estimates for each actor is 0.18) as are the receiver effects.

We now give results for two simulated network datasets with the same degree distribution. The first one does not exhibit either transitivity or clustering, while the second one has both.

There has been a focus in the literature on scale-free, preferential attachment and power-law models for networks, especially in the physics literature (Newman, 2003). These models assume that all networks with the same degree distribution are equally likely. As a result, methods based on these models cannot distinguish between networks that have the same degree distribution but network behavior that differs in other ways. The purpose of this simulated example is to show that our methods can make these distinctions.

Each of our simulated networks has 150 actors and an undirected relationship between them. They are sparse networks with density 0.022. The first network was simulated from the preferential attachment model of Handcock and Jones (2004) using the methods of Handcock and Morris (2007). In this model the degree sequences follow a Yule probability distribution, with *ρ* = 2.5, and the actors form ties independently given this sequence. The network generating process exhibits power-law behavior with scaling exponent 2.5. It is thus a scale-free network with a very right-skewed degree distribution, and exhibits no transitivity or clustering. The degree sequence is generated from the Yule distribution and the network generated using an exponential-family random graph model conditional on that degree sequence using
`statnet` (Handcock, Hunter, Butts, Goodreau, and Morris, 2003b). The network is visualized in Figure 4(a). Note how the high-degree actors act as “hubs” for the other actors.

Two simulated networks, each with 150 actors and the same degree distribution shown in (c). (a) Yule network (with no transitivity or clustering); (b) Latent Cluster network, where the labels 1–3 give the true cluster memberships.

The second network has the same degree distribution as the first but with latent positions drawn from the model (2) with *G* = 3 groups in *d* = 2 dimensions. The clusters are dispersed with *μ*_{1} = (0,0), *μ*_{2} = (−1.5, 1.5), *μ*_{3} = (1.5,1.5) The intra-cluster standard deviation in positions is *σ _{g}* = 0.2. The network is a random draw from the Latent Cluster Model conditional on the degree sequence of the first network. This network also has a power-law degree distribution. Unlike the first network, it exhibits transitivity and has clustered latent positions that lead to highly clustered pattern of links.

The two networks are shown in Figure 4. They look very different, but they have the same degree distribution, shown in Figure 4(c). Note the extreme right tail that is characteristic of scale-free distributions.

We now report the results of fitting the Latent Cluster Random Effects Model to these networks. In each case, we fit two models: a latent 3-cluster model with no random effects, and a latent 3-cluster model with random sociality effects, both of these with 2-dimensional latent spaces (*Z _{i}*

The fits of the two models (without and with random sociality effects) to the unstructured Yule network are shown in Figure 5. The estimated latent space positions vary very little for either model, and the estimated cluster distributions overlap almost completely. Thus, neither of the two latent space models that we fit finds much evidence of structure or distinct groups. And in fact there are no groups in the data, so both models reach the right conclusion in this case.

Minimum Kullback-Leibler locations from the models for the unclustered network in Figure 4(a). In plot (b), the area of the plotting symbol is proportional to the conditional odds ratio of a tie for its vertex, due to its random sociality effect.

The fits of the two models to the clustered network are shown in Figure 6. Both models were able to detect the distinct groups that are present in the data — the “Red” cluster is mostly group 1, “Green” is group 2, and “Blue” is group 3.

Minimum Kullback-Leibler locations from the models for the clustered network in Figure 4(b). In plot (b), the area of the plotting symbol is proportional to the conditional odds ratio of a tie for its vertex, due to its random sociality effect. The numbers **...**

To evaluate the quality of the clustering, we use a pairwise metric similar to the Fowlkes-Mallows Index (Fowlkes and Mallows, 1983): given that two nodes drawn at random are from the same true cluster, what is the probability that the clustering algorithm assigned them to the same cluster? When using hard clustering (by assigning a node to the cluster to which the plurality of MCMC iterations assign it) this probability is 79% for the model with random sociality effects, and 78% for the model without. However, looking at the soft clustering, where the metric defined above is averaged over the posterior distribution, the difference is more pronounced: 73% for the model with sociality effects and 65% without. Both models identified the clusters of actors in the data quite well, but the random effects model did so more robustly.

Also of note is the difference in the patterns of estimated latent positions. The model without random effects gives the “Red” and “Blue” clusters a hub-and-spokes shape: a few high-degree nodes in the middle, with many low-degree nodes in a ring around them, attracted by their ties to the “hub” nodes, but repelled by their lack of ties to each other. On the other hand, the model with random sociality effects addresses this by giving the high-degree nodes a high sociality effect, low degree nodes low sociality effects, and allowing them to be positioned together, reflecting structure adjusted for degree.

This example illustrates that networks with the same degree distribution can have very different network behavior. Methods based on degree distributions, such as those based on scale-free, preferential attachment and power-law models (Newman, 2003), cannot detect these differences. However, our model clearly distinguished between networks with and without transitivity and clustering behavior.

In 1999 and 2000, CATI Center Ljubljana conducted a survey, asking over 100,000 people which magazines and journals they read, producing a 2-mode, or affiliation network representing which readers read which magazines. These data were then compiled into a 1-mode, undirected network of magazines as follows: for a pair of magazines, the number of respondents who read both was counted, producing a weighted network of “coreaderships”. The dataset also breaks the magazines down into 14 groups by type, topic, and audience: daily newspapers, weekly news and analysis, computers, business, home and gardening, fashion, men's interest, women's interest, special interest, women's, TV guides, regional, teen, and free. For each magazine, the total number of respondents who reported reading it was also recorded. These data are available as a Pajek dataset “Revije” or “Journals” (Batagelj and Mrvar, 2006).

We analyze this network to illustrate the application of our model to non-binary data, as well as an example of a situation where a fixed covariate effect can be used in conjunction with a latent cluster model.

The coreadership for each pair of magazines is a count of events (i.e. the respondent reporting that he or she reads that pair of magazines) with a huge number of potential events (over 100,000). Those events (respondents) are independent, so it would be reasonable to approximate the distribution of counts as Poisson. The model is as follows:

$${Y}_{i,j}|{\mu}_{i,j}\sim \text{Poisson}\phantom{\rule{0.2em}{0ex}}({\mu}_{i,j})$$

(5)

$$\text{log}({\mu}_{i,j})={\eta}_{i,j}={\beta}_{0}-\Vert {Z}_{i}-{Z}_{j}\Vert .$$

(6)

Here, the latent position *Z _{i}* of a magazine

Magazine-specific random sociality effects (i.e. *δ _{i}* and

$${\eta}_{i,j}={\beta}_{0}+{\beta}_{1}{x}_{1,i,j}\phantom{\rule{0.2em}{0ex}}-\phantom{\rule{0.2em}{0ex}}\Vert {Z}_{i}-{Z}_{j}\Vert ,$$

where *x*_{1,}* _{i,j}* is a function of the number of magazine readers. We would expect the number of coreaderships of a given pair of magazines to be approximately proportional to their readerships, so we use

This resembles somewhat the association model of Goodman (1985) but the specification of the model is not the same. The idea of scores for the categories that are estimated from the data is also present in Goodman's approach. However, this network cannot be considered as a contingency table, because each respondent in the original survey could name as many publications as he or she wanted, incrementing multiple coreadership counts at once.

We found that a two-dimensional latent space could not adequately represent the structure in the data, and produced no clusters. However, using three dimensions allowed the model to detect a fairly consistent clustering with up to 5 clusters, which successfully separates those magazine categories that had within-category homophily, such that magazines within that category had greater-than-expected coreader counts with each other.

In order to find which categories have this property, we fit a non-latent-space quasi-independence model of the following form:

$${\eta}_{i,j}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}{\beta}_{0}+{\beta}_{1}\phantom{\rule{0.2em}{0ex}}(\text{log}({r}_{i})\phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\text{log}({r}_{j}))\phantom{\rule{0.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{k=1}^{14}\phantom{\rule{0.1em}{0ex}}{\beta}_{1+k}{1}_{{c}_{i}=k\Lambda {c}_{j}=k,}$$

where *η _{i,j}* are defined as in (5) and

We show the maximum likelihood estimates in Table 2. The estimated coefficient of log(*r _{i}*) + log(

The most informative fit in 3 dimensions was obtained using a 6-cluster model. One of the clusters did not have the plurality of MCMC draws assign any magazines to it, after dealing with label-switching as recommended by Stephens (2000), but including it seemed to facilitate mixing, as fitting a model with 5 clusters resulted in 4 non-empty clusters. The estimated positions (or, rather, their principal components) and their clustering are given in Figure 7. The clustering is not very strong, in the sense that for many of the magazines, no single cluster has a clear majority of iterations assign the magazine to it. However, it does detect some of the categories.

Positions and estimated clusters of magazine coreaderships. The first two principal components of the 3-dimensional fit are plotted. Only those edges with the highest coreadership after adjusting for readership are plotted.

The cross-tabulation between clustering and known categories is given in Table 3. All the magazines in each of the categories with very high homophily coefficients (Computers and Fashion) were assigned to the same clusters, and most of the time the MCMC sampling process put them in the same cluster. Men's Interest and Teen magazines also had high coefficients, and tended to be sorted into the same clusters, though not as consistently. On the other hand, Women's Interest magazines were not sorted into the same clusters to the same extent, despite their high coefficient. Groups of magazines with small or negative homophily coefficients tended to be spread out across clusters. All this suggests that the clustering model is successfully detecting classes of magazines and target audiences.

In this example actor degree effects are observed directly rather than being inferred, and are modeled as fixed rather than random. This example shows the usefulness of this class of models for detecting clusters in networks with weighted edges. This network's clusters, while meaningful, are not as clear-cut as in the other examples. We found that in this situation, the sampling algorithm may effectively use one of the clusters to facilitate detecting the others.

We have introduced an extension to the latent space model of Hoff et al. (2002) and the latent position clustering model of HRT that also models heterogeneity in actor sociality levels by including random effects, or with fixed covariates. We found this to give satisfactory fits to two real network datasets, one with binary data consisting of the presence or absence of relationships, and one with count data. We also applied our method to two simulated networks with the same, highly skewed degree distribution, but very different network behavior: one with transitivity and clustering and other without. Currently popular methods based on the degree distribution only could not distinguish between such very different kinds of networks, but our model was able to do so.

For directed data we have limited ourselves to modeling the two random effects of each individual as uncorrelated. Hoff (2005) and van Duijn et al. (2004) modeled the sender and receiver effects for the same individual as correlated, using a bivariate normal with a Wishart prior. This would be an obvious further extension to the latent cluster random effects model.

One problem we have not addressed here is that of choosing the number of groups and the latent space dimension. This can be done by recasting the problem as one of statistical model selection and using Bayesian model selection to solve it. HRT did this for choosing the number of groups in their latent position cluster model, Oh and Raftery (2001) did so for choosing the dimension of the latent space for a related Bayesian multidimensional scaling model, and Oh and Raftery (2007) did this for choosing both the number of groups and the latent space dimension simultaneously in model-based clustering for dissimilarities. This work could be adapted and extended to the latent cluster random effects model.

We have used a Euclidean distance for our latent social space, but this is not the only possible measure on which to base the model. In particular, Hoff, Raftery, and Handcock (2002) and Hoff (2005) used an inner product, which has certain advantages. Schweinberger and Snijders (2003) proposed using an ultrametric distance.

While we provide a reasonable heuristic for our choice of hyperparameters, the heuristic itself is a result of experimentation, and it would be desirable to have a more principled way of choosing the hyperparameters. One possibility would be to fit a logit model with node-specific effects, and then use the variances of these effects to obtain an empirical-Bayes-type prior.

- Amaral LAN, Scala A, Barthelemy M, Stanley HE. Classes of small-world networks. Proceedings of the National Academy of Sciences of the United States of America. 2000;97:11149–11152. [PubMed]
- Batagelj V, Mrvar A. Pajek datasets [WWW document] 2006. Available at: URL http://vlado.fmf.uni-lj.si/pub/networks/data/
- Breiger RL, Boorman SA, Arabie P. An algorithm for clustering relational data with application to social network analysis and comparison with multidimensional scaling. Journal of Mathematical Psychology. 1975;12:328–383.
- Diebolt J, Robert CP. Bayesian estimation of finite mixture distributions. Journal of the Royal Statistical Society, Series B. 1994;56:363–375.
- Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association. 1983;78:553–569.
- Fraley C, Raftery AE. Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association. 2002;97:611–631.
- Frank O, Strauss D. Markov graphs. Journal of the American Statistical Association. 1986;81:832–842.
- Goodman LA. The analysis of cross-classified data having ordered and/or unordered categories: association models, correlation models, and asymmetry models for contingency tables with or without missing entries. The Annals of Statistics. 1985;13:10–69.
- Handcock MS, Hunter DR, Butts CT, Goodreau SM, Morris M. Statnet Project; Seattle, WA: 2003b. statnet: software tools for the statistical modeling of network data, version 2.0. Available at: URL http://statnetproject.org, URL http://CRAN.R-project.org/package=statnet.
- Handcock MS, Jones JH. Likelihood-based inference for stochastic models of sexual network formation. Theoretical Population Biology. 2004;65:413–422. [PubMed]
- Handcock MS, Morris M. A simple model for complex networks with arbitrary degree distribution and clustering. In: Airoldi EM, editor. Vol. of 4503 of Lecture Notes in Computer Science; Workshop on Statistical Network Analysis, ICML 2006; Pittsburgh, USA. June 29, 2006; Springer; 2007. pp. 103–114.
- Handcock MS, Raftery AE, Tantrum JM. Model-based clustering for social networks (with discussion) Journal of the Royal Statistical Society, Series A. 2007;170:301–354.
- Hoff PD, Raftery AE, Handcock MS. Latent space approaches to social network analysis. Journal of the American Statistical Association. 2002;97:1090–1098.
- Hoff PD. Random effects models for network data. In: Breiger R, Carley K, Pattison P, editors. Dynamic Social Network Modeling and Analysis. Vol. 126. Committee on Human Factors, Board on Behavioral, Cognitive, and Sensory Sciences, National Academy Press; Washington, DC: 2003. pp. 302–322.
- Hoff PD. Bilinear mixed-effects models for dyadic data. Journal of the American Statistical Association. 2005;100:286–295.
- Jones JH, Handcock MS. An assessment of preferential attachment as a mechanism for human sexual network formation. Proceedings of the Royal Society of London, B. 2003;270:1123–1128. [PMC free article] [PubMed]
- Krivitsky PN, Handcock MS. Fitting position latent cluster models for social networks with
`latentnet`, Journal of Statistical Software. 2008. URL http://www.jstatsoft.org/v24/i02/ - Krivitsky PN, Handcock MS.
`latentnet`: Latent position and cluster models for statistical networks, Version 2.2. 2008. Available at: URL http://statnetproject.org, URL http://CRAN.R-project.org/package=latentnet. - Newman MEJ. Spread of epidemic disease on networks. Physical Review E. 2002;66 art. no.–016128. [PubMed]
- Newman MEJ. The structure and function of complex networks. SIAM Review. 2003;45:167–256.
- Oh MS, Raftery AE. Bayesian multidimensional scaling and choice of dimension. Journal of the American Statistical Association. 2001;96:1031–1044.
- Oh MS, Raftery AE. Model-based clustering with dissimilarities: A Bayesian approach. Journal of Computational and Graphical Statistics. 2007;16 to appear.
- Raftery AE, Lewis SM. Implementing MCMC. In: Gilks WR, Spiegelhalter DJ, Richardson S, editors. Markov Chain Monte Carlo in Practice. Chapman and Hall; London: 1996. pp. 115–130.
- R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. 2008. Available at: URL http://www.R-project.org.
- Sampson SF. PhD thesis. Cornell University; 1969. Crisis in a cloister.
- Schweinberger M, Snijders TAB. Settings in social networks: A measurement model. Sociological Methodology. 2003;33:307–341.
- Shortreed S, Handcock MS, Hoff PD. Positional estimation within the latent space model for networks. Methodology. 2006;2:24–33.
- Stephens M. Dealing with label-switching in mixture models. Journal of the Royal Statistical Society, Series B, Methodological. 2000;62:795–809.
- van Duijn MAJ, Snijders TAB, Zijlstra BH.
*p*_{2}: A random effects model with covariates for directed graphs. Statistica Neerlandica. 2004;58:234–254. - White HC, Boorman SA, Breiger RL. Social structure from multiple networks. I. Blockmodels of roles and positions. American Journal of Sociology. 1976;81:730–780.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |