PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS One. 2010; 5(4): e10012.
Published online 2010 April 8. doi:  10.1371/journal.pone.0010012
PMCID: PMC2851615

Efficient and Exact Sampling of Simple Graphs with Given Arbitrary Degree Sequence

Fabio Rapallo, Editor

Abstract

Uniform sampling from graphical realizations of a given degree sequence is a fundamental component in simulation-based measurements of network observables, with applications ranging from epidemics, through social networks to Internet modeling. Existing graph sampling methods are either link-swap based (Markov-Chain Monte Carlo algorithms) or stub-matching based (the Configuration Model). Both types are ill-controlled, with typically unknown mixing times for link-swap methods and uncontrolled rejections for the Configuration Model. Here we propose an efficient, polynomial time algorithm that generates statistically independent graph samples with a given, arbitrary, degree sequence. The algorithm provides a weight associated with each sample, allowing the observable to be measured either uniformly over the graph ensemble, or, alternatively, with a desired distribution. Unlike other algorithms, this method always produces a sample, without back-tracking or rejections. Using a central limit theorem-based reasoning, we argue, that for large An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e001.jpg, and for degree sequences admitting many realizations, the sample weights are expected to have a lognormal distribution. As examples, we apply our algorithm to generate networks with degree sequences drawn from power-law distributions and from binomial distributions.

Introduction

Network representation has become an increasingly widespread methodology of analysis to gain insight into the behavior of complex systems, ranging from gene regulatory networks to human infrastructures such as the Internet, power-grids and airline transportation, through metabolism, epidemics and social sciences [1][4]. These studies are primarily data driven, where connectivity information is collected, and the structural properties of the resulting graphs are analyzed for modeling purposes. However, rather frequently, full connectivity data is unavailable, and the modeling has to resort to considerations on the class of graphs that obeys the available structural data. A rather typical situation is when the only information available about the network is the degree sequence of its nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e002.jpg. For example, in epidemiology studies of sexually transmitted diseases [5], anonymous surveys may only collect the number of sexual partners of a person in a given period of time, not their identity. Epidemiologists are then faced with constructing a typical contact graph having the observed degree sequence, on which disease spread scenarios can be tested. Another reason for studying classes or ensembles of graphs obeying constraints comes from the fact that the network structure of many large-scale real-world systems is not the result of a global design, but of complex dynamical processes with many stochastic elements. Accordingly, a statistical mechanics approach [1] can be employed to characterize the collective properties of the system emerging from its node level (microscopic) properties. In this approach, statistical ensembles of graphs are defined [6], [7], representing “connectivity microstates” from which macroscopic system level properties are inferred via averaging. Here we focus on the degree as a node characteristic, which could represent, for example, the number of friends of a person, the valence of an atom in a chemical compound, the number of clients of a router, etc.

In spite of its practical importance, finding a method to construct degree-based graphs in a way that allows the corresponding graph ensemble to be properly sampled has been a long-standing open problem in the network modeling community (references using various approaches are given below). Here we present a solution to this problem, using a biased sampling approach. We consider degree-based graph ensembles on two levels: 1) sequence-level, where a specific sequence of degrees is given, and 2) distribution level, where the sequences are themselves drawn from a given degree distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e003.jpg. In the remainder we will focus on the fundamental case of labeled, undirected simple graphs. In a simple graph any link connects a single pair of distinct nodes and self loops and multiple links between the same pair of nodes are not allowed. Without loss of generality, consider a sequence of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e004.jpg positive integers An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e005.jpg, arranged in non-increasing order: An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e006.jpg. If there is at least one simple graph An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e007.jpg with degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e008.jpg, the sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e009.jpg is called a graphical sequence and we say that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e010.jpg realizes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e011.jpg. Note that not every sequence of positive integers can be realized by simple graphs. For example, there is no simple graph with degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e012.jpg or An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e013.jpg, while the sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e014.jpg can obviously be realized by a simple graph. In general, if a sequence is graphical, then there can be several graphs having the same degree sequence. Also note that given a graphical sequence, the careless or random placing of links between the nodes may not result in a simple graph.

Recently, a direct, swap-free method to systematically construct all the simple graphs realizing a given graphical sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e015.jpg was presented [8]. However, in general (for exceptions see Ref. [9]), the number of elements of the set An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e016.jpg of all graphs that realize sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e017.jpg, increases very quickly with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e018.jpg: a simple upper bound is provided by the number of all graphs with sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e019.jpg, allowing for multiple links and loops: An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e020.jpg. Thus, typically, systematically constructing all graphs with a given sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e021.jpg is practical only for short sequences, such as when determining the structural isomers of alkanes [8]. For larger sequences, and in particular for modeling real-world complex networks, it becomes necessary to sample An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e022.jpg. Accordingly, several variants based on the Markov Chain Monte Carlo (MCMC) method were developed. They use link-swaps [10] (“switches”) to produce pseudo-random samples from An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e023.jpg. Unfortunately, most of them are based on heuristics, and apart from some special sequences, little has been rigorously shown about the methods' mixing time, and accordingly they are ill-controlled. The literature on such MCMC methods is simply too extensive to be reviewed here, instead, we refer the interested reader to Refs. [11][13] and the references therein. Finally, we recall the main swap-free method producing uniform random samples from An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e024.jpg, namely the configuration model (CM) [14][17]. This method picks a pair of nodes uniformly at random and connects them, until a rejection occurs due to a double link or a self-loop, in which case it restarts from the very beginning. For this reason, the CM can become very slow, as shown in the Discussion section. The CM has inspired approximation methods as well [18] and methods that construct random graphs with given expected degrees [19].

Here, by developing new results from the theorems in Ref. [8], we present an efficient algorithm that solves this fundamental graph sampling problem, and it is exact in the sense that it is not based on any heuristics. Given a graphical sequence, the algorithm always finishes with a simple graph realization in polynomial time, and it is rejection free. While the samples obtained are not uniformly generated, the algorithm also provides the exact weight for each sample, which can then be used to produce averages of arbitrary graph observables measured uniformly, or following any given distribution over An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e025.jpg.

Methods

Mathematical foundations

Before introducing the algorithm, we state some results that will be useful later on. We begin with the Erdös-Gallai (EG) theorem [20], which is a fundamental result that allows us to determine whether a given sequence of non-negative integers, called “degree sequence” hereafter, is graphical.

Theorem 1 (Erdö-Gallai)

A non-increasing degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e026.jpg is graphical if and only if their sum is even and, for all An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e027.jpg:

equation image
(1)

A necessary and sufficient condition for the graphicality of a degree sequence, which is constrained from having links between some node and a “forbidden set” of other nodes is given by the star-constrained graphicality theorem [8]. In this case the forbidden links are all incident on one node and thus form a “star”. To state the theorem, we first define the “leftmost adjacency set” of a node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e029.jpg with degree An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e030.jpg in a degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e031.jpg as the set consisting of the An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e032.jpg nodes with the largest degrees that are not in the forbidden set. If An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e033.jpg is non-increasing, then the nodes in the leftmost adjacency set are the first An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e034.jpg nodes in the sequence that are not in the forbidden set. The forbidden set could represent nodes that are either already connected to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e035.jpg, and thus subsequent connections to them are forbidden, or just imposed arbitrarily. Using this definition, the theorem is:

Theorem 2 (Star-constrained graphical sequences)

Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e036.jpg be a non-increasing graphical degree sequence. Assume there is a set of forbidden links incident on a node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e037.jpg. Then a simple graph avoiding the forbidden links can be constructed if and only if a simple graph can be constructed where An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e038.jpg is connected to all the nodes in its leftmost adjacency set.

A direct consequence [8] of Theorem 2 for the case of an empty forbidden set is the well-known Havel-Hakimi result [21], [22], which in turn implies:

Corollary 1

Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e039.jpg be a non-increasing unconstrained graphical degree sequence. Then, given any node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e040.jpg, there is a realization of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e041.jpg that includes a link between the first node and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e042.jpg.

Another result we exploit here is Lemma 3 of Ref. [8], extended to star-constrained sequences:

Lemma 1

Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e043.jpg be a graphical sequence, possibly with a star constraint incident on node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e044.jpg. Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e045.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e046.jpg be distinct nodes not in the forbidden set and different from An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e047.jpg, such that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e048.jpg. Then An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e049.jpg is also a graphical sequence with the same star constraint.

Proof. Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e050.jpg denote the set of nodes forbidden to connect to node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e051.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e052.jpg is star-constrained graphical there is a simple graph An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e053.jpg realizing the sequence with no connections between An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e054.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e055.jpg. Since An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e056.jpg, there is a node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e057.jpg to which An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e058.jpg is connected but An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e059.jpg is not. Note that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e060.jpg could be in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e061.jpg. Now cut the edge An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e062.jpg of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e063.jpg creating a stub at An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e064.jpg and another at An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e065.jpg. Remove the stub at An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e066.jpg so that its degree becomes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e067.jpg, and add a stub at An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e068.jpg so that its degree becoming An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e069.jpg. Since there are no connections in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e070.jpg between An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e071.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e072.jpg, connect the two stubs at these nodes creating a simple graph An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e073.jpg thus realizing An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e074.jpg. Clearly there are still no connections between An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e075.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e076.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e077.jpg, and thus An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e078.jpg is also star-constrained graphical.

Finally, using Lemma 1 and Theorem 2, we prove:

Theorem 3

Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e079.jpg be a degree sequence, possibly with a star-constraint incident on node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e080.jpg, and let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e081.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e082.jpg be two nodes with degrees such that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e083.jpg that are not constrained from linking to node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e084.jpg. If the residual degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e085.jpg obtained from An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e086.jpg by reducing the degrees at An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e087.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e088.jpg by unity is not graphical, then the degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e089.jpg obtained from An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e090.jpg by reducing the degrees at An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e091.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e092.jpg by unity is also not graphical.

Proof. By definition, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e093.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e094.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e095.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e096.jpg; An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e097.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e098.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e099.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e100.jpg. We consider An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e101.jpg, however, the proof is not affected by this assumption. By assumption, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e102.jpg is not graphical. Using proof by contradiction, assume that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e103.jpg is graphical. Clearly, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e104.jpg, and thus we can apply Lemma 1 on this sequence. As a result, the sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e105.jpg, that is exactly An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e106.jpg is graphical, a contradiction.

Note that if a sequence is non-graphical, then it is not star-constrained graphical either, and thus Theorem 3 is in its strongest form.

Biased sampling

The sampling algorithm described below is ergodic in the sense that every possible simple graph with the given finite degree sequence is generated with non-zero probability. However, it does not generate the samples with uniform probability; the sampling is biased. Nevertheless, the algorithm can be used to compute network observables that are unbiased, by appropriately weighing the averages measured from the samples. According to a well known principle of biased sampling [23],[24], if the relative probability of generating a particular sample An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e107.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e108.jpg, then an unbiased estimator for an observable An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e109.jpg measured from a set of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e110.jpg randomly generated samples An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e111.jpg is the weighted average

equation image
(2)

where the weights are An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e113.jpg, and the denominator is a normalization factor. The key to this method is to find the appropriate weight An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e114.jpg to associate with each sample. Note that in addition to uniform sampling, it is in fact possible to sample with any arbitrary distribution by choosing an appropriate set of sample weights.

Results

The algorithm

Let An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e115.jpg be a non-increasing graphical sequence. We wish to sample the set An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e116.jpg of graphs that realize this sequence. The graphs can be systematically constructed by forming all the links involving each node. To do so, begin by choosing the first node in the sequence as the “hub” node and then build the set of the “allowed nodes” An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e117.jpg that can be connected to it. An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e118.jpg contains all the nodes that can be connected to the hub such that if a link is placed between the hub and a node from An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e119.jpg, then a simple graph can still be constructed, thus preserving graphicality. Choose uniformly at random a node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e120.jpg, and place a link between An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e121.jpg and the hub. If An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e122.jpg still has “stubs”, i.e. remaining links to be placed, then add it to the set of “forbidden nodes” An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e123.jpg that contains all the nodes which can't be linked anymore to the hub node and which initially contains only the hub; otherwise, if An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e124.jpg has no more stubs to connect, then remove it from further consideration. Repeat the construction of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e125.jpg and link the hub with one of its randomly chosen elements until the stubs of the hub are exhausted. Then remove the hub from further consideration, and repeat the whole procedure until all the links are made and the sample construction is complete. Each time the procedure is repeated, the degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e126.jpg considered is the “residual degree sequence”, that is the original degree sequence reduced by the links that have previously been made, and with any zero residual degree node removed from the sequence. Then, choose a new hub, empty the set of forbidden nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e127.jpg and add the new hub to it. It is convenient, but not necessary, to choose the new hub to be a node with maximum degree in the residual degree sequence.

The sample weights needed to obtain unbiased estimates using Eq. 2 are the inverse relative probabilities of generating the particular samples. If in the course of the construction of the sample An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e128.jpg different nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e129.jpg are chosen as the hub and they have An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e130.jpg residual degrees when they are chosen, then this sample weight can be computed by first taking the product of the sizes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e131.jpg of the allowed sets An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e132.jpg constructed, then dividing this quantity by a combinatorial factor which is the product of the factorials of the residual degrees of each hub:

equation image
(3)

The weight accounts for the fact that at each step the hub node has An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e134.jpg nodes it can be linked to, which is the size of the allowed set at that point, and that the number of equivalent ways to connect the residual stubs of a new hub is An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e135.jpg. Note that it is always true that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e136.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e137.jpg occurring for sequences for which there is only one possible graph.

Building the allowed set

The most difficult step in the sampling algorithm is to construct the set of allowed nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e138.jpg. In order to do so first note that Theorem 3 implies that if a non-forbidden node, that is a node not in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e139.jpg, can be added to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e140.jpg, then all non-forbidden nodes with equal or higher degree can also be added to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e141.jpg. Conversely, if it is determined that a non-forbidden node cannot be added to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e142.jpg, then all nodes with equal or smaller degree also cannot be added to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e143.jpg. Therefore, referring to the degrees of nodes that cannot be added to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e144.jpg as “fail-degrees”, the key to efficiently construct An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e145.jpg is to determine the maximum fail-degree, if fail-degrees exist.

The first time An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e146.jpg is constructed for a new hub, according to Corollary 1, there is no fail-degree and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e147.jpg consists of all the other nodes. However, constructing An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e148.jpg becomes more difficult once links have been placed from the hub to other nodes. In this case, to find the maximum fail-degree note that at any step during the construction of a sample the residual sequence being used is graphical. Then, since according to Theorem 2 any connection to the leftmost adjacency set of the hub preserves graphicality, it follows from Theorem 3 that any fail-degree has to be strictly less than the degree of any node in the leftmost adjacency set of the hub.

If there are non-forbidden nodes in the residual degree sequence that have degree less than any in its leftmost adjacency set, then the maximum fail-degree can be found with a procedure that exploits Theorem 2. In particular, if the hub is connected to a node with a fail-degree, then, by Theorem 2, even if all the remaining links from the hub were connected to the remaining nodes in the leftmost adjacency set, the residual sequence will not be graphical. Our method to find fail-degrees, given below, is based on this argument.

Begin by constructing a new residual sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e149.jpg by temporarily assuming that links exist between the hub and all the nodes in its leftmost adjacency set except for the last one, which has the lowest degree in the set. The nodes temporarily linked to the hub should also be temporarily added to the set of forbidden nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e150.jpg. The nodes in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e151.jpg should be ordered so that it is non-increasing, that forbidden nodes appear before non-forbidden nodes of the same degree, and that the hub, which now has residual degree 1, is last.

At this point, in principle one could find the maximum fail degree by systematically connecting the last link of the hub with non-forbidden nodes of decreasing degree, and testing each time for graphicality using Theorem 1. If it is not graphical then the degree of the last node connected to the hub is a fail-degree, and the node with the largest degree for which this is true will have the maximum fail-degree. However, this procedure is inefficient because each time a new node is linked with the hub the residual sequence changes and every new sequence must be tested for graphicality.

A more efficient procedure to find the maximum fail-degree instead involves only testing the sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e152.jpg. To see how this can be done, note that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e153.jpg is a graphical sequence, by Theorem 2. Thus, by Theorem 1, for all relevant values of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e154.jpg, the left hand side of Inequality 1, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e155.jpg, and the right hand side of it, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e156.jpg, satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e157.jpg. Furthermore, for the purposes of finding fail-degrees it is sufficient to consider linking the final stub of the hub with only the last non-forbidden node of a given degree, if any exists. After any such link is made, the resulting degree-sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e158.jpg will be non-increasing, and thus Theorem 1 can be applied to test it for graphicality. Therefore, if the degree of the node connected with the last stub of the hub is a fail-degree, then Inequality 1 for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e159.jpg must fail for some An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e160.jpg. For each An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e161.jpg, the possible differences in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e162.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e163.jpg between An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e164.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e165.jpg are as follows. An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e166.jpg is always reduced by 1 because the residual degree of the hub is reduced from 1 to 0. An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e167.jpg may be reduced by an another factor of 1 if the last node connected to the hub, having index An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e168.jpg and degree An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e169.jpg, is such that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e170.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e171.jpg. An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e172.jpg is reduced by 1 if An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e173.jpg, otherwise it is unchanged.

Considering these conditions that can cause Inequality 1 to fail for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e174.jpg, the set of allowed nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e175.jpg can be constructed with the following algorithm that requires only testing An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e176.jpg. Starting with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e177.jpg, compute the values of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e178.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e179.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e180.jpg. There are three possible cases: (1) An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e181.jpg, (2) An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e182.jpg, and (3) An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e183.jpg. In case (1) fail-degrees occur whenever An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e184.jpg is unchanged by making the final link to the hub. Thus, the degree of the first non-forbidden node whose index is greater than An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e185.jpg is the largest fail-degree found with this value of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e186.jpg. In case (2) fail-degrees occur whenever An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e187.jpg is unchanged and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e188.jpg is reduced by 2 by making the final link to the hub. Thus, the degree of the first non-forbidden node whose index is greater than An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e189.jpg and whose degree is less than An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e190.jpg is the largest fail-degree found with this value of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e191.jpg. In case (3) no fail-degree can be found with this value of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e192.jpg. Repeat this process sequentially increasing An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e193.jpg, until all the relevant An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e194.jpg values have been considered, then retain the maximum fail-degree. It can be shown that the algorithm can be stopped either after a case (1) occurs, or after An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e195.jpg where An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e196.jpg is the lowest index of any node in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e197.jpg with degree An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e198.jpg. Once the maximum fail-degree is found, remove the nodes that were temporarily added to An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e199.jpg and construct An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e200.jpg by including all non-forbidden nodes of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e201.jpg with a higher degree. If no fail-degree is ever found, then all non-forbidden nodes of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e202.jpg are included in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e203.jpg. An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e204.jpg will always include the leftmost adjacency set of the hub and any non-forbidden nodes of equal degree.

Note that after a link is placed in the sample construction process, the residual degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e205.jpg changes, and therefore, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e206.jpg has to be determined every time.

Implementing the Erdös-Gallai test

Finally, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e207.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e208.jpg should be calculated efficiently. Calculating the sums that comprise them for each new value of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e209.jpg can be computationally intensive, especially for long sequences. Even computing them only for as many distinct terms as there are in the sequence, as suggested in Ref. [25], can still become slow if the degree distribution is not quickly decreasing. Instead, it is much more efficient to use recurrence relations to calculate them.

A recurrence relation for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e210.jpg is simply

equation image
(4)

with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e212.jpg.

For non-increasing degree sequences, define the “crossing-index” An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e213.jpg for each An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e214.jpg as the index of first node that has degree less than An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e215.jpg, that is for which An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e216.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e217.jpg. If no such index exists, such as for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e218.jpg since the minimum degree of any node in the sequence is 1, then set An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e219.jpg. Then, a recurrence relation for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e220.jpg is

equation image
(5)

where An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e222.jpg is a discrete equivalent of the Heaviside function, defined to be 1 on positive integers and 0 otherwise, and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e223.jpg. Or, since the crossing-index can not increase with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e224.jpg, that is An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e225.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e226.jpg, a value An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e227.jpg will exist for which An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e228.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e229.jpg, and so Eq. 5 can be written

equation image
(6)

Thus, there is no need to find An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e231.jpg for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e232.jpg.

Using Eqs. 4 and 6, the mechanism of the calculation of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e233.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e234.jpg at sequential values of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e235.jpg is shifted from a slow repeated calculation of sums of many terms to the much less computationally intensive task of calculating the recurrence relations. In order to perform the test efficiently, a table of the values of crossing-index An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e236.jpg for each relevant An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e237.jpg can be created as An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e238.jpg is constructed.

It should be noted that the usefulness of this method for calculating An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e239.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e240.jpg is broader than its use for calculating fail-degrees in our sampling algorithm. In particular, it can be used in an Erdös-Gallai test to efficiently determine whether a degree-sequence is graphical.

Sample weights

As previously stated, the weight An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e241.jpg associated with a particular sample, given by Eq. 3, is the product of the sizes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e242.jpg of all the sets of allowed nodes that have been built for each hub node An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e243.jpg divided by the product of the factorials of the initial residual degrees of each hub node. The logarithm of this weight is

equation image
(7)

Generally, degree sequences with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e245.jpg admit many graphical realizations. When this is true, each of the An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e246.jpg terms in square brackets in Eq. 7 are effectively random and independent, and, by virtue of the central limit theorem, their sum will be normally distributed. That is, the weight An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e247.jpg of graph samples generated from a given degree sequence with large An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e248.jpg is typically log-normally distributed. However, degree sequences with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e249.jpg that have only a small number of realizations do exist, and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e250.jpg is not expected to be log-normally distributed for those sequences.

Furthermore, one can consider not just samples of a particular graphical sequence, but of an ensemble of sequences. By a similar argument to that given above for individual sequences, the weight An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e251.jpg of graph samples generated from an ensemble of sequences will also typically be log-normally distributed in the limit of large An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e252.jpg. For example, consider an ensemble of sequences of randomly chosen power-law distributed degrees, that is, sequences of random integers chosen from a probability distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e253.jpg. Hereafter, we refer to such sequences as “power-law sequences.” Figure 1 shows the probability distribution of the logarithm of weights for realizations of power-law sequences with exponent An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e254.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e255.jpg. Note that this distribution is well approximated by a Gaussian fit.

Figure 1
Probability distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e256.jpg of the logarithm of weights for an ensemble of power-law sequences with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e257.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e258.jpg.

We have also studied the behavior of the mean and the standard deviation of the probability distribution of the logarithm of the weights of such power-law sequences as a function of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e262.jpg. As shown in Fig. 2, they scale as a power-law. We have found qualitatively similar results, including power-law scaling of the growth of the mean and variance of the distribution of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e263.jpg, for binomially distributed degree sequences that correspond to those of Erdös-Renyi random graphs with node connection probability An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e264.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e265.jpg, and for uniformly distributed degree sequences, that is power-law sequences with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e266.jpg, with an upper limit, or cutoff, of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e267.jpg for the degree of a node. However, for uniformly distributed degree sequences without an imposed upper limit on node degrees, we find that the sample weights are not log-normally distributed.

Figure 2
Mean An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e268.jpg and standard deviation An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e269.jpg of the distributions of the logarithm of the weights vs. number of nodes An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e270.jpg of samples from an ensemble of power-law sequences with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e271.jpg.

Complexity

In this section we discuss the algorithm's computational complexity. We first provide an upper bound on the worst case complexity, given a degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e281.jpg. Then, using extreme value arguments, we conservatively estimate the average case complexity for degree sequences of random integers chosen from a distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e282.jpg. The latter is useful for realistically estimating the computational costs for sampling graphs from ensembles of long sequences.

To determine an upper bound on the worst case complexity for constructing a sample from a given degree sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e283.jpg, recall that the algorithm connects all the stubs of the current hub node before it moves on to the hub node of the new residual sequence. For every stub from the hub one must construct the allowed set An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e284.jpg. The algorithm for constructing An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e285.jpg, which includes constructing An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e286.jpg, performing the An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e287.jpg vs An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e288.jpg comparisons, and determining the maximum fail-degree, can be completed in An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e289.jpg steps, where An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e290.jpg is the maximum possible number of nodes in the residual sequence after eliminating An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e291.jpg hubs from the process. Therefore, an upper bound on the worst case complexity An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e292.jpg of the algorithm given a sequence An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e293.jpg is:

equation image
(8)

where the sum involves at most An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e295.jpg terms. Equivalently, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e296.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e297.jpg being the number of links in the graph. For simple graphs, the maximum possible number of links is An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e298.jpg, and the minimum possible number is An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e299.jpg. If An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e300.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e301.jpg, and if An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e302.jpg, then An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e303.jpg, which is an upper bound, independent of the sequence.

From Eq. 8, the expected complexity for the algorithm to construct a sample for a degree sequence of random integers chosen from a distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e304.jpg, normalized to unity, can be conservatively estimated as

equation image
(9)

Here An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e306.jpg is the expectation value for the degree of the node with index An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e307.jpg, which is the largest degree for which the expected number of nodes with equal or larger degree is at least An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e308.jpg. That is,

equation image
(10)

Notice that the sum in the above equation runs to the maximum allowed degree in the network An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e310.jpg, which is nominally An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e311.jpg, but a different value can be imposed. For example, in the case of power-law sequences, the so-called structural cutoff of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e312.jpg is necessary if degree correlations are to be avoided [19], [26], [27]. However, such a cutoff needs to be imposed only for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e313.jpg, because the expected maximum degree An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e314.jpg in a power-law network grows like An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e315.jpg. Thus, for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e316.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e317.jpg grows no faster than An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e318.jpg and no degree correlations exist for large An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e319.jpg [28].

Given a particular form of distribution An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e320.jpg, Eq. 9 can be computed for different values of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e321.jpg. Subsequent fits of the results to a power-law function allow the order of the complexity of the algorithm to be estimated. Figure 3 shows the results of such calculations for power-law sequences with and without the structural cutoff of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e322.jpg as a function of exponent An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e323.jpg. Note that, in the absence of cutoff, the results indicate that the order of the complexity goes to a value of 3 for An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e324.jpg, that is, in the limit of a uniform degree distribution. However, if the structural cutoff is imposed the order of the complexity is only An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e325.jpg in this limit. Both these results are easily verified analytically.

Figure 3
The estimated computational complexity of the algorithm for power-law sequences.

We have tested the estimates shown in Fig. 3 with our implementation of the sampling algorithm for power-law sequences with and without the structural cutoff for certain values of An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e332.jpg, including 0, 2, and 3. This was done by measuring the actual execution times for generating samples for different An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e333.jpg and fitting the results to a power-law function. In every case, the actual order of the complexity of our implementation of the sampling algorithm was equal to or slightly less than its estimated value shown in Fig. 3.

Discussion

We have solved the long standing problem of how to efficiently and accurately sample the possible graphs of any graphical degree sequence, and of any ensemble of degree sequences. The algorithm we present for this purpose is ergodic and is guaranteed to produce an independent sample in, at most, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e334.jpg steps. Although the algorithm generates samples non-uniformly, and, thus, it is biased, the relative probability of generating each sample can be calculated explicitly permitting unbiased measurements to be made. Furthermore, because the sample weights are known explicitly, the algorithm makes it possible to sample with any arbitrary distribution by appropriate re-weighting.

It is important to note that the sampling algorithm is guaranteed to successfully and systematically proceed in constructing a graph. This behavior contrasts with that of other algorithms, such as the configuration model (CM), which can run into dead ends that require back-tracking or restarting, leading to considerable losses of time and potentially introducing an uncontrollable bias into the results. While there are classes of sequences for which it is perhaps preferable to use the CM instead of our algorithm, in other cases its performance relative to ours can be remarkably poor. For example, a configuration model code failed to produce even a single sample of a uniformly distributed graphical sequence, An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e335.jpg, with An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e336.jpg, after running for more than 24 hours, while our algorithm produced An external file that holds a picture, illustration, etc.
Object name is pone.0010012.e337.jpg samples of the very same sequence in 30 seconds. Furthermore, each sample generated by our algorithm is independent. This behavior contrasts with that of algorithms based on MCMC methods. Because our algorithm works for any graphical sequence and for any ensemble of random sequences, it allows arbitrary classes of graphs to be studied.

One of the features of our algorithm that makes it efficient is a method of calculating the left and right sides of the inequality in the Erdös-Gallai theorem using recursion relations. Testing a sequence for graphicality can thus be accomplished without requiring repeated computations of long sums, and the method is efficient even when the sequence is nearly non-degenerate. The usefulness of this method is not limited to the algorithm presented for graph sampling, but can be used anytime a fast test of the graphicality of a sequence of integers is needed.

There are now over 6000 publications focusing on complex networks. In many of these publications various processes, such as network growth, flow on networks, epidemics, etc., are studied on toy network models used as “graph representatives” simply because they have become customary to study processes on. These include the Erdös-Rényi random graph model, the Barabási-Albert preferential attachment model, the Watts-Strogatz small-world network model, random geometric graphs, etc. However, these toy models are based on specific processes that constrain their structure beyond their degree-distribution, which in turn might not actually correspond to the processes that have led to the structure of the networks investigated with them, thus potentially introducing dangerous biases in the conclusions of these studies. The algorithm presented here provides a way to study classes of simple graphs constrained solely by their degree sequence, and nothing else. However, additional constraints, such as connectedness, or any functional of the adjacency matrix of the graph being constructed, can in principle be added to the algorithm to further restrict the graph class built.

After this paper was accepted for publication, we became aware of an unpublished work by J. Blitzstein and P. Diaconis that provides another direct construction method for sampling graphs with given degree sequences.

Acknowledgments

The authors gratefully acknowledge Y. Sun, B. Danila, M. M. Ercsey Ravasz, I. Miklós, E. P. Erdös and L. A. Székely for fruitful comments, discussions and support.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: CIDG and KEB are supported by the National Science Foundation (NSF) through grant DMR-0908286 and by the Norman Hackerman Advanced Research Program through grant 95921. HK and ZT are supported in part by the NSF BCS-0826958 and by the Defense Threat Reduction Agency (DTRA) through HDTRA 201473-35045. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Albert R, Barabási AL. Statistical mechanics of complex networks. Rev Mod Phys. 2002;74:47–97.
2. Newman MEJ. The structure and function of complex networks. SIAM Rev. 2003;45:167–256.
3. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU. Complex networks: Structure and dynamics. Phys Rep. 2006;424:175–308.
4. Newman MEJ, Barabási AL. The structure and dynamics of networks. Princeton University Press; 2006.
5. Liljeros F, Edling CR, Amaral L, Stanley H, Åberg Y. The web of human sexual contacts. Nature. 2001;411:907–908. [PubMed]
6. Bianconi G. Entropy of network ensembles. Phys Rev E. 2009;79:036114. [PubMed]
7. Bianconi G, Coolen ACC, Perez Vicente CJ. Entropies of complex networks with hierarchically constrained topologies. Phys Rev E. 2008;78:016114. [PubMed]
8. Kim H, Toroczkai Z, Erdös P, Miklós I, Székely L. Degree-based graph construction. J Phys A: Math Theor. 2009;42:392001.
9. Koren M. Sequences with a unique realization by simple graphs. J Comb Theor B. 1976;21:235.
10. Taylor R. Constrained switchings in graphs. SIAM J Alg Disc Math. 1982;3:115–121.
11. Cooper C, Dyer M, Greenhill C. Sampling regular graphs and a peer-to-peer network. Comb Prob Comp. 2007;16:557–593.
12. Kannan R, Tetali P, Vempala S. Simple markov-chain algorithms for generating bipartite graphs and tournaments. Random Struct Alg. 1999;14:293–308.
13. Viger F, Latapy M. Efficient and simple generation of random simple connected graphs with prescribed degree sequence. Lect Notes Comp Sci. 2005;3595:440–449.
14. Bollobás B. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. Eur J Comb. 1980;1:311–316.
15. Bender E, Canfield E. The asymptotic number of labelled graphs with given degree sequences. J Comb Th A. 1978;24:296–307.
16. Molloy M, Reed B. A critical point for random graphs with a given degree sequence. Rand Struct Alg. 1995;6:161–179.
17. Newman MEJ, Strogatz SH, Watts DJ. Random graphs with arbitrary degree distributions and their applications. Phys Rev E. 2001;64:026118. [PubMed]
18. Britton T, Deijfen M, Martin-Löf A. Generating simple random graphs with prescribed degree distribution. J Stat Phys. 2006;124:1377–1397.
19. Chung F, Lu L. Connected components in random graphs with given expected degree sequences. Ann Combinatorics. 2002;6:125.
20. Erdös P, Gallai T. Graphs with prescribed degree of vertices. Mat Lapok. 1960;11:477.
21. Havel V. A remark on the existence of finite graphs. Časopis Pěst Mat. 1955;80:477.
22. Hakimi SL. On the realizability of integers as the degrees of the vertices of a linear graph - I. J SIAM Appl Math. 1962;10:496.
23. Newman MEJ, Barkema GT. Monte Carlo methods in statistical physics. Oxford University Press; 1999.
24. Cochran WG. Sampling techniques. Wiley; 1977.
25. Tripathi A, Vijay S. A note on a theorem of erdös & gallai. Discr Math. 2003;265:417.
26. Burda Z, Krzywicki A. Uncorrelated random networks. Phys Rev E. 2003;67:046118. [PubMed]
27. Boguñá M, Pastor-Satorras R, Vespignani A. Cut-offs and finite size effects in scale-free networks. Eur Phys J B. 2004;38:205.
28. Catanzaro M, Boguñá M, Pastor-Satorras R. Generation of uncorrelated random scale-free networks. Phys Rev E. 2005;71:027103. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science