PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nistpaNIST's Public Access PlanNIST WebsiteHow to Use PMC
 
Internet Math. Author manuscript; available in PMC 2017 March 24.
Published in final edited form as:
Internet Math. 2016; 12(3): 205–219.
Published online 2016 March 24. doi:  10.1080/15427951.2016.1164768
PMCID: PMC4971584
NIHMSID: NIHMS804398

Fast Sequential Creation of Random Realizations of Degree Sequences*

Abstract

We examine the problem of creating random realizations of very large degree sequences. Although fast in practice, the Markov chain Monte Carlo (MCMC) method for selecting a realization has limited usefulness for creating large graphs because of memory constraints. Instead, we focus on sequential importance sampling (SIS) schemes for random graph creation. A difficulty with SIS schemes is assuring that they terminate in a reasonable amount of time. We introduce a new sampling method by which we guarantee termination while achieving speed comparable to the MCMC method.

1 Introduction

Creating random graphs with a given degree sequence is useful for tasks from counting graphs with a given degree sequence to creating models of networks. Unfortunately, the problem of creating a uniformly sampled random graph for a given degree sequence, using a reasonable amount of time and space, is an open and difficult problem.

The formalize the problem, we use some basic terminology. An degree sequence α = (α12, ...,αn) is a set of non-negative integers such that n–1 ≥ α1 ≥ α2 ≥ ... ≥ αn 0. For any simple, undirected graph G = (V,E), where V is a set of vertices and E is a set of edges, with node degrees α, G is termed a realization of α. We use the standard notation of n and m to represent number of nodes and edges respectively. If a degree sequence α has at least one realization, then sequence is graphic.

Using this notation, the basic problem is to select a near-uniform random realization of a graphic degree sequence α. Additionally, we are concerned with issues such as speed and memory usage that affect the creation of very large graphs. This article describes a new approach to this problem.

For the remainder of this article, we first review previous work to this problem and why there is a need for a new approach. Following that section, we give needed background results and definitions. The next four sections introduce the algorithm and provide analysis of its run-time and the quality of its results. Finally, we will offer some conclusions and future directions of investigation.

2 Previous Work

The most common method for creating a random realization of a degree sequence is the Monte Carlo Markov chain (MCMC) approach. In this approach, an initial (nonrandom) realization is created and then a series of random edge switches are chosen creating a walk among different realizations of the same degree distribution [Kleitman and Wang, 1973; Taylor, 1981]. This approach is popular because of its ease of implementation and its speed.

The major disadvantage of this approach is that it requires access to all edges in the graph in order to perform the random walk. This limits the size of graph that can be generated using this method. If the size of the graph is too large to fit within the memory of the computer, there is a significant access penalty incurred for each random edge selection.

Additionally, understanding the mixing time of the Markov chain is an open problem; it is unknown as to whether the random walk on a general degree distribution is rapid mixing. Significant progress was made Cooper et al. [2007] by showing that, for regular sequences, the MCMC approach is rapid mixing, but the bound they give is

τ(ε)αmax17n7log(αnε-1),

where αmax is the maximum value in the degree sequence α. More recently, [Greenhill, 2015] showed that if, for the maximum degree in a sequence, αmax,

3αmaxi=1nαi4,
(1)

then the resulting Markov chain is rapid mixing, but the resultant bound is an order of n larger than the bound for the regular case. Unfortunately, both bounds are much too large for practical use.

There is empirical evidence that the actual mixing times are much lower, perhaps as low as O(m) time [Gkantsidis et al., 2003]. With the time needed to created a Havel-Hakimi instance in order to start the random walk, a typical MCMC implementation uses O(mlogn) time and O(m) space to generate a random realization.

Another approach to generating random realizations is by using a sequential importance sampling (SIS) method to randomly select edges until a realization is built. This allows for selected edges to be saved at each step, and thus, only requires that the degree sequence, not the entire graph, fits into memory.

The difficulty with an SIS approach is that while we are selecting edges to add to our random realization it is possible to become “stuck”. In other words, we cannot arbitrarily choose edges and guarantee that a graph exists that has the given degree sequence and the chosen edges.

The first algorithm to overcome this problem of creating an SIS algorithm that could guarantee termination is from [Blitzstein and Diaconis, 2010]. Unfortunately, the Blitzstein-Diaconis algorithm (BD) has two drawbacks. The first is that the method has not been shown to sample uniformly. The authors were able to show empirical evidence that their algorithm gave only a reasonable sampling of the realizations. The second and more serious setback of this approach is that the algorithm’s run-time of O(mn2) makes it much slower than the MCMC approach. This runtime can be lowered to O(mn) time, but this is still significantly slower than the MCMC algorithm [Moseman, 2015].

A second SIS method was introduced, which we will denote as BKS [Bayati et al., 2009]. The advantage of their method stems from the result that if αmaxO(m14-τ) then their algorithm is expected to run in O(αmaxm) time and asymptotically approaches uniform sampling. This is a powerful result but it requires a very strong constraint on the degree sequence.

In addition, unlike the BD algorithm, this algorithm does not have a termination guarantee; it only has a high probability of succeeding for sequences that meet the maximum degree-size bound. This lack of a termination guarantee becomes problematic when using the BKS method for instantiating sequences that are outside of the prescribed limits. For instance, from the set of threshold degree sequences [Mahadev and Peled, 1995], as the length of the sequences grows, the probability that the BKS method is able to construct a realization of the sequences quickly approaches zero.

Finding fast SIS methods are essential to being able to create very large random graphs. In this article we introduce a new SIS algorithm that maintains the termination guarantee of the BD algorithm while having a runtime that is competitive with the MCMC and BKS methods.

3 Definitions and Basic Results

We begin with some needed preliminary definitions and results. In order to compare sequences, we will use majorization which is a partial order over the set of degree sequences. The degree sequence α majorizes (or dominates) the degree sequence β, denoted by α [succeeds, equals] β, if for all k from 1 to n

i=1kαii=1kβi,
(2)

and if the sums of the two sequences are equal, i.e.,

i=1nαi=i=1nβi.
(3)

The sequence α strictly majorizes β, denoted by α [succeeds] β, if αi ≠ βi for at least one i.

If α is a degree sequence and αi ≥ αj +2 then the operation of subtracting 1 from αi and adding 1 to αj is called the unit transformation from i to j on α. The new sequence created by this operation is denoted by α[i → j]. To designate the application to a degree sequence α of a series of unit transformations, we use the shorthand notation α[i1 → j1][i2 → j2] = (α[i1 → j1]) [i2 → j2].

There is a close relationship between majorization and unit transformations as shown by the following theorem.

Theorem 1 (Muirhead [1903])

For any degree sequences α and β where α [succeeds, equals] β, β can be obtained from α by a finite sequence of unit transformations.

The power of comparing degree sequences using majorization comes from the following theorem.

Theorem 2 (Ruch and Gutman [1979], Theorem 1)

If the degree sequence α is graphic and α [succeeds, equals] β, then β is graphic.

We introduce an additional notation for showing modifications to a degree sequence. For a degree sequence α, index i, and a set of indices ω where i [negated set membership] ω and |ω| = αi, a reduction R(α; i,ω) is the integer sequence where we delete the term αi and subtract 1 from the terms in the index set ω and the resulting sequence is sorted into reverse order. The set of all reductions from an index i on the sequence α we will denote as R(α, i).

In the set R(α, i) there exists a reduction Rmax(α; i) that majorizes all the other reductions. This reduction is created by setting the index set ω to the αi largest indices where i [negated set membership] ω. At the same time, there exists a reduction Rmin(α; i) that is majorized by all the other reductions and is created by setting the index set ω to the αi smallest indices where i [negated set membership] ω. An example is for the sequence α = (5,5,5,5,4,4,1,1,1,1), then Rmax(α;1) = (5,5,5,4,3,0,0,0,0) and Rmin(α;1)= (4,4,4,3,3,1,1,1,1). Performing the reduction, Rmin(α; i) to a degree sequence is commonly called a “laying-off” procedure in the literature. We now restate a classic result in terms of reductions.

Theorem 3 (Kleitman and Wang [1973], Theorem 2.1)

A degree sequence α is graphic if and only if for any index i, Rmin(α; i) is graphic.

This result is a simple consequence of the theorem of [Ruch and Gutman, 1979]. If α is graphic, then for a given index i, there must exist some reduction R(α; i,ω) that is also graphic. This reduction comes from simply taking the adjacent vertices to vi in the realization of α. Since R(α; i,ω) [succeeds, equals] Rmin(α; i), then Rmin(α; i) must also be graphic.

4 An SIS Approach Using Reductions

Previous SIS algorithms for creating random realizations involve choosing individual edges at each step. This approach leads to either slow and complicated checks to ensure that the resulting edge in the graph is viable as in case of the BD algorithm, or no guarantee of termination as in the case of the BKS algorithm.

Rather than selecting individual edges, we instead choose all the potential edges to a given node at once, i.e., choosing a random reduction to the given node. Determining whether or not the selected edges are valid requires testing only if the resulting reduction is graphic; if the reduction is graphic, then there is a realization that contains the resulting edges in the reduction.

This leads to a direct SIS algorithm; for the sequence α and the index i, at each step, we choose a random reduction R(α; i,ω). If the chosen reduction is non-graphic, we use it as an upper bound for selecting a new reduction, R(α; i,ω′), such that R(α; i,ω′) [precedes] R(α; i,ω). If this new sequence is still not graphic, it becomes the new upper bound and we iterate. From Theorem 3, this process will eventually find a graphic reduction, and thus, the algorithm itself is guaranteed to terminate with a random realization. This algorithm is shown in Figure 1.

Figure 1
SIS selection method based on random reductions.

The time needed for selecting a random reduction that respects the above constraints needs some explanation. First, let us consider the case when there is no previous upper bound for the reduction R(α; i,ω). We note that if a degree sequence is sorted and we guarantee that its sum is even, then we can test if the sequence is graphic in O(m) time [Tripathi and Vijay, 2003]. Using a data structure such as a binary indexed tree [Fenwick, 1994], we can both sample from a weighted distribution of the nodes and update the weighting after selection in O(logn) time per operation. Thus, the total time to select a reduction set ω for the reduction R(α; i,ω) using a given weighting of the nodes is O(αi logn). If, for each iteration, we reduce from an index i where i ≥ αi (such as choosing the index of the smallest value at each step), then αim and the total time needed to choose and test a random reduction becomes O(m·logn).

Now consider if we have a previous upper bound S = R(α; i,ωS). In order to choose a new reduction N =R(α; i,ωN) where R(α; i,ωN)[precedes]S using the described method, first select a new reduction R(α; i,ωP). Then define ωN as the αi smallest indices in the set ωS [union or logical sum]ωP. As we will show in Section 7, this selection process corresponds to the operation R(α; i,ωN) = R(α; i,ωS)∧R(α; i,ωP) [precedes, curly equals] R(α; i,ωS). The only difficulty with this approach is when R(α; i,ωP) [succeeds, equals] R(α; i,ωS) and so R(α; i,ωN) = R(α; i,ωS). When this occurs, we choose the largest index i where i [negated set membership] ωS such that there exists an index j [set membership] ωS where R(α; i,ωS)[i → j] is a valid reduction. By setting R(α; i,ωN)=R(α; i,ωS)[i → j][precedes]R(α; i,ωS), we guarantee that the algorithm will terminate with a set time bound.

Combining all these parts, the overall runtime of Algorithm 1 is O(nlogn·m·g(α)) where g(α) is the number of non-graphic reductions selected during the construction of the realization. We now examine the value g(α).

5 Selecting Graphic Reductions

The runtime of Algorithm 1 depends on the expected value of the number of non-graphic reductions selected, g(α), and this value g(α) strongly depends on the distribution of the degree sequence. To discuss this dependency on the sequence distribution, we will compare degree sequences by their majorization gap [Arikati and Peled, 1994]. The majorization gap for a sequence α is the minimum number of unit transformations from some threshold sequence β to α where β [succeeds, equals] α. This measures how close to being threshold a particular sequence is; the closer a sequence is to being threshold [Mahadev and Peled, 1995], then the smaller the sequence’s majorization gap is and the fewer the number of graphic reductions there are in a reduction set for any given index. If the sequence α is a threshold sequence, i.e., has a majorization gap of 0, then there is exactly one reduction in the set R(α, i) that is graphic. However, as the majorization gap becomes larger, the sequence is closer to being regular and the number of possible graphic reductions grows. For any sequence sum, there is a majorization gap value for which all degree sequence whose majorizations of all possible reductions are graphic.

Figure 2 shows the dependency of g(α) to its majorization gap. In this figure, we show the probability that a given uniformly sampled reduction is graphic from all degree sequences with a fixed length and majorization gap. The first column, where the majorization gap is 0, represents all the threshold graphs for a given length. This figures that the closer a sequence’s majorization gap is to zero, the higher the probability that a random reduction will not be graphic.

Figure 2
This figure displays the probabilities that a uniformly selected random reduction on the last index of a sequence will be graphic across all graphic degree sequences of a given length and majorization gap. This figure is a compilation of all graphic sequences ...

This dependency on the distribution is closely related to the probability distribution used for selecting the reductions. This weighting function for the selection probabilities is important for two reasons: the first is that having a uniform or near-uniform selection probability requires the correct edge selection probabilities. Although it is an open problem to show that any weighting function produces uniform selection among the various realizations for a degree sequence, we use the weighting function described by Blitzstein and Diaconis for their algorithm. For the degree sequence α and index p, the weight given to an edge (p,q) to be selected for the realization G is proportional to the degree of q. Blitzstein and Diaconis empirically observed that this weighting gives a more uniform sampling of the realizations than simply using a uniform weighting of the nodes.

Since the BD weighting tends to create edges to vertices with larger degrees, the majorization gap of the resulting reduction typically is larger than a reduction using uniform selection. This tends to reduce the number of non-graphic reductions selected in the algorithm. In Figure 3, we see the results of distribution and the sampling method on the number of non-graphic reductions selected. For distributions that tend to have smaller majorization gaps (in this case, power-law distributions compared to Poisson distributions), they will be expected to have more nongraphic selections. Also, there are more nongraphic selections for uniform edge sampling versus the weighting scheme of Blitzstein and Diaconis. It should be noted that, for these test cases, there were a very small number of nongraphic reductions chosen. In fact, other than the case which used a power-law distribution along with an uniform weighting combination, the algorithm averaged 1 nongraphic reduction selected out of the 2000 selections. For those cases the algorithm essentially behaved as if it has a O(nlogn·m) run-time.

Figure 3
This shows the average number of non-graphic random reductions where the x-axis of each plot is the sequence length. Each point is an average of 30 different degree distribution with the same sequence length and distribution parameters. Each graph shows ...

Because the maximum number of reductions the algorithm chooses before coming to the minimum reduction is polynomial in n [Greene and Kleitman, 1986], then from a uniform sampling from the reductions majorized by a given reduction, we would expect to find a graphic reduction in no more than g(α) = O(logn) trial reductions. Thus, an expected time for this algorithm of O(nlog2n·m) within a logarithmic factor of the time for MCMC when the sequence is dense (m = O(n2)), and is at worst a O(nlogn) factor slower for sparse sequences (m = O(n)).

The worst-case bound for g(α) is much larger than its expected bound. From a result we will prove in Section 7, the number of nongraphic reductions that the algorithm can potentially choose before encountering a graphic reduction is g(α) = Ω(n2). We will now consider how to use the order structure of the reduction set to limit this worst-case behavior of the algorithm.

As we will show in Section 7, the partial-order (R(α, i),[succeeds, equals]) is a lattice. Theorem 3 shows that the bottom element Rmin(α, i) in (R(α, i),[succeeds, equals]) will always be graphic if the original sequence is graphic. Because all the graphic sequences are grouped at the bottom of the lattice, we reduce the possibility of selecting a nongraphic reduction by pruning some of the nongraphic reductions at the top of the lattice. If we descend down the lattice starting from the maximum element, we will eventually find a graphic sequence that we can use to provide a range for selecting reductions. But simply finding a maximal graphic reduction in the reduction lattice is not enough to create a selection range where every realization has a non-zero probability of being selected. Consider the sequence with its corresponding reduction lattice in Figure 4. If we look at the set of graphic reductions, we notice that there are two maximal graphic reductions: (5,5,4,3,3,1,1,0,0) and (5,4,4,4,4,1,0,0,0). If we were to use one of those reductions as an upper bound for the set of reductions from which we are choosing, then the other reduction would have a zero probability of occurring in a realization.

Figure 4
This figure shows a lattice of reductions for the sequence (5,5,5,5,4,4,1,1,1,1) on its first index. The grey nodes signify non-graphic reductions while the white nodes are graphic.

In order to navigate the lattice to find an upper bound that does not preclude any graphic realizations, we use the following procedure. Start with the maximum reduction r = Rmax(α,d) and test if it is graphic. If it is not, then choose the smallest and largest indices in r, i and j respectively, where neither i nor j has been previously used in this search procedure, and create the new sequence r[i → j]. It is straightforward to see that the sequence r[i → j] remains a reduction. We now retest this new reduction whether it is graphic and repeat the procedure; if the reduction is not graphic, then we continue by choosing a new set of indices i, j that have not been used in this procedure. The maximum number of times this procedure will repeat is at most αd, because after αd iterations r becomes Rmin(α,d) which is graphic.

The point of this construction is that each time we choose the indices in this manner, the resulting reduction is majorized by all other valid reductions created by some unit transformation. Thus, for the nongraphic reduction r where r[i → j] is graphic, any maximal graphic reduction p will be r[i → j] [precedes, curly equals] p [precedes] r.

We now formalize this idea with a simple algorithm. As shown in Algorithm 2 in Figure 5, the basic idea is before any random reduction is selected for an index d, we call the procedure to give bounds for the selection range of the reductions. For the reduction R(α,d), the maximum number of iterations in the algorithm is αd. With an appropriate data structure, each unit transformation can be implemented in constant time, giving the algorithm a total run-time of O(αd).

Figure 5
SIS algorithm for creating a random model using the reduction bounds method. The reduction bounds routine returns two values, βN which is a minimal non-graphic reduction such that there exists a graphic reduction that is a unit transformation ...

6 Quality of Random Graphs

Determining the quality, or how close to uniform, the random graphs produced by the reduction algorithm are remains an open question. While we cannot prove that the sampling approaches uniform, we can empirically compare the graphs produced by the different algorithms. Figure 6 shows a comparison of the four algorithms: Algorithm 1, MCMC, BD, and BKS.

Figure 6
The x-axis shows the length for the random sequences. The Poisson degree distributions come from an Erdős-Renyi model where p = 0.1 and the power law degree come from a power-law distribution with an exponent of 2.0. The points represent the mean ...

These algorithms were compared by creating a series of graph degree sequences drawn from Poisson and power-law distributions, where a single random sequence for both distributions was created for each length from 100 to 2,000 incremented by 100. The exact distributions are a Poisson distribution coming from a Erdős-Renyi random graph with a edge probability of p = 0.1 and a power-law distribution with an exponent of k = 2.0. For each sequence, 30 random graphs were created by each algorithm. For the MCMC algorithm, we took the running time as 30 · m switches as suggested in Ray et al. [2012]. We compared three graph properties, again chosen from Ray et al. [2012], to examine the similarity of the random graphs: the global clustering coefficient, the maximum eigenvalue, and the diameter.

We make two comments about our experimental setup. The first is that for the power-law distribution, the BKS algorithm could not converge to a random graph, thus there are no results for BKS shown for those distributions. A second point is that for a number of the random graphs produced for the power-law distributions by the other three methods were not connected and, thus, had an infinite diameter. Therefore, we do not show the diameter results for this case.

Examining the results, we note the all four algorithms produced virtually identical results for the Poisson distributions. At the scale shown in the figure, the average behavior between the four algorithms was indistinguishable.

For the power-law distributions, we note two points. First, the reduction and BD algorithms give essentially identical results. This can be expected, since the reduction algorithm is using the Blitzstein-Diaconis weighting for selecting its reductions. Thus the quality of the results given by Algorithm 1 is essentially the same as that of the BD algorithm. A second point is that for the power-law case, we see the only noticeable difference between the results of the algorithms. There is a slight but discernible separation between the MCMC results and Algorithm 1 and BK results.

7 Bounding the Chain Lengths

Over the set of partitions Lp for some positive integer p, majorization forms the lattice (Lp,[succeeds, equals]) [Brylawski, 1973] where the meet and join operators are defined as follows:

(αβ)k=min{i=1kαi,i=1kβi}-i=1k-1(αβ)i=min{i=1kαi,i=1kβi}-min{i=1k-1αi,i=1k-1βi},
(4)

αβ={ϕ:ϕα,ϕβ}.
(5)

If the integer p is even, then there are a small number of elements at the bottom of the partition lattice that are graphic, while the majority of the partitions in the lattice are nongraphic [Pittel, 1999]. This integer partition lattice has provided a framework for considering realizability and uniqueness problems of degree sequences [Aigner and Triesch, 1994].

In this section, we use the lattice structure inherent in the set of reductions to establish bounds on the two algorithms. To show this, we will use the terminology and main result found in [Greene and Kleitman, 1986]. An H-step is a unit transform [i → i+1], whereas a V-step is a unit transform [i → j] where αi = αj +2. An HV-chain between the sequences α and β, is a series of transformations α = α(0) [succeeds] α(1) [succeeds] α(2) [succeeds] ... [succeeds] α(i) [succeeds] ... [succeeds] α(L) = β where every unit transformation from 1 to i is an H-step and every unit transformation from i to L is a V-step.

In an integer partition lattice, the length of the longest chain between two sequences, α and β where α [succeeds, equals] β, is defined as h(α,β). That HV-chains form the longest chains between sequences in these lattices was shown in [Greene and Kleitman, 1986].

Theorem 4 (Greene and Kleitman [1986], Theorem 12)

Suppose that α [succeeds, equals] β. Then all HV-chains from α to β have the same length and this length is h(α,β).

The degree set D(α) of a sequence α is the set of values in the sequence, i.e., D(α)={d|d =αi for 1≤i≤n}. We denote the subsequence of α from the index i to the index j as α(i : j). We now relate the degree set between the indices in a unit transformation to the length of the longest chain between the two sequences.

Theorem 5

For the positive integer sequence α and 1 ≤ i < j ≤ n, then

D(α(i:j))-1h(α,α[ij])D(α(i:j)).
(6)

Proof

Theorem 4 shows that we only need to construct an HV-chain from α to α[i → j] to establish the length of the maximum chain. To create an HV-chain, we will induct on the indices from i to j. To begin, we choose the smallest index p > i such that αi ≥ αp+2. For the index p, then there are three possibilities for maximum chain length h(α,α[i → p]).

  1. If αi =αp+2, then [i → p] defines a HV-chain of length 2 |D(α(i : p))|.
  2. If αi > αp +2 and αi = αp–1, then [p–1 → p] defines an HV-chain of length 2 = |D(α(i : p))|.
  3. If αi > αp+2 and αi = αp–1+1, then [p–1 → p][i → p–1] defines an HV-chain of length 3 = |D(α(i : p))|.

We will now induct on the index number to show that this relationship continues to hold. Assume that for all indices p ≤ r ≤ k there there is an HV-chain from α to α[i → r] whose length is |D(α(i : r))|1≤h(α,α[i → r]) |D(α(i : r))|. If αk+1 =αk, then α[i → k] = α[i → k+1] and so they have the same HV-chain. Else if αkk+1, then there are two cases. If αk ≥αk+1+2, then α[i → k+1] = α[k → k+1][i → k]. By prepending the H-step [k → k+1] to the front of the HV-chain for [i → k], we then have a HV-chain from α to α[i → k+1] whose length is |D(α(1 : k+1))|. If αk = αk+1+1, then there exists a smallest index s such that αs = αk. Now α[i → k+1] = α[i → s][s → k+1]. Again, by appending to the end of the HV-chain for α[i → s] the V-step [s → k+1], we have a new HV-chain that satisfies the condition.

This result bounds the number of possible nongraphic sequences that can be randomly selected before choosing a graphic sequence. The maximum number of nongraphic sequences in Algorithm 5 is no greater than the maximum chain length from α to α[i → j], i.e., h(α,α[i → j]) ≤ j–i+1 ≤ n.

If we take the set of the possible reductions for some index i in the sequence α, they form a subset of the integer partitions. The majorization operation also forms a lattice over the set of reductions, (R(α, i),[succeeds, equals]), where the meet and join definitions are identical from the integer partition lattice. We see this by showing that the meet operator (and by extension, the join operator) are closed for the set of reductions.

Theorem 6

For the sequence γ and the sequences α,β [set membership] R(γ,d), then α ∧β [set membership] R(γ,d).

Proof

From the definition of a reduction, 0 ≤ γi –αi 1 and 0 ≤ γi –βi 1 for all i ≠ d. It follows that for all i, that |αi –βi| 1. For the sequence α ∧β to not be a reduction requires that there is an index p where γp(α ∧ β)p 2. From Equation 4, if i=1p(αβ)i=i=1pαi and i=1p-1(αβ)i=i=1p-1αi, then (α ∧ β)p = αp and so γp(α ∧ β)p 1. There is an identical argument for when both sums are equal to the β sums. For a contradiction, we assume that i=1pαi>i=1pβi and i=1p-1αi<i=1p-1βi. But these inequalities imply that αp–βp 2 which contradicts the fact that this difference can be no larger than 1. Thus α ∧β [set membership] R(γ,d).

To establish a worst-case time bound for Algorithm 1, we show there exist reduction lattices that contain quadratic-length chains of nongraphic reductions.

Theorem 7

There exist reduction lattices of length n sequences that contain chains of nongraphic reductions of length Ω(n2).

Proof

Consider the degree sequence taken from the graph that is constructed by starting from the empty graph and then for m times adding an isolated node followed by a dominating node. From the construction of this graph, it is straightforward to see that this sequence is both threshold (see [Mahadev and Peled, 1995], Theorem 1.2.4) and that its degree set contains all the integers from 1 to n–1. Thus this degree sequence with its length of n = 2m is

α=(n-1,n-2,,m+1,m,m,m-1,,2,1).
(7)

Since α is threshold, then Rmin(α,m) must be the only graphic sequence in R(α,m). We construct Rmin(α,m) from Rmax(α,m) with the following series of unit transformations,

Rmin(α,m)=Rmax(α,m)[m-1n][m-2n-1][1m+1].
(8)

From Theorem 4, for each of these unit transformations, there is a HV-chain of length of ≥ m in the integer partition lattice. The point of this construction is that for each sequence in the HV-chain of a unit transformation, it is also a valid reduction. Thus between each unit transformation there is a sequence of m reductions, and for the m unit transformations there exists a chain of reductions whose length is at least m2 = Θ(n2) between Rmax(α,m) and Rmin(α,m).

8 Conclusions

We have examined the problem of sequentially creating a random realization of a very large degree sequence, and in particular in a reasonable amount of time. Current published algorithms all have failings that reduce their usefulness. The MCMC approach requires holding an entire graph in memory, the BKS algorithm does not guarantee termination and for some degree sequences has a high probability of not succeeding, and the BD algorithm is significantly slower than the other two approaches. The algorithm proposed in this article overcomes all of these difficulties.

This algorithm is competitive with the time requirement for the MCMC approach, but with a significant savings in memory usage. The quality of the selected realizations produced matches the output of the Blitzstein-Diaconis method. With this approach, it is feasible to create realizations containing millions of nodes.

Further research problems in this area starts with the problem of determining whether a better sampling scheme for the edge selection exists than the Blitzstein-Diaconis weighting. Another important question is to prove the expected runtime of this algorithm, and in particular, the expectation of choosing a nongraphic reduction.

Acknowledgments

The author would like to thank Zoe Park with her assistance in writing code. Also, the author would like to thank Peter Mell, Isabel Beichl and especially the anonymous reviewer for their helpful comments in substantially improving the presentation of the paper.

Footnotes

*Official contribution of the National Institute of Standards and Technology; not subject to copyright in the United States.

References

  • Aigner Martin, Triesch Eberhard. Realizability and uniqueness in graphs. Discrete Mathematics. 1994 Dec;136(1–3):3–20. doi: 10.1016/0012-365X(94)00104-Q. [Cross Ref]
  • Arikati Srinivasa R, Peled Uri N. Degree sequences and majorization. Linear Algebra and its Applications. 1994 Mar;199(Supplement 1):179–211. doi: 10.1016/0024-3795(94)90349-2. [Cross Ref]
  • Bayati Mohsen, Kim Jeong Han, Saberi Amin. A sequential algorithm for generating random graphs. Algorithmica. 2009 Jul; doi: 10.1007/s00453-009-9340-1. [Cross Ref]
  • Blitzstein Joseph, Diaconis Persi. A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Mathematics. 2010;6(4):489. doi: 10.1080/15427951.2010.557277. [Cross Ref]
  • Brylawski Thomas. The lattice of integer partitions. Discrete Mathematics. 1973;6(3):201–219. doi: 10.1016/0012-365X(73)90094-0. [Cross Ref]
  • Cooper Colin, Dyer Martin, Greenhill Catherine. Sampling regular graphs and a peer-to-peer network. Combinatorics, Probability and Computing. 2007;16(4):557–593. doi: 10.1017/S0963548306007978. [Cross Ref]
  • Fenwick Peter M. A new data structure for cumulative frequency tables. Software: Practice and Experience. 1994;24(3):327–336. doi: 10.1002/spe.4380240306. [Cross Ref]
  • Gkantsidis Christos, Mihail Milena, Zegura Ellen W. The Markov chain simulation method for generating connected power law random graphs. ALENEX. 2003:16–25.
  • Greene Curtis, Kleitman Daniel J. Longest chains in the lattice of integer partitions ordered by majorization. European Journal of Combinatorics. 1986;7(1):1–10. doi: 10.1016/S0195-6698(86)80013-0. [Cross Ref]
  • Greenhill Catherine. The switch Markov chain for sampling irregular graphs: Extended abstract. Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15; SIAM; 2015. pp. 1564–1572. http://dl.acm.org/citation.cfm?id=2722129.2722232.
  • Kleitman DJ, Wang DL. Algorithms for constructing graphs and digraphs with given valences and factors. Discrete Mathematics. 1973 Sep;6(1):79–88. doi: 10.1016/0012-365X(73)90037-X. [Cross Ref]
  • Mahadev NVR, Peled UN. Threshold graphs and related topics, volume 56 of Annals of Discrete Mathematics. North-Holland Publishing Co; Amsterdam: 1995.
  • Moseman Elizabeth. Improving the computational efficiency of the Blitzstein-Diaconis algorithm for generating random graphs of prescribed degree. Technical report. 2015 doi: 10.6028/NIST.IR.8066. [Cross Ref]
  • Muirhead RF. Some methods applicable to identities and inequalities of symmetric algebraic functions of n letters. Proceedings of the Edinburgh Mathematical Society. 1903;21:144–157. doi: 10.1017/S001309150003460X. [Cross Ref]
  • Pittel Boris. Confirming two conjectures about the integer partitions. Journal of Combinatorial Theory, Series A. 1999 Oct;88(1):123–135. doi: 10.1006/jcta.1999.2986. [Cross Ref]
  • Ray Jaideep, Pinar Ali, Seshadhri C. Are we there yet? When to stop a Markov chain while generating random graphs. Proceedings of the 9th International Conference on Algorithms and Models for the Web Graph, WAW’12; Berlin, Heidelberg. Springer-Verlag; 2012. pp. 153–164. [Cross Ref]
  • Ruch Ernst, Gutman Ivan. The branching extent of graphs. Journal of Combinatorics, Information, & System Sciences. 1979;4(4):285–295.
  • Taylor R. Constrained switchings in graphs. In: McAvaney Kevin L., editor. Combinatorial Mathematics VIII. Vol. 884. Springer; Berlin Heidelberg: 1981. pp. 314–336. [Cross Ref]
  • Tripathi Amitabha, Vijay Sujith. A note on a theorem of Erdős & Gallai. Discrete Mathematics. 2003;265(1–3):417–420. doi: 10.1016/S0012-365X(02)00886-5. [Cross Ref]