|Home | About | Journals | Submit | Contact Us | Français|
Let p be a graph parameter that assigns a positive integer value to every graph. The inverse problem for p asks for a graph within a prescribed class (here, we will only be concerned with trees), given the value of p. In this context, it is of interest to know whether such a graph can be found for all or at least almost all integer values of p. We will provide a very general setting for this type of problem over the set of all trees, describe some simple examples and finally consider the interesting parameter “number of subtrees”, where the problem can be reduced to some number-theoretic considerations. Specifically, we will prove that every positive integer, with only 34 exceptions, is the number of subtrees of some tree.
Numerous graph parameters have been introduced and thoroughly studied. Many of them are defined as the number of vertex or edge subsets of some sort. For instance, Klazar  describes twelve different tree parameters and analyses them by means of a generating functions approach. Recently, the inverse problem has gained a certain importance: given a graph parameter p that assigns an integer value to every graph within a given family of graphs and an integer n, determine some G such that p(G) = n. This is of interest in combinatorial chemistry, where graphical parameters are used as molecular descriptors. For applications in the design of combinatorial libraries and molecular recognitions, we refer to  and the references therein. In this context, acyclic graphs play a major role, and we will also concentrate on trees in this paper.
We will generally consider rooted trees, which will turn out to be useful for the treatment of the various parameters. (Some of the tree parameters are actually defined for rooted trees only). All rooted trees can be constructed recursively by means of the following two operations:
Many tree parameters can be described recursively in a natural way by means of these two operations. Let us present a few examples.
Note that all these recursive characterizations share a common scheme: in each case, a k-tuple (c1, c2, …, ck) is associated with each tree, the parameter we are actually interested in is some polynomial of the ci (typically linear), and our two operations induce polynomial maps f: k → Rk and g: k × k → k.
For this parameter, the problem is essentially trivial, as has also been noticed in : every integer is the number of matchings of some star, since a star with n vertices has exactly n matchings (the empty set and n − 1 single edges). Nevertheless, the problem can become much harder if one imposes further restrictions. For instance, it is not quite clear whether almost every positive integer is the number of matchings of some binary tree (at least, we are not aware of any result in this direction).
This example is also quite trivial, since it is easily observed that the number of antichains of a path, rooted at one of its ends, is just the number of its vertices (the only antichains are single vertices in this case).
Another simple example, but it shows a new phenomenon: it is easy to check from the recursive definition that the number of chains in any rooted tree is always odd. On the other hand, the inverse problem can be solved for all odd integers: indeed, one finds that a star with n vertices, rooted at its center, has exactly 2n − 1 chains (note that all chains are either single vertices or pairs consisting of the root and any leaf), proving the claim.
This problem is somewhat harder than the previous ones. In their paper , Lepović and Gutman conjecture that every positive integer, with only 49 explicit exceptions, is the Wiener index of some tree. This conjecture was verified independently in  and  and further extended to the class of trees with maximum degree 3 in . It should be noted that the Wiener index W (T) of a tree T with n vertices can be bounded above and below as follows:
This implies that the minimum does not increase linearly any more, which explains why no simple construction as in the previous examples can be given. However, since the growth is only polynomial, the problem can be reduced to number-theoretic considerations similar to those presented in the following section.
Here, the situation is much harder. It is conjectured, yet unproven that almost every positive integer is the number of independent subsets of some tree. This is a question that is also of relevance in combinatorial chemistry:  provides an algorithm for constructing such a tree provided that one exists. The biggest difference to the previously considered parameters lies in the fact that the minimum over all trees of size n grows exponentially with n, making it less amenable to the Diophantine methods applied in  and also in this paper. However, Linek  gives a partial result in this direction: he shows that every positive integer is the number of independent sets of some bipartite graph.
This problem will form the main part of the paper. “On average”, the behavior of this parameter is similar to the number of independent subsets, but the minimum only grows polynomially, which allows us the construction of a reasonably rich set of trees for which the inverse problem can be reduced to a question of elementary number theory. Specifically, we will prove the following theorem in the next section:
All positive integers, except for the 34 numbers 2, 4, 5, 7, 8, 9, 12, 13, 14, 16, 18, 19, 22, 23, 26, 27, 29, 31, 33, 35, 38, 39, 42, 43, 46, 50, 52, 54, 60, 65, 68, 72, 77 and 89, can be expressed as the number of subtrees of some tree.
We will show that it is sufficient to consider caterpillar trees, see Fig. 3. These can be constructed recursively by means of our two operations, where we only use a particular special case of Operation (II). In each step, we attach exactly one new vertex to the root; there are two possibilities for doing so, namely
These operations have the following effects on pairs (a, b) as described in the previous section:
To construct caterpillars, we proceed as follows: start with a single vertex. Now apply Operation (I) a0 times, then our special case of Operation (II), then a1 times Operation (I) again, etc. The final step involves ak applications of Operation (I). The variables ai can be interpreted as the number of horizontal edges between subsequent “legs” of the caterpillar. Let the resulting tree be denoted by C(a0, …, ak); it clearly has a0 + a1 + … + ak + k + 1 vertices.
Straightforward induction yields the following lemma:
The number of subtrees of C(a0, …, ak) containing [resp. not containing] the last root (the rightmost vertex in Figure 3) is
Thus the total number of subtrees of C(a0, …, ak) is given by
This is obviously true for a single vertex (k = 0, a0 = 0), and it is easy to see that it also holds for the path (a0 applications of Operation (I), k = 0 and a0 is arbitrary). Now the number of subtrees of C(a0, …, ak) can be calculated from C(a0, …, ak − 1) inductively:
After the application of our second operation, the pair becomes
and ak-fold application of Operation (I) yields
which completes the proof after some simplifications.
The formula for the number of subtrees C(a0, …, ak) represents a quadratic form, but it is not very practical as it stands. However, a carefully chosen substitution simplifies matters decisively: we set aj = xj − 2xj − 1, where . Then, (1) reduces to
Note that is a so-called pentagonal number. It is part of a more general theorem (see ) that every positive integer is a sum of five pentagonal numbers. However, it is not possible to apply this theorem to our problem, since we have aj ≥ 0 for all j, which yields the restrictions x0 ≥ 1 and xj ≥ 2xj − 1 for 1 ≤ j ≤ k. However, we can show the following, which will be sufficient for our purposes:
Every positive integer n ≥ 11034 can be written as
where x0 ≥ 1 and xj ≥ 2xj − 1 for 1 ≤ j ≤ k − 1.
We show that “for every integer ≥ 160, all integers within the closed interval [11034, ] can be written in the above form for some k, where in addition xk − 1 ≤ ”. This is done by induction on , the base case being 160 ≤ ≤ 319, which can be checked directly by means of a computer using a brute-force approach that just checks all the possibilities; a list of all numbers that are representable in this way is generated for all ≤ 319 in a recursive manner; then it only remains to check that the entire interval is contained. A Mathematica file containing all computations is available online at http://math.sun.ac.za/~swagner/inverseproblem. For the induction step, note that if n can be represented in the above form with xk − 1 ≤ /2, then can be written in this form as well, where now k − 1 is replaced by k and xk = . Therefore, under the induction hypothesis, all numbers in the union
(for the inclusion, simply note that for ≥ 160) can be represented in the desired form, proving the claim.
An immediate consequence of the preceding proposition and its proof is that every integer within the interval
is the number of subtrees of some caterpillar tree, provided that ≥ 160 (simple take xk = 2 in the above formula). Taking the union over all ≥ 160 (where we note that subsequent intervals overlap, since
holds for all ≥ 1), we find that all integers ≥ 1527 can be represented in this way. An additional computer search shows that in fact all positive integers, except for 2, 4, 5, 7, 8, 9, 12, 13, 14, 16, 18, 19, 22, 23, 26, 27, 29, 31, 33, 35, 38, 39, 42, 43, 46, 48, 50, 52, 54, 60, 61, 64, 65, 68, 72, 77, 79, 85, 89, 93, 96, 123, 157 and 183, are the number of subtrees of some caterpillar (one merely has to check the aforementioned lists of representable numbers). Finally, it is not difficult to compute the number of subtrees of all trees with ≤ 18 vertices (by means of our recursions or directly in a brute-force manner) to obtain the final list. All computations are again available online. This is sufficient, since a tree with n vertices has at least subtrees (and thus at least 190 if the number of vertices is ≥ 19): consider any pair of two (possibly equal) vertices; then the unique path between them forms a subtree, proving that the total number of subtrees is indeed at least .
On the meager evidence of two non-trivial tree parameters, the Wiener index and the number of subtrees of a tree, we tend to think that for most tree indices almost all natural numbers are values of that tree parameter, except when we see obvious reasons for the opposite (as in the case of chains, where only specific residue classes are covered). A general (but admittedly only heuristic) argument one is tempted to use is the following: if the number of trees with n vertices grows faster than the corresponding maximum value of a parameter (the number of non-isomorphic trees grows exponentially with an exponential base of 2.955765 —see ), then one would expect that almost all integer values (or at least those in certain residue classes) should be covered. A counterexample for this metaconjecture would be very instructive.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Éva Czabarka, Department of Mathematics, University of South Carolina, Columbia, SC 29208, United States of America.
László Székely, Department of Mathematics, University of South Carolina, Columbia, SC 29208, United States of America.
Stephan Wagner, Department of Mathematical Sciences, Mathematics Division, Stellenbosch University, Private Bag X1, Matieland 7602, South Africa.