Home | About | Journals | Submit | Contact Us | Français |

**|**J R Soc Interface**|**v.12(108); 2015 July 6**|**PMC4528601

Formats

Article sections

Authors

Related links

J R Soc Interface. 2015 July 6; 12(108): 20150330.

PMCID: PMC4528601

e-mail: ta.ca.neiwinudem@renruht.nafets

Received 2015 April 14; Accepted 2015 May 19.

Copyright © 2015 The Author(s) Published by the Royal Society. All rights reserved.

This article has been cited by other articles in PMC.

The formation of sentences is a highly structured and history-dependent process. The probability of using a specific word in a sentence strongly depends on the ‘history’ of word usage earlier in that sentence. We study a simple history-dependent model of text generation assuming that the sample-space of word usage reduces along sentence formation, on average. We first show that the model explains the approximate Zipf law found in word frequencies as a direct consequence of sample-space reduction. We then empirically quantify the amount of sample-space reduction in the sentences of 10 famous English books, by analysis of corresponding word-transition tables that capture which words can follow any given word in a text. We find a highly nested structure in these transition tables and show that this ‘nestedness’ is tightly related to the power law exponents of the observed word frequency distributions. With the proposed model, it is possible to understand that the nestedness of a text can be the origin of the actual scaling exponent and that deviations from the exact Zipf law can be understood by variations of the degree of nestedness on a book-by-book basis. On a theoretical level, we are able to show that in the case of weak nesting, Zipf's law breaks down in a fast transition. Unlike previous attempts to understand Zipf's law in language the sample-space reducing model is not based on assumptions of multiplicative, preferential or self-organized critical mechanisms behind language formation, but simply uses the empirically quantifiable parameter ‘nestedness’ to understand the statistics of word frequencies.

Written texts show the remarkable feature that the rank-ordered distribution of word frequencies follows an approximate power law

1.1

where *r* is the rank that is assigned to every word in the text. For most texts, regardless of language, time of creation, genre of literature, its purpose, etc. one finds that *α* ~ 1, which is referred to as Zipf's law [1]. In figure 1, the word frequency is shown for Darwin's text, *The origin of species*. The quest for an understanding of the origin of this statistical regularity has been going on for almost a century. Zipf himself offered a qualitative explanation based on the efforts invested in communication events by a sender and a receiver [1]. These ideas were later formalized within an information-theoretic framework [2–5]. The first quantitative model based on linguistic assumptions about text generation has been proposed by Simon [6]. The model assumes that as context emerges in the generation of a text, words that have already appeared in the text are favoured over others. By the simple assumption that words that have previously appeared are added to the text with a probability proportional to their previous appearance (preferential attachment), and assuming that words that have so far not appeared are added at a constant rate, it is possible to derive Zipf's law, given the latter rate is low. This preferential attachment model has been refined by implementing the empirical fact that the rate of appearance of new words decreases as the length of texts increases [7]. It has been shown in classical works that random typewriting models can lead to Zipf-like distributions of word frequencies [8–10]. However, these works are based on unrealistic assumptions on word-length distributions and lead to unstructured and uninterpretable texts. However, as we will show, grammar structure, jointly with discourse generation mechanisms, may play an essential role in the origin of Zipf's law in a realistic context. It is important to stress that the detailed statistical study of language properties does not end here; important work beyond Zipf's law has been put forward (e.g. [11–14]). Recent studies deal with the detailed dependence of the scaling exponents on the length of the body of text under study [15,16].

Rank-ordered distribution of word frequencies for *The origin of species* (blue) shows an approximate power law with a slope of approximately *α* ~ 0.9. The model result (red line) explains not only the power law exponent, but also captures **...**

Zipf's law is not limited to word frequencies but appears in countless, seemingly unrelated, systems and processes [17]. Just to mention a few, it has been found in the statistics of firm sizes [18], city sizes [1,6,19–22], the genome [23], family names [24], income [25,26], financial markets [27], Internet file sizes [28], or human behaviour [29]; for more examples see [30]. There have been tremendous efforts to understand the origin of Zipf's law, and more generally the origin of scaling in complex systems. There are three main routes to scaling: multiplicative processes [2,6,31], preferential processes [32–34] and self-organized criticality [35]. Several other mechanisms that are more or less related to these basic routes to scaling have been proposed (e.g. [5,36–40]).

Recently, a fourth, independent route to scaling has been introduced on the basis of stochastic processes that reduce their potential outcomes (sample-space) over time [41]. These are history-dependent random processes that have been studied in different contexts in the mathematical literature [42,43], and more recently in the context of scaling laws [44,45]. An example of sample-space reducing processes is the following. Think of a set of *N* dice where die number 1 has one face, die number 2 has two faces (coin), die number 3 has three faces, and so on. Die number *N* has *N* faces. Start by picking one of the *N* dice at random, say dice number *i*. Throw it and record the obtained face value, which was say *k*. Then take die number *k* − 1 throw it, get *j*, record *j*, take die number *j* − 1, throw it, etc. Keep throwing dice in this way until you throw 1 for the first time. As there is no die with less than 1 faces, the process ends here. The sequence of recorded face values in the above prescription (*i*, *k*, *j*, … , 1) is obviously strictly ordered or nested, *i* > *k* > *j* > … > 1. In [41], it was shown rigorously that if this process is repeated many times, the distribution of outcomes (face values 1, 2, … , *N*) is an exact Zipf law, i.e. the probability to observe a face value *m* in the above process (sequence of throws) is exactly *P _{N}*(

More formally, every die *N* has a sample-space, denoted by *Ω _{N}* = {1, 2, … ,

1.2

The nestedness of sample-spaces in a history-dependent sequence is at the heart of the origin of scaling laws in this type of process. For details, see [41] where it is also shown that if noise is added to the history-dependent processes, the scaling law *P _{N}*(

In this paper, we present a derivation of Zipf's law of word frequencies, based on a simple model for sentence/discourse formation. The model is motivated by the observation that the process of forming a sentence—or more generally a discourse—is a history-dependent sample-space reducing process. Words are not randomly drawn from the sample-space of all possible words, but are used in strict relations to each other. The usage of specific words in a sentence highly restricts the usage for consecutive words, leading to a nesting (or sample-space reducing) process, similar to the one described above. Sample-space collapse in texts is necessary to convey meaningful information. Otherwise, any interpretation, even in metaphoric or poetic terms, would become impossible. Let us make the point more concrete with an example for the formation of a sentence, where both grammatical and contextual constraints (that reduce sample-space) are at work (figure 2). We form the sentence: ‘The wolf howls in the night’. In principle, the first word ‘The wolf’ (ignoring articles and prepositions for the moment) can be drawn from all possible words. Assume there exist *N* possible words, and denote the respective sample-space by *Ω _{N}* = {1, 2, … ,

Schematic view of nestedness in sentence formation. (*a*) Among all the potential *N* words defining the initial sample-space, we choose ‘wolf’ (*b*). This choice restricts the sample-space for the next word (orange circle) that has to be grammatically **...**

The role of grammar for nesting is obvious. Typically in English, the first word is a noun with the grammatical role of the *subject*. The fact that the first word is a noun restricts the possibilities for the next word to the subset of *verbal phrases*. Depending on the particular verb chosen, the words that can now follow are typically playing the grammatical role of the *object* and are again more restricted. We use the terms sample-space reduction and nested hierarchical structure in sentences interchangeably. It is not only grammatical structure that imposes consecutive restrictions on sample-space of words as the sentence progresses; the need for intelligibility has the same effect. Without (at least partial) hierarchical structures in the formation of sentences, their *interpretation* would become very hard [46]. However, nested structures in sentences will generally not be strictly realized. Otherwise the creative use and flexibility of language would be seriously constrained. Sometimes words can act as a linguistic hinge, meaning that it allows for many more consecutive words than were available for its preceding word. One expects that nestedness will be realized only to some degree. Imperfect nestedness allows for a degree of ambiguity in the linguistic code and is one of the sources of its astonishing versatility [47].

In this paper, we quantify the degree of nestedness of a text from its word-transition matrix *M* (network). To characterize the hierarchical structure of a text with a single number, we define its nestedness *n* as a property of *M* by

1.3

where the average is taken over all possible word pairs (*i*, *j*). Nestedness is a number between 0 and 1, and specifies to what extent sample-space reduction is present on average in the text.^{1} A strictly nested system, like the one shown in equation (1.2), has *n*(*M*) = 1. In linguistic terms, strict nestedness is clearly unrealistic.

We use word-transition matrices from actual English texts, which serve as the input to a simple model for sentence formation. We then study the word frequency distributions of these artificially produced texts and compare them with the distributions of the original texts. For the first time, we show that it is possible to relate the topological feature of (local) nestedness in sentence formation to the global features of word frequency distributions of long texts. In this respect, we propose a way to understand the statistics of word frequencies—Zipf's law in particular—by the actual structural feature of language, nestedness, without the need to resort to previous attempts including multiplicative processes, preferential attachment or self-organized criticality, which, in the context of language, sometimes seem to rest on strong and implausible assumptions.

We assume a finite vocabulary of *N* words. From any given text, we obtain an empirical word-transition matrix *M*. Words are labelled with latin indices. *M _{ij}* = 1 means that in the text we find at least one occasion where word

Section of word-transition matrix *M* for the 250 words that show the largest sample-space volume of consecutive words (*a*). A black entry (*M*_{ij} = 1) means that a given word *i* (*y*-axis) is followed by word *j* (*x*-axis). Non-trivial nestedness is seen by the **...**

Note that the profile in figure 3*b* is actually not well fitted with a power law; the reason for the parametrization is for a purely theoretical argument that will become clear below. We exclude words that are followed by less than two different words in the entire text, i.e. we remove all lines *i* from *M* for which |*Ω _{i}*| < 2. Strict nestedness is not to be confused with strong or weak nesting. The latter are properties of the sample-space profile.

For statistical testing, we construct two randomized versions of *M*, and denote them by *M*_{rand} and *M*_{row-perm}, respectively. *M*_{rand} is obtained by randomly permuting the rows of the individual lines of the matrix *M*. This keeps the number of non-zero entries in every line the same as in the original matrix *M*, but destroys its nestedness and the information which words follow each other. The second randomized version *M*_{row-perm} is obtained by permuting the (entire) rows of the matrix *M*. This keeps the nestedness of the matrix unchanged, but destroys the information on word transitions.

Given *M*, we construct random sentences of length *L* with the following model:

- —Pick one of the
*N*words randomly. Say the word was*i*. Write*i*in a wordlist*W*, so that*W*= {*i*}. - —Jump to line
*i*in*M*and randomly pick a word from the set*Ω*. Say the word chosen is_{i}*k*; update the wordlist*W*= {*i*,*k*}. - —Jump to line
*k*and pick one of the words from*Ω*; say you get_{k}*j*, and update*W*= {*i*,*k*,*j*}. - —Repeat the procedure
*L*times. At this stage, a random sentence is formed. - —Repeat the process to produce
*N*_{sent}sentences.

In this way, we get a wordlist with *L* × *N*_{sent} entries, which is a random book that is generated with the word-transition matrix of an actual book. From the wordlist, we obtain the word frequency distribution *f*_{model}. The present model is similar to the one in [41] but differs in three aspects: it allows for non-perfect nesting *n* < 1, it has no explicit noise component, and it has a fixed sequence (sentence) length.

We analyse the model with computer simulations, specifying *L* = 10 and *N*_{sent} = 100 000. We use 10 randomly chosen books^{2} from Project Gutenberg (www.gutenberg.org). For every book, we determine its vocabulary *N*, its matrix *M*, its *Ω _{i}* for all words, its nestedness

The distribution obtained from the model *f*_{model} is clearly able to reproduce the approximate power law exponent for *The origin of species*, *α*_{model} ~ 0.86 (same fit range). Moreover, it captures details of the distribution *f*. For large values of *r* in *f*_{model}(*r*), a plateau is forming before the exponential finite size cut-off is observed. Both plateau and cut-off can be fully understood with the randomized model.

In figure 4*a*, we compare the *α* exponents as extracted from the books with the model results *α*_{model}. The model obviously explains the actual values to a large extent, slightly underestimating the actual exponents. We get a correlation coefficient of *ρ* = 0.95 (*p* < 3.7 × 10^{−5}). In figure 4*b*, we show that nesting *n*(*M*) is related to the exponents *α* in an approximately linear way. We test the hypothesis that by destroying nestedness the exponents will vanish. Using the randomized *M*_{rand}, we find (same fit range), which effectively destroys the power law. Using the other randomized version that keeps the nestedness intact, *M*_{row-perm}, for low-rank words (up to approximately rank approx. 10), we find similar word frequency distributions as for *M*; however, as expected, the power law tail (high ranks) vanishes for *M*_{row-perm} due to the noise contribution of the randomization (not shown). To validate our assumption that word ordering is essential, we computed the model rank distributions by using the transposed matrix *M*^{T}, meaning that we reverse the time flow in the model. We find two results. First, the correlation between the exponents of the books *α* and the model vanishes, reflected by an insignificant correlation coefficient *ρ* = 0.47 (*p* = 0.17). Second, the exponents (averaged over the 10 books) are significantly smaller, than for the correct time flow, where we get The corresponding *p*-value of a *t*-test is 0.039.

Finally, we try to understand the importance of the sample-space profile on the scaling exponents. For this, we generate a series of *M* matrices that have a profile parametrized with a power *κ*. In figure 4*c*, the model exponents *α*_{model} from these artificially generated *M* are shown as a function of *κ*, for various sizes of vocabulary *N*. For *κ* < 1 (weak nesting), we find exponents *α*_{model} ≈ 0, i.e. no scaling law. For large *N* at *κ* = 1, a fast transition to *α*_{model} ≈ 1 (Zipf) occurs. For smaller *N*, we find a more complicated behaviour of the transition, building a maximum exponent at *κ* < 1. The range of book exponents *α* ranges between 0.85 and 1.1, which is exactly the observed range for realistic vocabulary sizes *N* ~ 1000–10 000. We verified that variations in sentence length (with the exception of *L* = 1) do not change the reported results. For one-word sentences (*L* = 1), we obviously get a uniform word frequency distribution and, as a consequence, a flat rank distribution, as most words have almost the same rank. We varied the number of sentences from *N*_{sent} = 10^{4} to 10^{6}, and find practically no influence on the reported results.

In this paper, we focus on the fundamental property of nestedness in any code that conveys meaningful information, such as language. We argue that if nesting was not present, one would easily end up in confusing situations as described in *La Biblioteca de Babel* by J. L. Borges, where a hypothetical library owns all books composed of *all* possible combinations of characters filling 410 pages. We define and quantify a degree of nestedness in the linguistic code. Low degrees of nestedness typically imply a less strict hierarchy on word usage or a more *egalitarian* use of the vocabulary, than texts with high nestedness. As expected, texts have a well defined, but not strictly nested structure, which might arise from a compromise of specificity (to convey unambiguous messages) and flexibility (to allow a creative use of language). We find that nestedness varies between different texts, suggesting that different ways of using the vocabulary and grammar are at work. Our sample of texts included three plays by Shakespeare, three scientific texts and four novels. We find that the plays, maybe closest to spoken language, show a lower nestedness than the science books. The novels show the highest levels of nestedness. The sample is too small to draw conclusions on whether different types of texts are characterized by typical values of nestedness; however it is remarkable that nestedness is correlated with the variations of the scaling exponents of word frequencies on a book-by-book basis.

The main finding of this paper is that a simple sample-space reducing model can show that nestedness indeed explains the emergence of scaling laws in word frequencies, in particular, Zipf's law. More precisely, we were able to relate the emergence of scaling laws with topological structure of the word-transition matrix, or ‘phasespace’. The result is remarkable as the matrix does not encode any information about how often word *j* follows word *i*, it just tells that *j* followed *i* at least once in the entire text. Random permutations of the matrix that destroy its nestedness cannot explain the scaling anymore, while permutations that keep nesting intact do indicate the existence of the power laws. It is further remarkable that no (non-local) preferential, multiplicative or self-organized critical assumptions are needed to understand the observed scaling, and that no parameters are needed beyond the word-transition matrices.

The fact that the simple model is so successful in reproducing the detailed scaling property in word frequency statistics might point to an important aspect of language that has not been noted so far; the fact that overall word use is statistically strongly influenced by the use of local hierarchical structures and constraints that we use in generating sentences. We believe that the close relation between nestedness and the scaling exponent opens the door for an interpretation of word frequency distributions as a statistical observable that strongly depends on the usage of the vocabulary and grammar within a language. Accordingly, we conjecture that Zipf's law might not be universal, but that word-use statistics depends on local structures which may be different across texts and even within sentences. Further research is needed to clarify this point.

Finally, it is worthwhile to note that the class of sample-space reducing processes provide an independent route to scaling that might have a wide range of applications for history-dependent and ageing processes [41]. In statistical physics, it is known that processes that successively reduce their phasespace as they unfold are characterized by power law or stretched exponential distribution functions. These distributions generically arise as a consequence of phasespace collapse [44].

^{1}Note that the nesting indicator in equation (1.3) is reasonable only for the case where the probability of two words *i*, *j* having the same sample space is very low, *p*(*Ω _{i}* =

^{2}In particular, we use *An American tragedy*, by Theodore Dreiser; *The origin of species*, *Descent of man* and *Different forms of plants* by Charles Darwin; *Tale of two cities* and *David Copperfield* by Charles Dickens; *Romeo and Juliet*, *Henry V* and *Hamlet* by William Shakespeare; and *Ulysses* by James Joyce. Vocabulary varies from *N* = 3102 (*Romeo and Juliet*) to 22 000 (*Ulysses*) words.

S.T. designed the research, performed numerical analysis and wrote the manuscript. R.H. and B.C.-M. performed numerical analysis and wrote the manuscript. B.L. did preprocessing of the books and performed numerical analysis.

The authors declare no competing financial interests.

This work was supported by the Austrian Science Fund FWF under KPP23378FW.

1. Zipf GK.
1949.
Human behavior and the principle of least effort. Reading, MA: Addison-Wesley.

2. Mandelbrot B.
1953.
An informational theory of the statistical structure of language. In Communication theory (ed. Jackson W, editor. ). London, UK: Butterworths.

3. Harremoës P, Topsøe F.
2001.
Maximum entropy fundamentals. Entropy
3, 191–226. (doi:10.3390/e3030191)

4. Ferrer i Cancho R, Solé RV.
2003.
Least effort and the origins of scaling in human language. Proc. Natl Acad. Sci. USA
100, 788–791. (doi:10.1073/pnas.0335980100) [PubMed]

5. Corominas-Murtra B, Fortuny J, Solé RV.
2011.
Emergence of Zipf's law in the evolution of communication. Phys. Rev. E
83, 036115 (doi:10.1103/PhysRevE.83.036115) [PubMed]

6. Simon HA.
1955.
On a class of skew distribution functions. Biometrika
42, 425–440. (doi:10.1093/biomet/42.3-4.425)

7. Zanette DH, Montemurro MA.
2005.
Dynamics of text generation with realistic Zipf's distribution. J. Quant. Linguist.
12, 29–40. (doi:10.1080/09296170500055293)

8. Li W.
1992.
Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Trans. Inform. Theory
38, 1842–1845. (doi:10.1109/18.165464)

9. Miller GA.
1957.
Some effects of intermittent silence. Am. J. Psychol.
70, 311–314. (doi:10.2307/1419346) [PubMed]

10. Miller GA, Chomsky N.
1963.
Finitary models of language users. In Handbook of mathematical psychology, vol. 2 (eds Luce RD, Bush R, Galanter E, editors. ), pp. 419–491. New York, NY: Wiley.

11. Kosmidis K, Kalampokis A, Argyrakis P.
2006.
Statistical mechanical approach to human language. Phys. A
366, 495–502. (doi:10.1016/j.physa.2005.10.039)

12. Wichmann S.
2005.
On the power-law distribution of language family sizes. J. Linguist.
41, 117–131. (doi:10.1017/S002222670400307X)

13. Serrano MA, Flammini A, Menczer F.
2009.
Modeling statistical properties of written text. PLoS ONE
4, e5372 (doi:10.1371/journal.pone.0005372) [PMC free article] [PubMed]

14. Zanette DH, Montemurro MA.
2011.
Universal entropy of word ordering across linguistic families. PLoS ONE
6, e19875 (doi:10.1371/journal.pone.0019875) [PMC free article] [PubMed]

15. Font-Clos F, Boleda G, Corral A.
2013.
A scaling law beyond Zipf's law and its relation to Heaps’ law. N. J. Phys.
15, 093033 (doi:10.1088/1367-2630/15/9/093033)

16. Yan X-Y, Minnhagen P.
2014.
Comment on ‘A scaling law beyond Zipf's law and its relation to Heaps’ law’. (http://arxiv.org/abs/1404.1461)

17. Kawamura K, Hatano N.
2002.
Universality of Zipf's law. J. Phys. Soc. Jpn
71, 1211–1213. (doi:10.1143/JPSJ.71.1211)

18. Axtell RL.
2001.
Zipf distribution of US firm sizes. Science
293, 1818–1820. (doi:10.1126/science.1062081) [PubMed]

19. Makse H-A, Havlin S, Stanley HE.
1995.
Modelling urban growth patterns. Nature
377, 608–612. (doi:10.1038/377608a0)

20. Krugman P.
1996.
Confronting the mystery of urban hierarchy. J. Jpn Int. Econ.
10, 399–418. (doi:10.1006/jjie.1996.0023)

21. Blank A, Solomon S.
2000.
Power laws in cities population, financial markets and internet sites. Phys. A
287, 279–288. (doi:10.1016/S0378-4371(00)00464-7)

22. Decker E-H, Kerkhoff A-J, Moses M-E.
2007.
Global patterns of city size distributions and their fundamental drivers. PLoS ONE
2, 934 (doi:10.1371/journal.pone.0000934) [PMC free article] [PubMed]

23. Stanley HE, Buldyrev S, Goldberger A, Havlin S, Peng C, Simons M.
1999.
Scaling features of noncoding DNA. Phys. A
273, 1–18. (doi:10.1016/S0378-4371(99)00407-0) [PubMed]

24. Zanette D-H, Manrubia S-C.
2001.
Vertical transmission of culture and the distribution of family names. Phys. A
295, 1–8. (doi:10.1016/S0378-4371(01)00046-2)

25. Pareto V.
1896.
Cours d'Economie Politique. Geneva, Switzerland: Droz.

26. Okuyama K, Takayasu M, Takayasu H.
1999.
Zipf's law in income distribution of companies. Phys. A
269, 125–131. (doi:10.1016/S0378-4371(99)00086-2)

27. Gabaix X, Gopikrishnan P, Plerou V, Stanley H-E.
2003.
A theory of power-law distributions in financial market fluctuations. Nature
423, 267–270. (doi:10.1038/nature01624) [PubMed]

28. Reed WJ, Hughes BD.
2002.
From gene families and genera to incomes and internet file sizes: why power laws are so common in nature. Phys. Rev. E
66, 067103 (doi:10.1103/PhysRevE.66.067103) [PubMed]

29. Thurner S, Szell M, Sinatra R.
2012.
Emergence of good conduct, scaling and Zipf laws in human behavioral sequences in an online world. PLoS ONE
7, e29796 (doi:10.1371/journal.pone.0029796) [PMC free article] [PubMed]

30. Newman MEJ.
2005.
Power laws, Pareto distributions and Zipf's law. Contemp. Phys.
46, 323–351. (doi:10.1080/00107510500052444)

31. Solomon S, Levy M.
1996.
Spontaneous scaling emergence in generic stochastic systems. Int. J. Mod. Phys. C
7, 745–751. (doi:10.1142/S0129183196000624)

32. Malcai O, Biham O, Solomon S.
1999.
Power-law distributions and Lévy-stable intermittent fluctuations in stochastic systems of many autocatalytic elements. Phys. Rev. E
60, 1299–1303. (doi:10.1103/PhysRevE.60.1299) [PubMed]

33. Lu ET, Hamilton RJ.
1991.
Avalanches of the distribution of solar flares. Astrophys. J. 380, 89–92. (doi:10.1086/186180)

34. Barabási A-L, Albert R.
1999.
Emergence of scaling in random networks. Science
286, 509–512. (doi:10.1126/science.286.5439.509) [PubMed]

35. Bak P, Tang C, Wiesenfeld K.
1987.
Self-organized criticality: an explanation of the 1/*f* noise. Phys. Rev. Lett.
59, 381–384. (doi:10.1103/PhysRevLett.59.381) [PubMed]

36. Saichev A, Malevergne Y, Sornette D.
2008.
*Theory of Zipf's law and of general power law distributions with Gibrat's law of proportional growth*. (http://arxiv.org/abs/0808.1828)

37. Pietronero L, Tosatti E, Tosatti V, Vespignani A.
2001.
Explaining the uneven distribution of numbers in nature: the laws of Benford and Zipf. Phys. A
293, 297–304. (doi:10.1016/S0378-4371(00)00633-6)

38. Thurner S, Tsallis C.
2005.
Nonextensive aspects of self-organized scale-free gas-like networks. Europhys. Lett.
72, 197–203. (doi:10.1209/epl/i2005-10221-1)

39. Corominas-Murtra B, Solé RV.
2010.
Universality of Zipf's law. Phys. Rev. E
82, 011102 (doi:10.1103/PhysRevE.82.011102) [PubMed]

40. Montroll E-W, Shlesinger M-F.
1982.
On 1/f noise and other distributions with long tails. Proc. Natl Acad. Sci. USA
79, 3380–3383. (doi:10.1073/pnas.79.10.3380) [PubMed]

41. Corominas-Murtra B, Hanel R, Thurner S.
2015.
Understanding scaling through history-dependent processes with collapsing sample space. Proc. Natl Acad. Sci. USA
112, 5348–5353. (doi:10.1073/pnas.1420946112) [PubMed]

42. Kac M.
1989.
A history-dependent random sequence defined by Ulam. Adv. Appl. Math.
10, 270–277. (doi:10.1016/0196-8858(89)90014-6)

43. Clifford P, Stirzaker D.
2008.
History-dependent random processes. Proc. R. Soc. A
464, 1105–1124. (doi:10.1098/rspa.2007.0291)

44. Hanel R, Thurner S, Gell-Mann M.
2014.
How multiplicity of random processes determines entropy: derivation of the maximum entropy principle for complex systems. Proc. Natl Acad. Sci. USA
111, 6905–6910. (doi:10.1073/pnas.1406071111) [PubMed]

45. Hanel R, Thurner S.
2013.
Generalized (c,d)-entropy and aging random walks. Entropy
15, 5324–5337. (doi:10.3390/e15125324)

46. Partee BH.
1976.
Montague grammar. New York, NY: Academic Press.

47. Fortuny J, Corominas-Murtra B.
2013.
On the origin of ambiguity in efficient communication. J. Logic Lang. Inform.
22, 249–267. (doi:10.1007/s10849-013-9179-3)

Articles from Journal of the Royal Society Interface are provided here courtesy of **The Royal Society**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |