There is considerable interest in the possibility that analyses of typological features may enable us to ‘push back the time barrier’ beyond the apparent 6000–10 000 year upper limit of the comparative method (Gray 2005
). It has been suggested that typology can reveal historical signal dating back at least this far (Dunn et al
), or even tens of thousands of years earlier (Nichols 1994
). The network analysis of WALS structural features reported in points to some intriguing possible deep relationships, perhaps most notably the cluster linking together many of the major language families of Eurasia. However, our analysis of rates of evolution failed to identify any typological features that evolve at consistently slower rates than the basic lexicon. If the signal in the lexicon does stretch back as far as 10 000 years (Nichols 1992
; Ringe 1995
; Kaufman & Golla 2000
), then our results suggest that typological data is constrained by a similar time horizon (e.g. Dunn et al
Beyond the difficulty of identifying consistently stable typological features, our findings suggest two further challenges to inferring deep ancestral relationships from structural language data. First, the typological features show relatively high rates of homoplasy. The classification of lexical data into cognate sets relies on isomorphism between sound and meaning within a vast possible state space of the items under comparison. The coupling of these two aspects reduces the possibility of chance similarity (Meillet 1948
). In contrast, there is a ‘poverty of choice’ of possible typological states (Harrison 2003
). For example, there are only six permutations for the ordering of the subject, object and verb that a language can use. Accordingly, there is a 1/6 chance that any two languages share the same ordering—in fact, since some configurations are much more likely than others, even this probability is an underestimate. This means that, even for a given rate of change, shared typological features are a less reliable indication of a common ancestry than shared basic vocabulary, and are more likely to produce spurious relationships.
A second issue with identifying slowly evolving typological features is diffusion between geographically proximate languages (Matras et al. 2006
). This can occur through processes like language shift (Thomason & Kaufman 1988
)—where speakers of one language change to another owing to societal influences, yet retain morphology or phonology from their original language, or metatypy (Ross 1996
)—where a language rearranges some aspect of typology (e.g. morphosyntax) owing to contact between languages without explicit borrowing between the languages, usually as an outcome of intimate cultural contact. Our results show a substantial non-tree-like signal in the typological data and a poor fit with known language relationships within the Austronesian and Indo-European language families. On a global scale, shows some putative geographical clusters like the ‘Nostratic’ grouping in Eurasia. In this Nostratic cluster, Hindi does not group correctly with Indo-European but is located with its geographical neighbour, the Dravidian language, Kannada, suggesting that the similarities seen here may indeed be due to diffusion. Likewise, a grouping of Indonesian, Thai, Vietnamese and Mandarin may be the result of areal diffusion in the Southeast Asian region (Bisang 2006
; Matras et al. 2006
). The areal diffusion of typological features—like lexical borrowing—does make it harder to identify common ancestry.
Diffusion and chance resemblances are serious challenges for historical inference based on typological data. The problem of diffusion can be lessened if known instances of diffusion are identified and removed (Ross 1996
; Dunn et al. 2008
), and the data are analysed with methods that are robust to the effect of diffusion (Greenhill et al. 2009
). For example, the WALS contains information about word order (subject, object and verb), but additional distinctions can be made between word order for different kinds of clauses (e.g. main versus subordinate clauses) or between clausal and nominal objects. By identifying these and other more specific character states, it may be possible to increase the historical signal in typological data (Reesink et al. 2009
), although rates of evolution will then necessarily increase. In addition, the WALS data is unfortunately sparse, containing only 138 characters (compared with the approx. 200 well-attested items of lexicon), and with many languages missing information—perhaps more signal will be evident in a more complete dataset.
While we were unable to identify a set of consistently stable typological features, rates of lexical evolution in one family were a good predictor of rates in the other. This fits with previous work showing that rates of change in lexical items are highly correlated across the Indo-European, Austronesian and Bantu language families (Pagel 2000
; Pagel & Meade 2006
). Recent work has also shown that rates of lexical change are predictable based on the frequency of use and part of speech (Pagel et al. 2007
) and that some meanings have a lexical ‘half-life’—the time after which there is a 50 per cent chance that the word is replaced—in excess of 20 000 years. These extremely slow and predictable rates of lexical change mean that basic vocabulary may be a more practical choice for investigating questions of deeper language origins.
Finally, our findings highlight how little we know about the shape and tempo of language change. Contrary to what might be intuitively expected, our results indicate that dependencies between structural elements of language appear to do little to slow down rates of structural change, or to limit the diffusion of features between languages. In addition, we find that rates of structural evolution are specific to each language family, while lexical rates are correlated across families. One explanation for this observation may be that the frequency of use of different structural elements is an important determinant of rates of structural change, just as is the case for lexical change (Pagel et al. 2007
). While frequency of word use is relatively constant across languages, the way structures are used depends on what other structural constraints operate in a language (Meillet 1948
). This may explain the variation we see in rates of structural evolution between language families. In future, model-based approaches like those outlined here could be used to test hypotheses about macro-scale language change, and so shed light on the basic mechanisms driving the shape and tempo of language evolution.