PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS One. 2012; 7(4): e35289.
Published online Apr 27, 2012. doi:  10.1371/journal.pone.0035289
PMCID: PMC3338724
Dating the Origin of Language Using Phonemic Diversity
Charles Perreault#1 and Sarah Mathew#2*
1Santa Fe Institute, Santa Fe, New Mexico, United States of America
2Centre for the Study of Cultural Evolution, Stockholm University, Stockholm, Sweden
Michael D. Petraglia, Editor
University of Oxford, United Kingdom
#Contributed equally.
* E-mail: sarah.mathew/at/arklab.su.se
Conceived and designed the experiments: SM CP. Performed the experiments: SM CP. Analyzed the data: SM CP. Contributed reagents/materials/analysis tools: SM CP. Wrote the paper: SM CP.
Received September 8, 2011; Accepted March 14, 2012.
Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data.
A capacity for language is a hallmark of our species [1], [2], yet we know little about the timing of its appearance. Language appears in the archaeological record only recently, with the advent of lexicographic writing around 5,400 years ago [3]. Therefore, investigators have addressed the origin of language by studying the evolutionary history of anatomical features [4][7] and genes [8][15] that are associated with speech production. This research suggests that other Homo species had the ability to produce speech sounds that overlap with the range of speech sounds of modern humans, and that species such as Neanderthals possessed genes that, in humans, play a role in language. But we do not know whether these archaic hominins actually produced speech, and if so, to which extent it was similar to our capacity for language. As of now, the anatomical and genetic data lack the resolution necessary to differentiate proto-language from modern human language. Until this resolution is improved, we need alternative lines of evidence in order to better understand the timing of language origin.
Here, we use phonemic diversity data to date the origin of language. Phonemic diversity denotes the number of perceptually distinct units of sound–consonants, vowels and tones–in a language. The worldwide pattern of phonemic diversity potentially contains the statistical signal of the expansion of modern humans on the planet [16]. As human populations left Africa, 60–70 kya, and expanded into the rest of the world [1], [17], they underwent a series of bottlenecks. This serial founder effect has led to a clinal loss of genetic [18][20], phenotypic [21][23] and phonemic diversity [16] that can be observed in present-day human populations. African languages today have some of the largest phonemic inventories in the world, while the smallest inventories are found in South America and Oceania, some of the last regions of the globe to be colonized. The loss of phonemes through serial founder effect is consistent with other lines of evidence that indicate that phonemic diversity is determined by cultural transmission forces, rather than cognitive or functional constraints. First, phonemic diversity varies considerably among languages, and several languages function with a restricted number of phonemes. Rotokas, a language of New Guinea, and Pirahã, spoken in South-America, both have 11 phonemes [24], [25], while !Xun, a language spoken in Southern Africa has 141 phonemes. Second, as predicted by theoretical models linking cultural transmission and demography [26][28], phonemic diversity correlates positively with speaker population size [16], [29]. And finally, phonemic diversity also correlates positively with the number of surrounding languages [16], suggesting that phonemes, like other cultural traits, can be borrowed. Phonemic diversity not only evolves culturally, but it also evolves slowly [16]. That the languages outside of Africa might have not recovered their original phonemic diversity, despite thousands of years of history in their respective continent, and despite all the historical, linguistic and social factors that lead to linguistic change [30][36], suggests that phonemic diversity changes over long time scales. Here, we take advantage of the fact that phonemic diversity evolves culturally and slowly, and use it as a slow-clock to date the origin of language.
By focusing on phonemes rather than cognates–words that share a common ancestry–we are able to circumvent problems that prevent current historical linguistic approaches from tackling the problem of dating the origin of language. Glottochronology uses the number of cognates that languages share to estimate when they diverged [37][39]. However, because cognates change over short time scales, the time-depth resolution of glottochronology is limited to a few thousand years [8]. Several historical, social and demographic factors influence cognate evolution, [31], [32], [34], [40], [41], a main one being frequency of word use. Common words evolve more slowly than rare ones [42]. Frequency of word-use alone predicts 50% of the variation in rates of cognate change, and can generate cognate half-lives that range from 750 years to more than 10,000 years [42]. Such variation in rates of cognate change is problematic for glottochronology, because glottochronology assumes a constant rate of cognate change [43], [44]. The assumption of a constant rate of change can be relaxed by applying phylogenetic methods to cognate datasets. These methods are powerful tools for estimating the date of divergence of language families [45][47]. Nonetheless, the temporal scope of this method is, at least in its current state, too limited to address questions about the origin of language. For instance, the average word half-life among Indo-European languages is about 2,530 years [42]. Here we circumvent the problem of variation in rates by averaging rates of phoneme accumulation over a large spatial and temporal scale.
Given that languages accumulate phonemes over long time scales, we ask how long African languages had to have been around in order to reach their current phonemic diversity. We start by building two related mathematical models that describe two ways by which phonemic diversity can rise through time. In the first model, phonemic inventory increases linearly with time, while in the second model phonemic inventory increases exponentially. Then, we parametrize the two models with empirical data. Finally, we use rewritten forms of the models to estimate the time span over which phonemes would have had to accumulate in Africa.
We do not attempt to capture all the factors that influence phonemic inventory size. The state of our knowledge does not allow us to formalize the specific mechanisms by which phonemic diversity increases and decreases. Therefore, our models are agnostic about the particular mechanisms of change in phonemic diversity, and capture only the net effect of these mechanisms on phonemic diversity. We summarize this net effect as a single number, a rate of phoneme accumulation through time. Note that phonemic changes that occur within a language and that do not lead to a net change in the size of the phonemic inventory are not relevant to our analysis. The crucial assumption underlying our models is that the net effect of the factors leading to phonemic gain is greater than the net effect of those leading to loss. When this assumption is met, all other things being equals, phonemic diversity increases through time.
The method used in this paper to date the origin of language is built upon various assumptions that require further testing. An assumption underlying the empirical parametrization of the model is that human populations have lost phonemes through a drift-loss process during their expansion across the world [16]. However, this hypothesis is not widely accepted among linguists. Problems with the drift-loss hypothesis are discussed in a collection of commentaries published in Linguistic Typology [48][60] and Science [61][63]. Overall, these commentaries highlight the fact that, while Atkinson’s hypothesis remains viable, alternative hypothesis to the worldwide pattern of phonemic diversity have yet to be satisfyingly rejected [64], [65]. As we describe our method and material below, we specify the other assumptions that we have made and that also need further investigation to be validated. Despite these caveats, our approach constitutes a novel solution to the difficult question of dating the origin of language.
We start by estimating the rate at which languages accumulate phonemes. Controlling for distance from Africa, the phonemic diversity of a language depends on the speaker population size, the geographic area over which the language is spoken, and local linguistic diversity [16]. This suggests that new phonemes are more likely to appear in large populations. It also suggests that phonemes can be borrowed through contact between groups and languages [16].
With that in mind, consider the hypothetical case of two small populations, B and C, that dispersed from the same parent population, A, t years ago (Figure 1). Suppose that B and C are similar in size so that they both experience approximately the same loss in phonemic diversity due to the founder effect. Now, suppose that population B colonizes a large continental territory and subsequently expands and diversifies linguistically [66], [67]. In contrast, population C settles on a small island that does not allow for population expansion and language diversification. Because of the differences between the regions colonized by B and C, population B will accumulate phonemes at a faster rate than population C. Furthermore, if population C evolves on a sufficiently small island and remains isolated for most of its history, then the rate of phoneme accumulation in C will be low, and its phonemic diversity will remain approximately stable through time. Consequently, the present-day difference between the phonemic diversity of B and C can be attributed to the new phonemes accumulated within population B. Thus, the current phonemic diversity of population C has remained through time a good approximation of the original phonemic diversity of population B. When this is true, and if the date of colonization, t, is known, then it is possible to estimate the phoneme accumulation rate in a large population as
A mathematical equation, expression, or formula.
 Object name is pone.0035289.e001.jpg
(1)
assuming that phonemic inventories increase linearly, and
A mathematical equation, expression, or formula.
 Object name is pone.0035289.e002.jpg
(2)
assuming that phonemic inventories increase exponentially. An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e003.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e004.jpg are the current phonemic diversity of populations B and C, and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e005.jpg is the time elapsed between divergence of B and C, and the moment when their present phonemic inventories were measured. The linear model (Equation 1) is appropriate when phonemes increase independently of a language’s phonemic diversity. The exponential model (Equation 2) captures the alternative situation where the rate at which phonemes accumulate increases with a language’s phonemic diversity. Such dependence would arise, for instance, if each phoneme has the potential to give rise to new phonemes.
Figure 1
Figure 1
A model of change in phonemic diversity through drift and recovery.
To estimate An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e009.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e010.jpg empirically, we take advantage of a natural experiment that approximates the scenario outlined in Figure 1, the migration history of humans in mainland Southeast Asia and the Andaman Islands. Both Southeast Asia and the Andaman Islands were colonized during the Pleistocene dispersal of modern humans out of Africa, a process that started 70–60 kya [71]. Genetic data indicate that humans dispersed in Asia following a coastal route, from India to Australia [17], [68][70], and that both Southeast Asia and Andaman Islands were colonized from a population that occupied the region spanning from southern India to the Malay Peninsula [69], [71], [72]. This dispersal was rapid. Genetic analyses estimate that it occurred approximately 65 kya [69], [71], and the archaeological record puts humans both in Southeast Asia [73] and Australia [74] at least 45 kya. Relative to the long temporal scale over which phonemes accumulate, we expect that the Andaman Islands and Mainland Southeast Asia were colonized simultaneously.
Populations in Southeast Asia and Andaman Islands differed demographically and linguistically. Like population B above, human groups expanded considerably after their arrival in Southeast Asia. By 40–20 kya, more than half of the total human population is estimated to have lived in South and Southeast Asia [75]. Today, about 160 million people live in Mainland Southeast Asia, and speak more than 60 languages. Conversely, we expect the Andaman population to have mirrored population C in the example above, and to have gained few novel phonemes, because of their low population size and remarkable degree of isolation. The Andaman Islands constitute a fragmented landscape of about 200 small islands, with a carrying capacity estimated to about 5000 individuals before contact with Europeans [76]. Genetic analyses suggest that the inhabitants of Andaman Islands have remained isolated since their arrival during the Pleistocene, up until the mid-19th century [70], [72], [77]. The 13 languages spoken on the islands at that time period are linguistic isolates, with no clear relationship to other Asian languages [78][81].
We estimate the parameters An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e011.jpg An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e012.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e013.jpg in Equations 1 and 2 as follows. Assuming that Mainland Southeast Asia and Andaman Islands were colonized at some point in time between 45 kya and 65 kya, we use 45 and 65 k as lower and upper bounds of An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e014.jpg We obtained the phonemic diversity of languages of Mainland Southeast Asia and Andaman Islands using data from the UCLA Phonological Segment Inventory Database (UPSID) [24], [25]. While the categorical scaled measurements of phonemic diversity of the World Atlas of Language Structures (WALS) [82] were sufficient to detect a potential global serial founder effect [16], they are inadequate for the calculation of a phoneme accumulation rate. The UPSID contains the number of phonemic segments of a global sample of 451 languages. We estimate An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e015.jpg by taking the average phonemic inventory size of the languages in Mainland Southeast Asia. Assuming an eastward, coastal migration route, we have excluded the Asian languages that are located west of Andaman Islands (such as the languages from India and Nepal), as well as those spoken in Myanmar and the Malay Peninsula, because they could have served as departure points for the colonization of Andaman Islands (Figure 2). The 20 languages retained in our sample are thus those spoken in Cambodia, Vietnam, Laos and Southwest China (Table 1). The average phonemic diversity of the resulting sample is An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e016.jpg (errors represent one standard error). Great Andamanese (ISO 639-2: apq) is the only Andamanese language to appear in UPSID. Its phonemic diversity, 24, serves as our estimate of An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e017.jpg
Figure 2
Figure 2
Approximate location of the languages included in the Mainland Southeast Asia sample.
Table 1
Table 1
Sample of Mainland Southeast Asian languages.
Setting An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e018.jpg to 41.21 and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e019.jpg to 24, we obtain range estimates for the phoneme accumulation parameters An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e020.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e021.jpg for a large, linguistically diverse population (Table 2). Note that, in the real world, we expect An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e022.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e023.jpg to vary through time and space, both within and between languages, as a result of various linguistic forces and historical contingencies. In contrast, our estimates of An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e024.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e025.jpg are averaged over 20 languages, that are dispersed over a vast spatial area, and that have been evolving in the region for perhaps as long as 60 ky. By using a time and space-averaged value, we are attempting to eliminate the effect of local contingencies and estimate the expected value of the rate of phoneme accumulation of human languages. We need such time and space-averaged value especially since we are dating an event that happened thousands of years ago, by using the average present-day phonemic diversity of multiple African languages.
Table 2
Table 2
Phoneme accumulation rate estimates.
Using the rates of phoneme accumulation An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e027.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e028.jpg we calculate An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e029.jpg the time it would take for a language to acquire the phonemic diversity observed today in African languages, An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e030.jpg
A mathematical equation, expression, or formula.
 Object name is pone.0035289.e031.jpg
(3)
or
A mathematical equation, expression, or formula.
 Object name is pone.0035289.e032.jpg
(4)
where An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e033.jpg is the number of phonemes the first human languages started with. Phonemic diversity is assumed to have increased linearly in Equation 3, and exponentially in Equation 4.
To estimate An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e034.jpg we use the average phonemic diversity of African languages that natively possess clicks [83], [84]. We do so because they comprise the African languages that have had the longest continuous history, and as a result are the ones that have lost phonemes due to founder effect the least recently. The largest language groups in Africa–Afro-Asiatic, Niger-Congo and Nilo-Saharan–underwent recently considerable geographic expansion [85], which could have decreased their phonemic diversity through serial founder effect. This idea is consistent with the fact that the average phonemic diversity of Afro-Asiatic, Niger-Congo and Nilo-Saharan languages is 36, 33, and 29 respectively, while the average phonemic diversity of African languages outside these families is 75. The African languages in UPSID outside of these three families are Hadza, Khoekhoe, Sandawe and !Xun. All of these languages use click consonants. Genetic analyses suggests that the speakers of these languages may have had the longest continuous population history [85][89], with mitochondrial DNA and Y chromosome variation indicating that the divergence between the click language speakers is at least as old as the divergence between any other pair of human populations [85], [86]. The main click language branches–Hadza, Sandawe and South African Khoisan (the last one includes Khoekhoe and !Xun) are estimated to have diverged as early as 55–35 kya [85], [86], with Hadza and Sandawe splitting 20–15 kya [85]. We have also included the Dahalo language in our sample. Dahalo is an Afro-Asiatic language, but the occurrence of click sounds in its core vocabulary suggests that it natively may have had clicks [90]. Using the five African click languages present in UPSID, we estimate An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e035.jpg to be An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e036.jpg (Table 3).
Table 3
Table 3
Sample of African languages.
We cannot know what the initial number of phonemes of the first human language, An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e037.jpg was. A reasonable assumption is that it is equal to the smallest phonemic inventory ever observed An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e038.jpg Therefore, we have set An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e039.jpg to 11 phonemes. On the other hand, it is possible that the languages with the lowest phonemic diversity today are outliers, and that a central value of the world’s phonemic diversity better approximates the initial phonemic diversity of human languages. We show how changing An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e040.jpg to the median phonemic diversity of the languages in the UPSID sample An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e041.jpg affects the result.
When An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e042.jpg is 45–65 kya, the linear and the exponential growth models yield An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e043.jpg values of 232–159 kya and 225–156 kya, respectively. Setting An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e044.jpg to the median phonemic diversity, 29, decreases our estimate to 163–112 kya and 75–108 kya for the linear and exponential growth models respectively. We have also estimated intervals around An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e045.jpg using one standard error around An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e046.jpg and the rates of accumulation An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e047.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e048.jpg The value of An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e049.jpg is minimized when phonemic diversity in Africa is low and phoneme accumulation rate is high. Conversely, An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e050.jpg is maximized when phonemic diversity in Africa is high and phoneme accumulation rate is low. Therefore, the upper bound for An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e051.jpg under linear growth is obtained by setting Equation 2 to An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e052.jpg and its lower bound is obtained by setting Equation 2 to An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e053.jpg Similarly, under exponential growth, the upper bound of An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e054.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e055.jpg and the lower bound is An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e056.jpg The resulting date ranges are shown in Figure 3.
Figure 3
Figure 3
Date estimates for the origin of present-day languages.
These estimates are fairly insensitive to changes in model assumptions. We have considered the possibility that we are overestimating the phonemic diversity of African languages by restricting our sample to click languages. Click sounds may be evolving independently of non-click sounds. This would mean that a language could accumulate non-click phonemes at a certain rate, such as An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e062.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e063.jpg while simultaneously accumulating click-sounds at another rate An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e064.jpg If this is true, then we cannot compare the African languages in our sample which have been accumulating clicks and non-click phonemes simultaneously, to Mainland Southeast Asian and Andaman languages which do not contain click sounds. To account for this possibility, we excluded click sounds from the phoneme inventory counts of African languages. The average non-click phonemic diversity of our sample of African click languages is An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e065.jpg Using this value decreases our estimate to 158–108 kya and 187–129 kya for linear and exponential growth respectively. We have also tested the robustness of our results by excluding the Dahalo language from our sample of African languages. While Dahalo is thought to natively possess clicks, it is an Afro-Asiatic language [83] and as such might bias our sample of African languages towards lower phonemic diversity. Removing it from the sample increases our estimate to 244–167 kya and 230–159 kya for linear and exponential growth respectively. Finally, we also have increased our sample of Mainland Southeast Asian languages. Previously, we had excluded the languages spoken in Thailand, Malaysia, and Myanmar, as the colonizers of the Andaman Islands could have departed from one of these regions. By relaxing this assumption and including in our sample all the Mainland Southeast Asian languages contained in UPSID (the languages spoken in Myanmar, Thailand, Malaysia, Laos, Cambodia, Vietnam and Southwest China), we find that the average phonemic diversity in the region, An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e066.jpg is An external file that holds a picture, illustration, etc.
Object name is pone.0035289.e067.jpg which increases our estimate to 242–168 kya and 236–163 kya for linear and exponential growth, respectively.
Our analysis suggests that language appears early in the history of our species. It does not support the idea that language is a recent adaptation that could have sparked the colonization of the globe by our species about 50 kya [1], [91]. Rather, our result is consistent with the archaeological evidence suggesting that human behavior became increasingly complex during the Middle Stone Age (MSA) in Africa, sometime between 350–150 kya [92][100]. However, we cannot rule out the possibility that other linguistic adaptations, that are independent of phonemic evolution, arose later and triggered the out-of-Africa expansion.
Our date estimate for the origin of language roughly coincides with the date range for the emergence of modern humans. Fossil evidence suggests that anatomically modern humans were present by 195–160 kya [101][104], and fossils classified as Homo helmei, that may be anatomically modern or nearly modern, are dated to 300–250 kya [95], [100]. Coalescence times from genetic data suggest that a genetic population bottleneck, possibly associated with a speciation event, occurred 200–100 kya [85], [105][108].
A population bottleneck causing a loss of phonemes would push back, or even reset the phonemic clock. As a result, our date estimates should be treated as minimum ages for the origin of language. It is thus possible that language arose before the last speciation event in our lineage, or even before the appearance of behavioral modernity.
Our date estimates should be treated with caution. Our results hinge on a series of assumptions in addition to the ones laid out in the Material and Method section. We assume that the rate of phoneme accumulation of Southeast Asia and Africa were similar. We assume that the Andaman languages did not accumulate new phonemes following the colonization of the Andaman Islands, or lose phonemes when their populations crashed upon contact with Europeans. We assume that the founding populations that settled Andaman Islands and Mainland Southeast Asia have lost an equivalent number of phonemes due to drift. Also, the UPSID phoneme counts do not include tonal distinctions. The absence of tonal distinctions in our data could add noise to our analysis, and bias it if it leads us to underestimate the phonemic diversity of one of the continental regions, Africa and Mainland Southeast Asia, more so than the other. We assume that the rate of accumulation of phonemes does not decrease as phonemic inventory size increases. An accumulation rate that decreases with phonemic diversity would lead us to underestimate the antiquity of present-day phonemic inventories. A similar bias would also occur if the phoneme accumulation rate changed through time as our species evolved. Furthermore, our estimate of the rate of phoneme accumulation is based on a single historical case. We are not aware of other colonization sequences that resembles the one outlined in Figure 1 that would also be ancient enough to allow for phonemic inventories to increase. However, despite the caveats we have highlighted here, this analysis constitutes the first appraisal of when language evolved to be based directly on linguistic data.
Acknowledgments
We thank Tanmoy Bhattacharya, Robert Boyd, P. Jeffrey Brantingham, Juergen Neubauer and three anonymous reviewers for insightful comments and criticisms that helped improve this paper.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: Funding was provided by the Swedish Research Council [grant number 2009-2390]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
1. Klein RG. Chicago: University of Chicago Press, 989 pp; 2009. The human career.
2. Tattersall I. Human origins: Out of Africa. Proc Natl Acad Sci USA. 2009;106:16018–16021. [PubMed]
3. Powell BB. Malden, MA: Wiley-Blackwell; 2009. Writing: Theory and history of the technology of civilization.
4. Fitch W. The evolution of speech: a comparative review. Trends in Cognitive Sciences. 2000;4:258–267. [PubMed]
5. Lieberman DE. Speculations about the selective basis for modern human craniofacial form. Evolutionary Anthropology. 2008;17:55–68.
6. Lieberman P. The evolution of human speech: Its anatomical and neural bases. Current Anthropology. 2007;48:39–66.
7. Houghton P. Neandertal supralaryngeal vocal tract. American Journal of Physical Anthropology. 1993;90:139–46. [PubMed]
8. Campbell L. Time perspective in linguistics. Renfrew C, McMahon A, Trask L, editors, Time Depth in Historical Linguistics, Volume 1, Cambridge: McDonald Institute for Archaeological Research, chapter 1. 2000. pp. 3–31.
9. Enard W, Przeworski M, Fisher SE, Lai CSL, Wiebe V, et al. Molecular evolution of FOXP2, a gene involved in speech and language. Nature. 2002;418:869–72. [PubMed]
10. Fisher SE, Marcus GF. The eloquent ape: genes, brains and the evolution of language. Nature Reviews Genetics. 2006;7:9–20. [PubMed]
11. Fisher SE, Scharff C. FOXP2 as a molecular window into speech and language. Trends in Genetics. 2009;25:166–77. [PubMed]
12. Zhang J, Webb DM, Podlaha O. Accelerated protein evolution and origins of human-specific features: Foxp2 as an example. Genetics. 2002;162:1825–35. [PubMed]
13. Lieberman P. The FOXP2 gene, human cognition and language. International congress Series. 2006;1296:115–126.
14. Coop G, Bullaughey K, Luca F, Przeworski M. The timing of selection at the human FOXP2 gene. Molecular biology and evolution. 2008;25:1257–9. [PubMed]
15. Krause J, Lalueza-Fox C, Orlando L, Enard W, Green RE, et al. The derived FOXP2 variant of modern humans was shared with Neandertals. Current Biology. 2007;17:1908–12. [PubMed]
16. Atkinson QD. Phonemic diversity supports a serial founder effect model of language expansion from Africa. Science. 2011;332:346–349. [PubMed]
17. Mellars P. Why did modern human populations disperse from Africa ca. 60,000 years ago? A new model. Proc Natl Acad Sci USA. 2006;103:9381–9387. [PubMed]
18. Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science (New York, NY) 2008;319:1100–4. [PubMed]
19. Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic diversity of human populations. Current Biology. 2005;15:R159–R160. [PMC free article] [PubMed]
20. Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, et al. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci USA. 2005;102:15942–7. [PubMed]
21. Betti L, Balloux F, Amos W, Hanihara T, Manica A. Distance from Africa, not climate, explains within-population phenotypic diversity in humans. Proceedings of the Royal Society B Biological Sciences. 2009;276:809–814. [PMC free article] [PubMed]
22. Von Cramon-Taubadel N, Lycett SJ. Brief communication: human cranial variation fits iterative founder effect model with African origin. American Journal of Physical Anthropology. 2008;136:108–113. [PubMed]
23. Manica A, Amos W, Balloux F, Hanihara T. The effect of ancient population bottlenecks on human phenotypic variation. Nature. 2007;448:346–348. [PMC free article] [PubMed]
24. Maddieson I. Cambridge: Cambridge University Press; 1984. Patterns of sounds.
25. Maddieson I, Precoda K. Updating UPSID. UCLA Working Papers in Phonetics. 1990;74:104–111.
26. Henrich J. Demography and cultural evolution: How adaptive cultural processes can produce maladaptive losses: The Tasmanian case. American Antiquity. 2004;69:197–214.
27. Powell A, Shennan S, Thomas MG. Late Pleistocene demography and the appearance of modern human behavior. Science. 2009;324:1298–1301. [PubMed]
28. Shennan S. Demography and cultural innovation: a model and its implications for the emergence of modern human culture. Cambridge Archaeological Journal. 2001;11:5–16.
29. Hay J, Bauer L. Phoneme inventory size and population size. Language. 2007;83:388–400.
30. Croft W. Oxford, UK: Oxford university Press; 2000. Explaining language change: an evolutionary approach.
31. Labov W. Oxford, UK: Blackwell; 1994. Principles of linguistic change: Internal factors.
32. Labov W. Oxford, UK: Blackwell; 2001. Principles of linguistic change: Social factors.
33. Labov W. Principles of linguistic change: Cognitive and cultural factors. 2010.
34. Thomason SG, Kaufman T. Berkeley, CA: University of California Press; 1988. Language contact, creolization, and genetic linguistics.
35. Yang CD. Oxford: Oxford University Press; 2003. Knowledge and learning in natural language.
36. Trudgill P. Linguistic and social typology: The Austronesian migrations and phoneme inventories. Linguistic Typology. 2004;8:305–320.
37. Swadesh M. Lexicostatistic dating of prehistoric ethnic contacts. Proceedings American Philosophical Society. 1952;96:452–463.
38. Swadesh M. Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics. 1955;21:121–137.
39. Lees RB. The basis of glottochronology. Language. 1953;29:113–127.
40. Milroy J, Milroy L. Linguistic change, social network and speaker innovation. Journal of Linguistics. 1985;21:339–384.
41. Nettle D. Is the rate of linguistic change constant? Lingua. 1999;108:119–136.
42. Pagel M, Atkinson QD, Meade A. Frequency of word-use predicts rates of lexical evolution throughout Indo-European history. Nature. 2007;449:717–20. [PubMed]
43. Bergsland K, Vogt H. On the validity of glottochronology. Current Anthropology. 1962;3:115.
44. Blust R, Renfrew C, McMahon A, Trask RL. Why lexicostatistics doesn’t work: the “universal constant” hypothesis and the Austronesian languages. Renfrew C, McMahon A, Trask L, editors, Time depth in historical linguistics, McDonald Institute for Archaeological Research, volume 2, chapter 13. 2001. pp. 311–331.
45. Gray RD, Atkinson QD. Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature. 2003;426:435–439. [PubMed]
46. Gray RD, Drummond AJ, Greenhill SJ. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science. 2009;323:479–483. [PubMed]
47. Gray RD, Atkinson QD, Greenhill SJ. Language evolution and human history: what a difference a date makes. Philosophical Transactions of the Royal Society B: Biological Sciences. 2011;366:1090–1100. [PMC free article] [PubMed]
48. Bybee J. How plausible is the hypothesis that population size and dispersal are related to phoneme inventory size? Introducing and commenting on a debate. Linguistic Typology. 2011;15:147–153.
49. Trudgill P. Social structure and phoneme inventories. Linguistic Typology. 2011;15:155–160.
50. Donohue M, Nichols J. Does phoneme inventory size correlate with population size? Linguistic Typology. 2011;15:161–170.
51. Dahl O. Are small languages more or less complex than big ones? Linguistic Typology. 2011;15:171–175.
52. Wichmann, Rama T, Holman, Eric W. Phonological diversity, word length, and population sizes across languages: The ASJP evidence. Linguistic Typology. 2011;15:177–197.
53. Sproat R. Phonemic diversity and the out-of-Africa theory. Linguistic Typology. 2011;15:199–206.
54. Bowern C. Out of Africa? The logic of phoneme inventories and founder effects. Linguistic Typology. 2011;15:207–216.
55. Pericliev V. On phonemic diversity and the origin of language in Africa. Linguistic Typology. 2011;15:217–221.
56. Ringe D. A pilot study for an investigation into Atkinson’s hypothesis. Linguistic Typology. 2011;15:223–231.
57. Rice K. Athabaskan languages and serial founder effects. Linguistic Typology. 2011;15:233–250.
58. Ross B, Donahue M. The many origins of diversity and complexity in phonology. Linguistic Typology. 2011;15:251–265.
59. Maddieson I, Bhattacharya T, Smith DE, Croft W. Geographical distribution of phonological complexity. Linguistic Typology. 2011;15:267–279.
60. Jaeger TF, Graff P, Croft W, Pontillo D. Mixed e_ect models for genetic and areal dependencies in linguistic typology. 2011. 2011;15:281–319.
61. Cysouw M, Dediu D, Moran S. Comment on “Phonemic diversity supports a serial founder effect model of language expansion from Africa”. Science. 2012;335:657. [PubMed]
62. Wang C, Ding Q, Tao H, Li H. Comment on “Phonemic Diversity Supports a Serial Founder Effect Model”. Science. 2012;335:657. [PubMed]
63. Tuyl RV, Pereltsvaig A. Comment on “Phonemic Diversit’y Supports a Serial Founder Effect Model”. Science. 2012;335:657. [PubMed]
64. Atkinson QD. Linking spatial patterns of language variation to ancient demography and population migrations. Linguistic Typology. 2011;15:321–332.
65. Atkinson QD. Response to Comments on \Phonemic Diversity Supports a Serial Founder Effect Model of Expansion from Africa". Science. 2012;335:657.
66. Lightfoot D. Cambridge, UK: Cambridge University Press, 199 pp; 2006. How new languages emerge.
67. Nettle D. Explaining global patterns of language diversity. Journal of Anthropological Archaeology. 1998;17:354–374.
68. Forster P, Matsumura S. Did early humans go north or south? Science. 2006;308:965–966. [PubMed]
69. Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, et al. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005;308:1034–6. [PubMed]
70. Thangaraj K, Chaubey G, Kivisild T, Reddy AG, Singh VK, et al. Reconstructing the origin of Andaman Islanders. Science. 2005;308:996. [PubMed]
71. Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, et al. Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. American Journal of Human Genetics. 2004;75:966–78. [PubMed]
72. Endicott P, Gilbert MTP, Stringer C, Lalueza-Fox C, Willerslev E, et al. The genetic origins of the Andaman Islanders. American Journal of Human Genetics. 2003;72:178–84. [PubMed]
73. Barker G. The archaeology of foraging and farming at Niah Cave, Sarawak. Asian Perspectives. 2005;44:90–106.
74. O’Connell JF, Allen J. Dating the colonization of Sahul (Pleistocene Australia-New Guinea): a review of recent research. Journal of Archaeological Science. 2004;31:835–853.
75. Atkinson QD, Gray RD, Drummond AJ. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Molecular biology and evolution. 2008;25:468–74. [PubMed]
76. Ericksen P, Beckerman S. Population determinants in the Andaman Islands. Mankind. 1975;10:105–107.
77. Thangaraj K, Singh L, Reddy AG, Rao VR, Sehgal SC, et al. Genetic affinities of the Andaman Islanders, a vanishing human population. Current Biology. 2003;13:86–93. [PubMed]
78. Greenberg JH. The Indo-Pacific hypothesis. Sebeok TA, editor, Current Trends in Linguistics vol 8: Linguistics in Oceania, Mouton de Gruyter, volume 8. 1971. pp. 809–871.
79. Wurm SA. Sebeok TA, editor, Current trends in linguistics, vol. 8: Linguistics in Oceania, Mouton de Gruyter; 1971. Classifications of Australian languages, including Tasmanian. pp. 721–778.
80. Wurm SA, McElhanan K. Canberra: Australian National University; 1975. New Guinea area languages and language study, vol 1: Papuan languages and the New Guinea linguistic scene.
81. Ruhlen M. Stanford, CA: Stanford University Press; 1987. A guide to the world’s languages.
82. Dryer MS, Haspelmath M. Munich: Max Planck Digital Library; 2011. The World Atlas of Language Structures online.
83. Güldemann T, Stoneking M. A historical appraisal of clicks: A linguistic and genetic population perspective. Annual Review of Anthropology. 2008;37:93–109.
84. Greenberg JH. Bloomington: Indiana University Press; 1963. The languages of Africa.
85. Tishkoff SA, Gonder MK, Henn BM, Mortensen H, Knight A, et al. History of clickspeaking populations of Africa inferred from mtDNA and Y chromosome genetic variation. Molecular biology and evolution. 2007;24:2180–95. [PubMed]
86. Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, et al. African Y chromosome and mtDNA divergence provides insight into the history of click languages. Current Biology. 2003;13:464–473. [PubMed]
87. Semino O, Santachiara-Benerecetti AS, Falaschi F, Cavalli-Sforza LL, Underhill PA. Ethiopians and Khoisan share the deepest clades of the human Y-chromosome phylogeny. American Journal of Human G’enetics. 2002;70:265–8. [PubMed]
88. Chen YS, Olckers A, Schurr TG, Kogelnik AM, Huoponen K, et al. mtDNA variation in the South African Kung and Khwe-and their genetic relationships to other African populations. American journal of human genetics. 2000;66:1362–83. [PubMed]
89. Watson E, Bauer K, Aman R, Weiss G, von Haeseler A, et al. mtDNA sequence diversity in Africa. American Journal of Human Genetics. 1996;59:437–44. [PubMed]
90. Sands B, Güldemann T. What click languages can and can’t tell us about language origins. Botha RP, Knight C, editors, The cradle of language (studies in the evolution of language), USA: Oxford University Press, chapter 11. 2009. pp. 204–218.
91. Diamond J. Hutchinson Radius (Vintage Edition 1992), 360 pp; 1991. The rise and fall of the third chimpanzee.
92. Brown KS, Marean CW, Herries AIR, Jacobs Z, Tribolo C, et al. Fire as an engineering tool of early modern humans. Science. 2009;325:859–62. [PubMed]
93. D’Errico F, Henshilwood C, Vanhaeren M, Van Niekerk K. Nassarius kraussianus shell beads from Blombos Cave: evidence for symbolic behaviour in the Middle Stone Age. Journal of Human Evolution. 2005;48:3–24. [PubMed]
94. D’Errico F, Henshilwood CS. Additional evidence for bone technology in the southern African Middle Stone Age. Journal of Human Evolution. 2007;52:142–163. [PubMed]
95. Foley R, Lahr MM. On stony ground: Lithic technology, human evolution, and the emergence of culture. Evolutionary Anthropology. 2003;12:109–122.
96. Henshilwood CS, Marean CW. The origin of modern human behavior: Critique of the models and their test implications. Current Anthropology. 2003;44:627–651. [PubMed]
97. Marean CA, Assefa Z. The Middle and Upper Pleistocene African record for the biological and behavioral origins of modern humans. Stahl A, editor, African archaeology: A critical introduction, Malden, MA: Blackwell Publishing, chapter 4. 2005. pp. 93–129.
98. Marean CW, Bar-Matthews M, Bernatchez J, Fisher E, Goldberg P, et al. Early human use of marine resources and pigment in South Africa during the Middle Pleistocene. Nature. 2007;449:905–908. [PubMed]
99. Marean CW. Coastal South Africa and the coevolution of the modern human lineage and the Coastal adaptation. Bicho NF, Haws JA, Davis LG, editors, Trekking the shore: Changing coastlines and the antiquity of coastal settlement, New York, NY: Springer New York, Interdisciplinary Contributions to Archaeology. 2011. pp. 421–440. doi: 10.1007/978–1-4419–8219–3.
100. Mcbrearty S, Brooks AS. The revolution that wasn’t: a new interpretation of the origin of modern human behavior. Journal of Human Evolution. 2000;39:453–563. [PubMed]
101. Clark JD, Beyene Y, WoldeGabriel G, Hart WK, Renne PR, et al. Stratigraphic, chronological and behavioural contexts of Pleistocene Homo sapiens from Middle Awash, Ethiopia. Nature. 2003;423:747–752. [PubMed]
102. McDougall I, Brown FH, Fleagle JG. Stratigraphic placement and age of modern humans from Kibish, Ethiopia. Nature. 2005;433:733–736. [PubMed]
103. Smith TM, Ta_oreau P, Reid DJ, Grün R, Eggins S, et al. Earliest evidence of modern human life history in North African early Homo sapiens. Proc Natl Acad Sci USA. 2007;104:6128–6133. [PubMed]
104. White TD, Asfaw B, DeGusta D, Gilbert H, Richards GD, et al. Pleistocene Homo sapiens from Middle Awash, Ethiopia. Nature. 2003;423:742–747. [PubMed]
105. Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, et al. The dawn of human matrilineal diversity. Journal of Human Genetics. 2008;82:1130–1140. [PubMed]
106. Fagundes NJR, Ray N, Beaumont M, Neuenschwander S, Salzano FM, et al. Statistical evaluation of alternative models of human evolution. Proc Natl Acad Sci USA. 2007;104:17614–17619. [PubMed]
107. Ingman M, Kaessmann H, Pääbo S, Gyllensten U. Mitochondrial genome variation and the origin of modern humans. Nature. 2000;408:708–713. [PubMed]
108. Gonder MK, Mortensen HM, Reed FA, De Sousa A, Tishkoff SA. Whole-mtDNA genome sequence analysis of ancient African lineages. Molecular Biology and Evolution. 2007;24:757–768. [PubMed]
Articles from PLoS ONE are provided here courtesy of
Public Library of Science