Older adults (as a group) are less likely than younger adults to engage in an anticipatory mode of language comprehension, failing to successfully pre-activate information about upcoming likely (predictable) words during online processing. To assess (within one set of materials) age-related changes in the use of sentential context to affect processing of predictable words and in the consequences of violating predictions, event-related brain potentials were recorded while older adults read sentences that varied in sentence-level constraint and expectancy of sentence-final words. Strongly constraining sentences were completed by their most expected, predictable words and weakly constraining sentences were completed by their most expected, less predictable words. Both types of sentences also were completed by unexpected (but plausible) words. Older adults showed reduced and delayed effects of sentential context on processing predictable words. Whereas younger adults elicit an enhanced positive ERP (starting around 500 ms post-stimulus onset, largest over prefrontal electrode sites), specifically for unexpected words that violate strong expectancies for a different word, older adults as a group did not exhibit this neural consequence of disconfirmed predictions. Older adults were instead more likely to show a left-lateralized frontal negativity for predictable items. This ERP response has been attributed to processes needed to revisit contextual material in forming an interpretation of message-level meaning, which may be more likely when anticipatory modes of comprehension are not engaged. Taken together, the results suggest that normal aging can affect allocation of resources to different cognitive and neural pathways in achieving comprehension outcomes.
language; event-related potentials; sentential context; N400; frontal negativity
Millions of people search online for medical text, but these texts are often too complicated to understand. Readability evaluations are mostly based on surface metrics such as character or words counts and sentence syntax, but content is ignored. We compared four types of documents, easy and difficult WebMD documents, patient blogs, and patient educational material, for surface and content-based metrics. The documents differed significantly in reading grade levels and vocabulary used. WebMD pages with high readability also used terminology that was more consumer-friendly. Moreover, difficult documents are harder to understand due to their grammar and word choice and because they discuss more difficult topics. This indicates that we can simplify many documents by focusing on word choice in addition to sentence structure, however, for difficult documents this may be insufficient.
Text Readability; UMLS; Consumer-Friendly Display (CFD) Names; Blogs; WebMD
Two experiments examined parafoveal preview for words located in the middle of sentences and at sentence boundaries. Parafoveal processing was shown to occur for words at sentence-initial, mid-sentence, and sentence-final positions. Both Experiments 1 and 2 showed reduced effects of preview on regressions out for sentence-initial words. In addition, Experiment 2 showed reduced preview effects on first-pass reading times for sentence-initial words. These effects of sentence position on preview could result from reduced parafoveal processing for sentence-initial words, or other processes specific to word reading at sentence boundaries. In addition to the effects of preview, the experiments also demonstrate variability in the effects of sentence wrap-up on different reading measures, indicating that the presence and time course of wrap-up effects may be modulated by text-specific factors. We also report simulations of Experiment 2 using version 10 of E-Z Reader (Reichle, Warren, & McConnell, 2009), designed to explore the possible mechanisms underlying parafoveal preview at sentence boundaries.
reading; eye movements; E-Z Reader; parafoveal preview; wrap-up effects
Repetition and semantic-associative priming effects have been demonstrated for words in nonstructured contexts (i.e., word pairs or lists of words) in numerous behavioral and electrophysio-logical studies. The processing of a word has thus been shown to benefit from the prior presentation of an identical or associated word in the absence of a constraining context. An examination of such priming effects for words that are embedded within a meaningful discourse context provides information about the interaction of different levels of linguistic analysis. This article reviews behavioral and electrophysiological research that has examined the processing of repeated and associated words in sentence and discourse contexts. It provides examples of the ways in which eye tracking and event-related potentials might be used to further explore priming effects in discourse. The modulation of lexical priming effects by discourse factors suggests the interaction of information at different levels in online language comprehension.
words; discourse; eye movement; ERPs; reading
In two experiments the effects of word repetition, synonymy, and coreference on event-related brain potentials during text processing were studied. Participants read one (Experiment 1) or two sentence (Experiment 2) texts in which critical nouns were preceded by the definite (the) or indefinite (a) articles. Experiment 1 was run as a control to verify that differences in article processing in the second sentences of Experiment 2 would not contaminate the ERPs to critical noun items. They did not. In Experiment 2, an initial sentence was used to set up a context and contained either a first presentation or synonym of the critical word from the second sentence. N400 (but not Late Positive Component; LPC) priming effects were found for repetitions and synonyms (larger for repetitions) in second sentences. This extends observations of priming in word lists and single sentences to two-sentence texts. There was also a greater left anterior negativity or “LAN” for coreferential critical nouns (those following the article “The”) compared to non-coreferential critical nouns (those following the article “A”) suggesting that ERPs are sensitive to working memory processes engaged during referential assignment. In response to the articles themselves, there was a greater N400-700 elicited by the article “A” vs. “The.” Finally, there was a greater N400-like negativity to the final words of non-coreferential sentences implying that the meanings of these sentences were difficult to integrate with the discourse level representation established by the prior sentence.
N400; LAN; ERPs; Coreference; Anaphoric processing; Sentence processing
Extracting key concepts from clinical texts for indexing is an important task in implementing a medical digital library. Several methods are proposed for mapping free text into standard terms defined by the Unified Medical Language System (UMLS). For example, natural language processing techniques are used to map identified noun phrases into concepts. They are, however, not appropriate for real time applications. Therefore, in this paper, we present a new algorithm for generating all valid UMLS concepts by permuting the set of words in the input text and then filtering out the irrelevant concepts via syntactic and semantic filtering. We have implemented the algorithm as a web-based service that provides a search interface for researchers and computer programs. Our preliminary experiment shows that the algorithm is effective at discovering relevant UMLS concepts while achieving a throughput of 43K bytes of text per second. The tool can extract key concepts from clinical texts for indexing.
We report two experiments that investigate the effects of sentence context on bilingual lexical access in Spanish and English. Highly proficient Spanish-English bilinguals read sentences in Spanish and English that included a marked word to be named. The word was either a cognate with similar orthography and/or phonology in the two languages, or a matched non-cognate control. Sentences appeared in one language alone (i.e., Spanish or English) and target words were not predictable on the basis of the preceding semantic context. In Experiment 1, we mixed the language of the sentence within a block such that sentences appeared in an alternating run in Spanish or in English. These conditions partly resemble normally occurring inter-sentential code-switching. In these mixed-language sequences, cognates were named faster than non-cognates in both languages. There were no effects of switching the language of the sentence. In Experiment 2, with Spanish-English bilinguals matched closely to those who participated in the first experiment, we blocked the language of the sentences to encourage language-specific processes. The results were virtually identical to those of the mixed-language experiment. In both cases, target cognates were named faster than non-cognates, and the magnitude of the effect did not change according to the broader context. Taken together, the results support the predictions of the Bilingual Interactive Activation + Model (Dijkstra and van Heuven, 2002) in demonstrating that bilingual lexical access is language non-selective even under conditions in which language-specific cues should enable selective processing. They also demonstrate that, in contrast to lexical switching from one language to the other, inter-sentential code-switching of the sort in which bilinguals frequently engage, imposes no significant costs to lexical processing.
bilingualism; language switching; switch costs; lexical access; sentence context; cognates
Native Chinese readers’ eye movements were monitored as they read text that did or did not demark word boundary information. In Experiment 1, sentences had 4 types of spacing: normal unspaced text, text with spaces between words, text with spaces between characters that yielded nonwords, and finally text with spaces between every character. The authors investigated whether the introduction of spaces into unspaced Chinese text facilitates reading and whether the word or, alternatively, the character is a unit of information that is of primary importance in Chinese reading. Global and local measures indicated that sentences with unfamiliar word spaced format were as easy to read as visually familiar unspaced text. Nonword spacing and a space between every character produced longer reading times. In Experiment 2, highlighting was used to create analogous conditions: normal Chinese text, highlighting that marked words, highlighting that yielded nonwords, and highlighting that marked each character. The data from both experiments clearly indicated that words, and not individual characters, are the unit of primary importance in Chinese reading.
Chinese reading; spaced and unspaced text; eye movements
Finding useful high-grade professional orthopaedic information on the Internet is often difficult. Orthopaedic Web Links (OWL) is a searchable database of vetted online orthopaedic resources. OWL uses a subject directory (OWL Directory) and a custom search engine (OWL Web) to provide a list of resources. The most effective way to find readily accessible, full text on-subject material suitable for education of an orthopaedic surgeon or trainee has not been defined.
We therefore (1) proposed a method for selecting topics and evaluating searches and (2) compared the search results from an orthopaedic-specific directory (OWL Directory), a custom search engine (OWL Web), and standard Google searches.
A scoring system for evaluation of the search results was developed for standardized comparison. Single words and sets of three words from randomly selected examination questions provided the search strings to compare the three strategies.
For single keyword searches, the OWL Directory scored highest (16.4/50) of the three methods. For the three keywords searches, OWL Web had the highest mean score (26.0/50), followed by Google (22.8/50), and the OWL Directory (1.0/50). OWL Web searches had higher scores than Google searches, while returning 800 times fewer search results.
The OWL Directory of orthopaedic subjects on the Internet provides a simple browsable category structure to find information. The OWL Web search engine scored higher than Google and resulted in a greater proportion of valid, on-subject, and accessible resources in the search results.
Electronic supplementary material
The online version of this article (doi:10.1007/s11999-011-1875-1) contains supplementary material, which is available to authorized users.
Words in human language interact in sentences in non-random ways, and allow humans to construct an astronomic variety of sentences from a limited number of discrete units. This construction process is extremely fast and robust. The co-occurrence of words in sentences reflects language organization in a subtle manner that can be described in terms of a graph of word interactions. Here, we show that such graphs display two important features recently found in a disparate number of complex systems. (i) The so called small-world effect. In particular, the average distance between two words, d (i.e. the average minimum number of links to be crossed from an arbitrary word to another), is shown to be d approximately equal to 2-3, even though the human brain can store many thousands. (ii) A scale-free distribution of degrees. The known pronounced effects of disconnecting the most connected vertices in such networks can be identified in some language disorders. These observations indicate some unexpected features of language organization that might reflect the evolutionary and social history of lexicons and the origins of their flexibility and combinatorial nature.
Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,…) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf's law-like word rank distribution.
In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text.
The good fit of random texts to real Zipf's law-like rank distributions has not yet been established. Therefore, we suggest that Zipf's law might in fact be a fundamental law in natural languages.
Phonologic text alexia (PhTA) is a reading disorder in which reading of pseudowords is impaired, but reading of real words is impaired only when reading text. Oral reading accuracy remains well preserved when words are presented individually, but when presented in text the part-of-speech effect that is often seen in phonologic alexia (PhA) emerges.
To determine whether repetition priming could strengthen and/or maintain the activation of words during text reading.
Methods & Procedures
We trained NYR, a patient with PhTA, to use a strategy, Sentence Building, designed to improve accuracy of reading words in text. The strategy required NYR to first read the initial word, and then build up the sentence by adding on sequential words, in a step-wise manner, utilizing the benefits of repetition priming to enhance accuracy.
Outcomes & Results
When using the strategy, NYR displayed improved accuracy not only for sentences she practiced using the strategy, but unpracticed sentences as well. Additionally, NYR performed better on a test of comprehension when using the strategy, as compared to without the strategy.
In light of research linking repetition priming to increased neural processing efficiency, our results suggest that use of this compensatory strategy improves reading accuracy and comprehension by temporarily boosting phonologic activation levels.
phonologic text alexia; repetition priming; aphasia; alexia; rehabilitation
Although more individuals are sharing their experiences with chronic pain or illness through blogging (writing an Internet web log), research on the psychosocial effects and motivating factors for initiating and maintaining a blog is lacking.
The objective was to examine via online questionnaire the perceived psychosocial and health benefits of blogging among patients who use this media to communicate their experience of chronic pain or illness.
A 34-item online questionnaire was created, tested, and promoted through online health/disease forums. The survey employed convenience sampling and was open from May 5 to July 2, 2011. Respondents provided information regarding demographics, health condition, initiation and upkeep of blogs, and dynamics of online communication. Qualitative data regarding respondents’ blogging experiences, expectations for blogging, and the perceived effects from blogging on the blogger’s health, interpersonal relationships, and quality of life were collected in the form of written narrative.
Out of 372 respondents who started the survey, 230 completed the entire questionnaire. Demographic data showed survey respondents to be predominantly female (81.8%) and highly educated (97.2% > high school education and 39.6% with graduate school or professional degrees). A wide spectrum of chronic pain and illness diagnoses and comorbidities were represented. Respondents reported that initiating and maintaining an illness blog resulted in increased connection with others, decreased isolation, and provided an opportunity to tell their illness story. Blogging promoted accountability (to self and others) and created opportunities for making meaning and gaining insights from the experience of illness, which nurtured a sense of purpose and furthered their understanding of their illness.
Results suggest that blogging about chronic pain and illness may decrease a sense of isolation through the establishment of online connections with others and increases a sense of purpose to help others in similar situations. Further study involving a larger sample size, a wider range of education levels, and respondents with different types and magnitudes of illnesses will be needed to better elucidate the mechanism of the observed associations in this understudied area.
Blogging; narrative medicine; disease management; Internet; pain; chronic illness; survey; psychosocial support systems; holistic health; selfcare
Previous research has demonstrated that properties of a currently fixated word
and of adjacent words influence eye movement control in reading. In contrast to
such local effects, little is known about the global effects on eye movement
control, for example global adjustments caused by processing difficulty of
previous sentences. In the present study, participants read text passages in
which voice (active vs. passive) and sentence structure (embedded vs.
non-embedded) were manipulated. These passages were followed by identical target
sentences. The results revealed effects of previous sentence structure on gaze
durations in the target sentence, implying that syntactic properties of
previously read sentences may lead to a global adjustment of eye movement
reading; eye movements; syntax; context; global effects; syntactic priming
Research on written language comprehension has generally assumed that the phonological properties of a word have little effect on sentence comprehension beyond the processes of word recognition. Two experiments investigated this assumption. Participants silently read relative clauses in which two pairs of words either did or did not have a high degree of phonological overlap. Participants were slower reading and less accurate comprehending the overlap sentences compared to the non-overlapping controls, even though sentences were matched for plausibility and differed by only two words across overlap conditions. A comparison across experiments showed that the overlap effects were larger in the more difficult object relative than in subject relative sentences. The reading patterns showed that phonological representations affect not only memory for recently encountered sentences but also the developing sentence interpretation during on-line processing. Implications for theories of sentence processing and memory are discussed.
Five experiments used ERPs and eye tracking to determine the interplay of word-level and discourse-level information during sentence processing. Subjects read sentences that were locally congruent but whose congruence with discourse context was manipulated. Furthermore, critical words in the local sentence were preceded by a prime word that was associated or not. Violations of discourse congruence had early and lingering effects on ERP and eye-tracking measures. This indicates that discourse representations have a rapid effect on lexical semantic processing even in locally congruous texts. In contrast, effects of association were more malleable: Very early effects of associative priming were only robust when the discourse context was absent or not cohesive. Together these results suggest that the global discourse model quickly influences lexical processing in sentences, and that spreading activation from associative priming does not contribute to natural reading in discourse contexts.
Discourse; Lexical; Association; Eye movements; Event-related potentials
The purpose of this study was to examine the influence of phonotactic probability, the frequency of different sound segments and segment sequences, on the overall fluency with which words are produced by preschool children who stutter (CWS), as well as to determine whether it has an effect on the type of stuttered disfluency produced.
A 500+ word language sample was obtained from 19 CWS. Each stuttered word was randomly paired with a fluently produced word that closely matched it in grammatical class, word length, familiarity, word and neighborhood frequency, and neighborhood density. Phonotactic probability values were obtained for the stuttered and fluent words from an online database.
Phonotactic probability did not have a significant influence on the overall susceptibility of words to stuttering, but it did impact the type of stuttered disfluency produced. In specific, single-syllable word repetitions were significantly lower in phonotactic probability than fluently produced words, as well as part-word repetitions and sound prolongations.
In general, the differential impact of phonotactic probability on the type of stuttering-like disfluency produced by young CWS provides some support for the notion that different disfluency types may originate in the disruption of different levels of processing.
Stuttering; Language; Phonotactic Probability; Children
Reading comprehension depends on neural processes supporting the access, understanding, and storage of words over time. Examinations of the neural activity correlated with reading have contributed to our understanding of reading comprehension, especially for the comprehension of sentences and short passages. However, the neural activity associated with comprehending an extended text is not well-understood. Here we describe a current-source-density (CSD) index that predicts individual differences in the comprehension of an extended text. The index is the difference in CSD-transformed event-related potentials (ERPs) to a target word between two conditions: a comprehension condition with words from a story presented in their original order, and a scrambled condition with the same words presented in a randomized order. In both conditions participants responded to the target word, and in the comprehension condition they also tried to follow the story in preparation for a comprehension test. We reasoned that the spatiotemporal pattern of difference-CSDs would reflect comprehension-related processes beyond word-level processing. We used a pattern-classification method to identify the component of the difference-CSDs that accurately (88%) discriminated good from poor comprehenders. The critical CSD index was focused at a frontal-midline scalp site, occurred 400–500 ms after target-word onset, and was strongly correlated with comprehension performance. Behavioral data indicated that group differences in effort or motor preparation could not explain these results. Further, our CSD index appears to be distinct from the well-known P300 and N400 components, and CSD transformation seems to be crucial for distinguishing good from poor comprehenders using our experimental paradigm. Once our CSD index is fully characterized, this neural signature of individual differences in extended-text comprehension may aid the diagnosis and remediation of reading comprehension deficits.
reading comprehension; EEG/ERP; machine learning applied to neuroscience; current source density; working memory
A growing body of literature in psychology, linguistics, and the neurosciences has paid increasing attention to the understanding of the relationships between phonological representations of words and their meaning: a phenomenon also known as phonological iconicity. In this article, we investigate how a text's intended emotional meaning, particularly in literature and poetry, may be reflected at the level of sublexical phonological salience and the use of foregrounded elements. To extract such elements from a given text, we developed a probabilistic model to predict the exceeding of a confidence interval for specific sublexical units concerning their frequency of occurrence within a given text contrasted with a reference linguistic corpus for the German language. Implementing this model in a computational application, we provide a text analysis tool which automatically delivers information about sublexical phonological salience allowing researchers, inter alia, to investigate effects of the sublexical emotional tone of texts based on current findings on phonological iconicity.
phonological iconicity; sound symbolism; foregrounding; text analysis tool; neurocognitive poetics
Natural language processing is an important tool in biomedicine, and fails without successful segmentation of words and sentences. Tokenization is a form of segmentation that identifies boundaries separating semantic units, for example words, dates, numbers and symbols, within a text. We sought to construct a highly generalizeable tokenization algorithm with no prior knowledge of characters or their function, based solely on the inherent statistical properties of token and sentence boundaries. Tokenizing clinician-entered free text, we achieved precision and recall of 92% and 93%, respectively compared to a whitespace token boundary detection algorithm. We classified over 80% of punctuation characters correctly, based on manual disambiguation with high inter-rater agreement (kappa = 0.916). Our algorithm effectively discovered properties of whitespace and punctuation in the corpus without prior knowledge of either. Given the dynamic nature of biomedical language, and the variety of distinct sublanguages used, the effectiveness and generalizability of our novel tokenization algorithm make it a valuable tool.
This study was a comparison of the effects of oral speech with total communication (speech plus sign language) training on the ability of mentally retarded children to repeat 4-word sentences. Three children were chosen who used single words to communicate but who did not combine words into complete sentences. Three sentence pairs were trained, with each pair having one sentence trained using oral methods and an equivalent one trained using the total communication approach. Both training procedures involved chaining sentence parts, reinforcement, and prompting. Oral methods involved presenting vocal stimuli and requiring vocal responses whereas total communication methods involved presenting vocal and signed stimuli and requiring vocal and signed responses. For the initial sentence pair with each child, an alternating treatments design was used to determine the relative efficacy of the two language training approaches. This was repeated with a second and third sentence pair using a multiprobe technique within a multiple baseline design. Results pointed to the superiority of the total communication approach in facilitating sentence repetition. Possible explanations of these results are offered and the utility of the alternating treatments experimental design is discussed.
Previous language learning research reveals that the statistical properties of the input offer sufficient information to allow listeners to segment words from fluent speech in an artificial language. The current pair of studies uses a natural language to test the ecological validity of these findings and to determine whether a listener’s language background influences this process. In Study 1, the “guessibility” of potential test words from the Norwegian language was presented to 22 listeners who were asked to differentiate between true words and nonwords. In Study 2, 22 adults who spoke one of 12 different primary languages learned to segment words from continuous speech in an implicit language learning paradigm. The task consisted of two sessions, approximately three weeks apart, each requiring participants to listen to 7.2 minutes of Norwegian sentences followed by a series of bisyllabic test items presented in isolation. The participants differentially accepted the Norwegian words and Norwegian-like nonwords in both test sessions, demonstrating the capability to segment true words from running speech. The results were consistent across three broadly-defined language groups, despite differences in participants’ language background.
implicit learning; language; statistical learning; second language acquisition
When looking for the referents of novel nouns, adults and young children are sensitive to cross-situational statistics (Yu and Smith, 2007; Smith and Yu, 2008). In addition, the linguistic context that a word appears in has been shown to act as a powerful attention mechanism for guiding sentence processing and word learning (Landau and Gleitman, 1985; Altmann and Kamide, 1999; Kako and Trueswell, 2000). Koehne and Crocker (2010, 2011) investigate the interaction between cross-situational evidence and guidance from the sentential context in an adult language learning scenario. Their studies reveal that these learning mechanisms interact in a complex manner: they can be used in a complementary way when context helps reduce referential uncertainty; they influence word learning about equally strongly when cross-situational and contextual evidence are in conflict; and contextual cues block aspects of cross-situational learning when both mechanisms are independently applicable. To address this complex pattern of findings, we present a probabilistic computational model of word learning which extends a previous cross-situational model (Fazly et al., 2010) with an attention mechanism based on sentential cues. Our model uses a framework that seamlessly combines the two sources of evidence in order to study their emerging pattern of interaction during the process of word learning. Simulations of the experiments of (Koehne and Crocker, 2010, 2011) reveal an overall pattern of results that are in line with their findings. Importantly, we demonstrate that our model does not need to explicitly assign priority to either source of evidence in order to produce these results: learning patterns emerge as a result of a probabilistic interaction between the two clue types. Moreover, using a computational model allows us to examine the developmental trajectory of the differential roles of cross-situational and sentential cues in word learning.
probabilistic modeling; cross-situational word learning; syntactic bootstrapping; context-based attention mechanisms
The combining of individual concepts to form an emergent concept is a fundamental aspect of language, yet much less is known about it than about processing isolated words or sentences. To facilitate research on conceptual combination, we provide meaningfulness ratings for a large set of (2,160) noun–noun pairs. Half of these pairs (1,080) are reversed versions of the other half (e.g., ski jacket and jacket ski), to facilitate the comparison of successful and unsuccessful conceptual combination independently of constituent lexical items. The computer code used for obtaining these ratings through a Web interface is provided. To further enhance the usefulness of this resource, ancillary measures obtained from other sources are also provided for each pair. These measures include associate production norms, contextual relatedness in terms of latent semantic analysis distance, total number of letters, phrase-level usage frequency, and word-level usage frequency summed across the words in each pair. Results of correlation and regression analyses are also provided for a quantitative description of the stimulus set. A subset of these stimuli was used to identify neural correlates of successful conceptual combination Graves, Binder, Desai, Conant, & Seidenberg, (NeuroImage 53:638–646, 2010). The stimuli can be used in other research and also provide benchmark data for evaluating the effectiveness of computational algorithms for predicting meaningfulness of noun–noun pairs.
Electronic supplementary material
The online version of this article (doi:10.3758/s13428-012-0256-3) contains supplementary material, which is available to authorized users.
Conceptual combination; Lexical; Semantics; Ratings; Concepts
Human language may be described as a complex network of linked words. In such a treatment, each distinct word in language is a vertex of this web, and interacting words in sentences are connected by edges. The empirical distribution of the number of connections of words in this network is of a peculiar form that includes two pronounced power-law regions. Here we propose a theory of the evolution of language, which treats language as a self-organizing network of interacting words. In the framework of this concept, we completely describe the observed word web structure without any fitting. We show that the two regimes in the distribution naturally emerge from the evolutionary dynamics of the word web. It follows from our theory that the size of the core part of language, the 'kernel lexicon', does not vary as language evolves.