Search tips
Search criteria 


Logo of geronbLink to Publisher's site
J Gerontol B Psychol Sci Soc Sci. 2010 November; 65B(6): 706–711.
Published online 2010 September 13. doi:  10.1093/geronb/gbq064
PMCID: PMC2954330

Propositional Density and Cognitive Function in Later Life: Findings From the Precursors Study



We used longitudinal data from the Johns Hopkins Precursors Study to test the hypothesis that written propositional density measured early in life is lower for people who develop dementia categorized as Alzheimer's disease (AD). This association was reported in 1996 for the Nun Study, and the Precursors Study offered an unprecedented chance to reexamine it among respondents with different gender, education, and occupation profiles.


Eighteen individuals classified as AD patients (average age at diagnosis: 74) were assigned 2 sex-and-age matched controls, and propositional density in medical school admission essays (average age at writing: 22) was assessed via Computerized Propositional Idea Density Rater 3 linguistic analysis software. Adjusted odds ratios (ORs) for the matched case-control study were calculated using conditional (fixed-effects) logistic regression.


Mean propositional density is lower for cases than for controls (4.70 vs. 4.99 propositions per 10 words, 1-sided p = .01). Higher propositional density substantially lowers the odds of AD (OR = 0.16, 95% confidence interval = 0.03-0.90, 1-sided p = .02).


Propositional density scores in writing samples from early adulthood appear to predict AD in later life for men as well as women. Studies of cognition across the life course might beneficially incorporate propositional density as a potential marker of cognitive reserve.

Keywords: AD, Cognitive reserve, Dementia, Propositional density

THE cognitive reserve hypothesis is based on two key findings: first, that Alzheimer's disease (AD) risk is lower among those with higher education or who come from higher socioeconomic backgrounds, and second, that such individuals manifest symptoms of the disease at later stages (Stern, 2006, 2009). The hypothesis posits that those with greater cognitive reserve will maintain better function at similar levels of brain disease due to an increased capacity to compensate for damage. Educational or occupational attainment, literacy, and IQ scores are commonly used proxies for cognitive reserve (Mortimer, Borenstein, Gosche, & Snowdon, 2005; Richards & Sacker, 2003; Stern et al., 1994).

Although AD patients are known to have difficulties in writing and other linguistic domains (Taler & Phillips, 2008), there are few systematic studies of writing performance as a diagnostic or prognostic marker, and the Nun Study is unique in examining the relationship of early-life writing to subsequent AD. In a seminal paper, Snowdon and colleagues (1996) used rich data on nuns from Baltimore and Milwaukee to show that low propositional density in autobiographies written in early life was a strong predictor of poor cognitive performance in later life and of neuropathologically confirmed AD.

Propositional density measures the complexity of written and spoken language by counting the number of interrelated ideas expressed by individuals in text or conversation (Kintsch, 1974; Turner & Greene, 1977). Often used in studies of aging (Kemper, Greiner, Marquis, Prenovost, & Mizner, 2001; Kemper, Kynette, Rash, O’Brein, & Sprott, 1989; Kemper, Marquis, & Thompson, 2001; Lyons et al., 1993), a propositional density score quantifies the extent to which a person is connecting ideas (via assertions, questions, etc.) rather than merely referring to entities. In addition to linking low propositional density in young adulthood to later AD diagnoses, the Nun Study researchers showed that propositional density scores were inversely correlated with the severity of AD, including neurofibrillary tangle counts in the frontal, temporal, and parietal lobes measured at autopsy (Riley, Snowdon, Desrosiers, & Markesbery, 2005; Snowdon, Greiner, & Markesbery, 2000). Higher propositional density has also been linked with intact cognition in late life despite the presence of AD lesions (Iacono et al., 2009).

The idea of a life-course trajectory of cognitive function has intrigued AD researchers, and propositional density has been recognized as a potential indicator of cognitive reserve. However, the demanding nature of the necessary data—measures of writing style early in life and cognitive functioning in later life for the same individuals—has made it difficult to replicate the Nun Study analyses.

Longitudinal data from the Johns Hopkins Precursors Study offered an unprecedented chance to reevaluate the hypothesis that propositional density in early-life writing samples is significantly lower for those who developed AD in later life. We examined the relationship in a sample with gender, education, and occupation profiles very different from the nuns’ and employed a novel computerized propositional density measurement tool (Computerized Propositional Idea Density Rater [CPIDR3]). A matched case–control design capitalized on the selected and relatively homogenous nature of the Precursors Study cohort to control for known confounders while maximizing statistical power for detecting a difference in propositional density between cases and controls.


Sample Selection and Study Design

The Precursors Study is an ongoing prospective study of disease risk factors and health outcomes among all entering members of the Johns Hopkins School of Medicine classes of 1948–1964 (Thomas, 1951). The medical school graduates in the study complete annual questionnaires listing, among other information, new diagnoses and health issues experienced in the past year. A committee of physicians then assigns codes based on the International Classification of Diseases (ICD-9-CM Millennium Edition, 2000) for conditions reported by respondents or family members or listed in death certificates. The ICD is the most widely used diagnostic classification system for morbidity and mortality statistics and is designed to promote international comparability in the collection, processing, and presentation of these statistics (World Health Organization, 2010).

The Precursors Study also contains a rich archive of documentation, including personal statements written by study participants when they sought admission to the medical school. The statements, written at an average age of 22 years, provide a fitting counterpart to the nuns’ autobiographies and offer an opportunity to reexamine the association between early writing style and later cognitive outcomes.

We employed a nested case–control design with cumulative incidence sampling (Mantel, 1973) from the Precursors Study cohorts. Because Snowdon et al. (2000) showed that propositional density was more strongly associated with AD than with other forms of dementia, we focused our analyses on 18 cases with reported clinical AD diagnoses. The cumulative incidence rate of AD (ICD-9 code 331.0) in the Precursors cohort is 22.1 per 100,000. Whereas the AD assessments used by the Precursors Study are not as definitive as postmortem neuropathological diagnoses, the conservative nature of the ICD-9 diagnoses suggest that AD tends to be under-recognized in ICD-9–based coding (see Fillit, Geldmacher, Welter, Maslow, and Fraser, 2002). Thus, although reliance on this classification may have resulted in the exclusion of unidentified AD cases from our sample, our case group was unlikely to include persons with non-AD forms of dementia. (Indeed, our preliminary analyses showed that cases with an AD designation had exceptionally low propositional density not only in comparison to healthy study participants but also relative to those with other forms of cognitive impairment [4.70 vs. 4.93 propositions per 10 words, p = .03], bolstering our conviction that the AD identification is meaningful.)

Each case was assigned two non-cognitively impaired controls matched on sex and age (within 1 year on average or up to 6.8 years for the oldest), for a total sample of 54. Inclusion criteria for controls were a lack of dementia-related diagnoses and a score of 33 or above on the Telephone Interview of Cognitive Status (Brandt, Specter, & Folstein, 1988), administered in 2004–2005. Variables describing respondents’ demographic characteristics and cognitive function were extracted into the analytic data set.

Propositional Density Coding

Propositional density is measured as the ratio of expressed propositions to total number of words in a text, and often reported as the number of propositions per 10 words (Snowdon et al., 1996). Measures of propositional density in early life were constructed from the “statement of activities” prepared by applicants in response to the instruction: “Write below, in your own handwriting, a connected statement (which should be more than a list) of your general activities and intellectual interests in college (150-300 words).” (Compare this with archival information from the convent, which indicates that each sister was asked to “write a short sketch of her own life. This account should not contain more than 200-300 words and should be written on a single sheet of paper … include the place of birth, parentage, interesting and edifying events of one's childhood, schools attended, influences that led to the convent, religious life, and its outstanding events” [Snowdon et al., 1996].) Photocopies of the handwritten admissions essays were transcribed and then verified for accuracy and completeness.

Following the guidelines described in a Nun Study manual (generously provided by Kemper), propositional density was assessed for the last 10 sentences of each essay, or, when essays contained fewer than 10 sentences, the entire essay. The total number of sentences in each sample was recorded to account for potential differences by essay length.

Coding was conducted using the CPIDR3, version 3.2.2785.24603, a novel software program that generates a propositional density score for any English text on the basis of part-of-speech tags and additional rules for adjusting the total count (for examples, see Supplementary Appendix I). Brown, Snodgrass, Kemper, Herman, and Covington (2008) validated the software against Turner and Greene's examples and against human raters. CPIDR3 is highly consistent, furnishing the same rating for the same sentence and eliminating the nonreproducibility common to human coding. (The essays were also evaluated for propositional density by two independent coders who were blinded to case status. In multivariate analyses, similar results were obtained using the consensus score reached by the human coders and the score generated by CPIDR. The latter, more easily reproducible results are reported here.)

Statistical Analysis

Individual matching allowed us to evaluate the effect of a risk factor of primary interest (low propositional density) while achieving the most precise possible control for the covariates of age and sex, which are known to be associated with AD. As it was not possible to match cases and controls on the exact date of birth, any remaining age difference (in days) between cases and matched controls was included as a variable in the analysis to account for potential residual confounding. Because the study sample was selected from a cohort of medical students, cases and controls were similar in educational status and intellectual achievement, which are other known confounders.

Propositional density scores were treated as a continuous variable. After calculating descriptive statistics and a t test comparing mean propositional density for cases and controls, we employed a conditional (fixed-effects) logistic regression using Stata 10 to obtain Mantel–Haenszel odds ratio (OR) estimates (Rothman & Greenland, 1998). Conditional logistic regression offers the most robust analysis for matched data. With our 1:2 matched sample, we had 85% power to detect significant differences (at the .05 level) as low as 0.26 in the average scores of the two groups (Lachin, 2008). Consistent with our hypothesis of lower propositional density for AD cases, we report one-sided p values for all tests.


Table 1 summarizes the characteristics of the Precursors sample. Like the nuns (see Supplementary Appendix II), the Hopkins medical students are a relatively homogenous population, comprising mostly white males. The average ages at the time of the writing samples (22 years) are identical for our participants and the nuns, though the mean age of cognitive assessment is somewhat earlier in the Precursors Study (74 and 78 years for cases and controls, respectively) than in the Nun Study (80 years).

Table 1.
Characteristics of the Precursors Study Case–Control Sample

Mean propositional density is significantly lower for the cases relative to controls (4.70 vs. 4.99 propositions per 10 words, p = .01). Essays with low density tended to list activities (e.g., “I have participated in intramural sports, basketball, softball and tennis”), whereas those with higher scores provided more narrative (e.g., “I find foreign languages engrossing and I hope to continue my study of Spanish and German in the future”). Notably, the mean scores of the medical students, who were educated in Baltimore, are substantially lower than those of the Milwaukee nuns (Snowdon et al., 1996) but close to those of the Baltimore nuns (Kemper et al., 2001).

Table 2 lists the results of regression analyses testing the association of propositional density with case status while taking into account the matching of cases and controls. Model 1 shows considerably lower odds of AD diagnosis (OR = 0.16) for those with higher propositional density (p = .02). When the residual age difference between cases and controls is included in the analysis (Model 2) to control for any remaining confounding by age, the effect of propositional density is slightly less pronounced (OR = 0.22, p =.05), though the magnitude and significance of the OR continue to indicate a substantial reduction in AD risk for those whose writing exhibited higher propositional density.

Table 2.
Odds Ratios of Developing Alzheimer's Disease Associated With Propositional Density and 95% Confidence Intervals, Conditional Logistic Regressions


The Precursors Study provided an unprecedented opportunity for reexamining the Nun Study findings among respondents with markedly different gender, education, and occupation profiles. Our results indicate that propositional density in early-life writing is associated with a later-life AD diagnosis in a sample of mostly male medical-school graduates. Low propositional density scores might identify those at greater risk of developing AD, whereas the ability to produce idea-packed sentences in young adulthood could be an early signal of cognitive reserve, an active buffer against the effects of neuropathology. Though the observed difference in propositional density between cases and controls was considerably smaller in the Precursors Study than it was among participants in the Nun Study, it was statistically significant in a simple test of mean differences and in a more robust regression analysis.

Whereas the Nun Study consisted exclusively of women, more than 72% of the Precursors Study participants were male. It has long been known that females tend to perform better than males at tasks involving verbal fluency, perceptual speed, and verbal and item memory (Kimura, 1996). Sex differences in size (adjusted for overall brain volumes) have been reported for numerous cortical regions, including those influencing language, such as the superior temporal gyrus and Broca's area (Goldstein et al., 2001; Harasty, Double, Halliday, Kril, & McRitchie, 1997; Schlaefer et al., 1995). Recent studies have not, however, established any consistent or statistically significant neurobiological basis for sex differences in language function (Ihnen et al., 2009; Sommer, Aleman, Somers, Boks, & Kahn, 2008). Although sex differences have not been linked to odds of diagnosed AD, the same level of AD pathology (neuritic plaques and neurofibrillary tangles) is more likely to be clinically diagnosed as AD in women than in men (Barnes et al., 2005). Our matched design controlled for potential sex differences in the pooled analysis, and propositional density differences between cases and controls were apparent in this predominantly male sample.

The Precursors sample differs from the nun sample on common proxy measures of cognitive reserve including education, occupational attainment, and socioeconomic status. Whereas most of the nuns completed a high-school diploma before entering the convent, the Precursors Study participants completed undergraduate education prior to entering medical school. The majority of nuns subsequently earned a bachelor's degree, whereas the Precursors Study members earned a medical degree. The nuns were professionally engaged in teaching, and the Hopkins graduates had careers in medicine or related fields. As members of a religious community, the nuns took vows of chastity, poverty, and obedience and shared a common lifestyle. The opportunities open to graduates of a prestigious medical school likely led the Precursors Study participants to lifestyles quite distinct from those of the nuns.

Though higher income and increased educational and occupational attainment are jointly and independently protective against dementia (Richards & Sacker, 2003; Stern et al., 1994), propositional density scores in the Precursors Study were lower than those for the nuns. This finding might reflect varied writing styles across the two populations (potentially associated with differences in sex composition, occupation, or other characteristics), differences in their essay prompts, or distinctions in the motivation with which the two groups approached their writing.

Notably, the nuns’ propositional density was assessed by humans, whereas our results are based on computer-generated scores. However, Brown et al. (2008) reported high correlation (r = .97) between CPIDR's scores and those by humans, and results for our sample using hand-coded scores (not shown) did not differ substantially from those reported above.

A few potential limitations have to do with the selection of cases and controls. First, selecting cases nested in a cohort makes it difficult to distinguish between the exposure/disease and exposure/survival associations. Because Snowdon, Greiner, Kemper, Nanayakkara, and Mortimer, (1999) showed higher all-cause mortality for those with low propositional density, it is possible that individuals with lower propositional density died before developing AD. However, the relative youth of the sample may mitigate this potential selection bias. Furthermore, cases were identified based on assessment of study data by a team of medical experts following ICD-9 guidelines, not via a definitive postmortem neuropathological diagnosis. Due to this classification system, our study may have excluded AD cases from the sample, but was unlikely to include cases with non-AD forms of dementia. Notably, this type of potential classification error would bias our analysis against a significant result, rendering the significant association we report above conservative.

Finally, due to the cumulative incidence sampling procedure, all study participants who were not diagnosed with cognitive impairment at the time of sampling (2007) were eligible for inclusion as controls. It remains possible that current controls may become cases in the future and that the propositional density association we found applies particularly to early-onset AD. This potential misclassification would also bias the results toward the null hypothesis of no difference between the groups and again render our findings conservative. Further analysis as the study cohort ages may allow more definitive conclusions about the longitudinal relationship of linguistic complexity with cognitive pathology.

An association between cognitive ability measured in early life and information-processing speed at older ages was reported by Luciano and colleagues (2009), though the correlation was lower in carriers of the APOE-e4 allele. Distinct from markers of biological vulnerability to AD such as APOE-4, propositional density potentially reflects cognitive reserve, a factor that influences the brain's capacity for compensation through life span pathways that remain to be elucidated. Due to the selected nature of the Precursors sample, further studies involving participants with more varied educational, socioeconomic, and racial/ethnic backgrounds are needed. Although it remains unclear whether writing idea-dense prose is a protective activity or an indicator of underlying cognitive reserve, it is a compelling marker of AD risk that could be beneficially incorporated into future studies of cognitive change across the life course.


The National Institute on Aging (R01 AG01760); the National Institute of Diabetes and Digestive and Kidney Diseases (K24 DK02856); the Hopkins Population Center (R24 HD042854).


Supplementary material can be found at:

Supplementary Data:


We thank Susan Kemper for generously providing us with the training materials used in the original Nun Study propositional density analyses. We also express our appreciation to Teri Whitehead for transcribing the Precursors admissions essays, Amelia Greiner for assisting with propositional density coding, and Audrey Chu for preparing the data for analysis. Finally, we thank Michelle Carlson, Joshua Garoon, and three anonymous reviewers for helpful comments on the manuscript.


  • Barnes LL, Wilson RS, Bienias JL, Schneider JA, Evans DA, Bennet DA. Sex differences in the clinical manifestations of Alzheimer disease pathology. Archives of General Psychiatry. 2005;62:685–691. [PubMed]
  • Brandt J, Specter M, Folstein MF. The telephone interview for cognitive status. Neuropsychiatry, Neuropsychology, and Behavioral Neurology. 1988;1:111–117.
  • Brown C, Snodgrass T, Kemper SJ, Herman R, Covington MA. Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods. 2008;40:540–545. [PMC free article] [PubMed]
  • Fillit H, Geldmacher DS, Welter RT, Maslow K, Fraser M. Optimizing coding and reimbursement to improve management of Alzheimer's disease and related dementias. Journal of the American Geriatrics Society. 2002;50:1871–1878. [PubMed]
  • Goldstein JM, Seidman LJ, Horton NJ, Makris DN, Kennedy N, Caviness VS, Jr., Faraone SV, Tsuang MT. Normal sexual dimorphism of the adult human brain assessed by in vivo magnetic resonance imaging. Cerebral Cortex. 2001;11:490–497. [PubMed]
  • Harasty J, Double KL, Halliday GM, Kril JJ, McRitchie DA. Language-associated cortical regions are proportionally larger in the female brain. Archives of Neurology. 1997;54:171–176. [PubMed]
  • Iacono D, Markesbery WR, Gross M, Pletnikova O, Rudow G, Zandi P, Troncoso JC. The nun study: Clinically silent AD, neuronal hypertrophy, and linguistic skills in early life. Neurology. 2009;73:665–673. [PMC free article] [PubMed]
  • ICD-9-CM Millennium Edition. International Classification of Diseases 9th Revision, Clinical Modification. 6th ed. Los Angeles: Practice Management Information Corporation; 2000.
  • Ihnen SKZ, Church JA, Petersen SE, Schlaggar BL. Lack of generalizability of sex differences in the fMRI BOLD activity associated with language processing in adults. Neuroimage. 2009;45:1020–1032. [PMC free article] [PubMed]
  • Kemper S, Greiner LH, Marquis JG, Prenovost K, Mizner TL. Language decline across the life span: Findings from the Nun Study. Psychology and Aging. 2001;16:227–239. [PubMed]
  • Kemper S, Kynette D, Rash S, O’Brien K, Sprott R. Life-span changes to adults’ language. Applied Psycholinguistics. 1989;10:49–66.
  • Kemper S, Marquis J, Thompson M. Longitudinal change in language production: Effect of aging and dementia on grammatical complexity and propositional content. Psychology and Aging. 2001;16:600–614. [PubMed]
  • Kimura D. Sex, sexual orientation and sex hormones influence human cognitive function. Current Opinion Neurobiology. 1996;6:259–263. [PubMed]
  • Kintsch W. The representation of meaning in memory. Hillsdale, NJ: Erlbaum; 1974.
  • Lachin JM. Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic regression model. Statistics in Medicine. 2008;27:2509–2523. [PMC free article] [PubMed]
  • Luciano M, Gow AJ, Harris SE, Hayward C, Allerhand M, Starr JM, Visscher PM, Deary IJ. Cognitive ability at age 11 and 70 years, information processing speed, and APOE variation: The Lothian birth cohort 1936 study. Psychology and Aging. 2009;24:129–138. [PubMed]
  • Lyons K, Kemper S, LaBarge E, Feraro F, Balota D, Storandt M. Language and Alzheimer's disease: A reduction in syntactic complexity. Aging Cognition. 1993;50:81–86.
  • Mantel N. Synthetic retrospective studies and related topics. Biometrics. 1973;29:479–486. [PubMed]
  • Mortimer JA, Borenstein AR, Gosche KM, Snowdon DA. Very early detection of Alzheimer neuropathology and the role of brain reserve in modifying its clinical expression. Journal of Geriatric Psychiatry and Neurology. 2005;18:218–223. [PMC free article] [PubMed]
  • Richards M, Sacker A. Lifetime antecedents of cognitive reserve. Journal of Clinical and Experimental Neuropsychology. 2003;25:614–624. [PubMed]
  • Riley KP, Snowdon DA. The challenges and successes of aging: Findings from the Nun Study. Advances in Medical Psychotherapy. 1999–2000;10:1–12.
  • Riley KP, Snowdon DA, Desrosiers MF, Markesbery WR. Early life linguistic ability, late life cognitive function, and neuropathology: Findings from the Nun Study. Neurobiology of Aging. 2005;26:341–347. [PubMed]
  • Rothman KJ, Greenland S. Modern epidemiology. 2nd ed. Philadelphia: Lippincott Williams & Wilkins; 1998.
  • Schlaefer TE, Harris GJ, Tien AY, Peng L, Lee S, Pearlson GD. Structural differences in the cerebral cortex of healthy female and male subjects: A magnetic resonance imaging study. Psychiatry Res. 1995;61:129–135. [PubMed]
  • Snowdon DA, Greiner LH, Kemper SJ, Nanayakkara N, Mortimer JA. Linguistic ability in early life and longevity: Findings from the Nun Study. In: Robine JM, Forette B, Franceschi C, Allard M, editors. The paradoxes of longevity. Heidelberg, Germany: Springer; 1999. pp. 103–113.
  • Snowdon DA, Greiner LH, Markesbery WR. Linguistic ability in early life and the neuropathology of Alzheimer's disease and cerebrovascular disease: Findings from the Nun Study. Vascular factors in Alzheimer's disease. 2000;903:34–38. [PubMed]
  • Snowdon DA, Kemper SJ, Mortimer JA, Greiner LH, Wekstein DR, Markesbery WR. Linguistic ability in early life and cognitive function and Alzheimer's disease in late life: Findings from the Nun Study. Journal of the American Medical Association. 1996;275:528–532. [PubMed]
  • Sommer IE, Aleman A, Somers M, Boks MP, Kahn RS. Sex differences in handedness, asymmetry of the planum temporale and functional language lateralization. Brain Research. 2008;1206:76–88. [PubMed]
  • Stern Y. Cognitive reserve and Alzheimer disease. Alzheimer Disease Associated Disorders. 2006;20:112–117. [PubMed]
  • Stern Y. Cognitive reserve. Neuropsycholgia. 2009;47:2015–2028. [PMC free article] [PubMed]
  • Stern Y, Gurland B, Tatemichi TK, Tang MX, Wilder D, Mayeux R. Influence of education and occupation on the incidence of Alzheimer's disease. Journal of the American Medical Association. 1994;271:1004–1010. [PubMed]
  • Taler V, Phillips NA. Language performance in Alzheimer's disease and mild cognitive impairment: A comparative review. Journal of Clinical and Experimental Neuropsychology. 2008;30:501–556. [PubMed]
  • Thomas CB. Observations on some possible precursors of essential hypertension and coronary artery disease. Bulletin of the Johns Hopkins Hospital. 1951;89:419–441. [PubMed]
  • Turner A, Greene E. The construction and use of a propositional text base (Tech. Rep. No. 63) Boulder, CO: University of Colorado Institute for the Study of Intellectual Behavior; 1977.
  • World Health Organization. International Classification of Diseases. 2010. Retrieved July 23, 2010, from

Articles from The Journals of Gerontology Series B: Psychological Sciences and Social Sciences are provided here courtesy of Oxford University Press