|Home | About | Journals | Submit | Contact Us | Français|
We evaluated genetic and environmental contributions to individual differences in language skills during early adolescence, measured by both language sampling and standardized tests, and examined the extent to which these genetic and environmental effects are stable across time.
We used structural equation modeling on latent factors to estimate additive genetic, shared environmental, and nonshared environmental effects on variance in standardized language skills (i.e., Formal Language) and productive language-sample measures (i.e., Productive Language) in a sample of 527 twins across 3 time points (mean ages 10–12 years).
Individual differences in the Formal Language factor were influenced primarily by genetic factors at each age, whereas individual differences in the Productive Language factor were primarily due to nonshared environmental influences. For the Formal Language factor, the stability of genetic effects was high across all 3 time points. For the Productive Language factor, nonshared environmental effects showed low but statistically significant stability across adjacent time points.
The etiology of language outcomes may differ substantially depending on assessment context. In addition, the potential mechanisms for nonshared environmental influences on language development warrant further investigation.
In the child-language literature, adolescent development often gets overshadowed by the exciting linguistic milestones of the earlier years. Nevertheless, it is difficult to imagine a more significant transitional period than adolescence. Defined as the maturational period spanning the onset of puberty to early adulthood, 1 adolescence is associated with significant neurobiological, physical, cognitive, social, and behavioral changes (American Psychological Association, 2002; Dahl, 2004; D. Rice & Barone, 2000; Steinberg, 2005). Given evidence of increased brain plasticity during this period, some have referred to adolescence as a second critical period of development (Dahl, 2004; Huttenlocher, 1994; Peper et al., 2009; D. Rice & Barone, 2000). In regard to communication-related skills, the adolescent years are marked by growth and increasing individual differences in metalinguistic and abstract thinking, such as use and understanding of figurative language and morphologically complex nouns and adjectives (Nippold, 2000; Nippold & Sun, 2008; Paul, 2001), development of an adolescent register (Nippold, 2000), and increased sophistication in theory of mind, particularly as it relates to affect (Blakemore, 2012). In addition, the period of adolescence has been associated with a narrowing of linguistic flexibility, at least in terms of acquiring native competencies in domains such as phonology and morphosyntax (e.g., Johnson & Newport, 1989; Thompson et al., 2000). The present study examines the stability of etiological effects on language development explicitly during early adolescence.
Longitudinal twin studies that compare monozygotic (MZ) and dizygotic (DZ) twins are a useful way to examine genetic and environmental influences associated with adolescent development. Twin studies offer a natural control for genetic effects because MZ twins are twice as similar, genetically, as DZ twins. Twin studies also facilitate the estimation of environmental effects, which can be divided into two types: shared environmental factors, reflecting variation in nongenetic influences that contribute to familial resemblance (e.g., family socioeconomic status), and nonshared environmental factors, reflecting variation in environmental influences that cause individual family members to differ from one another (e.g., different educational experiences). Twin similarity is increased by genetic and shared environmental factors and reduced by nonshared environmental factors. The genetic effect size is typically referred to as heritability (a 2 or h 2), with shared and nonshared environmental effects represented as c 2 and e 2, respectively. Estimates of genetic effects on individual differences in language development at different ages have varied widely across studies, with such variability attributed to differences in language domain (e.g., Stromswold, 2001), form of measurement (e.g., DeThorne et al., 2008), environmental circumstances (e.g., Rowe, Jacobson, & Van den Oord, 1999), and child age (e.g., Spinath, Price, Dale, & Plomin, 2004). Specific to child age, multivariate analyses with longitudinal data allow us to examine the stability of genetic and environmental effects over time.
In one example of a longitudinal twin analysis, Haworth et al. (2010) combined data across six twin studies from four countries to study potential developmental change in the etiology of individual differences in IQ using a total sample of 11,000 twin pairs. Twins were grouped according to age: childhood (4–10 years), adolescence (11–13 years), and young adulthood (14–34 years). Across these three developmental periods, estimates of heritability increased linearly from 41% in childhood to 66% in young adulthood, with a middle ground of 55% heritability during adolescence (see also Hoekstra, Bartels, & Boomsma, 2007; Oliver & Plomin, 2007). Haworth et al. note that increased heritability of general cognitive ability could be due to the emergence of new genetic effects or the process of gene–environment correlation (p. 1118). As an example, active gene–environment correlation suggests that as children gain increased control over their environments, they are likely to select experiences and contexts that are more consistent with their own genetic inclinations, thereby predicting increased heritability over time (cf. Hopper, 2000; Plomin, DeFries, & Loehlin, 1977; Scarr & McCartney, 1983).
Specific to language, Hayiou-Thomas, Dale, and Plomin (2012) offered “the first long-term longitudinal examination of the etiology of individual differences in language from early childhood through to adolescence” (p. 233). Their analyses included approximately 8,000 same-sex twin pairs from the Twins' Early Development Study (Oliver & Plomin, 2007) and focused on three time periods: early childhood (2–4 years), middle childhood (7–10 years), and early adolescence (12 years). They noted an increase in heritability as children moved from early childhood (a 2 = .24) into middle childhood (a 2 = .57 and .63) due to new genetic influences. In addition, they reported an increase in nonshared environmental effects on adolescent language (e 2 = .13 and .22) compared with early (e 2 = .02) and middle childhood (e 2 = .06 and .05). One particularly interesting feature from this study was the diversity of measures used to assess child language across ages: Early childhood focused on a parent-report measure of combined vocabulary and syntax; middle childhood included both a standardized vocabulary subtest and a curriculum-based teacher assessment of “speaking and listening”; and early adolescence included a web-based assessment of receptive vocabulary, syntax, and pragmatics derived from standardized language subtests and a teacher-report measure similar to the one from middle childhood. As a consequence, it is difficult to disentangle whether results reflect shifting etiological effects across time points or differences in the underlying language constructs being assessed.
The Western Reserve Reading and Math Project (WRRMP; Petrill, Deater-Deckard, Thompson, DeThorne, & Schatschneider, 2006), a longitudinal twin study of reading development and related abilities, is unique in that it includes language-sample data collected during annual home visits in addition to standardized language assessments. Using this sample, DeThorne et al. (2008) conducted a multivariate genetic analysis of language skills that included 380 twins in first grade. Language measures loaded on two latent factors: a Conversational factor that included language-sample measures such as mean length of utterance (MLU), number of total words (NTW), and number of different words (NDW), and a Formal factor that included two standardized vocabulary tests. Heritability for the Conversational and Formal factors was .70 and .37, respectively, with a genetic correlation of .37 between the two factors.
A follow-up study by DeThorne, Harlaar, Petrill, and Deater-Deckard (2012) examined the longitudinal stability of genetic and environmental influences on children's language-sample measures, again using a Conversational factor, across first and second grades. The analyses revealed that 62% of the variance in children's conversational language at first grade was due to genetic effects. At second grade, the heritability of the same Conversational factor dropped to 34% and overlapped entirely with the genetic effects from one year prior. In contrast to the stability in genetic effects, the nonshared environmental effects, estimated at 38% and 66% at first and second grade, respectively, did not overlap across the two time points.
The present work represents an extension of prior WRRMP analyses (i.e., DeThorne et al., 2008, 2012) by examining the extent and stability of etiological influences on children's language development, using both language samples and standardized measures, across the critical developmental window of early adolescence. We asked the following research questions:
Participants from the present study came from the WRRMP Project. Same-sex twins from Ohio were recruited during kindergarten and first grade through media advertisements, school nominations, Ohio state birth records, and mothers-of-twins clubs. The cohort was systematically followed through a series of annual home visits from kindergarten until at least fifth grade. The present analyses focused on data from the fifth, sixth, and seventh home visits, 2 which centered on third, fourth, and fifth grades, respectively. Each annual home visit included a 2.5-hr assessment protocol of standardized measures and parent questionnaires. At each home visit, twins were assessed simultaneously, each by a separate examiner.
Demographic data, such as racial or ethnic background and parent education, were taken from questionnaires administered at the time families entered the study. Across the three time points, the sample was 91%–93% White and 57%–58% female and included 21%–42% MZ (vs. DZ) twins. According to parent reports of highest educational level attained, 100% of the respondents had completed high school and 92% of them had pursued postsecondary education: 32% of parents had completed a 4-year degree and 22% had completed graduate or professional school. Zygosity was determined through DNA testing from buccal swabs or a questionnaire of twin similarity reported to be 95% accurate (cf. Goldsmith, 1991). Information regarding hearing health, expressive language development, and speech-language pathology services was collected from a primary caregiver at entry into the study through the Speech-Language Survey (DeThorne et al., 2006).
Twins were selected from the WRRMP on the basis of having at least one type of language data (either language samples or standardized language-test scores) at each home visit, as well as complete data on age, biological sex, and zygosity; no different-sex pairs were included in the study. Five individuals were subsequently excluded due to a history of persistent hearing difficulties, as reported by parents on the Speech-Language Survey. This process identified 522 participants at Home Visit 5 (HV5), 95% of whom had both language samples and standardized language test scores; 498 at HV6, 87% of whom had both language samples and standardized language test scores; and 504 at HV7, 78% of whom had both language samples and standardized language-test scores. A total of 436 participants completed all three home visits. The mean age of twins was 9.83 years (SD = 0.99) at HV5, 10.88 years (SD = 1.02) at HV6, and 12.21 years (SD = 1.21) at HV7. On the basis of results from the Speech-Language Survey, between 7% and 8% of the sample was receiving speech-language pathology services when they entered the study.
Language data collected during each annual home visit included standardized-test scores from three subtests of the Clinical Evaluation of Language Fundamentals–Fourth Edition (CELF-4; Semel, Wiig, & Secord, 2003) and the Test of Narrative Language (TNL; Gillam & Pearson, 2004). In addition, the narratives produced during the TNL were supplemented by one additional picture plate taken from stimulus manual one of the Test of Language Competence–Expanded Edition (Wiig & Secord, 1989). The supplemental picture plate—which included a visual scene of three children, one of whom appeared to have gotten off of her bike because of a branch stuck through her spokes—was added to the protocol to lengthen the narrative samples. Similar to the final task in the TNL, participants were asked to look carefully at the photo and tell a related story. This picture elicitation task paired with the three expressive tasks from the TNL formed the narrative language sample. The entire narrative sample was digitally recorded and transcribed according to conventions associated with Systematic Analysis of Language Transcripts (Miller, Iglesias, & Nockerts, 2004). To prevent the inflation of utterance length on the basis of multiple conjoining conjunctions, independent clauses joined by common conjunctions (i.e., and, but, or) were segmented into separate utterances forming what has been referred to as communication units, or C-units (Loban, 1976; Nippold, 1998). In addition, repeated or reformulated units were removed from the linguistic analysis through parenthetical mazes.
Transcription was completed by research assistants within the Child Language and Literacy Laboratory at the University of Illinois. Twins within a pair were transcribed by different research assistants, so as not to procedurally inflate twin similarity. In addition, assistants were kept unaware of the zygosity of the twins they were transcribing. All assistants were trained through iterative review of the Systematic Analysis of Language Transcripts tutorial and lab manual until transcription of practice samples resulted in 85% point-by-point agreement with an experienced transcriber on both utterance boundaries and individual morphemes. In addition, approximately every 15th sample transcribed was subjected to a transcription reliability check in order to monitor and remedy potential drift in practiced conventions across transcribers. From this process, the point-by-point agreement across HV5 samples (n = 44) ranged from .67 to 1.00 (M = .92) for utterance boundaries and from .92 to .98 for morphemes (M = .96). In a similar vein, reliability for HV6 samples (n = 31) ranged from .62 to 1.00 (M = .93) for utterance boundaries and from .72 to .99 for morphemes (M = .95). Last, HV7 samples (n = 33) ranged in agreement from .84 to 1.00 for utterance boundaries (M = .94) and from .89 to .99 for morphemes (M = .97).
The transcription process resulted in language samples with an average length of 59 C-units at HV5 (SD = 31, range = 17–297), 62 at HV6 (SD = 25, range = 14–243), and 60 at HV7 (SD = 23, range = 28–236). Our elicitation procedures (i.e., story retell and single-picture elicitation method) and resulting sample lengths are well in line with prior published language-sample analyses of narrative samples (e.g., Miller et al., 2006; Nippold, Hesketh, Duthie, & Mansfield, 2005; Scott & Windsor, 2000; Tilstra & McMaster, 2007). A study by Heilmann, Nockerts, & Miller (2010) explicitly examined the reliability of productive-language measures from both conversational and narrative samples. Most relevant to the present study, that study reported Cronbach's α values of .74, .93, and .92, respectively, for MLU, NDW, and words per minute (a measure comparable to NTW) on narrative samples of comparable length to those in our study from children ages 6;0–13;3 (years; months; pp. 398).
The present analysis included the Oral Narration score from the TNL, which is based on three expressive tasks: retelling a story about a trip to a fast-food restaurant without any picture prompts, telling a narrative related to a sequence of five drawings of a boy late for school, and creating a story on the basis of a single pictured scene in which two children appear to have stumbled across aliens landing in a park. Consistent with scoring procedures reported in the manual, results across the three expressive tasks were summed into a single raw score for Oral Narration with a possible maximum of 90 points. Designed for children ages 5;0 to 11;11, the internal consistency (coefficient alpha) for the Oral Narration score reported in the test manual ranged from .85 to .90 for children between the ages of 8 and 11 (Gillam & Pearson, 2004, p. 44). In addition, test–retest reliability for the Oral Narration score from a relatively small sample of children ages 5–10 years was reported as .80 (uncorrected).
Recalling Sentences (CELF-4-RS). Considered an assessment of expressive language and memory, this subtest involved repetition of spoken sentences such as “The rabbit was not put in the cage by the girl” and “Does anyone know who the new teacher is?” Raw scores could range from 0 to 98 depending on the number of completed items and the number of errors. Errors included omissions, repetitions, additions, transpositions, and substitutions. Subtest reliability reported in the manual ranged from .87 to .92 for test–retest and from .86 to .91 for internal consistency (alpha) across relevant age ranges (9–15 years).
Understanding Spoken Paragraphs (CELF-4-USP). Recognized as a measure of receptive language, this subtest involved asking children to answer questions about three short narratives read aloud (four to nine sentences), which varied with age. There were a total of 15 questions focused on main ideas, details, sequences, inferences, and predictions related to the spoken narratives. Raw scores could range from 0 to 15 regardless of child age. Subtest reliability reported in the manual ranged from .51 to .87 for test–retest and from .54 to .68 for internal consistency (alpha) across relevant age ranges (9–15 years).
Word Classes 2 Receptive and Expressive (CELF-4-WC). Each item from the Word Classes subtest involved a receptive and an expressive portion and required the child to listen to a set of four words read aloud. For the receptive portion of the task, the child had to state which two items from each set were related. For the expressive portion, the child had to state how the two items selected were related. Raw scores for both the receptive and expressive portions could range from 0 to 24, for a range of 0–48 when summed together. Reliability data for the summed scores (receptive + expressive) ranged from .83 to .90 for test–retest and from .87 to .91 for internal consistency (alpha) across relevant age ranges (9–15 years).
The number of total words, a frequency count of all root word tokens within the first 30 complete and intelligible C-units, was derived as a general measure of word productivity. To simultaneously minimize the potential influence of volubility and maximize the proportion of available data, NTW was derived from the first 30 complete and intelligible C-units in each sample. 3 A prior study from the WRRMP (DeThorne, Deater-Deckard, Mahurin-Smith, Coletto, & Petrill, 2011) directly examined volubility as a potential confound for measures of NTW and NDW, and found that neither measure correlated with measures of child temperament, including surgency or extraversion, when derived from a standardized number of utterances.
To provide a measure of vocabulary diversity, the NDW tokens used by each child was also derived from the first 30 complete and intelligible C-units within each sample (cf. DeThorne et al., 2011; Hutchins, Brannick, Bryant, & Silliman, 2005). We considered the alternative of deriving NDW from 100 tokens, consistent with Scott and Windsor (2000); however, this measure did not lead to many additional cases and actually tended to rely on shorter samples (i.e., samples of 30 C-units had an average of around 250 tokens as compared to 100 tokens).
The average length of child utterances in morphemes was derived from all complete and intelligible C-units as a measure of the child's productive abilities, including morphosyntactic development. Consistent with conventions (Miller et al., 2005), bound inflectional morphemes were counted as separate morphemes, but derivational morphemes were not.
Although evidence regarding the reliability of productive language-sample measures has been mixed, especially from shorter samples (e.g., Gavin & Giles, 1996; Heilmann, Nockerts, & Miller, 2010; Tilstra & McMaster, 2007), such measures have been associated with evidence of validity through developmental change with age up through age 13 years (Miller et al., 2005; Miller, Freiberg, Holland, & Reeves, 1992; M. L. Rice, Redmond, & Hoffman, 2006); correlation with other measures of linguistic complexity (DeThorne et al., 2008; Nippold, 2009; Nippold et al., 2014; M. L. Rice et al., 2006; Ukrainetz & Blomquist, 2002); and differentiation of clinical groups (e.g., Condouris, Meyer, & Tager-Flusberg, 2003; Heilmann, Miller, & Nockerts, 2010; Scott & Windsor, 2000; Watkins, Kelly, Harbers, & Hollis, 1995).
The primary goal of this study was to examine the stability of etiological effects on language development during early adolescence. This was addressed through structural equation modeling of correlated latent factors derived from the language-sample measures (referred to here as the Productive Language factor) 4 and from the standardized tests (Formal Language factor). At each home visit, the Productive Language factor was indexed by MLU, NTW, and NDW, whereas the Formal Language factor was indexed by TNL and the three CELF-4 subtests. These factors are consistent with the conceptual framework of prior studies (DeThorne et al., 2008, 2012; Heilmann, Nockerts, & Miller, 2010; Miller et al., 2006; Scott & Windsor, 2000) and the phenotypic correlation data from the present study. Latent factors represent the shared variance among measures, independent of measure-specific variance and error. The models are depicted in Figure 1 for Productive Language and Figure 2 for Formal Language performance.
It is important to note that the models shown in Figures 1 and and22 include genetic (A), shared environmental (C), and nonshared environmental (E) latent factors that decompose the variance within and covariance between the latent Productive Language and Formal Language factors. These parameters are arranged in a Cholesky decomposition formation, meaning that the first set of A, C, and E factors (denoted A1, C1, and E1) accounts for the variance in Productive Language and Formal Language performance at HV5 as well the covariance, or stability, of Productive Language (Figure 1) or Formal Language (Figure 2) ability across HV5, HV6, and HV7. One advantage of using latent factors rather than individual measures in the Cholesky modeling is that nonshared environmental effects (e 2) reflect individualized effects that are shared across all variables in the factor, thereby removing any measure-specific error (though not measurement error that is correlated across measures, reflecting, for example, common method variance; Podsakoff, MacKenzie, Lee, & Podsakoff, 2003). The second set of A, C, and E factors (A2, C2, and E2) represents genetic and environmental effects that contribute to stability between Productive Language or Formal Language at HV6 and HV7 over and above genetic and environmental effects that influence Productive Language ability at HV5. The third set of A, C, and E factors (A3, C3, and E3) accounts for residual variance in Productive Language ability and Formal Language performance at HV7 that is independent of both HV5 and HV6—that is, “new” genetic and environmental influences that emerge later in development and are not associated with language ability at earlier assessments. In addition to examining genetic and environmental influences on latent Productive Language and Formal Language factors, our model included parameters to estimate measure-specific (residual) A, C, and E effects on each manifest variable (i.e., MLU, NTW, and NDW in the model for Productive Language, and TNL and the three CELF-4 subtests for Formal Language). The measure-specific E effects incorporate measure-specific error variance. The total effects of A, C, and E on each of the measures can be estimated as the sum of the measure-specific A, C, and E effects and the effects that are shared with the other variables loading onto the common latent factor, weighted for that variable's loading on the latent factor.
We used these models to estimate the relative contributions of A, C, and E to the phenotypic variance in Productive Language and Formal Language, which allowed us to address our first research question, regarding the extent of genetic and environmental effects on children's language skills. Similar to DeThorne et al. (2012), we then decomposed the sources of the temporal stability of Productive Language and Formal Language across the three home visits, thus: (a) A, C, and E effects that are stable across all three home visits; (b) A, C, and E effects that are stable across the last two visits only; and (c) A, C, and E effects that are specific to the last visit. This analysis allows us to address our second research question, regarding the extent of genetic and environmental stability across the three time points. A hypothesis of developmental stability predicts that genetic influences on language originate in the first latent genetic factor (A1), with no significant subsequent genetic influences (i.e., path coefficients from A2 and A3 are not significantly different from 0). In contrast, a hypothesis of developmental change predicts that new genetic variation will emerge at later home visits to affect language (i.e., path coefficients from A2 and/or A3 are significantly different from 0). Parallel hypotheses for shared environmental effects and nonshared environmental effects can also be tested.
All models were estimated in the Mx structural equation modeling package (Neale, Boker, Xie, & Maes, 2006) using full-information maximum likelihood to handle missing data. By fitting the model to twin data, we can estimate genetic and environmental effect sizes and 95% confidence intervals for the parameter estimates. Prior to analysis, language scores were standardized on the whole sample to a mean of 0 and an SD of 1. Outlying scores, defined as ±3 SDs from the mean, were removed. For the twin analyses, standardized residuals correcting for age and sex were used because the age of twins is perfectly correlated across pairs, which means that, unless corrected, variation within each age group at the time of testing would contribute to the correlation between twins and be misrepresented as shared environmental influence (McGue & Bouchard, 1984). The same applies to the sex of the twins, because MZ twins are always the same sex.
We considered two forms of evidence regarding the validity of our language-sample measures. First, we examined potential group differences in each language-sample measure as a function of whether or not caregivers reported a history of expressive-language difficulties at enrollment in the study. A consistent trend for higher productive-language scores for children without a history of reported expressive-language difficulties emerged from all measures across home visits, though differences reached statistical significance in only three instances. In terms of effect size, Cohen's d ranged from −0.14 in the case of MLU at HV6 to −0.71 for NDW at HV5.
Though the parent-report measure offers a positive indicator of social validity, the measure was taken approximately 3 years prior to the language-sample measures being studied here. As a consequence, a second form of validity evidence focused on children's concurrent performance using scores from the Recalling Sentences subtest of the CELF-4. Given prior evidence that sentence repetition serves as a relatively sensitive marker of language impairment (e.g., Conti-Ramsden, Botting, & Faragher, 2001; Meir, Walters, & Armon-Lotem, 2015; Stokes, Wong, Fletcher, & Leonard, 2006), we selected participants from each home visit who scored at least 1 SD above or below the mean on this subtest at each time point and compared the two groups on their productive-language measures at the same home visit. Significant group differences emerged across all nine comparisons (3 productive language-sample measures × 3 home visits), with Cohen's d varying from −0.92 to −1.08 at HV5, from −0.47 to −0.70 at HV6, and from −0.90 to −0.96 at HV7. Although neither form of validity evidence is conclusive, together they provide some support for the reliability and validity of our productive language-sample measures.
Descriptive statistics for each dependent measure, presented as raw scores, are summarized in Table 1. Results from analyses of variance revealed no significant mean or variance differences by zygosity or by classification of twins as Twin 1 or Twin 2 (reflecting birth order). Cross-measure phenotypic correlations within each home visit are shown in Table 2. There were high correlations among the language-sample measures (.75–.90 among MLU, NTW, and NDW), moderate to high correlations among the standardized language measures (.31–.67), and moderate correlations across the language-sample measures and the standardized language measures (.21–.46).
We applied a phenotypic confirmatory factor analysis (that did not include the effects of A, C, and E factors) to the language-sample measures and standardized-test scores to further test our conceptual distinction between these two types of assessment. This analysis supported the distinction between a Productive Language factor for the language-sample measures (MLU, NTW, and NDW) and a Formal Language factor for the standardized scores (TNL, CELF-4-RS, CELF-4-USP, and CELF-4-WC). The loadings of the language-sample measures on the latent Productive Language factor at each home visit were substantial (> .80 at each home visit), whereas loadings of the standardized language measures on the Formal Language factor were moderate to large (.30–.44 for TNL, .88–.91 for CELF-4-RS, .41–.56 for CELF-4-USP, and .65–.73 for CELF-4-WC). 5 As previously mentioned, the latent Productive Language and Formal Language factors represent the shared variance among the measures, independent of measure-specific variance and error.
In terms of longitudinal phenotypic analysis, the correlations between the Productive Language latent factors at HV5 and HV6 and between HV6 and HV7 were moderate (both rs = .31) and about double the magnitude of the phenotypic correlation between Productive Language at HV5 and HV7 (r = .15). In contrast, the phenotypic correlations for the Formal Language factor across the home visits (HV5 and HV6, HV6 and HV7, HV5 and HV7) were unity (r = 1.00), indicating that the common variance captured by the latent factors representing standardized language were perfectly stable.
An initial impression of the extent to which genetic and environmental factors contribute to individual differences in children's language skills can be gleaned from intraclass twin correlations, which are summarized in Table 3 for the two language factors at each home visit. For the Productive Language factor, correlations were small for both MZ and DZ twin pairs and reached statistical significance only at HV6. This pattern suggests that nonshared environmental influences accounted for the majority of the phenotypic variance in the language-sample measures. In contrast, the Formal Language factor demonstrated a consistent trend toward higher similarity between MZ pairs relative to DZ pairs, with large significant correlations for MZ pairs that were almost twice the magnitude of the DZ-twin correlations. This pattern is suggestive of high heritability.
Estimates of the proportion of variance in the latent Productive Language and Formal Language factors that are due to genetic and environmental effects (derived from the genetically informative Choleksy decomposition models shown in Figures 1 and and2,2, respectively) are largely aligned with the intraclass correlations. As shown in Table 4, variance in the Productive Language factor at each age was primarily due to nonshared environmental influences, which ranged from 90% of the variance at HV5 to 55% at HV7. Genetic (i.e., heritable) and shared environmental influences were small and nonsignificant. In contrast, variance in the Formal Language factor at each age was primarily due to genetic influences, which ranged from 82% at HV5 to 86% at HV7. Both shared and nonshared environmental influences were small. Estimates for the nonshared environment, but not the shared environment, were significantly different from 0, ranging in effect size from 5% to 6% of the variance for the Formal Language factor. Last, measure-specific E effects on the manifest variables, which include measurement-specific error variance, were generally significantly different from 0 in both the Productive Language and Formal Language models. In contrast, measure-specific A and C effects on each manifest variable were not significant for any of the measures. Estimates for these measure-specific effects, as well as the total genetic and shared and nonshared environmental influences on each of the measured variables, are shown in Supplemental Materials S2 and S3, respectively, for Productive Language and Formal Language.
The Cholesky decomposition model also provided information on the stability of genetic and environmental factors across the three home visits, thus allowing us to address our second research question. Here, we focus on the squared standardized parameter estimates, which provide information on the extent to which the variance in Productive Language and Formal Language at each home visit is due to additive genetic and shared and nonshared environmental influences that are shared (i.e., stable) across home visits. Table 5 shows the squared standardized estimates for Productive Language. For reference, the proportions of variance in the latent Productive Language factor due to additive genetic and shared and nonshared environmental influences (from Table 4) are shown in bold.
The primary variance component of interest is nonshared environmental factors, because the 95% confidence intervals from Table 4 indicate that neither additive genetic nor shared environmental factors made a significant contribution to variance in Productive Language at any home visit. Nonshared environmental effects common to all three factors (E1 in Figure 1) accounted for all of the nonshared environmental variance at HV5 (.90/.90 = 1, where the denominator, .90, refers to the total nonshared environmental variance in Productive Language at the home visit of interest—in this case, HV5). It also accounted for around 5% of the nonshared environmental variance at HV6 (.04/.76 = .05), and 2% of the nonshared variance at HV7 (.01/.55 =.02). Nonshared environmental factors specific to HV6 and HV7 (E2 in Figure 1) accounted for the majority, as well as the remaining (95%) nonshared environmental variance at HV6 (.74/.76 = .95) and 2% of the nonshared variance at HV7 (.01/.55 =.02). Last, nonshared environmental effects specific to HV7 (E3 in Figure 1) accounted for the remaining (96%) nonshared environmental variance in HV6 (.53/.55 = .96). In sum, there was evidence for nonshared environmental stability in Productive Language, but overall effect sizes were small, with most nonshared environmental effects being specific (i.e., new) to each home visit.
Parallel evaluations can be made for Formal Language. Table 6 shows the squared standardized estimates of the Cholesky decomposition model of Formal Language, with information from Table 4 shown in bold. Because shared environmental effects were not significantly different from 0 (Table 4), the primary variance components of interest are genetic and nonshared environmental factors. Genetic effects common to all three factors (represented by A1 in Figure 2) accounted for all of the genetic variance at HV5 (.82). They also accounted for 100% of the genetic variance at HV6 (.83/.83 = 1) and at HV7 (.86/.86 = 1). Genetic factors specific to HV6 and HV7 (represented by A2) and genetic factors that influenced only HV7, independent of HV5 and HV6 (represented by A3), had zero effect on Formal Language at later home visits. The same pattern emerged for nonshared environmental factors, even though the proportion of variance in Formal Language due to nonshared environmental factors was much smaller than that due to genetic effects. Nonshared environmental effects common to all three factors (represented by E1) accounted for all of the nonshared environmental variance at HV5 (.05), HV6 (.05), and HV7 (.06). Overall, these estimates point to a pattern of strong developmental stability in genetic and environmental effects on Formal Language across early adolescence.
Fit statistics for nested submodels for both Productive Language and Formal Language are presented in Supplemental Materials S4 and S5, respectively, to facilitate comparisons between the full models and models in which we dropped nonsignificant parameters. These fit statistics indicate that the fit of the full models was generally comparable to or better than the nested submodels.
The present study of etiological influences on child language during early adolescence found strikingly different patterns of effects depending on whether or not language was measured through standardized assessments or less structured productive tasks. To be specific, the Formal Language factor demonstrated a strong and consistent pattern of genetic effects at each time point (a 2 = .82, .83, .86), with significant but small nonshared effects at each time point as well (e 2 = .05, .05, .06). In contrast, the Productive Language factor demonstrated a strong pattern of nonshared environmental effects at each visit (e 2 = .90, .76, .55), with no significant genetic or shared environmental effects. In terms of longitudinal stability within each factor, we found complete overlap in the genetic and nonshared environmental effects on the Formal Language factor across visits, as well as significant but limited stability in nonshared environmental effects on the Productive Language factor for adjacent time points. The remaining discussion centers on the following: (a) interpreting the evidence for differing etiologies of the two language factors, (b) integrating the information about etiological stability with prior research, (c) noting study limitations, and (d) highlighting implications for clinical practice and future research.
One potential interpretation of the etiological differences between Productive Language and Formal Language factors, particularly in light of the strong nonshared environmental effects on the Productive Language factor, is that the narrative language samples did not provide a reliable form of measurement. This is particularly important given that latent factors do not control for forms of measurement error that might be shared across measures (e.g., rapport with examiner, interest in the task). Despite this reasonable concern, support for the reliability and validity of our productive-language measures comes from at least three sources. First, our elicitation task, transcription procedures, and sample length are in accord with prior published procedures (e.g., Miller et al., 2006; Nippold et al., 2005; Scott & Windsor, 2000; Tilstra & McMaster, 2007). Second, we found relatively consistent validity evidence for our language-sample measures through group comparisons. To be specific, whether grouped on the basis of parents' report of expressive-language history or concurrent performance on the CELF-4-RS subtest, significant group differences in the productive-language measures consistently emerged in the anticipated direction. As a final matter, it seems important to note that the significant stability in nonshared effects across time points, albeit small in effect size, does lend additional support to the validity of our productive-language measures. It is interesting to note that even if the nonshared environmental influences on the Productive Language factor are attributed to measurement error or poor reliability, this result is still of critical importance for interpreting other studies that have relied on similar language-sampling procedures.
If one accepts support for at least partial validity of the productive-language measures, then the etiological differences across Productive Language and Formal Language are intriguing, suggesting either that an individual's language skills can differ greatly across contexts or that language assessments are influenced by traits other than linguistic prowess. In fact, the two possibilities are likely intertwined. Situated accounts of communication have highlighted that language use is always embedded within specified contexts (cf. Hengst, 2015), and prior literature has documented the impact of factors such as attention, motivation, compliance, frustration tolerance, persistence, anxiety, and cultural background on test-taking performance (Allan, 1992; Dreisbach & Keogh, 1982; Erickson, 1972; Fleege, Charlesworth, Burts, & Hart, 1992; Peña, Iglesias, & Lidz, 2001; Speltz, DeKlyen, Calderon, Greenberg, & Fisher, 1999). Productive measures such as MLU, NTW, and NDW have similarly been noted to differ depending on such factors as motivation, genre, modality, place, and partner (e.g., Bornstein, Haynes, Painter, & Genevro, 2000; Fields & Ashmore, 1980; Nippold, 2009, 2014; Scott & Windsor, 2000).
In terms of our Formal Language and Productive Language factors, a couple of potential differences come to mind, specifically language modality (expressive vs. receptive) and degree of on-demand constraints. The Formal Language factor included two subtests that explicitly measured receptive language abilities, whereas the Productive Language factor, as the name suggests, focused on expressive abilities. In addition, the measures in the Formal Language factor, particularly the CELF-4 subtests, are fairly constrained and adult directed, meaning children have less opportunity to showcase their linguistic skills aside from the specified targets. Many of us have been in situations where we want to credit a child for her test response because it feels clever or creative even if it is not the “expected” right answer. In contrast, narrative samples offer children more latitude for linguistic flexibility and creativity. For example, one child colorfully used the adjective “seafoam green” in her narrative description of a boy's shirt; such vocabulary is unlikely to be directly queried in a standardized assessment.
In sum, perhaps it is not particularly surprising that the etiological influences on language outcomes would differ in part on the basis of how language is being assessed. A number of studies have highlighted that language-sample measures correlate more strongly with each other than with standardized tests and that standardized tests across multiple domains tend to correlate with each other (cf. Condouris et al., 2003; DeThorne, Johnson, & Loeb, 2005; DeThorne & Watkins, 2006; Ukrainetz & Blomquist, 2002). Of course, from a psychometric standpoint, influences other than the construct of interest are often chalked up to “error”; however, it is interesting to consider just how much the concept of error applies when findings are stable both within forms of assessment (e.g., productive-language sample, standardized tests) and across measurement time points.
Given the distinct and consistent etiological patterns associated with the Productive Language versus Formal Language factors, it seems important to reconsider past behavioral genetic findings with direct consideration of form of measurement. To be specific, Haworth et al. (2010) reported linear increases in the heritability of general cognition from childhood to young adulthood using standardized IQ measures. Hayiou-Thomas et al. (2012) reported increased heritability in child language from early to middle childhood but not from middle childhood into adolescence. Of particular interest, the early childhood measure for Hayiou-Thomas et al. was collected through parent report, whereas the middle-childhood and adolescent time points included direct standardized assessment and teacher curricular assessments, consequently making it difficult to differentiate effects of age from those of form of assessment. Regardless, Hayiou-Thomas et al. reported similar etiological findings across both direct assessments and teaching ratings in middle childhood and adolescence. To the extent that these findings are akin to our Formal Language and Productive Language factors, respectively, the two studies' findings would appear to be discrepant for the productive/report measure (yet very similar for the formal/direct measures).
Compared with our prior work, the present study found higher heritability for the Formal Language factor during early adolescence (ranging from .82 to .84 across time points) when compared with the .45 reported for the same general participant pool at earlier ages (DeThorne et al., 2008, p. 430). In addition, findings suggest a trend toward decreasing heritability in productive language-sample measures when current estimates, obtained during early adolescence, are compared to first- and second-grade time points (DeThorne et al., 2012). Taken together, findings across studies support the proposition that the heritability of child language development increases across development as assessed through standardized measures, but perhaps not for language measures taken from other forms of assessment, such as discourse-based language samples.
The contributions of the present study should be interpreted in light of limitations associated with the design and the measures used. In terms of design, behavioral genetic studies rely on large samples, and the present study was underpowered to detect small effect sizes. As a consequence, our data should not be used to suggest that there are no shared environmental effects on individual differences in early adolescent language development, nor that genetic effects no longer contribute to differences in child language-sample measures at early adolescence. Second, behavioral genetic designs by nature are intended to explain etiological contributions to variance within a given population. The restricted composition of our sample in terms of variation in race, ethnicity, and parent education limits our examination of variance associated with such factors. Last, behavioral genetic analyses are not well positioned to examine the complexities of epigenetic effects, specifically environmental influences on genetic expression (cf. Kraft & DeThorne, 2014; Rogers, Nulty, Aparicio Betancourt, & DeThorne, 2015).
In terms of implications for research, studies of child language should be sure to incorporate multiple forms of assessment, both for diagnostic purposes and for assessment of outcome variables. This recommendation has been stressed for clinical purposes, but less so for research. In addition, our significant findings for nonshared environmental effects on both standardized and productive-language measures suggest that additional study of nonshared environmental factors is warranted. The significance of nonshared environmental effects tells us that individualized experiences shape language development, which to some extent is difficult to reconcile with the field's heavy focus on caregiver interaction style and amount of linguistic input (cf. Rogers et al., 2015). Given that caregivers are generally the same within twin pairs, such variables could not be contributing in isolation to the significant nonshared environmental effects observed here. Candidates for potential nonshared environmental effects include such factors as diet, brain injury, peer affiliations, classroom placement, sleep patterns, drug or toxin exposure, and stress. In considering such issues, it is not difficult to begin to see how complex and individualized the web of causal influences on human development truly is.
With regard to clinical applications, the significance of genetic and nonshared environmental effects on adolescent language development supports the need to approach child language development much more individualistically, recognizing that each child brings both unique capacities and experiences to the language-learning task. As a consequence, there is not likely to be a one-size-fits-all approach or strategy for every child. Although many clinicians would likely agree with this assertion in theory, clinical and educational practice often revolve around packaged programs and curricula that may not directly acknowledge individual differences. In a similar vein, reductionist experimental group paradigms often focus on mean differences across groups rather than the complexity of individual differences, thereby cultivating the impression that one approach is preferred for all. In closing, findings of significant genetic and nonshared environmental influences on individual differences in early adolescent language development call us to think explicitly about the complexity of individualized language-learning trajectories—a challenge well suited, one would hope, to a discipline such as communication sciences and disorders.
This project was supported by National Institute of Child Health and Human Development Grants HD38075, HD46167, and HD050307 (awarded to Stephen Petrill). In addition to our appreciation of all participating families, we want to extend thanks to other WRRMP coinvestigators: Kirby Deater-Deckard, Lee Anne Thompson, and Christopher Schatschneider. Last, thanks to all participating members of the Child Language and Literacy Laboratory, particularly Karissa Nulty and Clare Rogers for assistance with data management and analyses.
This project was supported by National Institute of Child Health and Human Development Grants HD38075, HD46167, and HD050307 (awarded to Stephen Petrill).
1Given that adolescence is often defined relative to the onset of puberty, the specific age range varies on the basis of race and biological sex, among other factors (Herman-Giddens et al., 1997; Marshall & Tanner, 1969, 1970). For the purpose of this article, we are associating adolescence with an age range of 10–18 years, in accordance with the American Psychological Association (2002).
2Given that an extra semiannual home visit was conducted between the third and fourth annual home visits, the fourth annual home visit is referred to as the fifth home visit (HV5) in the present article.
3Calculating NTW on the basis of 100 utterances, consistent with our prior work (e.g., DeThorne et al., 2012), would have resulted in too few usable cases, given that many of the narrative samples were relatively short (mean of 59–60 C-units across the three home visits).
4 DeThorne et al. (2008) used the term Conversational language for the factor created from language-sample measures. The term Productive language is used here instead (consistent with Scott & Windsor, 2000), given that language samples were taken from narrative tasks rather than conversation.
5An alternative model in which TNL was set to load on the latent Productive Language factor, rather than Formal Language factor, resulted in a poorer model fit (details available in Supplemental Material S1), and was therefore not considered further.