All data were screened and distributions inspected for outliers and incorrect values. Missing data were present, to some degree, in all modelled variables. The average amount of missing data among the 1766 subjects was 2.2% and ranged from zero (mother’s place of birth, child gender and age) to 7.6% (Dimensions of Temperament, rhythmicity-sleep). To address this problem we carried out data imputation via a multiple imputation procedure using SAS PROC MI (SAS Institute, 2004
). Five complete data sets were generated; each subsequent analysis was performed on each of the data sets and results were then combined. This imputation approach is preferable to single imputation which substitutes a single number for each missing value in that the multiple imputation approach accounts for the variability in plausible replacement values (Rubin, 1987
). Using a Markov Chain Monte Carlo procedure, all data were imputed at the item level before computing the scale values.
Following imputation, characteristics of the mother, the family and the child were summarised (). Mothers were predominately between the ages of 24 and 34 years at the time of the birth of the child. Australia mandates 10 years of compulsory education. Years (ie Grades) 11 and 12 are principally used for college entry preparation. The majority of mothers completed 10 years of education and the distribution of maternal education was bimodal with about a quarter of mothers having less than 12 years of education, 19.3% completed 12 years of schooling, 13.1% completed a trade certificate, 13.7% some study towards a post-school qualification, while another 29.2% had completed a post-school technical qualification or university degree. Three quarters of the mothers were born in Australia, and forty percent of them were in paid employment working an average of 22 hours per week. Mean maternal DASS scores for depression, anxiety and stress are comparable to those of the normative sample (Lovibond & Lovibond, 1995b
). The mean PS Score reported by the RASCALS mothers was also commensurate with the means reported originally by Arnold et al (1993)
for their clinical and non-clinical groups and is comparable to population means reported by Zubrick et al. (2005)
Maternal, family and child variables for total sample (N = 1766)
Families were predominately two-parent original families with 13% of the remaining families being either step/blended or sole parent families. The average number of children per family was two. Assessment of family income revealed a small proportion of families (5.5%) earning $A16,000 or less per annum. Area SEIFA indicators for disadvantage, resources, and occupation/education were well within population averages for these measures. About 9% of families were classified as having abnormal family function using the FAD. This compares well to the population proportion of Western Australia families reporting abnormal family function (10%) (Silburn et al., 1996
). A family history of late talking was reported in 13.5% of the families.
Children in the study were an average age of 2.1 (sd 0.13) years and were nearly all Caucasian (96.6%). With respect to neonatal characteristics, fewer low birthweight infants were in the study sample (3.7%) relative to the Western Australian population proportion (6.4%) but otherwise the neonatal characteristics of the study sample were unremarkable with mean birthweight, mean gestational age and time to spontaneous respiration being comparable to Western Australian population averages (Gee, 1996
With respect to normative development, study sample mean CBCL T scores were at the approximate 50th
percentile and about 10% of the study children had a CBCL Total T-score in the clinical range. These are the first Australian data to be gathered on children as young as two years; however, the proportion of children scoring in the clinical range is comparable to Western Australian population studies of 4-11 year old children using the appropriate-for-age CBCL parent reported measure (Zubrick et al., 1995
). The ASQ developmental measures ranged from 1.6% (Personal Social Score) to 8.3% (Adaptive Score) in the abnormal range with 2.3% of the sample having ASQ Communication Scores in the abnormal range. A little over one-third of the children were receiving day care with the mean number of hours being 16 per week.
Determination of late language emergence (LLE) and prevalence
The scale of the study required an assessment of late language emergence with minimal effort loading on the part of the parent respondents. The instrument used is the ASQ Communication Scale, comprised of a short list of language milestones drawn from the normative literature by Bricker & Squires (1999)
. The Communication Scale is part of an instrument developed as a parent report measure to screen for developmental impairments. Recently, Luinge, Post, Wit, & Goorhuis-Brouwer (2006)
followed the same milestone method to develop a brief language screening instrument intended for public health assessments.
The ASQ Communication Scale uses six items to assess aspects of the child’s developing skills in speech production and comprehension. Mothers were asked to report whether their child could (1) point to pictures on request, (2) use two- or three-word phrases, (3) carry out simple directions on request, (4) name simple objects, (5) point to body parts on request, and (6) use personal pronouns such as “me”, “I” and “you”. The response categories for each item were (1) “Not Yet”, (2) “Sometimes”, and (3) “Yes”.
In our sample the Communication Scale had a Cronbach’s alpha of .71, essentially replicating the estimate provided in the test manual. Because the manual does not report validity estimates for the ASQ Communication subscale (only for the full instrument), we carried out analyses of criterion and concurrent validity for our ASQ outcome measure based on item response theory (ASQ IRT, described further below). Criterion validity is hampered by the lack of an external “gold standard” measure of late language emergence (McCardle, Cooper, & Freund, 2005
; Tager-Flusberg & Cooper, 1999
). We were, however, in a position to assess some aspects of concurrent validity against another measure of speech and language collected via parent-report at the time of the survey. For half of the cohort (N=902), the LDS had been sought from the parent at the time the questionnaire was completed. Of the children with LDS data, 888 also had ASQ data. This permitted estimating concurrent validity for our measure against the LDS. Both the ASQ IRT score and the LDS score are continuous variables and were moderately correlated (.675; p<.001). Additionally, for those children for whom we had a parent-completed LDS, we were able to calculate mean LDS scores for children differentiated by LLE status on the ASQ. Children defined with LLE on the ASQ measure had significantly lower mean LDS scores than those children classified in the normal range on the ASQ measure (MLLE
= 62.5, sd 52.5 vs MNormal
= 196.2, sd 70.7; df = 198.1, t = 24.7, p <.001).
We also assessed the correspondence between the LDS item, “Does your child combine 2 or more words into phrases…?” and the ASQ item, “Does your child say 2 or 3 words together…?”. Complete data were available on both of these items for 896 of the children. Frequency distributions were obtained on both the LDS and ASQ items. Ninety percent of children were reported on the LDS to be combining two or more words into phrases and 89% of children were reported on the ASQ to be saying two or three words together. Cross tabulation of these items indicated complete correspondence of these items for 860 of these cases (Chi square = 547.9, df=1, p <.001; Kappa=.78). As initial reports of the validity of the ASQ Communication Scale, these findings suggest an acceptable level of concurrent validity with another measure frequently used to assess early language emergence.
The Graded Response Model
To assess the suitability of the ASQ Communication scale to identify children with late language emergence, we undertook an item response analysis using a type of polytomous item response theory (IRT) model known as the Graded Response Model (GRM) (Samejima, 1969
). The GRM is a polytomous IRT model which models each of the three response categories simultaneously, creating a scaled value representing a person’s overall ability on the test. In general, Likert-type scales with fewer than five response choices and a small number of items are difficult to summarize with a single “scale” score that has a quantifiable standard error of measurement. The GRM is well suited to the ASQ analyses because it generates an ordering of persons on the ability scale where the responses for the scale are essentially ordered categorical responses. The GRM assumes: 1) that the relationship between ability level and the probability of endorsing a particular item response category (or a higher category) is monotonic, 2) that the items are unidimensional and have only one common factor and 3) that ability is distributed normally with a mean of zero and a standard deviation of one, even if the items do not measure the entire range of the distribution. This third assumption is not a necessary assumption of the model but is merely an identification condition to set the scale of ability and may be modified if desired. The major advantages of the IRT approach over other methods of scaling include: 1) use of all items rather than a reliance on a single item; 2) differential adjustment for item difficulty; 3) provisions for appropriate handling of missing data by determining estimates of ability that are based on all of the items answered and do not impute the individual’s mean score for missing items; and 4) use of a continuous estimate of (in this case) communication/verbal ability which is on a scale that is not sample dependent.
We commenced our assessment of the ASQ Communication scale by testing the tenability of the dimensionality assumption. To do this we used a principal components analysis. In addition to this traditional analysis, we also used the DETECT algorithm (Stout, 1987
) which is confirmatory in nature. The results of both of these analyses indicated that the six items represent only one dimension.
Item characteristic curves (ICCs, also known as item response functions) for each of the six items were then evaluated. For economy of space, an example of one of the items is shown in . The lines show the probability of endorsing a certain response at a given level of ability. These figures show that with increasing ability the probability of a ‘Not yet’ response decreases while the probability of a ‘Yes’ response increases. At the upper end of the ability scale there is very little difference in the probability of a yes response. Thus, for Item 2, measuring the use of two or three word phrases (), a child with an ability of one standard deviation above the mean would have about the same probability of a ‘yes’ response as an individual with an ability of 3 standard deviations above the mean. The graph in the right panel, the item information curve, represents how well the item can distinguish or discriminate between different levels of ability. We can see that item 2 is best at discriminating individuals with ability near −1.5 standard deviations. This is where the item is most informative and where measurement error is the lowest. Our assessment of each of the item characteristic curves showed that the ASQ Communication scale measured the low end of ability quite well.
Figure 2 The Item Characteristic Curve and Information Curve for Communication Item 2, ‘Does your child say two or three words together that are different ideas, such as “see dog”, “mommy come home” or “cat gone”?’. (more ...)
Having determined the item parameters from the child’s response on each of the six items, they were then used to create an estimate of each child’s ability. This estimate gives the child’s most “likely” ability level that explains the child’s responses. As shown in the test information curve in , we can see that the six item scale provides increasing discrimination and lower measurement error in the range from −1.0 to −1.5 standard deviations below the mean. The IRT/GRM models do not generate an exact cut-off point for creating a dichotomous variable for LLE, but the choice of the cut-off point is guided, in part, by the range of scores within which the scale is more precise in discriminating between different ability levels and also by the researcher’s judgment based on previous research and clinical factors.
Communication Composite Information Curve.
For reasons of clinical benchmarking and to avoid missing children with LLE we chose – 1.0 S.D. as the cut-off to demark those children with and without LLE (c.f., Feldman, et al, 2005
). Of the 1766 children a total of 238 (13.4%) were classified as having LLE (). The 13.4% estimate from the IRT composite can be compared to an alternative estimate. Following precedents in the literature, the ability to combine words at 24 months was used as a criterion for grouping children. Of the sample, 10.7% of the children were reported to not combine words; 8.4% “sometimes” and 80.9% “yes,” yielding an overall estimate of 19.1% of the sample who were not routinely combining words in utterances.
Maternal, family and child variables for Control and LLE Children (N = 1766a) – IRT 1.00
Late language emergence – bivariate relationships with maternal, family and child characteristics
Comparisons of maternal, family and child characteristics were made for children differentiated by LLE (). Alpha levels were not adjusted for family-wise or study-wise error in order to detect any possible differences between the groups. As it turned out, when differences were evident they almost all were at conventional levels of adjustment, i.e., < .01 or .001. With respect to maternal characteristics, no significant differences for children with and without LLE were observed with regard to maternal age at the birth of the child, levels of maternal education, mother’s place of birth, maternal uptake of paid employment and cigarette use. There were no significant differences between these groups in their mean maternal DASS scores nor in the proportions of mothers reporting varying levels of clinical depression, anxiety, and stress. The only statistically significant difference observed with regards to maternal characteristics was in the Parenting Score – the mothers of children with LLE reported higher mean PS scores (M = 2.9, sd 0.6 vs M = 2.8, sd 0.6, p < .01) with a correspondingly higher proportion falling within the clinical range (36.9% vs 27.6%, Chi2 = 8.67, df = 1, p < .01) denoting a higher level of dysfunctional parenting.
Within families, LLE was associated with a family history of late talking (22.2% vs 12.1%, Chi2 = 18.2, df = 1, p < .001) and with larger family size as measured by the number of children in the family. When compared with children who did not have LLE, children with LLE were less likely to be the only child (20.1% vs 31.4%, Chi2 = 16.6, df = 3, p < .001). Otherwise there were no significant differences in the family characteristics of children with and without LLE in terms of family type (i.e., two parent, sole parent), income, area level indicators of socioeconomic status, and family function.
With respect to characteristics of the child, there were several significant differences between children with and without LLE. Children with LLE were significantly more likely to be male (70.8% vs 47.6%, Chi2 = 44.3, df = 1, p < .001). While comparisons of their mean ages showed children with LLE to be significantly younger (M = 2.08 years, sd 0.104 vs M = 2.11, sd 0.135, p < .001) this equates to a mean difference of ten days in age between these groups. In practical terms 99.8% of the children were between the ages of 23 and 24 months of age.
There was no significant difference between LLE groups on neonatal measures of birth weight, low birth weight status, and time to spontaneous respiration. However, children with LLE were significantly more likely to be born weighing less than 85% of their optimal birth weight (14.7% vs 8.2%, Chi2 = 10.5, df = 1, p < .01) and less than 37 weeks gestation. Because gestational age, and prematurity specifically, is frequently cited as a confounding factor for LLE, separate investigation of this as a possible threat to the validity of the findings is reported below.
With regard to development, significantly higher proportions of children with LLE were in the abnormal range on the ASQ Gross Motor, Fine Motor, Adaptive, and Personal Social scores. Results on the ASQ Communication Score, which is calculated from the six variables used to define LLE status, revealed all children with LLE to fall in the abnormal range of the Communication Score. In terms of behavioral and emotional adjustment, significantly higher proportions of children with LLE were in the abnormal range on the parent-reported CBCL Total Score (15.6% vs 9.6%, Chi2 = 7.94, df = 1, p < .001) with corresponding and statistically significant elevations in CBCL Internalising problems (11.0% vs 6.7%, Chi2 = 5.64, df = 1, p < .001) and Externalizing problems (23.8% vs 15.1%, Ch2 = 11.4, df =1, p < .01).
The only temperament difference between those children with and without LLE was in negative mood quality. Relative to children without LLE, a significantly greater proportion of children with LLE were reported by their mothers to have negative mood quality (31.3% vs 23.7%, Chi2 = 3.44. df = 1, p <.05).
Finally, there was no difference in the proportion of children with and without LLE who were enrolled in day care nor in the amount of day care they received as measured by mean number of hours.
Late language emergence – multivariate relationships with maternal, family and child characteristics
The numerous relationships of maternal, family and child characteristics with LLE () were further investigated using multivariate logistic regression. Logistic regression allows the prediction of a discrete, binary outcome (in this case LLE) from a set of predictor variables (Hosmer & Lemeshow, 1989
). The predictor variables may be continuous, dichotomous, discrete or a mix of these types. Estimated effects of the predictor variables are multivariately adjusted for the effects of the other predictors. In this study, the association between the outcome variable (LLE) and the candidate predictor variables were expressed as odds ratios. An odds ratio is the ratio of the probability of the occurrence of an event to the probability of the non-occurrence of the event. In this study, the ‘event’ is LLE and because LLE is an adverse outcome, the predictor variables are ‘risk’ variables. Where predictors are categorical these odds ratios are calculated with reference to a specific base or “reference” category.
The candidate predictor variables were selected from . In fitting the logistic model, virtually all variables were used and, following Hosmer and Lemwshow (1989)
, most were coded to be categorical, rather than continuous. Two exceptions were made. First, the country of birth of the mother was not entered in the model. The distribution of this variable reflects differential bias in the exclusion of cases owing to English language requirements. Second, the child’s age in months at the time of the interview was entered as a continuous variable. All other variables were coded as categorical variables (see ).
To account for data imputation procedures (described above) we undertook logistic regression using SAS 9.1 (PROC LOGISTIC and PROC MIANALYZE) (SAS Institute Inc., 2004
). Instead of filling in a single value for each missing value, these procedures combine the results of the analyses of imputations and generate valid statistical inferences by replacing each missing value with a set of plausible values that represent the uncertainty about the right value to impute (Rubin, 1976
; Rubin, 1987
All variables were entered into the model in a single step with LLE as the response variable. For each of the predictor variables, parameter estimates (Betas), their standard errors, 95% confidence intervals, degrees of freedom, t values, and their probabilities along with the odds ratios and their 95% confidence intervals are shown in .
Multivariate logistic regression: Prediction of LLE status by maternal, family and child variables (bolded entries are significant)
There were no statistically significant associations between the various maternal characteristics and LLE. No significant associations between LLE and maternal education, age, smoking, psychological state, or parenting style were observed.
In the variables characterising the family, LLE was significantly associated with the number of children in the family. Relative to singleton children, those children with LLE were significantly more likely to have one or more siblings (OR 2.07, 95% ci 1.39 – 3.09). Relative to families without a history of late talking, children with LLE were significantly more likely to be born to families in which a parent has a history of late talking (OR 2.11, ci 1.39 – 3.19). All other statistical associations between LLE and the set of family variables were non-significant. This included family type, income, local area disadvantage, low economic resources, and low education and occupational status, family function and day care use.
Several characteristics of the child were associated with LLE status. Relative to females, males were significantly more likely to have LLE (OR 2.74, 95% ci 1.96 – 3.83). LLE children were more likely to be born at 32 weeks or less gestation (OR 1.84, 95% ci 1.04 – 3.25) and weigh 85% or less of their optimal birth weight (OR 1.89, 95% ci 1.18 – 3.01). All ASQ variables were significantly associated with LLE. Relative to children in each of the respective normal categories, children with LLE were more likely to fall in the abnormal range of the ASQ on measures of Gross Motor Score (OR 3.12, 95% ci 1.29-7.51), Fine Motor Score (OR 2.39, 95% ci 1.19-4.77), Adaptive Score (OR 2.64, 95% ci 1.66 – 4.21) and Personal Social Score (OR 5.52, 95% ci 2.05 – 14.86).
Potential threats to validity
These findings are based upon a well defined and described sample of children aged 2 years. Exclusions from this sample included non-English background and medical conditions or syndromes known at the time of the 2 year observation. To what extent might “covert” disability – i.e. conditions not known at the time of the 2 year assessment but associated with late language emergence – impart bias to these findings? Although the focus of these findings is on the phenomenology of late language emergence at 2 years, the study children were followed until the ages of 8 years.
Subsequent examination revealed that 19 additional children developed syndromal conditions that potentially were related to late language emergence. These children were assessed on the ASQ Communication Scale at age 2 and 37% were in the normal range while 63% were classified as having LLE. Of the 19 children a total of 10 were subsequently found to have intellectual disabilities, 4 were diagnosed with Autism Spectrum Disorders, and the remaining 5 with developmental syndromal conditions. The multivariate analysis () was repeated without these children. Only one change occurred in the estimates: prematurity was no longer a significant predictor of LLE status.
Further inspection of the data revealed an additional 7 children had been born less than 31 weeks of gestation. Six of these children had ASQ Communication Scale scores. Fifty percent of these children were measured at age 2 to have LLE. All 7 of these children were subsequently removed from the multivariate analysis along with the 19 children found later to have syndromal conditions. Aside from the non-significance of gestational age, results revealed no substantive changes to those reported in .