|Home | About | Journals | Submit | Contact Us | Français|
Twin and family studies of autistic traits and of cases diagnosed with autism suggest high heritability; however, the heritability of autistic traits in toddlers has not been investigated. Therefore, this study’s goals were to (1) screen a statewide twin population using items similar to the six critical social and communication items widely used for autism screening in toddlers (Modified Checklist for Autism in Toddlers); (2) assess the endorsement rates of these items in a general population; and (3) determine their heritability.
Participants composed a statewide, unselected twin population. Screening items were administered to mothers of 1,211 pairs of twins between 2 and 3 years of age. Twin similarity was calculated via concordance rates and tetrachoric and intraclass correlations, and the contribution of genetic and environmental factors was estimated with single-threshold ordinal models.
The population-based twin sample generated endorsement rates on the analogs of the six critical items similar to those reported by the scale’s authors, which they used to determine an autism threshold. Current twin similarity and model-fitting analyses also used this threshold. Casewise concordance rates for MZ (43%) and DZ (20%) twins suggested moderate heritability of these early autism indicators in the general population. Variance component estimates from model-fitting also suggested moderate heritability of categorical scores.
Autism screener scores are moderately heritable in 2–3 year-old twin children from a population-based twin panel. Inferences about sex differences are limited by the scarcity of females who scored above the threshold on the toddler-age screener.
Despite a strong focus on genetics by contemporary autism research and despite diagnostic criteria for autism that require onset before age three, heritability of indicators for autism in the toddler-age range has not been investigated. Autism heritability estimates are usually provided by twin studies, but extant twin studies have examined the heritability of autistic traits using only school-age children (≥ 5 years) or adults.1–6 Because genetic factors need to be studied at an age relevant to diagnosis, the present study uses a birth-register sample of 2- and 3-year-old twins to examine the heritability of items similar to the critical items used on a popular autism screener, the Modified Checklist for Autism in Toddlers (M-CHAT).7
A traditional conceptualization of autism is categorical -- that autism represents a discrete diagnosis. Alternately, autism can be viewed as one or more dimensional traits, with autistic individuals having extreme standing on traits that extend into the population. Three small studies of twins who were diagnosed as autistic—as opposed to studies of twins in an unselected population who possess autistic traits—initially motivated much of the current interest in the genetics of autism.
The seminal UK study included 42 individuals (21 twin pairs), 5–23 years old, 60% of whom met the following diagnostic criteria: (1) atypical development of social relationships, (2) delayed language development, and (3) repetitive play and interests.8 Concordance was 36% for the 11 monozygotic (MZ) twin pairs and 0% for the 10 dizygotic (DZ) twin pairs. When 5 MZ and 1 DZ cotwins with only delayed language development (and mild mental retardation in 2 cases) were included as concordant, the concordance rates increased to 82% for MZ twins and 10% for DZ twins.8 The authors concluded that not only was autism highly heritable, but related characteristics, such as language delay, reflected shared genetic factors. A similarly small (n=22 pairs) Scandinavian twin study reported 91% concordance for MZ twins and 0% for DZ twins using DSM-III-R diagnostic criteria.9 This sample included 2 twin pairs with some evidence of fragile-X and one set of triplets.
A third twin study augmented the sample used in the seminal UK study with 28 newly ascertained twin pairs.8,10 Both the original and the newly recruited twins were re-diagnosed using ICD-10 criteria, which eliminated some twin pairs from both groups. The autism concordance rates for the combined sample were 60% for 25 MZ twins and 0% for 20 DZ twins. When speech or language atypicality in a cotwin was considered concordant, concordance rates increased to 92% for MZ twins and 10% for DZ twins.10
These early twin studies, although smaller and less systematic in sampling than desirable, convinced the field that autism was highly heritable. Our own case-finding approach has also contributed evidence for heritability. Preliminary data from 79 twin pairs (1;7–19;8 years) with at least one twin receiving a credible community diagnosis on the autism spectrum (i.e. autism, Asperger’s syndrome, or Pervasive Developmental Disorder-Not Otherwise Specified) indicated that 77% of MZ twins and 28% of DZ twins were concordant for autism.11 These preliminary results confirm that autism spectrum diagnoses are heritable, although perhaps not as strongly as the earlier European twin studies suggested.
After the early twin studies of diagnosed autism, the focus of behavior-genetic analyses shifted. Folstein and Rutter’s observation that genetic factors might relate to broader variants of the autistic phenotype led to family studies.8 One estimate from these family studies concluded that 12–20% of autistic children’s siblings show a “broader autism phenotype.”12 One study of siblings of autistic children identified atypicality in both structural language and communicative use of language.13 However, other evidence suggested that parents and extended family members show increased rates of social atypicality and focused interests, but not atypical communication (e.g., language delay).14
Other recent studies have suggested that 20% of autistic children’s siblings show subtle atypicality in language and communication, and delays in gross and fine motor skills, as early as 14 months of age.15–16 By 7 years of age, as many as 40% of autistic children’s siblings are reported to show cognitive, learning, or language difficulties, although such high sibling rates of putatively autism-related atypicalities approach the constraints of most plausible genetic models.17 Other recent studies have reported language atypicalities in autistic children’s first-degree relatives, both with and without language difficulties, and familial aggregation of “insistence of sameness” and “intense preoccupations.”18–19
Thus, family studies suggest that a similar, but broader, autism phenotype appears in genetic relatives of autistic individuals. Evidence of a broader autism phenotype also supports reconceptualizing autism from a distinct, categorical diagnosis to a continuous distribution extending into the general population. If such a continuum exists, we need a dimensional framework that allows for quantitative, rather than qualitative, differences between the general population and autistic individuals.12,20
In an unselected Dutch sample of 18-year-old twins (n=155 pairs) assessed with the self-reported Autism-Spectrum Quotient, intraclass correlations for males were .59 for MZ twins (n=33) and .36 for DZ twins (n=31), leading to a simple Falconer estimate (twice the difference between MZ and DZ correlations) of h2 of .46.6,21 For females, the intraclass correlation was .51 for MZs (n=43 pairs) and .43 for DZs (n=35 pairs), yielding h2 of .16. The opposite sex DZ correlation was .35 (n=36 pairs). In an unselected UK sample of 5–17-year-old twins (n=458 pairs) assessed with the Social and Communication Disorders Checklist (SCDC), which has high sensitivity but low specificity for autism spectrum diagnoses, a heritability estimate of .74 (CI: 0.68–0.78) was reported.5,22 An earlier study using the same sample reported no evidence of differential genetic effect by sex.23
In an unselected U.S. sample of 7–15-year-old twins (n=788 pairs) assessed with the parent- or teacher-reported Social Responsiveness Scale (SRS), which, despite its name, contains nearly half non-social items, the intraclass correlation for males was .73 for MZs and .37 for DZs, yielding a Falconer h2 of .72.1–2 For females, the intraclass correlation was .79 for MZs and .63 for DZs, yielding h2 of .32. More complex model-fitting, which included the opposite-sex twins, suggested that SRS-measured traits might be more heritable in males than females, with heritabilities of .51 (CI: 0.28–0.58) and .39 (CI: 0.27–0.49), respectively. However, a more parsimonious biometric model with a similarly moderate h2 of .48, led the authors to conclude that the same additive genetic effects acted in both sexes, with these genetic effects likely either (1) amplified in males or dampened in females, or (2) complemented by shared environmental effects in females but not in males.
Lastly, in an unselected UK sample of 8 year-old twins (n=3,419 pairs) assessed with the parent-reported Childhood Asperger Syndrome Test (CAST), intraclass correlations were in the high .70s and low .80s for MZ pairs and in the high .20s to mid .30s for DZ pairs.4,24 Concordance rates for different CAST total score cutoffs were also examined; for the top 5%, the concordance rate was 58% for MZ pairs and 20% and 17% for DZ same- and opposite-sex pairs. These data showed few instances of sex differences in twin similarity.4
In summary, twin studies have established substantial heritability of autism diagnoses; however, participants in these studies were ascertained using earlier diagnostic criteria. Family studies have shown autistic features in non-diagnosed relatives. Four general population twin studies have converged in finding autistic traits moderately to highly heritable in school-aged and young adult twins. However, the studies have diverged as to whether heritability differs for males and females (with some suggesting higher heritability in males), and none of the studies has provided information on heritability of autism indicators in children at the usual age of diagnosis.
Screening instruments tend to be intentionally over-inclusive to maximize sensitivity (i.e., to maximize identification of potential cases for follow-up); over-inclusiveness produces false positives. False negatives can also arise in early autism screening because autistic traits can appear later in some children.25–28
Our study employed items analogous to those of the Modified Checklist for Autism in Toddlers (M-CHAT), a widely used, 23-item, parent-report autism screener designed for 24 month-olds.7 The full M-CHAT includes items covering the three domains of autism diagnostic criteria. The M-CHAT incorporates the nine parent-report items from the original Checklist for Autism in Toddlers (CHAT) with 14 additional items developed from hypotheses in the literature, instruments designed for older children, and clinical experience.7,29
Estimates of the M-CHAT’s sensitivity (87%) and specificity (99%) were calculated from a cutoff of 2 of 6 critical items identified from a discriminant function analysis (Table 1).7 Although children who screened negative were not followed up, preventing the calculation of sensitivity and specificity, another recent paper reported the M-CHAT’s positive predictive power (the ratio of true positive outcomes to combined true and false negative outcomes) for low-risk (PPP=0.28–0.61) and high-risk (PPP=0.74–0.79) samples of toddlers.30 Other recent reports suggest the initial sensitivity and specificity estimates may be high and that a cutoff of 3 rather than 2 items may be optimal.31–32
However, neither the M-CHAT items nor items from other early autism screeners have been subjected to genetic analyses; therefore, no information is available on heritability of indicators at the age when autistic traits are typically recognized and diagnosed. This lack of information, also acknowledged by Ronald and colleagues, motivated our examination in a birth register-based sample of twins between 2 and 3 years of age with items similar to the six discriminative items identified by the M-CHAT authors.33
The study was approved by the University of Wisconsin—Madison IRB, and the sample was part of the Wisconsin Twin Panel.34 Twins born during 1998–2003 were identified from statewide birth records, and informed consent was obtained from parents of 2422 twins (n=1211 pairs). Assessment primarily occurred after 24 and before 36 months (M=27 months, SD=2.8 months). Screener scores were neither correlated with (r=.02), nor corrected for, age. The sample comprised 828 MZ, 820 same-sex DZ, and 774 opposite-sex DZ individual twins. Approximately half the sample (n=1203 twins) was male. Ninety-two percent of the sample was Caucasian, 3 percent African American, and 5 percent other (e.g. Native American, Asian-American). The sample’s median income ranged from $50,001–60,000. Other demographic information on the sample may be found in Goldsmith et al., 2007.34
Twin zygosity was determined via phone interview with the Zygosity Questionnaire for Young Twins, which yields greater than 95% agreement with genotyping.35–37 Ambiguous parent responses were clarified by examining photographs. Twelve pairs were excluded from analyses due to uncertain zygosity.
Eight autism screener items were embedded in a longer questionnaire. The eight items (Table 1) were similar in content to the six critical items from the M-CHAT.7 The original M-CHAT items were not used because our study was designed prior to publication of the M-CHAT. As Table 1 shows, the first four M-CHAT items matched directly four of our eight items; the other two M-CHAT items mapped onto two pairs of our items, which were therefore averaged. Four of our items used a binary scale (“0”=No, “1”=Yes), as used in the M-CHAT, and four used a three-point scale (“0”=Not true/rarely, “1”=Somewhat true/sometimes, “2”=Very true/often), which was rescaled to match the 0–1 range (i.e. “0,1,2” recoded as “0,0.5,1”).
Our items were administered in two formats: as a questionnaire packet (for twins born from 1998–2000, excepting one family), or as part of a phone interview conducted with the twins’ primary caregiver, usually the mother (for twins born after 2000). Score means from the questionnaire administration (M=.17, SD=.49) were slightly but significantly lower than those from the interview administration (M=.25, SD=.63; p<.01, d=.14). However, because the mean difference between administration formats was small, analyses did not differentiate between questionnaire-based and interview-based administration.
We summed the items to yield a single score ranging from 0 to 6, with 0 representing no endorsement of any and 6 representing endorsement of all of the autism indicators.
Multiple approaches were used to index cotwin similarity. Based on a categorical approach, pairwise and casewise concordance rates were calculated using endorsement thresholds of the equivalent of 2 or more and 2.5 or more items.38 Pairwise concordance is the number of concordant twin pairs, with both twins meeting or exceeding the threshold, over the total number of twin pairs. Casewise and probandwise concordance rates differ only in ascertainment methods: The casewise method identifies cases in already-identified pairs, whereas the probandwise method identifies cases as individuals and then follows up the cotwins.39
Our study ascertained cases as pairs rather than as individuals; therefore, the concordances were considered to be “casewise.” Casewise concordance is the number of all above-threshold twins whose cotwin also exceeded threshold (which counts each member of a concordant pair), over the total number of pairs plus the number of concordant pairs. Categorical twin similarity (based on both 2- and 2.5-item cutoffs) was also indexed by tetrachoric correlations, or binary data correlation coefficients, which assume that the underlying variable is normally distributed. Dimensional twin similarity was indexed by intraclass correlation coefficients.
The twin method allows for estimation of genetic and environmental components of observed behavioral variance because MZ twins share all of their genes whereas DZ twins share, on average, one-half of their genes. Tetrachoric and intraclass correlations yield an estimate of genetic variance divided by phenotypic variance, heritability (h2), after applying Falconer’s formula to parse the observed variance. Statistical modeling provided a more accurate representation of the genetic and environmental contributions for single threshold models. That is, we used Mx software to fit threshold models to contingency tables that depict, for instance, the probability that a twin is above threshold given that his or her MZ (or DZ) cotwin is also above threshold. These models parse variation in categorical scores into latent additive genetic (A), shared environmental (C), and non-shared environmental and error (E) components for all MZ and DZ twins (including opposite-sex DZ twins).40
The distribution of scale scores was highly positively skewed (skewness=4.17), as expected for the assessment of a general population with items reflecting atypicality (Figure 1). With a threshold of the equivalent of 2 items, 2.8% of the sample (n=69) scored above threshold, including 42 males and yielding a slightly skewed male: female ratio of 1.5:1. With a threshold of the equivalent of 2.5 items, 1.3% of the sample (n=33) scored above threshold, including 21 males and yielding only a slightly more skewed male: female ratio of 1.8:1. The sex ratio only became highly skewed (7:1) when the threshold was raised to the equivalent of 4.5 items, which identified only eight children as above threshold.
Sex differences were examined by summing the autism screening items. Males scored higher and were more variable (M=.32, SD=.77) than females (M=.22, SD=.56; both means and SDs different at p<.001, d=.15). Moreover, Cronbach’s alpha (internal consistency reliability) showed that the scale was more reliable for males (α=.63) than for females (α =.49). Thus, sex differences were observed for score mean, variance and reliability. [Insert Figure 1 here]
The pairwise and casewise concordance rates (Table 2, rows 2–11) suggested moderate heritability for both screener thresholds (11–60%). The significant casewise differences (p<.01) indicated that above-threshold MZ twins were more likely than above-threshold DZ twins to have an above-threshold cotwin. Tetrachoric correlations (Table 2, rows 12–13) estimated twin similarity for the binary categories. The MZ tetrachoric correlations (threshold of 2=0.78, threshold of 2.5=0.92) were significantly greater than the DZ tetrachoric correlations (threshold of 2=0.53, threshold of 2.5=0.58; p<.01). Thus, extreme scores were moderately genetically influenced.
For a threshold of the equivalent of 2 items, Falconer’s formulas estimated that 50% of the variation in the categorical score classification could be attributed to additive genetic variance, 28% could be attributed to the environment shared by both twins and 22% to non-shared environmental variance and error. For the more stringent threshold of the equivalent of 2.5 items, 68% of the score variation could be attributed to additive genetic variance, 24% to the environment shared by both twins, and 8% to non-shared environmental variance and error.
The intraclass correlations (Table 2, rows 11–12) assessed similarity on total scores (0–6) for each twin pair. These correlations showed higher MZ (.58) than DZ (.35) similarity. Falconer’s formulas estimated that 46% of the variation could be attributed to additive genetic variance, 12% could be attributed to shared environmental influences, and 42% to non-shared environmental variance and error. Thus, the h2 estimates for the dimensional scores, 46%, and for the categorical scores, 50–68%, were similar.
The intraclass correlations assessing dimensional similarity moderated by sex are displayed in the bottom row of Table 2. MZ male twins (.62) were more similar than MZ female twins (.53), whereas DZ males (.25) were less similar than DZ females (.34). Opposite-sex DZ twins (.44) were more similar than same-sex DZ twins; however, the confidence intervals for the correlations overlapped in each of these cases.
Ordinal (polychoric) ACE models were used to estimate MZ and DZ heritability computed from single threshold categorical scores. For a threshold of the equivalent of 2 items, additive genetic variance (A) accounted for almost 44% of the raw score variation (95% CI: 0.00–0.90). Shared environmental variance (C) accounted for 32% of the variance (CI: 0.00–0.76), while non-shared environmental variance and error (E) accounted for the remaining 24% (CI: 0.10–0.49). The χ2 value of 2.635 was non-significant (p=.451, df=3, AIC=−3.365), indicating that the expected values generated by the model did not differ significantly from the observed values, resulting in a good fit to our data.
Using a threshold of the equivalent of 2.5 items, additive genetic variance (A) accounted for almost 74% of the raw score variation (95% CI: 0.03–0.99). Shared environmental variance (C) accounted for 19% of the variance (CI: 0.00–0.75), while non-shared environmental variance and error (E) accounted for the remaining 7% (CI: 0.01–0.33). This model with the higher, 2.5-item, threshold fit slightly better, in a descriptive sense, than the model with the 2-item threshold (χ2=2.125, p=.547, df=3, AIC=−3.875). Similar analyses based on multiple threshold categorical models resulted in poor fits, perhaps due to observed zero values in some of the cells of the contingency table, and are not presented.
Mothers of toddler-age twins from an unselected population judged items reflecting behaviors used to screen for autism. These items were summed to yield scores comparable to M-CHAT scores. The sex ratio for above-threshold screener scores was lower than the typical 4:1 male: female ratio reported for autism diagnoses, and such a ratio was reached only in the range of very extreme scores. Genetic differences played a moderate role and appeared to be stronger for males than females, although this difference might have been due to male scores being more extreme, more variable, and more reliable.
These analyses extend the investigation of heritability to autism indicators during toddlerhood, when behavioral manifestations of autism become apparent and initial diagnoses are made. The genetic influence on categorical scores was moderate for both screener thresholds (44–68% of phenotypic variance) but similar to some estimates in general population studies of school-age children and adolescents using other instruments, and lower than estimates from other studies.2,4–6
Differences in age or assessment instrument may contribute to differences in estimated heritability across studies. For example, the CAST, SRS, and SCDC are parent- and teacher-reported questionnaires for children and adolescents that assess things such as how well the child can maintain a two-way conversation or whether the child has friends rather than only acquaintances. Most of the M-CHAT items assess pointing, looking, and imitation. The developmental links between the early behaviors, assessed by the M-CHAT, and later social interaction are insufficiently understood, and we have little sense of how the influence of relevant genetic and environmental factors might change over time.41–44
To understand fully the implications of each twin study of diagnosed autism or autistic traits, sample characteristics must be scrutinized. Our identification of 2.8% and 1.3% of our toddler sample as above threshold (≥ the equivalent of 2 and 2.5 of the 6 M-CHAT analogous items, respectively) overestimates the number of children expected to qualify eventually for an autism spectrum diagnosis, based on prevalence estimates for the preschool population.45 This over-inclusiveness is acceptable in first-stage screeners such as the M-CHAT, which are intended to identify children for continued evaluation. However, the threshold proposed by the M-CHAT authors might need to be modified, especially if lowering the false positive rate is a priority.7,32 The currently accepted prevalence estimate of 0.006%, if applied to this sample (which is admittedly a questionable application), suggests that 55 of the 69 children scoring above the 2-item threshold but only 19 of the 33 children scoring above the 2.5-item threshold would be false positives.45
Our observed 1.5:1 male: female ratio (for the threshold of 2 items) and 1.8:1 ratio (for the threshold of 2.5 items) differs from the 4:1 ratio commonly observed for diagnosed cases.46 We observed a more skewed (7:1) male: female ratio only when the threshold was raised to the equivalent of 4.5 items. Ronald and colleagues also reported this atypical pattern with the CAST and suggested that different thresholds for males and females might be considered if the goal is for above-threshold scores to reflect the sex ratio of diagnosed autism.33 Conversely, further characterizing young girls who exceed screener thresholds but do not eventually receive a diagnosis would be a novel and potentially fruitful line of inquiry.
We found sex differences in heritability, but the literature on this issue remains mixed. Our sample’s sex ratio increased at extreme screener scores (Figures 2 and and3),3), and male dimensional scores were more heritable than female scores. However, the male dimensional scores were also more reliable. Higher scale reliability could account for MZ male correlations being higher than MZ female correlations but not for DZ male correlations being lower than DZ female correlations (both of which we observed). The presence of three concordant opposite-sex DZ twin pairs explains the higher correlation observed for opposite-sex DZ twin pairs than same-sex DZ twins (only one concordant pair), although the small numbers involved preclude any strong conclusions about same-sex versus opposite-sex DZ concordance.
The small number of individuals with extreme screener scores makes it difficult to disentangle sex and heritability in our sample, although one explanation for the low male: female ratio we observed might be mothers’ differential expectations for males’ and females’ social and communication skills during toddlerhood. Optimally, male and female scores of equal extremity would be compared to determine whether the evidence implicates difference in heritability due to sex or extreme versus non-extreme scores.
We emphasize that we did not identify children with a current or necessarily future diagnosis of autism, but rather with the presence of early behaviors used in autism screening; indeed, other researchers also suggest that these early behaviors are not confined to individuals who will eventually qualify for an autism spectrum diagnosis.7,30 However, the thresholds used to determine which individuals screen positive are still under consideration. Our results suggest that a slightly higher threshold may identify a more heritable, although still appropriately over-inclusive, number of individuals who screen positive.
Our data collection began prior to the M-CHAT’s publication; however, a major limitation of our study is that we did not use the M-CHAT or the supplemental follow-up interview (although other studies’ protocols have also omitted the interview31). Although our sample can be defended as large and representative, a key limitation is that the number of above-threshold children is nevertheless modest.
Because our items were chosen to approximate the most discriminative M-CHAT items, and the M-CHAT does not assess restricted/repetitive or stereotyped behaviors or interests, one of the three diagnostic criteria for autism that is more apparent at later ages, our study did not tap this domain either. Thus, another assessment of toddlers that did include all three diagnostic criteria might yield different heritability estimates. However, other researchers have found that repetitive behaviors in one-year-old infants are more accurately linked to developmental delay, rather than autism.47 Like most other investigators, we also relied solely on parental report, which is subject to varying interpretation of items and other biases. Finally, a long-term follow-up of the sample will be needed to establish the screener’s sensitivity and specificity for this population.
This research was supported by NIMH awards R01-MH069793 (Goldsmith & Gernsbacher, PIs) and R01-MH059785 (Goldsmith, PI) and a Menzies Scholarship (Stilp). Infrastructure support provided by NICHD award P30-HD03352 (Marsha Seltzer, Center Director).
The authors acknowledge the families in the Wisconsin Twin Panel for their participation and Dr. Frühling Rijsdijk for her statistical consultation.
Disclosure: Ms. Stilp, Ms. Schweigert, Ms. Arneson, and Drs. Gernsbacher and Goldsmith report no biomedical financial interests or potential conflicts of interest.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.