|Home | About | Journals | Submit | Contact Us | Français|
Specific language impairment is a neurodevelopmental disorder characterized by impairments essentially restricted to the domain of language and language learning skills. This contrasts with autism, which is a pervasive developmental disorder defined by multiple impairments in language, social reciprocity, narrow interests and/or repetitive behaviors. Genetic linkage studies and family data suggest that the two disorders may have genetic components in common. Two samples, from Canada and the US, selected for specific language impairment were genotyped at loci where such common genes are likely to reside. Significant evidence for linkage was previously observed at chromosome 13q21 in our Canadian sample (HLOD 3.56) and was confirmed in our US sample (HLOD 2.61). Using the posterior probability of linkage (PPL) to combine evidence for linkage across the two samples yielded a PPL over 92%. Two additional loci on chromosome 2 and 7 showed weak evidence for linkage. However, a marker in the cystic fibrosis transmembrane conductance regulator (7q31) showed evidence for association to SLI, confirming results from another group (O’ Brien et al. 2003). Our results indicate that using samples selected for components of the autism phenotype may be a useful adjunct to autism genetics.
Specific language impairment (SLI) is a neurodevelopmental disorder clinically defined as a failure to acquire and/or use language normally given adequate education and a suitable environment and in the absence of mental retardation, speech-motor or sensory deficits. SLI affects approximately 7% of children entering kindergarten  and is associated with generally poor academic outcomes if unresolved [2–5]. SLI has a genetic component as evidenced by familial aggregation and twin studies [6–9]. Recently, two groups have performed genomewide screens to search for SLI susceptibility genes and have attained LOD scores greater than 3 at non-overlapping loci on chromosomes 13, 16 and 19 [10, 11].
In contrast to SLI, autism is a relatively rare disorder that presents with abnormal development of language and social responses/initiation, and is also characterized by stereotypic behavioral repertoires . The severity of language related deficits in autism can make administration of standardized tests such as IQ difficult if other confounding behavioral traits are present. Autism has a genetic basis evidenced by twin studies and numerous groups have undertaken the search for susceptibility genes [13, 14].
The language-only phenotype that is associated with specific language impairment is not as severe as the language difficulties seen in autism. Children with SLI are very able to perform many tasks useful in diagnosis or research, allowing the accurate measurement of reading, language and IQ. SLI is also a common disorder, making ascertainment of extended families with multiple affected individuals a reasonable task. Several lines of evidence have suggested that there may be genetic overlap between autism and SLI.
Familial studies have repeatedly found that after ascertaining a proband for autism, the first and second degree relatives are commonly observed to have cognitive deficits that differ from autism in severity [15–20]. Language deficits are reported in many family studies of autism, though subjects have not been expressly evaluated for SLI. Twin studies have shown that while the MZ concordance rate for a strict diagnosis of autism is only 36%, the co-twin commonly has cognitive deficits, usually involving language delay. Using a more liberal autism spectrum diagnosis that includes language delay raises the MZ concordance rate to 82% .
Language delay is more common in persons with SLI than the overall population , and the language difficulties that appear in relatives of probands with autism do not co-occur with mental retardation or general cognitive impairments [23–25]. Thus, the language phenotype of co-twins and relatives of probands with autism appears quite similar to the phenotype of SLI. Additionally, language profiles of children with autism indicate that at least a portion of children with autism have language profiles similar to that of SLI children . As standardized tests are not practical for many children with autism (only about half Kjelgaard and Tager-Flusberg’s  sample could be measured using all tests), it is unclear what percentage of cases with autism at the population level would have similar language profiles to SLI.
Genetic studies of autism have suggested several locations where susceptibility genes may reside including chromosomes 2q, 7q, and 13q [27–39]. All three loci at 2q31, 7q31–32 and 13q21 have been suggested by more than one group. Convergence of linkage results on these chromosomal regions in independent samples suggests their involvement in the etiology of autism.
The region on chromosome 7q31–32 was implicated in an autism linkage study of 83 sib pairs by the International Molecular Genetic Study of Autism Consortium  which remained high after adding an additional 69 sib pairs (MLS 3.55) . Several other groups have suggested linkage to the same region [29, 33, 37, 40]. This region was initially suspected to contain a language-related autism gene by coincident linkage with the SPCH1 locus, which has since been cloned (FOXP2) . However, studies have failed to detect defects in FOXP2 that are associated with autism [42–44].
Genetic association of SLI has been examined in the 7q31–32 region using a categorical definition of language impairment and markers located in and around FOXP2 . Two markers that showed nominal significance after correction for multiple tests further indicating the presence of a language gene not necessarily related to autism. The significant markers are located ~5 Mb centromeric and ~3 Mb telomeric to FOXP2, respectively. O’Brien et al.  failed to find association of SLI within FOXP2 itself, as was the case in two UK SLI samples [46, 47]. Additionally, a genome scan for dyslexia in a sample of Finnish families suggested linkage to the region with an NPL score of 2.77, with one family showing an NPL of 4.21 . Direct sequencing of six subjects with dyslexia and three controls failed to identify coding mutations in FOXP2. Taken together, these data suggest that FOXP2 is not involved in either autism-related language impairments or more common language impairments (i.e. dyslexia and SLI).
Using quantitative trait linkage analysis with age of onset for first word, a suggestive QTL was localized to the 7q autism region in the Autism Genetic Resource Exchange family collection . The quantitative trait for age at first word generated much higher evidence for linkage to this region than the original analysis using a qualitative autism diagnostic scheme . The Collaborative Linkage Study of Autism, (CLSA) performed an additional analysis on their genome scan data . When they subset their families by the presence or absence of phrase speech delay (PSD), defined by concordance of both affected sibs for onset of phrase speech >36 months, the LOD score for markers in the 7q31–32 region increased in the PSD group .
Chromosome 13q21 is a second region that has been implicated in both autism and SLI through linkage analysis. The CLSA [31, 37] initially found a multipoint HLOD of 2.3 in this region. However, when the families were subset on the PSD criteria, the PSD group had an HLOD of 2.54 while in the non-PSD group the HLOD was 0.0. This area was also suggested in another genome scan using a screening set of families, but the MLS went to approximately zero when a second set of families was added . As autism is a heterogeneous disorder, simple admixture may account for the reduced MLS as power to detect linkage in the presence of heterogeneity can decrease as sample size increases when pooling is used to combine the data .
The SLI3 locus on chromosome 13 (OMIM # 607134) , directly overlaps with the potential autism locus [31, 37]. As part of an SLI genome scan in 5 nuclear and extended Canadian families, three phenotypes were tested for linkage under dominant and recessive modes of inheritance. The SLI3 locus was found using a reading discrepancy phenotype (single nonword reading 1 SD below nonverbal IQ) assuming a recessive mode of inheritance (LOD = 3.52, genomewide empirical p value <0.05). Intriguingly, the same maker in both the SLI and autism studies yielded the maximum multipoint HLOD score (D13S800).
Chromosome 2 is a third location with evidence for linkage to a language/autism phenotype. The IMGSAC sample yielded a multipoint MLS of 4.8 using a narrow diagnosis phenotype . This region has also been examined through stratification of two independent samples based on the PSD criteria described for the CLSA (above) [36, 39]. In both cases, the evidence for linkage increased in the PSD group. In Buxbaum et al. , the peak linkage finding overlapped IMGSAC , while in Shao et al.  the peak was ~ 15 cM away.
Taken together, these data suggest that SLI and autism may share a genetic component. Using a sample ascertained for SLI has allowed us to carefully select our probands and create detailed language and reading profiles that will increase our power to detect loci that influence language acquisition. While SLI is not autism, the use of our sample may prove to be a valuable adjunct to autism research by finding linkage to a genetic component that overlaps both disorders or by finding a language impairment susceptibility gene that modulates the autism phenotype (either additively or epistatically).
This study presents linkage results from two samples selected for SLI. The first was a fairly homogeneous sample of Celtic ancestry described in detail previously as part of an SLI genome scan . The second sample was ascertained in the US without strict requirements for ethnic homogeneity. The two samples differ in marker allele frequencies and may differ in disease allele frequencies or display locus heterogeneity. We therefore sought to find a statistically rigorous way to combine linkage information across these two samples.
A variety of methods can be used to combine linkage information when different types of heterogeneity are suspected. This paper follows the notation of Vieland et al. , which illuminates several key points about combining linkage information across samples. Briefly, two data sets can be pooled for calculation of one HLOD that assumes one theta and one alpha for both samples (HLOD-P) and loses power if the genetic models differ between samples. Alternatively, the HLOD scores can be calculated separately then added across samples (HLODS). As the maximum HLOD cannot be less than zero, the HLOD-S cannot accumulate negative evidence and is therefore anti-conservative.
It is also possible to collect the linkage information from each sample in the form of a posterior probability of linkage (PPL). The PPL is a Bayesian method for collecting linkage information in a flexible statistical framework that allows for removal of nuisance parameters by integrating them out of the equation [50–52]. If the admixture parameter between two samples is different, then combining linkage information handling admixture as a nuisance parameter would be more efficient than either simple pooling (HLOD-P) or summation (HLOD-S). Furthermore, the ‘apparent’ mode of transmission could differ between families with the same complex disease locus based upon heterogeneity at other loci. Thus, integration of parameters such as disease gene frequency and the penetrance vector will allow the data to accumulate valid linkage information without constraining the entire data set to one or two less than ideal alternatives such as the MMLS-C [53, 54]. In order to examine these propositions about combining linkage data in a real data set, we calculated three statistics (HLOD-P, HLOD-S, and PPL) and discuss the potential differences.
The second sample consisted of 22 nuclear and extended families from the United States (N = 279). In each of these families, a proband was ascertained by either clinical referral or by announcement of the study criteria at professional conferences (see below for proband behavioral criteria). Families with at least one additional affected family member were included; the final sample consisted of 19 Caucasian and 3 Hispanic families. Assessment of the US sample and proband designation were the same as described in Bartlett et al. ; the assessment included measures of language, reading, non-verbal intelligence as well as an oral-speech mechanism screening. Additionally, hearing was assessed by positive identification of 500 Hz (at 30 dB), 1000, 2000 and 4000 Hz (at 20 dB) pure tones.
The overall data set consisted of two independent samples. The first sample was described in Bartlett et al . Briefly, 2 nuclear and 3 extended families of Celtic ancestry living in Canada were phenotyped with language/reading measures (n = 73 subjects). Thirteen additional subjects (86 total) had DNA available. A speech language pathologist screened families by telephone interview for a history of language impairment segregating in the family. Families with a strong family history of language impairment were scheduled for assessment. All subjects received a comprehensive neuropsychological battery administered by an experienced tester in their own homes.
US families were included in the study if at least two persons met criteria for an SLI proband. All subjects were enrolled and tested after giving informed consent that conformed to the guidelines for treatment of human subjects approved by Rutgers University.
The three diagnostic classifications of impairment from Bartlett et al.  were employed. The classifications were not mutually exclusive; an individual subject could meet the criteria for more than one of the following classifications. A subject was classified as Language Impaired if their spoken language quotient on the Test of Language Development (TOLD) was ≤ 85. A subject was classified as Reading Impaired if their single nonword reading score (Word Attack) was one standard deviation below their performance IQ. Clinical Impairment criteria are described in detail in Bartlett et al. . Briefly, a subject was defined as clinically impaired if they fell into one of the following three groups. First, the subject was language impaired or reading impaired. Second, the subject was not language impaired, but scored one standard deviation below the mean on three individual subtests of TOLD or scored ≤85 on the receptive language measure (Token Test). Third, the subject had a history of language difficulty for at least two years during childhood. It was not necessary to exclude any subject from analysis because of mental retardation, abnormal hearing, oral motor or structural defects. Table 1 summariaizes the diagnostic overlap for the American and Canadian samples.
All family members who were willing to submit DNA samples (n = 365) were genotyped. DNA was extracted from peripheral blood samples or buccal swabs as described previously . Microsatellite markers were genotyped on chromosomes 2, 7, and 13 as shown in tables 2–4 using previously described methods . The marker, CFTR-TET, is on chromosome 7 as described in Gasparini et al  (GDB:182312). PCR primers were ordered from Research Genetics as part of the Human Map Pairs set or redesigned from the GDB locus sequence using the Primer 3 program .
Parametric analysis was performed with FASTLINK version 4.1P programs [58, 59]. The Language Impairment, Reading Impairment, and Clinical Impairment phenotypes were each analyzed under both a dominant and a recessive mode of inheritance, for a total of six analyses. Model parameters were the same as in Bartlett et al. . Heterogeneity testing was performed with HOMOG. Marker allele frequencies were estimated separately for the two samples, by allele counting using all genotyped unrelated individuals. Genetic distance between markers were taken from the Marshfield Map . SimWalk2 v2.83  was used for genotype mistyping analysis as well as for generation of haplotypes. Files were analyzed several times using slightly different parameters and random number seeds to ensure convergence on a stable solution. Genotypes with a mistyping probability >0.05 were compared to the raw data by two independent evaluators. Ambiguous genotypes were repeated or excluded. Haplotypes were used to determine crossover events in affected individuals within 13q21. Families with LOD scores greater than 0.6 at markers D13S1317 and D13S13109 were included in the determination of the critical region.
To combine linkage information across samples, we have calculated the HLOD for each sample separately , the HLOD-S and HLOD-P , and the PPL [11, 50, 51, 63, 64]. The PPL directly measures the probability that the recombination fraction between the marker and a putative disease gene is <0.5 and incorporates a prior probability of linkage of 2% [65, 66]. Nuisance parameters such as the penetrance vector, disease gene frequency and the admixture parameter, α, were integrated out assuming essentially uniform prior distributions for these parameters.
Family-based association tests were performed with the Pedigree Disequilibrium Test (PDT) version 4.0 [67, 68]. The PDT is a valid test of linkage and association in general pedigrees even in the presence of population stratification and calculates two alternatives of the test statistic. The AVE PDT weights each individual family’s contribution equally, regardless of pedigree size and complexity while the SUM PDT weights families proportional to these factors. For the three markers where genotypes were omitted due to ambiguity (n = 4), the TDT-AE was run instead of the PDT to ensure the correct type-I error rate . To correct Type-I error for multiple correlated phenotypes and the use of both PDT statistics, we simulated 1000 unlinked markers using SIMULATE. All individuals (excluding those without available DNA) and all pedigrees were the same as used in the actual analysis, but marker genotypes were generated without regard to affection status. Each unlinked simulated marker was analyzed with both PDT statistics three times, once with each phenotype. The resultant p values were compiled into a single distribution and compared to the results of the actual analysis.
In the 13q21 region, ten markers were genotyped in both samples. The four markers D13S788, D13S1317, D13S800 and D13S1306 were reported previously for the Canadian sample in Bartlett et al . Data for 13q21 are summarized in table 2. Under the recessive reading impairment model, both samples maximize at D13S1317 (US, 2.616; Canadian, 3.565; HLOD-P, 6.031). Combining both samples with the HLOD-S shows a global maximum of 6.181 at the same location. Visual inspection of the by family LOD scores in the US sample indicates that most of the linkage signal comes from 4 pedigrees. Despite three of these four pedigrees showing evidence for linkage throughout the 13q21 region, the HLOD’s for the whole US sample becomes greater than 1 in only two locations (D13S1309 and D13S1317). Multipoint analysis of the US sample using markers D13S1309 and D13S1317 decreased the HLOD slightly to 2.380. Marker D13S1231 did not show evidence for linkage in the US sample (HLOD = 0), despite close proximity to D13S1317 (<1 cM). The polymorphism information content (PIC) for D13S1231 was only 0.605 in the US sample, one large pedigree in particular showed a reduction of 0.6 LOD units due to homozygosity in several founders.
The PPL shows the effect of combining the Canadian and US samples. At D13S1317 the US PPL was 0.168 while the Canadian PPL was 0.542. The combined PPL for the reading impairment phenotype was 0.923. Haplotyping indicated critical recombination events in affected pedigree members between D13S1303 and D13S1309 as well as between D13S800 and D13S792. The critical region for SLI3 is still rather large, 7 cM with a corresponding physical distance of 16 Mb. However, this region contains only 12 known genes.
Chromosome 7 yielded two HLOD-S scores greater than 1 (D7S3052, 1.579 recessive clinical impairment; D7S2426, 1.776 recessive language impairment). The various heterogeneity linkage statistics for these two loci are displayed in table 3 by marker and phenotype. The strongest evidence for linkage was at D7S2426 where the combined US and Canadian PPL was 0.087 and the HLOD-S was 1.776 using the language impairment phenotype. The combined PPL for marker D7S3052 was 0.017 using the clinical impairment phenotype, with a corresponding HLOD-S of 1.579. The highest PPL for CFTR-TET was 0.017 using the clinical impairment phenotype though it should be noted that CFTR-TET was only biallelic in our sample with low information content for linkage (heterozygosity = 0.43). As these two markers (D7S3052 and CFTR-TET) were previously implicated to be involved in language by genetic association, we performed the PDT on all chromosome 7 markers using the three phenotypes. The only marker with p < 0.05 was CFTR-TET using the reading impairment phenotype (SUM PDT, p = 0.0262; AVE PDT, p = 0.0223). Simulation of 1000 unlinked markers analyzed with all three phenotypes show this result to be nominally significant after correcting for multiple phenotypes (p < 0.045), but not after correcting for multiple markers (p < 0.32). Though the PDT is a statistically valid test of linkage and association in extended pedigrees, such as ours, the power of 27 pedigrees is small.
Only one chromosome 2 marker gave rise to an HLOD-S greater than 1 (D2S2314, 1.716 recessive clinical impairment) with a corresponding PPL of 0.051. The language impairment model also showed evidence for linkage under the same genetic model (PPL 0.051, HLOD-S = 1.601). All chromosome 2 results are reported in table 4.
In this study, we have demonstrated linkage of SLI to 13q21 in an independent dataset ascertained in the US, lending further support to our previous linkage data in Canadian SLI families . This replication occurred using the same phenotype and trait model as the Canadian families. The 13q21 locus is evident in our SLI families from the US, increasing the possibility that this locus is the same locus described in the CLSA’s autism families that were also collected in the US . Further, the linkage results of both groups are further supported by cytogenetic evidence from a 13q21 deletion implicated in autism . The paucity of known genes in the region, if true, will facilitate the joint cloning efforts that are currently underway.
Visual inspection of the pedigrees at 13q21 indicates that only a small minority of US families are providing evidence for linkage (<30%). It appears unlikely that the gene(s) on 13q21 are necessary for SLI susceptibility and may not be a general risk factor in other populations. All of the families that provide evidence for linkage (Canadian and US) are Caucasian and were selected to have more than 1 person with a language impairment. Since a meta-analysis of family studies  has demonstrated that nuclear families ascertained through an SLI proband have no other affected family members ~31% of the time, our samples are likely to be biased towards finding genes that segregate in higher density families which may not be common in singleton family units ascertained through the same or different phenotypic criteria.
Our combined Canadian and US samples failed to provide conclusive linkage results on either chromosomes 2 or 7. A number of different reasons could explain these results. Since the PPL was greater than the prior probability of linkage for both chromosomes, though not tremendously so, if genes influencing language acquisition reside at these locations, they could be of relatively low penetrance with small effect size and therefore difficult to detect with linkage analysis. If these genes have epistatic interactions with necessary or sufficient autism susceptibility loci, their effects may be more pronounced in autism samples. Additionally, the loci on chromosomes 2 and 7 could be providing weak evidence due to locus heterogeneity on a large scale, or could simply be false positives. Furthermore, our phenotypes are derived from scores on standardized tests of language and reading, which differs from affection status based on autism or phrase speech delay. Further work and collaboration will be required to refine our understanding of the role of these loci in both disorders.
Our sample demonstrated nominal association of a marker in CFTR with SLI. Use of extended pedigrees for categorical association studies is still being developed and for the statistical tests available, power is not very high for small samples . Despite low power, we have suggested one of the same markers for association with SLI as O’Brien et al. . They advocated an as of yet undefined mutation in FOXP2 or the promoter as a likely candidate for SLI susceptibility. However, none of their markers in FOXP2 showed evidence of association and the two markers they did implicate are ~5 Mb (D7S3052) and ~3 Mb (CFTR-TET) on either side of FOXP2. As 3–5 Mb is rather large in terms of linkage disequilibrium, it may be more parsimonious to assume that CFTR or immediately surrounding genes are stronger positional candidates to influence language acquisition in autism and SLI. WNT2 is adjacent to CFTR, and has also shown evidence for association to autism with the majority of the association signal attributable to the PSD group .
The results of pooling our different samples yielded several interesting points. For D7S2426 under the Reading and Language phenotypes, both samples converged on 1.0 as an estimate of the admixture parameter. However, the HLOD-P showed a decrease from 1.0 to 0.25 for the reading phenotype and 0.55 for the language phenotype. This illustrates simulation and analytical results in nuclear families [71, 72] and highlights the difficulty in interpreting the admixture parameter in finite samples of extended pedigrees such as ours. Despite this difficulty, the HLOD is still a powerful tool for linkage detection allowing for heterogeneity in a given single sample [50, 73–75]. As expected, in all cases the HLOD-P was lower than the HLOD-S, yet this was seen even when the admixture parameter for the HLOD-P was estimated to 1.0.
Over the markers considered, the HLOD-P did not agree with the PPL as much as the HLOD-S. This was suggested by Vieland et al.  for simulated nuclear families, and also appears to be the case for extended pedigrees. However, there was one data point where the HLOD-S and PPL results might lead to different interpretations about the likelihood of a susceptibility locus at this location (D7S3052, clinical impairment in table 3) which could have several possible explanations. However, the computational differences between the HLOD-S and the PPL do not require these two statistics to be in agreement. The HLOD-S has been maximized over many parameters (2 thetas, 2 alphas, 2 modes of inheritance, and 3 phenotypes per marker) and is hard to interpret as the expected value of the HLOD-S under the null hypothesis given our maximizations is certainly greater than zero. However, the PPL is a much better tool for accumulating evidence against linkage compared to the HLOD-S, due to the anticonservative nature of summing statistics that can never be less than zero (the constituent HLOD’s). Though we have, in essence, maximized the PPL by use of three different phenotypes, the PPL has two distinct advantages over traditional methods, first being the ability to accumulate negative evidence as mentioned above, and the second being reduction of the probablility of observing a misleading PPL as the sample size increases . We do acknowledge that maximizing over phenotypic models will tend to inflate the probability of larger scores when there is no linkage, which is of concern, however the PPL increased when an independent data set was added, which provides another indicator that linkage is quite likely.
O’Brien et al.  is the second SLI genetics study to show the utility of dichotomizing quantitative traits for linkage and association analysis. In that study, the only results significant enough to report were with categorical phenotypes. The use of quantitative analysis for either linkage or association failed to demonstrate/suggest linkage or association. Though use of a categorical trait requires an arbitrary (in the genetic sense) threshold for affection, it appears that the information lost through dichotomy can be made up by the inherently greater statistical power of categorical linkage in at least some circumstances. This phenomenon seems similar in form to the simulation and analytical results of Terwilliger  where it was demonstrated that using a more deterministic genetic model (i.e. higher penetrances and lower disease frequencies) is more robust for linkage detection in categorical analysis, than a less deterministic (though possibly more realistic) genetic model since the latter assumes a priori that many meioses are uninformative. While categorical analysis is not a ‘one size fits all’ statistical technique and must still be used with caution and careful thought, it appears that the boundary between when researchers should use categorical versus quantitative analysis is very unclear. While categorical analysis seems to be useful in SLI research, it will be important to develop pooling techniques for quantitative traits that have equivalent properties to the PPL.
We would like to thank the participating families, who contributed their time and patience to make this study possible; Anne Bassett for her assistance in ascertainment of the Canadian families; Teresa Realpe-Bonilla, Linda Hirsch and Jason Nawyn for managing the phenotype database; Jared Hayter and Ray Zimmerman for technical assistance in the laboratory; the testers associated with the Laboratory of Paula Tallal; Neda Gharani for comments on earlier versions. This research was supported by grants from the March of Dimes (Support to LMB), the National Alliance for Autism Research (Support to CWB and LMB). PT, LMB and JFF were supported by NIDCD RO1 DC01654. VJV is supported by NIH MH52841.