|Home | About | Journals | Submit | Contact Us | Français|
While advances in network and pathway analysis have flourished in the era of genome-wide association analysis, understanding the genetic mechanism of individual loci on phenotypes is still readily accomplished using genetic modeling approaches. Here, we demonstrate two novel genotype-phenotype models implemented in a flexible genetic modeling platform. The examples come from analysis of families with specific language impairment (SLI), a failure to develop normal language without explanatory factors such as low IQ or inadequate environment. In previous genome-wide studies, we observed strong evidence for linkage to 13q21 with a reading phenotype in language-impaired families. First, we elucidate the genetic architecture of reading impairment and quantitative language variation in our samples using a bivariate analysis of reading impairment in affected individuals jointly with language quantitative phenotypes in unaffected individuals. This analysis largely recapitulates the baseline analysis using the categorical trait data (posterior probability of linkage (PPL) = 80%), indicating that our reading impairment phenotype captured poor readers who also have low language ability. Second, we performed epistasis analysis using a functional coding variant in the brain-derived neurotrophic factor (BDNF) gene previously associated with reduced performance on working memory tasks. Modeling epistasis doubled the evidence on 13q21 and raised the PPL to 99.9%, indicating that BDNF and 13q21 susceptibility alleles are jointly part of the genetic architecture of SLI. These analyses provide possible mechanistic insights for further cognitive neuroscience studies based on the models developed herein.
The hallmark of complex disease genetics is a lack of determinism between phenotype and the underlying susceptibility genotype(s). Unlike Mendelian diseases with underlying genes of major effect, where the relationship between phenotype and genotype may be complex but is largely deterministic and readily modeled by the concept of penetrances and disease allele frequencies, complex diseases lack such clear major effects. Unsurprisingly, statistical models used to map complex disease susceptibility loci have largely abandoned the concept of uniquely determined effects parameterized in terms of penetrance and disease allele frequencies for models that emphasize the overall strength of the genetic effect. Such models therefore ignore the specific allelic effects that underlie the relative risk defined by Risch , as one very common example, since any given relative risk value may have infinite combinations of multi-locus penetrances and susceptibility allele frequencies. However, the concept of modeling deterministic relationships is not without merit in complex disease under certain circumstances. Rather than focusing on mapability, where perhaps the strength of the effect is the main predictor of success, employing specific models of the genotype-phenotype relationship after evidence for a locus has been determined remains an underutilized tool in complex disease genetics.
Along these lines, we demonstrate a flexible genetic modeling platform (described in detail below) that allows interrogation of specific genotype-phenotype relationships in a collection of large extended pedigrees ascertained for specific language impairment (SLI). SLI is a neurodevelopmental disorder clinically defined as a significant delay in the acquisition and/or use of language despite adequate education. Classifying a subject as having SLI excludes a history of mental retardation, neurological or psychiatric impairments and speech-motor or gross sensory deficits that could have affected language acquisition. Affecting approximately 7% of children entering into kindergarten , SLI is associated with generally poor academic outcomes if unresolved [3, 4, 5]. Phenotypic presentation is heterogeneous. A large proportion of individuals with SLI will exhibit difficulty with some aspects of higher level phonological processing necessary for the development of both language and reading, while grammatical deficits are also commonly reported [for review see ]. Some subjects demonstrate concomitant difficulties in processing dynamic (rapidly changing) sensory information within a very brief time range [7, 8, 9, 10, 11, 12, 13]. Although some individuals with SLI appear to largely compensate as adults, many do not [4, 14, 15, 16, 17, 18, 19].
Familial aggregation studies, twin studies and prospective studies collectively indicate that SLI is heritable. Numerous case-control familial aggregation studies of SLI have previously been reported, and despite SLI incidence rates varying due to differences in the applied definition of SLI, the frequency of impairment is significantly increased in first-degree relatives in families containing a proband (18–42%) versus control families (3–26%) [20, 21, 22, 23, 24]. When using the categorical diagnosis of SLI, twin studies demonstrate near 100% concordance for monozygotic twins and ~50–70% concordance for dizygotic twins [25, 26], which shows that SLI as defined by categorical affection status does indeed have a genetic component. DeThorne et al.  determined heritability of language difficulty to be 0.54, which was congruent with previously reported heritability rates [28, 29]. An additional Twin Early Development Study determined the heritability of normal language in a set of 4-year-old twins to be 0.39 .
Prior linkage studies have found LOD scores greater than 3 at loci on chromosomes 13, 16, and 19 [31, 32, 33, 34]. The SLI Consortium found the locus on chromosome 16 with a non-word repetition measure (non-words are pseudowords that are consistent with standard English pronunciation) and the locus on chromosome 19 using an expressive language score trait . Two years later, in 2004, an independent sample replicated the locus on chromosome 16, increasing the total LOD score from 3.6 to 7.5 . However, the picture on chromosome 19 was more difficult to interpret since the replication sample only showed linkage with non-word repetition as opposed to the discovery sample that used the expressive language score. Pooling all samples for analysis with either univariate trait reduced the LOD to background noise. However, later investigations using multivariate analysis  demonstrated that the best-fitting model for the chromosome-19 locus gave greater weight to the expressive language score trait over the non-word repetition trait and indicated that differences seen in the 2004 linkage analysis were due primarily to the differences in ascertainment between the two samples sets. Evidence suggests that variations in both c-maf-inducing protein (CMIP) and calcium-transporting ATPase, type2C, member2 (ATP2C2) genes have independent effects that are responsible for the observed linkage to chromosome 16 . In a separate line of inquiry using these samples, a marginally significant association was seen between the contactin-associated protein-like 2 gene, CNTNAP2, and SLI . This gene is of interest due to its regulation by FOXP2, a protein known to influence human speech development, though it is not commonly mutated in developmental dyspraxias, autism or language impairments [38, 39, 40, 41, 42, 43].
Our group implicated 13q21 as a region containing an SLI susceptibility allele with an initial HLOD of 3.92 (posterior probability of linkage (PPL) of 53%) using a reading impairment phenotype ; however, a common language impairment phenotype did not provide evidence for linkage. Addition of a second group of families selected for SLI increased the PPL to 92% using the same reading impairment phenotype, thus supporting a susceptibility allele at 13q21 , though again, no linkage was observed with a language impairment phenotype. Despite very strong evidence for linkage to the region in a sample ascertained for language deficits, it has remained unclear why a reading impairment phenotype would provide stronger evidence for linkage in an SLI sample, i.e., a sample selected for language deficits, than a language impairment phenotype. While these findings are consistent with prior behavioral studies that showed a link between impaired speech and language abilities with lower reading scores , it remains to be explained if the 13q21 locus is related primarily to language, primarily to reading, or some combination thereof.
As the relationship between reading, language and the susceptibility allele on 13q21 clearly requires further elucidation, we sought ways to model this complex genotype-(multiple) phenotype relationship directly by constructing models that have strong assumptions and by assessing if the data are consistent with those assumptions. The PPL statistical framework has been extended to allow for joint analysis of clinical diagnosis (or other categorical phenotype) and a putatively related quantitative trait . This analysis has been successfully applied to datasets where (1) the relationship between the diagnosis and the quantitative trait was unknown (alcoholism diagnosis and EEG ); (2) quantitative assessment could not be completed for some individuals (autism diagnosis and parent's broader autism phenotype score), or (3) treatment effects render the quantitative trait uninformative in affected subjects, but the trait remains informative in the untreated/unaffected persons (autoimmune thyroid disease and serum auto-antibodies ). This method models the categorical phenotype as being related to the quantitative trait through an unspecified threshold, thus dichotomizing the quantitative trait into a categorical trait. In the present case, we are seeking to understand if reading impairment in our sample represents the low scores from the distribution of quantitative language variation. If true, then it would clearly demonstrate that the reading impairment phenotype is consistent with identification of language-impaired subjects. If the language trait is not linked in a univariate analysis this would additionally imply that the reading impairment phenotype preferentially selects a subset of language-impaired individuals from the sample. The converse relationship need not be present. Language- impaired individuals may not be a subset of reading-impaired individuals. Therefore, as part of our ongoing efforts to map the locus on 13q21, we sought to elucidate the relationship between reading and language as it relates to the susceptibility allele on 13q21.
We were also interested in examining the relationship between a known functional allele at a separate locus and our findings on 13q21. The advantage of incorporating gene × gene interaction, also known as epistasis, into a family study is theoretically very strong when such gene × gene interactions exist. While there are difficulties in assessing epistasis in the analysis of some study designs, our collection of large extended pedigrees contains suitable information about epistasis to make the analysis reasonable [47, 48]. Based on evidence for a marginal effect with reading impairment in our sample  (see also suppl. materials, www.karger.com/doi/10.1159/000320367), we decided to test epistasis by examining a coding variant in brain-derived neurotrophic factor gene (BDNF), a member of the nerve growth factor family known to regulate synaptic growth, proliferation, differentiation and the survival of neurons in the developing nervous system [50, 51]. A naturally occurring polymorphism in BDNF, rs6265, results in an amino acid substitution from valine to a methionine (p.Val66Met) within the prodomain of the precursor peptide affecting trafficking of the protein within the cell . Found only in humans, this variant has a population frequency of 20–30% in Caucasian populations and accounts for significant behavioral variance during N-back memory tasks  and word recall [53, 54]. The met allele has been linked to decreased neuronal integrity and synaptic abundance as well as reduced levels of episodic memory . Based on this body of compelling evidence that (1) the BDNF val/met alleles are functional and (2) these alleles produce a measurable effect on human cognition, we genotyped the underlying SNP, rs6265, in our SLI families with the hypothesis that working memory would have a detectable influence on language development, though we did not postulate as to the nature or the strength of the effect.
Using the PPL framework as a platform for examining evidence for and against specific genetic models, we examined the genotype-phenotype relationship of language impairment in a collection of large extended families. We first demonstrate the relationship between reading and language measures at our locus of interest to provide context for further work on 13q21. We then extended our previous work by examining the evidence for 13q21 when incorporating epistasis with a functional BDNF SNP. These complementary modeling approaches will be discussed in the context of the complex SLI phenotype.
The data set is derived from Bartlett et al.  with minor changes to the original samples as noted below. The study sample was comprised of 1 nuclear and 3 extended Canadian families of Celtic ancestry (n = 158). Of these, 71 subjects had both phenotypic and genotypic information, 14 only DNA for genotypes and 1 only phenotypic information. The remaining subjects were required to fully reconstruct the pedigrees but do not have phenotypic or genotypic information. We have added 55 new subjects to the existing 4 families and removed the 1 nuclear family (n = 4 subjects) from analysis due to limited DNA availability. These numbers represent a change of 59 samples from the 2002 publication. These pedigrees were originally identified while recruiting for a linkage study of schizophrenia  and were noted to have a history of language or reading impairments.
For inclusion in the study, the families were screened by a speech language pathologist via telephone interview for a history of language impairment segregating in the family. Those families with a strong family history of language impairment were scheduled for assessment. The largest family identified this way is only related to participants in the schizophrenia study by marriage (i.e., unrelated) and for the remaining families, no phenotypes from subjects with schizophrenia are used in any analysis. Prior to enrollment and testing, informed consent that conformed to the guidelines for treatment of human subjects approved by Rutgers University was obtained. All subjects received a comprehensive battery administered by an experienced tester in their own homes [see  for more details]. Assessment tools for the present study included:
For our quantitative analysis of language we used the TOLD Speaking (expressive language measures), Grammar (syntax and morphology measures) and SLQ composite scores. For our quantitative analysis of reading we used Word ID, Word Attack and Pass Comp. A table with the correlations between the quantitative measures used in this study is presented in online supplementary table 1.
We used two diagnostic classifications of impairment from Bartlett et al. . Due to the classifications not being mutually exclusive, it was possible for an individual subject to meet the criteria for more than one of the following classifications. To be language impaired, a subject's SLQ on the TOLD [56, 57] was ≤85. Further, a subject was classified as reading impaired if their single non-word reading score (Word Attack ) was one standard deviation below their performance IQ. There were 10 subjects in the language-impaired-only group, 4 in the reading-impaired-only group and 15 in the language-impaired + reading-impaired group. In our sample, it was not necessary to exclude any subject from analysis because of mental retardation, abnormal hearing, oral motor or structural defects.
To further assess linkage to the 13q21 region, we genotyped 30 SNPs at roughly 0.3 cM interval across the 10 cM region. These SNPs were selected from HapMap  (Phase 3) requiring a minor allele frequency >0.4 and a linkage disequilibrium (LD) (R2) <0.2 as calculated from the CEPH (Utah residents with ancestry from northern and western Europe) HapMap samples, and at least 200 base pairs away from any (1) dbSNP (build 129) variant ; (2) repetitive element as identified by RepeatMasker (http://www.repeatmasker.org/); (3) repetitive elements identified by Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html), or (4) ‘structural variations’ listed in the UCSC genome browser . From the candidates that met these criteria, final SNPs were chosen by walking down the chromosome in 0.3-cM intervals as determined by the Rutgers combined genetic map (for the list of chosen SNPs, see online suppl. table 2) . Genotyping was performed by multiplex PCR of amplicons containing the chosen SNPs followed by the ligase detection reaction (LDR) method of Bruse et al.  on a Luminex 200 Multiplex Bio-Assay Analyzer. All samples were stored and handled as described previously . Genotypes were called based on the same metric as in Bruse et al. , but initial clustering analysis was conducted by in-house Python 2.5.1 scripts using the scipy 0.6 library clustering routines (http://www.scipy.org/). This script automatically produces allele intensity plots color coded by called genotype. Markers with less than 85% completion were not considered further (n = 3). All plots were visually inspected (by T.R.S. and C.W.B.); SNPs that did not cluster appropriately were either dropped (n = 3) or called manually (n = 5).
Previously, a panel of 28 functional SNPs was also genotyped using the same genotyping procedures listed above and tested for association with SLI in a pilot study to examine genetic variation related to (possibly non-language) cognitive traits and their potential role in development of SLI . These 28 SNPs were chosen based on three criteria. The SNP was required to: (1) be associated with a cognitive trait in the literature; (2) have evidence of biological function from cell lines, mouse models or postmortem human brain tissue, and (3) affect a gene expressed in brain. Based on these results (for a summary, see online suppl. table 3), only the BDNF SNP, rs6265, showed evidence of a marginal effect and was chosen for analysis as a possible gene ×gene interaction partner to the 13q21 locus in this study. We re-genotyped rs6265 in single-plex PCR and LDR reactions to confirm the quality of the initial multiplex PCR/LDR genotypes. Since genotyping error detection in an isolated SNP is inefficient, even in our large extended pedigrees, we also validated the assay by direct-sequencing 2 samples in the center mass of each of the 3 genotyping clusters and 2 samples on each edge. Results from direct sequencing confirmed all genotype calls from the LDR assay.
All genotype data were automatically processed as they were generated via in-house Python scripts for consistent error checking to minimize human errors. Files containing the called genotypes were output into a pre-MAKEPED linkage format file. A linkage format locus data file was generated to form the two required inputs to PEDCHECK v1.1  which detects all Mendelian inconsistencies. PEDCHECK output was tabulated and merged with raw allele intensity data for manual evaluation of the genotype calls in the flagged nuclear family containing the error. Ambiguous genotypes were repeated and irreconcilable genotypes were handled by setting all of the flagged nuclear family's genotypes for that SNP to missing. SIMWALK2 v2.91 [69, 70] was used to detect Mendelian-consistent errors through its implementation of the Sobel et al.  pedigree likelihood that allows for errors. This likelihood is a multipoint method for detecting unlikely genotypes (input files were created with MEGA2 v4.0.r1 ). Files were analyzed 3 times using slightly different parameters and random number seeds to ensure convergence on a stable solution. Any genotype flagged in 2–3 of the runs was set to missing. Genotypes flagged in a single run were noted; repeating of analyses with these other potential genotyping errors yielded no or trivial changes in the results reported here (data not shown). Lastly, PEDSTATS v0.6.3 was used to check markers for Hardy-Weinberg equilibrium using both the founders-only function as well as the unrelated samples functions . Markers flagged at p < 0.01 were dropped (n = 4). After cleaning the data, a total of 20 SNP markers within the 10 cM 13q21 region remained for further analysis. Information content was estimated using Merlin v1.1.2  on 3 of the 4 families. The family that was too large to be run through Merlin did not differ in genotyping completion rates or heterozygosity.
We chose a unified analytical strategy for the analysis of categorical and quantitative traits in a framework that incorporates both linkage and LD with the causal disease variant on (essentially) the same scale. Use of the same scale across multiple analyses allows us to readily interpret results from one analysis in the context of another. The PPL framework is therefore used in this paper. It additionally has the advantage that it is essentially model-free, since the nuisance trait parameters have been integrated out of the likelihood and its implementation, the software program Kelvin [75, 76] has already been adapted for use with the types of analyses presented here. For additional information on how the models are calculated we refer the reader to the following papers [46, 77, 78, 79, 80, 81, 82, 83], cited in context below. Kelvin [75, 76] implements the PPL class of models for measurement of the strength of genetic evidence. The PPL is parameterized in terms of a general approximating likelihood, and all parameters of the trait model are then integrated out numerically, permitting the use of Bayes’ theorem for computation of the posterior probability of the hypothesis of interest . All results were based on multipoint analysis; Hardy-Weinberg equilibrium has been assumed throughout. The Rutgers combined genetic map was used for genetic locations . Both linkage and LD analysis can be performed in this framework; the difference between the two involves adding a single parameter for the trait-locus – marker-locus phase probability (in LD analysis) .
Our previous SLI studies have shown linkage to a dichotomous trait, reading impairment, as described above. Therefore, our baseline linkage analysis for interpretation of reading and language measures uses the dichotomous trait PPL, which is the standard admixture linkage likelihood with the trait parameters integrated out (penetrances, disease gene frequency, admixture). It is of interest to understand the genetic relationship between reading impairment or language impairment and various quantitative measures of language and reading because of the strong co-occurrence between the two disorders. To characterize these relationships we apply PPL analysis with a quantitative trait threshold. This model, as described previously [45, 46, 82], allows for the mixture of quantitative and dichotomous phenotypes in the same pedigree within the same analysis model. In this case, we classified individuals who were ‘affected’ with reading impairment or ‘affected’ with language impairment as having a phenotype beyond an unspecified threshold (this threshold is treated as a nuisance parameter and is integrated out of the likelihood to retain its model-free approach) while all ‘unaffected’ individuals were assigned their quantitative values. This analysis can be used to explore the relationship between two, in the most informative case, distinctly derived phenotypes that are not both quantitative or both dichotomous.
To model epistasis, an additional layer can be added to the above models in that multiple sets of penetrances (dichotomous traits) or multiple sets of genotypic means and variances may be modeled to have dependence on a separate, unlinked, genomic location, i.e., epistasis . Our method is an extension of the liability class model from linkage analysis which includes additional penetrances for the second (epistatic) locus and is therefore different from conditioning on a second locus in a regression framework which does not have the same genetic interpretation . The models are calculated the same way except for the inclusion of genotype-dependent parameters, where the genotype comes not from the locus under study, but a different locus of interest at an unlinked location. While there is a multiplicity of other statistical models for epistasis available in the literature [see for general discussion], we have chosen this particular model for its transparent interpretation in terms of the fundamental biology (see Discussion). Specifically, the model allows separate sets of 13q21 penetrances to differ based on genotypes from a functional SNP within BDNF. One limitation of the method we employ is that liability classes may not be unknown if a phenotype is specified due to practical limitations in computing. Therefore, 3 subjects who did not have DNA available but did have phenotypic information were set to unknown for both the phenotype and the liability class. This creates a different baseline with which the epistasis analysis must be compared since the non-epistasis baseline must also have those same 3 subjects set to phenotype unknown.
Of final note, Bayesian methods that quantify the probability of the linkage/association evidence do not generate p values . However, it is still true that a large PPL is more intellectually palatable if the specific value in question is not commonly seen across the genome. As a loose guide, we have performed a simulation where we generated 10,000 datasets. These datasets were identical to the families analyzed here in both phenotypes and configuration of available phenotypic and genotypic information. SNPs in an identical configuration of allele frequencies and map distances were simulated, but no genetic effect was simulated. In this simulation, we would characterize p values to be related to PPLs as follows: p < 0.05 related to PPL >1.5%; p < 0.01 related to PPL >4%; p < 0.001 related to PPL >23%. These numbers are virtually the same as suggested in another large-scale null simulation with the PPL .
As described in the Methods section, the pedigrees used differ slightly from our previous reports [31, 32]. We therefore did not make direct comparisons with those reported results. We first calculated a new baseline linkage analysis across the 13q21 region. Using the previously described reading impairment phenotype [31, 32], the maximum PPL in the region was 80% at rs1413119 (solid black line, fig. fig.1).1). This was larger than the previously reported 53% in Bartlett et al. [31, 32].
In order to determine if the observed linkage to the categorical reading impairment phenotype was indirectly selecting for language ability we assessed the relationship between quantitative language variation and the reading impairment in our sample. We first characterized the relationship between language and reading measures on a purely phenotypic level and then characterized the relationship between these measures in the context of the 13q21 linkage analysis. Figure Figure22 shows a boxplot analysis of how the dichotomous reading impairment trait separated out low versus high language ability. Based on these data, we may predict a clear dissociation between reading and language impairment in the linkage analysis just as was originally observed [31, 32].
We then investigated whether the 13q21 locus was related to poor reading ability due to poor language ability, using the quantitative trait threshold PPL. This analysis model explicitly assumes that reading-impaired persons are below an unspecified threshold on the quantitative language ability scale, which necessarily implies the two traits are related by way of the reading impairment, implying a co-occurring language impairment. We applied this method to 3 combinations of quantitative language measures from the TOLD family of tests (Speaking, Grammar and SLQ) and reading impairment (table (table1;1; see also online suppl. table 4). All 3 were consistent with linkage to 13q21. Both Speaking and SLQ showed strong evidence for linkage but were attenuated compared to reading impairment alone. Grammar was attenuated more where the evidence for linkage dropped below standards for declaring genome-wide linkage, but was still very suggestive (PPL = 18% where PPL = 21% is needed to declare a genome-wide linkage). These data are consistent with language processes being involved in the 13q21 linkage, though none of the three language traits selected for analysis are in fact the ‘optimal’ language trait to couple with the reading impairment phenotype highlighting the limitations of mapping complex phenotypes to the available constructs.
The reverse analogous scenario produced the opposite result, as expected. Using our language impairment trait and the quantitative reading measures Word ID, Word Attack and Passage Comprehension we examined a model that assumed language impairment was a consequence of being on the low end of quantitative reading variation. PPLs from these models were all inconsistent with linkage to 13q21. As our sample contains many language-impaired individuals who are not poor readers (see table table11 and fig. fig.2),2), analysis that mixed the two trait domains assuming a one-to-one relationship did not show linkage to 13q21. We also performed univariate quantitative trait analysis using 3 language measures and 3 reading measures, of which none showed evidence for linkage (table (table2;2; see also online suppl. materials). Thus, out of all the reading and language analyses presented, only the reading impairment phenotype with or without additional language phenotypic information showed evidence for linkage. Taken together, our reading and language analysis results are consistent with the reading impairment phenotype being related to impaired language processes.
Epistasis analysis by conditioning on alleles known to be functionally related to cognitive tasks relevant for language development was considered the most reasonable step to improve our mapping signal with this relatively small sample using the reading impairment phenotype. We previously indicated that rs6265 within BDNF showed evidence for association with the reading impairment phenotype (LD-PPL = 6%, see online suppl. materials). BDNF has a well-documented role in working memory which, although not a language measure itself, is related to language processing . We examined the effect of incorporating the BDNF rs6265 risk allele into our linkage analysis on 13q21, assuming either a dominant or recessive epistatic effect. The dominant epistatic model did not change the PPL appreciably (<2%) but the recessive epistatic model increased the PPL to 1 (fig. (fig.3).3). We would interpret this result to mean a posterior probability that is close to 1, but not 1 (which would mean certainty). The difficulty of such a PPL is that the scale has reached its ceiling, where one can no longer distinguish any changes in the strength of the evidence. In order to overcome this issue we also examined the LOD maximized over all parameters (MOD) for the two conditions since the LOD scale is unbounded. The MOD score is available as part of the standard Kelvin output as it is the maximum LOD encountered during the numerical integration procedure (and the associated model parameters are therefore the MLEs for the trait model). The increase in evidence was 4.19 LOD units to a MOD of 8.44, indicating a very clear effect of BDNF SNP rs6265 at this locus.
We sought to further improve interpretation of the difference in MOD scores, given that the allowing for epistasis adds more parameters and therefore introduces a different scale than that for the baseline (non-epistasis) MOD. Therefore, we also examined the integrated likelihood, which retains the same scale regardless of the number of parameters in the model since it is not maximized. This metric showed an increase of 2.26 log10 units to a final value of 4.86 that is nearly double the strength in evidence for linkage to the region. This measure therefore also supports a strong increase in the evidence for linkage to 13q21 and indicates that the large increase in the MOD score was not due to inflation from additional parameters in the epistasis likelihood. In 1,000 null simulations we did not find any replicates that were more extreme than this, suggesting p < 0.001. Notably, this p value would meet criteria for genome-wide significance even though the analysis was focused on 13q21, thus there was no need for genome-wide correction. Taken together, the results suggest that evidence for a susceptibility allele in 13q21 is substantially greater when an interaction with BDNF is included in the analysis.
The effect of the BDNF functional SNP can be quantified by changes in the maximum likelihood estimated penetrances from the single-locus model to the two-locus model. Table Table22 displays these results. In all cases, the penetrance for non-carriers (phenocopies) was estimated to be 0. Heterozygotes for the 13q21 risk allele show a clear dichotomy between BDNF SNP genotypes, where those with 0 or 1 less efficient met alleles are not at risk, but those with two met alleles have as much risk as persons who are homozygous for the 13q21 risk allele. The single-locus model shows some risk for 13q21 risk allele heterozygotes but substantially less than the two-locus model indicates for persons who are homozygous for the BDNF met allele.
In 2002 our group initially hypothesized that the reading impairment phenotype ‘in a population selected for SLI is most likely measuring the resultant reading outcome of an underlying language deficit as opposed to a reading deficit in isolation’ . However, this conjecture remained without evidence until two developments took place. First, we required a deeper understanding of the relationship between reading and language deficits as well as normal reading and language. In the past 8 years, numerous studies have been published that suggest reading impairments in persons with SLI may be related to their underlying language deficits [16, 89, 90, 91, 92]. Taken together, these studies indicate that abnormal language development may start the child on an abnormal developmental trajectory for the acquisition of reading skills, which increases the likelihood of a co-occurring reading deficit. Second, we required the correct analytical tools to be developed. Bartlett and Vieland  clean up presented a detailed series of simulations examining the characteristics of the PPL reparameterized for quantitative traits. That method was later adapted to allow for the unspecified threshold analysis as used in this paper and successfully applied to elucidate the genetic architecture of autoimmune thyroid disease in Vieland et al. . In this publication, we capitalized on advances in these two areas to perhaps explain why a sample selected for language impairment gives stronger linkage with a reading impairment phenotype. While the data do not suggest a reading impairment phenotype is universally better for finding SLI susceptibility alleles, as we have only examined one locus in depth in this paper, the reading impairment phenotype does appear to be the most relevant trait for the locus on 13q21.
To explore if the categorical reading impairment phenotype is indirectly selecting for a language trait we examined the data using models that assume reading impairment is related to quantitative language variation, specifically that reading-impaired persons will be on the low end of observed language variation. Our results are consistent with this assumption. The reverse was not true. Language-impaired subjects when assumed to be on the low end of reading variation did not show linkage to the 13q21 region. As one explanation for this pattern we suggest that the reading impairment trait in our sample is selecting subjects that have both reading and language impairments. This pattern is also largely seen in the phenotypic overlap between our definitions of language and reading impairments and is supported by the fact that language ability predicts reading impairment in our sample (fig. (fig.2).2). The individual quantitative traits did not show linkage to the region, indicating that no univariate analysis contains the most relevant information for linkage to 13q21. Taken together, these data support the notion that joint language and reading deficits are the most relevant phenotype for the 13q21 SLI locus.
Our analysis of reading and language phenotypes involved two levels: (1) the phenotypic analysis of how reading impairment separated out (essentially) non-overlapping groups on language scores but language impairment did not separate out groups on reading scores, and (2) the linkage analysis using the same principles. Both analyses displayed the same trends. However, we note that these two analyses were conducted on the same subjects from the same families and No. 1 is neither a replication study of the observed effects of No. 2, nor vice versa. The two analyses did not have to agree with one another by virtue of the fact that even though our families were selected based on phenotypic information, they were not selected on the basis of genotypic information. Therefore, the genotypes at 13q21 were not required to be correlated with our phenotypes (i.e., not selected for) as observed. In essence, the linkage information at this locus was a ‘free parameter’ in the second analysis that was not a simple recapitulation of the first phenotypic analysis. While this approach is intuitively not as revealing as a replication, it still represents convergence of evidence. Nonetheless, although our phenotypic analysis is concordant with the specific genetic analysis of 13q21, this finding alone does not imply that such a relationship will be observed generally in SLI. Ascertaining any subjects on language may distort the distribution of correlated reading scores, and here we have the additional complexity that these pedigrees were ascertained for apparent genetic transmission of language impairment. The relationship between language and reading may not generalize to other populations.
Further studies will be required to identify what mediates the reading-language relationship presented here. Our measure of reading involves only the recognition of isolated pseudowords (which relies strongly on phonological processing) and hence does not model the multidimensional components of reading that include vocabulary, grammar, and other language processes required for reading comprehension. In fact, it is possible that a measure of pseudoword reading actually measures a cognitive process that is fundamental to both reading and language – a process that presumably is more influenced by genetic variation on 13q21. If true, then selection for families based on language ability is a de facto selection of families based on phonological processing skills. Additionally, it is possible that aspects of language apart from phonological skills are more sensitive to effects of remediation or natural normalization of performance with age (compensation) and are therefore less suited as behavioral biomarkers.
This paper also applies a method for analysis of epistasis in pedigrees. We believe this is the first epistasis analysis to so dramatically increase the strength of the evidence for linkage to a cognitive trait in humans. The results provide much greater signal-to-noise compared to a typical genome-wide screen (either linkage or association), which is one of the theoretical benefits of exploiting epistasis. It is hoped that this will translate into greater signal for association fine mapping which is ongoing. However, even in the absence of the specific allele in 13q21 that increases susceptibility, the known functional allele in BDNF informs us about SLI in a limited but important fashion. The penetrance estimates from the two-locus modeling indicate that persons who are homozygous for the met allele at the BDNF SNP rs6265 are at a dramatically higher risk for developing SLI in conjunction with the as yet unidentified SLI risk allele for within the 13q21 locus. It remains to be seen if this effect translates to other loci as well.
Several studies have been done to try to establish the effects that the BDNF genotype has on hippocampal-dependent memory. As seen in mouse studies, human studies also give evidence to support a gene-dosage affect. It has been shown that Met carriers (Val/Met and Met/Met) consistently perform worse than their Val/Val counterparts on tasks that have been shown to be hippocampal dependent. Hariri et al.  have shown through functional magnetic resonance imaging (fMRI) that when asked to perform a test of place recognition, Met carriers exhibited significantly lower levels of activation in the hippocampus. This same result was seen during both the encoding and retrieval phases of the task. In further support, another set of fMRI studies was performed using a task (N-back task) that tested an individual's working memory and typically would result in the disengagement of the hippocampus. It was found that the Met carriers failed to disengage the hippocampus to the extent that the Val/Val individuals did . Another robust finding is a Met-associated decrease in hippocampal volume of up to 15% as measured by structural MRI. Transgenic mouse studies have reinforced the findings of human studies by showing a significant decrease in hippocampal volume of the Val/Met allele as compared to the Val/Val allele. Also, studies of the transgenic mouse have shown that the p.Val66Met is involved in activity-dependent secretion of Bdnf from neurons and results in a decrease in dendrite arbor complexity. Interestingly, there seems to be an additive gene dosage affect; Val/Met heterozygotes have less activity-dependent secretion than the Val/Val homozygote counterparts, but are still more active than the Met/Met homozygotes .
Although molecular neuroscience explanations for the interaction of BDNF with the SLI susceptibility allele on 13q21 cannot be confirmed until we have compelling evidence for a specific allele in 13q21, there are cognitive neuroscience theories that make predictions regarding possible interactions at the level of neural systems. One theory of SLI posits that language deficits either stem from, or are reinforced by, a deficient form of short-term memory for speech sounds known as phonological short-term memory . This buffer is required for the processing of speech sounds in order to string the sounds into more meaningful units for analysis, presumably at the linguistic level. Given the role of BDNF in hippocampal-related processes, if the phonological short-term memory buffer involves the hippocampus, then the interaction between BDNF and SLI alleles on 13q21 follows directly from the 13q21 SLI deficit being exacerbated by poor phonological short-term memory via hippocampal processes. Another theory of SLI suggests that an auditory processing deficit limits the ability of children with SLI to appropriately detect subtle but critical distinctions in the speech sound stream such as the subtle ‘-ed’ ending that is critical for learning the past tense . If a sensory processing deficit is present, such that phonemes are not processed correctly to begin with, an additional deficit in the ability to store those phonemes for additional processing, which may have included possible compensatory processing, could result in the kind of interaction suggested by our data. At present this is speculation, but it is possible to test these alternatives in a cognitive neuroscience framework. It is unfortunate that our Canadian pedigrees do not have an adequate measure of phonological short-term memory to test these alternatives in this sample. However, using suitable measures in other populations with rs6265 genotyping should be sufficient, given a large enough sample to test such a hypothesis.
Overall, these analyses demonstrate that thoughtful modeling of the genotype-phenotype relationship can elucidate important relationships in specific datasets that inform notions of which genetic mechanisms appear to mediate effects on phenotypic manifestations as well relationships between phenotypes. In the present case, the locus on 13q21 appears to have an interesting effect on non-word reading ability through effects on global language, presumably mediated developmentally through phonological processing. This last point is best tested at the phenotypic level and indeed evidence is provided in the literature [19, 96, 97]. Additionally, the strong genetic interaction between a BDNF variant associated with reduced working memory performance and the locus on 13q21 adds another level of phenotypic complexity in that it provides a potential genetic basis for the well-documented relationship between working memory and SLI [98, 99, 100, 101]. While it would be interesting to further test our interaction with phonological working memory phenotypes so prominently featured in the SLI literature, we do not have such a measure in these families. However, since the behavioral literature is quite extensive on the relationship of working memory as an important part of the SLI phenotype, it would appear reasonable to posit that the BDNF-13q21 interaction belies phonological working memory impairment. This possibility awaits testing in other samples.
We would like to thank the participating families, who contributed their time and patience to make this study possible, and Veronica J. Vieland for insightful comments on a previous version of the manuscript. This research was supported by Research Grant 12-FY02-107 (L.M.B.) from the March of Dimes, R01DC009453 (C.W.B.) and also supported in part by an allocation of computing time from the Ohio Supercomputer Center Grant PCCR0001-2 (C.W.B.).