Multipoint (MP) linkage analysis represents a valuable tool for whole-genome studies but suffers from the disadvantage that its probability distribution is unknown and varies as a function of marker information and density, genetic model, number and structure of pedigrees, and the affection status distribution [Xing and Elston: Genet Epidemiol 2006;30:447–458; Hodge et al.: Genet Epidemiol 2008;32:800–815]. This implies that the MP significance criterion can differ for each marker and each dataset, and this fact makes planning and evaluation of MP linkage studies difficult. One way to circumvent this difficulty is to use simulations or permutation testing. Another approach is to use an alternative statistical paradigm to assess the statistical evidence for linkage, one that does not require computation of a p value. Here we show how to use the evidential statistical paradigm for planning, conducting, and interpreting MP linkage studies when the disease model is known (lod analysis) or unknown (mod analysis). As a key feature, the evidential paradigm decouples uncertainty (i.e. error probabilities) from statistical evidence. In the planning stage, the user calculates error probabilities, as functions of one's design choices (sample size, choice of alternative hypothesis, choice of likelihood ratio (LR) criterion k) in order to ensure a reliable study design. In the data analysis stage one no longer pays attention to those error probabilities. In this stage, one calculates the LR for two simple hypotheses (i.e. trait locus is unlinked vs. trait locus is located at a particular position) as a function of the parameter of interest (position). The LR directly measures the strength of evidence for linkage in a given data set and remains completely divorced from the error probabilities calculated in the planning stage. An important consequence of this procedure is that one can use the same criterion k for all analyses. This contrasts with the situation described above, in which the value one uses to conclude significance may differ for each marker and each dataset in order to accommodate a fixed test size, α. In this study we accomplish two goals that lead to a general algorithm for conducting evidential MP linkage studies. (1) We provide two theoretical results that translate into guidelines for investigators conducting evidential MP linkage: (a) Comparing mods to lods, error rates (including probabilities of weak evidence) are generally higher for mods when the null hypothesis is true, but lower for mods in the presence of true linkage. Royall [J Am Stat Assoc 2000;95:760–780] has shown that errors based on lods are bounded and generally small. Therefore when the true disease model is unknown and one chooses to use mods, one needs to control misleading evidence rates only under the null hypothesis; (b) for any given pair of contiguous marker loci, error rates under the null are greatest at the midpoint between the markers spaced furthest apart, which provides an obvious simple alternative hypothesis to specify for planning MP linkage studies. (2) We demonstrate through extensive simulation that this evidential approach can yield low error rates under the null and alternative hypotheses for both lods and mods, despite the fact that mod scores are not true LRs. Using these results we provide a coherent approach to implement a MP linkage study using the evidential paradigm.
Evidential paradigm; Likelihood; Parametric linkage; Complex disease
Genetic studies of lung disease in Cystic Fibrosis are hampered by the
lack of a severity measure that accounts for chronic disease progression and
mortality attrition. Further, combining analyses across studies requires common
phenotypes that are robust to study design and patient ascertainment.
Using data from the North American Cystic Fibrosis Modifier Consortium
(Canadian Consortium for CF Genetic Studies, Johns Hopkins University CF Twin
and Sibling Study, and University of North Carolina/Case Western Reserve
University Gene Modifier Study), the authors calculated age-specific CF
percentile values of FEV1 which were adjusted for CF age-specific mortality
The phenotype was computed for 2061 patients representing the Canadian CF
population, 1137 extreme phenotype patients in the UNC/Case Western study, and
1323 patients from multiple CF sib families in the CF Twin and Sibling Study.
Despite differences in ascertainment and median age, our phenotype score was
distributed in all three samples in a manner consistent with ascertainment
differences, reflecting the lung disease severity of each individual in the
underlying population. The new phenotype score was highly correlated with the
previously recommended complex phenotype, but the new phenotype is more robust
for shorter follow-up and for extreme ages.
A disease progression and mortality adjusted phenotype reduces the need
for stratification or additional covariates, increasing statistical power and
avoiding possible distortions. This approach will facilitate large scale genetic
and environmental epidemiological studies which will provide targeted
therapeutic pathways for the clinical benefit of patients with CF.
Forced Expiratory Volume; Age Effects; Severity of Illness Index
Reading disability (RD) is a common neurodevelopmental disorder with genetic basis established in families segregating “pure” dyslexia. RD commonly occurs in neurodevelopmental disorders including Rolandic Epilepsy (RE), a complex genetic disorder. We performed genomewide linkage analysis of RD in RE families, testing the hypotheses that RD in RE families is genetically heterogenenous to pure dyslexia, and shares genetic influences with other sub-phenotypes of RE.
We initially performed genome-wide linkage analysis using 1000 STR markers in 38 US families ascertained through a RE proband; most of these families were multiplex for RD. We analyzed the data by two-point and multipoint parametric LOD score methods. We then confirmed the linkage evidence in a second US dataset of 20 RE families. We also resequenced the SEMA3C gene at the 7q21 linkage locus in members of one multiplex RE/RD pedigree and the DISC1 gene in affected pedigrees at the 1q42 locus.
In the discovery dataset there was suggestive evidence of linkage for RD to chromosome 7q21 (two-point LOD score 3.05, multipoint LOD 3.08) and at 1q42 (two-point LOD 2.87, multipoint LOD 3.03). Much of the linkage evidence at 7q21 derived from families of French-Canadian origin, whereas the linkage evidence at 1q42 was well distributed across all the families. There was little evidence for linkage at known dyslexia loci. Combining the discovery and confirmation datasets increased the evidence at 1q42 (two-point LOD = 3.49, multipoint HLOD = 4.70), but decreased evidence at 7q21 (two-point LOD = 2.28, multipoint HLOD = 1.81), possibly because the replication sample did not have French Canadian representation.
Reading disability in rolandic epilepsy has a genetic basis and may be influenced by loci at 1q42 and, in some populations, at 7q21; there is little evidence of a role for known DYX loci discovered in “pure” dyslexia pedigrees. 1q42 and 7q21 are candidate novel dyslexia loci.
A combined genome-wide association and linkage study was used to identify loci causing variation in CF lung disease severity. A significant association (P=3. 34 × 10-8) near EHF and APIP (chr11p13) was identified in F508del homozygotes (n=1,978). The association replicated in F508del homozygotes (P=0.006) from a separate family-based study (n=557), with P=1.49 × 10-9 for the three-study joint meta-analysis. Linkage analysis of 486 sibling pairs from the family-based study identified a significant QTL on chromosome 20q13.2 (LOD=5.03). Our findings provide insight into the causes of variation in lung disease severity in CF and suggest new therapeutic targets for this life-limiting disorder.
Investigators performing genetic association studies grapple with how to measure strength of association evidence, choose sample size, and adjust for multiple testing. We apply the evidential paradigm (EP) to genetic association studies, highlighting its strengths. The EP uses likelihood ratios (LRs), as opposed to P-values or Bayes' factors, to measure strength of association evidence. We derive EP methodology to estimate sample size, adjust for multiple testing, and provide informative graphics for drawing inferences, as illustrated with a Rolandic Epilepsy (RE) fine-mapping study. We focus on controlling the probability of observing weak evidence for or against association (W) rather than type I errors (M). For example, for LR⩾32 representing strong evidence, at one locus with n=200 cases, n=200 controls, W=0.134, whereas M=0.005. For n=300 cases and controls, W=0.039 and M=0.004. These calculations are based on detecting an OR=1.5. Despite the common misconception, one is not tied to this planning value for analysis; rather one calculates the likelihood at all possible values to assess evidence for association. We provide methodology to adjust for multiple tests across m loci, which adjusts M and W for m. We do so for (a) single-stage designs, (b) two-stage designs, and (c) simultaneously controlling family-wise error rate (FWER) and W. Method (c) chooses larger sample sizes than (a) or (b), whereas (b) has smaller bounds on the FWER than (a). The EP, using our innovative graphical display, identifies important SNPs in elongator protein complex 4 (ELP4) associated with RE that may not have been identified using standard approaches.
evidential paradigm; multiple testing; profile likelihood
Investigators performing genetic association studies grapple with how to measure strength of association evidence, choose sample size, and adjust for multiple testing. We apply the evidential paradigm (EP) to genetic association studies, highlighting its strengths. The EP uses likelihood ratios (LRs), as opposed to p-values or Bayes’ factors, to measure strength of association evidence. We derive EP methodology to estimate sample size; adjust for multiple testing; and provide informative graphics for drawing inferences, as illustrated with a Rolandic Epilepsy fine-mapping study. We focus on controlling the probability of observing weak evidence for or against association (W) rather than Type I errors (M). E.g., for LR>32 representing strong evidence, at one locus with n=200 cases, n=200 controls, W=0.134, whereas M=0.005. For n=300 cases and controls, W=0.039 and M=0.004. These calculations are based on detecting an OR=1.5. Despite the common misconception, one is not tied to this planning value for analysis; rather one calculates the likelihood at all possible values to assess evidence for association. We provide methodology to adjust for multiple tests across m loci, which adjusts M and W for m. We do so for (a) single-stage designs, (b) two-stage designs, and (c) simultaneously controlling family-wise error rate (FWER) and W. Method (c) chooses larger sample sizes than (a) or (b), while (b) has smaller bounds on the FWER than (a). The EP, using our innovative graphical display, identifies important SNPs in Elongator Protein Complex 4 (ELP4) associated with Rolandic Epilepsy that may not have been identified using standard approaches.
Panic disorder (PD) and social anxiety disorder (SAD) are moderately heritable anxiety disorders. We analyzed five genes, derived from pharmacological or translational mouse models, in a new case-control study of PD and SAD in European Americans: (1) the serotonin transporter (SLC6A4), (2) the serotonin receptor 1A (HTR1A), (3) catechol-o-methyltransferase (COMT), (4) a regulator of g-protein signalling, RGS2, and (5) the gastrin releasing peptide receptor (GRPR). Cases were interviewed using the Schedule for Affective disorders and Schizophrenia (SADS-LA-IV) and were required to have a probable or definite lifetime diagnosis of PD (N = 179), SAD (161) or both (140), with first onset by age 31 and a family history of anxiety. Final diagnoses were determined using the best estimate procedure, blind to genotyping data. Controls were obtained from the NIMH Human Genetics Initiative; only subjects above 25 years of age who screened negative for all psychiatric symptoms were included (N = 470). A total of 45 SNPs were successfully genotyped over the 5 selected genes using Applied Biosystems SNPlex protocol. SLC6A4 provided strong and consistent evidence of association with the PD and PD+SAD groups, with the most significant association in both groups being at rs140701 (χ2=10.72, p=0.001 with PD and χ2=8.59, p=0.003 in the PD+SAD group). This association remained significant after multiple test correction. Those carrying at least one copy of the haplotype A-A-G constructed from rs3794808, rs140701 and rs4583306 have 1.7 times the odds of PD than those without the haplotype (90%CI 1.2-2.3). The SAD only group did not provide evidence of association, suggesting a PD driven association. The findings remained after adjustment for age and sex, and there was no evidence that the association was due to population stratification. The promoter region of the gene, 5-HTTLPR, did not provide any evidence of association, regardless of whether analyzed as a triallelic or biallelic locus, nor did any of the other four candidate genes tested. Our findings suggest that the serotonin transporter gene may play a role in PD; however, the findings require replication. Future studies should attend to the entire genetic region rather than the promoter.
anxiety disorder; social phobia; association; SLC6A4; 5-HTTLPR; serotonin receptor 1A (HTR1A); catechol-o-methyltransferase (COMT); regulator of g-protein signalling; gastrin releasing peptide receptor (GRPR)
Several studies have identified increased medical problems among individuals with panic disorder (PD). We previously found that specific conditions— interstitial cystitis (IC), mitral valve prolapse (MVP), migraines, and thyroid disorders— aggregated non-randomly among panic families (we called this the “PD syndrome”), and that families with and without the syndrome were genetically distinguishable on chromosome 13. We present data from a new case-control study that replicates and extends the syndrome phenotype clinically.
Probands with a definite diagnosis and family history of PD (N= 219), social anxiety disorder (SAD; 199), or both (173), and 102 controls with no personal/family history of anxiety, were interviewed using the SADS-LA diagnostic instrument. Medical history was obtained via medical checklist and the family history screen; IC symptoms were assessed using criteria developed by the National Institute for Diabetes and Digestive and Kidney Diseases. Subjects and interviewers were unaware of the syndrome hypothesis; final best-estimate diagnoses were blind to syndrome data.
Probands with PD or SAD, as compared to controls, were five or more times as likely to report IC symptoms, and twice as likely to report MVP and migraines (other genitourinary and cardiovascular problems were not elevated). First-degree relatives of probands with PD or SAD were also at increased risk for IC, MVP, thyroid problems and headaches, regardless of whether the proband reported the same condition.
These findings are consistent with previous data supporting a PD syndrome, and further suggest that this syndrome may include other anxiety disorders well.
panic disorder; social anxiety disorder/social phobia; interstitial cystitis; mitral valve prolapse; migraines; headaches; chromosome 13
Rolandic epilepsy (RE) is the most common human epilepsy, affecting children between 3 and 12 years of age, boys more often than girls (3:2). Focal sharp waves in the centrotemporal area define the electroencephalographic (EEG) trait for the syndrome; are a feature of several related childhood epilepsies; and are freqently observed in common developmental disorders (e.g. speech dyspraxia, attention deficit hyperactivity disorder (ADHD) and developmental coordination disorder (DCD)). Here we report the first genome-wide linkage scan in RE for the EEG trait, centrotemporal sharp waves (CTS), with genomewide linkage of CTS to 11p13 (HLOD 4.30). Pure likelihood statistical analysis refined our linkage peak by fine-mapping CTS to variants in Elongator Protein Complex 4 (hELP4) in two independent datasets; the strongest evidence was with rs986527 in intron 9 of hELP4, providing a Likelihood Ratio of 629:1 (p=0.0002) in favor of an association. Resequencing of hELP4 coding, flanking and promoter regions revealed no significant exonic polymorphisms. This is the first report of a gene implicated in a common focal epilepsy and the first human disease association of hELP4. hELP4 is a component of the Elongator complex, involved in transcription and tRNA modification. Elongator depletion results in the brain-specific downregulation of genes implicated in cell motility and migration. We hypothesize that a non-coding mutation in hELP4 impairs brain-specific Elongator mediated interaction of genes implicated in brain development, resulting in susceptibility to seizures and neurodevelopmental disorders.
linkage; neurodevelopmental traits; centrotemporal spikes; attention deficit hyperactivity disorder; speech dyspraxia; developmental coordination disorder; association
We investigate the behavior of type I error rates in model-based multipoint (MP) linkage analysis, as a function of sample size (N). We consider both MP lods (i.e., MP linkage analysis that uses the correct genetic model) and MP mods (maximizing MP lods over 18 dominant and recessive models). Following Xing & Elston , we first consider MP linkage analysis limited to a single position; then we enlarge the scope and maximize the lods and mods over a span of positions. In all situations we examined, type I error rates decrease with increasing sample size, apparently approaching zero. We show: (a) For MP lods analyzed only at a single position, well-known statistical theory predicts that type I error rates approach zero. (b) For MP lods and mods maximized over position, this result has a different explanation, related to the fact that one maximizes the scores over only a finite portion of the parameter range.
The implications of these findings may be far-reaching: Although it is widely accepted that fixed nominal critical values for MP lods and mods are not known, this study shows that whatever the nominal error rates are, the actual error rates appear to decrease with increasing sample size. Moreover, the actual (observed) type I error rate may be quite small for any given study. We conclude that multipoint lod and mod scores provide reliable linkage evidence for complex diseases, despite the unknown limiting distributions of these multipoint scores.
Rolandic epilepsy (RE) is the most common human epilepsy, affecting children between 3 and 12 years of age, boys more often than girls (3:2). Focal sharp waves in the centrotemporal area define the electroencephalographic (EEG) trait for the syndrome, are a feature of several related childhood epilepsies and are frequently observed in common developmental disorders (eg, speech dyspraxia, attention deficit hyperactivity disorder and developmental coordination disorder). Here we report the first genome-wide linkage scan in RE for the EEG trait, centrotemporal sharp waves (CTS), with genome-wide linkage of CTS to 11p13 (HLOD 4.30). Pure likelihood statistical analysis refined our linkage peak by fine mapping CTS to variants in Elongator Protein Complex 4 (ELP4) in two independent data sets; the strongest evidence was with rs986527 in intron 9 of ELP4, providing a likelihood ratio of 629:1 (P=0.0002) in favor of an association. Resequencing of ELP4 coding, flanking and promoter regions revealed no significant exonic polymorphisms. This is the first report of a gene implicated in a common focal epilepsy and the first human disease association of ELP4. ELP4 is a component of the Elongator complex, involved in transcription and tRNA modification. Elongator depletion results in the brain-specific downregulation of genes implicated in cell motility and migration. We hypothesize that a non-coding mutation in ELP4 impairs brain-specific Elongator-mediated interaction of genes implicated in brain development, resulting in susceptibility to seizures and neurodevelopmental disorders.
linkage; neurodevelopmental traits; centrotemporal spikes; idiopathic partial epilepsy; association
Very few genetic associations for idiopathic epilepsy have been replicated and this has tempered enthusiasm for the results of genetic studies in epilepsy. What are the reasons for lack of replication? While type 1 error, population stratification, and multiple testing have been discussed extensively, the importance of genetic heterogeneity has been relatively neglected. In the first part of this review, we explore the sources of genetic heterogeneity and their importance for epilepsy genetic studies. In the second part, we review alternatives to the simple law of replication, revisiting Bradford Hill's guidelines for evidence of causality. A coherence perspective is applied to three examples. We conclude that adopting the perspective of integrating coherent and consistent evidence from different experimental approaches is a more appropriate requirement for proceeding to functional studies.
Genetics; Association studies; Epidemiology; Evidence; Causality; Genetic heterogeneity
Centrotemporal sharp (CTS) waves, the electroencephalogram (EEG) hallmark of rolandic epilepsy, are found in approximately 4% of the childhood population. The inheritance of CTS is presumed autosomal dominant but this is controversial. Previous studies have varied considerably in methodology, especially in the control of bias and confounding. We aimed to test the hypothesis of autosomal dominant inheritance of CTS in a well-designed family segregation analysis study.
Probands with rolandic epilepsy were collected through unambiguous single ascertainment. Siblings in the age range 4–16 years underwent sleep-deprived EEG; observations from those who remained awake were omitted. CTS were rated as present or absent by two indepen-dent observers blinded to the study hypothesis and subject identities. We computed the segregation ratio of CTS, corrected for ascertainment. We tested the segregation ratio estimate for consistency with dominant and recessive modes of inheritance, and compared the observed sex ratio of those affected with CTS for consistency with sex linkage.
Thirty siblings from 23 families under-went EEG examination. Twenty-three showed evidence of sleep in their EEG recordings. Eleven of 23 recordings demonstrated CTS, yielding a corrected segregation ratio of 0.48 (95% CI: 0.27–0.69). The male to female ratio of CTS affectedness was approximately equal.
The segregation ratio of CTS in rolandic epilepsy families is consistent with a highly penetrant autosomal dominant inheritance, with equal sex ratio. Autosomal recessive and X-linked inheritance are rejected. The CTS locus might act in combination with one or more loci to produce the phenotype of rolandic epilepsy.
Centrotemporal; Rolandic; Focal sharp wave; Epilepsy; Genetic; Family; EEG; Segregation
Associations between rolandic epilepsy (RE) with reading disability (RD) and speech sound disorder (SSD) have not been tested in a controlled study. We conducted a case–control study to determine whether (1) RD and SSD odds are higher in RE probands than controls and (2) an RE proband predicts a family member with RD or SSD, hence suggesting a shared genetic etiology for RE, RD, and SSD.
Unmatched case–control study with 55 stringently defined RE cases, 150 controls in the same age range lacking a primary brain disorder diagnosis, and their siblings and parents. Odds ratios (OR) were calculated by multiple logistic regression, adjusted for sex and age, and for relatives, also adjusted for comorbidity of RD and SSD in the proband.
RD was strongly associated with RE after adjustment for sex and age: OR 5.78 (95% CI: 2.86–11.69). An RE proband predicts RD in family members: OR 2.84 (95% CI: 1.38–5.84), but not independently of the RE proband's RD status: OR 1.30 (95% CI: 0.55–12.79). SSD was also comorbid with RE: adjusted OR 2.47 (95%CI: 1.22–4.97). An RE proband predicts SSD in relatives, even after controlling for sex, age and proband SSD comorbidity: OR 4.44 (95% CI: 1.93–10.22).
RE is strongly comorbid with RD and SSD. Both RD and SSD are likely to be genetically influenced and may contribute to the complex genetic etiology of the RE syndrome. Siblings of RE patients are at high risk of RD and SSD and both RE patients and their younger siblings should be screened early.
Phonologic disorder; Articulation disorder; Speech delay; Developmental dysphasia; Developmental dyslexia; Centrotemporal sharp waves; Complex genetic; Familial aggregation; Comorbidity; Cognitive deficit; Family study
The Genetic Analysis Workshop 14 simulated dataset was designed 1) To test the ability to find genes related to a complex disease (such as alcoholism). Such a disease may be given a variety of definitions by different investigators, have associated endophenotypes that are common in the general population, and is likely to be not one disease but a heterogeneous collection of clinically similar, but genetically distinct, entities. 2) To observe the effect on genetic analysis and gene discovery of a complex set of gene × gene interactions. 3) To allow comparison of microsatellite vs. large-scale single-nucleotide polymorphism (SNP) data. 4) To allow testing of association to identify the disease gene and the effect of moderate marker × marker linkage disequilibrium. 5) To observe the effect of different ascertainment/disease definition schemes on the analysis. Data was distributed in two forms. Data distributed to participants contained about 1,000 SNPs and 400 microsatellite markers. Internet-obtainable data consisted of a finer 10,000 SNP map, which also contained data on controls. While disease characteristics and parameters were constant, four "studies" used varying ascertainment schemes based on differing beliefs about disease characteristics. One of the studies contained multiplex two- and three-generation pedigrees with at least four affected members. The simulated disease was a psychiatric condition with many associated behaviors (endophenotypes), almost all of which were genetic in origin. The underlying disease model contained four major genes and two modifier genes. The four major genes interacted with each other to produce three different phenotypes, which were themselves heterogeneous. The population parameters were calibrated so that the major genes could be discovered by linkage analysis in most datasets. The association evidence was more difficult to calibrate but was designed to find statistically significant association in 50% of datasets. We also simulated some marker × marker linkage disequilibrium around some of the genes and also in areas without disease genes. We tried two different methods to simulate the linkage disequilibrium.
Variants associated with meconium ileus in cystic fibrosis (CF) were identified in 3,763 patients by GWAS. Five SNPs at two loci near SLC6A14 (min P=1.28×10−12 at rs3788766), chr Xq23-24 and SLC26A9 (min P=9.88×10−9 at rs4077468), chr 1q32.1 accounted for ~5% of the phenotypic variability, and were replicated in an independent patient collection (n=2,372; P=0.001 and 0.0001 respectively). By incorporating that disease-causing mutations in CFTR alter electrolyte and fluid flux across epithelia into an hypothesis-driven genome-wide analysis (GWAS-HD), we identified the same SLC6A14 and SLC26A9 associated SNPs, while establishing evidence for the involvement of SNPs in a third solute carrier gene, SLC9A3. In addition, GWAS-HD provided evidence of association between meconium ileus and multiple constituents of the apical plasma membrane where CFTR resides (P=0.0002, testing 155 apical genes jointly and replicated, P=0.022). These findings suggest that modulating activities of apical membrane constituents could complement current therapeutic paradigms for cystic fibrosis.
It is generally presumed that the Cystic Fibrosis (CF) population is relatively homogeneous, and predominantly of European origin. The complex ethnic make-up observed in the CF patients collected by the North American CF Modifier Gene Consortium has brought this assumption into question, and suggested the potential for population substructure in the three CF study samples collected from North America. It is well appreciated that population substructure can result in spurious genetic associations.
To understand the ethnic composition of the North American CF population, and to assess the need for population structure adjustment in genetic association studies with North American CF patients.
Genome-wide single-nucleotide polymorphisms on 3076 unrelated North American CF patients were used to perform population structure analyses. We compared self-reported ethnicity to genotype-inferred ancestry, and also examined whether geographic distribution and CFTR mutation type could explain the structure observed.
Although largely Caucasian, our analyses identified a considerable number of CF patients with admixed African-Caucasian, Mexican-Caucasian and Indian-Caucasian ancestries. Population substructure was present and comparable across the three studies of the consortium. Neither geographic distribution nor mutation type explained the population structure.
Given the ethnic diversity of the North American CF population, it is essential to carefully detect, estimate and adjust for population substructure to guard against potential spurious findings in CF genetic association studies. Other Mendelian diseases that are presumed to predominantly affect single ethnic groups may also benefit from careful analysis of population structure.
ethnicity; principal component analysis; population substructure; population stratification
Cystic fibrosis (CF) is a monogenic disease due to mutations in the CFTR gene. Yet, variability in CF disease presentation is presumed to be affected by modifier genes, such as those recently demonstrated for the pulmonary aspect. Here, we conduct a modifier gene study for meconium ileus (MI), an intestinal obstruction that occurs in 16–20% of CF newborns, providing linkage and association results from large family and case–control samples. Linkage analysis of modifier traits is different than linkage analysis of primary traits on which a sample was ascertained. Here, we articulate a source of confounding unique to modifier gene studies and provide an example of how one might overcome the confounding in the context of linkage studies. Our linkage analysis provided evidence of a MI locus on chromosome 12p13.3, which was segregating in up to 80% of MI families with at least one affected offspring (HLOD = 2.9). Fine mapping of the 12p13.3 region in a large case–control sample of pancreatic insufficient Canadian CF patients with and without MI pointed to the involvement of ADIPOR2 in MI (p = 0.002). This marker was substantially out of Hardy–Weinberg equilibrium in the cases only, and provided evidence of a cohort effect. The association with rs9300298 in the ADIPOR2 gene at the 12p13.3 locus was replicated in an independent sample of CF families. A protective locus, using the phenotype of no-MI, mapped to 4q13.3 (HLOD = 3.19), with substantial heterogeneity. A candidate gene in the region, SLC4A4, provided preliminary evidence of association (p = 0.002), warranting further follow-up studies. Our linkage approach was used to direct our fine-mapping studies, which uncovered two potential modifier genes worthy of follow-up.