|Home | About | Journals | Submit | Contact Us | Français|
Cystic fibrosis (CF), the most common lethal single gene disorder in Caucasians, is due to mutations in the CFTR gene. Twin and sibling analysis indicates that modifier genes, rather than allelic variation in CFTR, are responsible for most of the variability in severity of lung disease, the major cause of mortality in CF patients. We used a family-based approach to test for association between lung function and two functional SNPs (rs1800469, ‘−509’ and rs1982073, ‘codon 10’) in the 5′ region of transforming growth factor-beta1 (TGFB1), a putative CF modifier gene. Quantitative transmission disequilibrium testing of 472 CF patient–parent–parent trios revealed that both TGFB1 SNPs showed significant transmission distortion when patients were stratified by CFTR genotype. Although lung function and nutritional status are correlated in CF patients, there was no evidence of association between the TGFB1 SNPs and variation in nutritional status. Additional tagging SNPs (rs8179181, rs2278422, rs8110090, rs4803455 and rs1982072) that capture most of the diversity in TGFB1 were also typed but none showed association with variation in lung function. However, a haplotype composed of the −509 C and codon 10 T alleles along with the C allele of the 3′ SNP rs8179181 was highly associated with increased lung function in patients grouped by CFTR genotype. These results demonstrate that TGFB1 is a modifier of CF lung disease and reveal a previously unrecognized beneficial effect of TGFB1 variants upon the pulmonary phenotype.
Cystic fibrosis (CF [MIM 602421]) is a common autosomal recessive genetic disorder caused by mutations in the CF transmembrane conductance regulator (CFTR) gene. There is a high degree of phenotypic variability in CF, especially in lung disease, the major cause of morbidity and mortality. Differences in lung disease severity are not entirely explained by CFTR genotype, as illustrated by the great extent of variability in patients homozygous for the most common mutation, ΔF508 (1,2). Environmental factors such as infection with bacterial pathogens contribute to lung disease variation (3). However, affected twins and siblings demonstrate that variation in lung disease severity also has a strong genetic component, with heritability estimated at 0.6–0.8 (4). These observations indicate that modifier genes make a substantial contribution to CF lung disease, independent of CFTR genotype.
The moderate degree of similarity of CF lung disease severity between siblings suggests that a number of modifier genes are operating. These genes may individually be of small effect, requiring multiple replications to substantiate their relevance to CF lung disease. Indeed, numerous modifier genes have been examined, but few have withstood the test of replication (5). The most compelling biological candidate is transforming growth factor-beta1 (TGFB1). The TGFB1 gene product is a secreted protein with numerous functions including involvement in cellular growth and differentiation, inflammation, and tissue fibrosis (6–8). Although several case–control studies have investigated TGFB1 as a modifier of CF lung disease, the results have been conflicting (9–12). Discrepancies among these case–control studies may be due to population stratification, confounding of the studied trait by other phenotypes, interaction among genes, undetected variants in the candidate gene that affect lung function or differences in power. The US CF Twin and Sibling Study has recruited twins and siblings affected with CF and their parents to evaluate association between candidate modifier genes and variation in lung function. Use of these families permits the employment of transmission-based methods that avoid spurious associations due to population stratification. Patients were recruited only on the condition that they had a living sibling with CF, thereby removing potential bias inherent in recruiting based on extreme phenotypes and facilitating the analysis of the confounding effect of one trait upon another. Additionally, patients bearing all CFTR genotypes were recruited, allowing the investigation of gene–gene interaction between modifier genes and CFTR. Whenever possible, the CF Twin and Sibling Study enrolled both parents of each patient to facilitate the construction of haplotypes and the search for occult variation in candidate genes that modify CF lung disease. Finally, the unfortunate relative commonness of CF and the presence of a highly organized nationwide CF care system enabled recruitment of hundreds of families, thus providing reasonable power for transmission-based association studies.
Patients in this study were similar for age, CFTR genotype distribution, body mass index (BMI) and sex ratio compared with CF patients in the US CF Patient Registry (Table 1) (13). Since lung disease is progressive in CF patients, pulmonary function measures of the forced expiratory volume in 1 s (FEV1) were converted into disease-specific percentiles (FEV1CF%) to facilitate comparisons among patients of different ages and sex (14). The mean percentiles of the CF-specific lung function measures were higher in our cohort than in unrelated patients (i.e. >0.5); however, the entire spectrum of disease severity was represented. Quantitative transmission disequilibrium tests (QTDT) were performed on patient–parent–parent trios to evaluate association between polymorphisms in TGFB1 and lung function in CF patients. QTDT tests for transmission distortion, or a skewing of the expected 50/50 ratio of alleles transmitted from heterozygous parents to offspring, across the spectrum of a quantitative trait. Though Z-scores of the cross-sectional (MaxFEV1CF%) and longitudinal (AvgFEV1CF%) lung function measures were employed, the results were essentially identical when non-transformed traits were used (data not shown). Two SNPs in TGFB1 (rs1800469, ‘−509’ and rs1982073, ‘codon 10’) that have shown variable association with lung function in case–control studies were typed in 472 trios (genotype frequency data provided in Supplementary Material, Table S1). The −509 SNP showed transmission distortion in patients stratified by the cross-sectional measure (P = 0.011, achieving study-wide statistical significance) while transmission distortion of the codon 10 SNP approached significance (P = 0.059; Table 2). No difference in the parental origin of transmitted codon 10 alleles was observed, in contrast to the report by Becker et al. (15) (data not shown). These studies confirm that TGFB1 is a modifier of lung function in patients recruited by the CF Twin and Sibling Study.
To determine if CFTR genotype influenced TGFB1 effect upon lung function, patients were divided into two groups of nearly equal size: those who were homozygous for the common ΔF508 mutation (‘ΔF508 homozygotes’) and all other genotypes (‘non-ΔF508 homozygotes’). Eight patients with unknown CFTR genotype were excluded from CFTR-genotype-specific analyses. The ΔF508 homozygote and non-ΔF508 homozygote groups had very similar means and standard deviations for cross-sectional (0.68 ± 0.28, 0.69 ± 0.26, respectively) and longitudinal lung function (0.59 ± 0.24, both groups). When stratified by CFTR genotype it became apparent that non-ΔF508 homozygotes were predominantly responsible for the observed transmission distortion of both SNPs (Table 2). The −509 SNP demonstrated study-wide significant association with the cross-sectional and the longitudinal measure of lung function in non-ΔF508 homozygotes. Association between the codon 10 SNP and the cross-sectional measure of lung function also achieved study-wide significance, and almost attained study-wide significance for the longitudinal measure of lung function in non-ΔF508 homozygotes (Table 2).
Nutritional status, as measured by BMI, is known to be correlated with lung function measures in CF patients (16,17). As expected, the average of Z-scores for BMI between ages 2–20 (AvgBMIZ) was correlated with both cross-sectional and longitudinal lung function measures (Pearson's R = 0.36 and 0.44, respectively; P < 0.0001). QTDT performed in 457 trios in which both pulmonary and BMI data were available revealed that neither TGFB1 SNP showed association with AvgBMIZ (data not shown). To further examine whether TGFB1 alleles were associated with variation in lung function or a composite of both lung function and BMI, we adjusted for the confounding effect of BMI upon lung function using linear regression. Association between the TGFB1 SNPs and the cross-sectional and longitudinal pulmonary traits adjusted for BMI (see Materials and Methods) was analyzed by QTDT. The results were nearly identical to those for the unadjusted lung function measures, demonstrating that variation in nutritional status did not account for the association between TGFB1 alleles and lung function (Table 2, ‘BMI-adjusted’ columns). As noted previously, transmission distortion of both the −509 and codon 10 SNPs was present only in non-ΔF508 homozygotes.
To determine if certain combinations of variants in or near the TGFB1 gene (i.e. haplotypes) were associated with lung function measures, an additional eight SNPs (rs8179181, rs28730295, rs2278422, rs8110090, rs11466334, rs1800472, rs4803455, rs1982072) were typed in 445 trios with pulmonary data. The allele and genotype frequencies observed in our population are similar to the frequencies reported in other Caucasian subjects (data provided in Supplementary Material, Table S1). A monomorphic SNP (rs28730295) and two SNPs with minor allele frequency (MAF) below 0.05 (rs11466334, rs1800472) were dropped from further analysis. None of these additional SNPs were individually associated with lung function measures in CF patients (data not shown), except for rs1982072, which was in almost complete linkage disequilibrium (LD) with the −509 SNP (r2 = 1.0, D′ = 1.0).
Screening tools within PBAT (pedigree-based association testing) software (18) were used to select haplotypes of varying lengths and combinations of seven SNPs (−509, codon 10, rs8179181, rs2278422, rs8110090, rs4803455 and rs1982072) that were most likely to show association with the cross-sectional measure of lung function. Haplotypes comprising −509 and codon 10 showed the highest power, especially in the presence of an additional SNP in intron 5 (rs8179181). Although the −509 and codon 10 SNPs were in strong LD with each other (r2 = 0.65, D′ = 0.99), both SNPs were in low LD with the intron 5 SNP (r2 < 0.01; D′ = 0.10, 0.22; respectively). The latter finding suggested that the improved power upon inclusion of the intron 5 SNP is not due to a high degree of ancestral correlation with the −509 and codon 10 SNPs, but that specific haplotypes composed of all three SNPs have greater association with variation in lung function than any SNP alone.
To test the above hypothesis, haplotypes composed of two SNPs (−509 and codon 10) and three SNPs (−509, codon 10 and intron 5; haplotype frequencies provided in Supplementary Material, Table S2) were tested for association with the cross-sectional lung function measure using QTDT and a second method, Family-Based Association Testing (FBAT). In the group of ‘All CFTR genotypes’, the −509 T–codon 10 C (‘T–C’) haplotype was significantly under-transmitted to patients with increasing lung function in both analyses (i.e. associated with more severe lung disease as indicated by the negative Z-statistic in FBAT; Table 3). The TGFB1 haplotypes also demonstrated interaction with CFTR genotype, as was observed for individual SNPs. In non-ΔF508 homozygotes, the T–C haplotype was under-transmitted with increasing lung function, as noted for the entire group. Intriguingly, when the T–C haplotype was divided into two 3-SNP haplotypes based on the allele present at intron 5, the −509 T–codon 10 C–intron 5 C (‘T–C–C’) haplotype was associated with decreased lung function, while the other haplotype, −509 T–codon 10 C–intron 5 T (‘T–C–T’), showed no transmission distortion, even though the latter haplotype also contains a T at −509 and a C at codon 10. On the other hand, the −509 C–codon 10 T (‘C–T’) haplotype was over-transmitted (i.e. associated with milder lung disease) in non-ΔF508 homozygotes, as shown by the positive Z-statistic. When the intron 5 SNP was included, the three-SNP haplotype −509 C–codon 10 T–intron 5 C (‘C–T–C’) was over-transmitted to non-ΔF508 homozygous patients with improved lung function in both analyses while the −509 C–codon 10 T–intron 5 T (‘C–T–T’) haplotype showed no transmission distortion (Table 3). Thus, haplotype transmission analysis provides further evidence that the association of genetic variants in TGFB1 with CF lung function is better defined by haplotypes rather than individual alleles.
To assess the relative magnitude of the effect of individual TGFB1 haplotypes on the cross-sectional measure of lung function, we performed linear regression analysis under the assumption of additive, dominant and recessive modes of inheritance (MOIs). A measure of goodness-of-fit [Akaike's information criterion (AIC)] was calculated to compare models (19). For both univariate and multivariate regression, models which coded haplotypes according to additive or dominant MOIs had better fits than a model employing a recessive MOI. Since the resulting association estimates were qualitatively similar in both additive and dominant scenarios, only the latter results are shown. Results derived from the use of additive or recessive MOIs are available upon request. In the univariate analysis, only the C–T–C haplotype had a significant relationship with lung function in the complete group of patients (‘All CFTR genotypes’; Table 4). The positive value of the regression coefficient (β) associated with the C–T–C haplotype indicates that this haplotype is associated with better lung function in CF patients. In contrast, the C–T–T haplotype demonstrated no association with variation in lung function, despite sharing with the C–T–C haplotype a C at −509 and a T at codon 10 (Table 4). Furthermore, the regression coefficient associated with the C–T–T haplotype has a negative value, as opposed to the positive value for the C–T–C haplotype. In non-ΔF508 homozygous patients, the C–T–C haplotype was associated with an increase in lung function of 15.9 percentile points, while the T–C–C haplotype was associated with a decrease in lung function of 11.9 percentile points. Neither the C–T–T haplotype nor the T–C–T haplotype demonstrated association with variation in lung function, despite having the same alleles at −509 and codon 10 as the C–T–C and T–C–C haplotypes, respectively (Table 4).
To verify that the magnitude of the effect of the C–T–C haplotype (15.9 percentile points) was greater than the effect of individual SNPs, we performed linear regression on single alleles in 232 non-ΔF508 homozygotes. As expected, the regression coefficients were lower, at 12.0 (P = 0.045) and 13.2 percentile points (P = 0.009) for the −509 C and codon 10 T alleles, respectively. Additionally, the −509 C–codon 10 T (C–T) haplotype increased lung function by only 12.0 percentile points (P = 0.019, n = 218). These results demonstrate that the ‘C’ allele of the intron 5 SNP parses the ancestral haplotype block of −509 and codon 10 SNPs into 3-SNP haplotypes that show enhanced effect upon lung function.
The use of univariate regression described earlier allowed us to evaluate the contribution of individual haplotypes to variation in lung disease; however, this modeling could not account for the confounding effect of one haplotype upon another. For example, association of the C–T–C haplotype with increased lung function might simply be reflecting an absence of the T–C–C haplotype. Indeed, as these are the two most common haplotypes in our population, patients who do not carry the former haplotype are likely to carry the latter haplotype. To address this issue, we performed multivariate linear regression to determine the effect of TGFB1 C–T–C and T–C–C haplotypes (the two haplotypes that showed association by univariate analysis) upon lung function while simultaneously accounting for the effects of other haplotypes. When the relationship between lung function and both the C–T–C and the T–C–C haplotypes was modeled in non-ΔF508 homozygotes, the C–T–C haplotype continued to have a positive influence on CF lung disease while the T–C–C haplotype no longer showed a relationship with lung function (Table 4). Patients who carried the former haplotype had an estimated increase in lung function of 12.6 percentile points compared to patients who did not carry this haplotype. This analysis implies that non-ΔF508 homozygote CF patients with at least one C–T–C haplotype will have better lung function than patients carrying any other combination of the three SNPs that comprise this haplotype (i.e. T–C–C, T–C–T, C–T–T and C–C–C).
To confirm this conclusion, we plotted and compared the cross-sectional lung function of patients bearing different combinations of TGFB1 haplotypes (Fig. 1).
Non-ΔF508 homozygous patients carrying at least one C–T–C haplotype had significantly higher median lung function (MaxFEV1CF% = 0.84, n = 106) than patients carrying the T–C–C haplotype (MaxFEV1CF% = 0.58, n = 51; P < 0.0001, Mann–Whitney U test). Of particular note, patients bearing both the C–T–C and T–C–C haplotypes (i.e. C–T–C/T–C–C heterozygotes) had higher lung function (median MaxFEV1CF% = 0.79, n = 42) compared with those carrying the T–C–C haplotype in the absence of the C–T–C haplotype (P = 0.006, Mann–Whitney U test). This demonstrates the positive influence of the C–T–C haplotype on lung function and argues against a negative influence of the T–C–C haplotype. There was no difference in lung function between C–T–C/T–C–C heterozygotes and patients bearing a C–T–C haplotype in trans with a non-T–C–C haplotype. These results were unchanged when patients carrying no C–T–C or T–C–C haplotypes (i.e. C–T–T, T–C–T, C–C–C; n = 14) were also included in the group of T–C–C-bearing patients (data not shown). Thus, the C–T–C haplotype manifests a dominant positive effect on lung function in non-ΔF508 homozygotes.
Using family-based transmission analysis, we have shown that variation in the TGFB1 gene is associated with variability in lung function in CF patients. More importantly, our study design facilitated the discovery of three new attributes of the modifier effect of TGFB1. First, our analyses revealed interaction between TGFB1 and CFTR, as the effect of variation in TGFB1 on lung function was primarily observed in patients who were not homozygous for ΔF508. Secondly, variants in TGFB1 were shown to modify lung function and not nutritional status, despite strong correlation between these two traits. Thirdly, transmission and regression analyses revealed a TGFB1 haplotype that is associated with better lung function, and hence mild lung disease, an observation of substantial therapeutic potential.
Several studies have demonstrated association between SNPs in TGFB1 and variation in CF lung disease, while others have not. There are a number of key differences among these studies including design, power, definition of the pulmonary phenotype and assignment of affection status. The best-powered case–control study to date defined severe and mild lung disease as having forced expiratory volume in 1 s (FEV1) measurements in the lowest or highest quartiles for age, respectively. Of 808 ΔF508 homozygous patients, those with severe lung disease were two times more likely to have the −509 TT and codon 10 CC genotypes than those with mildly impaired lung function. The authors replicated this finding in a second population of 498 CF patients, of whom 70% were ΔF508 homozygotes and 30% had other ‘severe’ CFTR genotypes associated with poorer clinical outcomes. Interestingly, this association was not observed in the sub-group made up of ΔF508 homozygotes only, suggesting that TGFB1 variants had greater modifying capacity in non-ΔF508 homozygotes as seen in our population. In contrast, Arkwright et al. (9) showed in 171 ΔF508 homozygotes that the codon 10 TT genotype was associated with an earlier decline in lung function. Brazova et al. (12) did not see association of TGFB1 SNPs with variation in lung function in a study of 118 Czech CF patients, of whom about half were ΔF508 homozygotes and half had other ‘severe’ CFTR genotypes, and 268 control subjects. More recently, a study examined the rate of decline in lung function in 511 Canadian CF patients stratified by codon 10 genotype. Though a significant difference in the rate of decline was observed between the three genotype groups (CC, CT, TT), the pattern was not entirely consistent with previous reports as this study found that codon 10 heterozygotes (CT) had the smallest annual decline in lung function (20). In the only other family-based study, transmission distortion of −509 and codon 10 alleles at TGFB1 was not observed in 34 pairs of extreme concordant and discordant ΔF508 homozygote siblings using a composite measure of lung function and BMI. The current study of 472 trios demonstrated that the −509 C and codon 10 T alleles had a beneficial modifier effect on cross-sectional and longitudinal measures of lung function based upon FEV1. From our results we infer that the absence of the −509 C and codon 10 T alleles should be associated with reduced lung function in CF patients. Indeed, the case–control study of more than 1300 unrelated patients showed that the presence of the alternate alleles (−509 T and codon 10 C) was associated with lower FEV1, fundamentally the same measure of lung function used in this study. Thus, the results of our family-based study are consistent with the association observed in the the case–control study conducted by Drumm et al. (10).
The observations of the current study differ from all previous association studies in that the modifier effect of TGFB1 was dependent upon CFTR genotype. To reduce genetic heterogeneity at the CFTR locus, prior studies primarily tested for association of TGFB1 with lung function in patients who were homozygous for the most common CF mutation, ΔF508. We detected association in patients who were not homozygous for ΔF508, but association was not observed in ΔF508 homozygotes. This disparity does not appear to be a function of power since the two groups had nearly identical numbers of patients and had similar means and variances in lung function. We favor the concept that the magnitude of the modifier effect conferred by TGFB1 is smaller in ΔF508 homozygotes than in CF patients carrying other CFTR mutations. By analyzing ΔF508 homozygous patients with extreme phenotypes, the study by Drumm et al. (10) had 80% power to detect modifiers that altered lung function by 0.7%. The 232 trios homozygous for ΔF508 in our study had only 11% power to detect the same change in lung function. However, this number of trios did have reasonable power (80%) to detect modifiers that account for ~11% of the total variance in lung function, an effect size comparable with what was observed in the non-ΔF508 homozygous trios. These findings suggest that having two copies of ΔF508 creates a unique lung disease environment that is less responsive to variation in TGFB1. On the other hand, other CFTR mutations may lead to lung pathology that is more amenable to alteration by modifiers such as TGFB1. The latter concept is supported by the observation that the modifier effect of the glutamate–cysteine ligase catalytic subunit (GCLC) gene upon severity of lung disease could be observed only in CF patients with ‘mild’ CFTR alleles (21).
Patients were enrolled in the CF Twin and Sibling Study based on having a living sibling with CF. Since patients were drawn from the entire spectrum of CF phenotypes, we were able to test for association of TGFB1 alleles with traits that are correlated with lung function, such as nutritional status. Due to the ascertainment of only patients with extreme phenotypes, prior studies could not distinguish with certainty whether TGFB1 variants were associated with lung disease severity, nutritional status or both. The absence of association with a longitudinal measure of nutritional status and the presence of association with lung function after adjustment for differences in nutritional status indicates that variation in TGFB1 primarily modifies CF lung disease. This observation suggests that the lung should be the focus of studies examining the mechanism of the TGFB1 modifier effect in CF.
Haplotype analysis is a powerful tool for discovering causal alleles that are highly associated with typed markers and for determining whether combinations of alleles lead to a greater effect upon phenotype than is caused by individual alleles. While single alleles have been studied as modifiers in CF, TGFB1 haplotypes have not previously been explored. The TGFB1 intron 5 SNP (rs8179181) by itself was not associated with variation in lung function in this study. However, when the C allele of the intron 5 SNP occurred on the same haplotype as −509 C and codon 10 T, this 3-SNP haplotype (C–T–C) was shown to correlate with improved lung function in non-ΔF508 homozygous patients. The intron 5 SNP has no functional role of which we are aware; thus we propose that the −509 C–codon 10 T–intron 5 C haplotype contains a variant (or variants) that modulates TGFB1 expression or function such that a protective outcome is conferred on CF lungs. Another possible mechanism is that the codon 10 T and/or −509 C alleles in combination with an additional variant on this haplotype are necessary to ameliorate CF lung disease. The requirement of patients in this study to have a surviving affected sibling probably accounts for the increased mean of cross-sectional and longitudinal lung function measures compared with the CF population mean (Table 1). The bias of the patients in this study toward better lung function may have aided in the detection of the ‘mild’ TGFB1 C–T–C haplotype.
Variation in TGFB1 has been linked to several chronic pulmonary disorders in addition to CF. For example, the −509 TT genotype was shown to associate with the severity and diagnosis of asthma (22,23), similar to the observations in this study of CF. In contrast, the −509 T and codon 10 C alleles were found to be protective against chronic obstructive pulmonary disease (24–26). These observations suggest that the mechanism of TGFB1 action upon lung function is context-specific. The TGFB1 −509 T and codon 10 C alleles have been linked to higher gene and protein expression than the alternate alleles at these loci (27–32). TGFB1 is known to promote fibrogenesis by stimulating extracellular matrix production (33) and by inhibiting matrix degradation (34). Alleles that decrease levels or activity of TGFB1 may be predicted to stimulate an appropriate balance between tissue repair and fibrosis that improves lung function in CF patients. Our discovery that TGFB1 variants associate with mild lung disease in a subset of CF patients (non-ΔF508 homozygotes) provides new opportunity to test the aforementioned prediction and to possibly develop therapeutics that retard progression of this life-limiting feature of CF.
CF twins and siblings (n = 617) and their parents (n = 606) from 303 families were recruited by the CF Twin and Sibling Study as previously described. Of these families, 88.4% had two children old enough to perform pulmonary function testing, 7.6% had three children and 4.0% had one child. Sixteen dizygous and thirteen monozygous (MZ) twin pairs were included. Blood samples were obtained from patients and parents for standard DNA phenol/chloroform extraction. Raw pulmonary function test data, CFTR genotypes and height and weight measurements were obtained from medical records. In some cases in which genotypes were unavailable, CFTR exons were sequenced to identify mutations. Written informed consent or assent was obtained from all subjects. Only families in which both parents of the patients were available (i.e. complete trios) were included in the present study. Analyses were conducted using data from all patients, as well as using data from subgroups of patients categorized by CFTR genotype: those homozygous for the ΔF508 mutation (ΔF508 homozygotes) and those bearing all other genotypes (non-ΔF508 homozygotes).
The forced expiratory volume in 1 s (FEV1), a lung function measure that is highly correlated with survival in CF patients (35,36), was used to derive cross-sectional (MaxFEV1CF%) and longitudinal (AvgFEV1CF%) measures, as previously described. All patients in this study with a longitudinal lung function measure also had a cross-sectional measure. However, 142 patients were too young to have sufficient data for the longitudinal measure. Of the MZ twin pairs in which both twins had pulmonary data, ten pairs had MaxFEV1CF% and four pairs had AvgFEV1CF%. To include as many subjects as possible and to avoid randomly excluding one member of each pair, lung function measures were averaged for MZ twin pairs and included in analyses only if the twins' values were within 10 percentiles of each other, as not to double-count genetically identical individuals. For MZ twin pairs in which only one of the twins had pulmonary data, that twin's data was included. The average of Z-scores for BMI between ages 2 and 20 (AvgBMIZ) was used as a longitudinal marker of nutritional status. Pulmonary phenotypes were adjusted for nutritional status by regressing lung function measures on AvgBMIZ, and then for each individual the product of the regression coefficient and AvgBMIZ was subtracted from the lung function measure.
Two SNPs in TGFB1, c.−1347C>T (rs1800469, −509) and c.29T>C (rs1982073, codon 10), were chosen based on the findings of a recent study on modifiers of CF lung disease. These SNPs were genotyped in all individuals using TaqMan Assays-on-Demand and Assays-by-Design, respectively (Applied Biosystems, Foster City, CA). Reactions were performed in 384-well plates in a total reaction volume of 10 µl with 10 ng of template DNA in a Bio-Rad iCycler thermal cycler. Quality control samples were included on each plate. Endpoint fluorescence readings were obtained using an ABI PRISM 7900HT sequence detection system (SDS; Applied Biosystems). Genotype-calling was conducted using SDS v.2.1 software and inheritance checking was performed by SIB-PAIR v.0.99.9 (http://www2.qimr.edu.au/davidD) (37). In addition, −509 genotypes were verified using allele-specific oligonucleotide linear arrays (Roche Molecular Systems, Alameda, CA). The discrepancy rate between these two methods was 0.91%. Unclear or erroneous genotypes were either repeated by TaqMan or sequenced using the BigDye Terminator v.3.1 Cycle Sequencing Kit on an ABI 3100 sequencer (Applied Biosystems).
Tagger (http://www.broad.mit.edu/mpg/tagger) (38), the tag SNP selection algorithm implemented in Haploview (http://www.broad.mit.edu/mpg/haploview) (39), was used to select a minimal set of tag SNPs that adequately represented the genetic diversity within the TGFB1 gene region, based on patterns of LD found in the HapMap CEPH data (40) and Perlegen Caucasian data. The data source at the time of SNP selection was HapMap data release 20/phaseII Jan06 on the NCBI build 35 assembly, dbSNP build 125. Criteria for selection of tag SNPs were an r2 >0.8 with the untyped SNP, an Illumina design score >0.6 and an inter-SNP spacing of no less than 60 bp. Eight tag SNPs (rs8179181, rs28730295, rs2278422, rs8110090, rs11466334, rs1800472, rs4803455, rs1982072) within and 5 kb upstream of the TGFB1 gene were genotyped in 445 trios with pulmonary data using Illumina BeadArray technology (Illumina, San Diego, USA). Allele and genotype frequencies are provided in Supplementary Material, Table S1.
Genotype distributions were tested for Hardy–Weinberg equilibrium using the ‘–unrelatedsOnly’ option in PEDSTATS v.0.6.6 (http://www.sph.umich.edu/csg/abecasis/Pedstats) (41), which performs an exact test in a subset of unrelated individuals, as to avoid bias from correlated genotypes within families. General statistics, linear regression, Mann–Whitney U tests and t-tests were performed in Intercooled Stata 8 (StataCorp, College Station, TX). Because correlation among sibling marker genotypes may invalidate the results of family-based tests of association in the presence of linkage, we tested for linkage of MaxFEV1CF% to the TGFB1 gene region on chromosome 19. Single- and multipoint parametric linkage analysis was performed using all SNPs with MAF > 0.05 and two previously typed short-tandem repeat markers downstream of TGFB1 (D19S400, D19S718) using Sequential Oligogenic Linkage Analysis Routines (SOLAR v.4.0.7; http://www.sfbr.org/solar). No linkage of the cross-sectional measure of lung function to this region was found in CF patients.
Quantitative transmission disequilibrium testing (QTDT v.2.5.0; http://www.sph.umich.edu/csg/abecasis/QTDT) (42) was used to perform family-based tests of LD. The orthogonal model implemented in QTDT was adopted to test for association. To account for multiple testing in the presence of linked polymorphisms, empirical P-values were calculated from 1000 Monte-Carlo permutations using the ‘-m’ option. Though single-test P-values are reported, those meeting the threshold for a global empirical significance level of 0.05 are denoted by bold font.
Data analysis tools for continuous traits implemented within PBAT software (http://www.biostat.harvard.edu/~clange/default.htm) were used to estimate the power of the data set to detect an association between TGFB1 haplotypes and MaxFEV1CF% using the conditional mean model (43) while simultaneously minimizing the number of tests for which to correct. Haplotypes with >80% power (at α = 0.05) were analyzed by QTDT and by a second method, the FBAT program (http://www.biostat.harvard.edu/~fbat/fbat.htm) (44). Single-test P-values meeting the threshold for a global empirical significance level of 0.05 are denoted by bold font and P-values achieving significance after Bonferroni correction are marked with an asterisk. Because QTDT does not internally generate haplotypes, PHASE v.2.1 (http://stephenslab.uchicago.edu/software.html) (45) was used to construct haplotypes from the trio data via the ‘-P1’ option. Patients with recombinant or ambiguous haplotypes were excluded. The constructed haplotypes were treated as individual alleles in QTDT. The ‘hbat’ command implemented within FBAT was employed to construct and test haplotypes comprising specified SNPs. The ‘-p’ option in FBAT was employed to compute empirical P-values from Monte-Carlo permutations and also to perform the ‘minimal p test’, which calculates the significance of the smallest P-value. Since the family-based association tests employed (QTDT, PBAT and FBAT) have optimal power when traits are normally distributed, all quantitative phenotypes were also ranked and converted to Z-scores using an inverse normal transformation.
For univariate and multivariate linear regression of the cross-sectional lung function measure (MaxFEV1CF%) on TGFB1 variants, alleles or haplotypes were coded in an additive, dominant or recessive fashion to determine the most likely MOI. For the various models, alleles or haplotypes were coded as follows: additive, 0 (zero copies of the allele/haplotype present), 1 (one copy, i.e. heterozygous) or 2 (i.e. homozygous); dominant, 0 (zero copies) or 1 (at least one copy); recessive, 0 (zero or one copies) or 1 (two copies). AIC, a measure of the goodness-of-fit of an estimated statistical model, was used to compare models. Power analysis was performed using the online Genetic Power Calculator (http://pngu.mgh.harvard.edu/~purcell/gpc/) (46).
This work was supported by the National Institutes of Health (HL68927) and the Cystic Fibrosis Foundation (CUTTIN06P0).
The authors sincerely thank the many CF patients and their families, research coordinators, nurses and physicians involved with the CF Twin and Sibling Study, as this work would not have been possible without their participation. We also thank Nulang Wang for DNA extraction and CFTR genotyping, Michal Kulich, Ph.D. for providing conversion programs for CF-specific percentiles for FEV1 and John McGready, Ph.D. for helpful statistical discussions.
Conflict of Interest statement. None declared.