The twin sample contained 6,805 individuals of whom 55.3% were male and 44.7% were female. The mean age of the sample was 36.2 (SD 8.6) years with a range of 20.4–59.5 years. Overall, individuals in the sample reported: lifetime SI 78.2%, lifetime RS 54.2%, and 19.8% met criteria for lifetime ND. Those who reported that they were regular smokers were given the Fagerström Questionnaire, from which the FTQ score and FTND score were calculated.
Univariate analysis of ND
Five scenarios, presented in , were used to evaluate the increase of power resulting from adding phenotypes for ungenotyped twins, and assigning or estimating their genotypes based on allele frequencies, and genotype of co-twin when available. In all cases, twins for whom both phenotype and genotype data are available are included (GP) regardless of whether they constitute a complete twin pair or not. This corresponds to traditional association analyses, except that in such analyses typically only one twin per pair is included. This scenario includes 2,321 individuals.
In the second scenario, genotypes for MZ twins whose cotwin was genotyped (P1) are estimated—which is equivalent to assigning the genotypes as they are known without error, assuming that the measured genotypes are correct, increasing the sample size to 2,610 individuals. Alternatively, genotypes for DZ twins whose cotwin was genotyped (P2), regardless of whether the co-twin was phenotyped, are estimated based on the probabilities of the allele frequencies in the three categories corresponding to the genotype of the other twin, resulting in 2,815 individuals for scenario III. In scenario IV, information from the discordantly genotyped pairs—those where one member of a pair is genotyped and the other is not—is used regardless of zygosity, thus combining scenarios II and III, resulting in 3,104 individuals with phenotype-genotype data.
In the most complete scenario (V, N = 4,132), genotypes for MZ and DZ twins where neither twin was genotyped are estimated based on the allele frequencies, resulting in three possibilities for MZ (P3) and nine possibilities for DZ twins (P4). The latter scenario uses all available phenotypic data. Note that when phenotypic data are not available (combinations 4, 8 and 12 in ), even when genotypic data are available (G), those twins are not contributing to the analyses (except for the estimate of the allele frequencies) and were therefore omitted from .
A breakdown of the sample size by availability of genotype and phenotype data and by zygosity and twin order is presented in . Note that the phenotype of FTND is only available for regular smokers with non-regular smokers having missing values for FTND thereby reducing the overall sample sizes.
Sample sizes for combinations of genotyped and phenotyped twins (Color table online)
The results of the augmented univariate twin analyses for FTND showed that additive genetic factors accounted for slightly over half of the variance in liability to FTND (~55%, CI 37–63%), whereas the contribution of shared environmental factors was small (3%, CI 0–17%). Unique environmental factors made up the remaining part of the variance (42%, CI 37–47%). Of the seven SNPs tested, three reached significance at the 0.05 level (rs16969968 and rs1051730, which were previously identified in association analyses, and marker rs2869546 which is novel). Note that markers rs16969968 and rs1051730 are highly correlated (D' = 0.98) and as such likely represent the same association signal. Even though these three SNPs reached genome-wide significance in previous analyses, including one on the same sample, they each account for less than one percent of the variance in liability to ND (0.35%, CI 0.03–0.99% for rs16969968, 0.30%, CI 0.02–0.99% for rs1051730 and 0.27%, CI 0.01–0.86% for rs2869546), or close to 1% of the genetic variance.
By comparing models that included the effect of the measured genotype with those in which it is excluded, we can evaluate whether the variance accounted for by the SNP forms part of the residual additive genetic variance or either of the environmental variance sources, as the estimates of the background additive genetic and shared environmental variance are not entirely independent and derive from the relative magnitude of the MZ and DZ covariance. The relative difference in unstandardized variance components is presented in and , as well as the Chi-squares and P values associated with estimating the allelic effects of the respective SNPs. For the three significant SNPs, the variance accounted for by the SNP is mostly picked up by the additive genetic variance when the SNP is not modeled, although a small proportion of it is picked up by the shared environmental variance component, possibly as a consequence of population stratification or between-family effects of unknown origin. The difference in variance accounted for by specific environmental factors is very small, suggesting its estimation is not biased.
Difference in variance components between modeling with or without allelic effects
Difference in variance components and Chi-square associated with allelic effects
Significance of allelic effect
In comparing the different scenarios, including individual twins who were phenotyped but not genotyped and using information of the genotyped co-twin or allele frequencies, minor differences in the significance level of the SNP were noted between the scenarios for non-significant SNPs, as shown in . For the two alleles of largest effect—and previously identified—SNPs, adding phenotypic information of discordantly genotyped DZ twins slightly increased the significance level. However, adding this information for discordantly genotyped MZ twins surprisingly decreased the significance level, resulting in a lower significance level than just using twins who are both pheno-typed and genotyped. For rs2869546, the significance level went up with adding ungenotyped but phenotyped individuals regardless of zygosity, resulting in a pattern of increasing Chi-squares with increasing sample size.
Chi-squares associated with allelic effect of SNPs by scenario
Association with other smoking related phenotypes
We further evaluated whether these replicated SNPs associated with FTND also conferred risk for other smoking related phenotypes. Of interest, given the correlated liabilities of TI, RS and ND, was whether SNPs in the CHRNA receptor gene also contribute to variability in TI and RS. We also included CS, a dichotomous ND measure (based on FTND ≥7) and the individual items comprising the FTND scale: number of cigarettes smoked (Cigs), how soon after waking do you smoke your first cigarette (Wake), refrain from smoking where it is prohibited (Refrain), smoking when ill in bed (Ill), smoking more in the morning (Morning), first cigarette of day most satisfying (First).
The chi-square test statistics for testing the allelic effect are presented in . Note that for these analyses, we used the mixture approach and thus included all available phenotypes. As a result, sample sizes varied by phenotype, as more individuals had a non-missing value for TI and RS than for FTND which was only obtained in individuals who have smoked regularly. The three SNPs identified for FTND did not contribute significantly to variability in TI or RS, although some of the other SNPs appeared to be associated but their allelic effect was not statistically significant. The same three SNPs did contribute to risk for CS but to a lesser degree. Not surprisingly, the two items contributing the most information to the FTND score, Cigs and Wake, show interesting results. The two previously replicated SNPs seem to be most strongly associated with Cigs, while the third SNP (rs2869546) appear most strongly associated to Wake item, as well as with smoking when ill. No significant associations were found for Refrain, Morning and First.
Chi-squares for association of SNPs with smoking related phenotypes