Our proposed method can be used to estimate relative risks under a log-additive model using imputed genotypes from case-parent triads. These estimates can subsequently be used for meta-analysis. This method also provides a way to increase the number of triads that can be included for a particular estimation of relative risk, by using imputation software to fill in SNPs that are sporadically missing (those that cannot be “called”).
Perhaps the simplest available method for using imputed genotypes involves simply selecting the most likely genotype at each locus and then treating that as if it were measured genotype data. That is, one selects the highest of the 3 posterior probabilities and treats the corresponding allele count as if it were a measured genotype. In fact, for many SNPs, the linkage disequilibrium structure is strong enough that the most likely genotype corresponds to the actual genotype most of the time. For our chromosome 22 data, there was disagreement between the measured and “most likely” genotypes for approximately 1 in every 2,000 SNPs. The corresponding proportion of SNPs/triads with apparent Mendelian inconsistencies was 1/3,000. Thus, it is not surprising that log-linear analyses yielded high correlations for results based on our method and results based on these assigned values (excluding Mendelian inconsistencies). One issue with using “most likely” genotypes, however, is possible underestimation of the uncertainties associated with relative risk estimation. A related issue is that the accuracy of the “most likely” genotypes can vary across SNPs. For some SNPs in our chromosome 22 data, the percentage of disagreement between the measured genotype and the “most likely” genotype was as high as 21%, and the percentage of apparent Mendelian inconsistencies ranged up to 4.7%. For these few badly imputed SNPs, which cannot readily be identified, our method outperformed the “most likely” method, presumably because it effectively removes families with poor imputation genotype scores—those residing far away from the 15 nodes.
Although our results suggest that imputed data performed very well in these analyses, it is worth noting that relative risks are estimated with case-parent data, while odds ratios are estimated with case-control data. For a common disease such as asthma, this distinction could require a nullward correction of the case-control estimates (or an inflation away from the null of the triad-based estimates) to ensure full comparability before combining evidence across data from case-control and case-parent study designs. One would need a way to estimate b0, the baseline risk, in order to translate a relative risk to the corresponding odds ratio. For example, suppose that b is the proportion of children who develop asthma in the population under study and p is the prevalence of the putative risk-related variant allele, with a relative risk of R1 for a single copy and R12 for 2 copies. We then can apply Hardy-Weinberg equilibrium to approximate b = (1 − p)2b0 + 2p(1 − p)b0R1 + p2R2b0, so the baseline risk b0 is approximately equal to b/[(1 − p)2 + 2p(1 − p)R1 + p2R2]. Then, the odds ratio for inheritance of a single copy would be estimated by plugging the estimates for b0 and R1 into R1(1 − b0)/(1 − R1b0).
The choice of the cutoff value L determines the weight given to each triad with a non-nodal score. A high cutoff value gives less weight to triads that are not very close to any of the 11 informative nodes, while a low value allows triads relatively far from those nodes to contribute. Our sensitivity analysis suggests that results are not particularly sensitive to the choice of this parameter.
It is possible to modify our method by instead using posterior probabilities for the 3 possible genotypes directly. In this alternative approach, each family in the sample contributes a sum of increments that are weighted by the product of 3 respective genotype probabilities. For example, the i
th family will contribute to the statistic an amount equal to
Let the total weight for the i
th triad be defined by
Once contributions of all triads in the sample are added, the statistic is estimated as
, and the risk estimate is
. This approach provides risk estimates that are similar to those from the method described in Materials and Methods assuming a log-additive risk model, but its r2
with estimates based on genotyped markers was not quite as high as that based on the method described above (r2
= 0.93 vs. r2
The current version of the MACH program does not make inferential use of the family structure, and development of software for imputation that exploits that structure could presumably work even better for analysis of family data. The posterior probabilities from such imputed genotypes, together with multiple imputation, could also be used for mapping of genetic variants associated with quantitative traits (e.g., by employing a likelihood-based method, such as quantitative polytomous logistic regression (12
)) or could be used for extended pedigrees (e.g., by employing a method that can handle data with such pedigree structures, such as the pedigree disequilibrium test (13
For a condition with onset in early life or for a pregnancy complication, the maternal genome may also contribute to risk, so it may be of interest to carry out meta-analyses based on possible maternal effects (14
). This can also be done using the multiple imputation method we have described. If both maternal and offspring-based effects are found for the same SNP, one can fit log-linear models that include both, as well as characterize possible synergistic effects between the mother and her offspring (15
In summary, we have shown via simulations and a real data example that estimates based on our approach applied to imputed genotypes agree well with results based on the corresponding measured genotypes using maximum likelihood estimation and a log-linear model. It is reassuring that estimation based on imputed genotypes is consistently very close to what would have been estimated on the basis of actual genotyping. This method will be useful to consortia of investigators who wish to combine data from case-control and case-parent genome-wide association studies in meta-analyses.