The LRT we present in this paper expands upon the previous work of Weinberg [1999a]
, by allowing unaffected siblings to contribute to the EM algorithm when parental data are unavailable. Overall, our power calculations indicate that using genotype data from unaffected siblings improves the power of the LRT to detect risk due to the child's genotype, maternal genotype, and POO
effects under varying amounts of missing parental genotypes. When the child's genotype risk is of interest, having one unaffected sibling appears to result in the biggest gain in power in the models that we examined. The addition of a second unaffected sibling does not appear to add a lot of power of the LRT in these scenarios. However, if one is studying a rare disease like autism or neural tube defects and power is severely constrained by the small number of affected individuals available, it may well be worth it to genotype additional unaffected siblings. We see a similar improvement in power for the test of maternal genotype risk, particularly when all families are missing one parent. As expected, incorporating genotypes of unaffected siblings also results in increased power to detect POO
effects when parental data are missing for one or both parents. For the simulated examples, the power to detect POO
effects is almost (see and page 17) completely recovered to that of the fully genotyped dataset when two unaffected siblings are used and all families are missing one parent.
As has been previously noted for the LRT, multiple affected siblings cannot be considered independent if a locus under investigation is in linkage disequilibrium with the true disease locus, or other loci contribute to susceptibility to the disease phenotype [Schaid and Sommer1993
; Weinberg et al. 1998
; Martin et al 2003
]. This is because failing to account for the correlation between affected relatives complicates the interpretation of significant results by making it difficult to distinguish evidence for linkage alone versus both linkage and association. Because of this, the LRT is not a valid test of association in the presence of linkage when multiple affected siblings are used. An alternative method for testing maternal genotype and POO
effects in family data is the conditioning on parental genotypes (CEPG) test which fits the same model as the LRT, except in a conditional logistic regression framework [Cordell 2004]. This test, as opposed to the LRT, can incorporate genotypes from multiple affected individuals in a family using the Huber-White “information sandwich” estimation [Huber 1967; Whitehead et al 1982]. However, the LRT has improved power compared with the CEPG
to detect POO
and mother-child genotype effects since it incorporates POO
ambiguous trios (1,1,1) via the EM algorithm [Cordell 2004]. In addition, although families with missing genotypes for one parent can be analyzed with the CEPG
, this test does not implement the more efficient EM-based missing data likelihood method that is incorporated into the LRT.
Assumed genetic models can be fit to the log-linear model by restricting the maternal or offspring relative risk (RR), for example, by considering the homozygous and heterozygous RR's to be equal under a dominant model. Specification of an assumed model can improve the power of the LRT if the specified model reflects the true mode of inheritance. However, if the underlying mode of inheritance differs from the specified model, this can result in a substantial loss of power. Starr et al. (2005)
showed that loss of power can be controlled by assuming a log-additive model.
We provided a comparison of the LRT and another family-based association test, FBAT using data from a bipolar study. Our power calculations under an additive model suggest that LRT has greater power than FBAT, regardless of allele frequencies. Since FBAT is limited to models of child genotype risk, additional power comparisons for maternal effects and POO effects could not be performed.
The application of the Combined_LRT to the autism candidate gene RELN demonstrates that the addition of unaffected siblings gives more significant results for the child genotype risk and POO
model compared with an alternative EM_LRT that does not incorporate unaffected siblings. The significance of this SNP for the child risk model was seen previously using the PDT and Geno-PDT (global p-value = 0.028 and 0.033) [Skaar et al 2005
], but the LRTs in this paper provide the first significant evidence of POO
In summary, we have shown that the power of the LRT can be improved if unaffected siblings are used in the EM algorithm when parental genotypes are missing. This conclusion is documented through simulation-based power comparisons for the parent-of-origin and maternal effects models when varying proportions of parental genotypes are missing, and is supported by improved significance in an application to real data from a study of autism. Power calculations for the maternal genotype risk model had been previously performed for triads with missing parents (Weinberg 1999a
) but these did not include unaffected siblings nor has the power to detect POO
effects been studied. We demonstrated the improved power of the LRT for offspring genotype risk over a similar family-based test, FBAT, under an additive and multiplicative model. We also provided a bootstrap approach for calculation of appropriate confidence intervals for datasets with missing data, which has been implemented along with the LRT in a SAS macro that is available online. These results show that when nuclear families are studied and parental genotypes are missing, that bias-resistant likelihood methods can be used to take full advantage of genotype data from unaffected siblings.