We conducted simulation studies to assess the performance of the proposed approach.
First, we studied the age-independent penetrances. We assumed 2 independent disease related single-nucleotide polymorphisms (SNPs): one is the study mutation with minor allele frequency (MAF) 0.01 or 0.001 and the other one is unobserved with MAF 0.2. We assumed dominant mode of inheritance for both the SNPs. The disease and risk factors were related by a logistic regression model:
where
g (
r) is 1 if the genotype of the study SNP (unobserved SNP) is of higher risk and 0 otherwise and OR is the odds ratio parameter for the unobserved SNP, which takes value 1 or 2. The marginal penetrance
f1 for carriers was fixed at 0.5 and the other penetrance
f0 was 0.03 or 0.1. The values of log-OR parameters
a and
b were determined by the other parameters. The genotypes of parents were generated under Hardy–Weinberg equilibrium and random mating, and the genotypes of offspring were independently generated given parental genotypes. From a large number of generated families with 3 offspring, we randomly selected 1 000 000 families with the first offspring being affected and carrying the study mutation and treated them as the source population from which the study sample was collected. A sample of size 200, 500, or 1000 was drawn from this population and simulation results based on 100 000 replications were produced. Reported in are the Rbias of the estimates defined as the mean estimated penetrances divided by the true penetrance minus 1, empirical standard errors (SE) and mean estimated standard errors (SEE) of the estimates, and empirical coverage probability (ECP) of the penetrances.
| Table 1.Age-independent penetrance estimates |
Overall, the estimates have minor bias when the disease is rare (f0 = 0.03) and the study mutation is rare (MAF = 0.001), with absolute relative biases no more than 1.2%. Common disease (f0 = 0.1), increased MAF (0.01) of the study mutation, and positive effect of unobserved mutation (OR = 2) has small impact on the estimates, with Rbias − 7.9% ~ 3.4%. In all situations, the SEE are very close to the empirical ones. The relative bias tends to be stable and remain to be small when the sample size increases. We also estimated the penetrance of carriers using only the genotypes of unaffected relatives by assuming zero penetrance of noncarriers. The resulting Rbias is generally small when f0 = 0.03 but it could become considerably large when f0 = 0.1 (results not shown).
It is also seen from that the Rbias for a mutation with MAF = 0.001 tends to be smaller than that observed for a mutation with MAF = 0.01. Additional simulation results show that the relative biases get larger when the MAF increases. For example, a MAF of 0.03 produces relative biases at the range of − 10.3% ~ − 23.7%, and an MAF of 0.1 produces relative biases at the range of − 41.9% ~ − 65.1%, with the other parameters the same as those in . It appears that the proposed approach is suitable for rare mutation with MAF ≤ 0.01.
Second, we studied the proposed approach when the penetrance is age dependent. We generated data from the following Cox model with Weibull baseline hazard function:
where
g and
r are the same as those in (
5.1) with the same MAFs. The OR was fixed at 1 or 2. The other parameters
ξ,
ψ, and
β were determined by 3 cumulative risk probabilities:
p30,0 =
P(
T ≤ 30|
g = 0),
p60,0 =
P(
T ≤ 60|
g = 0), and
p60,1 =
P(
T ≤ 60|
g = 1), where
T is the age at onset. To mimic common disease, we set
p30,0 = 0.03 and
p60,0 = 0.09; to mimic rare disease, we set
p30,0 = 0.01 and
p60,0 = 0.03. In both situations, we set
p60,1 = 0.5. The ages of the relatives of a proband were generated from the uniform distribution in the interval (
a − 5,
a + 5), where
a is the current age of the proband that is uniformly distributed in the interval (20, 70). The ages, genotypes, and disease status were generated for a large number of families similarly to the age-dependent situation. In each family, there were 2 parents and 3 offspring whose data were generated. Altogether, 1 000 000 families with 1 affected proband (the first offspring) carrying the mutation in each family were obtained. From these families, we sampled 400 or 1000 families and estimated
ξ,
ψ, and
β in model (
5.2) by ignoring the unobserved mutation. Substituting the estimated parameters gave the estimates of marginal survival functions of carriers and noncarriers. Based on 5000 replications, we calculated the mean estimated survival functions of both carriers and noncarriers and the 90% confidence intervals of the survival functions.
Presented in and are the results for carriers and noncarriers, respectively, with sample size 1000 and OR = 1 (unobserved mutation does not play a role on the disease). We can see that the bias of the estimates reduces dramatically when the MAF of study mutation decreases from 0.01 to 0.001, showing that the approximation of the likelihood function works pretty good for relatively rare mutation. When the disease gets common, the proposed method using both affected and unaffected relatives does not produce extra bias. However, the method that uses only unaffected relatives has much larger bias for common disease. This extra bias is due to the improper assumption of zero penetrance function of noncarriers for common disease. Other results for sample size 400 or OR = 2 are presented in
Figures s1–
s6 of the
supplementary material (available at
Biostatistics online). In summary, the bias of the penetrance functions get smaller as the sample size increases. The positive effect of unobserved mutation (OR = 2) has only limited impact on the penetrance function estimates. In particular, the impact is minimal when the MAF of the study mutation is small and the disease is rare.
Finally, we examined the robustness of the specification of the baseline hazard function. Our simulation studies showed that the misspecification of the baseline hazard function could result in bias, with its magnitude depending on the true and misspecified functions. Here, we do not present the simulation results but briefly summarize them. If the true baseline hazard function is gamma, Weibull, or log-normal, but it was misspecified to be any other 2 functions, then the resulting penetrance estimate had small bias; if the baseline hazard function is piecewise constant but it was misspecified to be Weibull, then the bias could be relatively large.