|Home | About | Journals | Submit | Contact Us | Français|
Maternal-fetal genotype (MFG) incompatibility is an interaction between the genes of a mother and offspring at a particular locus that adversely affects the developing fetus, thereby increasing susceptibility to disease. Statistical methods for examining MFG incompatibility as a disease risk factor have been developed for nuclear families. Because families collected as part of a study can be large and complex, containing multiple generations and marriage loops, we create the Extended-MFG (EMFG) Test, a model-based likelihood approach, to allow for arbitrary family structures. We modify the MFG test by replacing the nuclear-family based “mating type” approach with Ott’s representation of a pedigree likelihood and calculating MFG incompatibility along with the Mendelian transmission probability. In order to allow for extension to arbitrary family structures, we make a slightly more stringent assumption of random mating with respect to the locus of interest. Simulations show that the EMFG test has appropriate type-I error rate, power, and precise parameter estimation when random mating holds. Our simulations and real data example illustrate that the chief advantages of the EMFG test over the earlier nuclear family version of the MFG test are improved accuracy of parameter estimation and power gains in the presence of missing genotypes.
Studies show that prenatal environment may contribute to disease risk in offspring later in their lives [Cannon, 1997; Cantor-Graae et al., 2000; Louey and Thornburg, 2005; McKinney et al., 1999]. One example is Maternal-Fetal Genotype (MFG) incompatibility, which arises through an interaction between maternal and fetal gene products [Laing et al., 1995; Marcelis et al., 1998; Ober, 1998]. The underlying genetic basis for this incompatibility is what allows us to study it even decades after the adverse environment has passed.
Sinsheimer et al.  developed the MFG incompatibility test, a likelihood-based, affected-only statistical method for examining MFG incompatibility as a disease risk factor. The original method uses parent-offspring trios in an adaptation of Weinberg’s log-linear method [Weinberg et al., 1998] for estimating genotypic relative risks for maternal and offspring main effects. The MFG test allows the user to jointly model offspring allelic effects, maternal allelic effects, and maternal-offspring genotype interactions, such as maternal-fetal genotype incompatibility. Kraft et al. [2004, 2005] extended the MFG test to allow multiple siblings per nuclear family and to model locus-specific effects of maternal immunological processes, such as prior exposure. Other researchers developed the MFG test variations for use with nuclear families or with case-mother control-mother data [Chen et al., 2005, 2009; Cordell et al., 2004; Li et al., 2009]. Through the MFG and related tests, MFG incompatibility has been tested as a potential risk factor for diseases as diverse as autism, schizophrenia, and rheumatoid arthritis (RA) [Chen et al., 2005, 2009; Hsieh et al., 2006a; Insel et al., 2005; Palmer et al., 2002, 2006, 2008; Zandi et al., 2006].
Although the MFG test is powerful and adept at jointly modeling effects without confounding them, the restriction to nuclear families forces researchers who have extended family data to make a difficult choice. They must either risk losing power by selecting a single nuclear family per extended pedigree or risk introducing biases by partitioning their extended pedigrees into a number of nuclear families treated as independent. For this reason, we have created the Extended-MFG (EMFG) test to handle arbitrary pedigree structures in testing hypotheses about maternal allelic effects, offspring allelic effects, and maternal-offspring genotype incompatibilities. Before we develop the EMFG test, we review the current, nuclear family MFG test. We then use simulations to illustrate the properties and potential advantages and disadvantages of the EMFG test when compared to the nuclear family MFG test. Finally, we provide a simple illustration of the EMFG test on a real data set consisting of a single large pedigree.
Table I illustrates how the MFG test works in general for MFG incompatibility at a bi-allelic locus. There are seven possible combinations of mother and offspring’s genotypes (Table I, Columns 1 and 2). The most general model (Table I, Column 3) allows the offspring’s disease risk to differ for each MFG combination [Sinsheimer et al., 2003]. In Table I, these risks are designated by δij, where i and j represent the number of variant alleles (allele 2) present in the mother and offspring, respectively. Genotype interaction (MFG incompatibility) is reflected by the lack of constraints on δij that force the multiplicative model δij = ρi ηj and make maternal and offspring effects independent.
When prior evidence exists for a particular maternal-fetal genotype incompatibility mechanism, the number of parameters can be reduced. Two examples are provided in Table I and Figure 1A, B. Consider RHD incompatibility (Table I, Column 4 and Fig. 1A). In this case, allele 2 corresponds to the antigen-coding allele (often coded as D) and allele 1 corresponds to the null allele (often coded as d). RHD incompatibility occurs when the immune system of a mother with genotype 1/1 recognizes the fetus’ protein product from the 1/2 genotype as foreign and mounts an immune response that can be detrimental to her offspring. The potentially detrimental genotype combination has an increased risk of μ over the baseline risk of β. All other maternal-offspring genotype combinations have the baseline risk.
The second example (Table I, Column 5 and Fig. 1B) is derived from RA and HLA-DRB1, where offspring allelic effects are highly significant and non-inherited maternal antigens (NIMA) have been implicated to increase risk of disease in offspring [Harney et al., 2003; Newton et al., 2004; van der Horst-Bruinsma et al., 1998]. Offspring with 1/2 or 2/2 genotypes are at increased risk (ρ1 and ρ2, respectively) over the baseline risk. NIMA effects can occur to 1/1 offspring whose mother has genotype 1/2. This MFG combination carries increased risk to the offspring.
We start our model development with the affected-only, nuclear family MFG [Kraft et al., 2004, 2005], which allows for multiple siblings in a family. The MFG test conditional likelihood controls for ascertainment and has the following form for a completely genotyped family:
Here, G = (G1,…,Gk) represents the observed genotypes of the k-affected offspring in the family, and Gr and Gs represent the genotypes of the mother and father. The denominator sums over all possible ordered (phased) genotypes for offspring and parents (g, gr, gs). The vector D = (D1,…,Dk) denotes the k offspring in the family, with Dc equal to 1 for the k-affected offspring. There are six mating types (MT) to consider for a bi-allelic locus (Table II). Using mating types controls for non-random mating at the trait locus.
The MFG test is an affected-only analysis, however, the genotypes of unaffected or phenotype unknown offspring contribute if there are missing parental data. The conditional likelihood (1) must be summed over all possible parental genotypes (gr, gs) consistent with the observed genotypes in the family, which now include the genotypes of unaffected or phenotype unknown offspring [Hsieh et al., 2006b; Kraft et al., 2004; Minassian et al., 2006; Palmer et al., 2002]. The denominator remains the same. The likelihood for the entire sample is the product of the likelihoods for separate and independent families.
Reliance on mating types limits the traditional EMFG test to nuclear family analysis and burdens the model with nuisance parameters. In extended pedigrees, it is unclear how to weight matings, where one or both parents are non-founders. Assuming random mating at the locus of interest avoids mating types. Random mating takes place when selection of a mate is independent of marker genotype. Thus, the probability of a mating type equals the product of the genotype frequencies (Table II, Column 3). The model now depends on founder genotypes. Based on this assumption, the conditional likelihood of any pedigree can be evaluated by taking the ratio of two unconditional pedigree likelihoods [Ott, 1974].
The MFG likelihood for a pedigree with n members is defined as the conditional likelihood L(G|D) of the observed genotypes G given the trait phenotypes D in the pedigree. The affected-only analysis is achieved by treating all unaffected individuals as phenotype unknown. Hence, the vector D omits the disease phenotypes of pedigree founders. In contrast, G includes the marker genotypes of founders when these genotypes are known. If users have affected founders, they can be included in the analysis by introducing their parents without phenotype or genotype data. If genotypes are missing for some individuals or genotype phases are unknown, the likelihood is summed over the possible ordered genotypes g that are consistent with the observed genotypes G in the family.
The conditional likelihood for the maternal-fetal genotype incompatibility effects is:
The probability of disease in offspring c is a function of both her and her mother’s genotype. Prior(gj) represents founder j’s genotype frequency. These founder population genotype frequencies are estimated by maximizing the likelihood along with the risk parameters. The second term in the numerator, Pr(Gi|gi), is 1 if the proposed genotype for i, gi, is consistent with the observed genotype Gi, and 0 if it is inconsistent. When Gi is missing, Pr(Gi|gi) = 1. This term is calculated for each family member, regardless of affection status. Offspring and maternal allelic effects, and maternal-fetal genotype interactions are parameterized through Pr(Dc|gc,gr), which is calculated with Trans(gc|gr,gs), the transmission probability for offspring, mother, and father triple (c,r,s). The denominator sums over all possible ordered (phased) genotypes for the n family members, and is similar to standard ascertainment correction [Lange, 2002]. When parental genotypes are unavailable for founders, we treat them as phenotype unknown.
To illustrate the adaptability of Pr(Dc|gc,gr), we consider a few pertinent examples. First, consider RHD incompatibility and hemolytic disease of the newborn where disease risk differs for boys and girls.
MFG incompatibility occurs when the mother’s genotype is 1/1 and the offspring’s is 1/2.
In our second example we modify Pr(Dc|gc,gr) to allow for both NIMA and offspring allelic effects as follows:
Parameters ρ1 and ρ2 represent the relative risk of disease for an offspring who carries one or two copies of a risk allele at the locus of interest compared to an offspring who carries no copies of a risk allele. MFG incompatibility occurs when the mother’s genotype is 1/2 and the offspring’s is 1/1. Since Di is considered missing if i is unaffected or phenotype unknown and the baseline population prevalence β cancels from the conditional likelihood, we fix β = 1, which is equivalent to δ00 = 1 (Table I, Column 3).
The EMFG test is implemented in the Mendel software [Lange et al., 2001] for pedigrees of variable size and structure. The current program handles the generalized risk model for a bi-allelic locus (Table I, Column 3). Parameters under specific hypotheses are estimated by placing restrictions on this model. RHD incompatibility without gender effects imposes δ00 = δ10 = δ11 = δ12 = δ21 = δ22 = 1 and estimates only δ01. NIMA sets δ00 = 1 and imposes δ01 = δ11 = δ21 and δ12 = δ22, accounting for the offspring allelic effects. Thus, there are three free parameters, two for offspring allelic effects and one (δ10) accounting for the NIMA effect. Nested models are compared using a likelihood ratio (LR) test. For example, to test for a significant NIMA effect in the presence of an offspring allelic effect, one can compare a full model with three free parameters (NIMA and offspring allelic effects) to a restricted model with two free parameters (offspring allelic effects) using a one-degree of freedom LR test.
The simulations are designed for three purposes. In Simulation Study 1, we demonstrate the statistical properties of the EMFG test under realistic effect sizes for RHD and NIMA incompatibility. In Simulation Study 2, we compare three possible study designs using (a) extended pedigrees in their entirety, (b) one nuclear family per extended pedigree [Palmer et al., 2008], and (c) all nuclear families available from extended pedigrees and treated as independent. We anticipate that design (a) will be superior to (b) in power and (c) in accuracy. In Simulation Study 3, we determine the effects of violating the random mating assumption on parameter estimation, power, and type-I error rate of the EMFG test.
In our simulation studies, we characterize the properties of the EMFG test in terms of its rejection rates and estimation accuracy. Unless otherwise mentioned, founder genotypes are simulated according to Hardy Weinberg Equilibrium (HWE) with allele frequencies P(1) = 0.33 and P(2) = 0.67. Non-founder genotypes are simulated using Mendel’s gene dropping option [Lange et al., 2001]. Samples are simulated with no MFG effect and with detrimental MFG effects ranging from 1.5 to 2.5 in 0.1 increments (results only shown for certain values). We chose this range because of previous estimates of MFG effects [Insel et al., 2005; Kraft et al., 2004; Palmer et al., 2008]. We refer to these simulation conditions as scenarios. For each scenario, 1,000 data sets are simulated. Parameter estimates and their standard errors are averaged over the data sets. The number of pedigrees within a data set varies. For each model, rejection rate and coverage are shown. Coverage is the proportion of 95% confidence intervals that contain the MFG relative risk’s true value. The rejection rate is the proportion of samples where the LR test rejects the null hypothesis of no MFG effect at a significance level of 0.05. In all scenarios, two-sided tests are used. Since each performance statistic p is a proportion, its standard error is .
For Simulation Study 1, each of 1,000 simulated data sets contain 200 three-generational pedigrees. Only individuals 7 and 8 are affected (with dark shading in Fig. 2). First, we use the EMFG Test to estimate the MFG effect in simulated samples with no gender-specific MFG effect (Table III, Scenarios A and B). When μ = 1, the type-I error rate of 0.040 is close to the desired level of 0.05. The EMFG test has 0.816 power to detect an MFG effect of 1.7 when there are no gender effects (Scenario B, model μm = μf). When these same data sets are analyzed under a model that allows for gender differences (μm≠μf), power is reduced to 0.724. In all scenarios, the parameter estimates are close to the true MFG effect. Founder genotype frequencies and their standard errors are the same across models, and are equal to their expected values under HWE ((1/1) = 0.109, SE = 0.012; (1/2) = 0.443, SE = 0.018; (2/2) = 0.448, SE = 0.018). Accurate results are also obtained for data simulated with 1.5≤μ≤2.5 (data not shown).
Several studies found that MFG incompatibility effects are confined to a single gender [Insel et al., 2005; Palmer et al., 2006, 2008]. To mimic this scenario, samples are simulated with males at increased risk of disease but females at baseline risk (μf = 1.0). Under a correct model constraining the female effect to 1, power to detect the MFG effect is 0.536 (Table III, Scenario C). Power decreases because each family has exactly one affected male and one affected female, so there are now only half the individuals contributing to μm’s estimation. Under a model that allows gender differences, parameter estimates are close to the true values and power is reduced to 0.45. Genotype frequency estimates and standard errors are the same as those observed in the previous scenarios. Similar results are obtained for data simulated with any value of μm between 1.5 and 2.5 (data not shown).
To determine type I error, power, coverage, and parameter estimation of the EMFG when offspring allelic effects are present, we simulated data with NIMA MFG incompatibility and offspring allelic effects (Table I, Column 5). A total of 15% of RA patients do not carry the shared epitope encoded by certain HLA-DRB1 alleles [Harney et al., 2003], so bi-allelic gene frequencies are P(1) = 0.39 and P(2) = 0.61 where allele 2 represents the shared epitope. Data are simulated without NIMA (μ = 1) or offspring effects (ρ1 = ρ2 = 1) (Table IV, Scenario A), and with μ = 2.5, ρ1 = 2 and ρ2 = 2.5 (Scenario B). In both scenarios, we fit a full model that estimates NIMA and offspring allelic effects. We also fit three reduced models (results not shown), that represent the null models of (a) no offspring allelic or NIMA effects (μ = ρ1 = ρ2 = 1.0), (b) no offspring allelic effects (ρ1 = ρ2 = 1.0), and (c) no NIMA effect (μ = 1.0). Using LR tests, these three models allow us to test for (a) either offspring or NIMA effects, (b) offspring effects in the presence of NIMA effects, and (c) NIMA effects in the presence of offspring effects.
When simulated under the null hypothesis (μ = ρ1 = ρ2 = 1.0), the parameter estimates (1.001–1.014), coverages (0.941–0.956), and type-I error rates (0.048–0.060) are appropriate. When μ = 2.5, ρ1 = 2.0, and ρ2 = 2.5, parameter estimates are close to their true values ( = 2.57, 1 = 2.058, 2 = 2.575) and coverage is appropriate (0.94–0.95). There is ~80% power to detect all three effects (rejection rate = 0.818 of μ = 1.0, rejection rate = 0.771 of ρ1 = ρ2 = 1.0, and rejection rate = 0.797 of μ = ρ1 = ρ2 = 1.0). Founder genotype frequencies are equal to their expected values under HWE.
To avoid potential biases in analyzing related and sometimes overlapping nuclear families, the current recommended practice when using the MFG test is to select nuclear families from an extended pedigree so that the offspring in different nuclear families are no more related than second cousins [Palmer et al., 2008]. To examine the impact on power, we consider three simulation scenarios involving 80 three-generational pedigrees with affected siblings and first cousins (Fig. 2, individuals 5–8). Table V reports parameter estimates, standard errors, coverage, and rejection rates using the EMFG test. The 80 three-generational pedigrees are analyzed as full pedigrees (Scenario A), as 80 selected nuclear families by removing individuals 3, 4, 7, and 8 (Scenario B), and as 240 independent nuclear families (Scenario C). Regardless of which we use, all the three approaches give reasonable parameter estimates ( range: 0.979–0.993), type-I error rates (~0.060), and coverage (range: 0.942–0.965) when μ = 1.0.
When μ = 1.7, analyzing pedigrees in their entirety produces the most accurate result ( = 1:691, coverage = 0.953). Analyzing either 80 nuclear families or all 240 nuclear families slightly underestimates the MFG effect (Scenario B: = 1.598, coverage = 0.948; Scenario C: = 1.623, coverage = 0.932). As expected, analyzing pedigrees in their entirety is more powerful than analyzing one nuclear family per pedigree, but analyzing all 240 nuclear families recovers the power (Scenario A: rejection rate = 0.825; Scenario B: rejection rate = 0.607; and Scenario C: rejection rate = 0.80). More striking trends are seen when μ = 2.5 where there is plenty of power to detect the MFG effect with all three study designs (power > 0.999). Analyzing pedigrees in their entirety gives the most accurate estimates ( = 2.532, coverage = 0.954). Surprisingly, we see decided underestimation and reduced coverage when analyzing either one nuclear family per pedigree or analyzing all nuclear families (Scenario B: = 2.231, coverage = 0.900; Scenario C: = 2.356, coverage = 0.913).
To investigate if the parameter underestimation observed in Scenarios B and C results from the random mating assumption, we analyzed the same samples simulated in Scenario C (μ = 1.7 and 2.5) using the original mating type based nuclear family MFG test (Scenario D). Under both values of μ, the parameters are again underestimated ( = 1.636, coverage = 0.940; = 2.376, coverage = 0.926). Since we continue to see underestimation of the MFG parameter even when we do not assume random mating, we can rule it out as a contributing factor. When μ = 1.7, the rejection rate is slightly lower than in Scenario C, and this difference proves to be statistically significant according to McNemar’s test (P-value < 0.001). This suggests that assuming random mating when it holds increases the power of the MFG test.
Instead, the underestimation is the result of the loss of information regarding MFG incompatibility that occurs when analyzing extended pedigrees as nuclear families. For our simulations, in order for individual 8 to be MFG incompatible, her mother (individual 5) must have genotype 1/1, meaning that the individual 5 cannot be MFG incompatible with her mother (individual 2). As a result, we expect individual 5 to carry the 1/1 genotype more often and carry the 1/2 genotype less often than individual 6, whose child (individual 7) can be incompatible with his mother (individual 3) even when the individual 6 is incompatible with his mother (individual 2). We can make our intuition more concrete by calculating the average frequencies for individuals 5 and 6 over all the simulations under Scenario A where μ = 2.5. The frequency of the 1/1 genotype for individual 5 (f1/1 = 0.166) is greater than the frequency for individual 6 (f1/1 = 0.109), and the frequency of the 1/2 genotype for individual 5 (f1/2 = 0.523) is less than the frequency for individual 6 (f1/2 = 0.664). When analyzed as a nuclear family, individual 6’s decreased tendency to be incompatible with individual 2, leads to an underestimation in μ. On the other hand, the extended pedigree’s likelihood correctly accounts for the relationships among the three generations.
There are additional situations where exploiting extended pedigrees has advantages. For example, if a parent of an affected individual is unavailable, we expect an increase in power if we genotype the parents of the missing parent. Using the 80 three-generational pedigrees from Table V, Scenario A, we deleted genotypes in the second generation so that 40 families are missing individuals 3–6, 20 families are missing individuals 5 and 6, 10 families are missing individual 5, and 10 families are missing individual 6. After analyzing the 80 extended pedigrees with missing genotypes (Table V, Scenario E), we conducted two nuclear family analyses. In the first analysis (Scenario F), we select a nuclear family with at least one genotyped affected offspring from each pedigree, resulting in 80 nuclear families, choosing trios over dyads, and dyads over singletons (individuals with two ungenotyped parents). If two nuclear families fit this criterion, one is chosen at random. Our final sample composition is 20 trios, 20 dyads, and 40 singletons. In the second nuclear family analysis (Scenario G), we analyze all nuclear families with at least one genotyped affected offspring, for a total of 180 nuclear families (40 trios, 60 dyads, and 80 singletons).
When μ = 1.0, parameter estimates (0.972–1.001), type-I error rates (0.047–0.057), and coverage (0.952–0.961) are appropriate if extended pedigrees are analyzed in their entirety, if a single nuclear family per pedigree is analyzed, or if all allowable nuclear families are analyzed. The most dramatic difference from the complete genotype data analyses are the increases in the SEs for μ when the pedigrees with missing data are analyzed as nuclear families (Scenario F: SE = 1.923 vs. Scenario B: SE = 0.277; Scenario G: SE = 0.680 vs. Scenario C: SE = 0.211), whereas the SE for μ is only slightly increased when using an extended pedigree study design (Scenario E: SE = 0.252 vs. Scenario A: SE = 0.219).
When μ = 1.7, the EMFG test still yields reasonable parameter estimates and power in the presence of missing genotypes when extended pedigrees are analyzed in their entirety ( = 1.697, coverage = 0.959, rejection rate = 0.736). There is a slight power loss compared to the situation where all individuals are genotyped (Scenario A, rejection rate = 0.825). When a single nuclear family per pedigree is analyzed, there is a large reduction in power relative to Scenario B, and much larger standard errors (1.585), but μ is only slightly underestimated ( = 1.612, coverage = 0.974, rejection rate = 0.222). When all allowable nuclear families are analyzed (Scenario G), μ is slightly underestimated, and power is larger in comparison to using only one nuclear family per analysis, but is still much smaller than the power seen when analyzing extended pedigrees ( = 1.624, coverage = 0.960, rejection rate = 0.359). The SE of μ is still large (0.504) but not nearly as large as in Scenario F. Thus, analyzing extended pedigrees provides a dramatic increase in power over nuclear families when there are missing data.
When μ = 2.5 and pedigrees are analyzed in their entirety, parameter estimates are accurate, and power is very high ( = 2.536, coverage = 0.957, rejection rate ≥ 0.999). When nuclear families are analyzed, the MFG parameter is underestimated, standard errors are large, and power is reduced (Scenario F: = 2.31, SE = 0.974, coverage = 0.972, rejection rate = 0.549; Scenario G: = 2.306, SE = 0.627, coverage = 0.939, rejection rate = 0.853). Again the large standard errors for μ offset the bias so that the coverage is not significantly reduced from the nominal value of 0.95.
Because the EMFG test assumes random mating it is important to investigate violations of this assumption. Violations can occur in different ways. For example, population stratification (PS) may be present. Sinsheimer et al.  found that the original MFG test, which does not assume random mating, is unaffected by PS, but it is possible that the EMFG test may be sensitive to it.
To understand the impact of PS, we turn to additional simulations. Each scenario involves one of three levels (No PS, Moderate PS, and Large PS), 1,000 samples, and the null hypothesis of μ = 1.0 or the alternative hypothesis of μ = 2.5. The true MFG effect is the same in the population but the baseline disease risk differs between populations. To allow direct comparison to the nuclear family MFG test, we simulate 240 parent-offspring trios. In the No PS example, trios are sampled from one population using an average of RHD frequencies for Finland and Germany (P(1) = 0.37, P(2) = 0.63). In the second example (Moderate PS), half the trios have Finnish RHD frequencies [Palmer et al., 2002; P(1) = 0.33, P(2) = 0.67], and half have German RHD frequencies [Wagner et al., 1995; P(1) = 0.41, P(2) = 0.59]. In the third example (Large PS), half the trios have Finnish RHD frequencies, and half have Asian RHD frequencies [Reid and Lomas-Francis, 2004; P(1) = 0.10, P(2) = 0.90]. Results are shown in Table VI.
For the samples with μ = 1.0 and No PS or Moderate PS, coverage (0.954 and 0.953), type-I error rate (0.050 and 0.052), and parameter estimation (0.983 and 0.970) are appropriate. Under large PS, type-I error rate is inflated (0.085), coverage is low (0.929), and μ is underestimated (0.804). For samples with μ = 2.5 and No PS ( = 2.506, coverage = 0.960, power = 0.988) or Moderate PS ( = 2.48, coverage 50.949, power = 0.985), the results are very similar. In the Large PS scenario, MFG is underestimated (2.054), coverage is low (0.900), and power is greatly reduced (rejection rate = 0.636). If we analyze these same samples using the nuclear family MFG test [Kraft et al., 2004, 2005], we obtain an appropriate parameter estimate and coverage ( = 2.492, coverage = 0.970). However, power is reduced (rejection rate = 0.768), reflecting the small proportion of Asians who are affected because of MFG incompatibility.
Because the EMFG test is sensitive to random mating violation, we suggest testing for departures from this assumption in a randomly selected sample of control parent pairs from the same study population. One option is to test for departure from HWE. A better option is to test for random mating violation by comparing observed parental mating type frequencies to expected mating type frequencies assuming random mating (Table II). Let P(MTj)Ha and P(MTj)Ho represent the probability of mating type j under the null (random mating holds) and alternative (random mating violated) hypotheses. If we denote the corresponding multinomial mating type counts in the control data by Nj, then the appropriate test statistic is
This statistic is compared to a χ2 distribution with three degrees of freedom. When we apply the HWE and LR tests to 1,000 samples each comprised of 120 Finnish control parents and 120 Asian control parents (Table VI, Scenario C), non-random mating that leads to 20% underestimation of μ (Table VI, Scenario D) can be detected with approximately 70% power under the LR test (5) vs. 40% power under a HWE test. The random mating test is more powerful despite having more degrees of freedom and allows us to determine situations where the nuclear family MFG test may be preferable to the EMFG test.
To illustrate the application of Mendel’s EMFG test and compare using an extended pedigree with nuclear families on actual data, we analyze a single, large pedigree from an internal isolate in Finland. This pedigree is part of a previously described genetic study of Finnish individuals affected with schizophrenia, schizoaffective psychosis disorder, and schizophrenia spectrum disorder [Hovatta et al., 1999; Ekelund et al., 2000, 2001; Paunio et al., 2001]. The pedigree spans 5 generations with 13 founders, 19 married-ins and 62 offspring; 14 individuals are known to be affected. A total of 51 of the 93 pedigree members have RHD genotypes. In previous studies, nuclear families were extracted from this pedigree and included in the data set for analysis to examine the question of whether RHD incompatibility is a risk factor for schizophrenia [Kraft et al., 2004; Palmer et al., 2002, 2008].
We estimate the RHD incompatibility effect in this large extended pedigree by using three study designs similar to those used in Simulation Study II: using the extended pedigree, using nuclear families without regard to individuals interrelatedness (11 Nuclear Families), and using only nuclear families that are distantly related to one another (7 Selected Nuclear Families). To treat the data as nuclear families without regard to their interrelatedness, we first use the TRIM_PEDIGREE option of Mendel software [Lange and Sinsheimer, 2004] to extract the 11 nuclear families with at least one genotyped affected sibling. To select the nuclear families that are not closely related, we first calculate the theoretical kinship matrix for the pedigree using the KINSHIP_MATRICES option of Mendel and then select the seven nuclear families where the between nuclear family relatedness of the offspring is no closer than second cousins.
The estimate of the MFG incompatibility parameter μ (~1.8) is similar for all three analyses (Table VII) and is lowest for the complete pedigree analysis and highest when using the selected nuclear families. The standard error and the P-value testing μ’s deviation from the null value of 1 are smallest in the complete pedigree analysis and largest in the selected nuclear families. The founder genotype frequency estimates are consistent with the published Finnish RHD genotype frequencies [Laakso and Toiviainen, 2009]. These results further illustrate the advantage of keeping extended pedigrees intact.
The nuclear family MFG test [Kraft et al., 2004, 2005; Sinsheimer et al. 2003] can be extended to accommodate arbitrary pedigree structures. The EMFG test, like its predecessor, is capable of estimating gender-specific MFG incompatibility effects, maternal allelic effects, and offspring allelic effects. Our simulation studies address EMFG test properties, such as accuracy of parameter estimation, coverage and rejection rates, and the sensitivity to model misspecification. Simulation Study 1 shows that the EMFG test has reasonable type-I error rate, coverage, and parameter estimates and adequate power to detect moderate effect sizes with as few as 200 affected cousin pairs in three-generation pedigrees.
Simulation Study 2 demonstrates the superior power advantage of the EMFG test over the nuclear family MFG test. It is hardly surprising that selecting one nuclear family per extended pedigree reduces the power. Additional power can also be gained by using genotypes rather than mating types. The most dramatic power gains occur when there are missing genotype data. In this setting, preserving pedigree structure allows genotype information to propagate from genotyped members to ungenotyped members. This information is lost when a pedigree is split into nuclear families. The EMFG test also gives accurate parameter estimates and reasonable type-I error rates in the presence of missing parental genotypes.
The EMFG test can decrease bias. The extent of bias caused by the loss of information that occurs from treating related nuclear families as independent can be large, as much as a 10.8% underestimation in our simulations. Analyzing pedigrees in their entirety avoids this problem without sacrificing power. Of course the degree of the bias depends on who is affected in the pedigree. In our simulations, we see notable underestimation of the RHD incompatibility parameter because both mother and daughter are affected. We would not expect to see the same degree of underestimation if only the grandchildren were affected. Furthermore, the degree of bias and its direction depend in part on the nature of the MFG incompatibility. For example, if MFG incompatibility requires that both mother and daughter share the 2/2 genotype, then we would expect to see overestimation when analyzing nuclear families taken from the same pedigree structure as Figure 2 (siblings and grandchildren affected).
In our analysis of a real data set, the EMFG test was capable of estimating the effects of RHD incompatibility on schizophrenia for a large family spanning five generations, where nearly half of the family members were missing genotypes. When analyzing only some nuclear families from this extended pedigree, precision was reduced. In practice, we do not recommend using the EMFG test on such a small sample size, but did so here to provide a particularly straightforward comparison between pedigree and nuclear family analyses in a more realistic way than was possible in our simulations.
Unlike the nuclear family MFG test, the EMFG test is sensitive to violations of random mating. When PS is present and large, the EMFG test loses power and underestimates the true MFG effect. Therefore, one should conduct a test of random mating prior to applying the EMFG test if violation is suspected. A test of HWE is not a substitute for a test of random mating comparing observed mating type frequencies to expected mating type frequencies. When a severe violation of random mating is observed, we recommend using the nuclear family version of the MFG test.
In spite of its sensitivity to violations of random mating, the EMFG test is an improvement over our previous test, particularly in ease of use. Although our simulation studies focus on specific examples of MFG incompatibility, the EMFG test software allows users to specify the risks under the general bi-allelic locus model (Table I). Our new EMFG test has been developed as part of the statistical genetics software program Mendel [Lange et al., 2001]. It is less data specific and more user friendly than our earlier software versions. The EMFG test will be released as an option in an upcoming version of Mendel.
We thank Drs. L. Peltonen, J. Lonnqvist, and J. Turunen for the use of the Finnish Pedigree. These data were collected with funding from the Center of Excellence in Disease Genetics, Academy of Finland, Biocentrum Helsinki Finland, and USPHS grant MH66001.