|Home | About | Journals | Submit | Contact Us | Français|
While multiple lines of evidence suggest the importance of genetic contributors to risk of preterm birth, the nature of the genetic component has not been identified. We perform segregation analyses to identify the best fitting genetic model for gestational age, a quantitative proxy for preterm birth.
Because either mother or infant can be considered the proband from a preterm delivery and there is evidence to suggest that genetic factors in either one or both may influence the trait, we performed segregation analysis for gestational age either attributed to the infant (infant's gestational age), or the mother (by averaging the gestational ages at which her children were delivered), using 96 multiplex preterm families.
These data lend further support to a genetic component contributing to birth timing since sporadic (i.e. no familial resemblance) and nontransmission (i.e. environmental factors alone contribute to gestational age) models are strongly rejected. Analyses of gestational age attributed to the infant support a model in which mother's genome and/or maternally-inherited genes acting in the fetus are largely responsible for birth timing, with a smaller contribution from the paternally-inherited alleles in the fetal genome.
Our findings suggest that genetic influences on birth timing are important and likely complex.
Preterm birth is a major public health concern. In the United States, 12.8% of births occur before term (<37 weeks) . Infants born before term have an increased risk of neonatal mortality as well as serious health problems, such as respiratory illness, blindness and cerebral palsy . Moreover, the severity and incidence of these problems worsen with decreasing gestational age . The impact of this disorder grows with the increasing rate of preterm birth in recent decades .
A wealth of evidence supports maternal genetic influences on preterm birth. For example, birth timing is highly consistent across pregnancies in the same woman [4,5,6,7,8,9]. Moreover, the most likely age for a recurrent preterm birth to a given mother is the same week as the first preterm birth [6, 9, 10] suggesting that factors that are stable over time, such as genetics, affect birth timing. Additionally, mothers and daughters , and sisters  share risk for delivering preterm. Heritability studies in twins indicate that genes account for about 30% of variation in preterm delivery [12, 13] and child's gestational age as continuous trait [12, 14], when the mother is considered the proband of a delivery. Similar studies comparing full and half siblings for children's gestation age estimated that 14% of variation may be due to maternal genetic factors .
Several lines of evidence further suggest that fetal genetic effects may influence birth timing. First, fetal genes that are paternally imprinted mainly control placental and fetal membrane growth . Because the placenta and fetal membranes likely play a role in preterm birth, fetal genes controlling these tissues may also contribute. Additionally, a study comparing the correlation in gestational age between full and half siblings suggests that preterm birth is influenced in part by fetal genetic factors . Lastly, several studies suggest that paternity affects risk for the disorder. For example, several studies indicate that partner changes between pregnancies reduced risk of preterm birth [17, 18]; however, changes in paternity may reflect association with long interpregnancy intervals rather than paternity effects per se. Paternal race also has been associated with preterm birth risk. Previous studies observed that preterm birth rates are highest when both parents are Black and remain higher when one parent is Black, whether that parent is the mother or father [19, 20], suggesting that fetal race also influences birth timing. However, father's family history of preterm birth has been shown to have only a weak association with risk. While an early study of a Norwegian birth registry demonstrated a correlation between father and children's gestational ages , a more recent and extensive study of this registry suggested fathers contributed little to no risk to preterm delivery . Similarly, a recent study  suggested that paternal genetics contributed little to gestational age, but could not refute the possible role of maternally-inherited genes expressed in the fetus. Hence, while paternally-inherited genes may contribute little to preterm birth or other disorders, maternally-inherited genes expressed in the fetus may still be important. Together, these data suggests that the fetal genome may contribute to birth timing, motivating further study defining the infant as the proband.
While multiple lines of evidence suggest genetic contributors are important in preterm birth, a specific mode of inheritance has not been identified. No prominent simple Mendelian pattern of inheritance has been observed in multiplex pedigrees identified to date. Modeling procedures used by twin studies suggest that additive genetic factors and environmental risk factors that are not shared between siblings both influence preterm birth [12,13,14]. Moderate values of lambda sibling (λS), a measure of risk to siblings of affected individuals compared to the population risk for a disorder, estimated for preterm birth (λS (95% CI): 4.3 (4.0–4.6))  are also consistent with complex genetic and environmental etiologies . Moreover, association studies have reported gene-gene [25, 26] and gene-environment [27,28,29] interactions with preterm birth. Together, these studies imply that the etiology of preterm birth likely involves genetic as well as environmental factors in complex interactions. However, there has not been a systematic study of possible genetic models for preterm birth to date.
In this study, we performed segregation analyses to identify the best fitting genetic model for gestational age, a quantitative proxy for preterm birth. Because either mother or infant can be considered the proband from a preterm delivery, and there is evidence to suggest that genetic factors in either one or both may influence the trait, we performed segregation analysis for gestational age as a quantitative trait either attributed to the infant, infant's gestational age, or to the mother, by averaging the gestational ages at which her children were delivered, using 96 multiplex preterm families. We also tested parent of origin models for infant's gestational age to examine whether mother's genotype is the sole determinant of variation in this trait. Additionally, as pregnancies in which either the mother [4, 7] or father [19, 20] is Black are at increased risk for preterm delivery, we performed segregation analysis for each phenotype in the total sample, as well as stratified by Black and White race, to test for heterogeneity between these two groups.
Proband mother-infant pairs were initially identified through premature birth of a live singleton fetus before 37 weeks of gestation  by review of delivery logs at university medical center hospitals at Washington University and University of Helsinki or by self-identification through the study's website from 2003 to October 2007. To avoid misclassification bias at borderline gestational ages, we defined preterm birth as <35 weeks in the US cohort and <36 weeks in the Finland cohort. Our gestational age criterion was less stringent in the Finland cohort due to the high number of early ultrasounds performed, leading to more accurate gestational ages in this cohort. To include only families with spontaneous onset of preterm singleton birth, the following mother-infant pairs were further excluded: elective deliveries without spontaneous onset of labor and deliveries in which either maternal (e.g. systemic infection) or fetal (e.g. malformation) disease with known predisposition to premature birth was indicated. Families were extended through affected individuals on both maternal and paternal sides until no additional first degree relatives were identified as either mother or infant of a preterm delivery. Families were recruited into the study only if two or more members were mothers and/or infants of preterm deliveries. 55 families were recruited from the US, of which 31 were Black and 24 were White. 41 White families were recruited from Finland. In the US cohort, families ranged from 9 to 36 individuals with phenotypic information with a median family size of 18 individuals. In the Finland cohort, families ranged from 8 to 73 individuals with phenotypic information with a median family size of 16 individuals. Informed consent was obtained from participants and the study was approved by the institutional review board of Washington University School of Medicine and the ethics committee of Helsinki University Central Hospital.
From the US cohort, an individual's gestational age was calculated based on his or her mother's self report of expected due dates and actual delivery dates for a pregnancy or of how many weeks early that family member was born. For all individuals born at the Washington University School of Medicine, gestational ages were verified from medical records. From the Finland cohort, an individual's gestational age was obtained from medical records. Individuals for whom pedigree information indicated that they were born <35 or <36 weeks without a specific gestational age designated were assigned a gestational age of 34 or 35 weeks, respectively. Similarly, those individuals indicated as ‘term’ were designated 40 weeks. Infant's gestational age was treated as a quantitative trait, which was standardized to a normal distribution (μ = 0, σ = 1) prior to any analysis. A total of 1378 individuals had non-missing phenotypes, with a median of 13 individuals per family (range 3–80). The number of sibpairs with phenotypic information for this phenotype was 309, with a median of 3 sibpairs per family (range 1–7). The median number of generations with phenotypic information was 3 (range 1–5).
For both cohorts, a variable representing the average gestational age of all children born to a given mother was constructed. For mothers who had one or more children born before 37 weeks, this variable was calculated as the mean of the gestational ages for all children born to that woman. Mothers who gave birth to all of their children at term (>37 weeks) were assigned a value of 40 weeks. Mothers for whom one or more children had missing gestational ages were coded as missing. Additionally, all males and females who had not yet given birth were coded as missing. This phenotype was treated as a quantitative trait, which was standardized to a normal distribution (μ = 0, σ = 1) prior to any analysis. Univariate statistics and standardizations for each phenotype were performed with SAS language v. 9.1.3 for Linux OS (SAS Institute, Cary, N.C., USA). A total of 404 individuals had non-missing phenotypes, with a median of 4 individuals per family (range 1–17). The number of sibpairs with phenotypic information for this phenotype was 309, with a median of 0 sibpairs per family (range 0–5). The median number of generations with phenotypic information was 2 (range 1–4).
We used the Pedigree Analysis Package (PAP), Version 5.0  to perform segregation analysis. Under the mixed Mendelian model (model 1), the phenotype is influenced by a major gene, polygenic background and an untransmitted environmental component. The major gene is biallelic (A, a), with allele A, occurring at frequency p, associated with lower trait values. Mean values for the three genotypes (μAA, μAa, μaa, where the order of the means is constrained to be μAA ≤ μAa ≤ μaa) and a common standard deviation for all genotypes are estimated. Parent-to-offspring transmission probabilities for the three genotypes (τAA, τAa, and τaa) also are included in the model. τAA, τAa, and τaa designate the probability of transmitting allele A for the genotypes AA, Aa, and aa, with Mendelian expectations of 1, 0.5, 0, respectively. When the τ's are set equal to p, there is no transmission of the major effect. Polygenic heritability (H) after accounting for the putative major gene effect was also estimated. For parent of origin models, heterozygotes who inherited A from their mother (Aa) and heterozygotes who inherited A from their father (aA) are distinguished from one another and allowed to have different means. Many of the free τ's models did not converge initially. In order to estimate these models, τAA and τaa were fixed to 1 and 0, respectively, and likelihoods calculated under these additional assumptions were used for the analysis.
All parameters were estimated using a maximum likelihood method. Nested models representing null hypotheses were tested against a more general model using a likelihood ratio test (LRT), in which the difference between negative twice the log-likelihood (–2 ln L) values for two models approximates a chi-square distribution with degrees of freedom (d.f.) equal to the number of independent parameter restrictions. The most parsimonious model of those not rejected by likelihood ratio test (p > 0.01) was determined using Akaike's Information Criterion (AIC) , which is computed as −2 ln L of the model plus twice the number of parameters estimated. The model with the lowest AIC indicates the most parsimonious fit to the observed data.
To account for the ascertainment of the families, the likelihood of each model was conditioned on the likelihood of the proband's phenotype under the model, an appropriate correction for the manner in which these families were extended . While our criterion required 2 or more preterm first degree relatives for a family to be enrolled in the study, the ascertainment scheme is approximately equivalent to single ascertainment . Because not all preterm deliveries in the metropolitan St. Louis or Finnish health care systems occurring during the study period were captured under our ascertainment scheme, the probability that a family was identified through multiple probands is small and should be proportional to the number of affected deliveries in a family, as expected under single ascertainment . Under single ascertainment, conditioning on the proband's phenotype is sufficient to adjust for the ascertainment of families . Hence, analyses for infant's gestational age were corrected for the proband infant's gestational age jointly with preterm birth (<37 weeks) affection status. Similarly, analyses attributing gestational age to the mother were corrected for the proband mother's average gestational age of children jointly with her preterm birth (<37 weeks) affection status.
To test genetic heterogeneity among races, we used the heterogeneity χ2 test [36, 37] in which the −2 ln L value under the best fitting model for the combined data is subtracted from the summation of −2 ln L values for stratified analyses to obtain the test statistic. This test statistic approximates a χ2 distribution with d.f. equal to K*J – K where J is the number of subgroups and K is the number of parameters in the model.
We first analyzed gestational age of the infant. The number of subjects and descriptive statistics for this phenotype are shown in table table1.1. Table Table22 presents the LRTs and AIC values for segregation analysis of infant's gestational age for 17 models of inheritance. The parameter estimates for segregation analysis of infant's gestational age are listed in table table3.3. The hypotheses of no familial resemblance (model 2), no major gene effect (model 3), and no multifactorial effect (model 4) are rejected, suggesting the presence of both a major gene and a multifactorial effect. Additionally, the equal τ's hypotheses (models 10, 12 and 14) are rejected for the mixed, recessive mixed and dominant mixed models, respectively. In the free τ's models, the estimated τAa differed from 0.5, expected under the Mendelian model, and fit the data better than their respective general models (models 1, 7 and 8) for all groups. Together, this evidence supports a genetic component for preterm birth transmitted from parents to offspring and suggests that the transmitted effect is complex. Similar models were tested for general parent of origin effects, as well as maternal-specific and paternal-specific effects under Mendelian transmission (models 15–17). The parent of origin model (model 15) best fit the data as judged by AIC values and was chosen as the most parsimonious model (table (table2).2). This model suggests that, when attributing gestational age to the infant of a delivery, genetic factors influence this trait and the parent from whom such factors are inherited influences the overall trait value.
We also analyzed gestational age attributed to the mother, by averaging the gestational ages at which her children were delivered (see table table1).1). Table Table44 presents the likelihood ratio tests and AIC values for segregation analysis on mother's average gestational age of children for 14 models of inheritance. The parameter estimates for the combined dataset are listed in table table5.5. The hypotheses of no familial resemblance (model 2), no major gene effect (model 3), and no multifactorial effect (model 4) are rejected, suggesting the presence of both a major gene and a multifactorial effect. The equal τ's (models 10, 12 and 14) are rejected. None of the free τ's models (models 9, 11 and 13) converged initially. In order to estimate these models, τAA and τaa were fixed to 1 and 0, respectively. Only τAa was estimated and, in each case, differed from 0.5, expected under the Mendelian model. Additionally, the free τ's models fit the data better than their respective general models (models 1, 7 and 8), perhaps suggesting that the major effect observed is more complex than the single biallelic locus modeled here. Together, this evidence supports a genetic component for preterm birth transmitted from parents to offspring and suggests that the transmitted effect is complex. The mixed free τ's model (model 9) best fit the data as judged by the AIC values and was chosen as the most parsimonious model (table (table4).4). This model suggests that a complex genetic model most likely best accounts for variation in gestational age, when the trait is attributed to the mother of a delivery.
Since pregnancies in which either the mother [4, 7] or father [19, 20] is Black are at increased risk for preterm delivery, we tested for evidence of genetic heterogeneity between these two groups. Segregation analyses of infant's gestational age and mother's average gestational age over all her children also were performed in Black and White subgroups. Table Table11 documents the number of subjects and descriptive statistics for both phenotypes by race. The segregation analyses of infant's gestational age supported similar conclusions in the combined sample and Black and White subgroups. However, several results differed in the segregation analysis of mother's average gestational age of children when the sample was stratified by Black and White race, compared to the combined sample. In the race-stratified samples, final estimates of the free τ's models had higher −2 ln likelihoods than did similar models with fewer parameters, indicating that the maximum likelihood was not reached. As a result, the equal τ's hypotheses (models 10, 12 and 14) and the free τ's hypotheses (models 9, 11 and 13) were not rejected for the mixed, recessive mixed and dominant mixed models, respectively, in Black and White subgroup analysis. The best fitting model according to AIC values was the mixed equal τ's model and was selected as the most parsimonious model in both race subgroups (data not shown).
To test formally for heterogeneity between Blacks and Whites, we used the heterogeneity χ2 test [36, 37] in which the −2 ln L value under a given model for the combined data is subtracted from the summed −2 ln L values from stratified analyses. For infant's gestational age, evidence for genetic heterogeneity among races was observed when comparing values under the multifactorial model (χ2 = Σ(698.07 Black + 2010.02 White) − 2690.72 combined = 17.37, 3 d.f., p = 0.0006). Of note, parameter estimates for Blacks and Whites differed, particularly estimates of p, H and μAA which were generally higher in Blacks than Whites. For the multifactorial model, H was estimated as 0.46 (95% CI: 0.27–0.65) for Blacks and 0.23 (95% CI: 0.13–0.33) for Whites. While these point estimates are quite different, the 95% CI overlap, indicating that this difference may not be statistically significant. For mother's average gestational age of children, evidence for genetic heterogeneity among races was observed when comparing values under the multifactorial model (χ2 = Σ(54.92 Black + 213.58 White) − 216.37 combined = 52.13, 3 d.f., p = 2.81 × 10–11). Of note, parameter estimates for Blacks and Whites differed, particularly estimates of H, which are generally higher in Blacks than Whites. For the mixed free τ's model, H was estimated as 0.70 (95% CI: 0–1) for Blacks and 0.23 (95% CI: 0–0.57) for Whites. As the 95% confidence intervals overlap, this difference is not significant in the sample size analyzed here.
Preterm birth likely has a complex etiology involving both genetic and environmental risk factors, based on evidence from previous studies. This study is the first to explicitly test different modes of inheritance for birth timing, by assessing gestational age as a phenotype of either mother or infant. These data lend further support to a genetic component contributing to birth timing since sporadic (i.e. no familial resemblance) and nontransmission models, in which gestational age is attributed to environmental factors alone, are strongly rejected (tables (tables22 and and4).4). Our findings suggest that genetic influences on birth timing are important and likely complex.
For infant's gestational age, the parent of origin model (model 15) best fit the data according to the AIC values and was chosen as the most parsimonious model (table (table2).2). Based on the mean estimates for AA, Aa, aA and aa under this model (table (table3),3), it appears that this parent of origin effect is largely maternal. Heterozygotes who inherit the A allele from their mother (Aa) have a mean closer to, but not equal to, AA. Under a strict maternal model, AA and Aa would be equivalent, since mother's A allele would be expected to be the sole determinant of phenotype. In contrast, heterozygotes who inherited the A allele from their father (aA) have a mean that is approximately equivalent to that of aa, suggesting that father's A allele has little effect on phenotype. A model in which the maternally-inherited allele was the sole determinant of the phenotype (model 16) did not fit the data better than the parent of origin model in which the entire genotype was considered. Importantly, father's genes also affect phenotype in this model, aligning with previous work showing paternity [17, 18] as well as paternal race [19, 20] influence preterm birth risk. Hence, both maternal and paternal alleles seem to contribute to infant's gestational age, with maternally-inherited alleles having a stronger effect on phenotype than those inherited from the father. This finding may support previous studies that have observed stronger effects of mother's race [19, 20] and family history on risk for preterm birth  than those of the father. These data suggest that maternally-inherited genes acting in the fetus and/or maternal genes acting in the mother are largely responsible for birth timing; however, these two possibilities are not easily distinguished. Maternal genetic effects can create the same pattern of phenotypic variation as genomic imprinting . These two classes of effects can be distinguished by comparing the offspring of heterozygous mothers ; however, such comparisons are not possible in our dataset in which individuals were assigned probabilities for each possible genotypic state, rather than having known genotypes measured empirically. Previous studies in cattle [39, 40] have observed maternal genetic effects on gestational age, but did not consider parent of origin effects. Consequently, further study is needed to determine whether maternal effects or imprinting account for our observations. In either case, considering the mother of a preterm delivery as proband may be most useful in identifying genetic contributions to preterm birth.
Segregation analysis of mother's average gestational age of children also supported a complex genetic model. The mixed free τ's model (model 9) best fit the data according to the AIC values and was chosen as the most parsimonious model (table (table4).4). This model, in combination with results from LRTs, suggests that genetic influences on birth timing are important and likely complex. Importantly, fewer individuals are informative when mother is considered the proband in a preterm delivery (fig. (fig.1),1), and the smaller sample size affects the power of the analysis. As a result, the parameter and likelihood estimates made when considering mother as proband are more affected by sampling variance; however, conclusions made by comparing across models should be less affected by sample size.
Overall, mother-based and infant-based analyses both support the importance of genetic factors, perhaps primarily acting in the mother or maternally-inherited alleles acting in the fetus, in birth timing. The genetic component influencing preterm birth likely involves many genes in interaction with environmental and other genetic factors. These results are consistent with previous studies suggesting that genes and environments [12,13,14], as well as gene-gene [25, 26] and gene-environment [27,28,29] interactions influence preterm birth. Estimates from the multifactorial model (model 3, tables tables22 and and4)4) indicate 30–40% of variation in gestational age, attributed to either mother or infant, can be explained by genetics, consistent with estimates from previous twin studies [12, 14]. As twins may not be representative of the population as a whole, our heritability estimates corroborate the general importance of genetics in birth timing. Heritability estimates were generally higher in mother-based analyses (0.44 (95% CI: 0.11–0.77)) than in infant-based analysis (0.33 (95% CI: 0.24–0.42)), but not significantly different (tables (tables3,3, ,5).5). While many genes may be contributing to the observed genetic influence on birth timing, the moderate heritability observed suggests that the cumulative effect of these genes accounts for an important amount of variation in gestational age.
Although alternative methods for segregation analysis exist, we considered PAP to be the best method for our primary goal of identifying the best-fitting genetic model for birth timing. Using PAP, we were able compare models directly and identify the most parsimonious model. In contrast, Markov Chain Monte Carlo (MCMC) methods, such as those used by the Loki  and Morgan  packages, generate a series of posterior probabilities for various models, but no one model is identified as superior. Moreover, MCMC methods model several Mendelian loci simultaneously but do not include a polygenic component, which we wanted to include in the models we examined. Furthermore, one cannot correct for ascertainment within MCMC analysis, in contrast to PAP. One of the disadvantages of using PAP exclusively was that we were not able to estimate the approximate number of loci contributing to birth timing, as could be done with MCMC methods. Additionally, MCMC methods may be better at handling large, complex pedigrees than PAP ; however, since our families were relatively small and simple (e.g. containing no inbreeding), we did not consider this a limitation.
We have likely enriched for genetic and/or common environmental effects by using 96 multiplex families, all of whom were recruited based on having two or more first degree relatives delivered preterm. These multiplex families provided a large sample size from which genetic effects could be examined. However, the ascertainment scheme also introduces bias, as the families were not collected randomly and may overestimate the importance of certain genetic models. To minimize errors due to such bias, all analyses were corrected for ascertainment by conditioning on the initial proband, either the mother or the offspring depending which phenotype was considered, that led to the ascertainment of the family, assuming single ascertainment. However, if our assumption of single ascertainment is incorrect, conditioning on probands in this manner may create bias in estimating model parameters , leading to inaccurate conclusions about the best-fitting model for familial preterm birth. Because we believe our ascertainment scheme is consistent with single ascertainment, these results appear to be most appropriate for modeling genetic effects in familial preterm birth. Yet, it is possible that our conclusions drawn from familial cases of preterm birth may not generalize to all instances of this disorder. Familial cases of preterm birth may have a different genetic contribution than isolated cases, perhaps having different etiologies than isolated cases.
This study is also limited by the phenotypes studied. Information on gestational age was collected by self-report data from questionnaires for many individuals used in this analysis. While it was possible to verify gestational ages from medical records in some cases, including all births delivered at the participating institutions, many gestational ages could not be verified and may be subject to reporting errors. Additionally, many individuals for whom we could not verify precise gestational ages were reported as ‘full term’ and assigned gestational age of 40 weeks. While this is the most likely gestational age for infants to be born , we may lose some variability in the overall distribution of gestational age by doing so. As modeling of trait variance is essential to segregation analysis, this approach also may have affected our results.
Our findings suggest that genetic influences on birth timing are important. Modeling for both mother and infant phenotypes indicate that a genetic component influences gestational age and is complex in nature. In analyses using either mother or infant as proband, monogenic Mendelian models were strongly rejected, suggesting that a single gene model cannot fully explain birth timing in these families. A number of genes probably contribute to the genetic influence on birth timing and preterm birth described. Analyses of gestational age attributed to the infant support a model in which mother's genome and/or maternally-inherited genes acting in the fetus are largely responsible for birth timing, with a smaller contribution from the paternally-inherited alleles in the fetal genome. Hence, considering the mother of a preterm delivery as proband may be more useful in identifying specific genetic contributions to preterm birth. Interestingly, results from the heterogeneity χ2 test comparing race-stratified analyses suggest that genetic influences on birth timing may differ between Blacks and Whites. Hence, in future association studies, race-stratified analyses or population stratification corrections may improve our ability to identify specific genes associated with preterm birth. Overall, as multiple genes in the mother's genome may explain the bulk of genetic influences on birth timing, future studies to identify specific genes influencing preterm birth perhaps will be most fruitful by using large scale studies of mothers’ genomes.
This work was supported by grants from the Children's Discovery Institute at Washington University School of Medicine and St. Louis Children's Hospital and from the March of Dimes awarded to LJM. This research was also supported by T32 GM081739 from the National Institute of General Medical Science and the Mr. and Mrs. Spencer T. Olin Fellowship for Women in Graduate Study at Washington University in St. Louis awarded to JP, a grant from the Sigrid Juselius Foundation awarded to MH, and a grant from the Academy of Finland to RH.