|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: RH MKK AL KT HP MO TV JCM SAM LP LJM AP MH. Performed the experiments: RH MKK BPC JP. Analyzed the data: RH MKK AL TV BPC JP SAM LJM AP. Contributed reagents/materials/analysis tools: LJM MH. Wrote the paper: RH MKK. Contributed in recruiting patients: KT HP. Contributed in obtaining patient data and DNA: HP. Contributed in obtaining the DNA array: LP. Contributed in writing the paper: AP MH. Served as a liaison between the investigators: MH.
Preterm birth is the major cause of neonatal death and serious morbidity. Most preterm births are due to spontaneous onset of labor without a known cause or effective prevention. Both maternal and fetal genomes influence the predisposition to spontaneous preterm birth (SPTB), but the susceptibility loci remain to be defined. We utilized a combination of unique population structures, family-based linkage analysis, and subsequent case-control association to identify a susceptibility haplotype for SPTB. Clinically well-characterized SPTB families from northern Finland, a subisolate founded by a relatively small founder population that has subsequently experienced a number of bottlenecks, were selected for the initial discovery sample. Genome-wide linkage analysis using a high-density single-nucleotide polymorphism (SNP) array in seven large northern Finnish non-consanginous families identified a locus on 15q26.3 (HLOD 4.68). This region contains the IGF1R gene, which encodes the type 1 insulin-like growth factor receptor IGF-1R. Haplotype segregation analysis revealed that a 55 kb 12-SNP core segment within the IGF1R gene was shared identical-by-state (IBS) in five families. A follow-up case-control study in an independent sample representing the more general Finnish population showed an association of a 6-SNP IGF1R haplotype with SPTB in the fetuses, providing further evidence for IGF1R as a SPTB predisposition gene (frequency in cases versus controls 0.11 versus 0.05, P=0.001, odds ratio 2.3). This study demonstrates the identification of a predisposing, low-frequency haplotype in a multifactorial trait using a well-characterized population and a combination of family and case-control designs. Our findings support the identification of the novel susceptibility gene IGF1R for predisposition by the fetal genome to being born preterm.
Preterm birth is the major cause of infant deaths and life-long neurologic and cardiopulmonary morbidity. More than 10% of births in the United States occur prematurely, and the rate is increasing without known effective prevention. Previous premature birth increases the risk 3-fold in subsequent pregnancies. We report here, for the first time to our knowledge, a genome-wide study on susceptibility to spontaneous preterm birth in singleton pregnancies. To detect novel regions of the genome associated with preterm birth, we performed linkage analysis on seven carefully selected large families with recurrent spontaneous premature births. When we studied the fetuses, evidence was found for linkage of a region on chromosome 15 with spontaneous preterm birth, with the highest linkage signals occurring within a single gene, IGF1R. Evidence of the involvement of this gene in the etiology of preterm birth was further strengthened by subsequent haplotype segregation analysis and case-control analysis of an independent patient population. The IGF1R gene encodes insulin-like growth factor receptor 1 (IGF-1R), an important protein that potentially regulates signaling cascades involved in the onset of labor. Our analyses are unique in providing evidence that fetal IGF1R influences the risk of spontaneous preterm labor, leading to preterm birth.
Preterm birth, defined as birth before 37 wk of gestation, accounts for an estimated 2 million annual deaths worldwide and is the major cause of serious morbidity in infants born preterm. Currently, approximately 12% of all births in the United States are premature. The serious acute diseases of prematurely born infants are principally caused by functional immaturity. Common life-long diseases that result in deteriorating quality of life among individuals born preterm include a chronic respiratory disease called bronchopulmonary dysplasia; retinopathy of prematurity, which is the most common cause of blindness in infants; cerebral palsy; and cognitive disorders . The majority of preterm births (approximately 70%) occur after spontaneous onset of labor; nearly 50% of these cases are preceded by rupture of fetal membranes. Apart from excessive uterine distension in multiple pregnancies or certain fetal malformations, and severe maternal diseases such as sepsis and abdominal trauma, no obvious environmental risk factors can be identified in most preterm births. Activation of spontaneous preterm labor and preterm birth is thought to result from the action of multiple pathways and mechanisms, including endocrine dysfunction or ascending intrauterine infection and inflammation that can lead to the induction of labor-producing mediators . Despite ongoing research efforts, there is no effective medication for the prevention of spontaneous preterm birth (SPTB).
A history of SPTB of a single fetus is a strong predictor of its recurrence in families . Approximately 20% of mothers with a preterm delivery have another baby born preterm , suggesting that factors that are stable over time, such as genetics, affect birth timing . Mothers and daughters  and sisters  share the risk of delivering preterm. Twin studies suggest a heritability estimate of about 30% –. Both the fetal and maternal genome, as well as gene–gene and gene–environment interactions are likely to influence predisposition to SPTB. Several studies using fetal or maternal DNA have reported associations of individual gene polymorphisms . These studies have focused on genes involved in infection, inflammation, and innate immunity; e.g. those encoding the cytokines tumor necrosis factor alpha and interleukins 4, 6, and 10, and mannose-binding lectin –. However, most of these associations were not replicated in subsequent studies and across populations. So far, only case-control candidate gene studies have been conducted for SPTB. Genome-wide methods of identifying genes a priori may reveal genes not considered to be obvious candidates, which are potentially important unexplored sources of variability in preterm birth. These studies may contribute to defining the risk of SPTB and developing potential preventive interventions.
The overall aim of the present approach was to define major genes that influence the susceptibility to SPTB. In this first report, we describe a SNP-based genome-wide linkage and haplotype segregation analysis of recurrent familial SPTB using a strictly defined phenotype and carefully selected families, followed by case-control association analysis of a study population independent of the subjects used for the linkage scan. The linkage scan was performed with seven large families originating from northern Finland, where the population is characterized by genetic homogeneity, making it advantageous for gene-mapping studies . With the phenotype defined as being born preterm, significant linkage signals (HLODmax=4.68) were obtained for chromosome locus 15q26.3 in a region harboring IGF1R (MIM *147370), the gene encoding insulin-like growth factor receptor 1 (IGF-1R). Haplotype segregation analysis performed for markers encompassing the IGF1R gene revealed prominent identical-by-descent (IBD) within-family and identical-by-state (IBS) between-family haplotype sharing among affected relatives. Evidence of the involvement of IGF1R in the etiology of SPTB was further strengthened by case-control analysis of an independent cohort located in northern and southern Finland, with a 6-SNP IGF1R haplotype overrepresented in SPTB infants. In summary, evidence from our linkage, haplotype sharing, and association analyses implicated IGF1R as a candidate gene for susceptibility to SPTB.
The overall study design is illustrated in Figure 1.
We selected the families for the linkage analysis from a total 120,000 births that took place in a single regional hospital in northern Finland; this region is characterized by a homogeneous and stable population with a low prevalence of prematurity (approximately 5.5–6.5% of all births). We chose mothers with recurrent SPTB using very stringent criteria, with known risk factors for SPTB and elective preterm births without labor among the exclusion criteria. We identified 120 mothers with at least two spontaneous singleton preterm deliveries. Family interviews revealed 20 large families with multiple relatives affected by SPTB. According to a genealogical study, these families were non-consanginous. Finally, families with apparent maternal inheritance of SPTB were chosen for the analysis.
We conducted parametric linkage analysis using seven large northern Finnish families with recurrent SPTB. Because genetic factors acting either on the fetus or the mother may influence SPTB, we used two settings in this study: affected fetus or infant phenotype (being spontaneously born preterm as the phenotype, n=41) and affected mother phenotype (giving spontaneous preterm birth as the phenotype, n=21). The pedigrees of the families are shown in Figure S1.
Before the analysis, markers were linkage-disequilibrium (LD) pruned to exclude high-LD SNPs, leaving 6377 markers with an average distance of 0.43 Mb between consecutive markers. We considered a heterogeneity logarithm of odds (HLOD) score of >2 as an initial signal of linkage. When we studied the affected fetus phenotype, analysis of the pruned marker set revealed HLOD scores of >2 on six autosomes, as depicted in Figure 2. The maximum HLOD score (HLODmax) of 2.59 was detected for SNP rs2715416 on chromosome locus 15q26.3 (θ=0.04, α=1.00). Figure 2 also shows the linkage signals for mother-based analysis, with an HLODmax of 1.53 at rs11167102 on chromosome locus 8q24.3 (θ=0.00, α=1.00).
For the affected fetus phenotype, we performed fine mapping using the unpruned marker set on the regions flanking (approximately 5 Mb) the six initial linkage signals. We considered an HLOD of >3 as a further sign of linkage. Table 1 reveals that the three markers with an HLOD of >3 were all on the same chromosome locus 15q26.3 and shows the HLOD scores in the fine-scale analysis for each of the seven families with recurrent SPTB. The HLODmax of 4.68 was obtained at rs2684811. Because one preterm infant not fulfilling the stringent criteria of SPTB (preterm infant from a twin pregnancy) was included among the premature births, we repeated the linkage analyses while excluding this infant from the affected individuals, with little effect on the results. All three markers with the highest HLOD scores are located within a single gene IGF1R. Interestingly, the markers that yielded the second- and third-highest HLOD scores in the mother-based analysis (SNPs rs329292 and rs11247268 with HLOD scores of 1.51 and 1.48 respectively; Figure 2) were located on the same chromosomal region (15q26.3) as the marker with HLODmax in the infants, with a distance of approximately 2.2 Mb between the markers. Because of the colocalized linkage signals in both the fetal- and mother-based analysis in the region including IGF1R, we chose to explore this gene in greater detail.
We performed haplotype segregation analysis using the SNPs flanking the highest linkage peak within IGF1R in the six linked families 24, 70, 126, 150, 185 and 253. Family 210 did not undergo haplotype segregation analysis due to an absence of linkage in this family (Table 1). The analysis was performed for 30 SNPs covering a 330 kb region encompassing the entire IGF1R gene. We aimed to resolve whether one or more distinguishable haplotype segments inferred from these SNPs cosegregated with SPTB. While not suitable as such for statistical evaluation, segregation analysis is a useful tool for an empirical hypothesis-generating approach . The model used for the segregation analysis was initially designed to be best-fit to the mode of inheritance (MOI) used in the linkage analysis (dominant MOI, allowing for the presence of healthy carriers as transmitters of a disease-cosegregating haplotype). However, haplotype-sharing analysis offers the advantage over parametric linkage analysis of dissecting the genetic data regardless of true MOI.
IGF1R haplotypes for family 70 with the highest linkage signal are represented in Figure 3, and Figure S2 shows these haplotypes for all of the linked families, including family 70. In all six families, and considering the haplotypes comprising 30 SNPs spanning the entire IGF1R gene, we observed prominent IBD within-family haplotype sharing among the affected relatives (Figure 3 and Figure S2). A single disease-cosegregating haplotype was shared IBD within family in families 70, 126, 253 and 185. In the remaining two families, 24 and 150, an IBD-shared haplotype was identified in a subset of members of each family, and a second haplotype derived from another carrier was identified in a different subset of members. On the whole, the segregation analysis supported the assumed model very precisely, because it predicted the carriership of a disease-cosegregating haplotype in 34 out of all 38 affected individuals (absent from individuals 70-2, 150-15, 150-19 and 150-24), whereas only six nonaffected family members were deemed to be carriers of a disease-cosegregating haplotype (present in individuals 24-1, 24-5, 24-11, 24-9, 150-8, and 150-26) (Figure 3 and Figure S2). The recombinations that we observed, particularly that in the unaffected male 253-5, also fit the segregation model well. Three of the families (126, 253, and 185) showed a complete haplotype-disease cosegregation (Figure S2). However, unexpected male carriers were identified in four families (spouses of females 24-2 and 70-1, and males 253-1 and 150-3). Families 24 and 150 were consistent with a mixed maternal-paternal bilineal transmission pattern, and family 70 exhibited nearly complete penetrance with mixed unilineal maternal–paternal transmission. The segregation patterns were consistent with complete penetrance in families 126, 253, and 185; with 100% maternal transmission in families 126 and 185; and unexpected 100% paternal transmission in family 253.
An attempt to identify a similar IBD- or IBS-shared chromosomal segment among families led us to discover two core haplotypes with overlapping locations (illustrated in Figure S2). A 55-kb 12-SNP haplotype, CAGACGATACTC (core I, comprising the interval between SNPs rs1879612–rs2715416), was shared IBS among five families: all of families 24, 70, 126 and 253 and part of 150. Another 79-kb 11-SNP core haplotype, ATGTGTAATGT (core II, SNPs rs2684761–rs3743259), was shared IBS between part of family 150 and the whole of family 185. The disease-segregating chromosomes with the core II haplotype were maternal in 100% of preterm-born individuals carrying these chromosomes, while haplotypes with core I were maternal in only about half (47%) of the cases.
All of the linked families shared one of the core haplotype segments, I or II. To demonstrate further the relevance of these haplotypes, we compared the haplotype frequencies in the affected and unaffected members of the linked families with those of an independent Finnish reference population from the Nordic Database Lund-Malmö dataset . These Finnish reference samples (referred to as Nordic–Finn, n=955) are derived from a region in Bothnia in which there is evidence of western late settlement. The people in this region are genetically close to those in the region of western early settlement representing the origin of the linkage families . Thus, these samples represented a good control and were also better matched with our study population than HapMap CEU individuals (CEPH; Utah residents with ancestry from northern and western Europe). Frequencies of haplotype core segments I and II were 0.30 and 0.33, respectively, in the affected members of the linkage families. Frequencies of core I and II haplotypes were 0.17 and 0.12, respectively, in the unaffected members of the linkage families, and 0.18 and 0.06 in the Nordic–Finn population. Thus, frequencies of the core haplotypes in the reference population were close to those of the unaffected members of the linkage families, while these frequencies were increased in the affected members of the linkage families. The higher frequency of haplotype core II in the unaffected members of the linkage families compared to Nordic–Finn individuals is explained by the high overall incidence of this haplotype in the largest linked family (Family 150; Figure S2). In terms of the LD patterns (Figure S3) in the unaffected individuals, the genetic profile of our study population was representative of the Nordic–Finn reference population. Furthermore, there was a short two-SNP major-allele haplotype AT at rs11630259–rs1357112 shared IBS between core segments I and II in 100% of the affected inviduals (n=38) and predicted carriers (n=18) (Figure S2), whereas the same haplotype was present in only 62% of the unaffected family members (n=37) and 72% of the Nordic-Finn individuals. Whether this particular segment sharing reflected the critical location of disease predisposition or arose by chance, could not be statistically evaluated in the current setting. Unfortunately, these two SNPs, which are not in LD with the surrounding region SNPs (Figure S3) and thus could not be imputed from other SNPs, failed to settle in the Sequenom iPLEX association platform and therefore could not be included in the following case-control association study.
We additionally performed a post hoc linkage analysis in which we used the transmission information obtained from the segregation analysis model, with both unaffected carriers and individuals born preterm defined as affected (naffected=55). An overall increase in the HLOD scores was observed within IGF1R, with an HLODmax of 3.81 at rs2715416 after pruning and an HLODmax of 5.15 at rs2684811 without pruning, further supporting the view that this genomic region may harbor a true susceptibility gene.
To validate the potential linkage and association between IGF1R and SPTB, we enrolled a new Finnish population comprising cases originating from the northern (Oulu) and southern (Helsinki) regions of the country; these cases were independent of the families used for the linkage analysis. Tagging SNPs covering the entire IGF1R gene were determined using HapMap data from the CEU population. Among the 20 SNPs studied, two (rs7165181 and rs4966038) showed a weak association (P<0.05) in the infants but not in the mothers (Table 2). Even so, a 55-kb region of six SNPs in LD (Figure 4) spanning these two SNPs and extending to rs2715416 (which yielded the original linkage signal) showed statistically significant haplotype association with SPTB in the infants exclusively (Table 3). Similar associations were evident in both recurrent and sporadic SPTB. The signals obtained from independent sets of study populations using linkage, segregation and association approaches were colocalized within the same region of IGF1R (Figure 5 and Table S1) and were consistently observed in the infants but not the mothers; i.e., when the affected phenotype was being born preterm instead of giving birth to preterm infant(s). The birthweights and gestational ages of preterm infants carrying the associating haplotype (n=71; 2,025±626 g; 32.6±3.0 wk; mean ± standard deviation) did not differ significantly from those of preterm infants without this haplotype (n=263; 2,105±616 g; 32.3±2.9 wk; P values of 0.64 and 0.35, respectively). Similar to our control population, the predicted frequency of the associating haplotype was 0.052 in the HapMap CEU population. In HapMap CHB (Han Chinese in Beijing, China) and JPT (Japanese in Tokyo, Japan) populations, allele and haplotype distibutions were completely different from our controls, as well as from the HapMap CEU population. The CHB and JPT populations completely lacked the associating haplotype. We were not able to estimate the frequency of this haplotype in the rest of the HapMap populations, including YRI (Yoruba in Ibadan, Nigeria), due to unavailable genotype data for part of the SNPs in these populations.
According to epidemiological studies, there is evidence for a heritable predisposition to preterm birth with both maternal and fetal contribution . To our knowledge, the present report describes the first genome-wide investigation and the first linkage study to identify genomic regions associated with SPTB. We detected significant parametric HLOD scores in the infants for three intronic markers (rs1521480, HLOD=3.63; rs4966936, HLOD=3.63; and rs2684811, HLOD=4.68) on chromosome 15q26.3 within a single candidate gene, IGF1R, encoding the type 1 insulin-like growth factor receptor IGF-1R (Table 1).
We identified one major disease-cosegregating haplotype (core I) in all but one of the six linked families (Figure 3 and Figure S2). Rather than IBD, this sharing is likely to be IBS, because of the high frequency of this haplotype in the population, a view consistent with the genealogical survey, which suggested no common ancestry among the linked families. The occurrence of an alternative shared haplotype segment (core II) in all of family 185 and part of family 150 may reflect allelic heterogeneity or absence of true linkage in this subgroup, which would be typical of complex phenotypes even in an isolated population . Taken together, our data on linkage and disease segregation of IGF1R SNPs are consistent with a role for this genomic region in SPTB under dominant MOI with incomplete penetrance allowing for healthy carriers or disease transmitters under etiological heterogeneity (the existence of phenocopies). However, because the segregation pattern supports the model of disease-cosegregating IGF1R haplotype transmission via both healthy male and female carriers, our initial MOI based on pure unilineal maternal transmission is likely to be oversimplified.
We examined the gene encoding IGF-1R in a separate investigation with a Finnish population originating from two regions of the country, which revealed an association of a fetal IGF1R haplotype with SPTB (Table 3 and Figure 5). Both the pattern of maximal haplotype sharing in the linked families and the region of association observed in the case-control study independently placed SPTB susceptibility on the same segment within IGF1R. As a whole, these analyses provide evidence that the fetal IGF1R influences the risk of spontaneous preterm labor leading to preterm birth.
IGF-1R is a heterotetramer composed of two extracellular alpha subunits containing a ligand-binding site for IGFs and two membrane-spanning beta subunits harboring intracellular tyrosine kinase activity involved in a variety of cellular functions , . Upon activation by IGF-1 or IGF-2, IGF-1R participates in regulation of the cell cycle. Accordingly, certain IGF1R mutations result in intrauterine growth restriction, whereas polymorphisms of this ubiquitously expressed gene may not influence fetal growth –. The roles of IGF-1R in normal and pathological growth and differentiation and aging involve interactions with the ubiquitous growth factors, hormones, and proinflammatory cytokines that are considered to be mediators of the labor process –. Several IGF-binding proteins that regulate IGF-1R–dependent signaling cascades have been studied in the context of SPTB, and some of them have been implicated in preterm labor –. However, these studies did not involve IGF1R and thus the extension of studies involving IGF-1R (particularly the ligand-binding alpha subunit encoded by exons 1–10) to the process of labor, with special consideration of fetal involvement in the endocrine and paracrine control of the preterm labor process, is clearly indicated. The range of regulatory roles performed by the IGF system is consistent with our view that IGF1R influences susceptibility to SPTB. Furthermore, a recent study revealed a parent-of-origin-specific methylated site within intron 2 of IGF1R , which localizes to the same region that was identified in our haplotype segregation and case-control association analyses. This site was predominantly methylated on the maternally derived chromosome and may be involved in the imprinting process. This finding suggests a possible epigenetic mechanism through which IGF1R may be involved in the preterm labor process. In our linkage families, the core II IGF1R haplotype was maternally transmitted to the individuals born preterm, whereas the core I haplotype had a mixed transmission. This is an important aspect to consider in future studies involving IGF1R and preterm birth.
In the present study, we obtained no clear linkage or association signals when the outcome phenotype was giving preterm birth (affected mother phenotype). Because the number of affected mothers was small (n=21), the power to detect regions of linkage was limited. Although the maternal HLODmax was <2, which we considered to be below an initial threshold suggesting linkage in this study, we identified the regions yielding the highest maternal linkage signals (HLODs of 1.53 and 1.51 for markers rs11167102 and rs329292 on chromosomes 8q24.3 and 15q26.3 respectively). An interesting finding was that the SNP with the maternal HLOD score of 1.51 was located on the same region as the one with the HLODmax obtained from the infants, separated by 2 Mb (Figure 5). To conclude, the lack of association in the mothers and the observation of both maternal and paternal transmission in the segregation analysis unexpectedly did not support a major maternal contribution in IGF1R-mediated genetic susceptibility to SPTB. Our discovery of a fetal gene was unforeseen, but does not exclude the role of maternal effects via other susceptibility genes. Interestingly, a recent study suggested that the fetal genome plays a more significant role than the maternal genome in individuals of European ancestry . However, other studies have pointed out the importance of the maternal genome in genetic predisposition to SPTB , .
Our study has several strengths compared to previous genetic studies of preterm birth. These candidate gene studies have been limited in several respects, such as small or mixed populations that may represent only the mother or the fetus, lack of replication, inconsistencies in phenotype definitions and incomplete coverage of variation within a gene . In contrast, our study first focused on detecting novel genes involved in SPTB using a nonhypothesis-driven genome-wide approach. We evaluated both the maternal and fetal contribution. Additionally, careful attention was paid to selection of the families for the linkage analysis to ensure a precise phenotype definition. Lastly, we replicated the finding of a linked and disease-cosegregating region in an independent case-control cohort covering a large part of the entire country. The effect size we observed in the infants fell within the range of power calculations described in Materials and Methods. Although the Finnish population represents a higher genetic similarity than most populations, a clear substructure caused by population history is known to exist within the country . However, the linkage and segregation analyses were performed using a relatively homogenous study population, which is amenable to using a search for shared genomic segments . Our case-control cohort comprised two regions (northern Finland and southern Finland/Helsinki) known to represent slightly distinct patterns of genetic variation; the Helsinki region is known to be representative of a genetically more diverse population , . The fact that the association was observed across these two subpopulations gives further credence to our findings.
Our results constitute proof of principle for using a genome-wide SNP linkage scan to elucidate a complex heterogenic trait via stringent initial selection of a limited set of individuals. Despite the important strengths mentioned above, our study has some limitations. The result remains yet to be confirmed and extended by independent case-control analyses in populations with larger sample sizes than in the current study, and by studying additional SNPs covering the large gene more extensively. It is possible that while the ethnically homogenous nature of our Finnish linkage families can facilitate detection of genetic factors, the relatively high degree of genetic similarity among the Finnish population compared to others may mean that this finding cannot be generalized to more outbred populations. The frequency of the associating IGF1R haplotype in the HapMap CEU population was similar (0.052) to the frequency of controls in our case-control analysis (Table 3), while this haplotype was completely lacking from HapMap CHB and JPT populations, reflecting population-specific variation within this genomic segment. In the current setting, it cannot be evaluated whether the observed association was due to a haplotypic effect or another linked polymorphic site that was not analyzed in the current study. Therefore, this region needs to be more thoroughly investigated to find the causative sequence variant(s) and also to determine whether other populations may have an IGF1R-mediated risk of SPTB. Furthermore, although we discussed here only regions with an HLOD of >3, other regions with initial linkage signals on chromosomes 2, 4, 10, 12, and 13 should not be ignored as candidates for the risk of spontaneous preterm labor leading to preterm birth.
In conclusion, our genetic linkage and haplotype segregation analysis mapped the novel fetal SPTB susceptibility gene IGF1R on chromosome 15q26.3; this was further confirmed by association in an independent case-control population replicate. This result was unexpected, because the studies on the etiology of SPTB have been focused on cytokine-mediated inflammatory signaling as a possible route of predisposition by the maternal genome. Future clarification of the molecular mechanism of a growth factor pathway with considerable influence on the heritable proportion of the risk of spontaneous premature labor and birth is likely to open up a new potential avenue for preventive therapies.
Written informed consent was obtained from the participants and the study was approved separately by the Ethics Committee of Oulu University Hospital and that of Helsinki University Central Hospital.
We selected families for the linkage analysis from among the population born in the Oulu University Hospital region. We selected mothers with recurrent SPTB from approximately 120,000 birth diaries between 1975–2005 (prospectively from 2002). To avoid misclassification bias at borderline gestational ages, we defined SPTB as spontaneous onset of labor and birth before 36 (rather than ≤37) completed weeks of gestation. Both labors initiated with intact membranes and those following premature rupture of fetal membranes (PROM, defined as leakage of amniotic fluid as the presenting symptom before the onset of contractions) were included among the SPTBs. Additional exclusion criteria were known risk factors for SPTB (multiple gestation, polyhydramnios, septic infection or chronic disease of the mother, narcotic or alcohol abuse, accidents, and fetuses with congenital anomalies) and all elective preterm births without labor (preeclampsia, intrauterine growth restriction, and placental abruption). Only families of Finnish origin were included. We identified 120 families altogether, with at least two spontaneous singleton preterm births. The family interview revealed that 20 of them were actually large families with multiple relatives affected by SPTB. To search for potential consanguinity, we performed a genealogical study in accordance with the published criteria , tracing ancestors from the Finnish Population Registries and scrutinizing microfiche copies available in the provincial and national archives of Finland. The genealogical survey showed that most of the large families originated from Northern Ostrobothnia. However, neither close consanguinity nor the common residence of ancestors dating from the 17th century was apparent. Because prior evidence suggests that a maternal family history of preterm birth may have a greater influence on this disorder, we selected families with apparent maternal inheritance of predisposition to preterm birth, excluding families with seemingly paternal or mixed transmission of the preterm birth phenotype. Using these criteria, seven families were selected. Whole-blood DNA samples (n=89) were taken from both affected and unaffected family members. The pedigrees of the families appear in Figure S1. Six of the mothers who gave preterm birth were themselves born preterm.
We selected mothers with a history of at least one SPTB and their preterm infants from singleton pregnancies from the regions of Oulu (northern Finland) and Helsinki (southern Finland) University Hospitals for an association study using the same phenotypic criteria as for the linkage analysis, resulting in a case population of 348 mothers (244 from Oulu and 104 from Helsinki) and 334 infants (238 from Oulu and 96 from Helsinki). One infant per family was included using the criteria described . One hundred and forty-six (42.6%) of the mothers gave birth following PROM and 197 (57.4%) had labor initiated with intact membranes, while 128 (39.5%) of the infants were born following PROM and 196 (60.5%) without PROM. For five mothers and ten infants, there was no information concerning the occurrence of PROM. The control population consisted of 143 mothers and 197 newborn singleton infants who were prospectively recruited from Oulu University Hospital in 2004–2005 among mothers who had at least three exclusively term deliveries with no pregnancy- or labor-associated complications. It is of note that this part of the study was not restricted to familial SPTB, because it included mothers with both recurrent (n=79) and sporadic (n=265) SPTB and their preterm infants (recurrent n=76 and sporadic n=258). All of the individuals from the families included in the linkage analysis and their relatives were excluded from the association study.
DNA was prepared from whole blood specimens (n=89) by standard methods. DNA samples were genotyped in the Broad Institute Center for Genotyping and Analysis (CGA) using the Affymetrix Genome-Wide Human SNP Array 5.0Kb consisting of 500,568 SNP markers.
Whole blood (n=797) and buccal cell samples (n=305) were used for DNA extraction. Buccal DNA was extracted using Chelex 100 (Bio-Rad, Hercules, CA, USA), after which it was were whole-genome amplified (WGA) in duplicate reactions using the Illustra GenomiPhi V2 DNA Amplification kit (GE Healthcare Sciences, Cardiff, UK), followed by pooling of the duplicates and purification with Illustra Microspin G-50 columns (GE Healthcare Sciences). The quality and quantity of the WGA samples was confirmed with ethidium bromide–stained agarose gels and UV absorbance measurements, and by including nontemplate reactions to control for DNA contamination. In the set-up stage, we genotyped the WGA products in parallel with the corresponding unamplified DNA for a set of test SNPs, yielding >99% consistent genotypes. IGF1R SNP genotyping was performed at the Institute for Molecular Medicine Finland FIMM Technology Centre using the Sequenom iPLEX Gold assay on the MassARRAY Platform for user-defined SNP sets.
We processed the Affymetrix array data files using PLINK, v. 1.02 . We used a 1 Mb-to-1 cM converted map, as justified in . Before linkage analysis, the markers were pruned using a linkage disequilibrium (LD) r2 threshold of 0.35 with PLINK. Markers with either minor allele frequency (MAF)<0.08, genotyping failure >0.1 or Mendelian errors were excluded, as well as those violating the Hardy–Weinberg equilibrium (HWE, P<0.001). Data processing after removal of the high-LD SNPs yielded a selection of 6,377 autosomal SNPs for the genome-wide linkage analysis. We rechecked the genotype data for any Mendelian inconsistencies using PedCheck  before proceeding to linkage analysis.
We studied two main outcome phenotypes for SPTB: (1) being spontaneously born preterm (affected fetus/infant, naffected=41), and (2) giving spontaneous preterm birth (affected mother, naffected=21) (pedigrees in Figure S1). A parametric two-point linkage analysis of SPTB as a dichotomous trait was performed with ANALYZE, v. 1.9.3 BETA, which is a linkage and LD analysis package that uses FASTLINK 4.1P to calculate the pedigree likelihoods and is capable of managing both extended pedigrees and large numbers of markers . We used a dominant low-penetrance model, assuming a disease allele frequency of 0.001 and penetrances of 0.001, 0.001, and 0 for the homozygotes, heterozygotes and wild-type homozygotes, respectively. This kind of model is nearly convergent with a model-free analysis, because it minimizes the effect of misspecifying the true MOI while retaining the highest power of a parametric analysis to detect linkage , . Parametric linkage in the presence of heterogeneity was assessed using heterogeneity LOD (HLOD) scores and their accompanying estimates of the proportion of linked families (α) and recombination fraction (θ).
We applied two-point linkage analysis to avoid bias that could result in multipoint analyses due to missing parental genotypes and markers residing in LD . Our linkage analysis strategy was to perform an analysis with an LD-pruned marker set and then to screen regions with initial linkage signals (HLOD>2) with a denser marker map including all markers with MAF>0.08 on the region. Using high-density SNPs in linkage analysis has the advantages of a low error rate, high per-marker call rates, and higher information content when compared to microsatellites . In two-point analysis, LD does not increase type 1 error .
When a marker showed HLOD>2 (six positions), we selected the genomic region for further analysis. This fine-scale linkage analysis was performed using the unpruned marker set on the flanking region (~5 Mb) of each of the initial linkage signals with the same parametric model as the original scan. We considered an HLOD score of >3 as a signal of linkage.
We used PedPhase version 2.0 utilizing integer linear programming (ILP) to find the minimum-recombinant haplotype configuration in the linked families 24, 70, 126, 150, 185 and 253 for the SNPs flanking the best linkage peak . Deceased or untyped individuals were also included if their haplotypes could be reconstructed from their relatives. The potential presence of disease-cosegregating haplotypes (“affected haplotypes”, affected fetus phenotype) was evaluated using a model compatible with the assumption of maternal unilineal inheritance with incomplete penetrance, utilizing the scheme of the most likely segregation pattern to search for a minimum number of affected haplotypes. The model assumed dominant MOI, allowing for the presence of healthy carriers as transmitters of an affected haplotype.
We used an independent Finnish reference population of the Nordic Database Lund-Malmö dataset . These samples (referred to as Nordic–Finn, n=955) are derived from a region in Bothnia in which there is evidence of western late settlement. We used Beagle 3.1  to infer the haplotype phases for unrelated individuals in the Nordic–Finn samples.
IGF1R is a large, >300 kb gene with nearly 2,000 SNPs (NCBI B36 assembly dbSNP b126), but little frequent exonic variation; it has relatively weak intragenic LD and is not readily divided into discrete blocks of limited haplotype diversity. Using HapMap data (release 23a/phaseII) from the CEU population to determine tagging SNPs (tSNPs) covering the entire IGF1R gene (chr15: 97,004,502–97,329,396), we obtained a list of 100 tSNPs (pairwise tagging, r2 cutoff 0.8, MAF>0.1). Given the number of individuals available for the study and the power to detect associations, it was reasonable to set up a single iPLEX set, which can include up to approximately 30 compatible SNPs. To select SNPs, we visualized the gene region's LD pattern in the HapMap CEU population in the Haploview program v. 4.1  and selected one SNP with the highest MAF from each haplotype block designated by this program. We also included the coding SNP rs2229765 (Glu1043Glu, also referred to as 3174 G>A) because individuals carrying at least one A allele are reported to have lower levels of free plasma IGF-1 than GG homozygotes , suggesting that this polymorphism may have a functional effect. After testing and validation, a selection of 20 SNPs (listed in Table 2 and details in Table S1) remained for the case-control association study.
We performed statistical tests using Haploview v. 4.1 . We took multiple testing into account by performing permutations with 10,000 replicates and considered a corrected P value of <0.05 to be statistically significant. For analysis of haplotypes with birthweight and gestational age, we inferred phased IGF1R haplotypes from unphased data using Beagle 3.2 . To analyze potential associations between haplotypes and birthweight or gestational age, the nonparametric Mann-Whitney U-test was used with Predictive Analytics SoftWare (PASW) statistics, version 17.0.3 (IBM SPSS, Inc.). Power consideration of the case-control study: the sample size provides 80% power (alpha=0.0025, allowing for multiple testing of 20 SNPs) to detect the risk allele carrier relative risks of approximately 2.1–2.7 in the infants and 2.4–3.1 in the mothers for allele frequencies ranging from 0.2 down to 0.05, considering SPTB as a discrete trait, assuming causal SNP, a prevalence of 0.05, and using the allelic 1 df test .
All of the chromosomal positions refer to NCBI Build 36 of the human genome.
Pedigrees of the seven families with recurrent SPTB included in the linkage analysis.
(0.46 MB PDF)
IGF1R haplotype segragation in the linked families.
(2.21 MB PDF)
Pairwise marker linkage disequilibrium (LD) in the unaffected members of the linked families and the reference population.
(1.92 MB PDF)
Combined results of IGF1R SNPs in fine-scale linkage analysis and association analysis with the affected fetus phenotype.
(0.08 MB PDF)
We thank Maarit Haarala, Outi Kajula, Mirkka Ovaska, and Riitta Vikeväinen for sample collection and technical assistance and Tero Hiekkalinna and Joseph Terwilliger for helpful scientific advice in linkage analysis.
The authors have declared that no competing interests exist.
This research was supported by the Academy of Finland (grant number 105734 to RH, 126662 and 125911 to MH), the Sigrid Juselius Foundation (MH), the Foundation of Pediatric Research in Finland (RH) and University of Oulu Faculty of Medicine (RH). Oulu University Hospital was supported by the Ministry of Social Affairs and Health (MH), the Wellcome Trust (grant number WTO89062 to AP), the Academy of Finland Center of Excellence for Complex Disease Genetics 200923 (AP), and by grants from the March of Dimes, the Children's Discovery Institute at Washington University School of Medicine, and St. Louis Children's Hospital awarded to LJM. The Broad Institute Center for Genotyping and Analysis was supported by grant U54 RR020278 from the National Center for Research Resources. The Malmo-Lund study was supported by Novartis Pharmaceuticals, the Sigrid Juselius Foundation, the Folkhalsan Research Foundation, and a Swedish Research Council Linne grant; and the data source was funded by the Nordic Center of Excellence in Disease Genetics based on samples regionally selected from Finland and Sweden. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.