Search tips
Search criteria 


Logo of ijbiostatThe International Journal of BiostatisticsThe International Journal of BiostatisticsSubmit to The International Journal of BiostatisticsSubscribe
Int J Biostat. 2011 January 1; 7(1): 32.
Published online 2011 August 23. doi:  10.2202/1557-4679.1305
PMCID: PMC3173606

Modeling Fetal Weight for Gestational Age: A Comparison of a Flexible Multi-level Spline-based Model with Other Approaches


We present a model for longitudinal measures of fetal weight as a function of gestational age. We use a linear mixed model, with a Box-Cox transformation of fetal weight values, and restricted cubic splines, in order to flexibly but parsimoniously model median fetal weight. We systematically compare our model to other proposed approaches. All proposed methods are shown to yield similar median estimates, as evidenced by overlapping pointwise confidence bands, except after 40 completed weeks, where our method seems to produce estimates more consistent with observed data. Sex-based stratification affects the estimates of the random effects variance-covariance structure, without significantly changing sex-specific fitted median values. We illustrate the benefits of including sex-gestational age interaction terms in the model over stratification. The comparison leads to the conclusion that the selection of a model for fetal weight for gestational age can be based on the specific goals and configuration of a given study without affecting the precision or value of median estimates for most gestational ages of interest.

Keywords: multi-level models, fetal growth, small for gestational age

1. Introduction

Poor fetal growth is strongly associated with adverse perinatal outcomes such as neurological damage, seizures, organ failure and perinatal mortality (Gabbe et al., 2007). Conventionally, a fetus or infant whose weight is in the smallest 10 percent of the population at a given gestational age is classified as small for gestational age (SGA) (Zhang et al., 2010b), and considered to be at increased risk of perinatal morbidity and mortality (Gabbe et al., 2007). Accurate weight-for-gestational-age percentiles are therefore needed to determine which fetuses may require additional investigations and closer monitoring. Conventional weight-for-gestational-age reference charts (Alexander et al., 1996, Kramer et al., 2001, Källén, 1995, Skjaerven et al., 2000), however, have an important shortcoming. Their trajectories of fetal “growth” are derived from the cross-sectional weights of infants born at different gestational ages, not from serial measurements of individual fetuses throughout pregnancy. Since preterm livebirths (infants born prior to 37 weeks of gestation) have been shown to be, on average, smaller than their in-utero peers (Fry et al., 2002, Hediger et al., 1995, Marsál et al., 1996, Ott, 1993, Weiner et al., 1985), models of fetal growth based on cross-sectional birth weight measurements are biased at younger gestational ages.

The introduction of obstetrical ultrasound has made it possible to estimate fetal weights prior to birth, and several models of fetal growth based on ultrasound estimates of fetal weight have been proposed. Hadlock et al. (1991) fitted a series of polynomial models to estimated fetal weight as a function of gestational age. The best model used a quadratic polynomial and log-transformed weight values, and was selected on the basis of the largest coefficient of determination (R2), of the smallest standard deviation and of the inspection of the residuals for uniformity of variance. However, since their dataset contained only a single estimated weight per pregnancy, it was not designed to be a model of fetal growth per se.

Royston (1995) fitted a multi-level model to longitudinal estimated ultrasound fetal weight data. A Box-Cox transformation and fractional polynomials were used in order to stabilize variance and linearize the relationship between gestational age and transformed weight. A linear mixed model, which allowed for a random slope and a random intercept, was fitted to the transformed data.

Hooper et al. (2002) fitted a quadratic polynomial to the natural logarithm of the weight data. This polynomial regression fit produced residuals, which were then normalized by fitting a linear spline to the normal probability plot of the residuals and fitting a model for the standard deviation of the residuals. Combining those two estimates, they transformed the residuals into z-scores, which were split into their measurement error and latent score components, the latter being defined simply as the raw score stripped of its measurement error component.

Pan and Goldstein (1997) fitted a multi-level model for pediatric growth on weight z-scores computed by applying Cole’s Lambda, Mu, Sigma (LMS) method (Cole and Green, 1992). The skewness, mean and variance functions, at the core of the LMS method, were estimated using cubic splines. They were combined to create a transformation function that uses weight values as input and outputs the required z-scores. The model was then used to derive unconditional and conditional norms for growth (conditional on past growth). Conditional norms could be estimated by fitting a linear regression model that has z-scores at a number of previous time points as covariates.

The lack of population-level, longitudinal fetal weight measurements has limited widespread adoption of longitudinal fetal growth references. However, there is now renewed interest in the collection of data to update fetal growth references, including ongoing work by the US National Institute of Health to develop a new National Standard for Normal Fetal Growth (Zhang et al., 2010a). Such work will need to select a longitudinal model for fetal weight for gestational age. Although different approaches to modeling longitudinal fetal weight measurements have been proposed, it is unknown to what extent the choice of model affects the estimates of fetal weight for gestational age, and which model, if any, is superior. A systematic comparison of the models is needed. Existing models may also benefit from improvements. Increasing model flexibility by including a large number of random effects parameters can now be more easily implemented thanks to advances in mixed model statistical software. Mixed models additionally offer convenient ways to deal with heteroscedasticity. Finally, adequately accounting for the influence of fetal sex, an important physiological determinant of fetal weight, may also improve fetal weight-for-gestational-age models.

2. Objectives

In this light, we sought to:

  • Present a new, parsimonious, and more flexible modeling strategy for fetal weight for gestational age based on a sex-specific multi-level model,
  • Systematically compare medians from the proposed modeling strategy with those of Royston (1995), Hooper et al. (2002) and Pan and Goldstein (1997), a description of which can be found in the appendices.

3. Data

The dataset consists of longitudinal ultrasound data and birth data for singletons collected in Scandinavia from 1986 to 1988 for the Successive SGA Birth study (Bakketeig et al., 1993). Ultrasound biometric measurements were obtained at antenatal study visits at 17, 25, 33 and 37 weeks. Fetal abdominal circumference, femur length and biparietal diameter were combined using Hadlock’s formula (Hadlock et al., 1984) to derive estimates of fetal weight. The original study included 1945 pregnancies, which included a 10% sample of the general obstetrical population (n=561) and an over-sampling of high-risk pregnancies (n=1384). In the 10% random sample, there were 454 pregnancies with both ultrasound and birth data; these were retained for the present analysis. Of these 454 pregnancies, 3 pregnancies were excluded because they had no complete birth weight - gestational age or estimated fetal weight - gestational age pairs, leaving 451 pregnancies for analysis. Estimates of gestational age were established using the date of the last normal menstrual period (LNMP) confirmed by an estimate of gestational age from early ultrasound through the following algorithm: In the case where the 17 week ultrasound estimate differed from that based on LNMP by less than 14 days, the LNMP estimate was chosen. If the difference was larger or equal or if the date of LNMP was unknown, the ultrasound estimate was chosen. The dataset is the largest of which we are aware to have 5 or more serial ultrasound measurements from an unselected obstetrical population. The representativeness of the study population, frequency of ultrasound measurements, and high study quality make it an ideal input for comparing models.

4. Methods

We fitted a linear mixed model to the fetal weight measurements using restricted cubic splines based on the truncated power basis (Harrell, 2001) to flexibly model the association between gestational age and weight. We included random effects on both the intercept and gestational age terms in order to account for between-fetus variability. Finally, since the variance of weight measurements increases with gestational age (Hadlock et al., 1991) we considered two approaches to manage heteroscedasticity:

  1. The weight variable was transformed by means of a variance-stabilizing technique,
  2. A variance structure that allows for heteroscedasticity was imposed on the residuals or the random effects.

A Box-Cox transformation was a natural choice in approach 1. The choice of the scaling parameter was driven by the need to make the output data as normally distributed and homoscedastic as possible. In order to ensure this, we adopted a REML-based approach devised by Gurka et al. (2006). However, applying the inverse transformation to the transformed model results in biased estimates of the mean, but not the median, on the original scale (Duan, 1983), so we present medians in subsequent analyses.

4.1. Sex-based differences in fetal weight

Evidence suggests that weight differences between male and female fetuses could be important as early as 14 completed weeks of pregnancy (Schwärzler et al., 2004). Therefore, in order to produce sex-specific weight-for-gestational-age curves in such a way that their shapes remain flexible and that the number of variance parameters to be estimated is reduced, we included sex-gestational age interaction terms in the model.

4.2. Model formulation

The model was formally specified as



  • Wi is a vector of weight values for individual i,
  • g(.) is the Box-Cox or the identity transformation,
  • βm, m = 1, 2,..., 2k – 1, is the vector of fixed effect coefficients,
  • bi,j, j = 1, 2,..., k – 1, is the vector of random effects,
  • Si is the code for sex,
  • Tij is gestational age expressed in the truncated power series basis,
  • k is the number of knots,
  • epsiloni ~ MVN(0, Σepsilon),
  • bi ~ MVN(0, Σb).

4.3. Comparison with other proposed approaches

Since the models proposed by Royston (1995), Hooper et al. (2002) and Pan and Goldstein (1997) ignore sex, they were fitted separately on data for male and female fetuses. Although our own model makes the assumption of a common variance-covariance structure for the weights of males and females, this assumption was relaxed in order to make the comparisons meaningful. This is equivalent to using stratification.We then fitted the model under the initial assumption of shared variance-covariance parameters to establish any potential advantages of not stratifying.

The comparisons were based on the location and the precision of the median estimates, as evidenced by their 95% pointwise bootstrap confidence bands, found after fitting the different models to the Scandinavian dataset. In addition, at term ages, we compared the median estimates to those found in a national birthweight-for-gestational-age chart (Skjaerven et al., 2000) based on births in Norway between 1987 and 1998. Comparisons with this chart were not made at preterm ages, since charts based on the cross-sectional weights of preterm newborns are known to be biased relative to fetal weight charts (Fry et al., 2002, Hediger et al., 1995, Marsál et al., 1996, Ott, 1993, Weiner et al., 1985).

4.4. Software

We carried out the analyses in R 2.9.1.

5. Results

5.1. Data characteristics

The observed weights for gestational age in the Scandinavian sample are shown in Figure 1. The strong and continuous increase in the variance of weight over time highlights the need to use modeling methods capable of dealing with heteroscedasticity. In Figures 2 and and3,3, randomly selected individual weight trajectories are shown. The occasional stabilization or decrease of weight at term (37 to 41 completed weeks of gestation) observed in certain trajectories may be the result of measurement error in estimated fetal weight at the 37 week antenatal visit. Work by Bertino et al. (1996) as well as Hooper et al. (2002) indicates that the observed flattening of the weight-for-gestational-age trajectory may also be due to a progressive decrease in growth velocity after 35 completed weeks, although it seems unlikely that this would explain the weight losses observed. Missing data were rare, with 98% of births (440/451) having at most one missing weight measurement. Among the 451 pregnancies, 19 resulted in a pre-term birth, i.e. before 37 weeks (259 days), leading to a preterm birth rate of approximately 4%.

Figure 1:
Pregnancies from the Scandinavian dataset: Circles represent weight/gestational age pairs at birth and triangles represent estimated weight/gestational age pairs in utero.
Figure 2:
Individual weight-for-gestational-age paths for males sampled from the Scandinavian dataset. Circles represent weight/gestational age pairs at birth and triangles represent estimated weight/gestational age pairs in utero.
Figure 3:
Individual weight-for-gestational-age paths for females sampled from the Scandinavian dataset. Circles represent weight/gestational age pairs at birth and triangles represent estimated weight/gestational age pairs in utero.

5.2. Mixed model characteristics

Basic spline regression models require the prior specification of knot positions. As recommended in the absence of substantive knowledge (Harrell, 2001), we placed knots at the 5th (119 days), 27.5th (175 days), 50th (232 days), 72.5th (262 days) and 95th (287 days) percentiles. Varying the number of knots did not result in noticeable improvements in model fit.

A summary of the fitted models is found in tables 1 and and2.2. A visual comparison of fitted curves for individual fetuses revealed that imposing non-zero off-diagonal elements in the random effects’ variance/covariance matrix did not strongly affect the fitted values, even if a certain amount of covariance did exist between the random effects. Imposing a highly-parameterized structure on these estimators also led to convergence problems. Therefore, we selected a diagonal variance structure. With untransformed weights, a random intercept was unwarranted, because the weights of all fetuses were essentially identical prior to 10 weeks.

Table 1:
Linear mixed model parameter estimates
Table 2:
Linear mixed model parameter estimates

On the transformed scale, specifying a residual correlation structure with non-zero off-diagonal elements did not sizably change the parameter estimates or their standard deviations. Therefore, we imposed the assumption of uncorrelated residuals, which is reasonable considering that measurements are taken many weeks apart. On the untransformed scale though, the assumption that residuals have an AR(1) covariance structure led to slightly different median estimates, but with noticeably lower standard deviations than when they were computed under the assumption of no correlation between the residuals.

Overall, transforming the data seemed to be beneficial. Although the median weight estimates from both fits, i.e. the one obtained by using the original weight values and the one obtained by using transformed weight values, were similar, the bootstrapping procedure revealed that those from the model fitted to transformed weight values had lower variance starting from approximately 272 days. It also revealed that transforming the data makes convergence of the linear mixed model components estimation procedure more likely. Indeed, the fitting procedure always converged when transformed weight values were used, whereas 28 runs out of 1028 (2.7%) led to a convergence error with untransformed data. For all these reasons, we retained only the model fitted to the dataset containing transformed weights for further comparisons.

5.3. Model comparisons

Results of sex-specific models were essentially similar; the results obtained using male fetuses are presented (results for females available upon request). Figure 4 and table 3 show that median estimates from the different methods remain very close. The Skjaerven et al. (2000) curve follows the anticipated trajectory: since its mean estimates are based on livebirths only, they fall below the other models’ median estimates at preterm. Due to the positive skewness of the livebirth weight-for-gestational-age distributions at preterm, differences between medians would be even larger at that time. From conception to late term, confidence bands (based on 1000 bootstrap iterations) for all models overlapped, although differences in median estimates were larger beyond 41 weeks, as illustrated by figure 5. Starting at 40 weeks, medians obtained after fitting our model were the closest to the mean values reported by Skjaerven et al. (2000). Before 40 weeks, other estimates were occasionally closer, but the range between the maximum and minimum median estimates always remained small. Further, all 95% pointwise confidence bands between 32 and 40 completed weeks contained the means reported by Skjaerven et al. (2000).

Figure 4:
Weight-for-gestational-age medians. The shaded region represents the proposed model’s 95 % pointwise confidence band.
Figure 5:
Weight-for-gestational-age median trajectories. The shaded region represents the proposed model’s 95% pointwise confidence band.
Table 3:
A comparison between the fitted medians (in grams) and their 95% confidence bands, and the means reported by Skjaerven et al. (2000)

Figure 5 and table 3 further show that with our method, as well as with that of Pan and Goldstein (1997), progression in weight for gestational age seems to become linear after 40 weeks. This would be at odds with the results of Skjaerven et al. (2000) though, which show a steady decline of the growth rate after 38 completed weeks. This decrease may have been due to errors in the estimation of gestational ages or selective delivery of fetuses at later gestational ages. Alternatively, since a restricted cubic spline imposes linearity after the last knot, it should be verified whether the assumption of linearity is in contradiction with the data, although sparseness limits inference. We originally placed the last knot at the 95th quantile, 287 days. Moving it to 294 days did not strongly affect the shape of the curve (data available upon request), indicating that the observed pattern was not solely an artefact due to the use of a restricted cubic spline.

Our model included an interaction term for the effect of sex on fetal weight for gestational age. Since the models against which ours was compared did not include such a term, they were fitted separately to male and female fetuses. Although the other models could be modified so as to take sex into account, we thought it better to select the method, namely stratification, that required the simplest assumptions, even though splitting the data in such a way can be seen as inefficient, since features of the distribution of weights are likely shared between the two groups (Johnsen et al., 2006). In this light, assuming at least a shared variance/covariance structure would not seem unreasonable. This is equivalent to our model’s inclusion of sex-gestational age interaction. Indeed, it can be understood from model equation 1 that, on one hand, weight values obtained for individual male and female fetuses are assumed to share the same covariance structure and, on the other hand, the extent of the contribution of the fixed effect component to observed or estimated fetal weight varies differently through time based on fetal sex. Although the inclusion of additional data does not impact strongly the value or the precision of the median estimates from our model, it does affect the random effects variance structure (see tables 1 and and4,4, and figures 6 and and77 in the appendix). If this structure was of specific interest, then we would benefit from not stratifying.

Table 4:
Linear mixed model parameter estimates

6. Discussion

In this article, we presented a new, flexible, sex-specific model for fetal weight for gestational age, and systematically compared the model to other proposed approaches. We established that at most gestational ages, the choice of model does not have a meaningful impact on the median estimates of fetal weight for gestational age, insomuch as the models’ 95% pointwise confidence bands have a large degree of overlap. However, the greater accuracy of estimates of weight-for-gestational-age medians at late term of the proposed model is important, because information on fetal growth at post-term ages may inform clinical decision-making on induction of labour. Overestimation of the median is likely to lead to an overestimation of the SGA threshold as well, which would produce an overestimation of the number of pregnancies with abnormal fetal growth. This being said, due to the quickly diminishing number of data points after 40 completed weeks, obtaining low-variance estimates at very late ages becomes difficult. While focusing on the estimation of the 10th percentile would be valuable, larger sample sizes than are currently available would be needed to obtain reasonable accuracy. Estimation of the median weight for gestational age is nevertheless useful, since an understanding of normal fetal growth is needed to be able to define and identify abnormal growth.

The differences between median estimates from the different methods before late term remain small and are therefore unlikely to be of clinical significance. Therefore, the selection process can safely be guided by the model-specific features that make a given model best suited to the underlying characteristics or objectives of a study. For instance, Hooper et al. (2002) proposed a way to isolate the measurement error from the latent weight component of estimated fetal weight, as to allow for the derivation of latent weight percentiles. The proposed model also presents several advantages. Its flexibility, in comparison to polynomial regression models for instance, makes it an obvious candidate for modelling growth. Its parsimony also makes it very appealing. Further it readily offers a concise and straightforward parametrization of variance and covariance. The mixed spline regression model is commonly taught in statistics and epidemiology programs, and scholars are very familiar with its formulation, implications and limitations. It has become a very popular approach to handle non-linear relationships, such as the one between gestational age and fetal weight. For the first time, this standard approach has been systematically compared to methods especially tailored to the problem of fetal growth. Spline regression methods are readily available in most software packages (as compared to some of the specialized approaches used previously). The practical implication of this work for epidemiologists and statisticians interested in modeling fetal growth is that these models mostly provide similar estimates of median weight-for-gestational-age, so that modeling strategies may be selected based on other criteria such as ease of use.

In general, fitting a model to a dataset in which the response variable has been transformed may produce results that are hard to interpret. For instance, fixed-effect coefficient values will not be expressible on the original scale and the retransformed mean will be biased. (Duan, 1983) However, percentile estimators, which are generally of greater relevance in the study of fetal growth and identification of intrauterine growth restriction, (McIntire et al., 1999) still remain unbiased. In this context, we are more interested in the estimation of median weight for gestational age than in interpreting the coefficients themselves.

Estimates of individual weight-for-gestational-age trajectories are strongly affected by measurement error in estimated fetal weight, which results from the use of a formula to estimate fetal weight from ultrasound biometric measurements, as well as operator-error at the time of ultrasound. Fortunately, this source of error has been shown to be mostly non-systematic (Dudley, 2005), so that while individual-level estimation of fetal weight may be error-prone, population medians should not be greatly affected. Uncertainty in gestational age is a second source of measurement error in longitudinal modeling of fetal weight. However, in our data, the gestational age estimates were validated by the use of both the ultrasound and the LNMP estimates, providing confidence that the measurement error on gestational age remains small on average.

Two different kinds of weight data are being used in the model-fitting process, namely ultrasound-based estimates and precise measurements made at birth. We would expect the coefficient of variation for ultrasound-based estimates to be higher, due to measurement error. Model variance estimates may also be affected. However, the Box-Cox transformation dampens changes in variance, irrespective of their patterns. Therefore, since we fitted models on the transformed scale, it seems unlikely that the median fits were significantly biased by the reduced variance of birth weight data.

Most models proposed to date were only adjusted for the influence of gestational age on fetal weight. It is worth considering if other adjustment terms in addition to sex should be considered. Parity, ethnicity and maternal BMI have all been shown to be significant predictors of fetal weight (Gardosi et al., 1995b,a, Mongelli and Gardosi, 1995). However, the clinical relevance of customization for maternal characteristics remains controversial (Hutcheon et al., 2008b,a). Another way to further personalize a weight-for-gestational-age curve would be to make it conditional on attained weight. Such an approach has been advocated by Royston (1995). A conditional estimate of weight becomes in essence an estimate of growth. However, temporal distance between successive weight measurements tends to dampen correlation and conceal the effects conditioning might have on the variance of individual predicted weights on the short term. Since successive measurements in our dataset are often more than four weeks apart, a weight estimate conditioned on a measurement taken less than a month earlier would essentially rely on an arbitrarily imposed covariance structure. Measurement error, too, would negatively impact the predictive power of an individual conditional weight estimate. Results by Hutcheon et al. (2010) indicate that the potential theoretical gains from conditioning are practically negligible when evaluated in the clinical setting.

While the goal of this study was to compare different methods for characterizing normal fetal growth, the methodologies examined in this paper could also be applied to the study of pediatric and adolescent growth. Further, the models could easily be modified to take other predictors into account. %For instance, maternal serum folate concentration in the second trimester and third trimester has been shown to be associated with birth weight (Goldenberg, Tamura, Cliver, Cutter, Hoffman, and Copper, 1992, Scholl, Hediger, Schall, Khoo, and Fischer, 1996) and therefore, measuring this predictor routinely during pregnancy might be warranted.

The dataset used in this study has several advantages, notably its relatively large sample size, the four longitudinal measurements resulting from unselected ultrasound scans and the quality of its weight and gestational age estimates. However, it would be of interest to be able to generate estimates applicable to different obstetrical populations, e.g. a North American population. Future work should determine whether the model can be adapted or rescaled to reflect fetal growth in different obstetrical populations.

Efforts are currently being made to update weight-for-gestational-age reference charts through the creation of a US national ultrasound standard for fetal growth (Zhang et al., 2010a). On the other hand, a new method could also be proposed to improve reference charts by combining data from difference sources that are already available such as data from routine clinical ultrasounds, serial ultrasound research studies, and population birthweight data. As both approaches will require selecting a model for median weight for gestational age, we believe that our work will be especially useful in this context.

Appendix A. Royston’s method

Royston (1995) proposed a multi-level model to predict weight for gestational age, with either a random intercept alone or with both a random intercept and a random slope, as well as a flexible fractional polynomial functional form, and a Box-Cox transformation to correct for heteroscedasticity. The model is formulated,


where Tij is the time of the jth measurement on the ith individual, Wij(λ) is the weight value after a potential Box-Cox transformation with scale parameter λ, μi is the intercept for individual i, βi is the slope for individual i, g(.) is the fractional polynomial transformation and ɛijN(0,σɛ2) is the residual term. The random-effect vector (μi, βi) is assumed to follow distribution MVN((μ, β), Σ), with Σ being an arbitrary variance-covariance matrix. If the scale parameter is set to 1 then the transformation function reduces to identity.

Royston based model selection on a fixed-effect alternative to the previously specified linear mixed model, that is, one in which random effects are replaced by individual-specific non-random intercept and, if required, slope terms. The choice of the optimal λ is based on its maximum likelihood estimate in model


where μi is assumed fixed.

Allowing individual-specific slope terms might render the transformation of the response variable unneeded. However, there may be some lingering residual non-normality or heteroscedasticity. Because of this, four models are compared (all models are fixed-effects models and have at least subject-specific intercepts):

  1. W linear in g(T) with a common slope,
  2. W linear in g(T) with separate slopes for each subject,
  3. W ([lambda with circumflex]) linear in g(T) with a common slope,
  4. W ([lambda with circumflex]) linear in g(T) with separate slopes for each subject.

Since model 1 is nested in model 2 and model 3 is nested in model 4, they can be readily compared. In order to do this, a F-ratio is computed, as a test of non-parallelism. A significant value for this ratio means that subject-specific curves should not be assumed parallel and thus, that the separate-slope model should be adopted. Non-nested models cannot be readily compared, hence Royston’s suggestion to use a pseudo-F-statistic to compare models 2 and 4. This test will verify whether the transformation is successful in reducing residual non-normality and/or heteroscedasticity. Based on the results of the previous steps, an analogous linear mixed model is fitted to replace the fixed-effects model that had previously been derived for selection purposes. In other words, individual-specific intercepts and, if needed, slopes will now be assumed to be correlated normal random variables (within-individual).

Appendix B. Hooper et al.’s method

In the model proposed by Hooper et al. (2002), weight for individual i obtained from the jth ultrasound examination, denoted Wij, is expressed in grams and the corresponding gestational age, Tij, is expressed in weeks. Estimated fetal weight measurements taken between 14 and 42 completed weeks are transformed into z-scores. The transformation is expressed as


q(t) being an estimated quadratic curve obtained by fitting a weighted-least-squares regression model of log(Wij) against Tij, with statistical weights inversely proportional to the square root of the number of examinations per subject. The difference between log(w) and q(t) is the residual function. These residuals are transformed to approximate normal scores. This is done by fitting a linear spline with 6 knots to the normal probability plot of the residuals. In other words, theoretical quantiles from a standard normal distribution are regressed against residual quantiles. The spline function is called g(.).

The function h(t) is an estimate of the standard deviation of g(r) at time t, where r corresponds to log(w) – q(t). It is estimated by fitting a quadratic spline to {g(rij)2, tij} pairs. The standard deviation corresponds to the square root of the fitted spline.

Weight is re-obtained by applying the inverse transformation,


With this method, weight-for-gestational-age percentiles can be derived easily by setting z, assumed to follow a standard normal distribution, to the level corresponding to the required quantile.

A latent score is defined as a z-score stripped of its measurement error component. Let Wj denote weight obtained at gestational age tj on an individual randomly sampled from the population. Let Zj = f(Wj, tj) be the corresponding z-score. The joint distribution for (Z1,..., Zn), conditional on (T1,..., Tn), is assumed to be multivariate normal with mean 0 and variance 1. To get latent score estimates, the z-score is reexpressed as


with Lj being the latent score component of Zj and Uj being measurement error. It follows that


A model called ALB is recommended to estimate the covariance structure, now denoted by cL(t1, t2), with parameters in cL(t1, t2) estimated by minimizing the objective function


dijk being a weight value and tij < tik. After all parameters contained in cL(t1, t2) have been estimated, variance values can be derived by setting t1 equal to t2.

Prediction intervals at gestational age t with 1 – α coverage probability are bounded by ±zαcL(t,t), with zα denoting the αth quantile of the standard normal distribution. A fetus is now considered small for gestational age if its latent score falls under the 10th percentile, i.e. under the lower bound of the 80% confidence interval.

Appendix C. Pan and Goldstein’s method

Pan and Goldstein (1997) developed a method to quantify growth in children, which involves modeling separately the mean, M(t), the coefficient of variation, S(t), and the Box-Cox power curve, L(t). The method is called LMS, for λ (L(t)), μ (M(t)) and σ (S(t)). Maximum penalized likelihood estimation is used to derive the preceding function estimates. These functions belong to the family of cubic splines with knots at each distinct value of t. Percentiles are estimated with the formula


with dα being the (1 – α)th percentile of the standard normal distribution. An equivalent z-score can thus be obtained,


W being a weight value. Z(t) is referred to as an empirical LMS (or ELMS) score.

Under the assumption that the LMS procedure provides normally distributed scores, a two-level model is constructed:




i being an index for measurements within individual j, nj being the number of measurements for individual j and m being the number of individuals. This is a typical example of polynomial regression with random effects. The polynomial is set to be of degree p and the number of random effects is q. Residuals are assumed to be uncorrelated and homoscedastic. The authors note that this may not be reasonable if measurements are taken close in time.

If serial measurements are available, a norm for a new measurement conditional on the previous two, expressed as


with ɛjN(0,σɛ2), can be obtained. Z3j can be standardized by subtracting from it the fixed part of the right-hand side of (12) and dividing the total by the residual variance. In other words, Z3j*, i.e. Z3j after standardization, can be expressed as


where σɛ2 is the residual variance in (12). The standardized estimates are found by substituting the Zij’s in (13). Corresponding percentiles can then be derived.

Appendix D. Extra figures

Figure 6:

An external file that holds a picture, illustration, etc.
Object name is ijb1305f6.jpg

Simulated individual weight-for-gestational-age trajectories. The curves have been simulated based on a model stratified on the sex variable.

Figure 7:

An external file that holds a picture, illustration, etc.
Object name is ijb1305f7.jpg

Simulated individual weight-for-gestational-age trajectories. The curves have been simulated based on a model with sex-gestational age interaction.


Author Notes: The authors would like to thank Michael S. Kramer for his helpful comments and suggestions. J.A.H. is the recipient of a post-doctoral Fellowship Award from the Canadian Institutes of Health Research (CIHR); R.W.P. is a Chercheur-Boursier of the Fonds de la Recherche en santé du Québec. R.W.P., L.V. and M.S.K. are members of the Montreal Children’s Hospital Research Institute, which receives operating funds from the Fonds de la recherche en santé du Québec. This work was also supported by grant MOP-84379 from the CIHR.

Contributor Information

Luc Villandré, McGill University Health Centre.

Jennifer A Hutcheon, University of British Columbia.

Maria Esther Perez Trejo, McGill University.

Haim Abenhaim, McGill University.

Geir Jacobsen, Norwegian University of Science and Technology.

Robert W Platt, McGill University.


  • Alexander GR, Himes JH, Kaufman RB, Mor J, Kogan M. “A United States national reference for fetal growth” Obstet Gynecol. 1996;87:163–168. doi: 10.1016/0029-7844(95)00386-X. [PubMed] [Cross Ref]
  • Bakketeig LS, Jacobsen G, Hoffman HJ, Lindmark G, Bergsjœ P, Molne K, Rœdsten J. “Pre-pregnancy risk factors of small-for-gestational age births among parous women in scandinavia” Acta Obstet Gynecol Scand. 1993;72:273–279. doi: 10.3109/00016349309068037. [PubMed] [Cross Ref]
  • Bertino E, Battista ED, Bossi A, Pagliano M, Fabris C, Aicardi G, Milani S. “Fetal growth velocity: kinetic, clinical, and biological aspects” Arch Dis Child Fetal Neonatal Ed. 1996;74:F10–F15. doi: 10.1136/fn.74.1.F10. [PMC free article] [PubMed] [Cross Ref]
  • Cole TJ, Green PJ. “Smoothing reference centile curves: the lms method and penalized likelihood” Stat Med. 1992;11:1305–1319. doi: 10.1002/sim.4780111005. [PubMed] [Cross Ref]
  • Duan N. “Smearing estimate: a nonparametric retransformation method,” J Amer Statist Assoc. 1983;78:605–610. doi: 10.2307/2288126. [Cross Ref]
  • Dudley NJ. “A systematic review of the ultrasound estimation of fetal weight.” Ultrasound Obstet Gynecol. 2005;2:5, 80–89. URL [PubMed]
  • Fry AG, Bernstein IM, Badger GJ. “Comparison of fetal growth estimates based on birth weight and ultrasound references” J Matern Fetal Neonatal Med. 2002;12:247–252. doi: 10.1080/jmf. [PubMed] [Cross Ref]
  • Gabbe SG, Niebyl JR, Simpson JL. Obstetrics: Normal and Problem Pregnancies. 5th Edition. Vol. 1. New York, NY: Churchill Livingstone; 2007.
  • Gardosi J, Mongelli M, Wilcox M, Chang A. “An adjustable fetal weight standard.” Ultrasound Obstet Gynecol. 1995a;6:168–174. doi: 10.1046/j.1469-0705.1995.06030168.x. URL [PubMed] [Cross Ref]
  • Gardosi JO, Mongelli JM, Mul T. “Intrauterine growth retardation” Baillieres Clin Obstet Gynaecol. 1995b;9:445–463. doi: 10.1016/S0950-3552(05)80374-8. [PubMed] [Cross Ref]
  • Goldenberg RL, Tamura T, Cliver SP, Cutter GR, Hoffman HJ, Copper RL. “Serum folate and fetal growth retardation: a matter of compliance?” Obstet Gynecol. 1992;79:719–722. [PubMed]
  • Gurka MJ, Edwards L, Muller KE, Kupper LL. “Extending the Box-Cox transformation to the linear mixed model,” J Roy Statist Soc Ser A. 2006;169:273–288. doi: 10.1111/j.1467-985X.2005.00391.x. [Cross Ref]
  • Hadlock FP, Harrist RB, Carpenter RJ, Deter RL, Park SK. “Sonographic estimation of fetal weight. the value of femur length in addition to head and abdomen measurements” Radiology. 1984;150:535–540. [PubMed]
  • Hadlock FP, Harrist RB, Martinez-Poyer J. “In utero analysis of fetal growth: a sonographic weight standard” Radiology. 1991;181:129–133. [PubMed]
  • Harrell F. Regression Modeling Strategies, with Applications to Linear Models, Survival Analysis and Logistic Regression. New York: Springer; 2001.
  • Hediger ML, Scholl TO, Schall JI, Miller LW, Fischer RL. “Fetal growth and the etiology of preterm delivery.” Obstet Gynecol. 1995;8:5, 175–182. URL [PubMed]
  • Hooper PM, Mayes DC, Demianczuk NN. “A model for foetal growth and diagnosis of intrauterine growth restriction” Stat Med. 2002;21:95–112. doi: 10.1002/sim.969. [PubMed] [Cross Ref]
  • Hutcheon JA, Egeland GM, Morin L, Meltzer SJ, Platt RW. “The predictive ability of conditional fetal growth percentiles,” Paediatric and Perinatal Epidemiology. 2010;2:4, 134–139. [PubMed]
  • Hutcheon JA, Zhang X, Cnattingius S, Kramer MS, Platt RW. “Customised birthweight percentiles: does adjusting for maternal characteristics matter?” BJOG. 2008a;115:1397–1404. doi: 10.1111/j.1471-0528.2008.01870.x. [PubMed] [Cross Ref]
  • Hutcheon JA, Zhang X, Platt RW. “The benefits of customizing for maternal factors or the benefits of using an intrauterine standard at preterm ages?” Am J Obstet Gynecol. 2008b;199:e18–9. doi: 10.1016/j.ajog.2008.02.034. author reply e19–20. [PubMed] [Cross Ref]
  • Johnsen SL, Rasmussen S, Wilsgaard T, Sollien R, Kiserud T. “Longitudinal reference ranges for estimated fetal weight” Acta Obstet Gynecol Scand. 2006;85:286–297. doi: 10.1080/00016340600569133. [PubMed] [Cross Ref]
  • Källén B. “A birth weight for gestational age standard based on data in the swedish medical birth registry, 1985–1989” Eur J Epidemiol. 1995;11:601–606. doi: 10.1007/BF01719316. [PubMed] [Cross Ref]
  • Kramer MS, Platt RW, Wen SW, Joseph KS, Allen A, Abrahamowicz M, Blondel B, Bréart G. “A new and improved population-based canadian reference for birth weight for gestational age” Pediatrics. 2001;108:E35. doi: 10.1542/peds.108.2.e35. [PubMed] [Cross Ref]
  • Marsál K, Persson PH, Larsen T, Lilja H, Selbing A, Sultan B. “Intrauterine growth curves based on ultrasonically estimated foetal weights” Acta Paediatr. 1996;85:843–848. doi: 10.1111/j.1651-2227.1996.tb14164.x. [PubMed] [Cross Ref]
  • McIntire DD, Bloom SL, Casey BM, Leveno KJ. “Birth weight in relation to morbidity and mortality among newborn infants” N Engl J Med. 1999;340:1234–1238. doi: 10.1056/NEJM199904223401603. [PubMed] [Cross Ref]
  • Mongelli M, Gardosi J. “Longitudinal study of fetal growth in subgroups of a low-risk population.” Ultrasound Obstet Gynecol. 1995;6:340–344. doi: 10.1046/j.1469-0705.1995.06050340.x. URL [PubMed] [Cross Ref]
  • Ott WJ. “Intrauterine growth retardation and preterm delivery.” Am J Obstet Gynecol. 1993;16:8, 1710–5. discussion 1715–7. [PubMed]
  • Pan H, Goldstein H. “Multi-level models for longitudinal growth norms” Stat Med. 1997;16:2665–2678. doi: 10.1002/(SICI)1097-0258(19971215)16:23<2665::AID-SIM711>3.0.CO;2-V. [PubMed] [Cross Ref]
  • Royston P. “Calculation of unconditional and conditional reference intervals for foetal size and growth from longitudinal measurements” Stat Med. 1995;14:1417–1436. doi: 10.1002/sim.4780141303. [PubMed] [Cross Ref]
  • Scholl TO, Hediger ML, Schall JI, Khoo CS, Fischer RL. “Dietary and serum folate: their influence on the outcome of pregnancy” Am J Clin Nutr. 1996;63:520–525. [PubMed]
  • Schwärzler P, Bland JM, Holden D, Campbell S, Ville Y. “Sex-specific antenatal reference growth charts for uncomplicated singleton pregnancies at 15–40 weeks of gestation.” Ultrasound Obstet Gynecol. 2004;2:3, 23–29. URL [PubMed]
  • Skjaerven R, Gjessing HK, Bakketeig LS. “Birthweight by gestational age in norway” Acta Obstet Gynecol Scand. 2000;79:440–449. doi: 10.1080/j.1600-0412.2000.079006440.x. [PubMed] [Cross Ref]
  • Weiner CP, Sabbagha RE, Vaisrub N, Depp R. “A hypothetical model suggesting suboptimal intrauterine growth in infants delivered preterm” Obstet Gynecol. 1985;65:323–326. [PubMed]
  • Zhang J, Grewal U, Hediger ML, Troendle JF, Zhang C. “The national standard for normal fetal growth,” 2010a. URL
  • Zhang J, Merialdi M, Platt LD, Kramer MS. “Defining normal and abnormal fetal growth: promises and challenges.” Am J Obstet Gynecol. 2010b;20:2, 522–528. URL [PMC free article] [PubMed]

Articles from The International Journal of Biostatistics are provided here courtesy of Berkeley Electronic Press