Search tips
Search criteria 


Logo of humrepLink to Publisher's site
Hum Reprod. 2009 September; 24(9): 2104–2113.
Published online 2009 June 2. doi:  10.1093/humrep/dep198
PMCID: PMC2727402

Is there an advantage in scoring early embryos on more than one day?



This study was undertaken to determine what characteristics should be recorded on which days to build a predictive model for selection of Day 3 embryos.


Embryos failing to form a clinical sac or that formed a viable fetus (to ≥12 weeks), and transferred singly (n = 269) or in pairs (n = 1326) were scored for early cleavage and pronuclear status on Day 1, and cell number, fragmentation, and symmetry on Days 2 and 3, with number of nuclei per blastomere also recorded on Day 2. Seven candidate models were identified using a priori clinical knowledge and univariate analyses. Each model was fit on a training-set and evaluated on a test-set with resampling, with discrimination assessed using the area under the ROC curve (AUC) and calibration assessed using the Hosmer–Lemeshow statistics.


Models built using Day 1, 2 or 3 scores independently on the 30 resampled data sets showed that Day 1 evaluations provided the poorest predictive value (median AUC = 0.683 versus 0.729 and 0.725, for Day 2 and 3). Combining information from Day 1, 2 and 3 marginally improved discrimination (median AUC = 0.737). Using the final Day 3 model fitted on the whole dataset, the median AUC was 0.732 (95% CI, 0.700–0.764), and 68.6% of embryos would be correctly classified with a cutoff probability equal to 0.3.


Day 2 or Day 3 evaluations alone are sufficient for morphological selection of cleavage stage embryos. The derived regression coefficients can be used prospectively in an algorithm to rank embryos for selection.

Keywords: embryo morphology, embryo score, evaluation day, implantation potential, prediction


A great challenge in clinical IVF is to refine embryo selection techniques so that the single most developmentally competent embryo can be reliably identified in every cohort of available embryos. Once this challenge has been met, we may be positioned to perform single embryo transfer to all patients. As such, IVF pregnancy rates may be maximized, and IVF-related multiple pregnancies may be eliminated except for those that very rarely occur due to embryo splitting.

Historically, morphological evaluation has been the primary method used for embryo assessment (Edwards et al., 1981; Cummins et al., 1986) and, despite its recognized limitations (Guerif et al., 2007), the method still remains the most commonly used approach for selection. It has been recognized for many years that a large proportion of human preimplantation embryos undergo deviant development in vitro, failing to follow the expected normal developmental timeline (Puissant et al., 1987; Staessen et al., 1992; Steer et al., 1992; Van Royen et al., 1999), and frequently exhibiting high levels of fragmentation and blastomere asymmetry. These facts have provided the rationale for investigating whether multiple evaluations through early preimplantation development may improve selection compared with a single evaluation, performed shortly before transfer. A large number of studies have been performed in which various combinations of days for scoring have been assessed (Skiadas and Racowsky, 2007 for review) and systems for evaluation have been proposed (e.g. Cummins et al., 1986; Puissant et al.,1987; Steer et al., 1992; Giorgetti et al., 1995; Van Royen et al., 1999; Desai et al., 2000).

Regardless of the days on which embryos have been assessed, several approaches have been made to establish various numerical scoring systems to predict the likelihood of an embryo giving rise to a viable fetus. The methods can be classified as follows: morphological observations and the allotment of scores (Giorgetti et al., 1995; Desai et al., 2000; Van Royen et al., 1999, 2001; Fisch et al., 2003; Sjöblom et al., 2006); the application of logistic regression analysis (Terriou et al., 2001, 2007; Guerif et al., 2007; Holte et al., 2007; Vergouw et al., 2008); class probability tree analysis (Saith et al., 1998); a case based reasoning system (Jurisica et al., 1998); decision tree data mining (Trimarchi et al., 2003) and automated pattern analysis (Patrizi et al., 2004). Overall, conflicting results exist and there is currently no consensus on (1) the optimum day(s) for evaluation; (2) the optimum set of variables that should be used in a predictive model; and (3) model performance benchmarks for future comparison. Answers to these questions are necessary so that quantitative comparisons can be made about the relative efficacies of the different evaluation protocols.

The present study was undertaken to identify the utility of embryo scoring on Day 1 for early cleavage on the afternoon of the fertilization check, and on Day 2 and Day 3. The over-arching goal of the research was to determine what days the embryos should be scored and what characteristics should be recorded, so as to achieve a ranking of available embryos for purposes of selection for transfer.

Materials and Methods

The study was approved by the Partners' Institutional Review Board for medical record review of our electronic IVF database.

Ovarian stimulation and luteal support protocols

Patients having IVF, with or without ICSI, underwent controlled ovarian stimulation (COS) using protocols previously described in Skiadas et al. (2006). Briefly, COS was most commonly performed using luteal leuprolide acetate (Lupron; TAP Pharmaceuticals, Deerfield, IL, USA) down-regulation in conjunction with either highly purified FSH (Fertinex; Serono Laboratories, Norwell, MA, USA) or recombinant FSH (Follistim: Organon, West Orange, NJ, USA; Gonal-F: Serono Laboratories). The standard daily gonadotrophin dosage was typically three to four ampules (225–300 IU) administered as either a single or split dose. However, patients >40 years or those with a history of low gonadotrophin response were given up to a maximum of eight ampules daily (administered in divided doses), with or without hMG (Repronex: Ferring, Tarrytown, NY, USA; Pergonal: Serono Laboratories), either using an antagonist regimen or a microflare protocol. When at least two follicles reached a mean diameter of 16.5 mm and the estradiol17β level was >500 pg/ml, 10 000 IU hCG (Profasi; Serono) was administered intramuscularly followed 36 h later by transvaginal oocyte retrieval.

Luteal progesterone supplementation was initiated the day after oocyte retrieval and continued until 10 weeks in patients who became pregnant. Such luteal support was achieved by one of three regimens: (1) daily i.m. progesterone (50 mg), (2) daily vaginal gel [8% progesterone [Crinone; Wyeth-Ayerst, Madison, NJ, USA)]; or (3) three times daily vaginal progesterone suppositories (200 mg tid). Embryo transfer generally was performed with a Wallace catheter (Marlow/Cooper Surgical, Shelton, CT, USA). For difficult transfers, a Marrs No. 4 or Marrs No. 5 embryo transfer catheter (Cook Ob/Gyn, Spencer, IN, USA) was occasionally used.

Insemination and culture protocols

Oocytes were either inseminated 4–6 h after retrieval in groups [3–5 oocytes in 1.0 ml HF-10 (Sigma Aldrich, St. Louis, MO, USA) supplemented with 5% human serum albumin (InVitroCare Inc., Frederick, MD, USA)] overlaid with 1.0 ml oil in Falcon 3037 dishes (Becton Dickinson Labware, Franklin Lakes, NJ, USA) using 300 000 motile spermatozoa, or they underwent ICSI 3–5 h after retrieval. At the fertilization check 16–18 h later, zygotes with 2 pronuclei (PN's) were cultured individually in 25 µl microdrops of growth medium overlaid with 8 ml oil in Falcon 1007 culture dishes (Becton Dickinson Labware, Franklin Lakes, NJ, USA). From Days 1 to 3, one of three culture media was used [G1.3 (Scandinavian IVF Science/Vitrolife, Gothenburg, Sweden/Denver, CO, USA); Life-Global (IVF Online, Toronto, Canada); ECM (Sage/Cooper Surgical, Trumbull, CT, USA)]. All cultures were maintained at 37°C in a humidified atmosphere of 5% CO2 in air.

Embryo evaluations

All embryos were evaluated on Days 1, 2 and 3 with target evaluation times post-insemination or ICSI of 25 h for assessment of early cleavage in the afternoon of Day 1, 44 h for Day 2 evaluations, and 68 h for Day 3. Evaluation times were recorded for all embryos, with morphological characteristics scored and nominal data coded, as shown in Table I. On Day 1, embryos were graded for cell number and, when 1-cell, for the presence or absence of the PN's. On Day 2, embryos were assessed for cell number, extent of fragmentation as assessed by the volume of the embryo occupied by fragments, and the extent to which the blastomeres were asymmetric (i.e. dissimilar in size and shape), and exhibiting nuclei. For the purposes of assessing the contribution of scoring the blastomere nuclei, the embryos were stratified into two groups: into those either having or not having a single nucleus in all blastomeres. On Day 3, embryos were similarly assessed for cell number, fragmentation and asymmetry, as previously described (Racowsky et al., 2003). Examples of various degrees of asymmetry and fragmentation are shown for 8-cell embryos in Fig. 1.

Figure 1
8-Cell human embryos on Day 3 of culture with varying degrees of fragmentation and asymmetry, as reflected by the numerical scores on the images corresponding to the cell number, fragmentation score and asymmetry scores, respectively, as defined in Table  ...
Table I
List of embryonic features included in the study

Study dataset

Our database was screened to identify all cycles performed at Brigham & Women's Hospital from 1 November 2003 through 31 March 2007 resulting in an embryo transfer on Day 3 without use of a gestational carrier, and in which all embryos were graded in the afternoon of the fertilization check (Day 1), and again on Day 2 and Day 3 (n = 1997). From this initial set of screened cycles, those having only one or two embryos transferred were further identified (n = 1257). This subset was then used to distinguish those cycles in which the developmental fate of each transferred embryo was definitively known (n = 972, with transfer of 1661 embryos). Developmental fate was classified as an embryo either failing to implant and give rise to a clinical sac, or implanting and developing into a viable fetus, detectable to at least 12 weeks of gestation. Accordingly, the single embryo transfers gave rise either to no detectable uterine pregnancy, or to a viable singleton, and the double embryo transfers resulted in either no detectable uterine pregnancy, or to a pregnancy comprised of viable dizygotic twins at 12 weeks.

Of the 1661 embryos identified with known developmental fate, 66 were subsequently excluded because of missing morphological data or because they either exhibited the rare condition of a single pronucleus (1PN) on the afternoon of Day 1 (despite having 2PNs at the fertilization check), or because they were transferred along with one of these unusually developing embryos. The final dataset comprised 1595 embryos, 564 of which resulted from ICSI and 1031 underwent assisted hatching with or without ICSI. In all, 269 embryos were transferred alone, and 1326 were transferred in double embryo transfers. A listing of the characteristics associated with the cycles from which the embryos were derived is given in Table II.

Table II
Characteristics of the cycles included in the study

Biometrical considerations

Logistic regression models were fitted to the data since the response (presence or absence of a fetus) was a binary variable (Hosmer and Lemeshow, 2000). The following subsets were selected for initial model building, which we assumed are likely to have a major impact on the development of a fetus at 12 weeks. These were denoted:

  • D1: {eggDonor, cell counts, day1stage} on Day 1.
  • D2: {eggDonor, cell counts, fragmentation, symmetry scores and nuclear status} on Day 2.
  • D3: {eggDonor, cell counts, fragmentation and symmetry scores} on Day 3.

The day1stage incorporated cell number and the number of pronuclei on Day 1. The fragmentation and symmetry scores were classified as factor variables, and therefore transformed to dummy variables using the lowest score as the reference score.

All models also included the age of the oocyte (i.e. the age of the patient in autologous cycles, and the age of the oocyte donor in cases of oocyte donation), and the source of the oocyte (autologous or donated), both of which we assumed to have a major impact on the outcome. In all, seven models were evaluated (Table III).

Table III
Candidate models

Modeling techniques

Standard techniques for the analysis of clustered data are the generalized estimating equation (GEE) algorithm and the generalized linear mixture algorithms. However, GEE models, as opposed to logistic regression, did not fit our data well (AUC < 0.6 in all considered models). Seven candidate models were assessed for discrimination and calibration using methodologies as recommended by Cook (2007). For discrimination, area under the receiving operator characteristic curve (AUC) was used as a measure of the ability to rank embryos according to probability of success, a high value indicating good ranking ability. For model calibration, the Hosmer–Lemeshow Goodness-of-Fit test was used which measures agreement between prediction and observed risks. When the P-value was >0.05, the model was considered to be well calibrated.

The data were randomly split into a training set (80%) and a test set (20%). Each model was fit on the training set and then evaluated on a test set. To overcome sample-bias, the data were split and the models fitted 30 times. Acceptable models were defined as those (i) whose median AUCs were significantly higher than those of any other candidates, and (ii) whose median Hosmer–Lemeshow P-values for the training set stayed significantly above 0.05. The degree of significance was evaluated by a Wilcoxon test under two-sample (AUC) and one-sample (Hosmer–Lemeshow P-value) settings with P < 0.05 considered statistically significant. All possible pair-wise combinations of the AUCs for the seven models were tested using the Bonferroni adjustment (P-value cutoff is 0.05/7C2 ≈ 0.002; Shaffer et al., 1995).

Since several of the models were not significantly different from one another, clinical considerations were also taken into account for the selection of the final model. Finally, the chosen model for embryo selection was constructed using the full dataset from 1595 embryos.

Fisher's Exact test, and the Kruskal–Wallis Exact test were used where appropriate to assess differences between proportions, with P < 0.05 considered statistically significant.


Justification for pooling data from patients treated with IVF and ICSI

Distributions of the ages of patients treated by IVF and ICSI

The ages of the patients ranged from 21.6 to 44.8 years. The distributions of the ages of the patients whose ova were fertilized either by IVF or ICSI indicate that there was a tendency for patients treated with IVF to be slightly older (ICSI: 34.18 ± 4.37 years, IVF: 35.22 ± 4.36 years).

Allocation of the numbers of embryos transferred in patients treated with IVF and ICSI

There was a borderline significant difference between the proportion of IVF and ICSI cycles having one embryo transferred (26.7 versus 32.6%, respectively; P = 0.042). However, there was no significant difference between the percentages of fetuses produced by IVF and ICSI when embryos were either transferred singly (30.2 versus 24.5%, IVF versus ICSI; P = 0.334), or as a pair (22.4 versus 21.3%, IVF versus ICSI; P = 0.673). The data obtained using IVF and ICSI have therefore been pooled in subsequent analyses since no major differences were observed between the distributions of the maternal ages, the allocation of numbers of embryos transferred and the yield of 12-week-old fetuses.

Preliminary univariate analyses

Maternal age

The effect of maternal age was examined by fitting a linear logistic regression between the presence or absence of a fetus and the age of the woman. The estimated equation poorly fitted the data (Hosmer–Lemeshow test: P < 0.001). Adding the square of the age to the equation improved calibration (P = 0.037). The estimated equation is:

equation image

where P is the probability of a fetus being present at 12 weeks. A plot of this probability against maternal age is shown in Fig. 2. The probability of a pregnancy was highest when her age was about 28 years whether the embryos were transferred singly or in pairs (data not shown). To allow for this non-linearity, the square of patient age was included in all models.

Figure 2
The probability of achieving a viable pregnancy to at least 12 weeks of gestation according to patient age.

Autologous versus donated oocytes

The proportion of viable fetuses resulting from donated oocytes was significantly greater than that from use of autologous oocytes (44/126 [34.9%] versus 319/1469 [21.7%]; P = 0.001). Therefore, oocyte donation was included as a variable in model building.

Times when the embryos were observed

The mean times of observation on Day 1, Day 2 and Day 3 were 24.8 h (range 18.0–29.0 h), 44.2 h (range 37.6–50.7 h) and 68.6 h (range 60.2–77.7 h), respectively. The regressions of the logit of the proportion of fetuses on time for the three days were not significantly different from zero (Day 1: b = −0.007, sb = 0.063; Day 2: b = −0.00004, sb = 0.053; Day 3: b = −0.039, sb = 0.024). Thus, the times when the embryos were scored had no significant effects on the outcomes and so were not included in the models.

Number of cells in embryos on Days 1, 2 and 3

On Day 1, those embryos at the 2-cell stage were most likely to form a fetus (30.7%), while those either still at the 1-cell stage (with or without their visible PN's), or that had cleaved beyond the 2-cell stage showed lower viability (1-cell with 2pn: 12.1%, 1-cell with 0pn: 22.4%, >2-cell: 9.5%). On Day 2, maximum fetal development was associated with embryos having 4-cells (30.4%), while on Day 3, in those groups with robust numbers of embryos (i.e. >50), maximum fetal development was associated with 8-cell embryos (30.2%). Of note was development of fetuses from 12-cell embryos on Day 3, which was higher than that for 8-cell embryos (34.1 versus 30.2%). This observation may be due to sampling error of a small sample size.

To allow for the non-linear relationship between day of culture and the number of cells associated with fetal viability (Day1: 2 cells, Day 2: 4 cells and Day 3: 8 cells), a derived variable was included in all models, defined by:

equation image

Fragmentation on Days 1, 2, 3

An increased level of fragmentation on either Day 2 or Day 3 was associated with a significant reduction in the number of fetuses that developed to 12 weeks [Day 2, Score 0 (28.0% fetuses) to Score 4 (0% fetuses); Day 3, Score 0 (29.4% fetuses) to Score 4 (0% fetuses)].

Asymmetry on Days 1, 2, 3

An increased level of asymmetry on either Day 2 or Day 3 was associated with a significant reduction in the number of fetuses that developed to 12 weeks [Day 2, Score 1 (28.1% fetuses) to Score 3 (9.1% fetuses); Day 3, Score 1 (31.1% fetuses) to Score 3 (5.0% fetuses)].

Multinucleate blastomeres on Day 2

The percentage of fetuses that developed from embryos was highest when all the blastomeres had a single nucleus (32.7%). The second largest percentage occurred when single nuclei could be seen in some, but not all, of the blastomeres (17.9%). Relatively smaller percentages of fetuses occurred when no nuclei were visible (9.0%), and when some blastomeres had more than one nucleus (9.9%). In the following multivariate linear regression analysis these data were reclassified in binary form where 0 was allotted to an embryo in which all blastomeres had a single nucleus and one was allotted to the rest.

Model selection

Figure 3 shows the AUC and Goodness of Fit P-values for both the training and test sets for each of the seven models investigated. There is close agreement between the results for the training (Fig. 3A, C) and test (Fig. 3B, D) sets. The median AUC of each of the seven models from the test sets were: D1 = 0.683; D2 = 0.729; D3 = 0.725; D1,2 = 0.725; D1,3 = 0.723; D2,3 = 0.739; and D1,2,3 = 0.737 (Fig. 3B). All seven candidate models showed adequate calibration, with the median Goodness-of-Fit P-value being significantly above the 0.05 threshold (Fig. 3D). The AUC for the Day 1 model was significantly lower than that of each of the other six models, indicating that the Day 1 model has the lowest discrimination (P < 0.002 for all seven comparisons using the Bonferroni adjustment). In contrast, the values of the AUCs for the remaining six models were not significantly different, indicating they had approximately equal discrimination (P > 0.002 for all 15 combinations).

Figure 3
The AUCs and the goodness of fit P-values fit for the seven putative models.

From the six models with comparable discrimination, Day 2 and Day 3 models were selected for further study based on practical application in the clinic (see Discussion). Logistic regressions were then computed using the whole datasets for these two models (Table IV). A plot of observed versus expected outcomes grouped by fixed probability thresholds for the Day 3 model is shown in Fig. 4. The figure confirms that the calibration of the model is acceptable, although it slightly under estimates the observed values. Similarly, calibration was acceptable for model Day 2 (data not shown). The AUCs for model Day 2 and Day 3 were 0.733 (95% CI, 0.701–0.765) and 0.732 (95% CI, 0.700–0.764), respectively.

Figure 4
Calibration plot of the observed proportion of embryos developing into fetuses at 12 weeks versus the prediction probability calculated using the whole dataset and the final Day 3 model.
Table IV
Regression coefficients for Day 2 and Day 3 models using the whole dataset


The purpose of an embryo scoring system in an IVF clinic is to aid in selection of the ‘best’ embryos for transfer. An acceptable system is one that orders the embryos reliably on a quantitative scale. We have derived such a system using logistic regression analysis on a robust dataset of embryos of known developmental fate. Resampling-based statistical modeling has guided the derivation of a model that shows that single day morphology scoring on Day 3, just before embryo transfer, provides similar predictive value to multi-day scoring. Furthermore, our results suggest that it is sufficient to score embryos for number of cells, fragmentation and symmetry on Day 3, and adjust for maternal age and oocyte donor status to select the best embryos for transfer. The variables and their regression coefficients used in the model are shown in Table IV (Day 3).

The analysis led to the derivation of another model that shows that embryos could be scored on Day2 only. The variables and their coefficients for this model are also shown in Table IV (Day 2).

Calculation of AUC coupled with the logistic regression method to build prognostic models allows one to determine the discriminatory ability of each model. Using such an approach, several computational procedures could have been used for selection of variables, e.g. forward, backward and stepwise fitting. Since there are pitfalls associated with these procedures when they are used alone (Good and Hardin, 2003), it is generally recommended that relevant clinical knowledge be used to assist in the selection of the independent variables. We adopted this approach in the present analyses, in which all the models we examined were based mainly on a priori clinical knowledge. Similar model building using logistic regression has been successfully used to predict the probability of blastocyst formation from cultured cleavage stage pig embryos (Booth et al., 2007).

Predictive value of the models

The probabilities that each embryo in our dataset will develop into a fetus at 12 weeks have been computed using the Day 3 model. The distributions of the individual probabilities associated with either the development of a fetus or failure to develop are shown in Fig. 5. Although the two distributions overlap, it is clear that higher probabilities are associated with the development of a fetus compared with those associated with the failure of a fetus to develop.

Figure 5
Distributions of probabilities associated with developing a fetus or not developing a fetus.

The predictive value of a model can be examined by the calculation of a classification table for a hypothetical situation determined by the selection of a cutoff probability. Such a table has been computed for the Day 3 model assuming a cutoff probability equal to 0.3 (Table V). Four predicted outcomes can be computed: correctly predicted occurrence of a fetus (true positive), correctly predicted absence of a fetus (true negative), false prediction of a fetus (false positive), and false prediction of a fetus failing to develop (false negative). The results, in this example, show that 68.6% of embryos are classified correctly.

Table V
Predictions for Day 3 model with a cutoff of ranking score at 0.3

The discrimination of a model is often assessed by the AUC. In the present context, the AUC is the probability that a randomly-chosen transfer yielding a fetus at 12 weeks is ranked higher than a randomly-chosen transfer that did not yield a fetus at 12 weeks (Pepe, 2003). The estimates of the AUC show that there is little to choose among all models examined except for Day 1. All values fall in the range 0.7 ≤ AUC < 0.8, which Hosmer and Lemeshow (2000) consider acceptable discrimination. The values for the Hosmer and Lemeshow statistics are also within acceptable limits for both training and test sets, indicating reasonable calibration. However, although the calibration of the models is numerically acceptable, we do not advocate using the probability estimate derived from the models as a proxy for the probability of the success of the transfer, as further work is needed to refine the assessment of calibration for these models. This means that the models presented here should be used to rank order the embryos for probability of success, but should not be used to provide patients with an estimated probability of success. In related work in other clinical domains, we show that discrimination of models is usually conserved across different settings, although calibration is not (Matheny et al., 2005; Ohno-Machado et al., 2006). This is particularly relevant in the assisted reproductive technology domain, given the baseline differences in success rates across IVF clinics and different characteristics of the population served by these programs. Therefore, it is necessary to assess the calibration of this model in other clinics. Until this is done, communicating the probability estimates from this model to patients or using its results as a basis for deciding how many embryos to transfer is not warranted.

The marginal increase in discrimination achieved using the model of all 3 days combined (Day 1, 2, 3: median AUC = 0.737) as compared with those using only Day 2 (median AUC = 0.733) or Day 3 (median AUC = 0.732) might suggest that multi-day scoring is desirable. However, any such slight gain to embryo selection must be weighed against possible detrimental effects of repetitive environmental perturbations to the embryos (light exposure, temperature and pH shifts etc.), caused by serial removal from the incubators. Moreover, the practicality of performing repeat observations within specific time-windows in a busy clinical IVF laboratory must also be taken into consideration. On balance, we propose that evaluations should be performed either on Day 2 or Day 3 alone, since such a paradigm essentially maximizes morphological selection of cleavage stage embryos, and is the simplest and potentially the safest approach for embryo health. Although available evidence suggests comparable pregnancy rates are achieved with either day of transfer in good prognosis patients, Day 2 transfer may improve the probability of pregnancy in poor responders (reviewed by Kyrou et al., 2008).

The considerable overlap in the distributions of probabilities associated with developing or not developing into a fetus (Fig. 5) gives rise to false positive and false negative predictions depending on the assumed cutoff probability (Table V). This overlap is likely due to a multitude of variables that may impact on whether an embryo will implant. Aside from the overall health of the gametes (as affected by parental age, genetic backgrounds and life-time environmental exposures), other factors including the ovarian stimulation, the culture conditions, the day of transfer, the degree of difficulty of the transfer procedure itself, and uterine receptivity issues all are likely to play a role. Moreover, although there is a correlation between aberrant morphology and increased aneuploidy (Munne et al., 1995; Magli et al., 2007), normal cleavage stage morphology does not guarantee euploidy (Gianaroli et al., 1997; Munne, 2006).

Evaluation of new methods

Considerable research is now being done on alternative non-invasive methods to morphological scores, such as amino acid analyses (Brison et al., 2007) and metabolomic (Seli et al., 2007) and proteomic (Katz-Jaffe et al., 2006) profiling that screen spent culture media. In the evaluations of these newly developing systems, it is important that the most accurate morphological systems are used. In view of the demonstrated efficacy of models based on morphology with an AUC > 0.7, the putative superiority of new methods should require that they result in an AUC > 0.7. Recently, Vergouw et al. (2008) have used classification tables to compare systems based on morphological scoring and metabolic profiling on Day 2 and Day 3. Their claim that metabolic profiling is more accurate than morphological scoring is not convincing since the accuracy of their morphological scoring systems (27.6 and 38.5%, Day 2 and Day 3, respectively), compared poorly to the accuracy of our Day 3 model (68.6%:), and the accuracy achieved with use of their metabolomic profiling was not superior (69.0 and 53.6%, Day 2 and Day 3, respectively).

A widespread belief exists that all morphological scoring systems have low predictive value for embryo selection. Our work shows that this belief is not true. Since these methods are based on simple direct observations, it is important that they be used as standards for evaluating all new methods. When the assessment is in binary form (i.e. the presence or absence of a fetus or newborn), the control system should have an AUC > 0.7. Only by setting such objective benchmarks can new methods be compared.


In conclusion, the findings reported in this paper indicate that single day morphological evaluation on either Day 2 or Day 3 provides similar predictive value to multi-day scoring. The regression coefficients derived from our statistical analyses (Table IV) can be applied prospectively to enable available embryos to be ranked for each patient prior to transfer. As such, we expect this algorithm to be applied in the clinic for the superior selection for transfer of the most developmentally competent embryo in each cohort, thereby resulting in an overall improvement in our viable pregnancy rates. We are currently planning a prospective evaluation of the efficacy of the Day 3 model in our clinic and eventually its validation in other clinics.


This work was funded in part by grant R01LM009520 from the National Library of Medicine, NIH (L.O.).


We thank the embryology team at Brigham & Women's Hospital for their expertise and commitment to excellence in the care of the gametes and embryos of our IVF patients. We extend particular appreciation to Gena Ratiu, M.D. for taking the embryo photographs.


  • Booth PJ, Watson TJ, Leese HJ. Prediction of porcine blastocyst formation using morphological, kinetic, and amino acid depletion and appearance criteria determined during the early cleavage of in vitro-produced embryos. Biol Reprod. 2007;77:765–779. [PubMed]
  • Brison DR, Hollywood K, Arnesen R, Goodacre R. Predicting human embryo viability: the road to non-invasive analysis of the secretome using metabolic footprinting. Reprod BioMed Online. 2007;15:296–302. [PubMed]
  • Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935. [PubMed]
  • Cummins JM, Breen TM, Harrison KL, Shwan JM, Wilson LM, Hennessey JF. A formula for scoring human embryo growth rates in in vitro fertilization: its value in predicting pregnancy and in comparison with visual estimates of embryo quality. J In Vitro Fert Embryo Transf. 1986;3:284–295. [PubMed]
  • Desai N, Goldstein J, Rowland DY, Goldfarb MJ. Morphological evaluation of human embryos and derivation of an embryo quality system specific for day 3 embryos: a preliminary study. Hum Reprod. 2000;15:2190–2196. [PubMed]
  • Edwards RG, Purdy JM, Steptoe PC, Walters DE. The growth of human preimplantation embryos in vitro. Am J Obstet Gynecol. 1981;141:408–416. [PubMed]
  • Fisch J, Sher G, Adamowicz M, Keskintepe L. The graduated embryo score predicts the outcome of assisted reproductive technologies better than a single day 3 evaluation and achieves results associated with blastocyst transfer from day 3 embryo transfer. Fertil Steril. 2003;80:1352–1358. [PubMed]
  • Gianaroli L, Magli M, Ferraretti A, Fiorentino A, Garrisi J, Munne S. Preimplantation genetic diagnosis increases the implantation rate in human in vitro fertilization by avoiding the transfer of chromosomally abnormal embryos. Fertil Steril. 1997;68:1128–1131. [PubMed]
  • Giorgetti C, Terriou P, Auquier P, Hans E, Spach J-L, Saltzmann J, Roulier R. Embryo score to predict implantation after in-vitro fertilization: based on 957 single embryo transfers. Hum Reprod. 1995;10:2427–2431. [PubMed]
  • Good PI, Hardin JW. Common Errors in Statistics. Hoboken, NJ: Wiley; 2003. pp. 127–162.
  • Guerif F, Le Gouge A, Giraudeau B, Poindron J, Bidault R, Gasnier O, Royere D. Limited value of morphological assessment at days 1 and 2 to predict blastocyst developmental potential: A prospective study based on 4042 embryos. Hum Reprod. 2007;22:1973–1981. [PubMed]
  • Holte J, Berglund I, Milton K, Garello C, Gennarelli G, Revelli A, Bergh T. Construction of an evidence-based integrated morphology cleavage embryo score for implantation potential of embryos scored and transferred on Day 2 after oocyte retrieval. Hum Reprod. 2007;22:548–557. [PubMed]
  • Hosmer DW, Lemeshow S. Applied Logistic Regression. 2nd edn. New York, NY: Wiley; 2000.
  • Jurisica I, Mylopoulos J, Glasgow J, Shapiro H, Casper R. Case-based reasoning in IVF: prediction and knowledge mining. Artif Intell Med. 1998;12:1–24. [PubMed]
  • Katz-Jaffe MG, Gardner DK, Schoolcraft WB. Proteomic analysis of individual human embryos to identify novel biomarkers of development and viability. Fertil Steril. 2006;85:101–107. [PubMed]
  • Kyrou D, Kolibianakis EM, Venetis CA, Papanikolaou EG, Bontis J, Tarlatzis BC. How to improve the probability of pregnancy in poor responders undergoing in vitro fertilization: a systematic review and meta-analysis. Fertil Steril. 2008 PMID:18639875. [PubMed]
  • Magli MC, Gianaroli L, Ferraretti AP, Lappi M, Ruberti A, Farfalli V. Embryo morphology and development are dependent on the chromosomal complement. Fertil Steril. 2007;87:534–541. [PubMed]
  • Matheny ME, Ohno-Machado L, Resnic FS. Discrimination and calibration of mortality risk prediction models in interventional cardiology. J Biomed Inform. 2005;38:367–375. [PubMed]
  • Munne S. Chromosome abnormalities and their relationship to morphology and development of human embryos. Reprod Biomed Online. 2006;12:234–253. [PubMed]
  • Munne S, Alikani M, Tomkin G, Grifo J, Cohen J. Embryo morphology, developmental rates, and maternal age are correlated with chromosome abnormalities. Fertil Steril. 1995;64:382–391. [PubMed]
  • Ohno-Machado L, Resnic FS, Matheny ME. Prognosis in critical care. Annu Rev Biomed Eng. 2006;8:567–599. [PubMed]
  • Patrizi G, Manna C, Moscatelli C, Nieddu L. Pattern recognition methods in human-assisted reproduction. Int Trans Oper Res. 2004;11:365–379.
  • Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford, UK: Oxford University Press; 2003.
  • Puissant F, Van Rysselberge M, Barlow P, Deweze J, Leroy F. Embryo scoring as a prognostic tool in IVF treatment. Hum Reprod. 1987;2:705–708. [PubMed]
  • Racowsky C, Combelles CM, Nureddin A, Pan Y, Finn A, Miles L, GaLE s, O'Leary T, Jackson KV. Day 3 and day 5 morphological predictors of embryo viability. Reprod Biomed Online. 2003;6:323–331. [PubMed]
  • Saith RR, Srinivasan A, Michie D, Sargent IL. Relationships between the developmental potential of human in-vitro fertilization embryos and features describing the embryo, oocyte and follicle. Hum Reprod. 1998;4:121–134. [PubMed]
  • Seli E, Sakkas D, Scott R, Kwok SC, Rosendahl SM, Burns DH. Noninvasive metabolomic profiling of embryo culture media using Raman and near-infrared spectroscopy correlates with reproductive potential of embryos in women undergoing in vitro fertilization. Fertil Steril. 2007;88:1350–1357. [PubMed]
  • Shaffer J. Ann Rev Psych. 1995. Multiple hypothesis testing; pp. 561–584.
  • Sjöblom P, Menezes J, Cummins L, Mathiyalagan B, Costello MF. Prediction of embryo developmental potential and pregnancy based on early stage morphological characteristics. Fertil Steril. 2006;86:848–861. [PubMed]
  • Skiadas CC, Racowsky C. Developmental rate, cumulative scoring, and embryo viability. In: Elder K, Cohen J, editors. Human Preimplantation Embryo Selection. London: Informa Healthcare, UK; 2007. pp. 101–121.
  • Skiadas CC, Jackson KV, Racowsky C. Early compaction on day 3 may be associated with increased implantation potential. Fertil Steril. 2006;86:1386–1391. [PubMed]
  • Staessen C, Camus M, Bollen N, Devroey P, Van Steirteghem AC. The relationship between embryo quality and the occurrence of multiple pregnancies. Fertil Steril. 1992;57:626–630. [PubMed]
  • Steer CV, Mills CL, Tan SL, Campbell S, Edwards RG. The cumulative embryo score: a predictive embryo scoring technique to select the optimal number of embryos to transfer in an in-vitro fertilization and embryo transfer program. Hum Reprod. 1992;7:117–119. [PubMed]
  • Terriou P, Sapin C, Gioretti C, Hans E, Spach J-L, Roulier R. Embryo score is a better predictor of pregnancy than the number of transferred embryos or female age. Fertil Steril. 2001;75:525–531. [PubMed]
  • Terriou P, Giogetti C, Hans E, Salzmann J, Charles O, Cignettii L, Avon C, Roulier R. Relationship between even early cleavage and day 2 embryo score and assessment ofor pregnancy. Reprod Biomed Online. 2007;14:294–299. [PubMed]
  • Trimarchi JR, Goodside J, Passmore L, Silberstein T, Hamel L, Gonzalez L. Comparing data mining and logistic regression for predicting IVF outcome. Fertil Steril. 2003;80:S100.
  • Van Royen E, Mangelschots K, De Neubourg D, Valkenburg M, Van de Meerssche M, Ryckaert G, Eestermans W, Gerris J. Characterization of a top quality embryo, a step towards single-embryo transfer. Hum Reprod. 1999;14:2345–2349. [PubMed]
  • Van Royen E, Mangelschots K, De Neubourg D, Laureys I, Ryckaert G, Gerris J. Calculating the implantation potential of day 3 embryos in women younger than 38 years of age: a new model. Hum Reprod. 2001;16:326–332. [PubMed]
  • Vergouw CG, Botros LL, Roos P, Lens JW, Schats R, Hompes PGA, Burns DH, Lambalk CB. Metabolomic profiling by near-infrared spectroscopy as a tool to assess embryo viability: a novel, non-invasive method for embryo selection. Hum Reprod. 2008;23:1499–1504. [PubMed]

Articles from Human Reproduction (Oxford, England) are provided here courtesy of Oxford University Press