|Home | About | Journals | Submit | Contact Us | Français|
To determine the structure of the relationships of the histology scores for acute intraamniotic infection collected in the Collaborative Perinatal Project (CPP).
44,427 subjects of the CPP had complete histology scores available for the 9 measures that related to acute intraamniotic infection (i.e., neutrophil infiltrates in umbilical cord, amnion of extraplacental membranes and chorionic plate, decidua, chorionic plate and fetal chorionic vessels). Confirmatory factor analysis was used to determine the relationships among the different markers of maternal inflammatory responses (in amnion, chorion and decidua) and fetal inflammatory responses (in umbilical cord and fetal chorionic vessels).
A single CFA model could not be developed across all CPP sites. A well-fit model was developed from the Boston site (N=10,803) and the factor loadings applied to the histology scores from the other CPP sites. The resultant scores for the latent variables (maternal and fetal inflammatory responses) were compared across sites. There was not only considerable variability in factor loadings, and the signs of factor loadings were also inconsistent across sites.
Histopathology scores of neutrophil infiltrates performed by different observers do not have the same interrelationships and, by extension, the latent variables they are supposed to reflect may not be equivalent. The lack of measurement invariance renders their use as indicators of the underlying processes of maternal and fetal inflammatory responses problematic in analysis with any clinical outcome.
Chorioamnionitis, the presence of intraamniotic microbial organisms triggering maternal and/or fetal inflammatory responses, plays a significant role in reproductive and childhood pathology. Numerous investigators have identified ascending infection as a key pathway in the etiology of preterm birth, particularly early preterm births (less than 35 weeks gestation, as reviewed in 1). Ascending infection has also been proposed as a potential explanatory factor for the substantial racial disparity in risk of preterm birth. 2 Neonates born with funisitis, a prime histologic marker of fetal inflammatory response, are at increased risk for neurologic handicap and cerebral palsy. 3 However, it is the minority of infants born from such environments that develop any neurodevelopmental disorder and causal inference remains problematic. 4 Evidence has begun to accumulate that gene-environment interactions determine the likelihood of preterm labor and delivery and, probably, the risk of fetal injury.5
Holzman et al 6 recently and elegantly summarized the conflicting literature regarding the role of infection diagnosed histologically in preterm birth. . In their own analysis, the choice of inflammatory cell threshold (the number of infiltrating neutrophils required to make a diagnosis of infection) dramatically influenced disease prevalence; the rates of histologic chorioamnionitis ranged “from 85 percent … to 7 percent [in term] and in PTD from 63 percent … to 4 percent” at different inflammatory cell thresholds. 6 They also documented variability in the specific tissue components included, the number of tissue samples reviewed, and the specific features detailed (location, density, and degeneration), additional factors that would affect the prevalence of diagnosis of histologic chorioamnionitis and by extension complicate our understanding of its gestational effects.
Controversy remains in pathology circles regarding whether a multi-category (0–4 stage and 0–4 grading) histologic chorioamnionitis scoring system, or a more simplified system (with fewer categories or a “present/absent” categorization) is optimal. Inter-rater reliability is optimized with a “present/absent” system 7 but such a system must blur the subtleties of the complex mix of genes, cytokines, specific bacterial and other environmental stressors that is inflammation. How best to analyze the individual histology scores derived from the different tissues is also controversial. Should the scores of inflammation in amnion, chorion, decidua and chorionic plate be summed to reflect an overall “maternal inflammatory response” or should a “threshold” level of “normal neutrophil infiltration” in sites such as subchorionic fibrin be used to determine “intraamniotic infection greater than would be common in normal term births?8
Summing scores does not allow finer distinctions among the relative “value” of the different histology indicators. For example, neutrophil infiltrates in the amnion may be a stronger indicator of histologic chorioamnionitis than, for example, decidual neutrophil infiltrates. (e.g., 8) One can empirically assign weights to different indicator scores, and thus tinker with the sum. Alternatively, factor analysis can be used to derive weights (or factor loadings) that reflect the actual intercorrelations among the indicator variables. Exploratory factor analysis is employed when little is known of the underlying structure. Confirmatory factor analysis can be applied when we have biologically based, and theoretically derived concepts regarding the underlying structure, which we want to test.
Given the richness of the histology data in the National Collaborative Perinatal Project (NCPP) data, and the clinical importance of reliable and reproducible diagnoses of histologic chorioamnionitis estimated from histologic slides, we determined to apply confirmatory factor analysis to the histology scores related to histologic chorioamnionitis in the NCPP. Our goal was to explore the structure of the relationships of histologic measures of the maternal and fetal inflammatory responses, respectively, within and among institutions and observers.
Subjects were a subset of the National Collaborative Perinatal Project. Details of the study have been described elsewhere. 9, 10. Briefly, from 1959 to 1965, women who attended prenatal care at 12 hospitals were invited to participate in the observational, prospective study. At entry, detailed demographic, socioeconomic and behavioral information was collected by in-person interview. A medical history, physical examination and blood sample were also obtained. In subsequent prenatal visits, women were repeatedly interviewed and physical findings were recorded. During labor and delivery, placental gross morphology was examined and samples were collected for histologic examination. The children were followed up to seven years of age.
The analytic sample for the present analysis was derived from all delivered infants, live or stillborn infants, irregardless of gestational age, and included both singletons and multifetal pregnancies and these clinical data should not prejudice or bias the scoring of neutrophil infiltrates by pathologists blinded to other clinical data. The sample was restricted to those with complete data on the nine measures of neutrophil infiltrates that were specified by the protocol.11 Expert pathologists at each of 12 institutions were provided a scoring sheet with a written description of the grading scale for neutrophil infiltrates of amnion, chorion and decidua of the membranes, amnion and chorion of the chorionic plate, umbilical artery, vein and Wharton’s jelly, and fetal chorionic vessels (9 separate scores).
Our a priori understanding led us to formulate a confirmatory factor analysis with 2 latent variables, one reflecting the maternal inflammatory response (indicated by the scores of amnion, chorion and decidua of the membranes, amnion and chorion of the chorionic plate) and one reflecting fetal inflammatory response (indicated by the umbilical cord and fetal chorionic vascular scores). We fitted confirmatory factor analysis models to the data using Mplus 4.2. 12 The data were treated as ordered categorical (that is we modeled the probability of each response, rather than the mean of the responses). Parameters were estimated using the weighted least squares – mean and variance corrected algorithm, this approach has been shown to work well with categorical data.13 We followed the methods described by Joreskog, 14 first attempting a strictly confirmatory approach and then using a model generation approach to modify the model.
To assess model fit, we used the Χ2 statistic, in conjunction with its associated p-value. The Χ2 statistic assesses the difference between the model and the data. Larger, and more statistically significant values of Χ2 are indicative of worse model fit – worse model fit implying a greater mismatch between the model and the data. However, Χ2 suffers from well known problems when fitting models based on large samples – specifically it has a large amount of power to find models which differ from the data in only trivial and inconsequential amount. Because of this, a wide range of other indices have been developed along with Χ2 to aid in determining when good model fit has been found. For this analysis, we also used the Root Mean Square Error of Approximation (RMSEA15), the Comparative Fit Index (CFI 16) and the Tucker Lewis Index (TLI, also referred to as the non-normed fit index, NNFI). The RMSEA can be thought of as a correction to Χ2, to account for the sample size and model complexity; values below 0.05 are often seen as indicative of adequate fit. The CFI and TLI both compare the Χ2 of the fitted model to that of the null model, the null model being the worst model that it would be possible to have, with no relations between any of the variables 17 values above 0.95 are usually considered to show good fit. 18
Confirmatory factor analysis models can be fitted to single groups, or to multiple groups. In a multiple group model, parameters are estimated for each group, and these parameters can then be tested across groups using Wald tests or Χ2 difference tests.
We first examined the percentage endorsement of frequencies of scores for each of the 9 measures at each CPP site (Table 1). Of note, certain sites used in effect a 0–2, rather than the 0–3 scale specified by the protocol 11, and the highest severity score was overall used infrequently when used.
Next, Using Mplus 4.2 12, and considering the histology scores as ordered categorical variables representing underlying continuous processes, we attempted to fit a multiple group model, with hospital sites defining the groups. This model had convergence problems which we identified as being related to particular sites, where measures either varied inconsistently, or were perfectly correlated. As the Boston cohort (N=10803) was the largest of the 12, we elected to develop a model in this cohort and then test its generalizability to the other cohorts. 12 The close correlation between scores of neutrophil infiltrates in membrane chorion and membrane decidua forced removal of the membrane chorion score from the model; of the two variables, the membrane decidua score provided slightly better fit. Figure 1 shows the final model, which had excellent fit according to established criteria (e.g. CFI, TLI each 0.999, RMSEA 0.033). Of interest, model fit was significantly improved by removing the fetal chorionic vessel score as an indicator of fetal inflammatory response; the covariance of this fetal indicator with maternal inflammation was stronger than with fetal inflammation (0.848 vs. 0.693, Table 1).
We then applied this model to each of the other 11 cohorts, and achieved generally as good a fit as for the Boston cohort. However, the loadings for the different histology scores differed significantly from the Boston cohort (Table 2, Wald tests). In addition, comparing the loadings for the group of indicators of maternal and fetal inflammatory responses showed that there were multivariate significant differences from the Boston cohort. In other words, the latent variables of maternal and fetal inflammation are not indicated by the histology scores uniformly across the cohorts. Further inspection of the data revealed other disturbing patterns. In general, maternal and fetal inflammatory responses tend to coincide; there may be variability in the relative strengths of each response, but they tend to be present together. The extent of covariance of maternal and fetal inflammatory responses was widely different among cohorts, ranging from 0.435 (Providence) to 2.094 (Pennsylvania). Moreover the means of the latent variables not only differed from that of the Boston cohort (indicating different prevalences of the histology scores, which would not be unexpected), but they differed in opposite directions (e.g., Buffalo, New Orleans, NY/Columbia, Virginia, Minnesota, NY/Medical, Oregon, Pennsylvania, Providence and Tennessee). These comparisons are, however, difficult to interpret because the measurements are not directly comparable across the cohorts.
These data demonstrate that, in the CPP, individual histology scorings of neutrophil infiltrates, markers of intramniotic infection, demonstrate significant differences in their contributions to more general constructs of maternal inflammation and fetal inflammation. While it is possible that demographic and genetic factors may account for part of these differences, at least some of the variability must be due to inter-observer factors. The lack of measurement invariance means that these scorings cannot be used to represent the same construct (or underlying biological process) in different cohorts. In the psychometric literature this is termed “differential item functioning”, or “DIF” 19, and threatens the validity of the measurement instrument. In psychometrics, items showing DIF are rewritten or removed from the instrument in order to generate measures of the latent constructs that can be generalized across groups.
Despite the measurement invariance we have identified in the graded scores of the CPP, we strongly reject one alternative model, namely, collapsing the multiple category scoring system, as has been suggested, because “these distinctions are of no documented clinical significance”. 7 Generally, information is expensive and difficult to collect, and should not be discarded lightly. Certainly if the diagnostic categories are discarded, there will be no chance to document clinical significance moving forward. Our goals should instead be to explore methods that allow improved reliability including image segmentation from digitized slides. 20
A more immediate and concrete criticism to collapsing the scoring system is that the neutrophil infiltrates in amnion, chorion and umbilical cord (for example) are of interest to us only insofar as they reflect aspects of the process of intraamniotic infection, a process we cannot otherwise directly access. Neutrophil infiltrates are indicators of the underlying latent (and not directly observable or measureable) variable in which we are truly interested. In modern pathology practice, we are forced to employ categorical scores as representations of one (or more) underlying continuous variables. However, as the categorical scale is progressively reduced from 0–4 to 0–1, as “absent/present”), the correlation of those scores with the underlying latent variable is also reduced.22 The simpler scale may be more “reliable” but it is less representative of the latent/unobservable process(es) in which we are truly interested. We may trade an appearance of reliability for a long-term limitation on the explanatory value of histology scorings, and ultimately, their utility in both research and clinical contexts.
The HUGE project data underscore the potential disadvantages of “lumping” vs. “splitting’ with regard to such information. The understanding that genetic polymorphisms modify aspects of the maternal and fetal inflammatory responses to a commonly perceived intraamnmiotic infectious stimulus is relatively recent. 23 All histologic scores may not be created equal; some neutrophilic infiltrates may represent an “uphill battle” with gene polymorphisms that would down-regulate inflammatory responses. Other inflammatory responses may have been facilitated by the genetic environments of the mother, the fetus or both. Collapsing a continuous process (recruitment of neutrophils and diapedesis from their site of origin) into, at the extreme, present vs absent 7 precludes ever disentangling the complex interplay between maternal and fetal genetic capacities and the infectious stimulus.
As noted above, chorioamnionitis, defined as the presence of intraamniotic microbial organisms triggering maternal and/or fetal inflammatory responses, plays a significant role in reproductive and childhood pathology. While risk of morbidity rises with severity of inflammation, most infants will not experience adverse outcomes. This suggests that the key exposure is heterogeneous and that the heterogeneity is not reflected in commonly used summary measures of infection. The underlying process of acute intraamniotic infection is physiologically complex, involving cytokines, chemokines, prostanoids, proteases, matrix metallo-proteinases, and almost innumerable other biologically active compounds. Is the categorical quantification of neutrophils the only facet of inflammation that is physiologically relevant to the outcomes that have been associated with acute intraamniotic infection? It is not unreasonable to suggest that the answer to this question may be “No”. Rather than collapsed into fewer categories, histology scoring may need to be expanded to cover other features (such as connective tissue characteristics, fibroblast proliferation, neutrophil karyorrhexis) that may mark other facets of the complex pathophysiology of intraamniotic infection.
We propose that perinatal researchers should, to use a worn but appropriate cliché, step outside the box and consider alternative approaches to both measurement of histology slides that would yield adequate reliability to allow cross-institutional analysis of the latent construct (s) involved in intraamniotic infection and ultimately to achieve a fuller understanding of the infection-preterm birth pathway.