|Home | About | Journals | Submit | Contact Us | Français|
In developing countries, newborn omphalitis contributes significantly to morbidity and mortality. Community based identification and management of omphalitis will require standardised clinical sign based definitions.
To identify optimal sign based algorithms to define omphalitis in the community and to evaluate the reliability and validity of cord assessments by non‐specialist health workers for clinical signs of omphalitis.
Within a trial of the impact of topical antiseptics on umbilical cord infection in rural Nepal, digital images of the umbilical cord were collected. Workers responsible for in‐home examinations of the umbilical cord evaluated the images for signs of infection (pus, redness, swelling). Intraworker and interworker agreement was evaluated, and sensitivity and specificity compared with a physician generated gold standard ranking were estimated.
Sensitivity and specificity of worker evaluations were high for pus (90% and 96% respectively) and moderate for redness (57% and 95% respectively). Swelling was the least reliably identified sign. Measures of observer agreement were similar to that previously recorded between experts evaluating subjective skin conditions. A composite definition for omphalitis that combined pus and redness without regard to swelling was the most sensitive and specific.
Two sign based algorithms for defining omphalitis are recommended for use in the community. Focusing on redness extending to the skin around the base of the stump will identify cases of moderate and high severity. Requiring both the presence of pus and redness will result in a definition with very high specificity and moderate to high sensitivity.
Omphalitis contributes to neonatal morbidity and mortality in developing countries.1 However, community based data on timing, case fatality, and incidence of non‐tetanus umbilical cord infection await identification of the best set of clinical signs to define infection. Evaluation of the performance of community health workers in recognising signs of omphalitis is a crucial step in translating clinical based diagnostic approaches to the community setting.
Umbilical cord infections present with variable signs, including pus, erythema, swelling, warmth, tenderness, and/or foul odour. In both developed2,3,4 and developing countries, 5,6,7,8 clinical definitions have varied considerably, and in some cases have required a positive umbilical culture. Diagnosis in the community, however, must be based solely on clinical signs of infection. An evaluation of the relative reliability and validity of potential signs is essential to the development of useful operational sign based definitions of omphalitis.
In visually dependent areas of medicine, formulating an accurate differential diagnosis from photographic slides is well integrated into training programmes.9,10,11,12,13 Classification of signs of skin lesions, however, is subjective and leads to substantial within‐observer variation, even among experts.14,15,16,17,18 The reliability of community health workers in identifying signs of omphalitis has not yet been assessed, and comparing worker assessments with those of a medical expert would provide credibility to use of field based diagnostic algorithms.
Given the potential importance of topical cord antisepsis,19,20 we designed a community based trial of the impact of chlorhexidine skin and cord cleansing on omphalitis and neonatal mortality in Sarlahi district, Nepal. Within this trial, we assessed the reliability and validity of sign based definitions for cord infection in the community through use of digital images and repeated measures of intraworker and interworker variation.
After giving informed consent, pregnant women were enrolled and followed until delivery. During home visits, the umbilical cord of newborns was examined for pus, redness, and swelling on days 1–4, 6, 8, 10, 12, 14, 21, and 28 after birth. For redness or swelling, workers indicated severity by recording “mild” (limited to the cord stump only), “moderate” (effecting abdominal skin at the base of the stump, <2 cm), or “severe” (redness spreading outward, >2 cm) (fig 11).). Workers (n = 61) learned to recognise potential signs of infection using images of the cord illustrating both the normal healing process and omphalitis of varying severity. Practical training under the guidance of supervisory staff members included examination of the cord of newborns in the community. Eleven more senior area coordinators were responsible for cord examinations during the first seven days, and subsequent examinations were conducted by 50 team leader interviewers.
Between February 2003 and January 2004, workers used digital cameras (Olympus D‐380; Olympus America Inc, Melville, New York, USA) during regular home visits to record a sample of umbilical cord images across the neonatal period. Among over 4500 images, 50 were selected to create a standard set for testing reliability and validity of cord assessments within a one hour testing period. To avoid overestimation of agreement through guessing, and to allow comparison of multiple potential definitions of infection, the set was overpopulated with positive images. In three training sessions, conducted about three months apart, all workers assessed this standard set for signs of infection.
Individual signs and a priori determined combinations of signs (algorithms) were assessed for reliability and validity (table 11)) using kappa (κ) and percentage agreement, the overall proportion of matching observations. Multiple‐observer κ and percentage agreement were estimated according to extensions described previously.21,22 Sensitivity, specificity, and positive/negative predictive values were estimated by comparison with gold standard rankings by a board certified paediatric dermatologist (GLD). The internal consistency of the gold standard rankings was estimated by a second assessment of the rankings by GLD, and the validity of the gold standard was estimated by obtaining an assessment by an independent paediatric dermatologist. Analyses were conducted using Stata 8.0 (Stata Corp, College Station, Texas, USA).
The Nepal Health Research Council (Kathmandu, Nepal) and the Committee on Human Research of the Johns Hopkins Bloomberg School of Public Health (Baltimore, USA) approved the protocol.
Table 11 shows the number and proportion of photographs in the standard set that met the defined criteria for each sign or algorithm, according to gold standard rankings.
After calculation of the intraobserver agreement for each worker, the proportion of workers with κ >0.4 and the median level of percentage agreement across all workers was estimated (table 22).
Pus was most consistently recognised by workers, and redness showed significantly higher levels of agreement than swelling. Algorithms with broad definitions (Alg‐04, Alg‐08), and those not requiring swelling (Alg‐06, Alg‐07, Alg‐10) were scored more consistently than those requiring a distinction between swelling severity grades (Alg‐05, Alg‐09). Median percentage agreement was moderate to high for all signs (>60%) and algorithms (>75%).
Table 33 shows interworker agreement by training session. Interobserver agreement trended higher across later assessment sessions. Agreement in pus evaluations during the third training session (percentage agreement, 88.7; κ statistic, 0.77) was substantial. As with intraobserver agreement, redness was more reliable across workers than swelling. Algorithms 05 and 09 were the least reliably assessed algorithms, largely a result of requiring observers to distinguish between grades of swelling.
For the final training session, sensitivity, specificity, and predictive values for pus, dichotomised rankings of redness and swelling, and each of the infection algorithms compared with the gold standard rankings are shown in table 44.
When workers were required to distinguish between moderate/severe and none/mild levels of swelling, sensitivity was reduced. Specificity was high (>94%) for all algorithms. More experienced workers (area coordinators) had higher specificity and significant increases in positive predictive value (table 55).
Repeat rankings by the gold standard observer were highly reliable. Exact classification of swelling was the least consistent of all individual signs and algorithms (κ = 0.77), but still in the moderate to excellent range (data not shown). Table 66 shows variation between the two expert observers.
As with intraobserver and interobserver reliability, agreement between the expert observers was high for pus and redness, whereas swelling was generally classified with poor consistency (κ range 0.09–0.25). For composite algorithms, the range of agreement was considerable, from excellent (Alg‐06, Alg‐07) or substantial (Alg‐04, Alg‐08, Alg‐10) to poor for those requiring a distinction between severe and non‐severe swelling (Alg‐05, Alg‐09).
Workers consistently evaluated the presence or absence of pus, and intraobserver κ statistics for redness were moderate or greater for more than half the workers. Swelling was inconsistently recognised, yet there was high median percentage agreement. As workers seldom graded swelling in the moderate/severe category, the marginal distribution was highly skewed, and each discordant assessment was heavily penalised when κ was calculated.
Levels of agreement were similar to previously documented estimates of intraspecialist variation in assessments of digital images for skin conditions.14,15,16 Intraobserver variation among highly trained specialists in other fields has also been considerable when the diagnosis was subjective23,24; less variation has been seen for more objective outcomes such as respiratory/heart rate or body temperature.25,26,27
The improvement across training sessions is unlikely to be biased by recall of previous assessments as the number of images was large (n = 50), the period between assessments long (three months), and images were reviewed in random order. As observed elsewhere,15,24 interobserver agreement was consistently less than intraobserver agreement, and comparable to those noted previously for classification of skin conditions.14,15,28,29
Worker assessments were highly sensitive and specific for pus and severe redness, but swelling was rarely identified. Whereas specificity remained high for all individual signs (>0.95), sensitivity varied considerably across the proposed algorithms, and was lowest when the more subjective distinction between grades of swelling was required. Similarly, more easily identified signs (tachypnoea) used in integrated management of childhood illness were more sensitive than subjective signs (chest indrawing, palmar pallor).30,31,32,33,34,35
The tedious assessment exercises (about 45 minutes) may have led to decreased concentration and underestimates of reliability, as suggested elsewhere.36,37 Previous investigators have stressed the importance of experience in observers.9,26,30 In our study the large number of workers, range of ability, and varied levels of previous experience probably increased discordance, as evidenced by the reduced validity among the less experienced workers (team leader interviewers). The two dimensional images limited the ability of both workers and expert readers to evaluate the inherently three dimensional character of swelling. Thus our agreement indicators for swelling may underestimate the value of this sign in defining omphalitis.
We recommend two specific algorithms. The first (Alg‐02, binary) requires redness at the moderate or severe level, whereas a second recommended algorithm (Alg‐10) requires severe redness, or pus with moderate redness. Both definitions are highly specific; the former may be more useful in settings or programmes where a higher number of false positives can be tolerated, whereas the latter will be more useful in situations where the focus is on severe cases. Research is required to further develop and validate these algorithms in other populations, such as in Africa, where assessment of omphalitis prevalence and impact of treatment will depend on sign based diagnosis.
This study was supported by grants from the National Institutes of Health, National Institute of Child Health and Human Development (HD44004 and HD38753), and The Bill & Melinda Gates Foundation (810‐2054) and cooperative agreements between the Johns Hopkins Bloomberg School of Public Health and the Office of Heath and Nutrition, United States Agency for International Development (HRN‐A‐00‐97‐00015‐00, GHS‐A‐00‐03‐000019‐00). The funding sources played no role in the study design, collection, data analysis, writing of the report, or decision to submit the paper for publication. Dr Buddy Cohen, Department of Dermatology, Johns Hopkins University, provided the alternative rankings of the 50 photographs in the standard set. The corresponding author (LCM) had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Competing interests: none declared
Parental consent was obtained for publication of figure 1