PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of archdischfnLink to Publisher's site
 
Arch Dis Child Fetal Neonatal Ed. Mar 2006; 91(2): F99–F104.
PMCID: PMC1379664
NIHMSID: NIHMS5546
Development of clinical sign based algorithms for community based assessment of omphalitis
L C Mullany, G L Darmstadt, J Katz, S K Khatry, S C LeClerq, R K Adhikari, and J M Tielsch
L C Mullany, G L Darmstadt, J Katz, S C LeClerq, J M Tielsch, Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
S K Khatry, S C Leclerq, Nepal Nutrition Intervention Project, Sarlahi (NNIPS), Kathmandu, Nepal
R K Adhikari, Institute of Medicine, Tribhuvan University, Kathmandu
Correspondence to: Dr Mullany
Department of International Health, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe Street, Suite W5021, Baltimore, MD 21211, USA; lmullany@jhsph.edu
Accepted September 19, 2005.
Background
In developing countries, newborn omphalitis contributes significantly to morbidity and mortality. Community based identification and management of omphalitis will require standardised clinical sign based definitions.
Objective
To identify optimal sign based algorithms to define omphalitis in the community and to evaluate the reliability and validity of cord assessments by non‐specialist health workers for clinical signs of omphalitis.
Design
Within a trial of the impact of topical antiseptics on umbilical cord infection in rural Nepal, digital images of the umbilical cord were collected. Workers responsible for in‐home examinations of the umbilical cord evaluated the images for signs of infection (pus, redness, swelling). Intraworker and interworker agreement was evaluated, and sensitivity and specificity compared with a physician generated gold standard ranking were estimated.
Results
Sensitivity and specificity of worker evaluations were high for pus (90% and 96% respectively) and moderate for redness (57% and 95% respectively). Swelling was the least reliably identified sign. Measures of observer agreement were similar to that previously recorded between experts evaluating subjective skin conditions. A composite definition for omphalitis that combined pus and redness without regard to swelling was the most sensitive and specific.
Conclusions
Two sign based algorithms for defining omphalitis are recommended for use in the community. Focusing on redness extending to the skin around the base of the stump will identify cases of moderate and high severity. Requiring both the presence of pus and redness will result in a definition with very high specificity and moderate to high sensitivity.
Keywords: omphalitis, infection, umbilical cord infection, validation, Nepal
Omphalitis contributes to neonatal morbidity and mortality in developing countries.1 However, community based data on timing, case fatality, and incidence of non‐tetanus umbilical cord infection await identification of the best set of clinical signs to define infection. Evaluation of the performance of community health workers in recognising signs of omphalitis is a crucial step in translating clinical based diagnostic approaches to the community setting.
Umbilical cord infections present with variable signs, including pus, erythema, swelling, warmth, tenderness, and/or foul odour. In both developed2,3,4 and developing countries, 5,6,7,8 clinical definitions have varied considerably, and in some cases have required a positive umbilical culture. Diagnosis in the community, however, must be based solely on clinical signs of infection. An evaluation of the relative reliability and validity of potential signs is essential to the development of useful operational sign based definitions of omphalitis.
In visually dependent areas of medicine, formulating an accurate differential diagnosis from photographic slides is well integrated into training programmes.9,10,11,12,13 Classification of signs of skin lesions, however, is subjective and leads to substantial within‐observer variation, even among experts.14,15,16,17,18 The reliability of community health workers in identifying signs of omphalitis has not yet been assessed, and comparing worker assessments with those of a medical expert would provide credibility to use of field based diagnostic algorithms.
Given the potential importance of topical cord antisepsis,19,20 we designed a community based trial of the impact of chlorhexidine skin and cord cleansing on omphalitis and neonatal mortality in Sarlahi district, Nepal. Within this trial, we assessed the reliability and validity of sign based definitions for cord infection in the community through use of digital images and repeated measures of intraworker and interworker variation.
Study design
After giving informed consent, pregnant women were enrolled and followed until delivery. During home visits, the umbilical cord of newborns was examined for pus, redness, and swelling on days 1–4, 6, 8, 10, 12, 14, 21, and 28 after birth. For redness or swelling, workers indicated severity by recording “mild” (limited to the cord stump only), “moderate” (effecting abdominal skin at the base of the stump, <2 cm), or “severe” (redness spreading outward, >2 cm) (fig 11).). Workers (n  = 61) learned to recognise potential signs of infection using images of the cord illustrating both the normal healing process and omphalitis of varying severity. Practical training under the guidance of supervisory staff members included examination of the cord of newborns in the community. Eleven more senior area coordinators were responsible for cord examinations during the first seven days, and subsequent examinations were conducted by 50 team leader interviewers.
figure fn80093.f1
Figure 1 Images of umbilical cord of infants in Sarlahi, Nepal: (A) mild redness, four days after birth; (B) pus, moderate redness, six days after birth; (C) moderate swelling, four days after birth; (D) severe redness, three days after birth; (more ...)
Between February 2003 and January 2004, workers used digital cameras (Olympus D‐380; Olympus America Inc, Melville, New York, USA) during regular home visits to record a sample of umbilical cord images across the neonatal period. Among over 4500 images, 50 were selected to create a standard set for testing reliability and validity of cord assessments within a one hour testing period. To avoid overestimation of agreement through guessing, and to allow comparison of multiple potential definitions of infection, the set was overpopulated with positive images. In three training sessions, conducted about three months apart, all workers assessed this standard set for signs of infection.
Statistical analysis
Individual signs and a priori determined combinations of signs (algorithms) were assessed for reliability and validity (table 11)) using kappa (κ) and percentage agreement, the overall proportion of matching observations. Multiple‐observer κ and percentage agreement were estimated according to extensions described previously.21,22 Sensitivity, specificity, and positive/negative predictive values were estimated by comparison with gold standard rankings by a board certified paediatric dermatologist (GLD). The internal consistency of the gold standard rankings was estimated by a second assessment of the rankings by GLD, and the validity of the gold standard was estimated by obtaining an assessment by an independent paediatric dermatologist. Analyses were conducted using Stata 8.0 (Stata Corp, College Station, Texas, USA).
Table thumbnail
Table 1 Composition of the standard set of photographs (n  = 50) by clinical signs and algorithms
Ethical approval
The Nepal Health Research Council (Kathmandu, Nepal) and the Committee on Human Research of the Johns Hopkins Bloomberg School of Public Health (Baltimore, USA) approved the protocol.
Table 11 shows the number and proportion of photographs in the standard set that met the defined criteria for each sign or algorithm, according to gold standard rankings.
After calculation of the intraobserver agreement for each worker, the proportion of workers with κ >0.4 and the median level of percentage agreement across all workers was estimated (table 22).
Table thumbnail
Table 2 Intraobserver reliability: proportion of workers (n  = 61) with κ >0.4 by sign or algorithm
Pus was most consistently recognised by workers, and redness showed significantly higher levels of agreement than swelling. Algorithms with broad definitions (Alg‐04, Alg‐08), and those not requiring swelling (Alg‐06, Alg‐07, Alg‐10) were scored more consistently than those requiring a distinction between swelling severity grades (Alg‐05, Alg‐09). Median percentage agreement was moderate to high for all signs (>60%) and algorithms (>75%).
Table 33 shows interworker agreement by training session. Interobserver agreement trended higher across later assessment sessions. Agreement in pus evaluations during the third training session (percentage agreement, 88.7; κ statistic, 0.77) was substantial. As with intraobserver agreement, redness was more reliable across workers than swelling. Algorithms 05 and 09 were the least reliably assessed algorithms, largely a result of requiring observers to distinguish between grades of swelling.
Table thumbnail
Table 3 Interobserver reliability: κ and percentage agreement for signs and algorithms, by training session
For the final training session, sensitivity, specificity, and predictive values for pus, dichotomised rankings of redness and swelling, and each of the infection algorithms compared with the gold standard rankings are shown in table 44.
Table thumbnail
Table 4 Sensitivity/specificity analysis by sign or algorithm for third training session (compared with the gold standard rankings)
When workers were required to distinguish between moderate/severe and none/mild levels of swelling, sensitivity was reduced. Specificity was high (>94%) for all algorithms. More experienced workers (area coordinators) had higher specificity and significant increases in positive predictive value (table 55).
Table thumbnail
Table 5 Comparison of validity measures by worker level (area coordinators versus team leader interviewers)
Repeat rankings by the gold standard observer were highly reliable. Exact classification of swelling was the least consistent of all individual signs and algorithms (κ  = 0.77), but still in the moderate to excellent range (data not shown). Table 66 shows variation between the two expert observers.
Table thumbnail
Table 6 Percentage agreement and κ statistics for expert rankings by sign or algorithm
As with intraobserver and interobserver reliability, agreement between the expert observers was high for pus and redness, whereas swelling was generally classified with poor consistency (κ range 0.09–0.25). For composite algorithms, the range of agreement was considerable, from excellent (Alg‐06, Alg‐07) or substantial (Alg‐04, Alg‐08, Alg‐10) to poor for those requiring a distinction between severe and non‐severe swelling (Alg‐05, Alg‐09).
Reliability
Workers consistently evaluated the presence or absence of pus, and intraobserver κ statistics for redness were moderate or greater for more than half the workers. Swelling was inconsistently recognised, yet there was high median percentage agreement. As workers seldom graded swelling in the moderate/severe category, the marginal distribution was highly skewed, and each discordant assessment was heavily penalised when κ was calculated.
What is already known on this topic
  • Umbilical cord infection contributes to neonatal morbidity and mortality in developing countries
  • As experienced medical professionals are rarely available in resource‐poor settings, community based identification and management of omphalitis will require standardised sign based definitions
What this study adds
  • This study describes the use of digital images of the umbilical cord to systematically evaluate the ability of health workers to recognise signs of omphalitis (pus, redness, swelling)
  • This methodological approach and the resulting definitions may be used in future investigations to enable rigorous evaluation of interventions designed to decrease neonatal omphalitis
Levels of agreement were similar to previously documented estimates of intraspecialist variation in assessments of digital images for skin conditions.14,15,16 Intraobserver variation among highly trained specialists in other fields has also been considerable when the diagnosis was subjective23,24; less variation has been seen for more objective outcomes such as respiratory/heart rate or body temperature.25,26,27
The improvement across training sessions is unlikely to be biased by recall of previous assessments as the number of images was large (n  = 50), the period between assessments long (three months), and images were reviewed in random order. As observed elsewhere,15,24 interobserver agreement was consistently less than intraobserver agreement, and comparable to those noted previously for classification of skin conditions.14,15,28,29
Validity
Worker assessments were highly sensitive and specific for pus and severe redness, but swelling was rarely identified. Whereas specificity remained high for all individual signs (>0.95), sensitivity varied considerably across the proposed algorithms, and was lowest when the more subjective distinction between grades of swelling was required. Similarly, more easily identified signs (tachypnoea) used in integrated management of childhood illness were more sensitive than subjective signs (chest indrawing, palmar pallor).30,31,32,33,34,35
Limitations
The tedious assessment exercises (about 45 minutes) may have led to decreased concentration and underestimates of reliability, as suggested elsewhere.36,37 Previous investigators have stressed the importance of experience in observers.9,26,30 In our study the large number of workers, range of ability, and varied levels of previous experience probably increased discordance, as evidenced by the reduced validity among the less experienced workers (team leader interviewers). The two dimensional images limited the ability of both workers and expert readers to evaluate the inherently three dimensional character of swelling. Thus our agreement indicators for swelling may underestimate the value of this sign in defining omphalitis.
Conclusion
We recommend two specific algorithms. The first (Alg‐02, binary) requires redness at the moderate or severe level, whereas a second recommended algorithm (Alg‐10) requires severe redness, or pus with moderate redness. Both definitions are highly specific; the former may be more useful in settings or programmes where a higher number of false positives can be tolerated, whereas the latter will be more useful in situations where the focus is on severe cases. Research is required to further develop and validate these algorithms in other populations, such as in Africa, where assessment of omphalitis prevalence and impact of treatment will depend on sign based diagnosis.
Acknowledgements
This study was supported by grants from the National Institutes of Health, National Institute of Child Health and Human Development (HD44004 and HD38753), and The Bill & Melinda Gates Foundation (810‐2054) and cooperative agreements between the Johns Hopkins Bloomberg School of Public Health and the Office of Heath and Nutrition, United States Agency for International Development (HRN‐A‐00‐97‐00015‐00, GHS‐A‐00‐03‐000019‐00). The funding sources played no role in the study design, collection, data analysis, writing of the report, or decision to submit the paper for publication. Dr Buddy Cohen, Department of Dermatology, Johns Hopkins University, provided the alternative rankings of the 50 photographs in the standard set. The corresponding author (LCM) had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Footnotes
Competing interests: none declared
Parental consent was obtained for publication of figure 1
1. World Health Organization Care of the umbilical cord. WHO/FHE/MSM‐cord care. Geneva: WHO, 1998.
2. Pezzati M, Biagioli E C, Martelli E. et al Umbilical cord care: the effect of eight different cord‐care regimens on cord separation time and other outcomes. Biol Neonate 2002. 8138–44.44. [PubMed]
3. Janssen P A, Selwood B L, Dobson S R. et al To dye or not to dye: a randomized, clinical trial of a triple dye/alcohol regime versus dry cord care. Pediatrics 2003. 11115–20.20. [PubMed]
4. Ford L A, Ritchie J A. Maternal perceptions of newborn umbilical cord treatments and healing. J Obstet Gynecol Neonatal Nurs 1999. 28501–506.506. [PubMed]
5. Gϋvenç H, Gϋvenç M, Yenioglu H. et al Neonatal omphalitis is still common in eastern Turkey. Scand J Infect Dis 1991. 23613–616.616. [PubMed]
6. Airede A. Pathogens in neonatal omphalitis. J Trop Pediatr 1992. 38129–131.131. [PubMed]
7. Faridi M M A, Rattan A, Ahmad S H. Omphalitis Neonatorum. J Ind Med Assoc 1993. 91283–285.285. [PubMed]
8. Sawardekar K P. Changing spectrum of neonatal omphalitis. Pediatr Infect Dis J 2004. 2322–26.26. [PubMed]
9. Oliveira M R, Wen C L, Neto C F. et al Web site for training nonmedical health‐care workers to identify potentially malignant skin lesions and for teledermatology. Telemed J E Health 2002. 8323–332.332. [PubMed]
10. Papier A, Peres M R, Bobrow M. et al The digital imaging system and dermatology. Int J Dermatol 2000. 39561–575.575. [PubMed]
11. Mann T, Colven R. A picture is worth more than a thousand words: enhancement of a pre‐exam telephone consultation in dermatology with digital images. Acad Med 2002. 77742–743.743. [PubMed]
12. Cyr P R. Family practice center‐based training in skin disorders: a photographic approach. Fam Med 1995. 27109–111.111. [PubMed]
13. Fawcett R S, Widmaier E J, Cavanaugh S H. Digital technology enhances dermatology teaching in a family medicine residency. Fam Med 2004. 3689–91.91. [PubMed]
14. Griffiths C E M, Wang T S, Hamilton T A. et al A photonumeric scale for the assessment of cutaneous photodamage. Arch Dermatol 1992. 128347–351.351. [PubMed]
15. Lund C H, Osborne J W. Validity and reliability of the neonatal skin condition score. J Obstet Gynecol Neonatal Nurs 2004. 33320–327.327. [PubMed]
16. Perednia D A, Gaines J A, Rossum A C. Variability in physician assessment of lesions in cutaneous images and its implications for skin screening and computer‐assisted diagnosis. Arch Dermatol 1992. 128357–364.364. [PubMed]
17. Whited J D, Hall R P, Simel D L. et al Primary care clinicians' performance for detecting actinic keratoses and skin cancer. Arch Intern Med 1997. 157985–990.990. [PubMed]
18. Whited J D, Hall R P, Simel D L. et al Reliability and accuracy of dermatologists' clinic‐based and digital image consultations. J Am Acad Dermatol 1999. 41693–702.702. [PubMed]
19. Zupan J, Garner P, Omari A A A. Topical umbilical cord care at birth (Cochrane Review). Cochrane Library. Issue 3. Chichester: John Wiley & Sons, Ltd, 2004.
20. Mullany L C, Darmstadt G L, Tielsch J M. Role of antimicrobial applications to the umbilical cord in neonates to prevent bacterial colonization and infection: a review of the evidence. Pediatr Infect Dis J 2003. 22996–1002.1002. [PMC free article] [PubMed]
21. Fleiss J L. Measuring nominal scale agreement among many raters. Psychol Bull 1971. 76378–382.382.
22. Landis J R, Koch G G. A one‐way components of variance model for categorical data. Biometrics 1977. 33671–679.679.
23. Nicholson A G, Addis B J, Bharucha H. et al Inter‐observer variation between pathologists in diffuse parenchymal lung disease. Thorax 2004. 59500–505.505. [PMC free article] [PubMed]
24. Fine P E, Job C K, Lucas S B. et al Extent, origin, and implications of observer variation in the histopathological diagnosis of suspected leprosy. Int J Lepr Other Mycobact Dis 1993. 61270–282.282. [PubMed]
25. Lim W S, Carty S M, Macfarlane J T. et al Respiratory rate measurement in adults: how reliable is it? Respir Med 2002. 9631–33.33. [PubMed]
26. Edmonds Z V, Mower W R, Lovato L M. et al The reliability of vital sign measurements. Ann Emerg Med 2002. 39233–237.237. [PubMed]
27. Singhi S, Bhalla A K, Bhandari A. et al Counting respiratory rate in infants under 2 months: comparison between observation and auscultation. Ann Trop Paediatr 2003. 23135–138.138. [PubMed]
28. Taylor P. An assessment of the potential effect of a teledermatology system. J Telemed Telecare 2000. 6(suppl 1)74–76.76. [PubMed]
29. Whited J D, Horner R D, Hall R P. et al The influence of history on interobserver agreement for diagnosing actinic keratoses and malignant skin lesions. J Am Acad Dermatol 1995. 33603–607.607. [PubMed]
30. Kahigwa E, Schellenberg D, Schellenberg J A. et al Inter‐observer variation in the assessment of clinical signs in sick Tanzanian children. Trans R Soc Trop Med Hyg 2002. 96162–166.166. [PubMed]
31. Perkins B A, Zucker J R, Otieno J. et al Evaluation of an algorithm for integrated management of childhood illness in an area of Kenya with high malaria transmission. Bull World Health Organ 1997. 75(suppl 1)33–42.42. [PubMed]
32. Weber M W, Mulholland E K, Jaffar S. et al Evaluation of an algorithm for the integrated management of childhood illness in an area with seasonal malaria in the Gambia. Bull World Health Organ 1997. 75(suppl 1)25–32.32. [PubMed]
33. Kolstad P R, Burnham G, Kalter H D. et al The integrated management of childhood illness in western Uganda. Bull World Health Organ 1997. 75(suppl 1)77–85.85. [PubMed]
34. Horwood C, Liebeschuetz S, Blaauw D. et al Diagnosis of paediatric HIV infection in a primary health care setting with a clinical algorithm. Bull World Health Organ 2003. 81858–866.866. [PubMed]
35. Simoes E A, Desta T, Tessema T. et al Performance of health workers after training in integrated management of childhood illness in Gondar, Ethiopia. Bull World Health Organ 1997. 75(suppl 1)43–53.53. [PubMed]
36. Taylor P, Goldsmith P, Murray K. et al Evaluating a telemedicine system to assist in the management of dermatology referrals. Br J Dermatol 2001. 144328–333.333. [PubMed]
37. Eedy D J, Wootton R. Teledermatology: a review. Br J Dermatol 2001. 144696–707.707. [PubMed]
Articles from Archives of Disease in Childhood. Fetal and Neonatal Edition are provided here courtesy of
BMJ Group