Programme site characteristics
The 21 sites surveyed provided ART to a total of 50 060 patients. The median number of patients on ART per site was 1000 (interquartile range, IQR: 320 to 2398). shows selected site characteristics. All programmes except one were in urban facilities; 11 (52%) were in public facilities; 9 (43%) were run by NGOs and one was a private, for-profit clinic. Eighteen sites (86%) received funding from a donor agency (mostly the Global Fund to fight AIDS, Tuberculosis and Malaria, the President’s Emergency Plan for AIDS Relief and the Bill & Melinda Gates Foundation); 14 (67%) obtained funding from two sources and 5 (24%) from three. One site reported funding from seven different sources.
General characteristics and data entry staff of 21 ART programmes in low- and middle-income countries participating in a study correlating record-keeping practices with losses to follow-up
Overall, 18 (86%) sites routinely used an electronic database. The median number of weekly hours spent on the database per 100 patients on ART was 3.6 (IQR: 1.6–5.1) for clerks and 1.5 (IQR: 0.3–5.7) for medical personnel (physicians or nurses). Four of the 18 sites (22%) exclusively employed clerks for data entry and two (11%) used medical staff only. Thirteen of the 21 sites (62%) had personnel trained in data management or data quality control. Sixteen of the 18 sites (89%) captured patient data by means of written charts during consultations. Three (14%) entered the data electronically during consultations.
Database characteristics and data quality measures
Among the 18 sites that had an electronic database, 11 (61%) used the same software for data collection and data management, six (33%) used two different packages and 1 site used as many as five because of various research and reporting requirements. Ten sites (56%) used generic software such as Microsoft Access (Microsoft Corporation, Seattle, WA, USA) or FileMaker Pro (FileMaker, Inc., Santa Clara, CA, USA), only 5 sites (28%) used systems developed for this purpose, such as FUCHIA (Follow-Up of Clinical HIV Infection and AIDS, from Doctors without Borders and EpiCentre) or ESOPE (from Ensemble pour une solidarité thérapeutique hospitalière en réseau, ESTHER
). A relational database with files for demographic data, clinical events, drugs and follow-up examinations was in place in 12 (67%) of the 18 sites using electronic databases. Only three such sites (17%) used solutions based on a Structured Query Language (SQL)13,14
server, which allows management of very large numbers of patients. Nine sites (50%) reported that the database could be linked to other data, including laboratory (8 sites) and pharmacy (7 sites) databases, the nutritional support unit (2 sites) or the socioeconomic support unit (3 sites). Standardized export formats were available at 6 sites (33%); five used Extensible Markup Language (XML) and one used the Health Level Seven (HL7) standard.
summarizes the main purpose of each database. Fourteen sites (78%) stated that both patient management and reporting requirements were important reasons for having the database. Also shown are the measures in place to ensure the quality of the data for selected key variables: CD4 counts, drugs and important dates (date of birth and dates of laboratory measurements, follow-up visits and death). Box 1 defines commonly used measures to improve and ensure data quality, including bounds checking, digit checks, fixed taxonomies, numerical alerts and Write Once, Read Many (WORM) computer data storage systems. For the variable CD4 cell counts, at least one of the measures listed in Box 1 was in place at 12 (67%) sites. Four (22%) sites reported two or more measures. Digit check and bounds checking were more frequently used (n = 6) than WORM systems (n = 4) and numerical alerts (n = 3). Fixed taxonomies of drugs were in use at 9 (50%) sites, and 3 (17%) sites additionally reported a WORM strategy. Regarding the four key dates, 7 (39%) reported WORM systems and 5 (28%) reported bounds checking for all four dates. Three (17%) reported both WORM and bounds for all four dates’ records. Use of controlled medical vocabularies was reported by 3 sites; all used the International statistical classification of diseases and related health problems, 10th revision (ICD-10). Twelve sites (67%) performed a daily backup, 4 sites (22%) did a weekly or monthly backup and 2 sites (11%) had no backup strategy in place.
Data quality control and means for patient follow-up observed in a 15-country study correlating medical record-keeping practices with losses to follow-up in ART programmes
Tracing of patients and missing data
Fifteen (71%) of the 21 sites indicated that they traced patients lost to follow-up. This included outreach teams at 11 (52%) sites, collaboration with community-based organizations at 5 sites (24%) and checking death registry data at 7 (33%) sites. The majority of the registries consulted were local death registries, such as hospital registries. Fourteen sites (67%) indicated they recorded when patients moved to another clinic and transferred their records on these occasions. Among the 18 sites with electronic databases, 5 (28%) had automatic alerts for missed visits.
Analyses of missing data were based on 41 936 patients from 19 sites and analyses of loss to follow-up on 36 149 patients from 18 sites participating in the ART-LINC collaboration. shows the proportion of missing values for the key variables that we used to create the missing data index, as well as the proportion of patients lost to follow-up. There was considerable variation across sites, as indicated by wide inter-quartile ranges. Missing data were more frequent for variables relating to laboratory measures than for demographic or clinical information. The median missing data index was 10.9% and the median proportion of patients lost to follow-up at 1 year was 8.5%.
Percentage of missing data for key variables, missing data index and rate of loss to follow-up in a 15-country study correlating medical record-keeping practices with losses to follow-up in ART programmes
As shown in and , training of staff and clerk-hours spent per week per 100 patients on ART were associated with a decreased likelihood of there being missing data. The figure shows that the variance decreases as the amount of time spent on the database increases. About 10 hours per week per 100 patients on ART were required to lower the proportion of missing data for key variables to below 10%. Interestingly, the four programmes with lower clerk-hours spent on data and lower levels of missing data tended have a strong research component. The amount of time spent by medical staff was only weakly associated with the missing data index (). The proportion of the population living on less than US$ 1 per day was also positively associated with missing data. Loss to follow-up was negatively associated with the number of active tracing strategies in place. The effect of the individual measures was similar: In the univariate logit models, the OR for loss to follow-up was 0.58 (95% CI: 0.25–1.34) for the presence of an outreach team; 0.54 (95% CI: 0.24–1.18) for collaboration with community-based organizations; and 0.41 (95% CI: 0.18–0.94) for collaboration with death registries. Finally, there was a positive correlation between the proportion of patients lost to follow-up and the proportion of data missing for key variables: the Spearman rank correlation coefficient (ρ) was 0.51 (P= 0.031).
Missing data index (median of percentage of data missing in six key variables) and hours spent by data clerks on the database each week
Probability of missing data for key variables and loss to follow-up according to characteristics of ART programmes in a 15-country study correlating medical record-keeping practices with losses to follow-up
Box 1. Measures to improve the quality of the data collected in clinical databases
Bounds checking: Automated checking of whether or not a number lies within a pre-defined numeric range of possible or likely values.
Check digit: Additional number added to a unique identifier to check for errors when entering identification numbers. The check digit is calculated from the other digits in the identification number and is designed so that it will not match if any of the other digits is incorrect.
Fixed taxonomy: Predefined names assigned to a variable that prevent free text-related problems, including spelling mistakes and inconsistent terminology. Examples include the ATC code for drugs.
Numerical alert: System alerting the operator when a number is not expected (i.e. value out of range, or of a critical nature) which will prompt the operator to verify the number or take other appropriate actions.
WORM (Write Once, Read Many): Any type of data storage to which data can be written to only a single time, but can be read from any number of times. This prevents the user from accidentally or intentionally altering or erasing the data.