Our survey shows that there are major defects in both the completeness and accuracy of the collection and reporting of data that tracks PMTCT service delivery. Yet national health systems rely on this type of data for national reports and to plan resources including financial allocations for the future. Such data might also be used to track system processes and outcomes to improve performance. This large-scale systematic review highlights the problems around the PMTCT data that are routinely collected and reported to the DHIS in South Africa, and demonstrates the need for interventions for its systematic improvement. Until these issues are addressed, routine data cannot be used to reliably inform efforts to improve PMTCT care.
The strength of this study lies in the very large sample size (all 316 clinical sites were surveyed for data completeness, 99 sites were sampled for data accuracy), the sampling technique (randomization) and the use of an objective, quality-assured “gold standard” report generated by on-site audit of the original data in clinic registers to evaluate the accuracy of PMTCT elements reported in the DHIS.
The analysis indicates that data collation at clinics, before submission to the DHIS, is the point of major breakdown that compromises data integrity. There was reasonable concordance between what the clinics actually submitted to the District Information offices for capture i.e. the Monthly summary sheets, and what subsequently appeared in the DHIS. For the six PMTCT indicators that were evaluated, the initial transfer of data from the individual clinic registers to the Monthly Summary Sheets was highly inaccurate. This may have been because numbers were incorrectly tallied, not all registers were included in the process, or register data were summated before the end of the month and therefore not all data included. By contrast, the survey team ensured that all registers were available and included, and each full month was included in their reconstruction.
The second major finding was that data were frequently not submitted by clinics to the District Information Offices for capture into the DHIS. In any health informatics system (or research study), the absence of data fundamentally weakens the accuracy and reliability of that system. Missing data, especially if substantial in quantity e.g. more than 10%, can very significantly skew results to either over-report or under-report practices or outcomes of a system. These results were presented to the district information officers responsible for the data submissions to the DHIS and to many of the clinic managers responsible for the initial data collection. Both groups initially doubted the findings presented to them. However, after investigating the indicator reports themselves, they were able to verify the findings.
Other investigators in South Africa and elsewhere have found similar defects in data systems used to track disease or public health programs. In a review of data received by the DHIS from 12 rural clinics in KZN province, the data collection process was found to be inefficient (duplication of data collection), was perceived to be a major burden on health workers, and was not being used in the clinics to improve patient care 
. This study found a significantly lower rate of incompleteness in the DHIS records (2.5% of data missing) compared to our survey. This may reflect significant differences in the sample size (12 clinics from a single sub-district vs 316 clinics and hospitals from 3 Districts) or may reflect a specific problem related to data collection in the PMTCT programme. An evaluation of the accuracy of death notification forms in the Cape Town metropole supports the concern that defects in the public health reporting system is pervasive 
. Errors were found in nearly all death notification forms (91%), with a major error detected in 43% of instances, resulting in documentation of an illogical sequence or cause of death.
Other reports indicate that data collected at clinic level and reported by national health programmes in developing countries cannot be relied on for disease or programme surveillance. A survey of a purpose-built data system that tracks the public sector antiretroviral treatment program in Malawi found that most clinics had complete case registration and clinical outcomes data but that case registration data were accurate in only 40% of sites 
. The authors identified several clinics characteristics (longer experience with the ARV programme, visits by ARV supervisors, high volume of patients, dedicated clerks for recordkeeping) that were positively associated with higher data quality performance. A study of the VCT program in Kenya found major discrepancies between onsite records and those in the national office, concluding that there was significant underreporting of the data 
. By contrast, a study that evaluated the quality of vaccination monitoring programs in 27 countries concluded that immunization rates were routinely over-reported 
. A similar study in Mozambique found consistent over-reporting at the facility level with 44% over-reporting of BCG vaccinations and 95% over-reporting of DPT+HepB vaccinations 
. Our study demonstrated a large variation and deficiency in the completeness and accuracy of individual data elements. This finding suggests that staff at clinics may not assign significant value to the quality of data collection and that the current data system is not used to improve the quality of PMTCT care processes at a local clinic level. Similarly, district information offices did not routinely review data completeness in order to improve the data system.
Data from multiple reports indicate that information systems in developing countries do not generally provide sufficiently useful information for effective public health management 
. A five-country evaluation of data structures supporting health care systems in developing countries across 4 continents identified a number of structural impediments (timeliness, accuracy, simplicity, flexibility, acceptiblity, usefulness) to an effective health information system 
. The authors proposed that while multiple problems exist, the common deficiencies were concerning the design of the system, ongoing training of personnel and dissemination of data from the system 
. These authors recommended that after a thorough evaluation, the system should be improved through training and support.
Our survey concurs with the findings of others, that the difficulty of accurate data collection is compounded by duplication and unnecessary complexity caused by a multiplicity of registers 
. This analysis supports the recommendations of WHO 
, who argue for simplified data collection tools, a minimal common set of key indicators, reduced numbers of registers, and allocation of dedicated, trained personnel at the local level to maintain patient records and reports.
Whether current approaches to improving data systems, including further training, simplified data collecting systems, or the use of sophisticated electronic data validation systems will be sufficient to provide more reliable data remains to be determined. While undoubtedly the solution will be a multifaceted one, at least two additional principles need to be considered in any response to improving data systems: 1) Data needs to be perceived by front line clinic staff as intrinsically valuable in the management of their patients, and in the performance of their delivery of health care. This can be partially achieved through simplification of the collection and reporting process, but data will only attain significant value if it is used by clinic staff in an ongoing process to manage patients and populations. 2) Clinic staff need to be supported and supervised in the execution of data management tasks. While more accurate data may result from more rationalized data collection and reporting processes, a crucial element is to provide ongoing supervision and support to the process. Clinic managers or supervisors need to work with their local clinic staff to promote improvement of clinical practice through analysis of performance and outcomes data. Local data needs to be owned and valued by local staff, rather than relegated to the orphan status that it currently occupies.