|Home | About | Journals | Submit | Contact Us | Français|
In the September issue of BJGP, Hankins et al published a brief report1 criticising the reliability and validity of the two survey instruments which are used for measuring patient satisfaction (the (Improving Practice Questionnaire [IPQ] and the General Practice Assessment Questionnaire [GPAQ]) in general practices nationally in order to monitor the quality of services and earn QOF points. I have only been involved in the development of the IPQ and so I will restrict my comments to the reliability and validity of that instrument, and not the GPAQ.
Hankins et al conducted a literature search based on the names of the questionnaires IPQ, GPAQ, and GPAS (the forerunner of GPAQ). They only found one paper about the IPQ,2 none for GPAQ and three papers for GPAS. They also hand-searched journals.
If they had been more thorough, they would have discovered that the IPQ had been previously validated in Australia3 where it was known as PAIS (Practice Accreditation and Improvement Survey) and that its core consists of the 12 items that comprise another instrument called DISQ (Doctor's Interpersonal Skills Questionnaire)4–6 which, in spite of its name, can be used to assess any clinicians' capability. There are several papers demonstrating that the DISQ has validity against external criteria, for example against the assessments of College examiners.7–9
The IPQ had also been assessed for validity and reliability in 2002 by two independent experts in this field, Professor Mike Pringle and Dr Brian McKinstry, prior to its recommendation as one of the tools to use in the Quality and Outcomes Framework (QOF) of the GP contract.
Criticisms of the IPQ raised by Hankins et al were as follows:
I still possess the first draft of our paper on the reliability and validity of the IPQ that was submitted to Education for Primary Care in 2003, the final version of which1 Hankins, et al reviewed. In that draft, I described the results of principal components analysis that I had used on the instrument, which yielded two major components that we called ‘capacity’ (the processes within the surgery) and ‘capability’ (about the clinician's ability). The capability component incidentally comprised all of the items that make up the DISQ. I had also calculated Cronbach's α for each of these components. The statistical reviewer of our first draft made cryptic comments about the use of Cronbach's α that we unfortunately misinterpreted so we regrettably deleted the values from the final draft of the paper. The deleted values were 0.9655 for the capability component and 0.9082 for the capacity component. Values for Cronbach's α over 0.9 are regarded as demonstrating good internal consistency,10 and hence reliability. I think this answers the first two criticisms about reliability of the IPQ.
Construct validity is an ongoing process of learning more about a construct (in this case, satisfaction with general practice), making new predictions and testing them. It is a process where the theory and the process and the measure are assessed at the same time.11 Satisfaction is a construct because it cannot be directly measured in the way that objective measures such as height or weight can. However, people can tell you how satisfied they feel about something and will be able to give a satisfaction rating. Regarding the validity of the IPQ, we felt that an assessment of construct validity was possible by correlating item 27, which asked patients to rate their overall satisfaction with this visit to their general practice, with the summed values of all the other items (1 to 26). We obtained a correlation coefficient of 0.78, P<0.001, which would indicate that the instrument produced responses that suggest it is measuring the construct ‘patient satisfaction’. In their report, Hankins et al neglected to say what item 27 was and thus detracted from its usefulness in assessing this construct.
Our other approach to evaluating IPQ construct validity was based on comparisons of mean scores across categories such as sociodemographic groups known or hypothesised to score differently. In spite of what Hankins et al say, it has generally been the case in past literature (for example, concerning the DISQ) that older patients give better satisfaction scores than younger ones.10,12 As this was expected, we felt that confirming this to be the case with IPQ demonstrated construct validity.
I first heard about these criticisms after they were presented at a conference in July 2006 and I immediately emailed the lead author to make myself known to him. In retrospect however, he should really have approached us for relevant information before doing his conference presentation. He replied to my email saying that he had reviewed our paper1 and asked me if I knew of any further papers. I took that to mean any more recent papers concerning reliability and validity of the IPQ, of which there were none: I didn't realise at the time that he might not have been aware of papers describing the validation of the DISQ. He also asked me to shed more light on the correlation we had done between IPQ items 1–26 against item 27 but I didn't feel there was any more to say about it. He regarded it as an estimate of reliability (and yet in his BJGP brief report he said that we hadn't assessed reliability), and I viewed it as a measure of validity. This only serves to underline the fact that validity and reliability are closely related. An unreliable instrument, after all, is unlikely to measure the ‘truth’.
Hankins et al say in their report that ‘access to IPQ and GPAQ datasets may have helped to evaluate reliability’ but neither I nor my colleagues received any requests for our IPQ data. We would have gladly given them all the raw data on 55 687 patient responses in 361 UK practices if he had asked for it. It is a shame that they didn't approach us earlier or that the lead author was not more specific in his eventual email reply to me because I could have sent him the Cronbach's α values for the IPQ and validation references about the DISQ too. In my view, this is not the way to conduct a balanced review. It is regrettable that these matters were not clarified because the Hankins et al article remains in the public domain potentially more influential than this complete rebuttal.