As a design a prospective documentation was used that followed the methodology outlined in a predefined and published protocol [15
]. This means that all the patients insured with the insurance company "Innungskrankenkasse" (short IKK) who had visited one of the participating doctors between 1995 and 1998 were eligible for treatment and hence for inclusion in this study. The IKK offered these treatments in the three German Länder
of Baden-Wurttemberg, Saxony, and Saxony-Anhalt and funded this study.
As the study was designed as a real routine-care-assessment the only inclusion criteria were: written informed consent, ability to read and write German and being insured with the funding insurance company.
The eligibility criteria of the doctors ensured the quality of treatment. All of them had to be medical doctors. For inclusion in the trial phase, they had to send proof of their qualification to a quality control committee set up by representatives of the insurance company and the doctors' associations. Homoeopaths had to possess the comprehensive additional qualification "homoeopathic doctor" according to standards of the German Homoeopathic Doctors' Association (Zentralverein homöopathischer Ärzte); acupuncturists had to have at least 140 hours of training and hold a so-called A -diploma (certified by an acknowledged acupuncture association).
At the beginning of the study, patients were asked to give written consent to allow documentation. Consenting patients were given baseline questionnaires while awaiting treatment. Post-treatment, acupuncture patients were asked to fill in a set of questionnaires after the final session of a treatment cycle as well as follow-up questionnaires once a year. For homoeopathy – where treatment is more likely to last several months or years – patients were asked to fill in questionnaires prior to the treatment and every 6 months thereafter. The questionnaires after the treatment had to be answered at home and mailed to the study centre. This ensured that doctors remained unaware of the answers given by the patients.
Doctors were asked to document the diagnosis, the application of treatment and improvement or deterioration of the main symptom every time they saw the patient.
At the end of the trial phase, the insurance company supplied information on absence from work.
The following data were recorded:
Pre-treatment questionnaire on
- socio-demographic variables
- complaints (free text)
- number and type of therapeutic approaches used
- current treatment
- current medication
- reasons for seeking alternative therapy
- health-related quality of life (MOS SF-36)
The post-treatment questionnaire contained the following items:
- subjective perception of effectiveness
- satisfaction with treatment
- in case of failure or disruption of therapy: reasons
- side effects
- concurrent illnesses during treatment
- health-related quality of life (MOS SF-36)
Follow-up questionnaires repeated the main questions post-treatment.
Initially, a practice profile questionnaire documented details about training, experience, and techniques normally applied.
After each session, doctors filled in one sheet concerning
- the type of consultation (first, follow-up, telephone)
- time spent with patient
- up to 3 diagnoses, indicating the main diagnosis
- with each diagnosis: duration and severity of disease, acute or chronic
- prescriptions (allopathic, homoeopathic, other)
- referrals (specialist, clinic, cure, physio- or psychotherapy)
- side effects, aggravation, indication of antidotation (only homoeopathy)
- in the case of acupuncture: acupuncture points and meridians
- in the case of homoeopathy: key symptoms
Health insurance data
The insurance company provided information for each patient on:
- work days lost, reasons (diagnoses) for absenteeism
- time insured over the past 8 years
- days in hospital, reasons (diagnoses) for hospital stay
Main outcome criteria
As target criteria we measured
- health related quality of life (SF 36)
- doctors' rating of improvement of main complaint as 7 point symmetrical improvement-deterioration-scale with anchors "-3 (very much worse)" to "+3 (very much improved, healed)"
- work absenteeism.
Procedures, data handling and statistics
At the beginning of the study, doctors were informed by the insurance company and their respective associations. Interlocutions between the study group and/or the insurance company and the doctors' associations insured cooperation. Conferences were held to inform doctors about the procedures and aims of the study and to ensure their compliance.
Treatment of data followed the prerequisites for anonymity of patients, and patients' data were neither shown to their GPs nor to the insurance company. All the data were sent directly to the study centre and were checked for completeness. Follow-up questionnaires were sent out to patients with pre-stamped envelopes by automatised routines and up to two reminders were sent out in case of no return.
The data were checked for completeness immediately upon arrival, and reminders or individual letters were sent, if any material was missing. The data were entered by hand or using scanning software (Cardiff Teleform), and plausibility checks were run. If data were missing the following procedures were employed:
Imputation of missing data was only carried out for the SF-36. We followed the published routines [14
] of imputing the personal mean if less than half of the items were missing. If whole questionnaires were missing, but subsequent questionnaires were available, we imputed the values of the next questionnaire in order to prevent overestimation over time. This guaranteed a conservative handling of missing data, but also allowed us to impute at least half of the missing questionnaires, as the missing data pattern was rather random than increasing over time.
In a separate study, we checked for the effects of different missing data interpolation routines and found no difference between sophisticated regression analytical methods and interpolation using propensity score and no interpolation [16
], which is likely to be an indicator that data are missing at random. In a telephone interview study, which is still ongoing, we checked whether non-responders differed systematically from responders. So far, our experience favours the hypothesis that non-responders do not systematically differ from responders in terms of outcome (data to be published elsewhere).
A comparison of the data given in questionnaires that were returned too late revealed that there was no significant difference in answer patterns. The given descriptive analysis was therefore carried out for all available data. Neither work days lost nor the doctor's rating was imputed when missing, as there was no hint for biased results. By nature, work days lost are not subject to overestimation bias, and by cross-checking the billing forms it was ensured that the doctor's rating was given for nearly every patient included in the study.
Data were analysed by Access and SPSS. Due to the descriptive nature of the study, we relied mainly on descriptive statistics. In order to quantify effects, we calculated effect sizes as standardised mean differences according to Cohen [17
], with the standard deviation at pre-time-point as standardisation factor.
Because this was an evaluation taking place in general practice together with some political necessities, the study had to begin before all data collection and monitoring routines were in place. This resulted in a somewhat loose monitoring of data in the first six months of the study, with many questionnaires being filled in too late, etc. In all cases of doubt we ran analyses to estimate if there would be a distortion of results when using the data. We discarded data suspected of being invalid, always following the conservative rule of trying to avoid overestimation of effects.
In particular, the doctor's rating of success was treated very conservatively and corrected accordingly to avoid overestimation of success. Since this scale was to be gauged to the last session (change of patient compared to last time seen), all points were added and standardised on the number of sessions, which yields a theoretical range from -3 to +3 with 0 marking the unchanged status. If a doctor scored "3" more than once a sequence, the following ratings were automatically recoded as unchanged.