The Delphi method
The Delphi method facilitates communication between and among a panel of experts, so that the process is effective and the group as a whole can deal with a complex problem [26
]. This method improves the generation of critical ideas by structured collection of information and processing of the collective input from a panel of geographically dispersed experts [27
]. The methodology originated in the early 1950's, when an Air Force-sponsored Rand project, titled "Project Delphi" sought to reach consensus, through a series of questionnaires and controlled feedback, among military experts on possible U.S. industrial targets for attacks from Russia [26
]. The Delphi methodology has applications in many fields, including healthcare, education and sociology.
The advantages of the method are numerous and include:
• The ability to conduct a study in geographically dispersed locations without physically bringing the respondents together;
• Time and cost-effectiveness;
• Discussion of broad and complex problems;
• The ability for a group of experts with no prior history of communication with one another to effectively discuss a problem as a group;
• Participants can have sufficient time to synthesize their ideas;
• Participants can respond at their convenience;
• There is a record of the group activity that can be further reviewed;
• The anonymity of participants provides them with the opportunity to freely express opinions and positions;
• The process has proven to be effective in a variety of fields, problems, and situations [28
Researchers use the Delphi method to translate scientific knowledge and professional experience into informed judgment, and support effective decision-making [22
]. For subject matters in which the best available information is the judgment of knowledgeable individuals, the Delphi method has demonstrated decision-making advantages over traditional conferences, group discussions, brainstorming, and other interactive group activities. The focus in a Delphi study is on the stability of the group opinion rather than on individuals' opinions, thus measuring the group result is superior to measuring individual rankings [27
Healthcare Delphi survey
A Delphi survey with 23 experts from 18 US states was conducted to create a patient safety tool to guide patient safety improvement in US healthcare organizations. The MBNQA framework was used as a general matrix for the tool and was extended to the field of patient safety. The Delphi study was reviewed and approved by the Institutional Review Board – Human Subjects in Research at Texas A&M University (protocol # 2003–0071).
The MBNQA examiners are trained to have in-depth knowledge and extensive experience relevant to the seven Baldrige categories in at least one, and preferably more than one industry or service sector. Consequently, it was important that the Delphi panel members had expertise in the application of the Baldrige process, as well as in patient safety systems.
Study sample size selection
Given that the intent of the Delphi survey was to examine the patient safety systems in the context of a nationally accepted management framework (the Malcolm Baldrige National Quality Award Criteria for Performance Excellence in Healthcare), all study experts were selected using stringent criteria, including knowledge of and/or training in the Malcolm Baldrige Criteria for Performance Excellence in Healthcare, and knowledge and experience in patient safety. The number of experts with such qualifications was fairly limited (n ~ 100) and the sample of Delphi panel participants was small (n = 23).
The sample size for the study was based initially on an empirically selected small sample size (n = 15) and the expected response rate necessary to achieve this sample size. It was critical to consider what response rate was usually obtained in surveys in the particular study area (healthcare quality and patient safety). A survey on the quality of healthcare and the problem of medical errors administered to a large random sample of Colorado physicians, national physicians and Colorado households, revealed response rates of 66% for the Colorado physician sample, 36% for the national physician sample, and 82% for the Colorado household sample [23
]. The psychometric validation process for the Safety Attitude Questionnaire conducted in 160 healthcare sites in the U.S., England and New Zealand obtained a response rate of 67% [24
]. Sumsion (1998), as discussed by Hasson, Keeney and McKenna (2000), argued that in order to maintain the rigor of the Delphi technique, a response rate of 70% must be maintained [22
]. Based on the healthcare study response rates as found in the literature, it was concluded that for this study a response rate of 70% could be expected. Thus, to obtain at least 15 respondents, the study should begin with 22–23 Delphi panellists, where a sample size of 15 to 23 respondents was considered to be small. Responses were obtained from all 23 experts that had made a commitment to serve on the Delphi panel.
Selection of Delphi experts
Delphi participants are not selected randomly; rather, they are purposefully selected to apply their knowledge and experience to a certain issue based on criteria, which are developed from the nature of the problem under investigation. The following criteria were utilized to qualify experts in healthcare quality improvement and patient safety for inclusion in the original Delphi panel:
(a) Judges, senior examiners or examiners for the Malcolm Baldrige National Quality Award in healthcare;
(b) Senior administrators in healthcare institutions that have won or have applied for the Malcolm Baldrige National Quality Award in healthcare;
(c) Senior administrators in healthcare institutions that have won a state quality award within the last five calendar years;
(d) Leaders in state or national organizations or programs that emphasize continuous quality improvement and/or patient safety;
(e) Experts possessing more than one of the aforementioned criteria.
Based on these criteria, only about 100 healthcare experts nationwide qualified for participation in the Delphi panel. Barriers to identification and inclusion of experts were the confidentiality of MBNQA applicant names and the scarcity of healthcare quality award winners at a state level. Approximately one quarter of the qualified experts were recruited for participation in the panel.
Since the names of the healthcare institutions, which have applied for the Malcolm Baldrige Award are kept confidential, obtaining information regarding the application status of a healthcare institution is a subject of individual contacts and institution's willingness to share such information. The reviewers for the category of healthcare available through the Malcolm Baldrige list of examiners were reached via phone and asked if they would consider sharing information on the applicant status of their institutions. Information was also solicited whether the examiners' organizations had won state quality awards within the last five years, and whether the examiners were senior administrators in their respective institutions. If the examiners and senior healthcare administrators qualified as experts in healthcare quality improvement and patient safety according to the study criteria described above, they were invited to participate in the study. In general, the study participants were recruited via telephone and/or letter contact and were selected from (1) the list of Malcolm Baldrige examiners, (2) senior administrators from healthcare institutions that had won national and/or state quality awards, and (3) referrals from (1) and (2). The recruitment of participants was discontinued after 23 qualified individuals confirmed their willingness to serve on the Delphi panel.
Importance rating scale
The Delphi panel utilized a four-choice Likert scale for assessing the importance of suggested critical processes for patient safety systems in healthcare institutions. The scale was modelled according to the original importance scale developed by Turoff [26
]. The participants in the panel were asked to indicate the importance of the Delphi survey items from 1 to 4, where "4" represented processes very important
to patient safety systems in healthcare institutions, and "1" represented unimportant
(irrelevant). All survey items that were identified by the expert group as "very important" or "important" for patient safety in the third study round, when the experts reached consensus, were included in the final patient safety tool. The Delphi survey concluded in three rounds with creation of a process-centred tool for evaluating patient safety performance and guiding strategic improvement at the institutional level, extending the MBNQA criteria to the area of patient safety [29
Bootstrap study design
After the Delphi panel created the patient safety tool, the concern about possible group bias with small expert numbers remained, because it has been argued that increased group size is beneficial in Delphi surveys [27
]. To study possible differences in response characteristics and to explore the possibility for group bias in the study group of experts and, therefore, to assess the possible error in the creation of the patient safety tool, we generated via computer program (SPSS 12.5) two large samples of expert ratings. Since the variation in expert opinions was greatest in the first study round, encompassing the whole spectrum of possible ratings from 1 to 4, the results from the first survey round were utilized as the basis for computer generation of the expanded samples. The expert responses to the survey items were randomly selected with replacement
by the computer program based on the raw data from the first round for the actual survey experts. This resampling technique is called bootstrap
The bootstrap method was developed by Efron in 1979 and has found wide use in the field of applied statistics [30
]. Bootstrap is a Monte Carlo-type data augmentation method utilizing resampling with replacement that can be used with observed data. While Monte Carlo techniques usually generate fictitious data, bootstrap resamples with replacement from the original observed values and generates multiple bootstrap samples as a proxy to the independent real sample. Each bootstrap sample is a random sub-sample (with the same size as the original sample) taken with replacement from the observed values. The original sample is treated as the "virtual population" and the sample is duplicated multiple times. The procedure can be repeated as many times as desired. Resampling has proven valid for any kind of data, including random and non-random data [31
]. During the last three decades, the bootstrap resampling has been used widely in applied statistics [32
Advantages and limitations of the bootstrap technique
Resampling (bootstrapping) of a random sample of an unknown population is considered to model the distribution of that population, where the vaguer the knowledge about the population distribution is, the more valuable the bootstrapping technique proves to be [33
]. Since classical statistical techniques are primarily designed for parametric statistics with normal distributions, the bootstrap technique has an advantage in distributions with no convenient statistical formulae, overcoming the limitations of the classical approaches in working with small sample sizes and non-normal distributions [34
]. Efron and Tibshirani proposed that the technique reduces the assumptions required to validate analysis and eliminates theoretical calculations required to assess accuracy; its major application is in determination of confidence intervals, where 1,000 or more iterations are necessary to estimate the confidence intervals [30
]. The simplicity of the method allows its application in a wide variety of studies and is considered superior to standard statistical tests of significance because it reduces the threat of multiple comparisons bias and provides information on the distribution of scores (and not parametric distributions); the technique is not dependent on a specific nominal size such as 5% and therefore is more accurate [34
]. The bootstrap technique may have limited accuracy in very small sample sizes (n < 20), in extremely skewed distributions, and if extreme outliers are present [34
In this study, statistics for each bootstrap resample were saved in memory and later used for estimation of sampling variance, confidence intervals and assessment of bias for the raw data [30
]. The characteristics of the generated samples, when analyzed collectively are used to provide a more representative expression of the underlying population, in this study – the population of patient safety experts knowledgeable about the Malcolm Baldrige framework. The hypothesis
was that strict expert inclusion criteria based on training in, and knowledge and understanding of the MBNQA framework in the original sample of 23 experts would provide stability of responses, even if the number of responses was increased by computer generated bootstrap samples.
The bootstrap samples were generated using SPSS 12.5 software. The characteristics of the SPSS 12.5 model were as follows:
MODEL PROGRAM b0 = 2
COMPUTE PRED_ = b0
/BOOTSTRAP 1000 
/CRITERIA STEPLIMIT 2 ISTEP 1E+20.
More specifically, the regression routine and the subroutine of nonlinear regression were employed. Once these routines were selected, the options feature of nonlinear regression was invoked. The bootstrap option was selected and the "paste" option was used to indicate the number of bootstrap samples to be derived.