The nine-step strategy for developing closed-ended survey items that was developed for this study is presented in . The steps in this table are presented in the order in which they were executed. In the discussion provided, the procedures that were followed in implementing each step are described in detail. In addition, where it is appropriate, substantive findings are presented to illustrate precisely how the procedures were implemented.
Steps for Developing Closed-Ended Survey Questions
All together, 8 focus groups were conducted with a total of 63 older adults. Thirty-one were older African Americans and 32 were elderly White adults. The groups were run in four pairs—one consisting solely of older White adults and one made up entirely of elderly Blacks adults. New subjects were recruited for each round of focus groups. Focus group moderators were matched to the race of the study participants. All subjects were paid $25 for participating in this study (subject remuneration was used in all phases of data collection in this study). David Morgan, an expert on focus groups (Morgan & Krueger, 1998
), came to Ann Arbor to train the moderators and to help run the first round of focus groups.
The focus groups were conducted in a local hotel. All focus groups were tape recorded and transcribed. In reviewing focus group transcripts, it is often useful to know the identity of the speaker because this information enables researchers to get a better sense of important issues like how widely opinions are shared in the group. Unfortunately, when listening to tapes of a focus group that consists of six or eight people, it is often difficult to know who is speaking. To attribute responses to specific people, a court stenographer was hired to transcribe all focus group sessions. Because court stenographers are trained to produce virtually flawless transcripts, the risk of encountering transcription errors was reduced significantly. Initially, it may appear that study subjects would be bothered or inhibited by the presence of a stenographer in the focus group sessions. There are two reasons why this was not a problem in the present study. First, field notes taken during the course of the focus groups revealed that study participants paid little attention to the stenographer and rarely even looked at this individual during the sessions. Second, at the end of the focus groups, study participants were asked if they were bothered in any way by the presence of the court recorder. Without exception, they indicated this was not the case, and some even made jokes about it.
Two key concepts guided the flow of focus group discussions: The funnel approach (Morgan, 1988
) and the saturation point (Glaser & Strauss, 1967
). Because the focus groups were conducted sequentially over time, a funnel approach was used to devise the moderator’s guide. For those unfamiliar with focus groups, the moderator’s guide contains a list of topics or questions that are used to stimulate focus group discussions. The first round of focus groups began with very general questions that were designed to throw as broad a net as possible, thereby ensuring that the views of the research team were not imposed on the group discussions (e.g., “What is the most important part of living a religious life?”). Initially, the members of the research team read the transcripts from the first round of focus groups on their own. Then, the team met as a group to reach a consensus about what had been said. This information was subsequently used to draft a series of more specific questions for the next round of focus groups. So, for example, more targeted questions were asked about church-based social support (“Some people say that the help and guidance they get from people at church is important. What do you think? What are some of the ways people in your church may help each other?”).
Two important points must be made about the funnel approach. First, consistent with the basic tenets of qualitative interviewing (Madill, Jordan, & Shirley, 2000
), this strategy helps ensure that the substantive content of the focus group questions, and the way they are phrased, are determined by the subjects. Second, because data from earlier rounds of focus groups were used to devise questions for later rounds of focus groups, the content of the moderator’s guide changed several times during this phase of the study (see May 1991
, for a more detailed discussion of this data collection strategy).
Concerns may arise over the use of the funnel approach because it may seem as though the use of more focused questions would prohibit those who participate in later rounds of focus groups from expressing their own views. This potential problem was addressed in the following manner. Even in later rounds of focus groups, the moderator always began with general questions about religion, and only asked more focused questions after study participants had sufficient time to respond to the more general questions. In this way, the general questions provided a way to continually bring new information on religion to the foreground.
In the process of conducting qualitative research, investigators often reach the point where respondents in later rounds of focus groups begin to discuss the same issues that emerged in earlier rounds of focus groups. This is called the saturation point (Glaser & Strauss, 1967
). If the goal of a study is to flush out the content domain of a construct, gathering redundant information is not useful. Therefore, once the saturation point is reached, the moderator’s guide is changed so that time spent with study subjects can be devoted to uncovering new information.
The focus group data were evaluated using a two-level qualitative data inventory. The first level, which is part of the funnel approach discussed previously, consisted of a more general assessment of the findings. Here, the intent was to sort the data into relatively crude large-level categories that could be used to identify themes to pursue in later rounds of focus groups (Brenner, 1985
). Consequently, this first-level assessment of the data was an ongoing process that unfolded over the course of all eight focus groups. The second level of analysis began once the focus groups were complete. Here, members of the research team reviewed the transcripts and independently developed a more detailed coding scheme (Mishler, 1986
). Following this, team members met as a group, discussed their coding schemes, and arrived at a consensus on how best to present the data.
The focus groups provided a wealth of information about religion in late life. Two papers were written to explore select areas in detail: One examined negative interaction in the church (Krause, Morgan, Chatters, & Meltzer, 2000a
), whereas the other dealt with prayer (Krause, Morgan, Chatters, & Meltzer, 2000b
). The findings on prayer will be reviewed briefly to highlight the rich insights that emerged from this phase of the item development strategy.
Most research on prayer focuses on how often people pray and the types of prayers they offer (e.g., prayers of thanks-giving or petitionary prayers requesting specific outcomes; Poloma & Gallup, 1991
). However, it became apparent over the course of the focus groups that study participants had a good deal to say about whether prayers are answered, and if they are, how answers are provided. So, for example, some felt that prayers are answered right away, whereas others believe that God answers prayers whenever He is ready. Moreover, some felt they got exactly what they asked for in a prayer. In contrast, others indicated they did not always get what they asked for. However, when these individuals took the time to think about it, they found the answer they got was precisely what they needed most.
This information provides a host of new ways to think about the relationship between prayer and health in late life. For example, older people who expect immediate answers to prayers may become disillusioned, and lose hope, if the anticipated answer is not forthcoming. As research indicates, the loss of hope may be an important risk factor for some health problems (Nunn, 1996
). To empirically evaluate this issue, closed-ended questions were crafted in later phases of this study to assess beliefs about the timing of answers to prayers.
A major advantage of using focus groups arises from the fact that it is possible to observe and record how older people talk to each other when they discuss important topics like religion. The words and phrases they use provide excellent raw grist for writing closed-ended question stems. There are, however, several disadvantages in using focus groups. First, respondent burden is heavy because subjects must leave their homes and come to a common location, such as a meeting room in a local hotel. Unfortunately, this makes it difficult for elderly people to attend focus groups if they do not have adequate transportation, or if they are physically challenged. Transportation problems may be dealt with by providing cab rides for those who need them, but little can be done to improve attendance rates among older adults with physical health problems.
The second disadvantage in using focus groups is discussed by Knodel (1995)
. He argues that the presence of others in the focus group may make it difficult to disclose information of a personal nature. Because some older people may consider their religious beliefs and practices to be a private matter, they may feel uncomfortable expressing their views and feelings in a focus group setting. In view of these limitations, and consistent with the principle of triangulation, the next step in the item development strategy involved conducting a series of one-on-one, in-depth interviews.
The individual in-depth interviews were conducted face-to-face with study participants in their homes. All interviewers were race-matched with respondents. A new set of subjects was recruited for the in-depth interviews from the HCFA list. Once again, the names of potential respondents were selected using simple random sampling procedures. A total of 131 in-depth interviews were completed successfully. Approximately 61% of these individuals were older White adults, and 39% were elderly African American adults. All interviews were tape-recorded. The interviews typically lasted between 60 and 90 min.
The number of in-depth interviews conducted for this study is unusually large (N
= 131). In fact, some investigators suggest that as few as eight are sufficient to cover a new domain (McCracken, 1988
). However, the members of the research team believed it was important to interview a large number of older people because the content domain of religion is so vast. This decision was ultimately supported by the wealth of data that was obtained during this phase of the study.
Throughout, the funnel approach and the saturation point concepts guided the flow of the in-depth interviews. As a result, the content of the questionnaire changed a number of times as the in-depth interviews were being conducted. It is important to emphasize two points about the way in which the funnel approach was implemented. First, following the procedures used in the focus groups, all in-depth interviews began with general, open-ended questions about religion and concluded with questions about more focused aspects of religion. Second, in-depth interview questions about more specific aspects of religion were written with input from the focus group data. This means that the use of the funnel technique was extended in this study by carrying insights across two different qualitative methodologies.
During the course of developing the in-depth interview questionnaires, a procedure was implemented that departs significantly from the traditional qualitative approach. As noted earlier when discussing the funnel approach, focused probe questions are typically developed based solely on the input provided by study subjects. However, in the process of writing the in-depth interview questionnaire, information from other sources was taken into consideration. Shortly before the in-depth interviews began, the John Templeton Foundation issued a request for proposals on forgiveness. This initiative was developed in response to a burgeoning literature, which suggests that forgiveness may be an important factor in promoting health and well-being (McCullough, Pargament, & Thoresen, 2000
). However, the focus group participants in the present study had very little to say about forgiveness. Even so, the research team decided that since the goal of the project was to develop a comprehensive set of religion measures, it would not be advisable to overlook this potentially important construct. This decision raises a broader question about how to obtain complete coverage of a conceptual domain that has been the subject of empirical and theoretical investigation for some time. For more than 100 years, scholars have been studying religion in an effort to distill its essential elements (James, 1902/1997
). This vast literature contains many valuable theoretical insights and important empirical findings (Koenig et al., 2001
). We felt it did not make sense to completely disregard this work. Instead, a more profitable approach to designing in-depth interviews involves finding a way to exploit existing material without compromising the inherent advantages associated with this important qualitative methodology.
With this objective in mind, a series of open-ended probe questions on forgiveness were placed at the end of the in-depth interviews. This ensures that the questions on forgiveness did not unduly influence or bias responses to the earlier, more traditional, qualitative probe questions because respondents were given ample time to express their own views before they were presented with questions based on external material. Viewed more generally, this strategy provides a unique way to more tightly integrate qualitative methods and quantitative research findings in the interests of developing the best closed-ended survey items.
As the in-depth interviews progressed, it quickly became evident that the questions on forgiveness evoked the most emotionally charged response that was encountered during the entire study. This was especially true of questions about self-forgiveness. Although data are not available to explain why forgiveness did not arise spontaneously in the focus groups, perhaps the highly personal nature of this topic initially inhibited discussion.
Because so many in-depth interviews were conducted (N = 131), the author was the only person to listen to all the tapes. Different members of the research team reviewed segments of the taped interviews dealing with specific topics, such as forgiveness. After these individuals independently coded the themes that emerged from the data, they met with the author to discuss their findings and arrive at a consensus on the best way to code the data.
The in-depth interviews provided rich insights into how older adults practice religion in their daily lives. The quality of these data may be illustrated by briefly reviewing some of the results on forgiveness. More detailed findings on forgiveness appear in a paper that was written with these data (Krause & Ingersoll-Dayton, 2001
). As the in-depth interviews progressed, it quickly became evident that respondents held quite different views about forgiving other people. Although some felt it was their duty as Christians to forgive others automatically, and not require transgressors to do anything first, other study participants believed that transgressors must earn their forgiveness. In particular, this latter group of subjects felt that transgressors must take one or more of the following steps to be forgiven: (1) transgressors must be aware of what they have done; (2) explicitly ask for forgiveness; (3) offer an explanation for the hurtful act; (4) make a resolution not to repeat the offense again; (5) follow-up on this resolution by changing their behavior; and (6) make amends (i.e., provide restitution). These insights set up a series of intriguing questions about the nature of the relationship between forgiveness and health. First, regardless of how it is attained, is forgiveness related to health? Second, does forgiving others without requiring them to do anything first provide the greatest health benefits, or is requiring transgressors to earn forgiveness more beneficial in this respect? Finally, if forgiveness must be earned, do potential health-related benefits increase as transgressors perform progressively more of the steps outlined previously?
In retrospect, it appears as though more valuable data were obtained from the in-depth interviews than from the focus groups. Although it is difficult to support this impression with hard data, study subjects may have provided better information because they felt more comfortable discussing sensitive religious issues in the privacy of the individual in-depth interview (Knodel, 1995
). Also, as noted previously, because the in-depth interviews were conducted in the homes of study participants, this methodology made it possible to reach a wider group of older people, including those who were homebound.
Input from Quantitative Studies
The next step in the item development strategy also departs from traditional qualitative procedures. This phase of the study emerged serendipitously. As the focus group and in-depth interviews were being conducted, an opportunity arose to insert closed-ended quantitative questions on religion in a nationwide survey of people affiliated with the Presbyterian Church (USA). Data from the focus groups and in-depth interviews revealed that many older study participants valued church-based social support highly. Although relatively little space was available in the questionnaire for the Presbyterian study, seven closed-ended items were crafted from the qualitative data to assess three aspects of church-based support: Emotional support from rank-and-file church members, emotional support from the pastor, and spiritual support from church members. Spiritual support involves things like sharing religious experiences and helping others live according to their religious beliefs. Data from the nationwide Presbyterian survey provided an excellent opportunity to gain a preliminary sense of the quality of the newly developed items. In particular, a series of analyses were performed ranging from an examination of simple frequency distributions to the estimation of a sophisticated second-order confirmatory factor model. The results of these analyses appear in a paper by Krause, Ellison, Shaw, Marcum, and Boardman (2001)
The Presbyterian survey was useful for several reasons. First, the confirmatory factor model revealed that the newly developed measures had good psychometric properties (i.e., good item and scale reliability).
Second, the Presbyterian study provided an opportunity to conduct a preliminary assessment of the construct validity of the newly devised, church-based support items. Construct validity is assessed by seeing whether a new set of items is related to an established outcome measure in a theoretically meaningful way. The Presbyterian survey also contained a set of widely used indicators on religious coping responses that were developed by Pargament (1997)
. These items assess specific ways that people use religion when they are confronted by stressful events. Based on the premise that the selection of religious coping responses are socially determined, Krause and colleagues (2001)
examined whether the three dimensions of the church-based support discussed previously are related to religious coping. Data indicate that people are especially likely to use religious coping responses when they receive spiritual support from church members. Even though emotional support from the pastor was also associated with greater use of religious coping methods, the relationship was not as strong. In contrast, emotional support from church members had no effect on religious coping.
Third, findings from the empirical work with the Presbyterian data also provided valuable information on how to implement the funnel approach in the in-depth interviews. As discussed earlier, the funnel technique involves using data from earlier rounds of in-depth interviews to develop more focused questions for later rounds of in-depth interviews. The decision to devote scarce interview time to probe specific areas in detail is based on the nature of the themes that emerge from the initial in-depth interviews. However, if the goal of a study is to ultimately develop closed-ended quantitative survey measures, then relatively unique criteria must be used for identifying themes that should be probed more deeply. In particular, an investigator must determine whether different themes that emerge from the qualitative data will lead to the development of different quantitative scales, or whether the themes assess content areas that are so closely related that they can be more economically assessed with a single scale. It is sometimes difficult to make this decision based on the qualitative data alone. This was true with respect to the church-based social support measures identified previously. In particular, it was hard to imagine how rank-and-file church members could provide spiritual support to respondents without also giving them emotional support at the same time. However, the empirical analysis of the Presbyterian data suggests that even though the two dimensions of church-based support are correlated, spiritual support is related to religious coping, whereas emotional support from church members is not. This differential impact suggests that the two dimensions of church-based support are conceptually distinct and that it would be useful to continue to invest in-depth interview time in exploring them both. Viewed from this perspective, the analysis of the Presbyterian data provides a relatively unique way of showing how both qualitative and quantitative methods can be used simultaneously to develop closed-ended survey items.
Developing Preliminary Quantitative Measures
Once the in-depth interviews were complete, a set of closed-ended survey questions was developed. These questions came from three sources. First, some questions were taken from existing scales (e.g., measures of religious commitment, as well as some religious coping items). Second, measures devised by other investigators were modified on the basis of information gleaned from the focus groups and in-depth interviews (e.g., some indicators of emotional support from church members). Finally, because good measures could rarely be found in the literature, the bulk of the items were developed from scratch.
It is important to provide more detail on how new survey items were written when good indicators could not be found elsewhere. Two issues figured prominently in this process. The first involved creating a list of all facets of religion that emerged from the focus groups and in-depth interviews. The two-level qualitative inventory discussed earlier was invaluable in this respect.
The second key issue involved confronting the problem of depth versus breadth. Data obtained in the focus groups and in-depth interviews were incredibly rich, and the number of questions that could be written from this material seemed almost limitless. However, because the final step in the item development strategy consisted of a nationwide survey lasting 70 minutes, crucial decisions had to be made about how many dimensions of religion to cover, and how many items to devise for each dimension. There does not appear to be much guidance in the literature on how to make these decisions. The following approach was used in this study.
First, the members of the research team estimated the total number of questions that could be administered in a 70-min interview. Second, in estimating the likely number of items, team members had to take into account the fact that questions would also be asked about health, psychological well-being, and a number of factual matters, such as age, education, and marital status. Once the amount of questionnaire space that could be used to measure all areas of religion was established, the number of specific dimensions that could be covered was estimated by turning to the work of Andrews (1984)
. His sophisticated latent variable modeling analyses reveal that the ideal length of a scale should be between two and four items. The upper limit of four items was used in the present study. Although departures from the target of four indicators ultimately arose, this general guideline was, nevertheless, useful. Finally, once an estimate was derived of the number of dimensions of religion that could be covered, difficult decisions had to be made about what to include. This was clearly the most challenging part of this study phase. Because the ultimate goal of this study was to examine the relationship between religion and health, decisions were based, in part, on whether a particular dimension of religion is likely to affect the health of older people. Dimensions of religion that might be related to health were identified by turning to the literature, as well as insights gleaned from the focus groups and in-depth interviews.
Panel of Experts
Seven scholars with outstanding reputations in the area of religion graciously agreed to review the items that were developed for this study. The following researchers were members of this group: Linda Chatters, Christopher G. Ellison, Ellen Idler, Harold G. Koenig, Jeffrey S. Levin, Kenneth I. Pargament, and Robert Taylor. It should be emphasized at the outset that even though these individuals provided invaluable input, the author is solely responsible for any problems with the closed-ended survey items that were developed in this study. Although expert panels have been used to develop items in other studies, a protocol for fully exploiting the expertise of these individuals is difficult to find in the literature. The following procedures were developed especially for this study.
First, a complete draft of all the religion items was sent to each of the experts. They were instructed to rate each indicator on a scale that ranged from 1 to 5, where 5 meant the question stem and the item response categories were of high quality, and a score of 1 denoted items that were most in need of revision. These data were mailed back to the study field office where frequency distributions of the ratings for each question were tabulated.
Following this, the experts were all flown to Ann Arbor for an intensive 2-day review of the study measures. The ratings provided before the meeting were used to structure the discussions. In particular, the meeting began by focusing on items with the least favorable ratings. The problems with each measure were identified, discussed, and potential solutions were proposed. The 2-day session with the experts was tape-recorded and transcribed.
An example may help illustrate how the expertise of the panel members was put to good use. Recent research suggests that, as people grow older, they turn control over specific domains in their lives to trusted others (Schulz & Heckhausen, 1996
). For example, some older people turn over the management of their financial affairs to a grown offspring. During the course of the focus groups and in-depth interviews, the subjects in our study indicated they turn control over to God as well. A series of preliminary items were devised to capture this domain. Initially, these items focused on turning things over completely to God, but the group of experts recommended that a second approach should also be considered. In particular, they argued that people may only partially rely on God while continuing to exercise control themselves. Items were, therefore, developed to capture this form of collaborative control with God. Following this two-part strategy makes it possible to see if complete control by God, or collaborative control with God, is most likely to enhance the health and well-being of older people.
In addition to reviewing question stems and response formats, the expertise of the group was also used to help determine the order of the closed-ended questions in the interview schedule. Toward the end of the 2-day meeting, the experts were each given a stack of index cards. The name of one dimension of religion was written on each card. The experts were asked to sort the cards according to where the questions should appear in the final closed-ended questionnaire. A consensus was determined by tabulating the rankings provided by the group.
Based on the input from the group of experts, a number of changes were made to the newly devised religion questions. The tapes and transcripts of the meeting with the experts proved to be quite valuable in this respect. In the process, problems and revisions suggested by these experts were checked against the focus group and in-depth interview data.
Once a good set of preliminary closed-ended questions was in place, a series of cognitive interviews were conducted. Cognitive interviews involve presenting study subjects with a closed-ended question followed by a series of open-ended probes. The purpose of the open-ended probe questions is to see if study subjects understand the closed-ended question in the intended manner, to see if respondents can provide a better way of phrasing question stems, and to see if subjects feel comfortable with the closed-ended response options.
Eighty-five cognitive interviews were conducted with a new sample of subjects. Once again, these study participants were selected from the HCFA list, using simple random sampling procedures. Up to this point, older adults were excluded from the study if they indicated that religion was not at all important to them. However, this exclusion criteria was relaxed at this point, and all elderly people were included in the remaining steps of this study regardless of how important religion may be to them. Forty-five of the participants in the cognitive interviews were older White adults, and 40 were elderly African American adults. All cognitive interviews were conducted face-to-face in the homes of the study participants.
The closed-ended religion items were broken down into four blocks. Cognitive interviews were conducted using approximately 20 subjects for each of the four blocks of questions. It was not necessary to probe some closed-ended questions because a good deal is already known about them (e.g., “At the present time, what is your religious preference?”). The cognitive interviews lasted between 60 and 90 min. All cognitive interviews were tape-recorded.
Initially, members of the research team were concerned that study participants would quickly tire when they found that each closed-ended question would be followed by a series of open-ended probes. This did not prove to be the case. Instead, the older people in this study worked hard to help the research team improve the closed-ended questions. This high level of motivation was due in part to the way the cognitive interviews were introduced. Subjects were told that a great deal of time had been spent speaking with older people about religion in an unstructured manner. They were also told that a series of questions had been written about religion based on these conversations, but to know if these items are any good, their critical input was essential. Subjects were asked to work hard to help the research team uncover problems and find new ways to improve the questions. This approach represents a modified version of Cannell’s strategy for increasing respondent motivation and commitment (Cannell, Miller, & Oksenberg, 1981
Two broad approaches may be followed in developing probe questions for cognitive interviews. The first is to ask very general follow-up questions, as in the think-aloud strategy (see Foddy, 1998
, for a recent discussion of this technique). An example will help clarify the nature of these general probe questions. Using religion to derive a sense of meaning in life emerged as an important theme in this study. Consequently, a series of closed-ended questions were devised to assess this domain. One item from this battery asked subjects if they felt that, “God has a specific plan for my life.” This indicator was followed with a broad probe: “Can you tell me a little more about what you were thinking about when you answered this question?”
These general probe questions did not appear to be especially useful. Instead, a second approach involving more focused probe questions was more helpful. This strategy can be illustrated by turning to forgiveness. Some subjects in the qualitative studies indicated that when they were hurt by someone, they were usually able to “forgive and forget.” A closed-ended question was written to assess this topic. After administering the closed-ended question, subjects were asked the following focused probe questions: “What does the phrase ‘forgive and forget’ mean to you? If you were to ask a friend about this, how would you do it—what words would you use to see if they can forgive and forget?” The observation that focused probe questions tend to produce better feedback than general probe questions is consistent with recent research on cognitive interviewing by Foddy (1998)
An example may help clarify how the focused cognitive interview probes were used to revise the religion measures. In his outstanding work on the measurement of religious coping, Pargament (1997)
devised an item that asks respondents how often they look to God for strength and guidance in a crisis. The respondents in our focus groups and in-depth interviews indicated they turned to God for this purpose, but the members of the research team were concerned that the item devised by Pargament (1997)
was double-barreled because “strength” and “guidance” may not mean the same thing to study participants. To evaluate this possibility, participants in the cognitive interviews were presented with the original item developed by Pargament (1997)
. Focused probe questions were subsequently asked to see if they felt that turning to God for strength was the same as turning to God for guidance. The wide majority of study subjects indicated they were different and recommended that separate questions be asked about these issues. Based on their input, two separate items were developed: “I look to God for strength in a crisis,” and “I look to God for guidance when difficult times arise.”
The cognitive interviews were also used to evaluate the closed-ended responses for the religion items. For example, subjects were asked whether they felt it is easier to answer a question using a standard five-point Likert scale, or whether selecting a response from a Cantril ladder (scored 1–10) made more sense to them. An effort was also made to learn more about how older people calibrate closed-ended responses. For example, subjects were asked whether they used a particular religious coping response a great deal, some, only a little, or not at all. Cognitive interview probe questions were subsequently asked to determine how much difference (if any) there was between a response of “some” and a response of “only a little.”
The newly developed religion items were revised again after members of the research team listened to the tapes of the cognitive interviews. These revisions were made by consulting the data gathered from the focus groups and in-depth interviews, and by turning to the feedback provided by respondents during the cognitive interviews.
The closed-ended religion items were administered to a new sample of 98 older subjects. Half the respondents were older White adults (n = 49), and half were elderly African Americans (n = 49). These subjects were, once again, selected from the HCFA list. All pilot test interviews were conducted face-to-face in the homes of the study participants. Only closed-ended quantitative questions were administered in this phase of the study. As a result, this step in the item development strategy more closely approximates a traditional pilot test.
The goal of the pilot study was to check the length of the survey, to examine frequency distributions to make sure the indicators had sufficient variance, and to perform exploratory factor analyses to examine the structure and psychometric properties of the newly developed scales. This information was supplemented with feedback from the interviewers about any problems they encountered with the questionnaire. Some minor revisions were made in some of the questions at this point. In addition, the exploratory factor analyses helped identify a few items that could be eliminated from some scales, thereby holding down the overall length of the interview schedule.
After this phase of the item development strategy was complete, a total of 175 closed-ended questions were in hand that assessed 14 major dimensions of religion. Copies of the religion items are available on request from the author.
The next step in the item development strategy involved administering the closed-ended questionnaire to a nationwide sample of 1,500 older people. Data were collected by Harris Interactive (formerly Louis Harris and Associates). Interviewing was completed in August 2001. The sample consisted of 750 elderly White subjects and 750 older African Americans. The sampling frame for the study was again provided by HCFA. Greater detail on the sampling procedures is available from the author.
Before the nationwide survey went into the field, the questionnaire was reviewed closely by staff members at Harris Interactive who have considerable experience in writing closed-ended survey questions. A few very minor revisions were made to the religion items. Then the entire interview schedule was evaluated in two brief pilot tests, consisting of 15 subjects each.
The final step in the item development strategy is currently in progress. A series of substantive papers are being written to explore the relationships between select dimensions of religion, health, and well-being (copies of these papers are available from the author). Detailed psychometric testing is being performed on the religion measures that are used in each paper. Four issues involving these tests are discussed briefly below.
First, all multiple item scales are examined with both exploratory and confirmatory factor analyses. The goal is to see whether items that were designed to capture a specific dimension of religion cluster together, and whether the indicators measure each factor well (i.e., whether the factor loadings are equal to or greater than .400).
Second, the findings from the confirmatory factor analyses are being used to compute internal consistency reliability estimates for each scale. More specifically, based on a formula provided by Rock, Werts, Linn, and Jöreskog (1977)
, the factor loadings and measurement error terms associated with each item are used to derive reliability estimates. So far, the reliability estimates for the newly devised scales are generally in excess of .800.
Third, in the process of performing the confirmatory factor analyses, tests are being performed for measurement invariance (Bollen, 1989
). As noted earlier, older White subjects make up half the sample obtained in the nationwide survey, whereas the other half consists of older Black subjects. Consequently, it is important to see whether the factor loadings and measurement error terms are equivalent in both racial groups. If these parameter estimates are the same, it is reasonable to conclude that the survey items mean the same thing to older White respondents and older Black respondents. One goal in conducting the focus groups, in-depth interviews, and cognitive interviews for this study was to develop survey items that mean the same thing to older White and older Black people. However, the assessment of measurement equivalence in this context is largely based on subjective evaluations of feedback provided by the focus group, in-depth interview, and cognitive interview participants. The empirical tests of measurement equivalence that are now being performed with the nationwide data are a nice complement to these qualitative strategies, and provide yet another way of more tightly merging qualitative and quantitative methods in studies designed to develop closed-ended survey measures.
Finally, in addition to assessing issues in measurement invariance and scale reliability, it is also important to evaluate the validity of the newly devised measures. Unfortunately, it is difficult to estimate some types of validity (i.e., predictive validity) with data that have been gathered at one point in time only (see Carmines & Zeller, 1979
, for an insightful discussion of scale validity). However, it is possible to evaluate construct validity with the data on hand. As noted earlier, construct validity is evaluated by embedding new measures in substantive conceptual models to see whether the new indicators are related to select outcomes in theoretically meaningful ways. Although the establishment of construct validity is an ongoing task that requires replication across a series of studies, the results that have emerged from the analyses that have been done so far are very encouraging. For example, as one might anticipate, older people who receive spiritual support from their fellow parishioners indicate they have a closer relationship with God, and people who feel closer to God report they are more hopeful about the future (further detail on this is available from the author).