In this section, we first present models of participants' understanding of the scenario. Next, we present common information-seeking patterns and relate it to their initial hypotheses. We conclude by characterizing the impact of user competencies and resource features during different search stages.
Participants' Demographics and Characteristics
Half of the participants had no formal education beyond high school, while the other half had at least college education. summarizes participants' characteristics by education level, based on an introductory questionnaire. The table suggests that although all participants had significant computer and Internet experience, college-educated participants were more likely to have used MedlinePlus® prior to the study. Females were more likely than males to report frequently using the Internet to obtain health information. In the high-school education subgroup, three of the four participants reporting frequent (“often”) Health Internet use were females. In the college-or-above education subgroup, all four participants reporting frequent Health Internet use were females.
Table 3 Demographics and Characteristics
Participants' Models of the Scenario: Background Knowledge and Hypotheses
This section corresponds to the content of the first component of our theoretical framework, background knowledge, theories and hypotheses. On the surface level, participants' understanding of the scenario could be labeled as incorrect or imprecise. None of the participants were able to identify stable angina (or angina) as the condition described in the scenario; none made a reference to coronary artery disease (CAD) as resulting in the symptoms. Their hypotheses about the situation differed in breadth and certainty. Seven participants proposed the Specific Hypothesis that the symptoms described in the scenario most likely signified a heart attack or a potential heart attack. While the “potential heart attack” hypothesis was not technically incorrect (stable angina increases the likelihood of a heart attack in the future), it is imprecise and not optimal in terms of prospective information-seeking trajectories. Eight believed that the character in the scenario suffered from some “heart problem,” while one suggested it was “old age” (Area Hypothesis). Four listed a number of cardiac and non-cardiac (e.g., stroke) illnesses that could explain the symptoms in the scenario (Assorted Hypothesis).
On a deeper, structural level, participants' understanding of the scenario differed from the reference model in three critical aspects: key concepts, symptoms' grouping and symptoms' characteristics. First, key concepts in the participants' models were different from those in the reference model. Two of the three disease key concepts of the reference model, coronary artery disease (CAD) and angina, were not mentioned by the participants. With respect to the third key reference model concept, atherosclerosis, a distinction needs to be made between participants' use of the term and their reference to the relevant concept. While the lexical form of the term was mentioned by only 3 of 20 participants, 18 made some reference to the blockage of blood vessels.
For all but one participant, cardiac concepts were prominent in their explanation of the disease. Eighteen participants mentioned heart attack either as their primary hypothesis or as a candidate hypothesis (classification of participants' hypothesis is presented in the following section). Blockage of blood vessels was seen as a main mechanism for causing cardiac problems; other mechanisms mentioned involved tear to the heart muscles, irregular heart beat and “electrical problem with the heart.” Eight participants suggested that the symptoms could be related to non-cardiac problems, such as stroke, arthritis, asthma and diabetes. These were not always supported by a physiological explanation (e.g., “This could be diabetes, because this disease can do weird things”).
The second distinction, symptoms' grouping, involves the tendency to connect the symptoms to either single or multiple conditions. The reference model presents all symptoms in the scenario as potential indicators of one condition. In the participants' models, nausea and dizziness were sometimes seen as unrelated to a cardiac problem, and indicative of a co-occurring non-cardiac condition (perhaps less worthy of immediate concern). The third distinction between the reference and participants' models involves the importance ascribed to certain symptoms' characteristics. None of the participants noted the significance of the short duration or pain, its relation to exertion and response to rest.
presents a model of Participant 17, a woman with a high-school education, which encompasses the three characteristics of lay models described in this section. The concepts mentioned in her explanation did not include those prominent in the reference model (key concepts). She viewed all of the symptoms described in the scenario as potentially indicative of either stroke or heart attack (symptoms' grouping). In her model, both conditions were related to blockage of blood vessels in the brain (with heart attack being the body's response to the resulting stress). She also felt that shortness of breath could be indicative of asthma (symptoms' grouping). Characteristics of the chest pain are not discussed (symptoms' characteristics).
Semantic analysis of domain knowledge, participant 17.
The language used by participants in their discussions also suggests a potential problem related to the lack of medical vocabulary knowledge. While many participants seemed aware that the scenario did not conform to classic symptoms of a heart attack, and used descriptors such as “potential,” they did not seem to possess the vocabulary that would allow them to “legitimize” the phenomenon by labeling it, and later searching for it.
Information-Seeking Processes: Goal Setting, Search Execution and Information Evaluation
This section corresponds to steps 2–4 of our theoretical framework, analyzing the flow of the information-seeking process from goal setting, search action steps and information evaluation. The theoretical framework for this study suggests that information seekers' search goals are influenced by their prior knowledge and hypotheses. The analysis of the knowledge models in the previous section suggests that for many participants, their initial search moves will not directly involve (stable) angina and coronary artery disease (CAD), but will often lie within the domain of cardiac problems. Participants are also likely to look for information supporting their initial beliefs. Differences in the understanding of the relevant concepts and the relations among them may negatively impact information evaluation and navigation choices. However, the strength of the connection between knowledge/understanding and action is likely to vary, and strategies are likely to emerge that are not predicted by research on theory-evidence coordination. As the focus of the study is on characterizing information-seeking trajectories, we chose to partition the data on the basis of the initial search goals and moves rather than hypotheses, and to relate patterns or knowledge use to these trajectories. In all cases, the initial action was consistent with the explicitly expressed goal. We thus categorized the participants into three clusters: Verification-First; Problem Area Search-First; and Bottom-Up. Subsequent switches to another strategy did not affect cluster assignment. summarizes key statistics for each cluster. represent prototypical information seeking sequences in each cluster, described in-depth in the following three subsections.
Table 4 Cluster Data Summary
Information-seeking sequences of verification-first participants with specific hypotheses.*
Information-seeking sequences of problem-area narrowing-first participants with area hypotheses.*
Information-seeking sequences of bottom-up-first participants with area hypotheses. *
Eight of the participants—P 1, 4, 6, 9, 11, 15, 17, 20— (40%) started by attempting to verify a specific illness. For five of the eight, the highest completed level of education was high school. Compared to the other clusters, participants in this cluster were more likely to start out with a specific hypothesis namely, a heart attack. All other hypotheses in this cluster also referenced heart attack. All participants navigated to a high-quality heart-attack site soon after starting their search. None used any strategies other than verification (e.g., bottom-up or narrowing). Seven participants arrived at the incorrect conclusion that the situation described in the scenario involved heart attack. They did it by focusing on the similarities between the descriptions of the heart attack on the sites and the scenario (squeezing chest, neck and shoulder pain; shortness of breath, nausea). At the same time, they ignored the differences: symptoms in the scenario emerged upon exertion, lasted 2–3 minutes only, and were alleviated by rest. As described in the previous section, the relative importance ascribed to various symptoms' characteristics
is different in participants' models and the text-based reference model. Ignoring symptoms' characteristics that are viewed as non-essential can also be seen as exemplary of the selective perception bias. 27
A notable reasoning pattern in this cluster involved the confirmation bias: starting with the heart attack hypothesis, navigating to a heart attack site and concluding that the information confirmed their hypothesis (). Three participants in this group also demonstrated what can be interpreted as premature search termination bias by stopping their search after reviewing only one (in all cases, incorrect) content topic and judging the information to be satisfactory.
The search trajectory of Participant 4 (high school education) illustrates the confirmation bias pattern in this cluster (). In responding to the scenario during the semi-structured interview, she expressed Specific heart attack hypothesis by saying, “I guess, it almost sounds like she's having a heart attack. Especially with all her symptoms. Um, the squeezing of the chest. The pain down her arm. Or I should say her shoulder and having the pain after physical activity and also being nauseous are the ones that probably concern me the most.” Afterwards, she explained her information-seeking goal as wanting “to look up heart attack.” She then typed “heart attack” query into MedlinePlus® search window (Verification Strategy), selected National Library of Medicine Heart Attack Portal from the list of results, and proceeded to Diagnosis/Symptoms section, “What Are the Symptoms of a Heart Attack?” site by the Cleveland Clinic Foundation (Search Actions). Once on the page, she scrolled to the section that stated “Symptoms of a heart attack include.” While reading the symptoms' list, she noted that some of the symptoms were present in the scenario (e.g., chest pain and nausea), while others were absent (e.g., sweating and light-headedness). She then concluded that in her opinion, the character in the scenario was suffering a heart attack. Only one participant in this group did not follow the pattern, noting the difference in the duration of the symptoms. He then followed a link from the MedlinePlus® heart attack encyclopedia page to the angina page, and concluded that the scenario described angina. This participant had graduate-level education and had used MedlinePlus® on many occasions in the past.
Participant 4 search trajectory (verification-first, confirmation bias).
Problem Area Narrowing-First Cluster
Five participants—P2, 3, 8, 13 and 19—(25%) started with problem area search. High school was the highest completed level of education for two of the five. Four of these participants had Area hypotheses, one had Assorted hypothesis. These participants started with either a general query such as “heart disease” (3 participants) or by browsing the site index tree (2 participants). One participant eventually switched to a bottom-up “chest pain” query; the rest continued with the narrowing strategy throughout the session. All eventually navigated to sites describing specific diseases. However, three of the five spent time on sites that had little potential for answering their questions (e.g., health news about a specific treatment procedure). Unlike the participants in the verification-first cluster, these participants were more likely to leave without a conclusion than with an incorrect conclusion.
represents a prototypical model of problem-area narrowing-first search, representing 4 out of 5 participants in this cluster. One of the two trajectories in this model is exemplified by Participant 3's performance. This participant held a master's degree and was very familiar with MedlinePlus®. When discussing the scenario, she states “I'd be concerned about heart, because she is not getting enough oxygen or something” (Area hypothesis). She then proceeded to type “heart diseases” query into MedlinePlus® search window (Narrowing Strategy), and followed the link to the National Library of Medicine Heart Diseases portal. She then read the links under the Diagnosis and Symptoms subsection and said, “What I am finding here are all tests, nothing about symptoms, so, I am disappointed. But here is one, Heart Attack, Stroke and Cardiac Arrest.” She then followed that link to the American Heart Association site, and read the list and description of heart attack symptoms: chest discomfort, discomfort in the other areas of the upper body, shortness of breath, and others (cold sweat, nausea and lightheadedness). The description of chest discomfort indicated that it “lasts more than a few minutes” or “goes away and comes back.” She then concluded, “These look like they could be symptoms, so I feel like they you know help me feel better about saying this could be something really serious … she could be having very minor heart attacks.” As the pain episode in the scenario only lasts 2–3 minutes, this can be viewed as selective perception bias. However, this reasoning may also be ascribed to the ambiguity in the text and the scenario. It is not clear from the text how long the period of recurrence is in “goes away and come back,” while the scenario states that the character “has been troubled by periodic squeezing pain in her chest” for the past year.
We can also partially account for some of the problems by considering the configuration of the web resources. The results of many users' queries and browses displayed links to relevant information about angina. These links, however, were not prominently displayed in the users' view and some did not realize that the choices were available to them. For example, a “heart disease” query produced a list of subtopics that included the relevant Coronary Heart Disease and Heart Diseases. However, these followed a number of specific irrelevant subtopics (e.g., Heart Valve Diseases). Some relevant results appeared so far down the screen they could not be seen without scrolling. In the topics index tree, Angina was listed under Heart and Circulation, but users did not select it from the alphabetical sequence, perhaps because they did not know the term.
Bottom-Up First Cluster
represents prototypical information seeking sequences in this cluster. Seven participants P5, 7, 10, 12, 14, 16, 18 (35%) started their search by attempting a bottom-up search. Five of these participants began without a specific hypothesis. For two of the participants, the bottom-up strategy proved immediately unsuccessful: they attempted to locate a general purpose diagnostic tool not included in MedlinePlus®. These participants switched to hypothesis-driven strategies and arrived at the incorrect heart attack conclusion. The five remaining participants made progress to varying degrees towards the accurate conclusion. One chose a heart attack site from the list of results and concluded that the scenario in fact described a heart attack. Another navigated to a low-relevance site about the mechanism of pain. Three participants went to a potentially useful familydoctor.org site that contained chest pain diagnostic flowcharts. However, one of them selected the flowchart for diagnosing “acute” rather than “chronic” chest pain, and ended up concluding that it was a heart attack. The other two employed a “chronic chest pain” flowchart available on one of the sites, but were unable to follow it to angina.
Like the participants from the narrowing group, these searchers encountered relevant links, but they were scattered throughout variably relevant subtopics, sometimes below the fold (necessitating scrolling). Consistent with the previous research, 5
this was partly the function of imprecise queries. For example, entering the terms “recurrent, 2—3 minute episodes, squeezing chest pain” results in the “Angina” subtopic presented at the top of the results list. However, the users' queries were much less specific (“chest pain,” “chest pain and nausea”), most likely reflecting the perceived relevance of various symptoms and symptoms' characteristics in the scenario. This issue is illustrated by the search performance of Participant 12 (who had a high-school education). He started the session by stating an intention to find out “what it means when somebody has pain and nausea when they are being physically active,” and proceeded to type in “chest pain” as a query. The list of resulting topics was divided into the following categories: Pain, Back Pain, Angina and Abdominal Pain. This participant followed the link to Why Do I Have Pain?
children's site by Nemours Foundation. 34
The site described the brain's processing of the sensation of pain. The participant spent a significant amount of time on the site, praising the amount of information it contained, “This is what research is. If you have a computer at home, you got a library at home.” He then clicked back to the list of results, stating that he would have to study all of them carefully, and that he would also have to study the results for nausea. The participant finally clicked on several links under Pain, Back Pain and Angina subheadings, scanning the content rather than reading, and finished the session with a positive comment about his experience, but didn't reach a conclusion.
As in the other clusters, some participants tended to ignore details of textual information to which they did not ascribe significance. For example, Participant 14 read the following description of a heart attack, “… chest pain or discomfort in the center of the chest, a squeezing, heaviness or crushing feeling. Lasts more than a few minutes or goes away and comes back” and concluded, “Wow. That seems like a lot like what she's having.” As in the example in the previous section, this can be interpreted either as selective perception bias or as an appropriate response to the text's ambiguity. Early termination of the search was less common than in Verification-First Cluster, with only two of the participants restricting their review to a single content webpage.
Other Factors: The Role of Web Resources, Education Level and User Competencies
One of our goals was to characterize the role of different kinds of competencies during the task, as well as the role of web resources in mediating the effect of these competencies. The frequency distribution of different kinds of qualitative competency codes suggests that different types of knowledge were instrumental during different search stages. Domain knowledge codes were more likely to appear during goal setting and information evaluation, whereas domain understanding provided the context that determined the direction of the search as well as the interpretation of the results. Resource knowledge, strategies and metaknowledge codes were more likely to appear during navigational action steps. illustrates this point by presenting micro-coded protocol excerpts for Participant 17. In the first statement of the coded segment 1, this participant expresses her intention to execute a “heart attack symptoms” query, because in her domain knowledge model chest pain is indicative of a heart attack. This statement concerns goal setting, and domain knowledge defines the intended search trajectory by providing a hypothesis to verify. The second part of the coded segment 1 and the coded segment 2 concern the navigation to a heart attack site. They involve a strategic choice between conducting a query and following a link and the knowledge of functions available in a browser (e.g., clicking the back button). Despite the failure of the initial query due to a spelling error, the participant's strategic repertoire helped her to bypass the problem and find the site she perceived as relevant. In the final (evaluation) segment, the domain knowledge once again becomes central. The participant evaluates the information on the site and draws on her background knowledge, and concludes that the character in the scenario could be suffering from a heart attack.
Table 5 Excerpts of Coded Information Seeking Protocol of Participant 17
Level of education appeared to differentially impact the various information seeking competencies. Regardless of their level of education, participants demonstrated comparable levels of understanding of the symptoms described in the scenario (domain knowledge), as evidenced in the Participants' Models of the Scenario section. At the same time, participants with higher levels of education were more likely to be familiar with MedlinePlus® (only one college-educated participant has never heard of this portal, as opposed to eight high-school educated participants), have a repertoire of efficient search strategies (e.g., using keyboard shortcuts to open multiple tabs, exhibited by three college-educated participants and none of the high school participants), and make meta-level comments (e.g., judging the authoritativeness of a source). However, incomplete and inaccurate domain knowledge often led these participants to apply their efficient strategies to the wrong pages. It also resulted in them disregarding certain key aspects of the information.
Lack of medical vocabulary knowledge presented an additional challenge. While most participants mentioned clogging of the arteries when discussing the possible mechanisms of heart disease, they lacked precise labels for their concepts. As a result, they could not use the relevant terms in queries, and did not recognize them when scanning topics or lists of results. Many participants read the term “angina” while perusing an index tree, yet they did not perceive it as relevant. Participants with high school education were more likely to comment on the vocabulary-related difficulties. For example, one such participant started the session by selecting the Heart and Circulation subtopic of MedlinePlus® index. Upon scanning the resulting list of topics, he noted, “Now not knowing the doctor's stuff… I don't see [anything] for the average, every day Joe Public here. Maybe I'll go back up here to a search engine.” When asked to clarify what caused this difficulty, the participant explained, “See, they say ‘For Cardiac Disease see Heart Disease.' Heart disease I know, but cardiac disease, I don't know what that is.”
Two aspects of the MedlinePlus® interface (redesigned since the study was conducted) might have contributed to shaping the search trajectories that emerged in this study. The first was the lack of explicitness in relating lay and professional terms in the index. For example, in the alphabetical list of topics, the Chest Pain title suggested that the reader see Angina. However, the Angina title made no references to chest pain. The second aspect had to do with the order and organization of query results lists, where specific relevant links were presented after general and less relevant links. The search of Participant 12, presented in a previous section, illustrates this point, where the list of responses to the “chest pain” query started with the general Pain category, and were followed by Back Pain category, and only then by Angina (below the fold on many monitors).