|Home | About | Journals | Submit | Contact Us | Français|
In this article we review recent and potential applications of optical neuroimaging to human factors and usability research. We focus specifically on functional near-infrared spectroscopy (fNIRS) because of its cost-effectiveness and ease of implementation. Researchers have used fNIRS to assess a range of psychological phenomena relevant to human factors, such as cognitive workload, attention, motor activity, and more. It offers the opportunity to measure hemodynamic correlates of mental activity during task completion in human factors and usability studies. We also consider some limitations and future research directions.
Usability testing generally relies on behavioral data such as reaction time, overall task completion time, and error rate, in addition to qualitative self-report scales, which, although valuable, are subjective measures of mental status and usually assessed only periodically. Neuroimaging data, assessed in conjunction with established usability methods, offer another potentially valuable indicator of cognitive workload and other mental states. Functional near-infrared spectroscopy (fNIRS) is an optical neuroimaging technology that has been used to measure brain activity during skill acquisition, task performance, and varying levels of cognitive workload. The low-cost, portable, and noninvasive nature of fNIRS makes it particularly amenable to human factors and usability research.
In this article we review the potential applicability of fNIRS to human factors and usability research. We compare fNIRS with other commonly used imaging systems, highlighting the benefits over more restrictive and costly methods, such as functional magnetic resonance imaging (fMRI). We review the usefulness of applying fNIRS in studying a wide range of cognitive processes and mental states, specifically showcasing its ease of use outside of typical laboratory settings. This article should serve as an introduction and guideline for researchers and practitioners interested in applying fNIRS to their current usability methods.
Among the variety of fNIRS methods, the most common is continuous wave spectroscopy, which introduces light of constant frequency and amplitude into the scalp to measure changes in tissue reflectance as concentrations of oxygenated and deoxygenated hemoglobin change due to neural activity. In a manner comparable to fMRI, fNIRS imaging also measures hemodynamic changes (i.e., the blood oxygen level–dependent, or BOLD, response; Cui, Bray, Bryant, Glover, & Reiss, 2011). However, fNIRS measures changes in light absorption, whereas fMRI measures changes in magnetic properties of hemoglobin. Despite a weaker signal-to-noise ratio than fMRI, fNIRS measurements correlate strongly with fMRI measurements across a large range of cognitive tasks (Cui et al., 2011).
The fact that fNIRS measures light absorption using LED or laser light allows for a far greater range in experimentation, much of which would be off limits with fMRI because of the magnet element; for example, experimentation involving the use of technology, machinery, or equipment.
Because fNIRS is noninvasive and portable, it enables the usability researcher to collect neuroimaging data in operational environments (e.g., air traffic control, user interface testing). Many fNIRS systems use light-weight headgear, such as flexible caps with integrated or reconfigurable light sources and detectors.
Figure 1 (left image) depicts a participant wearing an fNIRS cap (from NIRx Medical Technologies, LLC). The topographic image (on the right side of figure) shows the specific probe mapping used to measure activity in the prefrontal cortex. Participants liken the feel to a snug swim cap and report very little discomfort. Battery-powered systems allow for completely untethered motion. Mobile systems enable measurement of the BOLD response while a wearer is walking or running, which opens the door to unrestricted monitoring, something previously unavailable in functional neuroimaging. The entire setup process can be completed in 10 to 15 min with trained experimenters.
Because it introduces few physical constraints, fNIRS is useful for research investigating behavior in real-world task environments. Its real-world applicability has been demonstrated across several studies in which fNIRS was used to measure the effects of multitasking on the everyday task of walking (Holtzer et al., 2011; Mirelman et al., 2014), sometimes in conjunction with handheld technology use. In contrast to fMRI, which requires complete immobility of study participants, fNIRS is relatively robust to movement artifacts. It is also relatively unobtrusive, introduces no noise, and is highly tolerable to participants. Furthermore, fNIRS has been validated as an effective tool for studying both normal brain function and a variety of pathologies within a large range of populations, including sensitive populations such as older adults and young children (Ferrari & Quaresima, 2012).
Another physiological measure commonly employed within cognitive and human factors research is electroencephalography (EEG). EEG allows monitoring of the brain’s electrical activity over time via electrodes placed on the scalp. EEG and fNIRS are similar in their small form design, offering portable systems and even using similar caps.
Whereas EEG provides greater temporal resolution than fNIRS, largely because hemodynamic changes are slower than the propagation of electrical currents, fNIRS offers advantages over EEG in that it is more robust to artifacts attributable to head, eye, and face movements, as in the case of speaking aloud. These advantages make fNIRS a promising tool for ecologically valid testing because it allows participants to engage in tasks in normal use conditions.
Despite the advantages of fNIRS, there are limitations. For example, fNIRS has a lower spatial resolution than fMRI, of around 1.5 cm, compared with 3 mm3 for fMRI. However, recently progress has been made toward higher spatial resolution through the use of multidistance, high-density probe configurations (Koch et al., 2014). Investigations into neural activity using fNIRS are also limited to the cortex because of the lower penetration depth; reaching about 1 cm of cortex, fNIRS cannot be used to measure subcortical activity.
For these reasons, fNIRS can be considered a middle ground in terms of spatial resolution – with greater spatial resolution than EEG but limited spatial resolution compared with fMRI – though being portable and far more affordable than fMRI. In addition, fNIRS and EEG are compatible; some system manufacturers have created integrated caps offering the opportunity to collect hemodynamic and bioelectric measurements in tandem.
In this article, we limit our attention to technologies with potential for user experience (UX) research. We do not consider more invasive techniques that involve injection of radioactive material into the bloodstream, such as single-photon emission computed tomography (SPECT), or recently developed techniques involving diffusion tensor imaging (e.g., cortical surface mapping), which involves an MRI scanner.
Research in the areas of cognitive and clinical neuroscience has demonstrated that fNIRS is a useful tool for measuring a variety of cognitive and physiological processes, with research investigating across the prefrontal, somatosensory, and sensorimotor cortex. Previous research in these areas supports the potential for extending fNIRS applications into human factors and usability research.
Researchers have used fNIRS to assess a wide range of cognitive functions, including mental workload, executive functioning, attention, memory, somatosensory activity, motor activity, and even emotion. Fishburn and colleagues (Fishburn, Norr, Medvedev, & Vaidya, 2014) demonstrated that fNIRS is sensitive to both cognitive task type and level of cognitive demand. They measured brain activity during a resting state and a working-memory task at three difficulty levels. Results indicated discernable differences between resting and working-memory conditions, with activity scaling up with task difficulty. Thus fNIRS provides an indicator to measure changes in cognitive workload in complex work environments and tasks.
Research has been conducted using fNIRS to view functional changes in both the somatosensory and sensorimotor cortices. It has been used to noninvasively measure pain levels and shows potential in pain management systems assessment (Azar, 2009). As it has been used to monitor blood oxygenation changes related to pain, fNIRS is useful for monitoring the effectiveness of pain interventions. It has also been used to measure cortical blood flow changes related to fine motor skills, allowing researchers to discriminate between different motor responses and levels of motor complexity (Holper, Biallas, & Wolf, 2009). This application is particularly relevant to UX practitioners, as interaction with user interfaces (UIs) often requires precise finger dexterity for navigation (e.g., using video game controllers or touch screen tablets).
Researchers have also used fNIRS to investigate ventrolateral prefrontal cortex activity in emotional response (Tupak et al., 2014) while assessing preference, quality of experience, and acceptance. Given that fNIRS imaging has been validated as a means to evaluate cognitive workload, working memory, and attention, along with various other mental states, it can be linked to the goals of usability research, which is concerned with these cognitive faculties.
The International Organization for Standardization (ISO, 2010) defines usability as “the extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.” The purpose of usability testing is to design systems or interfaces that, in addition to being effective, also reduce working-memory demand and overall cognitive workload of the task. This goal is evident from the tenets of Neilson’s (1994) 10 design heuristics and Shneiderman’s Eight Golden Rules (Shneiderman, Plaisant, & Jacobs, 2009), which are highly cited within human factors and human–computer interaction research.
Usability scales often contain self-report dimensions designed to measure the user’s mental state while completing a task. Subjective indicators, such as frustration, satisfaction, and workload, are difficult for participants to verbalize and also fluctuate throughout a task’s duration. In the typical posttest format, usability scales cannot dynamically assess user experience over time. Methods such as the think-aloud protocol are designed to measure the dynamic user experience but still lack objective measurement of continuous changes in users’ mental state. Additionally, some researchers have found that adding this style of verbalization technique to a UI task can fundamentally change the user experience by increasing the overall workload of the task (Pike, Maior, Porcheron, Sharples, & Wilson, 2014). In combination with traditional usability metrics, fNIRS provides an arguably more objective and less performance-disruptive means of inferring mental-state changes across task duration.
Thus fNIRS is compatible with the type of research that usability practitioners engage in. For example, researchers cannot have participants test out a new touch screen interface while inside an MRI magnet, because no ferrous metal can be allowed into such a system, or pilot an interface that relies heavily on face movements while using EEG because of the extreme level of motion artifacts. These restrictions can be mitigated significantly using fNIRS, opening a wider range of experimentation with neuroimaging.
Solovey and colleagues (2009) championed the usefulness of fNIRS within human–computer interaction research, finding that it posed no significant disruptions to behavior when engaging in an interface task (e.g., necessary head and face movements or motor manipulations of keyboard or mouse). What follows are several examples of application of fNIRS to human factors and usability research.
Many studies have utilized fNIRS to examine workload- and training-related brain activity, and fNIRS measurements have been validated as a means of assessing mental workload via correlation with NASA Task Load Index (NASA-TLX) scores (a commonly used measurement of mental workload; Maior, Pike, Sharples, & Wilson, 2015; Peck, Yuksel, Ottely, Jacob, & Chang, 2013; Pike et al., 2013). For example, Peck and colleagues (Peck et al., 2013) found that fNIRS measurements could be used to compare mental workload across different visual designs. Participants completed an analytical task using separate visual displays, one composed of bar graphs and one composed of pie charts. Analysis of fNIRS data enabled researchers to discern which display created the most difficulty for participants as a result of higher mental workload demands. Cross-validation with EEG and NASA-TLX data indicated that fNIRS is a valid means of workload measurement.
Cognitive workload shares an inverse relationship with expertise. As a person becomes practiced in a task, the associated cognitive demand often decreases (Fairclough, Venables, & Tattersall, 2005). This relationship is visible in fNIRS evidence, with experts generally displaying lower activation levels than novices for the same task (Ayaz et al., 2011). This finding implies that fNIRS could be useful in assessing skill acquisition and development of expertise, which is especially useful if behavioral (e.g., high accuracy) or self-report (e.g., response confidence ratings) measures show ceiling effects. Reduced mental effort is often associated with an increase in automatic information processing (Fairclough et al., 2005). Thus fNIRS could be a useful tool for assessing the level of automatic processing during a task, which may be necessary to know when a person’s capacity to multitask is high.
Increasingly, fNIRS is used to assess affective states or at least emotional valence (positive or negative). It has been used to infer implicit preference judgments through emotional response to particular products or systems (Peck, Afergan, & Jacob, 2007). In an effort to assess a user’s judgment of visual pleasantness or attractiveness, Kreplin and Fairclough (2013) monitored brain activation with fNIRS while participants viewed artistic images. The images were previously rated on a variety of measures, including valence (positive or negative) and attractiveness. Examples of positive/attractive images were of brighter colors, smooth lines, and pleasant items or scenes. Negative/unattractive images were generally darker, with harsh lines and abstract depictions. Results indicated that there was higher activation in the prefrontal cortex when viewing positive and attractive images compared with viewing negative images.
Peck and colleagues (2007) were interested in using fNIRS to predict real-time preference judgments during a task. Participants watched a series of movie clips they previously rated as either favorable or unfavorable while fNIRS recorded cortical hemodynamic changes. This measurement was used as a baseline activation level for “preference” and was submitted to a machine-learning classifier. The classifier provided a contrast during screening of random movie clips, enabling accurate prediction (72%) of viewer preference. This finding suggests that fNIRS imaging could be valuable for usability testing by providing an additional measure for dimensions, such as preference, that traditionally are subject to self-report limitations.
Gupta and colleagues (2013) used emotional valence and preference metrics to assess user acceptance of text-to-speech technologies. Participants listened to a series of text-to-speech recordings and rated them across a series of dimensions, such as quality, comprehension, and pleasantness. The authors used fNIRS to infer emotional valence and preference during the task, underscoring the potential of fNIRS to assess emotion-based preference through monitoring prefrontal and somatosensory cortex.
This research was further expanded upon in an attempt to evaluate user quality of experience (Laghari et al., 2013). Quality of experience is composed of dynamic mental states, such as preference, emotional valence, and expectations throughout the duration of a task, and is difficult to objectively assess. It was demonstrated that fNIRS could provide the means to measure and quantify these subjective insights (Laghari et al., 2013).
Social interaction is a prominent component of many interactive task environments. Hyperscanning, or measuring two brains simultaneously, is thought to hold potential for gaining insight into how information is processed by multiple participants in an environment.
Cui, Bryant, and Reiss (2012) demonstrated that fNIRS presents a cost-effective and noninvasive method of hyperscanning. Participants in groups of two engaged in either a cooperative task, requiring they work together to achieve a goal, or a competitive task, requiring they work against each other to achieve individual goals. The researchers observed higher interbrain coherence during tasks of a cooperative, rather than a competitive, nature. This finding offers a great deal of insight into the subtleties of human interaction by monitoring interbrain coherence, a measure of correlation between brain activity signals from separate individuals. Hyperscanning allows researchers to study dynamic social interactions at a neural level over and above measures of each individual’s experience.
Gauging mental workload in real time has potential to support adaptive interaction in the event of cognitive overload. The goal is for adaptive automated systems to use neurofeedback information to actively initiate system changes that split information-processing requirements among system users to mitigate overload. This type of system is considered a passive brain–computer interface, whereby real-time signal analysis is combined with machine-learning classifiers to provide online information to an adaptive system.
Girourd, Solovey, and Jacob (2013) developed a system – called online fNIRS analysis and classification (OFAC) – that enabled them to use affective and workload states monitored through fNIRS to adapt to real-time classification tasks with 85% accuracy. Preliminary research suggests that fNIRS can convey real-time mental workload in complex cognitive tasks, such as air traffic control.
Ayaz and colleagues (2013) tested neurofeedback-based adaptive automation by having participants complete either an adaptive task, which moderated workload based on performance, or a nonadaptive task, which did not alter load. The fNIRS results, in conjunction with behavioral data, showed higher activation for those in the nonadaptive condition, indicating greater mental effort. These examples suggest that performance can to some degree be monitored, predicted, and moderated through use of fNIRS imaging as a measure of cognitive workload.
Gateau and colleagues (Gateau, Durantin, Lancelot, Scannella, & Dehais, 2015) conducted a study testing the feasibility of fNIRS neurofeedback outside a laboratory setting using a realistic flight simulation task. Results indicated that fNIRS was useful in conducting online state inference for pilots, enabling differentiation between flight simulations at multiple levels of working-memory taxation. The researchers did, however, express safety concerns with real-world applications, especially in the case of aeronautic and automotive application. Despite the relative noninvasiveness of current systems, there is room for improvement in order to render the system completely unobtrusive in high-risk scenarios (e.g., removal of wires).
Hirshfield and colleagues (2011) considered the potential to enhance usability testing with fNIRS. To aid researchers in the application of fNIRS to their own UX research, they created a four-step usability protocol to assess mental workload of UI tasks.
Stage 1 of the protocol requires researchers to find benchmark tasks at both high and low levels of workload that measure a variety of cognitive resources, such as working memory, visual search, and response inhibition. Stage 2 is the design of the desired UI task. Stage 3 requires the user to complete the benchmark tasks to provide a baseline of measurement during the low and high levels of workload. Users would then complete the UI task while their brain activity is monitored with fNIRS. In Stage 4, the benchmark measures are input as training data and the UI measures are input as testing data into a machine-learning classifier. This classifier is then used to determine the user’s workload during the given UI task.
This protocol enabled the researchers to use brain activity to interpret task demands during the higher-workload-level UI tasks. This protocol was tested with two UI tasks: a driving simulator and a Web-browsing task. The selected benchmarks included a visual search, response inhibition, and a working-memory task. Results indicated that it was possible to use brain activation in the cognitive benchmarks to interpret which UI versions were more demanding. The fNIRS results were corroborated by accuracy and survey data. The researchers used these empirical results to make practical suggestions for the design of Web browsers and driving simulations used for traffic schools.
Care must be taken when making so-called reverse inferences from neural data. Reverse inference occurs when engagement of a particular cognitive process (e.g., high workload) is inferred arising from the activation of a specific brain region. Poldrack (2006) argued for the importance of converging evidence from behavioral and neural data to corroborate a reverse inference.
Confidence in reverse inferences can be improved by increasing regional response specificity using well-understood tasks, limiting the size of the measurement region of interest, and estimating engagement of the cognitive processes in similar or related tasks through converging behavioral measures. Reverse inference can be flawed but is potentially useful if approached with caution.
The consequences of an uncertain reverse inference are somewhat mitigated in human factors/usability work, as the goal is effective engineering rather than a detailed understanding of underlying brain responses. The examples given earlier suggest it is possible to correlate brain activity to behavior and environmental changes to improve usability.
The link between behavior and neural data must be further validated in usability and ergonomics research before there will be widespread adoption of fNIRS imaging within this domain. The correlation between usability metrics and underlying neural activity is complex and sometimes inconsistent. There is a need to understand how these measures covary across time and study participants, because usability – and neuroimaging in general – are highly subject to individual differences.
One goal is for fNIRS imaging to eventually support real-time assessment of a variety of cognitive processes, including cognitive and physical overload mitigation, and as a real-time decision aid (with the use of judgment/preference cues). A more immediate opportunity for advancement lies in the wide-spread availability of mobile measurement systems. Mobility opens the door to a vast range of neuroimaging experimentation that has previously been impossible, allowing for application to real-world environments and scenarios that are particularly relevant to human factors research.
Audrey P. Hill is a 4th-year graduate student in the Applied Experimental and Human Factors Psychology Program at the University of Central Florida. Her work focuses on cognitive neuroscience related to decision making and learning, and assessment of cortical response in clinical populations (e.g., posttraumatic stress disorder patients, children with attention-deficit/hyperactivity disorder).
Corey J. Bohil is an assistant professor in the Applied Experimental and Human Factors Program in the University of Central Florida Department of Psychology. He received his PhD in cognitive psychology from the University of Texas at Austin. His research focuses on the cognitive processes underlying categorization and decision making.