|Home | About | Journals | Submit | Contact Us | Français|
The current study evaluates the efficacy of a P300-based Brain-Computer Interface (BCI) communication device for individuals with advanced ALS.
Participants attended to one cell of a N×N matrix while the N rows and N columns flashed randomly. Each cell of the matrix contained one character. Every flash of an attended character served as a rare event in an oddball sequence and elicited a P300 response. Classification coefficients derived using a stepwise linear discriminant function were applied to the data after each set of flashes. The character receiving the highest discriminant score was presented as feedback.
In Phase I, six participants used a 6×6 matrix on 12 separate days with a mean rate of 1.2 selections/min and mean online and offline accuracies of 62% and 82% respectively. In Phase II, four participants used either a 6×6 or a 7×7 matrix to produce novel and spontaneous statements with a mean online rate of 2.1 selections/min and online accuracy of 79%. The amplitude and latency of the P300 remained stable over 40 weeks.
Participants could communicate with the P300-based BCI and performance was stable over many months.
BCIs could provide an alternative communication and control technology in the in daily lives of people severely disabled by ALS.
Brain-computer interfaces (BCIs) circumvent motor output and convey messages directly from the brain to a computer (Kübler et al., 2001; Wolpaw et al., 2002). Thus, BCIs may be able to provide a new communication channel to individuals with severe neurological or muscular diseases. This includes patients with locked-in syndrome (LIS). LIS is characterized by complete motor paralysis, except for eye movements, with intact cognition and sensation (Laureys et al., 2004). Amyotrophic lateral sclerosis (ALS) is a progressive neurological disease that often leads to LIS (Karitzky and Ludolph, 2001). A communication tool that is independent of muscle control would allow individuals with LIS to regain a level of autonomy, and to be less dependent upon others for communication, particularly after they have lost reliable eye-movement.
The P300 event-related potential (ERP) is one possible BCI control signal. The P300 is a positive deflection in the electroencephalogram (EEG) that occurs 200 to 700 ms after stimulus onset and is typically recorded over central-parietal scalp locations (Fabiani et al., 1987). The response is evoked by attention to rare stimuli in a random series of stimulus events (i.e., the oddball paradigm) (Fabiani et al., 1987).
Farwell and Donchin showed that the P300 can be used to select items displayed on a computer monitor (Farwell and Donchin, 1988; Donchin et al., 2000). The authors presented study participants with a 6×6 matrix where each of the 36 cells contained one character (Figure 1). The participants were instructed to attend to one of the 36 cells (the target) while the matrix rows and columns flashed in a random order. This design represents an oddball paradigm. In one trial of 12 flashes (6 rows and 6 columns), the target cell flashes only twice: once in a column and once in a row. These two rare events in the context of the other 10 flashes typically elicit a P300 response.
While recent studies have successfully demonstrated several alternatives for P300-based communication and control, none have explored use of a matrix speller for spontaneous communication. Sellers and Donchin (2006) tested a four-choice paradigm with three healthy volunteers and three volunteers with ALS. The words 'yes', 'no', 'pass,' and 'end' were presented one at a time as auditory, visual, or auditory and visual stimuli. The participant’s task was to count the number of times the designated target (either ‘yes’ or ‘no’) was presented in a random sequence of the four choices. The authors showed that a target probability of 25 % dependably elicited a P300 response, and that the response remained stable over a period of 10 sessions in people with and without ALS.
Piccione et al. (2006) also tested a four-choice P300 paradigm with seven healthy volunteers and five individuals with tetraplegia, including one individual with ALS. In this paradigm, the participants attended to one of four flashing arrows to indicate which direction they wished to move a cursor: up, down, left, or right. Hoffman et al. (2007) tested a six-choice visual paradigm that allowed four healthy volunteers and five people severely disabled by ALS to select among icons representing four devices, a door, and a window.
Recently, Sellers et al. (2006b) presented preliminary data from subjects A and B in this study to illustrate their P300 response during successful use of the 6×6 matrix speller. The present study was designed to extend those encouraging initial findings by studying a larger group of individuals with ALS, evaluating the stability of their BCI performance in repeated sessions over a prolonged period of time, and by determining if an ALS population could use a P300-based matrix speller to communicate spontaneous words and phrases.
Eight individuals with amyotrophic lateral sclerosis (ALS), who were severely paralyzed (ALS-FRS score of 12±8), provided informed consent for the study. The study was approved by the Ethical Review Board of the Medical Faculty of the University of Tübingen in Germany. All participants were tested in their home environment. Six individuals completed Phase I of the study in which they copy-spelled 51 characters in each of 12 experimental sessions (described below in section 2.5). A seventh individual was excluded because our algorithm failed to detect a reliable difference in his EEG between the desired character and all other characters while he performed the copy-spelling task. An eighth individual withdrew after eight copy-spelling sessions when persistent electrical noise in the home environment prolonged set-up and data-collection time. Four individuals who completed Phase I also completed Phase II of the study. In Phase II, participants performed a brief copy-spelling session followed by free-spelling that allowed them to produce original and spontaneous messages. All demographics are included in Table 1.
We conducted the study in two phases. In Phase I, each participant completed 2 calibration sessions followed by 10 copy-spelling sessions over 6 – 14 weeks. In copy-spelling sessions, the computer prompts the user with text to copy character by character (Kübler et al., 2001). We sought to demonstrate three important principles. First, that the P300 response to a desired character compared to all other characters in an n × n matrix could be detected reliably in individuals with ALS - with a minimum of 70% accuracy as a predictor for satisfactory communication (Choularton and Dale, 2004; for further discussion of why a minimal accuracy of 70% is required for satisfactory BCI control see Sellers et al, 2006a). Second, that accuracy does not change significantly over time. In this study, we examined performance over the course of 10 sessions using a matrix-spelling task. Third, that over time P300 amplitude does not decline nor does P300 latency substantially change. While amplitude decline would ultimately cause the system to fail, amplitude and/or latency changes, while requiring periodic updates in the system, would not necessarily compromise BCI performance.
In Phase II, each subject completed at least 10 free-spelling sessions over 17 – 40 weeks. In a free-spelling session, participants choose characters at will (Kübler et al., 2001). Therefore, Phase II provided and initial test of BCI usefulness: can a BCI be used for independent communication by individuals who are severely disabled?
All aspects of data collection and experimental design were controlled by the BCI2000 software system (Schalk et al., 2004). The EEG was recorded with Ag/AgCl electrodes in a 16-channel cap (Electro-Cap International) (Fp1, Fp2, F3, Fz, F4, T7, T8, C3, Cz, C4, Cp3, Cp4, P3, Pz, P4, and Oz) according to the modified international 10–20 system (Jaspers, 1958). In Phase II, electrodes Po7 and Po8 were substituted for Fp1 and Fp2. Each channel was referenced to the right earlobe and grounded to the right mastoid. Impedances were kept below 5 kΩ. The EEG was amplified (x 20,000) with a g-tec USB amplifier, bandpass filtered between 0.01–70 Hz, and sampled at 240 Hz. Data processing, storage, and on-line display of the EEG were implemented using an IBM Thinkpad (Pentium 4 M 1.6 GHz, 512 MB RAM, Windows XP SP2). The matrix was displayed to the participant on a separate 43-cm video screen with a 60-Hz refresh rate.
For each channel in a set, an 800-ms segment of data following each intensification was extracted. Each segment was filtered by a moving average of 12 points and then decimated each by a factor of 12. The resulting data segments were concatenated by channel for each intensification, creating a single feature vector for training the classifier.
Each participant sat in a comfortable chair or his or her own wheelchair 0.5 – 1.5 meters away from a computer screen that displayed a 6 × 6 matrix beneath two horizontal lines (Figure 1A). The 36 squares of the matrix contain the English alphabet, the numerals 1–9, and an underscore. Characters were arranged from left to right and top to bottom. The text-to-copy (e.g., the word Franz) appeared on the first horizontal line (i.e., the text-to-copy line). The character-to-select, (e.g., the letter n in Figure 1), appeared in parentheses at the end of the presented text. The task was to attend to the character-to-select in the flashing matrix and count how many times it flashed. The flashes were presented in random order.
For each character-to-select, 20 sequences of flashes were presented, each sequence containing 12 stimuli (one for each column and one for each row). Each stimulus flashed for 100 ms and then the screen was static for 75 ms. Thus, flashes occurred every 175 ms. Accordingly, the duration of each sequence was 2.1 s and each character selection totaled 42 s. One and one half s after the matrix stopped flashing the selection was displayed on the second horizontal line (the feedback line; see Figure 1A). Simultaneously, the next character-to-select appeared in parentheses at the end of the text-to-copy. The participant was then given an additional 3.5 s to view the feedback and identify the matrix location of the next character-to-select. Thus, the period of time between the end of one character selection and the beginning of the next total was 5 seconds.
In each session, the nine words of the sentence “Franz jagt im komplett verwahrlosten Taxi quer durch Bayern,” (translation: “Franz chases in a completely shabby taxi across Bavaria”) were each presented as separate text-to-copy runs for a total of 51 selections. A sentence was chosen for copy spelling because spelling words is a more natural task than selecting random characters. Runs were separated by 60-s intervals. Thus, although run lengths varied, each session lasted approximately one hour. During two initial sessions, data were gathered for system calibration (see below) and participants received no feedback. In the subsequent 10 copy-spelling sessions, online feedback was presented as shown in Figure 1A. Online accuracy was recorded for each participant in each of the 10 copy-spelling sessions.
The task and procedures in Phase II were identical to those described above except for the following modifications. After 10–20 characters of copy spelling (to ensure proper system performance and provide additional calibration data) participants were allowed to select characters of their own choice, referred to here as free-spelling (Kübler et al., 2001). In the free-spelling matrix, ‘Bksp’, ‘Sp’, and ‘End’ functions replaced the numerals 1, 5, and the underscore, respectively. Choosing ‘Sp’ ended a word; ‘Bksp’ removed the last character selected; and ‘End’ terminated the run. On start up, the text-to-copy and the feedback lines were blank. As selections were made, they appeared in the feedback line. When the ‘Sp’ function was selected, the completed text was removed from the feedback line and subsequently appeared on the first line (previously the text-to-copy row).
Participant A continued with the standard 6 × 6 matrix while Participants B, D and F used a 7 × 7 matrix that provided 13 additional selections including punctuation marks and the German letters ä, ö and ü (Figure 1B). As a consequence, each sequence length, for these subjects, increased to 2.6 s. The interval between characters was also increased from 5 s in the copy spelling sessions to 8.75 s in the free spelling sessions. The additional time was provided to allow subjects ample time to select the appropriate next character. Finally, the number of sequences, and thus, the time per selection, was reduced during free spelling. Table 2 shows the number of sequences used by each participant for free-spelling.
Stepwise linear discriminant analysis (SWLDA) was used for on- and offline classification in Phase I and Phase II of the study (Draper and Smith, 1981). In this application, SWLDA identifies the suitable discriminant function by adding spatiotemporal features (i.e., the amplitude value at a particular channel location and time sample) to a linear equation based on the features that demonstrate the greatest unique variance. Thus, signal amplitudes at particular times and locations were considered for analysis without explicit consideration of spatial location. The discriminant functions for Phase I online studies were derived using a total of 10 spatiotemporal features from signals recorded at locations Fz, Cz, and Pz. Details of the method are described in Sellers and Donchin (2006) and Sellers et al. (2006a). The discriminant functions for Phase I offline studies expanded the feature space to include up to 60 spatiotemporal features including signals from nine additional electrodes: F3, F4, C3, C4, CP3, CP4, P3, P4, and Oz. Details of this method are described by Krusienski et al. (2006). The discriminant functions used for online classification in Phase II were derived as those in the offline portion of Phase I using signals from locations Fz, Cz, P3, Pz, P4, PO7, PO8, and Oz, as suggested by Krusienski et al. (2006).
In Phase I of the study classification accuracy was defined as the number of character accurately classified by the SWLDA classifier in both the online and offline modes. Each session consisted of 51 copy-spelling characters. Mean classification accuracy across participant and session was entered into the statistical analyses of Phase I. Classification accuracy was defined the same way in Phase II of the study; however, the number of characters that each participant copied was reduced to a minimum of 10. This change was made to allow the participant to have more time to produce unique spontaneous messages.
Our first hypothesis was that the response ERPs to the desired matrix item can be discriminated from the responses to the other matrix items accurately enough to be used for communication. Thus, we examined classification accuracy both on- and offline. Our second hypothesis was that classification accuracy does not change over time. To test both hypotheses, we conducted a two-way analysis of variance (ANOVA) including the factors of analysis mode using online vs. offline conditions on sessions 1 – 10. The online condition reflects actual performance. The offline condition reflects the performance expected if classification coefficients had been derived using the expanded feature space suggested by Krusienski and colleagues (Krusienski et al., 2006).
We found a main effect of analysis mode, but no main effect of session and no interaction. Mean classification rates for the online and offline conditions were 61.98% and 81.51%, respectively (F1,9 = 22.68, P < 0.01). As shown in Figure 2, all six participants performed better in the offline analysis that used the expanded feature space. This result indicates that we could have expected an increase in accuracy of approximately 20% had we initially used the more suitable coefficients online.
In addition, during online performance, the classification accuracy of only two out of six participants was adequate to use a BCI (i.e., greater than 70%). In contrast, the offline analysis suggested that five of the six participants could perform well enough to use a BCI. Figure 3 depicts accuracy across the 10 feedback sessions as a function of the online and offline classification coefficients. The figure illustrates the stability of on- and off-line classification accuracy across time.
Our third hypothesis was that the amplitude or latency or both of the P300 ERP would not change over time. To test this hypothesis we performed a one-way ANOVA on the amplitude and latency of each participant’s P300 response at electrode Cz for copy spelling sessions 1 – 10. Amplitude was defined as the most positive voltage between 200 and 600 ms minus the voltage at 0 ms. Latency was defined as the time point at which the peak amplitude occurred. The mean amplitude of the P300 response averaged across subjects and sessions was 4.06 µV (range 3.66 – 4.26). The amplitude ANOVA did not detect a significant difference across sessions (F5,9 < 1.0, P = 0.93). Figure 4A shows amplitude values averaged across participants for each of the 10 sessions as a proportion of the amplitude for copy spelling session 1. The mean latency of the P300 response averaged across subjects and sessions was 359.43 ms (range 305.83 to 401.5 ms). The latency ANOVA did not detect a significant difference across sessions (F5,9 < 1.0, P = 0.62). Figure 4B shows latency values averaged across participants for each of the 10 sessions. The amplitude and latency data illustrate that the response is quite stable in amplitude and latency for up to 40 weeks. Similar findings have been reported in healthy adults (e.g., Krusienski et al., 2008).
The results of Phase I suggested that classification accuracy could be improved by approximately 20% by choosing classification coefficients derived using the method suggested by Krusienski et al. (2006). Thus, Phase II had three objectives: first, to examine if the expected 20% increase in performance could be realized; second, to increase the speed of the system (if possible); third, to allow the participants to produce messages of their own choice. Spelling speed was increased by reducing the number of sequences, i.e., the number of times each letter flashed. Table 2 gives, for each Phase II participant, a comparison of Phase I and Phase II online performance regarding the number of sequences used, character/min, bits/min, and mean copy spelling accuracy for the copy spelling runs.
Participant A unfortunately passed away during Phase II of the experiment before the expanded feature space was used for the classifier; thus, participant A did not benefit from the classification improvements made between Phase I and Phase II of the study. However, the number of sequences was reduced based on offline analyses to decrease character selection time. The selection time was reduced by 12.6 s per selection for participant A, which resulted in decreased accuracy.
For participants B, D, and F used the expanded feature space suggested by Krusienski et al. (2008) and online accuracy increased by an average of 22%. This increase in online classification accuracy confirms what the offline analysis of Phase I suggested. In addition, the number of sequences was reduced by approximately 50% allowing for nearly twice as many selections per minute (after accounting for the time between successive selections). Moreover, all participants were able to produce unique spontaneous messages. Table 3 shows examples of communicated messages, and the number of errors.
Previously, Sellers and Donchin (2006) found that both healthy volunteers and volunteers with advanced ALS could use a P300-based 4-choice matrix with auditory and/or visual stimuli; and Sellers et al. (2006b) reported that several ALS patients could use the P300-based BCI and that two patients could use a 6×6 matrix to copy text. The present study substantially extends this initial work by demonstrating that individuals with severe paralysis caused by ALS can use a P300-based BCI that employs a variable-sized matrix for cued and spontaneous text production and that performance does not degrade over weeks and months. Furthermore, variability in P300 latency and amplitude over time was modest and did not markedly affect BCI performance.
Results from the offline analyses conducted during Phase I indicated that an increase in the number of EEG channels together with an increase in the total number of features entered into the SWLDA solution would likely result in a 20% increase in classification accuracy. Using this information to update parameters in Phase II resulted in a 22% increase in classification accuracy. These offline analyses were prompted by the findings of two recent studies (Krusienski 2006; 2008). Krusienski et al. (2006) compared classification methods based on Pearson’s correlation method (PCM), Fisher’s linear discriminant (FLD), SWLDA, linear support vector machines (LSVM), and Gaussian support vector machines (GSVM). Their results showed that the FLD and SWLDA classifications performed significantly better than the other methods. The SWLDA method was used for Phase II of the current study.
The other features used in Phase II were suggested by Krusienski et al. (2008). Using a SWLDA classifier, the authors compared among spatial channel selection, reference, decimation, and number of model features, to determine optimal settings for P300 speller data. The results indicated that the only variable that produced a statistically significant improvement in classification accuracy was channel set. Increasing the number of channels from six to 19 did not significantly improve performance (Krusienski et al., 2008). Moreover, pilot studies have shown that increasing the number of channels from 19 to 64 does not result in a significant improvement in classification accuracy. The current results show that increasing the channel set to Fz, Cz, Pz, Oz, P3, P4, PO7, and PO8, from Fz, Cz, and Pz, yields significant improvements in classification accuracy. This result is consistent with the previous work that suggests six to eight channels are optimal for classification accuracy (Krusienski et al., 2006; 2008). In addition, for severely disabled populations, fewer channels should be considered if possible because difficulties encountered with setup and cleanup.
In Phase I of the study, online bit rate was a function of accuracy because the number of sequences was 20 for all participants in all sessions. In Phase II of the study, bit rate averaged 11.3 bits/min when the 5 s interval between characters is taken into account and 15.8 bits/min when the time between characters is removed (participants B, D, and F only). These bit rates are comparable with previous online studies conducted with healthy participants (13.3 bits/min; Serby et al, 2005) or healthy participants and wheelchair bound but otherwise healthy adults (5.5 bits/min; Donchin et al, 2000)1. Several previous P300-based BCI studies have presented bit rate with the time between characters selections removed (e.g., Donchin, et al, 2000; Kaper et al, 2004; Meinicke et al, 2002; Serby et al, 2005); therefore, we have presented it both ways. However, it seems that the appropriate way to report bit rate would be to include the necessary and actual time between character selections. In addition, some previous studies have reported bit rate without regard for accuracy (Kaper et al, 2004; Meinicke et al, 2002), to disregard accuracy is inappropriate because accuracy of at least 70% is needed for effective communication (Sellers et al, 2006a).
Recent data presented by Piccione et al. (2006), using a 4-choice directional task suggested that although individuals with severe paralysis could use a P300-based BCI, there may be a relationship between degree of impairment and performance. In contrast, data from this study showed that the correlation between the degree of impairment (as measured by the ALS Functional Rating Scale (Cedarbaum and Stambler, 1997) and online BCI performance (r = 0.314, P = 0.544) did not reach statistical significance. However, additional study is needed to clarify the impact of individual differences, progress of disease, and diagnosis, on BCI use.
This study is the first to demonstrate that BCI technology can be moved out of the laboratory and into a home environment. The participants in this study received a great deal of support from highly trained lab members; however, one subject chose to withdraw from the study because the process of experimental setup and cleanup was too onerous. For BCI to become a practical technology embraced by many, the required amount of expert supervision and the time required for setup and cleanup must be reduced; improvements in the comfort level and robustness of the equipment are also required (for further discussion see Vaughan et al, 2006). Moreover, at present, the speed of online BCI systems is slow and requires patience from the user to effectively operate the system. In the current study, the mean number of online selections per minute ranged from 1.5 to 4.1, in Phase II of the study. This rate of performance reflects the five second period that was used between trials to allow the participants time to attend to the next selection. It may be possible to significantly reduce this amount of time and increase the number of selection per minute without a reduction in classification accuracy.
The data presented in this study support the hypothesis that individuals severely disabled by ALS can use a P300-based BCI for writing text and that performance was stable for many months in terms of the ERP response, and in terms of classification accuracy. We expect that BCI devices are poised to make significant and meaningful contributions in the daily lives of those who are severely disabled by ALS or other devastating neuromuscular disorders.
We thank Tilman Gaber, Slavica von Hartlieb, Jeroen Lakerveld, Boris Kleber, Seung-Soo Lee and Barbara Wilhelm from the Institute of Medical Psychology and Behavioral Neurobiology, University of Tübingen, Germany for their support in training participants and Gerwin Schalk and Dennis McFarland from the Wadsworth Center, Albany, USA for providing software support. We acknowledge the support and the patience of our participants. We particularly acknowledge the contribution of our dear and esteemed late participant A. We will never forget his tremendous kindness, courage and humor.
The study was supported by the Deutsche Forschungsgemeinschaft (DFG) (SFB 550 / TB5) and the National Institutes of Health (NIH) (Grants HD30146 and EB00856).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1These bit rates are estimates derived from information contained in the respective papers.