|Home | About | Journals | Submit | Contact Us | Français|
One goal of low vision rehabilitation is to improve a patient’s performance of everyday tasks. Performance can be assessed by self report or by evaluating performance directly. In the past decade there has been a great deal of energy devoted to the development of visual functioning questionnaires and the analysis of their results.1-7 The NEI-VFQ-25 is the most commonly used instrument for the evaluation of vision-specific health-related quality of life.1 It consists of a set of questions that are asked of patients regarding their vision and how much difficulty they have performing various activities.
There has been less attention paid to the direct assessment of performance. A number of studies have assessed functional visual performance in normally-sighted subjects, low vision patients, or both and a selection is listed in Table 1.7-18 Some studies have examined one task, but many have assessed a range of activities of daily living including identifying the pertinent information on a bill or food packet, telling the time, face recognition, and mobility.
A standardized battery of functional visual tests could serve two roles. First, performance-based visual function could be used as an independent outcome measure to assess the influence of low vision rehabilitation on a patient’s ability to perform everyday tasks. Second, tests of performance-based visual function could serve to validate self-reported measures of visual function or vision-related quality of life. Some of the tasks that have been used in performance-based assessments are covered in specific NEI-VFQ-25 items. The literature suggests that there is a moderate correlation between quality of life scores and certain objective measures of visual status.19-23 Nonetheless, a better understanding of the relationship between standardized tests of visual performance and quality of life measures is needed (National Advisory Eye Council, 1993). Patients may not accurately assess their own ability to perform everyday tasks 24-26 and estimates may be confounded by cognitive ability and depression.27-29
It has been previously demonstrated that patients’ performance on performance-based assessments are indicative of their ability to complete the tested tasks in real-world situations.15 Performance-based measures would provide clinicians and researchers with a means of evaluating a patient’s functional abilities accurately. The goals of this preliminary study were:
While previous studies have developed similar batteries of tests in larger samples, this study evaluates the repeatability of time taken to perform activities of daily living and the effect of low vision rehabilitation on ability to perform the tasks.
Early in the development of these tests, the authors consulted with several authorities in the field regarding visual function tests, including those used in the NEI’s Salisbury Eye Evaluation (SEE)15, 30 and the NIA’s Women’s Health and Aging Study.31
Tasks were selected based on the following criteria:
A battery of eight tests of functional visual performance of everyday tasks was developed. The test items are:
Although a number of reading tests were available,34-37 none suited our needs. Some tests had only a limited amount of text at each print size and others had multiple print sizes on the same page. The Minnesota Low-Vision Reading Test37 would have been appropriate, but we wanted a block of text of a single size on a separate page, so a new test was developed containing a limited range of print sizes: 8, 4, 2, and 1 M. Words were printed on charts arranged into 20-word paragraphs of five lines with four words per line. All words were chosen randomly, without replacement, from a list of 510 common English words. Each line had one each of 3, 4, 5, and 6 letter words. There was no capitalization or punctuation and words were printed in Times Roman font, black-on-white, on 8.5 × 11 inch paper, with landscape orientation. The different letter sizes were arranged into a set, beginning with 8M and ending with 1M. A total of ten sets were created.
Reading rate was measured using two randomly chosen sets of charts. Testing commenced with the largest print size (8 M) in a given set. Immediately following the completion of the first set, the second set was presented to the subject, for a total of two sets. If a subject could not read a given print size, the smaller sizes were not attempted. Words correct were scored on a separate sheet. If the subject read a word incorrectly, and immediately corrected themselves, he or she was given credit for that word. The subject was timed for each print size and reading rate was later calculated in words per minute (WPM).
The subject was required to find and read aloud a telephone number. Previous studies have required the subject to then dial the number,11, 15, 16, 38, 39 but it was felt that finding the number was the most visually-demanding task and the test was limited to this aspect of the task.7, 17 This also avoided the selection of a specific type of telephone with which the subject may or may not have been familiar.
Two pages from a local telephone directory were reproduced (Ameritech, Columbus area codes 614 & 740, 1996-97 White Pages, pages 710 and 926). The print size of the names and numbers was 10 point. For each trial the subject was required to find and read a telephone number for a name given by the examiner. The name was printed in 80 point Times Roman, read to the subject, and placed in front of the subject for their reference. Four names were available and for each trial the subject had to find two names. The subject was timed for each name. The number read was transcribed and later checked for accuracy.
The subject was required to identify information on a medicine bottle label. This task has been adopted by other authors,7, 17, 29 but excluded by others.16 Dummy prescription medicine bottles were prepared to contain the following information: a generic patient name (Amy or John Smith), a topical agent (hydrocortisone 1% cream), and a dosing schedule (apply to affected area daily, twice, three times, or four times daily). Other information, including a doctor’s name, was also printed in order to simulate a typical label. These bottles were assembled by The Ohio State University College of Pharmacy who informed us that there is no standard way of printing labels. The chosen labels were matte finished white with laser printed instructions in 12 point serif print. Eight different labels were created, and the subject was tested for four randomly chosen bottles. The subject was asked to read aloud the patient’s name and how often the drug is supposed to be used or taken. The subject was timed for each bottle. The information they read was transcribed and later checked for accuracy.
The subject was required to identify information on a utility bill. This task is similar to that used by Turco et al13 and Haymes et al.7 Utility bills sent by local companies (American Electric Power, Columbia Gas, Ameritech, and AT&T Calling Card) were reproduced. These bills were modified to mask the customer name and address. Four different bills were created, and the subject was tested for two chosen at random. The subject was asked to read aloud the amount due and due date. The print size of the relevant information was 16 point. The subject was timed for each bill. The date and amount that they read was transcribed and later checked for accuracy.
The subject was required to identify information on a cooking time on a packet of food. Similar tasks have been developed by other researchers.7, 16, 17, 38 Ferrucci et al. required that the subject also place the food in a microwave oven38, but it was felt that this would be confounded by the level of familiarity with the microwave used. Six packets (Rice-A-Roni, Near East Rice Pilaf, Duncan Hines Angel Food Cake, Jiffy Biscuit Mix, Food Club Noodles and Sauce, and Food Club Onion Soup Mix) were chosen from a local supermarket and the subject was tested for two chosen at random. The print size of the relevant information was between 8 and 16 point. The subject was presented a packet and asked to read aloud the conventional (or stovetop) cooking time, as opposed to the microwave cooking time or other recipes that may have been present on the box. The subject was timed for each packet. The cooking time that they read was transcribed and later checked for accuracy.
The subject was required to select coins to equal a given amount. A number of studies have utilized a money sorting task.7, 8, 13, 16, 17, 38, 39 Some have tested with only paper currency,8, 13 while others have used only coins7, 17, 39 and one group used both.16 It was decided not to use paper currency because of the confounding issue of new and old bills.
Five coins of each denomination (penny, nickel, dime, and quarter) were randomly placed in front of the subject. The subject was then instructed to separate or pick out the coins that equaled a given amount. Ten amounts were used (18, 33, 38, 47, 57, 62, 71, 81, 86, and 90 cents) all of which required a minimum total of 5 coins of 3 different denominations. Two amounts were chosen at random. The subject was timed for each amount. The amount that they selected was transcribed and later checked for accuracy.
The subject was required to identify a playing card in a task similar to that used by Ross et al.16 A standard-sized deck was used with the Jokers, Jacks, Queens, Kings, and Aces removed. After thoroughly shuffling the deck, the subject was sequentially presented with five cards, and asked to identify the number and suit. The subject was not allowed to pick up the card, but could shift his/her position. This test was not timed and was scored only for accuracy.
The subject was required to identify facial expressions. Face recognition has been assessed in a number of studies.7, 10, 12, 14, 40-42 Photos were selected from the “Pictures of Facial Affect” CD-ROM (Consulting Psychologists Press, Palo Alto, CA) used in previous studies12, 14, 41, 42 and printed as life-size on 8.5 × 11 inch white paper. Eight sets of four faces were printed. Each set contained two male and two female faces in four different facial expressions: happy, sad, angry (or disgusted), and frightened (or surprised).
Subjects were presented with a randomly chosen set of four faces at 1 m followed by a different set at 3 m. These test distances were adopted based on our experience 12 that normally-sighted subjects could perform this task at both distances. Only expression recognition was tested in order to simplify the test and to avoid the need for a training session or the use of a key with the names of the characters to be tested.
The subject was provided with a key on which were printed the following words in 80 point Times Roman font: happy, sad, disgusted, angry, confused, frightened, and surprised. This sheet was intended to provide suggestions for the subject should they have difficulty describing the expression. In responding, synonyms were acceptable, e.g. smiling for happy, dejected for sad. Based on our prior experience, the disgusted and angry were considered equivalent as were surprised and frightened.12 This test was not timed and was scored only on accuracy.
In order to evaluate the battery of tests, 14 normally-sighted subjects and 24 low vision subjects participated in a single session. The protocol was approved by The Ohio State University Biomedical Sciences Institutional Review Board and informed consent was obtained from all participants prior to testing. Inclusion criteria for the low vision subjects were as follows: at least 60 years of age with previous adult reading ability; age-related central vision loss with best corrected visual acuity between 20/40 and 20/400, and the ability to use at least one upper extremity. Normally-sighted subjects had to be at least 60 years of age, have best-corrected visual acuity better than 20/40, adult reading ability, and the ability to use at least one upper extremity. All normally-sighted subjects were recruited by word-of-mouth. One low vision subject was a word-of-mouth referral, and the rest were recruited by phone from the Low Vision Service of The Ohio State University College of Optometry. All subjects were reimbursed $20 for their participation. If a subject was unable to travel but willing to participate, all of the testing equipment was taken to the subject’s home, with their permission, and he or she was tested there.
Subjects were first assessed using the following tests: visual acuity, contrast sensitivity, and the NEI-VFQ-25. The subject was then assessed using the battery of functional tests. After a five to ten minute break, the battery of functional tests was repeated with different versions of the test material.
High contrast visual acuity was measured monocularly and binocularly, with habitual correction, using Bailey-Lovie charts at a distance of 4 m.43 When necessary the test distance was halved. Subjects were required to attempt all five letters before continuing to the next line. Visual acuity was scored on a letter by letter basis, and converted to logMAR.44 Only binocular visual acuity was used in analyses. Contrast sensitivity was measured binocularly using the Pelli-Robson Contrast Sensitivity chart45 at 1 m utilizing habitual correction. Contrast Sensitivity was scored on a letter by letter basis in log CS.46 The NEI-VFQ-251 was administered to all subjects verbally. The standard 25 questions were supplemented with items A3 and A4 from the Appendix as these items corresponded to some of the functional tasks (Table 2).
The subject was seated comfortably at a large table and a gray paperboard surface was used. Full room illumination was used and a 50 watt halogen adjustable table lamp was available if the subject used a lamp on a regular basis or the lighting conditions in the testing room warranted increased illumination. This lamp provided an average illumination of 2919 lux on the gray paperboard work surface which had an average luminance of 68.8 cd/m2.
Subjects performed each task using their habitual spectacle correction. Low vision subjects were also encouraged to use any low vision device(s) that they might find suitable for a given task. All tests except face recognition and playing card recognition were timed using a digital stopwatch. The examiner could not coach the subject nor give any indication of his/her accuracy for each trial, however, the subject was given periodic encouragement. A written protocol which included specific instructions for the subject and instructions on timing, recording, and scoring of tests was developed to promote consistency. Neither testing distance nor subject posture was fixed for any of the testing.
A Wilcoxon rank sum test was performed on the time taken for the first administration of each timed task in order to determine if there was a difference in overall performance between the low vision and normally-sighted groups. Ordinal data such as the face recognition and playing card recognition tasks were analyzed using Wilcoxon signed-rank tests to assess differences among trials, and Wilcoxon rank sum tests to assess differences between groups. Signed-rank testing was completed for the Medicine Label task in order to determine whether there was a learning effect for the task. Spearman correlation coefficients were determined for each task and visual acuity and contrast sensitivity. All of the above tests were performed using SAS (Statistical Analysis System) or STATA version 10.0 (StataCorp). Repeatability for each of the timed tasks was assessed by determining the 95% limits of agreement, a method more appropriate for clinical tests than the correlation coefficient.47 The difference between the time taken (or reading rate) for the first and second administration was calculated for each subject. The 95% limits of agreement was then calculated as the mean ± 1.96 × SD. The breadth of the limits of agreement indicates the repeatability of the test; the narrower these limits of agreement, the more repeatable the test.
The responsiveness of the test battery was evaluated subsequently in a pilot randomized clinical trial of the effects of low vision rehabilitation conducted at the University of Alabama Birmingham, The University of Waterloo, University of California at Berkeley, and The Ohio State University. Twenty-six patients with age-related vision loss and best-corrected visual acuity worse than 20/40 were recruited. At an initial visit, informed consent was obtained and baseline data collected including performance on the functional tests. At the conclusion of this visit, the subject was randomized to either an immediate or a delayed intervention. Patients randomized to immediate intervention were scheduled for an intervention visit, to occur within two weeks. The intervention consisted of a single office visit of up to two hours where optical aids were prescribed and training given. Three months after randomization, all patients attended for an outcome visit. Consequently, one group had received the intervention three months prior, and one had received no intervention. After outcome measures were performed, those in the delayed intervention group received the intervention.
The characteristics of the participants are summarized in Table 3. The mean age (± SD) of the normally-sighted subjects was 73.2 ± 8.6 years and the mean age of the low vision subjects was 76.8 ± 7.5 years. One low vision subject was found to be 57 years old after testing had been completed. Women accounted for 71.4% of the normally-sighted subjects and 62.5% of the low vision subjects. The median logMAR visual acuity was +0.13 (20/27; range = 20/15 to 20/35) for the normally-sighted subjects and +0.89 (20/155; range = 20/50 to 20/500) for the low vision subjects.
The proportion of trials performed correctly by each group for each task is shown in Table 4. Normally-sighted subjects could perform all tasks, but made occasional errors. The instructions were easily understood and the tasks performed with little difficulty. The proportion of trials performed correctly by the low vision subjects ranged from 35% for facial expression recognition at 3 m, to 95% for the playing card identification.
Medians, interquartile ranges, and 95% limits of agreement for the reading task are reported in Table 5. Medians, interquartile ranges, and 95% limits of agreement for the other timed tasks are reported in Table 6. In general, low vision patients took around three times longer to complete all tasks and demonstrated more variability in time taken. Wilcoxon rank sum testing confirmed significant differences in time taken between the low vision and normally-sighted groups for all timed tasks (p < 0.007) and for reading rates at all four print sizes (p < 0.001). The asymmetry of the 95% limits of agreement interval for the medicine bottle task indicates that subjects were significantly quicker on the second trial and this was confirmed by a Wilcoxon signed-rank test. Performance on some of the functional tasks was associated with clinical measures of vision in the low vision subjects. Visual acuity, but not contrast sensitivity, was significantly poorer in those subjects unable to perform the telephone number, medicine label, utility bill, and cooking instruction tasks (p < 0.05, Wilcoxon rank sum test). Both visual acuity and contrast sensitivity were significantly poorer in those subjects unable to identify more than half of the facial expressions at 3 m (p < 0.05, Wilcoxon rank sum test).
Twenty-three subjects completed the pilot randomized clinical trial: 11 in the intervention group and 12 in the control group. The results for the functional tests are shown in Table 7. These are pilot data and thus have not been subjected to statistical analysis, nonetheless, some trends are worthy of note. In the treatment group, performance improved most dramatically (from 64% to 95%) for the medicine bottle task. In contrast, face recognition performance changed little, as devices were not prescribed for distance tasks. Most other tasks showed small but meaningful improvement, although only 64% of subjects could successfully locate telephone numbers. Performance in the control group changed little following the three-month delay period, suggesting that this outcome measure may be repeatable in terms of completion rate. Two control subjects showed increased ability to read 1 M print at the outcome visit, most likely due to an updated spectacle prescription.
The battery of tests described here are similar to those developed independently by other authors.7, 17 All functional tasks in our battery could be performed by normally-sighted subjects, although occasional errors were made. Performance varied across tasks in the low vision group. The most difficult task was face recognition at 3 m, but performance approached that of normally-sighted subjects for other tasks, and was not significantly different for playing card recognition. Our low vision subjects, like those of Haymes et al., found the telephone number task among the most challenging. The cooking instructions task was challenging for our low vision group and a similar task was the most difficult for Owsley et al.’s subjects. 17 The battery of tests can be administered by individuals with a range of experience. All data were collected by an optometrist or a second-year optometry student. Approaches that require grading of performance may require an occupational therapist or low vision specialist,7, 26, 39
Even when the low vision subjects were able to perform the task, they did so significantly more slowly than the normally-sighted group, taking around three times as long. The time for completion was significantly different between the two groups for all timed tasks. The time taken to perform many of the tasks was very variable and probably a result of the visual search requirements of the task. Sometimes the subject would begin on the correct side of a food packet or at a location on the telephone directory page close to the required name. On other occasions, the subject had to rotate a box or scan a large part of the page to find the required information. This resulted in poor repeatability for time taken as indicated by the rather broad limits of agreement. In contrast, the reading task which required no visual search, gave acceptable limits of agreement. The limits of agreement were calculated using data from both groups of subjects, resulting in a somewhat skewed distribution and limits of agreement that were sometimes larger than the mean. Thus, the absolute values should be treated with some caution, but the poor repeatability is evident.
Owsley and colleagues have developed two separate groups of tests; a 17 item version intended for use in studies of visual impairment17 and a shorter five-item version for the assessment of cognitive abilities18. Both batteries were evaluated thoroughly and Owsley et al. advocate the use of time to complete the task as the outcome measure. Haymes et al. developed an ordinal scale that incorporates both task completion and accuracy.7 We found time to be highly variable between test sessions, but used subjects with a broader range of visual ability than Owsley et al, who do not report the repeatability of their timed tests.17 Thus we are hesitant to use time as an indicator of an individual’s worsening due to disease progression or improvement due to visual rehabilitation. Nonetheless, there is little overlap between the time taken by the two groups (Table 6), suggesting that these tests are able to distinguish between groups of subjects with different levels of visual ability.
Some of the tasks are ready to be implemented without modification, including the telephone book, utility bill, and coin sorting. The complete battery can be administered to a normally-sighted patient in less than 15 minutes with low vision patients requiring up to 30 minutes. The medicine bottle label task was the only task that showed a practice effect with the low vision subjects’ time improving on the second administration. The data suggest that this effect could be minimized by allowing a single untimed trial to familiarize the subject with the task. Accuracy did not improve significantly (<6%) on the second administration, so allowing the subject to practice would not be necessary if time were not an outcome variable. While the data are not reported here, one of the food packets had cooking instructions that were read quicker, but not more accurately, than the others. This suggests that further standardization may be warranted if time were to be used as an outcome variable. Owsley et al. used three different cans of food and the mean time varied by a factor of two.17 The playing card task failed to discriminate between normally-sighted and low vision subjects, so it was eliminated from the battery of tests used in the pilot randomized clinical trial. Low vision subjects also performed well on the coin sorting task but considerably slower than the normally-sighted subjects. Performance was correlated with visual acuity so the task should probably be retained, as it is the easiest to standardize. It also has a cognitive component that might make it useful in some circumstances. Given that the Telephone Book, Medicine Bottle Label, and Cooking instructions tasks yield the poorest performance among low vision patients and that performance appears to improve substantially in response to low vision rehabilitation, these three tasks might be the most appropriate to incorporate into future studies.
There are limitations to our study. Compared to similar studies, only a modest number of subjects were tested; 38 in the main study and 23 in the pilot randomized clinical trial. This may limit the generalizability of the results. Cognitive factors may have contributed to task performance and variability, and in the repeatability study we did not make any assessment of cognitive status.29 The visual status of any given subject may have been confounded by this and other factors that impact task performance including dexterity. In the pilot randomized clinical trial, all patients had to achieve a minimum score of 21 on the mini-mental state exam in order to be eligible. Our analyses did not include multivariate models of task performance taking into account age, visual acuity, etc., but unlike previous studies,17, 18 it was not our goal to make a comprehensive evaluation of the factors associated with task performance. As stated above, we tested patients with their habitual low vision aids and this undoubtedly confounded the correlations with clinical tests. We did not measure inter-rater repeatability. It is possible that this battery of tests is likely to be less susceptible to inter-rater variability because, unlike other test batteries,7, 16 there is no observer-graded component. Our observers needed only to record the time taken and whether or not a task was successfully completed. This cannot be confirmed, however, without further testing. Finally, we did not control for low vision aid use by our subjects. All low vision subjects had received a thorough examination in the Low Vision Service of The Ohio State University College of Optometry. Although we recorded whether or not the subject elected to use their habitual aid, we did not incorporate this into our data analysis.
In summary, we have developed a battery of tests of functional vision performance intended for the evaluation of low vision rehabilitation in research or in individual patients. We have used the tests in a small, multi-center pilot study of low vision rehabilitation and found performance on some of the tests to improve in comparison to a control group. Further studies are needed to validate the instrument for use in large clinical trials.
Supported by grants R21-EY11502 and T35-EY07151 from the National Eye Institute, National Institutes of Health and the Ohio Lions Eye Research Foundation. The authors thank Dawn DeCarlo, Robert Dister, Robert Kleinstein, and Susan Leat for contributing some of the data for the pilot randomized clinical trial.