|Home | About | Journals | Submit | Contact Us | Français|
The visual accessibility of a space refers to the effectiveness with which vision can be used to travel safely through the space. For people with low vision, the detection of steps and ramps is an important component of visual accessibility. We used ramps and steps as visual targets to examine the interacting effects of lighting, object geometry, contrast, viewing distance and spatial resolution. Wooden staging was used to construct a sidewalk with transitions to ramps or steps. 48 normally sighted subjects viewed the sidewalk monocularly through acuity-reducing goggles, and made recognition judgments about the presence of the ramps or steps. The effects of variation in lighting were milder than expected. Performance declined for the largest viewing distance, but exhibited a surprising reversal for nearer viewing. Of relevance to pedestrian safety, the step up was more visible than the step down. We developed a probabilistic cue model to explain the pattern of target confusions. Cues determined by discontinuities in the edge contours of the sidewalk at the transition to the targets were vulnerable to changes in viewing conditions. Cues associated with the height in the picture plane of the targets were more robust.
The visual accessibility of a space refers to the effectiveness with which vision can be used to travel safely through the space and to pursue the intended activities in the space. Our long-term goal is to provide tools to enable the design of safe environments for the mobility of low-vision individuals and to enhance safety for others, including older people with normal vision, who may need to operate under low luminance, glare and other visually challenging conditions. A long-term goal of our research is the development of a computer-based design tool in which complex, real-world environments (such as a hotel lobby, large classroom, or hospital reception area) could be simulated with sufficient accuracy to predict the visibility of key landmarks or obstacles under a variety of natural and artificial lighting conditions.
This paper reports on our study of the detection and recognition of single steps (up or down) and ramps in a simple, indoor environment. Subjects had normal vision, but made judgments under conditions of blur simulating reduced acuity. Our goal was to explore the interacting effects of lighting direction, target/background contrast, viewing distance, and blur. We conducted our psychophysical measurements in a real space, rather than simulating stimuli on a computer screen or in a virtual environment, to ensure that we captured the complexity of the real world. We reasoned that it is important to understand the visual cues and other factors determining visibility of ground-plane irregularities in a simple real-world space before attempting to generalize the analysis to a wider range of realistic environments and the performance of visually impaired subjects.
It is often difficult for a normally sighted person to judge when features, such as steps, are hard to see because of the complex interactions between lighting, the geometry of the feature and its surface material. A feature that is easy to see from one viewpoint under diffuse lighting might “disappear” in directional lighting, or one that is easy to see under directional lighting might not be seen under diffuse lighting. Brabyn, Schneck, Haegerstrom-Portnoy & Lott (2004) presented some compelling photos illustrating the effects of mild contrast reduction and glare on face images and everyday sidewalk and driving scenes. Their goal was to simulate the milder visual impairments of the normal aging eye, rather than the more severe loss of spatial resolution typical of low vision. They pointed out that it is difficult to imagine or predict the nature of the substantial functional deficits associated with these forms of mild visual impairment. Arditi and Brabyn (2000) have identified some practical measures for enhancing visual accessibility, such as placing high-contrast strips at the top of stairs.
With the exception of Ludt and Goodrich (2002) and Goodrich and Ludt (2003), most of the low-vision research on hazards and obstacles has focused on avoiding contact with obstacles while moving through a cluttered space. The past work on obstacle avoidance has concentrated on the influence of three key measures of visual function: acuity, contrast sensitivity, and visual field. The results have usually shown that acuity level is not very important, contrast sensitivity is somewhat important, and the total extent of the visual field is of major importance (Marron & Bailey, 1982; Long, Rieser, & Hill, 1990; Lovie-Kitchin, Mainstone, Robinson, & Brown, 1990; Haymes, Guest, Heyes, & Johnston, 1996; Kuyk, Elliot, & Fuhr, 1998). As demonstrated by Ludt and Goodrich, safety depends critically on the ability to reliably identify potential hazards from a distance. The visual demands of obstacle recognition at a distance are likely to place greater demands on acuity than is the case for avoiding contact with nearby objects and surfaces. The varied and complex lighting present in real architectural spaces is also likely to impact low vision performance in ways not apparent in empirical studies done in more controlled settings.
The importance of the visual accessibility of environments, particularly ramps and steps, is further emphasized by the large literature showing associations between vision and falls or other accidents in the elderly. For instance, there are associations between reductions in binocularity, contrast sensitivity, acuity and visual field size and the occurrence of falls and hip fractures in the elderly (Lord & Dayhew, 2001; Ivers, Cumming, Mitchell, & Attebo, 1998; Klein, Klein, Lee, and Cruickshanks, 1998). Poor vision is implicated in falls in specific environments including nursing homes (Rubenstein, Josephson, & Osterweil, 1996), and on stairs (Archea, 1985). Visual impairment is also associated with reduced postural stability which increases the likelihood of falls on uneven surfaces (Ray, Horvat Croce, Mason, & Wolf, 2008).
Our test bed was a sidewalk, built in an indoor classroom (Fig. 1).
The sidewalk was interrupted at a known transition point by a Step Up, Step Down, Ramp Up, or Ramp Down, or was not interrupted but remained Flat (Fig. 2). Subjects viewed the transition point from distances of 5, 10 or 20 ft. They wore blurring goggles that reduced effective acuity to Snellen equivalents of about 20/135 (Single-Blur) or 20/900 (Double-Blur). The subject’s task was to identify the target (5-alternative forced choice).
Through introspection, we identified a set of cues useful for distinguishing among the five targets. These cues are illustrated in Fig. 3. Panel A shows two cues for Step Up—the luminance contrast marking the transition from sidewalk to riser, and the Kink in the boundary contour of the sidewalk. Panel B shows a cue for Step Down—the L-Junction in the boundary contour of the sidewalk. Panel C shows a cue for Ramp—Up the Bend in the bounding contour associated with the transition from sidewalk to ramp. A bend in the opposite direction is a cue for Ramp Down. Another cue for distinguishing among the targets is the Height in the Picture Plane of the horizontal bounding contour between the far edge of the target and the wall behind it. There are three values for this picture-height cue: high for Step Up and Ramp Up, Low for Step Down and Ramp Down, and intermediate for the Flat target.
The visibility of these cues depends on the contrast of the boundary contours of our five targets, and in some cases (such as the L-Junction for Step Down) on the angular subtense of a local geometrical feature. Boundary contrast is affected by lighting direction and the contrast between the targets and their backgrounds. Visibility of the geometrical features is affected by viewing distance and acuity (blur). These considerations motivated our empirical interest in the effects of lighting arrangement, stimulus contrast, viewing distance and extent of blur.
For purposes of theoretical modeling, we define cue visibility as the probability of detecting and using the cues in a recognition judgment. Following presentation of our empirical results, we will describe a Bayesian analysis that interprets the data in terms of the probability of detection for these cues.
All experiments were conducted in a large windowless 33.25 by 18.58 ft (10.13 by 5.66 m) classroom in the basement of Elliott Hall on the campus of the University of Minnesota. Fig. 4 shows a schematic drawing.
Hardboard deck portable stage risers were used to construct a sidewalk, 4 ft wide by 24.5 ft long (1.3 m by 7.5 m), elevated 16 in (0.4 m) above the floor. The sidewalk was painted with Valspar satin light gray porch and floor enamel.
One of five possible targets formed a continuation of the sidewalk at its south end. Fig. 2 shows the five targets: Step Up, Step Down, Ramp Up, Ramp Down, and the Flat continuation of the sidewalk. The five targets were formed by arrangements of a 4 ft by 8 ft (1.2 m by 4.3 m) by 2 in thick rectangular panel of expanded polystyrene (EPS), and covered with the same gray paint. An additional small block of painted EPS was glued on the near (viewable) end of the EPS panel to create the front riser seen in the Step Up condition. The polystyrene surface of the target panel and the wooden surface of the sidewalk could be distinguished visually by fine texture differences with normal vision, but were indistinguishable under blur.
The five targets were configurations of the polystyrene panel produced by raising (or lowering) one or both of its ends by 7 in (18 cm) above or below the level of the wooden sidewalk using motorized scissor jacks. To produce a nearly seamless transition between the end of the sidewalk and the target panel, a wedge-shaped block of EPS was fixed to the end of the sidewalk, and a quarter inch sheet of hardboard material was laid over the viewing end of the sidewalk and the extending EPS wedge. Precise alignment of the stimuli was facilitated by two remote controlled laser diodes mounted on the target panel, and laser targets placed several feet away from the lasers. The experimenter changed targets between trials in about 20 sec by operating the jacks with a controller.
The classroom floor, far wall and right-hand wall formed the visual background for the targets. They were paneled with sections of polystyrene, painted gray to match the targets and sidewalk or painted black (Valspar interior satin dark kettle black acrylic latex) to form a high-contrast boundary with the targets.
There were three lighting arrangements —Overhead, and Near and Far Windows. Fig. 5 shows examples of the three light sources and two contrast conditions for the Step Up target.
Overhead lighting was produced by the room’s four rows of three 2 by 4 ft luminaries (recessed acrylic prismatic 4 lamp SP41 fluorescent). Overhead lighting produced a luminance of approximately 68 cd/m2 on the gray sidewalk and target panel.
There were two “window” conditions, in which the room lighting was turned off and an artificial window was placed at the Near and Far locations indicated in Figures 1 and and4.4. Artificial windows were constructed from sheet metal boxes with a 36 by 36 in aperture, containing vertical acrylic diffusers with 12 lamp SP65 fluorescents that were 10 in behind the diffusers. The insides of the boxes were painted flat white. The center of each window was 5.08 ft (1.55 m) above the floor. The mean luminance of the windows was 785 cd/m2.
Of the vast number of possible lighting arrangements, we chose the Overhead condition as representative of ambient room illumination. We chose the window conditions as representative of directional room lighting (e.g., a room with a north-facing window and a daytime view of a featureless gray sky, with the Near and Far locations intended to represent directional lighting in front of, or behind, the target).
Stimulus lighting was documented by high-dynamic-range (HDR) images based on multiple photographs using the method of Debevec and Malik (1997). Photographs were taken with a Nikon D80 digital Camera, with an 18–135 mm zoom lens set to 18 mm, and tethered to a laptop computer and Nikon’s Camera Control Pro 2 software. A Minolta CS100 Chroma Meter was used for photometric calibration.
Fig. 6 shows the image locations from which luminance values were sampled for contrast calculations. Contrast values across boundaries were computed between nearby luminance samples L1 and L2 using the Michelson formula C = (L1 − L2)/(L1 + L2), and are listed in Table 1. Note, for example, that for Overhead lighting and the Black Background, most of the contrast values on the bounding contours of the targets were quite high (see Table 1), ranging from 0.61 to 0.86. The exception was the contrast across the boundary between the target and the sidewalk, labeled “Front” in Table 1. Except for the Step Up condition, these contrasts were 0.05 or less. Consequently, these low-contrast features were below threshold when viewed through the blur goggles. For Step Up, the corresponding contrast was 0.32 and was usually visible through the blur goggles. For the Gray Background and Overhead lighting, the contrast across the bounding contours was much lower, and did not exceed about 0.3.
48 normally sighted young adults were assigned to one of four groups, with 12 members per group, defined by two conditions (either Gray or Black Background surrounding the gray targets) and two levels of blur (Single-Blur and Double-Blur). Group characteristics are given in Table 2. Each subject participated in one session lasting from two to three hours. Informed consent was obtained in accordance with procedures approved by the University of Minnesota’s IRB.
Subjects viewed the targets from a seated position on the sidewalk at distances of 5, 10 and 20 ft (1.5, 3.0 and 6.1 m) from the transition point between sidewalk and target.
Viewing was monocular with the dominant eye (determined using an aiming task). The other eye was covered with an opaque lens blank. Two levels of blur were produced with Bangerter Occlusion Foils (Odell, Leske, Hatt, Adams, & Holmes, 2008) attached to one or both sides of a clear acrylic lens, and mounted in a welding goggle frame.
Acuity with and without blur was measured with the Lighthouse Distance Visual Acuity Chart. Table 2 shows the mean values for the four groups. Contrast sensitivity (Pelli-Robson chart) through the blur foils was estimated psychophysically to be 0.8 (Single-Blur) and 0.6 (Double-Blur). Luminance was attenuated by about a factor of two through the blur foils.
A cylindrical, black, acrylic viewing tube was attached to the front of the goggles. The tube served to reduce glare, largely blocking direct illumination from the overhead and artificial window lighting when the subject viewed the targets. The tube reduced the field of view from about 48° to 33°. We verified that this method of glare reduction enhanced performance. In a control condition (10-ft viewing distance), there was a reduction in performance across groups from 76.9% correct target recognition with the tube in place to 64.2% correct without the tube.
Prior to testing, subjects were shown the five targets without blur. For each lighting by distance by background contrast condition, subjects were shown the five targets again.
A trial consisted of the presentation of one of the five targets. Subjects were allowed up to 4 sec to view the target. They were then instructed to identify the target, guessing if necessary (5-alternative forced choice.) Subjects also gave a confidence rating on each response, from 1 (no idea, complete guess) to 5 (very certain). These confidence ratings will not be discussed in this paper.
Between trials, the subject was asked to turn his/her head to face the wall on the right to avoid seeing the placement of the next target. Noise-reducing earmuffs and auditory white noise played through headphones were used to mask auditory cues associated with the change of targets.
Within a group, testing was blocked by viewing distance and lighting condition. Each subject completed 90 trials—two trials for each of five targets for 3 lighting conditions (Overhead, Near Window and Far Window), and three distances (5 ft/1.5 m, 10 ft/3.0 m, and 20 ft/6.1 m.)
We report accuracy for target identification (% correct) for the various conditions tested. Because there were five targets, chance accuracy was 20% correct. We also present confusion matrices for the five targets.
Accuracy data were arcsine-transformed prior to statistical analysis to achieve normality of the group data. We conducted a repeated-measure analysis of variance (ANOVA) on the transformed accuracy data, with two between-subject factors–background color (Black or Gray), and blur (Single or Double)–and two within-subject factors—lighting (Overhead, Near and Far Windows) and viewing distance (5, 10 and 20 ft). We list the significant effects:
These differences will be discussed in the following subsections.
Fig. 7 shows recognition accuracy for the five targets for each of the four groups. Combined across groups, the best target-recognition performance was for Step Up (89.9%) and the worst was for Ramp Down (63.2%).
Table 3 shows the target/response confusion matrix for data combined across groups and conditions. Target recognition accuracy is shown by the bolded diagonal values in the matrix. The off-diagonal values show the pattern of confusions. Values in parentheses are model fits as described in the Cue Analysis section below.
Each target was presented on 20% of the trials. There was a small but significant deviation from an equal distribution of responses across targets (F (4,176) = 30.66, p < .0005). The Flat target had the highest response rate (25.9%), and the others had nearly equal response rates, averaging 18.5%. The excessive use of the Flat response is explained by the cue analysis described later.
The four groups were defined by two levels of blur, and two background colors. Single-Blur resulted in an equivalent Snellen acuity of about 20/135 (Table 2). This acuity level would be considered moderate low vision, and better than the criterion acuity for legal blindness (20/200). Double-Blur had a Snellen equivalent of about 20/900, and would be considered severe low vision. The Black Background formed a high-contrast bounding contour with the gray targets (contrasts typically in the range 0.5 to 0.85, Table 1), while the Gray Background formed a much lower contrast bounding contour (typically 0.3 or less).
As expected, there were main effects of both blur level and background color (ANOVA statistics given above). The impact of blur exceeded the impact of background color. There was no significant blur by background interaction. Table 4 shows the target/response confusion matrices for each group.
There was a strong effect of blur level. For the high-contrast targets (Black Background), overall performance dropped from 88% correct for Single-Blur to 66% for Double-Blur. Inspection of the corresponding confusion matrices (Table 4: A & C) reveals a major contributor to this difference. Step Up is highly recognizable with Single-Blur (100%), but with Double-Blur, it is confused significantly often with Ramp Up. In the Double-Blur conditions, Flat was frequently confused with the other targets, implying that all distinguishing cues for the other targets were impaired.
There was a more modest effect of background contrast (see the confusion matrices, Table 4: A & B). For the Single-Blur targets, overall performance dropped from 88% correct for the high-contrast (Black Background) to 81% for the low-contrast conditions. In the low-contrast condition (Gray Background), accuracy for the Flat target dropped because of more confusions with Ramp Up and Ramp Down. Step Down was confused more often with Flat and Ramp Down. These effects were probably due to diminished visibility of the subtle “bend” cues for the ramps.
For the group tested under the most difficult viewing conditions (Double-Blur and the low-contrast Gray Background), overall performance was quite low (51%), but still above the chance level of 20%. From the confusion matrix (Table 4: D), we can see that even the usually visible Step Up had an accuracy of only 77.8%. The other targets were all close to or less than 50%.
Fig. 8 shows performance for the three lighting conditions (Overhead, Near Window, and Far Window) and four groups.
The differences due to lighting, although statistically significant (ANOVA statistics given above), were smaller than expected. The Overhead and Near Window conditions had similar overall mean performance levels of 70.7% and 69.8% respectively. The Far Window had slightly higher performance at 74.3%.
There was an interaction with background color (ANOVA statistics given above). For the two low-contrast groups (Gray Background), performance was better for the Far Window than for the other two lighting conditions. Inspection of the confusion matrices (not shown) revealed that performance for the Step Up target was substantially better for these groups in the Far Window condition. In this condition, the light source was located beyond the step so that the riser was not directly illuminated and appeared dark to the subject. This yielded a high value for the transition-contrast cue. Specifically, the Michelson contrast values for the Step Up in the three lighting conditions were Far Window: 0.72, Overhead: 0.34, and Near Window: 0.20. For the group with the black, high-contrast background and Single-Blur, there was no effect of lighting condition.
Fig. 9 shows that overall performance was similar at 5 ft (76.0%) and 10 ft (76.9%), and dropped at 20 ft (61.8%).
Unsurprisingly, for each of the groups considered separately, there was a significant effect of viewing distance on recognition accuracy. More surprisingly, for the group with the best viewing conditions (Black, Single-Blur), accuracy was higher at 10 ft (93.3%) than at 5 ft (81.7%). Examination of the confusion matrices shows that this group performed better at 10 ft because of reduced confusions among Flat, Ramp Up and Ramp Down. The Gray Single-Blur group also showed better performance at 10 ft (88.0%) than at 5 ft (84.2%; Fig. 9).
Overall, performance differences due to blur level and contrast increased with distance, and were greater for the blur manipulation than for the contrast manipulation. The exception was at 5 ft, where there was a major difference in performance between the Double-Blur Black condition (80.6%) and the Double-Blur Gray condition (57.8%). ANOVA statistics for the viewing distance by background color by blur level interaction are given above.
It is likely that the distance effects, especially those for the groups with Double-Blur, were related to difficulty in detecting localized cues for target recognition (see the discussion of Cue Analysis below).
The subject’s recognition task was to distinguish between the five targets: Step Up, Step Down, Ramp Up, Ramp Down and Flat. In this section, we present a theoretical analysis linking recognition performance to the visibility of the cues described in the Introduction and illustrated in Fig. 3.
We formulated the problem of recognition in two parts: first, the probabilities that the cues would be detected and available to the subject; and second, that the subject would make optimal use of the information provided by the available cues. By “detected and available”, we mean that the cue is not only above the threshold for detection but also that the subject looks at and notices the cue.
Our modeling goal was to estimate the probabilities of the cues, and to relate these values to the empirical values in the cells of a target-recognition confusion matrix such as Table 3.
To simplify the analysis, we assumed that the Bend Out cue for Ramp Up and the Bend In cue for Ramp Down were equally detectable, and that the three distinct picture-height cues—high, low and intermediate—were equally detectable and not confusable with one another.
This leaves us with five unknown cue probabilities as follows:
The resulting pattern of cue probabilities for the five targets is shown in Table 5.
We assume that the targets are presented with equal probability, and that the subject chooses the most likely target, given the cues available. The computational problem we addressed is to derive the cue probabilities from empirical data taken from a target confusion matrix.
The key is to consider all possible configurations of cues that a subject might observe on a given trial, and the optimal recognition response for each configuration. We begin by compiling a list of all possible cue configurations and their probabilities (Table 6). For example, in the first configuration in the table, the subject detects only the Transition Contrast cue, and not the other two cues for Step Up. There is a nonzero probability of this cue configuration for the Step Up target only. The probability of seeing only this cue is the probability of detecting the Transition Contrast, times the probability of not detecting the Kink in the boundary contour, times the probability of not detecting the high Picture Height cue.
The probability of NOT detecting a cue is just one minus the probability of detecting it. In the following table, we use the notation P′ to designate this complementary value.
Five steps are used in conjunction with Table 6 to estimate the five unknown cue probabilities, and the conditional probabilities represented by cells in a target confusion matrix such as Table 3. Details are provided in Appendix 1.
These steps were carried out for the confusion matrix in Table 3. This analysis produced the estimates for the five cue probabilities shown in the first row of Table 7. These cue probabilities were then plugged into the expressions for the cells of the confusion matrix (see Appendix 1) to produce the values shown in parentheses in the cells of Table 3. These parenthetical values represent the model fits to the confusion-matrix data.
Comparing the pairs of values in the cells of Table 3, we see that the pattern of model values has most of the same qualitative features as the data, but with some notable discrepancies. The following paragraphs discuss some of the details of the confusion matrix and the discrepancies between empirical and model values.
According to the model, if two targets do not have any cues in common, they should not be confusable, and the corresponding off-diagonal cells should be zero. The degree to which this is not true may be an indicator of unknown cues, guessing, or some other type of noise. For example, we expect zeros for the proportion of Step Down responses for target Step Up, and response Ramp Down for target Step Up; instead the cells of the confusion matrix show 1.27% and 0.81% confusions, respectively. These values may imply a non-visual error (“lapse”) rate near 5%. (Assuming “lapse” responses are distributed uniformly across trials, an error rate of 1% due to lapse in one cell of the confusion matrix implies a 5% lapse rate overall.)
The Flat target attracts a substantial number of Ramp Up responses (14.2%) and Ramp Down responses (9.72%), implying that the intermediate picture height for the Flat target is sometimes confused with the high and low picture heights. Confusion of the picture height cues indicates a violation of one of the simplifying assumptions of the model.
Surprisingly, according to the model, the Flat stimulus, with only one potential cue (intermediate picture height), is predicted to have perfect accuracy and no confusions. This is because if the subject sees no cues (or just the intermediate picture height), the most likely target is Flat. The bottom row of the confusion matrix, representing data and predictions for the Flat target, shows the largest discrepancies. But most of the confusions are confined to Ramp Up and Ramp Down. Once again, contrary to the assumptions of the model, this result implies that the intermediate picture height is sometimes confused with the high or low picture height.
In brief, much of the discrepancy between the empirical and model values in the confusion matrix can be accounted for by two factors: there is a baseline “lapse” or error rate near 5%, and there are roughly 10% confusions between the intermediate picture height and either the high or low picture height. Appendix 2 describes corrections to the model to account for these two factors. With these exceptions in mind, the derived probabilities for the five cues and the computation of conditional probabilities provide a good quantitative account of the data in the confusion matrix.
In Table 7, we have computed the sets of cue probabilities for the three light sources and the three viewing distances. In these cases, the cue probabilities were estimated from confusion matrices in which data were collapsed across the four groups and other conditions.
In the Results section, we reported that overall target-recognition performance was slightly higher for the Far Window, compared with the Near Window and Overhead lighting conditions. From Table 7, the advantage for the Far Window is due to higher visibility of the Step Up cues (Transition Contrast and Kink), and the Bend cues for the ramps.
Overall performance was similar at 5 ft and 10 ft (near 76% correct) and dropped at 20 ft (61%). From Table 7, the high probabilities for the Step Up and Step Down cues (Transition Contrast, Kink and L-Junction) at 5 and 10 ft decline substantially at 20 ft. There is less impact of viewing distance on the picture-height cues.
There is a surprising reduction in the probabilities of the bend cues (associated with the ramps) at 5 ft compared with 10 ft. This difference in cue access underlies the observation made in the Results section that the group with best performance (Black Background, Single-Blur) performed better at 10 ft than 5 ft because there were fewer confusions between Flat, Ramp Up and Ramp Down. We speculate that these subjects sometimes ignored the bend cues in the bounding contour of the sidewalk because 1) at the near viewing distance of 5 ft these cues were nearby on the ground plane and 2) perhaps subjects often failed to make the large change in gaze direction (from straight ahead) required to look at these nearby cues.
The model just described is an example of an independent feature model, similar to a naïve Bayes classifier. The visibilities of the cues are assumed to be determined independently. There are many ways this independence assumption could fail. For instance, a subject might use the height in the picture plane cue to narrow the target possibilities to Step Up or Ramp Up, and then test for the presence of Step Up by looking for one of the two diagnostic cues for Step Up.
We have begun exploring visual accessibility by using psychophysical methods to study the visibility of ramps and steps in a simple, indoor, real-world environment. We have measured target recognition under blurry viewing conditions which reduced the effective acuity of normally-sighted subjects to acuities typical of moderate to severe low vision. We used our data, together with a probabilistic model, to estimate how the detectability of target cues varied across the stimulus and viewing conditions.
We learned early on that even in a well-controlled indoor environment, the interactions of lighting, target geometry, surface color, viewing distance and the subject’s vision status (high or low acuity) are exceedingly complex. This lesson reinforced our conclusion that visual accessibility is not reducible to reliable rules of thumb, and is not easily judged “on the fly” by people with normal acuity.
Despite the complications, we believe that some of our detailed results are likely to generalize to other environments. We comment on three examples.
First, a step up is usually more visible than a step down. Although tripping on either is undesirable, failing to see a step down is usually more dangerous than failing to see a step up. This asymmetry in visibility is primarily due to the luminance contrast between the riser of a step up and its contiguous surface planes. This contrast can be enhanced by a directional light source placed beyond the step and diluted by a directional light source in front of the step, as in the Far Window and Near Window conditions respectively.
Fig. 10 illustrates the asymmetry between the visibility of stairs going up and stairs going down. The figure shows the original photos alongside two versions with digital low-pass filtering illustrating the effects of moderate and severe blur. In this example, even with severe blur, the steps up are visible because the sunlit risers appear as high-contrast horizontal bands. The same stairs, seen from above, are invisible in severe blur.
Second, subtle changes in the edge contours of a walkway provide local cues for steps and ramps. Our results show that these cues are quite fragile, being dependent on viewing distance and the contrast between the walkway and its background. Unsurprisingly, these cues become less visible at a long viewing distance (presumably related to acuity limitations). More surprising, sometimes the bends in the profiles associated with ramps become less visible with a very near viewing distance.
Third, a more robust set of cues may be the “height in the picture plane” cues associated with the gaze elevation of the walkway beyond the step or ramp transition. These cues are likely to be closer to the straight-ahead viewing direction and less dependent on acuity than the local geometric cues discussed in the previous paragraph. The nature of picture-height cues will vary, depending on the length of the walkway beyond the step/ramp transition.
In this study, we have relied on normally sighted subjects blurred to two levels of acuity. There are two obvious future steps. One is to extend the research to people with low vision. This will inevitably include consideration of visual field size, a factor not addressed in the present study. A second step is to forge a quantitative link between the acuity and/or contrast sensitivity of subjects and the visibility of localized geometric or contrast cues. We can illustrate this linkage with a simple example.
Consider the only unique cue for the Step Down target in our study, the L-shaped discontinuity in the edge profile of the sidewalk (Fig. 3B). The visibility of the L-Junction almost certainly depends on the observer’s acuity and viewing distance. Fig. 11 shows a plot of the angular subtense of the L-Junction as a function of viewing distance. The curve is based on simple trigonometry. The intersections of the horizontal dashed lines with the curve indicate the viewing distances at which people with different acuity levels might possibly see the L-Junction. Somebody with 20/200 acuity might see the cue at 4 m, but somebody with 20/500 acuity could not see the cue until nearly 0 distance, as they step over the edge. People with even poorer acuity might not see the step at all.
We thank Dan Kersten, Rob Shakespeare and Bill Thompson for their advice and suggestions. This research was supported by NIH grant EY017835.
There are five unknown cue probabilities to be estimated—P1, P2, P3, P4 and P5. The cues with probabilities P1 and P2 occur only for Step Up and always occur together. Although we conceive of them as distinct cues, our measurement procedure did not permit us to distinguish their separate effects. For that reason, our estimates of their probabilities are always equal. In principle, stimuli could be designed to estimate separate probabilities for these two cues.
For simplicity of notation, we designate the five targets with the letters A–E as follows:
First, we show how the conditional probabilities representing the cells of the stimulus/response confusion matrix are written in terms of the unknown cue probabilities.
We use lowercase letters to represent a response and uppercase letters to represent the targets. For instance, the conditional probability of responding “Ramp Up” when the target is “Step Up” is written P(c|A).
What is the probability of the correct response “a” given the target A, denoted P(a|A)? The target A could generate any of the 8 cue configurations (1, 2, 3, 4, 5, 6, 13 and 16) shown in the Step Up column of Table 6 and the corresponding probabilities add to 1.0. The six configurations involving P1 and/or P2 all include diagnostic cues and would generate an “a” response. Cue configuration 13, in which only high Picture Height is detected, could result from either target A or target C. An ideal observer would choose the maximum of these two probabilities, i.e.:
When the cue configuration is “high Picture Height”, choose ‘a’ if P′1*P′2*P5 is Max, and choose ‘c’ if P′4*P5 is Max.
Since both contain P5, this term can be factored out, leaving the rule:
When the cue configuration is “high Picture Height”, choose ‘a’ if P′1*P′2 > P′4; otherwise choose ‘c”.
If failing to detect the single diagnostic cue for Ramp Up is more likely than failing to detect both of the diagnostic cues for Step Up, the optimal response would be ‘c’. We assume this to be the case, to be confirmed when the cue probabilities are derived.
Finally, configuration 16, “no cues visible”, could also result from target A. In fact, all five targets could yield configuration 16. But, from the probabilities in the table, it is evident that target E (Flat) will always have the highest probability in this case. So the optimal decision rule will be:
When the cue configuration is “None”, choose Target E.
Given these considerations, the probability of responding ‘a’, given target A, is the sum of the six probabilities associated with the diagnostic cues:
Next, what is the probability of responding ‘b’ given target A? The only cue configuration that is common to B and A is the “None” configuration, and we have already established that the optimal decision is to respond ‘e’ for this configuration. Therefore:
What is the probability of responding ‘c’ given target A? Configuration 13, when only high Picture Height is detected, is the only case in which C can be confused with A. In the above discussion, we assumed that if only high Picture Height is seen, the probability generally favors C. So, the probability of response ‘c’, given target A, is the probability that high Picture Height occurs when A is the target:
What is the probability of response ‘d’ given target A? Since there is no cue configuration common to these two targets, except for the “None” condition:
Finally, the probability that response ‘e’ is given when A is the target is equal to the probability that none of the cues is detected when A is presented (configuration 16):
To summarize, all of the conditional probabilities for A:
The sum of these 5 probabilities is equal to 1.0, yielding a constraint equation:
What is the probability of responding ‘b’ given stimulus B, denoted P(b|B)?
Configurations 7 and 8 have the L-Junction diagnostic cue. Configuration 15 is low Picture Height, which could be produced by targets B or D. B would be more likely if P3 is less than P4 and D would be more likely if P3 is greater than P4. For simplicity, we adopt the case that P3 > P4; that is, that the L-Junction is more detectable than the Bend cues; this seems to be consistent with our observations. Finally, configuration 16 (the “None” case) can also result from target B, but, as discussed above, the optimal choice in this case would be target E. From these considerations, we get the following set of conditional probabilities:
Finally, there is the constraint that summing over all of the probabilities of outcomes, given Target B, must add to 1.0:
Target C can generate four cue configurations. Configurations 9 and 10 have the Bend diagnostic cues. Configuration 13 is high Picture Height, which could also be produced by A; but above, we provisionally decided that Target C will be most probable when configuration high Picture Height occurs. Finally, the null configuration could result from C, but would yield a response E. The resulting conditional probabilities are:
Using similar arguments:
Target E can generate configuration 14 with the cue for intermediate picture height only, or the “None” configuration. In both of these configurations, the optimal response is E. So, the resulting set of conditional probabilities is easy:
This implies a very high accuracy for “Flat”. Whenever Flat occurs, the diagnostic cue intermediate Picture Height is visible or no cue is visible. And when no cue is visible, the optimal choice is Flat.
To illustrate, we solve for the unknown cue probabilities using the confusion matrix in Table 3. This matrix combines across all groups and conditions, and represents the overall pattern of recognition and confusion for the five targets.
We designate the empirical values in the confusion matrix as mij where i is the response given stimulus j. For instance mad is the value from the confusion matrix for response ‘a’ given target D.
From the equations for target A:
Let k = P′1*P′2; then,
From Eq. 3: kP5 = mca
From Eq. 5: k[1 − P5] = mea
Solving for P5:
From the empirical confusion matrix, mca = 0.054, and mea = 0.026, so:
The value for k is a constraint on P1 and P2 as follows:
Eq. 1 also produces a constraint on P1 and P2 which reduces to:
These two constraint equations are symmetric in P1 and P2 and cross approximately at P1 = P2 = 0.70. We will take these values as the estimates for the probabilities of these cues.
First, we assume the case that P3 > P4, that is, that the L-Junction is more detectable than the outward or inward Bend cues. (This assumption will be confirmed by the derived values.)
Where mdb = 0.14, meb = 0.13, so:
This is a second estimate for P5. We average the two estimates for P5 of 0.68 and 0.52, to yield an overall derived value of P5 = 0.60.
And we solve for:
From Eq. 8, we would expect P(b | B) to be equal to P3 = 0.72 which is close to the empirical value mbb of 0.68.
From Eq. 15, P4 = 0.22
From Eq. 17, P4 = 0.43
Once again assuming P3 > P4, and taking P5 = 0.60, we obtain two more estimates for P4:
From Eq. 22, P4 = 0.08.
From Eq. 23, P4 = 0.43.
If we average the four estimates of P4, we obtain an overall mean value of P4 = 0.29.
Summarizing the estimated values of the five unknown probabilities:
P1 = 0.70, Transition Contrast for Step Up. P2 = 0.70, Kink for Step Up.
P3 = 0.72, L-Junction for Step Down. P4 = 0.29, Bend cues.
P5 = 0.60, Height in the Picture Plane.
In the Cue Analysis section, we identified two aspects of the empirical confusion matrices providing evidence for departures from our simple cue model.
First, there is evidence for a “lapse” rate, that is, a small proportion of trials in which subjects guess the target identity without paying attention to the cues. The lapse rate can be estimated from the response rate in cells predicted to have 0 rates by the cue model. For instance for the confusion matrix in Table 3, the model predicts 0 Ramp Down responses for a Step Up target, but the empirical value is 0.81%. Because there are five possible targets, and given the simplest assumption that guessing responses are randomly and uniformly distributed across targets, the value of 0.81% in one cell implies an overall guessing rate five times larger, i.e., 4.05%. This is the most conservative estimate of lapse rate, for the confusion matrix in Table 3.
We can “correct” for the lapse rate, by deriving a modified confusion matrix, without the lapse trials. Let the proportion of lapse trials be G (i.e., 0.0405 in the above example). For each proportion P in each cell of the empirical confusion matrix, the “corrected” value Pc is:
For example, for a cell with p = 0.02 (2% response rate), and a lapse rate of G = 0.0405, the corrected proportion Pc is 0.0124.
We applied the correction formula to all cells in the empirical confusion matrix in Table 3, producing a lapse-corrected confusion matrix. The effect of the lapse correction is to slightly increase high proportions in the confusion matrix and slightly decrease low values.
There are three Picture Height cues: high for Step Up and Ramp Up, intermediate for Flat, and low for Ramp Down and Step Down. In our modeling, we assumed that subjects never confused these three values, and that the three picture height cues functioned independently. But our data suggest some confusions among the three cues. For example, for the Flat stimulus, the only cue is the intermediate Picture Height. But, the confusion matrix in Table 3 indicates that on 14.2% of Flat trials, subjects respond Ramp Up, and on 9.72% of trials they respond Ramp Down.
A simple approach for taking these Picture Height confusions into account is to discount trials in which they occur. Taking 9.72% as a lower bound on the proportion of trials with a confusion, we would subtract 9.72% from the Ramp Down and Ramp Up confusions for Flat, and increase the proportion correct for Flat by 19.44%. This is equivalent to throwing out Flat trials in which the Picture Height confusion occurs.
We further assume that the same 9.72% of confusions occur in which subjects confuse the Flat target with the Ramp Up and Ramp Down stimuli. We correct the confusion matrix by reducing the corresponding rate of confusions with Flat and increase the hit rates for Ramp Up and Ramp Down by 9.72%.
We modified the confusion matrix in Table 3 by implementing the foregoing corrections for lapse rate and picture height confusions. Table 8 shows the corrected values (upper values in each cell). We then applied the cue analysis to the corrected confusion matrix. The values in parentheses are the model fits. In general, the model fits the data in Table 8 better than in Table 3. This is to be expected because the corrected values in Table 8 have, to some degree, discounted the empirical factors deliberately excluded from the model.