|Home | About | Journals | Submit | Contact Us | Français|
Existing observer models developed for studies with the external noise paradigm are strictly only applicable to target detection or identification/discrimination of orthogonal target(s). We elaborated the perceptual template model (PTM) to account for contrast thresholds in identifying non-orthogonal targets. Full contrast psychometric functions were measured in an orientation identification task with four orientation differences across a wide range of external noise levels. We showed that observer performance can be modeled by the elaborated PTM with two templates that correspond to the two stimulus categories. Sampling efficiencies of the human observers were also estimated. The elaborated PTM provides a theoretical framework to characterize joint feature and contrast sensitivity of human observers.
Human performance in detection, discrimination or identification tasks depends on many factors, including stimulus factors such as signal contrast, magnitude and distribution of external noise, and discrimination precision, task factors such as workload, and decision structure, as well as the state of the observer (e.g., attention, fatigue, cognitive factors, etc.). However, these factors have mostly been studied in isolation (but see ). Perceptual sensitivity, with and without manipulations of task factors, is typically measured in two different ways: (1) as contrast threshold at a particular level of stimulus difference or (2) as feature threshold at a constant (usually high) contrast. Using contrast threshold as the dependent measure, contrast sensitivity studies are usually based on detection of a single stimulus or discrimination of a pair of stimuli with a large and fixed feature difference (e.g., discrimination of two Gabors of ±45°; Figure 1a). On the other hand, studies using feature threshold as the dependent variable usually keep stimulus contrast at a relatively high level and often involve small feature differences (e.g., discrimination of two 100%-contrast Gabors of ± θ; Figure 1b). Examples include the hyperacuity and vernier acuity studies[2–4]. At the theoretical level, virtually all the existing observer models for external noise studies have been developed and tested in the contrast domain, for target detection or identification of orthogonal (or nearly orthogonal) targets[2–6]. In this study, we developed and tested a new form of the Perceptual Template Model (the “elaborated PTM”) to consider identification or discrimination of non-orthogonal targets in high precision discriminations. This new development provides a quantitative account of feature difference manipulations and an integrated model for both forms of measurements.
Measures of perceptual performance, such as spatial and/or temporal contrast sensitivity functions[7–15] or feature thresholds[16–21], provide the basic building blocks of our understanding of normal and clinical vision. To explain perceptual performance at a more fundamental level, psychophysicists have constructed observer models based on external noise studies that focus on intrinsic limitations of the observer. By separating intrinsic perceptual limitations of the observer from the characteristics of the input stimuli, an observer model provides a framework to generalize results from a particular experiment to predict observer performance in other tasks using different input stimuli but the same observer characteristics. Indeed, it has been shown that many observer characteristics are invariant across different perceptual tasks. This makes the method very useful because it attributes a wide range of limitations in perceptual sensitivity to a few observer limitations.
Behavioral approaches, including many psychological paradigms, have been developed to reveal internal limitations of the perceptual processes and constrain observer models. One major internal limitation, the variability in perceptual processing, is illustrated by inconsistent performance of the observers when they are given the same stimuli multiple times[2, 23]. Due to information loss during neural transmission, sampling errors from receptors, random bursts of neuronal spikes, and/or internal variation of stimulus representation, the variability can be collectively modeled by equivalent internal noises that produce the degree of inefficiency exhibited by the perceptual system[2, 5, 24–26]. Many behavioral paradigms have been developed to “externalize” the variability of the internal responses by adding external noise to the input stimulus against which to measure the perceptual variabilities. These include various procedures related to critical band masking, the equivalent input noise method[2, 4, 28–30], the double-pass consistency test[2, 23], and the classification image method. We focus on the equivalent input noise method in this article.
The equivalent input noise method was originally developed by engineers to measure the intrinsic noise of electronic amplifiers[32–34] and later adopted by sensory psychologists to measure the internal noise of the perceptual system[4, 29, 30, 35] (see  for a recent review). The basic idea is that the perceptual system functions like a noisy amplifier and the internal noise can be estimated by systematically manipulating the magnitude of the external noise superimposed on signal stimuli and measuring threshold versus external noise (TvC) functions – signal stimulus energy required for an observer to maintain a certain level of performance as a function of the contrast of the external noise.
The equivalent input noise method has been used to reveal internal noise in perceptual processes in a wide range of auditory[31, 36–42] and visual tasks[4, 5, 24, 26, 28–30, 35, 43–52]. The paradigm has also been further developed to investigate mechanisms underlying effects of various cognitive, developmental, and disease states on the perceptual system[5, 6, 53–55].
Observer models developed for external noise studies, for either target detection or for discrimination or identification of orthogonal (or nearly orthogonal) targets, include the linear amplifier model, the induced noise model, the linear amplifier model with decision uncertainty, the induced noise and uncertainty model, and the perceptual template model.
In a linear amplifier model (LAM), perceptual thresholds are determined by two factors: internal additive noise and observer’s sampling efficiency[4, 22]. The LAM models threshold versus external noise contrast (TvC) functions at a single performance level, but generally fails to account for TvCs at different performance levels. Moreover, estimates of internal additive noise and sampling efficiency from the LAM depend on the particular performance criterion level for which contrast thresholds are defined and measured. Various fixes of the LAM have been proposed, including the addition of decision uncertainty, induced noise, both decision uncertainty and induced noise, or non-linear transducer and multiplicative noise. Lu and Dosher conducted a systematic and comprehensive review of the external noise paradigms and all the existing observer models developed to account for performance in external noise studies. They concluded that the five-component perceptual template model, with a perceptual template, a non-linear transducer function, internal additive noise, internal multiplicative noise, and a decision structure, provide the best account of all the existing data on target detection or discrimination of orthogonal targets in the visual domain.
There has been a significant parallel development of observer models in pattern vision[57–69]. For example, in pattern masking studies, instead of external noise, pattern masks (e.g., sine-waves of the same or different frequencies, orientations, etc) are used to probe the properties of the visual system. Observer models developed and tested in pattern masking studies usually consist of multiple low-level channels and a pooling stage that computes a weighted sum of the outputs of the low-level channels[66–68]. In contrast, observer models developed in external noise studies use a simplified notion, the perceptual template, to represent the overall sensitivity of the perceptual system, corresponding to the weighted contributions of low-level channels in multi-channel models, without referring explicitly to the low-level visual channels. The benefits of this formulation are that (1) it relies on very few assumptions about the low-level visual channels, and (2) it uses fewer model parameters to describe a large range of data in external noise studies. The downside is that the formulation makes it difficult to model interactions of low-level visual channels. On the other hand, although they focus on the different properties of the visual system (non-linearity versus internal noise, interactions of low-level channels versus a global template), the functional forms of some of the observer models in external noise studies and pattern masking studies are very similar (see ). In this article, we describe our attempt to elaborate an observer model developed for external noise studies.
In this study, data were collected in a Gabor orientation identification task over a wide range of conditions (4 [orientation differences] × 6 [external noise levels] 5 [signal contrasts]) for three observers. This large parametric data set gave us an opportunity to use an ideal observer analysis to estimate human sampling efficiencies over a wide range of experimental conditions. This was done in three different ways. First, we simulated an ideal observer in the all the experimental conditions and estimated sampling efficiencies based on the simulations. These estimated sampling efficiencies depended on the performance level at which contrast threshold was defined because the simulated ideal observer is a linear model that cannot adequately capture the non-linear properties of the human perceptual processes. To illustrate this point, we performed another ideal observer analysis based on the linear amplifier model, which is essentially an ideal observer model. In the LAM, human inefficiencies are attributed to internal additive noise and sampling efficiency relative to the ideal observer. Although traditional ideal observer analysis focuses only on experimental conditions in which external noise is so high that effects of internal noise are ignored, the LAM based analysis includes a wide range of external noise conditions. The method explicitly considers and discounts effects of internal noise on human performance in the estimation of calculation efficiency. The LAM-based ideal observer analysis is consistent with the simulation based analysis because, like the simulation-based ideal observer analysis, the LAM-based ideal observer analysis results in performance level dependent sampling efficiency estimates. Finally, we explored the relationship between the ePTM and ideal observer analysis. By treating sampling efficiency as a model parameter in the more complex ePTM, we obtained performance and task independent estimates of sampling efficiency for each observer.
The goal of this study is to elaborate the perceptual template model to incorporate tasks involving identification or discrimination of non-orthogonal targets. We first describe an experiment that jointly manipulated the magnitude of external noise and the degree of target feature difference. We then present an elaborated PTM (ePTM) that explicitly considers target feature difference and document its ability to account for the experimental data.
Using the method of constant stimuli, we measured full contrast psychometric functions of three observers in an orientation identification task in varying amount of external noise at fovea. Four orientation differences (±3°, ±6°, ±15°, and ±45° from vertical) were examined, separated in mini-blocks in each session across six levels of external noise. Three threshold versus external noise contrast (TvC) functions, at criterion performance levels of 65, 75, and 85% correct, were estimated in each orientation difference condition.
Two naive observers (CB, JS) and the first author (SJ) participated in the experiment. All observers had corrected-to-normal vision and were experienced in psychophysical studies.
The experiment was conducted on a Macintosh Power G4 computer, running MATLAB with Psychtoolbox extensions. All displays were shown on a 17 inch Apple Studio Display monitor with a refresh rate of 120 frames/sec. The screen resolution was set to 640×480. A special circuit was used to produce a monochromatic signal of high grayscale resolution (>12.5 bits). Gray levels were linearized using a psychophysical procedure. The available display contrast ranged from −1.0 to 1.0. All displays were viewed binocularly with natural pupils at a distance of approximately 72 cm. A chinrest was used for observers to maintain head position throughout the experiment.
The signal stimuli were Gaussian-windowed sinusoidal gratings, oriented ±3°, ±6°, ±15°, and ±45° from vertical. The luminance profile of the Gabor stimulus is described by:
where c is the contrast of the Gabor, L0 is the background luminance, set in the middle of the dynamic range of the display (Lmin = 1 cd/m2; Lmax = 55 cd/m2), f = 1.92 c/d is the center spatial frequency of the Gabor, and s = 0.52 deg is the standard deviation of the Gaussian window. The Gabors were rendered on a 64 × 64 pixel grid, extending 2.78° × 2.78° of visual angle.
External noise images were constructed using 2 × 2 pixel elements (0.087° × 0.087°). In every trial, the contrasts of all the noise elements were drawn randomly and independently from the same Gaussian distribution with mean 0 and one of six standard deviations: 0, 0.05, 0.07, 0.12, 0.20 and 0.33. Because the display contrast ranges from −1.0 to 1.0, a sample with standard deviation of 0.33 conforms reasonably well to a Gaussian distribution. Both signal and noise images were centered at fixation.
The method of constant stimuli was used to measure threshold versus external noise contrast (TvC) functions in four Gabor orientation difference conditions (±3°, ±6°, ±15°, and ±45° from vertical). In each external noise condition, the psychometric function for the two-alternative forced-choice identification task was sampled at five signal stimulus contrasts, determined from pilot tests to span the full range of performance levels. There were therefore a total of 4 [orientation differences] × 6 [external noise levels] 5 [signal contrasts] conditions.
To reduce decision uncertainty, the four orientation difference conditions were run in separate mini blocks of 30 trials each. Within each block, there was one trial from each of the 30 [external noise × signal contrast] conditions. Each experimental session consisted of 40 mini-blocks, 10 for each orientation difference condition. The order of trials in each mini-block and the order of mini-blocks were both randomized in each session. All observers ran 10 sessions of 1200 trials, for a total of 12,000 trials or 100 trials per experimental condition.
In the beginning of each mini-block and each trial, observers were reminded of the orientation difference condition by a text string (e.g., “3 deg”) in the center of the display. Each trial began after the observer read the string and pressed the space bar on the computer keyboard. This was followed by a display sequence, consisting of a 500 ms fixation cross, a 8.3 ms external noise image, a 8.3 ms signal image, another 8.3 ms independent external noise image, and a blank screen till the end of response, all presented in the center of the monitor. Observers identified the orientation of the Gabor stimulus, using keys “s”, “d”, or “f” for counterclockwise orientations and “j”, “k”, or “l” for clockwise orientations. The six keys were used to reduce finger errors; observers generally used “f” and “j”. A system beep followed each incorrect response.
A total of 24 psychometric functions were obtained from each observer, one for each orientation difference and external noise condition (Figure 2), sampled at five pre-defined signal contrast levels selected for each external noise condition. Observers exhibited very small overall bias in their choice of the Left/Right responses: 47.5% vs. 52.5%, 51.2% vs. 48.4%, and 50.8% vs. 49.2% for CB, JS, and SJ, respectively.
The psychometric functions were first fit with the Weibull:
where c is the signal contrast, τ is the threshold, η is the slope of the psychometric function, ξ = 0.5 represents the chance performance level, and λ represents observer’s lapse rate. A maximum likelihood procedure was used. The likelihood is defined as a function of the total number of trials Ni, the number of correct trials Ki, and the percent correct predicted by Eq. 2 in each experimental condition, i:
where Π runs across all the experimental conditions for an observer.
Nested-model tests based on χ2 statistics were used to compare constrained (reduced model) and unconstrained (full model) fits to the psychometric functions:
where df = kfull − kreduced.
The following constraints were used in fitting the psychometric functions: (1) Each observer has a single lapse rate (λ) across all the orientation difference and external noise conditions, (2) In each orientation difference condition, the psychometric functions in all the external noise conditions have the same slope (η). The Weibull accounted for 80.1%, 87.7%, and 90.2% of the variance for observers CB, JS, and SJ, respectively. The constrained psychometric function fits are statistically equivalent to the unconstrained models in which an independent slope value was assumed for each external noise level in each orientation difference condition (χ2(20) = 0.3416 and 0.3336 for CB and SJ respectively, and χ2(16) = 0.3484 for JS. p> 0.90 for all observers).
The best fitting Weibull functions are shown as smooth curves in Figure 2. The parameters of the best fitting model are listed in Table 1. The lapse rate was very low (<1.4%) across the board. The two most prominent features of the family of psychometric functions are: (1) In each orientation difference condition, the psychometric functions shifted to the right as the external noise increased, and (2) the slope of the psychometric functions increased as the orientation difference increased from ± 3° to ± 45°.
Contrast thresholds at performance levels of 65%, 75% and 85% correct, corresponding to d′’s of 0.5449, 0.9539, and 1.4657, were computed from the best fitting Weibull functions. The thresholds are plotted as TvC functions in each orientation difference condition in Figure 3. The standard deviation of each threshold was calculated using a re-sampling method[5, 75].
The manipulation of external noise contrast was highly effective. Averaged across discrimination precision conditions and performance levels, thresholds increased 236%, 177%, and 340% from the zero external noise condition to the highest external noise condition for CB, JS, and SJ, respectively (note: the highest external noise condition for JS is lower than that for CB and SJ). Decreasing orientation difference (and thus increasing the discrimination precision) had two effects on the TvC functions. It increased thresholds in all the external noise conditions. Averaged across external noise conditions and observers, thresholds increased 32%, 69%, and 156% from the lowest discrimination precision condition (± 45°) to the highest discrimination precision condition (± 3°). It also increased the threshold ratio between different performance levels for each external noise condition. Averaged across external noise levels and observers, the threshold ratio between 85% correct and 75% correct performance levels increased from 1.2 to 1.4; the threshold ratio between 75% correct and 65% correct performance levels increased from 1.3 to 1.5. In the following section, we develop an elaborated PTM to account for all these effects.
Observer models developed for discrimination or identification in external noise assume the existence of a template tuned to each to-be-identified stimulus. The original PTM as well as all the other observer models in external noise studies were constructed for cases where any single stimulus plausibly activates only one perceptual template (e.g., Gabors of orientation ±45°). Here, we extended the original PTM to include variation in the feature dimension in addition to external noise and signal contrast.
We elaborated the PTM based on the results of an extensive analysis of all the major existing observer models developed in external noise studies, including the linear amplifier model, the induced noise model, and the induced noise plus uncertainty model. The PTM accommodates all the known “standard” properties of data in external noise experiments. It also provides the best qualitative and quantitative account of a full range of representative data sets.
In the PTM, perceptual inefficiencies are attributed to three limitations: internal additive noise that is associated with absolute thresholds in perceptual tasks, internal multiplicative noise that is associated with Weber’s Law behavior of the perceptual system, and perceptual templates tuned to the target stimuli, but that may be broad enough to allow external noise or distracters to affect performance. In the original PTM (Figure 4a), the observer is characterized by four parameters: a gain to the signal stimulus (β), exponent of the non-linear transducer function (γ), internal additive noise (Na), and coefficient of the multiplicative internal noise (Nm). To model two-alternative identification or discrimination of non-orthogonal targets, in the elaborated PTM (ePTM; Figure 4b), we introduce two perceptual templates, one for each of the two stimulus categories. The two templates are assumed to be identical except in the feature dimension under study. In a given trial of a two-alternative forced identification task, a single stimulus is presented and must be identified. The stimulus is better matched to one template TB (x, y, t) (with gain βB), and less well matched to the other template TW (x, y, t) (with gain βW less than βB). For example, if two orientations are to be discriminated, two templates are used and a given target stimulus matches one of the templates relatively closely, and – if the two orientations are sufficiently similar – that target stimulus also matches the other template to some degree because the two templates are also relatively similar. We next describe the components of the ePTM.
The model considers input stimuli that include a signal stimulus (i.e., a Gabor of a certain orientation) embedded in white Gaussian external noise. For a signal stimulus with contrast c superimposed with white Gaussian noise images – images made of pixels whose contrasts are samples of jointly independent, identically distributed Gaussian random variables with mean zero and standard deviation Next, the input stimulus can be expressed as:
where S0 (x, y, t) represents the spatio-temporal pattern of the signal stimulus, and g(x, y, t) represents the various contrasts of an external noise image whose value at a particular point (x, y, t) is drawn from a Gaussian distribution with mean 0 and standard deviation 1.0.
The input stimulus S(x, y, t) is matched to both templates, TB (x, y, t) and TW (x, y, t):
For a given pair of templates and signal stimuli, the values
and are constant; and are Gaussian random variables with mean 0 and a fixed standard deviation σTN. The outputs from template matching can be re-written as:
where 1 (0,1) and 2 (0,1) are two samples from the standard normal distribution. The two samples may be partially correlated if TB (x, y, t) and TW (x, y, t) overlap with each other.
The outputs of the two perceptual templates are then processed by an expansive nonlinear transducer function (Output= sign(Input)| Input |γ1), chosen based on similar choices in pattern vision[76, 77] If a stochastic model were fully implemented, nonlinearities (other than 1.0) would require the inclusion of cross products and consideration of the stochastic properties prior to the nonlinearity. This formulation is complex, and in general stochastic models based on Monte-Carlo simulations are necessary to model the non-linear transducer.
In developing the PTM, and in order to simplify the task of model estimation and fitting, we introduced analytical simplifications of the stochastic model by using the expectations of the random variables in place of the random variables, and ignoring all the cross products. The approach of using analytic simplifications of the full stochastic model in the (analytic) PTM has been validated in various ways. First, we have carried out simulations of the stochastic PTM to show that key properties of the analytic PTM and mechanisms of state change in the analytic PTM are consistent asymptotically with the stochastic model . Second, the distributional assumptions of the signal detection applications were shown to be approximately true of the stochastic PTM. Third, the analytic PTM has proven quite robust in accounting for a wide range of now dozens of studies that have evaluated not just single conditions, but full TvC functions at multiple (usually 2 or 3) criterion threshold levels (proxies for the full psychometric functions)[48, 54] (see  for a review).
We follow the same development in the elaborated PTM and approximate the outputs of the two detectors after the non-linear transducer as:
where 1 (0,1) and 2 (0,1) are two samples from the standard normal distribution. Generally absorbed in later normalization, F(γ1) is a constant that corrects for the effect of nonlinearity on the standard deviation.
Because in behavioral studies, the values of MB, MW, and NextσTN F(γ1) can only be known to a constant, without losing any generality, we normalized everything relative to σTNF(γ1). This essentially sets σTNF(γ1)=1, that is, the total gain of the perceptual templates (integrated over space and time) to 1.0. We define:
In this formulation, the definition of βB and βW depends on F(γ1), which is a function of γ1. In situations in which a single γ1 is involved, F(γ 1) is just a correction factor on the absolute value of βB and βW. In those few situations in which multiple γ1’s are involved, F(γ1) for the different γ1’s must be explicitly considered in the modeling process. Most situations in which the PTM has been evaluated have constant γ1[53–55, 81, 82].
For two templates with gains βB and βW, the variations in YB2 and YW2 are partially correlated. When the two templates cease being well approximated as orthogonal, and have more overlap, i.e., when βW is significantly greater than 0, the response to external noise will become more similar as well. We have simulated a stochastic version of the ePTM and examined the covariance between the outputs of the two templates after the non-linear transducer. We found that, over a large range of signal contrast levels (0 to 100%), template overlaps (±1 to ±45 deg), and γ1’s (1.0 to 3.0), the effective variance of (YB2 − YW2) can be corrected by a factor : the correction factor accounted for 95.3% of the variance in the simulation study. Therefore, if the perceptual system can utilize the partial correlation of the templates’ response to the external noise in decision-making, then the effective variance of the external noise should be corrected by a factor of .
The model posits that each detector has independent internal additive and multiplicative noise. In both detectors, the additive noise has mean 0 and standard deviation Na. The variance of the multiplicative noise is a function of the total contrast energy going through each detector. In computing multiplicative noise, the outputs of the two templates are rectified and passed through another non-linear transducer function (Output=| Input |γ2); stimulus energy over a broad range of space, time, and features may be integrated in computing multiplicative noise. The variance of multiplicative noise is proportional to the total stimulus energy in each detector:
After adding the internal additive and multiplicative noises, the outputs of the two detectors are:
where 3 (0,1), 4 (0,1), 5 (0,1) and 6 (0,1) are independent samples from the standard normal distribution.
We assume a difference rule is used at the decision stage. The outputs of the two detectors, YB3 and YW3 are compared:
In this comparison, the total variance is determined by the variance of all the random variables:
The average signal-to-noise ratio (d′) for the comparison is:
In the special case where γ = γ1 = γ2, corresponding to the situation where the rising portion of the TvC function has a slope of 1.0, we can solve Eq. 14 to obtain threshold signal contrast cτ as a function of external noise contrast Next at a given performance criterion (i.e., d′):
In all the applications of the PTM approach so far, we have found that the PTM with γ = γ1 = γ2 has provided adequate descriptions of the empirical data. In the rest of this article, we will restrict our discussion to this simplified set of PTM’s. The same logic could be followed to understand the properties of PTM’s with γ1 ≠ γ2.
It follows directly from Eq. 15 that, for any given external noise contrast ∀ Next, the threshold signal contrast ratio between two performance criterion levels (corresponding to and ), is:
Thus, the ePTM predicts that threshold signal contrast ratio between two performance criterion levels in any external noise contrast condition is a non-linear function of the corresponding d′’s, independent of the particular external noise level. These ratios are predicted to be independent of the external noise contrast (a testable model property), and form one competitive basis for favoring the PTM over alternative observer models. A full specification of all the parameters of an ePTM requires measurement of TvC functions at three (or more) separate levels of feature differences at each of three (or more) performance levels.
The ePTM is elaborated from the LAM by incorporating additional processing of the stimulus and noise, including the non-linear transducer, and multiplicative noise. If we set γ = 1, Nm = 0, and βW = 0, the ePTM is “reduced” to the LAM, and Eq. 15 becomes:
The LAM was developed as a form of an ideal observer model. If we square both sides of equation 17, we have:
Because βB reflects signal gain of the human observer, we can re-formulate it in terms of the gain βIB of the ideal observer and sampling efficiency υ:
where . If the slope of the TvC function is a, then the efficiency is:
Eq. 19 allows one to estimate LAM sampling efficiency from the slope of the threshold versus external noise functions. Although traditional ideal observer analysis focuses only on experimental conditions in which external noise is so high that the contributions of internal noise can be ignored, the LAM analysis includes a wide range of external noise conditions. The method explicitly considers and discounts effects of internal noise on human performance in the computation of efficiency.
This re-formulation of the LAM also illustrates the relationship of the ePTM to ideal observer analysis and indicates in a parallel development how to estimate sampling efficiency through the ePTM. Essentially, we can reformulate βB in terms of the gain βIB of the ideal observer and sampling efficiency of the human observer υ:
In simple detection tasks, the template of the ideal observer is matched to the signal stimulus. In identification or discrimination tasks, the template of the ideal observer is matched to the signal stimulus – which then yields an ideal computation if the decision rule is ideal[83, 84]. We can apply Eq. 9 to the actual stimuli used in the experiments to compute the gain of the ideal observer.
Although other components of the ePTM, the non-linear transducer, multiplicative noise and the gain of the less-well matched template, also affect human performance, our approach here is to model them explicitly in the ePTM and discount their contributions in estimating human sampling efficiency, just as additive noise is explicitly considered and discounted in the LAM-based ideal observer analysis.
To evaluate the ePTM using the current parametric data set, we tested whether a single model with only βW varying as a function of orientation difference can fit all the TvC functions in all the experimental conditions (with βW =0 in the ±45 condition). The model includes seven parameters, shared Na, Nm, γ, and βB across the orientation difference conditions, and 3 βW ’s for the ±3, ±6, and ±15 deg conditions of the Experiment. Fits of this most reduced seven-parameter model to the data were compared with three more saturated models, including (1) two models with 10 parameters that allowed Na or Nm, in addition to βW, free to vary in the four orientation difference conditions, and (2) one model with thirteen parameters that allowed both Na and Nm free to vary in the four orientation difference conditions. In fitting the ePTM, the standard deviation of external noise was multiplied by to reflect the use of two independent external noise frames in each trial.
A least-square procedure with the following cost function:
where is computed using Eq. 15, and Σ represents summation across three performance levels of all the external noise and orientation difference conditions for an observer, was used to search for the best fitting parameters of each model. The goodness of model fits was gauged by:
where Σ and mean() run across all the experimental conditions for an observer. An F-test for nested models was used to statistically compare the models. For two nested models with kfull and kreduced parameters, the F statistic is defined as:
where df1 = kfull − kreduced, and df2 = N − kfull; N is the number of predicted data points.
The most reduced model, which only allows βW free to vary across discrimination precision conditions, accounted for 97.4%, 93.3%, and 98.9% of the variance for CB, JS, and SJ, respectively. For all three observers, allowing Na and/or Nm free to vary across the four orientation difference conditions did not significantly improve the fits (all p>0.25). We conclude that the most reduced model in which the gain of the less-well matched template varies as a function of orientation difference provides the best account of the TvC functions. The parameters of the best fitting model are listed in Table 2.
In Figure 5, we plotted the average βW/βB of the best fitting model of the three observers as a function of the orientation difference. If we assume that, across the four orientation difference conditions, only the overlap between the better matched and less-well matched templates change but the shapes of the perceptual template remains the same and can be modeled as a Gaussian, we can estimate the bandwidth of the perceptual template by fitting a Gaussian to the data in Figure 5. The resulting half-width bandwidth at half height is 39.5°.
The ePTM without the correction factor of the covariance of the outputs of the two perceptual templates in each orientation difference condition was also evaluated. Although estimates of the model parameters are slightly different, the general qualitative results didn’t change.
The performance of an ideal observer was simulated using the stimuli and tasks in the experiment, with the assumption that the ideal observer has an integration window that is at least 25 ms, which is the duration of the stimulus in each trial of the experiment. The TvC functions at three performance levels (65%, 75%, and 85% correct) for the ideal observer are plotted in Figure 6 as squared contrast threshold versus the variance of the external noise functions. A linear function:
provided an excellent account of these TvC functions (r2= 0.9999). The slopes of the TvC functions μ(Pc | task) for the four orientation difference conditions at 65%, 75%, and 85% correct performance levels are listed in Table 3.
We also re-plotted the TvC functions of the human observers in terms of squared contrast thresholds versus external noise variance in Figure 6. Again, the variance of the external noise was corrected by a factor of 2 to reflect the use of two independent external noise frames in each trial. A linear regression analysis was used to extract the slopes and intercepts of the human TvC functions:
The linear equation (Eq. 25) provided excellent account of the human data, accounting for 99.8%, 99.3% and 99.8% variance for CB, JS, and SJ, respectively. The slopes and intercepts are listed in Table 3.
We then calculated the sampling efficiencies of the human observers using the following definition:
The results are listed in Table 4.
For the three observers in this study, sampling efficiencies ranged from 0.018 to 0.098. In a given orientation difference condition, the estimated sampling efficiency increased with performance level. For example, for observer CB, sampling efficiency = 0.035, 0.050, and 0.066 at 65%, 75% and 85% correct performance levels, respectively, in the ±45 deg condition. The dependence of the estimated sampling efficiency on the performance level reflects a major shortcoming of the conventional ideal observer analysis, which uses a linear model to estimate properties of the often non-linear perceptual processes. We further discuss this point in the next section.
The LAM predicts a linear relationship between squared contrast threshold and the variance of external noise. For non-overlapping stimulus categories (±45 deg), ideal observer analysis based on the efficiency-based formulation of the LAM (Eq. 17c) is identical to simulation based ideal observer analysis. We calculated βIB (Eq. 9) by using two ideal templates that are completely matched to the ±45 deg Gabor stimuli and the exact signal and external noise images used in the study. Because very brief (8.3 ms) external noise and Gabor image frames were used, perfect summation was assumed in the calculation. The result is βIB = 8.05. The d′ values corresponding to 65%, 75%, and 85% correct performance are 0.5449, 0.9539, and 1.4657. The sampling efficiencies were calculated from the slopes of the TvC functions using Eq. 19. For CB, sampling efficiency is 0.032, 0.051, and 0.069 at the 65%, 75% and 85% correct performance levels, respectively. For JS, sampling efficiency is 0.022, 0.044, and 0.072 at the three performance levels. For SJ, sampling efficiency is 0.022, 0.036 and 0.050 at the three performance levels. These values are very similar to those obtained from the simulation-based ideal observer analysis and comparable to estimated sampling efficiencies in the literature, .
Like the estimated sampling efficiencies from the simulation-based ideal observer analysis, the estimated sampling efficiencies from the LAM-based ideal observer analysis varied with performance level. This suggests that both the simulation-based and LAM-based efficiency estimates are not self-coherent. According to the LAM, the ratio between the slopes and intercepts at two different performance levels is equal to the corresponding d′2 ratios. The d′2 ratios between 75% and 65% correct, and between 85% and 75% correct are 3.06 and 2.36, respectively. For the human observers, the relationship between the slopes and intercepts at different performance levels are however inconsistent with the predictions of the LAM (p< 0.005). For our observers, the average ratio of TvC slopes between the 75% and 65% correct, and between 85% and 75% correct performance levels are 1.73 and 1.59 in the ±45 deg condition. Very similar ratios are also obtained for the intercepts. That the observed slope and intercept ratios are much lower than the corresponding d′2 ratios confirms our earlier findings that the LAM is not consistent with the observed threshold ratios between different performance levels (see  for a review). This is a parallel observation to that previously made on threshold ratios.[1, 8, 86] Increased sampling efficiency with performance level is however consistent with predictions of observer models that incorporate decision uncertainty[3, 56], template learning, or transducer non-linearity.
In contrast to the LAM, the ePTM provided an excellent account of observer performance over a wide range of performance levels in this study. Formulating the ePTM with a parallel application as the LAM to understand sampling efficiency within the context of other perceptual inefficiencies such as non-linear transducer and multiplicative noise provides a coherent framework to compare human performance to ideal observer performance.
It is assumed that the optimal template for each stimulus is a matched filter, and that the decision rule (here, difference rule, which is equivalent to a max rule in this case) is also optimal. Using the actual signal and external noise stimuli used in the study, we computed βIM =6.10 (Eq. 9). From the values of βB ’s of the best fitting ePTM, the estimated sampling efficiency, which accounts for performance in all three criteria, is 0.030, 0.019, and 0.031 for CB, JS, and SJ, respectively.
In the ePTM-based ideal observer analysis, we treated sampling efficiency as a model parameter in the more complex ePTM and estimated it in the context of the model. This yields a single, consistent sampling efficiency across all the performance levels and experimental conditions for each observer.
All the existing observer models for external noise studies have been developed in the context of target detection or discrimination or identification of orthogonal (or nearly orthogonal) targets. In this study, we elaborated and tested a new form of the Perceptual Template Model (the “ePTM”) to consider identification or discrimination of non-orthogonal targets required in high precision discriminations, and the treatment of feature difference thresholds. Using the method of constant stimuli, we collected full contrast psychometric functions from three observers in an orientation identification task at fovea in four orientation difference conditions (±3°, ±6°, ±15°, and ±45° from vertical) and across a wide range of external noise levels. We showed that the families of TvC functions in the four orientation difference conditions exhibited some very regular properties. The simplest elaboration of the PTM, with the same template gain to the better matched signal stimulus (βB), non-linearity (γ), internal additive noise (Na), and coefficient for multiplicative noise (Nm) but varying gains of the less well matched template (βW) across the orientation difference conditions, provided the best fit to all the data, accounting for 93.3–98.9% of the variance. Sampling efficiency of human observers was also estimated from the best fitting ePTM.
From the gains of the perceptual templates in different orientation difference conditions, we found that the perceptual templates are broadly tuned in orientation – the orientation bandwidth of the perceptual template is about 39.5 deg and there is considerable overlap between the templates in relatively high discrimination precision conditions. Based on Fourier analysis, the half-height half width of the Gabor signal used in this study is 39.5 deg. The estimated orientation bandwidth of the perceptual template matches very well with that of the Gabor stimuli. The close match of the orientation bandwidth of the perceptual template with that of the Gabor stimuli suggest that observers used near optimal weights of the visual information in the stimulus, supporting the notion of matched filters in visual recognition[88, 89]. However, their sampling efficiency is very low. Similar results have been obtained by others.
The estimated bandwidth of the perceptual template in the current psychophysical study reflects the orientation bandwidth at the overall observer level. It is much broader than that of single neurons in early visual cortical areas[91, 92]. For example, the average tuning width for orientation was about 14° in a single cell study of cat cortex. Another study by Campbell and Kullikowski  also found that the masking effect of one grating on another differed in orientation by approximately 12° ~ 15°. On the other hand, a good deal of psychophysics research[76, 77, 83, 95–97] has demonstrated that the human visual system is exquisitely sensitive to the orientation of lines or gratings. For example, in a line orientation task identification task, Westheimer found that the best thresholds are around 0.2° ~ 0.8°, 0.4° ~ 0.8°, and 0.17°.
Several approaches have been proposed to resolve the apparent discrepancy between broad orientation tuning of cortical neurons (10° ~ 20°) and acute human orientation discrimination threshold (0.2° ~ 0.8°)[18, 77, 86, 100, 101]. For example, Geisler  proposed an ideal detector model based on the retinal signal for a hyperacuity task and the cone sampling mosaic of the retina. Westheimer et al. assumed that, while detection is determined by the most excited orientation-tuned neural element, the sharpness of supra-threshold orientation discrimination is determined by the relative activities of two or more broadly-tuned orientation-sensitive neural elements signaling the difference among these activities. This idea has been framed in both the opponent-process and line-element[102, 103] formulations. These two formulations share the same idea that orientation discrimination is not limited by the bandwidth of the broadly-selective neural elements, but by a combination of their noise levels and the shape of their sensitivity curves (specifically, by the maximum slope difference). Regan and Beverly  made a clear demonstration that a detector that is most sensitive for detecting faint stimuli near its preferred orientation contributes either almost nothing or mere noise to the discrimination of subtle orientation differences around its preferred orientation (since the width of orientation tuning curve is broad). They proposed that one possible way to detect these orientations is to compare relative responses from neighboring detectors. The idea was supported by Waugh et al.  who found a bimodal curve with distinct peaks at about 10° on either side of the center line orientation in a vernier task masked by one-dimensional visual noise. The idea has also found support in physiological research[93, 104]. For example, Bradley et al.  measured the minimum difference in stimulus orientation and spatial frequency that can produce reliable changes in the response of individual neurons in cat visual cortex. They compared these values with those obtained from behavioral thresholds reported in the other experiments. Although the average minimum orientation difference that could be signaled reliably by most cells from their sample was 6.4°, which was well above the behaviorally determined thresholds, they reported that the most selective cells signaled orientation differences as small as 1.84°, which are comparable in magnitude to the behaviorally observed thresholds. Most notably, the slope was reduced, and the variability was maximal near the peak of the tuning function. Therefore, Bradley and the colleagues concluded that neurons that respond most sensitively to a particular stimulus provide little information about orientation changes in the vicinity of the stimulus. All these results implicate that the mechanisms most sensitive to a minute offset or difference of features are processors (templates, cells, or filters) at neighboring orientations to the mechanisms that detect the target.
The ePTM belongs to the general class of psychophysical models that use rather broadly tuned perceptual processors to achieve high discrimination precisions. In the ePTM, visual stimuli are first processed by perceptual templates that are tuned to the stimuli in the dimension of variation. The overlap between the better-matched and the less-well-matched perceptual templates determines the discrimination precision. The ePTM extends the earlier models by considering non-linearities and internal noise sources of the observer and is capable of modeling full psychometric functions over a wide range of external noise levels and orientation differences.
The ePTM also provides an alternative framework to estimate sampling efficiencies of human observers. Traditionally, ideal observer analysis is only based on the statistical properties of the input stimulus without any consideration of the perceptual process[50, 85]. In this study, conventional simulation-based ideal observer analysis resulted in performance dependent estimates of sampling efficiencies. This is due to the fact that the conventional ideal observer analysis is based on linear models that cannot adequately capture non-linear properties of the perceptual processes. By taking into account the internal additive noise, the LAM-based ideal observer analysis allows us to separate the contributions of internal additive noise from sampling efficiency. The ePTM based ideal observer analysis follows this important direction. By incorporating additional observer inefficiencies other than sampling efficiency, the ePTM based ideal observer analysis provides an excellent account of human performance as well as coherent estimates of sampling efficiency.
The elaborated PTM provides an integrated framework within which to understand the performance limitations of the observer in the two fundamental measurement regimes of contrast thresholds and feature thresholds. Within the new elaborated observer framework, we can characterize human performance in the “perceptual space” – human performance as a joint function of external noise and feature difference. This in turn would allow us to address the question of mechanisms associated with observer state changes (e.g., attention, perceptual learning) in a wide range of tasks involving different manipulations of task difficulty (achievable accuracy), including both the contrast threshold and feature threshold regimes.
The research was supported by the National Eye Institute and the National Institute of Mental Health. We thank Bosco Tjan for discussing the ideal observer analysis with us.