We first established whether we could improve performance for identifying crowded letters in observers with amblyopia using the flanked letter training task, which was effective in reducing the spacing limit in the normal periphery 
. Five observers (four with strabismus and one without, ) participated in this training. The performance measurement during training was the proportion correct for identifying the middle letter of trigrams (see Materials and Methods
for details). The stimulus array was, by design very crowded. Initially, on average, observers identified the middle letter correctly only 24% of the time. In contrast, they identified an unflanked letter of the same size ≈95% of the time, indicating a substantial effect of the flankers. Despite substantial individual differences which are typical for perceptual learning, all observers demonstrated improved identification accuracy over the course of training, from an average of 0.24 (proportion correct) in the first training block to 0.38 in the last training block (an average of approximately 60% improvement). Yet, these identification accuracies are still relatively low, and clearly reflect that the crowding task was challenging, even at the end of the training. Training data for individual observers are presented in the top row of . We quantified the improvements during training in three ways. First, we fit each observer's training data with a linear function, and examined whether the slope of the linear function was significantly different from a slope of zero by calculating the t-statistic of the slope (t
slope/standard error of the slope). The t-statistic and the degrees of freedom (number of data points – 2) were then used to determine the p-value. This method allows us to include all the data during training to determine if there was a significant improvement. Using this method, we determined that the slope for four of the five observers in the flanked letter training group was statistically different from zero. The one-in-five observer (20%) who did not show any improvement is similar to the percentage of “non-learners" reported in previous studies 
. Second, based on the fitted linear function, we calculated the expected
performance for the first and the last block of trials and quantified the improvement based on the ratio of these two calculated values. This ratio, averaged across observers, was 0.69±0.15 (95% CI). The third method we adopted to quantify the improvements was to calculate the ratio of the empirical
performance between the first and the last block of trials, akin to comparing performance “before"
training. While this method does not take into account all the training data, it is a standard way to compare improvements due to training especially when comparisons with untrained tasks are to be made (for a review of studies that used this method, refer to 
). Averaged across observers, the ratio between the first and the last block of trials was 0.60±0.19 (95% CI). Regardless of whether the ratio between the first and the last block of training was based on the calculated values from the fitted linear function or from the empirical data, a ratio of 1, meaning that there was no change in performance between the first and the last block of trials, did not fall within the 95% confidence intervals. Therefore, we infer that the improvement was significant at α
Visual characteristics of the 11 observers.
Training data for individual observers.
To determine whether the improvement following the flanked letter training transferred to other untrained visual tasks, we compared four measurements related to various aspects of identifying letters before and after training. These four measurements were: (1) the size limit (visual acuity)
, the smallest letter size that was required for observers to identify single letters at 52% correct; (2) the spacing limit
, the letter separation between adjacent letters such that the performance of identifying the middle letter of trigrams was 52% correct (), representing a measure of the distance over which crowding occurs; (3) the contrast threshold
for identifying single letters; and (4) the size of the visual span profile
, the amount of information of the letter stimuli that was transmitted in a fixation (). These four performance measures utilize similar, highly familiar stimuli (letters) and responses (letter identification), thus minimizing procedural learning. summarizes these comparisons. In each panel, each symbol represents data from an individual observer (red – strabismic amblyopes; green – non-strabismic amblyopes; bowtie symbols – flanked letter training group; circular symbols – isolated letter training group, see later). For panels a–c, data points plotted below the diagonal 1
1 line and in the shaded region represent improvement (values being smaller for post-test than for pre-test); whereas for panel d (size of the visual span), data points plotted above the diagonal 1
1 line and in the shaded region represent improvement. In general, observers for the flanked letter training as a group showed improvement for all these measurements (all the bowtie symbols are in the shaded regions), even though these measurements were not used for training purpose. A paired t-test (t-statistics are given in File S1
) confirmed that these improvements were significant, at the following p-values: (a) size limit, p
0.035; (b) spacing limit, p
0.019; (c) contrast threshold for single letters, p
0.019; (d) size of visual span, p
Proportion of correct responses in identifying flanked letters as a function of center-to-center letter separation in trigrams, for the task of measuring the spacing limit, is plotted for each individual observer.
Proportion of correct responses in all three letters in trigrams, presented at different letter position left and right of fixation, for the task of assessing the visual span.
Comparisons of the post- and pre-test performance for four untrained visual tasks.
We next examined whether the improvements observed as described above were specific to the training task, which consisted of visual stimuli with flankers, as learning to focus solely on the target letter in the presence of flankers could be a fundamentally different task from learning to identify a single letter presented on its own (see 
for a review). To do so, we trained another group of six observers with amblyopia (four with strabismus and two without) using a letter training task that did not have flankers. This task, the “isolated letter training", targeted at improving an aspect of functional vision that is different from the spacing limit. Specifically, the isolated letter training task was designed to improve the contrast sensitivity for near-acuity letters, with an associated improvement in high-contrast visual acuity – i.e., the size limit. Because age may be an important determinant of the magnitude of improvement, we ensured that the average age of observers in this isolated letter training group was similar to that of the flanked letter training group (t-test: p
0.60). The number of sessions and trials of training were identical to those of the flanked letter training group. We tracked the performance measurement during training, the contrast sensitivity (the reciprocal of contrast threshold, the minimum amount of contrast required) for identifying single near-acuity letters (see Materials and Methods
for details). Training data for individual observers of this group are presented in the bottom row of . Similar to the flanked letter training, we quantified the improvements during the isolated letter training in three ways — (1) fitting a linear function to the training data of each observer and examining whether the slope of the linear function differs significantly from zero; (2) comparing the expected
performance (based on the fitted linear function) between the first and the last block of trials; and (3) comparing the empirical
performance between the first and the last block of trials. As shown in , using a linear function fit to the training data, we found that the slope for four of the six observers was statistically different from zero. This proportion of observers who did not show improvements was again, similar to those reported in previous studies 
. When comparing the expected
performance between the first and the last block of trials, the ratio between the two blocks averaged 0.80±0.23 (95% CI). This method yielded 95% confidence intervals that just marginally included a ratio of 1, implying that the improvement did not reach statistical significance at the 0.05 confidence level. When we computed the ratio in performance between the first and the last block of trials based on empirical
data, the ratio averaged 0.69±0.16 (95% CI) and the 95% confidence intervals did not include the value of 1, meaning that the improvement for the group was significant at α
We also examined whether the improvement following training on the isolated letter task transferred to other visual tasks by comparing the same four measurements before and after training, as we did for the flanked letter training. As shown by the circular symbols in , except for one observer in panels b and d, the data for all other observers in this training group fall within the shaded regions. A paired t-test comparing the group-averaged data with the null effect confirmed that all these improvements were significant, at the following p-values: (a) size limit, p
0.004; (b) spacing limit, p
0.041; (c) contrast threshold for single letters, p<0.0001; (d) size of visual span, p
0.038. Along with the results from the flanked letter training group, our results show that both training tasks were effective in inducing improvements on the letter size limit, letter spacing limit, letter contrast sensitivity and the size of visual span, regardless of whether the task was a trained or an untrained one.
Generalization of the learning effect: dependency on the training task?
Our two training tasks were chosen on the basis that they targeted different limiting factors in amblyopic visual function. Specifically, our hypothesis was that the flanked letter training task would improve observers' ability to identify targets in clutter by reducing the effect of spatial crowding 
. Thus, we expected that the spacing limit would benefit more from perceptual learning for the flanked letter training group than for the isolated letter training group. In contrast, based on the findings of Zhou et al 
and Astle et al 
showing that training on a contrast sensitivity measurement task improved letter acuity, we anticipated that the isolated letter training group might benefit more than the flanked letter training group on the size limit (visual acuity) and contrast threshold measurements for identifying single letters. To compare the effectiveness of the two training tasks on improving the various types of measurements, we computed the post-pre ratios for letter size limit, spacing limit and the contrast threshold for identifying single letters, for each observer. For the size of the visual span measurement, instead of computing the post-pre ratio, we computed the difference in bits of information transmitted (see Materials and Methods
). Note that because the magnitude of the training effect depends on the pre-test value 
, we first confirmed that the pre-test values on these four measurements were not different between the two groups (t-test: p
0.68 for size limit; p
0.22 for spacing limit; p
0.46 for contrast threshold for identifying single letters and p
0.38 for size of the visual span). The post-pre ratios or differences for individual observers (small green or red symbols), as well as the group-averaged values (black filled symbols with ±95% confidence intervals), are plotted in (flanked letter training) and 5b (isolated letter training). If the confidence intervals include a post-pre ratio of 1 for size, spacing and contrast threshold measurements, or a post-pre difference of 0 for visual span measurement, then we conclude that there was no statistically significant improvement in performance on the given task following training, at α
0.05. For comparison, the improvements in performance for the trained task are also plotted in each panel (dark blue dotted line: ratio calculated based on the expected
values derived from the linear function fitted to the training data; light blue dashed line: ratio calculated based on the empirical
data). In general, the improvements were statistically significant for all four pre- and post-test measurements for the two training groups. For both training groups, the 95% confidence intervals for the size limit, spacing limit and contrast threshold for identifying single letters overlap with those of the training task (light or dark blue lines), implying a more or less complete transfer of learning to these untrained task. We are not able to draw the same conclusion for the visual span measurement simply because we compared the difference, instead of a ratio between the pre- and post-test measurement for visual span. Further, for each of the four measurements, the 95% confidence intervals between the two training groups overlap with each other, implying that the magnitude of improvements were similar between the two groups, consistent with the results of two-sample t-tests (size: p
0.18; spacing: p
0.93; contrast: p
0.15; vspan: p
0.88). In other words, the transfer of improvements to an untrained task did not depend on the training task.
Post-pre ratios and difference comparisons for the four untrained visual tasks between the two training groups.
Our initial expectation was that the flanked letter training would be more effective in reducing the spacing limit than the isolated letter training. However, shows that the two groups seem to have benefited from a similar reduction in the spacing limit following their respective training. Presumably, learning to identify flanked letters leads to a reduction in the spacing limit, while improving letter acuity at the same time. However, to ask whether there was a specific reduction in crowding per se, we calculated a crowding index, defined as the ratio between the letter spacing limit and the letter size limit, for each observer. The post-pre ratio of this crowding index averaged 0.62±0.36 (95%CI) for the flanked letter training group, and 1.01±0.77 for the isolated letter training group. Although there were substantial individual differences, these values indicate that the flanked letter training led to a significant reduction in the crowding index, but not for the isolated letter training, implying that the flanked letter training might be more effective in reducing crowding per se.