|Home | About | Journals | Submit | Contact Us | Français|
The study of expectations of reward helps to understand rules controlling goal-directed behavior as well as decision making and planning. I shall review a series of recent studies focusing on how the food gathering behavior of honeybees depends upon reward expectations. These studies document that free-flying honeybees develop long-term expectations of reward and use them to regulate their investment of energy/time during foraging. Also, they present a laboratory procedure suitable for analysis of neural substrates of reward expectations in the honeybee brain. I discuss these findings in the context of individual and collective foraging, on the one hand, and neurobiology of learning and memory of reward.
An expectation is said to be “a strong hope or belief that something that you want will happen”, or the action “to anticipate or look forward to the occurrence of an event.”1 Clearly, the notions of expectation and anticipation are linked to each other and it is taken for granted that these two words are interchangeable. However, although one needs to expect in order to anticipate, expecting per se does not imply anticipation. In psychology, the term ‘incentive’ designates internal correlates of specific rewards guiding the behavior of subjects pursuing such rewards.2 Incentive denotes what is referred to as a subject’s expectation of reward.2 Animals develop memories of specific properties of reward,2–4 alongside memories arising from the contingency between salient stimuli (as conditioned stimuli, or CSs) and reward (as an unconditioned stimulus, or US). Behaviorally, an expectation of reward is seen as an adjustment of a response which depends upon the formation and subsequent activation of memories about specific properties of reward, whereas the recollection of such memories is triggered by the cues and events predicting reward. In this scheme, expectations of reward are determined by past reward experience and can guide reward-induced behavior.2
Studying reward expectations helps us to understand rules controlling goal-directed behavior as well as decision making and planning. Early studies showed that non-human primates5 and rodents2,6 learn to expect specific outcomes and that these expectations are linked to specific magnitudes or kinds of rewards. Monkeys trained in a simple choice task show “disappointment, hesitation, and searching behavior” when they find a non-preferred food item where a preferred food item used to be,5 and the running time of rats in a runway changes dramatically when they experience a sudden shift in reward magnitude.2,6 Since these initial findings, reward expectations have extensively been addressed in pigeons,7 rodents,8,9 non-human primates10 and humans.11 However, very little is known about reward expectations in invertebrate species. Here, I shall review a series of four recent studies focusing on how the food gathering behavior of honeybees depends upon expectations of reward. The first two studies concern the behavior of free-flying bees foraging under conditions closely mimicking a natural situation.12,13 Using an approach frequent in behavioral ecology, they helped to ponder the role of reward expectations in the ecology of foraging. The remaining two studies concern a laboratory procedure suitable for analysis of neural substrates of reward expectations in the honeybee brain.14,15 They made use of an approach frequent in the study of reward expectations, which relies heavily on experiments with restrained subjects made under highly controlled conditions. Using this approach, for example, studies in mammals have shown that interaction between the basolateral complex of the amygdala and the orbitofrontal cortex is necessary for development and subsequent use of reward expectations involved in goal-directed behaviours.4,11,16,17 I shall discuss these studies in the context of individual and collective foraging, on the one hand, and the neurobiology of learning and memory of reward.
Honeybees live in large colonies whose primary source of energy is the nectar found within flowers. Nectar offer varies continuously, both in space and time, depending on species-specific flowering patterns, weather conditions, and the activity of other pollinators.18–23 In spite of such variability, honeybees gather energy efficiently using their learning and memory skills.24,25 They learn, for example, the location and time of day when flowers are productive as well as their odors, colors and shapes.26–30 But, can they also learn that reward level increases or decreases over time? In a recent study,12 we trained bees to forage individually on an artificial flower patch offering increasing (small-medium-large), decreasing (large-medium-small) or constant (small, medium or large) reward levels (Fig. 1A). Next, after a long foraging pause, we recorded how persistently they searched for food at the patch in the absence of reward. We found that the bees that had previously experienced increasing reward levels searched for food more persistently than the bees that had experienced decreasing reward levels (Fig. 1B). This difference could not be explained by the bees’ most recent reward experience or by the total amount of reward that they had previously collected. This conclusion was drawn from the fact that the bees that had collected small, medium and large constant rewards equally searched for food during testing (even if the last reward experience and the total amount of reward that they had previously collected were clearly different, Fig. 1B). We also found that the difference in persistence between the increasing and decreasing groups was independent of classical and/or operant associations between the offered reward and its predicting signals (see Fig. 1 in Gil et al.12). Taken together, the results of this study showed that bees can learn that the level of reward either increases or decreases over time and, subsequently, adjust their persistence during food searches accordingly. Can they also learn the magnitude of reward variations? We addressed this question in a second study13 in which we trained bees to forage individually on an artificial flower patch offering either a large (small-large) or a small (either small-medium or medium-large) increase in reward level (Fig. 1C). After a long foraging pause following training, we recorded how persistently they searched for food at the patch in the absence of reward. We found that the bees that had previously experienced a large increase in reward level searched for food more persistently than the bees that had experienced a small increase in reward level (Fig. 1D). As before, this difference could not be explained by the bees’ most recent reward experience or by the total amount of reward that they had previously collected. Honeybees, therefore, adjust their persistence to search for food in relation to both the sign and magnitude of past variations in reward level.12,13
The outcome of the above experiments is shown schematically in Figure 2. When honeybees forage on a flower patch offering variable reward levels, two parallel learning processes take place. On the one hand, bees learn the sign and magnitude of reward variations across successive foraging visits. They may do this using a build-in change detector that computes the difference in reward magnitude across foraging events. This computation leads to an estimate of an expected reward; we refer to such estimate as to a reward memory. On the other hand, bees associate the offered reward (as the US) with signals and cues present at the feeding site (as CSs) and an associative memory is formed. When a bee visits the feeding site after a long foraging pause, these memories are retrieved by reward-predicting stimuli. Associative memories are revealed through the bee’s choice behaviour,12 whereas reward memories are revealed through the bee’s persistence to search for food in the absence of reward.12,13 Hence, foraging honeybees adjust their investment of time/energy during food searches in relation to both the sign and magnitude of past variations in the level of reward. An increase in reward level leads to the formation of expectations of reward enhancing a forager’s reliance on a food source, and the strength of such reliance increases together with the magnitude of the past increase in reward level. This ability can make it more likely for them to successfully compete with other flower pollinators for limited resources as well as to maximize their individual rates of food collection by increasing their chances of finding food when forage is scarce. Because honeybees are social insects, one might ask how the colony as a whole benefits from a honeybee’s ability to develop expectations of reward. Such ability might enhance a colony’s selectivity among—variable—nectar sources. It would be interesting to examine the within-hive individual behavior as well as the pattern of collective behavior of honeybees foraging on multiple feeders offering increasing, decreasing and constant reward levels. One might also compare the ability of different species of bees to develop and use reward memories, and study how such ability relates to the specificities of their particular environments.
Optimal foraging theory attempts to predict foraging behavior in situations where sources of variable quality are heterogeneously distributed in space (reviewed in ref. 31). According to theory, foragers assess feeding site quality using an optimization rule that tends to maximize the rate of net energy intake.32 Thus, a forager’s investment of time/energy at a given patch is positively correlated to food availability.32–34 This rule does not capture a forager’s investment of time/energy in the absence of reward.31 When a honeybee searches for food at an empty source, the cost of searching influences its behavior in a way that memories on past reward experiences at the site will help the forager to determine how much time/energy is ought to be invested in the ongoing task. The results presented above show that a honeybee’s persistence to search for food on a negative energy budget relies on its—already developed—expectations of reward.12,13 Effort has been made to build models incorporating learning and memory phenomena into the context of foraging. One such model incorporates reward level variability into the forager’s evaluation of food patch quality.35 It predicts that the foraging behavior of animals that have previously experienced variable rewards at a given patch will depend upon their memories of either the most recent reward level or the average reward level experienced, depending on the time elapsed since the last encounter with reward.35 Our results do not match predictions from this model.12,13 Therefore, an alternative model is needed to explain how honeybees use reward memories during foraging.
In addressing neural correlates of reward expectations in honeybees, one has to find a behavioral correlate of reward memories suitable for laboratory studies. Such behavioral correlate can be the honeybee proboscis extension response (PER).36–38 This response allows bees to gather sugar solution and is triggered when the gustatory receptors of the antennae, proboscis and tarsi are stimulated with sucrose.37 In a recent study,14 we asked whether harnessed bees can learn the sign of reward variations so as to subsequently adjust their PERs. We used an experimental design similar to that of our initial experiment with free-flying bees.12 We first trained bees by coupling the stimulation of one antenna with either increasing, decreasing or constant reward levels offered to their proboscis throughout consecutive learning trials (Fig. 3A). We then recorded the bees’ PE reaction-time to sucrose stimulation of the antenna in the absence of reward. We found that the bees that had experienced increasing reward levels subsequently extended their probosces earlier than the bees that had experienced decreasing or constant reward levels (Fig. 3B). These results could not be accounted for by the bees’ most recent experience or the total amount of reward that they received during training. The bees that had experienced small, medium or large constant rewards showed similar reaction-times, although their last reward experience and the total amount of received reward were different (Fig. 3B). Therefore, one can conclude that harnessed bees can learn that reward level increases or decreases over time and adjust their PERs accordingly.14 But, further studies addressing neural correlates of reward memories in harnessed bees require within-animal controls. This is because recordings of neural activity are variable and, therefore, a reference from the same experimental subject is necessary for analysis of responses to any given stimulus. In a new series of experiments, we aimed to incorporate within-animal controls into our laboratory procedure. To this end, we asked whether bees can learn side-specifically that reward level increases or decreases over time.15 Side-specific learning is well documented in honeybees.39–42 We developed a side-specific training in which bees were trained by coupling stimulation of one antenna with increasing reward levels and stimulation of the other antenna with decreasing reward levels throughout consecutive learning trials (Fig. 3C). Next, at different times following training, we recorded the bees’ PE reaction-time to sucrose stimulation of each antenna in the absence of reward. We found that the bees extended their probosces earlier after stimulation of the antenna that had been linked to increasing reward levels than after stimulation of the antenna that had been linked to decreasing reward levels (Fig. 3D). Therefore, bees can also learn side-specifically that reward increases or decreases over time. They develop both short- and long-term side-specific reward memories, and the long-term memories are extinguished by repetitive stimulation of their antennae (Fig. 3D). Also, we found that these side-specific adjustments of PE response involve an interplay between gustatory and mechanosensory input, and correlate well with the activity of muscles responsible for controlling the movements of the proboscis.15 Taken together, these findings constitute a basis on which further analyses of reward memories can be built. Such analyses will include within-animal controls and a physiological correlates of a robust behavioral measure.
The events involved in this side-specific learning are schematically shown in Figure 4. Bees learn to associate gustatory and mechanical stimulation of each antenna with either increasing or decreasing rewards offered to their probosces throughout consecutive learning trials. They do this using a built-in change detector that computes differences in reward level (linked to each antenna) across feeding events. This computation leads to the formation of an internal estimate of an expected reward associated with each input side, and then, to the formation of side-specific reward memories. After training, gustatory and mechanosensory input can activate both short- and long-term side-specific reward memories. Activation of such memories leads to side differences in a honeybee’s PE reaction-time, also evinced by activity of the muscles (M17s) involved in movement of the proboscis. It would be interesting to address how the magnitude and frequency of reward variations relate to the adjustment of a honeybee’s PER as well as the relative involvement of mechanical and gustatory inputs in side-specific learning. In addition, the fact that the above procedure allows analysis of within-animal behavioral correlates of reward memories makes it suitable for pharmacological, electrophysiological and optophysiological study of neural substrates underlying these memories. These studies can be combined with neuro-anatomical studies identifying brain areas where projections of gustatory receptors from the antenna and proboscis, on the one hand, and mechanosensory receptors from the antenna, on the other, converge. Previous studies showed that gustatory receptors of the antenna project into the ipsilateral antennal lobe (AL), dorsal lobe (DL) and suboesophageal ganglion (SOG).43 The gustatory receptors of the proboscis project into the SOG and ascend to the DLs.44 The mechanoreceptors of the antenna project into the ipsilateral DL and SOG.43,45 Hence, the DL and the SOG seem the first-order neuropils for processing mechanosensory and gustatory input from the antennae and the proboscis. Electrophysiological and optophysiological experiments will address how neural activity in these neuropils relates to adjustment of a honeybee’s PER occurring after activation of reward memories. Pharmacological approaches would also prove fruitful in this context. For example, it would be interesting to evaluate the role of the octopamine (OA, a bioamine involved in associative learning, memory retrieval, and food arousal in honeybees)46–49 during formation and retrieval of reward memories.
The results of the above studies show that honeybees learn the sign and magnitude of reward variations and develop long-term reward expectations allowing adjustment of time/energy investment during foraging. The results also show that honeybees adjust their PE reaction-time in relation to the sign of reward variations; this form of learning involves the joint action of gustatory and mechanosensory input of the antennae, and can be side-specific. These studies constitute a basis on which three lines of investigation can be built. The first line concerns the role of reward expectations in honeybee foraging. In particular, further experiments are needed to address how the colony as a whole benefits from a honeybee’s ability to develop expectations of reward. The second line of investigation concerns the question of how reward expectations can be incorporated into theoretical accounts of individual and collective foraging. The third line concerns identifying neural substrates involved in the development of reward memories in honeybees. Progress in these three lines of investigation will bring together behavioral, theoretical and physiological data to better understand the role and underlying mechanisms of reward memories in honeybees.
I am indebted to R.J. De Marco for his permanent encouragement, fruitful discussions and valuable comments. I thank R. Menzel for his helpful comments and support. This work was supported by the Deutsche Forschungsgemeinschaft (DFG).
Previously published online: www.landesbioscience.com/journals/cib/article/10621