Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated by internal states. Within this framework, we mathematically prove that seeking rewards is equivalent to the fundamental objective of physiological stability, defining the notion of physiological rationality of behavior. We further suggest a formal basis for temporal discounting of rewards by showing that discounting motivates animals to follow the shortest path in the space of physiological variables toward the desired setpoint. We also explain how animals learn to act predictively to preclude prospective homeostatic challenges, and several other behavioral patterns. Finally, we suggest a computational role for interaction between hypothalamus and the brain reward system.
Our survival depends on our ability to maintain internal states, such as body temperature and blood sugar levels, within narrowly defined ranges, despite being subject to constantly changing external forces. This process, which is known as homeostasis, requires humans and other animals to carry out specific behaviors—such as seeking out warmth or food—to compensate for changes in their environment. Animals must also learn to prevent the potential impact of changes that can be anticipated.
A network that includes different regions of the brain allows animals to perform the behaviors that are needed to maintain homeostasis. However, this network is distinct from the network that supports the learning of new behaviors in general. These two systems must, therefore, interact so that animals can learn novel strategies to support their physiological stability, but it is not clear how animals do this.
Keramati and Gutkin have now devised a mathematical model that explains the nature of this interaction, and that can account for many behaviors seen among animals, even those that might otherwise appear irrational. There are two assumptions at the heart of the model. First, it is assumed that animals are capable of guessing the impact of the outcome of their behaviors on their internal state. Second, it is assumed that animals find a behavior rewarding if they believe that the predicted impact of its outcome will reduce the difference between a particular internal state and its ideal value. For example, a form of behavior for a human might be going to the kitchen, and an outcome might be eating chocolate.
Based on these two assumptions, the model shows that animals stabilize their internal state around its ideal value by simply learning to perform behaviors that lead to rewarding outcomes (such as going into the kitchen and eating chocolate). Their theory also explains the physiological importance of a type of behavior known as ‘delay discounting’. Animals displaying this form of behavior regard a positive outcome as less rewarding the longer they have to wait for it. The model proves mathematically that delay discounting is a logical way to optimize homeostasis.
In addition to making a number of predictions that could be tested in experiments, Keramati and Gutkin argue that their model can account for the failure of homeostasis to limit food consumption whenever foods loaded with salt, sugar or fat are freely available.