Simple learning rules to cope with changing environments
R. Groß, A. I. Houston, E. J. Collins, J. M. McNamara, F.-X. Dechaume-Moncharmont and N. R. Franks
We consider an agent that must choose repeatedly among several actions. Each action has a
certain probability of giving the agent an energy reward, and costs may be associated with
switching between actions. The agent does not know which action has the highest reward
probability, and the probabilities change randomly over time. We study two learning rules
that have been widely used to model decision-making processes in animals—one
deterministic and the other stochastic. In particular, we examine the influence of the rules’
‘learning rate’ on the agent’s energy gain. We compare the performance of each rule with the
best performance attainable when the agent has either full knowledge or no knowledge of the
environment. Over relatively short periods of time, both rules are successful in enabling
agents to exploit their environment. Moreover, under a range of effective learning rates, both
rules are equivalent, and can be expressed by a third rule that requires the agent to select the
action for which the current run of unsuccessful trials is shortest. However, the performance
of both rules is relatively poor over longer periods of time, and under most circumstances no
better than the performance an agent could achieve without knowledge of the environment.
We propose a simple extension to the original rules that enables agents to learn about and
effectively exploit a changing environment for an unlimited period of time.
Keywords:
decision making; learning rules; dynamic environments; multi-armed bandit.