The Reinforcement Learning model of pain motivation and decision-making.
Reinforcement Learning (RL) describes a general algorithmic (computational) method for learning from experience: predicting the occurrence of inherently salient events, and learning actions to exert control over them (maximising rewards, minimising penalties). In RL, an agent learns state or action value functions, or direct action policies, through interacting with the world. These functions can be learned by computing the error between predicted and actual outcomes, and using the error to improve future predictions and actions.
In the early 1990s, evidence emerged that RL provided a biologically plausible model for animal reward learning in Pavlovian (prediction) and instrumental (control) conditioning. Our work has shown that RL provides an equally good model for aversive learning - in describing both behaviour and neural responses in aversive conditioning and instrumental avoidance.
By directly comparing reward and aversive learning, we have been able to show how each involves dissociable systems, which act in opposition to each other in Pavlovian conditioning. This allows dual (opponent) processing of things like relief and disappointment. During instrumental conditioning, these systems are integrated to provide a common currency for decision-making spanning reward acquisition and punishment avoidance. Functional MRI indicates that this common currency for decision value is encoded in the ventromedial prefrontal cortex.
Computational models such as this are intended not merely descriptive accounts of behaviour, but as true mechanistic descriptions of how the brain actually works. Importantly, these models 'work' - they can be simulated or implemented, and will reproduce basic pain-related behaviour. Currently, we are building more complete models that capture the diversity of pain-rleated responses, and probe how different component systems interact.
A central role of the striatum in aversive learning.
A key finding to emerge from Reinforcement Learning models of aversive learning has been that the ventral putamen encodes an aversive prediction error. This was notable at the time because the striatum had often been considered relatively reward-specific. Our work has shown a functional and anatomical dissociation of reward and punishment systems during Pavlovian conditioning in the striatum, with anterior regions (nucleus accumbens) being more reward specific, and more dorsal-posterior regions being more punishment specific.
During avoidance, our work suggests that the dorsal striatum (dorsal putamen and caudate) adopts a reward-signed representation, consistent with successful avoidance being coded as a 'reward' that guides behavioural reinforcement.
Novel computational insights into dopamine and serotonin.
Dopamine and serotonin are the two neuromodulators most associated with motivation and decision-making. Early work, including our own human fMRI studies, indicates a key role for dopamine in conveying the striatal error term that guides value learning for rewards. However, our more recent work suggests that dopamine plays a broader role than pure learning, by modulating temporal impulsivity for rewards (increasing preference for smaller-sooner occurring rewards over larger-later alternatives), and controlling performance-specific actions (a greater tendency to choose higher-valued options).
There is good evidence that serotonin can act as an opponent to dopamine, with a role in punishment and inhibition. However, our recent work (using trypophan depletion) suggests that its role may be more complex than originally thought: we show that during flexible decision-making, serotonin selectively modulates reward values (and their representation in the medial prefrontal cortex), with no effect on aversive (avoidance) values.
The neuroeconomics of pain.
Recent years has seen substantial interest in the neurophysiological applicability of economic models of individual and social decision-making. A key part of economic models of social behaviour is other-regarding motivation, and our early work showed how observation of aversive outcomes (pain) in others is flexibly represented in the brain - in terms of empathy towards cooperators, and 'schadenfreude' towards competitors. More recently, we have studied how the informational content of social observation modulates our perception of outcomes in ourselves.
Economic models also offer insight into how we explicitly value aversive outcomes such as pain. In particular, we've been able to show how relative value models (in which value is constructed by comparison with recent experience) offer a better account of valuation than absolute models (which assume a intrinsic absolute scale of value in the brain).
- Wellcome Trust
- National Institute of Information and Communications Techology
- Japanese Society for the Promotion of Science (JSPS)
- MEXT (Japan)