This Title All WIREs
How to cite this WIREs title:
WIREs Cogn Sci
Impact Factor: 2.824

Model‐based approaches to neuroimaging: combining reinforcement learning theory with fMRI data

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract The combination of functional magnetic resonance imaging (fMRI) with computational models for a given cognitive process provides a powerful framework for testing hypotheses about the neural computations underlying such processes in the brain. Here, we outline the steps involved in implementing this approach with reference to the application of reinforcement learning (RL) models that can account for human choice behavior during value‐based decision making. The model generates internal variables which can be used to construct fMRI predictor variables and regressed against individual subjects' fMRI data. The resulting regression coefficients reflect the strength of the correlation with blood oxygenation level dependent (BOLD) activity and the relevant internal variables from the model. In the second part of this review, we describe human neuroimaging studies that have employed this analysis strategy to identify brain regions involved in the computations mediating reward‐related decision making. Copyright © 2010 John Wiley & Sons, Ltd. This article is categorized under: Neuroscience > Cognition

An example of a computational model which can be used in combination with functional magnetic resonance imaging data: reinforcement learning (RL). The goal of this model is to learn about the expected reward attributable to a set of actions in the world (e.g., A and B), and to guide action selection so that the action associated with the highest expected reward is favored. This particular RL model instantiation uses a temporal difference learning rule to learn the value predictions and a softmax rule for action selection. The index variable t denotes within‐trial time. The model has five internal variables: the prediction error (PE) δ, and the estimated value predictions for the two actions VA and VB, along with the softmax transformed action probabilities PA and PB. These variables are plotted in a trial‐by‐trial resolution but are modeled at different time points within a trial when converted to a predictor in a general linear model (Figure 2). The PE δ (weighted by the learning rate α) regulates the size of the value update on each trial. Softmax action selection is realized by filtering the value difference through a sigmoid function, whose slope is controlled by the inverse temperature τ. This operation converts the values to action probabilities. This parameter represents the stochasticity of the choices, or conversely, the reward sensitivity: if τ is small, even large value differences will result in very similar action probabilities and the model's choices are virtually random. In contrast, if τ is large, even small value differences in the medium value range can be exaggerated, thus leading to different choices. The model likelihood is used as a cost function in an optimization procedure to determine the model parameters α and τ so that model's fit with the individual choice history is maximal. As an initial visual quality check, the model's binned action probabilities for one particular action (e.g., A) can be plotted against the actual choice probabilities (determined, e.g., as percentage of choices for option A) and the increase across these different bins can be examined (lower right panel). Deviations of this linear increase from the y = x line can indicate whether the model is severely over‐ or underpredicting the actual choices of a subject.

[ Normal View | Magnified View ]

Application of the computational model to functional magnetic resonance imaging (fMRI) data. Internal variables derived from the model [e.g., prediction errors (PEs), modeled at the time of the outcome presentation of each trial] are converted into a time series and convolved with a hemodynamic response function thus yielding a regressor in a single‐subject fMRI design matrix. This general linear model is fitted at each voxel in the brain. Subsequent statistical contrasts for the parameter estimates of the newly created regressor yield a statistical map describing the degree of correlation between activity in a particular BOLD time series voxel and the internal variable of interest (in this case the PE). Finally, the goodness of fit of the model‐based variable with the time series in a particular brain region can be visualized by plotting the event‐related averaged time series for a given trial or event, separated into bins, which capture different levels of the internal variable (here: low, medium, and high PEs).

[ Normal View | Magnified View ]

Related Articles

Decision neuroscience: neuroeconomics
Cognitive Science: Overviews

Browse by Topic

Neuroscience > Cognition

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts