Abstract
Decision making is a core competence for animals and humans acting and surviving in environments they only partially comprehend, gaining rewards and punishments for their troubles. Decision-theoretic concepts permeate experiments and computational models in ethology, psychology, and neuroscience. Here, we review a well-known, coherent Bayesian approach to decision making, showing how it unifies issues in Markovian decision problems, signal detection psychophysics, sequential sampling, and optimal exploration and discuss paradigmatic psychological and neural examples of each problem. We discuss computational issues concerning what subjects know about their task and how ambitious they are in seeking optimal solutions; we address algorithmic topics concerning model-based and model-free methods for making choices; and we highlight key aspects of the neural implementation of decision making.
Article PDF
Similar content being viewed by others
References
Ainslie, G. (2001). Breakdown of will. Cambridge: Cambridge University Press.
Balleine, B. W., Delgado, M. R., & Hikosaka, O. (2007). The role of the dorsal striatum in reward and decision-making. Journal of Neuroscience, 27, 8161–8165.
Barto, A. G. (1995). Adaptive critics and the basal ganglia. In J. C. Houk, J. L. Davis, & D. G. Beiser (Eds.), Models of information processing in the basal ganglia (pp. 215–232). Cambridge, MA: MIT Press.
Battaglia, P. W., Jacobs, R. A., & Aslin, R. N. (2003). Bayesian integration of visual and auditory signals for spatial localization. Journal of the Optical Society of America A, 20, 1391–1397.
Baxter, J., & Bartlett, P. L. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15, 319–350.
Beck, J. [M.], Ma, W. J., Latham, P. E., & Pouget, A. (2007). Probabilistic population codes and the exponential family of distributions. Progress in Brain Research, 165, 509–519.
Beck, J. M., & Pouget, A. (2007). Exact inferences in a neural implementation of a hidden Markov model. Neural Computation, 19, 1344–1361.
Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.
Berger, J. O. (1985). Statistical decision theory and Bayesian analysis. New York: Springer.
Berridge, K. C. (2007). The debate over dopamine’s role in reward: The case for incentive salience. Psychopharmacology, 191, 391–431.
Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (Monographs on Statistics and Applied Probability). London: Chapman & Hall.
Bertsekas, D. P. (2007). Dynamic programming and optimal control (2 vol.). Belmont, MA: Athena Scientific.
Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychology Review, 113, 700–765.
Britten, K. H., Newsome, W. T., Shadlen, M. N., Celebrini, S., & Movshon, J. A. (1996). A relationship between behavioral choice and the visual responses of neurons in macaque mt. Visual Neuroscience, 13, 87–100.
Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (1992). The analysis of visual motion: A comparison of neuronal and psychophysical performance. Journal of Neuroscience, 12, 4745–4765.
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the 10th National Conference on Artificial Intelligence (pp. 183-188). Menlo Park, CA: AAAI Press.
Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society B, 362, 933–942.
Costa, R. M. (2007). Plastic corticostriatal circuits for action learning: What’s dopamine got to do with it? In B. W. Balleine, K. Doya, J. O’Doherty, & M. Sakagami (Eds.), Reward and decision making in corticobasal ganglia networks (Annals of the New York Academy of Sciences, Vol. 1104, pp. 172–191). New York: New York Academy of Sciences.
Daw, N. D., Courville, A. C., & Touretzky, D. S. (2006). Representation and timing in theories of the dopamine system. Neural Computation, 18, 1637–1677.
Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Current Opinion in Neurobiology, 16, 199–204.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711.
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876–879.
Day, J. J., Roitman, M. F., Wightman, R. M., & Carelli, R. M. (2007). Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nature Neuroscience, 10, 1020–1028.
Dayan, P., & Abbott, L. F. (2005). Theoretical neuroscience: Computational and mathematical modeling of neural systems. Cambridge, MA: MIT Press.
Dayan, P., & Sejnowski, T. J. (1996). Exploration bonuses and dual control. Machine Learning, 25, 5–22.
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence (pp. 761–768). Menlo Park, CA: AAAI Press.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
Deneve, S. (2008). Bayesian spiking neurons I: Inference. Neural Computation, 20, 91–117.
Dickinson, A., & Balleine, B. (2002). The role of learning in motivation. In C. Gallistel (Ed.), Stevens’s handbook of experimental psychology (Vol. 3, pp. 497–533). New York: Wiley.
Doya, K. (2002). Metalearning and neuromodulation. Neural Networks, 15, 495–506.
Ernst, M. O., & Banks, M. S. (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415, 429–433.
Everitt, B. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction: From actions to habits to compulsion. Nature Neuroscience, 8, 1481–1489.
Friston, K. J., Tononi, G., Reeke, G. N., Jr., Sporns, O., & Edelman, G. M. (1994). Value-dependent selection in the brain: Simulation in a synthetic neural model. Neuroscience, 59, 229–243.
Gittins, J. C. (1989). Multi-armed bandit allocation indices. New York: Wiley.
Glimcher, P. W. (2004). Decisions, uncertainty, and the brain: The science of neuroeconomics. Cambridge, MA: MIT Press, Bradford Books.
Gold, J. I., & Shadlen, M. N. (2001). Neural computations that underlie decisions about sensory stimuli. Trends in Cognitive Sciences, 5, 10–16.
Gold, J. I., & Shadlen, M. N. (2002). Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward. Neuron, 36, 299–308.
Gold, J. I., & Shadlen, M. N. (2007). The neural basis of decision making. Annual Review of Neuroscience, 30, 535–574.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley.
Haruno, M., Kuroda, T., Doya, K., Toyama, K., Kimura, M., Samejima, K., et al. (2004). A neural correlate of reward-based behavioral learning in caudate nucleus:A functional magnetic resonance imagresearch ing study of a stochastic decision task. Journal of Neuroscience, 24, 1660–1665.
Hyman, S. E., Malenka, R. C., & Nestler, E. J. (2006). Neural mechanisms of addiction: The role of reward-related learning and memory. Annual Review of Neuroscience, 29, 565–598.
Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185–1201.
Jacobs, R. A. (1999). Optimal integration of texture and motion cues to depth. Vision Research, 39, 3621–3629.
Jazayeri, M., & Movshon, J. A. (2006). Optimal representation of sensory information by neural populations. Nature Neuroscience, 9, 690–696.
Joel, D., Niv, Y., & Ruppin, E. (2002). Actor-critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15, 535–547.
Kable, J. W., & Glimcher, P. W. (2007). The neural correlates of subjective value during intertemporal choice. Nature Neuroscience, 10, 1625–1633.
Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge, MA: MIT Press.
Kakade, S., & Dayan, P. (2002). Dopamine: Generalization and bonuses. Neural Networks, 15, 549–559.
Killcross, S., & Coutureau, E. (2003). Coordination of actions and habits in the medial prefrontal cortex of rats. Cerebral Cortex, 13, 400–408.
Körding, K. [P.] (2007). Decision theory: What “should”the nervous system do? Science, 318, 606–610.
Körding, K. P., & Wolpert, D. M. (2004). Bayesian integration in sensorimotor learning. Nature, 427, 244–247.
Krebs, J., Kacelnik, A., & Taylor, P. (1978). Test of optimal sampling by foraging great tits. Nature, 275, 27–31.
Lengyel, M., & Dayan, P. (2008). Hippocampal contributions to control: The third way. In J. Platt, D. Koller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems 20 (pp. 889–896). Cambridge, MA: MIT Press.
Lo, C.-C., & Wang, X.-J. (2006). Cortico-basal ganglia circuit mechanism for a decision threshold in reaction time tasks. Nature Neuroscience, 9, 956–963.
Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian inference with probabilistic population codes. Nature Neuroscience, 9, 1432–1438.
Mangel, M., & Clark, C. W. (1989). Dynamic modeling in behavioral ecology. Princeton, NJ: Princeton University Press.
McClure, S. M., Gilzenrat, M. S., & Cohen, J. D. (2006). An exploration-exploitation model based on norepinephrine and dopamine activity. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems 18 (pp. 867–874). Cambridge, MA: MIT Press.
McClure, S. M., Laibson, D. I., Loewenstein, G., & Cohen, J. D. (2004). Separate neural systems value immediate and delayed monetary rewards. Science, 306, 503–507.
McNamara, J., & Houston, A. (1980). The application of statistical decision theory to animal behaviour. Journal of Theoretical Biology, 85, 673–690.
Montague, [P.] R. (2006). Why choose this book?: How we make decisions. New York: Dutton.
Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16, 1936–1947.
Neal, R. M., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in graphical models (pp. 355–368). Norwell, MA: Kluwer.
Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning (pp. 278–287). San Francisco: Morgan Kaufmann.
O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38, 329–337.
O’Doherty, J. [P.], Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304, 452–454.
Parker, A. J., & Newsome, W. T. (1998). Sense and the single neuron: Probing the physiology of perception. Annual Review of Neuroscience, 21, 227–277.
Platt, M. L., & Glimcher, P. W. (1999). Neural correlates of decision variables in parietal cortex. Nature, 400, 233–238.
Puterman, M. L. (2005). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley Interscience.
Pyke, G. H. (1984). Optimal foraging theory: A critical review. Annual Review of Ecology & Systematics, 15, 523–575.
Rao, R. P. N. (2004). Bayesian computation in recurrent neural circuits. Neural Computation, 16, 1–38.
Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922.
Ratcliff, R., & Rouder, J. (1998). Modeling response times for twochoice decisions. Psychological Science, 9, 347–356.
Redgrave, P., Gurney, K., & Reynolds, J. (2008). What is reinforced by phasic dopamine signals? Brain Research Reviews, 58, 322–339.
Rescorla, R., & Wagner, A. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory (pp. 64–99). New York: Appleton-Century-Crofts.
Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. Journal of Neuroscience, 22, 9475–9489.
Ross, S. (1983). Introduction to stochastic dynamic programming: Probability and mathematical. Orlando, FL: Academic Press.
Sahani, M., & Dayan, P. (2003). Doubly distributional population codes: Simultaneous representation of uncertainty and multiplicity. Neural Computation, 15, 2255–2279.
Schultz, W. (2002). Getting formal with dopamine and reward. Neuron, 36, 241–263.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Seymour, B., O’Doherty, J. P., Dayan, P., Koltzenburg, M., Jones, A. K., Dolan, R. J., et al. (2004). Temporal difference models describe higher-order learning in humans. Nature, 429, 664–667.
Shadlen, M. N., Britten, K. H., Newsome, W. T., & Movshon, J. A. (1996). A computational analysis of the relationship between neuronal and behavioral responses to visual motion. Journal of Neuroscience, 16, 1486–1510.
Shadlen, M. N., Hanks, T. D., Churchland, A. K., Kiani, R., & Yang, T. (2007). The speed and accuracy of a simple perceptual decision: A mathematical primer. In K. Doya, S. Ishii, A. Pouget, & R. P. Rao (Eds.), Bayesian brain: Probabilistic approaches to neural coding (pp. 209–238). Cambridge, MA: MIT Press.
Shadlen, M. N., & Newsome, W. T. (1996). Motion perception: Seeing and deciding. Proceedings of the National Academy of Sciences, 93, 628–633.
Smith, P. L., & Ratcliff, R. (2004). Psychology and neurobiology of simple decisions. Trends in Neurosciences, 27, 161–168.
Stocker, A. A., & Simoncelli, E. P. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, 9, 578–585.
Suri, R. E., & Schultz, W. (1998). Learning of sequential movements by neural network model with dopamine-like reinforcement signal. Experimental Brain Research, 121, 350–354.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In B. W. Porter & R. J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 216–224). San Francisco: Morgan Kaufmann.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
Trommershäuser, J., Landy, M. S., & Maloney, L. T. (2006). Humans rapidly estimate expected gain in movement planning. Psychological Science, 17, 981–988.
Trommershäuser, J., Maloney, L. T., & Landy, M. S. (2003a). Statistical decision theory and the selection of rapid, goal-directed movements. Journal of the Optical Society of America A, 20, 1419–1433.
Trommershäuser, J., Maloney, L. T., & Landy, M. S. (2003b). Statistical decision theory and trade-offs in the control of motor response. Spatial Vision, 16, 255–275.
Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108, 550–592.
Wald, A. (1947). Sequential analysis. New York: Wiley.
Wang, X.-J. (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron, 36, 955–968.
Wang, X.-J. (2006). Toward a prefrontal microcircuit model for cognitive deficits in schizophrenia. Pharmacopsychiatry, 39 (Suppl. 1), S80-S87.
Watkins, C. (1989). Learning from delayed rewards. Unpublished doctoral thesis, University of Cambridge.
Whiteley, L., & Sahani, M. (2008). Implicit knowledge of visual uncertainty guides decisions with asymmetric outcomes. Journal of Vision, 8, 1–15.
Wickens, J. [R.] (1990). Striatal dopamine in motor activation and reward-mediated learning: Steps towards a unifying model. Journal of Neural Transmission, 80, 9–31.
Wickens, J. R., Horvitz, J. C., Costa, R. M., & Killcross, S. (2007). Dopaminergic mechanisms in actions and habits. Journal of Neuroscience, 27, 8181–8183.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
Wittmann, B. C., Daw, N. D., Seymour, B., & Dolan, R. J. (2008). Striatal activity underlies novelty-based choice in humans. Neuron, 58, 967–973.
Yang, T., & Shadlen, M. N. (2007). Probabilistic reasoning by neurons. Nature, 447, 1075–1080.
Yoshida, W., & Ishii, S. (2006). Resolution of uncertainty in prefrontal cortex. Neuron, 50, 781–789.
Yu, A. J. (2007). Optimal change-detection and spiking neurons. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in neural information processing systems 19 (pp. 1545–1552). Cambridge, MA: MIT Press.
Yuille, A. J., & Bülthoff, H. H. (1996). Bayesian decision theory and psychophysics. In D. C. Knill & W. Richards (Eds.), Perception as Bayesian inference (pp. 123–161). Cambridge: Cambridge University Press.
Zemel, R. S., Dayan, P., & Pouget, A. (1998). Probabilistic interpretation of population codes. Neural Computation, 10, 403–430.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Funding came from the Gatsby Charitable Foundation (to P.D.).
Rights and permissions
About this article
Cite this article
Dayan, P., Daw, N.D. Decision theory, reinforcement learning, and the brain. Cognitive, Affective, & Behavioral Neuroscience 8, 429–453 (2008). https://doi.org/10.3758/CABN.8.4.429
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/CABN.8.4.429