Breadcrumb
David Leslie's research page
http://rsif.royalsocietypublishing.org/content/10/82/20130069.shortContext-dependent decision-making: a simple Bayesian model. Kevin Lloyd and David S. Leslie. Journal of the Royal Society Interface, 10 (2013), 20130069.
A model of individual learning in which the learner allocates observations to clusters in an online matter. Combines Dirichlet process clustering with online inference using single sample particle filters and Thompson sampling for action selection. The model exhibits plausibly realistic behaviour in serial reversal learning tasks, as well as spontaneous recovery, over-training reversal effects and partial reinforcement extinction effects.
Asynchronous stochastic approximation with differential inclusions. Steven Perkins and David S. Leslie. Stochastic Systems, 2 (2012), 409-446.
Uses the differential inclusions framework of stochastic approximation to consider the asynchronous updates problem, along with two-timescales techniques. The technique is demonstrated with an actor-critic method of learning in Markov decision processes.
Respondent driven sampling and community structure in a population of injecting drug users, Bristol, UK. H.L. Mills, C. Colijn, P. Vickerman, D. Leslie, V. Hope and M. Hickman. Drug and Alcohol Dependence, 126 (2012), 324-332.
An investigation of statistical properties of estimators when data are gathered using respondent-driven sampling.
Optimistic Bayesian sampling in contextual-bandit problems. Benedict C. May, Nathan Korda, Anthony Lee and David S. Leslie. Journal of Machine Learning Research, 13 (2012), 2069-2106.
Proves consistency of Thompson Sampling in contextual-bandit problems with consistent regression, and introduces a modification (optimistic Bayesian sampling, OBS) which is also provably consistent but outperforms Thompson sampling in empirical experiments.
Information theory and observational limitations in decision making. David Wolpert and David S. Leslie. The B.E. Journal of Theoretical Economics, 12(1) (2012), Article 5.
Considers the information available to a decision-maker to have been passed through an information channel, and considers the effect of this on observable decision-making. doi:10.1515/1935-1704.1749
A unifying framework for iterative approximate best response algorithms for distributed constraint optimisation problems. Archie C. Chapman, Alex C. Rogers, Nicholas R. Jennings and David S. Leslie. The Knowledge Engineering Review, 26 (2011), 411-444.
Article on publisher website. Local copy (copyright Cambridge University Press, 2011)
Casts DCOPs as potential games, and therefore considers a large number of DCOP algorithms under one framework to allow useful comparison.
Dynamic opponent modelling in fictitious play. Michalis Smyrnakis and David S. Leslie. The Computer Journal, 53 (2010), 1344-1359.
Download from publisher's website.
Uses particle filters to track and predict opponent strategy in for fictitious play. doi:10.1093/comjnl/bxq006
Posterior weighted reinforcement learning with state uncertainty Tobias Larsen, David S. Leslie, E. J. Collins and Rafal Bogacz. Neural Computation, 22 (2010), 1149-1179.
Download from publisher's website.
Reinforcement learning of state values when ambiguity exists over which state a reward relates to.
Nonparametric estimation of the distribution function in contingent valuation models. David S. Leslie, Robert Kohn, and Denzil G. Fiebig. Bayesian Analysis 4 (2009), 573-598.
Download preprint, published version
Places a Dirichlet process prior on the latent variable distribution of a binary regression model, so that the data determine the noise structure.
Generalised linear mixed model analysis via sequential Monte Carlo sampling. Yanan Fan, David S. Leslie and Matt P. Wand. Electronic Journal of Statistics 2 (2008), 916-938.
Download preprint, published version
Uses a sequential Monte Carlo sampler to analyse generalised linear mixed models.
On similarities between inference in game theory and machine learning. Iead Rezek, David S. Leslie, Steven Reece, Stephen J. Roberts, Alex C. Rogers, Rajeep K. Dash and Nicholas R. Jennings. Journal of Artificial Intelligence Research 33 (2008), 259-283
Download preprint. Official version on external website.
Introduces Bayesian decision making to fictitious play, giving "moderated fictitious play", and uses derivative action fictitious play to inform a variational learning procedure. Furthermore, discusses these two areas (learning in games and variational learning) in a common language.
A general approach to heteroscedastic linear regression. David S. Leslie, Robert Kohn and David J. Nott. Statistics and Computing 17 (2007), 131-146.
Download preprint. The original publication is available at www.springerlink.com.
Uses a Dirichlet process prior on the noise in a heteroscedastic linear regression, resulting in a very general regression model.
Generalised weakened fictitious play. David S. Leslie and E. J. Collins. Games and Economic Behavior 56 (2006), 285-298.
Download preprint. Official version on external website.
Studies a large class of learning algorithms based on fictitious play, using a unified convergence proof based on stochastic approximation.
Individual Q-learning in normal form games. David S. Leslie and E. J. Collins. SIAM Journal on Control and Optimization 44 (2005), 495-514.
Download the published version.
This paper studies a simple temporal difference learning algorithm in repeated normal form games, using results on multiple-timescales stochastic approximation obtained in the previous paper
Convergent multiple-timescales reinforcement learning algorithms in normal form games. David S. Leslie and E. J. Collins. Annals of Applied Probability 13 (2003), 1231-1251.
Download the published version.
This paper proves a result on multiple-timescales stochastic approximation, and uses it to investigate multi-agent actor-critic reinforcement learning.
Reinforcement learning in games. Ph.D. Thesis. Supervisor: E. J. Collins.
I was supported by CASE research studentship 00317214 from the UK Engineering and Physical Sciences Research Council in cooperation with BAE SYSTEMS.
As well as containing the foundations of the previous three papers, the thesis considers the stochastic approximation of an automata-based learning procedure, and some contraction properties of smooth best response operators.
Population-level reinforcement learning resulting in smooth best response dynamics. David S. Leslie and E. J. Collins. Technical report 02:13, Department of Statistics, University of Bristol.
This paper studies an evolutionary procedure that is closely related to the smooth best response dynamics.
