Title :
Factorized decision forecasting via combining value-based and reward-based estimation
Author :
Ziebart, Brian D.
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
A powerful recent perspective for predicting sequential decisions learns the parameters of decision problems that produce observed behavior as (near) optimal solutions. Under this perspective, behavior is explained in terms of utilities, which can often be defined as functions of state and action features to enable generalization across decision tasks. Two approaches have been proposed from this perspective: estimate a feature-based reward function and recursively compute values from it, or directly estimate a feature-based value function. In this work, we investigate the combination of these two approaches into a single learning task using directed information theory and the principle of maximum entropy. This enables uncovering which type of estimate is most appropriate-in terms of predictive accuracy and/or computational benefit-for different portions of the decision space.
Keywords :
decision theory; forecasting theory; learning (artificial intelligence); maximum entropy methods; action features; decision problems; directed information theory; factorized decision forecasting; feature-based reward function; feature-based value function; learning task; maximum entropy principle; reward-based estimation; sequential decision prediction; state features; value-based estimation; Entropy; Equations; Estimation; Mathematical model; Optimal control; Optimization; Strontium;
Conference_Titel :
Communication, Control, and Computing (Allerton), 2011 49th Annual Allerton Conference on
Conference_Location :
Monticello, IL
Print_ISBN :
978-1-4577-1817-5
DOI :
10.1109/Allerton.2011.6120271