DocumentCode
610101
Title
Compression of Optimal Value Functions for Markov Decision Processes
Author
Kochenderfer, Mykel J. ; Monath, N.
Author_Institution
Lincoln Lab., Massachusetts Inst. of Technol., Cambridge, MA, USA
fYear
2013
fDate
20-22 March 2013
Firstpage
501
Lastpage
501
Abstract
Summary form only given. A Markov decision process (MDP) is defined by a state space, action space, transition model, and reward model. The objective is to maximize accumulation of reward over time. Solutions can be found through dynamic programming, which generally involves discretization, resulting in significant memory and computational requirements. Although computer clusters can be used to solve large problems, many applications require that solutions be executed on less capable hardware. We explored a general method for compressing solutions in a way that preserves fast random-access lookups. The method was applied to an MDP for an aircraft collision avoidance system. In our problem, S consists of aircraft positions and velocities and A consists of resolution advisories provided by the collision avoidance system, with S > 1.5 x 106, and A = 10. The solution to an MDP can be represented by an |S| x |A| matrix specifying Q*(s,a), the expected return of the optimal strategy from s after executing action a. Since, on average, only 6.6 actions are available from every state in our problem, it is more efficient to use a sparse representation consisting of an array of the permissible values of Q*, organized into into variable-length blocks with one block per state. An index provides offsets into this Q* array corresponding to the block boundaries, and an action array lists the actions available from each state. The values for Q* are stored using a 32-bit floating point representation, resulting in 534 MB for the three arrays associated with the sparse representation. Our method first converts to a 16-bit half-precision representation, sorts the state-action values within each block, adjusts the action array appropriately, and then removes redundant blocks. Although LZMA has a better compression ratio, it does not support real-time random access decompression. The behavior of the proposed method was demonstrated in simulation with negligible impact on safety and - perational performance metrics. Although this compression methodology was demonstrated on related MDPs with similar compression ratios, further work will apply this technique to other domains.
Keywords
Markov processes; aircraft control; collision avoidance; dynamic programming; -time random access decompression; LZMA; MDP; Markov decision processes; Q array; action array; action space model; aircraft collision avoidance system; aircraft positions; aircraft velocities; block boundaries; computational requirements; dynamic programming; floating point representation; memory requirements; operational performance metrics; optimal strategy; optimal value function compression; random-access lookups; reward model; sparse representation; state space model; transition model; variable-length blocks; word length 16 bit; word length 32 bit; Aircraft; Arrays; Collision avoidance; Computational modeling; Data compression; Dynamic programming; Markov processes; Markov decision process;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference (DCC), 2013
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
978-1-4673-6037-1
Type
conf
DOI
10.1109/DCC.2013.81
Filename
6543111
Link To Document