Recent Projects
|
Covering Number as a Complexity Measure for POMDP Planning and Learning |
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or "just covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number. |
Bounded Incremental Real-Time Dynamic Programming |
A real-time multi-step planning problem is characterized by alternating decision-making and execution processes, whole online decision-making time divided in slices between each execution, and the pressing need for policy that only relates to current step. We propose a new criterion to judge the optimality of a policy based on the upper and lower bound theory. This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains. We prove that, under certain conditions, one can obtain an optimal policy with arbitrary precision using such an incremental method. We present a bounded incremental real-time dynamic programming algorithm (BIRTDP). In the experiments of two typical real-time simulation systems, BIRTDP outperforms the other state-of-the-art RTDP algorithms tested. |
Accelerating Point-Based POMDP Algorithms via Greedy Strategies |
Many planning tasks of autonomous robots can be modeled as partially observable Markov decision process (POMDP) problems. Point-based algorithms are well-known algorithms for solving large-scale POMDP problems. Several leading point-based algorithms eschew some flawed but very useful heuristics to find an epsilon-optimal policy. This paper aims at exploiting these avoided heuristics by a simple framework. The main idea of this framework is to construct a greedy strategy and combine it with the leading algorithms. We present an implementation to verify the framework's validity. The greedy strategy in this implementation stems from some common ignored heuristics in three leading algorithms, and therefore can be well combined with them. Experimental results show that the combined algorithms are more efficient than the original algorithms. On some benchmark problems, the combined algorithms have achieved about an order of magnitude improvement in runtime. These results provide an empirical evidence for our proposed framework's efficiency. |