Home -> Research -> Decision-theoretic Planning

In recent years, agent and multi-agent planning have been new research hotspots in the field of Artificial Intelligence with a broad prospect. In our laboratory, the research is on the basis of (partially observable) Markov decision process (POMDP) and its related theories. We studied the issues and solutions of these theories when contacting with the real-world application, and presented some theoretical analysis and improvement on a class of basic (partially observable) Markov decision-making algorithms.

Recent Projects
    Covering Number as a Complexity Measure for POMDP Planning and Learning
        Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or "just covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
    Bounded Incremental Real-Time Dynamic Programming
        A real-time multi-step planning problem is characterized by alternating decision-making and execution processes, whole online decision-making time divided in slices between each execution, and the pressing need for policy that only relates to current step. We propose a new criterion to judge the optimality of a policy based on the upper and lower bound theory. This criterion guarantees that the agent can act earlier in a real-time decision process while an optimal policy with sufficient precision still remains. We prove that, under certain conditions, one can obtain an optimal policy with arbitrary precision using such an incremental method. We present a bounded incremental real-time dynamic programming algorithm (BIRTDP). In the experiments of two typical real-time simulation systems, BIRTDP outperforms the other state-of-the-art RTDP algorithms tested.
    Accelerating Point-Based POMDP Algorithms via Greedy Strategies
        Many planning tasks of autonomous robots can be modeled as partially observable Markov decision process (POMDP) problems. Point-based algorithms are well-known algorithms for solving large-scale POMDP problems. Several leading point-based algorithms eschew some flawed but very useful heuristics to find an epsilon-optimal policy. This paper aims at exploiting these avoided heuristics by a simple framework. The main idea of this framework is to construct a greedy strategy and combine it with the leading algorithms. We present an implementation to verify the framework's validity. The greedy strategy in this implementation stems from some common ignored heuristics in three leading algorithms, and therefore can be well combined with them. Experimental results show that the combined algorithms are more efficient than the original algorithms. On some benchmark problems, the combined algorithms have achieved about an order of magnitude improvement in runtime. These results provide an empirical evidence for our proposed framework's efficiency.
Publication
    Zongzhang Zhang, Michael Littman, and Xiaoping Chen. Covering Number as a Complexity Measure for POMDP Planning and Learning, Proc. of the 26th AAAI Conference on Artificial Intelligence (AAAI 2012), Toronto, Canada,. July 22-26, 2012. (To appear).
    Zongzhang Zhang and Xiaoping Chen. Accelerating Point-Based POMDP Algorithms via Greedy Strategies, Proc. of the 2nd International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR 2010), Darmstadt, Germany,. November 15-18, 2010..
    FAN Chang-Jie and CHEN Xiao-Ping. Optimal Action Criterion and Algorithm Improvement of Real-Time Dynamic Programming. Journal of Software,. Vol.19, No.11, November 2008, pages 2869-2878.
    Changjie Fan and Xiaoping Chen. Bounded Incremental Real-Time Dynamic Programming. IEEE Proceedings of FBIT 2007,. Jeju Island, Korea, 2007..