Home -> Research -> Principled Solutions to Large-Scale Decision Making

Markov decision processes (MDPs) have been proved to be a useful model for planning under uncertainty. Most of the existing approaches, such as linear programming, value iteration, and policy iteration, solve MDP problems offline. When applying to large domains, exactly solving MDPs offline is often intractable due to the curse of dimensionality, i.e. the size of state space grows exponentially with the number of state variables. The very high computational complexity--typically polynomial in the size of the state space--may make it unacceptable in practice.

In contrast, online approaches try to alleviate this difficulty by focusing on computing a policy only for the current state. The key observation is that an agent could merely encounter a fraction of the overall states during execution. When interacting with the environment, online algorithms simultaneously evaluate all possible actions for the current state and select the "best" one. It recursively perform forward search on the reachable states by evaluating and updating the policy value in real-time. As shown in RTDP, LAO* and UCT, heuristic techniques can be utilized in the search process to reduce time and memory usage.

Hierarchical decomposition is another well-known method for scaling MDP algorithms to very large problems. By given a hierarchical structure of the domain, it decomposes the original model into a set of subproblems (or subtasks) that can be more easily solved. This method can benefit from the advantages of several useful techniques including temporal abstraction, state abstraction and subtask sharing.

Recent Projects
    Online Planning for Large MDPs with MAXQ Decomposition
        Markov decision processes (MDPs) provide an expressive framework for planning in stochastic domains. However, exactly solving a large MDP is often intractable due to the curse of dimensionality. Online algorithms help overcome the high computational complexity by avoiding computing a policy for each possible state. Hierarchical decomposition is another promising way to help scale MDP algorithms up to large domains by exploiting their underlying structure. In this paper, we present an effort on combining the benefits of a general hierarchical structure based on MAXQ value function decomposition with the power of heuristic and approximate techniques for developing an online planning framework, called MAXQ-OP. The proposed framework provides a principled approach for programming autonomous agents in a large stochastic domain. We empirically evaluated our algorithm on the Taxi problem--a common benchmark for MAXQ--to show the efficiency of MAXQ-OP. We have also been conducting a long-term case-study with the RoboCup soccer simulation 2D domain, which is extremely larger than domains usually studied in the literature, as the major benchmark to this research. The case-study showed that the agents developed with this framework and the related techniques reached outstanding performances, showing its high scalability to very large domains.

    RoboCup Soccer Simulation 2D
        RoboCup Simulation 2D Competition is an important case study of principled solutions to large-scale decision making. Please refer HERE to see the detailed information.
Publication
    Aijun Bai, Feng Wu, and Xiaoping Chen, Online Planning for Large MDPs with MAXQ Decomposition, In the AAMAS 2012 Workshop on Autonomous Robots and Multirobot Systems (ARMS 2012), Valencia, Spain, June 2012. .
    Aijun Bai, Feng Wu, and Xiaoping Chen, Online Planning for Large MDPs with MAXQ Decomposition (Extended Abstract), In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Valencia, Spain, June 2012. .
    Aijun Bai, Xiaoping Chen, Patrick MacAlpine, Daniel Urieli, Samuel Barrett, and Peter Stone, WrightEagle and UT Austin Villa: RoboCup 2011 Simulation League Champions, RoboCup-2011: Robot Soccer World Cup XV, Lecture Notes in Artificial Intelligence, Springer Verlag, Berlin, 2012. .
    Mao Chen, Xiaoping Chen. Sampling Based Approximate Algorithm for POMDP. Computer Simulation, Vol.23 No.5 P.64-67, 2006. .
    Xiang Li and Xiaoping Chen. A continual planning system in dynamic nondeterministic environments. Chinese Journal of Computer, Vol.28, No.7, July 2005. .