CSAIL Research Abstract

Introduction

Architecture, Systems
& Networks

Language, Learning,
Vision & Graphics

Physical, Biological
& Social Systems

Theory

horizontal line

Planning Under Time and Representation Constraints

Natalia H. Gardiol & Leslie Pack Kaelbling

Introduction

In many logical planning domains, the crux of finding a solution often lies in overcoming an overwhelmingly large action space. Consider, just to illustrate, the classic blocks world domain: the number of ways to make a stack of a certain height grows exponentially with the number of blocks on the table; and if the outcomes of actions are uncertain, this apparently simple task becomes even more daunting. We want planning techniques that can deal with large state spaces and large, stochastic action sets; most compelling, realistic domains have these characteristics.

In order to describe large stochastic domains compactly, we need relational structures that can represent uncertainty in the dynamics. Relational representations allow the structure of the domain to be expressed in terms of object properties rather than object identities and thus yield a much more compact representation of a domain than the equivalent propositional version can.

Even planning techniques that use relational representations, however, often end up operating in a fully-ground state and action space when it comes time to find a solution. However, it is often true that many action instances have similar kinds of effects: for example, in a blocks world it often does not matter which block is picked up first as long as a stack of blocks is produced in the end. If it were possible to identify under what conditions actions produce equivalent kinds of effects, the planning problem could be simplified by considering a representative action (from each equivalence class) rather than the whole action space.

This work is about taking advantage of structured, relational action representations. First, we want to identify logically similar effects in order to reduce the effective size of the action space; second, we want to limit the state space under consideration to an informative, reachable subset.

Using an Envelope to Plan Under Time-pressure

Decision-making agents are often faced with complicated problems and not much time in which to find a solution. In such situations, the agent is better off acting quickly -- finding some reasonable solution fast -- than acting perfectly. Relational Envelope-based Planning (REBP) [1] is a planning approach designed for time-pressured decision-making problems. The figure below shows a very high-level system diagram of the main parts of the REBP system.

A high-level schematic of the REBP system

REBP proceeds in two phases. First, given a planning problem, an initial plan of action must be quickly found. Knowledge about the structure of action effects is used to eliminate potentially redundant information and focus the search onto high-probablility sequences of actions, known as an envelope of states [2]. Second, if the agent is given additional time, it can elaborate the original plan by considering lower-probability consequences of its action choices.

REBP lets us reason with ground states and actions; but, it is structured so as to limit how much of the ground state and action space is considered at a time. The REBP method provides a way to restrict attention to a small, useful subset of a large MDP space. It produces an initial plan quickly by taking advantage of generalization among action effects, and as a result behaves smarter in a large space much sooner than it could by waiting for a full solution. The figure below is a illustration of how REBP works given a tiny task of making a two-block stack in a domain with two blocks. The initial plan (a) is produced; if more time is available, alternative outcomes are considered (e.g., the hand may break) (b), and the envelope may be expanded to handle the new outcomes (c).

Using the relational envelope method, we can take real advantage of relational generalization to produce good initial plans efficiently, and use envelope-growing techniques to improve the robustness of our plans incrementally as time permits. REBP is a planning system that tries to dynamically reformulate an apparently intractable problem into a small, easily handled problem at run time.

Defining Equivalent Actions

In this work, we investigate a more compact way of representing states and transitions (i.e., actions): we want to represent the world in terms of logical relations. That is, rather than a state being composed of a set of propositional features, we think of it as being composed instead of a set of logical predicates. These predicates can not only specify values of state features, they can also specify relationships between domain objects. Since these predicates can make assertions about logical variables, a single predicate may in fact represent a large number of ground propositions.

In a relational setting, when the underlying MDP space implied by the full instantiation of the representation is potentially huge, a good initial envelope is crucial. It determines the quality of the early envelope policies and sets the stage for more elaborate policies later on.

To cope with this problem, we have identified a technique called equivalence-class sampling. We partiiton into equivalence classes the actions that produce similar effects on the properties of the variables in their scope. Then, the initial can be constructed chaining forward only a sampled action from each class. The sampled action is representative of the effects of any action from that class. Sampling reduces the branching factor at each step in the plangraph, so significantly larger domains can be handled.

The following figure shows a blocks-world arrangment (a), and the graph showing the relations that blocks 1 and 7 participate in (b). Blocks 1 and 7 are equivalent since their graphs are identical. The nodes in the graph different sizes to represent the labels for different object types: small circles for blocks, medium circles for colors, and a big circle for the table.

A relation graph
showing two equivalent blocks

Formalizing Approximate Equivalence

If collapsing equivalent states together can help to speed up planning, then condensing ``approximately'' equivalent states should help even more. In this section, we take a look at some possibilities for defining approximate equivalence of states.

Another possibility to pursue is the use of a kernel to help decide when two graphs are approximately the same [3][4]. Consider the case in which two states have different numbers of objects, but they agree on an important, local substructure. It would be useful to consider them equivalent. A kernel-style approach would provide a vocabulary of components for graph similarity different than that provided by the relations in the basis set.

Extension to Variable Levels of Abstraction

Our model of time is extremely simple. Actions take a unit of time, and they are executed in sequence. A more realistic approach would consider uncertainty in action duration and action concurrency. How this would be formalized in the REBP formalism has yet to be considered.

Another current shortcoming is that we consider all of our actions at a fixed level of resolution. Reducing the size of the action space can only help so much; if the plan is too long, it can still be difficult to find a solution, even if the actions are few. Forward-chaining search will never scale to long plans with many steps because of the inherent branching factors involved. We would be more flexible if we could consider hierarchies of plans. If a plan is becoming too long at our current level of resolution, we need to bump up to a level that lets us solve an abstracted plan with fewer steps. This seems like another place in which to use our idea of basis set of relations to guide a Sacerdoti-style hierarchy of abstractions [5].

A fundamental problem in AI research is identifying the proper level of abstraction for decision-making. Sometimes, decision-making at the level of motor movements is appropriate; other times, we want to plan at the level of moving batallions from one place to the next. The proper choice of abstraction level greatly influences our chances to compute a solution effectively. It is obvious that learning and planning need to happen at all of these levels. The question is how much the abstract levels need to know about details at the lower levels, and, how best to combine information across levels.

To extend envelope-based planning to hierarchical planning in very large domains, we hope to use the same techniques (that is, those that let us choose whether to explore additional parts of the state space) to also estimate the utility of seeking more detail from an abstract action. Usually, we want to plan at the most abstract level, because this lets us take bigger steps in time; sometimes, however, we need to move to a greater level of detail in order to resolve uncertainty. These techniques are fundamental for flexible planning and acting in large domains.

References

[1] Natalia H. Gardiol and Leslie Pack Kaelbling. Envelope-based Planning in Relational MDPs. In Neural Information Processing Systems 16 (NIPS*2003), Vancouver, Canada, 2004.

[1] Gardiol, Natalia H. and Kaelbling, Leslie P.. Envelope-based Planning in Relational MDPs. In Advances in Neural Information Processing 16 (NIPS 2003), 2004.

[2] Thomas Dean and Leslie Pack Kaelbling and Jak Kirman and Ann Nicholson. Planning Under Time Constraints in Stochastic Domains. 1995.

[3] David Haussler. Convolution Kernels on Discrete Structures. July 1999.

[4] Thomas Gartner and Kurt Driessens and Jan Ramon. Graph Kernels and Gaussian Processes for Relational Reinforcement Learning. In Thirteenth International Conference on Inductive Logic Programming (ILP-2003), 2003.

[5] Earl D. Sacerdoti. Planning in a Hierarchy of Abstraction Spaces. pp. 115-135, 1974.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)