CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Learning Probabilistic Relational Dynamics for Multiple Tasks

Ashwin Deshpande, Brian Milch, Luke S. Zettlemoyer & Leslie Pack Kaelbling

The Problem

A good model of world dynamics is essential for an agent to act intelligently in an unknown environment. By observing the effects of actions in the world, the agent can learn to model these dynamics as a set of probabilistic planning rules. However, when task-specific data is limited, existing learning methods often produce incomplete or imprecise rule sets. We employ transfer learning to address this issue. In the transfer learning framework, a set of examples is given for the target task as before; however, in addition, sets of data from related source task are also provided. By generalizing both the structural and parametric knowledge from the source tasks, a more complete and precise model of the target task can be recovered. Preliminary experiments have shown that transfer learning can significantly improve the efficiency of learning probabilistic planning rule sets.

Previous Work

Significant work has been done recently in the formulation and learning of probabilistic planning rules for a single task [2]. In these frameworks, the world dynamics can be encapsulated by uniquely applicable rules, each with a distribution of possible outcomes. Recent work at MIT has shown that greedily balancing training error and complexity through various heuristics can be effective in learning a reasonable rule set of the world dynamics for a single task[6,8]. We combine probabilistic relational rule learning using a hierarchical Bayesian model [3]. Following the popular hierarchical Bayesian framework, a data from a set of source tasks and a single target task are fed as input; from this data, a generalized prior is constructed to maximize the probability of the task-specific data, which ultimately allows data from the source tasks to supplement data from the target task to create a more complete model[1,4,7,9].

Approach

Our approach can be divided into two sections: the formulation of a generative model and the optimization of the general prior and task-specific rule sets. The generative model is used to describe the probability of deriving a task-specific rule set from the general prior. We assume that the general prior is structured similarly to a task-specific rule set with one key difference: multiple rules are allowed to apply to a single world state. Given this rule set prior, various modifications, each with an associated decrease in probability, are made such as adding/deleting rules, changing the context(applicability) of rules, adding/deleting/modifying outcomes, and altering outcome distributions via the Polya distribution [5]. Thus, the probability of modifying a general prior to a task-specific rule set can be computed efficiently. To compute the probability of the general prior, we take the modification probability of creating the general prior from an empty hyper-prior using the same generative process. Note that the generative process automatically reduces complexity as each rule or outcome has an associated cost of creation. Thus, for any assignment of general prior and task-specific rule sets, we can compute the probability of the entire model given the training data.

The second section of our approach involves optimizing the probability of the general prior and task-specific rule sets through various greedy searching methods. As the search space for each task is enormous, we use a coordinated ascent to alternately maximize each task-specific rule set for the source tasks and the general prior independently while holding all other parts of the model fixed. The greedy search for both the general prior and task-specific rule sets involves proposing the addition/deletion of rules and the modification of existing rules' contexts. A secondary greedy search is initiated for each proposed change to find the best corresponding outcome distribution. This secondary search involves proposing the addition/deletion of outcomes and the modification of existing outcomes. For each proposed outcome distribution, the Polya distribution likelihood is calculated to complete the new overall probability of the model given the proposed rule change. At each stage, the greedy search is repeated until no further changes lead to more probable models. Finally, given a stable general prior, a last greedy search is performed to calculate the most probable task-specific rule set for the target task.

Preliminary Results

Preliminary tests of our approach have yielded encouraging results. The following two graphs show the performance of employing transfer learning with a varying number of source tasks versus learning from the target task alone. The x-axis represents the number of examples in the target task while the y-axis represents the inverse of the error rate.

Results in the

Results in the

The first "block size" domain is a simple domain with structural modifications between related tasks. The second "slippery gripper" domain is a more complex domain with parametric modifications along multiple rules between related tasks. In both cases, the transfer learning shows great promise in transferring knowledge from source tasks to the target task by requiring significantly less examples than the non-transfer method to achieve comparable error rates.

Future Work

We plan to allow for the possibility of overlapping outcomes and/or rules in task-specific rule sets. This may allow for more compact rule sets and a better transfer of knowledge between tasks. Finally, we hope to test this framework on real or simulated data to demonstrate the robustness of our generative model and learning framework.

References:

[1] J. Baxter. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling. In Machine Learning, pp. 28:7-39, 1997.

[2] A. L. Blum and J. C. Langford. Probabilistic Planning in the Graphplan framework. In Proceedings of 5th European Conference on Planning, ,1999.

[3] D. V. Lindley. The Estimation of Many Parameters. Foundations of Statistical Inference, Holt, Rinehart and Winston, Toronto, 1971.

[4] Z. Marx, M. T. Rosenstein, L. P. Kaelbling and T. G. Deitterich. Transfer Learning with an Ensemble of Background Tasks. In NIPS Workshop on Inductive Transfer, 2005.

[5] T. P. Minka. Estimating a Dirichlet Distribution. Available at http://research.microsoft.com/minka/papers/dirichlet, 2003.

[6] H. M. Pasula, L. S. Zettlemoyer and L. P. Kaelbling, Learning Probabilistic Relational Planning Rules. In Proceedings of 14th ICAPS, 2004.

[7] K. Yu, V. Tresp and A. Schwaighofer. Learning Gaussian Processes from Multiple Tasks. In Proceedings of 22nd ICML, 2005.

[8] L. S. Zettlemoyer, H. M. Pasula and L. P. Kaelbling. Learning Planning Rules in Noisy Stohastic Worlds. In Proceedings of 20th AAAI, 2005.

[9] J. Zhang, Z. Ghahramani and Y. Yang. Learning Multiple Related Tasks using Latent Independent Component Analysis. In Proceedings of 18th NIPS, 2006.

 

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu