
Research
Abstracts  2007 
Learning Probabilistic Relational Dynamics for Multiple TasksAshwin Deshpande, Brian Milch, Luke S. Zettlemoyer & Leslie Pack KaelblingThe ProblemA good model of world dynamics is essential for an agent to act intelligently in an unknown environment. By observing the effects of actions in the world, the agent can learn to model these dynamics as a set of probabilistic planning rules. However, when taskspecific data is limited, existing learning methods often produce incomplete or imprecise rule sets. We employ transfer learning to address this issue. In the transfer learning framework, a set of examples is given for the target task as before; however, in addition, sets of data from related source task are also provided. By generalizing both the structural and parametric knowledge from the source tasks, a more complete and precise model of the target task can be recovered. Preliminary experiments have shown that transfer learning can significantly improve the efficiency of learning probabilistic planning rule sets. Previous WorkSignificant work has been done recently in the formulation and learning
of probabilistic planning rules for a single task [2].
In these frameworks, the world dynamics can be encapsulated by uniquely
applicable rules, each with a distribution of possible outcomes. Recent
work at MIT has shown that greedily balancing training error and complexity
through various heuristics can be effective in learning a reasonable rule
set of the world dynamics for a single task[6,8].
We combine probabilistic relational rule learning using a hierarchical
Bayesian model [3]. Following the popular hierarchical
Bayesian framework, a data from a set of ApproachOur approach can be divided into two sections: the formulation of a generative model and the optimization of the general prior and taskspecific rule sets. The generative model is used to describe the probability of deriving a taskspecific rule set from the general prior. We assume that the general prior is structured similarly to a taskspecific rule set with one key difference: multiple rules are allowed to apply to a single world state. Given this rule set prior, various modifications, each with an associated decrease in probability, are made such as adding/deleting rules, changing the context(applicability) of rules, adding/deleting/modifying outcomes, and altering outcome distributions via the Polya distribution [5]. Thus, the probability of modifying a general prior to a taskspecific rule set can be computed efficiently. To compute the probability of the general prior, we take the modification probability of creating the general prior from an empty hyperprior using the same generative process. Note that the generative process automatically reduces complexity as each rule or outcome has an associated cost of creation. Thus, for any assignment of general prior and taskspecific rule sets, we can compute the probability of the entire model given the training data. The second section of our approach involves optimizing the probability of the general prior and taskspecific rule sets through various greedy searching methods. As the search space for each task is enormous, we use a coordinated ascent to alternately maximize each taskspecific rule set for the source tasks and the general prior independently while holding all other parts of the model fixed. The greedy search for both the general prior and taskspecific rule sets involves proposing the addition/deletion of rules and the modification of existing rules' contexts. A secondary greedy search is initiated for each proposed change to find the best corresponding outcome distribution. This secondary search involves proposing the addition/deletion of outcomes and the modification of existing outcomes. For each proposed outcome distribution, the Polya distribution likelihood is calculated to complete the new overall probability of the model given the proposed rule change. At each stage, the greedy search is repeated until no further changes lead to more probable models. Finally, given a stable general prior, a last greedy search is performed to calculate the most probable taskspecific rule set for the target task. Preliminary ResultsPreliminary tests of our approach have yielded encouraging results. The following two graphs show the performance of employing transfer learning with a varying number of source tasks versus learning from the target task alone. The xaxis represents the number of examples in the target task while the yaxis represents the inverse of the error rate. The first "block size" domain is a simple domain with structural modifications between related tasks. The second "slippery gripper" domain is a more complex domain with parametric modifications along multiple rules between related tasks. In both cases, the transfer learning shows great promise in transferring knowledge from source tasks to the target task by requiring significantly less examples than the nontransfer method to achieve comparable error rates. Future WorkWe plan to allow for the possibility of overlapping outcomes and/or rules in taskspecific rule sets. This may allow for more compact rule sets and a better transfer of knowledge between tasks. Finally, we hope to test this framework on real or simulated data to demonstrate the robustness of our generative model and learning framework. References:[1] J. Baxter. A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling. In Machine Learning, pp. 28:739, 1997. [2] A. L. Blum and J. C. Langford. Probabilistic Planning in the Graphplan framework. In Proceedings of 5th European Conference on Planning, ,1999. [3] D. V. Lindley. The Estimation of Many Parameters. Foundations of Statistical Inference, Holt, Rinehart and Winston, Toronto, 1971. [4] Z. Marx, M. T. Rosenstein, L. P. Kaelbling and T. G. Deitterich. Transfer Learning with an Ensemble of Background Tasks. In NIPS Workshop on Inductive Transfer, 2005. [5] T. P. Minka. Estimating a Dirichlet Distribution. Available at http://research.microsoft.com/minka/papers/dirichlet, 2003. [6] H. M. Pasula, L. S. Zettlemoyer and L. P. Kaelbling, Learning Probabilistic Relational Planning Rules. In Proceedings of 14th ICAPS, 2004. [7] K. Yu, V. Tresp and A. Schwaighofer. Learning Gaussian Processes from Multiple Tasks. In Proceedings of 22nd ICML, 2005. [8] L. S. Zettlemoyer, H. M. Pasula and L. P. Kaelbling. Learning Planning Rules in Noisy Stohastic Worlds. In Proceedings of 20th AAAI, 2005. [9] J. Zhang, Z. Ghahramani and Y. Yang. Learning Multiple Related Tasks using Latent Independent Component Analysis. In Proceedings of 18th NIPS, 2006. 

