Abstracts - 2007
Learning Probabilistic Relational Dynamics for Multiple Tasks
Ashwin Deshpande, Brian Milch, Luke S. Zettlemoyer & Leslie Pack Kaelbling
A good model of world dynamics is essential for an agent to act intelligently in an unknown environment. By observing the effects of actions in the world, the agent can learn to model these dynamics as a set of probabilistic planning rules. However, when task-specific data is limited, existing learning methods often produce incomplete or imprecise rule sets. We employ transfer learning to address this issue. In the transfer learning framework, a set of examples is given for the target task as before; however, in addition, sets of data from related source task are also provided. By generalizing both the structural and parametric knowledge from the source tasks, a more complete and precise model of the target task can be recovered. Preliminary experiments have shown that transfer learning can significantly improve the efficiency of learning probabilistic planning rule sets.
Significant work has been done recently in the formulation and learning
of probabilistic planning rules for a single task .
In these frameworks, the world dynamics can be encapsulated by uniquely
applicable rules, each with a distribution of possible outcomes. Recent
work at MIT has shown that greedily balancing training error and complexity
through various heuristics can be effective in learning a reasonable rule
set of the world dynamics for a single task[6,8].
We combine probabilistic relational rule learning using a hierarchical
Bayesian model . Following the popular hierarchical
Bayesian framework, a data from a set of
Our approach can be divided into two sections: the formulation of a generative model and the optimization of the general prior and task-specific rule sets. The generative model is used to describe the probability of deriving a task-specific rule set from the general prior. We assume that the general prior is structured similarly to a task-specific rule set with one key difference: multiple rules are allowed to apply to a single world state. Given this rule set prior, various modifications, each with an associated decrease in probability, are made such as adding/deleting rules, changing the context(applicability) of rules, adding/deleting/modifying outcomes, and altering outcome distributions via the Polya distribution . Thus, the probability of modifying a general prior to a task-specific rule set can be computed efficiently. To compute the probability of the general prior, we take the modification probability of creating the general prior from an empty hyper-prior using the same generative process. Note that the generative process automatically reduces complexity as each rule or outcome has an associated cost of creation. Thus, for any assignment of general prior and task-specific rule sets, we can compute the probability of the entire model given the training data.
The second section of our approach involves optimizing the probability of the general prior and task-specific rule sets through various greedy searching methods. As the search space for each task is enormous, we use a coordinated ascent to alternately maximize each task-specific rule set for the source tasks and the general prior independently while holding all other parts of the model fixed. The greedy search for both the general prior and task-specific rule sets involves proposing the addition/deletion of rules and the modification of existing rules' contexts. A secondary greedy search is initiated for each proposed change to find the best corresponding outcome distribution. This secondary search involves proposing the addition/deletion of outcomes and the modification of existing outcomes. For each proposed outcome distribution, the Polya distribution likelihood is calculated to complete the new overall probability of the model given the proposed rule change. At each stage, the greedy search is repeated until no further changes lead to more probable models. Finally, given a stable general prior, a last greedy search is performed to calculate the most probable task-specific rule set for the target task.
Preliminary tests of our approach have yielded encouraging results. The following two graphs show the performance of employing transfer learning with a varying number of source tasks versus learning from the target task alone. The x-axis represents the number of examples in the target task while the y-axis represents the inverse of the error rate.
The first "block size" domain is a simple domain with structural modifications between related tasks. The second "slippery gripper" domain is a more complex domain with parametric modifications along multiple rules between related tasks. In both cases, the transfer learning shows great promise in transferring knowledge from source tasks to the target task by requiring significantly less examples than the non-transfer method to achieve comparable error rates.
We plan to allow for the possibility of overlapping outcomes and/or rules in task-specific rule sets. This may allow for more compact rule sets and a better transfer of knowledge between tasks. Finally, we hope to test this framework on real or simulated data to demonstrate the robustness of our generative model and learning framework.