LambdaLearner: Learning Skills in Deterministic DomainsHarold FoxIntroductionLambdaLearner is a new system being developed at the AIRE group of the MIT CSAIL. Our vision is a software agent that can learn the functions of a discrete set of available actions, and that can synthesize algorithms to accomplish any of a broad set of goal possibilities. We will describe the project's motivation, the plan of research, and an early output of the first stage, the AireProof proof language and proof checker. MotivationThe acquisition of skills is one of the most interesting cognitive capabilities that humans have. When people first learn to play a new sport or musical instrument, they start at a high level of reasoning. Each of their actions is carefully reasoned out, and they can give detailed explanations of their thought processes to an external observer. With this transparency, their early efforts are sluggish and clumsy. As humans become more proficient, their actions become smoother and faster, to the point where the activity is uncomprehensible and magical to an unskilled observer. Top performers themselves, moreover, are mysteriously at a loss when asked to explain how they are thinking. Often, they are unable to teach their skills to a novice. We contend that there are three steps that a software agent must go through to learn a skill: isolating the domain, learning the rules, and acquiring the skills. Isolating the domain consists of perceiving the world and discovering a set of concepts whose behavior relative to each other is independent of the context in which they are embedded. For example, a violin and a bow are concepts whose interaction is independent of whether they are played in a concert hall or on a mountain top. Once a domain is identified, the agent must learn the precise rules of how its own actions modify the state within the domain. In the violin example, this consists of learning how particular fingerings and bowing styles combine to make sounds. Finally, knowing how the domain works, the agent must discover useful algorithms and strategies, the skills, that can help the agent combine actions to achieve goals. To study this problem computationally, we have chosen to focus on the rule learning/skill acquisition side of the problem. Therefore, we are examining the problem in closed domains. As a result, we have chosen not to work with real world perceptual data, relying instead on simulated environments to test and validate our algorithms. To further simplify the problem, we also assume that the world state is fully observable and that it behaves deterministically: the same state/action pair always yields the same result state. The prototypical task to deploy such an agent is to learn to interact with another computer program such as a graphical user interface or a video game. What makes this problem challenging is that modeling another software program requires a full lambda calculus. We cannot represent the consequences of actions by a set of preconditions and postconditions as in standard STRIPS planning. Nor can we represent the action's consequence as a parametrized function to be optimized as in reinforcement learning. Instead, we model our agents' actions as lambda functions in a Scheme-like language that we summarize below. Research Program To demonstrate the feasability of this research, we are working to develop
algorithms to learn a very simple world: The Bit Flipper. In this world,
there are two rows of bits. The top row is the cursor bit. Its value is
1 in one location and 0 elsewhere. The bottom row is a row of bits that
can be toggled independently of one another. The agent has three actions:
This problem breaks down naturally into two comparably sized subproblems. The first one is to learn the logic corresponding to each action. The second one is to analyze this logic to automatically synthesize an algorithm that takes a start state and goal state as input and produces a sequence of actions as output. To make sure our representations are right, we are studying the second problem first. To solve the algorithm synthesis problem, we have turned to the classical field of proof checking and theorem proving. To generate an algorithm that works, we need to be able to verify the correctness of algorithms, and we need to be able to propose and validate conjectures that look likely to lead to a complete algorithm. As a first step, we have developed AireProof. In AireProof, we can specify the behavior of the BitFlipper game, define a function that can solve this game, and prove the correctness of the function. After the full problem has been solved in the BitFlipper world, we will slowly expand to more complex environments with more objects and more complex strategies. Eventually, we would like to demonstrate the system working on a real program, such as an early Atari video game with its simple logic, yet challenging game play. Early Work: AireProof, A Proof Language and CheckerAireProof is a Scheme-like language for specifying proofs. The objective is to be able to write proofs for complex theorems through a combination of smaller lemmas the same way we, in programming, specify complex functions as a combination of smaller ones. The building blocks for AireProof expressions are primitives, conses, and lambda functions. AireProof has two types of statements: theorems and definitions. A definition
assigns a particular symbol to an expression. A theorem states that a
particular expression is equal to There are several notable differences between AireProof and Scheme.
For exmaple, two built-in AireProof functions cannot be evaluated directly:
There are two further distinctions between AireProof and Scheme. First, AireProof does not allow side effects. Once a definition is made, it is permanent. Second, AireProof does not allow recursive definitions. To create a recursive function, we must use the suchthat primitive. For example, to define the append function, we would write: append := (suchthat (lambda (f) (= (f
x y) (if (primitive x) y (cons (car x) (f (cdr x) y))))))
To use this definition, we would first have to prove that the inner lambda function was well defined. In practice, we have a set of structural induction lemmas that make this easy. When the theorem-writer writes down a step in the proof, the checker must be able to unify the expression to true. It does this through equality substitution and modus ponens implication following. With each claim, the theorem writer is provides a set of hints to the proof checker: established theorems and axioms that combine to prove the claim. First the proof checker uses the established implication rules to form new facts. It then attempts to match the unproven claim with the facts it is given and the new facts it has created. To match two expressions together, the checker uses the established equality rules as well as several built-in primitive equality tests. |
||
|