Interlingua-based Translation for Language Learning Systems
John Lee & Stephanie Seneff
For the past several years, we have been developing systems that will enable a student of a foreign language to practice conversation in a non-threatening environment [1,2]. This research has built upon our previous research in developing multilingual dialogue systems. We now have in place a system intended to help a native English speaker learn Mandarin, by engaging them in conversation in the weather domain. The student converses over the telephone, and can speak in either English or Mandarin at any time. Mandarin queries about the weather are answered in Mandarin, whereas English queries are automatically translated into Mandarin. The student can then attempt to imitate the provided Mandarin query to push the conversation forward.
Our interest here is in creating a reversed scenario in which a native Mandarin speaker is learning English. Ideally, reversing the language roles would simply require reassigning L1 and L2 in a top-level system control file. However, a critical component of the reversed system is missing: a high quality translation capability from Mandarin to English. While we already have a capability to understand Mandarin queries, the grammar for understanding only needed to capture the semantic content of the utterance and hence did not preserve sufficient detail for accurate translation.
Interlingua-based Translation Framework
For second-language learning applications, the machine translation system must be able to provide grammatical, near-perfect translation. Wide coverage is not required, since the typical student is concerned with particular domains of interest. In fact, if a sentence is out-of-domain or ill-formed, it would be better for the system to tell the student so, than to attempt translation at the risk of teaching the student an incoherent sentence.
The interlingual approach performs syntactic and semantic analysis on an input sentence and maps it to a language-neutral meaning representation ("semantic frame"). Rich linguistic knowledge, encoding long-distance dependencies, is easily incorporated into the semantic frame, and is taken into account during generation. A further advantage for language learning applications is that the frame, as a canonical meaning representation, is useful for detecting inappropriate or missing grammatical constructions, thereby giving feedback to the student. The design of such a meaning representation is challenging for a wide domain, but has been shown to be feasible for restricted ones.
A natural language generation system then maps the semantic frame to a surface string using formal generation rules. It specifies the order in which components in the frame are to be processed into substrings, and consults a generation lexicon to obtain multiple word-sense surface-form mappings, and appropriate inflectional endings.
Our system (called "formal" in the rest of this abstract) consists of three components for L1-to-L2 translation: a natural language understanding (NLU) system , which maps a sentence in language L1 to a semantic frame encoding syntax and semantics, a transfer phase , which modifies the semantic frame to account for linguistic properties unique to language L2, and a natural language generation (NLG) system , which produces a well-formed surface string in L2.
We utilized a set of about 900 Mandarin utterances spoken by native Mandarin speakers conversing with our MUXING weather information system . We examined the English output sentences manually, and rated the quality, given the original Mandarin sentence as a reference, as Perfect, Acceptable, or Wrong.
We were interested in comparing our formal translation system with a statistical approach, for which we utilized PHARAOH  (called "statistical" in the rest of this abstract), a state-of-the-art phrase-based translation system. This was easily trained on the same set of sentence pairs that were used to develop the formal system.
Translation evaluation results. Note: "Speech" = speech recognizer output; "Transcript" = manual transcriptions
The average rankings of the English output sentences are shown in the table above. As expected, translation quality degrades with speech recognizer output. In terms of the proportion of Perfect sentences, the formal method outperformed the statistical one by over 16% in both the transcribed and automatically recognized test sets.
Some of the errors produced by the statistical method were significant for language-learning applications. In sentences with future temporal expressions, it consistently preferred "is" instead of the correct "will", a long-distance dependency that it was unable to learn. In a sentence like "ming2_tian1 zhong1_xi1_bu4 you3 mei2 you3 xue3" (literally, "tomorrow Midwest have not have snow"), both "is" and "will" are possible translations of "you3", whose tense is not systematically marked. Its proper translation thus depends on temporal expressions elsewhere in the sentence.
The statistical method was unable to handle some other long-distance grammar constraints. Consider the question "ni3 zhi1 bu4 zhi1_dao4 ming2_tian1 hui4 xia4_yu3 ma5" (literally, "you know not know tomorrow going-to rain"), which was translated as "do you know about will it rain tomorrow". While the two halves of the sentence are perfectly good phrases on their own, as a whole they form an ungrammatical sentence.