CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line


Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

Syntactic Decomposition for Complex Question Answering

Boris Katz, Sue Felshin & Gary Borchardt

The Problem

Current question answering systems have demonstrated an ability to answer simple factoid and list questions such as “Who is the current prime minister of Italy?” and “In which countries is Spanish spoken?” However, many questions are more complex. For example, the question “What books did the author of War and Peace write?” requires a question answering system to determine not only who the author of War and Peace is, but also what (other) books that person has written. We are developing technology that will make it possible to decompose a complex question into simpler questions, each of which can be answered individually; the individual responses will then be combined into an answer to the original complex question.


By using syntactic means to decompose complex questions into simpler factoid and list questions, we can leverage our existing question answering machinery and answer complex queries at relatively low cost. See Figure 1 for an example of syntactic decomposition as implemented in our START question answering system.

screen shot of START answering “What is the population of the capital of Kenya?”

Figure 1: Our START system answering a complex question with the answer including contextual information regarding how the question was answered.

Related Work

START's ternary expressions (subject–relation–object triples) can richly and efficiently represent the user's questions, as well as natural language annotations used to describe content, and therefore lend themselves well to syntactic decomposition of complex questions (see [1] [2] and related abstracts). For further information on syntactic decomposition, see [3].


We may approach syntactic decomposition from two directions: first, syntax, and second, meta-knowledge about the contents of available information resources. Syntax informs us of which decompositions of a question are legal, and meta-knowledge about information resources can guide the system to answerable subquestions when many legal decompositions exist.

Legal syntactic decompositions are proper branches of the parse tree. Figure 2 shows two proper and one improper branch. The structure of a sentence, of course, is related to the meaning conveyed by the sentence. For example, if asked "Who was the third Republican president?", we must first find the Republican presidents (Lincoln, Grant, Hayes, Garfield, ...) and then find the third one, rather than first finding the third president (John Adams), and then seeing if he was a Republican. The relation which is lower in the parse tree must be resolved first, with two exceptions:

  • A system's knowledge base may be able to assist it to resolve several relations in a single step, and if so, it is more efficient to resolve the relations of a large (but proper) branch in a single retrieval operation. For example, to find the largest country, we might list all countries, find their sizes, find the largest of these, and select the country with this size, but if, using the knowledge base, the system can perform that selection in a single operation, that is far more efficient. Annotations can serve to describe the "shape" of the knowledge base(s) and indicate which sets of relations can be resolved in a single step.
  • If semantic knowledge can be found in the knowledge base which indicates that two relations are independent, the relations can be resolved in either order, and in particular, in whichever order the knowledge base indicates is more efficient.

proper and improper branches and ordering

Figure 2: Proper branches are outlined in green, improper in yellow. For efficient knowledge matching, the larger branch (labeled #1) should be tried first and the smaller (#2) second.

Future Work

We plan to examine more closely the relation between syntax and legal decompositions, with particular attention to the validity of resolving a relation before another relation which is lower in the parse tree: Is it strictly the case that doing so makes it impossible to guarantee a correct interpretation of the sentence, or may higher relations be resolved first provided they are within the same projection, island, clause, or other syntactic element?

We also plan to further explore the relation between the "shape" of the knowledge base (as indicated by available annotations) and the path of resolving a complex query, in order to further increase the efficiency of our search.

Research Support

This work is supported in part by the Disruptive Technology Office as part of the AQUAINT Phase 3 research program.


[1] Boris Katz. Using English for Indexing and Retrieving. In Artificial Intelligence at MIT: Expanding Frontiers, v. 1; Cambridge, MA, 1990.

[2] Boris Katz. Annotating the World Wide Web Using Natural Language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO '97), Montreal, Canada, 1997.

[3] Boris Katz, Gary Borchardt, and Sue Felshin. Syntactic and Semantic Decomposition Strategies for Question Answering from Multiple Resources. In Proceedings of the AAAI 2005 Workshop on Inference for Textual Question Answering, pp. 35–41, 2005.


vertical line
vertical line
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu