|
Research
Abstracts - 2006
|
Discourse and Dialog in the START Question Answering SystemBoris Katz & Sue FelshinThe ProblemQuestion answering systems based on language enable expressive and concise communication: the user can pose natural questions and receive natural and relevant responses. However, language can be ambiguous and vague. We should not force the user to adapt to the computer and formulate precise and unambiguous queries. Instead, the computer should adapt itself to ambiguity and missing data as a human does: by engaging in conversation, by inferring information missing from the question, and by giving intelligent related answers when the exact answer is not available (“near-miss” answers [2]). The START system (see [1] and related abstracts) provides users with convenient access to information through its ability to retain conversational state, recognize ellipsis, give appropriate near-miss answers, and report intelligently on ambiguity and failure to find information. MotivationConversational and interactive abilities allow START to make assumptions about information missing from the question or query the user about it, choose amongst multiple answers to questions, and provide intelligent near-miss answers. These conversational capabilities allow the user to interact with the system with convenient, natural brevity. ApproachSTART operates by parsing user questions into structural representations, matching these representations against its knowledge base, and retrieving information in order to return high-precision answers to questions. START's use of linguistic processing gives it several opportunites to incorporate discourse and dialog techniques to improve its operation:
EllipsisUsing the structural representation of the preceding question, START identifies what material in the preceding question should be replaced by the new, elliptical question phrase and chooses among multiple potential antecedents by examining their lexical features to find the closest semantic match.
Selecting Among Multiple ResultsThe more understanding a system has of the structure and intent of the question, the better it is able to select among multiple results. Because START performs a linguistic analysis of questions, it can distinguish types of ambiguity and multiplicity; it can distinguish whether multiple replies are different answers to the same interpretation of the question, or answers to different interpretations of the question. For example, when word(s) in the question can match more than one entity in a class, START chooses to respond about all entities or to query the user for clarification. Some entities are marked as important (manually assigned or heuristically calculated) and are preferred over others in the same class. Thus START presents the information which is most likely wanted, yet remains fully informative.
Near Misses, Partial Answers, and Recognizable FailuresFor structured and semi-structured databases indexed by START [3] (see related abstract), START can be confident that if no answer is found in the source, it is because the source does not contain the answer. START uses knowledge of real-world properties of entities in order to provide near-miss and partial answers. This requires ontological knowledge of how properties and entities relate within and across types of properties and entities, and therefore can only be implemented in the general case by building a complete ontology. In practice, however, the bulk of actual user questions address a relatively small number of types and properties, so that a small amount of ontology building can improve a comparatively large proportion of questions.
Research SupportThis work is supported in part by the Advanced Research and Development Activity as part of the AQUAINT Phase II research program. References:[1] Boris Katz. Annotating the World Wide Web Using Natural Language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO '97), Montreal, Canada, 1997. [2] Boris Katz and Sue Felshin. Discourse and Dialog in the START Question Answering System. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue (SIGdial '04), Demos during the Workshop, Cambridge, Massachusetts, 2004. [3] Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland, and Baris Temelkuran. Omnibase: Uniform Access to Heterogeneous Data for Question Answering. In Proc. of the 7th Int. Workshop on Applications of Natural Language to Information Systems (NLDB '02), Stockholm, Sweden, June 2002. |
||||||
|