For structured and semi-structured databases indexed by START [3] (see related abstract in this collection), START can be confident that if no answer is found in the source, it is because the source does not contain the answer.
START uses knowledge of real-world properties of entities in order to provide near-miss and partial answers. This requires ontological knowledge of how properties and entities relate within and across types of properties and entities, and therefore can only be implemented in the general case by building a complete ontology. In practice, however, the bulk of actual user questions address a relatively small number of types and properties, so that a small amount of ontology building can improve a comparatively large proportion of questions.
⇒ Show weather for Mt. Etna.
I don't have this information about Mt Etna. Instead, I can show you the weather for Caltanissetta, Italy, which is 4.0 miles from Mt Etna.
Source: START KB and The Weather Channel⇒ How far is Mt. Everest from Massachusetts?
The distance between Kathmandu, the capital of Nepal (where Mount Everest is located) and Boston, the capital of Massachusetts, is 7,376 miles (11,897 kilometers).
Source: START KB⇒ Show time zone for McKeesport.
I don't know the time zone of McKeesport, Pennsylvania, but Harrisburg, the capital of Pennsylvania, is located in Timezone 5 (5 hours west of Greenwich, England).⇒ How far is Toronto from Canada?
As far as I know, the city of Toronto, Ontario is located in Canada.
Ellipsis: One area of research is distinguishing when elliptical material should be considered an addition to previous material vs. a replacement: Given "What is the largest country in Europe?", does a followup of "In NATO?" mean "What is the largest country in Europe in NATO?" (France) or "What is the largest country in NATO?" (Canada)?
Selecting among multiple results: Our importance labels are sometimes derived manually, but we have also experimented with deriving them automatically. For example, a limited source can be used to derive importance labels for a broader source. The principal difficulty in acquiring importance labels in this way is determining equivalence between elements in two sets which may look different and be the same, as "Bill Clinton" vs. "William Jefferson Clinton", or look the same and be different, as the many "John Smith"s. Our research is ongoing in this area.
Near misses, partial answers, and recognizable failures: Success in this area relies largely on ontology building, which is an ongoing effort.
This work is supported in part by the Disruptive Technology Office as part of the AQUAINT Phase 3 research program.
[1] Boris Katz. Annotating the World Wide Web Using Natural Language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO '97), Montreal, Canada, 1997.
[2] Boris Katz and Sue Felshin. Discourse and Dialog in the START Question Answering System. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue (SIGdial '04), Demos during the Workshop, Cambridge, Massachusetts, 2004.
[3] Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland, and Baris Temelkuran. Omnibase: Uniform Access to Heterogeneous Data for Question Answering. In Proc. of the 7th Int. Workshop on Applications of Natural Language to Information Systems (NLDB '02), Stockholm, Sweden, June 2002.
Computer Science and Artificial Intelligence Laboratory (CSAIL) The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |