Figure 1: START's answer to the question
“Who was president of the United States in 1881?” with
knowledge from the World Wide Web. |
Despite the effectiveness of START and Omnibase in solving user information
needs, there are still several major unsolved challenges.
- The Scaling Problem: The sheer amount of information available
in the world today places a practical limit on the amount of knowledge
that can be incorporated into a system by a single research group. Although
natural language annotations are easy and intuitive, there is simply
too much content.
- The Knowledge Engineering Bottleneck: Manual knowledge engineering
is required to expand our system's knowledge coverage; integrating Web
sources under Omnibase requires site-specific wrapper scripts. Consequently,
only trained individuals can add knowledge to START and Omnibase.
- The Implicit Knowledge Problem: Many components of knowledge
do not appear explicitly within resources, but require the application
of domain knowledge or rules of inference to extract them. In other
cases, it is necessary to combine fragments of knowledge from multiple
resources in order to derive sought-after components of knowledge.
- The Fickle Web Problem: An undesirable side-effect of the Web's
dynamic nature is instability of site layout and page content. This
poses a serious problem to wrapper scripts custom-tailored to specific
formats. Often, major changes to page content or layout structure will
require significant modification of associated scripts.
To address these challenges, we have pursued many different solutions.
The following abstracts in this volume describe each of these technologies
in greater detail:
Research Support
This work is supported in part by the Advanced Research and Development
Activity as part of the AQUAINT Phase II research program.
References:
[1] Boris Katz. Annotating
the World Wide Web Using Natural Language. In Proceedings of the
5th RIAO Conference on Computer Assisted Information Searching on the
Internet (RIAO '97), Montreal, Canada, 1997.
[2] Boris Katz, Sue Felshin, Deniz Yuret,
Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland,
and Baris Temelkuran. Omnibase:
Uniform Access to Heterogeneous Data for Question Answering. In Proc.
of the 7th Int. Workshop on Applications of Natural Language to Information
Systems (NLDB '02), Stockholm, Sweden, June 2002.
|
|
|
Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139
- USA
tel:+1-617-253-0073 - publications@csail.mit.edu
|
|
|