We have integrated Omnibase into the START natural language system (see [2] and related abstracts in this collection). Omnibase allows us to quickly and conveniently augment START's knowledge base with Web data sources. It is no longer necessary to compromise START's modularity with large amounts of resource-specific code. Students can learn in hours how to write Omnibase scripts, substantially reducing the time it takes to integrate a new data source. Omnibase has significantly increased the quantity and diversity of data which START can access and queries which it can answer.
Property scripts are very easy to write, such that novice programmers can learn to write simple scripts very quickly. Unfortunately, Web sites often change their formatting over time, and it is time-consuming to rewrite scripts as Web sites change. Our Hap-Shu system (see related abstract) makes it possible to write generalized scripts which work at a conceptual level rather than directly at the HTML level.
This work is supported in part by the Advanced Research and Development Activity as part of the AQUAINT Phase II research program.
[1] J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting Semistructured Information from the Web. In Workshop on Management of Semistructured Data at PODS/SIGMOD'97, Tsimmis, 1997.
[2] Boris Katz. Annotating the World Wide Web Using Natural Language. In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet (RIAO '97), Montreal, Canada, 1997.
[3] Boris Katz, Sue Felshin, Deniz Yuret, Ali Ibrahim, Jimmy Lin, Gregory Marton, Alton Jerome McFarland, and Baris Temelkuran. Omnibase: Uniform Access to Heterogeneous Data for Question Answering. In Proc. of the 7th Int. Workshop on Applications of Natural Language to Information Systems (NLDB '02), Stockholm, Sweden, June 2002.
[4] C. Knoblock, S. Minton, J. Ambite, N. Ashish, I. Muslea, A. Philpot, and S. Tejada. The Ariadne Approach to Web-based Information Integration. In International Journal on Cooperative Information Systems v. 10; 1/2, pp. 145–169, 1999.
[5] Jimmy Lin. The Web as a Resource for Question Answering: Perspectives and Challenges. In Proceedings of LREC-2002, 2002.
Computer Science and Artificial Intelligence Laboratory (CSAIL) The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |