MIT CSAIL Research Abstracts

CSAIL Publications and Digital Archive header

Research Abstracts Home

CSAIL Digital Archive

Research Activities

CSAIL Home

horizontal line

Research Abstracts - 2007
horizontal line

horizontal line

Web-scale Environments for Deduction Systems

Chris Hanson

Introduction

We are building a distributed persistent storage system for the support of long-lived automated deduction systems that operate on global data sets. This system stores its data as directed graph structures, which provide exceptionally flexible representation and access. It uses distributed hash tables for indexing, so that access times scale logarithmically with the number of storage nodes. Unlike existing storage systems, our system implements dependency-based world views of the stored data.

A world view is a subset of a stored graph that is supported by a given set of explicit assumptions. By adding dependency information to all elements of a stored graph, the system can use a Truth Maintenance System to determine which elements have support. An application provides a set of assumptions with each query, and the system returns only those results supported by the assumptions. This allows the application to work with consistent subsets of the store, which is essential for making sense of globally inconsistent data.

Some other uses for world views are:

Hypothetical reasoning, in which a hypothetical assumption is introduced to a world view to see its effect.
Building overlays or slices, in which complex systems are described using multiple overlapping models. For example, electronic circuits are often modelled using one model for low frequencies and another for high frequencies. By keying these models to assumptions about the frequency domain, a program can work with one or the other by appropriate use of assumptions.

Design and technology

The storage system is implemented with the Web Consortium's Resource Description Framework (RDF) in order to leverage the work being done on the Semantic Web. In particular, the SPARQL Query Language provides an associative access mechanism for graphs to complement the navigational access of graph walking by link traversal. This design choice also makes the system immediately usable for Semantic Web applications, broadening its user base. The RDF model is layered on top of traditional SQL databases, although later we will replace that substrate with a more efficient one.

The system is monotonic: once written, nothing is ever deleted. Information that today seems useless or invalid may later be useful. Meanwhile, the world-view mechanism allows it to be ignored. Note that monotonicity doesn't preclude state, since additional markup can be used to indicate changes and deletions, much like the retraction of a statement "deletes" the statement. (This is actually better than simple side effects, since it maintains the history of what happened.) Monotonicity has useful engineering properties, as well: it simplifies caching, distribution, and replication.

The system uses distributed hash tables for indexing, so that it can be scaled to large data sets. Performance is enhanced by aggressive caching on the access side, and by exploiting the graph structure for prefetching. Replication and distribution of all stored data provides robustness. Data integrity is ensured by requiring all graph segments to be cryptographically signed by their publisher. A signer's identity can be provided as an assumption to the world-view mechanism, providing a simple way to ignore information other than that published by trusted sources.

Other uses

Building this storage system is only the beginning. Once available, it will facilitate the building of interesting applications. In addition to those described above, we imagine using this to enhance our Lisp-based programming systems with persistent storage that is better matched to Lisp than relational databases or file systems. We would like to build virtual-machine implementations that execute programs encoded as graph structure, providing an infrastructure for building autonomous agents. No doubt many other uses will be imagined.

References:

[1] J. Doyle. A truth maintenance system. In Artificial Intelligence 12, pp. 231-272, 1979.

[2] K. D. Forbus, J. de Kleer. Building Problem Solvers. MIT Press, Cambridge MA, 1993.

[3] T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, May 2001.

[4] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In Proceedings of the 2001 ACM SIGCOMM Conference, pp. 149-160, 2001.

Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu