Sifter works on web pages that are consistently structured, such as those generated from databases through server-side templates. These sites belong to large institutions and companies who can afford engineering costs of three-tier web applications. While these institutions and companies are important citizens of the Web, they do not represent the whole Web, nor does their data represent all information on the Web. Outside of Sifter's aim are the web pages created by individuals and small groups who do not have enough resources to afford enterprise web solutions. These information producers must often settle for static web pages that offer few of those features users have come to expect from interacting with enterprise web sites. Moreover, being hand-coded, those static web pages are far more difficult to scrape automatically.
To these information producers we propose Exhibit [4], a very lightweight structured data publishing framework that produces rich visualizations (maps, timelines, scatter plots, pivot tables, etc.) and offers sophisticated browsing features (sorting, filtering) while requiring only knowledge of HTML (see screenshot below). By baiting these information producers with rich user interfaces at very low cost, we get in return structured data published on many topics that have never appealed the large information producers. It is in the Long Tail [5] that we will find the diversity so representative of the Web, and it is in enabling and cherishing this diversity that we will make the Semantic Web the universal medium where anyone can publish structured data, not just the large information producers.
Millions of individuals and small groups publishing through Exhibit do not immediately contribute to a usable Semantic Web. Data from different sources might fit into the same data model, but the schemas from those sources will not readily align. We are in the process of designing user interfaces that allow naive users, not programmers or data modeling experts, to recombine data from disparate sources and get value out of the aggregate.
This work is being done in the Haystack group [6], the User Interface Design group [7], and the Simile project [8]. It has been supported by the National Science Foundation, the Biomedical Informatics Research Network, and Nokia.
[1] W3C Semantic Web Activity. http://www.w3.org/2001/sw/.
[2] David F. Huynh, Stefano Mazzocchi, David R. Karger. Piggy Bank: Experience the Semantic Web Inside Your Web Browser. In The Proceedings of the International Semantic Web Conference, Galway, Ireland, November 2005. [pdf]
[3] David F. Huynh, Robert C. Miller, David R. Karger. Enabling Web Browsers to Augment Web Sites' Filtering and Sorting Functionality. In The Proceedings of the User Interface Software Technology Conference, Montreux, Switzerland, October 2006. [screencast, pdf]
[4] David F. Huynh, David R. Karger, Robert C. Miller. Exhibit: Lightweight Structured Publishing Framework. In The Proceedings of the World Wide Web Conference, Banff, Canada, May 2007. [pdf]
[5] The Long Tail. http://en.wikipedia.org/wiki/Long_tail.
[6] Haystack project. http://haystack.csail.mit.edu/.
[7] User Interface Design group. http://groups.csail.mit.edu/uid/.
[8] Simile project. http://simile.mit.edu/.
Computer Science and Artificial Intelligence Laboratory (CSAIL) The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |