The extraction process automatically discovers informational fields within the items’ HTML, which are then used to offer supplemented browsing and sorting features. In the screenshot below, 48 items have been extracted from an Amazon.com search. They are then refined down to only 6 items that are “Paperback” and published in 2004 and 2005. These two filtering operations are not provided by the Web site itself but supplemented by our Web browser extension.
Our Web browser extensions are also capable of re-displaying extracted information in different views and merging information from different Web sites. The following screenshot shows movie shows and restaurants extracted from 2 different sites but plotted together on a single map.
We will be exploring many more ways for users to manipulate information extracted from the Web—sharing it with other people, performing calculations on it, persisting it for long-term use, building up personal values for the publicly available information that they now can only look but not “touch.”
[1] Housing Maps. http://housingmaps.com/.
[2] Greasemonkey. http://greasemonkey.mozdev.org/.
[3] Michael Bolin, Matthew Webber, Philip Rha, Tom Wilson, and Robert C. Miller. Automation and Customization of Rendered Web Pages. In The Proceedings of the Conference on User Interface Software and Technology (UIST), pp 163-172, 2005.
[4] Tim Berners-Lee, James Hendler, and Ora Lassila. The Semantic Web, Scientific American, May 2001.
[5] David F. Huynh, Stefano Mazzocchi, and David R. Karger. Piggy Bank: Experience the Semantic Web Inside Your Web Browser. In The Proceedings of the International Semantic Web Conference, Galway, Ireland, November 2005.
![]() |
![]() The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA tel:+1-617-253-0073 - publications@csail.mit.edu |