CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

The Re:Search Engine -- Helping People Return to Information on the Web

Jaime Teevan

Abstract

Re-finding information is consistently cited as a problem on the Web [3]. One reason re-finding on the Web is difficult is that while people rely on a considerable amount of context to return to information (e.g., the original path taken to it), the Web makes no guarantee that the context will remain static. Online news sites change when new stories are written, and Web sites change as their hosts edit them, and search results change as search engines update their indices to reflect the current state of the Web. This paper describes the Re:Search Engine, a system built to help people return to information in the dynamic environment of the Web. The Re:Search Engine makes it easy to find previously located search results, even when the result list for a query has changed. Because people tend not to remember much of what is presented in a search result list, when a person repeats a query, the Re:Search Engine is able to preserve what the user remembers about the original results while still presenting new information.

People Rely on Consistency When Searching...

People rely on consistency in their information environment when searching for information, and are particularly likely to expect consistency when searching for previously encountered information. For example, if Connie, while looking to purchase a GPS system, found several she liked through a search for “GPS”, she would expect to be able to use the same query to locate the exact same systems again.

The importance of consistency is revealed through two studies conducted in order to gain insight as to how people return to information. One, a modified diary study of fifteen computer science graduate students performing personally motivated searches in their email, their files, and on the Web, found that even among this technically savvy population, participants preferred navigating to what they were looking for along known paths over jumping directly to it [7]. Similarly, the other study, a naturalistic study analyzing instances of re-finding mined from Web pages containing the phrase “Where’d it go?”, found that when people expressed difficulty re-finding, they commonly described the original access path they took to their target, and were relatively unlikely to describe the temporal aspects of their original encounter with the information [9].

...But the Web Changes

Despite the importance of consistency in re-finding, information on the Web changes. Search results, often an important step in the original access path to a piece of information, change regularly as search engines update their indices to reflect the current state of the Web. This can be illustrated by looking at the top ten results for ten queries issued to Google and tracked over the course of a year. On average, only 2.7 of the results remained in the top ten after a year; three results disappeared from the list within the first month. Thus, if Connie wanted to revisit two of the top ten GPS systems she found in her original search for “GPS”, she would have a 51% chance of not being able to locate one of them after a month. Attempts to improve retrieval quality through personalization or collaborative filtering will increase the frequency of search result changes.

Some search systems (e.g., a9.com) have attempted to assist with re-finding by allowing users to mark pages of interest to return to later. However, people often do not employ keeping strategies that require active involvement [3]. Hayashi, et al. [2], describe a system that allows the user to choose to either interact with a cached version of the Web or the live Web. While employing such methods to keep the results for repeat queries static would make re-finding simpler, it would deny users the opportunity to discover new information. For example, if Connie re-issues her “GPS” search, in addition to re-finding the systems she liked before, it is possible she would also be interested in learning about newly available systems. Even though changes hinder returning to previously viewed information, they benefit users by providing new information.

Original results New results Re:Search Engine's results
(a) Original results (b) New results (c) Re:Search Engine's results
Figure 1: An example of the Re:Search Engine in action. Figure 1a shows the search results when a user first searched for "GPS" (visited links are italicized). Figure 1b shows the results when the query was next performed, and Figure 1c shows how the Re:Search Engine combined what a person is likely to remember from Figure 1a with what is new in Figure 1b.
Solution -- The Re:Search Engine

The Re:Search Engine addresses the problem of search result consistency by seamlessly integrating old relevant information with new. The engine consists of a Web browser toolbar that interfaces with any search engine, including Google. When a person issues a search that they issued before, the Re:Search Engine fetches the current results from the underlying search engine, and merges the newly available information with what the user remembers about the previously returned search results. Because people tend to remember very little of the original result list, it is possible to preserve the salient features of the old results while still presenting new information.

Implicit measures of attention, like those discussed by Kelly and Teevan [4], as well as psychological aspects of memory [e.g., 6], suggest the following aspects of the original result list are likely to be remembered:

  • Search results that were clicked on.
  • Anomalous search results.
  • The first and last result in the result list.

The Re:Search Engine keeps these cues constant, while updating all else with the most recent information available.

Consider as an example the search in Figure 1. Figure 1a shows the results returned when Connie first searched for “GPS”. Later, when she re-performed the same query, the results had changed to include several new GPS systems (Figure 1b). Instead of directly returning the new results, which could be disorienting, or the original results, which might omit items Connie would want to see, the Re:Search Engine merged the two (Figure 1c), preserving memorable aspects of the original results, such as followed links (italicized), anomalous results (“Geological and Planetary Sciences”), and the ordering of the first and last results, while including new results and an updated result summary.

A preliminary paper prototype study of people interacting with lists of document summaries suggests that changes such as wording and ordering go unnoticed, even when the changes occur as the person interacts with the information [8]. Further studies are in the process of being conducted to better understand what makes information memorable, and to test the efficacy of the Re:Search Engine for re-finding.

What a person remembers appears to be dependent on many things, such as task and elapsed time, and the engine will be extended to take advantage of these features. For example, like a person, the Re:Search Engine will “forget” about previous searches over time. This will enable it to present more new information when appropriate, and relieve the storage burden of having to store all of results the user sees.

Generalizing the Solution

Although the Re:Search Engine addresses changes to search results, other types of information also change. The growing ease of electronic communication and collaboration, the rising availability of time dependent information, and even the introduction of automated agents, suggest information is becoming ever more dynamic. As stated by Levy, “[P]art of the social and technical work in the decades ahead will be to figure out how to provide the appropriate measure of fixity in the digital domain [5].” Understanding how people use search results to re-find will shed light on how people return to information in dynamic environments in general, and I look forward to applying what I learn from the Re:Search Engine to other problems in the domain.

Acknowledgements

This work is supported by the NTT-MIT Alliance, the Oxygen Project, and the HP-MIT Alliance. I am grateful to my advisor, David R. Karger, and to my thesis committee, Mark S. Ackerman, Susan T. Dumais and Robert C. Miller, for their thoughtful advice and encouragement.

References:

[1] Graphic, Visualization, and Usability Center. GVU's tenth WWW user survey, October 1998.

[2] Koichi Hayashi, Takahiko Normura, Tan Hazama, Makoto Takeoka, Sunao Hashimoto, and Stephen Grudmundson. Temporally threaded workspace: A model for providing activity-based perspectives on document spaces. In Proceedings of HyperText ’98, pp. 87--96, Pittsburgh, PA, USA, June 1998.

[3] William Jones, Harry Bruce and Susan T. Dumais. How do people get back to information on the Web? How can they do it better? In Proceedings of INTERACT ’03, pp. 793--796, Zürich, Switzerland, September 2003.

[4] Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum, 37(2), 2003.

[5] David Levy. Fixed or fluid? Document stability and new media. In Proceedings of European Conference on Hypertext, pp. 24--31, Edinburgh, Scotland, September 1994.

[6] Bennet B. Murdock, Jr. The serial position effect of free recall. Journal of Experimental Psychology, 64:482--488, 1962.

[7] Jaime Teevan, Christine Alvarado, Mark S. Ackerman, and David R. Karger. The perfect search engine is not enough: A study of orienteering behavior in directed search. In Proceedings of CHI ’04, pp. 415--422, Vienna, Austria, April 2004.

[8] Jaime Teevan. Displaying dynamic information. In Proceedings of CHI ’01 (Extended Abstract), pp. 417--418, Seattle, WA, USA, Month 2001

[9] Jaime Teevan. How people re-find information when the Web changes. MIT AI Memo AIM-2004-012, 2004.

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)