CSAIL Publications and Digital Archive header
bullet Research Abstracts Home bullet CSAIL Digital Archive bullet Research Activities bullet CSAIL Home bullet

link to publications.csail.mit.edu link to www.csail.mit.edu horizontal line

 

Research Abstracts - 2007
horizontal line

horizontal line

vertical line
vertical line

ICEDB: Continuous Query Processing in an Intermittently Connected World

Yang Zhang, Bret Hull, Vladimir Bychkovsky, Hari Balakrishnan & Samuel Madden

Project Overview

Over the past few years, the "first generation" of wireless sensor computing systems have taken root [13, 14], and the idea of thinking of a sensor network as a streaming data repository over which one can run continuous queries [7, 9], with optimizations such as "in-network" aggregation [6], is now well-established. This approach works well for a class of applications that are characterized by static sensor nodes with relatively low data rates, where the primary function of the sensor network ("sensornet") is to periodically monitor a remote environment or to track some event of interest.

We believe that the next generation of sensornets will display much higher degrees of mobility and significantly higher data rates. For example, media-rich sensors, such as cameras to capture images and video, chemical sensors to monitor pollution, vibration (acceleration) sensors to monitor car and road conditions, and cellular and 802.11 (Wi-Fi) radio sensors to map wireless network conditions, connected to the millions of automobiles in urban and suburban areas of the world can dramatically improve the scale and fidelity of spatio-temporal sensing of a wide range of important phenomena.

There are many issues in the successful design and implementation of such mobile, high-data-rate sensor systems, of which this project focuses on one: query processing. Motivated by the success of first-generation systems that have viewed the "sensornet as a streaming database," we adopt a similar programming model. The goal is to enable users to connect to a central server (which we call the portal), declaratively specify (primarily via continuous queries) what data they are most interested in collecting, and deliver responses to the portal. The portal takes care of distributing queries to the mobile nodes, each of which has a local query processor.

The combination of mobility and high data rates, however, introduces two crucial differences from previous systems [2, 3, 7, 8] that adopt a similar philosophy:

  1. Intermittent and variable network connectivity: Delivering data from mobile sensornets to the portal is a non-trivial problem. One approach would be to use the wide-area cellular wireless infrastructure. While this approach might be feasible (depending on the rates at which sensors collect data), it is expensive, particularly for consumers who are unlikely to pay extra for such connectivity. To combat this potentially debilitating "access bottleneck," our approach is to take advantage of the rise of free and open Wi-Fi networks in many parts of the world. Cities are starting to deploy such networks [10, 12], and companies like Fon [4] are starting to take advantage of open Wi-Fi to provide Internet access. Although these community Wi-Fi networks have high bandwidth, their coverage is usually, and perhaps fundamentally, spotty: as cars move, their network connectivity will be intermittent, alternating between periods of relatively high bandwidth and periods of no connectivity. Cellular networks share this spottiness as well -- cellular "holes" are common, and in smaller rural markets many cellular providers are simply unavailable. Fortunately, for non-interactive applications such as the ones envisioned above, such intermittent networks are quite adequate.
  2. Large quantities of data relative to network bandwidth: Media-rich sensors generate data at high rates, which means that whenever network connectivity is available, it might not be possible to send all the data collected since the last upload. Without some care, information that is more important may be unduly delayed, while less important data takes its place. For example, in a traffic management scenario, the portal would be especially interested in receiving reports about locations users are about to visit, or about areas of the road that have been congested in the past at the same time, or about locations for which little is currently known, rather than simply receiving data in FIFO order.
Project Goals

The combination of these two properties -- unaddressed in previous work on query processing -- motivates a new framework for specifying and processing continuous queries. In ICEDB we focus on two main ideas:

  1. Delay-tolerant continuous query processing: Intermittent connectivity changes the "always on" network assumption that all existing distributed query processing systems make. In current systems, the absence of network connectivity is an example of a failure to be masked [1, 5, 11], whereas in ICEDB it is part of normal operation. In ICEDB, the local query processor takes advantage of periods of disconnectivity by gathering, storing, and processing collected data such that when connectivity resumes, the "right" data, as expressed by the query, is sent in order of perceived importance. Thus, queries are continuous, yet intermediate results are stored in a local database, with the results (of the continuous queries) being streamed from this stored data. We propose a simple buffering mechanism and and protocol for managing the staging and delivery of query results from mobile nodes to the portal, and of queries from the portal to the mobile nodes.
  2. Inter- and intra-stream prioritization of query results: Because bandwidth is limited and variable, it is essential that mobile nodes make the best possible use of connectivity when it arrives. Hence, some form of data prioritization is needed to allow nodes to decide what data to transmit first. We propose a set of SQL extensions that allow users to to express inter- and intra-stream prioritization of data so that, given the constrained, variable, and intermittent nature of connectivity in CarTel, the most important data can be sent back first. Our extensions allow for both local (within a mobile node) as well as global (portal-driven) prioritization of results within a stream.
Initial Progress

We have implemented ICEDB under Linux in the context of the CarTel project (http://cartel.csail.mit.edu/). A small number of cars equipped with CarTel boxes are currently in daily use, collecting data from GPS receivers, Wi-Fi interfaces, cameras, and the cars' standard on-board diagnostics (OBD) interfaces. We use the data collected from this real (albeit lightly used) system to conduct a series of trace-driven simulations of different prioritization policies expressed using our query language extensions. Our results demonstrate the usefulness of our language features as well as the clear need for data prioritization in bandwidth constrained settings.

Acknowledgments

This work is funded in part by the National Science Foundation under Award Number CNS-0205445 and by Quanta Corporation.

References:

[1] M. Balazinska, H. Balakrishnan, S. Madden, and M. Stonebraker. Fault-tolerance in the borealis distributed stream processing system. In SIGMOD, 2005.

[2] D. Carney, U. Centiemel, M. Cherniak, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In VLDB, 2002.

[3] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, V. Raman, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, Jan. 2003.

[4] FON home page. http://en.fon.com/.

[5] J. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In ICDE, 2005.

[6] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. Tag: A tiny aggregation service for ad-hoc sensor networks. In OSDI, 2002.

[7] S. Madden, M. Franklin, J. Hellerstein, and W. Hong. The design of an acquisitional query processor for sensor networks. In SIGMOD, 2003.

[8] R. Motwani, J. Widom, A. Arasu, B. Babcock, S.Babu, M. Data, C. Olston, J. Rosenstein, and R. Varma. Query processing, approximation and resource management in a data stream management system. In CIDR, 2003.

[9] P. Bonnet, J. Gehrke, and P. Seshadri. Towards sensor database systems. In Conf. on Mobile Data Management, 2001.

[10] M. Richtel. San Francisco gets proposals for free citywide wi-fi net. New York Times, Feb. 2006.

[11] M. Shah, J. Hellerstein, and E. Brewer. Highly-available, fault-tolerant parallel dataflows. In SIGMOD, 2004.

[12] B. Tedeschi. Big wi-fi project for philadelphia. New York Times, 2004.

[13] G. Tolle, et al. A macroscope in the redwoods. In SenSys, 2005.

[14] N. Xu, et al. A wireless sensor network for structural monitoring. In SenSys, 2004.

 

vertical line
vertical line
 
horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu