CSAIL Research Abstracts - 2005 link to http://publications.csail.mit.edu/abstracts/abstracts05/index.html link to http://www.csail.mit.edu
bullet Introduction bullet Architecture, Systems
& Networks
bullet Language, Learning,
Vision & Graphics
bullet Physical, Biological
& Social Systems
bullet Theory bullet

horizontal line

Scoop: Minimizing Overhead in Store-and-Query Sensor Networks

Thomer M. Gil & Samuel Madden

Introduction

The availability of low-cost wireless networking technologies like 802.11 and the emerging IEEE 802.15.4 standard means that the vast array of embedded computers in the world around us will soon be interconnected, promising tremendous advances in a variety of industries. For these kinds of applications to be widely deployed, users require a reusable infrastructure that allows them to monitor information from sensors without concern for the low-level details of networking, power management, or the difficulties associated with writing bug-free code for embedded microprocessors. Existing efforts within the sensor networking community to deploy reusable data collection technology for wireless sensor networks (WSNs) [1, 2, 3, 4, 5, 6] have had some success in making this a reality; several groups have proposed and built declarative languages and/or efficient query execution substrates that allow users to focus on the data they want to collect rather than the implementation details of collecting it.

Challenges

Existing "Database-style" systems like Cougar [1] and TinyDB [7] generally assume that the system is organized into a connected network topology that has sufficient network bandwidth to deliver query answers to the user. Unfortunately, the data rates required for industrial monitoring deployments are typically hundreds to thousands of Hertz, which is insufficient to allow current low-power radio technologies to continuously stream data from thousands of nodes. Even if sufficient bandwidth were available, doing so would quickly drain the batteries of these devices, suggesting that some kind of in-network storage and processing of data is needed. Although these systems do allow users to summarize data via aggregates, the raw readings that comprise those aggregates are not stored; they provide no way for users to revisit data collected by the system or query different subsets of the data as needed.

Scoop: What

We are building a system, Scoop, that is designed to efficiently store and query relational data collected by machines in a bandwidth-constrained sensor network. Scoop monitors changes in the distribution of sensor readings, queried values, and network connectivity to determine the best location to store data. We formulate this as an optimization problem and present a practical algorithm that solves this problem in Scoop. Users can then pose queries over this stored data; these queries can be over different subsets of nodes, ranges of data values, and time periods, allowing users to focus on the data in which they are particularly interested. This approach yields significant advantages over existing sensor network query systems:

  • Scoop minimizes network bandwidth usage of queries over stored data by optimizing the placement of data in the network based on the rate at which data is being acquired, the expected values of acquired sensor readings, and the expected type and frequency of users' queries.
  • Scoop adapts to changes in the rates of query and data arrival as well as in the distributions of queried values and sensor readings.
  • Scoop provides support for disconnected operation. Users are not required to be connected to the sensor network to monitor results or choose which data to query; they may issue queries over recent historical data whenever they connect.
Scoop: How

A special node, the basestation, collects statistics from sensors to generate a storage assignment that instructs nodes where in the network data items they produce should be stored in the future. (The insight here is that if a node has recently produced a particular sensor reading, it is likely to produce readings around that value in the near future.) The storage assignment not only takes into account the probability of values as reported by the nodes, but also the topology of their network neighborhood, and, if known, the probability of querying certain readings To satisfy queries, the basestation needs only talk to nodes that could possibly store data according to the storage assignment.

When a node produces data, it uses the storage assignment to determine the set of nodes responsible for storing such data items, and picks the nearest one to send the data item to, or, if the node itself is in the set, it stores the data item itself. By storing data nearby, the system avoids energy-costly radio transmissions.

Preliminary Results

Our results indicate that Scoop not only provides substantial performance benefits over alternative approaches on a range of data sets, but is also able to efficiently adapt to changes in the distribution and rates of data and queries.

References

[1] Yong Yao and Johannes Gehrke. Query Processing in Sensor Networks, in Conference on Innovative Data Systems Research (CIDR), 2003.

[2] Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong. The Design of an Acquisitional Query Processor For Sensor Networks, in Proceedings of SIGMOD, 2003.

[3] X. Li, Y. J. Kim, R. Govindan, and W. Hong. Multi-dimensional Range Queries in Sensor Networks, in Proceeding of the First ACM Conference on Sensor Systems (SenSys), 2003.

[4] Amol Desphande, Carlos Guestrin, Samuel Madden, Joe Hellerstein, and Wei Hong. Model-Driven Data Acquisition in Sensor Networks, in The Proceedings of VLDB, 2004.

[5] Ankur Jain, Edward Change, and Yuan-Fang Wang. Approximate Join Processing Over Data Streams, in Proceedings of SIGMOD, 2003.

[6] C. Olston and J.Widom. Best Effort Cache Sychronization with Source Cooperation, in Proceedings of SIGMOD, 2002.

[7] Samuel Madden, Wei Hong, Joseph M. Hellerstein, and Michael Franklin. TinyDB

horizontal line

MIT logo Computer Science and Artificial Intelligence Laboratory (CSAIL)
The Stata Center, Building 32 - 32 Vassar Street - Cambridge, MA 02139 - USA
tel:+1-617-253-0073 - publications@csail.mit.edu
(Note: On July 1, 2003, the AI Lab and LCS merged to form CSAIL.)