Abstracts - 2006
SAF: A Similarity-based Adaptable Framework for Approximate Querying Answering Based on Time Series Forecasting
Daniela Tulone & Samuel Madden
There has been a great deal of interest in recent years in developing systems to collect data from so-called wireless sensor networks -- collections of battery-powered, wireless radio-equipped devices that are embedded into some remote environment for sensing and data collection. Applications include environmental monitoring, agriculture, industrial monitoring, and process control. Existing systems for declarative querying in these environments typically collect data from all of the nodes at a regular rate to some sink node, where it is combined and processed just as in a standard streaming database system. Clearly this approach involves a large number of transmissions and it is costly from the energy viewpoint.
We have studied the problem of efficiently providing approximate answers to user queries at the sink. We develop a suite of novel techniques for predicting values sensed at the nodes, and for grouping together sensor nodes that produce similar data and using representative members of each group to answer queries for the other constituents. Our main focus is on designing an adaptable query framework that is energy-efficient and that provides user-specifiable approximation bounds. As with most work on sensor networks, we focus on minimizing the amount of communication, as message transmission tends to be the dominant energy cost in most sensor systems.
Overview of our approach
Our approach is probabilistic: the sink does not forward the query requests submitted by the end user to each node in the network, but answers queries using a global probabilistic model built at the sink to predict sensor readings. The global model is derived from local models built at each node and transmitted to the sink. Each node monitors the quality of its data model and adapts it when needed by notifying the sink.
Our approach, called SAF (for Similarity-based Adaptive Framework), uses time series forecasting to dramatically reduce the number of transmissions relative to previous approximate approaches. The time series models we use are simple linear models that are cheap to learn and easy to adapt, in the sense that the model can be frequently re-learned as the data distribution changes. Specifically, we use an autoregressive model (AR) with a short prediction window. Our experimental results show that despite its simplicity our adaptable model is capable of predicting data produced by real-world sensors measuring physical phenomena like ambient light, temperature, and humidity that evolve slowly over time.
In order to answer queries satisfying a certain user confidence, nodes have to periodically transmit their readings to the sink. SAF additionally reduces the amount of communication by organizing the network into clusters based on data similarity between nodes. Our novel definition of data similarity uses coefficients of local AR models stored at the sink rather the actual readings, which reduces energy consumption over techniques that directly compare data values. We derive an efficient clustering algorithm that is provably optimal in the number of clusters formed by the network. This is achieved by exploiting the linearity of the AR models, which allows us to transform the complex problem of computing and maintaing clusters into the simple problem of grouping intervals of same width into larger intervals. Finally, SAF provides provably correct error bounds and allows the user to dynamically tune answer quality to answer queries in an energy and resource efficient manner. This includes the possibility of tuning the data stream rate at the sensor according to the user requirements. The high flexibility of SAF can lead to further extensions for additionally reducing transmissions, as discussed in . We validate the accuracy and the stability of our models and clusters both analytically and through experimental results which show their suitability.
In conclusion, all these features and the positive simulation results obtained using traces of real data motivate us to further validate SAF by building a prototype system.
 D. Tulone, S. Madden. PAQ: Time series forecasting for approximate query answering in sensor networks. In Proc. of the 3rd European Workshop on Wireless Sensor Networks, pp. 21--37, Feb 2006.
 D. Tulone, S. Madden. A similarity cluster-based framework for approximate querying in sensor networks. Under submission.
 D.Tulone, S.Madden. SAF: a similarity-based adaptable framework for approximate querying in sensor networks based on time series forecasting. In preparation for journal submission.
 D.Tulone. Mechanisms for energy conservation in wireless sensor networks. Ph.D. thesis, Dept. Computer Science, University of Pisa, Dec 2005.