Abstracts - 2006
Detection of Web Service Substitutability and Composability
Michael D. Ernst, Raimondas Lencevicius & Jeff H. Perkins
Web services are used in software applications as a standard and convenient way of accessing remote applications over the Internet. Web services can be thought of as remote procedure calls. We have developed a dynamic technique to determine web service substitutability and composability. Substitutability is important because the stability of web services is not guaranteed. Finding substitutable web services would allow application developers to increase their application uptime. Composability is important because with the proliferation of web services, interesting applications can be programmed by composing several web services. We have implemented our technique and applied it to 14 freely available web services producing 92 outputs. Our approach correctly detects all composable and substitutable web services from this set.
Web services are software building blocks that can be accessed over the Internet in a standard programmatic way using SOAP messaging. They allow programmers to access data provided by remote providers without extracting it from HTML web pages or using proprietary protocols. Web services are used in various areas: customized stock tracking and trading applications, product search and ordering, address validation, and so on.
Essentially, web services are remote procedure calls. A Web Service Definition Language (WSDL) file describes the parameters and format of each operation provided by a service. The remote procedure calls can be implemented in a variety of different formats including SOAP, HTTP GET/POST, and MIME.
Integrating Web Services
In theory, semantic information in WSDL files should provide enough information to use a service. In practice, it is often insufficient. In most web services we considered, the parameter types are indicated simply as strings, floats and integers. It is impossible to determine from a WSDL file if the input string is a stock ticker or a town name; or what the units of the output quantity are. Some services are even worse, returning a single untyped XML object instead of a typed set of outputs.
Our goal is to enable creation of applications that deliver new functionality by integrating various web services. In the long run, the integration can indicate which services are compatible with one another; substitute one service for another; and transform inputs or outputs in order to make them compatible. We wish to support the integration performed by programmers writing software, by users who compose services, or even by applications as they discover new services. We discuss each of these scenarios in turn.
Information about web service composability or substitutability can be very useful for programmers. Suppose that a new information source or sink becomes available. As noted above, existing documentation is not necessarily adequate for the programmer's purpose. However, given the information source or sink, and an example application that uses it, tools based on our techniques will enable the programmer to explore the semantics of the feed in order to more quickly build applications that properly use it.
End users can discover web services and may wish to compose them. Suppose that a user discovers two services created without knowledge of one another, and they do not adhere to a common standard. Tools should enable the user to create a new application on the fly by connecting them. A composition wizard would permit the user to make sensible connections between them, rejecting nonsensical ones, and converting the representation of those with compatible semantics but incompatible formats. For example, a motion detector's output might not be a sensible input to a shopping application, but could be provided as spatial control for aiming a video camera. As another example, today a difference in data format renders an application completely unable to use a data source. Based on observations of use, a future system could infer transformations that retain the meaning. Trivial examples are centigrade to Fahrenheit and polar to Cartesian coordinates, but future work should address more ambitious ones as well.
Currently, the stability of web services is not guaranteed. Finding substitutable web services allows application developers to increase their application uptime. For example, suppose that a blogger posts a local weather report from a home meteorological station. A weather application could notice this new information source and determine that it is (imperfectly) correlated with other weather data, perhaps after transformations. If the primary weather service becomes unavailable, the application automatically converts the blogger?s information into a form compatible with the application and uses it to approximate the missing information. As another example, the system could determine when multiple services provide interchangeable functionality and choose the one that is cheapest, fastest, or most accurate, based on user preferences. Such substitutability improves system reliability.
Our work does not solve all of the above problems. However, it takes an important step toward their solution by proposing and evaluating techniques for automatically computing web service substitutability and composability.
Consider two web services Stock1 and Stock2, and suppose they have the partial input/output behavior shown in the table below. Those values could have been created by testing, by random invocations, by observing their actual user over time, or in any other way.
Our technique works in two phases. The first phase detects the overlap of Stock Ticker and Stock inputs (ADBE, MSFT, and QCOM are the same inputs) as well as overlap with a margin of error of the Price and Latest Price outputs. The Volume output of the service Stock2 does not overlap with any other column of inputs or outputs. Since there are no inputs of one service that overlap outputs of another service, there is no potential for composability. However, since there are inputs that overlap inputs and outputs that overlap outputs, the services are potentially substitutable.
The second phase aligns the invocations of the two services for the overlapping inputs and outputs, and finds the best match of different overlapping outputs. In our example, ADBE 36.9 invocation aligns with ADBE 37 invocation of the second service, MSFT 27.40 with MSFT 27.39, and so on. If thus aligned invocations match well (as they do in our example), services are considered substitutable.
This section describes our algorithm for computing web service composability and substitutability. Our approach assumes that tracing data is available from executions of a set of web services. Then it finds similarities between the inputs and outputs of different web services. A single web service contains one or more operations. Each operation takes zero or more inputs and produces one or more outputs. Our algorithm treats each operation independently. We use the term "param" to mean an input or output.
Our algorithm is a dynamic one; that is, it infers composability and substitutability by observing actual executions of the web service. Simply put, it searches for outputs that match inputs to determine composability, and it searches for inputs and outputs that match one another to determine substitutability. The notion of "matching" is parameterizable; different matching sub-algorithms can be plugged into our framework.
Suppose that there are two operations op1 and op2, in different web services. The algorithm observes, at run time, many invocations of op1 and many invocations of op2; for each invocation, the observed trace data indicates each param value (input and output value). There are two challenges. First, the algorithm must determine which invocations of op1 are related to which invocations of op2. Second, the algorithm must determine which params of op1 are related to which params of op2.
As an example, suppose op1 takes a movie as input, and it outputs the names and ZIP codes of theaters showing the movie. Suppose that op2 takes a ZIP code and a location as input, and it outputs the approximate distance between them. The first challenge is to determine which invocations of op1 are related to invocations of op2. An invocation of op1 matches an invocation of op2 if op1 outputs the same ZIP code as the one used as input by op2. Many invocations of op1 will not match any invocation of op2, and vice versa. Once the invocations have been aligned, the second challenge is to determine which params are related. In the example, only the ZIP codes are related, and no other params are.
The algorithm works in two phases. The first phase aligns invocations of different services. The second phase builds on those results and performs a second, potentially looser matching operation to match params. Approximately, the first phase indicates composability (though composability could be refined by the second phase), and the second phase computes substitutability.
We tried our technique on 16 different operations from 14 different web services with 92 output parameters. There were 13 possible direct substitutions and 6 possible direct compositions. Our tool found each of these without any false positives.
There are some additional substitutions and compositions that would be possible if the parameters were mapped in some fashion. For example, we examined two currency conversions services. One takes country names as input and the other takes currency names. We do not find these to be directly substitutable, but in theory in might be possible to substitute one for the other if a proper mapping existed of country name to currency name.
Our promising preliminary results suggest that automatic detection of web service composability and substitutability is a promising direction for future research. However, additional work is required to make the technique practical. Here we note some directions that we plan to pursue.
We would like to experiment with additional web services, including commercial ones. We would also like to apply our techniques to Internet information services that are not packaged as web services, for instance by "screen-scraping" the results of web forms. We expect that our technique could also be applied to other software services, and we plan to experiment with components of the Nokia mobile phone architecture.
Our framework is parameterized by matching algorithms. We plan to experiment with more sophisticated matching algorithms. For instance, when provided with sufficient data, an approximate matching algorithm could determine that "$5" and "5" stand for the same quantity, or that the string "5" and the number 5 are the same. Other machine learning technique, such as those of Daikon 9, could indicate properties of parameters (for example, ZIP codes are 5-digit strings that are composed solely of digits).
Once our algorithm has aligned invocations and matched parameters, correlations could be inferred among un-matched parameters. For example, if all other parameters match, it could be inferred that "US" in one web service must mean the same thing as "United States" in another. This is just one way to deal with constants and with multiple inputs; we plan other approaches to those problems as well.
We would like to apply our technique to real data collected from the field; our current approach relies on inputs that we made up. Real data will reveal how much repetition of values occurs and will aid us in tuning our algorithms.
This research is supported by a grant from Nokia.