About
The goal of Service-Detective is thus to overcome shortcomings of current Web Service search engines by:
- Employing automated methods to gather Web Services and related resources.
- Leveraging automatic means to create semantic service descriptions from information available on the Web.
- Describing the aggregated information in semantic models and allowing reasoning over it.
- Building meaningful clusters of the collected services and of search results.
The Service-Detective project will deliver a search engine that enables users to find up-to- date information on available Web Services. It will employ automated crawling, information retrieval methods and analysis techniques and shall be able to scale with the increasing number of services, as this approach does not rely on a central editorial team that would necessarily become a bottleneck once the number of deployed services reaches Web scale. Consequently, the approaches developed by Service-Detective can adapt quickly to changes in terms of the available services. The search engine will leverage available information exposed by current technologies and extend this information with semantic annotations to allow for more accurate retrieval. It will use the service information to enable efficient clustering and matchmaking of services, in view of the goal to provide an efficient discovery and clustered search possibility for Web Services.
The concept
Service- Detective approaches the discovery problem in two orthogonal dimensions:
- First, it aims at exploiting the technologies developed for semantic discovery in previous Semantic Web Services projects (e.g., RW², SWWS, DIP).
- Second, Service-Detective develops novel means to obtain the underlying semantic models for discovery by analyzing available Web content.
We illustrate the high-level architecture of Service-Detective in Figure 1. By crawling the Web, we obtain the available services and related information: We initially start from a set of services crawled from the Web. Given the information in the interface descriptions we can identify invokable endpoints. Around those endpoints we gather information. For instance, given the underlying transport protocol4, we can assess the liveliness of endpoints, as well as their geographic location. Aggregating this with other information sources, we can obtain information about the housing condition (shared, dedicated, dynamic DNS, etc) and infer properties about service reliability.
The sources of this information are divers. In particular sources of information include the provider’s service definition, as well as documents pointing to that definition and vice versa. Also other resources related via multiple Web links or through keyword similarity might be considered. In order to provide a general infrastructure, it is necessary to design methods for identification and retrieval of such heterogeneous information and to investigate how to exploit it. This first task is performed by our smart Service Crawler that is able to gather both services and information related to services.
Having the data gathered from the crawler, e.g. information of the service terms (Web Service API) and documentation, we are able to create semantic descriptions of service functionalities. Therefore we analyze the data and enrich it, based on the results of the analysis.

Figure 1: Service-Detective structure
Semantic descriptions of Web Services can be added in two different ways: using generic Web Service ontologies, as e.g. the Web Service Modeling Ontology (WSMO)5, or using domain-specific ontologies. The innovative approach in Service-Detective is to facilitate service discovery by automatically adding semantic service descriptions for different forms of Web Services, using both information from the service provider and from data sources independent of the service provider. Different forms of Web Service descriptions include for example descriptions in WSDL (Web Service Description Language), which is a W3C- recommended XML-based language that provides a model to describe Web Services, or REST (Representational State Transfer), which is an architecture style that has no standardized interface descriptions, as compared to WSDL, what makes REST service descriptions harder to detect on the Web than WSDL descriptions. REST input parameter are usually submitted in the query part of a URL and the response data is often described by a mixture of XML Schema and textual information.
It will then be integrated in one coherent semantic model (conceptual index) allowing effective retrieval.
The information from these various sources will then be used to build up an initial version of one coherent semantic model (conceptual index) that allows for effective retrieval. The semantic model will then be aggregated into a clustering and matchmaking component that uses term similarity approaches to build clusters out of the results.
In summary, the objectives of Service-Detective are to:
- Create an architecture for a Web Service search engine that automatically aggregates information from heterogeneous sources to facilitate discovery.
- Create methods to identify relevant service-related information from the Web in order to automatically build a semantic service crawler.
- Generate semantic service descriptions from various input sources, such as Web sites, Wikis, FAQs, blogs, etc.
- Take into account dynamic aspects of service requesters, as well as user profiles.
- Build index structures to allow efficient access to services within the repository and develop a matching engine that can reason about service requests.
- Build meaningful clusters of the services.