SWI-Prolog Semantic Web Server
Jan Wielemaker
Human Computer Studies (HCS),
University of Amsterdam
The Netherlands
E-mail: wielemak@science.uva.nl
Abstract
SWI-Prolog offers an extensive library for loading, saving and querying Semantic Web documents. Internally, the query language is `Prolog', building on top of an efficient implementation of a predicate rdf/3 expressing the content of the triple store.

Emerging dedicated Semantic Web query languages change this view. Supporting such languages provides a comfortable infrastructure for distributed Semantic Web processing systems. This document describes the SWI-Prolog Semantic Web Server. The server provides access to the Prolog triple store using either SeRQL or SPARQL. At the same time it is an extensible platform for realising Semantic Web based applications.

Table of Contents

1Introduction
2Query Languages
2.1SPARQL Support
2.2SeRQL Support
3Installation and Administration
3.1Getting started
3.2Persistent store
4Roadmap
4.1Query processing and entailment
4.2Query optimisation
4.3Webserver
5The Sesame client
6Sesame interoperability
7The SPARQL client
8Security issues
9Downloading

1Introduction

The SWI-Prolog Semantic Web Server unifies the SWI-Prolog general Web support and Semantic Web support, providing both a starting point for dedicated applications and a platform for exchange of RDF-based data using a standardised language and protocol. An overview of the SWI-Prolog Web support libraries can be found in SWI-Prolog and the Web,1Submitted to Theory and Practice of Logic Programming

2Query Languages

The current server supports two query languages: SeRQL and SPARQL. For both languages we provide an interactive service that presents the results as a human-readable HTML table, a service presenting its result as RDF/XML or XML that follows the HTTP protocol definition for the query language, the possibility to query the local database using a query language in Prolog and a Prolog client that can be used to query remote services supporting the query language and HTTP service.

For both query languages, queries are translated to a complex Prolog goal calling rdf/3 to resolve edges in the graph and calls to predicates from rdfql_runtime.pl that realise constraints imposed by the SeRQL WHERE clause and SPARQL FILTER clauses.

2.1SPARQL Support

SPARQL support is based on the SPARQL specification, versioned April 6, 2006. Status:

2.2SeRQL Support

SeRQL support and compatibility is based on development version 20040820, with additional support for the new 1.2 syntax and some of the built-in functions. Both SeRQL and the HTTP API are fully defined in the Sesame documentation.

3Installation and Administration

3.1Getting started

The file parms.pl contains a number of settings relevant to the server. Notable the port to connect to, where to store user information, etc. Persistent data kept by the server is a list of users and their access rights (default users.db) and a file-based backup of the in-memory store (default in the directory SeRQL-store). Please check the content of parms.pl and follow directions in the comments. On Unix-like systems, edit run.pl to adjust the location of SWI-Prolog on the !# line. Next, start run.pl and launch the server using the command below.

?- serql_server.

Now direct your browser to the server, using the default setup this is http://localhost:3020. If no users are defined the browser will prompt to enter the administrative password. After that the admin and anonymous users are created. Accounts can be created and modified by users with administrative rights through the List users ... link on the sidebar.

To restart from scratch, stop the server, delete the users database file and/or the triple backup file and restart the server as described above.

3.2Persistent store

The parms.pl setting persistent_store(Directory, Options) can be used to specify file-based persistent backup for the in-memory triple store. The store is a combination of quick-load triple databases and journal files that hold the modifications made to the triple store. Details of the persistent store are documented with the SWI-Prolog Semantic Web package

4Roadmap

4.1Query processing and entailment

The kernel of the system is formed by serql.pl and sparql, that implement the DCG parsers for the respective query languages as well as a compiler that translates this into a Prolog goal executing the query op top of the SWI-Prolog SemWeb package. The file rdfql_runtime.pl contains predicates that implement the constraints (SeRQL WHERE or SPARQL FILTER) and other constructs generated by the query-compiler.

Entailment reasoning is defined by rdf_entailment.pl. Specific entailments are in seperate files:

no_entailment.pl
Defines entailment none. Query explicitely stored triples only.
rdf_entailment.pl
Defines entailment rdf. Any resource appearing in a predicate position is of type rdf:Property. Any subject is an instance of rdf:Resource
rdfs_entailment.pl
Defines entailment rdfs. Adds class- and property-hierarchy reasoning to RDF reasoning, as well as reasoning on the basis of property domain and range.
rdfslite_entailment.pl
Defines entailment rdfslite. Only considers the class- and property-hierarchy. Using a backward chaining solver this is much faster, while normally keeping the intended meaning.

The query compiler and execution system can be called directly from Prolog.

serql_compile(+Query, -Compiled, +Options)
Compile Query, which is either an atom or a list of character codes and unify Compiled with an opaque term representing the query and suitable for passing to serql_run/2. Defined Options are:
entailment(Entailment)
Entailment to use. Default is rdfs. See section 4.1.
type(-Type)
Extract the type of query compiled and generally useful information on it. SeRQL defines the types construct and select(VarNames), where VarNames is a list of variables appearing in the projection.
optimise(Bool)
Whether or not to optimise the query. Default is defined by the setting optimise_query.
sparql_compile(+Query, -Compiled, +Options)
Similar to to serql_compile/3. Defined types are extended with describe and ask. Addional options are:
base_uri(-URI)
Base URI used to compile the query if not specified as part of the query.
ordered(-Bool)
Unify Bool with true if query contains an ORDER BY clause.
distinct(-Bool)
Unify Bool with true if query contains a DISTINCT modifier.
serql_run(+Compiled, -Answer)
Run a query compiled by serql_compile/3, returning terms row(Arg ...) for select queries and terms rdf(Subject, Predicate, Object) for construct queries. Subsequent results are returned on backtracking.
sparql_run(+Compiled, -Answer)
Similar to serql_run/2. Queries of type describe return rdf-terms like construct. Queries of type ask return either true or false.
serql_query(+Query, -Answer, +Options)
Utility combining of serql_compile/3 and serql_run/2. Note this gives no access to the column-names.
sparql_query(+Query, -Answer, +Options)
Similar to serql_query/3.

4.2Query optimisation

By default, but under control of the setting/1 option optimise_query(Bool), and the option optimise(Bool), the query compiler optimises initial goal obtained from naive translation of the query text. The optimiser is defined in rdf_optimise.pl. The optimiser is described in detail in An optimised Semantic Web query language implementation in Prolog. The optimiser reorders goals in the generated conjunction and prepares for independent execution of independent parts of the generated goal. With the optimiser enabled (default), the provided order of path-expressions on the query text is completely ignored and constraints are inserted at the earliest possible point.

The SeRQL LIKE operator applies to both resources and literals, while the SWI-Prolog RDF-DB module can only handle LIKE efficiently on literals. The optimiser can be made aware of this using WHERE label(X) LIKE "joe*". Taking the label informs the optimiser that it only needs to consider literals. Likewise, equivalence tests where one of the arguments is used as subject or predicate or has the isResource(X) constraint tell the system it can do straight identifier comparison rather then the much more expensive general comparison.

Query optimisation is not yet supported for SPARQL.

4.3Webserver

The webserver is realised by server.pl, merely loading both components: http_data.pl providing the Sesame HTTP API using the same paths and parameters and http_user.pl providing a browser-friendly frontend. Error messages are still very crude and almost all errors return a 500 server error page with a transcription of the Prolog exception.

The Sesame HTTP API deals with a large number of data formats, only part of which are realised by the current system. This realisation is achieved through rdf_result, providing an extensible API for reading and writing in different formats. rdf_html, rdf_write and xml_result provide some implementations thereof.

5The Sesame client

The file sesame_client.pl, created by Maarten Menken provides an API to remote Sesame servers. Below is a brief documentation of the available primitives. All predicates take an option list. To simplify applications that communicate with a single server defauls for the server and reposititory locations can be specified using set_sesame_default/1.

set_sesame_default(+DefaultOrList)
This predicate can be used to specify defaults for the options available to the other Sesame interface predicates. A default is a term Option(Value). If a list of such options is provided all options are set in the order of appearance in the list. This implies options later in the list may overrule already set options. Defined options are:
host(Host)
Hostname running the Sesame server.
port(Port)
Por the sesame server listens on.
path(Path)
Path from the root to the Sesame server. For the SWI-Prolog Sesame client, this is normally the empty atom (''). For thte Java based Sesame this is normally '/sesame'.
repository(Repository)
Name of the repository to connect to. See also sesame_current_repository/3.

Below is a typical call to connect to a sesame server:

...,
set_sesame_default([ host(localhost),
                     port(8080),
                     path('/sesame'),
                     repository('mem-rdfs-db')
                   ]).
sesame_current_repository(-Id, -Properties, +Options)
Enumerate the currently available Sesame repositories. Id is unified to the name of the repository. Properties is a list of Name(Value) terms providing title and access details. Options specifies the host, port and path of the server.
sesame_clear_repository(+Options)
Remove all content from the repository. Options specifies the host, port and path of the server as well as the target repository.
sesame_login(+User, +Password, +Options)
Login to a Sesame server. On success the returned cookie is stored and transmitted with each query on the same server. Options specifies the host, port and path of the server.
sesame_logout(+Options)
Options specifies the host, port and path of the server.
sesame_graph_query(+Query, -Triple, +Options)
Execute Query on the given server and return the resulting triples on backtracking. Options specifies the host, port and path of the server as well as the target repository. The example below extracts all type relations from the default server.
...,
sesame_graph_query('construct * from {s} <rdf:type> {o}',
                   rdf(S,P,O),
                   []),
sesame_table_query(+Query, -Row, +Options)
Execute Query on the given server and return the resulting rows on backtracking. Each Row is a term of the format row(Col1, Col2, ... ColN). Options specifies the host, port and path of the server as well as the target repository.
sesame_extract_rdf(-Triple, +Options)
Extract all content from an RDF repository. In addition to the server and repository options the following options are defined:
schema(OnOff)
Extract the schema information.
data(OnOff)
Extract the plain data
explicit_only(OnOff)
Determine whether or not entailed triples are returned. Default is off, returning both explicit and inferred triples.
sesame_upload_file(+File, +Options)
dd the content of File to the repository. In addition to the server and repository options the following options are defined:
data_format(+Format)
Format of the input file. Default is rdfxml.
base_uri(+BaseURI)
URI for resolving local names. Default is foo:bar.
verify_data(OnOff)
Do/do not verify the input. Default is off.
sesame_assert(+TripleOrList, +Options)
Assert a single rdf(Subject, Predicate, Object) or a list of such terms. In addition to the server and repository options the following options are defined:
base_uri(+BaseURI)
URI for resolving local names. Default is foo:bar.
sesame_retract(+Triple, +Options)
Remove a triple from the repository. Variables in Triple match all values for that field.

6Sesame interoperability

The SWI-Prolog SeRQL engine provides a (still incomplete) drop-in replacement for the Sesame HTTP access protocol. Sesame's remote server class can be used to access the SWI-Prolog SeRQL engine through the Sesame Java API. Likewise the Prolog client realised by sesame_client.pl provides a Prolog API that can be used to access both Sesame and the SWI-Prolog SeRQL engine.

7The SPARQL client

The file sparql_client.pl provides a client to the SPARQL HTTP protocol. The protocol defines how a SPARQL query is asked over HTTP and how the results are presented. It is possible to use the SeRQL protocol on the same server to perform tasks such as modifying the triple store.

The structure of the SPARQL client API is closely based on the SeRQL client.

sparql_query(+Query, -Row, +Options)
Run a SPARQL query on a remote server, retrieving the results one-by-one on backtracking. Options provide the host, port and path of the server. sparql_set_server/1 can be used to define default locations.
sparql_set_server(+Options)
List of options that act as defaults for sparql_query/3. Commonly set to specify the server location. For example:
?- sparql_set_server([ host(localhost),
                       port(3020),
                       path('/sparql/')
                     ]).

8Security issues

HTTP Communication with the server, including usernames and passwords, is in cleartext and therefore sensitive to sniffing. The overall security of the server is unknown. It is advised to run the server as user with minimal access rights, only providing write access to the user database file.

9Downloading

The SWI-Prolog SeRQL engine is available from CVS using the following commands:

% cvs -d :pserver:pl@gollem.science.uva.nl:/usr/local/cvspl login
Password: prolog
% cvs -d :pserver:pl@gollem.science.uva.nl:/usr/local/cvspl co SeRQL

Infrequently announces and snapshots are provided through the Prolog Wiki

Acknowledgements

The SeRQL server has been realised as part of the HOPS project and could not have been done without Sesame and feedback from Jeen Broekstra and Maarten Menken from the Free University of Amsterdam (VU). Adding SPARQL support has been realised as part of the E-culture sub-project of Dutch MultiMedia project.

Index

R
rdf/3
2
S
serql_compile/3
4.1 4.1 4.1
serql_query/3
4.1
serql_run/2
4.1 4.1 4.1
sesame_assert/2
sesame_clear_repository/1
sesame_current_repository/3
5
sesame_extract_rdf/2
sesame_graph_query/3
sesame_login/3
sesame_logout/1
sesame_retract/2
sesame_table_query/3
sesame_upload_file/+File, +Options
set_sesame_default/1
5
setting/1
4.2
sparql_compile/3
sparql_query/3
7
sparql_run/2
sparql_set_server/1
7