BeSEPPI: Semantic-Based Benchmarking of Property Path Implementations

. In 2013 property paths were introduced with the release of SPARQL 1.1. These property paths allow for describing complex queries in a more concise and comprehensive way. The W3C introduced a formal speciﬁcation of the semantics of property paths, to which implementations should adhere. Most commonly used RDF stores claim to support property paths. In order to give insight into how well current implementations of property paths work we have developed BeSEPPI, a benchmark for the semantic based evaluation of property path implementations. BeSEPPI measures execution times of queries containing property paths and checks whether RDF stores follow the W3Cs semantics by testing the correctness and completeness of query result sets. The results of our benchmark show that only one out of 5 benchmarked RDF stores returns complete and correct result sets for all benchmark queries.


Introduction
The SPARQL Protocol and RDF Query Language (SPARQL) is the standard query language for RDF stores. In 2013 property paths were introduced with SPARQL 1.1. Property paths allow for describing paths of arbitrary length in graphs, which cannot be described with a single SPARQL 1.0 query. For instance, all friends of a friend of a friend etc. from a social network cannot be retrieved with a single SPARQL 1.0 query. With property paths the construct foaf:knows * could be used to obtain all desired results with a single query. Furthermore, property paths provide a more concise way to formulate queries. A query that should return all friends of a friend in a social network could use the construct foaf:knows/foaf:knows.
In [1] it is shown that more and more queries containing property paths are run against the Wikipedia Knowledge Graph. For instance, of all queries scheduled in January 2018, over 20% contained property paths. In order to ensure that queries containing property paths return the same result sets independently of the used RDF store, the W3C released the official semantics of property paths in [2].
The comparison of query execution times is only meaningful, if the result sets are complete and correct. Therefore, we have developed a benchmark for semantic-based evaluation of property path implementations (BeSEPPI). BeSEPPI does not only measure the execution times of property path queries, but also provides unit tests to check if the result sets are complete and correct based on the W3Cs semantics (see section 3). Our benchmark comes with 236 queries and respective reference result sets, testing various semantic aspects of property paths. Thus, BeSEPPI may also be used by RDF store developers as a unit test to analyze their own implementation of property paths.
We used BeSEPPI to evaluate Blazegraph, AllegroGraph, Virtuoso, RDF4J and Apache Jena Fuseki (see section 4). Our evaluation indicates that most RDF stores do not adhere to the W3Cs semantics completely. The original contributions of this paper3 are: 1. BeSEPPI: A benchmark testing the execution times as well as the result set correctness and completeness of property path queries (see section 3). 2. An extensive evaluation of 5 common RDF stores (see section 4).

Preliminaries
In the following, common definitions for RDF, SPARQL and property paths based on [4], [5] and [6] are given in order to define the terminology used in this work.

Graph
The Resource Description Framework (RDF) [7] is a general-purpose language for representing information in the web. It uses triples to represent the information as directed, labeled graphs. A graphical representation of an RDF dataset is shown in figure  1. For better legibility prefixes can be used to abbreviate IRIs. An example for such a prefix is given by PREFIX ppb: <http://ppbenchmark.com/>. This prefix defines that for instance ppb:B1 actually means <http://www.ppbenchmark.com/B1>.

Definition 2 (RDF graph).
An RDF graph G is a finite set of RDF triples. Furthermore, the subjects and objects occurring in G are vertices and occurring predicates are edges in G. V G is the set of all vertex labels in G and E G is the set of all edge labels in G.

Definition 3 (RDF term).
An RDF term t is an element of I ∪ L ∪ B. The set of all RDF terms in a graph G is denoted by T G .

Definition 4 (Path and Cycle).
A path P = v 0 , e 1 , v 1 , e 2 , v 2 , ..., e n , v n in an RDF graph G connects two vertices v 0 and v n with each other. In a path v i are vertices, e i are edges, ∀i, j ∈ [0, n − 1] : Furthermore, the path length is defined by the number of edges between v 0 and v n .4 Example 1: An example for a path between the vertices ppb:A1 and ppb:A3 in figure 1 is: P = ppb:A1, ppb:e1, ppb:CenterA, ppb:e3, ppb:A3 . The length of this path is 2. Moreover, a self loop is a cycle of length one. In case of figure 1 the path P = ppb:A1, ppb:eSelf, ppb:A1 is a self loop.

SPARQL 1.1 Property Paths
The SPARQL Protocol and RDF Query Language (SPARQL) 1.1 is used to query RDF graphs. In the following section the syntax and semantics of the subset of SPARQL 1.1 that is needed for this paper is introduced. The syntax and semantics of property paths are defined following the semantic specification of the W3C in [2].5

Syntax Definition 5 (Property path expression).
A property path expression can be an atomic or a combined property path expression. Atomic property path expressions: 1) iri ∈ I is a simple property path expression. 2) !(iri 1 |...|iri n |^iri n+1 |...|^iri m ) with iri 1 , ...iri m ∈ I is the negated and inverse negated property set.

Combined property path expressions:
3)^E with property path expression E, is the inverse property path expression. 4) E 1 /E 2 , with property path expressions E 1 and E 2 , is the sequence property path expression. 5) E 1 |E 2 , with property path expressions E 1 and E 2 , is the alternative property path expression. 6) E? with property path expression E is the existential property path expression. 7) E*, with property path expression E, is the transitive reflexive closure property path expression. 8) E+, with property path expression E, is the transitive closure property path expression. 9) (E), groups the expression E. The second set denotes all tuples of vertex labels in G and the third part denotes the tuple of the element that was included in Γ additionally to V G .
In order to obtain information from an RDF store, elements of Γ are bound to variables. These bindings are called variable bindings. Definition 9 (Variable bindings). The partial function µ : V → T with variables V and RDF terms T, is called a variable binding. The domain dom(µ) of a variable binding µ is the set of variables on which µ is defined.

Definition 10 (Evaluation of property paths).
For constants s ∈ I ∪ B ∪ L, o ∈ I ∪ B ∪ L and variables v, v 1 , v 2 ∈ V the evaluation of property paths is defined as:

Definition 12 (Semantics of ASK query). [11]
The evaluation [[Q]] G of a query Q of the form ASK WHERE {P } over an RDF graph G is defined as:

Property Path Benchmark BeSEPPI
In order to benchmark the performance of RDF stores with regard to property path queries we introduce our novel benchmark for semantic-based evaluation of property path implementations (BeSEPPI)7. BeSEPPI measures the execution times of 236 property path queries. These queries are executed on a small dataset that was created for evaluating various aspects of property paths. Furthermore, BeSEPPI comes with reference result sets for each query, which allow for evaluating correctness and completeness of result sets.

Dataset
The benchmark dataset is a graph consisting of 28 triples. It allows for testing various semantic aspects of each property path expression. The dataset is kept small so that humans can easily create reference result sets for property path queries and evaluate the correctness and completeness of query result sets. The graph is depicted in figure 1.

Queries
The query set of BeSEPPI consists of 236 queries of which 73 are ASK queries and 163 are SELECT queries. In our benchmark we want to evaluate the performance of each property path expression individually with regard to various semantic aspects. Therefore, we test each expression separately and omit combinations of property path expressions. The queries are organized according to the following 3 dimensions. Dimension 1: The property path expression. The first dimension is the property path expression that is tested.

Dimension 2: The number and positions of variables and terms in the query.
According to definition 10 there are 4 possibilities for the number and positions of variables and terms in a query containing a single property path: sEo, sEv, vEo and v 1 Ev 2 where s and o are terms v, v 1 and v 2 are variables and E is a property path expression. Queries of the form sEo test for the existence of the path in the dataset and do not return any variable bindings. During our evaluation we have observed that some stores do not support queries with * after the SELECT statement, which do not contain any variables, even though these queries are syntactically correct. Due to the fact that such a query simply returns an empty set if the path in the query does not exist and otherwise an empty variable binding, we have transformed such queries to ASK queries which return false or true. We expect ASK queries to be supported in all cases whereas SELECT queries with * and without variables have shown to be not supported in some cases. Dimension 3: Semantic aspects. Semantic aspects are certain characteristics a query fulfills. Semantic aspects are for instance, that a query returns an empty result set or that the traversed path in the graph has a length of at least 4. Each property path expression has different semantics and therefore, not all semantic aspects can be considered for all property path expressions. Due to the high number of queries in BeSEPPI, describing all queries and the respective semantic aspects is beyond the scope of this paper. In order to still give insight into the query structure we give an overview of queries for each expression and variable-constant combination in table 1. Additionally, we explain two benchmark queries for the existential property path expression in the following section.

Existential Property Path Expression Queries
In order to evaluate the performance of RDF stores for property path queries with the existential property path expression, we use 24 queries. Two exemplary queries and their semantic aspects are presented below. For all queries reference result sets were created to evaluate the correctness and completeness of the result sets returned by the RDF stores.

Metrics
In order to allow for comparing benchmark results of different stores with each other and to make the results comprehensible, meaningful metrics need to be used. For BeSEPPI we focus on the following metrics.
1. Query correctness The percentage of correct query results that are returned for each query. For SELECT queries: If R q is the set of all correct results for a query q and R S q is the set of returned results of query q executed on RDF store S, then the query correctness is defined as: For ASK queries: If r q is the correct boolean result for the ASK query and r S q is the returned boolean result for an RDF store S, then the correctness is defined as: The percentage of all possible query results of the query. For SELECT queries: If R q is the set of all correct results for a query q and R S q is the set of returned results of query q executed on RDF store S, then the query completeness is defined as: For ASK queries: If r q is the correct boolean result for the ASK query and r S q is the returned boolean result for an RDF store S, then the completeness is defined as:

. Average execution time per query
The arithmetic mean avexec(q) of the execution time t(q) of each query q is defined as: where n is the number of times a query was executed.

Execution Strategy
In the first step of the benchmark execution, the complete dataset is loaded into the RDF store that should be benchmarked. Afterwards, all 236 queries are executed once without measuring any metrics in order to warm up the store. After that the 236 queries are executed 10 times and the metrics are measured. The queries are executed one after another and not in parallel. To prevent outliers the highest and lowest execution times are deleted. Finally, the average execution time, the correctness and the completeness are stored in a human readable CSV file.

Benchmark Results
In order to evaluate the performance of RDF stores in regard to queries containing property paths we use the property path benchmark BeSEPPI described in section 3.

Experimental Setting
We benchmarked the property path implementations of 5 common RDF stores, namely Blazegraph 2.

Completeness and Correctness
In this section the correctness corr(q) and completeness comp(q) of result sets for each store are presented and it is discussed how the difference between the returned results and the reference result sets might be caused.  In table 2 an overview of the numbers of queries, which returned only incomplete, only incorrect or incomplete and incorrect result sets, or caused an error during the execution of the query is given. Furthermore, the rightmost column shows the total number of queries for the respective property path expression.
One observation is that all stores return complete, correct and error-free result sets for the inverse, sequence and alternative expressions. A reason for this might be the clarity of their semantics, since their definition is the same in different sources, such as the official SPARQL 1.1 definition, [2] and [6]. Furthermore, the transformation of these property path expressions into SPARQL 1.0 queries is straightforward, such that already implemented SPARQL1.0 query operators could be reused.
In the rest of this section the cases in which queries did not return correct, complete and error-free result sets for each store are discussed.
Blazegraph: Blazegraph returns complete and correct result sets for most queries, but there are 13 result sets which are not complete or correct. The first three queries that did not return complete and correct result sets are ASK queries. These three queries incorrectly returned true and have in common that the combination of subject and predicate can be found in the graph whereas the object does not occur. For queries in which also the object occurred in the graph, true was correctly returned.
All other queries with incomplete result sets return correct results. The tested property paths of these queries are of the form variable, property path expression, variable. Furthermore, they all involve either the existential or the transitive reflexive closure expression. After examining the missing results, we noticed that only results produced by the term {(a, a) AllegroGraph also returns empty result sets, if the negated property set contains at least one non-existing property. Furthermore, if the property path contains two variables, AllegroGraph interprets the inverse negated property set as negated property set, leading to result sets in which the assignments of the two variables to terms are swapped. The same applies to the inverse part of the negated and inverse negated property set. Virtuoso: Virtuoso does not execute queries with two variables combined with the existential, the transitive closure or the transitive reflexive closure property path expression. For such queries the store returns an error, which says "Transitive start not given". This behavior seems to be a deliberate choice in the design of the RDF store and might have to do with the fact that Virtuoso is built on relational databases. In relational databases very large joins might be necessary in order to answer these queries and therefore, this feature may have not been implemented.
For queries with one or no variable and the existential or transitive reflexive closure property path expression Virtuoso returns complete and correct result sets. For the transitive closure property path expression there are 10 queries, which do not return complete result sets for Virtuoso. These queries all have a cycle like v 1 , e, v 2 , ..., v n , e, v 1 as tested semantic aspect and the missing result is always the start vertex v 1 of the cycle. This indicates that the transitive closure property path expression might be implemented in such a way that [[P * ]] Γ G is evaluated and the reflexive start is removed from the result set. In such a case, queries with cycles would return correct results except for the starting and the end vertex respectively.
For the negated property set Virtuoso returns correct and complete result sets for each query. For 11 queries with the inverse negated property set Virtuoso returns errors. The queries that return errors are distributed over all query forms and semantic aspects such that we could not identify the underlying cause. Finally, the combination of the negated and inverse negated property set returns complete and correct result sets. RDF4J: RDF4J returns false for three ASK queries with the existential property path expression where the correct result is true. In each of the three queries, the subject and object are equal. This indicates, that RDF4J ignores the results included in {(a, a) ∈ Γ } in ASK queries with the existential property path expression. Furthermore, RDF4J incorrectly returns false as result for ASK queries with a transitive closure property path expression, if they have a cycle as tested aspect.
For queries with the negated property set, the inverse negated property set and the combination of both sets RDF4J does not execute queries of the form subject property path expression object or variable property path object. This means every time such a query is executed the store returns an error. Apache Jena Fuseki: Fuseki was the only store that executed every benchmark query without errors and returned complete and correct result sets. It seems that the store follows the W3Cs definition of property path semantics.

Execution Times
In order to compare the current performances of property path implementations we present and discuss the execution times of the benchmark queries in this section13. For this discussion we only take the execution times of queries into consideration for which complete and correct result sets were returned. Out of all 236 queries all stores returned complete and correct result sets for only 134 queries. In figure 2 the sums of avexec(q) of these 134 queries are presented for the individual RDF stores.

Blazegraph
AllegroGraph When considering all benchmark queries for which complete and correct result sets were returned by all stores, RDF4J and Virtuoso execute queries the fastest. Fuseki was approximately 400ms slower and AllegroGraph took more than 5 times as long as RDF4J or Virtuoso. Blazegraph required 4443ms to execute the queries on average. This means Blazegraph needs more than 7 times longer than Virtuoso or RDF4J.
In table 3 the sums of avexec(q) are shown for queries containing the different property path expressions. For these execution times, the 134 queries were considered, for which complete and correct result sets were returned by all stores. When investigating the influence of the property path expressions dimension on the execution times, table 3 shows that Blazegraph executes queries containing the inverse, sequence, alternative property path expression or any form of negated property sets faster than AllegroGraph, but AllegroGraph is faster when it comes to queries with the existential, transitive or reflexive transitive closure property path expression. Regardless of this, both stores are slower than the other 3 stores for each property path expression. Virtuoso and RDF4J are the fastest stores for each property path expression and the differences between the execution times of these two stores are very small. Virtuoso is faster than RDF4j in 4 cases and RDF4J is faster in 5 cases. Therefore, it can be said that both stores have shown a very similar performance in our benchmark. Fuseki averagely needs 1.6 times longer to execute queries than Virtuoso. RDF4J needs at most 54% of the time Blazegraph or AllegroGraph need. In order to compare how the second dimension of queries, which is the number and position of variables in a query, affect the execution times, we compared the average execution times of all RDF stores for each combination of property path expression and variable position in a query. For instance, in figure 3 the average execution times of queries containing the transitive reflexive property path expression are shown for the different numbers and positions of variables in a query. Due to the fact that Virtuoso does not execute queries containing two variables and the reflexive transitive closure property path expression, the respective bar is missing in figure 3.  The plots in figure 3 show that Blazegraph processes queries with one variable faster than queries with no or two variables. Contrary to this we expected that queries with two variables would need the most time for execution, since these queries may consider every vertex in the graph as potential start vertex. AllegroGraph executes queries with no variables slightly faster than queries with one variable and queries with one variable faster than queries with two variables. There is nearly no difference in execution times for Virtuoso and RDF4J. Both stores execute the queries faster than the other three stores and the execution time does not increase with the number of variables. Note that Virtuoso does not execute queries with two variables and the transitive reflexive property path expression and therefore, nothing can be said about the execution times for respective queries. Finally Fuseki executes queries with one variable slightly faster than queries with no or two variables. We have expected that queries with two variables would have the highest execution times, but this was only the case for AllegroGraph and Fuseki. Due to the fact that the increase of execution time might depend on the size of the dataset we will test larger datasets in the future.
When investigating the influence of the semantic aspects dimension on the query execution time, most results were not surprising. For instance, transitive closure property path expressions that match with longer paths take longer to be executed. In [12] it is stated that query logs of some public SPARQL Endpoints contain a lot of queries, which return empty result sets. Therefore, it might be beneficial for RDF stores if they can identify these queries quickly to reduce the workload of the store. When focusing on the benchmark queries that return an empty result set, we could figure out that their execution times were similar to the execution times of comparable queries that returned non-empty result sets. Virtuoso and RDF4J have the fastest query execution times (averagely 24ms and 23ms) for queries with empty result sets and average execution times of 23ms and 24ms respectively for comparable queries with non-empty result sets. Blazegraph and AllegroGraph have the longest execution times (averagely 146ms and 114ms) for queries returning empty result sets and take 143ms and 121ms on average for comparable queries with non-empty result sets. Fuseki averagely required 44ms and 45ms for queries with empty and non-empty result sets respectively. This outcome indicates that there is a potential to improve the performance of these stores for queries with empty result sets.

Summary of Results
In summary, all stores returned complete and correct result sets for queries with an inverse, sequence or alternative property path expression. For queries containing an existential property path expression in it, Blazegraph, AllegroGraph and RDF4J all handle the term {(a, a)|a ∈ Γ } differently and are not following the W3Cs semantics. In case of transitive closure property path expressions, Virtuoso and RDF4J ignore results from cyclic paths. AllegroGraph returns empty result sets for queries with the negated property set, if one of the IRIs in the negated property set does not exist in the dataset. Furthermore, AllegroGraph seems to interpret the inverse negated property set as negated property set in queries with two variables. Virtuoso throws errors for ample queries with the inverse negated property set and RDF4J does not execute queries with the negated property set, inverse negated property set or the combination of both sets, where the object of the property path is an RDF term.
Furthermore, Virtuoso does not allow queries with variable path length without a fixed starting or ending point. This means whenever a query with 2 variables containing an existential, a transitive closure or a transitive reflexive closure property path expression is executed, Virtuoso returns an error. From the tested 5 RDF stores only Apache Jena Fuseki could return complete and correct result sets for all queries.
When comparing the execution times of queries for which all stores returned complete and correct result sets, RDF4J and Virtuoso are the two fastest stores in our evaluation. Fuseki needs averagely 60% more time to execute queries than RDF4J and Virtuoso. Blazegraph and AllegroGraph need averagely 260% more time than Fuseki. Furthermore, we have expected that queries with two variables would have the highest execution time for each store, but this was only the case for AllegroGraph and Fuseki.

Related Work
Common benchmarks for RDF stores like the Lehigh University Benchmark [13], the DBPedia SPARQL Benchmark [14] or the Berlin SPARQL Benchmark [15] are designed to test the performance of RDF stores in different application scenarios. Since they were created before the release of SPARQL 1.1 they do not test property paths. Furthermore, the Lehigh University Benchmark is the only benchmark that also evaluates completeness and correctness of result sets.
In [16] Gubichev et al. propose an indexing approach called FERRARI to efficiently evaluate property paths. In order to show the efficiency of their approach they also propose a small benchmark with 6 queries over the YAGO2 [17] RDF dataset. Although this approach tests queries with property paths, it only measures execution times and does not evaluate correctness or completeness of result sets. In spite of the fact that the benchmark proposed in [18] is not a benchmark for property paths in particular rather than a benchmark primarily designed for streaming RDF/SPARQL engines it tests property paths among various other SPARQL 1.1 features. Even though the completeness and correctness of result sets is not calculated, the results of the benchmark show that most of the benchmarked stream processing systems do not support property path queries.
In [19] a system is presented that generates small datasets based on given queries, their query features (e.g., the OPTIONAL or FILTER construct) and a data set. Additionally to the small datasets, the system returns the reference result sets for the given queries. They allow for checking the completeness and correctness of the query result sets returned from the evaluated RDF stores. This system is not a benchmark in particular but could be used to create datasets for benchmarks, which evaluate the completeness and correctness of result sets.
In [3] a benchmark for the evaluation of property path support is introduced. This benchmark can use an arbitrary RDF dataset as benchmark dataset and creates queries based on 8 query templates. Due to the small number of queries and the fact, that these queries do not test all property path expressions, this benchmark cannot be used for the semantic evaluation of property path implementations. Nevertheless the results of this benchmark indicate that ample RDF stores return incomplete or incorrect result sets for queries containing property paths.
To the best of our knowledge no RDF benchmark exists that tests if the result sets of property path queries are complete and correct based on the W3Cs semantics.

Conclusion
Property paths were introduced with SPARQL 1.1 in 2013. They allow for describing complex queries in a more concise and comprehensive way. In order to evaluate the performances of property path query executions of RDF stores, we have developed a benchmark for semantic-based evaluation of property path implementations called BeSEPPI. BeSEPPI comes with a small RDF dataset especially created for the evaluation of property path queries and 236 queries, which test each property path expression. Our benchmark measures execution times of queries and allows for comparing different RDF stores with each other. Furthermore, BeSEPPI tests if the result sets of the benchmark queries adhere to the W3Cs semantics of property paths and calculates completeness and correctness of result sets.
With BeSEPPI we have benchmarked 5 common stores, namely Blazegraph, Al-legroGraph, Virtuoso, RDF4J and Apache Jena Fuseki. The results of BeSEPPI show that only Apache Jena Fuseki could return complete and correct result sets for all 236 queries. Each of the other 4 stores returned incomplete or incorrect result sets for some queries and Virtuoso and RDF4J do not support all types of queries. Furthermore, we have compared the execution times of queries, for which all stores returned complete and correct result sets. This comparison shows that Virtuoso and RDF4J have the lowest execution times. Fuseki is slightly slower. Blazegraph and AllegroGraph needed the most time for the execution of queries containing property path expressions.
With our evaluation we could observe that ample RDF stores do not completely adhere to the W3Cs semantics of property paths. Therefore, BeSEPPI seems to be useful for RDF store developers to evaluate or improve their property path implementations. The results in [3] have shown, that the correctness and completeness of property path query result sets may depend on the size of the loaded dataset. Therefore, we will perform a semantic evaluation of property path implementations on a large dataset in the future. Furthermore, we will evaluate the correct associativity (i.e. [