Semantic Cache Reasoners

The current book is a nice blend of number of great ideas, theories, mathematical models, and practical systems in the domain of Semantics. The book has been divided into two volumes. The current one is the first volume which highlights the advances in theories and mathematical models in the domain of Semantics. This volume has been divided into four sections and ten chapters. The sections include: 1) Background, 2) Queries, Predicates


Introduction
Semantic caching (Ren, Q et al., 2003), (Dar et al., 1996) is said to be a technique for storing data and their corresponding semantic descriptions.Concept of semantic cache itself is quite simple but the reasoning required to evaluate any query over a semantic cache can be very complex (Godfrey P. and Gryz J., 1997).The reasoning over stored semantics is a determination process to know how query and cache formulas are related semantically.This reasoning is termed as semantic cache query processing (Ren, Q et al., 2003), (Dar et al., 1996).In this chapter we demonstrate several semantic cache query processing techniques for relational queries, web queries, xml queries, answering queries form materialized views and logic based subsumption analysis queries.
Mainly there are two types of semantic query processing approaches, structured-semantics and unstructured-semantics.In structured-semantics original problem or query is represented in a structure that has the ability to contain semantics along with its structure.Examples of structured-semantics are ontology, resource description framework (RDF) and extensible markup language (XML) etc. Unstructured-semantics approaches perform reasoning for semantic extraction from structures that do not posses semantics in their representations.Semantic cache query processing is an example of unstructured-semantics reasoning.Since standard query language (SQL) is structured but it do not contains semantics of data to be answered against a query and query itself.
In this chapter we demonstrate several semantic cache reasoners for unstructuredsemantics.All of these semantic cache reasoning techniques represent query language to a mediate structured-semantic representation for semantic extraction.

Semantic cache query processing
In general research a semantic cache system can be grouped into two parts i) cache management and ii) query processing.Strategies for data management, replacing, coalescing, and indexing results of previously evaluated queries are mainly the part of semantic cache management.Query processing involves techniques that compute available and unavailable data from a semantic cache by performing some sort of reasoning over semantic descriptions.Also query processing technique handles local query execution, retrieval of unavailable data from a remote server and formulation of the end results.In this chapter we focus on semantic cache query processing.
At finer granularity semantic cache is a collection of semantic regions or semantic segments.Associated semantics for a cached query, which is a query specification (Lee et al., 1999) are stored in semantic cache along with resultant data is called a semantic region (Dar et al., 1996) or semantic segment (Ren et al., 2003).Formal definition of semantic segment can be seen in (Ren et al., 2003).A query processing technique can perform reasoning over semantic segments to determine whether cached data fully or partially or do not contributes to an incoming query.If the incoming query is fully answerable from a semantic cache, then no communication with the server is required.
Similarly a partial answer to a query will reduce the amount of data retrieved from the server.
In case of a partial answer, the user query is trimmed into two disjoint sub queries (Keller A.M. and Basu J., 1996): the query executed locally called Probe Query (ProbQ) and the query sent to the server named Remainder Query (RemQ) (Dar et al., 1996).The previous literature (Ren, Q et al., 2003), (Dar et al., 1996), (Lee et al., 1999), (Godfrey P. and Gryz J., 1997), (Keller A.M. and Basu J., 1996) shows that this trimming is performed on the basis of relationship between the content of a semantic segment and the result required by an incoming query.Possible cases of the relationship between the incoming query and the semantics stored in the cache (as reported in the literature) is shown in Figure 1.White boxes represent previously stored query results and gray boxes shows incoming user queries.In Figure 1 rows (tuples) are represented horizontally and columns (attributes) are vertically and only select-project queries are considered.In each case a user query overlaps semantic cache region in a certain way.Case 2 depicts a horizontal partition in which some part of the incoming query tuples satisfied by cache semantics.Where in case 3, a projection of the query is available in cache and some attributes are missing, this situation is called a vertical partition.This figure represents that a partial answer is possible in case 2,3 and 4, where a user query can be fully answered from the cache in case 1.This figure is used to evaluate a semantic cache query processing scheme, too, i.e. whether a scheme incorporates all the cases or not.We argue that due to this misleading diagram, the missing implicit semantics are not being considered in the previous query processing techniques.Therefore, in this thesis we have adopted a new way of comparing the semantics of a user query and the cache semantics in the coming sections.t)

Semantic cache query processing criteria
Previous surveys (Bashir M. F. and Qadir M. A., 2006a), (Ahmad, M and Qadir, M.A., 2008), (Jónsson B. Þór et al., 2006), (Hao X et al., 2005), (Halevy, A.Y., 2001), (Makki K. S and Andrei S, 2009) conducted over semantic cache query processing identified two main parameters for evaluation i.e.Maximum Data Retrieval (MDR) and fast query processing.Quantification of the MDR was not given in those surveys.Here we quantify it with the test, data from server (D s ) intersection data from cache (D c ) should be empty set i.e.D s ∩ D c = Φ.In general any technique which retrieves maximum possible or complete results from local cache in tractable time with this given quantification is said to be an efficient semantic cache query processing technique.

Query
}, c is a constant in a specific domain (Ren et al., 2003), Q UD is the resultant data of this query.A query can be represented as π QUA σ QUP (Q UR ) in relational algebra.

Amending query
A query that only request a key attribute of a relation from a remote server to extract known available data from cache is called an amending query.When we know that some data is available in semantic cache but could not extract it precisely.Than we request the server for a key attribute for a user query and extract cached attributes (data) against those keys from cache.Requesting only keys require less computation on database server and low bandwidth over network, in general.
Consider the following employee database information provided in example 1 below, which shall be used throughout evaluation in this chapters.The semantic cache model we follow is similar to the relational database model.The basic building blocks of the relational model are attributes (columns), rows (tuple), tables (relations) and relation schema.The schema defines the relations and the attributes with their data type in each relation.A row or a tuple is a set of attribute's instances.

Query intersection
Query processing for the five scenarios similar to Figure 1 and one additional scenario which shows cache as a subset of incoming queries (reverse of case 1 of Figure 1) was presented by Lee (1999, pp.28-36).Against each scenario probe and remainder query were computed based on cache and query intersection or difference.Intersection and difference of cache semantics and a posed query were mentioned at a very abstract level.Definition of intersection (Lee et al., 1999) between semantics of cache region Q C and a user query Q U on relation R is shown in statement (i) of Figure 4.This intersection consists of two parts.One is the common projected attributes while the other is combined condition of a user and cached query predicates (Shown in statement (ii) of Figure 4).A query or cache semantics are represented as a triple < π and a condition is satisfiable if none of the value domain is null.We elaborate this concept with an example.
Consider the database schema information provided in example 1 above.A user query Q U over cached query Q C1 of Figure 3 are represented as triple < π Q , σ Q , operand Q > in statement (iv) and (iii) of Figure 4 respectively.The query Q U is statisfiable (or completely answerable) from Q C1 because intersection of projected attributes is not empty and there is no null value domain in predicate condition.According to Lee (1999, pp.28-36) two queries are disjoint if either intersection of their projected attributes is empty or there is no combined condition between user and cached query predicates.

Query trimming
The concept of query trimming was introduced by (Keller A.M. and Basu J., 1996) and formally given by Ren (2003, pp.192-210).Ren (2003, pp.192-210) gave a comprehensive algorithm for query processing.In the start of the algorithms it is checked if the user query attributes are subset of cached semantics attributes, then perform query trimming based upon the implication or satisfiability of predicates.If the user query attributes is not a subset of cached semantics attributes, then there may be some common attributes.In this case, if query predicate implies cache predicates or there are common predicates between the query and cache semantics, then form the probe and remainder query as per the logic given by the algorithm.In other words the logic is based on checking implication and satisfiability of a user and cached query predicates (based upon the already published material, as explained in the next section) and finding common part between the user and cached query attributes.
Much work has been contributed towards finding implication and satisfiablity between a user and cached query predicates (Guo S et al., 1996), (Härder T. and Bühmann A., 2008).Simplified concept of implication and satisfiability is, let us have a user query predicate Q UP and semantic segment predicate Q CP , then there are three scenarios: User predicate implies segment predicates, implying that the whole Remainder queries were trimmed again after comparing with other semantic cache segments with the same algorithm.It continues until it is decided that the cache does not further contribute to the query answering.This approach forms an iterative behavior, which was handled by a proposed query plan tree structure.This plan tree expresses the relationship of cache items and query subparts.
Query trimming techniques have some short comings, such as time and space efficiency, and complexity of the trimming process (Makki K. S and Andrei S, 2009), (Makki K. S and Rockey M., 2010).When query is trimmed into probe (Q UP  Q CP ) and remainder (Q UP   Q CP ) part, the negation of the cached query predicate in remainder part make it much more expanded term if it contains disjunctions.This expansion created by negation of a term was shown with example (Makki K. S and Andrei S, 2009).
A relational query can be visualized as a rectangle with boundaries set by query predicate values.So according to (Makki K. S and Andrei S, 2009), (Makki K. S and Rockey M., 2010) semantic cache query processing based on query trimming is problem of finding intersection between two finite rectangles.Six cases that are extended form of Figure 1.1 are given in (Makki K. S and Andrei S, 2009), (Makki K. S and Rockey M., 2010) to show relationship between rectangles of user and cached queries.These rectangular representations do not depict implicit knowledge present in the semantics of user and cached queries.An technique named Flattening Bi-dimensional Interval Constraints (FBIC) was proposed (Makki K. S and Andrei S, 2009).Based on FBIC an algorithm for handling disjunctive and conjunctive queries was given by Makki (Makki K. S and Rockey M., 2010).The algorithm works for only single disjunctive case, where conjunctive cases are same as provided by (Makki K. S and Andrei S, 2009).
Finding intersection between rectangles of user and cached queries was done by comparing Bounds (Lower or Upper) of both rectangles.But computing comparable bounds were not given (Makki K. S and Andrei S, 2009), (Makki K. S and Rockey M., 2010).

Satisfiability and implication
Finding whether there exists a satisfiable part between two formulas or whether one implies the other is central to many database problems such as query containment, query equivalence, answering queries using views and database cache.So according to Guo (Guo S et al., 1996) implication is defined as "S implies T, denoted as S  T, if and only if every assignment that satisfies S also satisfies T".Similarly satisfiability is defined as "S is satisfiable if and only if there exists at least one assignment for S that satisfies T." (Guo S et al., 1996) had given algorithm to compute implication, satisfiability and equivalence for given conjunctive formulas in integer and real domain.Let us have a formula (Salary < 20K AND Salary > 8K AND Department = 'CS') is satisfiable, because the assignment {12K/Salary , CS/Department} satisfies the formula.Similarly a formula (Salary >10K OR Salary < 12K) is a tautology, because every assignment under this formula is satisfiable.
Satisfiability and implication results in databases (Guo et al., 1996),(J.D. Ullman, 1989), (A.Klug, 1988), (Rosenkrantz and Hunt, 1980), (Sun et al., 1989) are relevant to the computation of probe and remainder query in semantic cache query processing for a class of queries that involve inequalities of integer and real domain.Previous work models the problem into graph structure.
Rosenkrantz and Hunt (Rosenkrantz and Hunt, 1980) provided an algorithm of complexity O(|Q| 3 ) for solving satisfiability problem; the expression S to be tested for satisfiability is the conjunction of terms of the form X op C, X op Y, and X op Y + C. Guo et al. (Guo et al., 1996) provided an algorithm (GSW) for computing satisfiability with complexity O(|Q| 3 ) involving complete operator set and predicate type X op C, X op Y and X op Y + C.Here we demonstrate GSW algorithm (Guo et al., 1996) for finding implication and satisfiability between two queries.
The GSW algorithm starts with transforming all inequalities into normalized form through given rules.It was proved by Ullman (J.D. Ullman, 1989) that these transformations still holds equality.After these transformation remaining operator set become {≤ ,≠ }. ( Satisfiability of a conjunctive query Q is computed by constructing a connected weighteddirected graph G Q =(V Q ,E Q ) of Q after above transformation.Where V Q are the nodes representing predicate attributes of an inequality and E Q represent an edge between two nodes.An inequality of the form X op Y + C has X and Y nodes and an edge between them with C weight.The inequality X op C is transformed to X op V 0 + C by introducing a dummy node V 0 .
According to GSW (Guo et al., 1996) algorithm, for any query Q if a negative-weighted cycle (a cycle whose sum of edges weight is negative) found in G Q then Q is unsatisfiable.Otherwise Q is satisfiable.Testing satisfiability among user query Q U and cached segment Q S require us to construct a graph (G Qu  Qs ) of (Q U  Q S ) and check G Qu  Qs for any negative weighted cycle.Negative weighted cycle is found through Floyd-Warshall algorithm (R. W. Floyd, 1962).Complexity of Floyd-Warshall algorithm is O(|V| 3 ), so finding satisfiability become O(|Q U  Q S | 3 ).
An algorithm with O(|S| 3 + K) complexity for solving the implication problem between two conjunctive inequalities S and T was presented by Ullman (J.D. Ullman, 1989) and Sun (Sun et al., 1989).Conjunctive queries of the form X op Y were studied by (A.Klug, 1988) and (Sun et al., 1989).Implication between conjunctive queries of the form X op Y +C was addressed by GSW algorithm (Guo et al., 1996) with complexity O(|Q U | 2 + |Q C |). GSW Implication (Guo et al., 1996) requires that Q U is satisfiable.At first the implication algorithm constructs the closure of Q U i.e., a universal set that contains all those inequalities that are implied by    5(a).Q U1 is satisfiable with respect to Q C7 , as there is no negative weighted cycle in G QU  QC7 .

Bucket algorithm
As discussed earlier, a user of data integration system poses query in term of mediated schema, because root sources are transparent in such systems.A module of data integration system translate/reformulate a user query that refers directly to the root sources.Several reputed algorithms exist for such query reformulation/rewriting (Levy A.Y et al., 1996), (Duschka O.M. and Genesereth M.R. 1997), (Pottinger R. and Levy A. 2000).In context of semantic cache the root sources are the cache segments and the mediated schema is the cache description.The goal of the bucket algorithm (Levy A.Y et al., 1996) is to reformulate a user query that is posed on the mediated (virtual) schema into a query that refers directly to the available (local/cached) data sources.This reformulation is known as query-rewriting.Both the query and the sources are described by select-project-join queries that may include atoms of arithmetic comparison predicates.The bucket algorithm returns the maximallycontained rewriting of the query using the views.This rewriting is a maximally-contained but not an equivalent one.
We demonstrate working (in context of semantic cache query processing) of bucket algorithm with example.Let us have Q C1 , Q C2 and Q C3 (shown in Figure 6) in cache, and a user query Q U (shown in Figure 6) is posed over them.As shown in Table 1 below, according to bucket algorithm both cached queries Q C1 and Q C2 are candidate selection for its bucket.Since there is no inconsistency between user query predicate and cached queries (i.e.Age ≥55 consistent with Age < 70) when compared in isolation (atomically).Where Q C3 is excluded due to predicate inconsistency (i.e.Exp < 15 inconsistent with Exp > 20).In second step of bucket algorithm, elements of buckets are combined together to form a rewriting of the user query.The rewritten query (Q') in this case is shown in Figure 7 below.Table 1.Contents of Bucket.The attribute not required by user query is shown as primed attribute.
However, searching performed over web resources through Boolean queries (keywords conjunction with AND & NOT operators) do not work in a plain page caching system.Because the user query in this case is not a URL, and extracting qualified tuples against an individual keyword or whole query from page headers is not possible (Chidlovskii B and Borghoff U. M., 2000), (Qiong L and Jaffrey F. N., 2001).Semantic cache was introduced as an alternative to plain page caching where cache is managed as semantic regions.
Web queries over web resources are different than queries posed over databases.As there is no attribute and predicate part in web queries, also it neither contain join operator.And the problem of answering web-queries can be reduced to set containment problem.
There is a lot of research work on semantic caching for web queries.Such as (Chidlovskii B and Borghoff U. M., 2000) addressed both semantic cache management and query processing of web queries for meta-searcher systems.Their technique is based on a signature file method.In which a signature is given to every semantic region for processing all cases (similar to Figure 1) of containment and intersection.
A cache model was proposed for database applications using web techniques (Anton J. et al., 2002).Cache elements were stored as web pages/sub pages called fragments and sub fragments with their header information called template.Fragments can be indexed or shared among different templates.Fragments, sub-fragments and templates were updated or expired based on their unique policy which included expiration, validation and invalidation information.In this case data retrieval is performed by matching template information with requested query and subsequent fragments or sub-fragments are returned.Partial answer retrieval is possible in this technique as sub-fragments alone can be resulted to a user query.But still this technique is closer to page cache technique, where each fragment is itself a page.

Pattern Prime Product (PPT) reasoning for XML queries
The information that is available on the web is unstructured, extensible mark-up language XML is used to provide the structure to the web information/data.As described above the querying mechanism for current web is keyword based search.Keyword based search is considered to be the non-semantic (Mandhani B. and Suciu D., 2005), (Sanaullah, M., 2008).
A novel method of checking containment is proposed by Gang Wu and Juanzi Li (Gang Wu and Juanzi Li, 2010).Each node in the query is assigned a unique prime number and then the product of these prime numbers is calculated by a specific method.This product is called Pattern's Prime producT (PPT).The query is stored in the cache along with this PPT.
On each next issued query the same procedure is followed to assign unique prime numbers to each node and if any node of the query matches with any existing stored view's node then the same prime number is assigned to new node as it was allotted to previously stored node.The PPT of the new query is calculated and then divided by the PPT of all stored views.If any of stored views completely divides the PPT of the query then that view is selected and rest are rejected.The selected view further processed to make sure whether the occurrences of the nodes in the query and view is similar, i.e Qk = Vk where k is the position of kth axis node.The PPT of each infix is also checked.

Example
An XML document is shown in Figure 9.A user issues a query /lib/book and as a result the technique loads all the results of "lib", "book" nodes in the cache and assigns prime numbers to each node i.e. "lib"=2, "book"=3.After assigning the prime numbers a prime product is calculated as follows.
(2*3), here 6 is the Tree Pattern Prime Product of the view.Now if the user again issues the query /lib/book/author then each node in the query is assigned the same prime number as it was previously assigned to the nodes in the view.Here 2 is assigned to "lib" and 3 is assigned to "book"."author" appeared first time so a new prime number i.e. 5 is assigned to author node.Dividing the prime product of query (90) by the prime product of view ( 6) will yields the result 15, means query is completely divided by the view.If the prime product of a view completely divides the prime product of a query then it further checks the following conditions.Whether the order of appearance of each axis node in the view and query is similar and if the answer is true then it means that the query is contained in the view.

Example
If a query contains predicates, for example A[b[b[a]]]/c/d the tree of this query is shown in figure 9.The prime product is calculated as shown below This algorithm retrieves the results of all axis nodes given in the query for example if we issue following query to the document shown in figure 1 "\lib\book[price>30]".Then apart from the presence of a predicate it retrieves all the result of book node and stores it in the cache.This action requires more cache space.

Subsumption analysis reasoning
Description logics (a language of logic family) DL claims that it can express the conceptual domain model/ontology of the data source and provide evaluation techniques.Since structured query language (SQL) is a structured format, it can be classified under subsumption relationship.A well known technique named Tabulex provides structural subsumption of concepts.Description logic (DL) is assumed to be useful for semantic cache query processing and management (Ali et al. 2010).The relational queries can be modelled / translated in DL and DL inference algorithms can be used to find query containments.The translation of relational query to DL may have not the same spirit as that of querying languages for DL systems, but is sufficient for finding the query containment of relational queries (Ali et al. 2010).The subsumption reasoning (containment) of the semantics of the data to be cached is very useful in eliminating the redundant semantics and minimizing the size of semantic cache for the same amount of data.
The tableaux algorithm (Baader et al., 1991a) (Hollunder et al., 1990) is instrumental to devise a reasoning service for knowledge base represented in description logic.All the facts of knowledge base are represented in a tree of branches with intra-branch logical AND between the facts and inter-branch logical OR, organized as per the rules of tableaux algorithm (Baader et al., 2003).A clash in a branch represents an inconsistency in that branch and the model in that branch can be discarded.The proof of subsumption or unsatisfiability can be obtained if all the models (all the branches) are discarded this way (Baader et al., 2003).
The proposed solution (Ali et al. 2010) consists of two basic steps: First user query (relational) is translated into DL.The translated query is then evaluated for subsumption relationship with previously stored query in the cache by using the sound and complete subsumption algorithm given in (Baader et al.,91b) (Lutz et al., 2005).

Example
Considering, another scenario having predicates conditions with disjunctive operator in Figure 10.All the three branches yields to clash in checking Q3 ⊑ Q4; therefore, Q4 contain Q3.In first branch (Line 8 in Figure 10) after applying the Or rule, Emp and ⇁Emp yields to clash.In second branch (Line 9 in Figure 10) ename and ⇁ename yields to clash, and in third branch (Line 10 Figure 10), ≥30k(sal) and ≤19k(sal) yields to clash.All tree branches (Line 8, 9, 10 of Figure 10) yield to clash in opening the tableaux algorithm; therefore, Q3 ⊑ Q4.

Conclusion
In this chapter we demonstrated several reasoning techniques of query processing in semantic cache.This chapter provides overview of semantic cache application in different domains such as relational databases, web queries, answering from views, xml based queries and description logic based queries.
Semantic cache query processing techniques are unstructured-semantics approaches, in which semantics are extracted from structured representations that have no semantics within their representations.
where Q UA is Select Clause of query which contains projected attributes.Q UR is the From Clause which contains relation of a d a t a b a s e D , f r o m w h i c h d a t a i s t o b e r e t r i e v e d .Q UP is Where Clause which contains conjunctive or disjunctive compare predicates, a compare predicate is of the form P = a op c, where a  A {Attributes Set}, op  {

Table Example 2
: Let us have a user query Q U1