Truth Space Method for Caching Database Queries

We propose a new method of client-side data caching for relational databases with a central server and distant clients. Data are loaded into the client cache based on queries executed on the server. Every query has the corresponding DB table – the result of the query execution. These queries have a special form called "universal relational query" based on three fundamental Relational Algebra operations: selection, projection and natural join. We have to mention that such a form is the closest one to the natural language and the majority of database search queries can be expressed in this way. Besides, this form allows us to analyze query correctness by checking lossless join property. A subsequent query may be executed in a client’s local cache if we can determine that the query result is entirely contained in the cache. For this we compare truth spaces of the logical restrictions in a new user’s query and the results of the queries execution in the cache. Such a comparison can be performed analytically , without need in additional Database queries. This method may be used to define lacking data in the cache and execute the query on the server only for these data. To do this the analytical approach is also used, what distinguishes our paper from the existing technologies. We propose four theorems for testing the required conditions. The first and the third theorems conditions allow us to define the existence of required data in cache. The second and the fourth theorems state conditions to execute queries with cache only. The problem of cache data actualizations is not discussed in this paper. However, it can be solved by cataloging queries on the server and their serving by triggers in background mode. The article is published in the author’s wording.


Introduction
In this paper we discuss the use of data views that are cached on the client's computer.We assume client-server environment with server based Relational Database (RDB).The cache stores query results in order to provide maximum usage of saved data in subsequent queries.
The purpose of this paper is to study the problem of building and using data views on a client's computer.This problem is similar to query optimization because it aims Truth space method for caching database queries 249 to decrease the data transfer from a Database server.Cached data is actively used in Database Management Systems (DBMS), but mostly it is only repeating use of data written in cache, without any prior data analysis aimed to define any partial or combined use.The DBMS can only avoid requesting blocks of data from external devices, while serving the query, if they are present in the cache.So, only block numbers are analyzed, not their information.
This paper is based on the results, obtained in [1].We have removed a constraint that limits the choice of attributes for Intermediate Data View (IDV) and also made some generalizations on multiple IDVs case.The second section provides an overview of publications on the subject of this paper.We have considered only those closest to the problem at hand.In the end of this section we present an example to explain our approach for solving the problem.
In the third section we provide a formalization of the problem.We also present the approach for removing the uncertainties in database queries and present auxiliary properties and definitions that are used later.
The main results are presented in Section 4. We research the possibility of using cached results of previous queries when performing a new one.The results are presented in the form of theorems.
One of the directions of further researches is combining this technology with efficient data manipulations approaches.Those can be performed with a help of Graphical Processors.GPUs gain serious attention for big data processing [2,3,4,5].Such processor architectures have big potential in multitasking and can reach enormous speeds compared to CPUs, especialy if the operations are performed similarly on different parts of data.

Existing solutions overview and comparison
One of the main research problems of queries optimization to database is the construction of an optimal query plan.Queries are transformed without content analysis of the database content and cache.In other cases, this information is taken into account for calculation statistical estimations to improve physical access to data.These problems are the subject of many studies, but they are outside of our approach.
A lot of publications are devoted to the problem of cache content management.For example, the heuristic algorithms update the cache with regards to user data [6], storage of important user queries in a cache [7], storage of the cached data on multiple servers [8].This paper solves the problem of the best use of a cache for execution of the user's queries.Cache content management and the best use of the cache are two complementary problem.
The most similar to our work is paper [9].The authors analyze conjunctive queries on data domains with predicates in the form of arithmetical comparisons, and present query computation algorithms using IDVs.In our paper, the special case of universal relational query is considered.It is a query on the Database relations, not particular domains.Although we had similar aims, the results obtained differ because of stated factors.In particular, there is no need to create any algorithms of data selection in our work, as they are replaced by Relational Algebra.
Another set of works describes processing scheme for event-driven continuous queries [10,11].In the proposed approach, query result caching is introduced to achieve a flexible Моделирование и анализ информационных систем Т. 22, № 2 (2015) way to share common operators among queries activated by unpredictable events.When a query is activated, an intermediate result generated for the query is stored into the cache area if it is expected to be reused by other queries.When other queries including the same operator are activated, they reuse the cached result if the cache includes reusable data.
The work [12] proposes methodology for functionally decomposing complex queries in terms of primitives so that multiple reuse sites are exposed to the query optimizer, to increase the amount of reuse.
In [13,14] queries are separated into subqueries that can yield results from cached nodes.Then query result can be combined from several subqueries.
Using cloud storages one can afford a lot of computational resources for a short time to execute complicated queries effectively on large data with a help of virtual machine clusters [15,16,17,18].A set of reliable heuristic algorithms is used.Finally the authors carry out a series of experiments that show that their optimizations speed up the federated query evaluation process.
In papers [19], [20] similar to our problem is discussed.In this paper we make correspondence between cache contents and predicates.Cached data usage problem is resolved in terms of truth spaces.We compute the truth spaces of query results in the cache.It allows us to define records in IDV that can be used to form a new view and new SQL queries that will let us load missing data from Database server.The following example demonstrates suggested approach.
Example 1. Let's assume the following database schema fragment, which represents the University Study plan: R 1 = Students (Stud_ID, Stud_Name, Group) R 2 = Schedule (Group, Room_ID, Course) R 3 = Progress (Stud_ID, Course, Score), relation names are italic, Primary Key attributes are in bold.Assume that on he user's computer the following queries are cached: Query 1: List of students studying physics, whose ID is bigger than 210: where π X 1 -projection operation over the set of attributes Query 2: Examination sheets of the group M10: where Let us assume that the user has requested information formalized with the following query: Using the calculation and comparison of the truth domains P 1 , P 2 and P * , we obtain that the query can be executed in the cache: P * = π X * (σ F 3 (P 1 P 2 )), where F 3 -logical condition: (Stud_ID > 300).Requesting the server in this case is not required.

Logical constraint definition and properties
To simplify domain computations we will consider logical formulas in Disjunctive Normal Form (DNF).In general case formula F has the following form: here T j , j = 1, . . ., n -predicates, where expanded attribute names are specified.R i .A j means attribute A j in relation R i .Those predicates can be: • operation Expr θ ALL/ANY S.
Note 1.We assume logical formulas have no trivial conditions on attributes, for example, Expr 1 = Expr 1 and those reduced to such form.In general, we assume that R i .A l domain is not fully contained in T j (. . ., R i .A l , . . . ) predicate truth space.Such conditions can be removed from the formula without a change in the truth space (we will define it later).
Definition 1. Space of attributes contained in a formula show the dimension of the formula and is denoted as F .
Listed variants of operations don't use every single SQL capability.For example, we don't use the EXISTS predicate, because it has no expanded attribute names in it.The NULL predicate is used for another purpose in our paper.
During logical formula domain calculation, if we have some attribute having N U LL value on a tuple t, we then get UNKNOWN value for the whole formula, because SQLquery results follow the Three-valued logic.It leads us to ambiguous interpretations of results of both usual users and experienced programmers.To solve this problem we suggest the following constraint: every attribute in F * is supplied with a property "Use of undefined value" with two mutually exclusive values: "Yes" or "No".The reasoning behind this property is the following: if it is assigned "Yes", then we leave tuples with NULL value for further consideration.Otherwise, having "No" in that property guarantees us removing all such tuples.
Let's write expression (1) for F in the following form: F (. . ., T j , . . .), here T jpredicates of expression (2).After the modification it will be the following: Моделирование и анализ информационных систем Т. 22, № 2 (2015) F (. . ., T j , . . . ) ∧ i,j (R i .A j = NULL), here ∧ i,j (R i .A j = NULL) -conjunction of all F attributes,for which NULL is not allowed, and T j = (T j ∨ i,j (R i .A j = NULL)), here ∨ i,j (R i .A j = NULL) -disjunction of all F attributes, for which NULL is allowed.Outer brackets for T j predicate define operation priority.We can see that this logical formula can only have TRUE and FALSE values when considering it in Three-valued logic.We can also note that if tuples don't have undefined values, the initial formula F will be equivalent to the transformed one, so semantics of view P almost undistorted.For the disclosure of the term "almost"we consider example.Assume F = R 1 .A 2 > 3∨R 3 .A 4 < 4. Let tuple t for R 1 .A 2 has value NULL, with a property "Use of undefined value" equal to "Yes", and R 3 .A 4 value equal to 5, so value of R 3 .A 4 < 4 is FALSE .Then transform of formula F on tuple t will be equal TRUE , which is not obvious.
Hereafter we will assume all the F formulas to be transformed.We consider set of all values for all Database attributes.Data constraints bound this space to some set of points that represents a set of available Database states.
In our example with University Study plan, the A can be the following: Obviously, without constraints this set represents more states than DB can have.For example, each student has his/her own ID, whereas there are 3 corresponding IDs for each student in A: {205, 315, 461} × {Rachel Davis, Noam Angrist, Cameron McCord}.Note 2. Dimension of formula F can be smaller than dimension of A. In this case, we consider the equivalent form of the formula that has all other attributes taking any values in their domains.
Looking back to our Example 1, we see that the formula in Query 1 contains only two attributes: The equivalent representation that we will use is the following: The motivation behind this transformation is to allow us comparing truth spaces of formulas defined on different attributes.Definition 2. Truth space of logical formula F , defined by ( 1), ( 2), (3), is a set So, the truth space of our formula F 1 will be the following subset of A: M (F ) = {(315, Rachel Davis, M10, 100, Physics, 70), (315, Noam Angrist, M10, 100, Physics, 70), . . .(461, Cameron McCord, M10, 100, Physics, 90)} M (F ) for a given formula F , written in DNF, is nothing but a union of Truth spaces of distinct conjunctive clauses.Truth space of a single conjunctive clause is an intersection of Truth spaces of it's predicates.Note 3. The complexity of Expr , S and Str predicates is defined by the software's ability to compute Truth spaces of formulas, as it is shown in example 1.
Definition 3. Given a logical formula F , defined by ( 1), ( 2), (3), the projection of F on the X attribute set is a logical formula F [X], F [X] = X, that has all its predicates with R F i .A F i / ∈ X replaced with trivial predicate TRUE .
So, the projection of F 1 on the attribute set X = {Stud_ID} will be transformation is made by replacing T j predicates that contain A j ∈ F attribute, such that A j / ∈ X with TRUE value.We get F ([X]) = TRUE according to (1) and ( 2).Therefore, a ∈ M (F [X]).This property of logical formulas is used while building new data view from given IDVs.

Intermediate Data View properties research
We denote saved IDVs as P ={P 1 , P 2 , . . ., P m }, here P v is an IDV, ), s(v) -amount of relations in Database that were used while building P v , π Xv -projection on X v attributes, σ Fv -selection with F v logical formula.Eventually we need to get the following view: Let's study the problem of building data view P * using existing P v IDVs.
Proof.Let t * be any tuple in P * .We need to show that t * ∈ π X * (P v ).From condition t * ∈ P * , there is the tuple t There may be several such tuples.We will choose the one that satisfies F * (t ) = T RU E. This tuple exists indeed, because otherwise t * would not be in P * .Thus, there are tuples t * i ∈ R * i : Моделирование и анализ информационных систем Т. 22, № 2 (2015) and for any pair i and j, such that R * i ∩ R * j = ∅, equality is held: Whereas conditions ( 4) and ( 5) are held for the whole set of relations {R * 1 , . . ., R * l }, they are true for any its subset also, including {R v 1 , . . ., R v s(v) }.Hence, after joining tuples t * i from relations R v j we get a tuple t , such that t = t [ Condition F * (t ) = TRUE and statement 1 imply the truth of the projection F * [X v ] on a tuple t , and consequently on t , as the formula is defined at common attributes of these tuples.Furthermore, according to condition c), we have The theorem has been proven.
The conditions given in the aforementioned theorem guarantee the data to build P * to be contained in P v IDV.However, there can be excess tuples that make F * take TRUE value.They appear because there are some relations in and will be deleted if we join those missing relations.Using Truth spaces of logical formulas we can query DBMS to get a minimal required data set to define excess tuples.
In the next theorem we describe the case of coinciding relation sets.
Proof.Conditions of the theorem are a special case of theorem 1, so inclusion P * ⊆ π X * (σ F * (P v )) is considered to be proven.It is necessary to show that π X * (σ Hence, t v ∈ P * .The theorem has been proven.
It is possible to use several IDVs to build a resulting representation.First, let's consider the following simple property of natural join.
Proof.It is so indeed, because joining additional relations to 1 can only remove some tuples that were presented there.After the projection, we will get the same relation 1 if no tuples were deleted or some subset otherwise.The order of joins doesn't matter because of commutativity of this operation.
Proof.Again, we will choose an arbitrary tuple t * ∈ P * and show that t * ∈ π X * (σ F * [X] ( n v=1 P v )).By analogy to theorem 1, the tuples t , t = t [ s i=1 R i ] are built.Note that The further reasonings also apply to all IDVs.We get F * [X](t ) = TRUE and F v (t ) = TRUE , v = 1, . . ., n. Considering statement 2, we have t [ The theorem has been proven.
Just like in the previous case, there are some additional conditions that allow us to get precise data presentation from IDVs.
Proof.Conditions of the theorem are a special case of theorem 3, so inclusion P * ⊆ π X * ( n v=1 P v ) is considered to be proved.It is necessary to show, that π X * (σ F * ( n v=1 P v )) ⊆ P * .Assume tuple t ∈ π X * (σ F * ( n v=1 P v )).Let's show, that t ∈ P * .Denote Using the property of the projection operation and condition a), we have Thus, according to the definition of tuple t, there is a tuple Therefore t ∈ P * .The theorem has been proven.The conditions listed above aren't exceptional in any case.Many applications require repeat of data entry with slight modifications.For example, applications that work with multidimensional data.Hypercube dimensions do not change often, so that we can create new hypercubes from IDVs without frequent interaction with the database.