FORMAL SPECIFICATION OF THE NOSQL DOCUMENT-ORIENTED DATA MODEL

We built formal definition of the NoSQL document-oriented data model. Two formal data models were built. The first data model is based on sets and second one is based on multisets (bags). The special relations called subdocument and subrecord were introduced. It is proven that those relations are preorder. Also general results about the cofinal relation on the sets are given.


INTRODUCTION
Existing NoSQL DBMS are based on few data models.We can to talk rather about the new ideology of developing databases alternative to relational one than about a common platform constituting the ground of the NoSQL DBMS [1].One of the most common types of NoSQL DBMS is document-oriented systems such as MongoDB [2] and CouchDB, which are based on the open standard for data representation and interchange JSON.Below we will consider formal models that describe the data structures used in document-oriented NoSQL DBMS [3], [4], [5], [6] and research some formal properties of these models.
The construction is based on a composition approach to programming [7].At the time, it was successfully used to describe the semantics of relational databases and language SQL [8], [9], [10].The constructed models use sets, multisets, and nominate sets as their basis [11], [12], [13].

Denote by 2
a set of all finite subsets of a set ܺ, i.e. 2 = {ܺ ᇱ |ܺ ᇱ ⊆ ܺ &ܺ ᇱ − ‫ݏ݅‬ ‫.}݁ݐ݂݅݊݅‬ Let ‫ܦ‬ be the set of the atomic data and ܸ be the set of the names.Then the set of the nominate sets denoted by ‫ܦ‬ is a set of finite mapping from ܸ to ‫,ܦ‬ i. e ‫ܦ‬ = ‫:ܣ|ܣ{‬ ܸ′ → ‫,ܦ‬ ܸ′ ∈ 2 }.
Definition 1.The set of records ‫ܥܴ‬ (ReCord) and set of documents ‫ܥܦ‬ (DoCument) are constructed inductively by range.The set of records of the range 0 is coincident with ‫ܦ‬ .Denote it by ‫ܥܴ‬ .The set of documents of range 0, denoted by ‫ܥܦ‬ , is set of all finite subsets of ‫ܥܴ‬ , i.e. ‫ܥܦ‬ = 2 ோ బ .Suppose records and documents of range 0, 1, …, i are defined.Then records of range i+1 are defined as . That means the value of name can be either atomic data or a document of one of the previous ranges.A document of the range i+1 is defined as finite set or records having range i+1, i.e. ‫ܥܦ‬ ାଵ = 2 ோ శభ .
Because of ‫ܥܦ‬ ⊂ ‫ܥܦ‬ ଵ ⊂ ‫ܥܦ‬ ଶ ⊂ ⋯ is monotonically increasing sequence by constructing then it has limit . By analogy, for records we have . ■ Taking into account monotonic property the definition of records of the i+1 range can be rewritten as ‫ܥܴ‬ ାଵ = ‫ܦ(‬ ∪ ‫ܥܦ‬ ) .
Let's modify the definition of the range of a document and a record.Because of ‫ܥܦ‬ ⊂ ‫ܥܦ‬ ଵ ⊂ ⋯ and ‫ܥܴ‬ ⊂ ‫ܥܴ‬ ଵ ⊂ ⋯, then in the sense of previous definition the range of a document (record) is defined ambiguously: a document (record) of range i+1 can has lesser range.
Definition 2.We say that index i of the set ‫ܥܦ‬ is range of document ݀ if ݀ ∈ ‫ܥܦ‬ .The same is for records.■ Similarly the relation of the inclusion ⊆ for abstract set let's introduce relation to be subdocument for documents and relation to be subrecord for records which take into account inner structure of documents and records.Intuitively the relation to be subdocument (subrecord) means all information contained in subdocument (subrecord) also is contained in document (record).Denote by sdoc a relation to be subdocument and by srec a relation to be subrecord.The relations are introduced inductively by range.
Definition 3.Suppose ‫ݎ‬ ଵ , ‫ݎ‬ ଶ ∈ ‫ܥܴ‬ .Then ‫ݎ‬ ଵ ‫ܿ݁ݎݏ‬ ‫ݎ‬ ଶ if an only if ‫ݎ‬ ଵ ⊆ ‫ݎ‬ ଶ .Similarly for documents of the zero range we have records of the range j, j = 0, 1, …, i.Then for records of the range i+1 the relation ‫ݎ‬ ଵ ାଵ ‫ܿ݁ݎݏ‬ ‫ݎ‬ ଶ ାଵ means that the set of names of the record ‫ݎ‬ ଵ ାଵ is included into set of the names of record ‫ݎ‬ ଶ ାଵ .The values assigned to the equal names are simultaneously ether atomic data and are equal or they are documents.In second case their range is less then i+1 and document from record ‫ݎ‬ ଵ ାଵ must be the subdocument of the corresponding document from record ‫ݎ‬ ଶ ାଵ .Below ߨ ଵ ଶ ‫ݎ‬ is set of names of recors ‫ݎ‬ (i.e.set of first components of all pairs forming binary relation ‫.)ݎ‬ Formally the relation ‫ܿ݁ݎݏ‬ is written in such way: The induction basis.For records and documents of the range zero there is nothing to prove.▪

Suppose those relations are defined for documents and
The step of induction.Suppose the proposition takes place for records and documents of the range i.Then prove that for records of the range i+1 the reflexivity take place as well, i.e. ‫ݎ‬ ାଵ ‫ܿ݁ݎݏ‬ ‫ݎ‬ ାଵ .Really: • the set of names is included into itself; • if a value of a name is atomic data then it is equal itself; • if a value of name is subdocument then by construction its range is less then i+1.Therefore by inductive assumption it is the subdocument of itself.For documents of range i reflexivity take place as well because of each record of range i, as shown above is the subrecord of itself.▪Secondly we shall show the transitivity.Other words we shall prove that the implications The basis of induction.For documents and records of the range zero the transitivity takes place because of the relation to be subrecord coincides with the set-theoretical inclusion in this case.The transitivity of the relation sdoc for documents of the range zero is verified immediately.▪ The step of induction.Suppose the transitivity take place for documents and records of the range j, j = 0, 1, …, i.Then it takes place for documents and records of the range i+1.
Consider that for records.Suppose that From definition of ‫ܿ݁ݎݏ‬ it follows that set of names of the record ‫ݎ‬ ଵ ାଵ is included into set of names of the record ‫ݎ‬ ଶ ାଵ and set of names of the record ‫ݎ‬ ଶ ାଵ is included into set of names of the record ‫ݎ‬ ଷ ାଵ .Then set of names of record ‫ݎ‬ ଵ ାଵ is included into set of names of the record ‫ݎ‬ ଷ ାଵ .Two cases will be considered.Let the atomic data to be the value of a name ‫ݒ‬ at record ‫ݎ‬ ଵ ାଵ .From definition it follows that the name ‫ݒ‬ belongs to set of names of record ‫ݎ‬ ଶ ାଵ and its value is the same atomic data and that name belongs to set of names of record ‫ݎ‬ ଷ ାଵ and has the same value.Therefore if a name at record ‫ݎ‬ ଵ ାଵ has the atomic value then it has the same atomic value at record ‫ݎ‬ ଷ ାଵ .Let the document ݀ ଵ to be the value of name ‫ݒ‬ at the record ‫ݎ‬ ଵ ାଵ .Then the value of name ‫ݒ‬ at record ‫ݎ‬ ଶ ାଵ is, generally speaking, another document ݀ ଶ , and the value of name ‫ݒ‬ at record ‫ݎ‬ ଷ ାଵ is the document ݀ ଷ , and the following relations take place: ݀ ଵ ‫ܿ݀ݏ‬ ݀ ଶ and ݀ ଶ ‫ܿ݀ݏ‬ ݀ ଷ .By construction, the ranges of the documents ݀ ଵ , ݀ ଶ , ݀ ଷ strictly less then i+1.Therefore by inductive assumption ݀ ଵ ‫ܿ݀ݏ‬ ݀ ଷ .▪Now consider the documents of the range i+1.
At the same time antisymmetry doesn't take place for those relations.Really, consider following example.Suppose ݀ ଵ = ൛ሼሺܽ, 1ሻ, ሺܾ, 2ሻሽ, ሼሺܽ, 1ሻሽൟ, and Note, that relation sdoc is constructed by relation srec by logical scheme of relation of confinality [14].Consider the common case.
Let ‫,ܦ〈‬ ≤〉 to be a set with a binary relation introduced (generally speaking, it is not needed the relation ≤ to be partial order).
Definition 4.The relation ≤ induces following relation of confinality ⊴ on Boolean ܲሺ‫ܦ‬ሻ of the set D: By this means the property to be preorder for relation sdoc is logical conclusion of the similar property srec.
On the subject of relation of confinality there are [15], [16].

DOCUMENTS INCLUSION ON MULTISETS
At the real document-oriented DBMS the records can have duplicates.The situation is similar to the tables in relation DBMS, where the rows are allowed to have duplicates.Therefore it is need to do the following refinement of the constructed data model using multisets.So let's consider the possibility to repeat records at documents.
Let's introduce some definitions of the multiset theory which are needed to construct our model, see [9], [12], [17], [18].Let's those relation to be defined for documents and records of the range j, j = 0, 1, … i.

Definition 5 .
Multiset ߙ with base ܷ is a function ߙ: ܷ → ܰ ା , where ܷ is a set and ܰ ା = {1,2, … } is the set of natural numbers without zero.■Here ߙሺ‫ݑ‬ሻ, ‫ݑ‬ ∈ ܷ, -number of copies (duplicates) of the base element ‫ݑ‬ (multiple of the element ‫.)ݑ‬Denote by ‫ܯ‬ all multisets with base ܷ.Now define the set of records ‫ܥܴ‬ ெ and documents ‫ܥܦ‬ ெ .The definition will be given inductively.Definition 6.The set of records of range 0 is coincided with family of nominate sets, i.e. ‫ܥܴ‬ ெ = ‫ܦ‬ .The set of documents of range 0 is set of all finite multisets the bases of which are finite sets of records of range 0, i. e. ‫ܥܦ‬ ெ and documents of range j = 0, 1, … i are defined.Then records of range i+1 are defined similar to previous case, i.e.‫ܥܴ‬ ெ іାଵ = ‫⋃ܦ(‬ ⋃ ‫ܥܦ‬ ெ ) ୀ.The value of name can be either atomic data or document of the one of previous range.Correspondingly the documents of the range i+1 are introduced as finite multisets the bases of which are finite sets of records of range i+1, i.e. ‫ܥܦ‬ ெ ାଵ = s redefine the relations to be subdocument (designate it as ‫ܿ݀ݏ‬ ெ ) and to be subrecord (designate it as ‫ܿ݁ݎݏ‬ ெ ).The relations are introduced inductively by range.Definition 7.For records of the zero range the definition is the same, i.e. ‫ݎ‬ ଵ ‫ܿ݁ݎݏ‬ ெ ‫ݎ‬ ଶ if and only if ‫ݎ‬ ଵ ⊆ ‫ݎ‬ ଶ .For documents of the zero range ݀ ଵ ‫ܿ݀ݏ‬ ெ ݀ ଶ if and only if ‫ݎ∀‬ ଵ ∈ ܷ ௗ భ ‫ݎ∃‬ ଶ ∈ ܷ ௗ మ ሺ‫ݎ‬ ଵ ⊆ ‫ݎ‬ ଶ ሻ, where ܷ ௗ భ and ܷ ௗ మ are the bases of documents ݀ ଵ ܽ݊݀ ݀ ଶ correspondingly.
and only if for any record ‫ݎ‬ ଵ ∈ ݀ ଵ ାଵ there exists the record ‫ݎ‬ ଶ ∈ ݀ ଶ ାଵ such that ‫ݎ‬ ଵ ‫ܿ݁ݎݏ‬ ‫ݎ‬ ଶ .Notice that for record ‫ݎ‬ ଵ , generally speaking, several corresponding records in document ݀ ଶ ାଵ can exist.■ It is obvious that formal definition of ‫ܿ݀ݏ‬ and ‫ܿ݁ݎݏ‬ entirely corresponds to informal ideas about information including, given above.Proposition 1. Relations ‫ܿ݁ݎݏ‬ and ‫ܿ݀ݏ‬ are preorder (i.e. they are reflexive, transitive, but generally speaking, not antisymmetric).■First we shall show the reflexivity.The proof is by induction by range of records and documents.
Following relations take place: 1. ∅ ⊴ ‫ܮ‬ for all ‫ܮ‬ ∈ ܲሺ‫ܦ‬ሻ; 2. If relation ≤ is reflexive then relation of confinality ⊴ is reflexive too; 3.If relationе ≤ is transitive then relation of the confinality ⊴ is transitive; 4. If relation ≤ -is partial order then relation of the confinality orders partially the family of discrete subsets of set D (subset L is discrete, if ‫,ܮ〈‬ ≤〉trivial partially ordered set).■Proofing.The clause 1 is verified directly: implication from definition of the relation of confinality is truth trivially.▪ Clauses 2-3 are verified directly as well.▪ Let's proof clause 4. For given ‫ܮ‬ ଵ ⊴ ‫ܮ‬ ଶ and ‫ܮ‬ ଶ ⊴ ‫ܮ‬ ଵ , we will demonstrate that ‫ܮ‬ ଵ = ‫ܮ‬ ଶ .Suppose ‫ݔ‬ is arbitrary element such that ‫ݔ‬ ∈ ‫ܮ‬ ଵ .Then there is element ‫ݕ‬ ∈ ‫ܮ‬ ଶ such that ‫ݔ‬ ≤ ‫.ݕ‬ From ‫ܮ‬ ଶ ⊴ ‫ܮ‬ ଵ it follows that for element ‫ݕ‬ there exists element ‫ݖ‬ ∈ ‫ܮ‬ ଵ such that ‫ݕ‬ ≤ ‫.ݖ‬So we have ‫ݔ‬ ≤ ‫ݕ‬ ≤ ‫.ݖ‬ Therefore ‫ݔ‬ ≤ ‫ݖ‬ because of relation ≤ is transitive.Since ‫,ݔ‬ ‫ݖ‬ ∈ ‫ܮ‬ ଵ and < ‫ܮ‬ ଵ , ≤> is trivial partially ‫ݕ‬ and ‫ݕ‬ ≤ ‫.ݔ‬ Since relation ≤ is antisymmetric we obtain ‫ݔ‬ = ‫ݕ‬ ∈ ‫ܮ‬ ଶ .Hence because of element ‫ݔ‬ is arbitrary then ‫ܮ‬ ଵ ⊆ ‫ܮ‬ ଶ .Inclusion ‫ܮ‬ ଶ ⊆ ‫ܮ‬ ଵ is proven in the same way.▪■ Conclusion 1.If initial relation ≤ is preorder then relation of confinality ⊴ is preorder as well.■ ISSN 1335-8243 (print) © 2013 FEI TUKE ISSN 1338-3957(online), www.aei.tuke.skordered set we have ‫ݔ‬ = ‫.ݕ‬Therefore ‫ݔ‬ ≤ Then for records of the range i+1 the relation ‫ݎ‬ ଵ ାଵ ‫ܿ݁ݎݏ‬ ெ ‫ݎ‬ ଶ ାଵ is defined in the same way as above for exception if the values of the equal names are documents then they are in relation ‫ܿ݀ݏ‬ ெ but not ‫.ܿ݀ݏ‬For documents of the range i+1 the ݀ ଵ ାଵ is subdocument of ݀ ଶ ାଵ if and only if for any record ‫ݎ‬ ଵ ∈ ܷ ௗ భ శభ there exists the record ‫ݎ‬ ଶ ∈ ܷ ௗ మ శభ for which ‫ݎ‬ ଵ ‫ܿ݁ݎݏ‬ ெ ‫ݎ‬ ଶ .■ ISSN 1335-8243 (print) © 2013 FEI TUKE ISSN 1338-3957(online), www.aei.tuke.skTheory and Technology of Programming of the Faculty of Cybernetics at Taras Shevchenko National University of Kyiv.She defended her PhD in the field of theory of programming in 2011; her thesis title was "Multisets Theory and its Applications".Since 2012 she is working as a lecturer at the Department of Information Systems.Her scientific research is focusing on theory of programming, databases and multisets.