Foundations of ontology-based data access under bag semantics

Ontology-based data access (OBDA) is a popular approach for integrating and querying multiple data sources by means of a shared ontology. The ontology is linked to the sources using mappings, which assign to ontology predicates views over the data. The conventional semantics of OBDA is set-based—that is, the extension of the views deﬁned by the mappings does not contain duplicate tuples. This treatment is, however, in disagreement with the standard semantics of database views and database management systems in general, which is based on bags and where duplicate tuples are retained by default. The distinction between set and bag semantics in databases is very signiﬁcant in practice, and it inﬂuences the evaluation of aggregate queries. In this article, we propose and study a bag semantics for OBDA which provides a solid foundation for the future study of aggregate and analytic queries. Our semantics is compatible with both the bag semantics of database views and the set-based conventional semantics of OBDA. Furthermore, it is compatible with existing bag-based semantics for data exchange recently proposed in the literature. We show that adopting a bag semantics makes conjunctive query answering in OBDA coNP -hard in data complexity. To regain tractability of query answering, we consider suitable restrictions along three dimensions, namely, the query language, the ontology language, and the adoption of the unique name assumption. Our investigation shows a complete picture of the computational properties of query answering under bag semantics


Introduction
Ontology-based data access (OBDA) is an increasingly popular approach to enable uniform access to multiple data sources with diverging schemas [2][3][4][5][6][7]. An ontology in OBDA is represented using a fragment of first-order logic and provides a unifying conceptual model for the data sources. The ontology is linked to the schema of each data source by global-as-view (GAV) mappings [8], which declaratively assign views over the data to predicates in the vocabulary of the ontology. Users of OBDA systems are typically unaware of the details of the source schemas or the mappings, and access the data by means of (conjunctive) queries formulated using only the vocabulary of the ontology. The answers to a query are those that logically follow from the union of the ontology and the materialisation over the sources of the views defined by the mappings.
The formalism of choice for representing ontologies in OBDA is the description logic DL-Lite R [9], which underpins the Web ontology language OWL 2 QL [10]. DL-Lite R is a language designed to ensure that all input queries to the OBDA system are first-order rewritable-that is, the answers to every input query can be obtained by first reformulating the query as a set of relational queries over the source schemas, and then evaluating the reformulated queries over the source data [2]. In practice, such reformulation involves two steps known as rewriting and unfolding [8,11]. In the rewriting step, the original query is transformed into a first-order query that captures the relevant information in the ontology; in turn, in the unfolding step, the query computed in the rewriting step is further rewritten as a set of relational queries over the schemas of the sources using the relevant mappings.
OBDA has received a great deal of attention in recent years. Researchers have studied the limits of first-order rewritability in ontology languages [9,12], established bounds on the size of rewritings [13,14], developed optimisation techniques [15][16][17], and implemented systems well-suited for real-world applications [3,5]. Example 1. Consider a music encyclopedia that collects metadata about music records and makes it available to the public. The encyclopedia gathers data from record labels, which are maintained in separate relational tables. For instance, it contains the table Columbia(art_nm, r_title, r_year, r_date, r_loc) that provides the names of the records (r_title) artists (art_nm) have cut on the record label Columbia together with their release years (r_year), recording dates (r_date), and locations (r_loc). In addition, we consider the table Verve_Wind(name, title, year) that provides information about the names of the records (title) artists (name) playing wind instruments have cut on Verve (for the sake of illustration, we assume that records by Verve are organised in different tables according to the type of instrument played by the lead performer; in our running example, we restrict ourselves to wind instruments). The following is a relational database instance D ex providing information about the records that trumpeter Miles Davis and pianist Keith Jarrett have cut on these two labels. To integrate this data, the music encyclopedia relies on a DL-Lite R ontology with TBox T ex , which defines unary predicates, called concepts, such as Musician, WindPlayer, and Record, and binary predicates, called roles, such as hasMusician.
TBox T ex describes the meaning of these predicates using the following axioms, called inclusions: WindPlayer Musician, saying that every player of a wind instrument is a musician, Record ∃hasMusician, saying that every record is associated to some musician.
The extension of the concepts and roles in the ontology based on the data in Columbia and Verve_Wind tables is determined by a set of GAV mapping assertions of the form (x) A(x) or (x, y) distinction between set and bag semantics in databases is significant in practice; in particular, it influences the evaluation of aggregate queries that combine various aggregation functions such as MIN, MAX, SUM, COUNT, and AVG with the grouping functionality provided in SQL by the GROUP BY construct. The mismatch between the set semantics of OBDA and the bag semantics of database views manifests itself already in our running example. Example 2. Consider TBox T ex , mappings ex , and database instance D ex specified in Example 1. Consider also the query q ex (x) = Musician(x), which asks for all musicians. Under the conventional OBDA semantics, the virtual ABox A ex corresponding to D ex and ex comprises the following assertions: To compute these answers in practice, however, an OBDA system would exploit the first-order rewritability property of OBDA. In particular, it would first rewrite q ex (x) into the union of queries Musician(x) and WindPlayer(x) using inclusion WindPlayer Musician in T ex . Then, in a second step, the system would unfold each disjunct of the rewritten query into a query over the database D ex using the mappings in ex . The result is the SQL query ex (x) comprising the union of the SQL queries σ 1 (x) and σ 2 (x) mentioned on the left-hand side of mappings σ 1 and σ 2 , respectively. The query ex (x) is finally evaluated directly over D ex .
According to the semantics of OBDA, the answers to ex (x) over ontology T ex , A ex should coincide with the evaluation of SQL query ex (x) over D ex . This is, however, not the case in our example. In particular, evaluating ex (x) over D ex yields a bag containing two occurrences of M. Davis and one occurrence of K. Jarrett. This is because duplicates in the answers to SQL queries are kept by default unless duplicate elimination is explicitly requested by using the DISTINCT operator in the SELECT clause of a query. 1 This discrepancy between OBDA semantics and the semantics of database views may occur even if the TBox of the ontology is empty. In particular, in such a case the evaluation of q ex (x) over ABox A ex does not coincide with the evaluation of the rewritten query ( σ 1 (x) in this case) over D ex .
Example 2 suggests that the conventional approach to OBDA can faithfully represent only a subset of GAV mapping assertions-those whose SQL query contains the DISTINCT operator in the top-level SELECT clause.
In this paper, we propose and study a bag semantics for OBDA, which provides a solid foundation for future research on aggregate and analytic queries. Our semantics is compatible with (i) the bag semantics of database views; (ii) the set-based conventional semantics of OBDA; and (iii) the bag semantics recently proposed by Hernich and Kolaitis [20] in the context of data exchange.

Contributions and organisation
The contributions and organisation of this paper are as follows. In Section 3 we introduce the bag semantics of an OBDA setting T , , D consisting of a DL-Lite R TBox T , a set of GAV mappings , and a (bag) database instance D. We also define the notion of certain answers to conjunctive queries as well as the associated query answering problem.
In Section 4 we define the ontology language DL-Lite b R and two of its natural fragments. A distinctive feature of DL-Lite b R is that ABoxes are bags of assertions rather than sets. Syntactically, this language allows for the same TBoxes as DL-Lite R , but their semantics also takes multiplicities into account. We show that, as in the case of conventional OBDA, the certain answers to a query over an OBDA setting T , , D can be characterised as those that logically follow from the union of the TBox T and a virtual bag ABox A ,D representing the materialisation of the views defined by the mappings over the database D. As a result, the data complexity of OBDA query answering under bag semantics coincides with that of query answering over DL-Lite b R . In Section 5 we then establish the relationship between bag and set semantics of DL-Lite b R and DL-Lite R , respectively. In particular, we show that, on the one hand, satisfiability checking in DL-Lite b R reduces to satisfiability checking in DL-Lite R and, on the other hand, query answers under bag and set semantics coincide if multiplicities are ignored. There are, however, key properties of the conventional semantics of DL-Lite R that are no longer satisfied by the bag semantics: unlike the set 1 Bag semantics offers two types of union, called maximal and arithmetic; the first computes the maximum number of occurrences for every tuple in the provided operands whereas the second adds these numbers up. The answer we provide here is based on maximal union for reasons that will be made clear later on in this article. The use of arithmetic union does not affect our motivation. case, DL-Lite b R ontologies may not have a universal model for conjunctive queries-that is, a single model the answers over which are precisely the certain answers to all such queries; moreover, query answers may be sensitive to the adoption of the unique name assumption (UNA).
In Section 6 we show that conjunctive query answering under bag semantics is computationally more challenging than in the set case. In particular, we establish three incomparable coNP lower bounds for the data complexity of the problem. We first show that it is coNP-hard even if we restrict the ontology language to DL-Lite b core -that is, the language where role inclusions are disallowed-regardless of whether the UNA is adopted or not. Second, we show that without the UNA the problem is hard even if the ontologies do not have existential quantification on the right-hand side of concept inclusions (i.e., in the DL-Lite b rdfs ontology language) and the queries are restricted to the so-called rooted conjunctive queries [21]-that is, conjunctive queries with all their connected components containing at least one constant or answer variable; this class of queries comprises most practical OBDA queries. The third hardness result is established for the same settings as in the second case except that the UNA is adopted, but there are no restrictions on inclusion axioms.
In Section 7 we make a first step on the way to regain tractability of conjunctive query answering. In particular, we show that rooted conjunctive queries admit a universal model over DL-Lite b core ontologies, regardless of the adoption of the UNA. In Section 8 we employ this result to show that rooted conjunctive queries are rewritable over DL-Lite b core to queries in a bag analogue of relational calculus, BCALC, which can be evaluated directly on the ABox of the ontology. Using known results on bag databases, we conclude that the corresponding query answering problem is tractable, in particular, in LogSpace.
In Section 9 we establish similar results for arbitrary conjunctive queries and DL-Lite b rdfs -that is, the language that allows for role inclusions, but does not allow for existential quantification on the right-hand side of concept inclusions; however, these positive results hold only when the UNA is adopted, while we already know that without the UNA the problem is coNP-hard and hence non-rewritable to BCALC.
In Section 10 we combine the results of the previous two sections and establish rewritability and tractability of query answering for the ontology language DL-Lite b R − capturing both DL-Lite b core and DL-Lite b rdfs . This language allows for both role inclusions and existential quantification on the right-hand side of concept inclusions. To establish rewritability, however, the language essentially forbids interaction between these two features. Also, the setting inherits all of the restrictions imposed on the previous cases-that is, it adopts the UNA and considers only rooted conjunctive queries.
Finally, in Section 11 we provide a comprehensive discussion of related work, and in Section 12 we discuss possible extensions of our OBDA framework.

Preliminaries
In this section we recapitulate the basic definitions that we use in the remainder of the paper. In Section 2.1 we introduce the syntax and (set-based) model-theoretic semantics of the standard ontology and query languages for OBDA. Then, in Section 2.2, we introduce conjunctive queries and define the associated query answering problem. In Section 2.3 we review the common operations on bags. Finally, in Section 2.4 we specify a bag relational calculus that we will exploit later on to express query rewritings over ontologies; our calculus is embeddable into the bag algebra for relational databases by Grumbach and Milo [22], which we discuss in the accompanying Appendix. 2

Syntax and semantics of DL-Lite R ontologies
We fix a vocabulary consisting of countably infinite and pairwise disjoint sets of individuals I (or constants), variables X, atomic concepts C (unary predicates) and atomic roles R (binary predicates). A role is an atomic role P in R or its inverse P − . A concept is an atomic concept in C or an existentially quantified concept ∃R, where R is a role. An inclusion axiom (or just inclusion) is an expression of the form S 1 S 2 with S 1 and S 2 either both concepts, in which case we speak of a concept inclusion, or both roles, in which case we speak of a role inclusion. A disjointness axiom is an expression of the form Disj(S 1 , S 2 ) with S 1 and S 2 either both concepts or both roles. A DL-Lite R TBox is a finite set of inclusions and disjointness axioms. A concept assertion is of the form A(a) with a ∈ I and A ∈ C. A role assertion is of the form P (a, b) with a, b ∈ I and P ∈ R. A DL-Lite R ABox is a finite set of concept and role assertions. A DL-Lite R ontology is a pair T , A with T a DL-Lite R TBox and A a DL-Lite R ABox.
An interpretation I (or set interpretation, when the context matters) is a pair I , · I , where the domain I is a nonempty set, and the interpretation function · I maps each individual a ∈ I to an element a I ∈ I , each atomic concept A ∈ C to a subset A I of I , and each atomic role P ∈ R to a subset P I of I × I . The interpretation function extends to other concepts and roles as follows: 2 There are alternative algebraic query languages for bags, such as that by Libkin and Wong [23]; however, their expressive power is equivalent to that of Grumbach and Milo's algebra and hence the choice of an underpinning algebraic query language is immaterial to the results in this article.
Interpretation I is finite if so is I . An interpretation I = I , · I satisfies the unique name assumption (or UNA) whenever I interprets distinct individuals from I with distinct elements from I -that is, the inequality a I = b I holds when a, b ∈ I with a = b. An interpretation I satisfies an inclusion S 1 S 2 if S I 1 ⊆ S I 2 , and it satisfies a disjointness axiom Disj(S 1 , S 2 ) It is well-known that a DL-Lite R ontology is satisfiable if and only if it is satisfiable under the UNA, and this can be tested in NLogSpace in general and in AC 0 if the TBox is fixed; similarly, a DL-Lite R TBox entails an axiom if and only if it entails the axiom under the UNA [24].
In this paper, we will also consider the following three sublanguages of DL-Lite R : -DL-Lite core restricts DL-Lite R by disallowing in TBoxes role inclusions and role disjointness axioms; -DL-Lite rdfs restricts DL-Lite R by disallowing in TBoxes concept and role disjointness axioms as well as existentially quantified concepts on the right-hand side of concept inclusions; -DL-Lite R − restricts DL-Lite R by disallowing in TBoxes T inclusions of the form C ∃R whenever T contains inclusion R S for some role S different from R.
Note that DL-Lite core and DL-Lite rdfs are incomparable fragments of DL-Lite R , whereas DL-Lite R − extends both DL-Lite core and DL-Lite rdfs . The restrictions imposed in DL-Lite R − limit the interaction between concept and role inclusions in TBoxes.

Queries over ontologies
A conjunctive query (or CQ) q(x) with answer variables x is a formula ∃y. φ(x, y) in first-order logic with equality, where x and y are (possibly empty) repetition-free tuples of variables from X and φ(x, y) is a conjunction of atoms of the form A(t), where A ∈ C, P ∈ R, z ∈ x ∪ y, and t, t 1 , t 2 ∈ x ∪ y ∪ I. If x is clear from the context, then we may write q instead of q(x). The equality atoms of the form (z = t) in a CQ q(x) = ∃y. φ(x, y) yield an equivalence relation ∼ on terms x ∪ y ∪ I, and we write t for the equivalence class of a term t. The Gaifman graph of q(x) has a node t for each t ∈ x ∪ y ∪ I in φ, and an edge {t 1 , t 2 } for each atom P (t 1 , t 2 ) in φ. In what follows, we (silently) assume that all CQs q(x) = ∃y. φ(x, y) are safe-that is, such that for each z ∈ x ∪ y, the class z contains either an individual from I or a variable mentioned in an atom of φ(x, y) that is not an equality. A CQ is Boolean if its answer variables x are the empty tuple . Furthermore, following Bienvenu et al. [21], a CQ q(x) is rooted if each connected component of its Gaifman graph has a node with a term in x ∪ I.
A union of CQs (UCQ) is a disjunction of CQs with the same answer variables. A UCQ is Boolean (or rooted, or both) if so are all of its component CQs.
The answers q I to a (U)CQ q(x) over an interpretation I are the set of all tuples a of individuals from I with |a| = |x| such that the formula q(a) holds in I (where |a| and |x| are the sizes of a and x, respectively). The certain answers to a (U)CQ q(x) over a DL-Lite R ontology K are the intersection of the answers to q(x) over all models of K. The certain answers q K to q(x) over K under the UNA are the intersection of the answers to q(x) over all models of K satisfying the UNA. In fact, for DL-Lite R , the (usual) certain answers always coincide with the certain answers under UNA, and checking whether a tuple of individuals is in the certain answers to a (U)CQ q over a DL-Lite R ontology T , A is an NP-complete problem with AC 0 data complexity (i.e., when T and q are fixed) [9,24]. The latter follows from the rewritability of the class of UCQs to itself over DL-Lite R [9]. Informally, the key ideas for rewritability is as follows. First, every DL-Lite R ontology possesses a so-called canonical interpretation, which is a model if the ontology is satisfiable, and which is homomorphically embeddable to every other model. Moreover, this model is universal for UCQs in the sense that the answers to every UCQ on the canonical interpretation coincide with the certain answers over the ontology. Finally, for every UCQ and every TBox it is always possible to construct another UCQ such that, for every ABox, the answers to the original UCQ over the universal model of the resulting ontology are the same as the answers to the new UCQ over just the ABox. We will formally define the notions of rewritability, canonical interpretation, and universal model later in the article.
To conclude, we note that our definition of CQs is slightly non-standard in that it allows for equality atoms, which are usually regarded as inessential. Making equalities explicit in the query will be convenient later on when computing query rewritings, where we will sometimes need to force an answer variable to become equal to another answer variable or to an individual. This is, however, just a technicality; in particular, none of our complexity lower bounds to query answering or negative rewritability results depend on the presence of equalities in the query.

Bags
A bag over a set M is a function Note that bag difference is well-defined only when 2 does not assign ∞ to any element in M. Also, the unary duplicate elimination ε operator is defined for a bag over a set M and for every c ∈ M as follows: Note that, for every two finite bags 1 and 2 over the same set M, the following identities hold [26]: However, these identities may not hold if the bags are infinite.

A calculus for querying bag databases
A database schema is a non-empty finite set S of predicates with non-negative arities that are disjoint from C and R. Given a database schema S and a set of constants (i.e., individuals) I, a database fact is an expression of the form S(a), where S ∈ S and a is a tuple of constants from I of size equal to the arity of S. Then, a bag database instance D is a finite bag over all the facts over S and I.
Grumbach and Milo [22] proposed an algebraic query language for bag databases called BALG, which is sufficiently powerful to capture relational algebra over bags [19]. BALG allows for nesting of bags and can be seen as the union of the sublanguages BALG k , k ≥ 1, each of which allowing for up to k − 1 levels of nesting. In this article we restrict ourselves to BALG 1 , which does not allow for bag nesting. Grumbach and Milo [22] studied the data complexity of answering BALG 1 queries under a unary encoding of numbers in the input and showed that it is strictly "sandwiched" between complexity classes AC 0 and LogSpace; it is therefore tractable, but strictly harder than the data complexity of relational queries over set databases. We defer a full treatment of Grumbach and Milo's algebra and its associated decision problem to Appendix A.
We next introduce BCALC-a calculus for querying bag databases based on Grumbach and Milo's algebra. Using a calculus formulation instead of an algebraic one will significantly simplify the presentation of our query rewriting algorithms later on. In Appendix B we show that our calculus can be easily embedded into BALG 1 and thus inherits its LogSpace upper bound for data complexity of query answering.
The syntax of BCALC for a database schema, formally presented in the following inductive definition, extends the syntax of (U)CQs given in Section 2.2 with several new operations. Domain-dependent queries, inexpressible in algebraic query languages, are precluded by introducing restrictions on the use of variables (intuitively, a query is domain-dependent if its answers over a fixed database instance may change when the underlying set of constants is modified; see [28] for details). Definition 3. Given a database schema S and a set of constants I, a BCALC query (x) with answer variables x is any of the following, where , 1 , and 2 are BCALC queries: -S(t), where S ∈ S and t is a tuple over x ∪ I of size equal to the arity of S mentioning all variables in x; , y), where y is a tuple of distinct variables from X that are not in x; A BCALC query is positive if it does not mention the difference operator \. A positive BCALC query is a BCALC conjunctive query (CQ) if it additionally does not mention operators ∨, ∨ · , and δ. A BCALC maximal (or arithmetic) union of CQs is a BCALC query of the form 1 (x) ∨ · · · ∨ n (x) (or of the form 1 (x) ∨ · · · · ∨ · n (x), respectively), where each i is a BCALC CQ.
Next we formally define the semantics of BCALC queries, which are bags of tuples of constants. Definition 4. The bag answers D to a BCALC query (x) over a bag database instance D is the finite bag over I |x| defined inductively by the following equations for every tuple a over I with |a| = |x|, where ν : x ∪ I → I is the function such that ν(x) = a and ν(a) = a for all a ∈ I: The decision problem of query answering for BCALC is defined as follows, where all numbers in the input are assumed to be represented in unary and the bag (i.e., the database instance) is explicitly defined only for a finite number of facts while the multiplicities of all other facts are assumed to be 0.

QueryAnswering[BCALC]
Input: B C A L Cq u e r y (x), bag database instance D, tuple a of constants over I, and number k ∈ N 0 .

Question:
The data complexity of this problem is the complexity when query is considered to be fixed and only D, a, and k form the input.
The LogSpace upper bound for the data complexity of QueryAnswering[BCALC] is obtained by showing that for each BCALC query one can construct a BALG 1 algebra expression E such that the bag answers to over every bag database D coincide with the bag answers to E over D; this serves the need, because, as we already mentioned, BALG 1 can be evaluated in LogSpace. The proof of this claim, which is rather technical but conceptually straightforward, is deferred to Appendix B.

Proposition 5. QueryAnswering[BCALC] is in LogSpace in data complexity.
We conclude by observing that in the literature on query optimisation under bag semantics [29][30][31] it is common to encounter the notion of bag-set semantics for databases, where input database instances are sets-that is, do not allow for multiplicities greater than 1-while permitting query answers and views to be bags. The bag semantics we consider in this article generalises the bag-set semantics, and restricting ourselves to bag-set semantics would not change any of our results.

Ontology-based data access under bag semantics
In this section, we introduce our OBDA framework as a natural generalisation of that by Poggi et al. for OBDA under set semantics [2]. We start by defining the syntax of OBDA settings.
is a set of global-as-view (GAV) mapping assertions (or mappings) of the form where and are BCALC queries, while A and P are an atomic concept and atomic role, respectively; and -D is a bag database instance.
A couple of observations about Definition 6 are in order. First, recall that in all our motivating examples so far we have written mappings using SQL queries, which reflects the way in which mappings are defined in practice. To formally study OBDA, however, the use of a bag query language close to first-order logic, such as BCALC, is more appropriate. Second, in contrast to the definitions by Poggi et al. [2], we do not allow for function symbols on the right-hand side of mappings. This restriction does not affect the computational properties of query answering in OBDA under set semantics [2], and it is adopted in most theoretical papers on data integration [8,32]; it is also immaterial to our technical results, and yet allows us to simplify the presentation. Example 7. The mappings ex in Example 1 are equivalently expressed using BCALC as follows: hasMusician(x, y), The semantics of bag OBDA settings is based on bag interpretations I , which are defined as set interpretations (see Section 2.1) with the exception that concepts and roles are now interpreted as bags rather than sets. The extension of the interpretation function to non-atomic concepts and roles is defined in a natural way: for example, the concept ∃P for an atomic role P is interpreted as the bag projection of the interpretation P I of P to its first component, where each occurrence of a pair (u, v) in P I contributes separately to the multiplicity of a domain element u in (∃P ) I .

Definition 8. A bag interpretation I is a pair
I , · I where the domain I is a non-empty set, and the interpretation function · I maps each individual a ∈ I to an element a I ∈ I , each atomic concept A ∈ C to a bag A I over I , and each atomic role P ∈ R to a bag P I over I × I . Interpretation function · I extends to complex concepts (P − ) I and (∃R) I , for P ∈ R and R a role, as follows, for all u, v ∈ I : A bag interpretation I is finite if I is a finite set and I assigns a finite bag to each A ∈ C and each P ∈ R.
We are now ready to specify the semantics of bag OBDA settings in terms of bag interpretations (note that satisfaction of axioms is defined in the same way as in the set case, but the symbols ⊆, ∩, and ∅ denote the subbag relation, bag intersection, and the empty bag, respectively).

x)
A(x) and (x, y) P (x, y) in , and all individuals a, a 1 , a 2 ∈ I: To show that I ex |= b T ex , ex , D ex , we argue that I ex satisfies ex with respect to D ex and that I ex |= b T ex . For the former, we first compute the bag answers to the BCALC queries 1 , . . . , 6 appearing on the left-hand side of the mappings σ 1 , . . . , σ 6 over database instance D ex . These are specified as follows:  We next discuss an important aspect of Definition 9 concerning the presence of different mappings defining the same view. In such cases, the extension of the view intuitively corresponds to the union of the answers to the queries specified on the left-hand side of the contributing mappings. However, bag query languages, such as BCALC, come with two versions of the union operation: maximal and arithmetic union. Moreover, in different settings one of these unions can be more intuitive and preferable than the other. On the one hand, Definition 9 tacitly commits to the maximal union by requiring that a model for and D satisfies each contributing mapping independently. On the other hand, this is not a limitation of our OBDA framework since GAV mapping assertions can always be rewritten to reflect the alternative choice based on arithmetic union. Example 11. Consider the bag OBDA setting T ex , ex , D ex obtained from our running example by augmenting ex to the set ex that additionally contains mapping Note that both mappings σ 1 and σ 7 define the extension of concept Musician. By Definition 9 interpretations such as I ex in Example 10, which interpret Musician as the maximal union of the bags corresponding to the musicians mentioned in the Columbia and Verve_Wind tables, are valid models of T ex , ex , D ex . Let us now define ex as the mappings obtained from ex by replacing σ 1 by Musician(x).
In this case, every model of T ex , ex , D ex interprets Musician as the arithmetic union of the bags corresponding to musicians in the relevant tables. In particular, interpretation I ex in Example 10 is not a model, as required.
We are now ready to define CQ answering under bag semantics. We first define the answers q I to a CQ q(x) over a bag interpretation I ; this is a natural extension to (possibly infinite) interpretations of the notion of bag answers to a CQ over a bag database (see Section 2.4). Specifically, q I is a bag of tuples of individuals such that each valid embedding λ of the atoms in q into I contributes separately to the multiplicity of the tuple λ(x) in q I , and where the contribution of each specific λ is the product of the multiplicities of the images of the query atoms under λ in I .

Definition 12.
Let q(x) = ∃y. φ(x, y) be a CQ and I = I , · I be a bag interpretation. The bag answers q I to q over I are the bag over tuples of individuals from I of size |x| such that, for every such tuple a, where is the set of all valuations λ : x ∪ y ∪ I → I such that λ(x) = a I , λ(a) = a I for each a ∈ I, and λ(z) = λ(t) for each z = t in φ(x, y).
If q is Boolean then the bag answers q I are defined only for the empty tuple . Also, conjunction φ(x, y) may contain repeated atoms, and hence can be seen as a bag of atoms; while repeated atoms are redundant in the set case, they are essential in the bag setting [29,33], and thus the definition of q I (a) should be read in the way that it treats each copy of a query atom S(t) separately in the product.
The following definition of certain answers, which captures open-world query answering, is a natural extension of the set notion to bags: a query answer is certain with multiplicity k if it is an answer with multiplicity at least k over every model of the OBDA setting. Definition 13. The bag certain answers q T , ,D to a CQ q over a bag OBDA setting T , , D are the bag The bag certain answers under the UNA are defined in the same way except that the intersection ranges only over the models satisfying the UNA. The decision problem BagCertObda[Q, O] corresponding to computing the bag certain answers to a CQ from a class Q over an OBDA setting with a TBox from an ontology language O (i.e., DL-Lite R or one of its sublanguages) is defined as follows, where we again assume that all numbers in the input are represented in unary and the bag is explicitly defined only for a finite number of facts.

Input:
CQ q from Q, bag OBDA setting T , , D with T from O, tuple a of individuals from I, and number k ∈ N ∞ 0 . Question: The UNA version BagCertObda UNA [Q, O] of this problem is defined in the same way as BagCertObda[Q, O] except that the certain answers are considered under the UNA. The data complexity of these problems is the complexity when the query q, TBox T , and mappings are considered to be fixed, and only D, a, and k form the input.

R
In Section 4.1 we define the ontology language DL-Lite b R and its natural fragments, where the distinctive feature of DL-Lite b R is that ABoxes consist of bags of facts rather than sets. We then show in Section 4.2 that, analogously to the case of conventional OBDA, the materialisation of the mappings over the sources can be represented by a virtual bag ABox; as a result, the data complexity of OBDA query answering under bag semantics coincides with that of query answering over DL-Lite b R .

R
We start by introducing the notion of a bag ABox and describing its semantics in terms of bag interpretations.

Definition 15.
A bag ABox is a finite bag over the set of concept and role assertions. A bag interpretation I = I , · I satisfies a bag ABox A, written I |= b A, if, for each concept assertion A(a) and role assertion P (a 1 , a 2 ), the following holds: If a bag interpretation I satisfies the UNA, then ABox satisfaction amounts to checking whether the inequalities A(A(a)) ≤ A I (a I ) and A(P (a 1 , a 2 )) ≤ P I (a I 1 , a I 2 ) hold for each concept and role assertion A(a) and P (a 1 , a 2 ), respectively.
We can now introduce the notion of a bag ontology and define the ontology language DL-Lite b R and its fragments.

Definition 16. A DL-Lite b
R ontology is a pair T , A of a DL-Lite R TBox T and bag ABox A. The sublanguages DL-Lite b core , DL-Lite b rdfs , and DL-Lite b R − of DL-Lite b R are defined in the same way except that only DL-Lite core , DL-Lite rdfs , and DL-Lite R − TBoxes are allowed, respectively.
A bag interpretation I is a model of a DL-Lite b ontology is satisfiable if it has a model; it is satisfiable under the UNA if it has a model satisfying the UNA.
The following definition of certain answers, which captures open-world query answering, is a natural extension of the set notion for DL-Lite R to bags: a query answer is certain for a given multiplicity if it occurs with at least that multiplicity in the bag answers to the query over every model of the ontology. Similarly to the OBDA case, the decision problem corresponding to computing the bag certain answers to a CQ from a class Q over an ontology in a bag ontology language O (e.g., DL-Lite b R or one of its sublanguages) is defined as follows, where we again assume that all numbers in the input are represented in unary and the bag is explicitly defined only for a finite number of facts.

BagCert[Q, O]
Input: We conclude this section by introducing the notion of rewritability for our bag semantics. Our definition is analogous to that of Calvanese et al. [9] for the set case. Note that a bag ABox A can be seen as a database instance, so we can write A for a BCALC query over the unary and binary predicates for atomic concepts and roles, respectively.

Definition 18. A BCALC query
is a rewriting of a CQ q with respect to a TBox T if all the individuals, atomic concepts and atomic roles of appear in q or T , and q T ,A = A for every bag ABox A with satisfiable T , A . A class of CQs Q is rewritable to a class of BCALC queries Q over an ontology language O if, for every query in Q and every TBox in O, there exists in Q a rewriting of the query with respect to the TBox. Rewritings and rewritability under the UNA are defined in the same way except that only ontologies satisfiable under the UNA and certain answers under the UNA are considered.
Note that by restricting the signature of to that of q and T in this definition we are considering the problem of finding a pure rewriting of q with respect to T . In the set case, it was shown that pure rewritings may have to be of larger size than their impure counterparts [13]. However, our negative results on rewritability (i.e., Propositions 45 and 63) do not depend on this restriction, and may be shown for the more general case; we impose this restriction to facilitate exposition of some proofs.
Note also that in the set case the target query language for rewriting is typically a class of UCQs, since Calvanese et al. [9] showed that arbitrary CQs are rewritable to UCQs over DL-Lite R . In the case of bags, however, we will see that the situation is markedly different, and we will focus on BCALC as the target language for rewriting.

Relationship to query answering in OBDA
In conventional OBDA, the certain answers to a query over T , , D can be characterised as those that logically follow from the union of the TBox T and the virtual ABox A ,D , which represents the materialisation of the views defined by the mappings over the database instance D. As a result, query answering in OBDA amounts to query answering over DL-Lite R , and the rewritability and data complexity properties of both problems coincide [2].
In what follows, we show that an analogous correspondence holds under bag semantics. We start by introducing the notion of a virtual bag ABox, which captures the materialisation of the views specified in a bag OBDA setting.

Definition 19. The virtual DL-Lite b
R ABox of the bag OBDA setting T , , D is the bag ABox A ,D defined as follows, for all A ∈ C, P ∈ R, and a, a 1 , a 2 ∈ I: The following example illustrates the notion of virtual ABoxes. The following lemma shows that the models of a bag OBDA setting T , , D coincide with those models satisfying the TBox T and the virtual bag ABox A ,D .

Lemma 21. For every bag OBDA setting T , , D and every bag interpretation I, we have that
Proof. It suffices to show that I is a model of A ,D if and only if I satisfies with respect to D as in Definition 9. For this, let I be a model of A ,D and, for all S ∈ C ∪ R and tuples a of individuals, let S,a (x) ∈ where x has the same arity as a. By the definition of I |= b A ,D and Definition 19, the following inequality holds for every S ∈ C ∪ R, tuples a of individuals, and mapping (x) By Definition 9, this is equivalent to requiring that I satisfies with respect to D, as desired.
Having Lemma 21 at our disposal, we can relate the problems of satisfiability checking and query answering for bag OBDA settings to the corresponding problems for DL-Lite b R .
All three statements also hold when the problems are considered under the UNA.
Proof. We concentrate on the general case; the case of the UNA is analogous. The first two statements are direct consequences of Lemma 21 and the definitions of satisfiability and certain answers. This theorem allows us to talk only about DL-Lite b R ontologies in the rest of the paper, silently assuming that all the results apply to OBDA settings as well.

Relationship of bag and set semantics in the context of DL-Lite R
In this section we discuss the main similarities and differences between our bag semantics of DL-Lite b R and the conventional set semantics of DL-Lite R . First, in Section 5.1, we argue that our bag semantics can be seen as a generalisation of the set semantics, in the sense that, on the one hand, satisfiability checking in DL-Lite b R reduces to satisfiability checking in DL-Lite R and, on the other hand, query answers under bag and set semantics coincide if multiplicities are ignored. There are, however, key properties of the conventional semantics for DL-Lite R that are no longer satisfied by the bag semantics. In particular, in Section 5.2, we discuss the influence of the UNA on query answering and show fundamental differences with the set case. Furthermore, in Section 5.3, we show that a universal model-a representative model of a satisfiable ontology over which each CQ can be correctly evaluated-may not exist under bag semantics; this is in contrast to the set case, where the fact that a universal model always exists is key to ensuring favourable computational properties of query answering.

Satisfiability, entailment of axioms, and query answering
The following theorem shows that our bag semantics is compatible with the conventional set semantics of DL-Lite R . The first statement in the theorem shows that satisfiability under bag semantics reduces to the set case: to check whether a DL-Lite b R ontology K is satisfiable, it suffices to check satisfiability of the DL-Lite R ontology K obtained from K by setting all non-zero multiplicities in the ABox to 1. The second statement establishes that entailment of axioms under set and bag semantics coincide; this means that the adoption of bag semantics does not affect the standard TBox reasoning services implemented in ontology development tools. Finally, the third statement shows that certain answers under bag and set semantics coincide if multiplicities are ignored-that is, a tuple is a set certain answer to a query with respect to an ontology if and only if it is also a bag certain answer with multiplicity at least one. All three statements in the theorem hold regardless of whether the UNA is adopted.
Then, the following statements hold: 3. a ∈ q K if and only if q K (a) ≥ 1, for each CQ q and each tuple a over I.
All three statements also hold when the problems are considered under the UNA.
Proof. We concentrate on the general case; the case of the UNA is analogous.
(Statement 1) Assume that K has a model I = I , · I . Let I = I , · I be the bag interpretation defined as follows, for each a ∈ I, A ∈ C, P ∈ R, and u, v ∈ I : Bag interpretation I satisfies A and all axioms in T . Thus, I is a model of K and, therefore, K is satisfiable, as required. Conversely, suppose that K has a model I = I , · I . We construct an interpretation I = I , · I as follows, for each a ∈ I, A ∈ C, P ∈ R, and u, v ∈ I : Interpretation I is a model of K by construction, which completes the proof of Statement 1.
(Statement 2) We show the claim by considering a case for each kind of axiom in T .
Let first α be S 1 S 2 where S 1 and S 2 are either both concepts or both roles. To show that T |= b S 1 S 2 implies T |= S 1 S 2 , assume that T |= b S 1 S 2 but T |= S 1 S 2 . Then, the following DL-Lite R ontology must be satisfiable [9]: which then implies T |= b S 1 S 2 , contradicting our assumption.
We now show that T |= S 1 S 2 implies T |= b S 1 S 2 . Let T be the TBox extending T with the following inclusions for each role inclusion R 1 for R − 1 and R − 2 the inverses of R 1 and R 2 , respectively. Following [34], T |= S 1 S 2 implies that either T |= Disj(S 1 , S 1 ), or there exists a chain of inclusions T 0 T 1 , . . . , T n−1 T n in T such that T 0 = S 1 and T n = S 2 . In the first case, we have that T |= b Disj(S 1 , S 1 ) by definition.
In the second case, the chain of inclusions in T implies that T |= b S 1 S 2 since T I 0 ⊆ T I 1 ⊆ · · · ⊆ T I n should hold for every bag interpretation I satisfying T . Then, T |= b S 1 S 2 follows from the fact that every bag interpretation satisfying T satisfies also the additional inclusions in T by construction.
Let now α be Disj(S 1 , S 2 ) where S 1 , S 2 are either both concepts or both roles. If T |= b Disj(S 1 , S 2 ), then there exists a bag interpretation I such that I |= b T and S I 1 ∩ S I 2 = ∅. Let I be the set interpretation constructed in the proof of Statement 1 on the basis of I . By construction we have that I |= T and S I 1 ∩ S I 2 = ∅; thus T |= Disj(S 1 , S 2 ), as required. The other direction can be shown in exactly the same way.
(Statement 3) First, let a ∈ q K -that is, let a belong to the certain answers to q over K-and assume for the sake of contradiction that q K (a) = 0. The latter means that there exists a model I of K such that q I (a) = 0. Consider the interpretation I constructed on the basis of I as in the proof of Statement 1. Interpretation I is a model of K such that I |= q(a), which yields a contradiction. Thus, q K (a) ≥ 1, as required. The other direction can be shown in exactly the same way.

Unique name assumption
As we mentioned before, in the set case general satisfiability and satisfiability under the UNA coincide for DL-Lite R , and the same holds for axiom entailment. The following corollary, which states similar claims for bag semantics, is an immediate consequence of Statements 1 and 2 of Theorem 23 and this property.

Corollary 24. A DL-Lite b R ontology is satisfiable if and only if it is satisfiable under the UNA. A DL-Lite R TBox entails an axiom under bag semantics if and only if it entails this axiom under bag semantics and the UNA.
Artale et al. [24] showed that query answering over satisfiable DL-Lite R ontologies is also independent of whether the UNA is adopted or not; indeed the UNA may influence query answers under set semantics only if the ontology language allows for some form of equality (e.g., functionality constraints). We next argue that the situation is markedly different under bag semantics. The following proposition shows that the UNA can influence query answering under bag semantics as soon as role inclusions are allowed in the ontology language.

Proposition 25. There exists a satisfiable DL-Lite b
R ontology K and a rooted CQ q such that the (general) certain answers to q over K differ from the certain answers under the UNA.
Let also q = ∃y. P (a, y). Under the UNA, we have that q K ( ) = 2. Indeed, in all models I satisfying the UNA, individuals b 1 and b 2 are interpreted as different elements; as a result, in each such I , the element a I is associated with at least two elements in P I . In contrast, if the UNA is not adopted, then the certain answer is 1, which is witnessed by an interpretation that maps b 1 and b 2 to the same element of the domain.
As we will see later on (in Corollary 43), the presence of role inclusions is crucial for this mismatch, and the certain answers to rooted CQs over DL-Lite b core ontologies do not depend on whether the UNA is adopted.

Universal models
An important property of each satisfiable DL-Lite R ontology K is the existence of so-called universal models for CQsthat is, models I such that the certain answers to every CQ q over K can be obtained by evaluating q over I [9]. Existence of such universal models is critical to the favourable computational properties of DL-Lite R . The notion of a universal model for bags is the same as for sets.

Definition 26.
A model I of an ontology K is universal for a class of queries Q if q K = q I for all q ∈ Q. It is universal under the UNA if the certain answers under the UNA are considered.
In the set case, it is well-known that universal models for the class of CQs always exist for satisfiable ontologies K. In fact, they are canonical interpretations-that is, interpretations that can be obtained by a restricted chase procedure applied to K [35]. It is also well-known that a model of K is universal for the class of CQs if and only if it can be homomorphically embedded into every other model of K [9]. Unfortunately, in contrast to the set case, even DL-Lite b core ontologies may not admit universal models for all CQs.
Similarly, consider the bag interpretation I 2 with domain {a, b, u} that interprets the individuals by themselves and interprets concepts and roles as follows: It is immediate to verify that both I 1 and I 2 are models of K. Moreover, for the Boolean CQs and q nr = ∃y. B( y), thus, neither model is universal for {q r , q nr }. Suppose now there is a universal model I for {q r , q nr }. Then, since q I r ( ) must be 0, we have that P I (a I , b I ) = 0. Since assertion A(a) occurs in A with multiplicity 1 and inclusion A ∃P belongs to T , we have that Finally, note that the proof also works under the UNA.

Lower bounds for the data complexity of query answering under bag semantics
The lack of universal models illustrated in Section 5.3 suggests that CQ answering under bag semantics is computationally more challenging than in the set case. In this section, we show that this is indeed the case and establish three incomparable coNP lower bounds in data complexity. These are in stark contrast to the well-known AC 0 upper bound in the set case for CQ answering over DL-Lite R .
The first lower bound is given in Theorem 28, where we show that CQ answering is coNP-hard even if we restrict the ontology language to DL-Lite b core regardless of the adoption of the UNA. The second and third lower bounds are established in Theorem 29, where we show similar coNP-hardness results for the cases where the query language is restricted to the class of rooted CQs and the ontology language is allowed to contain role inclusions. Proof. We prove that there exists a DL-Lite b core TBox T and a Boolean CQ q such that checking whether q T ,A ( ) ≥ k for an input bag ABox A and k ∈ N ∞ 0 is coNP-hard regardless of whether the UNA is adopted or not. Following ideas of Kostylev and Reutter [36], we provide a reduction of the complement of the 3-colourability problem for directed graphs, a well-known coNP-complete problem, to query answering.
We first address the case of BagCert UNA . Let G = V , E be a directed graph with vertices V and edges E. We construct a DL-Lite core TBox T and a Boolean CQ q, neither of which depends on G, as well as a bag ABox First, let T consist of the inclusions V ertex ∃hasColour and ∃hasColour − Colour, where V ertex and Colour are atomic concepts, and hasColour is an atomic role, and let q be the Boolean CQ Then, let A G be the bag ABox defined as given next, where we use an individual a v for each vertex v ∈ V , an individual a representing an auxiliary "vertex", individuals r, g, and b representing three colours, atomic role Edge, as well as the concepts and roles introduced before: -Colour(r) has multiplicity |V | + 1 for colour r; -Colour(g) and Colour(b) each has multiplicity |V | for colours g and b; -V ertex(a), Edge(a, a), and hasColour(a, r) each has multiplicity 1 for the auxiliary "vertex" a; and -all other assertions have multiplicity 0.
Concept V ertex and role Edge are used to encode G. The role hasColour represents a colour assignment to the vertices of G, where inclusions V ertex ∃hasColour and ∃hasColour − Colour necessitate the association of each vertex to a colour.
Concept Colour provides a sufficient number of pre-defined copies of the three colours; every proper colour assignment of G shall use at most |V | times each of these colours. We next exploit these properties for showing that First, let G be not 3-colourable. Consider an arbitrary model I of T , A G . We next show that q I ( ) ≥ 3 × |V | + 2. Since I is a model of bag ABox A G , the valuation λ defined as λ(x) = λ(y) = a I and λ(z) = λ(w) = r I contributes to q I ( ) a multiplicity of at least |V | + 1; this is because A G contains assertion Colour(r) with multiplicity |V | + 1 and assertions V ertex(a), Edge(a, a), and hasColour(a, r) with multiplicity 1. Similarly, each valuation that differs from λ by sending w to either g I or b I contributes to q I ( ) a multiplicity of at least |V |. We have two possibilities for I : either there exists an element u ∈ I different from r I , g I and b I such that Colour I (u) ≥ 1 or not. In the first case the valuation that differs from λ by sending w to u instead of r I contributes to q I ( ) a multiplicity of at least 1, so overall we have q( ) I ≥ 3 × |V | + 2, as required. In the second case we can consider a colour assignment γ to V such that, for every corresponding to the colour of v 1 and v 2 under γ , and λ (w) = r I . By construction, λ contributes to q I ( ) a multiplicity of at least |V | + 1. Therefore, overall we have that q( ) I ≥ 3 × |V | + 2, as required.
Assume now that G is 3-colourable. It suffices to show that there exists a model I of T , A G for which . Consider a bag interpretation I with the domain consisting of all the individuals (i.e., that interprets all the individuals by themselves, and such that V ertex I , Edge I and Colour I are defined precisely according to A G (e.g., In other words, interpretation I is defined on the basis of the 3-colouring of G. By construction, I is a model of T , A G . Next, we show that q I ( ) = 3 × |V | + 1. First, we observe that the first three atoms Edge(x, y), hasColour(x, z), and hasColour( y, z) of q match exactly once (i.e., under the valuation sending x and y to a, and z to r). Next, there are precisely three possibilities for variable w, namely r, g, and b, contributing multiplicity 3 × |V | + 1 in total. Consequently, q I ( ) = 3 × |V | + 1, as desired.
We now address the case of BagCert by discussing the required modifications in the aforementioned reduction. For this, it is enough to ensure that, first, the auxiliary "vertex" a is not interpreted by the same element as any of the vertices of G; and, second, that the colour individuals r, g, and b are interpreted by pairwise different elements. To ensure this, we use atomic concepts V a , V G , Red, Green, and Blue. We add the following disjointness axioms to TBox T : Disj(V a , V G ), Disj(Red, Green), Disj(Red, Blue), and Disj(Green, Blue). We also modify bag ABox A G by setting the multiplicity of V a (a), Red(r), Green(g), Blue(b), and V G (a v ), for every vertex v ∈ V , to 1 (and the multiplicity of all other assertions over the new concepts to 0). Following the same argumentation as for the case of BagCert UNA , we can show that the above reduction works when the UNA is dropped.
Note that the query constructed in the proof of Theorem 28 is not rooted; furthermore, the use of the disconnected atom Colour(w) in the query is instrumental to the correctness of the reduction. In Section 8 we show that rooted CQs are rewritable to BCALC over DL-Lite b core regardless of the adoption of the UNA-that is, the problems are in LogSpace in data complexity.
Unfortunately, the restriction to rooted CQs alone is not sufficient to ensure tractability of query answering for bag ontology languages allowing for role inclusions. In the first part of Theorem 29 we show that answering rooted CQs is intractable (coNP-hard) even if we restrict ourselves to DL-Lite b rdfs ontologies, which allow for role inclusions while at the same time disallowing existential quantification on the right-hand side of concept inclusions. This lower bound, however, critically depends on the fact that the UNA is not adopted; indeed, in Section 9 we will show that all CQs (and not just rooted ones) are rewritable to BCALC over DL-Lite b rdfs under the UNA. On the other hand, even if adopting the UNA can make rooted CQ answering easier, in the second part of Theorem 29 we show that it remains intractable in general: answering rooted CQs over DL-Lite b R is coNP-hard under the UNA. Proof. We first prove the claim for the case of BagCert, and then show how to adapt the proof to the case of BagCert UNA . The proof is again by reduction of the complement of the 3-colouring problem for directed graphs; however, the reduction is more involved. Let G = V , E be a directed graph with vertices V and edges E. Next, we define a DL-Lite R TBox T and Boolean rooted CQ q, neither of which depends on G, as well as a bag ABox where hasColour and Colour are atomic roles. Let also q be the following Boolean rooted CQ, where a 0 is a "root" individual, and Edge, Beg, End and V ertex are atomic roles: . Finally, let the bag ABox A G mention the "root" individual a 0 of q, individuals a v and c v associated to vertices v ∈ V , individuals r, g, and b corresponding to the three colours, individuals a e associated to edges e ∈ E, as well as an auxiliary "vertex" individual a and "edge" individual a * ; let also A G assign 1 to the following assertions (and 0 to all others): Having the reduction complete, next we show that it is correct-that is, that Intuitively, every model of T , A G has 3 × |V | + 1 contributing valuations for q that send the subquery of q over the y variables to the (interpretations of) the assertions over the auxiliary a * , a and r, while the subquery over the x variables to the assertions over one of a v and a, and one of r, g, and b. Then, if some c v is interpreted as neither r, nor g, nor b, we can construct one more valuation sending x c to the interpretation of c v (here we make use of the TBox T ). Otherwise, the identifications of c v can be seen as a colouring of the vertices (represented by a v individuals), and every valid colouring corresponds to the model possessing exactly 3 × |V | + 1 valuations. Next, we make this intuition formal. In fact, in the both directions of the correctness proof we make use of the following fact.

Claim 30. For every bag interpretation I satisfying all assertions in
Proof. Let I be a bag interpretation satisfying ABox A G . Consider all the valuations λ such that λ(y e ) = a I * , for v ∈ V , and λ(x c ) is one of r I , g I , and b I . Since I satisfies A G , each of these valuations contribute at least 1 to q I ( ), and there are overall 3 × |V | of them. Note that we rely only on the cardinality of the ABox here, so even if the interpretations of the individuals are not pairwise distinct-that is, if the UNA is violated-and some of these valuations may coincide, the total contribution of these valuations is still at least 3 × |V | by Definition 15. Consider now the valuation λ that is the same as before on y e , y 1 v , y 2 v and y c , but such that λ(x v ) = a I and λ(x c ) = r I . This valuation also contributes a multiplicity of at least 1. Moreover, for the same reason as before, the contribution of each of the considered valuations is separate-that is, the total contribution is at least 3 × |V | + 1.
Having this claim at hand, we are ready to show correctness of the reduction. Let first G be not 3-colourable. Consider an arbitrary model I of T , A G . Since I satisfies all assertions in A G , by Claim 30 we know that there are valuations that contribute 3 × |V | + 1 to q I ( ). So, it is enough to show that there is a valuation with a non-zero and different contribution. We have two possibilities: either there is a vertex v ∈ V such that c I v is distinct from r I , g I , and b I , or not.
In the first case, consider such a vertex v and the valuation λ that is the same as in Claim 30 on y c , y 1 v , y 2 v and y c ,  1 and v 2 assigned to the same colour. For brevity, consider only the case when this colour is red; the other two cases are symmetric. Let λ be the valuation that agrees with a valuation in Claim 30 on the variables x v and x c , and follows the following assignment for the rest of the variables: On the one hand, the contribution of this valuation to q I ( ) is at least 1 by construction. On the other hand, this valuation is different from the ones considered in Claim 30.
Therefore, in both cases we have that q I ( ) ≥ 3 × |V | + 2, as required.
Let now G be 3-colourable-that is, there is a colour assignment γ to V such that the vertices of each edge are coloured differently. We next show that q T ,A G ( ) < 3 × |V | + 2. To this end, consider the bag interpretation I defined as follows: We are left to show the second part of the theorem-that is, coNP-hardness of BagCert for the case when the UNA is adopted, but arbitrary DL-Lite R TBoxes are allowed. In fact, we can essentially repurpose the same reduction as in the first part. The only modifications in the reduction are that the ABox A G does not have assertions hasColour(a v , c v ), for v ∈ V (i.e., does not use individuals c v at all), while the TBox T additionally has the inclusion ∃V ertex − ∃hasColour. Then, the correctness proof goes along the same lines as in the first case, except that anonymous domain elements u v , which are enforced by the new inclusion for each v ∈ V , are used instead of the c I v .
Since the data complexity of BCALC is strictly contained in LogSpace, the lower bounds in Theorems 28 and 29 imply non-rewritability to BCALC for the relevant query and ontology languages. In the following sections, we investigate how to regain tractability of query answering and rewritability to BCALC by considering suitable restrictions on the query and ontology languages that allow us to circumvent the bounds in Theorems 28 and 29. In Sections 7 and 8 we focus on DL-Lite b core and show that the class of rooted CQs is rewritable to BCALC both in general and under the UNA. In Section 9 we focus on the ontology language DL-Lite b rdfs and show that all CQs (and not just rooted ones) are rewritable to BCALC under the UNA. Finally, in Section 10 we show rewritability of rooted CQs over DL-Lite b R − , which extends both DL-Lite b core and DL-Lite b rdfs , under the UNA. For the convenience of the reader, we summarise in Table 1 all the data complexity results proved in this paper.

Universal models for rooted conjunctive queries over DL-Lite b CORE ontologies
Our next main goal is to show tractability of answering rooted CQs over DL-Lite b core in data complexity and their BCALC rewritability, both regardless of the adoption of the UNA. Towards this goal, in this section we show that every satisfiable DL-Lite b core ontology admits a universal model for rooted CQs, both in general and under the UNA. To this end, we proceed as in the set case: we first define a special bag interpretation for each DL-Lite b core ontology, which we call canonical, and then, after developing dedicated machinery, prove that it is indeed universal for the class of rooted CQs when the ontology is satisfiable. However, in contrast to the set case, the requirement for CQs to be rooted is crucial here: recall Proposition 27, where we constructed a DL-Lite b core ontology that does not have a universal model for all CQs.
To formalise canonical bag interpretations, we need two auxiliary notions. First, the concept closure ccl T [u, I] of an element u ∈ I in a bag interpretation I = I , · I over a TBox T is the bag of concepts such that, for every concept C , In other words, ccl T [u, I](C ) is the maximum value of C I 0 (u) amongst all concepts C 0 satisfying T |= C 0 C -that is, ccl T [u, I](C ) is the minimal multiplicity of C J (u) required for an extension J of I to satisfy TBox T locally in u.
Second, the union I ∪ J of two bag interpretations I = I , · I and J = J , · J interpreting all the individuals in the same way-that is, such that a I = a J for all a ∈ I-is the bag interpretation I ∪ J , · I∪ J with a I∪ J = a I for all individuals a ∈ I and S I∪ J = S I ∪ S J for all atomic concepts and roles S ∈ C ∪ R (recall that S I and S J are bags, so S I ∪ S J is the bag maximal union).

Definition 32. The canonical bag interpretation Can(K) of a DL-Lite b
core ontology K = T , A is the bag interpretation that is the union i≥0 Can i (K) of the bag interpretations Can i (K) defined as follows: -Can 0 (K) = Can 0 (K) , · Can 0 (K) is the bag interpretation corresponding to bag ABox A-that is, such that Can 0 (K) = I, a Can 0 (K) = a for each a ∈ I, and S Can 0 (K) (a) = A(S(a)) for each S ∈ C ∪ R and individuals a; -for each i > 0, Can i (K) = Can i (K) , · Can i (K) extends Can i−1 (K) by satisfying all the inclusions that are not satisfied in where w j u,R are fresh domain elements, called anonymous, and, for all a ∈ I, A ∈ C, P ∈ R, and domain elements u and v, a Can i (K) = a, We have just defined canonical bag interpretations in a declarative way. Note, however, that they can also be obtained by applying a variant of the restricted chase procedure [35] extended to bags-a procedure where, starting from the ABox, violations of the inclusions in the TBox are successively "repaired" by extending the interpretation of concepts and roles in a minimal way. We now illustrate Definition 32 with an example. Inspecting Definition 32 and Example 33, we observe that every canonical bag interpretation interprets each concept with a bag of individuals but only with a set of anonymous elements; similarly, multiplicities greater than 1 in the interpretations of roles are possible only for pairs of elements that are both individuals. This is an important property of canonical bag interpretations, which we are going to use in this section.
Note that the canonical bag interpretation satisfies the UNA. The following is another simple and intuitive observation, which holds regardless of the UNA and can be checked by the construction.

Proposition 34. If a DL-Lite b
core ontology is satisfiable then its canonical bag interpretation is its model.
In the rest of this section, we show that the canonical bag interpretations of satisfiable DL-Lite b core ontologies are universal models for the class of rooted CQs regardless of the adoption of the UNA. There are two key ideas here, which are similar to the set case, but more subtle.
First, the canonical bag interpretation of a satisfiable DL-Lite b core ontology admits a homomorphism of a special type to every model of the ontology. Such homomorphisms, which we call multiplicity-preserving on the individuals, have a hybrid nature: for the concept interpretations, they preserve multiplicities of the interpretations of individuals, but are not required to do so for anonymous elements; similarly, for role interpretations, they preserve multiplicities of the pairs having at least one element being the interpretation of an individual, but are not required to do so for pairs of anonymous elements.
Second, each valuation of a rooted CQ over the canonical bag interpretation sends at least one term of each connected component to the interpretation of an individual. So, since DL-Lite b core does not allow for role inclusions, if two such valuations are different, then they are different on the non-anonymous part of the canonical interpretation. Moreover, the canonical bag interpretation is essentially set-based on the anonymous elements. Therefore, a valuation contributing to the answers and its multiplicity are determined solely by the non-anonymous part of the image of the valuation.
Putting these two ideas together, we can conclude that the composition of a valuation of a rooted CQ over the canonical bag interpretation and a homomorphism from the canonical interpretation to another model that is multiplicity-preserving on the individuals is also a valuation, and its contribution to the certain answers over the latter model is at least as large as the contribution of the former valuation over the canonical interpretation; moreover, different valuations over the canonical interpretation contribute independently to the certain answers over the latter model. This means that the canonical bag interpretation has the smallest possible certain answers to every rooted CQ-that is, by definition, it is the universal model for the class of such queries.
Even though these ideas may seem quite intuitive, their formalisation requires additional machinery, which we do not have yet. The problem is that with the current terminology we cannot unambiguously refer to each particular occurrence of an element in a bag, which is highly desirable for the formalisation. Therefore, we start by introducing new terminology for bags and other bag-based notions. There is a straightforward one-to-one correspondence between bags and e-bags, and we call the e-bag corresponding to a bag the enumerated version of and denote it e . We can extend this notation to bag interpretations and consider the enumerated version I e of a bag interpretation I = I , · I defined as the pair I , · I e such that a I e = a I for each individual a and S I e = (S I ) e for each concept or role S.
The use of enumerated versions of interpretations allows us to refer, in an unambiguous way, to the different occurrences of elements and pairs of elements in the bags corresponding to concepts and roles, respectively. An enumerated homomorphism between two interpretations is then defined as a standard homomorphism that additionally establishes a correspondence for each enumerated tuple of elements in each bag of the relevant bag interpretations.

To handle some cases uniformly, we sometimes write h P − ([(v, u): m]) instead of h P ([(u, v): m]), for P ∈ R.
E-homomorphisms have no essential differences with usual homomorphisms because they can send several enumerated tuples to just one without any restrictions. In contrast, the next definition formalises the aforementioned idea of multiplicity preservation: e-homomorphisms that are multiplicity-preserving on the individuals preserve multiplicities on the non-anonymous part of the source interpretation. Definition 37. An e-homomorphism (h, h S , . . .) from I e = I , · I e to J e = J , · J e is multiplicity-preserving on individuals I if, for each a ∈ I, the following holds, where u = a I e : The following lemma then formalises the first key idea about the canonical bag interpretation in terms of ehomomorphisms that are multiplicity-preserving on the individuals. To begin, recall that a = a Can 0 (K) for every a ∈ I and let h : Can 0 (K) → I be such that h(a Can 0 (K) ) = a I for every a ∈ Can 0 (K) -that is, for every a ∈ I. We now define function h S : S Can e 0 (K) → S I e for every S ∈ C ∪ R. For this, consider a tuple of individuals a such that S Can 0 (K) (a) = k, for k ∈ N-that is, such that [a: m] ∈ S Can e 0 (K) for all m ∈ N with m ≤ k. By the definition of Can 0 (K), we have that A(S(a)) = k. Since I is a model of K, it satisfies A, and, in particular, , and to define h R on the role links between these anonymous elements and corresponding individuals. For the extension for atomic concepts, consider an arbitrary concept name A ∈ C and an element u ∈ Can 0 (K) = I. By definition, we have A Can 1 (K) (u) = ccl T [u, Can 0 (K)](A). Therefore, to show that it is possible to extend h A in such a way that different enumerated elements are sent to different enumerated elements, it suffices to prove that . By the definition of concept closure, there must exist a concept C such that T |= C A and C Can 0 (K) (u) = ccl T [u, Can 0 (K)](A). If C is an atomic concept, then C Can 0 (K) (u) = A(C(u)) by construction, and A I (u I ) ≥ A(C(u)) follows from the fact that I is a model of both A and T , and the fact that T |= C A is equivalent to T |= b C A (see Statement 2 of Theorem 23). If C is ∃P or ∃P − , then C Can 0 (K) (u) = a∈I A(P (u, a)) or C Can 0 (K) (u) = a∈I A (P (a, u)), respectively, and the argument is analogous.
For the extension for the introduced anonymous elements and corresponding role links, it is enough to consider arbitrary P ∈ R and u ∈ Can 0 (K) = I. By definition, Can 1 (K) extends Can 0 (K) with fresh anonymous elements w To complete the proof we argue that having an e-homomorphism (h, h S , . . .) from Can e i (K) to I e that is multiplicitypreserving on I, for i ≥ 1, we can extend it to Can e i+1 (K). This can be shown analogously to the case of Can e 1 (K). The only additional observation is that for every i ≥ 1 and every S ∈ C ∪ R, bag S Can i+1 (K) extends S Can i (K) with tuples u of elements that are all anonymous, for which S Can i (K) (u) = 1 by definition.
We next move to the formalisation of the second intuitive idea. Recall that a Boolean CQ can be seen as a bag of atoms. Therefore, in the following definition we can consider the enumerated version q e of a Boolean CQ q, which is the e-bag of its atoms; this formulation allows us to distinguish different occurrences of atoms in q. Then, an enumerated valuation from a CQ to an interpretation is essentially an e-homomorphism where the CQ is seen as a bag interpretation.

Definition 39.
An enumerated valuation (e-valuation) of a Boolean CQ q over a bag interpretation I = I , · I is a family (ν, ν S , . . .), for S ∈ C ∪ R, of the following functions, where q S is the subquery of q consisting of all its atoms over S: for all S ∈ C ∪ R, such that ν(a) = a I for each a ∈ I, -ν(y) = ν(t) for all equality atoms y = t in q, where is a number in N, and where is a number in N.
It is straightforward to check that the number of e-valuations of a Boolean CQ q over a bag interpretation I is precisely the multiplicity of the empty tuple in the certain answers to q over I .

Proposition 40. The number of e-valuations of a Boolean CQ q over a bag interpretation I is q I ( ).
The following lemma formalises the second idea: if two e-valuations over the canonical bag interpretation coincide on all the (enumerated occurrences of the) atoms of a rooted CQ that involve terms evaluating to (the interpretations of) individuals, then they are the same e-valuation. . Moreover, by assumption t consists of only variables. We consider only the case when S(t) is P (x 1 , x 2 ), where P ∈ R (and the case when S(t) is A(x) for A ∈ C can be handled in the same way).

Lemma 41. Let q be a rooted Boolean CQ and K be a DL-Lite
Boolean CQ q is rooted, so there exists a sequence , and, for each j = 1, . . . , k, t j ∼ t j and either j for an atomic role P j . We claim that for all j = 1, . . . , k (which, in particular, contradicts our assumption on [P (x 1 , x 2 ): m]). To prove this claim, suppose for the sake of contradiction that it is not the case, and let i ∈ {1, . . . , k} be the smallest number such that (2) does not hold.
By assumption, we know that ν i (t j−1 ) = ν i (a) for both i = 1, 2 and every a ∈ I (therefore, j = 1, because t 0 ∈ I). However, since j is the smallest number, ν 1 (t j−1 ) = ν 2 (t j−1 ). So, the element u = ν 1 (t j−1 ) in the canonical bag interpretation , which, by assumption, are different. However, we also know that ν 1 (t j−1 ) = ν 2 (t j−1 ), so the only possibility for both ν 1 . Therefore, our assumption on the existence of j was wrong and (2) indeed holds for all j. In particular, it holds for j = k, which contradicts the fact that ν 1 Lemma 41 relies both on the fact that there are no role inclusions in the TBox and on the fact that the CQ is rooted. It is easy to construct counter-examples to this lemma if any one of these requirements is violated.
Having Lemmas 38 and 41 at hand, we are ready to prove that, for satisfiable DL-Lite b core ontologies, the canonical bag interpretation is the universal model for the class of rooted CQs. The idea is that the composition of an e-valuation of a rooted CQ and an e-homomorphism that is multiplicity-preserving on the individuals is also an e-valuation; moreover, the mapping between e-valuations defined by the e-homomorphism in this way is injective and therefore preserving the size of the domain of the mapping. Given that Can(K) satisfies the UNA, this result implies also that Can(K) is universal under the UNA for the class of rooted CQs and satisfiable DL-Lite b core ontologies K.
The following is an important corollary of Theorem 42 and Corollary 24, which allows us to forget the UNA in the rest of the paper when talking about rooted CQ answering over DL-Lite b core .

Corollary 43. The certain answers to rooted CQs over DL-Lite b core ontologies do not depend on the adoption of the UNA.
Another important corollary of Theorem 42, the structural properties of rooted CQs, and the definition of the canonical interpretation is that, similarly to the set case, the bag certain answers q K to a rooted CQ q over a satisfiable DL-Lite b core ontology K can be computed over the sub-interpretation Can n (K) of Can(K) with n depending only on q.

Corollary 44. If K is a satisfiable DL-Lite b
core ontology with Can(K) = i≥0 Can i (K) and q is a rooted CQ having n atoms, then q K = q Can n (K) .

Rewritability of rooted conjunctive queries over DL-Lite b CORE
First-order rewritability of CQs is a key property of DL-Lite query answering under set semantics. In this section, we show rewritability of rooted CQs over DL-Lite b core to BCALC (recall that by Corollary 43 this result is agnostic to the adoption of the UNA).

Non-rewritability to BCALC unions of conjunctive queries
In the case of set semantics, the target language for rewritings is that of unions of conjunctive queries (UCQs). There are two natural counterparts to UCQs in the bag setting: BCALC maximal union of CQs and BCALC arithmetic union of CQs. Our first result is negative and in stark contrast to the set case: in general, rewriting to either of these classes of BCALC queries is not possible, even over DL-Lite b core . Evaluating q over Can(K), we get q Can(K) (a) = 7 for individual a. Assume for the sake of contradiction that there exists a rewriting of q to a BCALC maximal union (x) of CQs with respect to T . By the semantics of the BCALC maximal union, there exists a BCALC CQ q 0 in with q A 0 (a) = q Can(K) (a). Observe that A contains three distinct assertions with multiplicities 3, 2, and 3. Hence, whenever there is a valuation of the terms of q 0 that maps an atom of q 0 to one of these assertions, the multiplicity is either 2 or 3. Because q 0 is a CQ, every valuation of q 0 contributes to q A is a multiple of 2 or 3. Since 7 is prime, there can be no valuation contributing multiplicity 7. Moreover, there are only two ways to get 7 as an instance of a polynomial with coefficients 2 and 3, namely, 2 + 2 + 3 and 2 × 2 + 3. For the former sum, this means that there exist three distinct valuations contributing to q A 0 (a) with multiplicities 2, 2, and 3, respectively, which is impossible given the fact that, to get 2, query q 0 must be set equal to ∃y. P (x, y), which excludes the possibility of getting the multiplicity 3. For the latter sum, there must exist two distinct valuations contributing to q A 0 (a) with multiplicities 4 and 3, respectively, which is again impossible given the fact that, to get 4, query q 0 must be set to ∃y, z. P (x, y) ∧ P (x, z) or to ∃y. P (x, y) ∧ P (x, y), which in either case excludes the possibility of getting the multiplicity 3.

Proposition 45. Rooted CQs are rewritable neither to BCALC maximal nor to BCALC arithmetic unions of CQs over DL-Lite
So, none of these cases satisfies A = B Can(K) , which is required for a rewriting of q with respect to T .

General ideas for rewritability to BCALC queries
Next, we show that rooted CQs are rewritable to a richer fragment of BCALC over DL-Lite b core , which also features the operation of difference. This ensures LogSpace membership in data complexity of query answering. Our rewriting algorithm is inspired by that of Kikot et al. [37] for DL-Lite R . The key observation behind our approach is that, for a DL-Lite b core ontology K and a rooted CQ q(x) = ∃y. φ(x, y), the bag answers to q over the canonical bag interpretation Can(K), which, by Theorem 42, coincide with the bag certain answers to q over K, can be partitioned as where each [q, z] Can(K) is the bag of answers to q over Can(K) supported by valuations of q(x) over Can(K) that send all the variables in a subset z of the variables y to anonymous elements and all other variables to individuals. Next, we define such partitions formally.

Definition 46.
Let q(x) = ∃y. φ(x, y) be a rooted CQ and let K be a DL-Lite b core ontology. Given a subset z of variables y, let [q, z] Can(K) be the bag of tuples over I such that, for each tuple a of individuals, where z is the set of valuations λ : x ∪ y ∪ I → Can(K) such that λ(x) = a, λ(a) = a for each a ∈ I, λ(x) = λ(t) for each x = t in φ(x, y), λ(z) is an anonymous element for each z ∈ z, and λ(y) ∈ I for each y ∈ y \ z.
Following this key observation, given a rooted CQ, the rewriting algorithm first constructs a set of BCALC queries each one accounting for the bag [q, z] Can(K) for a subset z and then adjoins them using the arithmetic union to produce the actual rewriting, which is still a BCALC query. In particular, every subset z is processed along the following three steps: 1. z is checked for T -realisability-that is, whether the corresponding subquery can be folded to the anonymous part of a canonical bag interpretation-and disregarded from consideration in (3) if the check fails; 2. each connected component of the subquery corresponding to z is replaced in the query of [q, z] Can(K) by a single representative role atom; and 3. each concept atom and each representative atom is rewritten to a BCALC query that takes into account the TBox and the fact that z should be sent to anonymous elements.
In the following three sections we formalise each of these steps and prove their correctness.

Step 1: checking for realisability
In the first step, every subset z of existentially quantified variables y in a rooted CQ q(x) is checked for T -realisability. Intuitively, z is T -realisable if the subquery of q(x) induced by z can be "folded" into the anonymous forest-shaped part of Can(K) for some ontology K having TBox T . Therefore, non-realisable z cannot contribute to partitioning (3) and their associated subqueries can be disregarded for the purpose of query rewriting. To provide the formal definition of T -realisability, we need to introduce some preliminary definitions and notations.
Let q(x) = ∃y. φ(x, y) be a rooted CQ and T be a DL-Lite core TBox. First, recall that the Gaifman graph G of q has the equivalence class t for each term t ∈ x ∪ y ∪ I in φ as a node, and an edge {t 1 , t 2 } for each atom P (t 1 , t 2 ) in φ. A subset z of y is equality-consistent if z ⊆ z for every z ∈ z.
Second, each equality-consistent subset z of y has a corresponding subgraph G| z of Gaifman graph G-that is, the subgraph on the set of nodes {z | z ∈ z}. This subgraph may have several connected components; a subset v of z is maximally connected if it is also equality-consistent and the subgraph of G| z corresponding to v is a connected component of G| z . Therefore, z can be partitioned to its maximal connected subsets.
Third, for every maximally connected subset v of an equality-consistent z ⊆ y and for each individual a, we define the where φ v is the conjunction of all atoms in φ mentioning at least one variable in v, t v is the set of all terms appearing in φ v but not in v, and v = (x ∪ y) ∩ t v . This query is a Boolean CQ, except that it may have equalities of two individuals and inequalities of terms. The semantics of CQs in Definition 12 can be extended to such queries in a straightforward way: the additional requirement on each valuation λ contributing to the sum is that λ should satisfy λ(x) = λ(t) for each inequality atom (x = t) in the query (and the requirement for equalities of individuals is the same as for usual equalities).

Example 47. Consider the rooted CQ
with y = y 1 , . . . , y 5 over atomic roles P and R, and its Gaifman graph, depicted in Fig. 2a. Observe that no subset of y containing y 1 is equality-consistent because q contains equality y 1 = c and c is not in y. Furthermore, every subset of y containing y 3 or y 4 but not both is also not equality-consistent. However, z = {y 2 , . . . , y 5 } is equality-consistent. The corresponding subgraph G| z is depicted in Fig. 2b. It has two connected components, and therefore z partitions into two maximally connected subsets, v 1 = {y 2 } and v 2 = {y 3 , y 4 , y 5 }. For the first, we have φ v 1 = P (x, y 2 ), t v 1 = {x}, and, for an For the second, we have We are now ready to define the notion of realisability, which is inspired by the notion of tree witnesses proposed by Kikot et al. [37] in the context of query rewriting over DL-Lite R .
Definition 48. Let q(x) = ∃y. φ(x, y) be a rooted CQ and T be a DL-Lite core TBox. A subset z of variables y is T -realisable if it is equality-consistent and every maximally connected subset v of z satisfies the following conditions, where, as before, φ v is the conjunction of all atoms in φ mentioning at least one variable in v and t v is the set of all terms appearing in φ v but not in v:

there is at most one individual in t v ;
2. all the atoms in φ v mentioning terms in t v are over the same atomic role P v and have these terms at the same position Note that realisability checking is clearly decidable. In particular, by Corollary 44, Condition 3 can be checked over a bounded fragment of the relevant canonical bag interpretation.
Example 49. Consider the DL-Lite core TBox T = {A ∃P , ∃P − ∃R} over atomic concept A and atomic roles P and R, the rooted CQ q(x) specified in Example 47, and the equality-consistent subset z = {y 2 , . . . , y 5 } with two maximally connected subsets, v 1 = {y 2 } and v 2 = {y 3 , y 4 , y 5 }. Subset z is T -realisable. To see this, note first that Conditions 1 and 2 are immediately satisfied by both v 1 and v 2 , with no individuals in t v 1 a, b) : 1 | } for fresh individuals a and b, and therefore (4), evaluates to 1 for over Can(K v 1 ). Condition 3 for v 2 can be verified in exactly the same way, except that K v 2 and Can(K v 2 ) have d instead of a, and q d v 2 is defined in (5).
The next lemma establishes the key property of realisability: for any ABox, a non-realisable z cannot contribute to the partitioning (3) of the bag query answers over the canonical bag interpretation.
Lemma 50. Let q(x) = ∃y. φ(x, y) be a rooted CQ and let T be a DL-Lite core TBox. If a subset z of y is not T -realisable, then [q, z] Can( T ,A ) = ∅ for every bag ABox A.
Proof. Let K = T , A where A is an arbitrary bag ABox, and z be a subset of y that is not T -realisable. By Definition 48, either z is not equality-consistent, or there exists a maximally connected subset of z for which one of the three conditions does not hold.
It is straightforward to check that [q, z] Can(K) = ∅ if z is not equality-consistent: indeed, in this case the CQ contains an equality atom (y = t) with y ∈ z and t / ∈ z, which cannot be satisfied over the canonical bag interpretation Can(K) by any valuation contributing to [q, z] Can( T ,A ) because y is mapped to anonymous elements of Can(K) by such valuations, whereas t is mapped to individuals.
Similarly, if there is a maximally connected v ⊆ z for which Condition 1 does not hold, then [q, z] Can(K) = ∅: indeed, if t v contains two different individuals, then no contributing valuation can send v to anonymous elements, because v is connected, while Can(K) has independent tree-shaped anonymous parts connected to these two individuals.
Next, if there is a maximally connected v ⊆ z for which Condition 2 does not hold, then again [q, z] Can(K) = ∅: indeed, if φ v contains two atoms over different atomic roles with terms in t v or two atoms over the same atomic role but with the terms in t v in different positions, then no contributing valuation can send v to anonymous elements by the same reasons as in the case of Condition 1 and the fact that no anonymous element is connected to another element in two different ways in Can(K) by construction.
Finally, consider the case when there is a maximally connected v ⊆ z for which Conditions 1 and 2 hold but Condition 3 does not. Reasoning as in the previous two cases, we conclude that all variables in v are sent by every contributing valuation to anonymous elements generated by the same individual in the canonical bag interpretation, and all terms in t v are sent to this individual. By Condition 2, every atom in φ v that has a term in t v is over atomic role P v and has the term in position p v . Therefore, (q a v ) Can(K v ) ( ) is the factor corresponding to φ v in the multiplicity of every valuation satisfying the conditions of [q, z] Can(K) , and (q a v ) Can(K v ) ( ) = 0 means that there are no valuations with non-zero contribution.

Step 2: replacing subqueries with representatives
Consider a T -realisable subset z of the variables y in a rooted CQ q(x) = ∃y. φ(x, y) for a DL-Lite core TBox T . As established in the previous section, maximally connected subsets v of z are always disjoint, so the corresponding conjunctions of atoms φ v from φ that mention at least one variable in v do not share atoms either. So, q(x) can be put into the following form for the given z in a unique way, where ψ z consists of all atoms in φ(x, y) not contained in any φ v : The rewriting step introduced in this section can be informally explained as follows. By definition, every valuation contributing to bag [q, z] Can( T ,A ) with an arbitrary bag ABox A sends z to anonymous elements of the canonical bag interpretation. So, since each φ v is connected and mentions a variable from z in each of its atoms as well as a term outside z, conjunction φ v contributes to every valuation for [q, z] Can( T ,A ) a multiplicity of at most 1 (recall that the canonical bag interpretation involves only multiplicities 0 and 1 in the anonymous part). Moreover, whether φ v can be appropriately embedded into Can( T , A ) depends solely on whether φ v can be embedded into the canonical bag interpretation of the ontology consisting of T and a prototypical bag ABox that comprises a single assertion with multiplicity 1 in such a way that all terms of φ v that are outside z are sent to one of the individuals of the assertion. Therefore, when computing [q, z] Can ( T ,A ) , the whole φ v can be replaced in q by just a single role atom mentioning a representative variable and a term not in v as well as several equalities identifying all the terms not in v.
Next, we formalise this idea and prove its correctness.
Definition 51. Let q(x) = ∃y. φ(x, y) be a rooted CQ and let T be a DL-Lite core TBox. For every T -realisable subset z of y, let where ψ z is defined as in (6), y z is the set of all variables in y appearing in ψ z , atom α v is defined as follows, for every maximally connected v ⊆ z with terms t v , role P v , and position p v defined as in Definition 48, as well as for a term t ∈ t v and a fresh variable y v : and z is the set of all variables y v introduced for the atoms α v .
Formally speaking, this definition is non-deterministic, because the terms t in the atoms α v are chosen arbitrarily from t v .
However, this choice does not influence the semantics of q z (x) because of the equalities introduced in (7). Therefore, we assume that q z (x) is well-defined.
Example 52. Consider the rooted CQ q(x), DL-Lite core TBox T , and T -realisable subset z with maximally connected subsets v 1 and v 2 introduced in Examples 47 and 49. We know that In contrast to q, CQ q z does not contain any R atoms. Indeed, such atoms are not needed: sending y 3 and y 4 to the same anonymous element w and given that w must be a P -successor of d in the canonical bag interpretation, it follows that w must have an R-successor due to inclusion ∃P − ∃R in T . Thus, we only need a representative y v 1 for y 3 and y 4 .
In the second step, our algorithm generates the query q z (x) for each T -realisable subset z of y. The next lemma justifies this step by showing that we can consider in (3) only T -realisable z and replace q with q z for each such z.
Lemma 53. Let q(x) = ∃y. φ(x, y) be a rooted CQ and let T be a DL-Lite core TBox. If a subset z of y is T -realisable then [q, z] Can( T ,A ) = [q z , z ] Can( T ,A ) for every bag ABox A, where z as in Definition 51.
Proof. Let K = T , A where A is an arbitrary bag ABox, and let z be a T -realisable subset of y. First, for every valuation λ ∈ z as in Definition 46 of [q, z] Can(K) , let λ z be the valuation of x ∪ y z ∪ z ∪ I to Can(K) that is the same as λ on I and all the terms of ψ z , and, for each maximally connected subset v of z, λ z (y v ) = λ(y), where y is a variable in v such that φ v contains an atom P v (t, y) or an atom P v (y, t), for some t ∈ t v ; in other words, λ z is the same as λ except that each v is replaced by its representative with the corresponding value. It suffices to show that every valuation λ ∈ z contributes to [q, z] Can(K) the same multiplicity as λ z contributes to [q z , z ] Can(K) .
Consider first a valuation λ ∈ z contributing to [q, z] Can(K) a non-zero multiplicity. By construction, Can(K) interprets all concepts with multiplicity at most 1 on the anonymous elements and all roles with multiplicity at most 1 on all pairs with at least one anonymous element. Thus, the contribution of λ is equal to the contribution of (the relevant part of) λ to the evaluation of ψ z . So, it is enough to show that, for each maximally connected v ⊆ z, λ z (α v ) has multiplicity 1 in Can(K), and all the equalities on t v introduced in (7) to q z hold for λ z (i.e., for λ, because they coincide on t v ). The former holds immediately by construction of λ z , while the latter follows from the fact that v are connected in q and sent to anonymous elements by λ, while Can(K) is tree-shaped on the anonymous elements. Consider now a valuation λ ∈ z such that λ z contributes to [q z , z ] Can(K) a non-zero multiplicity. Similarly to the previous case, the contribution of λ z is equal to the contribution of (the relevant part of) λ z to the evaluation of ψ z . So, it is enough to show that, for each maximally connected v ⊆ z, λ(φ v ) has multiplicity 1 in Can(K). However, this follows from the fact that z is T -realisable (in particular, Condition 3) and the fact that λ z (α v ) has multiplicity 1 in Can(K).

Step 3: rewriting atoms to BCALC queries
In the last step, our algorithm first transforms each CQ q z (x) computed in Step 2 for a T -realisable z to a BCALC query z (x) satisfying equality [q z , z ] Can( T ,A ) = A z and then constructs the final rewriting of the input CQ q(x) = ∃y. φ(x, y) with respect to the input DL-Lite core TBox T to the BCALC query The intuition behind the construction of z (x) from q z (x) and T hinges on the observation that, for every bag ABox A, the multiplicities of individuals in the interpretations of concepts and roles in Can( T , A ) are determined by the multiplicities of the assertions in A as follows: -for a concept C , the multiplicity of an individual a in C Can( T ,A ) is the maximum multiplicity of a in the interpretation of the concepts subsumed by C with respect to T in the bag interpretation corresponding to A; -for a role P , the multiplicity of a pair of individuals (a, b) in P Can( T ,A ) coincides with the multiplicity of assertion P (a, b) in A (which is justified by the fact that DL-Lite core TBoxes do not allow for role inclusions).
Therefore, for a role R, the number of anonymous R-successors of an individual a in the canonical bag interpretation is precisely the multiplicity of a in the interpretation of the concept ∃R minus the number of individual R-successors of a in the ABox. We can then exploit these observations to construct a BCALC query z (x) such that the bag answers to z (x) over every ABox A coincide with the bag [q z , z ] Can( T ,A ) (where all valuations contributing to the latter bag map z to anonymous elements and the rest to individuals). For this, it suffices to apply to q z , which has the form (7), the following replacements: -each atom over an atomic concept A in the conjunction ψ z of q z with a BCALC query retrieving the maximum multiplicity over all concepts subsumed by A in T ; , for a maximally connected v ⊆ z, with a BCALC query that subtracts the number of P vsuccessors in the ABox from the maximum multiplicity over all concepts subsumed by ∃P v in T ; and , for a maximally connected v ⊆ z, with a BCALC query that subtracts the number of P vpredecessors in the ABox from the maximum multiplicity over all concepts subsumed by ∃P − v in T .
Note that atoms over atomic roles in ψ z are left intact because T does not allow for role inclusions. Next, we formalise this intuition and define the BCALC query z (x) in terms of q z (x) and T .
Definition 54. Let q(x) = ∃y. φ(x, y) be a rooted CQ, T be a DL-Lite core TBox, and let z be a T -realisable subset of y. The BCALC query z (x) is obtained from q z (x) by replacing -each occurrence of an atom A(t) in ψ z with -for each maximally connected v ⊆ z, the atom P v (t, y v ) and the atom P v (y v , t) with the following BCALC query, where R v is P v and P − v , respectively: where, for every concept C and term t, and for a fresh variable y, if C is an atomic concept, ∃y. ξ R (t, y), if C is of the form ∃R, and, for every atomic role P and terms t 1 and t 2 , ξ P (t 1 , t 2 ) = P (t 1 , t 2 ) and ξ P − (t 1 , t 2 ) = P (t 2 , t 1 ).
Before proving correctness of the last step of the rewriting, we illustrate the definitions on our running example.
Example 55. Consider the DL-Lite core TBox T = {Record ∃hasMusician, ∃hasMusician − Musician} and the rooted CQ There are two subsets of y, namely ∅ and y, and it is immediate to check that both of them are T -realisable. Moreover, q ∅ (x) = q(x) and q y (x) = ∃y . hasMusician(x, y ).
For the first of these CQs, all valuations contributing to the bag [q ∅ , ∅] Can(K) map all variables of q ∅ to individuals, for every K with TBox T . Therefore, the atom hasMusician(x, y) remains intact in ∅ whereas the atom Musician( y) is rewritten to a BCALC query retrieving the multiplicity of an individual a in Musician Can(K) , which is equal to the maximum multiplicity of a amongst the concepts Musician and ∃hasMusician − over the ABox of K. As a result, For the second CQ, q y , all valuations contributing to the bag [q y , y ] Can(K) map y to an anonymous element and x to an individual, for every K with T . Hence, each such valuation contributes to [q y , y ] Can(K) multiplicity 1, while all valuations λ agreeing on x contribute to [q y , y ] Can(K) an overall multiplicity equal to the number of anonymous hasMusician-successors of λ(x). This number is the multiplicity of λ(x) in the interpretation of ∃hasMusician under Can(K) minus the number of individual hasMusician-successors of λ(x). Inspecting T , we finally derive ∃y . hasMusician(x, y ). Evaluating the two rewritings on this ABox, we get The following lemma establishes correctness of Step 3 of our rewriting approach as formalised in Definition 54.
Lemma 56. Let q(x) = ∃y. φ(x, y) be a rooted CQ and T a DL-Lite core TBox. Then [q z , z ] Can( T ,A ) = A z for every bag ABox A and every T -realisable subset z of y with z as in Definition 51.
Proof. We first claim that it is enough to show that, for every T -realisable subset z of y and every maximally consistent v ⊆ z, the bag answers to the rewritings (9) and (10) of atoms A(t) and ξ R v (t, y v ), respectively, in q z over every bag ABox A are equal to the bag answers [A(t), ∅] Can(K) and [∃y v . ξ R v (t, y v ), y v ] Can(K) , respectively, for K = T , A . Indeed, on the one hand, z is obtained from q z by applying only these replacements; on the other hand, the atoms that are not rewritten in z are of the form P (t 1 , t 2 ) or (t 1 = t 2 ), for terms t 1 and t 2 that are mapped to individuals by each valuation λ ∈ z , so such atoms do not need to be rewritten by the fact that atoms P (t 1 , t 2 ) satisfy P Can(K) (λ(t 1 ), λ(t 2 )) = A(P (λ(t 1 ), λ(t 2 ))) for every λ ∈ z , whereas equalities do not contribute to multiplicities.
We now argue the correctness of replacements (9) and (10). By the definitions of canonical bag interpretation and concept closure, for all individuals a and concepts C , First, by substituting A for C in (11) and by the semantics of BCALC queries, we immediately derive the following for every a ∈ I and A ∈ C: which proves the claim for concept atoms. For R v atoms, note that, for each individual a, [∃y Therefore, by substituting ∃R v for C in (11) and by the semantics of BCALC queries, we get the following for every a ∈ I and A ∈ C: which proves the claim for R v atoms.

Rewriting and complexity
Putting the results of the previous three sections together, we obtain the following theorem, which establishes the correctness of our rewriting approach.

Theorem 57. For every rooted CQ q and every DL-Lite
Proof. Consider a rooted CQ q(x) = ∃y. φ(x, y) and a DL-Lite b core ontology K = T , A . By (3), by Lemmas 50, 53, and 56, and by (8) we have the sequence of equalities which proves the statement of the theorem.
Theorems 42 and 57, together with the fact that q does not depend on A, imply our main rewritability result.
Corollary 58. The class of rooted CQs is rewritable to BCALC over DL-Lite b core .
Recall that by Corollary 43 the rewritability result applies regardless of the adoption of the UNA. Note also that we need to manipulate only finite bags when evaluating the rewriting q . Since BCALC maximal union is expressible via BCALC arithmetic union and difference for such bags according to equation (1), we can strengthen Corollary 58 and claim that there always exists a rewriting that uses only ∧, ∨ · , \, equalities, and existential quantification.
We conclude with the LogSpace upper bound on the data complexity of the query answering for rooted CQs over DL-Lite b core . We can decide this problem by checking the ontology for non-satisfiability as in the usual set setting and, if the check fails, evaluating the BCALC rewriting of the input query on the ABox. The algorithm is correct by Statement 1 of Theorem 23 on the equivalence of satisfiability under bag and set semantics, and by Corollaries 43 and 58. Both steps can be done in LogSpace by Proposition 5 on the LogSpace membership of the query answering problem for BCALC and the results obtained by Calvanese et al. [9].

Rewritability of conjunctive queries over DL-Lite b RDFS under UNA
In this section we show BCALC rewritability of CQs over DL-Lite b rdfs under the UNA as well as tractability of the corresponding query answering problem in data complexity. This result holds only under the UNA since, as shown in Theorem 29, even rooted CQ answering over DL-Lite b rdfs is coNP-hard in data complexity if the UNA is dropped. Note, however, that the results of this section hold for arbitrary CQs and not just rooted ones.
We proceed analogously to the case of DL-Lite b core described in the previous two sections; however, the absence of existential quantification on the right-hand side of concept inclusions considerably simplifies the exposition.
First, to formalise canonical bag interpretations for DL-Lite b rdfs , we introduce the notion of role closure, which is analogous to that of concept closure in Section 7. The role closure rcl T [(u, v), I] of a pair (u, v) of elements u, v ∈ I in a bag interpretation I = I , · I over a TBox T is the bag of roles such that, for every role R, We are now ready to define canonical bag interpretations for DL-Lite b rdfs ontologies.
Definition 60. Let K = T , A be a DL-Lite b rdfs ontology and Can 0 (K) be the bag interpretation corresponding to A-that is, such that Can 0 (K) = I, a Can 0 (K) = a for each a ∈ I, and S Can 0 (K) (a) = A(S(a)) for each S ∈ C ∪ R and tuple of individuals a. Let also Can R (K) be the bag interpretation with domain I that interprets individuals and atomic concepts as Can 0 (K) and, for each atomic role P and individuals a, b, The canonical bag interpretation Can(K) of K is the bag interpretation with domain I that interprets individuals and atomic roles as Can R (K), and, for each atomic concept A and individual a, satisfies There are two main differences between the canonical bag interpretations for DL-Lite b rdfs and DL-Lite b core ontologies. On the one hand, canonical bag interpretations for DL-Lite b rdfs do not involve anonymous domain elements; this is because the logic does not support existentially quantified concepts on the right-hand side of inclusions. On the other hand, canonical bag interpretations for DL-Lite b rdfs need to satisfy role inclusions, which is ensured using the notion of role closure. As expected, the aforementioned definitions of a canonical bag interpretation coincide for ontologies that are both in DL-Lite b rdfs and DL-Lite b core . We next argue that the canonical bag interpretation in Definition 60 is a universal model for CQs under the assumptions in this section. Proof. First, note that every DL-Lite b rdfs ontology is satisfiable under the UNA, because it does not have any disjointness axioms (or any other axioms that may cause an inconsistency). Moreover, the canonical bag interpretation Can(K) is a model of a DL-Lite b rdfs ontology K by construction. Second, every bag interpretation I satisfying the UNA has the corresponding bag interpretation I s satisfying the standard name assumption-that is, I s is the same as I except that, for every a ∈ I, the individual a itself is used in I s as an element instead of a I ; moreover, q I = q I s for every query q. Therefore, it is enough to show that, for every model I of K, I s contains Can(K), in the sense that Can(K) ⊆ I s and S Can(K) ⊆ S I s for every S ∈ C ∪ R. However, this follows from the definitions of a bag interpretation and the canonical bag interpretation, because concept and role closures essentially encode satisfaction of the inclusions. As a result, we have q K ( ) = 2. Assume now for the sake of contradiction that there exists a BCALC maximal union of CQs such that A ( ) = 2. By the semantics of maximal union, this means that contains a CQ q 0 = ∃y. φ(y) satisfying There are only two ways in which this is possible: -if there is only one valuation λ : y ∪ I → I for q 0 contributing multiplicity 2 to q A 0 ( ), then φ(y) must contain an atom S(t) such that S A (λ(t)) = 2; however, this is not possible because A does not have any assertions with multiplicity 2; -there exist two distinct valuations for q 0 , each contributing multiplicity 1; however, this is not possible because A does not have two assertions with multiplicity 1 over the same atomic concept or role.
As a result, we conclude that cannot contain any such CQ q 0 , and hence cannot be a rewriting of q with respect to T .
Finally, non-rewritability to BCALC arithmetic unions of CQs follows already from the proof of the second part of Proposition 45, where the relevant TBox consists only of an inclusion between two atomic concepts.
In what follows, we show how to construct a BCALC rewriting of an arbitrary CQ with respect to a DL-Lite rdfs TBox. The construction is much simpler than that for DL-Lite b core in Section 8 since the canonical bag interpretation of a DL-Lite b rdfs ontology does not contain anonymous elements. In particular, there is no need to identify T -realisable subsets of existentially quantified variables and to transform the input CQ accordingly, which implies that Steps 1 and 2 in Section 8 are no longer required. As a result, a BCALC rewriting of the CQ can be directly constructed following an analogous approach to that of Step 3 in Section 8, and yet the operations of arithmetic union and difference are no longer required. Note, however, that the BCALC query resulting from the rewriting is not a BCALC maximal union of CQs since it interleaves maximal unions with existential quantification.
Definition 64. Let q(x) = ∃y. φ(x, y) be a CQ and T be a DL-Lite rdfs TBox. The BCALC query q (x) is obtained from q(x) by replacing each occurrence of an atom A(t) and an atom P (t 1 , t 2 ) with where, for every concept C and term t, and for a fresh variable y, if C is an atomic concept, ∃y.
and, for every atomic role P and terms t 1 and t 2 , ξ P (t 1 , So, analogously to Definition 54 in the DL-Lite b core case, we replace each concept atom A(t) with a BCALC maximal union of all atoms over t corresponding to concepts subsumed by A in T ; in contrast to Definition 54, however, we also need to take into account role inclusions, which results in the inclusion of further disjuncts in the second part of the definition of ζ C (t). Then, each role atom P (t 1 , t 2 ) is replaced by the BCALC maximal union of all binary atoms over t 1 and t 2 corresponding to roles subsumed by P in T .
We now provide an example illustrating the construction of a rewriting.

Consider now ABox
Canonical interpretation Can(K) assigns to every atomic concept A and every atomic role P the bag defined as [(u, v), Can 0 (K)](P ) for all elements u, v ∈ Can(K) , respectively. By unfolding the definition of closures, we get the following, for every atom A(t) and P (t 1 , t 2 ) in φ(x, y), and every valuation λ: Consider first the right-hand side of (13). Observe that the first subexpression involving the max function can be equivalently written as A (λ(t)). As for the second subexpression, it can be equivalently written as can be further written as ∃y .
A . Thus, by substituting for ∪ in the aforementioned expression and for the outer max function in (13), and by the definition of ζ C (t) in Definition 64, we derive as required. To complete the proof of the theorem, observe that from (14) and the semantics of BCALC queries we immedi-

Rewritability of rooted conjunctive queries over DL-Lite b R − under UNA
In this section we consider BCALC rewritings of rooted CQs over the ontology language DL-Lite b R − , which extends both DL-Lite b core and DL-Lite b rdfs . As defined in Section 2.1, this language provides all the constructs available in DL-Lite R , but allows concept inclusions of the form C ∃R only for roles R that do not have more general roles in the ontology.
Similarly to the case of DL-Lite b rdfs considered in the previous section, we focus on the semantics that adopts the UNA; this is justified by Theorem 29, where we showed that (rooted) CQ answering over ontology languages allowing for role inclusions is coNP-hard in data complexity if the UNA is not adopted. However, in contrast to the previous section, we focus on rooted CQs rather than general CQs; this choice is justified by Theorem  Note that the canonical bag interpretation of a satisfiable DL-Lite b R − ontology K = T , A is always a model of K by construction and the fact that T does not have concept inclusions of the form C ∃R whenever R has a more general role in T . This fact guarantees that the subinterpretation of Can(K) corresponding to the set of concept inclusions in T , namely i≥1 Can i (K), does not violate the role inclusions in T that have been already satisfied in Can 0 (K). Given this property, the following is a generalisation of Proposition 34 for the canonical bag interpretations of DL-Lite b core ontologies, which holds again regardless of whether the UNA is adopted or not (recall that this was automatic for the case of DL-Lite b rdfs because this language does not allow for any inconsistencies).

Proposition 71. If a DL-Lite b R − ontology is satisfiable then its canonical bag interpretation is its model.
We next show that the canonical bag interpretation is universal for the class of rooted CQs. For this, we establish the counterpart of Lemma 38 for the case of DL-Lite b R − ontologies. We next move to the BCALC rewritability of rooted CQs over DL-Lite b R − . We start by pointing out that this class of queries is rewritable to neither BCALC maximal nor to BCALC arithmetic unions of CQs over DL-Lite b R − under the UNA, which is a direct consequence of Propositions 45 and 63, as well as the fact that DL-Lite b R − extends both DL-Lite b core and DL-Lite b rdfs . In fact, the rewriting algorithm can be seen as a combination of the corresponding algorithms for DL-Lite b core and DL-Lite b rdfs , described in Sections 8 and 9, respectively. When given as input a DL-Lite R − TBox T and a rooted CQ q(x) = ∃y. φ(x, y), the algorithm considers each subset z of y independently and then takes the BCALC arithmetic union of the resulting rewritings. In particular, for each such z, we proceed according to the following three steps: 1. as specified in Definition 48, z is checked for T C -realisability, where T C is the set of concept inclusions in T , and disregarded from consideration if the check fails, 2. as specified in Definition 51, each maximally connected component of the subquery corresponding to z is replaced by a single representative role atom, resulting in a CQ; and 3. all atoms are rewritten to a BCALC query that takes into account the TBox and the fact that z should be mapped to anonymous elements.
The only essential difference to the algorithm for DL-Lite b core is in the last step, where we now take into account also the presence of role inclusions, as in the case of DL-Lite b rdfs . The construction of the rewriting is formalised by the following definition.
Definition 74. Let q(x) = ∃y. φ(x, y) be a rooted CQ and T a DL-Lite R − TBox. For a T C -realisable subset z of y, let z (x) be the query obtained from the CQ q z (x), given in Definition 51, by replacing -each occurrence of an atom A(t) and an atom P (t 1 , t 2 ) with -the atom P v (t, y v ) and the atom P v (y v , t) for each maximally connected v ⊆ z with the following BCALC query, where R v is P v and P − v , respectively: where, for every concept C and term t, and for a fresh variable y, if C is an atomic concept, ∃y.
and, for every atomic role P and terms t 1 , Finally, let The structure of the rewriting in Definition 74 is similar to that for DL-Lite b core in Definition 54. The main differences are as follows: -when rewriting atoms not having variables in z, we take into account the role inclusions as in the DL-Lite b rdfs case; and -when rewriting role atoms P v (t, y v ) and P v (y v , t) for maximally connected subsets v of variables, we distinguish between different types of subconcepts of ∃P v and ∃P − v , respectively: for atomic concepts we follow the rewriting for DL-Lite b core , while for existentially quantified subconcepts we take into account the role inclusions as in the DL-Lite b rdfs case.
Example 75. Consider TBox T from Example 70 and the rooted CQ q(x) = ∃y. hasMusician(x, y) ∧ Musician( y), same as in Example 55. As before, there are two T C -realisable subsets of y, namely ∅ and y, and q ∅ (x) = q(x) and q y (x) = ∃y . hasMusician(x, y ).
We show the correctness of the approach by means of the following generalisation of Lemma 56. Proof. The proof is similar to the proofs of Lemma 56 and Theorem 66. Consider an arbitrary bag ABox A and let K = T , A be the resulting DL-Lite b R − ontology. We need to show that the bag answers over each bag ABox A to the rewritings of atoms A(t), P (t 1 , t 2 ) and ξ P v (t, y v ) in q z as these appear on the left-and right-hand side of (15), as well as in (16), respectively, are equal to the bag answers [A(t), ∅] Can(K) , [P (t 1 , t 2 ), ∅] Can(K) , and [∃y v . ξ P v (t, y v ), y v ] Can(K) , respectively.
We now argue the correctness of replacements (15) and (16) in Definition 74. To begin, let T R be the set of role inclusions in T , let I A = I A , · I A be the bag interpretation corresponding to ABox A-that is, the interpretation defined as I A = I, a I A = a for all a ∈ I, and S I A (a) = A(S(a)) for all S ∈ C ∪ R and tuples of individuals a, and let Can R ( T R , A ) be the bag interpretation for the DL-Lite b rdfs ontology T R , A defined on the basis of I A according to Definition 60 (note that we use I A instead of Can 0 (K) since the definition of the latter in DL-Lite b R − differs from the one in DL-Lite b rdfs ). By the definitions of canonical bag interpretations for DL-Lite b R − ontologies as well as concept and role closures, for all individuals a and b, concepts C , and atomic roles P , we have the following equalities: From (17), (18), and the semantics of BCALC queries, we immediately derive the following, for every a, b ∈ I, A ∈ C, and P ∈ R (see also the derivations for (13) and (14) in the proof of Theorem 66): which proves the claim for atoms of the form A(t) and P (t 1 , t 2 ). The claim for P v atoms is proved using (17) and similarly to the proof of Lemma 56.
Using this lemma, the following theorem can be proved in exactly the same way as Theorem 57.
Theorem 77. For every rooted CQ q(x) and every DL-Lite b Combining Theorem 73 and Theorem 77, we derive the following corollary.

Related work
In this section, we establish bridges between our bag semantics for OBDA and existing work in the literature on data exchange and query answering over DL-Lite.

Bag semantics in data exchange
Hernich and Kolaitis [20] have recently studied CQ answering under bag semantics in the context of data exchange. As is customary in conventional treatments of data exchange [32], their setting considers disjoint source and target database schemas, which are related via source-to-target GLAV mappings. In the data exchange literature it is common to also consider dependencies over the target schema (which can equivalently be seen as ontological axioms); however, the semantics by Hernich and Kolaitis is defined only under the assumption that no such dependencies exist.
Definition 80. A bag data exchange setting is a tuple S, T, , D where -S is a source database schema; -T is a target database schema disjoint from S; -is a finite set of global-and-local-as-view (GLAV) mapping assertions (or mappings) of the form where q(x) is a CQ over S and p(x) is a CQ over T; 3 and -D is a bag database instance over S.
In the conventional set-based case, a solution of a data exchange setting is a finite database over the target schema that satisfies all the mappings together with the source database. Hernich and Kolaitis [20] defined two possible generalisations of this notion to bag semantics.
In other words, the incognizant semantics adopts the maximal union approach for interpreting mappings defining the same view-that is, when applied to GAV mappings q 1 (x) S(x) and q 2 (x) S(x) for a predicate S, and a database D, an incognizant solution B must satisfy q D 1 ∪ q D 2 ⊆ S B ; in contrast, the cognizant semantics adopts the arithmetic union approach and requires that a solution B satisfies q D Query answering in bag data exchange settings is defined as the problem of computing the bag certain answers to a query over the target schema with respect to the set of solutions. We are now ready to establish the connection between our OBDA framework and that of Hernich and Kolaitis for data exchange. Note that there are several differences between our OBDA settings, introduced in Definition 6, and data exchange settings. First, target schemas in OBDA are restricted to predicates of arity one and two, whereas in data exchange the arity of target predicates is unbounded. Second, data exchange mappings have CQs on both sides, whereas OBDA mappings have arbitrary BCALC queries on the source side, but only atomic queries of the form A(x) and P (x, y) on the target side.
Third, an OBDA setting comes with a TBox, whereas the data exchange setting in [20] does not allow for any dependencies over the target schema. Finally, from a semantics point of view, certain answers in OBDA are defined in terms of (possibly infinite) models, whereas certain answers in data exchange are defined in terms of (finite) database solutions.
We next show that, despite these mismatches, our semantics is compatible with that of Hernich and Kolaitis. For this we note that OBDA and data exchange settings are syntactically compatible if we assume target predicates of arity only one and two, absence of an ontology, and restrict ourselves to GAV mappings with CQs on the source side and atoms A(x) and P (x, y) on the target side. Under these assumptions, we argue that our semantics coincides with the incognizant semantics of Hernich and Kolaitis; furthermore, their cognizant semantics can be simulated in our setting by means of a suitable rewriting of their mappings. Note also that the following proposition is stated for the OBDA semantics without the UNA; however, adopting the UNA would not affect this result, because in case of an empty ontology the two semantics coincide. Proof. Given a bag database B over T, let I B be the bag interpretation corresponding to B-that is, the interpretation having the set of individuals (i.e., constants) appearing in B as the domain and satisfying S I B = S B for every predicate S ∈ T. We claim that, for every bag database B over T, The first claim follows from Definitions 81 and 9 of the semantics of data exchange and OBDA, respectively.
Consider now the second claim. The fact that B is a c-solution for S, T, , D means that there are bag database instances B σ , for σ ∈ , such that q D ⊆ (S(x)) B σ for each σ = q(x) S(x) and σ ∈ B σ ⊆ B. This is equivalent to the fact that, for each atom S(x), which means that I B is a model of ∅, , D by the definition of . To conclude the proof, note that, since an ontology with an empty TBox has a unique finite model that is minimal with respect to bag inclusion, by Lemma 21, it is enough to consider only finite interpretations when computing the bag certain answers to q.
Note that the second statement in this proposition is essentially illustrated in Example 11. We conclude this section by observing that, at least in principle, we could have defined a "cognizant" bag semantics for DL-Lite b R , which is based on arithmetic union rather than maximal union. Under such a semantics, inclusions A C and B C would be satisfied by a bag interpretation I only if A I B I ⊆ C I ; this is in contrast to our current "incognizant" semantics where we require that A I ∪ B I ⊆ C I . Such a semantics would, however, come with clear disadvantages.

Count aggregate queries over ontologies
Our approach to OBDA query answering under bag semantics is closely related to existing approaches for answering conjunctive counting aggregate queries over DL-Lite R [36,38]. Indeed, under bag semantics, CQs are intrinsically equipped with counting power: the result of evaluating a CQ over a (set or bag) interpretation under bag semantics is a bag of answer tuples, where each tuple comes with a multiplicity (i.e., a "count").
Calvanese et al. [38] proposed an epistemic semantics, where query answers are obtained by evaluating the query over the (finite) set of all ABox facts entailed by the ontology. Although this approach is well-suited for practical implementations, it can easily lead to counter-intuitive answers. To remedy this, Kostylev and Reutter [36] proposed a certain answers semantics that requires the query to be evaluated over all models of the ontology, which yields more intuitive answers at the expense of increased computational cost.
In what follows we take a closer look at Kostylev and Reutter's framework, which is formalised in the following definition. The main difference between their setting and ours is that they consider set ABoxes and conventional set-based semantics of ontologies.
Definition 84. A count aggregate query is the expression q c (x, Count()) = ∃y. φ(x, y), where φ(x, y) is a conjunction of atoms over atomic concepts and roles. A (set) interpretation I satisfies q c (a, m), for a tuple a over I with |a| = |x| and a number m ∈ N ∞ 0 , if there are exactly m valuations λ : x ∪y ∪I → I with λ(x) = a I and λ(a) = a I for each a ∈ I that make φ(x, y) true in I . A number n ∈ N ∞ 0 is in the count aggregate certain answers C ert(q c , a, K) to q c for a tuple of individuals a and DL-Lite R ontology K if n ≤ min{m ∈ N ∞ 0 | there is I |= K satisfying q c (a, m)}. Count aggregate certain answers C ert UNA (q c , a, K) under the UNA are defined in the same way except that only interpretations satisfying the UNA are considered.
Intuitively, the count associated to a in the count aggregate certain answers to q c is the minimum number of matching valuations over all models of the ontology. The following proposition establishes a correspondence between our setting and theirs: under suitable restrictions on the TBox, assuming only set ABoxes, and adopting the UNA, the certain answers to CQs under our bag semantics coincide with the answers to the corresponding count query as given in Definition 84.
Proposition 85. For every DL-Lite R ontology K s = T , A s such that T does not contain any inclusions of the form ∃R C , every count aggregate query q c (x, Count()) = ∃y. φ(x, y), every tuple of individuals a, and every n ∈ N ∞ 0 , n ∈ Cert UNA (q c , a, K s ) if and only if n ≤ q K b ,UNA (a), where q is the CQ ∃y. φ(x, y), K b is the DL-Lite b R ontology obtained from K s by considering A s as a bag ABox (i.e., as the bag ABox assigning 1 to all assertions in A s and 0 to all others), and q K b ,UNA is the certain answers to q over K b under the UNA.
Proof. First, we claim that for every set interpretation I s that is a model of K s there exists a bag interpretation I b that is a model of K b such that, for every tuple of individuals a, q I b (a) = m where m is the number with I s satisfying q c (a, m). Indeed, given such an I s we can take as I b the bag version of I s -that is, the bag interpretation such that S I b (b) = 1 for an atomic concept or role S and individuals b if b ∈ S I s and S I b (b) = 0 otherwise. Bag interpretation I b is a model of K b because the restriction on the TBox rules out any forced increase of multiplicities, while q I b (a) = m holds for every a because each valuation contributes to q I b (a) with multiplicity 0 and 1, and the ones with 1 are precisely those that turn φ to true.
Second, we claim that for every bag interpretation I b that is a model of K b there exists a set interpretation I s that is a model of K s such that I s satisfies q c (a, m) for every tuple of individuals a and for a number m ≤ q I b (a). Indeed, given such an I b we can take as I s the "characteristic" version of I b -that is, the set interpretation such that b ∈ S I s for an atomic concept or role S and individuals b if and only if S I b (b) ≥ 1. Set interpretation I s is a model of K s just by definition, while I s additionally satisfies q c (a, m) for a tuple a and a number m ≤ q I b (a) because, by construction, a valuation contributes to q I b (a) a multiplicity greater than 0 if and only if it turns φ to true.
These two claims immediately imply the statement of the proposition.
Note, however, that both the UNA and the restriction on TBoxes are necessary for Proposition 85 to hold, and dropping any of these makes the two frameworks incompatible. Indeed, if the UNA is not adopted, then, for the simple ontology We conclude by pointing out that the work by Kostylev and Reutter is also strongly related to existing approaches in the database literature for answering counting aggregate queries in the presence of incomplete information [39][40][41]. As observed by Kostylev and Reutter, however, these approaches are not directly applicable to answering counting queries in the presence of an ontology, and we refer to [36] for a detailed discussion.

Other related work
Jiang [42] proposed a bag semantics for the description logic ALC. The author focuses on satisfiability checking and provides a tableaux-based decision procedure. Their semantics is, however, incomparable to ours. For example, concepts of the form ∃R. are not interpreted as the bag projection on the first argument of role R, which makes the semantics incompatible with SQL.
Query answering and optimisation under bag semantics have received significant attention in the database literature [22,23,27,[29][30][31]33,43]. These works study the relative expressive power of bag algebra primitives, the relationship with set-based algebras, and establish the data complexity of query answering, query containment, and query equivalence. More recently, Console et al. [44] studied query answering under bag semantics in incomplete relational databases. Query answering and its data complexity under bag semantics have been recently studied as well in the setting of the Semantic Web query languages [45,46].

Conclusion and future work
In this article, we have proposed a novel bag semantics for OBDA and studied the computational properties of its associated query answering problems. The key advantage of our semantics is that it allows us to faithfully represent arbitrary GAV mappings (and not just those whose source query involves duplicate elimination) in a way that is compatible with SQL. Furthermore, our semantics is compatible with existing bag-based semantics for databases with incomplete information and data exchange.
We see many interesting directions for future work. First, we are planning to extend the query language to allow for database-style aggregate functions and to study suitable restrictions ensuring rewritability of such queries. Second, it would be interesting to try to push the rewritability boundaries for CQs to include some constant-free Boolean queries. For this, an interesting starting point could be the notion of local concepts and queries proposed by Gutiérrez-Basulto et al. [47] as a means of regaining decidability of query answering over DL-Lite R for the class of CQs with inequalities. Finally, our rewriting algorithms are not designed with an efficient implementation in mind. We plan to develop a practically applicable rewriting algorithm for our bag semantics.
Proof. Let E be a fixed BALG 1 algebra expression, D be a bag database instance, a be a tuple of constants over I, and k be a number in N 0 . Consider the bag database instance D 0 over I and schema extended with a fresh predicate T of arity |a| in which T assigns k to a and 0 to all other tuples of size |a|, while every other predicate is as in D.
Let E 0 = T − E. We claim that E D (a) ≥ k if and only if E D 0 0 (a) = 0. Indeed, assuming E D (a) ≥ k, we derive E D 0 0 (a) = 0. Similarly, assuming E D (a) < k, we derive E D 0 0 (a) > 0. The above many-one reduction is computable, for each D, a, and k, by a Boolean circuit with arbitrary fan-in AND and OR gates whose depth depends only on E. We conclude that language {(D, a, k) | E D (a) ≥ k} is contained in {(D, a, k) | E D (a) = k} under LogSpace-uniform AC 0 reductions, as required.
Grumbach and Milo [22] studied the expressive power of QueryAnswering = [BALG 1 ] and showed that it is strictly in between AC 0 and LogSpace. Therefore, we can conclude the following fact.

Appendix B. Relationship between BCALC and BALG 1
Theorem 88. For each BCALC query there is a BALG 1 algebra expression E such that D = E D for every bag database instance D.
Proof. In this proof we assume that all the variables X are globally ordered, and each tuple of repetition-free variables has its variables according to this order; moreover, for a BCALC query ∃y. (x, y) we assume that all y are after all x in the order, which is done without loss of generality, because BCALC is agnostic to renaming of variables.
We define BALG 1 algebra expression E for each BCALC query by induction on the structure of as follows: -if (x 1 , . . . , x n ) = S(t 1 , . . . , t k ) for a predicate S and a tuple of terms t 1 , . . . , t k with the first occurrence of each x i in a position p i , then, for c 1 , . . . , c k ∈ I such that c j is fresh if t j is a variable and c j is t j otherwise for each j, E = π p 1 ,...,p n σ α 1 (X)=α m 1 (X) · · · σ α k (X)=α m k (X) S × β(τ (c 1 , . . . , c k )) · · · , where, for each j, m j = p i if t j is x i and m j = j + k otherwise; ..,p k σ α m+1 (X)=α s 1 (X) · · · σ α m+n (X)=α sn (X) E 1 × E 2 ) · · · , where, for each ∈ [1, k], p is the position of the first occurrence of in i 1 , . . . , i m , j 1 , . . . , j n and, for each ∈ [1, n], s is the position of the first occurrence of j in i 1 , . . . , i m , j 1 , . . . , j n ; -if (x) = ∃y. (x, y), then E = π 1,...,|x| (E ); It is now straightforward to check that D = E D for every bag database D.
The following fact is a direct consequence of Corollary 87 and Proposition 88.