Modelling the Compatibility of Licenses

,


Introduction
Web applications facilitate combining resources (linked data, web services, source code, documents, etc.) to create new ones. To facilitate reuse, resource producers should systematically associate licenses with resources before sharing or publishing them [1]. Licenses specify precisely the conditions of reuse of resources, i.e., what actions are permitted, obliged and prohibited when using the resource.
For a resource producer, choosing the appropriate license for a combined resource or choosing the appropriate licensed resources for a combination is a difficult process. It involves choosing a license compliant with all the licenses of combined resources as well as analysing the reusability of the resulting resource through the compatibility of its license. The risk is either, to choose a license too restrictive making the resource difficult to reuse, or to choose a not enough restrictive license that will not sufficiently protect the resource.
Relations of compatibility, compliance and restrictiveness on licenses could be very useful in a wide range of applications. Imagine license-based search en-gines for services such as GitHub 3 , APISearch 4 , LODAtlas 5 , DataHub 6 , Google Dataset Search 7 or OpenDataSoft 8 that could find resources licensed under licenses compatible or compliant with a specific license. Answers could be partially ordered from the least to the most restrictive license. We argue that a model for license orderings would allow the development of such applications.
We consider simplified definitions of compliance and compatibility inspired by works like [2][3][4][5]: a license l j is compliant with a license l i if a resource licensed under l i can be licensed under l j without violating l i . If a license l j is compliant with l i then we consider that l i is compatible with l j and that resources licensed under l i are reusable with resources licensed under l j . In general, if l i is compatible with l j then l j is more (or equally) restrictive than l i . We also consider that a license l j is more (or equally) restrictive than a license l i if l j allows at most the same permissions and has at least the same prohibitions/obligations than l i . Usually but not always, when l i is less restrictive than l j then l i is compatible with l j . For instance, see Fig. 1 that shows an excerpt of three Creative Commons (CC) 9 licenses described in RDF and using the ODRL vocabulary 10 . Notice that there exists a restrictiveness order among these licenses, (a) is less restrictive than (b) and (b) is less restrictive than (c). By transitivity (a) is less restrictive than (c). Notice also that (a) is compatible with (b) and (c), but (b) is not compatible with (c). This is due to the semantics of the prohibited action DerivativeWorks that forbids the distribution of a derivation (remix, transform or build upon) of the protected resource under a different license. Thus, depending on the semantics of their actions, a restrictiveness relation between two licenses does not imply a compatibility relation.
Our research question is: given a license l i , how to automatically position l i over a set of licenses in terms of compatibility and compliance? The challenge we face is how to generalise the automatic definition of the ordering relations among licenses while taking into account the influence of the semantics of actions.
Inspired by lattice-based access control models [6,7], we propose CaLi (ClAssification of LIcenses), a model for license orderings that uses restrictiveness relations and constraints among licenses to define compatibility and compliance. We validate experimentally CaLi with a quadratic algorithm and show its usability through a prototype of a license-based search engine. Our work is a step towards facilitating and encouraging the publication and reuse of licensed resources in the Web of Data. However, it is not intended to provide legal advice.
This paper is organised as follows. Section 2 discuses related works, Section 3 introduces the CaLi model, Section 4 illustrates the usability of our model, Section 5 shows experiments of the implemented algorithm as well as the prototype of a license-based search engine, and Section 6 concludes.

Related work
Automatic license classification requires machine-readable licenses. License expression languages such as CC REL 11 , ODRL, or L4LOD 12 enable fine-grained RDF description of licenses. Works like [8] and [9] use natural language processing to automatically generate RDF licenses from licenses described in natural language. Other works such as [10][11][12] propose a set of well-known licenses in RDF described in CC REL and ODRL. Thereby, in this work, we suppose that there exist consistent licenses described in RDF.
There exist some tools to facilitate the creation of license compliant resources. TLDRLegal 13 , CC Choose 14 and ChooseALicense 15 help users to choose actions to form a license for their resources. CC search 16 allows users to find images licensed under Creative Commons licenses that can be commercialized, modified, adapted, or built upon. Web2rights proposes a tool to check compatibility among Creative Commons licenses 17 . DALICC [12] allows to compose arbitrary licenses and provides information about equivalence, similarity and compatibility of licenses. Finally, Licentia 18 , based on deontic logic to reason over the licenses, proposes a web service to find licenses compatible with a set of permissions, obligations and prohibitions chosen by the user. From these tools, only Licentia and DALICC use machine-readable licenses 19,20 in RDF. But unfortunately, these works do not order licenses in terms of compatibility or compliance.
The easiest way to choose a license for a combined resource is to create a new one by combining all resource licenses to combine. Several works address the problem of license compatibility and license combination. In web services, [2] proposes a framework that analyses compatibility of licenses to verify if two services are compatible and then generates the composite service license. [13] addresses the problem of license preservation during the combination of digital resources (music, data, picture, etc.) in a collaborative environment. Licenses of combined resources are combined into a new one. In the Web of Data, [3] proposes a framework to check compatibility among CC REL licenses. If licenses are compatible, a new license compliant with combined ones is generated. [4] formally defines the combination of licenses using deontic logic. [14] proposes PrODUCE, an approach to combine usage policies taking into account the usage context. These works focus on combining operators for automatic license combination but do not propose to position a license over a set of licenses.
Concerning the problem of license classification to facilitate the selection of a license, [15] uses Formal Concept Analysis (FCA) to generate a lattice of actions. Once pruned and annotated, this lattice can be used to classify licenses in terms of features. This classification reduces the selection of a license to an average of three to five questions. However, this work does not address the problem of license compatibility. Moreover, FCA is not suitable to generate compatibility or restrictiveness relations among licenses. FCA defines a derivation operator on objects that returns a set of attributes shared by the objects. We consider that the set of actions in common of two licenses is not enough to infer these relations. If applied to our introductory example, FCA can only work with permissions but not with obligations and prohibitions. That is because l i is less restrictive than l j if permissions of l i are a superset of permissions of l j , but regarding obligations and prohibitions, l i is less restrictive than l j if they are a subset of those of l j . In the context of Free Open Source Software (FOSS), [5] proposes an approach, based on a directed acyclic graph, to detect license violations in existing software packages. It considers that license l i is compatible with l j if the graph contains a path from l i to l j . However, as such a graph is build from a manual interpretation of each license, its generalisation and automation is not possible.
In the domain of access control, [6] proposes a lattice model of secure information flow. This model classifies security classes with associated resources. Like in the compatibility graph of [5], security class sc i is compatible with sc j if the lattice contains a path from sc i to sc j . Thus, this path represents the authorized flow of resources (e.g., resource r i protected with sc i can flow to a resource protected by sc j without violating sc i .). The lattice can be generated automatically through a pairwise combination of all security classes if sc i combined with sc k gives sc j where sc i and sc k are both compatible with sc j . [7] describes several models based on this approach but none focuses on classifying licenses.
None of these works answers our research question. They do not allow to automatically position a license over a set of licenses in terms of compatibility or compliance. In our work we propose a lattice-based model inspired by [6]. This model is independent of any license description language, application context and licensed resource so that it can be used in a wide variety of domains.

CaLi: a lattice-based license model
The approach we propose to partially order licenses in terms of compatibility and compliance passes through a restrictiveness relation. In a license, actions can be distributed in what we call status, e.g., permissions, obligations and prohibitions. To decide if a license l i is less restrictive than l j , it is necessary to know if an action in a status is considered as less restrictive than the same action in another status. In the introductory example ( Fig. 1), we consider that permissions are less restrictive than obligations, which are less restrictive than prohibitions, i.e., P ermission Duty P rohibition. This relation can be seen in Fig 2b. We remark that if two licenses have a restrictiveness relation then it is possible that they have a compatibility relation too. The restrictiveness relation between the licenses can be automatically obtained according to the status of actions without taking into account the semantics of the actions. Thus, based on lattice-ordered sets [16], we define a restrictiveness relation among licenses.
To identify the compatibility among licenses, we refine the restrictiveness relation with constraints. The goal is to take into account the semantics of actions. Constraints also distinguish valid licenses from non-valid ones. We consider a license l i as non-valid if a resource can not be licensed under l i , e.g., a license that simultaneously permits the Derive action 21 and prohibits DerivativeWorks 22 .
This approach is based on: 1. a set of actions (e.g., read, modify, distribute, etc.); 2. a restrictiveness lattice of status that defines (i) all possible status of an action in a license (i.e., permission, obligation, prohibition, recommendation, undefined, etc.) and (ii) the restrictiveness relation among status; a restrictiveness lattice of licenses is obtained from a combination of 1 and 2; 3. a set of compatibility constraints to identify if a restrictiveness relation between two licenses is also a compatibility relation; and 4. a set of license constraints to identify non-valid licenses.
Next section introduces formally the CaLi model and Section 3.2 introduces a simple example of a CaLi ordering.

Formal model description
We first define a restrictiveness lattice of status. We use a lattice structure because it is necessary, for every pair of status, to know which status is less (or more) restrictive than both.

Definition 1 (Restrictiveness lattice of status LS).
A restrictiveness lattice of status is a lattice LS = (S, S ) that defines all possible status S for a license and the relation S as the restrictiveness relation over S. For two status s i , s j , if s i S s j then s i is less restrictive than s j . Different LSs can be defined according to the application domain. Fig. 2a shows the diagram of a LS inspired by file systems where actions can be either prohibited or permitted. With this lattice, prohibiting to read a file is more restrictive than permitting to read it. Fig. 2b illustrates a LS for CC licenses where actions are either permitted, required (Duty) or prohibited. Fig. 2c shows a LS inspired by the ODRL vocabulary. In ODRL, actions can be either permitted, obliged, prohibited or not specified (i.e., undefined). In this lattice, the undefined status is the least restrictive and the prohibited one the most restrictive. Fig. 2d shows a LS where a recommended or permitted action is less restrictive than the same action when it is permitted and recommended. Now we formally define a license based on the status of its actions.

Definition 2 (License).
Let A be a set of actions and LS = (S, S ) be a restrictiveness lattice of status. A license is a function l : A → S. We denote by L A,LS the set of all licenses.
For example, consider A = {read , modify, distribute}, LS the lattice of Fig.  2c and two licenses: l i which permits read and distribute but where modify is undefined and l j where modify is also undefined but which permits read and prohibits distribute. We define l i and l j as follows: ∀a ∈ A: A restrictiveness lattice of status and a set of licenses make possible to partially order licenses in a restrictiveness lattice of licenses.

Definition 3 (Restrictiveness relation over licenses).
Let A be a set of actions and LS = (S, S ) be a restrictiveness lattice of status associated to the join and meet operators ∨ S and ∧ S , and l i , l j ∈ L A,LS be two licenses. We say that l i is less restrictive than l j , denoted l i R l j , if for all actions a ∈ A, the status of a in l i is less restrictive than the status of a in l j . That is, Moreover, we define the two operators ∨ and ∧ as follows. For all actions a ∈ A, the status of a in l i ∨ l j (resp. l i ∧ l j ) is the join (resp. meet) of the status of a in l i and the status of a in l j . That is, For example, consider LS the lattice of Fig. 2c, and licenses l i and l j defined previously; l i R l j because l i (read ) S l j (read ), l i (modify) S l j (modify) and l i (distribute) S l j (distribute). In this example, l i ∨ l j = l j because ∀a ∈ A, (l i ∨ l j )(a) = l j (a), e.g., (l i ∨ l j )(distribute) = l j (distribute) = P rohibition. If for an action, it is not possible to say which license is the most restrictive then the compared licenses are not comparable by the restrictiveness relation.

Remark 1
The pair (L A,LS , R ) is a restrictiveness lattice of licenses, whose ∨ and ∧ are respectively the join and meet operators.
In other words, for two licenses l i and l j , l i ∨ l j (resp. l i ∧ l j ) is the least (resp. most) restrictive license that is more (resp. less) restrictive than both l i and l j .

Remark 2
For an action a ∈ A, we call (L {a},LS , R ) the action lattice of a. Remark that (L A,LS , R ) and a∈A (L {a},LS , R ) are isomorphic. That is, a restrictiveness lattice of licenses can be generated through the coordinatewise product [16] of all its action lattices. The total number of licenses in this lattice is |LS| |A| .
For example, consider A = {read , modify}, LS the lattice of Fig. 2a, Figure 3a,b,c illustrates the product of these action lattices and the produced restrictiveness lattice of licenses.
To identify the compatibility relation among licenses and to distinguish valid licenses from non-valid ones it is necessary to take into account the semantics of actions. Thus, we apply two types of constraints to the restrictiveness lattice of licenses: license constraints and compatibility constraints.

Definition 4 (License constraint).
Let L A,LS be a set of licenses. A license constraint is a function ω L : L A,LS → Boolean which identifies if a license is valid or not.
For example, the license constraint ω L1 considers a license l i ∈ L A,LS non-valid if read is prohibited but modification is permitted (i.e., a modify action implies a read action): For example, consider that a license prohibits the action modify. In the spirit of DerivativeWork, we consider that the distribution of the modified resource under a different license is prohibited. Thus, the compatibility constraint ω →1 , considers that a restrictiveness relation l i R l j can be also a compatibility relation if l i does not prohibit modify. This constraint is described as: For li, lj ∈ LA,LS , Now we are able to define a CaLi ordering from a restrictiveness lattice of licenses and constraints defined before.

Definition 6 (CaLi ordering).
A CaLi ordering is a tuple A, LS, C L , C → such that A and LS form a restrictiveness lattice of licenses (L A,LS , R ), C L is a set of license constraints and C → is a set of compatibility constraints. For two licenses l i R l j ∈ L A,LS , we say that l i is compatible with l j , denoted by l i → l j , if ∀ω L ∈ C L , ω L (l i ) = ω L (l j ) = T rue and ∀ω → ∈ C → , ω → (l i , l j ) = T rue.

Remark 3
We define the compliance relation as the opposite of the compatibility relation. For two licenses l i , l j , if l i → l j then l j is compliant with l i .
A CaLi ordering is able to answer our research question, given a license l i , how to automatically position l i over a set of licenses in terms of compatibility and compliance? It allows to evaluate the potential reuse of a resource depending on its license. Knowing the compatibility of a license allows to know to which extent the protected resource is reusable. On the other hand, knowing the compliance of a license allows to know to which extent other licensed resources can be reused. Next section shows an example of CaLi ordering.

Example 1
Consider a CaLi ordering A, LS, {ω L1 }, {ω →1 } such that: -A is the set of actions {read, modify}, -LS is a restrictiveness lattice of status where an action can be either permitted or prohibited, and P ermission S P rohibition (cf Fig. 2a), ω L1 is the license constraint introduced in the example of Def. 4, and ω →1 is the compatibility constraint introduced in the example of Def. 5.  Consider a set of resources R = {r 1 , r 2 , r 3 , r 4 , r 5 }. is the has license relation such that {r 1 , r 2 } l 1 ; r 3 l 3 ; {r 4 , r 5 } l 4 . Thanks to our CaLi ordering, next questions can be answered.
-Which licensed resources can be reused in a resource that has as license l 3 ?
Those resource whose licenses are compatible with l 3 : r 1 and r 2 that have license l 1 which precedes l 3 , as well as r 3 that has the license l 3 itself. -Which licensed resources can reuse a resource that has as license l 1 ? Those resource whose licenses are compliant with l 1 : r 3 , r 4 and r 5 that have licenses l 3 and l 4 which follow l 1 , as well as r 1 and r 2 that have the license l 3 itself.
Resulting licenses can be returned ordered in a graph of compatibility. We illustrated CaLi with a simple restrictiveness lattice of status, next section introduces a more realistic CaLi ordering inspired by licenses of Creative Commons.

A CaLi ordering for Creative Commons
Creative Commons proposes 7 licenses that are legally verified, free of charge, easy-to-understand and widely used when publishing resources on the Web. These licenses use 7 actions that can be permitted, required or prohibited. In this CaLi example, we search to model a complete compatibility ordering of all possible valid licenses using these 7 actions.

Description of a CC ordering based on CaLi
Consider CC_CaLi, a CaLi ordering A, LS, C L , C → such that:  Other constraints could be defined to be closer to the CC schema 24 but for the purposes of this compatibility ordering these constraints are enough.

Analysis of CC_CaLi
The size of the restrictiveness lattice of licenses is 3 7 but the number of valid licenses of CC_CaLi is 972 due to C L . That is, 5 actions in whatever status and 2 actions (cc:CommercialUse and cc:ShareAlike) in only 2 status: 3 5 * 2 2 .
23 To simplify, we consider that a requirement is a duty. 24 https://creativecommons.org/ns The following CC_CaLi licenses are like the official CC licenses. The following CC_CaLi licenses are not part of the official CC licenses. License CC l 1 is like CC BY-NC but without the obligation to give credit to the copyright holder/author of the resource. CC l 2 is like CC BY but with the prohibition of making multiple copies of the resource. License CC l 3 allows only exact copies of the original resource to be distributed. CC l 4 is like CC l 3 with the prohibition of commercial use. In CC_CaLi, the minimum is the license where all actions are permitted (i.e., CC Zero) and the maximum is the license where all actions are prohibited. Fig. 4 shows two subgraphs of CC_CaLi with only the compatibility relations. Fig. 4a shows only the 7 official CC licenses and Fig. 4b includes also CC l 1 to CC l 4 . These graphs can be generated using the CaLi implementation (cf Section 5). Thanks to ω →2 , the restrictiveness relation between CC BY-SA and CC BY-NC-SA is not identified as a compatibility relation and thanks to ω →3 , the restrictiveness relation between CC BY-ND and CC BY-NC-ND is not identified as a compatibility relation. We recall that a license that prohibits cc:DerivativeWorks is not compatible even with itself.
The compatibility relations of Fig. 4a are conform to the ones obtained from the Web2rights tool. This example shows the usability of CaLi with a real set of licenses.

Implementation of CaLi orderings
The goal of this section is twofold, to analyse the algorithm we implemented to produce CaLi orderings and to illustrate the usability of CaLi through a prototype of a license-based search engine.

Experimental validation
The size growth of CaLi orderings is exponential, i.e., |LS| |A| . Nevertheless, it is not necessary to explicitly build a CaLi ordering to use it. Sorting algorithms like insertion sort can be used to produce subgraphs of a CaLi ordering.
We implemented an algorithm that can sort any set of licenses using the LS of We use a heuristic, based on the restrictiveness of the new license, to chose between two strategies, 1) to insert a license traversing the graph from the minimum or 2) from the maximum. To do this, our algorithm calculates the relative position of the new license (node) from the number of actions that it obliges and prohibits. The median depth (number of levels) of the existing graph is calculated from the median of the number of prohibited and obliged actions of existing licenses. Depending on these numbers, a strategy is chosen to find the place of the new license in the graph.
Results shown in Fig. 5 demonstrate that our algorithm sorts a set of licenses with at most n 2 /2 comparisons. We used 20 subsets of licenses of different sizes from the CC_CaLi ordering. Size of subsets was incremented by 100 up to 2187 licenses. Each subset was created and sorted 3 times randomly. The curve was produced with the average of the number of comparisons to sort each subset.
A comparison of restrictiveness takes on average 6 milliseconds 25 , thus to insert a license in a 2000 licenses graph takes an average of 12 seconds. Building

A search engine based on an ODRL CaLi ordering
We implemented a prototype of a search engine that allows to find linked data 27 and source code repositories 28 based on the compatibility or the compliance of their licenses. We use licenses described with the ODRL vocabulary. ODRL proposes properties to define semantic dependencies among actions 29 that we translate as CaLi constraints. Included In is defined as "An Action transitively asserts that another Action encompasses its operational semantics". Implies is defined as "An Action asserts that another Action is not prohibited to enable its operational semantics". Thereby we consider that if an action a i is included in another action a j then a i implies a j . For example, CommercialUse is included in use, therefore we consider that CommercialUse implies use. That means that if CommercialUse is permitted then use should be permitted too. To preserve this dependency we implemented the constraint ω L4 . We use ODRL_CaLi, a CaLi ordering A, LS, C L , C → such that: -A is the set of 72 actions of ODRL, -LS is the restrictiveness lattice of status of Fig. 2c, The size of this ordering is 4 72 and it is not possible to build it. This search engine illustrates the usability of ODRL_CaLi through two subgraphs. On the one side, there is a subgraph with the most used licenses in DataHub 30 and OpenDataSoft. Licenses in this graph are linked to some RDF datasets such that it is possible to find datasets whose licenses are compatible (or compliant) with a particular license. On the other side, there is a subgraph with the most used licenses in GitHub. Here, licenses are linked to some GitHub repositories and it is possible to find repositories whose licenses are compatible (or compliant) with a particular license.
Discussion The model we propose uses restrictiveness as the basis to define compatibility and compliance among licenses. This strategy works most of the time, as we have shown in this paper, but it has certain limitations. In particular, CaLi is not designed to define the compatibility of two licences if it is not coherent with their restrictiveness relation. As an example consider two versions of MPL licenses. Version 2.0 relaxes some obligations compared to version 1.1. Thus, MPL-2.0 is less restrictive than MPL-1.1. With CaLi constraints, it can only be possible to say that MPL-2.0 is compatible with MPL-1.1. But in the legal texts it is said the opposite, i.e., MPL-1.1 is compatible with MPL-2.0.
Thereby, particularities in the usage of compatibility of licenses, the granularity of the semantisation of licenses and the understanding of some actions (like ShareAlike) are the main reasons of the difference between CaLi orderings and other classifications. This is the case, for instance, of our compatibility graph devoted to licenses of GitHub and the graph presented in [5].

Conclusions and perspectives
We proposed a lattice-based model to define compatibility and compliance relations among licenses. Our approach is based on a restrictiveness relation that is refined with constraints to take into account the semantics of actions existing in licenses. We have shown the feasibility of our approach through two CaLi orderings, one using the Creative Commons vocabulary and the second using ODRL. We experimented the production of CaLi orderings with the implementation of an insertion sort algorithm whose cost is n 2 /2. We implemented a prototype of a license-based search engine that highlights the feasibility and usefulness of our approach. Our compatibility model does not intent to provide a legal advice but it allows to exclude those licenses that would contravene a particular license.
A perspective of this work is to take into account other aspects of licenses related to usage contexts like jurisdiction, dates of reuse, etc. Another perspective is to analyse how two compatibility orderings can be compared. That is, given two CaLi orderings, if there is an alignment between their vocabularies and their restrictiveness lattices of status are homomorphic then find a function to pass from a CaLi ordering to another.