An argumentation reasoning approach for data processing

Data-intensive environments enable us to capture information and knowledge about the physical surroundings, to optimise our resources, enjoy personalised services and gain unprecedented insights into our lives. However, to obtain these endeavours extracted from the data, this data should be generated, collected and the insight should be exploited. Following an argumentation reasoning approach for data processing and building on the theoretical background of data management, we highlight the importance of data sharing agreements (DSAs) and quality attributes for the proposed data processing mechanism. The proposed approach is taking into account the DSAs and usage policies as well as the quality attributes of the data, which were previously neglected compared to existing methods in the data processing and management ﬁ eld. Previous research provided techniques towards this direction; however, a more intensive research approach for processing techniques should be introduced for the future to enhance the value creation from the data and new strategies should be formed around this data generated daily from various devices and sources.


Introduction
Big data paradigm as a shifting phenomenon [14,33,7,40] provides access to a large pool of data resources; coupled with new storage, management and analytical techniques.These unprecedented opportunities arising from 'Big Data' are stressing firms to position themselves within this highly competitive data-intensive scheme, altering the way they generate, collect and transform data to actionable knowledge [10].Gaining a competitive posture requires more than analytical techniques [36]; making sense [68] of this data is the challenge of organisations, forming a new way of data-based decision-making, disrupting the business landscape while moving from the world of "making things" to a "world of outcomes" [44].The industrial world was disruptively changed during the last century; while the manufacturing process has gone through multiple revolutions (agricultural, industrial, digital etc.) and respectively multiple alterations in the way products and services are formed.A critical view of the literature associated with the manufacturing context [41,62] of the last decades, reveals that this evolution around data has influenced radically also the production, distribution and supply chain world.
The data evolution (also often referred as "Big Data" evolution) has also accelerated the popularity of service industries around data (for storing, analysing, processing etc.).In the context of these data services; myriads of data are shared as well as stored, used and transformed, not only by users but also by companies, organisations, and governments.Security concerns regarding shared data, e.g., privacy, confidentiality, integrity, and the necessity of protecting them are becoming serious issues.Towards this direction, the regulatory and legislative environment is continuously updated for meeting citizens' rights, protecting their data privacy, anonymity, and security.The regulatory and legislative implications of security issues are becoming rather more severe when we extend the scope across the immediate organisational, governmental or even country borders; where multiple legislative domains are applied, and frequently they lead to conflicting behaviours and interests.Exchanging data implies that all parties agree on the associated rules to be enforced; this is often referred as a data sharing agreement [59] (DSA).The DSA can be seen as a contract, between two or more parties, and the different rules are the terms of the contract.The terms express how and who is permitted or denied to access, delete, use, and share the data, along with the different constraints that should be respected.
The enforcement is done when the data are used, and it evaluates the requests and usage of the data against the set agreements.
Constructing and representing DSAs is not trivial, as they should incorporate data access and usage rules for the security and privacy of the data, as well as users preferences, business and legislative rules applicable to that case.All the above rules are applied to the same bunch of data; therefore there can be conflicting behaviours between rules, and differentiated legislative, regulatory rules and domain contexts that bring inconsistencies inside the DSAs rules.Different techniques [42,63,70] were introduced for data services towards various security solutions, with a major focus on data access permissions and an efficient usage control [17,69], suitable for stored or integrated data from multiple parties and sources.Access and usage control is a well-studied research area [39,53], however the existing solutions do not permit a fine grained representation of the different types of rules and their constraints, as well as the associated conflict detection and resolution.
The aim of this study is to propose a novel data processing technique using a policy analysis language for the representation of the sharing agreements and quality attributes of the data.We extend our research scope in an industrial context where the data quality control should also be studied as it is often a neglected aspect in the context of data sharing agreements.Initially, we provide the theoretical background where this study was based, coupling two streams of literature.Previous research in data management was examined in order to explain the data manufacturing analogy and the quality problem which was also highlighted in past decades, as well as the data access and usage control background for building the concepts associated with DSAs.The following sections develop the proposed methodological approach based on a policy language for argumentation and abductive reasoning used in the data processing context, and its explanation through a use case scenario and the relevant analysis and representation.Our study ends up with a research agenda for the future implications of this approach, as well as how this can be extended and applied in industrial data-intensive environments.

Theoretical background
The analogy of the data processing to the manufacturing processing of physical materials is prevalent in the literature of data management.There are many similarities between the mechanisms of data processing and the manufacturing processing; however, there are some significant differences.In a manufacturing process, physical materials are input into a process, the materials are transformed, and the resulting output is a manufacturing product.In the data processing, the data represent the input into the processing mechanism, and a transformed data product is the output of the process.Data of bad quality used through the data process will remain bad until the quality is improved (until the problem is actively cleaned up or removed).Data sharing controls were not well-studied in previous decades, as the data were mostly shared within the boundaries of a company or between single databases, where the trust and security issues were solved by the individual sharing entities and the associated agreements between interested parties.
The data manufacturing framing presented in data management literature of the previous decades should be extended in a context across boundaries between individuals, firms and countries and the target should be tailored data products, as these are mostly the results of the data evolution.Data manufacturing analogy was based on the data artefact as a unit of analysis; however, data era requires novel techniques focusing on the processing of data but not solely on their processing mechanism, but also the quality and sharing attributes associated with them.This section will explore the previous research of data management and data sharing literature, with the view to provide a background for the problem area and how this is expanded in a data-intensive industrial context.

Data management and quality attributes
Plenty of examples nowadays can provide evidence that companies from multiple industries invest in data management solutions, with the view to improve and expand the production of enterprises through the use of their analytical skills, or by viewing and optimising their supply chains of their core business [62,61].Although data can be used along with the core business focus in different industries, the recent data evolution expanded the business scope and disrupted the operating models providing opportunities to process this data, create new data and information products/services and also to resell and exchange this data.Reviews of the literature in the context of data management in an industrial context mostly focus on "Supply Chain Analytics (SCA)" [64] as a way in "developing supply chain strategies and efficiently managing supply chain operations at tactical and operational levels" [64].Supply Chain Management (SCM) focuses mostly on how the analytics can be applied to strategic decisions related to SCM [64,62], how efficiency and effectiveness of supply chains can be improved through the use of data [61] as well as the data strategies and servitization around supply chains [46].This direction reflects that the research focuses mostly the use of data within an industrial context; nevertheless, our research focus will be on data processing techniques, setting data manufacturing analogy from data management literature as the main theoretical background, analogous to the product/service manufacturing processes.Literature reviews and frameworks referring to data/ information processing are presented and summarised in Table 1, in Appendix A. This summary reveals that data analogous to physical materials are moved through a manufacturing process which reshapes/reconfigures them in information/data products.
Data processing was initially introduced by Brodie [6] through the analogy between product manufacturing and data manufacturing when data quality was a primary concern in transforming data to valid information and knowledge [1].Some of the most indicative studies around these areas developed the concept of data manufacturing analogy in order to find out the path for better data quality [38,1,18] and they provided frameworks to describe and track data manufacturing process [65,2,66,56].A simple framework of input-process-output describing the similarities between the two manufacturing processes was proposed in [66] and calls for continuously defining, measuring, analysing, and improving data quality.Mostly, the data manufacturing analogy was focusing on data quality and the ways to ensure that we can trust the data we use in manufacturing processes.Recent studies following the data manufacturing path can be considered those in [19,16] where they introduce the need for continuous improvement in the SCM data production process, suggesting a framework for establishing a data quality control mechanism.These two studies are investigating the data manufacturing process via a data quality lens and expand the research focus to new topics related to the processing of data, and how to improve the quality of this data with the use of various techniques and methods.

Data access and usage control
Research around data sharing has provided data-centric solutions for protecting the used and shared data, aside from focusing on protecting the databases where they are stored [15], the network used for their transfer [28], or constructing coordination techniques for data re-use [25].The focus in datacentric solution is due to the increase in the connectivity and the diversity of attacks.Protecting and ensuring the security of all the environments where the data are collected, processed, and shared is a major challenge for all the interested parties.Therefore, datacentric security solutions have an important position in the literature [4,31,42,63,70] with focus on protecting data transfers and transactions.Data-centric security solutions present two main challenges: the control of data access and the usage of data.Both of them, have been widely studied and various solutions are developed for solving these problems [34,12].
Previous approaches in role-based access control [12] are based on different user roles for data access controls.Motivated by this direction, we will expand these aspects with a data representation technique for different user roles with specific access and usage policies for the data.Usage control (UCON) [48] is a widely studied concept following different approaches, for controlling the access and usage of digital information emphasising the problem of rights delegation [47], or a two level policy language that represents the notions of prohibition and obligation, and a generic server-side architecture for UCON [52].Applications of UCON are also presented in the context of distributed systems [29], when having a data flow in-between different connected systems, or in the context of multiple distributed systems as a fully decentralised infrastructure for enforcement of global usage control [30].Our work is building on the previous research of UCON while expanding the scope regarding the policies that represent permissions, denials, obligations and delegations of rights.
Another interesting approach to sharing and accessing data is the use of sticky policies [42,43].Sticky policies are machine readable policies that contain conditions and constraints attached to data that describe how the data should be treated while shared among multiple parties.The sticky policy paradigm [27] and technologies for enterprise privacy enforcement and exchange of customer data are represented through a privacy control language [26] for specified privacy rights and obligations.The privacy control language presents authorisation management and access control for user consents, obligations and distributed administration, with the extension of the sticky policy paradigm also for the cloud environment [49,60].Our research will build on the policy language for policy representation and will expand the application of this technique in the industrial use of data.
Data usage concerns are usually expressed by the different entities using the data.These entities, before creating, sharing and using the data, should agree regarding the different rules that describe how the data should be treated, called data sharing agreements [59] (DSAs).The DSAs describe not only the agreements between the involved parties but also the compliance to the different business, legislative and regulatory rules for the various contexts of the data sharing.A language representation of different rules for data sharing agreements (DSAs) as presented by [39] fails to provide expressivity.This language cannot permit the representation of complex DSAs, as well as analysis for the DSAs and leaves unsolved the problem of deciding which rules to apply to the DSAs.
All the above-represented approaches, from the data access and usage control to the sticky policies and finally the DSAs representation, seem incomplete to provide a decision background for the rules that should apply to the shared data.Following this motivation, we propose a combinatory analysis of the rules with a conflict resolution technique.The proposed analysis is an enhancement of the one introduced in [24,23], as we enrich it with data quality analysis that can be easily extended for big data applications.Our solution is based on abductive [22] and argumentation based reasoning approach [5,11], as this technique can facilitate decision making mechanisms [21,3] under conflicting knowledge.

Introduction to the use case scenario
In this work, we show our data processing technique based on DSAs in a realistic use case.Let us introduce our scenario, taken from an e-health example of a European Project (Coco Cloud project 1 ).In this use case, the data needs to be collected, processed and shared between different actors, e.g., data subject, data controller, recipients, and data processor.The actors need to stipulate agreements between each other that describe how the data should be treated, the data sharing agreements, composed of security, legislative and business rules.Deciding the rules to apply to particular cases is not trivial, as various conflicts might arise.The conflicts need to be captured and solved.
One of the main actors is the data subject, in our use case is the patient, some of his rights are to access/delete his medical data and to know who is processing his data.The data controller is an entity (public authority, agency, legal person), which determines the purpose and means of processing the data of the data subject.In our use case, the hospital is the data controller of the patient's data and determines the purpose for which the data are processed (e.g., administrative purpose or treatment purpose).The doctors of the hospital are the data recipients that need to comply with the data controller rules.The data recipients are considered as part of the data controller, the employees within the hospital do not stand as separate entities than the hospital itself.The hospital that is the data controller has various rules of how the doctors can access the patient's data, e.g., a doctor needs to be inside the hospital for accessing the patient's data (geographical constraint), he needs to be during his office hours (temporal constraint), and he needs to be the patient's treating doctor (role-based constraint).
The data processor is an entity (public authority, agency, legal person) that is processing the data on behalf of the controller.In our use case, the cloud provider is considered the data processor as far as it respects the instructions of the controller.The controller rules that should be respected by the processor can also have a legal nature, e.g., if the controller is in an EU country, the cloud provider should as well be in an EU country and cannot send the data to countries outside the EU and EEA.
A third party is an entity (public authority, agency, legal person) that is not the data subject, data controller or processor, and that under the direct authority of the controller or processor is authorised to process the data.In our use case, a doctor outside the controller hospital is considered a third party.Once access is obtained, the third party becomes a data controller and has to comply with the data protection principals.The patient can be in different situations, e.g., intensive treatment, unconscious, emergency, that affect how the data are shared and used.

Methodology
The study presents the proposed approach, where the bunches of data are processed from different entities.The latter establish different agreements between each other, called data sharing agreements.The DSAs are composed of various constraints and rules.We use an expressive policy analysis language for representing the DSAs.An important aspect of the data processing mechanism is data quality.We enrich the policy language to capture various data quality properties like accessibility, timeliness and accuracy.The used policy language permits the analysis of the various policies and the detection of the rising conflicts, redundancies or the missing cases.The proposed method implements an analysis and representation based on argumentation and abductive reasoning to capture and solve conflicts between context dependent rules.The introduced analysis permits the construction of correct and efficient DSAs that apply in different contexts during the processing of data.

Policy language for DSA and data quality representation
The proposed model is based on the policy analysis language [8] that is constructed using the Event Calculus [32].This language represents the required rules and constraints during the access, usage and sharing of data.The policy regulation rules are composed of predicates and domain description ones, and represent authorisation and obligation rules, and have in their structure subject, targets, and actions.2Some of the predicates of the policy language are introduced below.reqðSub; Tar; Act; TÞ oblðSub; Tar; Act; T s ; T e ; TÞ permittedðSub; Tar; Act; TÞ deniedðSub; Tar; Act; TÞ The above predicates represent correspondingly: a request made from the subject Sub, at the instant of time T, to perform a certain action Act at the target Tar; the obligation for a given subject to perform an action during the period of time from T s to T e ; that a given subject is permitted/denied at time T, to perform a certain action to the target.A domain description predicate is holdsAt, which means that a given property/predicate is true at a given instant of time.The used policy language can represent the permission, denial and obligation concepts for the DSAs.In the following examples, we introduce the representation of some DSAs rules from our use case.
Example 1. Bob (B) is the family doctor (fDoc) of the patient Alice.The family doctor has the permission to access Alice's prescriptions, 3  The used policy language permits the representation of different predicates, authorisation, and obligation policies, together with domain description policies.The sharing and usage of data bring the need of describing other properties related to the quality of the collected data.When working with data quality the entities that are using, sharing, storing the data are called data consumers.
In our case, the data consumers are considered the data collectors and the data processors.The data quality is a major factor when we are working with data consumers, where data quality is defined as data that are fit to be used by data consumers [67].Through our DSAs representation and the applied policies, we grant an important characteristic of data quality, which is the data accessibility.We ensure the data accessibility that complies with the constraints for accessing the data.

Timeliness/freshness
Another important data quality characteristic is data timeliness, or better the degree to which data represent reality from the required point in time when the event occurred.For the context we are working on, generally, the data have a specific subject and a data controller that together with the various legislations and business rules can decide the different policies to be used for accessing and using the data.For representing the data timeliness, we use the concept of data freshness.Data freshness is a predicate that expresses the last time when a given piece of data has been updated: freshness(Tar, T), where Tar is the targeted data, and T is the last instant of time when the data were updated.Through the use of the freshness predicate, we are able to represent the data timeliness depending on the different contexts.Let us see how we can apply the above predicate to our use case.
Example 4. Suppose that a patient is hospitalized for a particular illness or symptoms related to an illness that is still being studied.He is offered to be part of a pilot project, and give the consent of sharing his medical data (anonymized/ sanitised) with a team of researchers.Therefore, the patient will be monitored/ visited in specific instants/intervals of time, e.g., the doctor should visit, (visit), the patient every four hours.For granting the data timeliness, 4 every time the patient is visited/ monitored his medical data are updated, upd.In case, there are no changes to the patient's data (e.g., same vital functions) the old data are confirmed for that instant of time.
freshnessðTar; TÞ fulfilledðSub 1 ; Tar; upd; T 0 ; T e ; T 0 Þ; holdsAtðvisitðSub 1 ; PÞ; T 0 Þ; holdsAtðownerðP; TarÞ; T 0 Þ; not holdsAtðvisitðSub 2 ; PÞ; T 00 Þ; freshnessðTar; T 000 Þ; T e > T > T 0 ; T 0 > T 00 > T 000 : The above predicate states that the given data (Tar) at the instant of time T are fresh, as the data of the patient are updated, every time the patient is visited.A deontic obligation holds for the doctor to update the patient's medical data every time he visits the patient.
fulfilledðSub; Tar; T; T e ; TÞ oblðSub; Tar; upd; T; T e ; TÞ; holdsAtðSub; Tar; upd; TÞ: oblðSub; Tar; update; T; T e ; TÞ holdsAtðSub; P; visit; TÞ; holdsAtðownerðP; TarÞ; TÞ; T e > T: The above predicates ensure that every time a patient is visited, the doctor updates the patient's data.Thus, the data freshness is in line with the visiting time, and the data timeliness is respected.The timeliness of the patient's data described above is linked to the actions of actors like doctors and nurses.The patient can also be monitored by medical IoT devices that measure, (meas), different vital functions.In this case, for ensuring the timeliness of the data, we expect that the patient's medical data are collected and saved simultaneously.The latter can be ensured by the below predicates.An important problem related to the data sharing is the linked data problem.One of the main facets of this problem is deciding the policies that should be applied to the new data, that were produced by processing existing ones.The new data are linked to the existing ones, and deciding their policies is not trivial, as processed data can have different policies, also in the conflict with each other, and the nature of new data is not known.The solution we propose is the use of the type of data.While working on our use case, the data are divided into different types, e.g., personal information, prescriptions, private prescriptions, and every type has its corresponding sets of policies.When new data are after the processing of old ones, the data subject and recipients (e.g., patient, doctor) are the ones that decide the type of data and consequently their corresponding set of policies.In this particular case, the responsibility of deciding the policies for the new data is given to the data subject and recipient that often can suffer from human errors.Despite the human errors, we believe this would be a good solution, especially for the e-health scenario, where a division of access related to the security type of data cannot be applied, e.g., the doctor can process private prescriptions of the patient (e.g., depression treatment) for producing routine prescriptions of the patient (prescription with a low privacy level).
Another interesting future solution, for the linked data problem, would be the extension of an audit process [50] which verifies the use of data for the intended purpose.Different tags should be added to the audit system and the purposes.Thus, if the audit process confirms the purpose of the data, the policies associated with that purpose are applied to the data.

Data accuracy
Another important property of data quality is the accuracy of the collected data.When the data collection is made by a human actor, we can put in the act a deontic obligation for the actor to use a particular accuracy when getting and saving the data.This case can suffer from human errors.On the other hand, when a device is collecting the data, e.g., an IoT device, we can be more specific and ensure the data accuracy by checking the parameters of the various devices.accuracyðTar; TÞ holdsAtðmeasðSub; PÞ; TÞ; holdsAtðacceptPðSubÞ; TÞ; holdsAtðownerðP; TarÞ; TÞ: In the above case, we state that the data collected for a particular subject are accurate, as the device that collected them is able to measure the data with acceptable parameters, (acceptP), for ensuring data accuracy.The same can be stated for a doctor visiting the patient and updating the data with an acceptable accuracy.

Data processing with argumentation reasoning
The data are collected, used and shared between different entities.The rules to be applied while processing the data are given through a decision process, which given the data, the various entities together with their preferences, and the applicable legal, security and business rules, decide the rules that apply to particular contexts.As introduced above these rules are called data sharing agreements, and are represented as policies.Due to the heterogeneity of the rules, and their context dependability the policies that represent them can be in conflicts, redundant, or not complete.We introduce a policy analysis for capturing the above, solving the various conflicts, and perform a decision process.The introduced analysis permits to implement and accomplish an efficient data processing.
The policy language enables an analysis based on an abductive constraint logic programming system, A-system6 [45], which is performed to the rules and permits their efficiency and soundness.In particular, the modality conflicts analysis task finds conflicts between policies regulation rules, and permits to have sound DSAs.It can capture the case when an action is both permitted/obliged and denied on the same instant of time, as well as more complex conflicts.The coverage of gaps analysis finds the different gaps (cases) that are not covered by the DSAs rules, and permits the construction of a complete list of rules that should be part of the DSAs, e.g., when there is an explicit request from a subject to perform an action, and there is no authorization policy rule that neither permits nor denies this request.The policies comparison analysis checks whether a policy is included/equivalent/implied by another one.This analysis improves the efficiency of the DSAs, by identifying redundant rules, that can be easily removed from the DSAs.
The above analysis cannot capture the conceptual conflicts, as the latter are not direct conflicts between predicates (e.g., permitted/obliged and denied predicates) and are context dependent.The rules can be in conflict with each other, as they might hold for general domain description predicates but not for specific ones, or vice versa.The resolution of the conceptual conflicts is done by introducing priorities between rules [3], and is a decisionmaking problem.Techniques based on argumentation reasoning [21,5], (that is a non-monotonic reasoning [11]) together with abductive one, is introduced to find and solve the conceptual conflicts.Argumentation reasoning implements decision-making mechanisms for conflicting rules that have priorities/preferences between them and that is strongly context dependent.Argumentation reasoning permits to represent the various conflicting rules, the context where they are valid and the preferences between them.The priorities between rules permit us to work with conflicting policies and to analyse them.The above analysis is implemented using GorgiasB7 [57], which is a tool for preferencebased argumentation with a graphical user interface.
Our decision-making technique has as input the rules together with the domain description predicates that can be facts or defeasible knowledge, and finds the conflicts between rules, if there is any, and solves them.The resolution of the conflicts is made by introducing priorities between rules, called priority rules, and explicitly specifying when a rule has to be considered stronger than another one.A preference/priority relation, denoted by >, is used to indicate preferences between rules.Given two conflicting rules r 1 and r 2 , where for the context and the information we have, r 1 should be applied instead of r 2 , we denote it with r 1 > r 2 .The introduced priority rules together with the existing rules are checked, and if any conflict is found, other priorities rules are introduced.
For every context and information given, the above analysis and conflict resolution provide the DSAs that apply to the data.The DSAs are used as regulatory rules between entities while processing the data and permit the compliance of all the various requirements and constraints: security, business, and legal.

Case representation and analysis
In this section, we continue with our use case that was already introduced in the previous sections.In our use case, we deal with an EU e-health scenario, where we want to model the data sharing agreements between different entities, for sharing and using the data.The different entities are the patients P ¼ fP 1 ; P 2 ; . ..g, the service providers that are the hospitals H ¼ fH 1 ; H 2 ; . ..g, and the doctors D ¼ fD 1 ; D 2 ; . ..g, that work in hospitals.Every patient has his associated data.The data can be of three types: prescriptions: Presc(data), e.g., blood pressure, analyses, X-rays; private prescriptions: PData(data), e.g., anti-depressive treatments; personal information: PInfo(data), e.g., contact and emergency contacts.
Every patient (P) has a family doctor: FDoc(D, P).When the patient is treated/hospitalized in an hospital he has also the treating doctors: TDoc(D, P).In this case, for D to be the treating doctor of P, then D should work in the same hospital where P is treated/hospitalized, as follows: TDocðD; PÞ when HospðP; HÞ ^ WorkðD; HÞ: The patient's data are used, accessed, and shared between different entities by respecting their DSAs.The first step is to agree on the terms of the DSAs, where some DSAs terms, usually legal ones, are irrefutable.Let us give some of the DSAs terms for our scenario.
1.The family doctor can access to all the data of the patient.2. The patient can access to all his data.3. The treating doctor can access to the prescription data of the patient.4. The hospital regulation says that the treating doctor can access the patient's data during his working time, and while he is in the hospital.5.The treating doctor cannot access the other types of the patient's data (private prescriptions and personal information), and cannot access to all types of data when he is not in the hospital, or not during his shift.
6. Nobody else can access the data.
The family doctor accesses to all the patient's data, described as follows8 : Accessðdata; P; permittedÞ FDocðD; PÞ ^ OwnerðP; dataÞ: Rule 2 states that the patient can access to all of his data: Accessðdata; P; permittedÞ OwnerðP; dataÞ: Rule 3 states that the treating doctor is permitted to access the patient's prescriptions.Rule 4 states that the treating doctor can access the patient's data during his shift, shift(Doctor), and while he is in the hospital.For ensuring the latter, we use hospP(Hospital, Location) and pos(D, Location).The other accesses, e.g., when the doctor is not in the hospital, not during his working hours are not allowed, rule 5. Below, we represent rule 3 and 4 together, and rule 5.
Accessðdata; D; permittedÞ TDocðD; PÞ ^ OwnerðP; dataÞ Accessðdata; D; deniedÞ TDocðD; PÞ ^ OwnerðP; dataÞ ^ ðPDataðdataÞ _ PInfoðdataÞ _ not shiftðDÞ _ ðposðD; A patient might be in an emergency situation (means a high level of risk for his life): Emerg(P, H).When the patient is an emergency situation, the policies for the legislative and business rules are as below.
7. The treating doctor can access the prescriptions of the patient, when the patient is in an emergency situation, and the doctor is in the hospital and during his shift.8.The treating doctor can access the personal information of the patient, e.g., for notifying the family members, when the patient is in an emergency situation, and the doctor is in the hospital and during his shift.9.The treating doctor cannot access the patient's private prescription, when the patient is in an emergency situation.
Accessðdata; D; permittedÞ EmergðP; HÞ ^ TDocðD; PÞ PrescðdataÞ ^ OwnerðP; dataÞ posðD; Accessðdata; D; permittedÞ EmergðP; HÞ ^ TDocðD; PÞ ^ PInfoðdataÞ ^ OwnerðP; dataÞ Accessðdata; D; deniedÞ EmergðP; HÞ ^ TDocðD; PÞ PDataðdataÞ ^ OwnerðP; dataÞ Our analysis finds that rule 7 is a sub-case of rule 3. Thus, we can remove rule 7. Rules 8 is in conflict with rule 5.This type of conflict is captured by the argumentation reasoning analysis.For solving this conflict, we put a preference relation stating that the emergency situation has higher priority.Thus, the doctor can access the personal information of the patient, represented as rule 8 > rule 5.The redundancy analysis finds that rule 9 represents the same policy as rule 5, where actually the latter is more generic.For improving the efficiency of our DSAs, rule 9 can be removed.
As described in the previous sections, the patient (P) may decide to be part of a research study for a particular disease, Study(P).In this case, his data are shared with third parties, e.g., research institutes, insurance company, government entities.Some of the uses that these entities can do with the data are for studies in particular diseases, creating adequate health campaign or stipulating new insurance plans.How and to whom the data are shared depends on their freshness, (Fresh), and accuracy, (Accurate), and the entity to whom they are shared.The entities are divided into basic (e.g., insurance company), silver (e.g., government entities), and gold (e.g., research institutes) members, depending on the type of agreements they have with the data subject and controller.The rules that apply in the patient data are as below9 : 10.The gold members (Gold) can access to the patients normal and private prescriptions that are fresh and accurate.11.The silver members (Silver) can access to the patients normal and private prescriptions that can be accurate or fresh.12.The basic members (Basic) can access to the patients normal prescriptions that are not accurate, but can be fresh.
For the last case, if the data is accurate, an impoverishment process takes place, that takes part of the data accuracy out, (Cast).

Accessðdata
The above rules are in conflict with rule 6, where nobody else except the doctors can access the data.As the patient is part of the study, the above rules are stronger then rule 6, represented as rule 10 > rule 6, rule 11 > rule 6, and rule 12 > rule 6.
The legislation says that the medical data of the patient can be shared just between entities inside the EU or EEA.Thus for the above case we have that10 : 13.If a member that is part of the study is not part of the EU/EEA, than it cannot access the patient's data.

Conclusion and future research directions
The purpose of this research was to extend the 'data manufacturing' concept of previous decades, to a data-intensive environment across organisational, individual and country boundaries, where 'data products' are accessible to different entities who have the approved rights on them.By using the data processing approach, we unfold the potentialities of data marketplaces, where tailored "data products" can be created, shared, sold, used or even accessed by various entities.We also identify the emerging areas of interest arising within the context of "Big Data".Within this context, we argue that data can be processed and can create value through tailoring techniques across organisational boundaries with the help of DSAs and usage control rules while developing data products/services as well as disrupting existing business models to facilitate such a change.
Future research should consider that data nowadays are multiform and multi-source; therefore new approaches are required for value extraction appropriate for any of these forms, creative enough for innovative industrial usage and extensive enough to process the masses of heterogeneous data.Data processing and manufacturing approaches should be investigated further for the path to better data quality, along with new frameworks to describe and track data processing in different industrial applications.Except for data quality, data privacy is an emerging concern, as often serious threats arise when the datasets are shared among third parties.Thus, the ways to prevent such issues provided a new research agenda around trust and shared responsibility among the actors and entities involved.Furthermore, data collection, processing and storage techniques and methods is a progressively expanding research area, as there is a wide interest in how data can be generated and exploited and therefore what are the right tools and methods for that.Data generation and exploitation strategies can also focus on the organisational aspects as well as the capabilities and skills the firms should acquire for building innovation in data-intensive contexts.Moreover, a strategic way of coupling multi-source data in different innovative ways while creating data product/services and new value streams for the companies should be proposed and explored further the necessary capabilities, skills and innovative ways of handling and processing the data.
Table 1 Studies with focus on data manufacturing process.

Study
Description Key points [66] Analysis of data quality literature following "data manufacturing" analogy as a framework for literature synthesis -Data quality issues -Data aspects -Data manufacturing/ processing -Data products [1] Review of the information manufacturing stages through a discussion, raising issues of data quality -Extracting information from databases -Information manufacturing/processing -Data quality [2] Introducing the problem of "information product" quality providing an assessment model of information quality (tracking information attributes-timelines, accuracy and cost) -Information manufacturing/processing -Information products -Information manufacturing systems -Information quality -Data quality [6] Discussing the role of data quality for the effective use of information systems -examining the tools, concepts and techniques around data quality -Data reliability -Data quality in programming and databases -Data quality assessment [13] Literature review on "data" definition (5 approaches) expanding the discussion to data quality issues, introducing 4 dimensions (accuracy, completeness, consistency, current-ness) -Data definition -Data quality dimensions [56] Information as an inventory approach, involving a 3 stage process (raw materials-process-finished goods) -Information processing/ manufacturing -Input-process-output for information manufacturing -Data as a key resource [65] Positioning the "data quality" problem in Total Data Quality Management (TDQM) perspective proposing a methodology around TDQM -Information product -Information product characteristics -Information manufacturing -Information quality assessment -Information manufacturing systems [66] An attribute-based model for data quality assessment requirements analysis to serve quality indicator identification -Data quality -Data manufacturing -Data management -Data quality requirements [67] A framework for data quality following a data consumer perspectiveorganizing data quality dimensions (intrinsic, contextual, representational, accessible).A survey approach is followed to test and refine the framework -Data manufacturing systems -Data raw material -Data products -Data consumers [58] Conceptualizing data quality in a context of organizational processes, procedures, roles employed in collecting, processing, distributing and using data -Data quality -Data from multiple sources -Databases -Information for decision making [35] Information quality assessment tool developed through surveys in 5 organizations -Information quality -Information product -Information quality assessment [51] Data quality assessment metricshighlighting the distinction between "data" and "information" (information is presented as processed data) -Data and information -Processed data (information as an outcome) -Data quality dimensions [20] Information quality benchmarks for product and service performance introducing a measurement instrument for Information Quality dimensions -Information quality benchmarks -Dimensions of Information Quality [54] Discussion about poor data quality and its impacts (operational, strategic, tactical) on the enterprises and their customers in the "Information Age" -Information ecology -Data quality issues -Data accuracy [9] "Information Ecology" approach as a holistic view of the information environment (endogenous and exogenous), distinguishing "data", "information" and "knowledge" as distinct aspects -Data storage -Databases mastering information complexity -Technology as enabler of information improvement -Data, information, knowledge [55] Methodologies for data quality programs in Information Age enterprises -Aspects of data management -Management roles around data -Data quality design [37] Overview of data and information quality landscape providing a framework for categorization of topics and methods Studies with focus on data manufacturing process Study Description Key points [18] Data quality attached to the development and delivery of products/services as a crucial quality factor suggestions of a 4-part quality program -Dimensions of data quality -Data tracking and quality control [19] Proposing the use of control charting methods for data quality monitoringstressing the data quality problem as crucial in Data Analytics within 2010ies -Data production process -Data quality -Data analytics [38] Decision-making based on integrated, high quality information (systems theory in decision-making) highlighting themes for data warehousing decision-making (integration, implementation, intelligence, innovation) -Data warehousing challenges -Input-process-output -Data-based decision-making -Aspects of data-based decision-making (integration, implementation, intelligence, innovation) [16] Ways of producing, organizing and analysing datapresenting data quality problem in the notion of SCM problems -Data quality -Monitoring and controlling data quality (A_presc), at the instant of time T.Example 3. Bob can access Alice's prescriptions when he is inside the hospital.For ensuring that Bob is inside the hospital we use pos(Bob, Location) that gives Bob's location, and hospP (Hospital, Location) that gives the hospital geographical location.Thus, we use the function same(L 1 , L 2 ) that checks if the two given locations L 1 , L 2 , are the same or are geographically nearby each other, to be considered the same location.

-
Data and information quality -Data quality methods -Data quality topics E. Karafili et al. / Computers in Industry 94 (2018) 52-61