Correct Audit Logging in Concurrent Systems

Audit logging provides post-facto analysis of runtime behavior for diﬀerent purposes, including error detection, amelioration of system operations, and the establishment of security in depth. This necessitates some level of assurance on the quality of the generated audit logs, i


Introduction
Reliable audit logging is essential to provide secure computation through the after-the-fact analysis of the audit log. Audit logging is used along with preventive security mechanisms to enable in-depth enforcement of security. In-depth enforcement of security refers to multiple layers of pre-execution, runtime, and post-execution techniques to ensure the legitimacy of the computation. Examples abound, e.g., a medical records system that enforces preventive measures including user authentication, static/dynamic information flow analysis to prevent leakage or corruption of data [1], and access authorization, e.g., to deny illegitimate access of certain users to certain medical data. Moreover, the system engages in the collection and a posteriori analysis of audit logs for different purposes, including the satisfaction of accountability goals, e.g., established by Health Insurance Portability and Accountability Act (HIPAA) [2,3], reinforcement of access control [4,5], etc.
Using audit logging along with preventive mechanisms has two major applications: 1) Post-execution analysis of audit logs provides a platform to detect security violations based on the logged evidence [6,7]. This class of logging policies rely on the notions of user accountability and deterrence. 2) Audit logging is used to detect existing vulnerabilities in the preventive security mechanisms and ameliorate those mechanisms [8,9].
In both of the aforementioned applications, effectiveness of in-depth security relies on the correctness and efficiency of the generated audit log and its after-the-fact analysis. Audit log correctness and efficiency reflect on some challenges in the generation of audit logs. Correct audit logging must record factual information about the runtime behavior, which may be ensured by the verification of auditing policies and their runtime enforcement mechanisms. Moreover, a correct audit log must include sufficient information according to what the auditing policy specifies. In addition, efficiency of audit logging must be emphasized in in-depth security in order to improve system performance regarding the collection and analysis of audit logs. Efficient audit logging entails to only record necessary information about the computation, rather than naively collecting all events in the log. These issues have been challenging in the wild, for instance resulting in failure to safeguard against data breaches, and are considered as one of the top ten most critical security risks by Open Web Application Security Project [10].
To establish a formal foundation for audit logging, a general semantics of audit logs has been defined [7,11] using the theory of information algebra [12]. This line of work helps to study whether the mechanism of enforcing an audit logging policy is correct and efficient on the basis of the proposed information-algebraic framework. Both program execution traces and audit logs are interpreted as information elements in this framework. In essence, the relation between the information in execution traces and the information in audit logs is formulated according to the established notion of information containment. An audit log is defined correct if it satisfies this relation. This formulation facilitates the separation of the specification of auditing requirements from programs, which is of great value in practice. This way, rather than manual inlining of audit logging in the code, algorithms can be proposed that automatically instrument the code with audit logging capabilities. The semantic framework enables algorithms for implementing general classes of specifications for auditing and establish conditions that guarantee the enforcement of those specifications by such algorithms.
The aforementioned line of work relies on the proposed semantic framework for audit log generation whose implementation model is constrained to linear process executions. This limitation, in practice, restricts the application of the framework to systems where a single thread of execution is involved in the generation of audit logs. For instance, in the case study of a medical records system [11], audit logging capability is considered as an extension to the web server program, and all preconditions for logging depend on the events that transpire in the same program execution thread. As an example consider breaking the glass event [13]. Breaking the glass is used in critical situations to bypass access control. By breaking the glass, system users increase their authority in the system in order to gain access to certain data, but simultaneously admit to be accountable for their actions. Breaking the glass event is a precondition to log accesses to particular patient information. Instrumentation of medical records web server guarantees correct audit logging as long as such events occur in the execution trace of the single-threaded web server. This eliminates the possibility of distributing authentication and authorization tasks to other concurrent components of the system. Such restriction encourages us to study the semantics of audit logging in concurrent environments that underlies correct instrumentation of multithreaded and multiprocess applications for auditing purposes. Indeed, realworld examples of inadequate logging and monitoring in concurrent and distributed systems, e.g., a recent security incident in a retailer's network of POS systems [14], demonstrates how crucial it is to ensure the correctness of audit logging mechanisms.
The proposed semantic framework needs to provide a mechanism to specify auditing requirements based on concurrent execution traces. Our framework needs to be general enough to encompass different audit log generation and representation approaches as its instances. The generality of information theoretic models have already been shown in this realm [11]. We demonstrate that such a model can be used for concurrent systems. We use the model to interpret audit logs, specify audit logging requirements and define correct enforcement of such requirements in concurrent systems. Similar to the previous work on linear process executions, correctness of log in concurrent environments is conditioned on the specifications of auditing requirements through the comparison between the information contained in the log and the information advertised by those specifications.
We instantiate our general model with a sufficiently expressive language in order to specify and enforce auditing requirements in concurrent environments. Horn clause logic is a proper language for this purpose due to straightforward modeling of execution traces as sets of facts, sufficient expressivity to specify auditing requirements and available logic programming implementations, e.g., [15,16].
A formal language model is used to specify and establish correct enforcement of audit logging policies in concurrent systems according to the developed framework. We use a variant of π-calculus [17] with unlabeled reduction semantics for this purpose. This formalism provides a model for developing tools with correctness guarantees. This model enjoys the following features.
• Concurrency: In order to specify multithreaded programs and multiprocessor systems, the language model supports concurrent process executions with interprocess communication (IPC) through message passing. • Generality: While different process calculi are potential choices to describe our implementation model, we use a variant of π-calculus due to its sufficiently concise and high level syntax and semantics to describe interactions among processes. This facilitates formulation of a wide range of concurrent systems. • Timing: We need to be able to specify the ordering of interesting events for the sake of specifying auditing requirements. For example, in break the glass policies access to particular patient information is logged as long as the glass is already broken. In order to implement such specifications, we need to apply a timing mechanism that is shared among all processes of the system. Each step of concurrent execution of processes updates this universal time. • Named functions: To specify auditing requirements, a fundamental unit of secure operations is required.
Functions can be considered as abstractions of these fundamental units in different languages and systems.
Our language model supports named functions, in terms of sub-agents of each agent.
Using the formalism with aforementioned features enables us to model concurrent environments that guarantee the correct generation of audit logs according to the developed semantic framework. In this paper, we propose an instrumentation algorithm that receives a concurrent system as input and modifies the system according to a precise specification of audit logging requirements. We show that this algorithm is correct (based on the semantic framework), and hence the instrumented concurrent system generates correct audit logs. As implied earlier, enforcement of audit logging policies through code instrumentation separates policy from code, provides a foundation to study the effectiveness of enforcement mechanism using formal methods, and can be applied automatically to legacy code to enhance system accountability.
Since our model is based on process calculi, IPC is handled by message-passing. Modeling alternative IPC approaches for correct audit logging, e.g., shared memory and/or files, is considered as potential future work.
Case studies that benefit from the result of this work include deployment of correct logging capabilities in multiprocess and multithreaded client-server and peer-to-peer applications, microservices, etc. While this paper provides a prototype instrumentation algorithm in abstract settings, as a future work, we aim to deploy our existing instrumentation algorithm in Spring Boot [18], a Java microservices framework, that facilitates code instrumentation through aspect-oriented programming [19].

Paper Outline
The rest of the paper is organized as follows. In Section 1.2, we discuss an illustrative example for audit logging in concurrent systems and in Section 1.3 we discuss the threat model. Section 2 reviews the informationalgebraic semantics of audit logging and instantiation of the model with first-order predicate logic. Section 3 discusses the implementation model in detail. In particular, Section 3.1 introduces the syntax and semantics of the source system, a variant of π-calculus. In Section 3.2, we study a class of logging specifications that assert temporal relations among function invocations, potentially in different concurrent components of a system. Section 3.3 studies the syntax and semantics of the target system. In Section 3.4, we propose our instrumentation algorithm, along with the properties of interest that the algorithm satisfies. The proofs of these properties are given in our accompanying Technical Report [20]. In Section 4, related work is discussed. Finally, Section 5 concludes the paper and specifies future work.

An Example: Microservices-based Medical Records Systems
In this section, an oversimplified example is given that illustrates the application of audit logging in concurrent environments. We will revisit this example later in the paper (through Examples 3.1 and 3.2) to explain sample instantiations of our formal framework.
Many applications have been shifting their architecture from a traditional monolithic structure toward service-oriented architecture (SOA) in order to boost maintainability, continuous deployment and testing, adaptation to new technologies, system security, fault tolerance, etc. One popular deployment approach to SOA is where an application is decomposed to a set of highly collaborative processes, called microservices. A microservice must be minimal, independent, and fine-grained. Minimality constrains a microservice to access and manipulate certain data types within an application, ideally a single database per each service. Microservice instances run independently in their own containers, virtual machines, or hosts. To accomplish its own goals, a microservice communicates with other microservices of the application through message passing, or remote procedure calls (RPCs). Jolie [21] is the programming language for developing applications with microservices architecture. Its formal semantics [22,23] is defined as a process calculus, inspired by π-calculus.
The need to better streamline healthcare services is pushing medical record systems toward microservices [24]. In fact, a new study shows that microservices-based healthcare is anticipated to experience fivefold increase in market value within the next few years [25].
In what follows we describe a simple example of microservices-based medical records system, where audit logging for certain events is necessary, as dictated by the accountability requirements. Figure 1 depicts a medical records system with microservices architecture that includes Authorization and Patient services (among others). Application front-end includes API gateway that multiplexes user requests (from different clients, e.g., web, mobile applications, etc.) to certain microservices. Patient microservice manages the information about patients, e.g., their medical history. Authorization microservice handles different operations to authorize access to system data, including e.g., breaking the glass.
As mentioned earlier, by breaking the glass, the user agrees to comply with accountability regulations. The common solution to follow accountability regulations is to generate trail of audit logs at runtime. One such audit logging requirement may be as follows: "Any attempt by a healthcare provider to read patient medical data must be stored in the log, if that provider has already broken the glass." This example demonstrates the core ideas that we are pursuing in this paper:  components, e.g., a medical records system with different microservices including the ones described above. • Audit logging requirements necessitate logging certain events in a concurrent component, provided that a set of other events have previously occurred in potentially other concurrent components, e.g., logging the event of reading medical history in Patient microservice if the glass is already broken in Authorization microservice. • We investigate an algorithmic approach to establish correct audit logging for concurrent environments according to the already-established audit logging requirements. We expect correctness of audit logging in our medical records system example, for instance, to imply only logging the reading attempts by the user who has broken the glass. This avoids missing any logging event, as well as logging unnecessary events. We accomplish this by instrumenting Authorization and Patient microservices, and in particular the operations of interest, i.e., breaking the glass and reading patient medical history.

Threat Model
We assume that the concurrent system subjected to instrumentation is not supporting audit logging in the first place, or is suffering from either insufficient or overzealous audit log generation. However, we assume that the concurrent system that is deployed according to our implementation model, passes both static and dynamic checks, e.g., syntactic checks, type checks, compilation, interpretation, etc. We trust the compiler/interpreter, and the runtime environment in which that system is being executed. Moreover, we trust the implementation of our instrumentation algorithm, and its compilation and/or interpretation, along with the runtime environment in which the instrumentation algorithm is executed. We also trust the integrity of logging specifications that assert audit logging requirements. Finally, we trust the compilation/interpretation process for the instrumented concurrent system that is deployed based on our implementation model, as well as the system's runtime environment. Security of the messages transmitted between concurrent components and the generated audit logs is considered to be out-of-scope. These assumptions help us to purposefully focus on the essence of logging, i.e., whether logs are generated correctly in the first place and independent of external concerns including reliability of the underlying execution and communication system, latency, etc. which are explored in related work (Section 4).

Semantics of Audit Logging
In order to provide a standalone formal presentation, in this section we review the information-algebraic semantics of audit logging and the instantiation of the semantic framework with first-order logic, which is originally proposed by Amir-Mohammadian et al. [11]. We have applied minor modifications to the model to better suit concurrency and nondeterministic runtime behavior, inherent to concurrent systems.

Information-Algebraic Semantic Framework
In order to specify how audit logs are generated at runtime, we need to abstract system states and their evolution through the computation. A system configuration κ abstracts the state of the system at a given point during the execution. Let K denote the set of system configurations. We posit a binary reduction relation among configurations, i.e., (−→) ⊆ K × K which denotes the computational steps, and is used in the standard infix form. 3 A system trace τ is a potentially infinite sequence of system configurations, i.e., τ = κ 0 κ 1 · · · , where κ i is the ith configuration in sequence, and κ i −→ κ i+1 . We denote the set of all traces by T , and define prefix (τ ) as the set of all prefixes of τ .
Information algebra is used to define the notion of correctness for audit logs. In Section 2.2, we instantiate this abstract algebraic structure to model a specific class of audit logging requirements. We define an information algebra in the following.
Definition 2.1 (Information algebra) An information algebra (Φ, Ψ) is a two-sorted algebra consisting of an Abelian semigroup of information elements, Φ, as well as a lattice of querying domains, Ψ. Two fundamental operators are presumed in this algebra: a combination operator, (⊗) : Φ × Φ → Φ, and a focusing operator, An Information algebra (Φ, Ψ) satisfies a set of properties, in connection to combination and focusing operators. 4 We let X, Y, Z, · · · to range over elements of Φ, and E range over Ψ.
X, Y ∈ Φ are information elements that can be combined to make a more inclusive information element X ⊗ Y . E ∈ Ψ is a querying domain with a certain level of granularity that is used by the focusing operator to extract information from an information element X, denoted by X ⇒E . For example, relational algebra is an instance of information algebra, in which relations instantiate information elements, sets of attributes instantiate querying domains, natural join of two relations defines the combination operator, and projection of a relation on a set of attributes defines the focusing operator [12].
Combination of information elements induces a partial order relation ( ) ⊆ Φ × Φ among information elements, defined as follows: As part of the semantics of audit logging, we treat execution traces as information elements, i.e., the information content of the execution trace. To this end, we posit · : T → Φ as a mapping in which, intuitively, τ refers to the information content of the trace τ . We also impose the condition that · be injective and monotonically increasing, i.e., if τ ∈ prefix (τ ) then τ τ . This ensure that as the execution trace grows in length, it contains more information.
In the following definition, we define audit logging requirements in an abstract form. We call this abstraction a logging specification. This definition is abstract enough to encompass different execution models, as well as different representations of information. In Sections 2.2 and 3.2, we instantiate this definition with a more concrete structure that guides us on how to implement audit logging requirements.

Definition 2.2 (Logging specifications)
Logging specification LS is defined as a mapping from system traces to information elements, i.e., LS : T → Φ. Intuitively, LS (τ ) declares what information must be logged, if the system follows the execution trace τ .
Note that even though · and LS have the same signature, i.e., maps from traces to information elements, they are conceptually different. τ is the whole information contained in τ , whereas LS (τ ) is the information that is supposed to be recorded in the log, if the system follows the execution trace τ .
We denote an audit log with L which represents a set of data, gathered at runtime. Let L denote the set of audit logs. In order to judge about the correctness of an audit log, the information content of the audit log needs to be studied in comparison to the information content of the trace that generates that audit log. To this end, we define a mapping that returns the information content of an audit log. We abuse the notation and consider · : L → Φ as such mapping. Therefore, L refers to the information content of the audit log L. We assume that · on audit logs is injective and monotonically increasing, i.e., if L ⊆ L then L L . Therefore, the more inclusive the audit log is, it contains more information.
The notion of correct audit logging can be defined based on an execution trace and a logging specification. To this end, the information content of the audit log is compared to the information that the logging specification dictates to be recorded in the log, given the execution trace. The following definition captures this relation.

Definition 2.3 (Correctness of audit logs)
Audit log L is correct wrt a logging specification LS and a system trace τ iff both L LS (τ ) and LS (τ ) L hold. The former refers to the necessity of the information in the audit log, and the latter refers to the sufficiency of those information.
A system that generates audit logs at runtime includes the stored logs as part of its configuration. Let the mapping logof : K → L denote the residual log of a given system configuration, i.e., logof (κ) is the set of all recorded audit logs in configuration κ. It is natural to assume that the residual log within configurations grows larger as the execution proceeds. The residual log of a trace is then defined using logof . Definition 2.4 (Residual log of a system trace) The residual log of a finite system trace τ is L, denoted by τ L, iff τ = κ 0 κ 1 · · · κ n and logof (κ n ) = L.
Note that if τ L, then L is not necessarily correct wrt a given LS and a trace τ . If the residual log of a trace is correct throughout the execution, then that trace is called ideally-instrumented. System trace τ is ideally instrumented for a logging specification LS iff for any trace τ and audit log L, if τ ∈ prefix (τ ) and τ L then L is correct wrt τ and LS . Indeed audit logging is an enforceable security property on a trace of execution [11]. Given a logging specification, ideally-instrumented traces induce a safety property [26], and hence implementable by inlined reference monitors [27], and edit automata [28].
Let s be a concurrent system with an operational semantics. s ⇓ τ iff s can produce trace τ , either deterministically or non-deterministically, and τ ∈ prefix (τ ). We abuse the notation and use κ ⇓ τ to denote the same concept for configuration κ. We follow program instrumentation techniques, in order to enforce a logging specification on a system. An instrumentation algorithm receives the concurrent system as input along with the logging specification, and instruments the system with audit logging capabilities so that the instrumented system generates the required "appropriate" log. An instrumentation algorithm is a partial function I : (s, LS ) → s that instruments s according to LS aiming to generate audit logs appropriate for LS . We call s the source system, and the instrumented system, i.e., I(s, LS ) = s , the target system. Source and target traces refer to the traces of the source and target systems, resp.
It is natural to expect that the instrumentation algorithm would not modify the semantics of the original system drastically. The target system must behave roughly similar to the source system, except for the operations related to audit logging. We call this attribute of an instrumentation algorithm semantics preservation, and define it in the following. This definition is abstract enough to encompass different source and target systems (with different runtime semantics), and instrumentation techniques. The abstraction relies on a binary relation :≈, called correspondence relation, that relates the source and target traces. Based on different implementations of the source and target systems, and the instrumentation algorithm, the correspondence relation can be defined accordingly. Another related property of the instrumentation algorithm is to ensure that it is deadlock-free, meaning that instrumenting a system does not introduce new states being stuck. One approach to define an independent notion of deadlock-freeness is to consider bisimilar source and target traces. Indeed, additional formal constructs must be introduced to translate target traces to source traces for this purpose. Our definition of deadlockfreeness, however, heavily relies on the notion of semantics preservation, and is not a required component in the definition of instrumentation correctness (Definition 2.7). Let source system s generate trace τ , and I(s, LS ) generate trace τ such that τ :≈ τ . Then, we call I(s, LS ) being stuck if s can continue execution following τ (at least for one extra step), while I(s, LS ) cannot continue execution following τ . Definition 2.6 (Deadlock-freeness of the instrumentation algorithm) Instrumentation algorithm I is deadlock-free iff for any source system s, logging specification LS , traces τ and τ , and configuration κ, if s ⇓ τ , I(s, LS ) ⇓ τ , τ :≈ τ , and s ⇓ τ κ, then there exists some configuration κ such that I(s, LS ) ⇓ τ κ .
Besides these properties, another important feature of an instrumentation algorithm is the quality of audit logs generated by the instrumented system. The information-algebraic semantic framework provides a platform to define correct instrumentation algorithms for audit logging purposes. Let s be a target system, and τ be a source trace. Simulated logs of τ by s is the set simlogs(s, τ ) defined as simlogs(s, τ ) = {L | ∃τ .s ⇓ τ ∧ τ :≈ τ ∧ τ L}. Using this set, we can define correctness of instrumentation algorithms in a straightforward manner. Intuitively, the instrumentation algorithm I is correct if the instrumented system generates audit logs that are correct wrt the logging specification and the source trace. This must hold for any source system, any logging specification, and any possible log generated by the instrumented system. Definition 2.7 (Correctness of the instrumentation algorithm) Instrumentation algorithm I is correct iff for all source systems s, traces τ , and logging specifications LS , s ⇓ τ implies that for any L ∈ simlogs(I(s, LS), τ ), L is correct wrt LS and τ .

Instantiation of Logging Specification
In Definition 2.2, logging specification is defined abstractly as a mapping from system traces to information elements. For a more concrete setting, this definition needs to be instantiated with appropriate structures in a way that is useful in the deployment of audit logging. In essence, we need to instantiate information algebra (Definition 2.1). We are interested in logical specification of audit logging requirements due to its easiness of use, expressivity power, well-understood semantics, and off-the-self logic programming engines for subsets of first-order logic (FOL), e.g., Horn clause logic. To this end, in this section, we instantiate information algebra with FOL, which is expressive enough to specify computational events, and the temporal relation among them.
Indeed, other variants of logic may also be considered for this purpose.
In order to instantiate information algebra, it is required to specify the contents of the set of information elements Φ and the lattice of querying domains Ψ, along with the definitions of combination and focusing operators. Definitions 2.8, 2.9, and 2.10 accomplish these instantiations. Definition 2.8 instantiates an FOL-based set of information elements. An information element in our instantiation is a closed set of FOL formulas, under a proof-theoretic deductive system.  Lastly, Definition 2.10 instantiates the combination and focusing operators for the FOL-based information algebra. Combination is the closure of the union of two sets of formulas. Focusing is the closure of the intersection of an information element and a query domain.

Definition 2.10 (Combination and focusing in
(Φ FOL , Ψ FOL ) is an information algebra, given the Definitions 2.8, 2.9, and 2.10. 5 In order to use (Φ FOL , Ψ FOL ) as a framework for audit logging, we also need to instantiate the mapping · , introduced in Section 2.1, to interpret both execution traces and audit logs as information elements. Now we can instantiate logging specification LS in the information algebra (Φ FOL , Ψ FOL ). To this end, a set of audit logging rules and definitions are assumed to be given in FOL. Let Γ be this set. Moreover, a set of predicate symbols are assumed that reflect on the predicates whose derivation need to be logged at runtime. This set is denoted by S. A logging specification in this setting, receives a trace τ , combines the information content of τ with closure of Γ, and then focuses on the predicates specified in S. Intuitively, given Γ and S, a logging specification maps a trace τ to the set of all predicates whose symbols are in S, and are derivable given rules in Γ and the events in τ .

Implementation Model on Concurrent Systems
In this section, we propose an implementation model for correct audit logging in concurrent systems. To this end, we use a variant of π-calculus to specify the concurrent system, and propose an instrumentation algorithm that retrofits the system according to a given logging specification. We then specify and prove the properties of interest, including the correctness of the instrumentation algorithm (Definition 2.7). In Section 3.1, the syntax and semantics of the source system model is introduced. Section 3.2 proposes a class of logging specifications that can specify temporal relations among computational events in concurrent systems. Section 3.3 describes the syntax and semantics of the systems enhanced with audit logging capabilities. Lastly, in Section 3.4, we discuss the instrumentation algorithm and the properties it satisfies.

Source System Model
We consider a core π-calculus as our source concurrent system model, denoted by Π. One major distinguishing feature of π-calculus is modeling mobile processes using the same category of names for both links and transferable objects, along with scope extrusion. However, mobility is not used in our implementation model. Therefore other seminal process calculi e.g., CSP [29] and CCS [30] can also be considered for this purpose. We employ π-calculus due to its concise syntax and simple semantics that provides a clean and sufficiently abstract specification of the required interactions among concurrent components of the system. The syntax and semantics of the source system are defined in the following. It is based on the representation of the calculus given in [31] which deviates from standard π-calculus by dropping silent prefixes, unguarded summations and labeled reduction system, for the sake of simplicity and conciseness.

Syntax
Let N be the infinite denumerable set of names, and a, b, c, · · · and x, y, z, · · · range over them.
Prefixes Prefixes α are defined as α ::= a(x) |āx. Prefix a(x) is the input prefix, used to receive some name with placeholder x on link a. Prefixāx is the output prefix, used to output name x on link a.
Agent names and processes Let A, B, C, D range over agent names, and A be the finite set of such names. Processes P are defined as: P ::= 0 | α.P | (P |P ) | (νx)P | C(y 1 , · · · , y n ). 0 refers to the nil process. α.P provides a sequence of operations in the process; first input/output prefix α takes place, and then P executes. P |P provides parallelism in the system. (νx)P restricts (binds) name x within P . C(y 1 , · · · , y n ) refers to the invocation of an agent C with parameters y 1 , · · · , y n . Let P, Q, R range over processes.
Free and bound names Name restriction and input prefix bind names in a process. We denote the set of free names in process P with fn(P ). α-conversion for bound names is defined in the standard way.
Notational conventions A sequence of names is denoted byã, i.e., a 1 , · · · , a l for some l. A sequence of name restrictions in a process (νa 1 )(νa 2 ) · · · (νa l )P is shown by (νa 1 a 2 · · · a l )P , or in short (νã)P . We skip specifying the input name, if it is not free in the following process, i.e., a.P refers to a(x).P where x / ∈ fn(P ). a.P refers to outputting a value on link a that can be elided, e.g., due to lack of relevance in discussion.
Codebases Agent definitions are of the form A(x 1 , · · · , x n ) P . Let's denote the set of agent and subagent definitions with D. We assume the existence of a universal codebase C U consisting of agent definitions of such form. This codebase is used to define top-level agents. A top-level agent corresponds to a concurrent components of the system. Top-level agents are supposed to execute in parallel and occasionally communicate with each other to accomplish their own tasks, and in aggregate the concurrent system. Let A U be the set of top-level agent names such that A U ⊂ A. Throughout the paper we let m to be the size of A U , comprising of A 1 , · · · , A m . C U is defined as a function from top-level agent names to their definitions, i.e., C U : A U → D.
Moreover, we assume the existence of a local codebase for each top-level agent, denoted by C L (A) for toplevel agent A. A local codebase consists of sub-agent (subprocess) definitions of the form B A (x 1 , · · · , x n ) P , where B is a sub-agent identifier, and A is a top-level agent identifier annotated in the definition of B. We treat sub-agents as internal modules or functions of a top-level process. Annotation of top-level agent identifier is used for this purpose, i.e., B A specifies that B is a module of top-level agent A. The set of sub-agent names is denoted by A L , defined as A − A U . C L is defined as the function with signature C L : A U → A L → D.
Note that any process and subprocess definition can be recursive, e.g., if C U (A) = [A(x 1 , · · · , x n ) P ] then A(y 1 , · · · , y n ) may appear in P . In the following, we use A and B to range over top-level agent identifiers and sub-agent identifiers, resp. We use C to range over both top-level agents, A, and sub-agents, B A . In the rest of the paper, we refer to top-level agents simply by "agents". We assume that in any (sub)process definition C(x 1 , · · · , x n ) P , we have fn(P ) ⊆ {x 1 , · · · , x n }. This ensures that (sub)processes are closed.
Initial system Let A U = {A 1 , · · · , A m }. We posit a sequence of linksc, that connect these agents in the system. Then the initial concurrent system s is defined as where P s = (νc)(A 1 (x 1 ) | A 2 (x 2 ) | · · · | A m (x m )), assuming fn(P s ) = ∅, i.e., ix i ⊆c. Configurations We define system configurations as κ ::= P , where P is the process associated with the whole system. The initial configuration is then defined as κ 0 = P s .
Substitutions A substitution is a function σ : N → N . The notation {y/x} is used to refer to a substitution that maps x to y, and acts as the identity function otherwise. {ỹ/x} is used to denote multiple explicit mappings in a substitution, wherex andỹ are equal in length. P σ refers to replacing free names in P according to σ. This is associated with renaming of bound names in P to avoid name clashes.

Semantics
In the following, we define evaluation contexts and the structural congruence between processes. These definitions facilitate the specification of unlabeled operational semantics in a concise manner.
Evaluation contexts A context is a process with a hole. An evaluation context E is a context whose hole is not under input/output prefix, i.e., E :: Structural congruence Two processes P and Q are structurally congruent under the universal codebase C U , denoted by C U P ≡ Q according to the following rules.
(i) Structural congruence is an equivalence relation. (ii) Structural congruence is closed by the application of E, i.e., (iii) If P and Q are α-convertible, then C U P ≡ Q. (iv) The set of processes is an Abelian semigroup under | operator and unit element 0, i.e., for any C U , P , Q, and R, we have C U P |0 ≡ P , C U P |Q ≡ Q|P , and C U P |(Q|R) ≡ (P |Q)|R.
We may elide C U in the specification of the structural congruence, if it is clear from the context.
Operational semantics We define unlabeled reduction system in Figure 2, using judgment C U , C L κ −→ κ . We may elide C U and C L in the specification of reduction steps, since they are static and may be clear from the context, i.e., κ −→ κ .
Note that according to structural congruence rules an agent invocation is structurally congruent to its definition (part v), and thus considered as an "implicit" step of execution according to rule STRUCT. Contrarily, rule CALL defines an "explicit" reduction step for sub-agent invocations. This is due to some technicality in our modeling: invocation of sub-agents could be logging preconditions and/or logging events (introduced in Section 3.2), and hence need special semantic treatment at the time of call (discussed later in Sections 3.3 and 3.4), e.g., deciding whether a record must be stored in the log.
For a (potentially infinite) system trace τ = κ 0 κ 1 · · · , we use notation C U , C L τ to specify the generation of trace τ under the universal and local codebases C U and C L , and according to the aforementioned unlabeled reduction system, i.e., C U , C L κ i −→ κ i+1 for all i ∈ {0, 1, · · · }.
For a system trace τ = κ 0 κ 1 · · · , system s generates τ , denoted by s ⇓ τ iff s is defined as (1), κ 0 is defined as P s , and C U , C L κ i −→ κ i+1 for all i ∈ {0, 1, · · · }. toFOL(·) instantiation for traces In order to specify a trace logically, we need to instantiate toFOL(·) according to Definition 2.11. We consider the following predicates to logically specify a trace: Comm/3, Call/4, Context/2, UniversalCB/3, and LocalCB/4. 6 Let C U , C L τ , and τ = κ 0 · · · κκ · · · . Moreover, let t denote a timing counter. We define a function that logically specifies a configuration within a trace. To this end, let the helper function toFOL(κ, t) return the logical specification of κ at time t. Essentially, toFOL(κ, t) specifies what the evaluation context and the redex are within κ at time t, defined as follows: A, B,ỹ), we treatỹ as a single list of elements, rather than a sequence of elements passed as parameters to Call, i.e., Call is always a quaternary predicate.
As an example, consider α-converted structurally equivalent processes. Let κ = a(x).(νb)xb.0|āb.0. Since κ ≡ κ = a(x).(νd)xd.0|āb.0, and κ −→ (νd)bd.0|0, we have κ −→ (νd)bd.0|0 according to the rule STRUCT in Figure 2 Logical specification of universal and local codebases, denoted by C U and C L resp., are defined as Note that in UniversalCB(A,x, P ) and LocalCB(A, B,x, P ),x is a single list of elements, rather than a sequence of elements passed as parameters to the predicates, and thus these predicates have fixed arities. We define logical specification of traces both for finite and infinite cases according to the logical specification of configurations, and universal and local codebases, i.e., using toFOL(κ, t), C U , and C L . Let C U , C L τ . If τ is finite, i.e., τ = κ 0 κ 1 · · · κ n for some n, then its logical specification is defined as It is straightforward to show that toFOL(τ ) is injective and monotonically increasing.

A Class of Logging Specifications
We define the class of logging specifications LS call that specify temporal relations among module invocations in concurrent systems. LS call is the set of all logging specifications LS defined as spec (Γ G , {LoggedCall}), where Γ G is a set of Horn clauses, called guidelines, including clauses of the form ∀t 0 , · · · , t n , xs 0 , · · · , xs n .
in which for all j ∈ {0, · · · , n}, A j ∈ A U , B j ∈ A L , xs j is a placeholder for a list of parameters passed to B j , and Call(t j , A j , B j , xs j ) specifies the event of invoking module (subprocess) B j by the top-level process A j at time t j with parameters xs j . In (2), ϕ(t 0 , · · · , t n ) is assumed to be a possibly empty conjunctive sequence of literals of the form t i < t j . Moreover, we define triggers and logging events as , · · · , n}. As an additional condition, we assume that Logevent(LS ) / ∈ Triggers(LS ). Example 3.1 describes the logging specification for the breaking the glass policy specified in Section 1.2 for a medical records system, using (2). Example 3.1 We revisit the example described in Section 1.2, a microservices-based medical records system, where breaking the glass entails logging the attempts to read patient medical history. Each microservice is treated as an agent, i.e., agents Patient and Auth correspond to Patient and Authorization microservices, resp. Let's assume that a user can break the glass by invoking brkGlass function from Auth agent. Moreover, reading patient medical history is accomplished by calling function getMedHist deployed by Patient agent. Indeed, these functions are treated as sub-agents in our calculus, i.e., we assume the existence of definitions: • C L (Auth)(brkGlass) = [brkGlass Auth (u) P ] for some process P , and • C L (Patient)(getMedHist) = [getMedHist Patient (p, u) Q] for some process Q.
Then, the logging specification for the breaking-the-glass auditing policy is LS = spec(Γ G , {LoggedCall}), where Γ G includes the clause In the clause above, t 0 and t 1 are timestamps where t 1 is preceding t 0 . p refers to the patient identifier whose medical history is requested. u is the healthcare provider (user) identifier who breaks the glass and then attempts to read the medical history of patient p. Note that u is passed as an additional parameter to both brkGlass and getMedHist. In practice, this is accomplished in microservices using access tokens. An API gateway uses access tokens to communicate the identity of the service requester. One common approach to implement access tokens is by JSON Web Tokens standard [32].

Target System Model
We define the target system model, denoted by Π log , as an extension to Π with the following syntax and semantics. The instrumentation algorithm's job is to map a system specified in Π to a system in Π log .
x is considered as a single list of names in callEvent and emit, so that they have fixed arities. callEvent(A, B,x).P, ∆, Σ, Λ) −→ (t + 1, P, ∆ , Σ, Λ) addPrecond(x, A).P, ∆, Σ, Λ) −→ (t + 1, P, ∆, Σ , Λ) sendPrecond(x, A).P, ∆, Σ, Λ) −→ (t + 1,xy.P, ∆, Σ, Λ) emit(A, B,x).P, ∆, Σ, Λ) −→ (t + 1, P, ∆, Σ, Λ ) emit(A, B,x).P, ∆, Σ, Λ) −→ (t + 1, P, ∆, Σ, Λ) A configuration κ, in Π log , is defined as the quintuple κ ::= (t, P, ∆, Σ, Λ), with the following details. t is a timing counter. P is the process associated with the whole concurrent system. Processes in Π log are defined similar to Π, without any extensions. ∆(·) is a mapping that receives an agent identifier A and returns the set of logical preconditions (to log) that denote the events transpired locally in that agent. That is, ∆(A) is a set of predicates of the form Call(t, A, B,x). Σ(·) is a mapping that receives an agent identifier A and returns the set of all logical preconditions that have taken place in the triggers, i.e., in all agents A ∈ A U , where (A , B) ∈ Triggers for some B ∈ A L . That is, Σ(A) is a set of predicates of the form Call(t, A , B,x), where (A , B) ∈ Triggers. These preconditions are supposed to be gathered by A from other agents A , in order to decide whether to log an event. Λ(·) is a mapping that receives an agent identifier A and returns the audit log recorded by that agent. Λ(A) is a set of predicates of the form LoggedCall (A, B,x). The initial configuration

Semantics
We use judgment C U , C L , Γ G κ −→ κ to specify a step of reduction in Π log . Figure 3 depicts the unlabeled reduction semantics of Π log . C U , C L , and Γ G may be elided in the specification of reduction steps since they are static and may be clear from the context. Π log inherits the reduction semantics of Π, according to rule PI. Rule CALL EV gives the reduction with prefix callEvent (A, B,x). In this case, ∆ gets updated for agent A with information about the invocation of subprocess B A . In rule ADD PRECOND, reduction with the prefix addPrecond(x, A) is specified. In this case, x is added to Σ. Rule SEND PRECOND is about the reduction with prefix sendPrecond(x, A). In this case, the set of logging preconditions that are collected by A, i.e., ∆(A), is converted to a transferable object (aka object serialization), e.g., a string of characters describing the content of ∆(A), and sent though link x. Let serialize() be the semantic function that handles this conversion. With prefix emit(A, B,x), agent A is supposed to study whether the predicate LoggedCall(A, B,x) is logically derivable from the local set of preconditions, i.e., ∆(A), the set of preconditions that are collected by other agents involved in the enforcement of the logging specification, i.e., Σ(A), and the set of guidelines Γ G . If the predicate is derivable, then it is added to the audit log of A, i.e., Λ(A). Otherwise, the log does not change. Rule LOG specifies the former case, whereas the rule NO LOG specifies the latter.
For a (potentially infinite) system trace τ = κ 0 κ 1 · · · , we use notation C U , C L , Γ G τ to specify the generation of trace τ under the universal codebase C U , local codebase C L , and set of guidelines Γ G , according to the reduction system, i.e., C U , C L , Γ G κ i −→ κ i+1 for all i ∈ {0, 1, · · · }.
The generated trace in Π log out of a target system s, i.e., s ⇓ τ , can be defined in the same style as defined in Π, i.e., by some valid initial system in Π log 8 , the initial configuration κ 0 in Π log , and the aforementioned reduction system for Π log .
The residual log of a configuration is defined as logof (κ) = L = A∈A U Λ(A), where κ = ( , , , , Λ). 9 This instantiates τ L for Π log (Definition 2.4). Since L is a set of logical literals, it suffices to define toFOL(·) for audit logs as toFOL(L) = L, which completes the instantiation of L (Definition 2.11).
Note that arbitrary systems in Π log do not guarantee any correctness of audit logging. However, there is a subset of systems in Π log that provably satisfy this property. These systems use the extended prefixes (introduced as part of Π log syntax) in a particular way for this purpose. In the following section, we introduce an instrumentation algorithm to map any system in Π to a system in Π log , and later prove that any instrumented system satisfies correctness results for audit logging.

Instrumentation Algorithm
Instrumentation algorithm I takes a Π system, defined in (1), and a logging specification LS ∈ LS call , defined in Section 3.2, and produces a system s in Π log defined as s = P s , C U , C L , where P s = (νc)(νc )(A 1 (x 1 ) | A 2 (x 2 ) | · · · | A m (x m )).c is the sequence of names of the form c ij which are all fresh, i.e., they are not used already in (1). Moreover, it is assumed that sub-agent identifiers D ij are also fresh, i.e., they are undefined in C L component of (1).
Intuitively, I works as follows.
(i) I adds new links c ij between agents A i and A j , where A i is the agent that includes a sub-agent whose invocation is considered a logging event, and A j is some agent that includes a sub-agent whose invocation is a trigger for that logging event. c ij is used as a link between A i and A j to communicate logging preconditions (by sendPrecond and addPrecond prefixes).
(ii) Regarding the invocation of a sub-agent B A , (a) if the invocation of B A is a trigger, then the execution of B A must be preceded by callEvent prefix. This way, the invocation of B A is stored in A's local set of logging precondition (∆(A)), according to the rule CALL EV. (b) if the invocation of B A is a logging event, then execution of B A must be preceded by callEvent, similar to the case above. Next, it must communicate on appropriate links (c ij s) with all other agents that are involved as triggers according to the logging specification. To this end, B A is supposed to notify each of those agents to send their collected preconditions. After receiving all those preconditions from involved agents on the dedicated links, it adds them to Σ(A). This is done using addPrecond prefixes, according to the rule ADD PRECOND. Then, it studies whether the invocation must be logged, before following normal execution. This is facilitated by emit prefix (rules LOG and NO LOG). (c) if the invocation of B A is neither a trigger nor a logging event, then that sub-agent executes without any change in behavior. (iii) Regarding the invocation of an agent A (a) if A includes a sub-agent B A whose invocation is considered a trigger, then A must be able to receive and handle incoming requests for collected preconditions. This is done by adding a subprocess to A that always listens for requests on the dedicated link (c ij ) between itself and the agent that may send such requests. Upon receiving such a request, it sends back the preconditions, handled by prefix sendPrecond according to the rule SEND PRECOND, and then continues to listen on the link. (b) if A does not include any trigger invocation of a sub-agent, then A executes without any changes.
Formally, the details of the returned system s are as follows:  Note that D ij is defined recursively to facilitate listening on c ij indefinitely for incoming requests about logging preconditions. In addition, since c ij is fresh, c ij / ∈ fn(P ). Therefore, P cannot communicate on this link, e.g., to compromise logging attempts. Figure 4 illustrates the established links between sub-agents of different   agents according to the guideline defined in (2). These links are used to communicate logging preconditions between the logging event and the triggers. Example 3.2 depicts how the medical records system described in Section 1.2 is instrumented according to the specified instrumentation algorithm, and the logging specification given in Example 3.1. In addition subprocess D PA is added to agent Auth that indefinitely responds to the requests from Patient on link c PA , defined as: Figure 5 illustrates the established link between the sub-agents of the two agents.
3.4.1 Instantiation of :≈ According to Definition 2.5, semantics preservation relies on an abstraction of correspondence relation :≈ between source and target traces. In this section, we instantiate this relation for I. We define the source and target trace correspondence relation as follows: τ 1 κ 1 :≈ τ 2 κ 2 iff κ 1 = P 1 , κ 2 = (t 2 , P 2 , ∆ 2 , Σ 2 , Λ 2 ), and trim(P 2 ) = P 1 . Function trim is formally defined in Figure 6. Intuitively, it removes all prefixes, sub-agents, and link names that I may add to a process.

Main Results
Main properties include three results. The instrumentation algorithm I is semantics preserving, deadlock-free, correct. These are specified in Theorems 3.3, 3.4, and 3.5, resp. Proofs of the theorems are given in our accompanying Technical Report [20].

Related Work
Majority of previous work on audit logging in concurrent environments focus on audit log analysis (e.g., [33]) and security concerns regarding in transit and/or at rest log information (e.g., [34,35,36]). Studies regarding the collection of logs from multiple monitors in distributed intrusion detection systems are such instances (e.g., [37,38]). However, previous work do not reflect on the generation of the log, and assume that the audit log is given. This line of work includes studies on the security of audit logs in terms of their secrecy and integrity within concurrent environments. For example, Yavuz et al. [35] propose a logging scheme that guarantees forward security, employing cryptographic techniques. In the same line of work, Böck et al. [39] propose a system that ensures that logs are trustworthy. Our work is however orthogonal to the notion of audit log security. Using cryptographic techniques to ensure verifiable confidentiality and integrity of the audit log does not guarantee it to be correct. We employ a semantic framework by which the content of the audit log can be judged against the execution trace of the concurrent system given in π calculus (Definition 2.3).
Cederquist et al. [40] and Corin et al. [41] propose predicate logic frameworks to specify and enforce accountability requirements in distributed systems. The former proposes a framework that ensures user accountability in discretionary access control. The latter studies user accountability in access to personal data that are associated with usage policies defined by the owner of data, and can be distributed among users. Jagadeesan et al. [42] use turn-based games to analyze distributed accountability systems. Guts. et al. [43] use static type enforcement to assure that a distributed system generates sufficient audit logs. However, our approach is dynamic and relies on instrumentation techniques that can be applied to legacy systems which may inherently suffer from the lack of correct audit logging mechanisms. With respect to system instrumentation for auditing purposes, our work is related to the language proposed by Martin et al. [44] that facilitates querying runtime behavior of a program.
Another line of work employs logs to record proof of legitimate access to system resources. Vaughn et al. [45] propose an architecture based on trusted kernels that rely on such logged proofs. Another related work is the a posteriori compliance control system [46] that verifies legitimacy of access after the fact, using a trust-based logical framework that focuses on a limited set of operations. However, our logical framework is used to specify invocation of any arbitrary operation as a precondition to log, or the logging event.
Audit logs can be considered a form of provenance [47]. CamFlow [48] is an auditing and provenance capture utility in Linux that can easily integrate with distributed systems. Pasquier et al. [49] make strong case for accountability, data provenance and audit in the IoT. AccessProv [5] is proposed as an instrumentation tool that rewrites legacy Java applications for provenance and finds bugs in authorization systems. Kacianka et al. [50] propose a formal model of accountability for cyber-physical systems.
Amir-Mohammadian et al. [11] propose a semantic framework for audit logging based on the theory of information algebra [51,12]. Their implementation model is restricted to sequential computation. This model is therefore insufficient to apply on concurrent systems where logging preconditions and logging events may transpire in different execution threads. Moreover, their implementation model is restricted to deterministic system behavior. Our work generalizes the application of information-algebraic semantics of audit logging to concurrent environments, which naturally behave non-deterministically at runtime. We show that the semantic framework is inclusive enough for this purpose. Similar to [11], we propose a provably correct instrumentation algorithm. However, our algorithm retrofits a concurrent system (rather than a simple sequential program) according to a formal description of audit logging requirements. Information-algebraic semantic framework for audit logging has also been used to enhance dynamic integrity taint analysis through after-the-fact study of audit logs [8,9]. This line of work introduces maybe-tainted tags for data objects and proposes an implementation model on a core functional object-oriented calculus that provably ensures correctness of generated audit logs. However, it does not address the problem of deploying audit logging in concurrent environments.
Recently, Justification Logic [52] is used to formally characterize auditing of computational units [53,54,55] which result in programming languages that enable applications to study their own audit trails and decide accordingly. This is a separate theoretical problem than what we are aiming in this work.
Microservices-based approach [56,57] to software deployment is an application of our implementation model, that we aim to study in future in a greater detail. Accountability plays a significant role as part of the access control framework in microservices-based systems [58], including platform-specific monitoring techniques, e.g., in Azure Kubernetes Service [59]. Smith et al. [60] have proposed a provenance management system, including provenance logger, for microservices-based applications. Camilli et al. [61] have proposed a semantics for microservices based on Petri nets. Our approach is, however, language-based and relies on process calculi. Jolie [21] is the major programming language for the deployment of microservices, whose semantics [22,23] is defined as a process calculus, heavily influenced by π-calculus.

Future Work and Conclusion
In this paper, we have proposed an implementation model to enforce correct audit logging in concurrent environments. In essence, we have proposed an algorithm that instruments legacy concurrent systems according to a formal specification of audit logging requirements. We use Horn clause logic to specify these logging requirements, which assert temporal relations among the events that transpire in different concurrent components of the system. We have proven that our algorithm is semantics preserving, i.e., the instrumented system behaves similar to the original system, modulo operations that correspond to audit logging. Moreover, we have proven that our algorithm guarantees correct audit logs. This ensures that the instrumented system avoids missing any logging event, as well as logging unnecessary events. Correctness of audit logs are defined according to an information-algebraic semantic framework. In this semantic framework, information containment is used to compare the runtime behavior vs. the generated audit log.
We have argued that the our instrumentation algorithm proposes a model to implement audit logging in real concurrent systems, e.g., in microservices-based medical records systems (Examples 3.1 and 3.2). In this paper, we have aimed at the formal specification of the model on an abstract core calculus to demonstrate the main ideas for future deployment. In future work, we intend to consider real-world language settings, relying on the fundamental results established in this work. In particular, we are aiming to deploy our existing instrumentation algorithm in Spring Boot [18], a Java microservices framework. Indeed, to provide formal guarantees of audit logging correctness in such real-world settings that are implemented using our model, translation of systems to Π systems is required.
Another area of interest is to extend the class of logging specifications, and hence implementation models. Our current class focuses on function invocations within each agent of the system, and it is limited to Horn clauses. While a great percentage of system events can be specified in this class, we need other classes of logging specifications for certain purposes. For example, consider the effect of revoking break-the-glass status for a user on the specification of audit logging requirements. Moreover, auditing usually includes the log of message transmissions between specific agents, which is not supported by what we have introduced in this paper.
Since our model of concurrency is based on a process calculi, message-passing is used for IPC. This necessitates the exploration of models that study the specification and enforcement of correct audit logging in concurrent environments which handle IPC through alternative approaches, e.g., shared memory and/or files.