Detecting violations of access control and information flow policies in data flow diagrams ✩

The security of software-intensive systems is frequently attacked. High fines or loss in reputation are potential consequences of not maintaining confidentiality, which is an important security objective. Detecting confidentiality issues in early software designs enables cost-efficient fixes. A Data Flow Diagram (DFD) is a modeling notation, which focuses on essential, functional aspects of such early software designs. Existing confidentiality analyses on DFDs support either information flow control or access control, which are the most common confidentiality mechanisms. Combining both mechanisms can be beneficial but existing DFD analyses do not support this. This lack of expressiveness requires designers to switch modeling languages to consider both mechanisms, which can lead to inconsis-tencies. In this article, we present an extended DFD syntax that supports modeling both, information flow and access control, in the same language. This improves expressiveness compared to related work and avoids inconsistencies. We define the semantics of extended DFDs by clauses in first-order logic. A logic program made of these clauses enables the automated detection of confidentiality violations by querying it. We evaluate the expressiveness of the syntax in a case study. We attempt to model nine information flow cases and six access control cases. We successfully modeled fourteen out of these fifteen cases, which indicates good expressiveness. We evaluate the reusability of models when switching confidentiality mechanisms by comparing the cases that share the same system design, which are three pairs of cases. We successfully show improved reusability compared to the state of the art. We evaluated the accuracy of confidentiality analyses by executing them for the fourteen cases that we could model. We experienced good accuracy. © 2021TheAuthor(s).PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense (http://creativecommons.org/licenses/by/4.0/).


Introduction
In software-intensive systems, software contributes an essential influence on the design, construction, deployment, and evolution of the system as a whole (Institute of Electrical and Electronics Engineers, 2000). Consequently, software-intensive systems certainly cover all software systems but also cover, for example, modern production systems, cyber-physical systems or the internet of things. Many attacks target software-intensive systems (Deogirikar and Vidhate, 2017;Sadeghi et al., 2015). Thus, establishing and maintaining security of software-intensive systems is necessary. There are various security objectives that shall be established. Confidentiality, which is one of these security ✩ Editor: W.K. Chan. objectives, ensures that "information is not made available or disclosed to unauthorized individuals, entities, or processes" (International Organization for Standardization, 2018). Confidentiality is hard to achieve in software-intensive systems (Alguliyev et al., 2018) but it is important to consider in order to avoid high penalties and loss of reputation. Strong data protection regulations such as the General Data Protection Regulation (GDPR) (European Union, 2016) of the European Union carry high financial penalties for failing to protect the data of users. For instance, British Airways is facing a penalty of £20m (Denham, 2020a) and Marriott International is facing a £18.4m penalty (Denham, 2020b) because of confidentiality breaches. Another threat to companies is loss of reputation after information disclosure. For instance, Facebook users lost trust (Weisbaum, 2018), which also affected the market value, after the Cambridge Analytica scandal (Isaak and Hanna, 2018).
Considering confidentiality is not a small polishing step in the development process but has to be done right from the beginning on. Big software vendors like Microsoft already consider confidentiality in all development phases (Microsoft Corporation, 2020). Considering confidentiality in the software design is https://doi.org/10.1016/j.jss.2021.111138 0164-1212/© 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). especially crucial to avoid a significant increase in the overall development effort: Boehm et al. (1975) reported that fixing an issue becomes more expensive, the later it is fixed. Therefore, issues should be fixed as early as possible in the development process. The same holds for security issues in the development process (Microsoft Corporation and iSEC Partners, Inc., 2009;Hoo et al., 2001;McGraw, 2006). This is critical because design issues cause about 50 % of all security issues (McGraw, 2006). Ensuring proper software designs does not free developers from considering confidentiality in the remaining phases but builds a solid foundation for further phases by identifying and fixing fundamental issues that can barely be fixed later even when spending considerable effort.
Model-based confidentiality analyses are appropriate for identifying confidentiality violations caused by a confidentiality issue in software design, as Jürjens (2005b) demonstrated as part of a case study. A confidentiality violation is a detectable violation of a confidentiality requirement such as a system that receives data, to which it should not have access. A confidentiality issue is the reason why a confidentiality violation occurs. For instance, a system might acquire wrong data because of a wrong service call. Manual inspections of system designs can detect confidentiality violations but this task is complex and labor-intensive, which impedes fast and early detection of violations. A modeling language that is not capable of representing the important aspects for detecting confidentiality violations makes the detection process even harder. Automated model-based confidentiality analyses operating on appropriate models have the potential to speed up finding violations (Tuma et al., 2020). Especially, model-based confidentiality analyses operating on DFDs are promising because security problems tend to follow the data flow (Shostack, 2014), i.e. to identify the cause of a violation, it is often necessary to follow the path that the data took. We already demonstrated that model-based confidentiality analyses based on software designs given as data flows can yield valuable results in Industry 4.0 settings in previous work (Al-Ali et al., 2019). DFDs are part of, among others, the curriculum of requirements engineering certifications, such as the IREB certification (Pohl and Rupp, 2015), and textbooks on requirements engineering, such as (Dick et al., 2017;Wiegers, 2005), which is why designers are usually familiar with DFDs and do not require a steep learning curve.
Confidentiality analyses must support access control and information flow control because both are important confidentiality mechanisms: Access control is the standard for protecting confidential data (Sabelfeld and Myers, 2003). Therefore, it is commonly used in practice. For instance, a system might violate an access control requirement by providing a user with information of a certain type, which should be kept secret from that particular user. Information flow control can detect information leaks by data propagation that allow drawing conclusions without direct data flows (Hedin et al., 2017). For instance, a system might violate an information flow requirement by providing a user with information that has been derived from other information, which in turn should be kept secret from that particular user. Simple information flow control approaches such as taint analysis (Arzt et al., 2014) are applied in practice but more powerful information flow control approaches such as fine-grained noninterference enforcements are not (Staicu et al., 2019). Access control and information flow control are valid options to use depending on the system and the development context. Even combinations of simple information flow control and access control are possible at implementation level (Xu et al., 2006;Wang et al., 2009), which can improve the protection of information. If modeling and analysis approaches are not capable of representing information flow and access control, the chances are high that they are not applicable in a significant amount of cases in practice.
This article addresses the automatic detection of confidentiality violations in data-oriented software designs. Related work such as Tuma et al. (2019), van den Berghe et al. (2018) and Alghathbar and Wijesekera (2003) (discussed in detail in Section 4) as well as our previous work  already suggested modeling languages and analysis semantics in order to realize automated confidentiality analyses of software designs. Nevertheless, we still see the need for further research because of the following challenges that neither related work nor our previous work addressed comprehensively so far: (Ch1) A systematic consideration of all possible paths, which data can take in a system design, is necessary to find violations systematically. (Ch2) Modeling and analyzing information flow and access control within separate artifacts introduces consistency issues, so a consistent modeling and analysis approach, which supports both confidentiality mechanisms, is necessary. (Ch3) User-defined analyses are necessary to cope with specific analysis needs, which are hard or tedious to define in terms of established confidentiality mechanisms. We describe these challenges in more detail in Section 2. The following two contributions address these challenges: (C1) Extended DFD Syntax. We specify an extended DFD syntax by a metamodel that addresses the previously described challenges via syntactical extensions for representing confidentiality mechanisms. The metamodel introduces the concept of alternative data flows via pins to represent multiple data sources and destinations (Ch1). The metamodel distinguishes between system parts that depend on particular confidentiality mechanisms and system parts that do not. Everything related to specific confidentiality mechanisms is encapsulated in extensions that can be defined by users (Ch3). An extension consists of confidentiality properties and behavior descriptions, i.e. descriptions of how the system changes these properties during its execution. The metamodel can represent information flow and access control (Ch2) by such extensions.
(C2) DFD Semantics for Confidentiality Analyses. We introduce analysis semantics based on label propagation that support various types of confidentiality analyses. Confidentiality properties are mapped to labels. Behavior descriptions are mapped to label propagation functions. An analysis is defined by a comparison of labels resulting from the label propagation with expected labels stemming from requirements. The comparison can cover information flow and access control analyses (Ch2) as well as user-defined analyses (Ch3). The semantics explicitly consider all possible data flows as well as their combinations, i.e. all data flow paths (Ch1).
We evaluate the presented modeling and analysis approach in a case study including fifteen cases. A case consists of a system, confidentiality requirements given in terms of a particular confidentiality mechanism as well as the properties and behaviors required to reason about confidentiality. We evaluate three aspects of the approach: the expressiveness in specifying systems and analyses, the reusability when replacing confidentiality mechanisms as well as the accuracy of analyses. We evaluate information flow analyses on nine cases and access control analyses on six cases. All cases used to evaluate information flow analyses and half of the cases used to evaluate access control analyses stem from related work. The results indicate good expressiveness and accuracy as well as improved reusability compared to the state of the art.
The remainder of this article is structured as follows. Section 2 describes the three challenges that we address. We describe the running example to illustrate our approach throughout the article in Section 3. Section 4 covers the discussion of the state of the art in DFD semantics as well as design time confidentiality analyses. An overview on how the approach works is given in Section 5. The core contributions are the syntax and the semantics, which we describe in Sections 6 and 7, respectively. We show how to detect confidentiality violations using both contributions in Section 8. We briefly report on our tooling in Section 9. Section 10 presents the evaluation of the expressiveness and reusability of the syntax as well as the accuracy of defined analyses. Section 11 concludes the article.

Challenges
In this section, we describe the challenges in using the DFD syntax of DeMarco (1979) for detecting violations of confidentiality requirements. DFDs as introduced by DeMarco (1979) are graphs presenting a functional viewpoint on systems based on data processing. There are only four fundamental elements: Data flows are unidirectional edges that connect nodes to describe a data transmission between them. Source and sink nodes (also called actors) start or terminate a sequence of data flows. Process nodes transform incoming data to outgoing data. File nodes (also called stores) persist and emit data. DeMarco describes the semantics of DFDs in an intuitive but incomplete way, so there is no standard semantics.
The lack of full-fledged semantics and shortcomings of the simple syntax make automated analyses of DFDs challenging. Especially, we see the following three open challenges that have not been addressed sufficiently yet.
(Ch1) Exploration of multiple data flow paths. A data flow path is a sequence of nodes, which a data item took to reach a particular node. Multiple paths providing the same type of data to the same node commonly occur in realistic applications. For instance, branches can change call destinations and thereby also the destination of sent data. Multiple calls arriving at a certain location imply multiple sources of data for the callee. Modeling approaches have to provide means for describing these multiple paths to represent realistic system designs. The corresponding analysis approaches have to consider all of these paths in a systematic way to detect possible violations. Often, not all combinations of data flows build a valid data flow path from a logical point of view. Therefore, modeling approaches should provide means to specify valid combinations. A common approach to treat multiple data flows is to require an explicit selection of one particular path before the analysis but this is problematic because it does not scale well: In theory, the cross product of all possible choices at every node in a DFD has to be considered if no specification of valid paths is available.
(Ch2) Coverage of multiple confidentiality mechanisms. Usually, DFDs require extensions to capture the information required to conduct confidentiality analyses. Single purpose models and analyses cover phenomenons pretty well and provide accurate analyses. However, the downside of single purpose approaches is the lack of flexibility, i.e. designers have to choose a particular confidentiality mechanism, e.g. information flow or access control, before they start modeling. Switching to another confidentiality mechanism implies remodeling large parts of the system in the new modeling language even if fundamental parts, such as the system structure, could be reused. Remodeling large parts may imply consistency problems: software designers have to ensure that the shared part of both models actually represents the same design. Creating (automated) mappings between two single purpose models is possible in general but such kind of consistency management is challenging if the languages diverge too much (Torres et al., 2020). A feasible approach for addressing this consistency problem when switching confidentiality mechanisms is necessary.
(Ch3) User-defined confidentiality analyses. Requirements to keep information confidential can be formulated in various ways.
However, when designers are forced to use predefined confidentiality mechanisms, even simple requirements such as that a certain piece of information must not flow to one specific node can become complex: In Role-based Access Control (RBAC), a designer has to specify roles and assign these roles to data and nodes in a way that the simple policy can be checked by comparing roles. In information flow, a designer has to do roughly the same steps but for labels instead of roles. Defining custom analyses can be easier. To do so, designers need means for specifying custom analyses and according modeling concepts. As a side effect, this would also allow to integrate new confidentiality mechanisms. An underlying formalism supporting analyses of various confidentiality mechanisms as well as an appropriate modeling language is needed to provide such means.

Running example
To illustrate the concepts described in this article as well as the limitations of the state of the art, we use the TravelPlanner case study (Katkalov et al., 2013) of iFlow as a running example. The case study consists of the four systems shown in Fig. 1: The travel planner app queries flights and books them on behalf of the user. The credit card center app manages the credit card information of a user. An airline service provides flight information and allows booking flights. A travel agency service mediates between the travel planner and the airline. The scenario is that users query flights, load their credit card data (CCD), book the flight with the airline and the airline pays a commission for mediating to the travel agency.
With respect to confidentiality, there are three totally ordered security levels: The first level User,Airline,Agency contains information accessible to all parties. The travel agency, airline and user have clearance for this level. The travel planner and credit card center apps belong to the user. Both apps and the user always have the same clearance. The second level User,Airline dominates, i.e., it is bigger than or at least equal to (⩾), the first level and contains information regarding the flight booking. The airline and user have clearance for this level. The third level User dominates the previous levels and contains information only meant for the user. The user has clearance for this level. The critical part of the system is that credit card information from level three must not be disclosed to entities with lower clearance level. However, the airline needs the credit card information to process the booking. Therefore, a declassification of the credit card data explicitly lowers the security level to the second level. If this declassification is missing, there is a violation of the information flow requirements.
The corresponding DFD is shown in Fig. 2. The level, behavior and user annotations are part of our extended DFD syntax. The remaining elements follow the notation of DeMarco (1979). Informally speaking, nodes annotated with level 1 belong to the travel agency, nodes annotated with level 2 belong to the airline and the remaining nodes belong to the user. A process with the user annotation (small actor symbol on the left side) is a step executed by the user instead of the system.

State of the art
This article is about detecting violations of confidentiality requirements in software designs by analyzing DFDs. In order to analyze DFDs, we have to define the meaning of every element of a DFD. Various attempts (see Section 4.1), which do not focus on confidentiality, have been made to specify formal semantics of DFDs. Although these semantics are not usable to analyze confidentiality, they reveal shortcomings in the DFD semantics by DeMarco (1979) that have to be addressed. Often, such shortcomings stem from ambiguities caused by imprecise or missing  information in DFDs. Such ambiguities can be addressed the best by providing additional information in an extended syntax. We derive features that have to be considered by DFD semantics and the corresponding syntax based on the identified shortcomings and ambiguities. We show the significance of these features for modeling and analyzing confidentiality by discussing where these features are used in the running example. In general, the features are an enabler for addressing the challenges (Ch1, Ch2, Ch3) described in Section 2. However, the syntax and semantics cannot address these challenges completely on its own but require support from other parts of the approach such as the analyses.
Approaches for identifying violations of confidentiality requirements do not have to use DFDs but can operate on various artifacts. Because approaches operating on data flows are most closely related to our proposed approach, we separate the discussion of approaches for identifying violations of confidentiality requirements in Section 4.2 by the paradigm of the analyzed artifacts: Section 4.2.1 discusses approaches operating on control flow descriptions and Section 4.2.2 discusses approaches operating on data flow descriptions. We also discuss how the approaches realize the previously mentioned features and whether the approaches sufficiently address the challenges described in Section 2.

DFD semantics
The publications about semantics of DFDs, which we describe in the following, frequently report on four shortcomings of the informal semantics introduced for DFDs by DeMarco (1979). They address these shortcomings by extensions. Consequently, we see these extensions as required features for DFD semantics as well as for the syntax if a syntax extension supports a semantical extension. The features are (F1) properties of nodes, (F2) defined meaning of multiple inputs, (F3) behaviors of processes and (F4) behaviors of actors. In the following, we explain the significance of these features for detecting confidentiality violations and how well solutions proposed by related semantics address these features.
F1 Node Properties. The properties of nodes are barely covered in related semantics but representing them is important: in our running example, it would not be possible to represent the clearance level of nodes, which is essential for comparing it with the classification level of data to identify violations. France (1992) and Petersohn et al. (1994) define execution semantics for DFDs and cover node properties as part of the global execution state. This means, properties can change dynamically. While this is an interesting approach, dynamic annotations are more complex to specify compared to static annotations. Therefore, we are interested in exploring whether static annotations are sufficient to represent and analyze common confidentiality mechanisms.
F2 Multiple Inputs. The handling of multiple inputs is a commonly addressed feature. If a process has multiple inputs, they usually relate to each other but it is not clear how. In our running example, it would be unclear that the two credit card inputs are alternatives in book flight rather than two mandatory inputs. However, the choice of a particular input can change the analysis results. The simplest solution is to always require all inputs (Fensel et al., 1993;Larsen et al., 1994;Petersohn et al., 1994;Xiong et al., 2017) but this is often too restrictive: Requiring all inputs would not allow modeling the alternative input flows ccd direct and declassifiedCCD in our running example. Expecting all inputs, which roughly equals to expecting all possible incoming calls to be mandatory, is no realistic assumption. Building alternative groups of particular data flows is possible by defining preconditions to select flows (France, 1992;Liu and Tang, 1991;Plat et al., 1991;Wahls et al., 1993;Leavens et al., 1996Leavens et al., , 1999 or by building sets of data flows. However, an additional alternative flow implies changes in potentially multiple preconditions and sets which can lead to inconsistent specifications in case of many data flows and preconditions or sets. In our running example, adding another input providing credit card data to book flight would require adjusting the precondition or the sets. We see potential to further simplify adding an alternative data flow.
F3 Behavior of Nodes.
A formal framework to specify the behavior of processes with respect to the effect on data is necessary. In our running example, it is important to specify that declassify CCD lowers the classification level of yielded data. This is not possible without means for specifying behavior. However, finding a reasonable level of abstraction for the specification of process behaviors is a challenging topic. Semantics focused on execution (Kavi et al., 1986;Brunza and van der Weide, 1989;Petersohn et al., 1994;Xiong et al., 2017) do not consider process behavior at all. Semantics using behaviors to specify trigger conditions (Plat et al., 1991;France, 1992;Fensel et al., 1993;Larsen et al., 1994), i.e. conditions for when a process can run, do not describe an effect on yielded data. Both approaches would not allow us to derive data properties from data processing by the system. This means manual and error-prone data classifications are necessary to still support powerful analyses. Specifications of algorithms to calculate outputs (Liu and Tang, 1991;Wahls et al., 1993;Leavens et al., 1996Leavens et al., , 1999 can represent wide ranges of effects by specifications given in general purpose languages. However, a generic specification language potentially is more complex to use than a tailored specification language.
F4 Behavior of Actors. The behavior of actors, i.e. the data processing done by actors instead of systems, is often neglected but can be important to consider. In our running example, it is crucial to know that the user does not pass the credit card information received from the simple getter call back into the system but the credit card information received from the declassification operation. Actor behaviors allow to specify that the data is received and passed back into the system. Without such descriptions, we could only guess the origin of data, which could lead to incorrect analysis results. About half of the identified semantics (Kavi et al., 1986;Liu and Tang, 1991;Brunza and van der Weide, 1989;Fensel et al., 1993) ignore actor behavior but about the other half (Plat et al., 1991;France, 1992;Larsen et al., 1994;Leavens et al., 1996Leavens et al., , 1999 uses the same means as for specifying process behaviors. Representing actor behavior by the same means as for representing node behaviors is beneficial because this provides a uniform way of specifying behavior. This lowers the learning effort.

Confidentiality modeling and analysis approaches
To cope with the large amount of confidentiality modeling and analysis approaches focusing on the design and development phases, we discuss categories of approaches and provide examples from these categories. The examples illustrate limitations with respect to the required features identified before as well as limitations in sufficiently addressing the challenges described in Section 2. The limitations apply to the whole category. In the following, we distinguish between approaches analyzing control flows (Section 4.2.1) and approaches analyzing data flows (Section 4.2.2). The latter approaches are closely related to the approach we present in this article.

Modeling and analysis of control flows
Control flow modeling and analysis approaches describe actions to be executed and the order, in which these actions are executed. We distinguish between approaches working on abstractions of the system (Gerking et al., 2018;Katkalov et al., 2013;Jürjens, 2005a;Hoisl et al., 2014;Almorsy et al., 2013;Abdellatif et al., 2011), such as models specified in the Unified Modeling Language (UML), and approaches working with source code (Arzt et al., 2014;Snelting et al., 2014;Runge et al., 2020;Ahrendt et al., 2016). Creating an abstraction of a system usually requires an upfront effort for modeling. However, once the model is created, it can be changed and analyzed for different design alternatives (cf. what-if-analyses) much easier compared to source code. This is because abstracting the system usually reduces dependencies that need to be considered.
Model-based Approaches. One of the most fundamental decision when creating a model-based approach is the level of abstraction of the model to be used. Therefore, we distinguish approaches by the level of detail required to model the behavior of nodes (F3). As illustrated in the overview on related modelbased approaches operating on control flows in Table 1, we see three groups of approaches: (i) approaches requiring detailed specifications (s) in the top section, (ii) approaches using coarsegrained specifications such as predefined behaviors based on node types (nt) in the middle section and (iii) approaches not describing the behavior at all (-) in the bottom section. In the following, we do not discuss the features F1, F2 as well as the challenge Ch1 individually because all approaches handle them the same: Properties of nodes (F1) are covered by annotations. The meaning of multiple incoming data flows (F2) is simple: because data flows only happen via calls, every individual call is an alternative data flow consisting of potentially many data items. Consequently, all approaches address the challenge of discovering all data flow paths (Ch1) but restrict themselves to data flows via calls, which cannot represent more complex data flow patterns of DFDs.
The three approaches (Gerking et al., 2018;Katkalov et al., 2013;Jürjens, 2005a) requiring detailed specifications (i) use the specifications to prove information flow properties of the system model. All three approaches cannot represent data processing by the behavior of actors (F4) but limit behaviors to individual calls to the system. Therefore, they cannot provide full traceability of data that is processed by a user. Besides information flow, UMLSec (Jürjens, 2005a) can analyze access control. However, UMLSec can only control access to actions but not access to data. The support for custom analyses (Ch3) is limited to simple well-formedness constraints. This means that custom analyses can compare annotations and report violations on a structural level. A custom data propagation analysis is not possible without intrusive changes in the UMLSec source code.
There are approaches operating on more abstract behavior descriptions: Hoisl et al. (2014) use predefined behavior descriptions for processes (F3) and actors (F4), which they assign based on the type of various nodes. This is often simple to use for designers but also implies restrictions with respect to possible analyses and extensibility: The approach only supports taint analyses, which is a simple information flow mechanism (Ch2), and does not support custom analyses (Ch3).
Approaches not providing behavior specifications (Almorsy et al., 2013;Abdellatif et al., 2011) for processes (F3) or actors (F4) usually only have limited analysis capabilities. The approach of Almorsy et al. (2013) supports a simple form of access control (Ch2) and means to define simple well-formedness analyses (Ch3). The approach of Abdellatif et al. (2011) only supports information flow and no custom analyses. Both approaches do not analyze data propagation, so classifying data or other system elements is a manual task and the analyses are limited to pattern matching.
Source Code-based Approaches. There are three types of related approaches operating on source code: taint analyses such as FlowDroid (Arzt et al., 2014), full-fledged information flow analyses such as JOANA (Snelting et al., 2014) or IFcB (Runge et al., 2020), and verification approaches such as KeY (Ahrendt et al., 2016). The approaches either associate properties of nodes (F1) by the node type (e.g. a sensor of a certain type can always be manipulated by an attacker) or by the value of attributes (e.g. a class has an attribute holding its clearance level). The handling of multiple inputs (F2 and Ch1) is the same as for the model-based analyses operating on control flows. The behavior of nodes (F3) is given by the source code and the behavior of actors (F4) is usually not covered. Because approaches based on source code are often highly specific to certain application domains or scenarios, they only support one particular confidentiality mechanism (Ch2) and are barely extensible (Ch3). All approaches except for KeY only support information flow analyses. KeY does not prescribe a particular confidentiality mechanism but supports custom analyses (Ch3) via preconditions and postconditions. However, approaches based on source code are not applicable at design time as already motivated.

Modeling and analysis of data flows
Design time approaches exploiting data flows are closely related to our work. Table 2 gives an overview on the approaches discussed in the following. The upper part of the table covers threat modeling approaches. The lower part covers data propagation analyses.
Threat modeling (Abi-Antoun et al., 2007;Deng et al., 2011;Yampolskiy et al., 2012;Berger et al., 2016;Sion et al., 2018) is frequently researched. Because of the flexible nature of threat modeling, multiple confidentiality mechanisms (Ch2) and custom analyses (Ch3) are usually supported. All approaches support node properties (F1) by static annotations and do not consider actor behaviors (F4). All approaches allow multiple inputs but only Yampolskiy et al. (2012) distinguish mandatory and optional data flows (F2). However, the selection process of their introduced optional flows is not specified in their publication, so systematically considering multiple flow paths is not possible (Ch1). The behavior of processes (F3) is often not represented: Only Abi-Antoun et al. (2007) and Sion et al. (2018) describe behaviors by annotations. These annotations are compared to patterns later. All analyses are limited to purely structural analyses that perform pattern matching and that do not derive properties of exchanged data based on its processing. Therefore, reasoning about information flow requires either manually classifying all exchanged data, which can be a complex task, or only yields results with the same granularity as simple taint analyses. Reasoning about multiple classification levels, like we do in the running example, is not possible. Data propagation analyses reduce the complexity of the labeling task by not requiring all data to be labeled manually. Manual labeling is repetitive and sometimes challenging, so it is error prone. Instead, data propagation analyses require a limited set of initial labels that are propagated through the system. As a consequence, only few labels have to be assigned manually, which reduces the complexity compared to the category of approaches discussed before. FlowUML (Alghathbar et al., 2006) derives DFDs from UML sequence diagrams, models them in a logic program and describes how to detect violations of information flow requirements as well as DAC and MAC requirements. Therefore, they support information flow and access control (Ch2). FlowUML uses specific node types to represent properties of nodes (F1), which is comparable to static annotations, and specifies behaviors of processes (F3) and actors (F4) by tables that relate data flows. The handling of multiple flows (F2) and also multiple data flow paths (Ch1) is not described in the FlowUML paper (Alghathbar et al., 2006), so it is unclear how well realistic systems can be modeled and analyzed by the approach. Formulating custom analyses (Ch3) is not described. We could not find any publications reporting on an evaluation of FlowUML. Therefore, it is unclear whether the approach is applicable to realistic systems and whether it provides accurate results. Tuma et al. (2019) as well as our previous work  have been evaluated for realistic systems. Both approaches describe the system behavior (F3) as a sequence of label propagation functions and initial labels on data. Both approaches represent properties of nodes (F1) as static annotations. Tuma et al. only support information flow and do not consider the behavior of actors (F4). Our previous work  only supports access control and considers the behavior of actors (F4) by label propagation functions. Considering actor behaviors allows to specify, for instance, which particular credit card information is passed to the system in our running example, which in turn affects the analysis results. Both approaches only support exactly one type of confidentiality analysis (Ch2). Our previous work  additionally provides means for specifying custom analyses (Ch3) via queries. Both approaches do not provide means for systematically considering all possible data flow paths (Ch1) in presence of multiple valid selections of inputs but prescribe one particular input selection (F2). Prescribing one selection allows analyses in presence of ambiguities but does not guarantee to find violations produced by other possible selections.
van den Berghe et al. (2017b) describe systems by data flows between predefined processing operators to prove security properties including a simple form of information flow control but no access control (Ch2). They describe system behavior in the proof assistant Coq by stateful modeling in linear-time temporal logic. These behavior descriptions can be used to describe the behavior of processes (F3) and actors (F4). Properties of nodes (F1) can be defined freely and they can change dynamically. This also enables formulating custom analyses (Ch3). The behavior description of nodes includes the logic for selecting inputs (F2). However, the paper does not report on systematically considering all possible data flow paths (Ch1). Additionally, including the selection logic of inputs in behavior descriptions hinders reusability because the same behavior cannot be used for two nodes with different amounts of inputs. In our running example, we would have to specify a dedicated behavior for the book flight process instead of just reusing the Forward behavior because the behavior would have to be extended by the selection logic of the alternative incoming flows of credit card information.

Overview of the approach
Before describing the contributions in detail, we give a highlevel overview on how our modeling and analysis approach is applied and how it works. The goal of the approach is to detect violations of confidentiality requirements. This, especially, covers requirements given in terms of information flow or access control. To apply the approach, the three activities illustrated in Fig. 3 are necessary: creating an analysis definition, modeling the system and running the analysis. The analysis definition introduces confidentiality-related model elements that are used while modeling the system. Often, it is sufficient to create the analysis definition once and use it for various systems. We explain all of these activities in the following.
Creating an Analysis Definition. An analysis definition is a collection of the following model elements: (1) properties of nodes (F1), (2) properties of data, (3) behavior description of nodes (F3 and F4) and (4) a comparison function. In our running example, the properties of nodes (1) are the clearance levels and the properties of data (2) are the classification levels. The behavior descriptions (3) define how nodes process data, i.e. what properties outgoing data will have based on properties of incoming data. In our running example, the behavior descriptions are Forward, Join and Declassify. The Forward behavior copies incoming data properties to outgoing data properties unchanged. The Join behavior looks for the highest classification level on all incoming data and applies that level to outgoing data. The Declassify behavior explicitly sets the classification to the second level. The comparison function (4) defines a pattern that indicates a violation by comparing data and node properties. In our running example, the comparison function looks for a node with a clearance level lower than the classification level of any data received by that node. A dedicated security expert creates the analysis definition because it requires security expertise to map a confidentiality analysis to the described four model elements. Alternatively, a software designer can carry out these activities if he/she has security expertise. Analysis definitions (or at least parts of it) are often reusable. Therefore, defining an analysis is only required if it has not been defined before. Consequently, security experts do not have to take part in the design process of every system but only in the processes that require new analysis definitions. Decoupling the analysis-specific model elements, i.e. the analysis definition, from the remaining DFD elements is not only beneficial for assigning clear responsibilities: In previous work , we demonstrated that this separation also improves maintainability. In addition, the separation is beneficial for reusing models as we show in the evaluation in Section 10. In our running example, the whole analysis definition can be reused for other systems if the particular levels are renamed. Because the analysis definition is sufficient to represent the core elements of a confidentiality mechanism and creating the analysis definition does not require intrusive extensions of the overall approach via source code, the analysis definition addresses the challenge of defining custom analyses (Ch3). As we will show in the evaluation in Section 10, the analysis definition is expressive enough to represent information flow and access control mechanisms, so it also addresses the challenge about representing both confidentiality mechanisms (Ch2).
Modeling the System. First, a software designer models the structure of the system with the DFD elements, which DeMarco (1979) introduced. Next, the designer integrates the confidentiality mechanism into the system by applying elements from the previously defined analysis definition to DFD elements. In our running example, the designer assigns each node a clearance level and a behavior description. Assigning data properties explicitly is not necessary: behavior descriptions can provide initial data properties of newly created data and the analysis will determine the remaining data properties later. In our running example, the behavior description of the FlightPlanner specifies that outgoing data always is classified by the first level. In contrast, the Forwarding behavior of the dispatch request process does not provide an initial data classification but will derive the classification during the analysis.
Running the Analysis. The software designer starts the fully automated analysis. The result is a list of detected violations according to the comparison function. The fundamental idea of the analysis is to map the DFD, the properties and the behavior descriptions to a label propagation network. Properties become labels, nodes become label propagation functions according to their behavior description and data flows define the connections between the label propagation functions. The analysis propagates all labels through the network. After that propagation, the labels of all data at all nodes are known. In the last step, the comparison function compares the labels to identify a violation. To find information flow violations in our running example, we look for an edge with a higher classification label than the clearance level of the receiving node. The dashed edge in Fig. 2 causes such a flow: The dashed flow circumvents the declassification process, which makes the credit card data arriving at book flight level 3 instead of level 2. The process booking process receives this level 3 data but its clearance is only valid up to level 2. This means that we found a violation. The dashed data flow as well as the solid data flow transport credit card data to the process booking process. As we will explain in Section 6, we introduced a notion to clearly state that both flows are alternative flows (F2), which means that exactly one of these flows has to be chosen. As we will explain in Section 7, the analysis systematically explores all possible combinations of data flows transporting labels, which addresses the corresponding challenge Ch1.
In the presented running example, the violation is easy to spot but in more complex systems, finding all possible sources and properties of incoming data is challenging. Using the sketched analysis can help designers to identify issues in software designs and correct them before the implementation of the introduced issue starts. In the running example, a designer has to ensure that data, which has not been declassified, never arrives at the book flight process by removing the faulty data flow. Programmers later have to adhere to this specification and ensure that data always goes through a declassification operation.

Syntax of extended data flow diagram
In order to realize the identified missing features of DFD semantics described in Section 4.1, we have to extend the syntax and the semantics of DFDs. It is not sufficient to only extend the semantics because we need additional information to solve ambiguities such as the handling of multiple inputs and outputs. In this section, we introduce the syntactical DFD extensions that support the definition of semantics discussed in Section 7. An overview on the syntax is given by the metamodel in Fig. 4. Gray elements are DFD elements as introduced by DeMarco (1979). Non-filled elements are the extending elements introduced in this article. As part of the following descriptions, we relate the syntax to the metamodel used in our previous work  as well as to closely related approaches (Tuma et al., 2019;van den Berghe et al., 2018).
Node Characteristics (F1). To cover relevant properties of nodes, we introduce typed characteristics. Strong types are beneficial because identifying and matching properties becomes possible. Sets of discrete values can represent relevant properties such as roles or classification levels. We call such a discrete value Label. An Enumeration builds an ordered set of corresponding labels.
Analyses can make use of the order, e.g. to determine dominance between labels. In our running example, the security levels are an enumeration of ordered labels. Labels with a higher index in such a list dominate labels with a lower index. A Character-isticType is the type of a property with a value range given by an enumeration. In our running example, the clearance and classification are characteristic types referring to the enumeration of security levels. A Characteristic is an instance of a characteristic type selecting a subset of available labels, which means that these labels apply. Every node can hold multiple characteristics, which means that the selected labels apply to the node. In our running example, we use a number inside the node to visualize a node characteristic. The number indicates a particular label, i.e. clearance level, that has been selected from the characteristic type for clearance levels. In contrast to related work (Tuma et al., 2019;van den Berghe et al., 2018) and to our previous work , labels can be ordered and that order can be used in analyses, which we describe later.
Pins (F2). We introduce the concept of a Pin to clearly specify required data. A pin describes either a required input data or output data. The set of all input and output pins describes the interface of the node. The pins are similar to pins in the UML (Object Management Group (OMG), 2017, pp. 444), which also distinguishes inputs and outputs. In contrast to the UML, we use one fixed meaning of how data is transferred through pins to simplify the usage of pins. We will see this in the following and in the definition of the semantics for pins in Section 7. Multiple DataFlow edges to an input or output pin represent multiple sources or destinations for the same data, respectively. This concept lowers the complexity while modeling because connecting a new data flow has one clear meaning: An additional flow to an input pin is an alternative flow. An additional flow from an output pin is another forked flow. A new mandatory input or output requires a new dedicated pin, to which the new data flow connects to. In Fig. 2, we visualize multiple flows for the same pin by overlapping edges. For instance, the book flight process receives credit card information from two sources when considering the dashed data flow. These two flows are alternatives, so they connect to the same pin. To foster this clear meaning of data flows, all data flows have to go through pins. Compared to related work (Tuma et al., 2019;van den Berghe et al., 2018) and our previous work , pins simplify adding additional, alternative flows because the flow just has to be added instead of integrated into existing behavior specifications of the node. Without pins, it was necessary to duplicate the specification of processing effects for these additional flows and to define the order, in which these flows shall be considered.
Data Processing Behavior (F3). We describe the data processing behavior of nodes by BehaviorDefinitions. A behavior definition is meant to be reusable to reduce the specification effort to be done by a security expert. In our running example, the behaviors Declassify, Forward and Join are behavior definitions shared between the various processes. In Fig. 2, the letters in the processes indicate the reused behavior definition. Such a definition consists of input and output pins as well as Assignments of labels to output pins. A Term specifies whether a label shall be assigned. It can refer to labels of input pins or nodes as well as to constants. The set of assignments specifies the label propagation function. Our previous work  neither provides means to specify types of behavior specifications or means to reuse them. Related work provides fixed types of behavior definitions (Tuma et al., 2019) or means to specify types (van den Berghe et al., 2018). Not considering types complicates the interaction between designers and security experts because security experts have to inspect every node in the DFD instead of only providing a few behavior types.
Actor and Store Behavior (F4). To cover behavior of actors and stores, we apply BehaviorDefinitions to these node types as well. Stores act like forwarding processes, i.e. they redirect all labels from the input to the output. Because we do not represent time or state in the model and systematically consider all possible incoming flows into the store, the forwarding behavior fits the semantics of a store that saves data and emits unchanged data. In our running example, the Flight Storage emits the same flights as the flights entered by the FlightPlanner. Actors usually use behaviors specific to them that cannot be reused. Additionally, we add the ActorProcess to describe complex data processing done by actors. These processes act like regular processes and can reuse behavior definitions but act on behalf of the actor. Consequently, the node properties, i.e. characteristics, of the actor, also apply to these processes. In our running example, the select flight process is an actor process because the user manually selects a flight from a list. In contrast to related work (Tuma et al., 2019;van den Berghe et al., 2018), we represent actors and stores with dedicated elements, which we already did in previous work . Additionally, we clearly distinguish the behavior of the system from the behavior of actors. This is beneficial because developers can distinguish parts to develop from parts only describing usage.

Semantics of extended data flow diagram
In the previous section, we introduced extensions to the DFD syntax to ease defining unambiguous semantics, which we introduce in this section. We define the semantics of the extended DFD by mapping it to clauses in first-order logic. We chose to formalize the semantics in first-order logic using Prolog because Prolog provides comprehensive capabilities of exploring all possible data flow paths, which we will discuss later. We explain the semantics in three steps: In Section 7.1, we recap foundational knowledge about Prolog. Section 7.2 explains how to map DFD elements to clauses in first-order logic. Afterwards, Section 7.3 discusses the resulting semantics of the logic program.

Foundations on Prolog
Analyses presented in this article rely on the semantics given by a transformation from the DFD into a logic program given in Prolog (Bramer, 2013). Prolog is an established logic programming language that requires a programmer to specify the knowledge to solve a problem rather than the procedure. A Prolog program consists of clauses (Bramer, 2013, pp. 13), which can be facts or rules. Facts such as the ones shown in lines 1 and 2 of Listing 1 are always true. A rule is only true if all of the terms of its body are true. In Listing 1, line 3 is the head of the rule and the lines 4 and 5 are the body terms. Terms are constants, variables, lists and compound terms. Compound terms consist of a name and arguments, which are also terms. Facts and the head of a rule are compound terms. By convention, variable names are always upper case while constants are lower case. Quoted strings and numbers are constants as well. Lists are denoted by square brackets. Empty brackets mean empty lists as shown in line 6. In Prolog, rules are given as Horn clauses, i.e. the conjunction of terms in the body imply the term in the head. From a procedural point of view, the rule in line 3 to 5 can be read as follows: In order to prove chases(X,Y), prove cat(X) first and bird (Y) second. Queries ask the program to find answers to a question. A query is a list of goals that Prolog interpreters try to solve. Goals and terms in the body of rules can be connected with a logical conjunction , or logical disjunction ;. Negation \+ is also possible but does not have the exact same meaning as negation in boolean logic (Kifer and Liu, 2018, pp. 17). Terms in queries can contain variables, for which the interpreter finds values that make all goals true. Informally speaking, Prolog interpreters find all instantiations of variables that make all goals true, which means that they can be deduced based on facts and rules. Selection Rule Driven Linear Resolution for Definite Clauses (SLD) (Nugues, 2006, pp. 447) is the most commonly used resolution process for finding variable bindings in Prolog but detailed knowledge about that process is not required for the remainder of this article.

Mapping to logic program
In this section, we describe the mapping from the extended DFD syntax to clauses in first-order logic formulated in Prolog. To keep things simple, we focus on the fundamental principles but omit implementation details such as the helper clauses that are always added as a preamble to the mapping result. Additionally, we use simple identifiers instead of unique identifiers that would be hard to read in our examples. The full specification of the transformation is given by a model to model transformation in our data set (Seifermann et al., 2021a).
DFD Nodes. First, we map the DFD nodes Actor, Store and Process, which DeMarco (1979) introduced, to clauses. Fig. 5 illustrates the mapping logic for these nodes and others that we describe later. The clauses only state that an element of the specified type exists with the given unique identifier. For instance, a store becomes a store clause with its identifier given as argument. Defining that elements exist is necessary to establish relations and specify further details such as behaviors as we will see later. For every node, we create one clause.
Actor Behavior (F4). An ActorProcess represents one activity, which an actor does. A set of such processes represents all activities done by an actor. The mapping of actor processes consists of two steps: First, we treat an actor process like a regular process, which means we generate a process clause as described before.
By doing so, we can reuse all the logic for describing behaviors of nodes. Additionally, we do not need special logic for handling actor processes during label propagation. Second, we introduce an additional clause actorProcess stating that a process with given identifier belongs to an actor with a given identifier. This is necessary to find all activities of an actor. The clauses are visualized in Fig. 5.
Multiple Inputs (F2). Our extended syntax supports multiple (alternative) inputs via Pins and DataFlows that refer to these pins. For every node, we create one clause for every pin specified in the BehaviorDefinition assigned to a node. We do not represent the BehaviorDefinition itself because its sole purpose is to make assignments and pins reusable. In Fig. 5, input pins are visualized by squares containing the letter i at the border of the node. Output pins are visualized by squares containing the letter o at the border of the node. The pin clauses describe that there is an input or output pin with a given identifier on a node with a given identifier. For every data flow, we create one clause dataflow with a unique identifier as the first argument. The next two arguments describe the source node and the corresponding output pin. The last two arguments describe the destination node and the corresponding input pin. Node Characteristics (F1). Before we can map node characteristics, we first have to map the available types of characteristics. Characteristic types are also mapped to clauses stating their existence. As shown in Fig. 6, we create one clause charac-teristicType for every characteristic type stating that there is a characteristic type with a given identifier. We do not represent enumerations because they only provide means for reusing labels while modeling. Instead, we create one clause charac-teristicTypeValue for every label transitively referenced by a characteristic type. The first argument specifies the characteristic type, the second argument specifies the label and the last argument specifies the index of the label in the enumeration. Naming the characteristic type and the label is necessary to establish a relation, i.e. to state that a certain label is a valid label for a certain characteristic type. A label is only unambiguous if it is used together with a characteristic type because a label can be reused in various characteristic types and can, therefore, have different meanings: In our running example, the meaning of the User level is different when used as classification or as clearance. Representing the index is beneficial because label comparison functions can refer to the order of the label via that index. Characteristics applied to a node are also represented by one clause for every label within a characteristic. In the example in Fig. 6, the clearance level User is applied to the actor User. Thereto, we create one clause nodeCharacteristic, which states that the node User (given as first argument) has the label User (given as third argument) of the characteristic type Clearance (given as second argument) applied.
Node Behavior (F3). In the syntax, the node behavior is given by a sequence of assignments of truth values to boolean variables. The boolean variable on the left hand side defines whether one particular label, i.e. the tuple of characteristic type and label, is available at one particular output pin. The truth value on the right hand side can refer to labels on input pins, logic operations and constants. If no assignment specifies a truth value for a label, the default is that it is not available (false). The sequence of assignments represents the label propagation function. Representing labels as boolean variables is beneficial because first-order logic supports boolean variables and boolean expressions very well. In the following, we explain how we represent boolean variables and how we map assignments.
We create one characteristic clause holding six arguments that represents a truth value for every label of a characteristic type on an output pin. Particular examples of these clauses are shown in Fig. 7. The first two arguments identify the node and the output pin. The next two arguments identify the characteristic type and the label. The following two arguments are a flow tree S and a set of already visited flows VISITED.
Roughly said, the flow tree contains the data flows connecting all transitive predecessors of a certain node. The leafs of the tree are always data flows from nodes without incoming data flows.
There can be multiple trees for one node. If the characteristic clause evaluates to true, the label is available at the output pin for a particular flow tree and a particular set of visited flows. The flow tree is necessary to identify the data flows and nodes that lead to a violation. Without knowing this information, identifying the issue that lead to a violation would be hard. The set of visited flows prevents evaluation cycles in DFDs containing cycles. We explain both concepts (flow tree and visited flows) in more detail in Section 7.3.
Assignments describe when a label shall be available. The list of assignments contained in a BehaviorDefinition is ordered because an assignment that is defined later can override the effect of an assignment defined previously. Terms, which specify the right hand side of an assignment, cannot refer to labels on the output pins, which means they cannot refer to the boolean variables that the assignments change. Therefore, there is always only one assignment that determines the final truth value for a label on an output pin that does not depend on any previous assignment in the list of assignments. To simplify the mapping, we only consider that particular single assignment for building the body of the characteristic rule for the particular characteristic type and label. There is no point in representing other assignments than the so-called last applicable one because they do not affect the result of the label propagation. The mapping transforms the Term on the right hand side of the assignment to clauses in the rule body of the characteristic clause. Constants such as the ones shown in Fig. 8 can be mapped to truth values. References to input labels are mapped to a characteristic clause referring to a label on an input pin as the mapping of the forwarding behavior in Fig. 7 demonstrates: the label User shall be applied to the output pin if it is available on the input pin, which can be checked by the characteristic clause for the input pin (third line). To ensure traceability of the results, it is necessary to keep   track of the data flows that have been considered while calculating the label, i.e. the flow tree. The inputFlow clause selects a data flow F0 that shall be considered when determining the label for the input pin. F0 will become the first flow in the flow tree. For the sake of brevity, we omit the implementation details of the inputFlow clause and the characteristic clause for input pins but refer to our data set (Seifermann et al., 2021a) for the programs containing the full implementation.

Semantics of logic program
The goal of the clauses resulting from the previously defined mapping is to formalize data transmission and data processing by means of label propagation. Queries comparing propagated labels with expected labels prescribed by confidentiality requirements can identify violations as we show in Section 8. In the following, we explain the meaning of the previously introduced clauses in an informal way because a full formal discussion would require explaining all used helper clauses in detail, which is not possible within a reasonable amount of space. Instead, we just explain the effect of these helper clauses. The full specification is available in the logic programs in the dataset (Seifermann et al., 2021a). The underlying semantics for interpreting the logic programs are given by the SLD resolution algorithm (Nugues, 2006, pp. 447) for first-order logic programs. Later, queries will also use the Prolog-specific all-solution predicates findall and setof (Nugues, 2006, pp. 470). The algorithm and the all-solution predicates have well-known and established semantics for firstorder logic programs.
The majority of clauses have quite simple semantics: they state that an element of a certain type exists with a certain identifier. Additionally, some of the clauses described in the following establish relations between elements. The only clauses having complex semantics are the clauses covering node behaviors. We describe all clauses in the following.
DFD Nodes. The semantics of the clauses representing nodes is straight forward: the clauses for various node types simply mean that an element of the named type exists with a given identifier.
For instance, the meaning of process(N) is that there exists a process with identifier N.
Actor Behavior (F4). The clauses representing actor processes only describe existence: The meaning of actorProcess(N, A) is that there exists an actor process with an identifier N that belongs to an actor with identifier A.
Multiple Inputs (F2). The clauses representing and involving pins only describe existence: The meaning of inputPin(N, PIN) is that an input pin PIN exists at the node N. The meaning of outputPin(N, PIN) is that an output pin PIN exists at the node N. The data flow clause states that a data flow from a source to a destination exists. dataflow(F, N_S, PIN_S, N_D, PIN_D) means that there exists a data flow F originating from pin PIN_S of node N_S and going to pin PIN_D of node N_D.
Node Characteristics (F1). The clauses covering characteristic types describe the existence of these types: The meaning of characteristicType(CT) is that a characteristic type CT exists. The meaning of characteristicTypeValue(CT, V, I) is that the characteristic type CT contains a label V at index I.
The clauses representing node characteristics introduce a relation between a node and a label. nodeCharacteristic(N, CT, V) means that the label V belonging to a characteristic type CT applies to node N.
Node Behavior (F3). Node behaviors describe the label propagation functions of nodes. The previously described clauses define the structure of a DFD as directed graph of nodes and edges. Together, they build a label propagation network. The semantics of the label propagation are given by the characteristic clauses for input and output pins. We decided to realize the label propagation as label lookup to reduce the effort for considering multiple combinations of data flows. If we are only interested in the labels of one particular node, it is more efficient to follow data flows in reverse order. We can stop following data flows as soon as the label cannot be changed anymore. This is the case if an assignment only involves a constant because previous labels would be overridden by that constant assignment anyway. Therefore, we only consider nodes that actually change the labels. In contrast, a forward propagation would require us to evaluate all nodes and combinations of alternative data flows because we do not know yet whether the labels propagated by a node will eventually influence the labels of interest. This is costly in presence of alternative data flows. Besides the label lookup, we already identified further means for improving the performance of the analysis in a student's thesis (Kunz, 2018). These optimizations, however, increase the complexity of the mapping to the logic program as well as the logic program itself and are, therefore, subject to future research. Because we did not experience a performance issue in non-synthetic systems and, especially, not in the realistic systems of our evaluation, we did not include these optimization for the sake of comprehensibility.
The characteristic clause for input pins shown in Listing 2 is part of various helper clauses added as preamble to the mapping result of the previous section. The labels available at an input pin solely depend on the labels available at the output pin that a data flow connects to the input. Lines 2 and 3 find a data flow F that connects an output pin PIN_S to the input pin PIN. To avoid evaluation cycles, only data flows not already visited are considered in the next line. In the last line, the truth value of the label V of the output pin PIN_S is just copied. In the same step, the set of visited flows VISITED is extended by the used data flow.
The major benefit of the SLD resolution algorithm used in Prolog is that it can find all possible variable bindings, i.e. all possible labels available via all possible data flow trees, by reevaluating the clause. This is important if there are multiple data flows connected to the same input pin and it also addresses the corresponding challenge Ch1 of systematically considering all possible data flow paths. A label is only available for a certain node and a certain data flow tree. In the running example, the book flight process has two alternative data flows providing credit card details. The reevaluation by Prolog automatically considers both data flows but only the direct flow of credit card details leads to a violation. A data flow tree, as introduced in the section before, can be seen as an acyclic subgraph of the DFD only representing nodes and data flows that potentially affect the labels available at a certain node. There are no alternative data flows contained in such a data flow tree but always exactly one choice for every alternative flow. Therefore, all data flow trees for the book flight process contain either the direct flow or the declassified flow but never both. This is important for identifying the underlying issue of a reported violation.
The characteristic clause for output pins has the same arguments as the clause for input pins but the body depends on the particular assignments as motivated in Section 7.2. The meaning of constant assignments is that the label is always available Listing 2: Prolog rule for finding labels on input pins. (true) or is never available (false) independent of the particular data flow tree or visited flows. The meaning of logical operators is equivalent to their intuitive meaning, e.g. the And operator translated to, means that both operands have to evaluate to true in order to become true. The meaning of references to node characteristics is that the particular label has to be available at the node, i.e. there has to be a nodeCharacteristic clause for the particular node and label. The meaning of references to characteristics of incoming data is that the particular label has to be available at the referenced input pin, i.e. the characteristic clause for the particular input pin, label and data flow tree has to evaluate to true. Again, Prolog considers all possible data flow trees when looking for labels. The data flow tree S initially consists of one particular data flow for every input pin. The resolution of further clauses extends this data flow tree until it contains all relevant data flows.

Definition and execution of label comparison function
Extended DFDs as described before can be analyzed for violations of access control and information flow requirements. To do that, the automated model transformation described in the section before translates the DFD into a logic program given in Prolog. The label comparison function is a query to the Prolog program. Queries compare labels of received data with labels of other data or nodes. Prolog automatically considers all data paths via backtracking (Ch1), which means that all possible sets of labels that can be found via all possible data flow paths are considered in the comparison. The label comparison function is part of the analysis definition introduced in the approach overview in Section 5. We focus on the label comparison function in this section because we already motivated and explained the other elements of the analysis definition, namely the node properties, the property types used for data and the behavior descriptions. In the following, we recapture the Prolog clauses that security experts can use to define label comparisons. Afterwards, we define the query for our running example. For the complete logic program of the running example, please refer to our data set (Seifermann et al., 2021a). We create and discuss further queries as part of our evaluation in Section 10.3.
Security experts define queries using the clauses in Listing 3.
Line 1 gives clauses to find identifier N representing actors, stores or (actor) processes. Line 2 gives clauses for finding identifier PIN of input or output pins. Line 3 gives a clause to find identifier CT of a characteristic type. Line 4 gives a clause to find label identifier CV of characteristic type CT with order number I. Line 5 gives a clause to find a flow tree S consisting of all data flows that potentially contributed labels to pin PIN of node N. The tree chooses exactly one data flow at any pin having multiple, alternative data flows. Line 6 gives a clause to check whether node N is visited when following flow tree S. Line 7 gives a clause to find label CV of characteristic type CT that is active on node N. Line 8 gives a clause to find label CV of characteristic type CT that is present on pin PIN of node N when choosing flow tree S. Please note that the clause given in line 8 is a shorthand for the characteristic clause introduced in the previous section that uses an initial empty list of already visited data flows. This is Listing 3: Prolog API to specify comparison functions. reasonable because no flows have been visited yet when a label lookup starts at one particular node. Queries are tailored to policy types such as RBAC policies or non-interference policies. A policy is a set of confidentiality requirements. A policy type prescribes the structure of confidentiality requirements that may be used in a policy. Therefore, the first step is to define a violation in the context of the policy type. An information flow policy with totally ordered levels is violated if someone with clearance l clear accesses data with classification l class bigger than the clearance l clear < l class . In terms of our semantics, we have to find and compare the clearance label of a node with the classification label of its input pins. In the second step, we encode this detection rule by the query shown in Listing 4.
In line 1, we determine the clearance level V_CLEAR of a node P. Line 2 determines the position N_CLEAR of the clearance level V_CLEAR in the enumeration. We defined the levels in ascending order, so the level with a lower index is semantically lower than a level on a higher index. Line 3 finds an input pin PIN for node P. The classification level V_LEVEL is determined in line 4. The order number N_LEVEL of the classification level V_LEVEL is determined in line 5. Line 6 tests for l clear < l class .
The DFD as modeled in Fig. 2 does not contain an information flow violation when not considering the dashed edge. Considering the dashed edge, we detect the two violations shown in Listing 5. The first result in line 1 detects that the Booking Storage receives data on its input that is classified higher than its clearance. The same violation is detected for the process booking process in line 2. In both cases, we can find the cause of the violation in the data flow tree S, which contains ccd direct. This data flow directly transfers the credit card data without declassification, which causes the violation. Therefore, we can trace back both violations to the introduced issue given by the ccd direct data flow.
Specifying queries requires security expertise. However, designers do not need this competence. They can reuse defined queries from an existing analysis definition that also contains characteristic types, characteristics of nodes and behavior definitions. Security experts can create these reusable elements and put them in a catalog structured by the particular policy types. After that, designers can select the elements and make use of them without the need for security expertise. For instance, the query presented in this section does not depend on particular levels. Therefore, it is applicable to information flow policies consisting of arbitrary totally ordered levels. How many elements of analysis definitions for such policy types are reusable depends on how tailored they are to the use case. For instance, the clearance and classification levels defined for our running example are tailored to the example, so they are reusable but require renaming to fit another system.

Tool support
We realized the previously presented concepts to show that an implementation is feasible (Böhme and Reussner, 2008 call this a level 0 validation) and to support the evaluation described in Section 10. This article is not meant to be a technical report, so we only briefly report on our tooling. Our data set (Seifermann et al., 2021a) gives more details about the tooling. The full implementation is available in various projects on GitHub, which we describe in the following.
First of all, we realized all metamodels 1 described in this article in the Eclipse Modeling Framework (EMF) (Steinberg, 2009) and defined appropriate invariants to specify the wellformedness of DFDs in more detail. For instance, invariants ensure that a data flow always originates from an output pin and leads to an input pin, which both must not belong to the same node. Using EMF automatically provides us with ready to use editors. The metamodel projects on Github also contain an enhanced graphical editor that adopts the classic DFD syntax. Designers can reuse elements defined for particular policy types by referencing catalog models.
To automate the detection of violations, we realized the mapping to the Prolog program as model-to-model transformation in Xtend (Bettini, 2016) and implemented an adaptor 2 to run Prolog interpreters. The transformation has about 480 LLOC in total, which includes about 130 LLOC for adding the static preamble to the logic program. LLOC covers all lines containing a statement. We also created a metamodel 3 for Prolog programs as well as a model printer to serialize the program and a model parser to parse results based on Xtext (Bettini, 2016). The query is executed in the commonly used SWI Prolog interpreter (Wielemaker, 2017) that we connected to our prototype via the implemented adaptor. Therefore, users do not have to interact with the interpreter directly.
We decided to specify the analysis directly in Prolog because the resulting specification is self-contained and executable. It is easy to find the concepts introduced as part of the semantics definition in Section 7 within the analysis program, so there is no gap in abstraction. We could also have used existing model checking approaches (González and Cabot, 2014) but this would not free us from a model transformation into a particular formalism or at least a special encoding of the logic to discover multiple data flow paths (Ch1).
To ease writing Prolog queries, we developed a Domainspecific Language (DSL) (Hahner et al., 2021) that is capable of formulating common queries without the need to adhere to the Prolog syntax or even be aware of Prolog. When formulating the query with the DSL, it is also possible to process the interpreter result directly and report the detected violations in terms of the DFD, which is know to the designer. The prototype of the DSL is still under development and not ready to use yet, so we did not use or evaluate it as part of this article.

Evaluation
In this section, we evaluate our aforementioned contributions. We present our evaluation goals and metrics in Section 10.1. The evaluation design is described in Section 10.2. In Section 10.3, 10.4 and 10.5, we discuss results. We discuss threats to validity in Section 10.6 and limitations in Section 10.7. We report on the availability of evaluation data in Section Data Availability.

Evaluation goals and metrics
We structure our evaluation according to the Goal-Question-Metric methodology (Basili and Weiss, 1984;Basili et al., 1994). We formulate three evaluation goals.
(G1) Evaluate the expressiveness of our syntax and semantics to represent and analyze systems using information flow and access control. (G2) Evaluate the reusability of DFDs when switching confidentiality mechanisms. (G3) Evaluate the accuracy of confidentiality analyses realized with our semantics.
We evaluate expressiveness, reusability and accuracy. Expressiveness describes what confidentiality mechanisms our approach can express. We want to evaluate expressiveness to see whether the approach supports information flow and access control (Ch2).
The evaluation of expressiveness also shows that we addressed the challenge of enabling custom analyses (Ch3) because we do not limit ourselves to predefined confidentiality mechanisms and analyses but use extensions to cover confidentiality mechanisms. We evaluate reusability of DFD parts when switching confidentiality mechanisms to show that our approach reduces the amount of elements, which have to be recreated. This was one motivation for developing the extended DFD syntax for covering information flow and access control within one modeling language (Ch2). We evaluate accuracy because expressiveness and reusability are only useful if resulting analyses have satisfying accuracy, which means designers can identify violations. To provide accurate analyses, it is necessary to systematically consider all possible data flow paths, i.e. combinations of these data flows. Otherwise, violations might not be discovered. Because the evaluated system designs contain multiple data flow paths, evaluating the accuracy of the analyses is appropriate to show that we addressed the challenge of considering all data flow paths (Ch1).
To evaluate G1, we ask the following evaluation questions: To answer the questions for G1, we use the syntactic quality metric s = |R∩E| /|R| as defined by Boyd et al. (2005) for rating the quality of constrained natural languages. The metric is also usable for rating a DSL (Munnelly and Clarke, 2008), which fits to the DFD metamodel presented in this paper. In our context, a language requirement r ∈ R is a DFD or analysis query that we would like to express. The set of expressions E contains every possible DFD or analysis query that can possibly be constructed using our artifacts. The metric value ranges from zero (no DFD or analysis query could be expressed) to one (all DFDs or analysis queries could be expressed).
To evaluate G2, we ask the following evaluation questions: (Q2.1) How much DFD elements can be reused when switching between confidentiality mechanisms?
To answer Q2.1, we calculate the similarity coefficient according to Jaccard j = |M∩N| /|M∪N| (Levandowsky and Winter, 1971) between the models (M and N) of the cases that represent the same system but use different confidentiality mechanisms. A model is defined as set of model elements, i.e. instances of metaclasses. A model element m ∈ M is equal to a model element n ∈ N if the type and all properties of the model elements are equal. We determine this equality of model elements by applying EMFCompare (Brun and Pierantonio, 2008): First, we match model elements by their identifiers. Afterwards, we compare their properties. A reference to another model element is such a property. References are considered equal if they refer to equal model elements. The coefficient is simple but is a good measure of the amount of unchanged model elements and consequently also on the amount of elements, which have to be changed when switching the used confidentiality mechanism in a system design. The metric value ranges from zero (every element is different and has to be recreated) to one (the models are equal and nothing has to be recreated). The coefficient of Jaccard is appropriate to rate the similarity of software design models as we have shown in previous work (Heinrich, 2020;Monschein et al., 2021).
To evaluate G3, we ask the following evaluation question: (Q3.1) What is the accuracy of the analyses?
To answer Q3.1, we apply the commonly used metrics precision p = tp /(tp+fp) and recall r = tp /(tp+fn) with the number of true positives t p , false positives f p and false negatives f n . We describe the classification of results as t p , f p or f n in the evaluation design. We intentionally do not evaluate usability or correctness of the modeling and analysis approach. Usability is usually evaluated in user studies that evaluate the tool support and the concrete syntax used for modeling. We neither aim for evaluating our implementation nor for a particular concrete syntax because both are no contributions of this article. We do not verify correctness because this would not provide insights into the application of our approach and how well the approach addresses the challenges (Ch1, Ch2, Ch3). Instead, a case study provides such insights in the context of realistic systems, which is the objective of this article.

Evaluation design
Evaluations based on case studies are the second most common evaluation approach for security notations after just illustrating how to use notations and analyses as van den Berghe et al. (2017a) point out. Especially with respect to expressiveness and reusability, a detailed discussion of established cases provides more insight than a generic discussion about hypothetical systems. Therefore, we evaluate the proposed syntax and semantics based on a case study. We select cases from related work (Tuma et al., 2019;Katkalov, 2017) and from one of our previous publications  or define new cases if there are no appropriate cases available. A case is a pair of a system design and confidentiality requirements. In the following, we discuss the evaluation design per evaluation goal before we discuss cases and their selection.
Expressiveness. For evaluating expressiveness, we model the system design as DFD and the corresponding analysis query using our semantics for each case. The procedure described in the following is the same for both access control and information flow control. (1) We identify relevant data and node properties, i.e. the labels and the corresponding characteristic types. (2) We identify relevant behavior descriptions including the label propagation rules.
(3) We model the system design as DFD by using the behaviors defined before. (4) We define the analysis query for identifying violations. After step 3, we finished modeling the system design, so we can calculate the syntactic quality metric and answer Q1.1 and Q1.2. A requirement as specified by the metric is a thing that shall be expressed by a modeling language. In our evaluation, one case is one requirement, i.e. a thing to be expressed according to the definition of syntactic quality by Boyd et al. (2005). This means, the DFD metamodel has to be capable of representing the whole case or the whole case will be counted as not expressible. After step 4, we finished the analysis definition, so we can calculate the syntactic quality metric and answer Q1.3 and Q1.4.
We build weighted sums while calculating the syntactic quality metric. The weighted sums normalize the influence of cases that use different system designs but share the same analysis type. Without such a normalization, a single case using a not supported analysis type can be hidden by a group of cases sharing the same but well-supported analysis type. For instance, the information flow cases TravelPlanner, DistanceTracker and ContactSMS from related work (Katkalov, 2017) share the same analysis definition representing noninterference with declassification using totally ordered security levels. If our approach supports this analysis type well but does not support another analysis type, that is only used by one case, the value of the syntactic quality metric would be 3 /4. However, we are, especially, interested in the support of confidentiality mechanisms. Therefore, the metric value using a weighted sum 1+1+1 3 / 1+1+1 3 +1 = 0.5 would be more appropriate. As illustrated, we group cases by their type of analysis definition. We sum up the amount of fully modeled cases and divide this sum by the amount of cases in the corresponding group. Eventually, we sum up all of these weighted sums and divide it by the number of different types of analysis definitions.
Reusability. To answer Q2.1, we identify cases that are about the same systems but that use different confidentiality mechanisms. This applies to the cases using the previously mentioned systems TravelPlanner, DistanceTracker and ContactSMS: For each system, there exists one case using RBAC and another case using noninterference with totally ordered levels. For every such pair of cases, we calculate the Jaccard Coefficient by comparing the model elements. We use EMFCompare (Brun and Pierantonio, 2008) to compare the model elements in order to identify equal and unequal model elements. The comparison approach of EM-FCompare provides the necessary steps to decide whether two model elements are equal: In a first step, matching elements are identified by comparing their identifiers. This is reasonable because we copied and adjusted the models to switch the confidentiality mechanism. This is also the approach designers would most likely do. In a second step, differences are calculated, which covers all properties of the model elements. As a result, we receive a list of differences. We walk through that list and add all model elements that have been matched and that have no changed property to the set of equal elements M ∩ N. The metric indicates a benefit compared to the state of the art if the value is above 0.
Accuracy. The accuracy evaluation reuses the previously created DFDs and analysis queries. The procedure described in the following is the same for access control and information flow. (1) We identify a way to introduce an issue into the DFD that leads to violations with respect to the defined analysis. We derive the issue from related work or by defining a new issue if no issue is reported in related work. We describe how we did that for every case in the description of the case selection below. (2) We inject the issue into the DFD of the case. The issue is usually introduced by an additional data flow. Therefore, the analysis has to consider multiple data flow paths. (3) We execute the analysis and classify the results.
To calculate the accuracy metrics, we classify the violations, which our analysis reports. A reported violation is valid if it traces back to the injected issue. A reported violation is invalid if it does not trace back to the injected issue. Because the DFD does not contain an issue before we inject an issue, it is reasonable to trace back violations to exactly the one known issue. A violation traces back to an issue if the injected data flow is in the flow tree of the violation. We classify the set of reported violations per case to avoid that large cases with many reported violations for one analysis type hide the violations of smaller cases for another analysis type in the metric. If all reported violations are valid, the case is counted as a true positive t p . If at least one violation is invalid, the result is a false positive f p . Not reporting any violations is a false negative f n .
The reason for classifying all violations together is that analyses do not only report one but multiple violations. This is no flaw in our analysis but the logical consequence of propagating data through the system: if data must not be used in one node, the chances are high that it must not be used in following nodes as well. In our running example, the analysis reports two violations: one violation at process booking and one violation at the store, into which the process writes the data. Related work (Katkalov et al., 2013;Tuma et al., 2019) often only discusses why a violation occurs, i.e. the root cause of a violation, but does not discuss individual occurring violations. In contrast, our approach reports violations but no root cause. Again, this is no limitation of our approach because a root cause is a design decision that has to be changed in order to meet confidentiality requirements. Neither our nor other approaches can free software designers from choosing a solution because this is a creative process. Doing this automatically is barely possible. Therefore, we have to bridge the gap between the set of individual violations that our approach yields and the root causes that related approaches report in their publications. We do this by ensuring that every reported violation traces back to the issue we introduced. We already demonstrated how to trace back issues in Section 8.
Case Selection for Information Flow. There are various security models based on information flow but noninterference is one of the most commonly used models (Sabelfeld and Sands, 2009), which can be extended by declassification to increase its applicability. Related approaches (Tuma et al., 2019;Katkalov, 2017) also use this security model and provide cases including points to insert issues. These cases support our evaluation because they provide data-oriented system descriptions, define information flow requirements based on data and provide reference results, issues or critical points to inject issues for rating the accuracy of analysis results. All cases consider declassification and are based on real systems. We select all cases presented in the mentioned publications. Katkalov (2017) provides five cases with flow requirements: TravelPlanner, DistanceTracker and Con-tactSMSManager cover noninterference with declassification using totally ordered security levels (OL). The information flow Table 3 Characteristics of information flow cases (top) and access control cases (bottom) realized in our DFD syntax. analysis ensures that no data arrives at a node that has a clearance level lower than the data classification. PrivateTaxi covers finegrained noninterference rules between nodes and selected data types (LG). BankingApp covers noninterference between tenants of a banking system. All aforementioned cases of Katkalov do not provide reference results in form of a set of violations or cases containing issues. However, they describe the critical point, i.e. a declassification function, in the design that prevents violations. Therefore, we introduce an issue by circumventing these declassifications. Tuma et al. (2019) provide the four cases FriendMap, Hospital, JPmail and WebRTC that cover noninterference analyses with two security levels (2L). The information flow analysis ensures that no data classified high arrives at a node observable by an attacker. Tuma et al. provide a variant with and a variant without issue for the cases FriendMap and Hospital. We use both variants, so we do not have to introduce an issue by ourselves. For the remaining cases, the critical points in the design, i.e. the declassifications, are available. We introduce an issue into every case by circumventing the declassification. The upper part of Table 3 gives an overview of the size of the cases after realizing them with our syntax. The publications describe the cases in more detail than we can provide in this article, so we refer to the respective publications and our data set (Seifermann et al., 2021a) for detailed descriptions.
Case Selection for Access Control. Access control and corresponding analyses are a wide field. Unfortunately, finding cases that are neither about correctly implementing access control systems nor designing appropriate requirements is challenging. The related approach FlowUML (Alghathbar et al., 2006) does not provide an evaluation and therefore no cases. In previous work , we provide three RBAC cases derived from the already known cases TravelPlanner, Distance-Tracker and ContactSMS by mapping the security levels to roles. The RBAC analysis ensures that every node holds at least one role that a data item requires to grant access. The cases support our evaluation by covering various system designs and providing an analysis covering the core of RBAC. We introduce the same issues in the access control cases that we already introduced in the information flow cases for the same reasons. There are further three common access control models (Furnell, 2008, pp. 61), for which we could not identify appropriate cases in literature: DAC, MAC, and ABAC. Therefore, we create one case for each access control model on our own. We use a textbook (Furnell, 2008) that describes the foundational concepts of these models. The cases support our evaluation because they are designed to cover the remaining, most common access control models, which we have to consider to reason about expressiveness. The lower part of Table 3 gives an overview of the size of the cases after realizing them with our syntax. In the following, we describe the cases created by us. Our data set (Seifermann et al., 2021a) contains additional details about the cases.
DAC Case. Discretionary Access Control (DAC) (Furnell, 2008, pp. 61) directly assigns access privileges on objects to the accessing subjects. The case covers these aspects: The DFD describes a system consisting of a storage of family pictures and a system function to read the pictures as illustrated by Fig. A.1. The DFD reflects common usage scenarios of DAC in operating systems or file sharing systems. There are four users: Mother, Dad, Aunt and Indexing Bot. The mother is the owner of the pictures. She grants read access to all but the bot. Consequently, the index bot must not access the storage. The introduced issue is that the index bot accesses the pictures.
MAC Case. Mandatory Access Control (MAC) (Furnell, 2008, pp. 64) defines mandatory, global rules that aim for avoiding unwanted explicit information flows. The military access control model is one of the most prominent examples for MAC. Therefore, we assume that this particular model is a representative example for MAC. Military information systems often use MAC requirements prohibiting access to information classified higher than the user's clearance. The case is about such a system: The DFD describes a system for monitoring the airspace using the military access control model (Furnell, 2008, pp. 65) as illustrated by Fig. A.2. There are three user types: Clerks have the clearance Unclassified. They create and store weather reports. Flight Controllers have the clearance Classified. They register civil planes, look them up in a database and determine new routes for them by considering weather reports. Military Flight Controllers have the clearance Secret. They do the same as the civil flight controller but for military planes by also considering positions of civil planes. Information about weather is Unclassified, information about civil planes is Classified and information about military planes is Secret. The levels have the total order Unclassified, Classified and Secret. The introduced issue is that the civil flight controller reads military plane information.
ABAC Case. Attribute-based Access Control (ABAC) (Furnell, 2008, pp. 74) describes subjects and objects by attribute descriptors rather than roles or identity. Access control permissions are defined between subject descriptors and object descriptors. The case covers these aspects: The DFD describes a system design for managing customers of a bank with branches in the USA and Asia as illustrated by Fig. A.3. There are Clerks that register customers, look them up and determine credit lines for them. A clerk has the attributes Role and Location. Managers have the same abilities and properties as a clerk but can also register celebrity customers and move customers between branches. Processed information has the attributes Customer Status and Customer Location. The access permissions are defined as follows. Users with a certain location can access information about customers that are in the same location and that are not celebrities. Users that have the role manager can access all information. Any other access is forbidden. The introduced issue is that a manager registers a celebrity as regular customer.

Evaluation results and discussion of expressiveness
We could successfully model all system designs including properties and behaviors relevant for confidentiality for all cases mentioned in Table 3  In the following, we discuss the modeling results and examine the reason for reduced syntactical quality. We do not present the resulting DFDs but focus on the used characteristic types, behavior descriptions and the analysis queries because they are the crucial parts that have potential to limit expressiveness. As introduced in Section 5, we refer to the combination of these three things as analysis definition. The full DFDs are available in our data set (Seifermann et al., 2021a). The cases TravelPlanner, DistanceTracker and ContactSMS share the same analysis: non-interference using a totally ordered lattice (OL). The characteristic types are the classification of information and the clearance of nodes. Both use totally ordered security levels. The behavior descriptions are as follows. A Forwarder copies the classifications from input to output. A Store acts like the forwarding behavior. A Joiner merges two inputs into one and classifies the output by the highest of all incoming levels. A Syncer acts like the forwarding behavior but waits for an additional input without considering its classification. A Declassifier explicitly sets the classification of the output. The analysis query is the same as already presented in Listing 4. We could successfully represent all three cases, which includes the system designs and analyses. The presented analysis definition is applicable to all noninterference analyses including declassification that have totally ordered security levels.
The cases FriendMap, Hospital, JPMail and WebRTC share the same analysis: non-interference using high/low levels (2L). The characteristic types are the classification of information, the classification of encrypted content and the zone of nodes. The classification characteristic type uses the values high and low. The zone characteristic type uses the values attack and trusted. The behavior descriptions are as follows: Store, Forwarder and Joiner share the semantics already described for the previous cases. The Encryptor always sets the classification of the output to low but attaches the old classification in the classification of encrypted content characteristic. The Decryptor sets the classification of the output to the classification stored in the classification of encrypted content characteristic. The analysis query shown in Listing 6 searches for data with a high classification that arrives on a node P in the attack zone. We could successfully represent all four cases, which includes the system designs and analyses. The presented analysis definition is applicable to all noninterference analyses including declassification by encryption that use two classification levels and only distinguish regular and attacking system nodes or users. The PrivateTaxi case is complex and covers non-interference using lattice groups (LG). It requires a decent amount of characteristic types and behaviors. The characteristic type PublicKeyOf and PrivateKeyOf describe that the information is a public key or a private key of an entity. DecryptableBy describes the entities that can decrypt the encrypted information. Entity describes that a node belongs to an entity. All of these characteristic types use a list of entities as values. The characteristic type CriticalData describes that a data type requiring protection is contained in the information. EncryptedContent describes the content of encrypted information. Both characteristic types use a list of data types as values. The Store, Forwarder and Syncer behavior are as explained previously. The Joiner determines the output characteristics by building the union of received labels for each characteristic type except for the decryptable characteristic type, which requires the intersection of labels. The Encryptor stores the critical data type in the characteristic for encrypted content, removes the critical data type characteristic, and sets the decryptable characteristic to the owner of a received public key. The Decryptor inverts the effect of the Encryptor if the decryptable characteristic matches the owner of a received private key. There are two behaviors that declassify data: The Proximity behavior acts like the forwarding behavior but removes the critical data type label for routes because the route cannot be reconstructed from a single valued metric. The Route-Creator behavior creates routes from a location and a destination. It acts like the joining behavior but explicitly sets the critical data type characteristic to route. The analysis query shown in Listing 7 tests whether either the service for calculating distances has access to contact information or the private taxi service has access to the route. We could successfully represent the case, i.e. the system design and the corresponding analysis. The behaviors to handle encryption are reusable but the characteristic types and the analysis goal are tailored to the case. The reason for this is the explicit reference to nodes in the analysis goal as defined by Katkalov (Katkalov, 2017, p. 211).
As mentioned before, we could not fully express the BankingApp case. The information flow requirements to be considered in this case are about ensuring that tenants/users of a banking app including the banking backend system do not interfere with each other. For instance, a user must not have access to the balance of another user. While we could represent the system structure consisting of processes, the actor and stores, we could not represent the remaining system aspects such as multiple users of the same type. Consequently, we could also not represent the analysis query. We cannot represent multiple users because the DFD model and the semantics operate on a type-level. However, representing multiple users of the same type Listing 8: RBAC analysis query for iFlow cases. Listing 10: Extension of analysis query for non-interference using totally ordered labels. requires models and semantics operating on instance-level. We discuss this aspect as part of the limitations in Section 10.7. The access control versions of the cases TravelPlanner, Dis-tanceTracker and ContactSMS share the same analysis: Core RBAC. The characteristic types are AccessRights of data and Roles of nodes. Both use three available roles as values. The behavior types are the same as described for the corresponding information flow cases. The Joiner applies the intersection of access rights of incoming data to the output. The Declassifier copies the access rights including a defined additional access right to the output. The remaining behaviors remain the same. The analysis query illustrated in Listing 8 collects all access rights REQ of a data item, collects all roles ROLES of a processing node, and reports a violation if the intersection between access rights and roles is empty. We could successfully represent all cases, i.e. the system design and the corresponding analysis. The analysis definition can be reused to represent access control scenarios covering static Core RBAC (Furnell, 2008, pp. 71).
The DAC case covers DAC without delegation of rights. The used characteristic types are the Identity of actors as well as the ReadAccess and Owner of stores. All characteristic types use a set of identities as values. We reuse the Store and Forwarder behavior descriptions that we described previously. The analysis query in Listing 9 detects data received by actors, which comes from a store that has not granted read access to that actor. It uses the flow tree S, as well as the helper clause traversedNode that tests whether the given store STORE is in the flow tree S. We could successfully represent the system design and the corresponding analysis. The involved characteristic types and behavior descriptions are reusable for other DAC cases.
The MAC case covers MAC with the military access control model. We use the characteristic types Classification of data and the Clearance of nodes. Both characteristic types use an ordered set of security levels. We reuse the previously described behavior descriptions Store and Forwarder. A Joiner propagates the highest classification value of all incoming data items. The analysis query is the same query as already presented in Listing 4 but we restrict the nodes to be checked to nodes directly associated to an actor as shown in Listing 10. We could successfully represent the MAC case, i.e. the system design and the analysis.
In the ABAC case, we use the characteristic types Customer-Location and CustomerStatus to describe attributes of data as well as EmployeeLocation and EmployeeRole to describe attributes of actors. We reuse the previously defined behavior descriptions Listing 11: Analysis query for ABAC case. Store and Forwarder. The Joiner applies the union of all incoming data characteristics to the outgoing data. The LocationChanger acts like the forwarding behavior but sets the location to Asia. The analysis query in Lisitng 11 encodes the specific requirements of the case. A violation is detected if (i) the location of the actor and the data is not the same and the actor is not a manager (ii) the data is about a celebrity and the actor is not a manager. We could successfully represent the case, i.e. the system design and the analysis. All behaviors except the location changing behavior are reusable. The analysis query is specific for the ABAC rules and not reusable. However, the flexibility of Prolog allows to represent even complex attribute descriptors and relations.
As the values of the syntactic quality metric and the corresponding discussion demonstrated, we can represent multiple types of information flow and access control mechanism (Ch2) in system designs. We integrated the confidentiality mechanisms via extensions rather than predefined behavior descriptions or characteristic types. Because further, custom analyses would be integrated via the same extensions, the evaluation also demonstrated that custom analysis definitions (Ch3) can be integrated without invasive source code extensions.

Evaluation results and discussion of reusability
We calculated the Jaccard Coefficient for the cases covering the TravelPlanner, DistanceTracker and ContactSMS to answer Q2.1. For the TravelPlanner system design, the coefficient is j = 89 /219 = 0.41. For the DistanceTracker system design, the coefficient is j = 47 /98 = 0.48. For the ContactSMS system design, the coefficient is j = 66 /123 = 0.54.
The Jaccard Coefficients that we calculated for the three cases TravelPlanner, DistanceTracker and ContactSMS range between 0.41 and 0.54. The bigger the value is, the more similar the DFDs are. A value of 0.5 means that the shared amount of model elements is as big as the sum of the individual model elements of both involved models. This is a significant improvement compared to a value of 0, which would be the result of using two dedicated modeling languages of the state of the art for representing two versions of a system. An in-depth look at the individual model elements, i.e. the model elements that are different when using different confidentiality mechanisms, confirms that the structural elements, i.e. the nodes and data flows, are not affected by the switch to another confidentiality mechanism. This means, the DFD structure is equal, which is the expected effect of separating the system structure from the confidentiality mechanism in the metamodel. The good metric values show that the chosen modeling approach supports considerable reuse of existing models when switching confidentiality mechanisms.
To give an idea what these results mean, we would like to explain how we switched the mechanism in the case study. Fig. 9 presents the Distance Tracker case. The upper part shows the DFD extended by properties and behavior descriptions. The lower part shows the particular properties and behaviors. In order to switch the confidentiality mechanism from information flow to RBAC, we adjusted the properties and behaviors of information flow (shown in dark gray) in a way that they look like the RBAC properties and behaviors shown in light gray. This means, we neither had to adjust the DFD structure nor the annotations of the DFD (shown as upper case letters). It is also possible to first strip all annotated information, i.e. properties and behaviors, from the DFD, import an existing analysis definition and add new annotations to the DFD but this implies additional effort for recreating the annotations. Either way, the DFD structure will always remain the same, which means designers can save effort by not recreating the model from scratch.
As the results and the previous discussions show, the proposed extended DFD is capable of representing access control and information flow control mechanisms. Because we did not have to change the modeling language to represent both mechanisms, we successfully addressed challenge Ch2. As the Jaccard Coefficient illustrated, we did not only achieve this by merging two distinct modeling languages but by using a commonly shared modeling core (the DFD core elements) and extending it by analysis-specific modeling constructs. We represented all confidentiality mechanisms by extensions, which means that these extensions are the foundation of confidentiality analyses that users can define. Therefore, we addressed the modeling aspect of Ch3.

Evaluation results and discussion of accuracy
We executed the previously defined analyses for every case that we could express and classified the results. We found violations in 14 cases and all reported violations trace back to the specific issue. This means all results are classified as true positives Thus, our analyses achieved perfect accuracy. We could reproduce the analysis results of related publications that initially defined the cases. We represented and analyzed information flow and access control cases, while related approaches can only represent subsets as discussed in Section 4.2.
As part of the result classification, we checked every reported violation. Reporting on every violation as part of this article would require a considerable amount of space and also knowledge about the particular DFD. Therefore, we do not report on the details of this classification in this article but refer to our data set (Seifermann et al., 2021a), in which we give enough details on the DFD to understand the classification for each violation that is also part of the data set.
The values of the precision and recall metrics demonstrated that we cannot only represent systems and confidentiality mechanisms as well as analyses but that we can also derive accurate results via the defined analyses. We always introduced errors by adding an additional, alternative data flow to a DFD without an issue. If we did not systematically explore all possible data flow paths, we could not have received such accurate results. Therefore, we addressed the challenge about considering multiple data flow paths (Ch1). The results also support our claim to support information flow and access control analyses (Ch2) as well as custom analysis definitions (Ch3) because we cannot only model them (see Section 10.3) but also execute them.

Threats to validity
We structure the discussion of threats to validity by the four categories of Runeson and Höst (2009) for evaluations based on case studies.
Internal validity assures that no unknown factor influences the investigated factor in order to draw valid causal relations. The investigated factors in this evaluation are the expressiveness, reusability and accuracy of our syntax and semantics. The expected influencing factors are our syntax and semantics. However, further factors can influence the expressiveness: Limited experience with the modeling language can influence the expressiveness negatively. We can exclude this factor because the authors of this article are the designers of the modeling language. Too simple scenarios can make the expressiveness look more positive than it actually is because they omit relevant aspects. We selected all information flow cases and half of the access control cases based on related publications (Tuma et al., 2019;Katkalov, 2017;Seifermann et al., 2019), so we do not expect them to be tailored or too simple in this field of research. In addition, we selected all cases from the mentioned, related publications to avoid a tailored or bias selection. We used weighted sums to avoid an increased influence of cases sharing the same analysis definition. Without this, it would be possible to hide a lack of expressiveness regarding one type of analysis definition by adding many cases using a well supported analysis definition. We created three access control cases on our own but included fundamental concepts mentioned in a corresponding textbook (Furnell, 2008). We report on aspects of the particular access control mechanisms that we did not cover in the limitations in Section 10.7 to not claim more expressiveness than the case study could show. Overly simplifying analyses can positively influence the expressiveness by hiding important details. We stick as closely as possible to the analyses presented in related work (Tuma et al., 2019;Katkalov, 2017;Seifermann et al., 2019) or the corresponding textbook (Furnell, 2008) to mitigate simplification. We report on aspects of the particular information flow control mechanisms that we did not cover in the limitations in Section 10.7. There are also factors that can influence the accuracy: Even if we did not insert issues in initially created DFDs, there still might be issues that lead to a violation. We cannot rule this out but the evaluation showed that we can successfully detect all injected violations and trace them back at least. Therefore, we can only claim that the analyses at least provide results as good as the results of related approaches. Incorrect analysis queries or DFDs can yield always the same result, which might be a detected violation or not. We addressed this issue by always tracing back violations, which is unlikely to be successful if the analysis query does not properly describe the violation to be expected. Overfitting analysis queries, such as by encoding the violation to be reported directly in the query, can make the accuracy look more positive than it actually is. We evaluated three analysis types (OL, 2L, RBAC according to Table 3) with more than one DFD and achieved accurate results. This is unlikely to succeed for analysis types, which use queries that are overfitted to a particular issue or DFD. In PrivateTaxi (LG), the query is system-specific as requested by the original case description. For the remaining access control queries (DAC, MAC, RBAC), we discussed their generalizability, which would also reveal overfitted queries.
External validity assures that researchers only generalize findings if it is valid to do so. According to Runeson and Höst (2009), case study research does not focus on representativeness but on specific aspects of the case under study to get a better understanding of the phenomena. Therefore, insights cannot be generalized to arbitrary other cases unreservedly. However, generalizing insights to cases with comparable characteristics is possible. Therefore, we discussed the characteristics of the case and how it can be generalized for each analysis type in the discussion of expressiveness in Section 10.3. We consider the cases derived from related work representative for the application area. In addition, we evaluated 15 cases, which we consider a reasonable amount, especially when comparing the amount to related work (Tuma et al., 2019;Katkalov, 2017;Seifermann et al., 2019), which usually only considers 5 cases with similar analysis definitions at most. The remaining cases at least comply with common definitions.
Construct validity assures that the used metrics are capable of answering the evaluation question. We chose the syntactical quality metric to rate expressiveness. It is barely possible to summarize expressiveness by metrics because variations and limitations of the studied cases have to be discussed, so we extensively discussed the results and provided the metric values for the sake of a quick overview. Syntactic quality is an appropriate metric for this as it has already been used to rate the expressiveness of a DSL (Munnelly and Clarke, 2008). We use the Jaccard Coefficient to reason about reusability when switching confidentiality mechanisms. The Jaccard Coefficient is an established metric for rating similarity of sets in various fields (Levandowsky and Winter, 1971). The coefficient requires a definition of an element and a definition of equality between two elements in order to rate similarity. We defined both in the evaluation design in Section 10.2. The definitions cover models elements and their properties. Because the whole model only consists of model elements and properties, the definition covers the whole model. Therefore, the coefficient is applicable to rate the similarity of our models. In addition, we demonstrated the applicability of the Jaccard Coefficient for comparing models in previous work (Heinrich, 2020;Monschein et al., 2021). Using the comparison approach of EMF-Compare (Brun and Pierantonio, 2008) to determine equal model elements is reasonable as we explained in the evaluation design. We described the steps that EMFCompare takes in the evaluation design in Section 10.2. The steps are intuitive, established and could also be carried out manually. The precision and recall metrics used to rate the accuracy of the analyses are commonly applied metrics for rating the accuracy of various information flow analyses (Arzt et al., 2014;Wei et al., 2014). The selection of cases is appropriate for answering the evaluation questions as discussed before.
Reliability assures that the conducted study, i.e. the data collection and data analysis, does not depend on the particular researcher but other researchers come to the same results. As discussed before, the model quality depends on the experience of the modeler with the syntax and semantics. We cannot completely mitigate this issue. However, we provide all material required to replicate the evaluation starting from the models as stated in Section Data Availability. Additionally, all metric values can be calculated in an objective way: we provide clear instructions on how to collect input data for calculating the metrics without the need for subjective interpretations. Therefore, the process and results are traceable and other researchers can decide whether the study has been carried out correctly.

Limitations
We distinguish between limitations of the proposed syntax and semantics on the one hand, as well as limitations of the evaluation on the other hand.
One limitation of the syntax and semantics has been demonstrated in the evaluation: there are no means to represent individual data or users but only classes of data or users. A class of data describes a group of data that is treated the same. A class of users describes a group of users acting the same. This limitation implies limited support for some specific aspects of confidentiality mechanisms: The RBAC extension providing means to specify constraints on individual subjects, which hold roles, cannot be represented. Therefore, we cannot represent that two clerks have to approve something but they must not be the same person, for instance. The delegation of rights in DAC cannot be represented, so we cannot distinguish between valid and invalid access to data that involves delegated access rights, for instance. Also, it is not possible to ensure non-interference between users of the same type, so we cannot ensure, for example, that a bank customer cannot access the balance of another customer. All of these aspects would require detailed information about individual users and data as well as a mechanism to express time and dependencies between system states in different times. We intentionally excluded this because such detailed information required to model individuals might not be available during design time. Additionally, the amount of elements to specify will certainly be increased when more detailed models and even considering time are necessary. We demonstrated that the proposed syntax and semantics can provide valuable results and insights and suggest to cover the remaining aspects in later development phases when more detailed information or even source code is available. This lowers the overhead for analyzing these aspects significantly. Other approaches building on DFDs such as SecDFD (Tuma et al., 2019) or FlowUML (Alghathbar et al., 2006) share the same restrictions.
Our evaluation focused on the expressiveness and accuracy of our syntax and semantics as well as on reusability. We did not evaluate usability. As already said before, we intentionally did not evaluate usability because this would only evaluate our implementation of tool support rather than our concepts. We do not see an open research question in whether usable tooling for modeling and analyzing DFDs can be created because the users in a recent study of Tuma et al. (2020) could successfully use their DFD modeling and analysis approach. We also did not verify the correctness of the mapping and the resulting logic program. As already motivated in the evaluation goals, verifying correctness does not allow us to answer whether we sufficiently addressed the challenges mentioned in the introduction. However, we plan to report on the correctness in future publications on different aspects of our approach.

Conclusions
In this article, we proposed an extended DFD syntax and analysis semantics that allow expressing analyses to detect violations of access control and information flow requirements with good accuracy. The DFD syntax is based on DFD elements as introduced by DeMarco (1979) but extends these elements with means for representing behavior relevant for confidentiality analyses. The semantics describe this behavior in terms of label propagation rules formulated in a logic program. An automated mapping translates the extended DFD into an executable logic program that yields detected violations. Thereby, we address three open challenges of software design modeling and analysis approaches aiming to find confidentiality violations. In our evaluation, we demonstrated the expressiveness with respect to information flow and access control, demonstrated effective reuse of existing models when switching between information flow and access control as well as evaluated the accuracy in a case study considering fifteen cases.
Practitioners as well as researchers can benefit from our contributions. Our syntax and semantics provide means for systematically considering confidentiality properties in an early design stage. This allows identifying fundamental design issues early and fixing them in a cost-efficient way. Because our syntax is close to the commonly known concepts and syntax of DFDs, we assume a flat learning curve for designers. Researchers can use our provided analyses as a foundation for defining their own confidentiality analyses based on data property propagation. This allows focusing on the application area and analysis concepts rather than on generic issues like data propagation or data dependency resolution. Additionally, the cases published as part of this article (Seifermann et al., 2021a) can serve as a benchmark for existing analyses.
We see five major points as part of future work. First, we plan to investigate how our DFD-based modeling language and analyses can be integrated with existing early design modeling and analysis approaches. Many modeling languages have counterparts for the modeling elements we presented in this paper. It might be possible to cover all aspects of our extended DFD modeling language by some lightweight modifications and a mapping to our modeling language. We already created a preliminary concept for Architectural Description Language (ADL) integration (Seifermann et al., 2021b) that needs to be refined and evaluated in future. Second, we plan to investigate whether the presented syntax and semantics are capable of supporting further security objectives such as integrity. Evaluating the support for integrity is reasonable because information flow requirements often ensure confidentiality and integrity. Third, we plan to investigate how we can build catalogs of reusable model elements. Reusing model elements has the potential to lower the modeling effort. An important question to answer is how designers could use these catalogs and how to design analysis definitions as reusable as possible. Fourth, we would like to know whether parts of our analyses could be executed in real-time while modeling to guide designers and provide fast feedback. Challenges in doing so include the handling of incomplete models, incomplete analysis results and how to identify and present useful analysis results while editing. Fifth, we plan a publication on the verification of correctness with respect to the mapping and the logic program. We plan to do the verification of the mapping based on the properties to verify for correctness collected by Rahim and Whittle (2015). The verification of the correctness of the logic program will consider completeness and correctness as suggested by Drabent (2016).

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
We provide all data used in the evaluation in our data set Seifermann et al. (2021a). This includes metamodels, source code, data flow diagrams, logic programs, analysis queries and results. In addition, we provide a manual to replicate all steps of our evaluation as part of the dataset.  Dominik Werle is a researcher at Karlsruhe Institute of Technology (KIT) since 2016. His research interests include model-based quality prediction for software systems with a focus on the performance of data-intensive applications. Dominik's research is also concerned with using data flow models to better understand design decisions and architectures for these applications.
Ralf Reussner is full professor for software engineering at Karlsruhe Institute of Technology (KIT) since 2006. He holds the chair for Dependability of Softwareintensive Systems, heads the Institute for Information Security and Dependability and is Director at the FZI Research Center for Information Technology. His research group works in the interplay of software architecture and predictable software quality as well as on view-based design methods for software-intensive technical systems.