A Decentralized Compositional Framework for Dependable Decision Process in Self-Managed Cyber Physical Systems

Cyber Physical Systems (CPSs) need to interact with the changeable environment under various interferences. To provide continuous and high quality services, a self-managed CPS should automatically reconstruct itself to adapt to these changes and recover from failures. Such dynamic adaptation behavior introduces systemic challenges for CPS design, advice evaluation and decision process arrangement. In this paper, a formal compositional framework is proposed to systematically improve the dependability of the decision process. To guarantee the consistent observation of event orders for causal reasoning, this work first proposes a relative time-based method to improve the composability and compositionality of the timing property of events. Based on the relative time solution, a formal reference framework is introduced for self-managed CPSs, which includes a compositional FSM-based actor model (subsystems of CPS), actor-based advice and runtime decomposable decisions. To simplify self-management, a self-similar recursive actor interface is proposed for decision (actor) composition. We provide constraints and seven patterns for the composition of reliability and process time requirements. Further, two decentralized decision process strategies are proposed based on our framework, and we compare the reliability with the static strategy and the centralized processing strategy. The simulation results show that the one-order feedback strategy has high reliability, scalability and stability against the complexity of decision and random failure. This paper also shows a way to simplify the evaluation for dynamic system by improving the composability and compositionality of the subsystem.


Introduction
Once the concept of the Cyber Physical System (CPS) was first proposed by the American National Science Foundation (NFS) in 2006, it soon became so popular that CPS is even regarded as a next revolution of technology which can rival the contribution of the Internet [1]. CPS applications are being explored in various areas, e.g., smart transportation, smart cities, precision agriculture and entertainment. A CPS is a (large) geographically distributed, close-loop system. It closely interacts with the physical world by sensing and actuating. Roughly speaking, a CPS consists of wireless/wired sensor networks (WSNs), decision support systems (DSSs), networked control systems (NCSs) and physical systems/elements. To integrate these four kinds of subsystems, the framework/model of CPS should support both the discrete models (i.e., WSN, DSS and some NCS) and continuous models (i.e., some NCS and physical systems), and integrate them seamlessly. Compared to general AC system, a self-managed CPS has to interact automatically with the physical world. To behave properly, it should take the right actions in the right place, at the right time, with reasonable processing speed. In other words, it should not only continuously improve the system itself, but also adapt to the variable environment. Hence, a self-managed CPS should form two types of closed loops [13,18,19]. One is a self-healing loop [19], which is similar to the schema in Figure 1. Another is the interactive loop between the cyber world and the physical world, which is illustrated in Figure 2. The interaction loop includes long-term loops for causal reasoning (big data driven MAPE-K loop) and short-term loop for dependable decision process (the feedback control loop). The self-healing loop and interaction loop may influence each other, e.g., the temperature rise will trigger the cooling control loop (environment-in-loop adaptation) and also affects the reliability of hardware (system-centric self-management). In this paper, we focus on improving the consistency of event observation (the long-term loop) and improving the dependability of decision process (the short-term loop). Centralized decision arrangement is the most common solution for AC systems. The processing flow of a decision is controlled by a (local) central system, such as DSS. The processing flow fails if the decision manager fails (a.k.a. single point of failure). The dependability of such processing solutions is limited by the central system. To overcome this issue, one generic solution is deploying redundant decision arrangement system. However, it may generate conflicting decisions because two redundant decision control systems may have the inconsistent observation results and get different events' orders.
Even to one same centralized decision manager, the order of events may be wrong. The physical events occur in parallel and sensors are distributed in CPS. Due to various issues (e.g., errors, failures, delays, etc.) [20,21], the clocks of sensors may not be precisely synchronized. As a consequence, Compared to general AC system, a self-managed CPS has to interact automatically with the physical world. To behave properly, it should take the right actions in the right place, at the right time, with reasonable processing speed. In other words, it should not only continuously improve the system itself, but also adapt to the variable environment. Hence, a self-managed CPS should form two types of closed loops [13,18,19]. One is a self-healing loop [19], which is similar to the schema in Figure 1. Another is the interactive loop between the cyber world and the physical world, which is illustrated in Figure 2. The interaction loop includes long-term loops for causal reasoning (big data driven MAPE-K loop) and short-term loop for dependable decision process (the feedback control loop). The self-healing loop and interaction loop may influence each other, e.g., the temperature rise will trigger the cooling control loop (environment-in-loop adaptation) and also affects the reliability of hardware (system-centric self-management). In this paper, we focus on improving the consistency of event observation (the long-term loop) and improving the dependability of decision process (the short-term loop). reducing resource consumption of services, improving dependability and security of system. Indeed, it is us, human beings, who are trained to adapt to the AC systems. Compared to general AC system, a self-managed CPS has to interact automatically with the physical world. To behave properly, it should take the right actions in the right place, at the right time, with reasonable processing speed. In other words, it should not only continuously improve the system itself, but also adapt to the variable environment. Hence, a self-managed CPS should form two types of closed loops [13,18,19]. One is a self-healing loop [19], which is similar to the schema in Figure 1. Another is the interactive loop between the cyber world and the physical world, which is illustrated in Figure 2. The interaction loop includes long-term loops for causal reasoning (big data driven MAPE-K loop) and short-term loop for dependable decision process (the feedback control loop). The self-healing loop and interaction loop may influence each other, e.g., the temperature rise will trigger the cooling control loop (environment-in-loop adaptation) and also affects the reliability of hardware (system-centric self-management). In this paper, we focus on improving the consistency of event observation (the long-term loop) and improving the dependability of decision process (the short-term loop). Centralized decision arrangement is the most common solution for AC systems. The processing flow of a decision is controlled by a (local) central system, such as DSS. The processing flow fails if the decision manager fails (a.k.a. single point of failure). The dependability of such processing solutions is limited by the central system. To overcome this issue, one generic solution is deploying redundant decision arrangement system. However, it may generate conflicting decisions because two redundant decision control systems may have the inconsistent observation results and get different events' orders.
Even to one same centralized decision manager, the order of events may be wrong. The physical events occur in parallel and sensors are distributed in CPS. Due to various issues (e.g., errors, failures, delays, etc.) [20,21], the clocks of sensors may not be precisely synchronized. As a consequence, delays, etc.) [20,21], the clocks of sensors may not be precisely synchronized. As a consequence, different sensors may generate inconsistent timestamps for one same event, which confuses the DSS and misguides the fault diagnosis methods. Taking precision agriculture as an example, the DSS analyzes the soil moisture with the current temperature and the status of leaves; and then makes a final decision that the plants could and should be watered. Then the nozzle starts to spray water at the timestamp t 1 , and the event of starting to spray is denoted as e 1 . The soil moisture sensor detects the increase of humidity at the timestamp t 2 , and this event is denoted as e 2 . If we hold the assumption of the global reference time, it implies that t 1 and t 2 are comparable. When DSS receives e 2 before e 1 and finds that t 1 > t 2 , the DSS will alarm that there are some things wrong with the nozzle or pipe (i.e., leaking). Whereas in a real multi-agent CPS, the timestamps t 1 and t 2 are not comparable because of the time synchronization deviation between the node with soil moisture sensor and the actuator with nozzle. Hence, the timing order of e 1 and e 2 is indistinguishable. Consequently, the information of causality between physical events is lost, and further analysis becomes impossible. The timing issue is challenging the correctness of self-managing decisions, especially to the real-time CPS.
Another challenge is guaranteeing the consistency of the dynamic behavior with simple and dependable (Model@run.time) solutions. As a kind of system of systems (SoS), a CPS is composed of numerous heterogeneous subsystems. These subsystems may also recursively consist of various other subsystems. To describe this feature, we should abstract the subsystems with a model that is closed under composition. To model the dynamic structure and dynamic behavior for self-management, the reference framework should be flexible enough to describe the runtime composition. To guarantee the quality of decisions and quantitative analyze the dynamic behavior, the properties of subsystems should be composable and the requirements of decision should be runtime decomposable. With systematic consideration of these requirements, we proposed a framework based on compositional actors.
The contributions of the paper are manifold. We introduce a relative time solution to solve the inconsistent event observation in CPS, which forms a foundation for a decentralized decision process. Moreover, we design a formal compositional framework of the decentralized decision process. A self-similar recursive actor interface is proposed to simplify self-management. We analyze the composability and compositionality of our design and provide seven composition patterns. A one-order dynamic feedback strategy is introduced to improve the reliability, scalability and stability of decision process.

Structure of Paper
The remainder of the paper is organized as follows: Section 2 is about the related works on self-management CPS and formalization. Section 3 introduces the relative time model to guarantee the consistency of event observation and the qualitative contrastive analysis with the absolute time model. Section 4 details the actor-based formal model and the interface design. We analyze the composability and compositionality of reference framework in Section 5. We introduce a simple decentralized decision process strategy and one-order feedback decentralized dynamic decision process strategy and compare the reliability with other two strategies in Section 6. The relationship of Sections 3-6 is shown in Figure 3. Section 7 is a case study of the dependability of decentralized decision process. Section 8 draws the conclusions.
Notations: (1) without additional notes, we use t or t b to represent the absolute timestamp, τ to represent the duration, t l and t l + τ to represent the relative timestamp in the remainder of our paper. We useτ to represent the static duration for WCET, BCET, and static requirement of advice, and τ to represent the real duration or the dynamic requirement of decision; (2) the term "subsystem" is an agent with several actors. "Subsystem" to decentralized CPS is the "component" to centralized system; (3) the "decision" in this paper is a dynamical concept, which is similar to the concept of "application" and "program". An example is introduced in Appendix A.1 for further understanding of the decentralized decision process.

Related Works on Self-Management Framework for CPS
Roughly speaking, a CPS has to face two kinds of uncertainty. One is the changeable environment, another is the unpredictable process flow caused by resource competition and random failures. To behave properly under uncertainty, CPS should make and process decisions according to the context. For each self-adapting decision, CPS should select the right subsystems from various heterogeneous candidates, organize them in the right way, and coordinate the decision process on subsystems. For self-healing, each prearranged subsystem may be replaced by others at runtime, and heterogeneous redundant subsystems should cooperate together to improve the reliability. No matter self-adapting or self-healing, CPS has to dynamically reconstruct its services, structure and topology at runtime. However, dynamic reconstruction decreases the controllability and predictability of CPS behavior. It is a big challenge to achieving the consistent quality of decisions process, such as consistent timing, predictable reliability and safety. To overcome these issues, systemic solutions are need to evaluate the correctness of reconstruction and to guarantee the consistency of the dynamic behavior of decisions.
A good framework is the foundation for self-management CPS. Massive aspect oriented formal framework have been published to improve the functional performance of CPS [22,23], and various frameworks are proposed for self-adapting CPS. As we classified in the survey [3], these frameworks of CPS can be classified into three types: Service Oriented Architecture (SOA)-based frameworks, Multi-Agent System (MAS)-based frameworks, and other aspect oriented frameworks. Compared to SOA-based frameworks, MAS-based frameworks are more lightweight and more scalable. As a kind of SoS, CPS shows high flexibility, but low predictability. More and more researchers are paying attention to verification and validation (V&V) of the dynamic structure and behavior of CPS with Model@run.time methods to improve their predictability. A formal framework is an alternative solution to improve the predictability and dependability without introducing too much complexity.
Unfortunately, there are relatively few studies on formal framework (architecture) and dependability evaluation [3]. SCA-ASM is a formal SOA-based framework for modeling and validating distributed self-adaptive applications. SCA-ASM can model the behavior of monitoring and reacting to environmental changes and to internal changes, and the related operators for expressing and coordinating self-adaptive behaviors [24,25]. A MAS-based framework based on the logic-based modeling language called SALMA was introduced, this model focuses on the information transfer processing [26]. In the domain of cyber-physical transportation, Mashkoor et al. built a formal model with higher-order logic [27]. These frameworks are based on centralized decision process control solutions. The centralized decision controller is slow in reacting because of the long transmission delay, which increases the safety risk. Moreover, the centralized decision controller is a single point of failure. Decentralized control can overcome these drawbacks, and more and more studies are being published in this field. A formal framework was proposed for decentralized partially observable Markov decision process problems. Based this framework, a policy iteration algorithm is presented to improve the coordination of distributed subsystems [28]

Related Works on Self-Management Framework for CPS
Roughly speaking, a CPS has to face two kinds of uncertainty. One is the changeable environment, another is the unpredictable process flow caused by resource competition and random failures. To behave properly under uncertainty, CPS should make and process decisions according to the context. For each self-adapting decision, CPS should select the right subsystems from various heterogeneous candidates, organize them in the right way, and coordinate the decision process on subsystems. For self-healing, each prearranged subsystem may be replaced by others at runtime, and heterogeneous redundant subsystems should cooperate together to improve the reliability. No matter self-adapting or self-healing, CPS has to dynamically reconstruct its services, structure and topology at runtime. However, dynamic reconstruction decreases the controllability and predictability of CPS behavior. It is a big challenge to achieving the consistent quality of decisions process, such as consistent timing, predictable reliability and safety. To overcome these issues, systemic solutions are need to evaluate the correctness of reconstruction and to guarantee the consistency of the dynamic behavior of decisions.
A good framework is the foundation for self-management CPS. Massive aspect oriented formal framework have been published to improve the functional performance of CPS [22,23], and various frameworks are proposed for self-adapting CPS. As we classified in the survey [3], these frameworks of CPS can be classified into three types: Service Oriented Architecture (SOA)-based frameworks, Multi-Agent System (MAS)-based frameworks, and other aspect oriented frameworks. Compared to SOA-based frameworks, MAS-based frameworks are more lightweight and more scalable. As a kind of SoS, CPS shows high flexibility, but low predictability. More and more researchers are paying attention to verification and validation (V&V) of the dynamic structure and behavior of CPS with Model@run.time methods to improve their predictability. A formal framework is an alternative solution to improve the predictability and dependability without introducing too much complexity.
Unfortunately, there are relatively few studies on formal framework (architecture) and dependability evaluation [3]. SCA-ASM is a formal SOA-based framework for modeling and validating distributed self-adaptive applications. SCA-ASM can model the behavior of monitoring and reacting to environmental changes and to internal changes, and the related operators for expressing and coordinating self-adaptive behaviors [24,25]. A MAS-based framework based on the logic-based modeling language called SALMA was introduced, this model focuses on the information transfer processing [26]. In the domain of cyber-physical transportation, Mashkoor et al. built a formal model with higher-order logic [27]. These frameworks are based on centralized decision process control solutions. The centralized decision controller is slow in reacting because of the long transmission delay, which increases the safety risk. Moreover, the centralized decision controller is a single point of failure. Decentralized control can overcome these drawbacks, and more and more studies are being published in this field. A formal framework was proposed for decentralized partially observable Markov decision process problems. Based this framework, a policy iteration algorithm is presented to improve the coordination of distributed subsystems [28]. A decentralized control solution based on Markov decision processes is proposed for automatically constructed macro-actions in multi-robot applications [29]. However, these solutions mainly focus on the performance and convergence speed, the dependability and timing issues are rarely discussed. Moreover, these researches are based on ideal subsystems assumption, where all subsystems are dependable and behave consistently.
In the real world CPS, numerous heterogeneous subsystems are applied. These subsystems have different properties, i.e., different performance, different precision, which complicate the control of decision process. Various kinds of solutions have been invented to hide the differences between subsystems, such as middleware and virtualization [30,31], interface technology [32] and specification [33]. These technologies simplify the self-adaptation by providing consistent interfaces. Nevertheless, it is still not enough for self-management CPS. For safety, the risk of all decisions should be evaluable, and all actions should be predictable, which implies that all services and actions have consistent, stable behavior at run-time. Specifically, a CPS should be stable in the timing behavior and the reliability of services, and the accuracy of data, etc. Otherwise, the inconsistent and uncontrollable behaviors will make the CPS unpredictable and increase the risk of safety, mislead the DSS into making wrong decisions, i.e., the reasoning failure caused by inconsistent timestamp, which was introduced earlier.
To hide the differences and guarantee the quality of services, one promising way is to improve the C&C of services. Composability is the property whereby component properties do not change by virtue of interactions with other components [9]. It comes from the philosophy of reductionism, which highlights the consistent behavior of the component when it cooperates with other components to build a whole system. On the contrary, compositionality is originated from holism. Compositionality is that system level properties can be computed from and decomposed into component properties [9]. It is more about the capacity of decomposition of the system level properties. It focuses on the consistency between the system level properties and its divided properties (component properties), where the system level properties can be calculated with components/subsystems properties. For more detailed discussions about composability and compositionality readers may refer to [9]. By the way, the concepts of composability and compositionality are interchangeable in some studies.
Designing subsystems with high C&C can reduce the complexity of CPS and systematically improve the quality of services. A theory of composition for heterogeneous systems was proposed to improve the stability, which decouples stability from timing uncertainties caused by networking and computation [34]. Nuzzo introduced a platform-based design methodology based on contracts to refine the design flow. This methodology uses contracts to specify and abstract the components, then validates contracts according to the structure of CPS in design period [35]. An I/O automata-based compositional specification theory is proposed to abstract and refine the temporal ordering of behavior, and to improve the reasoning of the behavior of components [36], which is useful for dynamic decision evaluation and fault diagnosis. To guarantee the timeliness of real-time operations, a formal definition of timing compositionality is introduced [37]. However, how to guarantee in general the quality of dynamic characteristics such as timing [37], safety [34] and dependability is still an open issue. Both new architectures and evaluation methods are needed for guaranteeing the timing and the dependability at runtime. A contract-based requirement composition and decomposition strategy was introduced for component-based development of distributed systems [38]. This work is a valuable reference to the solution design for Model@run.time-based decision evaluations.
To achieve dependable CPS, systematic solutions are necessary. Both traditional means and self-healing methods are useful for maintaining the dependability of CPS. These methods should be applied organically at different levels to achieve dependability without introducing too much complexity. A satellite oriented formal correctness, safety, dependability, and performance analysis method is introduced in [39]; it comprehensively applies the traditional methods to improve a static architecture, yet the traditional means are limited to static architectures, and they become less and less efficient for CPS [3]. Self-healing methods are the trend to manage the dependability of the dynamic structure, which generally adjusts the architecture to prevent or recover from failures with flexible strategies. A simplex reference model was proposed to limit the fault propagation in CPS that built with unreliable components [40]. A methodology is introduced to formalize the requirements, the specification and the descriptive statements with domain knowledge, it shows a systematic solution to verify the dependability of CPS with formal models [41].
What's done cannot be undone, so this hardly eliminates the negative effects of a wrong physical operation, which makes great claims upon the dependability of CPS. Without maintenance of services and self-healing solutions, self-adapting CPS is still inapplicable. Considering the complex influence between self-healing actions and self-adapting actions, a good formal framework is needed to simplify the decision evaluation at run-time. To address the complexity, we need a systemic solution to apply self-management without introducing too much complexity.

Improving the C&C of Timing Behavior with a Relative Time-Based Model
Time is important to computing [42], especially for feedback control and causal reasoning. As a necessary condition for causal reasoning, it is important to achieve consensus on the timing behavior of both physical and cyber events. Moreover, the precise time can improve the control and cooperation between subsystems. Both context aware-based self-adaptation and fault prevention-based self-healing can benefit from the accurate causal reasoning and precise decision control. Hence, it is necessary to eliminate the temporal difference among subsystems and improve the C&C of timing for the self-management CPS.
To improve the C&C of timing and make all events observers achieve consensus on timing behavior (the same order of observed events). One intuitive solution is to establish a global reference time with a precisely timed infrastructure and time synchronization protocol. A time-centric model has been introduced for CPS [43]. It is a global reference time-based solution where every subsystem shares one absolute reference time. It is relatively easy to meet the assumption of global reference time for wired small scale CPS. Whereas for a large scale wireless connected CPS, such as the smart transportation CPS and the precision agriculture CPS, maintaining the consistent reference time (absolute time) is a big challenge [20,21].
Furthermore, even if we have a well synchronized system, it still can't achieve consistent absolute time and reproduce the causal relationship of events in cyberspace due to the imprecise timestamp. In fact, the timestamp of an observed event is rough. The accuracy of a timestamp depends on the sensitivity of the sensor, the processing speed, the sampling period, even the distance between the target object and sensor. Imagine that a physical event occurs at timestamp t p and the sensor detects the event at timestamp t s , where t p and t s are absolute times, t p < t s (because sensing takes time). To sensors (especially to the smart sensors integrated complex data analysis), t s − t p is not equal on different subsystems because of the sensitivity and the processing speed. Even to one same sensor, t s − t p is under stochastic volatility. Consequently, it's impossible to get the consistent absolute time of events in distributed CPS.
As current causal analysis methods just need the order of events, absolute time is an overly restrictive conditions, e.g., for logical reasoning e 1 ∧ e 2 → r : if two events e 1 and e 2 occur then we must have the result (event) r; or for quantum causal analysis with probability P(r|e 1 ) : the probability of the observing event r given that event e 1 is true/observed. Few technologies support to deduce further conclusion from the accurate time difference ∆t r→e between the result event r and the event e. There are two main reasons: (1) as ∆t r→e is affected by many factors, the acceptable range of ∆t r→e maybe too large. It is difficult to quantitatively analyze the stochastic volatility of ∆t r→e . For example, it takes several weeks to observe the effect of fertilization. In the meantime, various factors may changes the efficient of fertilizer; (2) meanwhile, most events are irrelevant, it wastes resources to guarantee the absolute time of these events. In general, there are two kinds of timekeeping methods. One is absolute time, where all subsystems share the same reference time (i.e., UTC) and the timestamp of event t b . Another is based on local time, where all subsystems have a different local reference time t l . For one same event, these subsystems have different observation timestamps t l + τ. Analyzing the sequence of events is the first step for mining the relationship between events. The common method to get the order is calculating the timestamp difference ∆t between two events. With absolute time, we can directly get the difference ∆t ab = t b2 − t b1 . With local time, it is relatively complex. As the base reference time t l is different, the common solutions of local timestamps are not directly comparable. The difference of two reference times ∆(t l2 − t l1 ) is necessary, so the final timestamp difference of two events is As observation is relative to each observer and each case, we propose a relative time model. Every subsystem just needs to record the duration that it takes to observe the event. The relationship of absolute time and different observers' timestamp is depicted in Figure 4. The tuple of timestamp is (absolute time, timestamp according to sensors' view, timestamp according to actuator's view). For example, a physical event occurs at the absolute time t b . It takes sensor1 τ 1.0 to observe the physical event, and the absolute timestamp is t b + τ 1.0 . The actuator observes the event from sensor1 at t b + τ 1.0 + τ 1.1 . Here, let us assume that sensor1, sensor2 and actuator are not well synchronized, they have to record the observation based on their own local times. The timestamp in local time when sensor1 observes the event is t l1 , where t l1 and t b + τ 1.0 are two timestamps based on different reference times, and t l1 = t b + τ 1.0 . Obviously, sensor1 can infer that the physical event occurs at t l1 − τ 1.0 . Likewise, the actuator observes the event from sensor1 at absolute timestamp t b + τ 1.0 + τ 1.1 , and at t l1 + τ 1.1 from sensor1s' view, and at t l3.1 from the actuator's view. The event from sensor2 occurs at t b + τ 2.0 + τ 2.1 , t l2 + τ 2.1 , t l3.2 , where t l3.1 maybe not equal to t l3.2 . As mentioned earlier, we can't figure out the order of two observations based on the timestamps t l1 and t l2 . To simplify the calculation of ∆t r f , one intuitive solution is to select a good observer to let ∆(t l2 − t l1 ) = 0. The actuator is such an observer, the actuator can infer that the event occurs at t l3.1 − τ 1.0 − τ 1.1 and t l3.2 − τ 2.0 − τ 2.1 . As t l3.1 and t l3.2 share the same local reference time, we have the difference of the timestamps ∆t r f = t l3.
where τ 1.0 + τ 1.1 and τ 2.0 + τ 2.1 are amount of the process time and transmission time. Currently, the best observers are the sensors. To achieve this, we propose a dynamic decision process framework, which will be introduced in the last part of this paper.
In general, there are two kinds of timekeeping methods. One is absolute time, where all subsystems share the same reference time (i.e., UTC) and the timestamp of event b t . Another is based on local time, where all subsystems have a different local reference time l t . For one same event, these subsystems have different observation timestamps l t   . Analyzing the sequence of events is the first step for mining the relationship between events. The common method to get the order is calculating the timestamp difference t  between two events. With absolute time, we can directly get the difference is necessary, so the final timestamp difference of two events is As observation is relative to each observer and each case, we propose a relative time model. Every subsystem just needs to record the duration that it takes to observe the event. The relationship of absolute time and different observers' timestamp is depicted in Figure 4. The tuple of timestamp is (absolute time, timestamp according to sensors' view, timestamp according to actuator's view). For example, a physical event occurs at the absolute time b t . It takes sensor1 1.0  to observe the physical event, and the absolute timestamp is . The actuator observes the event from sensor1 at

 
are amount of the process time and transmission time. Currently, the best observers are the sensors. To achieve this, we propose a dynamic decision process framework, which will be introduced in the last part of this paper.  Theoretically speaking, two observations of one same event should be identical. For timing, two observations should have the same timestamp, where ∆t r f = 0. However, it may be not true in the real world system, because the clocks on different subsystems have different speeds due to the frequency deviation of oscillators. The revised relative time model is shown in Figure 5, where f is the system clock frequency of the respective subsystems. The tuple of timestamp is (accumulated time from the physical event is generated, local time). For example, the actuator observes the event from sesnor1 at real world system, because the clocks on different subsystems have different speeds due to the frequency deviation of oscillators. The revised relative time model is shown in Figure 5, where f is the system clock frequency of the respective subsystems. The tuple of timestamp is (accumulated time from the physical event is generated, local time). For example, the actuator observes the event from sesnor1 at local time 3 l t , the accumulated time is  Based on the time synchronization solutions for the symmetric network [21] or the asymmetry network [44], every observer can get the duration of transmission time by exchanging message methods according to its own clock, whereas compared to the absolute time model, the relative time model doesn't need to synchronize the clocks. Instead, the neighbor subsystem just needs to check the scale of the frequency 21 / ll ff . We design an appointment and execution method to calculate 21 / ll ff which is shown in Figure 6. For easy understanding, all the timestamps in Figure 6 are absolute times, and all the duration are relative. With the exchanging message method, every subsystem has already got the 1  At the beginning, subsystem1 makes an appointment with subsystem2 to execute one same benchmark at the same time (the execution speed of the benchmark should be independent to the hardware architecture, i.e., the size of cache). Subsystem1 takes 1.2  to finish the benchmark and takes 1.4  to get the finished signal from subsystem2. From the view of subsystem1, it takes subsystem2  So far, the remaining problem is how to automatically get the scale of the frequency f l2 / f l1 . Based on the time synchronization solutions for the symmetric network [21] or the asymmetry network [44], every observer can get the duration of transmission time by exchanging message methods according to its own clock, whereas compared to the absolute time model, the relative time model doesn't need to synchronize the clocks. Instead, the neighbor subsystem just needs to check the scale of the frequency f l2 / f l1 . We design an appointment and execution method to calculate f l2 / f l1 which is shown in Figure 6. For easy understanding, all the timestamps in Figure 6 are absolute times, and all the duration are relative. With the exchanging message method, every subsystem has already got the τ 1.1 and τ 1.5 (actually, τ 1.1 and τ 1.5 can be inaccurate because of frequency deviation of oscillators. The revised relative time model is shown in Figure 5, where f is the system clock frequency of the respective subsystems. The tuple of timestamp is (accumulated time from the physical event is generated, local time). For example, the actuator observes the event from sesnor1 at local time 3 l t , the accumulated time is  Based on the time synchronization solutions for the symmetric network [21] or the asymmetry network [44], every observer can get the duration of transmission time by exchanging message methods according to its own clock, whereas compared to the absolute time model, the relative time model doesn't need to synchronize the clocks. Instead, the neighbor subsystem just needs to check the scale of the frequency 21 / ll ff . We design an appointment and execution method to calculate 21 / ll ff which is shown in Figure 6. For easy understanding, all the timestamps in Figure 6 are absolute times, and all the duration are relative. With the exchanging message method, every subsystem has already got the 1.1 ). At the beginning, subsystem1 makes an appointment with subsystem2 to execute one same benchmark at the same time (the execution speed of the benchmark should be independent to the hardware architecture, i.e., the size of cache). Subsystem1 takes 1.2  to finish the benchmark and takes 1.4  to get the finished signal from subsystem2. From the view of subsystem1, it takes subsystem2   At the beginning, subsystem1 makes an appointment with subsystem2 to execute one same benchmark at the same time (the execution speed of the benchmark should be independent to the hardware architecture, i.e., the size of cache). Subsystem1 takes τ 1.2 to finish the benchmark and takes τ 1.4 to get the finished signal from subsystem2. From the view of subsystem1, it takes subsystem2 We can simplify the relative time model by calibrating the clock (oscillator) of all subsystems with a base clock (oscillator) before deployment, and set f b / f l for every subsystem and get an absolute duration.
To simplify the formulation, we use the term absolute duration in the remainder of this paper. We can easily change the absolute duration τ b to relative duration τ r with formulation τ Considering related physical events are in geographical proximity, the observer should be as close as possible to the source of events. Thus, the accumulated error of duration of two events will not be too large that CPS can't reproduce cause-effect relationships. As soon as the events being observed and serialized, their orders (relationships) are confirmed. Then, CPS can apply various technologies for further analysis of the relationship between these events. The relative time model records the duration of events instead of the absolute timestamp when events occur. Ideally, f l2 / f l1 just needs to be set once. In the real world system, it may still need to be calibrated several times during the system lifetime, because oscillators are affected by temperature and aging. Anyhow, this process can decrease the frequency of synchronization significantly and has a more stable error. Furthermore, the relative time model doesn't need a global reference time, which can improve the scalability of CPS significantly, no matter the subsystems are heterogeneous or not. Detail comparison between relative model and absolute time mode is beyond the scope of this paper, the qualitative conclusion is shown in Table 1.

The Formal Reference Framework for Decision Process
As mentioned earlier, it can benefit a lot by improving the C&C of subsystems. Our solution mainly focuses on the composition of subsystems and decomposition of requirements at run-time. In this section, we introduce the formal reference framework for the dynamic decision process.

Overview of the Actor Based Framework
A self-managed CPS should automatically sense the environment and diagnose itself, then make both self-adapting and self-healing decisions, and execute these decisions. Decision making and executing are the two key parts to form the close-loop. Improving the C&C of subsystems can decrease the complexity of the process of both decision making and decision executing. Composability can simplify decision making by simplifying the evaluation of the reasonability of decisions. Otherwise, DSS has to enumerate all available combinations. Compositionality can simplify decision executing by simplifying the decomposition of the requirements at runtime, which is helpful for guaranteeing the dependability of decision execution. Composability and compositionality are two sides of the same coin, which are the necessary qualities for a good CPS framework.
A self-management CPS includes two parts: (1) the agent platform, which includes hardware and corresponding actors; (2) the dynamic behavior management subsystem (decision subsystem). An overview of an actor-based framework for a self-management CPS is shown in Figure 7. The actor is the atomic abstraction of subsystems in our reference model. Agents are the platform for decision execution. The behavior of a decision depends on both the properties of hardware and the properties of the actors. To simplify, we integrate the hardware properties into the properties of the actors. We assume that the decision has been made by the DSS. Here, we focus on the evaluation of advice and the dependability guaranteeing in run-time. of the actors. To simplify, we integrate the hardware properties into the properties of the actors. We assume that the decision has been made by the DSS. Here, we focus on the evaluation of advice and the dependability guaranteeing in run-time.  In the rest of this paper, we will simplify the notation by writing in ss instead of trajectory (the   transition sequence) ,,

Actor and Decision Formalization
Definition 1 (Actors). An actor is a time bounded Mealy finite state machine (FSM) Actor = (Tid, Σ, S, s 0 , Θ, Ψ, T). Where Σ is a finite set of events, and event should not be empty ε / ∈ ∑, e timer ∈ ∑ is the timer interrupt event based on local time; S is a finite set of states, where ∀s ∈ S has a time bound τ s that represents the duration that the actor stays in state s, and the maximal time bound of state s denotes s|τ s ; s 0 ∈ S is the initial state, and s 0 |τ In addition, an actor must contain one non-empty action ψ; otherwise, it can't interact with other actors.τ k ψ is the time bound of action ψ k ; T is the union set of the time bound of the state, transition and action T = {τ s } ∪ {τ θ } ∪ τ ψ . Here we have {τ θ } and τ ψ because the transition in cyber space and the action with physical world are always asynchronous. Tid is the identifier of the type of actor.
In the rest of this paper, we will simplify the notation by writing s i · · · s n instead of trajectory (the transition sequence) < ε i, s i , θ i , ψ i >, · · · , < ε n, s n , θ n , ψ n >, whenever the context allows doing so without introducing ambiguity. We define Actor i = Actor j if and only if Actor i and Actor j produce identical output sequence ψ 1 · · · ψ n for all valid input sequences ε i · · · ε m , where n ≤ m. (Actor i = Actor j ) ⇔ (Actor i .Tid = Actor j .Tid) , Actor i = Actor j just says that the two actors has the same trajectory, the two actors may be not isomorphism, and the properties of two actors can be different, i.e., the performance, the reliability etc. If Actor i = Actor j , and also all the properties of Actor i and Actor j are the same, we use the notation Actor i ≡ Actor j .
Every actor has a set of properties, which we denote as (Actor, P). In our dependable framework, P =<τ b ,τ w , p(τ) >, whereτ w is the worst-case execution-time (WCET) of processing a decision, τ b is the best-case execution-time (BCET). p(τ) is the failure rate, where τ is the online time or the elapsed time from last recovery. We can calculateτ b by replacing the BCETτ s andτ θ in the formulâ Noticeτ b andτ w are not the time from s 0 to s end . For the case presented in Figure 8, Actor i receives output in state s k , and generates a new output in state s k+h . Thus S p = {s k , s k+1 , · · · , s k+h } is the S p in formulaτ = ∑ s∈{s k+1 ,··· ,s k+h } (τ s +τ θ ) +τ k ψ . We also can get theτ b andτ w with the Monte Carlo method.
decision, ˆb  is the best-case execution-time (BCET  We use the notations Actor i (s k , ψ k ) msg → Actor j (s m , ψ m ) to represent point-to-point communication that Actor i sends a message msg in state s k with action ψ k , and Actor j receives msg in state s j with action ψ j , and is the many-to-one communication. The three types of communication are illustrated in Table 2. Using message-based composition, we can decouple the actors and reduce the constraints of operation interfaces.
We will use msg(i, j) to represent the communication in short, and msg i,j to identify the message itself. Without explicit mention, msg(i, j) also implies that the Actor i and Actor j have the same definition of the structure of the message; if not, Actor j will ignore the message. In addition, we use msg k,i = msg k,j , where msg k,i shares the same description of structure with msg k,j , the context/value of message could be different. msg k,i ≡ msg k,j means that msg k,i and msg k,j have the same the structure and the context. msg k,i msg k,j means that msg k,i and msg k,j are identical, which means that msg k,i ≡ msg k,j and the properties (e.g., time bound etc.) of message are the same.   Notice that, according to this model, one Actor can send a message to itself, whereas in a real system, only the composited actor can send the message to itself (the subsystem of the composited actor). It is meaningless for an atomic actor to do so. CP is the agent/subsystem level view of interactions. These interactions are not limited to applications' communication, which include the interactions between agents to maintain the infrastructures, e.g., topology, QoS, etc. For advice evaluation, we ignore the communication for maintenance.   Notice that, according to this model, one Actor can send a message to itself, whereas in a real system, only the composited actor can send the message to itself (the subsystem of the composited actor). It is meaningless for an atomic actor to do so. CP is the agent/subsystem level view of interactions. These interactions are not limited to applications' communication, which include the interactions between agents to maintain the infrastructures, e.g., topology, QoS, etc. For advice evaluation, we ignore the communication for maintenance.

Many-to One
Notice that, according to this model, one Actor can send a message to itself, whereas in a real system, only the composited actor can send the message to itself (the subsystem of the composited actor). It is meaningless for an atomic actor to do so. CP is the agent/subsystem level view of interactions. These interactions are not limited to applications' communication, which include the interactions between agents to maintain the infrastructures, e.g., topology, QoS, etc. For advice evaluation, we ignore the communication for maintenance. Actor j many-to-one Notice that, according to this model, one Actor can send a message to itself, whereas in a real system, only the composited actor can send the message to itself (the subsystem of the composited actor). It is meaningless for an atomic actor to do so. CP is the agent/subsystem level view of interactions. These interactions are not limited to applications' communication, which include the interactions between agents to maintain the infrastructures, e.g., topology, QoS, etc. For advice evaluation, we ignore the communication for maintenance. Definition 3 (Advice). Let Ad = (X s , < Tid act , X f >, ⊗, X) be an advice, where X is a set of observation event χ o =< Ob, Tid t , χ t >, which represents the preorder actor observes whether the target Actor tid=tid.t has generated event χ t or not, Ob is an composition which includes an operation instruction op on event χ t , Ob = msg(Tid s, Tid t ). < op, Tid t , χ t >∈ CP. Tid act is the identifier of actuator actor to take the action, X s is the action triggering conditions, X f is the action finishing conditions, X s ⊆ X, X f ⊆ X; ⊗ is Boolean operations {or, and, not}. A generic form of advice is defined as i f ⊗ χ∈X s χ, then excute Actor Tid=Tid act , until ⊗ χ∈X f χ. Every advice has a set of constraints R s =<τ d ,τ v , r s dep >, τ d is the maximal process time of the decision that generated from Ad,τ v is the term of validity of Ad, r s dep is the minimum reliability requirement of the decision.
Notice that operation instructions op in the composition message for Ob can be a set of operations op = {<, ≤, =, ≥, >, =, } ∪ {not occur, occur}. For safety, one decision contains one final actuator and can only take one action, because mealy FSM (1) is on not closed status under parallel composition [45], hence, the final action should be processed in serial order. However, it doesn't says that actuators can't be the target actor of χ. χ is an observation event, the trigger of a decision can depend on the event whether a target actuator has taken/finished an action. Definition 4 (Decision). Let DC = (uuid, Ad, ACT, CP, R d ) be the decision instance of an advice Ad, where ∀Actor ∈ ACT and whose Tid is defined in Ad or is an network actor; uuid is the universally unique identifier, ∀msg ∈ M has the same uuid with the decision; R d =< τ r , τ sv , r d dep > is the run-time decomposed requirements of DC. τ r =τ d − ∑ τ i is the remaining processing time of the decision, where τ i is the actual processing time of Actor i ; τ sv = ∑ τ w i − ∑ τ i is the saved time; r d dep is the current reliability of decision, which will be introduced in Sections 5 and 6.1. ∀msg ∈ CP.M belongs to a composition pattern, which will be introduced in Section 5.2. And also ∀msg ∈ CP.M has a time bound < τ w , τ rs >∈ T, Actor i+1 first waits for time τ w then starts to process the decision, τ w is the reserved time for parallel composition to synchronize the processing; τ rs is the reserved time for decision process, τ rs =τ d − ∑ τ w i+1 . In summary, Actor i+1 should wait τ w and finish the (i + 1)th step of decision in Uuid is the identification to avoid repeatedly processing one decision on the same actors, which is an important constraint to prevent duplication and maintain safety. R d is for transmitting the dynamic requirement to successor actors, τ w is for synchronization, τ rs is used to control the deadline of process. The example of the formal process flow is introduced in Appendix A.1.

Centralized and Decentralized Decision Process
According to the way of decision management, there are two kinds of decision process forms. One is centralized decision process; another is our proposal, decentralized decision process. Without loss of generality, the local DSS generates an advice with two χ s , i f e 1 &e 2 , then excute Actor act , until e 3 .
The centralized decision process flow is illustrated in Figure 9. The local DSS sends an advice to a decision manager and the decision manager controls the flows of a decision process. At every step (1.1 to 1.4, 2.1 to 2.4 and 3.1 to 3.5), the sensors and actuators should acknowledge to the manager, then the manager sends the command for next operation. By the way, Actor act is also a decision manager to the process of e 3 . Uuid is the identification to avoid repeatedly processing one decision on the same actors, which is an important constraint to prevent duplication and maintain safety. d R is for transmitting the dynamic requirement to successor actors, w  is for synchronization, rs  is used to control the deadline of process. The example of the formal process flow is introduced in Appendix A.1.

Centralized and Decentralized Decision Process
According to the way of decision management, there are two kinds of decision process forms. One is centralized decision process; another is our proposal, decentralized decision process. Without loss of generality, the local DSS generates an advice with two s  , The centralized decision process flow is illustrated in Figure 9. The local DSS sends an advice to a decision manager and the decision manager controls the flows of a decision process. At every step (1.1 to 1.4, 2.1 to 2.4 and 3.1 to 3.5), the sensors and actuators should acknowledge to the manager, then the manager sends the command for next operation. By the way, act Actor is also a decision manager to the process of 3 e  . To overcome the single point of failure and to minimize the duration (time) error for event observations, we design a decision as a program solution. The decentralized decision process flow is illustrated in Figure 10. A decision is processed with the flow of transmission. It has no explicit decision manager. In some sense, every actor can be regarded as a decision manager for next step composition. The successor waits for all messages from its preorders according to the composition pattern (step 3.1). Based on the decentralized solution, CPS can observe the firsthand events (both physical events and cyber events).  To overcome the single point of failure and to minimize the duration (time) error for event observations, we design a decision as a program solution. The decentralized decision process flow is illustrated in Figure 10. A decision is processed with the flow of transmission. It has no explicit decision manager. In some sense, every actor can be regarded as a decision manager for next step composition. The successor waits for all messages from its preorders according to the composition pattern (step 3.1). Based on the decentralized solution, CPS can observe the firsthand events (both physical events and cyber events).

Simplify Self-Management Strategies with Self-Similar Actor
CPSs have massive subsystems, and some of them are heterogeneous. It is impossible to specify strategies for every subsystem. In general, most of the subsystems have limited resources, it is too complex to apply enough powerful strategies to adapt to all situations. Moreover, it is also impossible to exhaust all situations. The systematic solution is need to decease the complexity of runtime decision management.
The key idea to achieve self-management without deceasing the dependability is using simplicity to control complexity [5] and simplifying the management (control) based on selfsimilarity [46]. To achieve this, we need to take full advantage of the characteristic of SoS and design a systematic framework and self-similar subsystems to enable recursive composition for CPS. Our framework includes four levels of abstraction: CPS, Agent, CompositedActor and CommonActor. The BNF (Backus Normal Form) of the composition relation is shown in Equation (1). To achieve selfsimilarity, we propose a well-design actor interface to simplify the self-management. These actors share a set of similar operations, the self-similar interface is shown in Figure 11. By applying FSM based actor design, we simplify the constraints for runtime decision decomposition and actor composition. The detailed composition pattern will be discussed in Section 5.
Base on the thought of everything as an actor, we can abstract the decision with compositedactor, which can be recursive decomposed at runtime. Based on the self-similar interface design, the ActorManager on different agents can manage every sub-part of decision with the same rule. And every actor supports a set of same actions self-healing() and property_detecting(). property_detecting() is dedicated to check the requirements with the actors' properties, which include process time and reliability. A compositedactor is generated by the adviceparser according to the advice.

Simplify Self-Management Strategies with Self-Similar Actor
CPSs have massive subsystems, and some of them are heterogeneous. It is impossible to specify strategies for every subsystem. In general, most of the subsystems have limited resources, it is too complex to apply enough powerful strategies to adapt to all situations. Moreover, it is also impossible to exhaust all situations. The systematic solution is need to decease the complexity of runtime decision management.
The key idea to achieve self-management without deceasing the dependability is using simplicity to control complexity [5] and simplifying the management (control) based on self-similarity [46]. To achieve this, we need to take full advantage of the characteristic of SoS and design a systematic framework and self-similar subsystems to enable recursive composition for CPS. Our framework includes four levels of abstraction: CPS, Agent, CompositedActor and CommonActor. The BNF (Backus Normal Form) of the composition relation is shown in Equation (1). To achieve self-similarity, we propose a well-design actor interface to simplify the self-management. These actors share a set of similar operations, the self-similar interface is shown in Figure 11. By applying FSM based actor design, we simplify the constraints for runtime decision decomposition and actor composition. The detailed composition pattern will be discussed in Section 5.  Base on the thought of everything as an actor, we can abstract the decision with compositedactor, which can be recursive decomposed at runtime. Based on the self-similar interface design, the ActorManager on different agents can manage every sub-part of decision with the same rule. And every actor supports a set of same actions self-healing() and property_detecting(). property_detecting() is dedicated to check the requirements with the actors' properties, which include process time and reliability. A compositedactor is generated by the adviceparser according to the advice. The compositedactor just fills the Tiggerconditions if there is not Actor act on the same agent. Otherwise, the compositedactor take actions if the value of the Boolean expression of the Tiggerconditions is true. By using message-based composition, actors share the same communication pattern. Combining with the self-similar interface, actors can have a self-similar behavior, which is depicted in Figure 12. For example, based on the observation event ,, Ob Tid     and CP , the observation is recursive (Boolean operation is closed); logically, any level subsystem can be an observer. The recursive decomposition of event stops when the event is an atomic event, where  . Based on the recursive design, a complex strategy/decision can be decomposed and processed by basic actors. Based on selfsimilar behavior, simple (self-healing) rules can be applied at all levels of CPS, which is shown in Figure 13. The threshold for the timeout detection are the time bound T which defined in Section 4.2. By using message-based composition, actors share the same communication pattern. Combining with the self-similar interface, actors can have a self-similar behavior, which is depicted in Figure 12. For example, based on the observation event X =< Ob, Tid, X > and CP, the observation is recursive (Boolean operation is closed); logically, any level subsystem can be an observer. The recursive decomposition of event stops when the event is an atomic event, where χ ∈ Σ. Based on the recursive design, a complex strategy/decision can be decomposed and processed by basic actors. Based on self-similar behavior, simple (self-healing) rules can be applied at all levels of CPS, which is shown in Figure 13. The threshold for the timeout detection are the time bound T which defined in Section 4.2. By using message-based composition, actors share the same communication pattern. Combining with the self-similar interface, actors can have a self-similar behavior, which is depicted in Figure 12. For example, based on the observation event ,, Ob Tid     and CP , the observation is recursive (Boolean operation is closed); logically, any level subsystem can be an observer. The recursive decomposition of event stops when the event is an atomic event, where  . Based on the recursive design, a complex strategy/decision can be decomposed and processed by basic actors. Based on selfsimilar behavior, simple (self-healing) rules can be applied at all levels of CPS, which is shown in Figure 13. The threshold for the timeout detection are the time bound T which defined in Section 4.2.

Composition Rules for Reference Framework
As CPS may dynamically reconstruct at any time, different subsystems may be selected to process the decision. Hence, the ACT of one same decision may be entirely different in different executions. Even the physical communication topology and the hardware structure may be dynamic, i.e., the communication topology of the smart fertilization CPS that consists of the unmanned aerial vehicle (UAV) and WSN, or the hardware structure of a Network-on-Chip (NoC) system. The behavior of a decision changes with the actors involved. In this section, we formally analyze the consistency between decision (advice) requirements and subsystem properties based on the reference framework and give the rules for run-time composition to guarantee the correctness and dependability.

Composability and Compositionality of Actors
Improve the C&C of actors is an effective solution without introducing too much complexity. The theory on composability and compositionality for actors are detailed in [47,48]. One main issue that limits the C&C of actor is that the composed actor may have potential deadlocks due to the data flow loop. As every decision and each transition of actor has a deadline (time property), this issue is not so serious. Also, as we analyze in this paper, we just focus on the rules for the composition of properties and runtime requirements decomposition.

The Pattern of Composition
The three basic formats of composition are illustrated in Figure 14. In each format, P is composited with i and j . In Figure 14b, i and j have different functional logic and perform parallel. For redundant composition in Figure 14c, i and j also perform in parallel, but have the same functional logic.

Composition Rules for Reference Framework
As CPS may dynamically reconstruct at any time, different subsystems may be selected to process the decision. Hence, the ACT of one same decision may be entirely different in different executions. Even the physical communication topology and the hardware structure may be dynamic, i.e., the communication topology of the smart fertilization CPS that consists of the unmanned aerial vehicle (UAV) and WSN, or the hardware structure of a Network-on-Chip (NoC) system. The behavior of a decision changes with the actors involved. In this section, we formally analyze the consistency between decision (advice) requirements and subsystem properties based on the reference framework and give the rules for run-time composition to guarantee the correctness and dependability.

Composability and Compositionality of Actors
Improve the C&C of actors is an effective solution without introducing too much complexity. The theory on composability and compositionality for actors are detailed in [47,48]. One main issue that limits the C&C of actor is that the composed actor may have potential deadlocks due to the data flow loop. As every decision and each transition of actor has a deadline (time property), this issue is not so serious. Also, as we analyze in this paper, we just focus on the rules for the composition of properties and runtime requirements decomposition.

The Pattern of Composition
The three basic formats of composition are illustrated in Figure 14. In each format, P is composited with i and j. In Figure 14b, i and j have different functional logic and perform parallel. For redundant composition in Figure 14c, i and j also perform in parallel, but have the same functional logic.

Composition Rules for Reference Framework
As CPS may dynamically reconstruct at any time, different subsystems may be selected to process the decision. Hence, the ACT of one same decision may be entirely different in different executions. Even the physical communication topology and the hardware structure may be dynamic, i.e., the communication topology of the smart fertilization CPS that consists of the unmanned aerial vehicle (UAV) and WSN, or the hardware structure of a Network-on-Chip (NoC) system. The behavior of a decision changes with the actors involved. In this section, we formally analyze the consistency between decision (advice) requirements and subsystem properties based on the reference framework and give the rules for run-time composition to guarantee the correctness and dependability.

Composability and Compositionality of Actors
Improve the C&C of actors is an effective solution without introducing too much complexity. The theory on composability and compositionality for actors are detailed in [47,48]. One main issue that limits the C&C of actor is that the composed actor may have potential deadlocks due to the data flow loop. As every decision and each transition of actor has a deadline (time property), this issue is not so serious. Also, as we analyze in this paper, we just focus on the rules for the composition of properties and runtime requirements decomposition.

The Pattern of Composition
The three basic formats of composition are illustrated in Figure 14. In each format, P is composited with i and j . In Figure 14b, i and j have different functional logic and perform parallel. For redundant composition in Figure 14c, i and j also perform in parallel, but have the same functional logic.   According to the automata theory, FSM is closed under the operations: union, intersection, concatenation, substitution, homomorphism, etc. The composite FSM can generate the identical trajectory with the sub-FSMs under the same input (advice), so FSM-based actors are compositional for structures in Figure 14 (compositedactor is still an actor, it inherit the logical function and interface from the sub-actors. The design of compositedactor is shown in Figure 11). Therefore, we can transform dynamically the advice to decision and reconstruct the decision according to the closure of union, intersection (composition for parallel observation, Figure 14b), concatenation (i.e., hierarchical structure composition, Figure 14a), simplify the QoS-based self-optimization and replacing based self-healing based on homomorphism (for replacing with the actors on heterogeneous agents), and substitution (replacing the Actor tid2 with two Actor net and Actor tid2 ' or building a redundant composition with Actor tid2 and Actor tid2 ', Figure 15). We can use the closure of reversal to simplify reasoning. According to the automata theory, FSM is closed under the operations: union, intersection, concatenation, substitution, homomorphism, etc. The composite FSM can generate the identical trajectory with the sub-FSMs under the same input (advice), so FSM-based actors are compositional for structures in Figure 14 (compositedactor is still an actor, it inherit the logical function and interface from the sub-actors. The design of compositedactor is shown in Figure 11). Therefore, we can transform dynamically the advice to decision and reconstruct the decision according to the closure of union, intersection (composition for parallel observation, Figure 14b), concatenation (i.e., hierarchical structure composition, Figure 14a), simplify the QoS-based selfoptimization and replacing based self-healing based on homomorphism (for replacing with the actors on heterogeneous agents), and substitution (replacing the tid Actor , Figure 15). We can use the closure of reversal to simplify reasoning. Notice that, Mealy FSM is not closed under parallel composition [45], because the component i and j may depend on each other (cyclic dependency, also called an algebraic loop). To break the cyclic dependency, we limit the amount of final actuator to one (see Section 4.2). FSM is closed just means that the functional logic of FSM (the trajectory of input and output) is closed under these operations. It doesn't mean that the properties of subsystem are also closed, i.e., the worst case response time is not closed under homomorphism and substitution.

Constraints and Solution for Composability
(1) Interface composition: (Compatibility): According to the interface theory, two interfaces are not compositional, because one can't accept the error output that generated by another interface [32]. Compositional interface can be achieved by designing uninterruptible self-healing operation and noticing other actors before starting to self-heal, because notification and timeout event are acceptable to all actors. Other actors can reconstruct the decision, so no error output will be generated and sent to other actors. After recovery, the state will be restarted from the state 0 s . For normal transitions without error states, according to the automata theory, this constraint can be easily complied. If Actor fails, the conclusions can't be made, because it may generate an erroneous output which is unacceptable to other actors. To an actor, it can' keep the consistent timing behavior in such situation. The only solution is to apply the redundancy methods which will be introduced in Section 5.2 and Section 6 to minimize the risk of failure. Meanwhile, we use the methods introduced in (1) interface composition to stop all actions immediately until the actors is recovered. Notice that, Mealy FSM is not closed under parallel composition [45], because the component i and j may depend on each other (cyclic dependency, also called an algebraic loop). To break the cyclic dependency, we limit the amount of final actuator to one (see Section 4.2). FSM is closed just means that the functional logic of FSM (the trajectory of input and output) is closed under these operations. It doesn't mean that the properties of subsystem are also closed, i.e., the worst case response time is not closed under homomorphism and substitution.

Constraints and Solution for Composability
(1) Interface composition: (Compatibility) According to the interface theory, two interfaces are not compositional, because one can't accept the error output that generated by another interface [32]. Compositional interface can be achieved by designing uninterruptible self-healing operation and noticing other actors before starting to self-heal, because notification and timeout event are acceptable to all actors. Other actors can reconstruct the decision, so no error output will be generated and sent to other actors. After recovery, the state will be restarted from the state s 0 . (

2) Consistent transition: (limits the effects of failures)
Actor is consistent in transition, iff Actor produces identical output sequence ψ 1 · · · ψ n for all valid input sequences ε i · · · ε m , and all t ψ ≤t ψ .
For normal transitions without error states, according to the automata theory, this constraint can be easily complied. If Actor fails, the conclusions can't be made, because it may generate an erroneous output which is unacceptable to other actors. To an actor, it can' keep the consistent timing behavior in such situation. The only solution is to apply the redundancy methods which will be introduced in Sections 5.2 and 6 to minimize the risk of failure. Meanwhile, we use the methods introduced in (1) interface composition to stop all actions immediately until the actors is recovered. The cyclic dependency of actors will confuse the observation and make decision tracing difficult. The behavior is not inferable, if the composition is commutative because the trajectory of Actor i Actor j is the same with Actor j Actor i . The observer can't infer the behavior based on the trajectory. Hence, if Actor i Actor i = Actor j Actor i , such composition should be forbidden, or Actor j and Actor i should be designed as a one huge actor.

Composition Rules of Reliability and Time (Duration) Properties
The compositionality of dynamic requirements and the composability of properties in dynamic behavior are two sides of the same coin. For DSS, checking the rationality of an advice is estimating the holistic properties of a decision with the properties of the actors. It should take into account the available structures for processing. For the actors who process the decision, evaluating the practicability of dynamic arrangement (decision decomposition) is checking the fitness between run-time requirements with the properties of (next step) actors.
Most requirements/properties of decisions, which include both the system level requirements and subsystem level requirements, change over time. And most properties of subsystems just depend the duration of processing, which can be specified by a function of duration/time, i.e., the reliability R(τ), and energy consumption E(τ) = P × τ. In this paper, we focus on the dependability and process time, but this method can provide the reference for other requirements.

Calculation Rules for Reliability Composition for Relative Time Based Framework
The reliability function is written as is the failure rate function, p(τ) is the failure density function, τ is the duration, R(τ) ∈ (0, 1). To simplify the equation, we use the absolute duration in this Section (because the process of statistic of p(τ) is based on absolute duration; we can transform it into a relative time model with τ 0 is the duration of the observer actor. f b is the absolute frequency and f 0 is the frequency of MCU where observer runs on).
We conclude seven types of composition solutions, which are shown in Table 3. t i refers to the timestamp of last recovery of Actor i , τ r i refers to the elapsed time from last recovery, τ p i is the process time τ b i ≤ τ p i ≤ τ w i . t i + τ r i is the timestamp when Actor i starts to process current event and t i + τ r i + τ p i is the timestamp when finishes to process current events. To simplify, let . Notice that, the equations in Table 3 can also be applied as the rules for the decomposition of reliability requirement at run-time.
The pattern 1 and pattern 2 are the two basic functional composition patterns, and the pattern 3 to 5 are the basic redundant processing patterns. All three patterns start m actors simultaneously to process the same decision. Pattern 3 accepts the first returned output without waiting others (i.e., Reliable message transmission). Pattern 4 doesn't start next action until all actors finish the actions (i.e., guarantee the reliability of sensing data). Pattern 5 starts next action after receiving k (same) outputs. Pattern 6 and 7 are composite patterns to tradeoff between time requirement and reliability. CPS can apply different strategies (pattern 3-5) to accept the outputs in every step.
Both reliability and process time are very important to safety-critical CPS. We can apply different composition patterns to arrange the decision process to achieve balance between the dependability, efficient (minimizing the amount of redundant actors) and correctness (to meet the requirement of τ r , in other word, finishing the decision in time). For example the reserved time Min(τ DC ) < τ r − τ rs < Max(τ DC ) and the runtime reliability requirement for the next step is Min(R DC ) < r d dep < Max(R DC ), we can apply the pattern 6.2 and with pattern 5 to meet the constraints of time and reliability at the same time. Table 3. The composition rules for reliability and duration.

Patterns
The Structure of the Composition R DC (ø), ø DC and t 1 (1) Basic pattern

Decision Process Patterns and Reliability
We summarized four kinds of decision arrangement solution, two of them are traditional solutions, and another two are designed for our framework. In this subsection, all involved actors are composited actors, and the final composition structure of decision is the same with the structure 1 of Table 3. The availability of n compositional actors is a classical Markov repairable system [49], which is briefly introduced in Appendix A.1. Table 3. The composition rules for reliability and duration.

Patterns
The Structure of the Composition (2) Parallel pattern (time critical)

Decision Process Patterns and Reliability
We summarized four kinds of decision arrangement solution, two of them are traditional solutions, and another two are designed for our framework. In this subsection, all involved actors are composited actors, and the final composition structure of decision is the same with the structure 1 of Table 3. The availability of n compositional actors is a classical Markov repairable system [49], which is briefly introduced in Appendix A.1. Table 3. The composition rules for reliability and duration.

Patterns
The Structure of the Composition (3) First wins (time critical)

Decision Process Patterns and Reliability
We summarized four kinds of decision arrangement solution, two of them are traditional solutions, and another two are designed for our framework. In this subsection, all involved actors are composited actors, and the final composition structure of decision is the same with the structure 1 of Table 3. The availability of n compositional actors is a classical Markov repairable system [49], which is briefly introduced in Appendix A.1. Table 3. The composition rules for reliability and duration.

Patterns
The Structure of the Composition (4) Check all (reliability critical)

Decision Process Patterns and Reliability
We summarized four kinds of decision arrangement solution, two of them are traditional solutions, and another two are designed for our framework. In this subsection, all involved actors are composited actors, and the final composition structure of decision is the same with the structure 1 of Table 3. The availability of n compositional actors is a classical Markov repairable system [49], which is briefly introduced in Appendix A.1. Table 3. The composition rules for reliability and duration.

Patterns
The Structure of the Composition    It is the traditional solution for the static architecture, where both hardware and software are centralized. Applications are built as a macro-system. All actors are tightly implemented as a union, and the connections between actors can't be modified dynamically. The structure is shown in Figure 16.     It is the traditional solution for the static architecture, where both hardware and software are centralized. Applications are built as a macro-system. All actors are tightly implemented as a union, and the connections between actors can't be modified dynamically. The structure is shown in Figure 16.
where, max{t 1,n + τ 1,n , · · · , t m,n + τ m,n } = t,   It is the traditional solution for the static architecture, where both hardware and software are centralized. Applications are built as a macro-system. All actors are tightly implemented as a union, and the connections between actors can't be modified dynamically. The structure is shown in Figure 16.     It is the traditional solution for the static architecture, where both hardware and software are centralized. Applications are built as a macro-system. All actors are tightly implemented as a union, and the connections between actors can't be modified dynamically. The structure is shown in Figure 16.
, j is the first actor (apply pattern 3 in all steps) 1 R DC (τ) is the reliability of decision, τ DC is the duration of decision t is an absolute timestamp when the decision is finished.
Where ∧ is the AND operator of Boolean logic.

(3) Constraints for advices and decisions
For an operable advice, the requirements of time should meet ∑ Actor∈Adτ b i ≤τ d , (τ b i is BCET defined in Section 4.2), and the dependability requirements should meet r dep < Max(R DC ), if τ rs < Max(τ DC ). If τ rs > Max(τ DC ) and τ d >τ w i+1 , we can try redo Actor i+1 to improve the reliability.

Decision Process Patterns and Reliability
We summarized four kinds of decision arrangement solution, two of them are traditional solutions, and another two are designed for our framework. In this subsection, all involved actors are composited actors, and the final composition structure of decision is the same with the structure 1 of Table 3. The availability of n compositional actors is a classical Markov repairable system [49], which is briefly introduced in Appendix A.1.

Static Decision Process Strategy (Static for Short)
It is the traditional solution for the static architecture, where both hardware and software are centralized. Applications are built as a macro-system. All actors are tightly implemented as a union, and the connections between actors can't be modified dynamically. The structure is shown in Figure 16.
(6) Hybrid pattern (6. It is the traditional solution for the static architecture, where both hardware and software are centralized. Applications are built as a macro-system. All actors are tightly implemented as a union, and the connections between actors can't be modified dynamically. The structure is shown in Figure 16. As all subsystems (components) are built on agent, every subsystem is online. The online duration where τ j is the decision process duration on Actor j . The reliability of a decision using this solution is shown in Equation (2):

Centralized Decision Process Strategy (Centralized for Short)
This is a typical decision process flow of the current solution, the hardware structure is decentralized but the control is centralized. The centralized decision manager selects next actors from distributed agents, and controls the flow of the decision process. The structure is shown in Figure 9. For this structure, the centralized decision manager should be online during the whole process time. The reliability function of the decision applying this solution is shown in Equation (3). Notice that, to focus on decision process, we ignore the processing time of decision manager in each step. As a result, R cnt is larger than the real value. In simulation, we let the failure density function of manager p mg = p 1 : The decision is processed dynamically without feedback control, where both structure and control are decentralized. The decision process flow is shown in Figure 10. As actors can heal themselves dynamically, these actors have different online durations (t 0 of every actor is different). The decision process fails if and only if the actor fails when it is processing the decision. The reliability of a decision with this solution is written as an Equation (4): The flow of one-order feedback dynamic decision process is illustrated in Figure 17. Suppose that Actor i and Actor i+1 are two connected composited actors, and Actor i+1 is processing the decision.  The decision is processed dynamically without feedback control, where both structure and control are decentralized. The decision process flow is shown in Figure 10. As actors can heal themselves dynamically, these actors have different online durations ( 0 t of every actor is different).
The decision process fails if and only if the actor fails when it is processing the decision. The reliability of a decision with this solution is written as an Equation (4) The flow of one-order feedback dynamic decision process is illustrated in Figure 17  (1) If Actor i+1 fails when it is processing a decision and Actor i doesn't receive the ACK message from Actor i+1 after the time τ w i+1 + τ msg(i+1,i) , Actor i can resend the msg(i, i + 1) to another Actor i+1 to re-process the decision if the time requirement permits. The decision can go on being processed correctly.
(2) If both Actor i and Actor i+1 succeed, but Actor i doesn't receive the ACK message because the network failed. The Actor i+1 can find a successor Actor i+2 to process the decision, Actor i may resend a query to Actor i+1 . As the decision has uuid, the final actuator will just process the decision once and ignore the decision sent Actor i+1 . This wastes resources but decision can be processed correctly. (3) Similar to (2), if Actor i fails and Actor i+1 succeeds, Actor i+1 can just ignore Actor i and pass the decision to the next successor. In addition, the decision can be processed correctly. (4) If Actor i+1 fails and Actor i also fails, no one has the status of the decision. Obviously, no actors can rearrange the process of this decision. Consequently, the decision fails.
Therefore, for one-order feedback solution, decision fails only when the Actor i and Actor i+1 are both failed. It takes Actor iτdl + τ w i+1 + τ msg(i+1,i) to aware the failure and τ msg(i,i+1 ) to resend the message to Actor i+1 . The decision can go on processing. If the Actor i also fails duringτ dl + τ w i+1 + τ msg(i+1,i) + τ msg(i,i+1 ) , this instance of decision fails. Thus the failure rate is shown in Equation (5), The reliability of dynamic processing strategy is shown in Equation (5): Notice that, if τ w i+1 + τ msg(i+1,i) <τ dl can't be satisfied, Actor i has no time to find another available actor, the decision fails; but we can decrease the failure possibility by redundant solution, i.e., arranging more successors to process the decision, which was introduced in Section 5.2.1. By the way, we can also develop a high-order system to improve the reliability, the ACK message should be sent from Actor i+1 to Actor i and to Actor i−k in recursive form, so Actor i−k also knows the statues of the decision.

Simulation and Result Analysis
According to the Equations (2)-(4) and (6), we reach the conclusion that the reliability decreases with the process time, failure aware time (WCET), because reliability value F(t) < 0. Here, we conduct a set of simulations with MATLAB to test reliability of the four strategies of decision process against the complexity of decision (the amount of composited actors).
In this simulation, we assume that the failure functions of all actors are following the exponential distribution (F(t) = 1 − e −λt ), which is a common assumption for reliability evaluation. The failure rate λ of each actors fallows uniform distribution, whose range is [0, 0.0002]. The amount of actors increases from 2 to 40 with step 2. The process time τ of each actor follows uniform distribution, whose range is [100, 300]. The range of process time affects the Min reliability and Max reliability of process, and the more results of different ranges is shown in Appendix A.3. The WCET τ w = 1.1 × τ; thê τ dl = 20; t s i = 0 which means that all actors are renewed for decision process. The URL for scripts and simulation data is in supplementary materials.
For each amount of actors, we simulate 100 times, where λ and τ are the same for the four strategies. In each simulation under same amount of actors, λ and τ are renewed. The simulation results of four strategies against the complexity of decision (the amount of actors) are illustrated in Figure 18. Furthermore, the part of statistic results are listed in Table 4. The stability analysis is shown in Figure 19.  The obtained simulation results show that dynamic, decentralized decision process can achieve not only higher reliability (Static process < Centralized process < Simple dynamic process < one-order feedback) but also higher stability and higher scalability. Static process is centralized control with centralized structure; the centralized process is centralized control with decentralized structure; simple dynamic process and one-order feedback are decentralized control with decentralized structure. The curves in Figure 18 show that decentralization can increase the reliability of the system. One-order feedback solution has the highest reliability. For each amount of actors, one-order feedback solution also  The obtained simulation results show that dynamic, decentralized decision process can achieve not only higher reliability (Static process < Centralized process < Simple dynamic process < one-order feedback) but also higher stability and higher scalability. Static process is centralized control with centralized structure; the centralized process is centralized control with decentralized structure; simple dynamic process and one-order feedback are decentralized control with decentralized structure. The curves in Figure 18 show that decentralization can increase the reliability of the system. One-order feedback solution has the highest reliability. For each amount of actors, one-order feedback solution also has smaller value of Max−Min and (Max−Min)/Mean, which shows that it has more stable behavior against the variable process time. Thus, the reliability of one-order feedback is more predictable, and it is important for decision arrangement. With the increasing of complexity, the values of Max−Min and (Max−Min)/Mean show that one-order feedback solution is more stable. The reliability of one-order feedback solution decreases slowly with the amount of actors, which shows higher scalability. It means that one-order feedback solution can be applied for more complex decision and involves more actors. In addition, one-order feedback strategy can achieve higher reliability and stability than simple dynamic decision process strategy without introducing time overhead (it just increases the memory overhead, because the preorder actor should keep the message until the successor actor returns the ACK).

Proactive Self-Healing for Fault Prevention (Risk Management)
There are two types of self-healing strategy: (1) self-healing actions only occur after an actor has failed. Such system is a classical Markov repairable system (as seen in Appendix A.1); (2) The actors take proactive actions to heal themselves before any fault occurs (i.e., an actor can periodically check and restart itself to prevent the faults). CPS is safety-critical. Applying the first self-healing strategy increases risk of missing deadline, because repairing takes time. Therefore, we can apply the second strategy to prevent failures to improve the safety.
According to the hazard function h i+1 (t) = p(t) , we can replace the R i (τ i ) and R i+1 (τ i+1 ) with the Equations (2)-(4) and (6), and can get the hazard function of failure for each strategy when Actor i hinds over the decision process to Actor j (the detail equation is attached in Appendix A.2). Obviously, the hazard increasing with the online time of Actor j (Centralized processing also depends on online time of decision manager, One-order feedback strategy also depends on the online time of Actor i ).
We can define a threshold of risk H threshold , let h i+1 (τ) ≤ H threshold , by solving the equation, we can have a τ p which is the period of self-healing. Therefore, we can control the failure risk and to improve the safety of the decision process. In addition, let us assume that it takes an actor τ h time to self-heal itself. Thus, the availability of every actor is A = τ p /(τ p + τ h ).

Case Study
To test, validate and evaluate the propose concepts, we implemented a test-bed platform. We used a PC as a local DSS, which connects with other subsystems with a USB to ZigBee adapter. There are three types of Arduino boards (Mega2560) and a humidifier. Type1 (top): the board has a light sensor (Keyes K853518). Type2 (middle): the board has a soil moisture sensor (FC−28) and controls the humidifier with a relay module. And Type 3 (bottom): the board has a temperature and a humidity sensor (DH11). The three types of Arduino cooperate together to process the decision. The platform of case 1 is shown in Figure 20.
means that one-order feedback solution can be applied for more complex decision and involves more actors. In addition, one-order feedback strategy can achieve higher reliability and stability than simple dynamic decision process strategy without introducing time overhead (it just increases the memory overhead, because the preorder actor should keep the message until the successor actor returns the ACK).

Proactive Self-Healing for Fault Prevention (Risk Management)
There are two types of self-healing strategy: (1) self-healing actions only occur after an actor has failed. Such system is a classical Markov repairable system (as seen in Appendix A.1); (2) The actors take proactive actions to heal themselves before any fault occurs (i.e., an actor can periodically check and restart itself to prevent the faults). CPS is safety-critical. Applying the first self-healing strategy increases risk of missing deadline, because repairing takes time. Therefore, we can apply the second strategy to prevent failures to improve the safety.
According to the hazard function

Case Study
To test, validate and evaluate the propose concepts, we implemented a test-bed platform. We used a PC as a local DSS, which connects with other subsystems with a USB to ZigBee adapter. There are three types of Arduino boards (Mega2560) and a humidifier. Type1 (top): the board has a light sensor (Keyes K853518). Type2 (middle): the board has a soil moisture sensor (FC−28) and controls the humidifier with a relay module. And Type 3 (bottom): the board has a temperature and a humidity sensor (DH11). The three types of Arduino cooperate together to process the decision. The platform of case 1 is shown in Figure 20.  In all cases, the actors of the same sensors and actuators are implemented with the same code. We automatically inject the faults on every Arduino board when actors are active, and the frequency of fault injection is one error every 4 s. The subsystem can self-recover from the failures with a container-based multilevel self-healing solution, which is introduced in our previous paper [50].

Case 1: All actors in one board
It is a macrosystem solution, all sensors and actuators are integrated into one board. It is also a centralized control solution, all process flows are controlled by one agent. Notice that, to improve the reliability, the actuator will notify local DSS the progress of humidifying every one minute Case 2: T1_one + T2_one + T3_one In this case, CPS has no redundant subsystems. It just has one type1 board, one type2 board and one type3 board.

Case 3: T1_two + T2_one + T3_two
In this case, CPS has two redundant type1 boards and two redundant type3 boards. It has one type2 board. To avoid over modifying the environment, the process of the final step of decision (humidifying) is mutually exclusive, and only one actuator is in charge of the final step of action.
Two types of failures are injected: (1) actor level failure, which is WCET violation [50]; the target actor will start the self-healing action. Other actors on the same board work normally; (2) Board level failure, which is Random PC Error [50]; the board will be restarted and all actors on this board are failed.
The tests take 12 days (it takes about two days to test every case and each failure, from 15 August to 27 August). The deadline of decision process is 15 min. It takes the board 1-4 s to recover from board level failure, and it takes about 80-110 milliseconds to recover from an actor level failure. The results are shown in Tables 5 and 6. The failure rates in Table 4 show that the decentralized framework (cases 2 and 3) can tolerate the higher frequency failures, especially board level failure. One-order feedback strategy on decentralized framework (cases 2 and 3 in Table 4) can successfully process all decisions in time. The mean process time (MPT_N) in Table 5 shows that case1 has highest, performance in normal model, but it is not much better than the distributed framework (the overhead of the distributed framework in normal mode is 239.2 − 235.8 = 3.4 s). The mean process time under fault injection (MPT_F) shows that the distributed framework has significantly higher performance than centralized solution. One-order feedback strategy on the redundant decentralized framework (case 3 in Table 5) can shorten the redoing time, cover the time cost of self-healing and improve the dependability of decision process. In summary, One-order feedback strategy on the redundant decentralized framework can tolerate failures, leave more time to actuator to take the final action, which can improve the safety of decision process (the actuator can cautiously process the final action with more frequent checking).
Notice that, to improve the dependability of real world CPS, we have applied multi-level measures, which include actor level self-healing solution, node level self-healing solution, decentralized fault detection solution, etc. The results in Table 4 are the comprehensive effect of these measures. Moreover, the duration of process time is affected by the weather (the humidity and temperature). Normally, it takes the humidifier about 4 min to increase the moisture from 30 to 50 (the moisture of surface soil has been increased to 50). In addition, it takes about 15 min to dry the soil moisture from 50 to 30 in Harbin in August.

Discussion
In this paper, we mainly focus on the introduction of a compositional framework and the evaluation of decentralized decision process. A CPS is an autonomic computing system which should be able to adapt to the changeable environment, prevent and recover from various failures automatically. To achieve this goal, CPS has to adjust its structure and behavior dynamically. In this paper, we introduce a systemic solution to improve the consistency of event observation (the long-term loop) and the dependability of decision process (the short-term loop). To solve the inconsistent timestamp of the events, an observer based relative time solution is proposed to guarantee the consistent event observation for causal reasoning and processing duration management. The relative time solution infers the timestamp when the events occur with process duration and the timestamp that event observed. Using the locality of events, we can select the nearest local observer to control the errors of observation. This solution doesn't need the global reference time and periodic clock synchronization, it can increase the scalability of CPS.
To minimize the errors of observation and to overcome single point failure of centralized decision process, we design a formal reference framework based on compositional actor for self-management CPS. Base on the thought of decision as a program, actor-based decisions (advice) can be decomposed and composed at runtime. Moreover, a self-similar recursive actor interface is proposed to simplify self-management. We provide the patterns and evaluation rules and constraints for reliability and process time composition and decomposition.
Based on this framework, we propose a simple dynamic decision process strategy and a one-order dynamic feedback decision process strategy and compare the reliability with traditional static strategy and centralized decision process strategy, the simulation results shows that the one-order dynamic feedback strategy has high reliability, scalability and stability against the complexity of decision and random failure.
The testing results of the real world system show the comprehensive improvement of dependability with our framework. Our compositional framework improves the scalability through three main solutions: (1) the relative time model is applied to remove the central reference time node; (2) the compositional framework supports decentralized decision process; (3) one-order dynamic feedback strategy improves the scalability. CPS can apply different composition patterns to achieve the balance between requirements of safety, reliability and process time.
In this paper, we show a way to simplify the dependability evaluation for dynamic systems. By improving the composability and compositionality of actors, we can evaluate the system requirements with the properties of compositional actors, and deduce the system behavior from the behavior of subsystems, which can accelerate the progress of evaluation significantly. Flow1 and Flow2 have a parallel composition. Thus the reliability of Actor tid1 and Actor tid1 should be larger than 0.98. To synchronize observation, Actor tid1 first waits τ w = 17 then starts its observation. Let us assume that Actor act receives the message from Actor tid1 at local timestamp t. According to the relative time model, Actor act can deduce that event from Actor tid1 occurs at t − 10, that event from Actor tid2 occurs at t − 20. Actor act have the right order of event. Such decentralized solution can reduce the error of difference of timestamp.
Without loss of generality, let us assume that Actor tid2 is a composited actor, which means that no Actor can meet the reliability requirement. We can improve the reliability with redundant composition which introduced in Section 5.1. If we apply the composition pattern 4, the decomposition rule is (1 − R i (τ i )) ≥ 0.98, and the maximum τ DC < 100 − 55 = 45. Thus, local DSS will select m Actor whose tid = tid2 to observe the event e 2 together. Let us suppose that two Actors with tid = tid2 As events and actors support recursive composition, every composited actor can decompose the requirement as the local DSS dose in the example. Suppose e 2 =< ob, tid 4 , e 4 > & < ob, tid 5 , e 5 > is a composited event, e 4 is atomic event, e 4 ∈ Σ. The decomposed requirement of Actor tid2.1 is R s =<τ d ,τ v , r s dep >=< 45, 0, 0.96 >. The Actor tid2.1 plays as a local DSS, the process flow is shown in Figure A3, replacing the e 2 with e 4 . composition which introduced in Section 5.1. If we apply the composition pattern 4, the decomposition rule is Actor plays as a local DSS, the process flow is shown in Figure A3, replacing the 2 e with 4 e . Figure A3. The example flow of decomposition.
consists several actors. The range of the reliability of each actors is (0, 1). The four Equations (2)-(4) and (6) can be simplified as R dc = c × n ∏ i=1 R i , where R i ∈ (0, 1). R dc decreases if R i decreases (R 1 × R 2 < min(R 1 , R 2 ), where 0 < R 1 < 1, 0 < R 2 < 1). As R i decreases with the process time, R dc decreases the process time of decision is Στ i . All the parameters used are the same with the simulation in Section 6.2. The actor amount is  with the step of 2. According to the four figures Figure A4a-d. we can make the conclusion that the range of process time affects the Min reliability and Max reliability of process. One-order feedback strategy has best performance of reliability and stability. Obviously, the actor with small process range is more stable (Figure A4a vs. Figure A4d). consists several actors. The range of the reliability of each actors is (0, 1). The four Equations (2)- (4) and (6)  All the parameters used are the same with the simulation in Section 6.2. The actor amount is  with the step of 2. According to the four figures Figure A4a-d. we can make the conclusion that the range of process time affects the Min reliability and Max reliability of process. One-order feedback strategy has best performance of reliability and stability. Obviously, the actor with small process range is more stable (Figure A4a vs. Figure A4d).

Appendix A.4 Hazard Function of Four Strategies
Hazard function of the static decision process strategy: Hazard function of the centralized decision process strategy: Hazard function of simple decentralized decision dynamic processing strategy: Hazard function of one-order feedback decentralized decision dynamic processing strategy.