Modelling of Failures Effect of Open Transmission System for Safety Critical Applications with the Intention of Safety

1Abstract—The paper deals with the problem of modelling safety features of open transmission system used within safetyrelated applications. The basic principles of modelling failures effect to safety of open transmission system and standards used in the process of safety evaluation are summarised in the paper. The practical part is oriented to description of realized Markov’s model for determination of random failures effects to safety of safety-related wireless communication system with safety a cryptography codes. The model reflects the safety analysis of failures effect caused by electromagnetic interference in wireless communication channel and random HW failures of transmission system. In the paper there are mentioned the results of simulation of parameters of transmission system and the impact of block length of cryptography code on the resulting of undetected corrupted message are mentioned.


I. INTRODUCTION
Safety-related systems are characterized by high tolerance against dangerous effects of failures.Consequences of system failures can be measured directly on the system or by the system simulation on model or eventually by theoretical consideration and by calculation.It should be noted that the high safety requirements of safety-related systems cannot be demonstrated only by the test results or by results from the practice (the frequency of occurrence of dangerous state is very small and the value of the mean time among failures far exceeds the value of lifetime of the safety-related system).Safety analysis of the system helps to provide the evidence that the safety requirements are met and the resulting risk is Manuscript received February 12, 2013; accepted November 20, 2013.This work has been supported by the Educational Grant Agency of the Slovak Republic (KEGA) Number: 024ŽU-4/2012: Modernization of technology and education methods orientated to area of cryptography for safety critical applications.acceptable.
In technical particle there is the term safety seen as one of comprehensive indicators of reliability attribute.This attribute refers to the degree to which a user can relay that the system will operate the way which it should have, that the system will be available in given time and circumstances and that the system is safe.Such combination of attributes of Reliability, Availability, Maintainability and Safety is known under acronym RAMS [1].
Communication system is an essential part of the whole safety-related system.Therefore it is necessary to pay attention on the method of realization of safety analysis respectively on the system synthesis.If we divide the communication system into detailed subsystems then it is necessary during the calculation of total failure rate calculates the failure rate of end device including interface and the failure rate of transmission system consisting of a transmitter, communication channel, receiver and other network elements [2].The failure rate of end devices is in most cases stated by the manufacturers therefore it is necessary to pay attention only to safety of safety-related transmission system [3].Nowadays, even for applications with great requirements for safety it is enforced the usage of open transmission systems for example GSM-R technology (communication medium for train control system in development of European Train Control System [4], [5]) respectively other wireless media (Wi-Fi, Bluetooth, ZigBee, WiMaX) within safety-related control systems in industrial automation [6]- [8].The approach of wireless safety-related systems (W-SRS) development is based on the usage of COTS technologies (Commercial Off The Shelf) and on additional safety layers as recommended by railway applications and industrial applications standards [9]- [12].Additional layers (safety profile) are mainly focused on protection against transmission errors (for their elimination is usually used safety code) and against unauthorized access to the system (for elimination of this is usually used cryptographic code).In the development phase of the system there shall be given quantitative evidence that safety mechanisms used in the safety profile meets the requirements for safety integrity level (SIL) for both these protections.SIL is defined for four levels from the lowest SIL 1 up to the highest SIL 4 in [10].
Within the qualitative analysis of wireless safety-related system the authors were focused on hazard analysis of the safety-related message transmission, on the determination of the error probability of cryptographic code decoder and on determination of dangerous failure rate of wireless safetyrelated communication system on the level of point-to-point connection.

II. PROCEDURE FOR SOLUTION BASED ON QUANTITATIVE ANALYSIS
Let us consider a point-to-point communication system (Fig. 1) which consists of two wireless safety-related equipment W-SRE1 and W-SRE2 and wireless transmission system.Trusted wireless transmission system arranges safety-related transmission (physically implemented through couple of encoder/decoder of safety code -ESC/DSC) and accesses (physically implemented through couple of encoder/decoder of cryptography code -ECC/DCC) which are an extension of encoder/decoder of transmission code -ETC/ DTC of untrusted transmission system.One part of untrusted wireless transmission system is wireless communication channel, which is affected by EMIelectromagnetic interference (caused by noises, reflections respectively fading effect) and attacks caused by unauthorized person, what must be also considered in the case of open transmission system.

Communication channel
Untrusted wireless transmission system (SIL 0) The dangerous failure rate of communication system λD(CS) for continuous operation is the sum of dangerous failure rate of end device λD(ED) and dangerous failure rate of transmission system λD(TS)

Hacker
On the basis of dangerous failure formation for open transmission system on the basic fault model according to [9] a dangerous state of transmission system can be caused by:  Hardware failures of untrusted transmission system including the technical equipment for message transmissions for example by wrong position of antennas or sensitivity of receiver;  Random failures caused by EMI which are not detected by transmission or safety code;  Failures of transmission code decoder;  Failures of cryptographic code decoder.
If we mark dangerous failure rates of particular parts which can cause a dangerous state λD (1), λD (2), λD (3), λD (4) and assume that the impact of failures form particular parts is independent then the dangerous failure rate of whole transmission system is given by sum of those partial failure rates In case the untrusted transmission system does not contain transmission code the influence of λD (3) shall not be considered.
Protocols of wireless technologies in most cases consider the safety code in the form of CRC (Cyclic Redundancy Check) code.The procedure for quantitative expression of dangerous failure rates λD(1), λD (2), λD( 3) is given in annex of norm [9], but only for the case of closed transmission system.In case the transmission system is using wireless communication channel (it becomes an open transmission system) it is necessary to quantify also the value of dangerous failure rate λD (4).In this paper authors deal only with a mathematical procedure to quantify this particular failure rate.Calculations of dangerous failures rates λD(1), λD (2), λD(3) are taken from [9] respectively from results in [13], [14] stemming from authors experiences gained during years of practice in this field of expertise.Hardware errors of untrusted transmission system can lead to undetected errors during message transmission in case of simulations failure of detection properties of the safety code.Then for λD (1) applies where λD(HW) is the hardware failure rate of transmission system, pUS is the undetected error probability of safety code, k1 is the hardware failure coefficient.
The mathematical apparatus of pUS calculation for (n, k) channel block codes can be found for example in [15]- [17].
The values of λD(HW) and k1 depends on the failure analysis of particular device or system.In most cases results are the experiences of the devices' operators and are estimated for the worst case.In analysis there is for the probability of undetected error used worst case approach (value 2 -r ) where r is the number of redundant bits of the safety respectively transmission code.Undetected errors caused by corrupted data integrity due to influence of EMI during the transmission occur in case of failure of both channel codes: transmission (in untrusted transmission system) or safety (in safety layer).Dangerous failure rate λD (2) . , where pUT is the undetected error probability of transmission code, pUS is the undetected error probability of safety code, fEMI is the frequency of error messages per hour caused by EMI.
In case the transmission system does not contain channel encoder/decoder of transmission code then pUT = 1.
The frequency of corrupted messages can be easily determined for example in case of cyclic transmission of messages.In other cases this value is estimated or is considered the worst case that means all generated messages from the source are corrupted.
Undetected transmission errors caused by hardware error of transmission code decoder (controlling device) can cause that all messages entering into safety-related layer are consider as correct.Falsification of received message can be detected only by safety code.Then the dangerous failure rate λD(3) can be expressed . , where λD(decTC) is the dangerous failure rate of transmission code decoder, pUS is the undetected error probability of safety code, k2 is the hardware failure coefficient.The values of λD(decTC) and k2 depend on analysis of particular situation for given application.In case we are not able to measure the bit error rate pb of communication channel is necessary to take into account the worst case during the pUS determination, which for binary transmission is pb = 2 -1 where pUS is limiting to value 2 -r , where r is the number of redundant bits of safety code.
During the usage of open transmission system it is necessary to consider that dangerous state (hazard) can occur also due to unauthorized access to the system (for example by hacker).In that case it is necessary to include the cryptographic code into the transmission string which modifies the message to unintelligible form for unauthorized user.It should also be quantified the cryptographic code failure on the side of receiver.It is recommended to use only computationally secure cryptographic block codes for safetyrelated applications.

III. MODEL DEVELOPMENT AND DESCRIPTION
The model realized by continuous Markov processes.During model development the authors considered the impact of various factors on the safety of wireless transmission system.The aim of failure effects analysis of the wireless system safety was to create a model which allows identifying the transitions process from safe state to dangerous state and allows calculating the probability of occurrence of dangerous state as a result of failures during system operation.Corruption of transferred data which is not detected by transfer system those data are handled as correct is considered as adverse effect.
In the model there are considered following types of random failures: random failures of hardware part of transmission system and failures caused by electromagnetic influence.Model development was based on Markov models implemented for closed transmission system (fieldbus) and was published in [18].These models are extended for the needs of open transmission system.The transition from a functional safe state 1 to dangerous (failure) state 7 is shown in Fig. 2. A Markov diagram corresponding with safetyrelated message transfer through wireless transmission system is shown in Fig. 1.The state of wireless transmission system when transmitting part of transmission system or any part of communication channel is in failure.

3
The state of wireless transmission system when transmission code decoder is in failure.0 4 The state of wireless transmission system when safety code decoder is in failure.0 5 The state of wireless transmission system when transmitting part of transmission system or any part of the communication channel is in failure and also transmission code decoder and safety code decoder are in failure.The authors simplified the diagram assuming that in case of transmission code decoder failure or in case of cryptographic code decoder failure it is no longer relevant to consider the impact of other parts of untrusted transmission system on the frequency of corrupted data (Fig. 3).Markov diagram in Fig. 3 can be mathematically described by a system of differential equations and by initial probability vectors.The system of differential equations is defined by where P(t) = {p1(t), p2(t), ..., pn(t)} is the absolute probabilities vector and A is the transition integrity matrix.
Vector of initial probabilities P(t = 0) = {1,0,...0}.Transition will take place as a result of operation of the control mechanism for the number of received corrupted messages by transmission code decoder or by safety code decoder or by cryptographic code decoder.
Transition will take place as a result of insufficient detection capability of transmission, safety and cryptographic code.

→ 5
Transition will take place as a result of hardware failure of transmission code decoder.
Transition will take place as a result of operation of the control mechanism for the number of received corrupted messages by transmission code decoder or by safety code decoder or by cryptographic code decoder.
Transition will take place as a result of insufficient detection capability of transmission, safety and cryptographic code.Transition will take place as a result of operation of the control mechanism for the number of received corrupted messages by safety code decoder or by cryptographic code decoder.

→ 7
Transition will take as a result of insufficient detection capability of transmission, safety and cryptographic code.

→ 5
Transition will take place as a result of hardware failure of transmitting part of transmission system or of any part of communication channel or as a result of hardware failure of transmission code decoder.

→ 6
Transition will take place as a result of operation of the control mechanism for the number of received corrupted messages by cryptographic code decoder.δC 4 → 7 Transition will take place as a result of insufficient detection capability of cryptographic code.

→ 6
Transition will take place as a result of intervention of the control mechanism for the number of received corrupted messages by safety code decoder or by cryptographic code decoder.

→ 7
Transition will take place as a result of insufficient detection capability of safety and cryptographic code.

fW.pUS.pU C
Based on formulas in [18] it is possible for the simplified Markov diagram of open transmission system in Fig. 3 to determine the transition integrity matrix which implies following system of differential equations: , , , where λHTP -hardware failure rate of transmitting part of the transmission system and of the communication channel; λHTD -hardware failure rate of transmission code decoder; λHSD -Hardware failure rate of safety code decoder; λEMI -failure rate of EMI disturbance on transmitted messages; pUTundetected error probability of transmission code; pUSundetected error probability of safety code; pUC -undetected error probability of cryptographic code; f -frequency of generated messages by transmitter; fEMI -frequency of corrupted messages due to EMI; fHTP-frequency of corrupted messages due to hardware failures of transmitting part of transmission system and of the communication channel; fW -frequency of corrupted messages without reason distinction; TT -reception tolerance time of corrupted messages of untrusted part of transmission system (detected by transmission code decoder); TS -reception tolerance time of corrupted messages of trustworthy part of transmission system (detected by safety code decoder); TC -reception tolerance time of corrupted messages of trustworthy part of transmission system (detected by cryptographic code decoder); T  -transition intensity to permanent safe state because of the control mechanism for the number of received corrupted messages by transmission code decoder; S  -transition intensity to permanent safe state because of the control mechanism for the number of received corrupted messages by safety code decoder; C  - transition intensity to permanent safe state because of the control mechanism for the number of received corrupted messages by cryptographic code decoder.

IV. MODEL VERIFICATION AND OBTAINED RESULTS
The accuracy of the calculation depends on suitably of chosen calculation method and on the numerical accuracy of computing technique.There exist several software tools which support the solution of Markov diagrams.Authors used the software tool Windchill Quality Solutions (former Relex 2011) from company PTC and the results were verified in software tool Wolfram Mathematica 8 from Wolfram Research.In practice, the use of model in Fig. 3 is problematic because of the high degree of uncertainty in determination of the model parameters.Therefore, in practical calculations is often used further simplification for example in terms of worst case approach during determination of model parameters.The authors assumed during the quantitative evaluation of transitions in model in Fig. 3  1 36000 , 24000 , 18000  It is assumed a transmission code type CRC-16 − according to standard [9] the occurrence of undetected error probability of transmission code is: PUT = 2 -16 (worst case assumption);  It is assumed a safety code type CRC-32 − type according to standard [9] the occurrence of undetected error probability of transmission code is: PUS = 2 -32 (worst case assumption);  It is assumed a block cryptographic code with block size k = 64, 128, 256 bits;  The lengths of transmitted messages are: n = 10 1 , 10 3 , 10 4 , 10 5 , 10 8 bits.For the calculation of undetected error probability of cryptographic code PUC is according to [19] used the formula (8), by which the PUC can be approximately calculated as UC CW P P  . For those interested in the relation (8) we suggest for example [20]       Numerical and graphical results of probability of entry into dangerous (hazardous) state ( 6) in time t: P6(t) for Markov diagram in the Fig. 3 [21] has been determined after application of above mentioned parameters and after application of the system of differential equations ( 7)- (12).
The undetected error probability of cryptographic code PUC for the message with length n = 10 5 bits is shown in graphs in Fig. 4 and in Fig. 5.The authors monitored the impact of block length of block cipher k on the resulting undetected error probability of cryptographic code PUC as well as on progress of error probability of gaining the dangerous state P6(t).The size of block cipher was chosen in accordance with lengths used in practice k = 64 bits (for example for block cipher DES), k = 128 bits (for example for block cipher AES) and k = 256 bits (for example for block cipher AES/Rijndael).

Fig. 2 .
Fig. 2. Markov diagram.Characteristics of individual states and of diagram transitions from Fig. 2 are given in TableIand TableII.Meaning of symbols used in diagram and in Fig.2is given in TableIII.
corrupted message was not detected.0
place as a result of hardware failure of transmitting part of transmission system or of any part of communication channel.λHTP 3 → 6 the following:  Hardware failure rate of transmitting part of the transmission system and of the communication channel is according to[18]: Hardware failure rates of transmission code decoder and of safety code decoder are according to[18] It is assumed the cyclic mode of safety-related messages transmission from the source -time of cycle is 50 ms;  Frequency of safety-related messages generated by ms, worst-case assumption that all messages are corrupted due to EMI);  Frequency of corrupted messages without ms, assuming all messages are corrupted);  Reception tolerance time of corrupted messages of trustworthy part of transmission system is set to value TC = 150 ms (the transmission system is set up so that if three successive messages are corrupted the connection is terminated), behaviour of the system is after restart verified also for other values of TC (100 ms and 200 ms);  Transition intensity to permanent safe state because of the control mechanism for the number of received corrupted messages by cryptographic code decoder is 1 1 1

Fig. 4 .
Fig. 4. Probability of undetected corrupted message for block size 64 bits.The progress chart of the dependence of dangerous state probability and of the time for block size k = 64 bits, for values TC = 100 ms, 150 ms and 200 ms, is shown in Fig. 4.

Fig. 5 .
Fig. 5. Probability of undetected corrupted message for block size 256 bits.The progress chart of the dependence of dangerous state probability and of the time for block size k = 256 bits, for values TC = 100 ms, 150 ms a 200 ms, is shown in Fig. 5.
Table I and Table II.Meaning of symbols used in diagram and in Fig. 2 is given in Table III.

TABLE I .
STATE DIAGRAM.

TABLE II .
DIAGRAM TRANSITIONS.