Diagnosis of Intermittent Connection Faults for CAN Networks With Complex Topology

As the topologies of Controller Area Networks (CAN) in the industrial production systems become increasingly complex, robust and accurate diagnosis of network faults are essential to the system maintenance and reliability assurance. In CAN network, intermittent connection (IC) fault is a hidden problem, which deteriorates the performance and safety of the system. Although IC fault diagnosis methods had been studied, diagnosis of IC faults for CAN network with complex topology has not been systematically addressed. In this paper, a novel IC fault diagnosis methodology for CAN networks with complex topologies is proposed, which utilizes distributed measurement from data link layer information for error pattern analysis. First, the diagnosability of IC faults is defined and the analysis method is presented. Then, local and trunk IC faults localization algorithm is developed based on the integration of error patterns with the network topology. Third, a sequential fault diagnosis strategy is proposed to achieve the full diagnosis of IC faults by gradually compressing the non-diagnosable sub-networks. Testbed is constructed and case studies under various topologies and IC faults scenarios are conducted to demonstrate the effectiveness of the proposed methodology. Experiment results show that the IC faults locations diagnosed by the proposed methodology agree well with the experiment setups.


I. INTRODUCTION
Controller Area Network (CAN), as a low-cost, real-time, and high-flexibility fieldbus, is widely used in networked industrial systems, such as automotive communication systems, avionics systems, and process monitoring and control, which usually require excellent network real-time performance and reliability [1], [2]. However, in practice, plenty of factors, such as environmental interference, vibration, inappropriate human intervention, loose connection problems, etc., can cause faults in the CAN network. Among the various faults in the network, the intermittent connection (IC) fault, a phenomenon that the network cables intermittently and randomly disconnect in short time intervals, is a very common wiring The associate editor coordinating the review of this manuscript and approving it for publication was Yiqi Liu .
problem. It has been reported in the automotive, aerospace, telecommunications, computer, and consumer industries [3]. The IC fault will cause transmission delay or message loss by random interruption of the bus communication, which will deteriorate the network's real-time performance. In severe cases, IC faults can cause a node, which sometimes has no physical connection to the IC fault location, to enter into the bus-off state and disappear from the network, which can trigger system-level shutdown in the reliability critical systems, increase the maintenance costs for the networks, and even safety related issues. The dangers of inadequate dealing with IC faults have been demonstrated by numerous incidents. For example, a case study on Ford's ignition module illustrates that intermittent No Fault Found (NFF) problems bring an unfavorable reputation and expensive costs to automakers [4], where IC fault is a major root cause of NFF problems [5]; a multitude of incidents within systems operated by NASA has indicated that repetitive IC faults would compromise the shuttle's safety [6]. Therefore, online diagnosis of the IC fault before they cause system-level damage is highly demanded for networked industrial systems.
However, the IC fault is a hidden and hard-to-diagnose problem. On the one hand, as CAN nodes share the bus media, the disconnections of network cables caused by IC fault impact the bus globally, thus there is no clear mapping relationship between the fault location and the bus-off nodes. On the other hand, the occurrence of IC faults is unpredictable and can not be reproduced. In addition, the topology of a CAN network will be very complex (e.g. tree topology) due to the combination of multi-port circuit connectors in industrial environments, which further complicates the mapping between collected information and IC fault locations. Hence, timely detection and complete location of IC faults become tremendous challengings for networked systems, especially those with complex topology.
Complex topologies bring challenges in terms of system diagnosability, especially as several faults cannot be isolated from each other. In literatures related to system diagnosability, Boussif et al. analyzed the necessary and sufficient condition for intermittent fault diagnosability of discrete event system, and proposed a systematic procedure for checking diagnosability [7]. Carvalho and Su et al. analyzed the robust diagnosability of discrete event system against the uncertain observations and observation loss [8], [9]. Fu et al. quantified the disturbance effect on the fault diagnosability performance for a stochastic system based on a new sliding window model [10]. Ibrahim et al. solved the diagnosability planning problem in order to ensure the diagnosability of a partially controllable system [11]. Massuyès et al. discussed the diagnosability based on the analytical redundancy relations of system, and determined the minimal additional sensors to satisfy the specific diagnosability degree [12]. Cabasino et al. presented a new labeling functions design procedure to solve the problem of optimal sensor selection for ensuring the diagnosability in labeled bounded and unbounded Petri nets [13]. However, the diagnosability analyses shown in [7], [8], [9], [10], [11], [12], and [13] consider the automaton or behavior model of system, which are inapplicable to stochastic CAN network IC fault diagnosis problems.
There is a significant amount of work dealing with the intermittent faults (IFs) identification and diagnosis in literature. Yan et al. proposed a method to detect all the appearing and disappearing time of IFs and isolate them by the constructed residuals in linear stochastic systems with measurement noises [14]. Yang et al. proposed a sparse representation based IFs detection method for dynamic system, which was able to update the fault detection threshold with online measurements [15]. Yu et al. constructed a fault discriminator to distinguish the multiple faults of different types, including abrupt faults, incipient faults, and IFs [16]. Mahapatro et al. formulated the IFs detection and diagnosis in wireless sensor networks as an optimization problem [17]. Syed et al. proposed an IFs detection algorithm which sent a sine wave and decoded the received signal for intermittent information [18]. Pralet et al. modeled the discrete event systems with IFs based on Past Time Linear Temporal Logic, and computed the preferred diagnosis incrementally at each time step [19]. Li et al. proposed a feasible simulation mechanism for the IFs in Gyroscopes, and developed a data-driven fault diagnosis method based on dynamic principle component analysis algorithm [20]. Cai et al. proposed a dynamic Bayesian network based fault diagnosis method for electronic systems to identify the faulty components and distinguish the fault types including the transient faults, IFs, and permanent faults [21]. He et al. proposed a robust model-based IFs detection filter for a class of discrete time uncertain networked system with delays and data missing [22]. Chen et al. proposed a method to detect scalar IFs in continuous linear stochastic dynamic systems using the sliding window based analytic residual [23]. However, the methods proposed in [14], [15], [16], [17], [18], [19], [20], [21], [22], and [23] do not involve the diagnosis of IC faults for CAN network. Recently, studies were also conducted to diagnose the IC fault in CAN networks. Lei et al. applied Generalized Zero Inflated Poisson models to describe the errors caused by IC faults [24], [25]. However, the localization of IC faults is not addressed. Lei et al. defined two error events corresponding to different IC fault scenarios in CAN network, and estimated the confidence intervals of the parameters for the events to diagnose IC faults [26], [27]. Zou et al. developed a two-level identification method to identify the multiple IC faults in the CAN network [28]. However, methods proposed in [26], [27], and [28] are based on the analysis of the physical layer information which is not robust and not extensible for the distributed network. There are several diagnosis methods developed at the data link layer level. Zhang et al. proposed a context free grammar (CFG) based IC fault localization method, which applied concurrent localization algorithm to diagnose local and trunk IC fault separately [29]. However, the implementation of this method is not straightforward. Zhang et al. proposed a unified diagnostic framework to localize local and trunk IC faults [30]. However, the methods proposed in [26], [27], [28], [29], and [30] has poor diagnosability, and can not handle the general complex topologies.
As can be seen from the literature, the existing studies not only have shortcomings in fault diagnosability analysis for CAN networks, but also have the following drawbacks in IC fault diagnosis methods due to the lack of diagnosability analysis methods: first, these methods can not solve multiple trunk IC problems; second, these methods have poor localization accuracy, i.e., the health condition of part of cables can not be determined; in addition, the existing methods are limited to simple topologies while not applicable to complex topologies that are more common in industrial scenarios, such as tree-topology networks. These drawbacks cause the existing methods to produce erroneous results and fail to complex topologies, which will not meet the needs of fault diagnosis for networked systems. Therefore, there is an urgent need to develop an effective IC fault diagnosis method for complex topology CAN networks.
In this paper, a novel methodology for IC fault diagnosis for CAN networks with complex topology is proposed. The advantages of this methodology are: • An IC fault diagnosability analysis method for CAN networks is developed by integrating network topology and network error patterns, which can be extended to other shared-media networks; • A high-precision fault diagnosis method is developed using a ''divide-and-conquer'' strategy to ensure the complete diagnosability of IC faults that can not be guaranteed by existing methods; • This is a general fault diagnosis methodology for various network topologies.
The results of this work will lead to effective IC faults diagnosis for ensuring the reliability and safety of the CAN network, and provide the data acquisition deployment strategy for system monitoring and maintenance. The rest of this paper is organized as follows: Section II introduces IC faults in the CAN network with complex topology, and the error event tuples are presented. Section III presents the problem definition. Section IV develops the methodology of this work, which includes the discussion of diagnosability, filtered search algorithm (FSA), and error sensing and recovering device (ESRD) deployment scheme. Section V sets up the testbed and conducts the case studies to demonstrate the effectiveness of the proposed methodology. Section VI concludes this paper and presents the future works.

II. IC FAULTS IN COMPLEX TOPOLOGY CAN NETWORK
In this section, the characteristics of the IC faults in CAN networks with complex topologies are introduced. Then, an approach for mathematically representing the information collected synchronously at multiple points in the network is presented.

A. CAN NETWORK WITH COMPLEX TOPOLOGY AND IC FAULTS
A CAN network with complex topology contains multiple branches, some branches connect more than one node, and the intersection point of branches is denoted as a coupling point. The lines directly connecting the nodes are drop lines, and the others are trunk lines. For example, there are 3 branches B 1 , B 2 , B 3 in Fig. 1, and the coupling point is b 1 .
Local and trunk IC faults refer that IC faults occur on the drop lines and trunk lines of the network, respectively. The IC fault is a phenomenon where the cable is transiently disconnected, resulting in dominant bits turning to recessive bits, which violates the CAN bus specification. According to CAN protocol, any node in the system detecting a logic error can signal it by transmitting an error frame, and interrupt the transmission of the data frame. The transmission will resume when the bus is idle again.

B. PATTERNS OF ERROR EVENT TUPLE
When an IC fault occurs, the error sensing and recovering device (ESRD) developed in this work collects the interrupted data frames in the data link layer, which are discarded by CAN nodes according to CAN protocol. Then, the collected interrupted data frames are compared with the reference data with the same source address collected when the bus is normal, which are used to produced the error event tuples.
As illustrated in Fig. 1, the data frame sent from N 2 is interrupted by IC faults, and the data collected by ESRD s 3 is different from the reference data, then error event e [2] s 3 = N is obtained, which implies that IC faults occur on the communication path between N 2 and s 3 (denoted as P N 2 −s 3 ), and all elements in the set of lines forming P N 2 −s 3 (denoted as L N 2 −s 3 , i.e. {o 2 p 2 , p 2 b 1 , b 1 p 3 , p 3 p 4 }), are suspected to suffer IC faults. Conversely, the data collected by s 1 and s 2 are the same as the reference data, then error events e [2] s 1 = e [2] s 2 = P are obtained, which implies that P N 2 −s 1 and P N 2 −s 2 are error free, and all the elements in L N 2 −s 1 and L N 2 −s 2 (i.e. {o 2 p 2 , p 2 p 1 } and {o 2 p 2 , p 2 b 1 , b 1 p 5 , p 5 p 6 }, respectively) are error free.
Suppose there are h ESRDs in the network, which is represented as s 1 , s 2 , · · · , s h . If an IC fault causes an error interruption when node N i,∀i∈{1,2,··· ,n} is transmitting, then every ESRD will produce an error event for node N i . An error event tuple can be obtained by sequentially combining these h error events. The pattern of error event tuple varies depending on the types and the locations of IC faults. Assume that there are m i patterns of error event tuple for node N i corresponding to all IC faults in the network, then they can be represented as: where E <t> N i is the t-th pattern of error event tuple for node N i , ∀t ∈ {1, 2, · · · , m i }; e [i]<t> s k is the k-th error event produced by sensor s k in E <t> N i , ∀k ∈ {1, 2, · · · , h}. Without major modifications to the hardware system, based on the accessibility, i.e., the ease of hooking up ESRDs  to the network, of the physical cables in industrial environments, the priority of ESRD placement positions in descending order is: terminal resistor points, open connectors of nodes, the interfaces between drop lines and trunk lines, and the coupling points. For example, the priority order of ESRD placement positions of the network shown in Fig. 1 is

III. PROBLEM DEFINITION
Error event tuples generated by the aforementioned procedure will be used for IC fault localization analysis. To completely localize all IC faults, the following issues should be addressed: (1) How to evaluate the diagnosability of CAN network for IC faults, and how to deploy ESRDs?
(2) How to efficiently diagnose IC faults based on the initially deployed ESRDs?
(3) How to determine the non-diagnosable subnets, and how to determine the sufficient number of ESRDs to ensure the complete diagnosability of these subnets?
Two assumptions are made: (1) The IC faults are independent and persistent over time; (2) The communication cables between the data acquisition system and network are reliable.

IV. METHODOLOGY
The basic idea of this mothod is by using different error event tuple patterns to diagnose the trunk and local IC faults separately, and when there are non-diagnosable subnets, a ''divide-and-conquer'' strategy is used to compress these subnets by adding the ESRD until all network faults are located.
The framework of the method is shown in Fig. 2. Firstly, based on the diagnosability analysis of IC fault, the ESRDs are deployed at the ends of network branches. Then, filtered search algorithm (FSA) is developed to diagnose the trunk and local IC faults respectively, where trunk IC faults exploit the error event tuples with heterogeneous elements (e.g., E <1> N 2 = (P, P, N) shown in Fig. 1) and local IC faults exploit the ones with homogeneous elements (all P or all N). Thirdly, based on the preliminary trunk IC faults diagnosis result, the non-diagnosable subnets are determined. Fourthly, if there are non-diagnosable subnets, then for any subnet, the minimum adding ESRD sets (MAES) generation algorithm is developed to determine the locations where ESRDs should be added, and FSA is applied to diagnose the IC faults based on the results of the added ESRDs. Conduct the above process iteratively until all subnets are fully diagnosable, then the complete IC fault locations are obtained. Details of the proposed method are introduced as follows.

A. DIAGNOSABILITY AND INITIAL ESRD CONFIGURATION
In this section, the IC fault diagnosability of the CAN network with complex topology is analyzed, and the necessity of deploying ESRDs at the ends of the branches is illustrated.
Diagnosability is equivalent to determine if the CAN system is diagnosable, which contains detectability and discriminability. Detectability refers that the occurrence of a specific type of IC fault can be detected based on the error event tuples for nodes, while discriminability refers to the capability of identifying the locations of IC faults.
In a fully diagnosable CAN network, the patterns of error event tuple for nodes differ according to the existence and the types of IC faults, furthermore, for the IC faults with the same type, there is at least one node of which error event tuple patterns differ depending on the locations of IC faults.

Proposition 1: A CAN network is fully diagnosable for local IC faults when deploying ESRDs at the ends of each branch.
Proof: 1. Firstly, the CAN network with simple topology is analyzed. As shown in Fig. 3a, the network only contains one branch connecting n nodes, so that deploying ESRDs at two terminal resistors is topologically equivalent to deploying them at the branch ends p 1 and p n .
(1) If only one ESRD s 1 is deployed at any point in set {p 1 , · · · , p n , o 1 , · · · , o n }, then there exists a node N i,i∈{1,2,··· ,n} such that P N i −s 1 contains both drop and trunk lines, and IC faults occurring on any line in L N i −s 1 will cause s 1 to produce an error event tuple N for node N i . In this case, different types of IC faults lead to the same error event tuple for the node, which contradicts the requirement of complete diagnosability of the network. Therefore, the network is not fully diagnosable for local IC faults.
(2) If two ESRDs s 1 and s 2 are deployed in the network, and ∃s k,k∈{1,2} / ∈ {p 1 , p n }. There exists a node N i,i∈{1,2,··· ,n} such that set L N i −s 1 ∩ L N i −s 2 contains both drop and trunk lines, and IC faults occurring on any line in the set will cause the ESRDs to produce the same error event tuple (N, N) for node N i . The follow-up discussion is the same as in Part 1(1). Hence, the network is not fully diagnosable for local IC faults.
(3) If two ESRDs s 1 and s 2 are deployed at the end of network branch, i.e., point p 1 and p n . For ∀N i,i∈{1,2,··· ,n} , if the IC fault occurs on the drop line of N i , the error event tuple for N i produced by ESRDs is (N, N), since both path P N i −s 1 and L N i −s 2 contain this drop line; Similarly, if the IC fault occurs on the drop line of other nodes, the error event tuple for N i is (P, P); If the IC fault occurs on any trunk line, the error event tuple for N i is (P, N) or (N, P), since only one path in P N i −s 1 and L N i −s 2 contains this trunk line. In this case, the error event tuple pattern of each node not only varies with the occurrence of IC fault, but also varies with the type of IC fault. Therefore, under the premise of deploying ESRDs at each branch end of simple topology network, the network is fully diagnosable for local IC faults.
2. The complex topology network shown in Fig. 3b is analyzed, which is equivalent to adding a branch B 3 with (g 2 −g 1 +1) nodes to the simple topology network. Referring to the discussion in Part 1, ESRDs s 1 and s 2 should be deployed at each branch end of the simple topology to ensure full diagnosability for local IC faults. Based on this premise, the following analysis is carried out: (1) Without adding ESRDs. In this case, there exists a node contains both drop and trunk lines, and IC faults occurring on any line in the set will cause s 1 and s 2 to produce the same error event tuple (N, N) for node N i . The follow-up discussion is the same as in Part 1(1), indicating that the network is not fully diagnosable for local IC faults.
(2) Adding an ESRD s 3 at any optional position except p g 2 . In this case, there exists a node N i,i∈{g 1 ,g 1 +1,··· ,g 2 } such that set L N i −s 1 ∩ L N i −s 2 ∩ L N i −s 3 contains both drop and trunk lines, and IC faults occurring on any line in the set will cause s 1 and s 2 to produce the same error event tuple (N, N, N) for node N i . The follow-up discussion is the same as in Part 1(1), indicating that the network is not fully diagnosable for local IC faults.
(3) Adding an ESRD s 3 at the end of branch B 3 , i.e., point p g 2 . For ∀N i , if an IC fault occurs on the drop line of N i , the paths from N i to the three ESRDs all contain this drop line, thus the error event tuple for N i is (N, N, N); Similarly, if an IC fault occurs on the drop line of other nodes, the error event tuple for N i is (P, P, P); If an IC fault occurs on any trunk line, the paths from N i to each ESRD do not all contain this trunk line, and thus, the error event tuple for N i contains both P and N. In this case, as discussed in Part 1(3), the network is fully diagnosable for local IC faults when ESRDs s 1 , s 2 , and s 3 are deployed at the ends of all network branches.
3. By analyzing complex topological networks with more branches using a similar analysis process as illustrated above, the conclusion can be drawn that, with ESRDs deployed at the ends of each network branch, CAN network is fully diagnosable for local IC faults. □ Proposition 2: A CAN network is not fully diagnosable for trunk IC faults when deploying ESRDs at the ends of each branch. Proof: Assuming that the network can be fully diagnosed for trunk IC faults when ESRDs are deployed at the ends of each branch. According to the definition of diagnosability, for different sets of trunk IC faults, there is at least one node of which error event tuple patterns are variable. For example, in the simple topology network illustrated in Fig. 4, ESRDs s 1 and s 2 are deployed at the branch ends, i.e., at points {p 1 , p n }.
If IC faults occur on the trunk lines p 2 p 3 and p 6 p 7 , the error event tuples generated by s 1 and s 2 for each node are: If an IC fault occurs on the trunk lines {p 2 p 3 , p 6 p 7 }∪{u|u ∈ 2 {p 3 p 4 ,p 4 p 5 ,p 5 p 6 } , u ̸ = ∅}, the error event tuples for all nodes are the same as those shown in (2), where 2 {p3p4,p4p5,p5p6} denotes the power set of {p 3 p 4 , p 4 p 5 , p 5 p 6 }. In this case, for different sets of trunk IC faults, all nodes have the same pattern of error event tuples, which contradicts the assumption that the network is fully diagnosable for trunk IC faults. Therefore, when ESRDs are deployed at the ends of each branch, the CAN network is not fully diagnosable for trunk IC faults. □ Proposition 3: Deploying ESRDs at the ends of each branch of CAN network with complex topology, then the subnets between arbitrary two trunk IC faults occurring on the same branch are not diagnosable for trunk IC faults, and the subnets between arbitrary two trunk IC faults occurring on the different branches are diagnosable for trunk IC faults.
Proof: 1. For a simple topology network with ESRDs deployed at each branch ends, as shown in the proof of Proposition 2, the subnetworks between arbitrary two trunk IC faults on this branch are not diagnosable for trunk IC faults.
2. Considering the complex topology network with three branches, as shown in Fig. 3b. The ESRDs deployed at all branch ends, i.e., points p 1 , p n , and p g 2 . The three branches are equivalent in terms of network topology and ESRD location, thus the layout of arbitrary two trunk IC faults f 1 and f 2 includes the following two scenarios: (1) Scenario 1: The two trunk IC faults occur on the same branch. Without loss of generality, the branch B 1 is selected for further analysis. As shown in Fig. 5a, f 1 and f 2 occur on the trunk lines p 2 p 3 and p (g 1 −2) p (g 1 −1) , respectively. On this basis, if any trunk line in {p 3 p 4 , p 4 p 5 , · · · , p (g 1 −3) p (g 1 −2) } occurs IC faults, no error event tuple pattern of any node will change. Therefore, the subnetwork between f 1 and f 2 within the same branch is not diagnosable for trunk IC faults.
(2) Scenario 2: The two trunk IC faults occur on different branches. We choose branches B 1 and B 3 for further analysis. As shown in Fig. 5b, f 1 and f 2 occur on the trunk lines p 2 p 3 and p (g 1 +1) p (g 1 +2) , respectively. On this basis, when any trunk line in {p 3 p 4 , p 4 p 5 , · · · , p (g 1 ) p (g 1 +1) } occurs IC faults, there is always at least one node whose error event tuple pattern changes. Therefore, the subnetwork between f 1 and f 2 occurring on different branches is diagnosable for trunk IC faults.
3. To analyze complex topology networks with more branches, the network can be decomposed into the topology shown in Fig. 3b, and the diagnosability analysis of the subnetwork between arbitrary two trunk IC faults is the same as in Part 2. Then, Proposition 3 can be proved.
□ Based on these propositions, the ESRDs are initially deployed at the end of each branch of CAN network, and the patterns of error event tuple produced by these ESRDs for all nodes are used to diagnose local and trunk IC faults.

B. IC FAULTS DIAGNOSIS METHOD
The patterns of error event tuple for nodes differ according to the types of IC faults. Specifically, the patterns of error event tuple corresponding to local IC faults are with homogeneous elements, which are either all P or all N. The patterns of error event tuple corresponding to trunk IC faults are with heterogeneous elements, which are the combination of P and N. Therefore, two types of error event tuple patterns are used to localize trunk and local IC faults separately.
In this part, the trunk IC fault diagnosis method is first introduced in detail, and then the differences of local IC fault diagnosis are emphasized. For further analysis, assume that h ESRDs are deployed and m i patterns of error event tuple for N i,i∈{1,2,··· ,n} are produced.

1) TRUNK IC FAULTS DIAGNOSIS
The patterns of error event tuple with heterogeneous elements are used to derive trunk IC faults, and let m i,het denotes the number of error event tuple patterns with heterogeneous elements for N i .

a: DOMAIN OF ERROR EVENT TUPLE PATTERNS WITH HETEROGENEOUS ELEMENTS
For node N i , if ∃v ∈ {1, 2, · · · , h}, such that where (the t-th pattern of error event tuple for node N i ), this indicates that P N i −s v is error free in the process of producing the m i,het patterns of error event tuple. Thus L normal Trunk (i, v), the set of trunk lines contained in P N i −s v , is error free during the measurement process. After conducting the same analysis for all nodes, the set of trunk lines which are error free in the network, L normal Trunk , can be obtained: can be derived by (5), as shown at the bottom of the next page, where f j denotes that IC faults occur on the line j, L Trunk is the set of all trunk lines in the network, L Trunk is the set of trunk lines contained in L N i −s k .
The domain set D can be obtained after deriving the domains of error event tuple patterns with heterogeneous elements for all nodes.
Example 1: Derive the domain of error event tuple pattern with heterogeneous elements: For ease of understand, the network topology shown in Fig. 1 is used as an example. Assuming that IC faults occur on b 1 p 3 and b 1 p 5 , then the error event tuple patterns with heterogeneous elements for node 2 are E <1> Now, the domain of each error event tuple pattern can be derived by (5). Take E <1> N 2 = (P, P, N) as an example: Trunk +L For further analysis, let X = (x <1> , x <2> , · · · , x <T x > ) denotes the sequence of vectors whose elements sequentially correspond to the elements in D, where x <1> is the vector corresponding to D <1> N 1 , and T x = n i=1 m i,het . Then a score vector A ∈ R |L Trunk | that corresponds to the likelihood of the occurrence of each IC fault contained in F Trunk is calculated: is the u-th element of vector A and denotes the score of F Trunk (u). The domains of error event tuples can be obtained by using (5), which are In this topology, the trunk IC fault vector can be obtained by using (7), which are Hence we can obtain Z = [0, 0, 6, 0, 6, 0] T . Then we can calculate the score vector for trunk IC faults in this example, which is:

d: FILTERED SEARCH ALGORITHM (FSA)
In FSA, four main steps are performed at each level: deletion, selection, calculation, and pruning. As illustrated in Fig. 6, the deletion step deletes the domains containing the current IC faults sequence from the domains set, and calculates the score vector based on the updated domains set. The selection step chooses the IC faults with the maximum scores to be the candidate vertices. The calculation step applies the evaluation function to all the candidate vertices. The pruning step sorts all candidate vertices in increasing order of their evaluation function and retains the top vertices while pruning the remaining vertices. The details are shown as follows.
(r 1 ,r 2 ) (u). If multiple trunk IC faults are obtained, then these candidate faults in level 3 are represented as y <3> (1) , y <3> (2) , · · · , y <3> (k 3 ) . • Calculation. E <3> (r 1 ,r 2 ) = E <2> (r 1 ) + log A <2> (r 1 ,r 2 ) (v). • Pruning. Execute the above deletion-selectioncalculation steps for all IC faults sequences preserved in level 2, and obtain the corresponding evaluation function represented as , · · · . Then preserve the vertices with maximum evaluation value for further exploration and prune the others. (4) The above deletion-selection-calculation-pruning procedure is iterative level by level until it reaches the final level, i.e., D <T y > (,,··· ) = ∅. Then the obtained sequence (y <1> , y <2> , · · · , y <T y > ) is trunk IC faults locations in the network. Example 3: Diagnose the trunk IC faults using FSA: In this example, we diagnose the IC fault based on the analysis shown in Example 2. The process of applying FSA to derive trunk IC faults is shown in Fig. 7, details are presented as follows.

2) LOCAL IC FAULTS DIAGNOSIS
The patterns of error event tuple with homogeneous elements are used to diagnose local IC faults. Let m i,hom denotes the number of error event tuple patterns with homogeneous elements for N i , m i,het + m i,hom = m i .

a: DOMAIN OF ERROR EVENT TUPLE PATTERNS WITH HOMOGENEOUS ELEMENTS
If all elements of E <t> N i , ∀t ∈ {1, 2, · · · , m i,hom } are P, then its domain is: 52206 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. Otherwise, the domain is: where ∀k ∈ {1, 2, · · · , h}, L Local is the set of all drop lines in the network, L (k) Local is the set of drop lines contained in L N i −s k . The domain set D can be obtained after deriving the domains of error event tuple patterns with homogeneous elements for all nodes, and is calculated as follows: where F Local (u) denotes the IC faults occurring on the drop line L Local (u). The process of calculating the score vector for local IC faults is the same as shown in Section IV-B1.b, but note that

c: LOCALIZING LOCAL IC FAULTS USING FSA
The objective and the evaluation function for localizing local IC faults are the same as shown in Section IV-B1.c but y <g> g∈{1,2,··· ,T y } ∈ F Local . The process of FSA is the same as shown in Section IV-B1.d except selecting IC faults from F Local at each level instead of from F Trunk .

C. MAES GENERATION ALGORITHM FOR NON-DIAGNOSABLE SUBNETS
The process of determining the non-diagnosable subnets is shown in Fig. 8, based on the initial IC fault diagnosis process, the health statuses of drop lines and trunk IC faults locations can be obtained, and any area of the network between two trunk IC faults locating at the same branch is the nondiagnosable subnet.

1) STRUCTURAL CHARACTERISTICS OF THE SUBNET
For ease of analysis, the MAES generating algorithm firstly assumes that all possible locations for adding ESRDs are the interfaces between drop and trunk lines. Let S denotes the set of ESRDs deployed at all interfaces between drop and trunk lines in the subnet, and W denotes the set of all trunk lines in the subnet, then the structural model of the subnet can be represented by a matrix A, where the rows correspond to paths between ESRDs and nodes that contain trunk lines, and the columns correspond to trunk lines. A = a u,v , and Let R = {r 1 , r 2 , · · · , r g 1 } denotes any set of paths, C = {c 1 , c 2 , · · · , c g 2 } denotes any set of trunk lines. Then a finite set of paths R ′ is a structurally determined set with respect to C if R ′ satisfies the following constraints: where ∀c q ∈ C, Var R (c q ) = {r ∈ R|c q is contained in r}, and ∀r t ∈ R, Var C (r t ) = {c ∈ C|r t contains c}. Definition 1: R ′ m is the maximum structurally determined set if the number of elements in R ′ m is maximum among all the structurally determined sets.
Let R equals to the set of row elements of matrix A, and C equals to W, then R ′ m with respect to W can be obtained, which forms the basis of constructing the structurally determined model A o of the subnet. A o is a submatrix of A, the rows of A o correspond to the elements of R ′ m and the columns are the same as the ones of A, the values of elements in A o can be obtained from A.
As one ESRD may associate with multiple paths, thus the paths associating to the same ESRD should be analyzed together, which leads to the fault signature matrix G. G can be obtained by replacing the trunk line of A o with the corresponding IC faults, and combining the paths associating with the same ESRD in A o , where the involved rows of A o execute the logical OR operation.

2) MINIMUM W-DOMINATING SET GENERATION ALGORITHM
Fault signature matrix G can be equivalently represented by a bipartite graph G = (S, W, E), where S and W are the vertex sets, and E is the edge set. ∀s i ∈ S, s i denotes the ESRD corresponding to the row element of G, and ∀w j ∈ W, w j denotes trunk IC fault corresponding to the column element of G, and E = {(s i , w j )|G(i, j) = 1}. Follows introduce the definitions relative to the forthcoming sections.
Definition 2: ∀s i ∈ S, the neighborhood of s i is defined as N (s i ) = {w j ∈ W|(s i , w j ) ∈ E}, the degree of s i is defined as the number of neighbors of s i , i.e., d(s i ) = |N (s i )|.

Definition 3:
Let G = (S, W, E) denotes the bipartite graph corresponding to G. A subset Q of S is a W-dominating set if for every w j ∈ W, w j is adjacent to at least one vertex of Q, i.e., Q ⊆ S and ∀w j ∈ W, ∃ s i ∈Q (s i , w j ) ∈ E [31].
Definition 4: A subset Q is called a minimal Wdominating set if no proper subset of Q is a W-dominating set. A minimum W-dominating set is a minimal W-dominating set of minimum size among all minimal W-dominating sets of G.
Generating MAES based on G can be converted to the process of solving the minimum W-dominating set of G, as shown in Algorithm 1. The input of the algorithm is the bipartite graph G = (S, W, E), and the output is the set of minimum W-dominating set M, i.e., the MAES. 6: if W = ∅ then The algorithm operates as follows: Initially, the set of minimum W-dominating set M, the set of minimal W-dominating set Q, and the W-dominating set Q are all empty. Then, the algorithm calls the procedure MWDS(S, W, Q, Q). This procedure first finds the vertices in S with the maximum degree. Next, for each s i with the maximum degree, let s i be included in Q, delete s i from S and delete w j ∈ N (s i ) from W. Then, if W is empty, the algorithm holds Q in Q, and initializes Q as empty. Otherwise, the algorithm recalls the procedure MWDS(S, W, Q, Q) until W is empty. Finally, all elements in Q are minimal W-dominating sets, and the elements with minimum cardinality are minimum W-dominating sets and held in M.

Algorithm 1 Minimum W-Dominating Set Generation
According to the accessibility priority order of the ESRD placement positions (in Section II-B), the additional ESRD locations can be determined as follows: for any location contained in M (e.g., p • in Fig. 1), if the drop line connected to this location is error free, then the ESRD is deployed at the corresponding node's open connector, otherwise the ESRD is deployed at this location. Then, the full diagnosis of trunk IC faults in the subnet can be achieved by iterately performing the ''divide-and-conquer'' procedure shown in Fig. 2.

V. TESTBED SETUP AND CASE STUDIES
In order to demonstrate the effectiveness and advantages of the proposed methodology, a testbed is constructed and three case studies are conducted. In case study 1, the analysis procedure is demonstrated in detail using a simple bus topology. In case study 2, the effectiveness of this method for diagnosing complex topology is illustrated. In case study 3, the universality of the proposed method for network topology is verified.
The constructed testbed is shown in Fig. 9, which consists of three modules: a CAN protocol based networked system, the IC fault injection system, and the ESRDs developed in this work. The IC fault injection system can inject IC faults at different points independently by controlling the different high speed on-off switches placed in the cable. The arrivals of the IC faults follow Poisson processes with arrival rate λ IC and the duration of the IC faults is set to be one bit length. ESRDs are implemented by the NI CompactRIO FPGA Data Acquisition framework and the CAN transceiver.

A. CASE STUDY 1: IC DIAGNOSIS FOR SIMPLE TOPOLOGY
The network topology employed in this case study is shown in Fig. 10, two ESRDs s 1 , s 2 are deployed at the two ends of the bus. Four lines are chosen to inject IC faults independently; l 12 has local IC faults at an injection rate of λ l 12 IC = 270 faults per second; l 2 , l 3 , and l 5 have trunk IC faults at injection rates of λ l 2 IC = 164 faults per second, λ l 3 IC = 95 faults per second, and λ l 5 IC = 169 faults per second, respectively. In total of 10,714 error records are received in this case study. Patterns of error event tuples for each node are shown in TABLE 1, and their domains can be obtained by applying (5), (11), and (12).

1) DIAGNOSE LOCAL IC FAULTS
The domain set employed for diagnosing local IC faults is D=   0.07, 0.07] T can be obtained by (8). Then FSA is applied to diagnose local IC faults, as shown in Fig. 11a. Level 1. y <1> (1) = f l 12 , and E <1> = log 0.51 = −0.29. Level 2. For y <1> (1) , delete the elements that contain y <1> from D, then D <1> (1) = ∅, thus FSA is terminated for the sequence (f l 12 ), which is the final obtained sequence. Therefore, the set of local IC faults is {f l 12 }.

2) DERIVE TRUNK IC FAULTS
The domain set employed for deriving trunk IC faults is D = The trunk IC fault vector F Trunk = {f j |j ∈ {l 1 , l 2 , · · · , l 7 }}, and the score vector A = [0, 0.25, 0.25, 0.25, 0.25, 0, 0] T . The process of applying FSA to derive trunk IC faults is shown in Fig. 11b, details are presented as follows.

3) GENERATE MAES FOR THE NON-DIAGNOSABLE SUBNET
As f l 2 and f l 5 occur on the same branch, then the subnet between f l 2 and f l 5 is not diagnosable according to the Proposition 3, which is shown in the area SN 1 of Fig. 10. The structurally determined model A o and the fault signature matrix G of subnet SN 1 are: s p 13 1 0 where s p 13 denotes that an ESRD is deployed at point p 13 . The set of minimum W-dominating set M = {{s p 16 }} can be obtained by using the Algorithm 1. Since the drop line l 11 is error free according to the results in Section V-A1, then new ESRD should be added at point o 16 .
As discussed in Section IV-C, the patterns of error event tuple with heterogenous elements produced by s 1 , s 2 and the adding ESRD s o 16 should be used to diagnose trunk IC faults in SN 1 . In this case study, in total of 3,842 error records are received for nodes in SN 1 by these three ESRDs, and the patterns of error event tuple with heterogeneous elements for each node are: N, N), and the obtained sequence by applying FSA is (f l 3 ). VOLUME 11, 2023  According to the analysis above, the set of IC fault locations is {f l 2 , f l 3 , f l 5 , f l 12 }, which agrees with the experiment setup.

4) DISCUSSION
In this part, the diagnostic precision of the method proposed in this paper is compared to the diagnosis methods shown in [29] and [30].
The diagnostic errors include (1) false diagnosis, where a fault-free line is diagnosed as faulty, and (2) miss diagnosis, where a fault is present but not diagnosed. The false diagnosis rate (FDR) and the miss diagnosis rate (MDR) are used to quantitatively described the two types of errors, respectively, which are calculated by where #(•) denotes the cardinality of •, denotes the set of diagnosed fault locations, and * denotes the set of actual fault locations. In this case study, the set of actual IC fault locations is * = {f l 2 , f l 3 , f l 5 , f l 12 }. Then the diagnosis results and errors of different methods can be obtained, which are shown in TABLE 2. It can be seen that (1) the diagnosed locations of [29] and [30] are not consistent with the experiment setup; (2) the MDR of the proposed method is lower than the methods in [29] and [30]. Therefore, we can conclude that the proposed method can lead to a more precise diagnostic result.

B. CASE STUDY 2: IC DIAGNOSIS FOR THREE-BRANCH COMPLEX TOPOLOGY
In this case study, the network topology is shown in Fig. 12, which contains 3 branches connecting 10 nodes. Three ESRDs s 1 , s 2 , s 3 are deployed at the ends of branches. Three lines are chosen to inject IC faults independently: l 15 has local IC faults at an injection rate of λ

3) GENERATE MAES FOR THE NON-DIAGNOSABLE SUBNET
As f l 7 and f l 9 occur on the same branch, then the subnet between f l 7 and f l 9 is not diagnosable according to the Proposition 3, which is shown in the area SN 2 of Fig. 12. The structurally determined model A o and the fault signature matrix G of subnet SN 1 are:   As discussed in Section IV-C, the patterns of error event tuple with heterogenous elements produced by s 1 , s 2 , s 3 and the adding ESRD s o 7 should be used to diagnose trunk IC faults in SN 2 . In this case study, in total of 4,464 error records are received for nodes in SN 2 by these four ESRDs, and the patterns of error event tuple with heterogeneous elements for each node are: E <1> According to the analysis above, the set of IC fault locations is {f l 7 , f l 9 , f l 15 }, which agrees with the experiment setup. Thus, the effectiveness of applying the proposed method to fully diagnose all IC faults in the CAN network with complex topology has been verified.

C. CASE STUDY 3: IC DIAGNOSIS FOR FIVE-BRANCH COMPLEX TOPOLOGY
In this case study, the network topology is shown in Fig. 14
Since these trunk IC faults occur on the different branches, then there is not non-diagnosable subnet according to Proposition 3. Based on the analysis above, the set of IC fault locations is {f l 5 , f l 9 , f l 13 , f l 17 , f l 25 }, which agrees with the experiment setup. Thus, the universality of applying the proposed method to multiple network topologies has been verified.

VI. CONCLUSION
In this paper, a novel methodology using data link layer information to achieve the full diagnosis of IC faults in the CAN network with complex topology is proposed. The diagnosability of the CAN network is defined and the initial ESRD deployment scheme is presented based on the diagnosability analysis. On the basis of the patterns of error event tuples produced by ESRDs, the FSA algorithm is developed to localize local and trunk IC faults respectively. The MAES generation algorithm is proposed to obtain the sequential ESRD deployment strategy for the non-diagnosable subnets. Testbed is constructed and case studies are conducted to demonstrate and verify the proposed methodology. Experiment results show that IC faults locations identified by the proposed method agree well with the experiment setup in various scenarios, and the CAN network is fully diagnosable to IC faults by deploying a minimum number of ESRDs.
Future works include improving the current method to ensure the complete localization of IC faults without reconfiguring the ESRD so that a simpler and practical implementation is available for industrial applications, and extending the current method to short-circuit IC faults as well as general hybrid IC fault scenarios with both open and short-circuit IC faults. 52212 VOLUME 11, 2023