A New Fault Diagnosis Method of Flexible OSC Clamp

The capacity and safety of train transportation mostly depends on the working state of the flexible overhead contact system (OCS) clamp in the traction power supply system. The notable feature of the clamp loosening problem is its temperature change. Due to the influence of long-distance measurement, environmental factors, the traditional signal processing methods for temperature measurement have basically failed. The feature analysis method combined with empirical clustering and time series analysis can effectively determine the thermal health status of bare conductors in the overhead contact line and realize the diagnosis of the thermal state. In order to eliminate the much interference in measurement process, the data is pre-processed based on the point density criterion, and then the clustering method of big data is used to calculate the relative proximity of the sampled data, the normal data and the historical fault data. And the time-series similarity analysis method is used to obtain the predicted fault development time of the hidden equipment. Experiments show that the method can accurately identify the thermal health of bare conductors, the concept is clear and the algorithm is simple.


Introduction
The OCS clamp in railway power supply system is easy to become slightly loose, which is the early fault behavior of clamp connection state, so the early identification of clamp loosening is considered as an important measure to improve the reliability of traction power supply system [1]. With the continuous advancement of smart grid construction, many scholars have applied various intelligent algorithms to the field of fault diagnosis and achieved good results [2]. Accurate analysis of the thermal state of the bare conductor is of great significance to the safe operation of the railway system and to improve economic efficiency.
At present, the detection of bare conductors mainly relies on infrared non-contact temperature measurement and contact temperature measurement of PT100 temperature sensor and other modules, and only determines the operating range of the bare conductor thermal state by analyzing the temperature. However, it is extremely difficult to eliminate early failures and it is difficult to analyze hidden dangers. In the existing research, it has been proposed to use the monitoring resistance change to achieve the purpose of detecting the health of the bare conductor, and some scholars have proposed the idea of using thermal imaging analysis [3]. However, there is no algorithm support. The former has complex measurement variables, and the latter has too high detection cost, which is difficult to implement under the actual working conditions of the railway OCS.
In this paper, the idea of big data clustering is applied to bare conductor detection, combined with time series analysis method, a set of OCS bare conductor thermal state evaluation system is established. On the basis of measuring the temperature change of the naked conductor, a comprehensive evaluation of the current performance and hidden danger of the naked conductor is completed. The model firstly performs data preprocessing based on the point density criterion, marks and eliminates noise points to complete the data; secondly, the sampling data points are selected to calculate the relative proximity between them and the historical fault data clusters; finally, according to the results of relative proximity calculation, the state of the equipment is marked and classified.

Device status based on big data clustering
The bare conductor fault is mainly caused by the increase of resistance due to looseness, which leads to heating. The collection of bare conductor fault history data samples will contain the collection of bare conductor thermal state data under bare conductor fault. All historical data points that fault a bare conductor are called a cluster of C, Based on big data clustering and time series analysis, the idea of the thermal state evaluation system for the bare conductor of the OCS was adopted to calculate the mean reciprocal distance between the nearest data points x and k, Thus, the proximity l(x, k) of the data point x to the data cluster C can be obtained.
Cj(x, k) is the data point x which the set of k nearest neighbors in cluster Cj; y is the nearest neighbor of x in cluster Cj; d(x, k) is the Euclidean distance between x and y, which can be calculated by equation (2).
m is the data dimension; xi is the coordinate of the data point x on the dimension i; yi is the coordinate of the data point y on the dimension i; λi is the normalization coefficient on the dimension i, obtained in the data preprocessing stage.
The internal data point of C is taken as the data point to be tested, and is substituted into equation (1) for calculation, and the degree of proximity of the data point with respect to the cluster is obtained, which is called intra-cluster proximity. The intra-cluster proximity of all internal data points is averaged to obtain the base proximity lref of C.
For the data point x to be calculated: the greater its proximity to the cluster, the closer it is to the cluster, which is more likely to fail. The degree of proximity of data point x and cluster C is represented by the relative proximity lc(x, k).

Trend prediction based on time series similarity analysis
The change of the fault temperature of the naked conductor has time sequence characteristics. For the naked conductor with latent fault risk, the speed and trend of the transition from the current state to the fault state can be judged based on the similarity analysis of time series.

Determining the time series of data to be tested
When the relative proximity between the data point x of the device to be evaluated and the bare conductor fault data cluster C is in the interval [0.6, 0.8], the data points x are taken at the plurality of data points before the acquisition time as the time series X of the data to be tested. The steps are as follows: • Set the length N of the sequence X; • Determine the last data point time stamp txN: Set the acquisition time of the data point x to txN; • Calculate the relative proximity: set the maximum search time txmax of the sequence X. Calculating the relative proximity lc(x(t),k) of the data point x(t) in the range of t∈[tyb-txmax, txmax]and the fault data cluster C according to equation (3) • Determine the initial data point time stamp tx1: Search from the txN to the first data point x(t) with relative proximity lcj(x(t),k)= 0.5±εx, and εx is the search allowable error. If there is a data point that meets the requirements, set the corresponding data acquisition time to tx1. If there is no data point that meets the requirements, set txN -txmax to X1; • Get the time series: take D as the sampling interval, extract dtx=∆tN/(N-2) data points in chronological order from the beginning, and get the final time series X={x(x1), x(tx2),…… , x(txN)}.

Determining the fault history data time series
Taking the data point y adjacent to a data point x in the bare conductor fault cluster C as an example, the time series Y before the data point y is selected as the comparison sequence. The process is basically consistent with the time series X of obtaining the data to be tested, and only the first and last data points are determined to have different time stamping methods, and the difference is as follows.
• Determine the initial data point time stamp ty1: The acquisition time of the fault data point y is tyb, and the data point y is also recorded as y(tyb). Set the maximum search time tymax for sequence Y. Calculate the relative proximity of the data y(t) in the range t∈[tyb-tymax, tymax] and the fault data cluster C. lc(y(tx1),k) searches for the data point satisfying lc(y(t),k)= lc(y(tx1), k) ±εy1 from the time tyb, and εy1 is the initial search allowable error. If a data point that meets the requirements is found before the time tyb-tymax, the corresponding acquisition time is set to ty1. Otherwise, end all steps and consider the search to fail.
• Determine the end data point time stamp tyN: Perform a second search in the ty1 to tyb interval, and search for the first data point backward from ty0, so that lc(y(t),k)= lc(y(tx1),k) ±εy2,εy2 is the secondary search allowable error. If you find a data point that meets the requirements before time tyb, set the corresponding acquisition time to ty. Otherwise, end all steps and consider the search to fail.

Similarity calculation of time series
The distance between two time series can reflect their similarity characteristics. The smaller the distance is, the higher the similarity of the two time series will be. Commonly used time series distances include Euclidean distance [4], edit distance [5], dynamic time bending distance, and so on [6]. Among them, the edit distance between two strings sequence is used to calculate conversion steps needed to edit, Euclidean distance is the subscript of the corresponding two time series data points, the sum of the distance between corresponding computing dynamic time warping distance allows synchronization point, its essence is by finding the minimal path between the two time series to calculate the distance between them. Since the time series data element in this paper is the temperature information of the naked conductor, and it is necessary to prevent the phenomenon that the distance evaluation result is too large due to the dislocation of some data, the dynamic time bending distance is adopted to measure the similarity of the two time series. The time series X={x(tx1),x(tx2),……,x(txN)} and the time series Y={y(ty1),y(ty2),……,y(tyN)}Dynamic time bending distance can be obtained by recursive calculation(4). y(ty3),…, y(tyN)}. In order to conveniently measure the distance between different length sequences, the dynamic time bending distance calculation results are normalized, and the normalized distance finally used to evaluate the sequence similarity is

State evaluation method flow
The comprehensive evaluation method of bare conductor state is divided into three stages, the flow of which is shown in Figure 1.  Figure 1. Method flow chart • First, data preprocessing is performed based on the point density criterion, and the noise points are marked and eliminated, and the data is complemented.
• Next, select the sampled data point x and calculate its relative proximity to the historical fault data cluster.
• Finally, based on the relative proximity calculation results, the device status is marked and classified, and is divided into three parts: • The relative proximity of the data point to the fault data cluster is greater than 0.8, which indicates that the equipment is at a high risk of failure, and the bare conductor is recorded as a fault; • When the relative proximity is in the interval [0.6, 0.8], it indicates that the bare conductor is at a higher risk, and the bare conductor state is marked as a latent fault. The time series analysis method is used to further judge the speed and trend of the device transition to the fault state; • When the relative proximity of the data points and all data clusters are less than 0.6, the device status is marked as healthy and the health score of the device is calculated.

Computing device health score
When the relative proximity calculation result is less than 0.6, the health score of the computing device is used as a comprehensive evaluation index of the device state. For determining the bare conductor fault j, the known data point x and its proximity lc(x,k) can be converted into a bare conductor with a fault-related health score. Set 100 to full score, the expression of the device's health score F(t) is The schematic diagram of experimental set-up was shown in Figure 2, and the surface temperature of the wire clamp group was measured every once in a while with an infrared thermometer. See Figure 3 for the experimental site.

Experimental result
This paper use visual studio 2017 and write the implementation software based on C#. This software has the functions of temperature measurement, automatic recording of temperature measurement time, data recording, data calculation according to the above algorithm, output calculation results and related parameters, as well as failure alarm, hidden danger alarm and other related functions. In the engineering site, it can be switched to the timing temperature measurement mode and automatic calculation mode, and the score response of the hidden naked conductor is estimated to need to be replaced. By the short circuit experiment: only in electricity 10 minutes we found wiring is lax bare conductor wire clip, program play appear alarm prompt, part of the clamp the measured temperature as shown in According to the above algorithm, this paper write the fault time prediction function based on the relationship between the normalized distance value and a large number of sample fault records. Due to time problems, not enough data samples have been obtained, and the module is still in continuous update.

Conclusion
This paper proposes a method for evaluating the bare conductor state based on clustering and time series analysis. The following conclusions can be verified: • Based on the point density criterion, the method preprocesses the data and eliminates the influence of noise data. Based on the clustering method, the state of the naked conductor is divided into three categories of healthy, latent faults and faults. Compared with the traditional fault diagnosis algorithm, the judgment accuracy of the latent faults and faults in the naked conductor is higher.
• The equipment health score evaluation index is set based on fault weighting, which can directly reflect the equipment health status and provide reference for the equipment operation and maintenance arrangement.