Monitoring of Fibre Optic Links With a Machine Learning-Assisted Low-Cost Polarimeter

The optical fibres widely used in telecommunication can be simultaneously used for (distributed) sensing or fibre network self-monitoring. In our work, we monitor changes in the fibre environment via monitoring changes in the state of light polarization without the utilization of methods based on back-scattered light. These changes can generate a vast amount of data, but it is generally not straightforward to extract useful information from them, e.g., future fibre break predictions or earthquake monitoring. We suggest using machine learning to solve this problem. However, since the measured data events are not labelled (i.e., we do not know in advance what fingerprint in the measured data corresponds to a future fibre break), unsupervised machine learning methods must be used. Here, we report a proof-of-concept approach in which we use a simple polarimetric technique and installed optical fibre, which we disturb with controlled vibrations, knocking on the fibre, and rack door closing near the fibre. Using a machine learning K-means algorithm, we distinguish between data generated with these controlled disturbances and data generated by noise due to common traffic. These results are the first step along the way to automated data labelling, which can be used for the classification of events.


I. INTRODUCTION
Intelligent fibre optic systems need to provide (among other benefits) self-monitoring, which, for example, includes early detection of future fibre breaks or detection of an intrusion. To perform the mentioned intentions in as large a portion of the fibre network as possible, these smart functionalities should be provided with hardware of minimal cost.
The above effects can be monitored via measuring changes in the properties of the light (phase, amplitude, or polarization) that propagates through the fibres, which are measured at the receiver side [1] or in the back-reflection (e.g., [2], [3]). The system operating in transmission has the advantage of simplicity (often not requiring any additional components [1]), but generally cannot provide spatially-resolved detection (with some exceptions, e.g., the relatively complex dual-wavelength method presented in [4]). Detection systems operating via reflection (e.g., optical time-domain The associate editor coordinating the review of this manuscript and approving it for publication was Inês Domingues . reflectometry, OTDR [5] or phase-resolved optical timedomain reflectometry φ-OTDR [6]) can achieve spatiallyresolved detection, but generally require additional optical hardware and must cope with the detection of extremely low levels of the back-scattered signals. The additional hardware may significantly increase the cost of the system, e.g., φ-OTDR requires a dedicated, relatively high-power continuous-wave (CW) light source with significant demands on its coherence length, which must be at least twice the fibre optical length. Such a CW light source is not only expensive: its high power can also easily cause undesirable cross-talk when the fibre is simultaneously used for data signals.
For practical use, any measurement technique sensitive to fibre manipulation needs to be complemented with data analysis that identifies events of interest, e.g., 'tapping on the fibre' or 'slow degradation of the fibre'. Such a technique must function effectively even in the presence of 'noise', e.g., events that are not of interest (a person walking down the corridor, playing music or talking, a truck passing on the road, etc.). VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Methods of machine learning have recently emerged to address the classification of measured data; however, they have been almost-uniquely applied to back-scatter-based systems (based on OTDR or φ-OTDR) [7]- [9]. Tejedor et al. [7] provide an overview of the application of Distributed Acoustic Sensing (DAS) and pattern recognition to detect potentially dangerous events on gas pipelines. Makarenko [8] and Bublin [9] both utilize deep learning techniques for data acquired by DAS during perimeter monitoring on pipelines. Machine learning applied to simple systems operating in transmission has not been, to the best of our knowledge, reported yet. For the classification, the model must be learned from data which are already labelled. The most challenging question is how to obtain such labels for measured data.
Here, we suggest performing automatic recognition of the fibre changes of interest (events) from data acquired with our cost-effective transmission-based polarization monitoring method [3] using unsupervised machine learning. To achieve this, we demonstrate the first necessary step experimentally, which is grouping (clustering) data that are deemed to correspond to the same/similar event. We suggest and use the unsupervised machine learning algorithm K-means [10] for this task.
Although our measurement method [3] requires additional hardware on the receiver side (a low-cost polarization analyzer, which we describe later), it can be used in conjunction with the extremely inexpensive transmission of amplitudemodulated data. Thus, the overall system cost can be lower than for techniques requiring a coherent receiver [1], where the cost of optical hardware to transmit and detect the data on its own is significantly higher than for simple amplitudemodulated transmission.
Our results represent the first step towards the future automated identification of various events from the measured data. Achieving this final goal will require training a classification model for categorization of events (i.e., to evaluate which footprint data correspond to 'tapping on the fibre' and which group of data corresponds to 'slow fibre degradation', etc.). This will allow for the final data classification of 'tapping on the fibre' or 'slow fibre degradation' directly from the newly measured yet unseen data.
The structure of this paper is as follows: section II describes the experimental setup and provides definitions of the experiments. The following section III aims to provide an accurate explanation of the data processing pipeline and the utilized algorithms. Section IV summarizes the results and explains the estimations of the parameters for all testing scenarios. The performance and evaluation of results are discussed in section V, where possible future work is also mentioned. All is concluded in the final section VI.

II. EXPERIMENTAL SETUP
We used an optical fibre polarimeter, which consists of four polarization-sensitive fibre gratings inscribed along a short piece of optical fibre. The light passing through the gratings is scattered out with the scattering intensity polarization-dependent. The four projections of the input State Of Polarization (SOP) onto four linearly-polarized signals with the polarization axis projection defined by the polarization axis of the four gratings [11] is then detected by four InP photodiodes. Subsequently, the signals from photodiodes are amplified with transimpedance amplifiers and digitized using analogue-to-digital converters; for technical details, see [11], [12].
Our experimental testbed, Fig. 1, consists of two laboratories ('A' and 'B') in different buildings which are approximately 1 km apart from each other. The installed G.652 single-mode fibre that connects these two laboratories is installed underground. Within the two buildings, it passes through several rooms with switches in which the fibre is connected. This setup thus represents a real-world scenario, especially when compared to other reports that address pure in-laboratory conditions, e.g., [13], [14]. In laboratory A, a data signal is sent down the Fibre Under Test (FUT) using a Small Form-factor Pluggable (SFP) transceiver operating at 1542.94 nm (ITU channel 43). It is detected in laboratory B with the polarimeter that samples the signal at a fixed frequency of 20 kHz, and data are recorded continuously with a computer.
During the measurement, we introduce controlled disturbances as follows: 1) A loudspeaker creating pulses of discrete frequencies (PDF): we glued a small section of the FUT in a 2 mm diameter protective buffer directly onto a diaphragm of a loudspeaker. We introduced pulses of discrete frequencies of 10, 20, 30, 40, 50, 75, 100, 200, and 500 Hz interleaved by 3-s long silences. After that, we introduced 5-s long 'heartbeat' signals that consisted of repeated pairs of two signals with 100% and 20% of loudspeaker amplitude, respectively. The scenario PDF contains ten types of different events in total. The overall length of the PDF sequence was 95 s.
2) Periodic closing and opening of a rack door (RACK ): this test represents events that occur in actual operation. It consisted of tapping on the FUT two times, followed by closing and opening the door of the rack in which the fibre was installed. The open-close event was performed five times during the acquisition interval, which was approximately 44 s long.
The experimental data were then visualized by timefrequency 2-D maps (known as spectrograms). For each time point (x-axis), the frequency response (y-axis) was calculated from the time domain data acquired over a short time interval using Fast Fourier Transform (FFT) [15].
The FUT, which is partially buried underground, primarily detects frequencies below 1 kHz. This is because higher frequencies are strongly attenuated by most materials, including soil [16]. We have found that most FUT detection occurs for frequencies below 200 Hz.
Due to the fixed polarimeter sampling frequency, the resolution of the time-frequency map is limited. Better frequency resolution (larger FFT window) reduces time resolution, and vice-versa. Time resolution that is required for data acquisition depends on the duration of the disturbance events we are monitoring. In our experiment, we chose the FFT window size of 12,288 samples, which led to a relatively low time resolution of ∼0.625 s (1 column of the spectrogram equals 0.625 s), which, however, was sufficient to distinguish between two consecutive events. This choice provides excellent frequency resolution, which should allow for effective distinction between different FUT disturbances, which is the key interest in our study.

III. DATA PROCESSING
In this section, we start with a brief description of the critical components of the K-means algorithm. Next, the data processing pipeline will be explained.

A. K-MEANS ALGORITHM
The K-means method allows division of measured data (events) into K clusters, with each cluster containing experimental observations that are deemed to correspond to the same event (in our case, fibre disturbance). Finding parameter K is not straightforward and is generally performed experimentally or with the help of some metric measuring the K-means performance. We discuss our approach for K estimation later.
The K-means algorithm works iteratively: see Fig. 2. In the initialization step, we place one measured data sample into each cluster. In the first iterative step, K-means sorts the rest of the measured data events into the K clusters based on their similarity, which is calculated via the least-squared Euclidean distance (mean) between the measured data event and the cluster's centroid. In the second iteration step, K-means calculates new centroids of each cluster. Consequently, the centroid represents all measured data events that are in the cluster. The iterative process continues until there are no more re-assignments of measured data events to the clusters between the iterative steps. This process assigns events of a similar nature to the same cluster. In practice, we utilized K-means implemented with Python's scikit.learn package.

B. DATA PROCESSING PIPELINE
The data processing pipeline schematic is depicted in Fig. 3. It starts with an acquired signal, which is subsequently transformed into a spectrogram using FFT. Firstly, we normalize the values of the spectral content to be within [0, 1]  The overview of the data processing pipeline. The acquired signal is first transformed using FFT from time to frequency domain. Next, the whole spectrogram is normalized, and the columns are used as input vectors for the K-means algorithm.
throughout the entire spectrogram. We then describe the normalized spectrogram by a matrix S, in which the spectral content of the i-th measurement sample is in the i-th column of S, denoted as S i . VOLUME 8, 2020 Subsequently, we take all columns S i as input vectors for the K-means algorithm. The initial K-means clusterization result showed that most of the measured events (in the RACK scenario) occurred within a single time slot of 0.625 s (corresponding to a single column in S). However, few measured events were longer (in particular, from the PDF test), spreading over several columns in the spectrogram. To account for them as a single measurement event when processing them with K-means, we pre-processed them with the unwrap method.
This method takes the u consequent data lines and then unwraps them to the side. We demonstrate this with the following example: After the application of unwrap to the transposed spectrogram, we obtain new matrix S T , which we transposed back. Subsequently, the input vector is defined as V i = unwrap(S T ) T i . For example, V 0 = [S 0 , . . . , S 4 ] for u = 5. We used unwrap for data events during the PDF test. From the experimental data, we estimated that each event lasted for approximately five samples (5 × 0.625 s), which is the value we used for unwrap.
As we have mentioned earlier, establishing how many clusters K should be used for the K-means method is nontrivial. Sometimes, this is known, e.g., when we know how many types of events we expect. However, when not known, we can, for example, use learning algorithms applied to the measured data or data extracted from them ('meta-data'). This approach is commonly referred to as 'meta-learning'. Meta-learning requires a way to evaluate the K-means clustering performance, i.e., to evaluate how similar events within the same cluster are, and how different they are in different clusters. This is done either by a human or by using metrics (or an index) that quantifies the clustering performance. There are many indices to perform this task, e.g., Silhouette index [17], Davies-Bouldin [18], Calinski-Harabasz [19], Dunn index [20], and R-squared index [21]. Establishing which one is the most applicable to our data is beyond the scope of this paper. Later, we will discuss how to assess their suitability using an example of the Silhouette index. Table 1 summarizes the results for both use-cases, RACK and PDF. We can see that the success rate in the RACK scenario for K = 5 is 100 % for event identification. Still, only 6 of 7 events are assigned into the correct cluster, and at the same time, each type of event has only one assigned cluster (this is referred to as homogeneity). Specifically, the 2 nd knock is misplaced into cluster number 2 instead of the same cluster as the 1 st knock (cluster number 1). In the PDF use-case (for K = 8), the best event identification achieved 50% success, and cluster homogeneity was achieved for 4 of 10 events.

Figures 4 and 5 show normalized spectrograms of RACK
and PDF tests, respectively. They also show the K-means clusterization results when setting K to selected values of K = 2, 3, ..8 for the RACK test and K = 2..12 for the PDF test, respectively.
In the RACK test, Fig. 4, we expect two knocks on the patchcord followed by five hard closings of the rack doors. Thus, we expect K = 3: (1) background, (2) knocking, and (3) rack door closing. However, for K = 3, K-means assigned two clusters (1 and 2) to the two knocks. The rack door closing was then mistakenly identified as background (cluster 0). It is worth noting that this discrepancy between the algorithm output and the expected result is not due to an error in the K-means algorithm. It only means that the signals of the 1 st and the 2 nd knocks exhibit greater difference than background versus rack closing. The lowest K value that allowed the identification of all expected events (background, knocking, and rack closing) is K = 5: five door closing events (cluster 0), knocking (clusters 1 and 2), and background (cluster 3). However, cluster 4, which follows the 2 nd knock, is erroneously not evaluated as background (cluster 3).
In the PDF test, Fig. 5, we can see that the polarimeter is capable of capturing the frequency of the disturbance (as we increased it from 10 through 20, 30, 40, 50, 75, 100, 200, and 500 Hz), including its harmonics. Thus, we expect that K-means should recognize all of these events. Since we expect nine different frequencies, heartbeat knocking, and background, K should be 11.
For all K values from 2 to 10, K-means did not distinguish between background and footprints of 10, 20, and 30 Hz frequencies. Later, for K = 11 and K = 12, there is evidence of excessive granularity in 'silence' segments, and only 40, 50, and 75 Hz frequencies were identified in their full length. The two results (K = 9 and K = 10) are very similar. The results for K = 10 identified one more cluster (number 9 for 500 Hz frequency) than the results for K = 9, where there are also traces of excessively extensive granularity (short intervals with cluster number 3).
Here, we decided to select K = 8 as a trade-off between the number (5) of correctly identified clusters for frequencies and clear and homogeneous identification of background. The footprints of 40, 50, 75, and 200 Hz (cluster numbers 6, 7, 5, and 4) were correctly identified. The result for K = 7 is similar to the results for K = 8 (perhaps slightly better in the case of the 200 Hz footprint), but it failed in the identification of 'heartbeat'. In our opinion, it is better to assign 'heartbeat' into the same cluster as 40 Hz rather than into the cluster of background (which is what happened for K = 7). For K = 8, in Fig. 5, we see separated clusters of blocks corresponding to frequencies (40, 50, 75, and 200 Hz), as well as 'heartbeat' knocking. Unfortunately, lower frequencies of 10, 20, and 30 Hz were not identified as separate clusters. However, we obtained good consistency in background noise identification for one type of footprint (cluster 0).

V. DISCUSSION
In Section IV, we have found that K = 5 captured all of the events in the RACK test, although we expected to need only K = 3. As we have mentioned earlier, the task of finding the optimum K , which we performed manually, should be performed automatically in the future with the help of metrics or an index. We mentioned that there is a large number of such metrics or indices. However, as we show below with the example of the Silhouette index, a careful approach must be adopted when using these metrics or indices.

A. SILHOUETTE INDEX
The Silhouette index ranges from −1 to +1, where a high value indicates that events are well matched to their cluster and poorly matched to neighbouring clusters.  Table 2 shows Silhouette index values for both experiments: RACK and PDF. For RACK, the highest Silhouette index is achieved for K = 2. As shown in Fig. 4, with K = 2, K-means recognized only one of the two events: it recognized all knocks, but no closings of the rack. Thus, a metrics or index other than Silhouette must be used to recognize both events of interest.
For the PDF test, we expected 11 clusters (9 frequencies, background, and 'heartbeat'), but similarly to the RACK test, the Silhouette index is the highest for K = 2. As shown in Table 2, there is a local maximum of the Silhouette index at K = 10, which is slightly lower than the expected K = 11. We speculate that the footprint of the 10 Hz signal is too similar to the 'background'. Analyzing local maxima of the Silhouette index gives a K value close to what we would expect, showing the usefulness of this index for the PDF test. VOLUME 8, 2020 B. FUTURE WORK Once the properties of clusters are established, we would assign labels to them (e.g., knowing that cluster number 3 corresponds to events describing opening a rack with fibres, we could label this cluster as 'Rack opening'). The output pairs (input vector-label) will next be used as input data for training of machine learning classification algorithms like Support Vector Machines, Artificial Neural Networks or K-Nearest Neighbours (which would later place unseen data events into different classes).
Following that, we could provide automated identification of measured events with a classification algorithm. For practical application of classifier training, we will need at least hundreds to thousands of labeled data. It can be expected that the footprints of a given event will not be entirely identical because of differences in the background noise, environment and conditions at the specific time of occurrence. Thus, K-means requires numerous input vectors (repeated measurements of the same event) to converge a cluster mean to a stable and reliable value. In practice, the selected optical link will be measured for a reasonably long period, and the K-means will process the acquired data.
Our initial experiments shown here provided us with data corresponding to limited types of events. The main advantage of the machine learning approach is that a trained classificator is capable of rapidly evaluating new data and assigning them into categories learned from training data, enabling real-world events to be accommodated even with the in-lab developed system.

VI. CONCLUSION
Our approach demonstrates the suitability of unsupervised machine learning methods like K-means to sort measured data with event footprints into a defined number of categories (clusters). Subsequently, these categories can be used as target labels of data instances for training a classification model.
The key task is to find the optimum number of categories, especially when adopting methods of machine learning. We demonstrate this using the Silhouette index, which requires careful analysis to allow for estimation of this number.
To make this practical, it will be necessary to carry out further steps, including the evaluation of events with respect to the knowledge of the original event occurrences or with learned classifiers.  His research interests include optical and optics-assisted signal processing, study of environmental sensitivity of optical fibres, and hollow-core optical fibres and their applications in telecommunications, metrology, and sensing. He is a Fellow Member with the Optical Society of America. VOLUME 8, 2020