Optical High-Speed Rolling Mark Detection Using Object Detection and Levenshtein Distance

Krammer, Manuel; Pröll, Markus; Bürger, Martin; Zauner, Gerald

doi:10.3390/app13158678

Open AccessArticle

Optical High-Speed Rolling Mark Detection Using Object Detection and Levenshtein Distance

¹

Plasser & Theurer, Export von Baumaschinen Gesellschaft m.b.H., Emerging Technologies, Pummererstraße 8, 4021 Linz, Austria

²

School of Engineering, University of Applied Sciences Upper Austria, Stelzhamerstraße 23, 4600 Wels, Austria

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(15), 8678; https://doi.org/10.3390/app13158678

Submission received: 5 June 2023 / Revised: 7 July 2023 / Accepted: 19 July 2023 / Published: 27 July 2023

(This article belongs to the Special Issue Sustainable Railway Infrastructures: Health Monitoring, Assessment and Maintenance)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

Railroad Infrastructure Detection.

Abstract

This paper presents an automated high-speed rolling mark recognition system for railroad rails utilizing image processing techniques. Rolling marks, which consist of numbers, letters, and special characters, were engraved into the rail web as 3D information. These rolling marks provide crucial details regarding the rail manufacturer, steel quality, year of production, and rail profile. As a result, they empower rail infrastructure managers to gain valuable insights into their infrastructure. The rolling marks were captured using a standard color camera under dark field illumination. The recognition of individual numbers, letters, and special characters was achieved through state-of-the-art deep neural network object detection, specifically employing the YOLO architecture. By leveraging reference rolling marks, the detected characters can then be accurately interpreted and corrected. This correction process involves calculating a weighted Levenshtein distance, ensuring that the system can identify and rectify partially misidentified rolling marks. Through the proposed system, the accurate and reliable identification of rolling marks was achieved, even in cases in which there were partial errors in the detection process. This novel system thus has the potential to substantially improve the management and maintenance of railroad infrastructure.

Keywords:

railway infrastructure; rolling marks; computer vision; object detection; YOLO—you only look once; Levenshtein distance

1. Introduction

In the European Union, the extensive railway network spans approximately 216,000 km [1]. To ensure proper availability, ongoing maintenance and servicing are essential. Currently, manual track inspections form a significant part of these maintenance activities. For instance, in the “Deutsche Bahn” network (approximately 38,400 km), track inspections are scheduled at least every two years. In addition to manual inspections, there are also advanced maintenance and servicing systems equipped with specialized sensors. These systems employ sensors mounted on traction units to detect and identify railway-related objects within the track network.

These comprehensive systems provide technicians and managers with a holistic overview of track conditions. This system is instrumental in planning the necessary actions and interventions. It combines various sensory inputs to offer valuable insights into track conditions, aiding in decision-making processes. The majority of railway tracks installed worldwide consist of Vignole rails, which can be categorized into three different parts—the rail head, rail web, and rail foot—as depicted in Figure 1.

Typically, the rolling mark is positioned on the rail web at consistent intervals along the rail line, as illustrated in Figure 2. This mark serves as an identifier and contains essential information pertaining to the rail. It comprises a combination of letters, numbers, and special characters, often including horizontal lines as well. These characters collectively convey specific details about the rail, allowing for identification and data interpretation.

In this paper, we introduce a novel camera-based system designed for automatically extracting information from rolling marks. The paper is organized as follows.

Section 2 provides an in-depth look into rolling marks. Section 3 describes the method for detecting rolling marks, including the physical principle of measurement and the basic principles of the evaluation software. In Section 4, we demonstrate an exemplary evaluation of the presented method. We use test data to highlight the strengths and weaknesses of the method. Section 5 summarizes the paper, and Section 6 provides an outlook on potential further applications of the system.

2. Rolling Marks

For Vignole rails of contemporary design weighing 46 kg/m or above, the DIN EN 13674-1 standard provides comprehensive guidelines for both conventional and high-speed railway lines. Within this standard, specific requirements are outlined for the textual composition of rolling marks, as depicted in Figure 2. These specifications encompass the following aspects:

Rolling mill;
Steel grade;
Rolling year;
Rail profile.

Each rolling mark is carefully rolled into the rail web at intervals no greater than 4 m, ensuring that the marks are prominently raised within the range of 0.6 mm to 1.3 mm. The characters comprising the rolling mark exhibit a consistent height falling between 20 mm and 25 mm. These specifications have been in effect since November 2002, serving as a standardized framework for rolling mark implementation. Figure 2 visually portrays a representative rolling mark adhering to the aforementioned structural guidelines. In this specific example, the encoded information contained within the rolling mark was as follows:

The rail was rolled in the Donawitz rolling mill (Donawitz Steelworks is a steel mill located in Donawitz near Leoben in Styria/Austria; it is particularly well known for the first application of the Linz-Donawitz process (LD) for steel production).
The quality of the steel corresponds to grade R 350 HT.
The rolling year was 2012.
It was a rail of profile 54E2.

Rails manufactured prior to 2002 often deviate from the standard convention described earlier, particularly with regard to the placement of the steel grade indication at the right end of the rolling mark. Additionally, these rolling marks typically include the specification of the month of production and utilize an older profile designation, such as the usage of “UIC 54E” instead of the “54E2” designation.

The accurate and automated localization of these specific markings holds significant importance for rail infrastructure operators, as it allows for targeted insights into the rail infrastructure. This capability facilitates the effective planning of maintenance actions according to the age of the rails. Furthermore, it contributes to enhanced safety by enabling the identification of older rails that require replacement in a precise and targeted manner.

3. Camera-Based Rolling Marks Detection

A rolling mark can be considered a “surface anomaly” due to its distinctive elevation on the rail. To enhance the visibility of these marks, dark field illumination is employed in image processing techniques. In this work, the illumination setup was similar to that depicted in Figure 3. All objects were located within the structure gauge specified in the railroad area. The emitted light was directed onto the rail surface but not directly back into the camera. Only in the vicinity of a raised surface, such as a rolling mark, does some of the light reflect back to the camera, resulting in brighter contours of the mark in the captured image (as shown in Figure 4).

Furthermore, the illumination system differentiated between red and green colors, exploiting the photometric stereo effect. This technique takes advantage of the varying angles of light incidence to create shading variations, which can help to accentuate surface features. A similar principle was utilized in [2] to detect defects on the rail surface. Building upon this principle, the present work employed it to further enhance the visibility of rolling marks. Furthermore, an optical bandpass filter, tuned to the wavelength of the illumination, was used, reducing ambient influences, such as sunlight.

To enable the extraction of information from a rolling mark, the process involved two main steps: detection (localization) of the mark in the image and subsequent interpretation. While conventional computer vision approaches often rely on a combination of neural networks for object detection, such as You Only Look Once (YOLO) [3], along with additional text recognition software (OCR, Optical Character Recognition, e.g., [4,5]), this work adopted a different approach (YOLO is a widely used object detector that has been intensively characterized in numerous computer vision benchmarks [6] offering high frame rates and a relatively user-friendly software framework).

In this work, each individual character of the rolling mark string was detected separately using the deep learning-based YOLO object detector. This approach allows for more precise localization of each character within the mark. Subsequently, the identified characters were passed to a parser that utilized the Levenshtein distance metric [7] for the interpretation and analysis of the rolling mark. By treating each character as an individual object of interest, the proposed approach provides more flexibility and accuracy in capturing the intricate details of the rolling mark. This method avoids the need for additional preprocessing steps, such as converting the region of interest to a binary image, which are commonly employed in traditional object detection and OCR techniques. The parser, leveraging the Levenshtein distance metric (explained in detail in Section 3.2), compared the detected characters with reference rolling marks and performed corrective actions according to the calculated distance. This enabled the system to handle partially incorrectly recognized marks and ensure the accurate interpretation of the rolling mark content. This novel combination of an object detection technique and the power of the Levenshtein distance-based parser provides a robust and efficient solution for rolling mark detection and interpretation.

3.1. Parser Based on a Weighted Levenshtein Distance

The basic idea is to create reference character strings that are generally possible in a rolling mark. Then, the corresponding Levenshtein distance (a distance metric between two strings) is calculated between a reference and a detected rolling mark in the camera image. Finally, a range search algorithm [8] is used to determine the correct sequence of rolling mark characters. For each calculation of the Levenshtein distance, the backtrace between the two strings under consideration is also determined. This backtrace then yields the “path” between two words. The associated path corresponds to the sequence of necessary change steps to move from the detected word to the reference. From this, the characters that are incorrectly recognized can be deduced. Thus, the corrected rolling mark is compiled from a sequence of instructions consisting of documented insertion, deletion, and replacement processes of characters, which will be described in detail in the next section.

3.2. Levenshtein-Distance

The simple Levenshtein distance (edit distance) between two strings

s

and

t

is defined as the cost of necessary operations to transfer

s

to

t

[9]. It can be calculated as

δ_{l e v} (s, t) = L_{|s|, |t|}

, where the following recursion equation holds for

L

:

L_{0,0} = 0 L_{i, j} = m i n = \{\begin{matrix} L_{i - 1, j - 1} + 0, k e e p \\ L_{i - 1, j - 1} + 1, e x c h a n g e \\ L_{i - 1, j} + 1, i n s e r t \\ L_{i, j - 1} + 1, d e l e t e \end{matrix}

(1)

3.3. Weighted Levenshtein Distance

The Levenshtein distance, a useful metric for measuring the similarity between two strings, assumes equal costs for insertion, deletion, and substitution operations. However, in the case of correcting rolling marks, it is often necessary to assign different costs to these operations on the basis of their specific impact on the accuracy of the interpretation.

To address this, the Levenshtein distance can be extended with variable costs by defining functions to calculate the costs for exchanging, inserting, and deleting characters. For example, the function

d_{e x c h a n g e} (s [i], t [j])

can be used to determine the costs incurred when swapping the characters

s [i]

and

t [j]

. Similarly, the functions

d_{i n s e r t} (t [j])

and

d_{d e l e t e} (s [j])

can be defined to calculate the costs for inserting character

t [j]

or deleting character

s [j]

, respectively.

These cost functions can be implemented in various ways. One approach is to use a static table that assigns predetermined costs for each operation. Alternatively, the costs can be adjusted dynamically during runtime on the basis of the properties of the rolling mark data, such as the significance of certain characters or the likelihood of specific errors.

L_{0,0} = 0 L_{i, j} = m i n = \{\begin{array}{l} L_{i - 1, j - 1} + 0, & k e e p \\ L_{i - 1, j - 1} + d_{e x c h a n g e} (s [i], t [j]), & e x c h a n g e \\ L_{i - 1, j} + d_{i n s e r t} (t [j]), & i n s e r t \\ L_{i, j - 1} + d_{d e l e t e} (s [j]), & d e l e t e \end{array}

(2)

Table 1 shows the results obtained for the two German words

s = “ I T A L I E N ”

and

t = “ A L L E I N ”

at costs:

d_{i n s e r t} (t [j]) = 1 d_{d e l e t e} (s [j]) = 0.9 d_{e x c h a n g e} (s [i], t [j]) = 0.9 + α

(3)

with

α = \{\begin{array}{l} 0,5, & s [i] \in {“ I ”, ” T ”, ” L ”}, t [j] \in {“ I ”, ” T ”, ” L ”}, s [i] \neq t [j] \\ 1, & o t h e r \end{array}

.

Starting with the Levenshtein algorithm, the cost matrix

L

is initialized with

L_{0, 0} = 0

, and the matrix is gradually filled in for each position (i, j) according to the comparison of characters from the input strings s and t. The cost values increase as the algorithm progresses, ultimately resulting in the value at

L_{|s|, |t|}

, which represents the total cost or distance between the two strings.

In the case of the weighted Levenshtein distance

{δ_{l e v}}^{i} (s, t) = L_{|s|, |t|}

, the value at

L_{|s|, |t|}

is 4.1. The weights associated with each operation (insertion, deletion, and exchange) contribute to this calculated distance. By considering these weights, the algorithm assigns different costs to each operation, reflecting the specific requirements of the problem at hand.

Additionally, the path or backtrace between the two strings can be determined on the basis of the entries in the calculated Levenshtein matrix. This backtrace represents the sequence of correction steps needed to transform one string into the other, typically referred to as the edit operations. If we view the entries in the cost matrix as an “elevation profile”, with higher values representing higher elevations, the path along the valley in this profile corresponds to the optimal sequence of correction steps. Traversing this path in the matrix provides the corrected order of operations between the two strings. This reads:

Keep $„ N “$ ;
Insert $„ I “$ ;
Keep $„ E “$ ;
Exchange $„ I “$ with $„ L “$ ;
Keep $„ L “$ ;
Keep $„ A “$ ;
Delete $„ T “$ ;
Delete $„ I “$ .

3.4. Creating and Parsing with Reference Rolling Marks

By combining all known manufacturer designations, special characters for steel quality and manufacturing process, profile designations, year numbers, and months, it would be possible to generate an extensive list of reference rolling marks. However, due to the sheer number of elements involved, this list would be exceedingly large. For instance, a rolling mark conforming to the current standard would appear 100 times in the list, each with a different year. Hence, it becomes logical to generate a reference rolling mark that incorporates a year placeholder. This approach is justified by the fact that the manufacturer designation remains constant across rails from different years, while the manufacturing date varies. Additionally, the type designation itself is limited to a finite number of variants. For example, a rolling mark of the current standard would be “DO Applsci 13 08678 i010

12 54E2” (see Figure 2). The reference rolling mark is formed using the structure “Producer _ Quality _ Y-T Y-O _ Type”, where Y-T is a placeholder for the tens digit of the year and Y-O is a placeholder for the one’s digit.

3.5. Parsing with the Aid of Reference Rolling Marks

Utilizing each known textual structure, a comprehensive reference rolling mark is generated by considering all conceivable combinations of manufacturers, steel qualities, months, manufacturers, and type designations. Placeholder variables are included specifically for the years, as well as for the spaces between each distinct specification. Employing an efficient calculation method, such as tries [8], the weighted Levenshtein distance of the identified characters is computed.

The YOLO neural network, employed for detecting the distinctive characters of a rolling mark, can undergo preliminary testing. To achieve this, data that has not been processed by the neural network yet is utilized. The outcomes from the object detections are then compared with the actual values and meticulously documented in a table, known as the confusion matrix. This matrix demonstrates the neural network’s performance in terms of accurately identifying characters (referred to as True Positive—TP), instances in which confusion occurs, undetected characters (False Negative—FN), and instances in which characters are detected despite their absence (False Positive—FP). On the basis of the confusion matrix, tailored costs can be assigned to the network. Figure 5 depicts a confusion matrix consisting of 65 classes, encompassing 10 digits (0–9), 26 letters (A–Z), 26 special characters, a hyphen, a dot, a comma, and the background. Each entry in the matrix is normalized relative to the columns. The bluer a field appears, the closer its numerical value is to 1, while a white field corresponds to a numerical value of 0.

The costs for the weighted Levenshtein distance are chosen as follows: characters that are often confused, falsely detected, or not found at all according to the confusion matrix are easier to correct in the subsequent distance calculation. Therefore, the following considerations are made:

If a class is consistently detected in nearly every instance within the image (FN ≈ 0), the cost of an insert operation is consequently assigned a higher weight. Conversely, for a class that is infrequently detected (FN ≈ 1), the weight assigned to the insert operation is lower.

Likewise, the cost of deleting a character can be ascertained by examining the false-positive (FP) values in the confusion matrix. The occurrence of predicting a character class that does not actually exist is reflected in the FP values. If certain classes are more prone to such occurrences, the cost of deleting those characters is set to a lower value. Additionally, the confidence value of the predicted class c serves as a measure of “reliability”. For instance, if a character is partially obscured or unclear in the image, a lower confidence value is typically assigned.

In reality, there is a higher likelihood of confusion between the number “0” and the letter “O” compared with, for example, the letter “E”. The confusion matrix provides insights into which classes are more prone to such confusion (TP = 0), as well as classes that are rarely confused (TP ≈ 1, FP ≈ 0). When there is a high probability of confusion, the cost associated with swapping two characters should be relatively low. It is worth noting that swapping the two characters

s [i]

with

t [j]

can also be interpreted as deleting

t [j]

and inserting

s [i]

in place of

t [j]

. Therefore, the cost must be quantified as higher than a single operation to account for this combined action.

To address the potential bias toward shorter strings, the calculated weighted Levenshtein distance is further normalized by dividing it by the sum of the deletion costs

L_{0, |t|}

. This normalization ensures that shorter strings do not dominate the distance calculation, thus enabling the inclusion of longer rolling marks in range searches. Consequently, a specific weighted Levenshtein distance measure

\hat{{δ_{l e v}}^{i}} (s, t) = \frac{L_{|s|, |t|}}{L_{0, |t|}}

is employed to account for this normalization.

4. Example Evaluation and Results

To conduct an illustrative evaluation, a series of test recordings were conducted on the Wels–Passau railroad line, specifically on the section between Wels (route kilometer 0) and Riedau (route kilometer 42.2). This section corresponds to route number ÖBB 205 01 and route number DB 5831. The evaluation employed a dataset comprising 11,800 images capturing rolling marks. These images were captured using a measuring vehicle provided by the company Plasser & Theurer. A new image was captured every 250 mm, depicting a track section of approximately 340 mm.

4.1. Object Detection Results (Single Rolling Mark Characters)

Table 2 provides a comprehensive overview of the distribution of individual classes, while Table 3 visually presents the special characters. The dataset was divided into three subsets: 80% for training data and 10% each for validation and test data. Special attention was given to ensuring that the test data included images from successive exposures, which are crucial for evaluating the subsequent analysis with the parser.

The training process involved an initial training phase, which consisted of 300 epochs with a batch size of 32 and an image size of 640 × 640 pixels. Subsequently, fine-tuning was conducted with half the batch size and a reduced learning rate until no further improvement in training performance could be observed. The training process made use of the hardware acceleration provided by 8 NVIDIA RTX A6000 GPUs.

Figure 6 shows the obtained results based on the common evaluation metrics used in the field of image classification and object recognition, namely the mAP (mean average precision), F1-score, precision, and recall [10]. The first column shows the precision averaged over all classes at a confidence value of 0.5. The second column shows the highest F1 score. Column three shows the maximum precision. This occurs at the confidence value given in the fourth column, and the fifth column indicates the recall at a confidence value of 0.

4.2. Measurement Speed Considerations

The image acquisition should take place at a driving speed of at least 80 km/h so that no railroad line closure is required. The camera and lighting must be designed accordingly. With a selected distance of 25 cm between two image captures, a frame rate of

\frac{80 \cdot \frac{km}{h}}{0.25 \cdot m} = 88.8 \frac{1}{s}

is required. This is possible with commercially available cameras and lighting. The bottleneck represents the data stream, which has to be processed at correspondingly high speeds. In the example evaluation, about

30 \frac{GB}{km}

were recorded. If the task was to evaluate the images directly and store only the detected rolling marks (without the original image data), the detection network needed to be accordingly fast. With an image size of 640 × 640 pixels, the average processing time of the YOLO detector in our case was 8 ms per image (7.4 ms for inference + 0.6 ms for non-maximum suppression). This implied a final measurement speed of:

\frac{0.25 \cdot m}{0.008 \cdot s} = 31.25 \frac{m}{s} = 112.5 \frac{km}{h} \hat{=} 8 \frac{rolling marks}{s}

4.3. Results of the Parser and Rolling Mark Correction and Application Scenarios

Defining a metric to determine whether the input data (detections from the neural network) have been correctly corrected or parsed is not a trivial task. Various factors can complicate the process of “correct correction” or even render it impossible. For instance, if a significant portion of a rolling mark is not visible, as illustrated in Figure 9, achieving a precise correction becomes challenging. Furthermore, attempting to establish a specific criterion, such as “the maximum allowable number of incorrectly detected characters”, in order to deduce the correct rolling mark is impractical due to the diverse and similar nature of rolling marks. A universal answer cannot be provided.

Moreover, it is important to note that without any correction, the input data already exhibit high quality with minimal errors. The need for correction primarily arises in exceptional cases. Consequently, the role of the parser often revolves around exclusive parsing rather than extensive correction. In general, it has been observed that the necessity for correction diminishes as the input data quality improves.

In light of these considerations, this section aims to provide insights into the expected quality of the results by showcasing example images and highlighting scenarios in which erroneous outcomes may arise.

It is feasible to generate different representations of rolling marks automatically. In Figure 7, each distinct rolling mark is assigned a defined color. In Figure 8, the color selection is based on the steel grade. However, it is also possible to filter for other information contained within the rolling marks. For instance, one can filter for rails from a specific manufacturer with a designated year of manufacture.

By examining Figure 7 and Figure 8, certain assumptions can be made: Initially, there were only two rails in the depicted area, represented by the light blue and green lines. Subsequently, a track with a higher steel grade (yellow line) was inserted. However, this track required multiple corrections, as indicated by the red, blue, and purple lines. It is possible that this track belongs to a batch with quality issues and may need to be replaced.

The light blue and green lines correspond to relatively recent tracks manufactured in 2011 and 2015, respectively, while the yellow line represents a much older track produced in 1998. The rails used for track correction (red, blue, and purple lines) are also younger, originating from the years 2010, 2014, and 2019. It is advisable for a track infrastructure operator to pay close attention to the track from 1998 (yellow line) during the next maintenance cycle, as further repairs may be necessary.

In Figure 9, the indication of the production month (originally September, represented by the Roman numeral IX) is obscured by a ground wire. Consequently, after parsing, the production month was incorrectly identified as January (represented by the Roman numeral I). Such “simple errors” can be automatically corrected by a multi-step correction procedure. In this process, all rolling marks that are located between two rail joints (welds, connecting lugs) are combined, and an additional correction process is performed. It is important to note that this analysis is based on test data only. None of the images were previously processed by the neural network.

5. Summary and Conclusions

This study focused on the recognition of rolling marks on railroad rails at high process speeds. Rolling marks are 3D lettering rolled into the rail web, comprising a distinctive sequence of characters. They convey crucial information about the rail manufacturer, production year, steel grade, and rail profile.

Images were captured using a color camera with dark field illumination. The YOLO neural network was employed to detect the characters present in the rolling marks. Leveraging the inherent structure of rolling marks, it was possible to infer the provided information. A weighted Levenshtein distance was computed between the detected characters and predefined reference rolling marks. This approach enables corrective measures to be applied to inaccurately detected characters. Since this method allows for the interpretation of all rail information, diverse analyses can be conducted.

For instance, it is feasible to automatically search for specific manufacturers or rolling marks from particular manufacturing periods. Additionally, transitions between different steel grades or rail profiles can be visualized. As shown in Section 4.3, potentially safety-critical areas can thus be found. Frequent repair of a track indicates a track with low quality. This provides rail infrastructure managers with a valuable tool for maintenance and the planning of maintenance intervals for their infrastructure.

Clear practical limitations of the proposed approach are heavy surface abrasion and overlap by vegetation or engineered rail fasteners (see Figure 9), i.e., situations in which the rolling mark structure is simply no longer present in the image. In general, however, the proposed approach is surprisingly robust, due to:

A special camera illumination technique (dark field illumination);
The use of optical bandpass filters for improved and thus more reliable image acquisition;
A specially trained object detection algorithm (YOLO) for finding single rolling characters, which is not based on classical OCR (“Optical Character Recognition”) techniques;
A subsequent text correction, which improves reliable recognition in the case of partially incorrect detections.

6. Outlook

6.1. Semi-Supervised Learning

The presented Levenshtein-based text correction technique introduces additional opportunities for automation. By leveraging the known correction steps for faulty detections, automatic correction of labels becomes feasible. These corrected labels can then be reintroduced into the training cycle of the neural network. As the system learns from this process, the combination of the neural network and correction algorithm can be considered a form of semi-supervised learning.

However, the Levenshtein-based correction method may not be effective for characters that have multiple valid variants, such as the steel grade and manufacturing time. In such cases, corrections can be performed on the basis of neighboring rolling marks. On the other hand, the manufacturer’s designation and profile designation fall into the category of “easy to correct” characters since they are unambiguous in each instance. Correcting these characters using the Levenshtein distance approach proves somewhat simpler when it involves swapping or deleting characters rather than inserting non-detected characters.

When following the correction instructions, if a character is to be exchanged for another character, only the character’s name needs to be modified. If a character that was detected initially is deleted during the correction, the label associated with it simply needs to be removed. When inserting a character, its position must be determined by referencing the uncorrected characters and considering the textual structure of the existing rolling mark to ensure accurate placement of the missing character.

However, it is important to handle cases like the one depicted in Figure 9 differently for all correction processes. In this scenario, the detection itself is accurate, but according to the correction instructions, the grounding cable would be incorrectly labeled as part of the month of manufacture.

6.2. Anomaly Detection

The recorded field of view captured by the camera provides additional valuable information beyond the rolling marks. It enables the detection of anomalies, such as missing irons, track holes, and other defects. By analyzing the captured images, it is possible to identify and address these irregularities, contributing to the overall maintenance and safety of the railway infrastructure (Figure 10).

7. Patents

In the course of this work, a patent (WO2022253660A1) was filed.

Author Contributions

Conceptualization, M.K. and M.P.; methodology, M.K.; software, M.K.; investigation, M.K. and M.P.; resources, G.Z. and M.B.; data curation, M.K.; writing—original draft preparation, M.K. and G.Z.; writing—review and editing, M.K., G.Z. and M.B.; visualization, M.K.; supervision, G.Z. and M.B.; project administration, M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Plasser & Theurer, Export von Bahnbaumaschinen, Gesellschaft m.b.H.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Research data are not shared due to confidentiality reasons.

Conflicts of Interest

The authors declare no conflict of interest.

References

Knapcikova, L.; Konings, R. European Railway Infrastructure: A Review. Acta Logist. 2018, 5, 71–77. [Google Scholar] [CrossRef]
Soukup, D.; Huber-Mörk, R. Convolutional Neural Networks for Steel Surface Defect Detection from Photometric Stereo Images. In Advances in Visual Computing; Bebis, G., Boyle, R., Parvin, B., Koracin, D., McMahan, R., Jerald, J., Zhang, H., Drucker, S.M., et al., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2014; Volume 8887, pp. 668–677. ISBN 978-3-319-14248-7. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors; Institute of Information Science: Marina del Rey, CA, USA, 2022. [Google Scholar]
Sinha, H.; Soumya, G.V.; Undavalli, S.; Jeyanthi, R. An Effective Real-Time Approach to Automatic Number Plate Recognition (ANPR) Using YOLOv3 and OCR. In Intelligent Systems, Technologies and Applications: Proceedings of Sixth ISTA 2020, India; Springer: Berlin/Heidelberg, Germany, 2021; pp. 299–314. [Google Scholar]
Prajwal, M.J.; Tejas, K.B.; Varshad, V.; Murgod, M.M.; Shashidhar, R. Detection of Non-Helmet Riders and Extraction of License Plate Number Using Yolo v2 and OCR Method. Int. J. Innov. Technol. Explor. Eng. 2019, 9, 5167–5172. [Google Scholar]
Diwan, T.; Anirudh, G.; Tembhurne, J.V. Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications. Multimed. Tools Appl. 2023, 82, 9243–9275. [Google Scholar] [CrossRef] [PubMed]
Levenshtein, V.I. Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Sov. Phys. Doklady 1966, 10, 707–710. [Google Scholar]
Shang, H.; Merrettal, T.H. Tries for Approximate String Matching. IEEE Trans. Knowl. Data Eng. 1996, 8, 540–547. [Google Scholar] [CrossRef] [Green Version]
Chen, H. String Metrics and Word Similarity Applied to Information Retrieval. Master’s Thesis, University of East Finland, Joensuu, Finland, 2012. [Google Scholar]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In Proceedings of the AI 2006: Advances in Artificial Intelligence, Hobart, Australia, 4–8 December 2006; Sattar, A., Kang, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]

Figure 1. Profile of a Vignole rail.

Figure 2. Illustration of a rolling mark highlighting information about (1) the rolling mill, (2) the steel grade, (3) the rolling year and (4) the rail profile.

Figure 3. Concept drawing of the applied darkfield illumination.

Figure 4. A comparison between daylight exposure (left) and dark field illumination (right) for rolling marks reveals significant differences in visibility and highlighting of these marks.

Figure 5. Confusion matrix with 65 classes. Classes without entries indicate characters that do not occur in the data used for training or evaluation but are still considered in the model. These classes may represent potential characters that could occur in future data or may be included for completeness.

Figure 6. Detection results using only test data (i.e., images that were not used in the training process) showing high recognition accuracy.

Figure 7. Representation of various rolling marks. Each color represents a different rolling mark.

Figure 8. Representation of various rolling marks. Each color represents a different steel quality. Blue Line: quality R260; Yellow Line: quality R350 HT.

Figure 9. Correct result despite an incorrect detection due to a mounted grounding cable partially obscuring the rolling mark.

Figure 10. In addition to the previously mentioned anomalies, another valuable use case is the detection of missing small irons, as illustrated by the blue rectangle in the image on the far right. By leveraging image analysis techniques, it is possible to identify these specific components and flag instances where they are missing. This information can then be used for maintenance planning and prompt remedial actions, ensuring the integrity and safety of the railway system.

Table 1. Results table including backtrace for a weighted Levenshtein distance between the German words

s = “ I T A L I E N ”

and

t = “ A L L E I N ”

. The orange coloured fields indicate the path between the two words.

Table 1. Results table including backtrace for a weighted Levenshtein distance between the German words

s = “ I T A L I E N ”

and

t = “ A L L E I N ”

. The orange coloured fields indicate the path between the two words.

Exchangeor Keep	← Delete →
← Insert →		„“	I	T	A	L	I	E	N
	„“	0.0	0.9	1.8	2.7	3.6	4.5	5.4	6.3
	A	1.0	1.8	2.7	1.8	2.8	3.8	4.8	5.8
	L	2.0	2.4	3.2	2.7	1.8	2.8	3.8	4.8
	L	3.0	3.3	3.8	3.6	2.7	3.2	4.2	5.2
	E	4.0	4.2	4.7	4.5	3.6	4.1	3.2	4.2
	I	5.0	4.0	5.0	5.4	4.5	3.6	4.1	5.1
	N	6.0	4.9	5.9	6.3	5.4	4.5	5.0	4.1

Table 2. Number of individual classes in the data set.

Label	Count
0	6682 (10.4%)	L	35 (0.1%)	<06>	3 (<0.1%)
1	5011 (7.8%)	M	157 (0.2%)	<07>	5344 (8.3%)
2	601 (0.9%)	N	219 (0.3%)	<08>	19 (<0.1%)
3	173 (0.3%)	O	5579 (8.7%)	<09>	-
4	1233 (1.9%)	P	15 (<0.1%)	<10>	19 (<0.1%)
5	1503 (2.3%)	Q	-	<11>	-
6	5373 (8.4%)	R	273 (0.4%)	<12>	-
7	101 (0.2%)	S	491 (0.8%)	<13>	2 (<0.1%)
8	974 (1.5%)	T	246 (0.4%)	<14>	-
9	3092 (4.8%)	U	2929 (4.6%)	<15>	-
A	312 (0.5%)	V	2211 (3.4%)	<16>	-
B	83 (0.1%)	W	226 (0.4%)	<17>	-
C	2985 (4.7%)	X	841 (1.3%)	<18>	-
D	5235 (8.2%)	Y	284 (0.4%)	<19>	-
E	2589 (4.0%)	Z	221 (0.3%)	<20>	-
F	-	<00>	-	<21>	-
G	19 (<0.1%)	<01>	184 (0.3%)	<22>	-
H	283 (0.4%)	<02>	4 (<0.1%)	<23>	-
I	7374 (11.5%)	<03>	-	<24>	-
J	-	<04>	934 (1.5%)	<25>	-
K	17 (<0.1%)	<05>	241 (0.4%)	.	4 (<0.1%)
				–	69 (0.1%)

Table 3. Graphical representation of the occurring special characters.

Label	Special Character
<01>
<02>
<04>
<05>
<06>
<07>
<08>
<10>
<13>

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Krammer, M.; Pröll, M.; Bürger, M.; Zauner, G. Optical High-Speed Rolling Mark Detection Using Object Detection and Levenshtein Distance. Appl. Sci. 2023, 13, 8678. https://doi.org/10.3390/app13158678

AMA Style

Krammer M, Pröll M, Bürger M, Zauner G. Optical High-Speed Rolling Mark Detection Using Object Detection and Levenshtein Distance. Applied Sciences. 2023; 13(15):8678. https://doi.org/10.3390/app13158678

Chicago/Turabian Style

Krammer, Manuel, Markus Pröll, Martin Bürger, and Gerald Zauner. 2023. "Optical High-Speed Rolling Mark Detection Using Object Detection and Levenshtein Distance" Applied Sciences 13, no. 15: 8678. https://doi.org/10.3390/app13158678

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optical High-Speed Rolling Mark Detection Using Object Detection and Levenshtein Distance

Abstract

Featured Application

Abstract

1. Introduction

2. Rolling Marks

3. Camera-Based Rolling Marks Detection

3.1. Parser Based on a Weighted Levenshtein Distance

3.2. Levenshtein-Distance

3.3. Weighted Levenshtein Distance

3.4. Creating and Parsing with Reference Rolling Marks

3.5. Parsing with the Aid of Reference Rolling Marks

4. Example Evaluation and Results

4.1. Object Detection Results (Single Rolling Mark Characters)

4.2. Measurement Speed Considerations

4.3. Results of the Parser and Rolling Mark Correction and Application Scenarios

5. Summary and Conclusions

6. Outlook

6.1. Semi-Supervised Learning

6.2. Anomaly Detection

7. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI