Assessing thermal imagery integration into object detection methods on air-based collection platforms

doi:10.21203/rs.3.rs-2535939/v1

Download PDF

Article

Assessing thermal imagery integration into object detection methods on air-based collection platforms

https://doi.org/10.21203/rs.3.rs-2535939/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 25 May, 2023

Read the published version in Scientific Reports →

You are reading this latest preprint version

Object detection models commonly focus on utilizing the visible spectrum via Red-Green-Blue (RGB) imagery. Due to various limitations with this approach in low visibility settings, there is growing interest in fusing RGB with thermal long wave infrared (LWIR) (7.5–13.5 µm) images to increase object detection performance. However, we still lack baseline performance metrics evaluating RGB, LWIR and RGB-LWIR fused object detection machine learning models, especially from air-based platforms. This study undertakes such an evaluation finding that a blended RGB-LWIR model generally exhibits superior performance compared to traditional RGB or LWIR approaches. For example, an RGB-LWIR blend only performed 1–5% behind the RGB approach in predictive power across various altitudes and periods of clear visibility. Yet, RGB fusion with a thermal signature overlayed provides edge redundancy and edge emphasis, both which are vital in supporting edge detection machine learning algorithms. This approach has the ability to improve object detection performance for a range of use cases in industrial, consumer, government, and military applications. Finally, this research additionally contributes a novel open labeled training dataset of 6,300 images for RGB, LWIR, and RGB-LWIR fused imagery, collected from air-based platforms, enabling further multispectral machine-driven object detection research.

Physical sciences/Mathematics and computing/Computational science

Physical sciences/Mathematics and computing/Computer science

Physical sciences/Mathematics and computing/Scientific data

Thermal object detection

long wave infrared (LWIR)

computer vision

FLIR

Uncrewed Aerial Systems (UAS)

Machine Learning

YOLO

Despite the recent growth and proliferation of machine learning (ML) object detection algorithms, most approaches commonly focus on the visible light portion of the electromagnetic spectrum, for example, using Red-Green-Blue (RGB) images ^{1 2 3 4}. Hitherto, thermal long wave infrared (LWIR) spectrum has received less research attention for ML object detection activities. While machine-assisted RGB models are effective during daytime periods, machine-assisted LWIR-models are generally more effective at night or during periods of decreased visibility ^{5 6 7 8}. Unlike RGB, LWIR provides superior edge enhancement of radiant object classes to further increase edge detection in object detection algorithms. Given the contrasting strengths and weaknesses between RGB and LWIR, a growing area of multispectral research examines the blending of these different capabilities with the ultimate aim of providing superior object detection ^{9 10 11}. For developing both RGB and LWIR models most software techniques are relatively similar. Although the pricing of LWIR sensors is becoming more economical, cost has traditionally been a prohibiting factor, limiting the amount of multispectral research activities taking place.

Another complementary technology that is rapidly proliferating and becoming easier to access is commercially available off-the-shelf drone platforms ^{12 13}. Increasingly, Uncrewed Aerial Systems (UASs) are being outfitted with not only RGB sensors to collect overhead imagery, but also with an array of other infrared-related sensors, such as LWIR (7.5–13.5 µm), to collect valuable multispectral data ¹⁴.

Given these limitations, the literature currently lacks scientific evaluation metrics on how different thermal image fusion techniques affect model performance when utilizing object detection methods from drone platforms. This research therefore intends to investigate the following research question:

1. How do fused RGB-LWIR object detection models perform against separate RGB and LWIR approaches when measured at various fixed altitudes and different times of the day?

The key contribution of this research is providing new quantitative scientific information for how RGB, LWIR, and RGB-LWIR object detection models perform from air-based drone platforms. The research focuses on identifying common object classes (cars, trucks etc.), as these provide generalizable insights for a wide range of object detection use cases in industrial, consumer, government, and military applications. Figure 1 illustrates the comparative differences between fused RGB-LWIR imagery versus traditional RGB imagery in a low visibility setting.

In the following section a literature review is undertaken. Then results are presented in Section III, before returning in the discussion to reevaluate the key research question in Section IV. A method is described in Section V capable of answering the research question identified.

The existing literature identifies two key benefits for integrating LWIR with RGB to enhanced ML object detection models. Firstly, RGB sensors are limited in their capacity to detect in low visibility settings, or in situations where visibility is limited due to foliage, smoke or fog ^{15 16}. Therefore, integrating LWIR imagery enhances both human and machine three-dimensional (3D) depth perception when compared to traditional RGB imagery, providing an overall increase in situational awareness ¹⁷.

Secondly, LWIR sensors are superior at segmenting the object of interest from the image background (‘edge detection’) [16], provided that the object of interest is radiating a thermal signature (as illustrated visually already in Fig. 1). LWIR object detection is regularly adopted in military and homeland security use cases to detect illicit activity and identify targets, especially at night ^18,19. However, most infrared (IR) sensors for military and national security applications use near-infrared (NIR), which operates between .75–1.3 µm and does not work well for drone-based ML object detection models ²⁰.

In terms of the wider literature, one recent study evaluated ML object detection models that analyzed RGB and LWIR imagery to better identify humans from a ground-based system ¹⁸. In adverse weather conditions, when attempting to identify humans, the LWIR model achieved a mean Average Precision (mAP) of 97.9% while the RGB model achieved a mAP of 19.6% ¹⁸. Indeed, both LWIR and RGB models were tested, although no baseline performance metrics were provided for a blended RGB-LWIR approach. The research used ground-based sensors and utilized version 3 of the pre-trained convolutional neutral network ‘You Only Look Once’ (YOLOv3). A thermal dataset was used to attempt to identify humans and animals during various weather conditions ranging from clear conditions to inclement conditions with limited visibility. Although their LWIR model outperformed the RGB model, the performance gap was most significant when visibility was limited. The thermal ML model was also highly accurate in differentiating multiple object classes in a single image, reaching a recall of 98% with an F1 score of 97% ¹⁸.

A separate research study recently used LWIR imagery to train an object detection model that achieved an average accuracy of 91.9% during periods of limited visibility ²¹. However, it was identified that a shortfall of LWIR object detection is that LWIR cameras have difficulty identifying object classes at larger distance. As the object class is farther away, the thermal edges begin to blur and the thermal signature resolution deteriorates, making it difficult for the ML model to conduct edge detection ²². Thus, because of this resolution decrease over distance, this supports the conjecture that fusing RGB with LWIR provides additional value in model performance.

Another research study that used LWIR sensors from a low-flying multirotor quadcopter collected thermal data to create a human detection model that identifies human heat signatures. The approach was applied to a rescue operations use case following natural disasters by using object segmentation and fusion technique called 4-channel ²³. The 4-channel ML model conducted “early fusion” of RGB-thermal images, performing better than the traditional “late fusion” model. This study focused on object segmentation of LWIR images taken from the UAS post-flight and did not conduct object detection from LWIR images or RGB-LWIR fused images.

The reliability of LWIR sensors to work in complex environments has led to adoption in numerous technologies. For example, LWIR sensors are used to advance semantic segmentation, classifying pixels in an image associated to a label class, with key use cases in autonomous driving ^{24 25 26}. However, a key issue in the application of this technology to autonomous driving is the low resolution and heavy noise present in LWIR images when compared to RGB methods ²⁷.

LWIR based object detection does present several key challenges for ML algorithms. One such issue is blurring in LWIR imagery caused by object movement or LWIR camera movement ²⁸. One study addressed this issue using a LWIR image restoration algorithm that conducts super-resolution reconstruction and deblurring while simultaneously running the object detection algorithm ²⁸. Although the methods to deblur LWIR images does increase the overall accuracy of the object detection results, it also requires increased computer processing to conduct simultaneous image restoration and object detection when conducting real-time inference on edge devices. In this research study there is an undetermined level of image blurring induced by the moving airframe with RGB-LWIR cameras.

Another issue with LWIR object detection is that there exists a shortage of publicly available LWIR datasets or pre-trained LWIR models ²⁹. Indeed, there are multiple pre-trained RGB ML models and datasets to choose from, but very few LWIR datasets and pre-trained models. Labelled LWIR datasets are scarce because they are expensive to collect and produce, and LWIR cameras are not widely available to the same degree as RGB cameras ^{29 30}.

A key benefit to blended RGB-LWIR is the ability to adjust fusion levels between the RGB-LWIR sensors as ambient and ground temperatures increase, creating an effect called thermal crossover. When the target object is the same temperature as the ground, thermal cross over takes place leading to a loss of contrast between the target object and the ground ³¹. Depending on the environment and season, thermal crossover typically occurs twice a day. Via a ground based LWIR ML object detection model approach, thermal crossover is not as large an issue because the horizon provides a dark background to contrast against thermal target objects. However, from a UAS the bird’s-eye view of the ground offers significantly lower contrast with the target object. When using an LWIR camera without an RGB camera or having the ability to conduct RGB-LWIR fusion, the ambient and ground temperature must be factored in prior to flight.

Thermal object detection is also advantageous because of the ability to conform an image to a desired color palette ³², thereby reducing the overall number of colors compared to RGB images ³³. Often, RGB images can have backgrounds that blend in with the object of interest ³⁴, making object detection a more challenging task. In contrast, thermal imagery highlights the object of interest and provides a consistent color palette ³⁵. The study results will now be presented.

The mAP results are reported for all three models at various fixed elevations to measure performance change over different elevations, as well as daily time periods. Therefore, the findings are segmented for eight elevations, including 15 m (50 ft), 30 m (100 ft), 45 m (150 ft), 61 m (200 ft), 76 m (250 ft), 91 m (300 ft), 106 m (350 ft), and 121 m (400 ft). The test area selected was a busy four-way intersection in Gaithersburg, Maryland. This intersection was selected because of the complex environmental blend of objects among various lighting shades. The collection site also provided multiple vantage points of vehicles entering and leaving the intersection, thus helping to generate realistic data.

The best overall predictive performance was exhibited by the RGB-LWIR model (with a mean mAP of 59.8%), followed by the traditional RGB model (58.6%). In contrast, the LWIR model performed the poorest (with a mean mAP of 36.3%). The best individual performing instance was the blended RGB-LWIR hybrid at 47 m elevation during the Pre-Sunrise period (with a mean mAP of 94.6%). Moreover, the worst performing instance was the LWIR model at 125 m during the Post-Sunrise period (with a mean mAP of 2.1%).

Figure 2 (A) graphically depicts all 120 model performance data points for each model type, elevation, and time of day period. The RGB-LWIR model performed very strongly during periods of limited visibility (Pre-Sunrise and Post-Sunset), while the RGB models exhibited superior performance during daytime periods of visibility. In particular, the RGB-LWIR fusion approach demonstrated strong predictive power during the Pre-Sunrise and Post-Sunset periods between elevations of 16 m and 67 m. During periods of clear visibility, RGB and RGB-LWIR mAP decreases gradually as elevation increases. Conversely, during periods of limited visibility mAP model performance decreases at a quicker rate, with performance declining upwards of 78 m. Although largely inferior in performance when compared to the other models, the LWIR performance was generally consistent across all five illumination periods.

As visualized within Fig. 2 (B), when using the traditional RGB model as a baseline, the RGB-LWIR model had up to a 49.9% increase in performance during the Post-Sunset period. Out of the eighty total elevation and time-of-day data points, the RGB-LWIR approach ranked in all top fifteen places with mean mAP values averaging 82.7%. In contrast, while the LWIR model achieved the bottom twelve lowest ranking positions with mean performance averaging 8.6%. The RGB-LWIR model performed best overall at 47 m during Pre-Sunrise hours (with a mean mAP of 94.6%) and performed worst overall at 121 m, also at Pre-Sunrise hours (with a mean mAP of 16.7%).

The RGB approach achieved the highest mAP during periods of clear visibility (Post-Sunrise to Pre-Sunset). Figure 2 (B) visualizes model performance against the RGB baseline, demonstrating that RGB approaches are best suited for daytime conditions while the RGB-LWIR approach is best suited for nighttime conditions. The greatest difference between the RGB and RGB-LWIR model performance during clear visibility conditions was at noon (7.25% difference in mean mAP), followed by Post-Sunrise (3.2% difference in mean mAP) and then Pre-Sunset (1.2% difference in mean mAP). The RGB model performed best at 16 m at noon (94.5% in mean mAP) and performed worst at 125 m during Post-Sunset hours (5.8% in mean mAP).

The LWIR approach had the lowest predictive power of all three models, with a negative performance change of up to -69.2% when compared to the RGB model baseline. The three least performing instances for LWIR occurred at the Post-Sunrise period with negative performance values ranging between − 59.0% and 69.2%. Noon was the next lowest performing period for LWIR, with the top 3 negative performance values reaching RGB baseline differences between − 52.03% and 39.95%. The LWIR model also suffered the sharpest decrease in performance over elevation, with the worst performance localized between 94 m to 121 m. The LWIR model performed best at 16 m during Post-Sunset period (74.3% mAP) and performed worst at Pre-Sunset at 94 m (9.5% mAP).

During Post-Sunrise, RGB and RGB-LWIR approaches both performed similarly below 94 m, with RGB-LWIR performing consistently between − 4% and 8% of the RGB baseline. LWIR regularly performed far below the RGB baseline, ranging between − 9% and − 69.3%, explained by factors already well identified in the literature (e.g., higher altitudes lead to decreased resolution when compared to RGB). Both LWIR and RGB-LWIR performance deteriorated rapidly at 109 m and 125 m when compared to the RGB baseline (for example, between − 11% to -69.3% below the traditional RGB approach). The LWIR model performed the worst during periods of clear visibility, for example, with the worse LWIR performance occurring Post-Sunrise (-24.7% from RGB baseline), Pre-Sunset (-12.1% from RGB baseline) and noon (-11.3% from RGB baseline).

In Fig. 3 (A), when analyzing model performance by elevation and daytime periods (Post-Sunrise, noon, Pre-Sunset) both RGB and RGB-LWIR models performed similarly at all elevations. Both models had near identical mAP performance between 16 m and 62 m. Both RGB and RGB-LWIR models also shared comparable mAP performance decreases over different elevations. Both RGB and RGB-LWIR models achieved the highest mAP at the lowest altitudes and gradually decreased mAP performance over elevation, losing approximately 1–5% in mAP performance every 15 m.

In contrast, in Fig. 3 (B) when analyzing model performance at night, the RGB-LWIR model significantly outperformed both RGB and LWIR approaches. Unlike the RGB model which had a consistent reduction in mAP over altitude, the RGB-LWIR model performed consistently between 16 m and 47 m with performance slightly increasing over increasing altitudes (14.1% mAP increase between 16 m and 47 m). At 47 m, the RGB-LWIR approach had a higher mAP (94.6%) than the RGB model, with the best predictive performance at the same altitude during periods of daytime illumination (91.5%).

Given the lack of baseline performance metrics evaluating RGB, LWIR, and LWIR-RGB fused object detection machine learning models, especially from air-based platforms, this study undertook such an assessment. Whereas most object detection models have commonly focus on utilizing the visible spectrum using RGB imagery, the method undertaken here fused RGB with thermal LWIR (7.5–13.5 µm) images.

Thus, over 6,300 training images were collected for RGB and LWIR sensors, mounted on a multirotor drone, creating an openly available fused RGB-LWIR dataset. Three object detection models were then trained, each based on one of the three image types identified (RGB, LWIR, and RGB-LWIR). After training, an additional 1,200 testing images were collected from eight separate altitudes at five separate periods of the day. These images were then used to assess mAP performance for key uncertainty factors (altitude and time of day).

This discussion will return to the research question identified earlier in this paper, to discuss the key findings now that results have been obtained and reported.

How do fused RGB-LWIR object detection models perform against separate RGB and LWIR approaches, when measured at various fixed altitudes and different times of the day?

When analyzing the mean average across all mAP results, the RGB-LWIR method outperformed the RGB approach by 5.6%. Although the mean mAP is similar between these two models, both performed inversely under different illumination conditions and altitudes. For example, the RGB-LWIR approach was superior for conducting object detection in periods of limited visibility. This finding is counterintuitive to the belief that LWIR by itself would be the best suited sensor to conduct object detection in nighttime settings. The RGB-LWIR fusion helped to dampen long-distance blurring and thus the resolution loss that LWIR sensors suffer from as object classes become farther away. The RGB fusion allows for an additional edge to be overlayed on the thermal signature of the object class, providing edge redundancy and edge emphasis, both which are vital in supporting edge detection machine learning algorithms.

Counter to expectation, the LWIR model performed best during the Post-Sunset period. Surfaces during Post-Sunset periods generally retain ample amounts of heat from the day. Increased ground surface temperature provides less contrast to the object class (thermal crossover) which would reduce edge detection. The Post-Sunrise period is associated with cooler ground temperatures, thus providing greater contrast to warm object classes, and resulting in higher predictive power. Post-Sunset ground temperature is still warm, decreasing the background contrast of object classes. The LWIR model had the best performance in Post-Sunset conditions, but performed very poorly in Pre-Sunrise conditions. One limitation is that these findings may be season-dependent, and therefore further research should be conducted during a greater annual range of months (particularly summer months) to further quantify these differences in sensor performance during larger temperature ranges.

When visualizing mAP metrics across different periods of the day, there was a slight upward trend in the Pre-Sunset results between 109 m and 121. This upward trend is most likely due to variety in the types of images being introduced into the model. The images tested at 121 m are likely of higher quality than the images at 109 m. This is due to external factors such as the number of object classes in each image, lighting, vehicle angle and drone position. Sun position (sunrise and sunset) may have also played a role in RGB sensor and model performance. For a truly consistent experiment, a static object class can be used in future research to measure model performance and sensor type over elevation and illumination levels. However, this approach is not necessarily feasible for realistic applications where complex scenes with changing or moving object-classes are present.

When analyzing model performance over elevation, during daytime hours, model performance decreased gradually over elevation. Excluding LWIR, performance generally decreased consistently between 1–5% over every 15 m, as reported in Fig. 3 (A). During nighttime hours the decrease in mAP was much sharper, with performance dropping significantly at 62 m (15.3% reduction in mAP).

The model performance metrics from this research indicate both future research opportunities and research limitations in deploying air-based multispectral object detection models. For example, the results demonstrate that not one specific object detection model type is best suited for all conditions, and that each ML model type has its own strengths and weaknesses for certain situations. More specifically, the RGB model performed best during daytime hours due to superior resolution across all altitudes. In contrast, the RGB-LWIR model performed best at night because of superior edge refining characteristics. However, the LWIR model exhibited the lower performance in all daily time periods because of rapid resolution deterioration as elevation increased.

To conclude, this research successfully quantified the performance of three unique models and found that the RGB-LWIR model generally performed the best. This is because RGB-LWIR provided consistent detection performance across many daily time periods with heterogenous illumination levels. Indeed, the blended RGB-LWIR approach only performed 1–5% behind the RGB approach at various altitudes during periods of clear visibility, while also having the advantage of operating in poor visibility settings.

One final benefit is the open dataset generated from this research. Thus, this labelled imagery could be integrated as training data into future air-based LWIR multispectral object detection research.

This method describes key steps including sensor selection, data collection, image processing and labeling, model training and the testing of air-based models. When these method steps are combined, they produce a final set of model performance metrics capable of answering the research question identified for investigation.

A. SENSOR SELECTION

The LWIR camera selected for this research is the FLIR (Forward-Looking Infrared) Vue Pro R. The FLIR Vue Pro R is a radiometric capable camera designed specifically for drones and costs $2,914 USD. The field of view (FOV) for the camera is 45° with a lens diameter of 6.8mm ³⁶. The 30 Hz variant of the FLIR Vue Pro R will be used. Although the 30 Hz FLIR Vue Pro results in a higher frames per second (30 FPS) compared to the 9 Hz variant (9 FPS), the 30 Hz is export controlled and cannot be purchased outside of the United States. Both 30 Hz and 9 Hz variants produce the same LWIR resolution. The camera resolution is 336 x 256 and has a spectral band of 7.5–13.5 µm. The operating temperature range for the FLIR Vue Pro R is -20° C (-4° F) to 50° C (122° F) ³⁶.

The RGB camera selected for this research is the RunCam 5 Orange, which is designed for drone applications and costs $110 USD. The RunCam 5 uses a Sony IMX377 12 megapixel image sensor which has a FOV of 145° with adjustable resolution, ranging from 1080P at 60 FPS to 4K at 30 FPS ³⁷. 1080P (1920 x 1080 resolution) at 60 FPS (60 Hz) will be used for this research. Shutter speed, ISO, color style, saturation, exposure, contrast, sharpness and white balance are all set to the default settings.

B. DATA COLLECTION

Overhead imagery collection for the air-based ML models is collected from the DJI Inspire 2 (Fig. 4). The RGB and LWIR cameras on the multirotor are co-aligned to maintain the same field of view to ensure that similar images are being collected between the two sensors ³⁸. Data are collected during various times of the day at different temperatures to ensure data diversity. Footage is recorded and extracted on the camera’s micro-SD cards. Frames of interest from the footage will then be extracted and converted into images to train the ML model. Images will also be collected from various altitudes to ensure image diversity and to help reduce model performance loss at higher altitudes.

A 3D printed component for the RGB camera was designed and printed to be able to directly mount the RGB camera to the LWIR camera. The 3D printed mounting bracket reduces parallax as well as ensures the same FOV of both cameras. This fixed FOV makes fusing the LWIR and RGB footage easier in Adobe Premier Pro. The file to print the mounting bracket can be found in the data availability section.

C. IMAGE PROCESSING, AND LABELING

IP techniques are then applied to the original images to increase the quantity available in the training dataset, while simultaneously generating edge-enhanced images to increase model performance ⁴⁰. Six image processing and mathematical morphology techniques are carried out to help increase model performance which include flipping, blurring, blurring & flipping, Gaussian Thresholding (GT), Difference of Gaussians (DoG) and Sobel-XY. See Fig. 6 for a visual example of each of these techniques for RGB, LWIR and fused RGB-LWIR. The blurred and blurred + flipped IP techniques are especially useful because of video vibrations caused by the oscillatory motions from the airframe’s propellers ⁴¹. Model training on blurred images helps to ensure that the model will continue to work when frames are blurred due to camera movement, target object movement, or both. Although counterintuitive, training ML models with blurred images tends to increase detection rates and confidence levels ⁴². All code for generating and exporting augmented images can be found in the image processing link in the data availability section ⁴³.

After image processing, a total of 5,400 new training images are generated, resulting in a total of 6,300 total images. 90% of the dataset (5,670 images) is used for training, 5% (315 images) is used for validation, and the remaining 5% (315 images) is used for testing. None of the newly generated images are used for testing. This is to ensure that testing results are similar across all ML models. Lastly, all images are labeled using LabelImg, which is an open-source python based image labeler ⁴⁴.

D. MODEL TRAINING

This research will utilize YOLOv7 as the Convolutional Neural Network (CNN) to perform object detection ⁴⁵. YOLOv7 was selected because to date it surpasses all existing object detectors in terms of speed and accuracy ⁴⁶. YOLOv7 is considered one the fastest open-source object-detection models currently available ^{47 48 46}. A primary shortfall of this family of object detection models is that YOLO approaches can struggle to detect smaller objects within an image, which is primarily due to spatial constraints in the algorithm ^{49 50}. The standard YOLOv7 variant is used for this research study ⁵¹.

The model is trained through 55 epochs. This number was selected to prevent overtraining. There is an imbalance in the number of car and truck labels in the dataset, making overfitting a possibility if the models are trained through too many epochs ⁵². Cars have the most labels in the dataset while trucks have the least. Training the dataset beyond the 55 epochs selected may result in an increase in false positives, thus decreasing the mAP of the model. After the completion of training the three models (RGB, LWIR and RGB-LWIR) are ready for evaluation from drone-based imagery at different periods of day at various altitudes.

E. TESTING AIR-BASED MODELS

A multirotor drone will be utilized to fly at fixed elevations to determine inference performance via mAP for both sensors and all three model types. As indicated in Fig. 7, to assess the models and sensors new test images will be extracted from video footage, separate from those used for training, collected at 15 m (50 ft), 30 m (100 ft), 45 m (150 ft), 61 m (200 ft), 76 m (250 ft), 91 m (300 ft), 106 m (350 ft), and 121 m (400 ft). Footage cannot be collected above 121 m due to Federal Aviation Administration (FAA) drone regulation that prohibit drones from flying above 121 m (400 ft). Additionally, data will be collected at five different periods of the day. These include Pre-Sunrise (low-thermal cross-over, low illumination), Post-Sunrise (low-thermal cross-over, medium illumination), Solar-Noon (high-thermal cross-over, high illumination), Pre-Sunset (high-thermal cross-over, medium illumination) and Post-Sunset (high-thermal cross-over, low illumination). Atmospheric and location related metadata will also be recorded prior to each flight, to support both this study but also the reusability of images in future research. This metadata includes temperature (C°), wind speed (meters per second), illumination (lux), time, date, and location.

Five test images will be extracted at every elevation for each image type. This will result in 120 images (5 RGB, LWIR and RGB-LWIR images across the 8 elevations) per flight, with 600 labeled images (5 flights) per daily period. Following ten full flights, a total of 1,200 test images will be collected to evaluate model and sensor performance. When calculating mAP for test images, variables will be constrained to a confidence level of 10% with an intersection of union (IoU) of 65%. After executing the test code, the notebook will export critical metrics such as precision, recall, precision-recall curve, [email protected] and [email protected]:95. For this research, only [email protected] will be used to measure sensor and model performance at fixed elevations. The labeled test image dataset and test script can be found in the Test Data link and YOLOv7 Training Code notebook link in the data availability section.

ACKNOWLEDGMENT

The authors would like to thank the Geography & Geoinformation Science Department at George Mason University for supporting and funding this research. JEG would like to thank the United States Army for resource contributions that allowed him to carry out this research.

COMPETING INTERESTS

JEG and EJO have no competing interests.

AUTHOR CONTRIBUTION

JEG focused on data collection, processing, model creation and paper writing, while EJO provided guidance on the research method, produced visualizations of the results, and contributed to writing the paper.

DATA AVAILABILITY

The datasets generated during and analyzed during the current study are available in the Zenodo repository. Links to the code and datasets to the Zenodo repositories are provided in the below hyperlinked text.

Air-based labeled data for all object classes

Air-based ML model weights

Image processing code

YOLOv7 training & testing code

Inference videos

Test images with labels (images at elevation with labels)

3D printed RGB mount

Hao, J. & Ho, T. K. Machine Learning Made Easy: A Review of Scikit-learn Package in Python Programming Language. Journal of Educational and Behavioral Statistics 44, 348–361 (2019).
Lahoud, J. & Ghanem, B. 2D-Driven 3D Object Detection in RGB-D Images. in 2017 IEEE International Conference on Computer Vision (ICCV) 4632–4640 (2017). doi:10.1109/ICCV.2017.495.
Alldieck, T., Bahnsen, C. H. & Moeslund, T. B. Context-Aware Fusion of RGB and Thermal Imagery for Traffic Monitoring. Sensors 16, 1947 (2016).
Oughton, E. J. & Mathur, J. Predicting cell phone adoption metrics using machine learning and satellite imagery. Telematics and Informatics 62, 101622 (2021).
St-Laurent, L., Maldague, X. & Prevost, D. Combination of colour and thermal sensors for enhanced object detection. in 2007 10th International Conference on Information Fusion 1–8 (2007). doi:10.1109/ICIF.2007.4408003.
Nirgudkar, S. & Robinette, P. Beyond Visible Light: Usage of Long Wave Infrared for Object Detection in Maritime Environment. in 2021 20th International Conference on Advanced Robotics (ICAR) 1093–1100 (2021). doi:10.1109/ICAR53236.2021.9659477.
Choi, Y. et al. KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving. IEEE Transactions on Intelligent Transportation Systems 19, 934–948 (2018).
Tian, G., Liu, J. & Yang, W. A dual neural network for object detection in UAV images. Neurocomputing 443, 292–301 (2021).
Fei, S. et al. UAV-based multi-sensor data fusion and machine learning algorithm for yield prediction in wheat. Precision Agric (2022) doi:10.1007/s11119-022-09938-8.
Jiang, C. et al. Object detection from UAV thermal infrared images and videos using YOLO models. International Journal of Applied Earth Observation and Geoinformation 112, 102912 (2022).
De Oliveira, D. C. & Wehrmeister, M. A. Using Deep Learning and Low-Cost RGB and Thermal Cameras to Detect Pedestrians in Aerial Images Captured by Multirotor UAV. Sensors 18, 2244 (2018).
Wargo, C., Snipes, C., Roy, A. & Kerczewski, R. UAS industry growth: Forecasting impact on regional infrastructure, environment, and economy. in 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC) 1–5 (2016). doi:10.1109/DASC.2016.7778048.
Canis, B. Unmanned Aircraft Systems (UAS): Commercial Outlook for a New Industry. Unmanned Aircraft Systems 17.
Kazaz, B. et al. Deep Learning-Based Object Detection for Unmanned Aerial Systems (UASs)-Based Inspections of Construction Stormwater Practices. Sensors 21, 2834 (2021).
Cho, M. A Study on the Obstacle Recognition for Autonomous Driving RC Car Using LiDAR and Thermal Infrared Camera. in 2019 Eleventh International Conference on Ubiquitous and Future Networks (ICUFN) 544–546 (2019). doi:10.1109/ICUFN.2019.8806152.
Altay, F. & Velipasalar, S. Pedestrian Detection from Thermal Images Incorporating Saliency Features. in 2020 54th Asilomar Conference on Signals, Systems, and Computers 1548–1552 (2020). doi:10.1109/IEEECONF51394.2020.9443411.
Weinmann, M. et al. Thermal 3D mapping for object detection in dynamic scenes. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. II–1, 53–60 (2014).
Krišto, M., Ivasic-Kos, M. & Pobar, M. Thermal Object Detection in Difficult Weather Conditions Using YOLO. IEEE Access 8, 125459–125476 (2020).
Özbay, M. & Şahingil, M. C. A fast and robust automatic object detection algorithm to detect small objects in infrared images. in 2017 25th Signal Processing and Communications Applications Conference (SIU) 1–4 (2017). doi:10.1109/SIU.2017.7960456.
Luo, Y., Remillard, J. & Hoetzer, D. Pedestrian detection in near-infrared night vision system. in 2010 IEEE Intelligent Vehicles Symposium 51–58 (2010). doi:10.1109/IVS.2010.5548089.
Sachan, R., Kundra, S. & Dubey, A. K. An Efficient Algorithm for Object Detection in Thermal Images using Convolutional Neural Networks and Thermal Signature of the Objects. in 2022 4th International Conference on Energy, Power and Environment (ICEPE) 1–6 (2022). doi:10.1109/ICEPE55035.2022.9798144.
Setjo, C. H., Achmad, B., & Faridah. Thermal image human detection using Haar-cascade classifier. in 2017 7th International Annual Engineering Seminar (InAES) 1–6 (2017). doi:10.1109/INAES.2017.8068554.
Speth, S. et al. Deep learning with RGB and thermal images onboard a drone for monitoring operations. Journal of Field Robotics 39, 840–868 (2022).
Agrawal, K. & Subramanian, A. Enhancing Object Detection in Adverse Conditions using Thermal Imaging. Preprint at https://doi.org/10.48550/arXiv.1909.13551 (2019).
Sun, Y., Zuo, W. & Liu, M. RTFNet: RGB-Thermal Fusion Network for Semantic Segmentation of Urban Scenes. IEEE Robotics and Automation Letters 4, 2576–2583 (2019).
Sun, Y., Zuo, W., Yun, P., Wang, H. & Liu, M. FuseSeg: Semantic Segmentation of Urban Scenes Based on RGB and Thermal Data Fusion. IEEE Transactions on Automation Science and Engineering 18, 1000–1011 (2021).
Dai, X., Yuan, X. & Wei, X. TIRNet: Object detection in thermal infrared images for autonomous driving. Appl Intell 51, 1244–1261 (2021).
Batchuluun, G. et al. Deep Learning-Based Thermal Image Reconstruction and Object Detection. IEEE Access 9, 5951–5971 (2021).
Blythman, R. et al. Synthetic Thermal Image Generation for Human-Machine Interaction in Vehicles. in 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX) 1–6 (2020). doi:10.1109/QoMEX48832.2020.9123135.
Liu, P., Li, F., Yuan, S. & Li, W. Unsupervised Image-Generation Enhanced Adaptation for Object Detection in Thermal Images. Mobile information systems 2021, 1–6 (2021).
Zhao, H., Ji, Z., Li, N., Gu, J. & Li, Y. Target Detection over the Diurnal Cycle Using a Multispectral Infrared Sensor. Sensors (Basel) 17, 56 (2016).
Agrawal, D. & Karar, V. Color Palette Selection In Thermal Imaging For Enhancing Situation Awareness During Detection-Recognition Tasks. in 2018 International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE) 1227–1232 (2018). doi:10.1109/ICRIEECE44171.2018.9008486.
Nguyen, H. V. & Tran, L. H. Application of graph segmentation method in thermal camera object detection. in 2015 20th International Conference on Methods and Models in Automation and Robotics (MMAR) 829–833 (2015). doi:10.1109/MMAR.2015.7283983.
Rai, M. et al. An improved statistical approach for moving object detection in thermal video frames. Multimed Tools Appl 81, 9289–9311 (2022).
Guo, Z., Li, X., Xu, Q. & Sun, Z. Robust semantic segmentation based on RGB-thermal in variable lighting scenes. Measurement 186, 110176 (2021).
FLIR Vue Pro R Radiometric Drone Thermal Camera | Teledyne FLIR. https://www.flir.com/products/vue-pro-r?vertical=suas&segment=oem.
RunCam 5 Orange. RunCam Store https://shop.runcam.com/runcam-5-orange/.
Bergeron, M. A. Simplicity vs. Flexibility; an Integrated System Approach to Stereography. in SMPTE International Conference on Stereoscopic 3D for Media and Entertainment 1–15 (2010). doi:10.5594/M001401.
Roy, S. Deep active learning for object detection. 12.
Galvez, R. L., Bandala, A. A., Dadios, E. P., Vicerra, R. R. P. & Maningo, J. M. Z. Object Detection Using Convolutional Neural Networks. in TENCON 2018 - 2018 IEEE Region 10 Conference 2023–2027 (2018). doi:10.1109/TENCON.2018.8650517.
Wu, Y., Zhang, H., Li, Y., Yang, Y. & Yuan, D. Video Object Detection Guided by Object Blur Evaluation. IEEE Access 8, 208554–208565 (2020).
Liu, C., Tao, Y., Liang, J., Li, K. & Chen, Y. Object Detection Based on YOLO Network. in 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC) 799–803 (2018). doi:10.1109/ITOEC.2018.8740604.
Gallagher, James. RGB-TIR Image Processor.
GitHub. LabelImg. (2022).
JUL 13, B. D. & Read, 2022 5 Min. How to Train YOLOv7 on a Custom Dataset. Roboflow Blog https://blog.roboflow.com/yolov7-custom-dataset-training-tutorial/ (2022).
Wang, C.-Y., Bochkovskiy, A. & Liao, H.-Y. M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint at http://arxiv.org/abs/2207.02696 (2022).
YOLOv7 PyTorch Object Detection Model. Roboflow https://models.roboflow.com/object-detection/yolov7.
JUL 17, J. S. & Read, 2022 6 Min. YOLOv7 - A breakdown of how it works. Roboflow Blog https://blog.roboflow.com/yolov7-breakdown/ (2022).
Gandi, R. R-CNN, Fast R-CNN, Faster R-CNN, YOLO — Object Detection Algorithms | by Rohith Gandhi | Towards Data Science. Towards Datat Science https://towardsdatascience.com/r-cnn-fast-r-cnn-faster-r-cnn-yolo-object-detection-algorithms-36d53571365e?gi=b2d45005e9a2.
YOLOv7 Paper Explanation: Object Detection and YOLOv7 Pose. https://learnopencv.com/yolov7-object-detection-paper-explanation-and-inference/ (2022).
Huang, Z. et al. Making accurate object detection at the edge: review and new approach. Artif Intell Rev 55, 2245–2274 (2022).
Brownlee, J. What is the Diﬀerence Between a Batch and an Epoch in a Neural Network? 5.

No competing interests reported.

SupplementaryEvidence.docx

Download PDF

Journal Publication

published 25 May, 2023

Read the published version in Scientific Reports →

Editorial decision: Major revision
26 Mar, 2023
Reviews received at journal
11 Mar, 2023
Reviewers agreed at journal
07 Mar, 2023
Reviewers agreed at journal
28 Feb, 2023
Reviewers invited by journal
07 Feb, 2023
Editor assigned by journal
07 Feb, 2023
Editor invited by journal
07 Feb, 2023
Submission checks completed at journal
06 Feb, 2023
First submitted to journal
31 Jan, 2023

You are reading this latest preprint version

Assessing thermal imagery integration into object detection methods on air-based collection platforms

Status:

Journal Publication

Version 1

Abstract

Figures

I. Introduction

Ii. Literature Review

Iii. Results

Iv. Discussion

V. Methods

Declarations

References

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1