Elsevier

Pattern Recognition

Volume 116, August 2021, 107929
Pattern Recognition

STDnet-ST: Spatio-temporal ConvNet for small object detection

https://doi.org/10.1016/j.patcog.2021.107929Get rights and content
Under a Creative Commons license
open access

Highlights

  • STDnet-ST is a novel spatio-temporal ConvNet for small object detection.

  • STDnet-ST exploits the correlation of promising regions between frames.

  • An efficient tubelet linking is performed to link small objects across video frames.

  • A novel tubelet suppression algorithm is proposed to avoid unprofitable tubelets.

  • STDnet-ST outperforms its state-of-the-art counterparts for small target detection.

Abstract

Object detection through convolutional neural networks is reaching unprecedented levels of precision. However, a detailed analysis of the results shows that the accuracy in the detection of small objects is still far from being satisfactory. A recent trend that will likely improve the overall object detection success is to use the spatial information operating alongside temporal video information. This paper introduces STDnet-ST, an end-to-end spatio-temporal convolutional neural network for small object detection in video. We define small as those objects under 16×16 px, where the features become less distinctive. STDnet-ST is an architecture that detects small objects over time and correlates pairs of the top-ranked regions with the highest likelihood of containing those small objects. This permits to link the small objects across the time as tubelets. Furthermore, we propose a procedure to dismiss unprofitable object links in order to provide high quality tubelets, increasing the accuracy. STDnet-ST is evaluated on the publicly accessible USC-GRAD-STDdb, UAVDT and VisDrone2019-VID video datasets, where it achieves state-of-the-art results for small objects.

Keywords

Small object detection
Spatio-temporal convolutional network
Object linking

Cited by (0)

Brais Bosquet received the B.Sc. degree and the M.Sc. degree from the Universidade de Santiago de Compostela, Spain, in 2014 and 2015, respectively. He is currently a Ph.D. candidate at the Universidade de Santiago de Compostela. His research interests include object detection, neural networks and image processing.

Manuel Mucientes is an associate professor in computer science and artificial intelligence within the CiTIUS of the Universidade de Santiago de Compostela. He has authored or coauthored more than 100 papers in international journals, book chapters, and conferences. His current research interests are computer vision, in the topics of object detection and tracking based on deep learning; robotics, focused on UAVs (Unmanned Aerial Vehicles); process mining, and machine learning.

Victor M. Brea received the Ph.D. degree in Physics from the Universidade de Santiago de Compostela, Spain, in 2003. He is currently an associate professor in the Centro Singular de Investigación en Tecnoloxas da Información (CiTIUS), Universidade de Santiago de Compostela, Spain. His main research interests include object detection and tracking in the field of computer vision, as well as CMOS vision sensors, and the design of energy efficient sensors.