Skip to main content
Log in

Interactive object annotation based on one-click guidance

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Due to the large workload of manual annotation of datasets, uneven data quality and high professional thresholds have been a problem. Based on the idea of semi-automatic annotation, this article discusses the method use of interactive methods to obtain accurate annotations of objects. We propose a method of human–machine interactive object annotation based on one-click guidance. Specifically, we click on a point close to the center of the object and use the prior information of this point to give a guide to the model. The advantages of our method are fourfold: (1) the simulated click method is transferable and can be labeled across datasets; (2) clicks help to eliminate irrelevant areas within the bounding box; (3) the operation is more convenient and does not require artificial boxes, we only need to give the relevant location information; (4) our method supports additional click annotations for further correction. To verify the effectiveness of the proposed method, we conducted a lot of experiments on the KITTI and PASCAL VOC2012 datasets, and the results proved that our model has improved average IoU by 18.1% and 14.6% compared with Anno-Mage and CVAT, respectively. Our method focuses on improving the accuracy and efficiency of annotation, and provides a new idea for the field of semi-automatic annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5296–5305

  2. Nandhini P, Kuppuswami S, Malliga S, DeviPriya R (2022) Enhanced rank attack detection algorithm (e-rad) for securing rpl-based iot networks by early detection and isolation of rank attackers. J Supercomput 1–24

  3. Suseendran G, Akila D, Vijaykumar H, Jabeen TN, Nirmala R, Nayyar A (2022) Multi-sensor information fusion for efficient smart transport vehicle tracking and positioning based on deep learning technique. J Supercomput 1–26

  4. Varga V, Lőrincz A (2020) Reducing human efforts in video segmentation annotation with reinforcement learning. Neurocomputing 405:247–258

    Article  Google Scholar 

  5. Kishorekumar R, Deepa P (2020) A framework for semantic image annotation using legion algorithm. J Supercomput 76(6):4169–4183

    Article  Google Scholar 

  6. Pham T-N, Nguyen V-H, Huh J-H (2023) Integration of improved yolov5 for face mask detector and auto-labeling to generate dataset for fighting against covid-19. J Supercomput 1–27

  7. Boukthir K, Qahtani AM, Almutiry O, Dhahri H, Alimi AM (2022) Reduced annotation based on deep active learning for Arabic text detection in natural scene images. Pattern Recogn Lett 157:42–48

    Article  Google Scholar 

  8. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173

    Article  Google Scholar 

  9. Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

  10. Acuna D, Ling H, Kar A, Fidler S (2018) Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 859–868

  11. Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204

    Article  Google Scholar 

  12. Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 891–898

  13. Zhang S, Liew JH, Wei Y, Wei S, Zhao Y (2020) Interactive object segmentation with inside–outside guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12234–12244

  14. Pacha S, Murugan SR, Sethukarasi R (2020) Semantic annotation of summarized sensor data stream for effective query processing. J Supercomput 76(6):4017–4039

    Article  Google Scholar 

  15. Schembera B (2021) Like a rainbow in the dark: metadata annotation for hpc applications in the age of dark data. J Supercomput 77(8):8946–8966

    Article  Google Scholar 

  16. Ling H, Gao J, Kar A, Chen W, Fidler S (2019) Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5257–5266

  17. Gao X, Zhang G, Xiong Y (2022) Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel. Measurement 194:111001

    Article  Google Scholar 

  18. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  19. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361

  20. Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237

    Article  Google Scholar 

  21. Tzutalin: LabelImg. https://github.com/tzutalin/labelImg (2015)

  22. Dutta A, Zisserman A (2019) The via annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2276–2279

  23. Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 2(5), 6

  24. christopher5106: FastAnnotationTool. https://github.com/christopher5106/FastAnnotationTool (2016)

  25. virajmavani: Anno-mage. https://github.com/virajmavani/semi-auto-image-annotation-tool (2018)

  26. OpenVINO: CVAT. https://github.com/openvinotoolkit/cvat (2020)

  27. Wang B, Wu V, Wu B, Keutzer K (2019) Latte: accelerating lidar point cloud annotation via sensor fusion, one-click annotation, and tracking. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp 265–272

  28. Piewak F, Pinggera P, Schafer M, Peter D, Schwarz B, Schneider N, Enzweiler M, Pfeiffer D, Zollner M (2018) Boosting lidar-based semantic labeling by cross-modal training data generation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0–0

  29. Yue X, Wu B, Seshia SA, Keutzer K, Sangiovanni-Vincentelli AL (2018) A lidar point cloud generator: from a virtual world to autonomous driving. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp 458–464

  30. Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. In: Conference on Robot Learning. PMLR, pp 1–16

  31. Maninis K-K, Caelles S, Pont-Tuset J, Van Gool L (2018) Deep extreme cut: from extreme points to object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 616–625

  32. Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2017) Extreme clicking for efficient object annotation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4930–4939

  33. Fails JA, Olsen Jr DR (2003) Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp 39–45

  34. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99

    Google Scholar 

  35. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  36. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125

  37. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 510–519

  38. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

Download references

Author information

Authors and Affiliations

Authors

Contributions

GZ and XG wrote the main manuscript text, and YX prepared Figs. 1, 2 and 3, and GZ prepared Figs. 4 and 5, and XG prepared Figs. 6 and 7. All authors reviewed the manuscript.

Corresponding author

Correspondence to Guoying Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, Y., Gao, X. & Zhang, G. Interactive object annotation based on one-click guidance. J Supercomput 79, 16098–16117 (2023). https://doi.org/10.1007/s11227-023-05279-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05279-z

Keywords

Navigation