PANDORA: A Panoramic Detection Dataset for Object with Orientation

Xu, Hang; Zhao, Qiang; Ma, Yike; Li, Xiaodong; Yuan, Peng; Feng, Bailan; Yan, Chenggang; Dai, Feng

doi:10.1007/978-3-031-20074-8_14

PANDORA: A Panoramic Detection Dataset for Object with Orientation

Conference paper
First Online: 12 November 2022

1776 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13668))

Abstract

Panoramic images have become increasingly popular as omnidirectional panoramic technology has advanced. Many datasets and works resort to object detection to better understand the content of the panoramic image. These datasets and detectors use a Bounding Field of View (BFoV) as a bounding box in panoramic images. However, we observe that the object instances in panoramic images often appear with arbitrary orientations. It indicates that BFoV as a bounding box is inappropriate, limiting the performance of detectors. This paper proposes a new bounding box representation, Rotated Bounding Field of View (RBFoV), for the panoramic image object detection task. Then, based on the RBFoV, we present a PANoramic Detection dataset for Object with oRientAtion (PANDORA). Finally, based on PANDORA, we evaluate the current state-of-the-art panoramic image object detection methods and design an anchor-free object detector called R-CenterNet for panoramic images. Compared with these baselines, our R-CenterNet shows its advantages in terms of detection performance. Our PANDORA dataset and source code are available at https://github.com/tdsuper/SphericalObjectDetection.

H. Xu and Q. Zhao—This work was done when Hang Xu and Qiang Zhao were at ICT.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Coors, B., Condurache, A.P., Geiger, A.: Spherenet: learning spherical representations for detection and classification in omnidirectional images. In: ECCV (2018)
Google Scholar
Anguelov, D., et al.: Google street view: capturing the world at street level. Computer 43(6), 32–38 (2010)
Article Google Scholar
Chou, S.H., Sun, C., Chang, W.Y., Hsu, W.T., Sun, M., Fu, J.: 360-indoor: towards learning real-world objects in 360deg indoor equirectangular images. In: WACV (2020)
Google Scholar
Cormack, R.: Flattening the earth: two thousand years of map projections by John P. Snyder; two by two: twenty-two pairs of maps from the newberry library illustrating five hundred years of western cartographic history by James Akerman; Robert Karrow; David Buisseret. ISIS 85(3), 488–489 (1994)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Everingham, M., Eslami, S.M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. IJCV 111(1), 98–136 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: CVPR (2017)
Google Scholar
Huang, J., Chen, Z., Research, A., Ceylan, U.D., Hailin, U.: 6-DOF VR videos with a single 360-camera. In: VR (2017)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Lee, Y., Jeong, J., Yun, J., Cho, W., Yoon, K.J.: SpherePHD: applying CNNs on a spherical PolyHeDron representation of 360 images (2019)
Google Scholar
Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Lin, T.: Labelimg (2015)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Neubeck, A., Gool, L.: Efficient non-maximum suppression. In: ICPR (2006)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Google Scholar
Pearson, F.: Map Projections: Theory and Applications. CRC Press, Boca Raton (1990)
Google Scholar
Putri, S.E., Tulus, T., Napitupulu, N.: Implementation and analysis of depth-first search (DFS) algorithm for finding the longest path. In: InteriOR (2011)
Google Scholar
Ran, L., Zhang, Y., Zhang, Q., Tao, Y.: Convolutional neural network-based robot navigation using uncalibrated spherical images. Sensors 17(6), 1341 (2017)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: CVPR (2017)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Su, Y.C., Grauman, K.: Learning spherical convolution for fast features from 360 imagery. In: CVPR (2017)
Google Scholar
Su, Y., Jayaraman, D., Grauman, K.: Pano2vid: automatic cinematography for watching \(360^{\circ }\) videos. In: ACCV (2016)
Google Scholar
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2020)
Google Scholar
Wang, K.H., Lai, S.H.: Object detection in curved space for 360-degree camera. In: ICASSP (2019)
Google Scholar
Lai, W.-S., Huang, Y., Joshi, N., Buehler, C., Yang, M.-H.: Semantic-driven generation of hyperlapse from 360[formula: see text] video. TVCG 24(9), 2610–2621 (2017)
Google Scholar
Wikipedia contributors: Spherical trigonometry (2021). https://en.wikipedia.org/w/index.php?title=Spherical_trigonometry &oldid=1016967508
Yang, W., Qian, Y., Cricri, F., Fan, L., Kamarainen, J.K.: Object detection in equirectangular panorama (2018)
Google Scholar
Yang, X., Yan, J., Qi, M., Wang, W., Xiaopeng, Z., Qi, T.: Rethinking rotated object detection with gaussian wasserstein distance loss. In: International Conference on Machine Learning (2021)
Google Scholar
Yu, D., Ji, S.: Grid based spherical CNN for object detection from panoramic images. Sensors 19(11), 2622 (2019)
Article Google Scholar
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: CVPR (2020)
Google Scholar
Zhao, P., You, A., Zhang, Y., Liu, J., Tong, Y.: Spherical criteria for fast and accurate \(360^{\circ }\) object detection. In: AAAI, vol. 34, pp. 12959–12966 (2020)
Google Scholar
Zhao, Q., Zhu, C., Dai, F., Ma, Y., Zhang, Y.: Distortion-aware CNNs for spherical images. In: Twenty-Seventh International Joint Conference on Artificial Intelligence IJCAI 2018 (2018)
Google Scholar
Zhao, Q., et al.: Unbiased IOU for spherical image object detection. In: AAAI (2022)
Google Scholar
Zhao, Q., Feng, W., Wan, L., Zhang, J.: Sphorb: a fast and robust binary feature on the sphere. Int. J. Comput. Vision 113(2), 143–159 (2015)
Article MathSciNet Google Scholar
Zhao, Q., Wan, L., Feng, W., Zhang, J., Wong, T.T.: Cube2video: navigate between cubic panoramas in real-time. IEEE Trans. Multimedia 15(8), 1745–1754 (2013)
Article Google Scholar
Zheng, J., et al.: Gait recognition in the wild with multi-hop temporal switch. In: ACM MM (2022)
Google Scholar
Zheng, J., Liu, X., Liu, W., He, L., Yan, C., Mei, T.: Gait recognition in the wild with dense 3D representations and a benchmark. In: CVPR, pp. 20228–20237 (2022)
Google Scholar
Zhou, X., Wang, D., Krhenbühl, P.: Objects as points. arXiv (2019)
Google Scholar

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China (2020YFB1406604) and the National Natural Science Foundation of China (62072438, U1936110, 61931008, U21B2024).

Author information

Authors and Affiliations

Hangzhou Dianzi University, Hangzhou, China
Hang Xu & Chenggang Yan
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Hang Xu, Qiang Zhao, Yike Ma & Feng Dai
Huawei Noah’s Ark Lab, Beijing, China
Xiaodong Li, Peng Yuan & Bailan Feng
State Key Laboratory of Media Convergence Production Technology and Systems, Beijing, China
Chenggang Yan

Authors

Hang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yike Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Bailan Feng
View author publications
You can also search for this author in PubMed Google Scholar
Chenggang Yan
View author publications
You can also search for this author in PubMed Google Scholar
Feng Dai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Qiang Zhao or Feng Dai .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 906 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, H. et al. (2022). PANDORA: A Panoramic Detection Dataset for Object with Orientation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13668. Springer, Cham. https://doi.org/10.1007/978-3-031-20074-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-20074-8_14
Published: 12 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20073-1
Online ISBN: 978-3-031-20074-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics