You Only Look for a Symbol Once: An Object Detector for Symbols and Regions in Documents

Smith, William A. P.; Pillatt, Toby

doi:10.1007/978-3-031-41734-4_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

668 Accesses

Abstract

We present YOLSO, a single stage object detector specialised for the detection of fixed size, non-uniform (e.g. hand-drawn or stamped) symbols in maps and other historical documents. Like YOLO, a single convolutional neural network predicts class probabilities and bounding boxes over a grid that exploits context surrounding an object of interest. However, our specialised approach differs from YOLO in several ways. We can assume symbols of a fixed scale and so need only predict bounding box centres, not dimensions. We can design the grid size and receptive field of a grid cell to be appropriate to the known scale of the symbols. Since maps have no meaningful boundary, we use a fully convolutional architecture applicable to any resolution and avoid introducing unwanted boundary dependency by using no padding. We extend the method to also perform coarse segmentation of regions indicated by symbols using the same single architecture. We evaluate our approach on the task of detecting symbols denoting free-standing trees and wooded regions in first edition Ordnance Survey maps and make the corresponding dataset as well as our implementation publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://picterra.ch/.
2.
https://usc-isi-i2.github.io/linked-maps/.
3.
https://livingwithmachines.ac.uk/.
4.
1:2500 County Series 1st Edition [TIFF geospatial data], Scale 1:2500, Updated: 30 November 2010, Historic, Using: EDINA Historic Digimap Service, https://digimap.edina.ac.uk. Downloaded: 2015–2022. 1’ Crown Copyright and Landmark Information Group Limited 2023. All rights reserved. 1890–1893.

References

Adorno, W., Yi, A., Durieux, M., Brown, D.: Hand-drawn symbol recognition of surgical flowsheet graphs with deep image segmentation. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 295–302. IEEE (2020)
Google Scholar
Baily, B.: The extraction of digital vector data from historic land use maps of great britain using image processing techniques. E-perimetron 2(4), 209–223 (2007)
Google Scholar
Branson, S., Wegner, J.D., Hall, D., Lang, N., Schindler, K., Perona, P.: From google maps to a fine-grained catalog of street trees. ISPRS J. Photogramm. Remote. Sens. 135, 13–30 (2018)
Article Google Scholar
Budig, B.: Extracting spatial information from historical maps: algorithms and interaction. Würzburg University Press (2018)
Google Scholar
Chiang, Y.-Y., Duan, W., Leyk, S., Uhl, J.H., Knoblock, C.A.: Using Historical Maps in Scientific Studies. SG, Springer, Cham (2020). https://doi.org/10.1007/978-3-319-66908-3
Book Google Scholar
Elyan, E., Jamieson, L., Ali-Gombe, A.: Deep learning for symbols detection and classification in engineering drawings. Neural Netw. 129, 91–102 (2020)
Article Google Scholar
Garcia-Molsosa, A., Orengo, H.A., Lawrence, D., Philip, G., Hopper, K., Petrie, C.A.: Potential of deep learning segmentation for the extraction of archaeological features from historical map series. Archaeol. Prospect. 28(2), 187–199 (2021)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference On Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Groom, G.B., Levin, G., Svenningsen, S.R., Perner, M.L.: Historical maps machine learning helps us over the map vectorisation crux. In: Automatic Vectorisation of Historical Maps: International workshop organized by the ICA Commission on Cartographic Heritage into the Digital, pp. 89–98. Department of Cartography and Geoinformatics, ELTE Eötvös Loránd University (2020)
Google Scholar
Hosseini, K., McDonough, K., van Strien, D., Vane, O., Wilson, D.C.: Maps of a nation? the digitized ordnance survey for new historical research. J. Vic. Cult. 26(2), 284–299 (2021)
Article Google Scholar
Hosseini, K., Wilson, D.C., Beelen, K., McDonough, K.: Mapreader: a computer vision pipeline for the semantic exploration of maps at scale. In: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities, pp. 8–19 (2022)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp. 448–456. PMLR (2015)
Google Scholar
Islam, M.A., Jia, S., Bruce, N.D.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2019)
Google Scholar
Julca-Aguilar, F.D., Hirata, N.S.: Symbol detection in online handwritten graphics using faster R-CNN. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 151–156. IEEE (2018)
Google Scholar
Kara, L.B., Stahovich, T.F.: An image-based, trainable symbol recognizer for hand-drawn sketches. Comput. Graph. 29(4), 501–517 (2005)
Article Google Scholar
Kayhan, O.S., van Gemert, J.C.: On translation invariance in CNNs: Convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285 (2020)
Google Scholar
Laumer, D., Lang, N., van Doorn, N., Mac Aodha, O., Perona, P., Wegner, J.D.: Geocoding of trees from street addresses and street-level images. ISPRS J. Photogramm. Remote. Sens. 162, 125–136 (2020)
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180. IEEE (2020)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Ltd, B.I.: National tree map (Nov 2022). https://bluesky-world.com/ntm/
Maxwell, A.E.: Semantic segmentation deep learning for extracting surface mine extents from historic topographic maps. Remote Sensing 12(24), 4145 (2020)
Article Google Scholar
Oliver, R.: Ordnance Survey Maps: a concise guide for historians. Charles Close Society (1993)
Google Scholar
Petitpierre, R.: Neural networks for semantic segmentation of historical city maps: Cross-cultural performance and the impact of figurative diversity. arXiv:abs/2101.12478
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Google Scholar
Uhl, J.H., Leyk, S., Chiang, Y.Y., Knoblock, C.A.: Towards the automated large-scale reconstruction of past road networks from historical maps. Comput. Environ. Urban Syst. 94, 101794 (2022)
Article Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Article Google Scholar
Williamson, T., Barnes, G., Pillatt, T.: Trees in England: management and disease since 1600. University of Hertfordshire Press (2017)
Google Scholar
Wong, C.S., Liao, H.M., Tsai, R.T.H., Chang, M.C.: Semi-supervised learning for topographic map analysis over time: a study of bridge segmentation. Sci. Rep. 12(1), 18997 (2022)
Article Google Scholar
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: Deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2020)
Google Scholar

Download references

Acknowledgments

This research was conducted as part of the Future Of UK Treescapes project ’Branching Out: New routes to valuing urban treescapes’, funded by UK Research and Innovation [Grant Number: NE/V020846/1].

Author information

Authors and Affiliations

University of York, York, UK
William A. P. Smith & Toby Pillatt

Authors

William A. P. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Toby Pillatt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William A. P. Smith .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, W.A.P., Pillatt, T. (2023). You Only Look for a Symbol Once: An Object Detector for Symbols and Regions in Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_14
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

You Only Look for a Symbol Once: An Object Detector for Symbols and Regions in Documents