Abstract
We present YOLSO, a single stage object detector specialised for the detection of fixed size, non-uniform (e.g. hand-drawn or stamped) symbols in maps and other historical documents. Like YOLO, a single convolutional neural network predicts class probabilities and bounding boxes over a grid that exploits context surrounding an object of interest. However, our specialised approach differs from YOLO in several ways. We can assume symbols of a fixed scale and so need only predict bounding box centres, not dimensions. We can design the grid size and receptive field of a grid cell to be appropriate to the known scale of the symbols. Since maps have no meaningful boundary, we use a fully convolutional architecture applicable to any resolution and avoid introducing unwanted boundary dependency by using no padding. We extend the method to also perform coarse segmentation of regions indicated by symbols using the same single architecture. We evaluate our approach on the task of detecting symbols denoting free-standing trees and wooded regions in first edition Ordnance Survey maps and make the corresponding dataset as well as our implementation publicly available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
1:2500 County Series 1st Edition [TIFF geospatial data], Scale 1:2500, Updated: 30 November 2010, Historic, Using: EDINA Historic Digimap Service, https://digimap.edina.ac.uk. Downloaded: 2015–2022. 1’ Crown Copyright and Landmark Information Group Limited 2023. All rights reserved. 1890–1893.
References
Adorno, W., Yi, A., Durieux, M., Brown, D.: Hand-drawn symbol recognition of surgical flowsheet graphs with deep image segmentation. In: 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 295–302. IEEE (2020)
Baily, B.: The extraction of digital vector data from historic land use maps of great britain using image processing techniques. E-perimetron 2(4), 209–223 (2007)
Branson, S., Wegner, J.D., Hall, D., Lang, N., Schindler, K., Perona, P.: From google maps to a fine-grained catalog of street trees. ISPRS J. Photogramm. Remote. Sens. 135, 13–30 (2018)
Budig, B.: Extracting spatial information from historical maps: algorithms and interaction. Würzburg University Press (2018)
Chiang, Y.-Y., Duan, W., Leyk, S., Uhl, J.H., Knoblock, C.A.: Using Historical Maps in Scientific Studies. SG, Springer, Cham (2020). https://doi.org/10.1007/978-3-319-66908-3
Elyan, E., Jamieson, L., Ali-Gombe, A.: Deep learning for symbols detection and classification in engineering drawings. Neural Netw. 129, 91–102 (2020)
Garcia-Molsosa, A., Orengo, H.A., Lawrence, D., Philip, G., Hopper, K., Petrie, C.A.: Potential of deep learning segmentation for the extraction of archaeological features from historical map series. Archaeol. Prospect. 28(2), 187–199 (2021)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference On Computer Vision, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Groom, G.B., Levin, G., Svenningsen, S.R., Perner, M.L.: Historical maps machine learning helps us over the map vectorisation crux. In: Automatic Vectorisation of Historical Maps: International workshop organized by the ICA Commission on Cartographic Heritage into the Digital, pp. 89–98. Department of Cartography and Geoinformatics, ELTE Eötvös Loránd University (2020)
Hosseini, K., McDonough, K., van Strien, D., Vane, O., Wilson, D.C.: Maps of a nation? the digitized ordnance survey for new historical research. J. Vic. Cult. 26(2), 284–299 (2021)
Hosseini, K., Wilson, D.C., Beelen, K., McDonough, K.: Mapreader: a computer vision pipeline for the semantic exploration of maps at scale. In: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Geospatial Humanities, pp. 8–19 (2022)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp. 448–456. PMLR (2015)
Islam, M.A., Jia, S., Bruce, N.D.: How much position information do convolutional neural networks encode? In: International Conference on Learning Representations (2019)
Julca-Aguilar, F.D., Hirata, N.S.: Symbol detection in online handwritten graphics using faster R-CNN. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 151–156. IEEE (2018)
Kara, L.B., Stahovich, T.F.: An image-based, trainable symbol recognizer for hand-drawn sketches. Comput. Graph. 29(4), 501–517 (2005)
Kayhan, O.S., van Gemert, J.C.: On translation invariance in CNNs: Convolutional layers can exploit absolute spatial location. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14274–14285 (2020)
Laumer, D., Lang, N., van Doorn, N., Mac Aodha, O., Perona, P., Wegner, J.D.: Geocoding of trees from street addresses and street-level images. ISPRS J. Photogramm. Remote. Sens. 162, 125–136 (2020)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Li, Z., Jin, L., Lai, S., Zhu, Y.: Improving attention-based handwritten mathematical expression recognition with scale augmentation and drop attention. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 175–180. IEEE (2020)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ltd, B.I.: National tree map (Nov 2022). https://bluesky-world.com/ntm/
Maxwell, A.E.: Semantic segmentation deep learning for extracting surface mine extents from historic topographic maps. Remote Sensing 12(24), 4145 (2020)
Oliver, R.: Ordnance Survey Maps: a concise guide for historians. Charles Close Society (1993)
Petitpierre, R.: Neural networks for semantic segmentation of historical city maps: Cross-cultural performance and the impact of figurative diversity. arXiv:abs/2101.12478
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
Uhl, J.H., Leyk, S., Chiang, Y.Y., Knoblock, C.A.: Towards the automated large-scale reconstruction of past road networks from historical maps. Comput. Environ. Urban Syst. 94, 101794 (2022)
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Williamson, T., Barnes, G., Pillatt, T.: Trees in England: management and disease since 1600. University of Hertfordshire Press (2017)
Wong, C.S., Liao, H.M., Tsai, R.T.H., Chang, M.C.: Semi-supervised learning for topographic map analysis over time: a study of bridge segmentation. Sci. Rep. 12(1), 18997 (2022)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: Deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (2020)
Acknowledgments
This research was conducted as part of the Future Of UK Treescapes project ’Branching Out: New routes to valuing urban treescapes’, funded by UK Research and Innovation [Grant Number: NE/V020846/1].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Smith, W.A.P., Pillatt, T. (2023). You Only Look for a Symbol Once: An Object Detector for Symbols and Regions in Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)