Hierarchical and Multimodal Classification of Images from Soil Remediation Reports

Rysbayeva, Korlan; Giot, Romain; Journet, Nicholas

doi:10.1007/978-3-030-86549-8_11

Korlan Rysbayeva¹¹,
Romain Giot¹¹ &
Nicholas Journet¹¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Included in the following conference series:

International Conference on Document Analysis and Recognition

3857 Accesses
1 Citations

Abstract

When soil remediation specialists clean up a new site, they have a long time manually revising digital reports previously written by other experts, where they look for necessary information in accordance with similar characteristics of polluted fields. Important information lies in tables, graphs, maps, drawings and their associated captions. Therefore, experts have to be able to quickly access these content-rich elements, instead of manually scrolling through each page of entire reports. Since this information is multimodal (image and text) and follows a semantically hierarchical structure, we propose a classification algorithm that takes these two constraints into account. In contrast to existing works using either multimodal system or hierarchical classification model, we explore the combination of state-of-the-art methods from multimodal systems (image and text modalities) and hierarchical classification systems. By this combination, we tackle the constraints of our classification process: small dataset, missing modalities, noisy data, and non-English corpus. Our evaluation shows that the multimodal hierarchical system outperforms the unimodal and that the performance of multimodal system with a joint combination of hierarchical classification and flat classification on different modalities provides promising results.

This work is supported by Abai-Verne scholarship and Innovasol Consortium.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 323–330 (2019)
Google Scholar
Baker, S., Korhonen, A.L.: Initializing neural networks for hierarchical multi-label text classification. In: Proceedings of the BioNLP Workshop, pp. 307–315 (2017)
Google Scholar
Banerjee, S., Akkaya, C., Perez-Sorrosal, F., Tsioutsiouliklis, K.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300 (2019)
Google Scholar
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, October 2014
Google Scholar
Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: 2016 IEEE/ACM Joint Conference on Digital Libraries, pp. 143–152. IEEE (2016)
Google Scholar
Das, S.D., Mandal, S.: Team neuro at SemEval-2020 task 8: multi-modal fine grain emotion classification of memes using multitask learning. arXiv preprint arXiv:2005.10915 (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arxiv 2015. arXiv preprint arXiv:1508.01991 (2015)
Hyman, M., Dupont, R.R.: Groundwater and soil remediation: process design and cost estimating of proven technologies, pp. 367–422. American Society of Civil Engineers (2001)
Google Scholar
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 20, 226–239 (1998)
Article Google Scholar
Lu, D., Neves, L., Carvalho, V., Zhang, N., Ji, H.: Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1990–1999 (2018)
Google Scholar
Martin, L., et al.: Camembert: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
Google Scholar
Masakuna, J.F., Utete, S.W., Kroon, S.: Performance-agnostic fusion of probabilistic classifier outputs. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–8. IEEE (2020)
Google Scholar
McKnight, P.E., Najab, J.: Mann-Whitney U Test, pp. 1. American Cancer Society (2010)
Google Scholar
Narayana, P., Pednekar, A., Krishnamoorthy, A., Sone, K., Basu, S.: HUSE: hierarchical universal semantic embeddings. arXiv preprint arXiv:1911.05978 (2019)
Pastor, J., Gutiérrez-Ginés, M.J., Bartolomé, C., Hernández, A.J.: The complex nature of pollution in the capping soils of closed landfills: Case study in a mediterranean setting. In: Environmental Risk Assessment of Soil Contamination, pp. 199–223. IntechOpen, Rijeka (2014)
Google Scholar
Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning, chap. TF-IDF, pp. 986–987. Springer, US, Boston, MA (2010). ISBN 978-1-4899-7687-1
Google Scholar
Sharma, C., et al.: SemEval-2020 Task 8: memotion analysis-the visuo-lingual Metaphor! In: Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020) (2020)
Google Scholar
Shen, Y., Tan, S., Sordoni, A., Courville, A.: Ordered neurons: integrating tree structures into recurrent neural networks. arXiv preprint arXiv:1810.09536 (2018)
Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Biol. Learn. Soc. 1–14 (2015)
Google Scholar
Sun, C., Song, X., Feng, F., Zhao, W.X., Zhang, H., Nie, L.: Supervised hierarchical cross-modal hashing. In: Proceedings of the 42nd International ACM SIGIR on Research and Development in Information Retrieval, pp. 725–734 (2019)
Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566 (2015)
Google Scholar
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9049–9058. IEEE (2018)
Google Scholar
Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: International Conference on Machine Learning, pp. 5075–5084 (2018)
Google Scholar
Wuana, R.A., Okieimen, F.E.: Heavy metals in contaminated soils: a review of sources, chemistry, risks, and best available strategies for remediation. Heavy Metal Contamination of Water and Soil: Analysis, Assessment, and Remediation Strategies, p. 1 (2014)
Google Scholar
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., Ye, Q.: DANet: divergent activation for weakly supervised object localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6589–6598 (2019)
Google Scholar
Yang, Y., Wu, Y.F., Zhan, D.C., Liu, Z.B., Jiang, Y.: Complex object classification: a multi-modal multi-instance multi-label deep network with optimal transport. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2594–2603 (2018)
Google Scholar
Yu, Z., Yu, J., Xiang, C., Fan, J., Tao, D.: Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans. Neural Netw. Learn. Syst. 29(12), 5947–5959 (2018)
Google Scholar
Zhang, Q., Chai, B., Song, B., Zhao, J.: A hierarchical fine-tuning based approach for multi-label text classification. In: 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pp. 51–54. IEEE (2020)
Google Scholar
Zhe, X., Ou-Yang, L., Chen, S., Yan, H.: Semantic hierarchy preserving deep hashing for large-scale image retrieval. arXiv preprint arXiv:1901.11259 (2019)
Zhou, J., et al.: Hierarchy-aware global model for hierarchical text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1106–1117 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

LaBRI, UMR5800, Univ. Bordeaux, Bordeaux INP, CNRS, 33400, Talence, France
Korlan Rysbayeva, Romain Giot & Nicholas Journet

Authors

Korlan Rysbayeva
View author publications
You can also search for this author in PubMed Google Scholar
Romain Giot
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Journet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Korlan Rysbayeva .

Editor information

Editors and Affiliations

Universitat Autònoma de Barcelona, Barcelona, Spain
Josep Lladós
Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Kyushu University, Fukuoka-shi, Japan
Seiichi Uchida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rysbayeva, K., Giot, R., Journet, N. (2021). Hierarchical and Multimodal Classification of Images from Soil Remediation Reports. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-86549-8_11
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86548-1
Online ISBN: 978-3-030-86549-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)