Abstract
When soil remediation specialists clean up a new site, they have a long time manually revising digital reports previously written by other experts, where they look for necessary information in accordance with similar characteristics of polluted fields. Important information lies in tables, graphs, maps, drawings and their associated captions. Therefore, experts have to be able to quickly access these content-rich elements, instead of manually scrolling through each page of entire reports. Since this information is multimodal (image and text) and follows a semantically hierarchical structure, we propose a classification algorithm that takes these two constraints into account. In contrast to existing works using either multimodal system or hierarchical classification model, we explore the combination of state-of-the-art methods from multimodal systems (image and text modalities) and hierarchical classification systems. By this combination, we tackle the constraints of our classification process: small dataset, missing modalities, noisy data, and non-English corpus. Our evaluation shows that the multimodal hierarchical system outperforms the unimodal and that the performance of multimodal system with a joint combination of hierarchical classification and flat classification on different modalities provides promising results.
This work is supported by Abai-Verne scholarship and Innovasol Consortium.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 323–330 (2019)
Baker, S., Korhonen, A.L.: Initializing neural networks for hierarchical multi-label text classification. In: Proceedings of the BioNLP Workshop, pp. 307–315 (2017)
Banerjee, S., Akkaya, C., Perez-Sorrosal, F., Tsioutsiouliklis, K.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300 (2019)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, October 2014
Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: 2016 IEEE/ACM Joint Conference on Digital Libraries, pp. 143–152. IEEE (2016)
Das, S.D., Mandal, S.: Team neuro at SemEval-2020 task 8: multi-modal fine grain emotion classification of memes using multitask learning. arXiv preprint arXiv:2005.10915 (2020)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arxiv 2015. arXiv preprint arXiv:1508.01991 (2015)
Hyman, M., Dupont, R.R.: Groundwater and soil remediation: process design and cost estimating of proven technologies, pp. 367–422. American Society of Civil Engineers (2001)
Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 20, 226–239 (1998)
Lu, D., Neves, L., Carvalho, V., Zhang, N., Ji, H.: Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1990–1999 (2018)
Martin, L., et al.: Camembert: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)
Masakuna, J.F., Utete, S.W., Kroon, S.: Performance-agnostic fusion of probabilistic classifier outputs. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–8. IEEE (2020)
McKnight, P.E., Najab, J.: Mann-Whitney U Test, pp. 1. American Cancer Society (2010)
Narayana, P., Pednekar, A., Krishnamoorthy, A., Sone, K., Basu, S.: HUSE: hierarchical universal semantic embeddings. arXiv preprint arXiv:1911.05978 (2019)
Pastor, J., Gutiérrez-Ginés, M.J., Bartolomé, C., Hernández, A.J.: The complex nature of pollution in the capping soils of closed landfills: Case study in a mediterranean setting. In: Environmental Risk Assessment of Soil Contamination, pp. 199–223. IntechOpen, Rijeka (2014)
Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning, chap. TF-IDF, pp. 986–987. Springer, US, Boston, MA (2010). ISBN 978-1-4899-7687-1
Sharma, C., et al.: SemEval-2020 Task 8: memotion analysis-the visuo-lingual Metaphor! In: Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020) (2020)
Shen, Y., Tan, S., Sordoni, A., Courville, A.: Ordered neurons: integrating tree structures into recurrent neural networks. arXiv preprint arXiv:1810.09536 (2018)
Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Biol. Learn. Soc. 1–14 (2015)
Sun, C., Song, X., Feng, F., Zhao, W.X., Zhang, H., Nie, L.: Supervised hierarchical cross-modal hashing. In: Proceedings of the 42nd International ACM SIGIR on Research and Development in Information Retrieval, pp. 725–734 (2019)
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566 (2015)
Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9049–9058. IEEE (2018)
Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: International Conference on Machine Learning, pp. 5075–5084 (2018)
Wuana, R.A., Okieimen, F.E.: Heavy metals in contaminated soils: a review of sources, chemistry, risks, and best available strategies for remediation. Heavy Metal Contamination of Water and Soil: Analysis, Assessment, and Remediation Strategies, p. 1 (2014)
Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., Ye, Q.: DANet: divergent activation for weakly supervised object localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6589–6598 (2019)
Yang, Y., Wu, Y.F., Zhan, D.C., Liu, Z.B., Jiang, Y.: Complex object classification: a multi-modal multi-instance multi-label deep network with optimal transport. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2594–2603 (2018)
Yu, Z., Yu, J., Xiang, C., Fan, J., Tao, D.: Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans. Neural Netw. Learn. Syst. 29(12), 5947–5959 (2018)
Zhang, Q., Chai, B., Song, B., Zhao, J.: A hierarchical fine-tuning based approach for multi-label text classification. In: 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pp. 51–54. IEEE (2020)
Zhe, X., Ou-Yang, L., Chen, S., Yan, H.: Semantic hierarchy preserving deep hashing for large-scale image retrieval. arXiv preprint arXiv:1901.11259 (2019)
Zhou, J., et al.: Hierarchy-aware global model for hierarchical text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1106–1117 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Rysbayeva, K., Giot, R., Journet, N. (2021). Hierarchical and Multimodal Classification of Images from Soil Remediation Reports. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-86549-8_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86548-1
Online ISBN: 978-3-030-86549-8
eBook Packages: Computer ScienceComputer Science (R0)