Skip to main content

Hierarchical and Multimodal Classification of Images from Soil Remediation Reports

  • Conference paper
  • First Online:
Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Included in the following conference series:

Abstract

When soil remediation specialists clean up a new site, they have a long time manually revising digital reports previously written by other experts, where they look for necessary information in accordance with similar characteristics of polluted fields. Important information lies in tables, graphs, maps, drawings and their associated captions. Therefore, experts have to be able to quickly access these content-rich elements, instead of manually scrolling through each page of entire reports. Since this information is multimodal (image and text) and follows a semantically hierarchical structure, we propose a classification algorithm that takes these two constraints into account. In contrast to existing works using either multimodal system or hierarchical classification model, we explore the combination of state-of-the-art methods from multimodal systems (image and text modalities) and hierarchical classification systems. By this combination, we tackle the constraints of our classification process: small dataset, missing modalities, noisy data, and non-English corpus. Our evaluation shows that the multimodal hierarchical system outperforms the unimodal and that the performance of multimodal system with a joint combination of hierarchical classification and flat classification on different modalities provides promising results.

This work is supported by Abai-Verne scholarship and Innovasol Consortium.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 323–330 (2019)

    Google Scholar 

  2. Baker, S., Korhonen, A.L.: Initializing neural networks for hierarchical multi-label text classification. In: Proceedings of the BioNLP Workshop, pp. 307–315 (2017)

    Google Scholar 

  3. Banerjee, S., Akkaya, C., Perez-Sorrosal, F., Tsioutsiouliklis, K.: Hierarchical transfer learning for multi-label text classification. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6295–6300 (2019)

    Google Scholar 

  4. Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, October 2014

    Google Scholar 

  5. Clark, C., Divvala, S.: PDFFigures 2.0: mining figures from research papers. In: 2016 IEEE/ACM Joint Conference on Digital Libraries, pp. 143–152. IEEE (2016)

    Google Scholar 

  6. Das, S.D., Mandal, S.: Team neuro at SemEval-2020 task 8: multi-modal fine grain emotion classification of memes using multitask learning. arXiv preprint arXiv:2005.10915 (2020)

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  9. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arxiv 2015. arXiv preprint arXiv:1508.01991 (2015)

  10. Hyman, M., Dupont, R.R.: Groundwater and soil remediation: process design and cost estimating of proven technologies, pp. 367–422. American Society of Civil Engineers (2001)

    Google Scholar 

  11. Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 20, 226–239 (1998)

    Article  Google Scholar 

  12. Lu, D., Neves, L., Carvalho, V., Zhang, N., Ji, H.: Visual attention model for name tagging in multimodal social media. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1990–1999 (2018)

    Google Scholar 

  13. Martin, L., et al.: Camembert: a tasty French language model. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020)

    Google Scholar 

  14. Masakuna, J.F., Utete, S.W., Kroon, S.: Performance-agnostic fusion of probabilistic classifier outputs. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–8. IEEE (2020)

    Google Scholar 

  15. McKnight, P.E., Najab, J.: Mann-Whitney U Test, pp. 1. American Cancer Society (2010)

    Google Scholar 

  16. Narayana, P., Pednekar, A., Krishnamoorthy, A., Sone, K., Basu, S.: HUSE: hierarchical universal semantic embeddings. arXiv preprint arXiv:1911.05978 (2019)

  17. Pastor, J., Gutiérrez-Ginés, M.J., Bartolomé, C., Hernández, A.J.: The complex nature of pollution in the capping soils of closed landfills: Case study in a mediterranean setting. In: Environmental Risk Assessment of Soil Contamination, pp. 199–223. IntechOpen, Rijeka (2014)

    Google Scholar 

  18. Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning, chap. TF-IDF, pp. 986–987. Springer, US, Boston, MA (2010). ISBN 978-1-4899-7687-1

    Google Scholar 

  19. Sharma, C., et al.: SemEval-2020 Task 8: memotion analysis-the visuo-lingual Metaphor! In: Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval-2020) (2020)

    Google Scholar 

  20. Shen, Y., Tan, S., Sordoni, A., Courville, A.: Ordered neurons: integrating tree structures into recurrent neural networks. arXiv preprint arXiv:1810.09536 (2018)

  21. Shimura, K., Li, J., Fukumoto, F.: HFT-CNN: learning hierarchical category structure for multi-label short text categorization. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 811–816 (2018)

    Google Scholar 

  22. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Comput. Biol. Learn. Soc. 1–14 (2015)

    Google Scholar 

  23. Sun, C., Song, X., Feng, F., Zhao, W.X., Zhang, H., Nie, L.: Supervised hierarchical cross-modal hashing. In: Proceedings of the 42nd International ACM SIGIR on Research and Development in Information Retrieval, pp. 725–734 (2019)

    Google Scholar 

  24. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1556–1566 (2015)

    Google Scholar 

  25. Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9049–9058. IEEE (2018)

    Google Scholar 

  26. Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: International Conference on Machine Learning, pp. 5075–5084 (2018)

    Google Scholar 

  27. Wuana, R.A., Okieimen, F.E.: Heavy metals in contaminated soils: a review of sources, chemistry, risks, and best available strategies for remediation. Heavy Metal Contamination of Water and Soil: Analysis, Assessment, and Remediation Strategies, p. 1 (2014)

    Google Scholar 

  28. Xue, H., Liu, C., Wan, F., Jiao, J., Ji, X., Ye, Q.: DANet: divergent activation for weakly supervised object localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 6589–6598 (2019)

    Google Scholar 

  29. Yang, Y., Wu, Y.F., Zhan, D.C., Liu, Z.B., Jiang, Y.: Complex object classification: a multi-modal multi-instance multi-label deep network with optimal transport. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2594–2603 (2018)

    Google Scholar 

  30. Yu, Z., Yu, J., Xiang, C., Fan, J., Tao, D.: Beyond bilinear: generalized multimodal factorized high-order pooling for visual question answering. IEEE Trans. Neural Netw. Learn. Syst. 29(12), 5947–5959 (2018)

    Google Scholar 

  31. Zhang, Q., Chai, B., Song, B., Zhao, J.: A hierarchical fine-tuning based approach for multi-label text classification. In: 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), pp. 51–54. IEEE (2020)

    Google Scholar 

  32. Zhe, X., Ou-Yang, L., Chen, S., Yan, H.: Semantic hierarchy preserving deep hashing for large-scale image retrieval. arXiv preprint arXiv:1901.11259 (2019)

  33. Zhou, J., et al.: Hierarchy-aware global model for hierarchical text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1106–1117 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Korlan Rysbayeva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rysbayeva, K., Giot, R., Journet, N. (2021). Hierarchical and Multimodal Classification of Images from Soil Remediation Reports. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86549-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86548-1

  • Online ISBN: 978-3-030-86549-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics