Skip to main content
Log in

Multimodal deep learning based on multiple correspondence analysis for disaster management

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Similar content being viewed by others

Notes

  1. Available at https://github.com/Breakthrough/PySceneDetect

References

  1. Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: Learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp. 892–900 (2016)

  2. Baecchi, C., Uricchio, T., Bertini, M., Del Bimbo, A.: A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed. Tools Appl. 75(5), 2507–2525 (2016)

    Article  Google Scholar 

  3. Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002)

    Article  Google Scholar 

  4. Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: ACM international conference on multimodal interfaces, pp. 205–211 (2004)

  5. Careem, M., De Silva, C., De Silva, R., Raschid, L., Weerawarana, S.: Sahana: Overview of a disaster management system. In: IEEE international conference on information and automation, pp. 361–366 (2006)

  6. Chang, K.I., Bowyer, K.W., Flynn, P.J.: An evaluation of multimodal 2D + 3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 619–624 (2005)

    Article  Google Scholar 

  7. Che, X., Ip, B., Lin, L.: A survey of current youtube video characteristics. IEEE Multimed. 22(2), 56–63 (2015)

    Article  Google Scholar 

  8. Chen, C., Zhu, Q., Lin, L., Shyu, M.L.: Web media semantic concept retrieval via tag removal and model fusion. ACM Trans. Intell. Syst. Technol. 4(4), 61 (2013)

    Google Scholar 

  9. Chen, M., Chen, S.C., Shyu, M.L., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Process. Mag. 23(2), 38–46 (2006)

    Article  Google Scholar 

  10. Chen, S.C., Shyu, M.L., Kashyap, R.L.: Augmented transition network as a semantic model for video data. Int. J. Netw. Inf. Syst., Special Issue on Video Data 3(1), 9–25 (2000)

    Google Scholar 

  11. Chen, S.C., Shyu, M.L., Peeta, S., Zhang, C.: Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems. IEEE Trans. Intell. Transp. Syst. 4(3), 154–167 (2003)

    Article  Google Scholar 

  12. Chen, S.C., Shyu, M.L., Zhang, C.: Innovative shot boundary detection for video indexing. In: Deb, S. (ed.) Video Data Management and Information Retrieval, 217–236. Idea Group Publishing (2005)

  13. Chen, X., Zhang, C., Chen, S.C., Rubin, S.: A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(2), 228–233 (2009)

    Article  Google Scholar 

  14. Fang, R., Pouyanfar, S., Yang, Y., Chen, S.C., Iyengar, S.S.: Computational health informatics in the big data age: A survey. ACM Comput. Surv. 49(1), 12:1–12:36 (2016)

    Article  Google Scholar 

  15. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649 (2013)

  16. Greenacre, M., Blasius, J.: Multiple correspondence analysis and related methods. Chapman and Hall/CRC press, London (2006)

    Book  MATH  Google Scholar 

  17. Grosky, W.I., Zhang, C., Chen, S.C.: Intelligent and pervasive multimedia systems. IEEE MultiMed. 16(1), 14–15 (2009)

    Article  Google Scholar 

  18. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

    Article  Google Scholar 

  19. Hu, X., Deng, F., Li, K., Zhang, T., Chen, H., Jiang, X., Lv, J., Zhu, D., Faraco, C., Zhang, D., et al.: Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In: ACM international conference on multimedia, pp. 451–460 (2010)

  20. Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning?. arXiv:1608.08614 (2016)

  21. Josse, J., Chavent, M., Liquet, B., Husson, F.: Handling missing values with regularized iterative multiple correspondence analysis. J. Classif. 29(1), 91–116 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kahou, S.E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016)

    Article  Google Scholar 

  23. Kessous, L., Castellano, G., Caridakis, G.: Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Interfaces 3(1-2), 33–48 (2010)

    Article  Google Scholar 

  24. Khan, S., Yong, S.P.: A comparison of deep learning and hand crafted features in medical image modality classification. In: IEEE international conference on computer and information sciences, pp. 633–638 (2016)

  25. Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on empirical methods in natural language processing, pp. 1746–1751 (2014)

  26. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

  27. Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)

    Article  Google Scholar 

  28. Li, T., Xie, N., Zeng, C., Zhou, W., Zheng, L., Jiang, Y., Yang, Y., Ha, H.Y., Xue, W., Huang, Y., et al.: Data-driven techniques in disaster information management. ACM Comput. Surv. 50(1), 1:1–1:45 (2017)

    Article  Google Scholar 

  29. Li, Y., Gai, K., Ming, Z., Zhao, H., Qiu, M.: Intercrossed access controls for secure financial services on multimedia big data in cloud systems. ACM Trans. Multimed. Comput. Commun. Appl. 12(4), 67:1–67:18 (2016)

    Google Scholar 

  30. Lin, L., Chen, C., Shyu, M.L., Chen, S.C.: Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE MultiMed. 18(3), 32–43 (2011)

    Article  Google Scholar 

  31. Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C.: Effective feature space reduction with imbalanced data for semantic concept detection. In: IEEE international conference on sensor networks, ubiquitous and trustworthy computing, pp. 262–269 (2008)

  32. Lin, L., Shyu, M.L.: Weighted association rule mining for video semantic detection. Methods and Innovations for Multimedia Database Content Management 1 (1), 37–54 (2012)

    Google Scholar 

  33. Maestre, E., Papiotis, P., Marchini, M., Llimona, Q., Mayor, O., Pérez, A., Wanderley, M.M.: Enriched multimodal representations of music performances: Online access and visualization. IEEE MultiMed. 24(1), 24–34 (2017)

    Article  Google Scholar 

  34. McDonald, K., Smeaton, A.F.: A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: International conference on image and video retrieval, pp. 61–70 (2005)

  35. Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design challenges for an integrated disaster management communication and information system. In: IEEE workshop on disaster recovery networks (2002)

  36. Meng, T., Shyu, M.L.: Leveraging concept association network for multimedia rare concept mining and retrieval. In: IEEE international conference on multimedia and expo, pp. 860–865 (2012)

  37. Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: IEEE international conference on computer vision, pp. 104–111 (2009)

  38. Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International conference on machine learning, pp. 689–696 (2011)

  39. Pan, L., Pouyanfar, S., Chen, H., Qin, J., Chen, S.C.: Deepfood: Automatic multi-class classification of food ingredients using deep learning. In: IEEE international conference on collaboration and internet computing, pp. 181–189 (2017)

  40. Poria, S., Cambria, E., Hussain, A., Huang, G.B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)

    Article  Google Scholar 

  41. Pouyanfar, S., Chen, S.C.: Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management. In: IEEE international conference on information reuse and integration, pp. 556–564 (2016)

  42. Pouyanfar, S., Chen, S.C.: Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int. J. Semantic Comput. 11(01), 85–109 (2017)

    Article  Google Scholar 

  43. Pouyanfar, S., Chen, S.C., Shyu, M.L.: Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp. 386–393 (2018)

  44. Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.S.: Multimedia big data analytics: A survey. ACM Comput. Surv. 51(1), 10:1–10:34 (2018)

    Article  Google Scholar 

  45. Saporta, G.: Data fusion and data grafting. Comput. Stat. Data Anal. 38(4), 465–473 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  46. Shyu, M.L., Chen, S.C.: Emerging multimedia research and applications. IEEE MultiMed. 22(4), 11–13 (2015)

    Article  Google Scholar 

  47. Shyu, M.L., Chen, S.C., Kashyap, R.L.: Generalized affinity-based association rule mining for multimedia database queries. Knowl. Inf. Syst. 3(3), 319–337 (2001)

    Article  MATH  Google Scholar 

  48. Shyu, M.L., Sarinnapakorn, K., Kuruppu-Appuhamilage, I., Chen, S.C., Chang, L., Goldring, T.: Handling nominal features in anomaly intrusion detection problems. In: International workshop on research issues in data engineering: Stream data mining and applications, pp. 55–62 (2005)

  49. Smith, J.R.: Riding the multimedia big data wave. In: ACM SIGIR conference on research and development in information retrieval, pp. 1–2 (2013)

  50. Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: IEEE international conference on system science, engineering design and manufacturing informatization, pp. 27–30 (2010)

  51. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp. 2818–2826 (2016)

  52. Tian, H., Chen, S.C.: MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In: IEEE international conference on multimedia big data, pp. 268–275 (2017)

  53. Tian, H., Chen, S.C., Rubin, S.H., Grefe, W.K.: FA-MCADF: Feature affinity based multiple correspondence analysis and decision fusion framework for disaster information management. In: IEEE international conference on information reuse and integration, pp. 198–206 (2017)

  54. Tian, Y., Chen, S.C., Shyu, M. L., Huang, T., Sheu, P., Del Bimbo, A.: Multimedia big data. IEEE MultiMed. 22(3), 93–95 (2015)

    Article  Google Scholar 

  55. Walch, M., Lange, K., Baumann, M., Weber, M.: Autonomous driving: investigating the feasibility of car-driver handover assistance. In: ACM international conference on automotive user interfaces and interactive vehicular applications, pp. 11–18 (2015)

  56. Wang, Z., Kuan, K., Ravaut, M., Manek, G., Song, S., Fang, Y., Kim, S., Chen, N., D’Haro, L.F., Tuan, L.A., et al.: Truly multi-modal youtube-8m video classification with video, audio, and text. arXiv:1706.05461 (2017)

  57. Weill, P., Vitale, M.: Place to space: Migrating to eBusiness models. Harvard Business Press, Brighton (2001)

  58. Wöllmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In: Annual conference of the international speech communication association, pp. 2362–2365 (2010)

  59. Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  60. Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: ACM international conference on multimedia, pp. 572–579 (2004)

  61. Yang, Y., Pouyanfar, S., Tian, H., Chen, M., Chen, S.C., Shyu, M.L.: IF-MCA: Importance factor-based multiple correspondence analysis for multimedia data analytics. IEEE Trans. Multimed. 20(4), 1024–1032 (2018)

    Article  Google Scholar 

  62. Yates, D., Paquette, S.: Emergency knowledge management and social media technologies: A case study of the 2010 haitian earthquake. Int. J. Inf. Manag. 31(1), 6–13 (2011)

    Article  Google Scholar 

  63. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp. 487–495 (2014)

  64. Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: IEEE international conference on semantic computing, pp. 462–469 (2010)

  65. Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Effective supervised discretization for classification based on correlation maximization. In: IEEE international conference on information reuse and integration, pp. 390–395 (2011)

  66. Zhu, W., Cui, P., Wang, Z., Hua, G.: Multimedia big data computing. IEEE Multimed. 22(3), 96–105 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by NSF CNS-1461926.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samira Pouyanfar.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Big Data for Effective Disaster Management

Guest Editors: Xuan Song, Song Guo, and Haizhong Wang

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pouyanfar, S., Tao, Y., Tian, H. et al. Multimodal deep learning based on multiple correspondence analysis for disaster management. World Wide Web 22, 1893–1911 (2019). https://doi.org/10.1007/s11280-018-0636-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-018-0636-4

Keywords

Navigation