Multimodal deep learning based on multiple correspondence analysis for disaster management

Pouyanfar, Samira; Tao, Yudong; Tian, Haiman; Chen, Shu-Ching; Shyu, Mei-Ling

doi:10.1007/s11280-018-0636-4

Multimodal deep learning based on multiple correspondence analysis for disaster management

Published: 18 September 2018

Volume 22, pages 1893–1911, (2019)
Cite this article

World Wide Web Aims and scope Submit manuscript

Samira Pouyanfar¹,
Yudong Tao²,
Haiman Tian¹,
Shu-Ching Chen ORCID: orcid.org/0000-0001-9209-390X¹ &
…
Mei-Ling Shyu²

1819 Accesses
30 Citations
Explore all metrics

Abstract

The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 3

Multimodal deep representation learning for video classification

Article 03 May 2018

Haiman Tian, Yudong Tao, … Mei-Ling Shyu

Multi-scale network with shared cross-attention for audio–visual correlation learning

Article 19 July 2023

Jiwei Zhang, Yi Yu, … Jianming Wu

GLTCM: Global-Local Temporal and Cross-Modal Network for Audio-Visual Event Localization

Notes

Available at https://github.com/Breakthrough/PySceneDetect

References

Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: Learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp. 892–900 (2016)
Baecchi, C., Uricchio, T., Bertini, M., Del Bimbo, A.: A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed. Tools Appl. 75(5), 2507–2525 (2016)
Article Google Scholar
Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002)
Article Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: ACM international conference on multimodal interfaces, pp. 205–211 (2004)
Careem, M., De Silva, C., De Silva, R., Raschid, L., Weerawarana, S.: Sahana: Overview of a disaster management system. In: IEEE international conference on information and automation, pp. 361–366 (2006)
Chang, K.I., Bowyer, K.W., Flynn, P.J.: An evaluation of multimodal 2D + 3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 619–624 (2005)
Article Google Scholar
Che, X., Ip, B., Lin, L.: A survey of current youtube video characteristics. IEEE Multimed. 22(2), 56–63 (2015)
Article Google Scholar
Chen, C., Zhu, Q., Lin, L., Shyu, M.L.: Web media semantic concept retrieval via tag removal and model fusion. ACM Trans. Intell. Syst. Technol. 4(4), 61 (2013)
Google Scholar
Chen, M., Chen, S.C., Shyu, M.L., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Process. Mag. 23(2), 38–46 (2006)
Article Google Scholar
Chen, S.C., Shyu, M.L., Kashyap, R.L.: Augmented transition network as a semantic model for video data. Int. J. Netw. Inf. Syst., Special Issue on Video Data 3(1), 9–25 (2000)
Google Scholar
Chen, S.C., Shyu, M.L., Peeta, S., Zhang, C.: Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems. IEEE Trans. Intell. Transp. Syst. 4(3), 154–167 (2003)
Article Google Scholar
Chen, S.C., Shyu, M.L., Zhang, C.: Innovative shot boundary detection for video indexing. In: Deb, S. (ed.) Video Data Management and Information Retrieval, 217–236. Idea Group Publishing (2005)
Chen, X., Zhang, C., Chen, S.C., Rubin, S.: A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(2), 228–233 (2009)
Article Google Scholar
Fang, R., Pouyanfar, S., Yang, Y., Chen, S.C., Iyengar, S.S.: Computational health informatics in the big data age: A survey. ACM Comput. Surv. 49(1), 12:1–12:36 (2016)
Article Google Scholar
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649 (2013)
Greenacre, M., Blasius, J.: Multiple correspondence analysis and related methods. Chapman and Hall/CRC press, London (2006)
Book MATH Google Scholar
Grosky, W.I., Zhang, C., Chen, S.C.: Intelligent and pervasive multimedia systems. IEEE MultiMed. 16(1), 14–15 (2009)
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Hu, X., Deng, F., Li, K., Zhang, T., Chen, H., Jiang, X., Lv, J., Zhu, D., Faraco, C., Zhang, D., et al.: Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In: ACM international conference on multimedia, pp. 451–460 (2010)
Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning?. arXiv:1608.08614 (2016)
Josse, J., Chavent, M., Liquet, B., Husson, F.: Handling missing values with regularized iterative multiple correspondence analysis. J. Classif. 29(1), 91–116 (2012)
Article MathSciNet MATH Google Scholar
Kahou, S.E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016)
Article Google Scholar
Kessous, L., Castellano, G., Caridakis, G.: Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Interfaces 3(1-2), 33–48 (2010)
Article Google Scholar
Khan, S., Yong, S.P.: A comparison of deep learning and hand crafted features in medical image modality classification. In: IEEE international conference on computer and information sciences, pp. 633–638 (2016)
Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on empirical methods in natural language processing, pp. 1746–1751 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)
Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)
Article Google Scholar
Li, T., Xie, N., Zeng, C., Zhou, W., Zheng, L., Jiang, Y., Yang, Y., Ha, H.Y., Xue, W., Huang, Y., et al.: Data-driven techniques in disaster information management. ACM Comput. Surv. 50(1), 1:1–1:45 (2017)
Article Google Scholar
Li, Y., Gai, K., Ming, Z., Zhao, H., Qiu, M.: Intercrossed access controls for secure financial services on multimedia big data in cloud systems. ACM Trans. Multimed. Comput. Commun. Appl. 12(4), 67:1–67:18 (2016)
Google Scholar
Lin, L., Chen, C., Shyu, M.L., Chen, S.C.: Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE MultiMed. 18(3), 32–43 (2011)
Article Google Scholar
Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C.: Effective feature space reduction with imbalanced data for semantic concept detection. In: IEEE international conference on sensor networks, ubiquitous and trustworthy computing, pp. 262–269 (2008)
Lin, L., Shyu, M.L.: Weighted association rule mining for video semantic detection. Methods and Innovations for Multimedia Database Content Management 1 (1), 37–54 (2012)
Google Scholar
Maestre, E., Papiotis, P., Marchini, M., Llimona, Q., Mayor, O., Pérez, A., Wanderley, M.M.: Enriched multimodal representations of music performances: Online access and visualization. IEEE MultiMed. 24(1), 24–34 (2017)
Article Google Scholar
McDonald, K., Smeaton, A.F.: A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: International conference on image and video retrieval, pp. 61–70 (2005)
Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design challenges for an integrated disaster management communication and information system. In: IEEE workshop on disaster recovery networks (2002)
Meng, T., Shyu, M.L.: Leveraging concept association network for multimedia rare concept mining and retrieval. In: IEEE international conference on multimedia and expo, pp. 860–865 (2012)
Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: IEEE international conference on computer vision, pp. 104–111 (2009)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International conference on machine learning, pp. 689–696 (2011)
Pan, L., Pouyanfar, S., Chen, H., Qin, J., Chen, S.C.: Deepfood: Automatic multi-class classification of food ingredients using deep learning. In: IEEE international conference on collaboration and internet computing, pp. 181–189 (2017)
Poria, S., Cambria, E., Hussain, A., Huang, G.B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)
Article Google Scholar
Pouyanfar, S., Chen, S.C.: Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management. In: IEEE international conference on information reuse and integration, pp. 556–564 (2016)
Pouyanfar, S., Chen, S.C.: Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int. J. Semantic Comput. 11(01), 85–109 (2017)
Article Google Scholar
Pouyanfar, S., Chen, S.C., Shyu, M.L.: Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp. 386–393 (2018)
Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.S.: Multimedia big data analytics: A survey. ACM Comput. Surv. 51(1), 10:1–10:34 (2018)
Article Google Scholar
Saporta, G.: Data fusion and data grafting. Comput. Stat. Data Anal. 38(4), 465–473 (2002)
Article MathSciNet MATH Google Scholar
Shyu, M.L., Chen, S.C.: Emerging multimedia research and applications. IEEE MultiMed. 22(4), 11–13 (2015)
Article Google Scholar
Shyu, M.L., Chen, S.C., Kashyap, R.L.: Generalized affinity-based association rule mining for multimedia database queries. Knowl. Inf. Syst. 3(3), 319–337 (2001)
Article MATH Google Scholar
Shyu, M.L., Sarinnapakorn, K., Kuruppu-Appuhamilage, I., Chen, S.C., Chang, L., Goldring, T.: Handling nominal features in anomaly intrusion detection problems. In: International workshop on research issues in data engineering: Stream data mining and applications, pp. 55–62 (2005)
Smith, J.R.: Riding the multimedia big data wave. In: ACM SIGIR conference on research and development in information retrieval, pp. 1–2 (2013)
Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: IEEE international conference on system science, engineering design and manufacturing informatization, pp. 27–30 (2010)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp. 2818–2826 (2016)
Tian, H., Chen, S.C.: MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In: IEEE international conference on multimedia big data, pp. 268–275 (2017)
Tian, H., Chen, S.C., Rubin, S.H., Grefe, W.K.: FA-MCADF: Feature affinity based multiple correspondence analysis and decision fusion framework for disaster information management. In: IEEE international conference on information reuse and integration, pp. 198–206 (2017)
Tian, Y., Chen, S.C., Shyu, M. L., Huang, T., Sheu, P., Del Bimbo, A.: Multimedia big data. IEEE MultiMed. 22(3), 93–95 (2015)
Article Google Scholar
Walch, M., Lange, K., Baumann, M., Weber, M.: Autonomous driving: investigating the feasibility of car-driver handover assistance. In: ACM international conference on automotive user interfaces and interactive vehicular applications, pp. 11–18 (2015)
Wang, Z., Kuan, K., Ravaut, M., Manek, G., Song, S., Fang, Y., Kim, S., Chen, N., D’Haro, L.F., Tuan, L.A., et al.: Truly multi-modal youtube-8m video classification with video, audio, and text. arXiv:1706.05461 (2017)
Weill, P., Vitale, M.: Place to space: Migrating to eBusiness models. Harvard Business Press, Brighton (2001)
Wöllmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In: Annual conference of the international speech communication association, pp. 2362–2365 (2010)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: ACM international conference on multimedia, pp. 572–579 (2004)
Yang, Y., Pouyanfar, S., Tian, H., Chen, M., Chen, S.C., Shyu, M.L.: IF-MCA: Importance factor-based multiple correspondence analysis for multimedia data analytics. IEEE Trans. Multimed. 20(4), 1024–1032 (2018)
Article Google Scholar
Yates, D., Paquette, S.: Emergency knowledge management and social media technologies: A case study of the 2010 haitian earthquake. Int. J. Inf. Manag. 31(1), 6–13 (2011)
Article Google Scholar
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp. 487–495 (2014)
Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: IEEE international conference on semantic computing, pp. 462–469 (2010)
Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Effective supervised discretization for classification based on correlation maximization. In: IEEE international conference on information reuse and integration, pp. 390–395 (2011)
Zhu, W., Cui, P., Wang, Z., Hua, G.: Multimedia big data computing. IEEE Multimed. 22(3), 96–105 (2015)
Article Google Scholar

Download references

Acknowledgments

This research is partially supported by NSF CNS-1461926.

Author information

Authors and Affiliations

School of Computing and Information Sciences, Florida International University, Miami, FL, 33199, USA
Samira Pouyanfar, Haiman Tian & Shu-Ching Chen
Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, 33124, USA
Yudong Tao & Mei-Ling Shyu

Authors

Samira Pouyanfar
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Tao
View author publications
You can also search for this author in PubMed Google Scholar
Haiman Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Ching Chen
View author publications
You can also search for this author in PubMed Google Scholar
Mei-Ling Shyu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samira Pouyanfar.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Big Data for Effective Disaster Management

Guest Editors: Xuan Song, Song Guo, and Haizhong Wang

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pouyanfar, S., Tao, Y., Tian, H. et al. Multimodal deep learning based on multiple correspondence analysis for disaster management. World Wide Web 22, 1893–1911 (2019). https://doi.org/10.1007/s11280-018-0636-4

Download citation

Received: 29 April 2018
Revised: 19 July 2018
Accepted: 27 August 2018
Published: 18 September 2018
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11280-018-0636-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal deep learning based on multiple correspondence analysis for disaster management

Abstract

Access this article

Similar content being viewed by others

Multimodal deep representation learning for video classification

Multi-scale network with shared cross-attention for audio–visual correlation learning

GLTCM: Global-Local Temporal and Cross-Modal Network for Audio-Visual Event Localization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multimodal deep learning based on multiple correspondence analysis for disaster management

Abstract

Access this article

Similar content being viewed by others

Multimodal deep representation learning for video classification

Multi-scale network with shared cross-attention for audio–visual correlation learning

GLTCM: Global-Local Temporal and Cross-Modal Network for Audio-Visual Event Localization

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation