Context-driven Multi-stream LSTM (M-LSTM) for Recognizing Fine-Grained Activity of Drivers

Behera, Ardhendu; Keidel, Alexander; Debnath, Bappaditya

doi:10.1007/978-3-030-12939-2_21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11269))

Included in the following conference series:

German Conference on Pattern Recognition

2924 Accesses
11 Citations

Abstract

Automatic recognition of in-vehicle activities has significant impact on the next generation intelligent vehicles. In this paper, we present a novel Multi-stream Long Short-Term Memory (M-LSTM) network for recognizing driver activities. We bring together ideas from recent works on LSTMs, transfer learning for object detection and body pose by exploring the use of deep convolutional neural networks (CNN). Recent work has also shown that representations such as hand-object interactions are important cues in characterizing human activities. The proposed M-LSTM integrates these ideas under one framework, where two streams focus on appearance information with two different levels of abstractions. The other two streams analyze the contextual information involving configuration of body parts and body-object interactions. The proposed contextual descriptor is built to be semantically rich and meaningful, and even when coupled with appearance features it is turned out to be highly discriminating. We validate this on two challenging datasets consisting driver activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abouelnaga, Y., Eraqi, H.M., Moustafa, M.N.: Real-time distracted driver posture classification. arXiv preprint arXiv:1706.09498 (2017)
Aggarwal, J., Ryoo, M.: Human activity analysis: a review. ACM Comput. Surv. 43(3), 16:1–16:43 (2011)
Article Google Scholar
Behera, A., Hogg, D.C., Cohn, A.G.: Egocentric activity monitoring and recovery. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7726, pp. 519–532. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37431-9_40
Chapter Google Scholar
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: ICCV, pp. 1395–1402 (2005)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE CVPR (2017)
Google Scholar
Carsten, O.: From driver models to modelling the driver: what do we really need to know about the driver? In: Cacciabue, P.C. (ed.) Modelling Driver Behaviour in Automotive Environments, pp. 105–120. Springer, London (2007). https://doi.org/10.1007/978-1-84628-618-6_6
Chapter Google Scholar
State Farm Corporate: State farm distracted driver detection (2016). https://www.kaggle.com/c/state-farm-distracted-driver-detection
Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. PAMI 39(4), 677–691 (2017)
Article Google Scholar
Fathi, A., Farhadi, A., Rehg, J.M.: Understanding egocentric activities. In: ICCV (2011)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: IEEE CVPR, pp. 1933–1941 (2016)
Google Scholar
Girdhar, R., Ramanan, D.: Attentional pooling for action recognition. In: Advances in NIPS, pp. 33–44 (2017)
Google Scholar
Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R*CNN. In: ICCV, pp. 1080–1088 (2015)
Google Scholar
Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception. In: CVPR (2007)
Google Scholar
Heide, A., Henning, K.: The “cognitive car”: a roadmap for research issues in the automotive sector. Ann. Rev. Control 30(2), 197–203 (2006)
Article Google Scholar
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)
Article Google Scholar
Hssayeni, M., Saxena, S., Ptucha, R., Savakis, A.: Distracted driver detection: deep learning vs handcrafted features. Electron. Imaging 10, 20–26 (2017)
Article Google Scholar
Huang, J., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR, pp. 3296–3297 (2017)
Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: ICML, pp. 2342–2350 (2015)
Google Scholar
Kaplan, S., Guvensan, M.A., Yavuz, A.G., Karalurt, Y.: Driver behavior analysis for safe driving: a survey. IEEE Trans. Int. Transp. Syst. 16(6), 3017–3032 (2015). https://doi.org/10.1109/TITS.2015.2462084
Article Google Scholar
Kim, H.J., Yang, J.H.: Takeover requests in simulated partially autonomous vehicles considering human factors. IEEE Trans. Hum.-Mach. Syst. 47(5), 735–740 (2017). https://doi.org/10.1109/THMS.2017.2674998
Article Google Scholar
Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: IEEE CVPR (2010)
Google Scholar
Laptev, I., Lindeberg, T.: Space-time interest points. In: ICCV, pp. 432–439 (2003)
Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos “in the wild”. In: IEEE CVPR, pp. 1996–2003 (2009)
Google Scholar
Luo, Z., Peng, B., Huang, D.A., Alahi, A., Fei-Fei, L.: Unsupervised learning of long-term motion dynamics for videos. arXiv preprint arXiv:1701.01821, vol. 2 (2017)
Mallya, A., Lazebnik, S.: Learning models for actions and person-object interactions with transfer to question answering. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 414–428. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_25
Chapter Google Scholar
Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)
Google Scholar
Ranft, B., Stiller, C.: The role of machine vision for intelligent vehicles. IEEE Trans. Int. Veh. 1(1), 8–19 (2016). https://doi.org/10.1109/TIV.2016.2551553
Article Google Scholar
Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: IEEE CVPRW, pp. 512–519 (2014)
Google Scholar
Rohrbach, M., Amin, S., Andriluka, M., Schiele, B.: A database for fine grained activity detection of cooking activities. In: IEEE CVPR, pp. 1194–1201, June 2012
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. IJCV 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Ryoo, M.S., Aggarwal, J.K.: Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: ICCV (2009)
Google Scholar
Ryoo, M.S., Rothrock, B., Matthies, L.H.: Pooled motion features for first-person videos. In: IEEE CVPR (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: IEEE CVPR, pp. 1961–1970 (2016)
Google Scholar
Singh, D.: Using convolutional neural networks to perform classification on state farm insurance driver images. Technical report. Stanford University, Stanford, CA (2016)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 65-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Mach. Learn. 4(2), 26–31 (2012)
Google Scholar
Trivedi, M.M., Gandhi, T., McCall, J.: Looking-in and looking-out of a vehicle: computer-vision-based enhanced vehicle safety. IEEE Trans. Int. Transp. Syst. 8(1), 108–120 (2007). https://doi.org/10.1109/TITS.2006.889442
Article Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. IJCV 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: IEEE CVPR (2015)
Google Scholar
Wu, Z., Jiang, Y.G., Wang, X., Ye, H., Xue, X., Wang, J.: Fusing multi-stream deep networks for video classification. arXiv preprint arXiv:1509.06086 (2015)
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in NIPS, pp. 802–810 (2015)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: NIPS, pp. 3320–3328 (2014)
Google Scholar
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: IEEE CVPR, pp. 4694–4702 (2015)
Google Scholar

Download references

Acknowledgments

The research is supported by the Edge Hill University’s Research Investment Fund (RIF). We would like to thank Taylor Smith in State Farm Corporation for providing information about their dataset. The GPU used in this research is generously donated by the NVIDIA Corporation.

Author information

Authors and Affiliations

Department of Computer Science, Edge Hill University, Ormskirk, L39 4QP, UK
Ardhendu Behera, Alexander Keidel & Bappaditya Debnath

Authors

Ardhendu Behera
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Keidel
View author publications
You can also search for this author in PubMed Google Scholar
Bappaditya Debnath
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ardhendu Behera .

Editor information

Editors and Affiliations

University of Freiburg, Freiburg im Breisgau, Baden-Württemberg, Germany
Thomas Brox
University of Stuttgart, Stuttgart, Baden-Württemberg, Germany
Andrés Bruhn
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Mario Fritz

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 16870 KB)

Supplementary material 2 (pdf 1242 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Behera, A., Keidel, A., Debnath, B. (2019). Context-driven Multi-stream LSTM (M-LSTM) for Recognizing Fine-Grained Activity of Drivers. In: Brox, T., Bruhn, A., Fritz, M. (eds) Pattern Recognition. GCPR 2018. Lecture Notes in Computer Science(), vol 11269. Springer, Cham. https://doi.org/10.1007/978-3-030-12939-2_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-12939-2_21
Published: 14 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12938-5
Online ISBN: 978-3-030-12939-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics