Skip to main content
Log in

The joint detection and classification model for spatiotemporal action localization of primates in a group

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Analysis of primate behavior is crucial for neuroscience research and drug evaluation. Although many methods of automatically recording animal behavior have been proposed, none of them can meet the requirements of both speed and accuracy. In order to implement real-time and high-precision automatic recording of primate behavior, we proposed a novel Joint Detection and Classification (JDC) model to predict the location, identity and actions of monkeys simultaneously. Different from the existing complex non-end-to-end models, our model is the first end-to-end method in this field. In order to explore how to efficiently fuse spatiotemporal information, we constructed the JDC model with a single frame or different fusion approaches. In addition, we collected a new dataset named Spatiotemporal Action Localization of Monkeys in a Group (SALMG), which is the first one containing the location, identity and actions of monkeys in a group. The JDC model with middle fusion approach (JDC-MF) on the SALMG dataset outperforms all compared methods. The F1 score of the JDC-MF is 81.4%, which is 15.3% and 10.6% higher than the Separate Detection and Classification model and the two-stage model, respectively. Moreover, the JDC-MF achieves the highest accuracy of 99.1 % on public Pig Novelty Preference Behavior dataset, which is 4.1% higher than the second-best model. The JDC-MF only takes 0.027 s for a clip on a Nvidia GeForce RTX 2080 Ti. Therefore, the JDC-MF can realize real-time and high-precision spatiotemporal action localization of monkeys, and provide an effective reference scheme for automatic recording and analysis of animal behavior. Code has been made available at: https://github.com/Kewei-Liang/JDC-MF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The SALMG dataset has been made available at: https://drive.google.com/drive/folders/16ZCkv0j1r26eRNydXYFAtr_fKp-ywIJd?usp=share_link.

References

  1. Klein CJMI, Budiman T, Homberg JR, Verma D, Keijer J, Schothorst EMV (2022) Measuring locomotor activity and behavioral aspects of rodents living in the home-cage. Front Behav Neurosci 16:877323. https://doi.org/10.3389/fnbeh.2022.877323

    Article  Google Scholar 

  2. Kitamural A, Takata R, Aizawa R, Watanabe H, Wada T (2018) A murine model of atopic dermatitis can be generated by painting the dorsal skin with hapten twice 14 days apart. Sci Rep 8(1):5988. https://doi.org/10.1038/s41598-018-24363-6

    Article  Google Scholar 

  3. Yoo KH, Lee SJ (2010) Experimental animal models of neurogenic bladder dysfunction. Int Neurourol J 14(1):1–6. https://doi.org/10.5213/inj.2010.14.1.1

    Article  Google Scholar 

  4. Labuguen R, Matsumoto J, Negrete SB, Nishimaru H, Nishijo H, Takada M et al (2020) MacaquePose: a novel “in the wild’’ macaque monkey pose dataset for markerless motion capture. Front Behav Neurosci 14:581154. https://doi.org/10.3389/fnbeh.2020.581154

    Article  Google Scholar 

  5. Pang W, Song JH, Lu Y, Zhang XL, Zheng HY, Jiang J et al (2018) Host restriction factors APOBEC3G/3F and other interferon-related gene expressions affect early HIV-1 infection in northern pig-tailed macaque (Macaca leonine). Front Immunol 9:1965. https://doi.org/10.3389/fimmu.2018.01965

    Article  Google Scholar 

  6. Howell LL, Murnane KS (2008) Nonhuman primate neuroimaging and the neurobiology of psychostimulant addiction. Ann N Y Acad Sci 1141:176–194. https://doi.org/10.1196/annals.1441.023

    Article  Google Scholar 

  7. Waguespack HF, Aguilar BL, Malkova L, Forcelli PA (2020) Inhibition of the deep and intermediate layers of the superior colliculus disrupts sensorimotor gating in monkeys. Front Behav Neurosci 14:610702. https://doi.org/10.3389/fnbeh.2020.610702

    Article  Google Scholar 

  8. Novak MA, Meyer JS (2021) A rhesus monkey model of non-suicidal self-injury. Front Behav Neurosci 15:674127. https://doi.org/10.3389/fnbeh.2021.674127

    Article  Google Scholar 

  9. Kiss T, Hoffmann WE, Scott L, Kawabe TT, Milici AJ, Nilsen EA et al (2011) Role of thalamic projection in NMDA receptor-induced disruption of cortical slow oscillation and short-term plasticity. Front Psychiatry 4:2–14. https://doi.org/10.3389/fpsyt.2011.00014

    Article  Google Scholar 

  10. Iredale SK, Nevill CH, Lutz CK (2010) The influence of observer presence on baboon (Papio spp.) and rhesus macaque (Macaca mulatta) behavior. Appl Anim Behav Sci 122(1):53–57. https://doi.org/10.1016/j.applanim.2009.11.002

    Article  Google Scholar 

  11. Anderson DJ, Perona P (2014) Toward a science of computational ethology. Neuron 84(1):18–31. https://doi.org/10.1016/j.neuron.2014.09.005

    Article  Google Scholar 

  12. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: IEEE Conference on computer vision and pattern recognition (CVPR) pp. 4724-4733. https://doi.org/10.1109/CVPR.2017.502

  13. Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. https://doi.org/10.48550/arXiv.1212.0402, arXiv:1212.0402

  14. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: International conference on computer vision, pp. 2556-2563. https://doi.org/10.1109/ICCV.2011.6126543

  15. Qiao Y, Guo Y, Yu K, He D (2020) C3D-ConvLSTM based cow behaviour classification using video data for precision livestock farming. Comput Electron Agric 193:106650. https://doi.org/10.1016/j.compag.2021.106650

    Article  Google Scholar 

  16. Ma S, Zhang Q, Li T, Song H (2022) Basic motion behavior recognition of single dairy cow based on improved Rexnet 3D network. Comput Electron Agric 194:106772. https://doi.org/10.1016/j.compag.2022.106772

    Article  Google Scholar 

  17. Fang C, Zhang T, Zheng H, Huang J, Cuan K (2021) Pose estimation and behavior classification of broiler chickens based on deep neural networks. Comput Electron Agric 180:105863. https://doi.org/10.1016/j.compag.2020.105863

    Article  Google Scholar 

  18. Lozano P, Gavrilets S, Sánchez A (2020) Cooperation, social norm internalization, and hierarchical societies. Sci Rep 10(1):15359. https://doi.org/10.1038/s41598-020-71664-w

    Article  Google Scholar 

  19. Lidster K, Owen K, Browne WJ, Prescott MJ (2019) Cage aggression in group-housed laboratory male mice: an international data crowdsourcing project. Sci Rep 9(1):15211. https://doi.org/10.1038/s41598-019-51674-z

    Article  Google Scholar 

  20. Gu C, Sun C, Ross DA et al (2018) AVA: a video dataset of spatio-temporally localized atomic visual actions. In: IEEE/CVF Conference on computer vision and pattern recognition, pp. 6047-6056. https://doi.org/10.1109/CVPR.2018.00633

  21. Marks M, Qiuhan J, Sturman O, Ziegler LV, Kollmorgen S, Behrens WVD et al (2022) Deep-learning-based identification, tracking, pose estimation and behaviour classification of interacting primates and mice in complex environments. Nat Mach Intell 4(4):331–340. https://doi.org/10.1038/s42256-022-00477-5

    Article  Google Scholar 

  22. Bailey DW, Trotter MG, Knight CW, Thomas MG (2018) Use of GPS tracking collars and accelerometers for rangeland livestock production research. Transl Anim Sci 2(1):81–88. https://doi.org/10.1093/tas/txx006

    Article  Google Scholar 

  23. Li D, Zhang K, Li Z, Chen Y (2020) A spatiotemporal convolutional network for multi-behavior recognition of pigs. Sensors 20(8):2381. https://doi.org/10.3390/s20082381

    Article  Google Scholar 

  24. Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. In: IEEE/CVF International conference on computer vision (ICCV) pp. 6201-6210. https://doi.org/10.1109/ICCV.2019.00630

  25. Kang X, Li S, Li Q, Liu G (2022) Dimension-reduced spatiotemporal network for lameness detection in dairy cows. Comput Electron Agric 197:106922. https://doi.org/10.1016/j.compag.2022.106922

    Article  Google Scholar 

  26. Wang H, Zhang S, Zhao S, Lu J, Wang Y, Li D et al (2022) Fast detection of cannibalism behavior of juvenile fish based on deep learning. Comput Electron Agric 198:107033. https://doi.org/10.1016/j.compag.2022.107033

    Article  Google Scholar 

  27. Fleming SA, Dilger RN (2017) Young pigs exhibit differential exploratory behavior during novelty preference tasks in response to age, sex, and delay. Behav Brain Res 321:50–60. https://doi.org/10.1016/j.bbr.2016.12.027

    Article  Google Scholar 

  28. Shirke A, Golden R, Gautam M, Green-Miller A, Caesar M, Dilger RN (2021) Vision-based behavioral recognition of novelty preference in pigs. https://doi.org/10.48550/arXiv.2106.12181, arXiv:2106.12181

  29. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei, L (2014) Large-scale video classification with convolutional neural networks. In: IEEE Conference on computer vision and pattern recognition, pp. 1725-1732. https://doi.org/10.1109/CVPR.2014.223

  30. Lin TY, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2014) Feature pyramid networks for object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR) pp. 936–944. https://doi.org/10.1109/CVPR.2017.106

  31. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: IEEE/CVF Conference on computer vision and pattern recognition, pp. 8759-8768. https://doi.org/10.1109/CVPR.2018.00913

  32. Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K et al (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691. https://doi.org/10.1109/TPAMI.2016.2599174

    Article  Google Scholar 

  33. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–31. https://doi.org/10.1109/TPAMI.2012.59

    Article  Google Scholar 

  34. Lin J, Gan C, Han S (2019) TSM: Temporal shift module for efficient video understanding. In: IEEE/CVF International conference on computer vision (ICCV), pp. 7082-7092. https://doi.org/10.1109/ICCV.2019.00718

  35. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934, arXiv:2004.10934

  36. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp. 779-788. https://doi.org/10.1109/CVPR.2016.91

  37. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp. 6517-6525. https://doi.org/10.1109/CVPR.2017.690

  38. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. https://doi.org/10.48550/arXiv.1804.02767, arXiv:1804.02767

Download references

Acknowledgments

This work has been partially supported by the Chinese National Natural Science Foundation Projects (#82090051, #81871442), and in part by the Youth Innovation Promotion Association CAS (#Y201930). We thank JOINN Laboratories Co., Ltd for its support on the experimental site and equipment. We thank Chenlu Jie, Wenxuan Fan, Jiawei Huang et al. for their contributions to data annotation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xibo Ma.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

Our experiment did not interfere with the normal life of monkeys and was reviewed and approved by Animal Management and Use Committee of JOINN Laboratories Co., Ltd.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 721 KB)

Supplementary file 2 (mp4 45370 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, K., Chen, Z., Yang, S. et al. The joint detection and classification model for spatiotemporal action localization of primates in a group. Neural Comput & Applic 35, 18471–18486 (2023). https://doi.org/10.1007/s00521-023-08670-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08670-2

Keywords

Navigation