Skip to main content
Log in

Stagemix video generation using face and body keypoints detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Playing multiple stage videos of a particular singer as if they are one is called Stagemix video. The consumption of video media has increased recently, and the demand for video editing has also increased. Stagemix videos have gained popularity in various communities, and a number of YouTubers who upload videos with cross-cuts are appearing. In this work, we introduce a novel task, Stagemix video generation. Stagemix video generation requires considerable time and skillful editing skills. To address this, we suggest a method of auto-generating Stagemix video, a novel technique that plays multiple stage videos of a particular singer as if they are one. Our novel methods automatically generate a Stagemix video and improve performance with face or body keypoints which is extracted by CNN-based extractor. Quantitative differences between frames and creation time show that our methods effectively produce a natural video.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Fang HS, Xie S, Tai YW, Lu C (2017) Rmpe: Regional multi-person pose estimation. In: ICCV

  2. Gabeur V, Sun C, Alahari K, Schmid C (2020) Multi-modal transformer for video retrieval. In: European conference on computer vision (ECCV), vol 5. Springer

  3. Ging S, Zolfaghari M, Pirsiavash H, Brox T (2020)

  4. Jain A, Tompson J, Andriluka M, Taylor GW, Bregler C (2013)

  5. Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35 (1):221–231

    Article  Google Scholar 

  6. Jiao Y, Li Z, Huang S, Yang X, Liu B, Zhang T (2018) Three-dimensional attention-based deep ranking model for video highlight detection. IEEE Trans Multimed 20(10):2693–2705

    Article  Google Scholar 

  7. Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P et al (2017)

  8. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE international conference on computer vision, pp 5533–5541

  9. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  10. Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450–6459

  11. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  12. Wei SE, Ramakrishna V, Kanade T, Sheikh Y (2016) Convolutional pose machines. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4724–4732

  13. Xiong B, Kalantidis Y, Ghadiyaram D, Grauman K (2019) Less is more: Learning highlight detection from video duration. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1258–1267

  14. Yu Y, Lee S, Na J, Kang J, Kim G (2018) A deep ranking model for spatio-temporal highlight detection from a 360 video

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junseok Kwon.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jung, M., Lee, S., Sim, E.S. et al. Stagemix video generation using face and body keypoints detection. Multimed Tools Appl 81, 38531–38542 (2022). https://doi.org/10.1007/s11042-022-13103-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13103-8

Keywords

Navigation