Abstract
Intelligent camera management systems were developed to automatically record meetings for videoconferencing. These systems provided many benefits, such as reducing the production cost and conveniently documenting events. However, automatically recorded videos in general were not visually engaging. This paper presents a novel approach that intelligently controls camera shots and angles to improve the visual interest. We use 3D infrared images captured by a Kinect sensor to recognize active speakers and their positions in a meeting. A movable camera, constructed by placing a wireless PTZ (pan-tilt-zoom) camera on top of a motorized rail, can automatically move its position to frame an active speaker in the center of the screen. Without interrupting the meeting, a speaker can seamlessly switch video sources through gesture-based commands. We have summarized and implemented a set of heuristic rules to simulate a human director. These rules can be visually edited through a graphical user interface. The customization of a virtual director makes our system applicable in various scenarios. We conducted a user study, and the evaluation results justified the quality of an automated video.
Similar content being viewed by others
References
Basili VR, Caldiera G, Rombach HD (1994) The goal question metric approach, technical report, department of computer science, University of Maryland, ftp://ftp.cs.umd.edu/pub/sel/papers/gqm.pdf
Bianchi M (1998) AutoAuditorium: a fully automatic, multi-camera system to televise auditorium presentation, In Proc. Joint DARPA/NIST smart spaces technology workshop
Brandstein M, Ward D (2001) Microphone arrays: signal processing techniques and applications. Springer Verlag
Cutler R, Rui Y, Gupta A, Cadiz J, Tashev I, He I, Colburn A, Zhang Z, Liu Z, Silverberg S (2002) Distributed meetings: a meeting capture and broadcasting system. ACM, Proc. Multimedia, pp 503–512
Foote J, Kimber D (2000) FlyCam: practical panoramic video. Proc. MULTIMEDIA. ACM, 487–488
Gadanac D, Ericsson Nikola Tesla d. d., Zagreb, Croatia, Dujak M, Tomic D, Jercic D (2014) Kinect-based presenter tracking prototype for videoconferencing Proc. MIPRO, 485–490
Heck R, Wallick M, Gleicher M (2007) Virtual videography. ACM Trans Multimedia Comput Commun Appl vol. 3(1)
Howell AJ, Buxton H (2002) Visually mediated interaction using learnt gestures and camera control. HCI 2002. Springer-Verlag. 272–284
Inoue T, Okada K, Matsushita Y (1995) Learning from TV programs: application of TV presentation to a videoconferencing system. Proc. UIST 1995, ACM Press 147–154
Jones A, Lang A, Fyffe G, Yu X, Busch J, McDowall I, Bolas M, Debevec P (2009) Achieving eye contact in a one-to-many 3D video teleconferencing system. ACM Trans Graph 28 (3), Article 64
Kuney J (1990) Take one: television directors on directing. Praeger Publishers
Lee D, Erol B, Graham J, Hull J, Murata N (2002) Portable meeting recorder. ACM, Proc. MULTIMEDIA, pp 493–502
Liu Q, Rui Y, Gupta A, Cadiz JJ (2001) Automating camera management for lecture room environments. In Proc. CHI 2001. ACM, 442–449
Liu Q, Kimber D, Foote J, Wilcox L, Boreczky J (2002) FlySPEC: a multi-user video camera system with hybrid human and automatic control. Proc. Multimedia 2002. ACM, 484–492
Motlicek P, Duffner S, Korchagin D, Bourlard H, Scheffler C, Odobez JM, Galdo G, Kallinger M, Thiergart O (2013) Real-time audio-visual analysis for multiperson videoconferencing. Advances in Multimedia (2013), Volume, Article ID 175745
Mukhopadhyay S, Smith B (1999) Passive capture and structuring of lectures. Proc Multimedia 99:477–487
Nagai T (2009) Automated lecture recording system with AVCHD camcorder and microserver, Proc. SIGUCCS, 47–54
Nickel K, Gehrig T, Stiefelhagen R, McDonough R (2005) A joint particle filter for audio-visual speaker tracking. Proc. ICMI 2005. ACM, 61–68
Norris J, Schnadelbach H, Qiu G (2012) CamBlend: an object focused collaboration tool. Proc CHI 12:627–636
Poltrock SE, Engelbeck G (1997) Requirements for a virtual collocation environment. In ACM GROUP, 61–70
Ranjan A, Birnholtz JP, Balakrishnan R (2006) An exploratory analysis of partner action and camera control in a video-mediated collaborative task. Proc. ACM CSCW 403–412
Ranjan A, Birnholtz JP, Balakrishnan R (2008) Improving meeting capture by applying television production principles with audio and motion detection. Proc. CHI 2008, ACM 227–236
Ranjan A, Henrikson R, Birnholtz J, Balakrishnan R, Lee D (2010) Automatic camera control using unobtrusive vision and audio tracking. Proc. Graphics Interface 2010. ACM 47–54
Ronzhin AL, Prischepa M, Karpov A (2010) A video monitoring model with a distributed camera system for the smart space. Proc. ruSMART/NEW2AN′10, Springer-Verlag, 102–110
Rubin AM (2002) The uses-and-gratifications perspective of media effects. Media Effects: Advances in theory and persuasion, 525–548
Rui Y, Gupta A, Cadiz JJ (2001) Viewing meeting captured by an omni-directional Camera. Proc. CHI 2001, ACM 450–457
Rui Y, Gupta A, Grudin J (2003) Videography for telepresentations. Proc. CHI 2003, ACM, 457–464
Song MS, Zhang C, Florencio D, Kang HG (2011) An Interactive 3-D audio system with loudspeakers. IEEE Trans Multimedia 13(5):844–855
Suau X, Ruiz-Hidalgo J, Casas JR (2012) Real-time head and hand tracking based on 2.5D data. IEEE Trans Multimedia 14(3):575–585
Takahashi M, Fujii M, Naemura M, Satoh S (2013) Human gesture recognition system for TV viewing using time-of-flight camera. Multimedia Tools Appl 62:761–783
Tang JC, Marlow J, Hoff A, Roseway A, Inkpen K, Zhao C, Cao X (2012) Time travel proxy: Using Lightweight Video Recordings to Create Asynchronous, Interactive Meetings. Proc. CHI, 3111–3120
Wang F, Ngo CW, Pong TC (2007) Lecture video enhancement and editing by integrating posture, gesture, and text. IEEE Trans Multimedia 9(2):397–409
Wang F, Ngo CW, Pong TC (2008) Simulating a smartboard by real-time gesture detection in lecture videos. IEEE Trans Multimedia 10(5):926–935
Williamson B, LaViola J, Roberts T, Garrity P (2012) Multi-kinect tracking for dismounted Soldier training. Proc. Interservice/industry training, simulation, and education conference, 1727–1735
Yu Z, Nakamura Y (2010) Smart meeting systems: a survey of state-of-the-art and open issues. ACM Comput Surv Vol. 42, No. 2, Article 8
Zhang JR (2012) Upper body gestures in lecture videos: indexing and correlating to pedagogical Significance. Proc. MM, 1389–1392
Acknowledgments
We thank the volunteer participants in this investigation. The authors would like to thank the anonymous reviewers for their insightful and constructive comments that helped to significantly improve the presentation. This work is in part supported by NSF under grant CNS-1126570.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Roudaki, A., Kong, J. & Reetz, S. SmartCamera: a low-cost and intelligent camera management system. Multimed Tools Appl 75, 7831–7854 (2016). https://doi.org/10.1007/s11042-015-2700-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-2700-8