Multi-camera active surveillance of an articulated human form – An implementation strategy

https://doi.org/10.1016/j.cviu.2011.06.006Get rights and content

Abstract

This paper presents an effective framework for real-time reconfiguration of a multi-camera active-vision system performing surveillance of articulated objects with time-varying geometries. The proposed novel real-world strategy tangibly improves surveillance performance by selecting near-optimal viewpoints along a prediction time horizon in order to maximize visibility of a human form in the presence of multiple, dynamic obstacles. Controlled experiments have demonstrated a positive relationship between system re-configurability and surveillance performance.

Highlights

► Multi-camera active-vision system used to improve human action sensing performance. ► Environment may be cluttered with multiple, dynamic obstacles. ► Selects near-optimal camera poses to maximize visibility of a human subject. ► Generalized framework can be customized to other time-varying-geometry objects. ► Shown to tangibly improve performance in real-world, dynamic environments.

Introduction

Early research on the tracking and recognition of human form and actions have used single-camera computer-vision systems, while assuming ideal conditions, with un-occluded views of the subject readily available (e.g., [1]). Recent works, however, have proposed the use of multiple (static) cameras to improve upon the sensing objective (e.g., [2], [3]). The prevalent focus of these investigations has been mainly on the development of robust recognition algorithms.

An alternative approach to improving recognition has been the use of reconfigurable multi-camera active vision systems, which can select performance-optimal viewpoints (e.g., [1], [4], [5], [6]). Although active-vision systems have been shown to improve sensing performance in a variety of situations, no framework exists for sensing generic time-varying-geometry (TVG) objects (e.g., human form) with such systems. Thus, we present herein, a novel, agent-based sensing-system reconfiguration strategy for the recognition of a human form moving under real-world conditions. The strategy is designed to be adaptable to a wide variety of real-world TVG object action-sensing tasks, while addressing real-time operation, presence of multiple static or dynamic (maneuvering) obstacles, differentiated importance of viewpoints, and object self-occlusion.

This proposed novel on-line strategy also utilizes (i) a new robust, time-constrained, distributed form recovery method, and (ii) a new method of complete system calibration designed specifically for reconfigurable active-vision systems to significantly reduce error introduced through real-world camera-pose (position and orientation) reproduction.

A brief literature review is presented below, prior to a detailed description of the proposed implementation strategy. The review first examines past work on sensing-system reconfiguration for fixed-geometry objects, as they form the basis for most current algorithms for TVG objects. This is followed by a brief summary of several recent reconfiguration methods for the tracking and recognition of articulated objects/subjects.

In static environments, sensing-system reconfiguration methods have been commonly classified as either generate-and-test or synthesis [1]. The former discretize the sensor-pose domain, for example by using a tessellated sphere or geodesic dome, for an efficient search of a small problem space using well-established search techniques (e.g., [7]). Possible configurations are evaluated with respect to task constraints. Synthesis methods, on the other hand, determine sensor parameters by finding an analytical solution to task constraints in a high-dimension space (e.g., [8]). These methods can find an optimal solution by maximizing one or more parameters, such as error or robustness, but often require insight into the causal relations between the parameters.

For dynamic environments, most proposed system-reconfiguration algorithms have been extensions of those for static environments. For example, [9] presents a system that discretizes the workspace into sectors, and assigns sensors once an object enters a given sector. In [10], attention-based behavior is used to dynamically select a single-target for all sensors to focus on at any given time. In [11], [12], multiple mobile sensors are positioned on-line for the surveillance of maneuvering targets in the presence of static obstacles. Multiple dynamic obstacles were considered in [13], where a multi-camera system is used to sense a moving Object of Interest (OoI). Recent approaches have also led to agent-based strategies (e.g. [11], [14], [15], [16]).

The recognition of the actions of even a simple articulated form has proven a difficult task for automated systems. Input data quality has often been the limiting factor in performance [17]. The following review focuses on active vision, which may be used to improve input data. We first examine past work on form recognition with static sensing-systems, prior to a discussion of current active methods for TVG objects.

Early efforts in form recognition focused on the identification of a single, static form. Since this requires a database of template poses, most algorithms dealt with reconstructing models for a priori unknown objects (e.g., [18]). Human-gait recognition has also emerged as a central research topic: it is possible to uniquely identify an individual based on their gait (e.g., [19]).

TVG objects typically exhibit specific, repeatable sequences of form (actions) that one might wish to recognize [20]. Pertinent recognition approaches have been, commonly, categorized into three general classes: template matching, semantic, and statistical. Template matching compares input images directly to stored templates, with multiple pose matches over time being used to match an action from a template (e.g., [21]). Semantic approaches are model-based approaches, where a high-level representation of the target is constructed (e.g., [22]). This reduces the dimensionality of the feature-vector matching problem. Statistical approaches combine both previous approaches, by also reducing dimensionality through statistical operations on the template database (e.g., [23]).

Most past form-recognition methods did not consider the sensor-pose choice as part of the methodology – input data is taken to be fixed, with no opportunity for improvement. Many recent methods continue this trend [2], [3], [24], [25], [26].

Although input data may be improved through active sensing, system reconfiguration is a complex endeavor for TVG objects – for example, viewpoints exhibit differentiated importance by both pose and time when recognizing the actions of articulated objects [27]. Namely, different views of a human subject would contain different sub-parts of the overall model (form), which in turn depends on the action (time). If an algorithm already has past input data containing some parts of the overall model, views of the missing parts can become relatively more important, if they express action-differentiating information.

Self-occlusion also increases viewpoint differentiation due to sensor pose. An articulated object may self-occlude, but in sensing a time-varying geometry, these occlusions also change over time. Viewpoint differentiation by time results in a need for continuous surveillance. The data being sensed, i.e., an action, is a continuously changing function of time. This invalidates the use of ‘sense-and-forget’ or attention-based behavior.

Most recent reconfiguration methodologies for TVG objects have simply adapted algorithms developed for fixed-geometry objects (e.g., [28]). While such approaches have merit, the problems mentioned above cannot be directly addressed in this manner. Thus, in this paper, a novel, multi-camera active-vision system is proposed to improve form-recognition performance by selecting near-optimal viewpoints. The method seeks to maximize the visibility of a TVG Object of Interest (OoI), moving in a cluttered, dynamic environment.

Section snippets

Problem formulation

The problem at hand is the recognition of an a priori unknown action of an articulated Object of Interest (OoI) moving on an a priori unknown 6-dof trajectory in a real-world environment. The workspace may be cluttered with multiple, dynamic obstacles, also moving along a priori unknown trajectories, which consequently would necessitate trajectory prediction for both the OoI and the obstacles. As noted above, articulated OoIs may also self-occlude and are subject to differentiated viewpoint

Proposed methodology

This section outlines the three primary elements of the proposed multi-camera, active-vision implementation strategy: Sensing-system reconfiguration, distributed form recovery, and system calibration. The use of an agent-based sensing-system reconfiguration methodology is proposed as part of the implementation strategy presented in this paper. This methodology, as shown in Fig. 2, was first outlined in our previous paper [6]. Thus, this section presents only those pertinent details necessary

Experiments

Experiments were performed to demonstrate that the proposed strategy can tangibly improve recognition performance for surveillance of a single, time-varying-geometry OoI moving in a dynamic environment. A large number of experiments were conducted, surveying a walking humanoid analogue in an environment with (i) no obstacles, (ii) static obstacles, and (iii) dynamic obstacles, respectively. The capability of the active-vision system was also varied for all three cases – testing (a) no on-line

Conclusions

In this paper, an agent-based sensing-system-reconfiguration methodology is proposed for the surveillance of time-varying-geometry objects. A central-planning agent, using ranked visibility metric evaluations from multiple sensor agents, selects assignments and poses for a system of sensors tracking a single, dynamic, and articulated subject. A referee agent ensures the correct aggregate system behavior. Overall, the proposed methodology is capable of improving tangibly form-recognition

References (35)

  • M.D. Naish et al.

    Coordinated dispatching of proximity sensors for the surveillance of maneuvering targets

    J. Robot. Comput. Integr. Manuf.

    (2003)
  • D. Cunado et al.

    Automatic extraction and description of human gait models for recognition purposes

    Comput. Vision Image Understanding

    (2003)
  • C. Lee et al.

    Dynamic shape outlier detection for human locomotion

    Comput. Vision Image Understanding

    (2009)
  • K.A. Tarabanis et al.

    A survey of sensor planning in computer vision

    IEEE Trans. Robot. Autom.

    (1995)
  • X. Wang et al.

    Distributed visual-target-surveillance system in wireless sensor networks

    IEEE Trans. Syst., Man, Cyber. – Part B: Cyber.

    (2009)
  • A. Roy, S. Sural, A Fuzzy interfacing system for gait recognition, in Proc. of The 28th North American Fuzzy...
  • M. Mackay, B. Benhabib, A multi-camera active-vision system for dynamic form recognition, in: Intl. Conf. on Computer,...
  • M. Mackay, B. Benhabib, Active-vision system reconfiguration for form recognition in the presence of dynamic obstacles,...
  • M. Mackay et al.

    Time-varying-geometry object surveillance using a multi-camera active-vision system

    Int. J. Smart Sens. Intell. Syst.

    (2008)
  • S. Sakane et al.

    Occlusion avoidance of visual sensors based on a hand eye action simulator system: HEAVEN

    Adv. Robot.

    (1987)
  • C.K. Cowan et al.

    Automated sensor placement from vision task requirements

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1988)
  • B. Horling, R. Vincent, J. Shen, R. Becker, K. Rawlings, V. Lesser: SPT Distributed Sensor Network for Real Time...
  • T. Darrel, I. Essa, A. Pentland, Attention-Drive Expression and Gesture Analysis in an Interactive Environment, Intl....
  • R. Murrieta-Cid et al.

    A sampling-based motion planning approach to maintain visibility of unpredictable targets

    J. Autonomous Robots

    (2005)
  • A. Bakhtari et al.

    Active-vision for the autonomous surveillance of dynamic, multi-object environments

    J. Intell. Robot. Syst.

    (2008)
  • J.R. Spletzer et al.

    Dynamic sensor planning and control for optimally tracking targets

    Intl. J. Robot. Res.

    (2003)
  • M. Kamel, L. Hodge, A coordination mechanism for model-based multi-sensor planning, in: IEEE Intl. Symp. on Intelligent...
  • 1

    Fax: +1 416 978 7753.

    2

    Fax: +1 416 978 7753.

    View full text