research-article

Virtual worlds and active learning for human detection

Authors:
David Vázquez

Computer Vision Center and Computer Science Dpt. UAB, Bellaterra, Spain

Computer Vision Center and Computer Science Dpt. UAB, Bellaterra, Spain
View Profile

,
Antonio M. López

Computer Vision Center and Computer Science Dpt. UAB, Bellaterra, Spain

Computer Vision Center and Computer Science Dpt. UAB, Bellaterra, Spain
View Profile

,
Daniel Ponsa

Computer Vision Center and Computer Science Dpt. UAB, Bellaterra, Spain

Computer Vision Center and Computer Science Dpt. UAB, Bellaterra, Spain
View Profile

,
Javier Marín

Computer Vision Center, Bellaterra, Spain

Computer Vision Center, Bellaterra, Spain
View Profile

ICMI '11: Proceedings of the 13th international conference on multimodal interfacesNovember 2011Pages 393–400https://doi.org/10.1145/2070481.2070556

Published:14 November 2011Publication History

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

Pages 393–400

ABSTRACT

Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e. trained with labelled samples. However, labelling is a manual intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e. using realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by equally training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples, which has not been done before.

References

Y. Abramson and Y. Freund. SEmi-automatic VIsuaL LEarning (SEVILLE): a tutorial on active learning for visual object recognition. In IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005.Google Scholar
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Vaughan. A theory of learning from different domains. Machine Learning , 79(1):151--175, 2009. Google ScholarDigital Library
T. Berg, A. Sorokin, G. Wang, D. Forsyth, D. Hoeiem, I. Endres, and A. Farhadi. It's all about the data. Proceedings of the IEEE , 98(8):1434--1452, 2010.Google ScholarCross Ref
C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
A. Broggi, A. Fascioli, P. Grisleri, T. Graf, and M. Meinecke. Model-based validation approaches and matching techniques for automotive vision based pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005. Google ScholarDigital Library
N. Dalal. Finding people in images and videos. PhD Thesis, Institut National Polytechnique de Grenoble / INRIA Rhone-Alpes, 2006.Google Scholar
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005. Google ScholarDigital Library
P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: a benchmark. In IEEE Conf. on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 2009.Google ScholarCross Ref
M. Enzweiler and D. Gavrila. A mixed generative-discriminative framework for pedestrian classification. In IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.Google ScholarCross Ref
M. Enzweiler and D. Gavrila. Monocular pedestrian detection: survey and experiments. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(12):2179--2195, 2009. Google ScholarDigital Library
M. Everingham, L. V. Gool, C. Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. Int. Journal on Computer Vision, 88(2):303--338, 2010. Google ScholarDigital Library
P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.Google ScholarCross Ref
D. Gerónimo, A.M. López, A.D. Sappa, and T. Graf. Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. on Pattern Analysis and Machine Intelligence, 32(7):1239--1258, 2010. Google ScholarDigital Library
J. Marín, D. Vázquez, D. Gerónimo, and A.M. López. Learning appearance in virtual scenarios for pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.Google ScholarCross Ref
T. Pouli, D. Cunningham, and E. Reinhard. Image statistics and their applications in computer graphics. In European Computer Graphics Conference and Exhibition, Norrköping, Sweden, 2010.Google Scholar
G. Taylor, A. Chosak, and P. Brewer. OVVV: Using virtual worlds to design and evaluate surveillance systems. In IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 2007.Google ScholarCross Ref
W. van der Mark and D. M. Gavrila. Real-time dense stereo for intelligent vehicles. IEEE Trans. on Intelligent Transportation Systems , 7(1):38--50, 2009. Google ScholarDigital Library
L. von Ahn, R. Liu, and M. Blum. Peekaboom: a game for locating objects in images. In ACM SIGCHI Conf. on Human Factors in Computing Systems, Montréal, Québec, Canada, 2006. Google ScholarDigital Library
S. Walk, N. Majer, K. Schindler, and B. Schiele. New features and insights for pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.Google ScholarCross Ref
X. Wang, T.X. Han, and S. Yan. An HOG-LBP human detector with partial occlusion handling. In Int. Conf. on Computer Vision, Kyoto, Japan, 2009.Google ScholarCross Ref

Index Terms

Virtual worlds and active learning for human detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition

Recommendations

Cost‐effective multi‐instance multilabel active learning
Abstract
Multi‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Read More
Selective Weakly Supervised Human Detection under Arbitrary Poses
Abstract
In this paper we study the problem of weakly supervised human detection under arbitrary poses within the framework of multi-instance learning (MIL). Our contributions are threefold: (1) we first show that in the context of weakly supervised ...
Highlights
- We propose a novel Selective Weakly Supervised Detection method which outperforms the previous state-of-the-art methods.
- We annotate a new large-scale data set called LSP/MPII-MPHB (Multiple Poses Human Body) for human body detection.
Read More
Combining active learning and semi-supervised for improving learning performance
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies

In many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '11: Proceedings of the 13th international conference on multimodal interfaces
November 2011
432 pages
ISBN:9781450306416
DOI:10.1145/2070481
General Chairs:
Hervé Bourlard
Idiap Research Institute, Switzerland
,
Thomas S. Huang
University of Illinois, USA
,
Enrique Vidal
Universitat Politécnica Valéncia, Spain
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Louis-Philippe Morency
University of Southern California, USA
,
Nicu Sebe
University of Trento, Italy
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
active learning
human detection
virtual world
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate453of1,080submissions,42%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 250
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Virtual worlds and active learning for human detection

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cost‐effective multi‐instance multilabel active learning

Selective Weakly Supervised Human Detection under Arbitrary Poses

Combining active learning and semi-supervised for improving learning performance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Virtual worlds and active learning for human detection

ICMI '11: Proceedings of the 13th international conference on multimodal interfaces

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cost‐effective multi‐instance multilabel active learning

Selective Weakly Supervised Human Detection under Arbitrary Poses

Combining active learning and semi-supervised for improving learning performance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media