ABSTRACT
Image based human detection is of paramount interest due to its potential applications in fields such as advanced driving assistance, surveillance and media analysis. However, even detecting non-occluded standing humans remains a challenge of intensive research. The most promising human detectors rely on classifiers developed in the discriminative paradigm, i.e. trained with labelled samples. However, labelling is a manual intensive step, especially in cases like human detection where it is necessary to provide at least bounding boxes framing the humans for training. To overcome such problem, some authors have proposed the use of a virtual world where the labels of the different objects are obtained automatically. This means that the human models (classifiers) are learnt using the appearance of rendered images, i.e. using realistic computer graphics. Later, these models are used for human detection in images of the real world. The results of this technique are surprisingly good. However, these are not always as good as the classical approach of training and testing with data coming from the same camera, or similar ones. Accordingly, in this paper we address the challenge of using a virtual world for gathering (while playing a videogame) a large amount of automatically labelled samples (virtual humans and background) and then training a classifier that performs equal, in real-world images, than the one obtained by equally training from manually labelled real-world samples. For doing that, we cast the problem as one of domain adaptation. In doing so, we assume that a small amount of manually labelled samples from real-world images is required. To collect these labelled samples we propose a non-standard active learning technique. Therefore, ultimately our human model is learnt by the combination of virtual and real world labelled samples, which has not been done before.
- Y. Abramson and Y. Freund. SEmi-automatic VIsuaL LEarning (SEVILLE): a tutorial on active learning for visual object recognition. In IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005.Google Scholar
- S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. Vaughan. A theory of learning from different domains. Machine Learning , 79(1):151--175, 2009. Google ScholarDigital Library
- T. Berg, A. Sorokin, G. Wang, D. Forsyth, D. Hoeiem, I. Endres, and A. Farhadi. It's all about the data. Proceedings of the IEEE , 98(8):1434--1452, 2010.Google ScholarCross Ref
- C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Google ScholarDigital Library
- A. Broggi, A. Fascioli, P. Grisleri, T. Graf, and M. Meinecke. Model-based validation approaches and matching techniques for automotive vision based pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005. Google ScholarDigital Library
- N. Dalal. Finding people in images and videos. PhD Thesis, Institut National Polytechnique de Grenoble / INRIA Rhone-Alpes, 2006.Google Scholar
- N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, USA, 2005. Google ScholarDigital Library
- P. Dollár, C. Wojek, B. Schiele, and P. Perona. Pedestrian detection: a benchmark. In IEEE Conf. on Computer Vision and Pattern Recognition, Miami Beach, FL, USA, 2009.Google ScholarCross Ref
- M. Enzweiler and D. Gavrila. A mixed generative-discriminative framework for pedestrian classification. In IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.Google ScholarCross Ref
- M. Enzweiler and D. Gavrila. Monocular pedestrian detection: survey and experiments. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(12):2179--2195, 2009. Google ScholarDigital Library
- M. Everingham, L. V. Gool, C. Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. Int. Journal on Computer Vision, 88(2):303--338, 2010. Google ScholarDigital Library
- P. Felzenszwalb, D. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In IEEE Conf. on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 2008.Google ScholarCross Ref
- D. Gerónimo, A.M. López, A.D. Sappa, and T. Graf. Survey of pedestrian detection for advanced driver assistance systems. IEEE Trans. on Pattern Analysis and Machine Intelligence, 32(7):1239--1258, 2010. Google ScholarDigital Library
- J. Marín, D. Vázquez, D. Gerónimo, and A.M. López. Learning appearance in virtual scenarios for pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.Google ScholarCross Ref
- T. Pouli, D. Cunningham, and E. Reinhard. Image statistics and their applications in computer graphics. In European Computer Graphics Conference and Exhibition, Norrköping, Sweden, 2010.Google Scholar
- G. Taylor, A. Chosak, and P. Brewer. OVVV: Using virtual worlds to design and evaluate surveillance systems. In IEEE Conf. on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 2007.Google ScholarCross Ref
- W. van der Mark and D. M. Gavrila. Real-time dense stereo for intelligent vehicles. IEEE Trans. on Intelligent Transportation Systems , 7(1):38--50, 2009. Google ScholarDigital Library
- L. von Ahn, R. Liu, and M. Blum. Peekaboom: a game for locating objects in images. In ACM SIGCHI Conf. on Human Factors in Computing Systems, Montréal, Québec, Canada, 2006. Google ScholarDigital Library
- S. Walk, N. Majer, K. Schindler, and B. Schiele. New features and insights for pedestrian detection. In IEEE Conf. on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 2010.Google ScholarCross Ref
- X. Wang, T.X. Han, and S. Yan. An HOG-LBP human detector with partial occlusion handling. In Int. Conf. on Computer Vision, Kyoto, Japan, 2009.Google ScholarCross Ref
Index Terms
- Virtual worlds and active learning for human detection
Recommendations
Cost‐effective multi‐instance multilabel active learning
AbstractMulti‐instance multi‐label (MIML) Active Learning (M2AL) aims to improve the learner while reducing the cost as much as possible by querying informative labels of complex bags composed of diverse instances. Existing M2AL solutions suffer high ...
Selective Weakly Supervised Human Detection under Arbitrary Poses
AbstractIn this paper we study the problem of weakly supervised human detection under arbitrary poses within the framework of multi-instance learning (MIL). Our contributions are threefold: (1) we first show that in the context of weakly supervised ...
Highlights- We propose a novel Selective Weakly Supervised Detection method which outperforms the previous state-of-the-art methods.
- We annotate a new large-scale data set called LSP/MPII-MPHB (Multiple Poses Human Body) for human body detection.
Combining active learning and semi-supervised for improving learning performance
ISABEL '11: Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication TechnologiesIn many learning tasks, there are abundant unlabeled samples but the number of labeled training samples is limited, because labeling the samples requires the efforts of human annotators and expertise. There are three major techniques for labeling the ...
Comments