Incorporating Side Information by Adaptive Convolution

Kang, Di; Dhar, Debarun; Chan, Antoni B.

doi:10.1007/s11263-020-01345-8

Incorporating Side Information by Adaptive Convolution

Published: 02 July 2020

Volume 128, pages 2897–2918, (2020)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

823 Accesses
18 Citations
6 Altmetric
Explore all metrics

Abstract

Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in deep learning based counting systems. In order to incorporate the available side information, we propose an adaptive convolutional neural network (ACNN), where the convolution filter weights adapt to the current scene context via the side information. In particular, we model the filter weights as a low-dimensional manifold within the high-dimensional space of filter weights. The filter weights are generated using a learned “filter manifold” sub-network, whose input is the side information. With the help of side information and adaptive weights, the ACNN can disentangle the variations related to the side information, and extract discriminative features related to the current context (e.g. camera perspective, noise level, blur kernel parameters). We demonstrate the effectiveness of ACNN incorporating side information on 3 tasks: crowd counting, corrupted digit recognition, and image deblurring. Our experiments show that ACNN improves the performance compared to a plain CNN with a similar number of parameters and achieves similar or better than state-of-the-art performance on crowd counting task. Since existing crowd counting datasets do not contain ground-truth side information, we collect a new dataset with the ground-truth camera angle and height as the side information. We also perform ablation experiments, mainly for crowd counting, to study the helpfulness of the side information, and the effect of the placement of the adaptive convolutional layers in order to get insight about ACNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 14

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Notes

The perspective value on a pixel location is proportional to the size of the object if the object exists there.
To reduce clutter, here we do not show the bias term for the convolution.
The mean absolute difference (MAD) between the density maps generated using the original perspective maps and our perspective maps is 0.475 on average, and [0.029, 0.818, 0.800, 0.597, 0.131] respectively on the five test scenes.
The MAD between the original density maps and those using single Gaussian kernels is 2.893 on average, and [0.582, 4.491, 1.946, 7.078, 0.368] respectively on the five test scenes (using our perspective map). This is because the ROI boundary cuts through the most crowded regions on scenes 2 and 4.
CSRNet termed the first ten convolution layers from VGG as front-end, which is more commonly referred as back-end elsewhere.
On the clean MNIST dataset, the 2-conv and 4-conv CNN architectures achieve 0.81% and 0.69% error, while the current state-of-the-art is \(\sim \) 0.23% error (Ciresan et al. 2012).

References

Arteta, C., Lempitsky, V., Noble, J. A., & Zisserman, A. (2014). Interactive object counting. In ECCV
Burger, H. C., Schuler, C. J., & Harmeling, S. (2012). Image denoising: Can plain neural networks compete with BM3D? In CVPR
Chan, A. B., & Vasconcelos, N. (2009). Bayesian poisson regression for crowd counting. In ICCV
Chan, A. B., Liang, Z. S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In CVPR. IEEE.
Chan, A. B., & Vasconcelos, N. (2012). Counting people with low-level features and bayesian regression. IEEE Transactions on Image Processing, 21, 2160–2177.
Article MathSciNet Google Scholar
Ciresan, D., Meier, U., & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In CVPR
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., & Wei, Y. (2017). Deformable convolutional networks. In CVPR
De Brabandere, B., Jia X., Tuytelaars, T., & Van Gool, L. (2016). Dynamic filter networks. In NIPS
Dozat, T. (2015). Incorporating nesterov momentum into adam. Technical report, Stanford University (2015). http://cs229.stanford.edu/proj2015/054report.pdf
Eigen, D., Krishnan, D., & Fergus, R. (2013). Restoring an image taken through a window covered with dirt or rain. In ICCV
Fiaschi, L., Nair, R., Koethe, U., & Hamprecht, F. (2012). Learning to count with regression forest and structured labels. In ICPR
Gharbi, M., Chaurasia, G., Paris, S., & Durand, F. (2016). Deep joint demosaicking and denoising. ACM Transactions on Graphics (TOG).
Ha, D., Dai, A., & Le, Q. V. (2017). HyperNetworks. In ICLR
He, K., Zhang, X., Ren, S., & Sun J. (2016). Deep residual learning for image recognition. In CVPR
Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47, 853–899.
Article MathSciNet Google Scholar
Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images. In CVPR
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML
Jaderberg, M., Simonyan, K, Zisserman A, & Kavukcuoglu K. (2015). Spatial transformer networks. In NIPS
Kang, D., & Chan, A. (2018). Crowd counting by adaptively fusing predictions from an image pyramid. In BMVC
Kang, D., Dhar, D., & Chan A. (2017). Incorporating side information by adaptive convolution. In NIPS
Kang, D., Ma, Z., & Chan, A. B. (2018). Beyond counting: Comparisons of density maps for crowd analysis tasks–Counting, detection, and tracking. IEEE Transactions on Circuits and Systems for Video Technology, 29, 1408–1422.
Article Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980
Klein, B., Wolf, L., & Afek, Y. (2015). A dynamic convolutional layer for short range weather prediction. In CVPR
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In NIPS
Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In NIPS
Li, S., Liu, Z. Q., & Chan, A. B. (2015). Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: IJCV
Li, Y., Zhang, X., & Chen, D. (2018). CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In CVPR
Liu, R., Li, Z., & Jia, J. (2008). Image partial blur detection and classification. In CVPR
Ma, Z., Yu, L., & Chan, A. B. (2015). Small instance detection by integer programming on object density maps. In CVPR
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In ICML
Niu, Z., Zhou, M., Wang, L., Gao, X., & Hua, G. (2016). Ordinal regression with multiple output CNN for age estimation. In CVPR
Onoro-Rubio, D., & López-Sastre, R. J. (2016). Towards perspective-free object counting with deep learning. In ECCV
Pech-Pacheco, J. L., Cristóbal, G., Chamorro-Martinez, J., & Fernández-Valdivia, J. (2000). Diatom autofocusing in brightfield microscopy: A comparative study. In ICPR
Ren, W., Kang, D., Tang, Y., & Chan, A. (2017). Fusing crowd density maps and visual object trackers for people tracking in crowd scenes. In CVPR
Rodriguez, M., Laptev, I., Sivic, J., & Audibert, J. Y. Y. (2011). Density-aware person detection and tracking in crowds. In ICCV
Rothe, R., Timofte, R., & Van Gool, L. (2015). DEX: Deep expectation of apparent age from a single image. In ICCVW
Sam, D. B., Surya, S., & Babu, R. V. (2017). Switching convolutional neural network for crowd counting. In CVPR
Shi, J., Xu, L., & Jia, J. (2014). Discriminative blur detection features. In CVPR
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In ICLR
Sindagi, V. A., & Patel, V. M. (2017). Generating high-quality crowd density maps using contextual pyramid CNNs. In ICCV
Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In NIPS
Xu, L., Ren, J. S., Liu, C., & Jia, J. (2014). Deep convolutional neural network for image deconvolution. In NIPS
Zhang, C., Li, H., Wang, X., & Yang, X. (2015). Cross-scene crowd counting via deep convolutional neural networks. In CVPR
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In ECCV
Zhang, L., Shi, M., & Chen, Q. (2018). Crowd counting via scale-adaptive convolutional neural network. In WACV
Zhang, Y., Zhou, D., & Chen, S., Gao, S., & Ma, Y. (2016). Single-image crowd counting via multi-column convolutional neural network. In CVPR

Download references

Acknowledgements

The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. [T32-101/15-R] and CityU 11212518). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU used for this research.

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR, China
Di Kang, Debarun Dhar & Antoni B. Chan
Tencent AI Lab, Shenzhen, China
Di Kang

Authors

Di Kang
View author publications
You can also search for this author in PubMed Google Scholar
Debarun Dhar
View author publications
You can also search for this author in PubMed Google Scholar
Antoni B. Chan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Kang.

Additional information

Communicated by S. Soatto.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kang, D., Dhar, D. & Chan, A.B. Incorporating Side Information by Adaptive Convolution. Int J Comput Vis 128, 2897–2918 (2020). https://doi.org/10.1007/s11263-020-01345-8

Download citation

Received: 08 January 2019
Accepted: 30 May 2020
Published: 02 July 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11263-020-01345-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incorporating Side Information by Adaptive Convolution

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incorporating Side Information by Adaptive Convolution

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation