Evaluation of visual relationship classifiers with partially annotated datasets

de Moura Estevão Filho, Roberto; Rodríguez Carneiro Gomes, José Gabriel; Oliveira Nunes, Leonardo

doi:10.1007/s11042-023-15967-w

Evaluation of visual relationship classifiers with partially annotated datasets

Published: 14 July 2023

Volume 83, pages 18333–18352, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Roberto de Moura Estevão Filho ORCID: orcid.org/0000-0002-3502-4892¹,
José Gabriel Rodríguez Carneiro Gomes¹ &
Leonardo Oliveira Nunes²

103 Accesses
1 Altmetric
Explore all metrics

Abstract

In this work, we investigate neural networks as visual relationship classifiers for precision-constrained applications in partially annotated datasets. The classifier is a convolutional neural network, which we benchmark on three visual relationship datasets. We discuss the effect of partial annotation on precision and why precision-based metrics are not adequate in partial annotation cases. So far, this topic has not been explored in the context of visual relationship classification. We introduce a threshold tuning method that imposes a soft constraint on precision while being less sensitive to the degree of annotation than a regular precision-recall trade-off method. Performance can then be measured via recall of predictions computed with thresholds tuned by the proposed method. Our previously introduced negative sample mining method is now extended to partially annotated datasets (namely Visual Relationship Detection, VRD, and Visual Genome, VG), by sampling from unlabeled pairs instead of unrelated pairs. When thresholds are tuned using our method, negative sample mining improves recall from \(24.1\%\) to \(30.6\%\) and from \(36.7\%\) to \(41.3\%\) for VRD and VG, respectively. The neural networks also maintain the ability to correctly classify between predicates. When considering only ground-truth relationships for threshold tuning, there is only a small decrease in recall (from \(45.1\%\) to \(43.8\%\) in VRD, or from \(60.5\%\) to \(58.7\%\) in VG) compared to when the neural networks are trained only on ground-truth samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-task Compositional Network for Visual Relationship Detection

Article 30 July 2020

Modelling relations with prototypes for visual relation detection

Article 04 August 2020

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Data Availability

the datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

More information on the challenge is available at https://storage.googleapis.com/openimages/web/challenge2019.html.
Instead of using the conv_5 layers on deeper backbones, we use a randomly initialized convolutional stack with the same architecture as the conv_5 layers from the ResNet-18 backbone.

References

Ahmad S, Mehfuz S, Mebarek-Oudina F et al (2022) Rsm analysis based cloud access security broker: a systematic literature review. Cluster Comput 25(5):3733–3763. https://doi.org/10.1007/s10586-022-03598-z
Article Google Scholar
Anderson P, Fernando B, Johnson M, et al (2016) Spice: Semantic propositional image caption evaluation. In: European Conference on Computer Vision, pp 382–398 https://doi.org/10.1007/978-3-319-46454-1_24
Cole E, Mac Aodha O, Lorieul T, et al (2021) Multi-Label Learning From Single Positive Labels. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 933–942
Dai B, Zhang Y, Lin D (2017) Detecting visual relationships with deep relational networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3298–3308 https://doi.org/10.1109/CVPR.2017.352
Estevão RdMFilho, Gomes JGR, Nunes LdO (2020) Visual relationship classification with negative-sample mining. In: 2020 IEEE International Conference on Image Processing (ICIP), pp 2251–2255 https://doi.org/10.1109/ICIP40778.2020.9191170
Everingham M, Eslami SMA, Van Gool L et al (2015) The pascal visual object classes challenge: A retrospective. International Journal of Computer Vision 111(1):98–136. https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar
He K, Zhang X, Ren S et al (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. 2015 IEEE International Conference on Computer Vision. Santiago, Chile, pp 1026–1034
Google Scholar
He K, Zhang X, Ren S, et al (2016) Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 770–77 https://doi.org/10.1109/CVPR.2016.90
He K, Gkioxari G, Dollár P, et al (2017) Mask r-cnn. In: IEEE International Conference on Computer Vision, pp 2980–2988 https://doi.org/10.1109/ICCV.2017.322
Inayoshi S, Otani K, Tejero-de Pablos A, et al (2020) Bounding-box Channels for Visual Relationship Detection. In: European Conference on Computer Vision, pp 682–697 https://doi.org/10.1007/978-3-030-58558-7_40
Johnson J, Krishna R, Stark M, et al (2015) Image retrieval using scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3668–3678 https://doi.org/10.1109/CVPR.2015.7298990
Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: Fully convolutional localization networks for dense captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4565–4574 https://doi.org/10.1109/CVPR.2016.494
Johnson J, Gupta A, Fei-Fei L (2018) Image generation from scene graphs. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1219–1228 https://doi.org/10.1109/CVPR.2018.00133
Krishna R, Zhu Y, Groth O et al (2017) Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123(1):32–73. https://doi.org/10.1007/s11263-016-0981-7
Article MathSciNet Google Scholar
Kuznetsova A, Rom H, Alldrin N et al (2020) The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale. International Journal of Computer Vision 128:1956–1981. https://doi.org/10.1007/s11263-020-01316-z
Article Google Scholar
Li L, Chen L, Huang Y, et al (2022) The devil is in the labels: Noisy label correction for robust scene graph generation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 18,847–18,856 https://doi.org/10.1109/CVPR52688.2022.01830
Li Y, Ouyang W, Wang X, et al (2017a) Vip-cnn: Visual phrase guided convolutional neural network. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7244–7253 https://doi.org/10.1109/CVPR.2017.766
Li Z, Peng C, Yu G, et al (2017b) Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264
Liang K, Guo Y, Chang H, et al (2018) Visual relationship detection with deep structural ranking. In: AAAI Conference on Artificial Intelligence
Liang Y, Bai Y, Zhang W, et al (2019) Vrr-vg: Refocusing visually-relevant relationships. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp 10,402–10,411 https://doi.org/10.1109/ICCV.2019.01050
Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400
Liu R, Lehman J, Molino P, et al (2018) An intriguing failing of convolutional neural networks and the coordconv solution. In: Advances in Neural Information Processing Systems, pp 9605–9616
Lu C, Krishna R, Bernstein M, et al (2016) Visual relationship detection with language priors. In: European Conference on Computer Vision, pp 852–869 https://doi.org/10.1007/978-3-319-46448-0_51
Ma C, Sun L, Zhong Z et al (2021) ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognition 111(107):684. https://doi.org/10.1016/j.patcog.2020.107684
Article Google Scholar
Mikolov T, Chen K, Corrado G, et al (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Newell A, Deng J (2017) Pixels to graphs by associative embedding. Advances in Neural Information Processing Systems 30:2171–2180
Google Scholar
Nyo MT, Mebarek-Oudina F, Hlaing SS, et al (2022) Otsu’s thresholding technique for mri image brain tumor segmentation. Multimedia Tools and Applications 81:43,837–43,849. https://doi.org/10.1007/s11042-022-13215-1
Paszke A, Gross S, Massa F, et al (2019) Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems
Peng Y, Chi J (2020) Unsupervised Cross-Media Retrieval Using Domain Adaptation With Scene Graph. IEEE Transactions on Circuits and Systems for Video Technology 30(11):4368–4379. https://doi.org/10.1109/TCSVT.2019.2953692
Article Google Scholar
Peyre J, Laptev I, Schmid C, et al (2017) Weakly-supervised learning of visual relations. In: IEEE International Conference on Computer Vision, pp 5189–5198 https://doi.org/10.1109/ICCV.2017.554
du Plessis MC, Niu G, Sugiyama M (2014) Analysis of Learning from Positive and Unlabeled Data. In: Advances in Neural Information Processing Systems
Qi M, Wang Y, Li A et al (2020) Sports Video Captioning via Attentive Motion Representation and Group Relationship Modeling. IEEE Transactions on Circuits and Systems for Video Technology 30(8):2617–2633. https://doi.org/10.1109/TCSVT.2019.2921655
Article Google Scholar
Ren S, He K, Girshick R et al (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Ren S, He K, Girshick R et al (2017) Object detection networks on convolutional feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(7):1476–1481. https://doi.org/10.1109/TPAMI.2016.2601099
Article Google Scholar
Sadeghi MA, Farhadi A (2011) Recognition using visual phrases. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1745–1752 https://doi.org/10.1109/CVPR.2011.5995711
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York, NY, USA
Google Scholar
Sutskever I, Martens J, Dahl GE et al (2013) On the importance of initialization and momentum in deep learning. International Conference on Machine Learning. Atlanta, Georgia, USA, pp 1–9
Google Scholar
Tang K, Niu Y, Huang J, et al (2020) Unbiased scene graph generation from biased training. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 3713–3722 https://doi.org/10.1109/CVPR42600.2020.00377
Wang H, Ganapathiraju MK (2015) Evaluation of Protein-protein Interaction Predictors with Noisy Partially Labeled Data Sets. arXiv preprint arXiv:1509.05742
Xi Y, Zhang Y, Ding S et al (2020) Visual question answering model based on visual relationship detection. Signal Processing: Image Communication 80(115):648. https://doi.org/10.1016/j.image.2019.115648
Article Google Scholar
Xu D, Zhu Y, Choy CB, et al (2017) Scene graph generation by iterative message passing. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3097–3106 https://doi.org/10.1109/cvpr.2017.330
Yang J, Ang YZ, Guo Z et al (2022) Panoptic scene graph generation. In: Avidan S, Brostow G, Cissé M et al (eds) Computer Vision - ECCV 2022. Springer Nature Switzerland, Cham, pp 178–196
Chapter Google Scholar
Yu F, Tang J, Yin W, et al (2020) ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph. arXiv preprint arXiv:2006.16934
Yu R, Li A, Morariu VI, et al (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: IEEE International Conference on Computer Vision, pp 1068–1076 https://doi.org/10.1109/ICCV.2017.121
Zellers R, Yatskar M, Thomson S, et al (2018) Neural motifs: Scene graph parsing with global context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5831–5840 https://doi.org/10.1109/cvpr.2018.00611
Zhan Y, Yu J, Yu T, et al (2019) On exploring undetermined relationships for visual relationship detection. In: IEEE Conference on Computer Vision and Pattern Recognition, p 5123–5132https://doi.org/10.1109/cvpr.2019.00527
Zhang H, Kyaw Z, Chang SF, et al (2017a) Visual translation embedding network for visual relation detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3107–3115 https://doi.org/10.1109/CVPR.2017.331
Zhang J, Elhoseiny M, Cohen S, et al (2017b) Relationship proposal networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5226–5234 https://doi.org/10.1109/CVPR.2017.555
Zhang J, Kalantidis Y, Rohrbach M, et al (2019) Large-scale visual relationship understanding. In: AAAI Conference on Artificial Intelligence, pp 9185–9194 https://doi.org/10.1609/aaai.v33i01.33019185
Zhang Z, Wu Q, Wang Y et al (2021) Exploring region relationships implicitly: Image captioning with visual relationship attention. Image and Vision Computing. https://doi.org/10.1016/j.imavis.2021.104146
Article Google Scholar
Zhou H, Zhang C, Zhao M et al (2021) Improving Visual Relationship Detection With Two-Stage Correlation Exploitation. IEEE Transactions on Circuits and Systems for Video Technology 31(7):2751–2763. https://doi.org/10.1109/TCSVT.2020.3032650
Article Google Scholar
Zhuang B, Liu L, Shen C, et al (2017) Towards context-aware interaction recognition for visual relationship detection. In: IEEE International Conference on Computer Vision, pp 589–598 https://doi.org/10.1109/ICCV.2017.71

Download references

Funding

this work has been supported in part by Microsoft ATL in Rio de Janeiro, and in part by Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq - Brazil.

Author information

Authors and Affiliations

Department of Electrical Engineering, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
Roberto de Moura Estevão Filho & José Gabriel Rodríguez Carneiro Gomes
Microsoft, Rio de Janeiro, Brazil
Leonardo Oliveira Nunes

Authors

Roberto de Moura Estevão Filho
View author publications
You can also search for this author in PubMed Google Scholar
José Gabriel Rodríguez Carneiro Gomes
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Oliveira Nunes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roberto de Moura Estevão Filho.

Ethics declarations

Conflicts of interest

the authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

de Moura Estevão Filho, R., Rodríguez Carneiro Gomes, J.G. & Oliveira Nunes, L. Evaluation of visual relationship classifiers with partially annotated datasets. Multimed Tools Appl 83, 18333–18352 (2024). https://doi.org/10.1007/s11042-023-15967-w

Download citation

Received: 09 December 2022
Revised: 20 March 2023
Accepted: 26 May 2023
Published: 14 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15967-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of visual relationship classifiers with partially annotated datasets

Abstract

Access this article

Similar content being viewed by others

Multi-task Compositional Network for Visual Relationship Detection

Modelling relations with prototypes for visual relation detection

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluation of visual relationship classifiers with partially annotated datasets

Abstract

Access this article

Similar content being viewed by others

Multi-task Compositional Network for Visual Relationship Detection

Modelling relations with prototypes for visual relation detection

Explanation-Based Weakly-Supervised Learning of Visual Relations with Graph Networks

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation