计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 104-111.doi: 10.11896/jsjkx.221000084

• 计算机图形学&多媒体 • 上一篇    下一篇

基于自适应正则化的无偏场景图生成方法

李浩晨1, 曹付元1,2, 乔世昌1   

  1. 1 山西大学计算机与信息技术学院 太原030006
    2 山西大学计算智能与中文信息处理教育部重点实验室 太原030006
  • 收稿日期:2022-10-11 修回日期:2023-03-07 出版日期:2023-10-10 发布日期:2023-10-10
  • 通讯作者: 曹付元(cfy@sxu.edu.cn)
  • 作者简介:(Striverlhc@163.com)
  • 基金资助:
    国家自然科学基金(61976128);山西省应用基础研究计划项目(201901D111035)

Unbiased Scene Graph Generation Based on Adaptive Regularization Algorithm

LI Haochen1, CAO Fuyuan1,2, QIAO Shichang1   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University,Taiyuan 030006,China
  • Received:2022-10-11 Revised:2023-03-07 Online:2023-10-10 Published:2023-10-10
  • About author:LI Haochen,born in 1995,postgra-duate.His main research interests include computer vision and data mining.CAO Fuyuan,born in 1974,professor,Ph.D supervisor,is a senior member of China Computer Federation.His main research interests include data mining and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61976128) and Applied Basic Research Program of Shanxi Pro-vince(201901D111035).

摘要: 场景图生成旨在给定一张图片,通过目标检测模块得到实体和实体间关系的视觉三元组形式,即主语、关系和宾语,构建语义结构化表示。场景图可应用于图像检索和视觉问答等下游任务。然而,由于数据集中的实体间关系呈长尾分布,因此现有模型在预测关系时更偏向于粗粒度的头部关系。这样的场景图无法对下游任务起到辅助性作用。以往工作普遍采用再平衡策略,如重采样和重加权的方法,来解决长尾问题。但模型反复学习尾部关系样本,易出现过拟合现象。为了解决上述问题,文中提出了一种自适应正则化无偏场景图生成方法。具体来说,该方法通过设计一个基于先验关系频率的正则项,自适应地调整模型全连接分类器权重,从而实现对模型的平衡预测。所提方法在场景图VG(Visual Genome)数据集上进行了实验,实验结果表明,该方法不仅能防止模型过拟合,也能缓解关系长尾分布问题对场景图生成的负面影响,且最先进的场景图生成方法在结合所提方法后能更有效地改善无偏场景图生成的性能。

关键词: 场景图, 长尾分布, 重采样, 重加权, 自适应正则化

Abstract: The purpose of scene graph generation is to give a picture,obtain the visual triplet form of entities and relationships between entities through the object detection module,namely subject,relationship and object,and construct a semantic structured representation.Scene graphs can be applied to downstream tasks such as image retrieval and visual question answering.However,due to the longtail distribution of relationships between entities in the dataset,existing models tend to predict coarse grained head relationships.Such scene graph cannot play an auxiliary role for downstream tasks.Previous works generally adopt rebalancing strategies such as resampling and reweighting to solve the long tail problem.However,because the models repeatedly learn the tail relationship samples,it is prone to overfitting.In order to solve the above problems,an adaptive regularized unbiased scene graph generation method is proposed in this paper.Specifically,the method adaptively adjusts the weights of full connected classifier of the model by designing a regularization term based on the prior relation frequency,so as to achieve the prediction of model balance.The proposedmethod is tested on Visual Genome dataset,and the experimental results show that it can not only prevent the model from overfitting,but also alleviate the negative impact of the longtail distribution problem on the scene graph generation,and the state-of-the-artscene graph generation methods combined with the proposed method can more effectively improve the performance of unbiased scene graph generation.

Key words: Scene graph, Long-tail distribution, Re-sampling, Re-weighting, Adaptive regularization

中图分类号: 

  • TP391
[1]JOHNSON J,KRISHNA R,STARK M,et al.Image Retrieval Using Scene Graph[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2015:3668-3678.
[2]GUO Y,CHEN J J,ZHANG H,et al.Visual Relations Aug-mented Cross-modal Retrieval[C]//Proceedings of ICMR International Conference on Multimedia Retrieval.2020.
[3]SONG X,CHEN J J,WU Z X,et al.Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval[J].IEEE Transactions on Multimedia,2021,24:2914-2923.
[4]TANG K H,ZHANG H,WU B,et al.Learning to Compose Dynamic Tree Structures for Visual Contexts[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:6619-6628.
[5]HILDEBRANDT M,LI H,KONER R,et al.Scene Graph Reasoning for Visual Question Answering[J].arXiv:2007.01072,2020.
[6]CHEN S Z,JIN Q,WANG P,et al.Say as You Wish:Fine-grained Control of Image Caption Generation with Abstract Scene Graphs[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2020:9962-9971.
[7]GU J X,JOTY S,CAI J F,et al.Unpaired Image Captioning Via Scene Graph Alignments[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2019:10323-10332.
[8]GUO Y Y,GAO L L,WANG X H,et al.From General to Specific:Informative Scene Graph Generation Via Balance Adjustment[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021.
[9]KANG B,XIE S,ROHRBACH M,et al.Decoupling Representation and Classifier for Long-Tailed Recognition[J].arXiv:1910.09217,2019.
[10]DAI B,ZHANG Y Q,LIN D H,et al.Detecting Visual Relationships with Deep Relational Networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:3076-3086.
[11]LIANG K M,GUO Y H,CHANG H,et al.Visual Relationship Detection with Deep Structural Ranking[C]//Proceedings of Thirty-Second AAAI Conference on Artificial Intelligence.2018.
[12]LIAO W T,ROSENHAHN B,SHUAI L,et al.Natural Lan-guage Guided Visual Relationship Detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2019.
[13]LI R J,ZHANG S Y,WAN B,et al.Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Ge-neration[C]//Proceedings of IEEEConference on Computer Vision and Pattern Recognition.New York:IEEE Press,2021:11109-11119.
[14]XU D F,ZHU Y K,CHOY C B,et al.Scene Graph Generation by Iterative Message Passing[C]//Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.New York:IEEE Press,2017:5410-5419.
[15]ZELLERS R,YATSKAR M,THOMSON S,et al.Neural Motifs:Scene Graph Parsing with Global Context[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:5831-5840.
[16]YANG J W,LU J S,LEE S,et al.Graph R-CNN for SceneGraph Generation[C]//Proceedings of the ECCV European Conference on Computer Vision.2018:670-685.
[17]CHEN T S,YU W H,CHEN R Q,et al.Knowledge-embedded Routing Network for Scene Graph Generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:6163-6171.
[18]HUANG Y T,YAN H.Scene Graph Generation Model Combi-ning Attention Mechanism and Feature Fusion [J].Chinese Computer Science,2020,47(6):113-137.
[19]LU Y C,RAI H,CHANG J,et al.Context-Aware Scene Graph Generation with Seq2seq Transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:15931-15941.
[20]ZAREIAN A,KARAMAN S,CHANG S F.Bridging knowledge graphs to generate scene graphs[C]//Proceedings of the ECCV European Conference on Computer Vision.Cham:Springer,2020:606-623.
[21]TANG K H,NIU Y L,HUANG J Q,et al.Unbiased Scene Graph Generation from Biased Training[C]//Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition.2020:3716-3725.
[22]KRISHNA R,ZHU Y,GROTH O,et al.Visual Genome:Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J].International Journal of Computer Vision,2017,123(1):32-73.
[23]LYU X,GAO L,GUO Y,et al.Fine-Grained Predicates Lear-ning for Scene Graph Generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:19467-19475.
[24]CHIOU M J,DING H,YAN H,et al.Re-covering the Unbiased Scene Graphs from t-he biased ones[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:1581-1590.
[25]LU C,KRISHNA R,BERNSTEIN M,et al.Visual Relationship Detection with Language Priors[C]//Proceedings of the ECCV European Conference on Computer Vision.Cham:Springer,2016:852-869.
[26]CHEN C,ZHAN Y,YU B,et al.Resistance Training usingPrior Bias:toward Unbiased Scene Graph Generation[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022:212-220.
[27]LIN X,DING C,ZENG J,et al.Gps-net:GraphProperty Sensing Network for Scene Graph Generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2020:3746-3753.
[28]YAN S,SHEN C,JIN Z,et al.Pcpl:Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:265-273.
[29]SUHAIL M,MITTAL A,SIDDIQUIE B,et al.Energy-basedLearning for Scene Graph Generation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:13936-13945.
[30]YU J,CHAI Y,WANG Y,et al.Cogtree:Cognition Tree Loss for Unbiased Scene Gr-aph generation[J].arXiv:2009.07526,2020.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!