research-article

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

Authors:
Shuyu Yang

Xi'an Jiaotong University, Xi'an, China

Xi'an Jiaotong University, Xi'an, China

0000-0003-0409-1467
View Profile

,
Yinan Zhou

Xi'an Jiaotong University, Xi'an, China

Xi'an Jiaotong University, Xi'an, China

0009-0003-2211-6688
View Profile

,
Zhedong Zheng

National University of Singapore, Singapore, Singapore

National University of Singapore, Singapore, Singapore

0000-0002-2434-9050
View Profile

,
Yaxiong Wang

Hefei University of Technology, Hefei, China

Hefei University of Technology, Hefei, China

0000-0001-6596-8117
View Profile

,
Li Zhu

Xi'an Jiaotong University, Xi'an, China

Xi'an Jiaotong University, Xi'an, China

0000-0003-2136-3196
View Profile

,
Yujiao Wu

Peng Cheng Laboratory, Shenzhen, China

Peng Cheng Laboratory, Shenzhen, China

0000-0001-6366-9834
View Profile

MM '23: Proceedings of the 31st ACM International Conference on MultimediaOctober 2023Pages 4492–4501https://doi.org/10.1145/3581783.3611709

Published:27 October 2023Publication History

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 4492–4501

ABSTRACT

In this paper, we introduce a large Multi-Attribute and Language Search dataset for text-based person retrieval, called MALS, and explore the feasibility of performing pre-training on both attribute recognition and image-text matching tasks in one stone. In particular, MALS contains 1,510,330 image-text pairs, which is about 37.5 × larger than prevailing CUHK-PEDES, and all images are annotated with 27 attributes. Considering the privacy concerns and annotation costs, we leverage the off-the-shelf diffusion models to generate the dataset. To verify the feasibility of learning from the generated data, we develop a new joint Attribute Prompt Learning and Text Matching Learning (APTM) framework, considering the shared knowledge between attribute and text. As the name implies, APTM contains an attribute prompt learning stream and a text matching learning stream. (1) The attribute prompt learning leverages the attribute prompts for image-attribute alignment, which enhances the text matching learning. (2) The text matching learning facilitates the representation learning on fine-grained details, and in turn, boosts the attribute prompt learning. Extensive experiments validate the effectiveness of the pre-training on MALS, achieving state-of-the-art retrieval performance via APTM on three challenging real-world benchmarks. In particular, APTM achieves a consistent improvement of +6.96 %, +7.68%, and +16.95% Recall@1 accuracy on CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets by a clear margin, respectively. The dataset, model, and code are available at https://github.com/Shuyu-XJTU/APTM.

Supplemental Material

mmfp0088-video.mp4

mp4

40.6 MB

Download

References

Surbhi Aggarwal, Venkatesh Babu Radhakrishnan, and Anirban Chakraborty. 2020. Text-based person search via attribute-aided matching. In WACV. 2617--2625.Google Scholar
Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi, and David J Fleet. 2023. Synthetic Data from Diffusion Models Improves ImageNet Classification. arXiv preprint arXiv:2304.08466 (2023).Google Scholar
Tim Brooks, Aleksander Holynski, and Alexei A Efros. 2023. Instructpix2pix: Learning to follow image editing instructions. In CVPR. 18392--18402.Google Scholar
Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh. 2019. OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019).Google Scholar
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR. 7291--7299.Google Scholar
Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Jing Shao, Zejian Yuan, and Xiaogang Wang. 2018a. Improving deep visual representation for person re-identification by global and local image-language association. In ECCV. 54--70.Google Scholar
Tianlang Chen, Chenliang Xu, and Jiebo Luo. 2018b. Improving text-based person search by spatial matching and adaptive threshold. In WACV. 1879--1887.Google Scholar
Weihua Chen, Xianzhe Xu, Jian Jia, Hao Luo, Yaohua Wang, Fan Wang, Rong Jin, and Xiuyu Sun. 2023. Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks. In CVPR. 15050--15061.Google Scholar
Yuhao Chen, Guoqing Zhang, Yujiang Lu, Zhenxing Wang, and Yuhui Zheng. 2022. TIPCB: A simple but effective part-based convolutional baseline for text-based person search. Neurocomputing, Vol. 494 (2022), 171--181.Google ScholarCross Ref
Ekin D Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V Le. 2020. Randaugment: Practical automated data augmentation with a reduced search space. In CVPR workshop. 702--703.Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248--255. https://doi.org/10.1109/CVPR.2009.5206848Google ScholarCross Ref
Zefeng Ding, Changxing Ding, Zhiyin Shao, and Dacheng Tao. 2021. Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification. arXiv preprint arXiv:2107.12666 (2021).Google Scholar
Bryce Drennan. 2022. imaginAIry. https://github.com/brycedrennan/imaginAIry. Accessed: 2022-05-04.Google Scholar
Ammarah Farooq, Muhammad Awais, Josef Kittler, and Syed Safwan Khalid. 2022. AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification. In AAAI, Vol. 36. 4477--4485.Google ScholarCross Ref
Chenyang Gao, Guanyu Cai, Xinyang Jiang, Feng Zheng, Jun Zhang, Yifei Gong, Pai Peng, Xiaowei Guo, and Xing Sun. 2021. Contextual non-local alignment over full-scale representation for text-based person search. arXiv preprint arXiv:2101.03036 (2021).Google Scholar
Kai Han, Jianyuan Guo, Chao Zhang, and Mingjian Zhu. 2018. Attribute-aware attention model for fine-grained representation learning. In ACM MM. 2040--2048.Google Scholar
Xiao Han, Sen He, Li Zhang, and Tao Xiang. 2021. Text-Based Person Search with Limited Data. In BMVC.Google Scholar
Keke He, Zhanxiong Wang, Yanwei Fu, Rui Feng, Yu-Gang Jiang, and Xiangyang Xue. 2017. Adaptively weighted multi-task deep network for person attribute classification. In ACM MM. 1636--1644.Google Scholar
Patrick Helber, Benjamin Bischke, Andreas Dengel, and Damian Borth. 2019. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (2019).Google Scholar
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626 (2022).Google Scholar
Jian Jia, Houjing Huang, Xiaotang Chen, and Kaiqi Huang. 2021. Rethinking of pedestrian attribute recognition: A reliable evaluation under zero-shot pedestrian identity setting. arXiv preprint arXiv:2107.03576 (2021).Google Scholar
Jian Jia, Houjing Huang, Wenjie Yang, Xiaotang Chen, and Kaiqi Huang. 2020. Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method. arXiv preprint arXiv:2005.11909 (2020).Google Scholar
Yiqi Jiang, Weihua Chen, Xiuyu Sun, Xiaoyu Shi, Fan Wang, and Hao Li. 2021. Exploring the quality of gan generated images for person re-identification. In ACM MM. 4146--4155.Google Scholar
Ya Jing, Chenyang Si, Junbo Wang, Wei Wang, Liang Wang, and Tieniu Tan. 2020. Pose-guided multi-granularity attention network for text-based person search. In AAAI, Vol. 34. 11189--11196.Google ScholarCross Ref
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, Vol. 1. 2.Google Scholar
Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In ECCV. 201--216.Google Scholar
Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. 2022. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. In ICML.Google Scholar
Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, and Xiaogang Wang. 2017a. Identity-aware textual-visual matching with latent co-attention. In ICCV. 1890--1899.Google Scholar
Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, and Xiaogang Wang. 2017b. Person search with natural language description. In CVPR. 1970--1979.Google Scholar
Shuzhao Li, Huimin Yu, and Roland Hu. 2020. Attributes-aided part detection and refinement for person re-identification. Pattern Recognition, Vol. 97 (2020), 107016.Google ScholarDigital Library
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In CVPR. 2197--2206.Google Scholar
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, and Yi Yang. 2019. Improving person re-identification by attribute and identity learning. Pattern recognition, Vol. 95 (2019), 151--161.Google ScholarDigital Library
Hefei Ling, Ziyang Wang, Ping Li, Yuxuan Shi, Jiazhong Chen, and Fuhao Zou. 2019. Improving person re-identification by multi-task learning. Neurocomputing, Vol. 347 (2019), 109--118.Google ScholarDigital Library
Jiawei Liu, Zheng-Jun Zha, Richang Hong, Meng Wang, and Yongdong Zhang. 2019. Deep adversarial graph attention convolution network for text-based person search. In ACM MM. 665--673.Google Scholar
Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, and Xiaogang Wang. 2017. Hydraplus-net: Attentive deep features for pedestrian analysis. In ICCV. 350--359.Google Scholar
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021a. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012--10022.Google Scholar
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021b. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In ICCV.Google Scholar
Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations (ICLR).Google Scholar
Jinghao Luo, Yaohua Liu, Changxin Gao, and Nong Sang. 2019. Learning what and where from attributes to improve person re-identification. In ICIP. IEEE, 165--169.Google Scholar
Binh X Nguyen, Binh D Nguyen, Tuong Do, Erman Tjiputra, Quang D Tran, and Anh Nguyen. 2021. Graph-based person signature for person re-identifications. In CVPR. 3492--3501.Google Scholar
Kai Niu, Yan Huang, Wanli Ouyang, and Liang Wang. 2020. Improving description-based person re-identification by multi-granularity image-text alignments. IEEE Transactions on Image Processing (TIP), Vol. 29 (2020), 5542--5556.Google ScholarCross Ref
Scott Reed, Zeynep Akata, Honglak Lee, and Bernt Schiele. 2016. Learning deep representations of fine-grained visual descriptions. In CVPR. 49--58.Google Scholar
Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In ECCV workshop. Springer, 17--35.Google ScholarCross Ref
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR. 10684--10695.Google Scholar
Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, and Yannis Kalantidis. 2023. Fake it till you make it: Learning transferable representations from synthetic ImageNet clones. In CVPR.Google Scholar
Zhiyin Shao, Xinyu Zhang, Meng Fang, Zhifeng Lin, Jian Wang, and Changxing Ding. 2022. Learning Granularity-Unified Representations for Text-to-Image Person Re-identification. In ACM MM. 5566--5574.Google Scholar
Yuxuan Shi, Zhen Wei, Hefei Ling, Ziyang Wang, Jialie Shen, and Ping Li. 2020. Person retrieval in surveillance videos via deep attribute mining and reasoning. IEEE Transactions on Multimedia, Vol. 23 (2020), 4376--4387.Google ScholarCross Ref
Jordan Shipard, Arnold Wiliem, Kien Nguyen Thanh, Wei Xiang, and Clinton Fookes. 2023. Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion. arxiv: 2302.03298 [cs.CV]Google Scholar
Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, and Xiao Wang. 2023. See finer, see more: Implicit modality alignment for text-based person retrieval. In ECCV workshop.Google ScholarDigital Library
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping. In CVPR.Google Scholar
Xiaoxiao Sun and Liang Zheng. 2019. Dissecting person re-identification from the viewpoint of viewpoint. In CVPR. 608--617.Google Scholar
Wei Suo, Mengyang Sun, Kai Niu, Yiqi Gao, Peng Wang, Yanning Zhang, and Qi Wu. 2022. A Simple and Robust Correlation Filtering Method for Text-Based Person Search. In ECCV. Springer, 726--742.Google Scholar
Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, and Xiaolin Hu. 2019a. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In ICCV. 4997--5006.Google Scholar
Geyu Tang, Xingyu Gao, and Zhenyu Chen. 2022. Learning semantic representation on visual attribute graph for person re-identification and beyond. ACM Transactions on Multimedia Computing, Communications and Applications (2022).Google Scholar
Hao Tang, Dan Xu, Gaowen Liu, Wei Wang, Nicu Sebe, and Yan Yan. 2019b. Cycle in cycle generative adversarial networks for keypoint-guided image generation. In ACM MM. 2052--2060.Google Scholar
Chiat-Pin Tay, Sharmili Roy, and Kim-Hui Yap. 2019. Aanet: Attribute attention network for person re-identifications. In CVPR. 7134--7143.Google Scholar
Chengji Wang, Zhiming Luo, Yaojin Lin, and Shaozi Li. 2021. Text-based person search via multi-granularity embedding learning. In IJCAI. 1068--1074.Google Scholar
Jingya Wang, Xiatian Zhu, Shaogang Gong, and Wei Li. 2018. Transferable joint attribute-identity deep learning for unsupervised person re-identification. In CVPR. 2275--2284.Google Scholar
Yanan Wang, Shengcai Liao, and Ling Shao. 2020b. Surpassing real-world source training data: Random 3d characters for generalizable person re-identification. In ACM MM. 3422--3430.Google Scholar
Zhe Wang, Zhiyuan Fang, Jun Wang, and Yezhou Yang. 2020a. Vitaa: Visual-textual attributes alignment in person search by natural language. In ECCV. 402--420.Google Scholar
Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, and Yifeng Li. 2022a. CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval. In ACM MM. 5314--5322.Google Scholar
Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, and Yifeng Li. 2022b. Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold. In ACM MM. 1984--1992.Google Scholar
Zijie Wang, Aichun Zhu, Zhe Zheng, Jing Jin, Zhouxin Xue, and Gang Hua. 2020c. IMG-Net: inner-cross-modal attentional multigranular network for description-based person re-identification. Journal of Electronic Imaging (JEI), Vol. 29, 4 (2020), 043028.Google ScholarCross Ref
Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In EMNLP-IJCNLP. 6382--6388.Google Scholar
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer gan to bridge domain gap for person re-identification. In CVPR. 79--88.Google Scholar
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In CVPR.Google Scholar
Suncheng Xiang, Dahong Qian, Mengyuan Guan, Binjie Yan, Ting Liu, Yuzhuo Fu, and Guanjie You. 2021. Less is more: Learning from synthetic data with fine-grained attributes for person re-identification. ACM Transactions on Multimedia Computing, Communications and Applications (2021).Google Scholar
Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2016. End-to-end deep learning for person search. arXiv preprint arXiv:1604.01850, Vol. 2, 2 (2016), 4.Google Scholar
Shuanglin Yan, Neng Dong, Liyan Zhang, and Jinhui Tang. 2022. CLIP-Driven Fine-grained Text-Image Person Re-identification. arXiv preprint arXiv:2210.10276 (2022).Google Scholar
Yan Zhang, Xusheng Gu, Jun Tang, Ke Cheng, and Shoubiao Tan. 2019. Part-based attribute-aware network for person re-identification. IEEE Access, Vol. 7 (2019), 53585--53595.Google ScholarCross Ref
Ying Zhang and Huchuan Lu. 2018. Deep cross-modal projection learning for image-text matching. In ECCV. 686--701.Google Scholar
Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, and Tao Mei. 2020a. Hierarchical Gumbel Attention Network for Text-based Person Search. In ACM MM.Google Scholar
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable Person Re-Identification: A Benchmark. In ICCV.Google Scholar
Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2011. Person re-identification by probabilistic relative distance comparison. In CVPR. IEEE, 649--656.Google Scholar
Zhedong Zheng, Xiaodong Yang, Zhiding Yu, Liang Zheng, Yi Yang, and Jan Kautz. 2019. Joint discriminative and generative learning for person re-identification. In CVPR.Google Scholar
Zhedong Zheng, Liang Zheng, Michael Garrett, Yi Yang, Mingliang Xu, and Yi-Dong Shen. 2020b. Dual-path convolutional image-text embeddings with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 16, 2 (2020), 1--23.Google ScholarDigital Library
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by gan improve the person re-identification baseline in vitro. In ICCV. 3754--3762.Google Scholar
Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2020. Random erasing data augmentation. In AAAI, Vol. 34. 13001--13008.Google ScholarCross Ref
Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, and Gang Hua. 2021. DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval. In ACM MM. 209--217.Google Scholar

Index Terms

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
  2. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

Improving embedding learning by virtual attribute decoupling for text-based person search
Abstract
This paper considers the problem of text-based person search, which aims to find the target person based on a query textual description. Previous methods commonly focus on learning shared image-text embeddings, but largely ignore the effect of ...
Read More
Towards a framework for attribute retrieval
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

In this paper, we propose an attribute retrieval approach which extracts and ranks attributes from HTML tables. We distinguish between class attribute retrieval and instance attribute retrieval. On one hand, given an instance (e.g. University of ...
Read More
CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a color over-reliance problem, which means that the models rely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attribute prompt learning
image-text alignment
multi-attribute recognition
synthetic data
text-based person retrieval
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 209
  Total Downloads
- Downloads (Last 12 months)209
- Downloads (Last 6 weeks)46
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Improving embedding learning by virtual attribute decoupling for text-based person search

Towards a framework for attribute retrieval

CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval