research-article

Fashion-GPT: Integrating LLMs with Fashion Retrieval System

Authors:
Qianqian Chen

Huawei Singapore Research Center, Singapore, Singapore

Huawei Singapore Research Center, Singapore, Singapore

0009-0008-3863-8882
View Profile

,
Tianyi Zhang

Huawei Singapore Research Center, Singapore, Singapore

Huawei Singapore Research Center, Singapore, Singapore

0009-0000-8140-2199
View Profile

,
Maowen Nie

Huawei Singapore Research Center, Singapore, Singapore

Huawei Singapore Research Center, Singapore, Singapore

0009-0009-5306-0297
View Profile

,
Zheng Wang

Huawei Singapore Research Center, Singapore, Singapore

Huawei Singapore Research Center, Singapore, Singapore

0000-0002-7064-6267
View Profile

,
Shihao Xu

Huawei Singapore Research Center, Singapore, Singapore

Huawei Singapore Research Center, Singapore, Singapore

0000-0001-8295-5692
View Profile

,
Wei Shi

Huawei Singapore Research Center, Singapore, Singapore

Huawei Singapore Research Center, Singapore, Singapore

0009-0006-2717-4192
View Profile

,
Zhao Cao

Huawei Technologies Co., Ltd., Beijing, China

Huawei Technologies Co., Ltd., Beijing, China

0000-0002-4214-7858
View Profile

LGM3A '23: Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal ApplicationsNovember 2023Pages 69–78https://doi.org/10.1145/3607827.3616844

Published:29 October 2023Publication History

LGM3A '23: Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications

Pages 69–78

ABSTRACT

Customers on a fashion e-commerce platform although expressing their clothing preferences through combined imagery and textual information, they are limited to retrieve with single-round fixed inputs. At the same time, large language models (LLMs) have been gaining attention across various fields. ChatGPT is a remarkable example of an LLM, known for its user-friendly language interface, impressive conversational proficiency, and reasoning abilities. To this end, we propose Fashion-GPT, a system paradigm that integrates ChatGPT with a pool of AI models in the fashion domain to achieve a multi-round multi-modal search. Specifically, it enables the system to utilize the LLMs for understanding user queries, select retrieval models based on their function descriptions, execute each subtask with the selected fashion models, and leverage LLMs to summarize the response corresponding to the execution results.

In order to boost the performance dominated by AI experts, we also introduce a novel pre-trained framework called 3M (short for Multi-view Multi-modal Matching). In particular, unlike prior studies that rely solely on one-to-one matching on image-text pair, 3M incorporates multiple texts describing the same image to achieve one-to-many alignment. Maximizing mutual information between features extracted from these views boosts capturing information about high-level factors that influence multiple views, such as the occurrence of specific objects. In addition, with the advantage of the characteristics of fashion data, multi-view images from the same product, like front-view and side-view, are naturally suitable for intra-modal self-alignment. Therefore, 3M also introduces an intra-modal contrastive objective to provide additional benefits in representation learning from the image perspective. To the best of our knowledge, our framework is the first to consider one-to-many mapping for multi-modality representation learning. Experimental evaluations demonstrate that our fashion experts are competitive and achieve state-of-the-art performance, bringing a +3.47% R@10 boost on Fashion-200K and +1.98% R@10 boost on the Fashion-IQ dress dataset compared to the previous SOTA results.

References

Rafal Ablamowicz and Bertfried Fauser. 2007. CLIFFORD: a Maple 11 Package for Clifford Algebra Computations, version 11. http://math.tntech.edu/rafal/cliff11/index.html Retrieved February 28, 2008 fromGoogle Scholar
, Patricia S. Abril and Robert Plant. 2007. The patent holder's dilemma: Buy, sell, or troll? Commun. ACM, Vol. 50, 1 (Jan. 2007), 36--44. https://doi.org/10.1145/1188913.1188915Google ScholarDigital Library
A. Adya, P. Bahl, J. Padhye, A.Wolman, and L. Zhou. 2004. A multi-radio unification protocol for IEEE 802.11 wireless networks. In Proceedings of the IEEE 1st International Conference on Broadnets Networks (BroadNets'04). IEEE, Los Alamitos, CA, 210--217.Google Scholar
I. F. Akyildiz, T. Melodia, and K. R. Chowdhury. 2007. A Survey on Wireless Multimedia Sensor Networks. Computer Netw. , Vol. 51, 4 (2007), 921--960.Google ScholarDigital Library
I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. 2002. Wireless Sensor Networks: A Survey. Comm. ACM, Vol. 38, 4 (2002), 393--422.Google Scholar
Sten Andler. 1979. Predicate Path expressions. In Proceedings of the 6th. ACM SIGACT-SIGPLAN symposium on Principles of Programming Languages (POPL '79). ACM Press, New York, NY, 226--236. https://doi.org/10.1145/567752.567774Google ScholarDigital Library
David A. Anisi. 2003. Optimal Motion Control of a Ground Vehicle. Master's thesis. Royal Institute of Technology (KTH), Stockholm, Sweden.Google Scholar
Sam Anzaroot and Andrew McCallum. 2013. UMass Citation Field Extraction Dataset. http://www.iesl.cs.umass.edu/data/data-umasscitationfield Retrieved May 27, 2019 fromGoogle Scholar
Sam Anzaroot, Alexandre Passos, David Belanger, and Andrew McCallum. 2014. Learning Soft Linear Constraints with Application to Citation Field Extraction. arxiv: 1403.1349Google Scholar
J. E. Archer, Jr., R. Conway, and F. B. Schneider. 1984. User recovery and reversal in interactive systems. ACM Trans. Program. Lang. Syst. , Vol. 6, 1 (Jan. 1984), 1--19.Google ScholarDigital Library
Philip Bachman, R. Devon Hjelm, and William Buchwalter. 2019. Learning Representations by Maximizing Mutual Information Across Views. https://doi.org/10.48550/arXiv.1906.00910 arxiv: 1906.00910 [cs, stat]Google ScholarCross Ref
P. Bahl, R. Chancre, and J. Dungeon. 2004. SSCH: Slotted Seeded Channel Hopping for Capacity Improvement in IEEE 802.11 Ad-Hoc Wireless Networks. In Proceeding of the 10th International Conference on Mobile Computing and Networking (MobiCom'04). ACM, New York, NY, 112--117.Google Scholar
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022a. Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features. 4959--4968. https://openaccess.thecvf.com/content/CVPR2022W/ODRUM/html/Baldrati_Conditioned_and_Composed_Image_Retrieval_Combining_and_Partially_Fine-Tuning_CLIP-Based_CVPRW_2022_paper.htmlGoogle Scholar
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022b. Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 4955--4964. https://doi.org/10.1109/CVPRW56347.2022.00543Google ScholarCross Ref
Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022c. Effective conditioned and composed image retrieval combining CLIP-based features. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21434--21442. https://doi.org/10.1109/CVPR52688.2022.02080Google ScholarCross Ref
Lutz Bornmann, K. Brad Wray, and Robin Haunschild. 2019. Citation concept analysis (CCA)--A new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by two exemplary case studies including classic books by Thomas S. Kuhn and Karl R. Popper. arxiv: 1905.12410 [cs.DL]Google Scholar
Mic Bowman, Saumya K. Debray, and Larry L. Peterson. 1993. Reasoning About Naming Systems. ACM Trans. Program. Lang. Syst. , Vol. 15, 5 (November 1993), 795--825. https://doi.org/10.1145/161468.161471Google ScholarDigital Library
Johannes Braams. 1991. Babel, a Multilingual Style-Option System for Use with LaTeX's Standard Document Styles. TUGboat, Vol. 12, 2 (June 1991), 291--301.Google Scholar
Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. Knott. 1987 a. Vertex Types in Book-Embeddings. Technical Report. Amherst, MA, USA.Google Scholar
Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. Knott. 1987 b. Vertex Types in Book-Embeddings. Technical Report. Amherst, MA, USA.Google Scholar
Fei-Long Chen, Du-Zhen Zhang, Ming-Lun Han, Xiu-Yi Chen, Jing Shi, Shuang Xu, and Bo Xu. 2023. VLP: A Survey on Vision-language Pre-training. , Vol. 20, 1 (Feb. 2023), 38--56. https://doi.org/10.1007/s11633-022--1369--5Google ScholarCross Ref
Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020a. Image Search With Text Feedback by Visiolinguistic Attention Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2998--3008. https://doi.org/10.1109/CVPR42600.2020.00307Google ScholarCross Ref
Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020b. Image Search With Text Feedback by Visiolinguistic Attention Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2998--3008. https://doi.org/10.1109/CVPR42600.2020.00307Google ScholarCross Ref
Malcolm Clark. 1991. Post Congress Tristesse. In TeX90 Conference Proceedings. TeX Users Group, 84--89.Google Scholar
Kenneth L. Clarkson. 1985 a. Algorithms for Closest-Point Problems (Computational Geometry). Ph.,D. Dissertation. Stanford University, Palo Alto, CA. UMI Order Number: AAT 8506171.Google Scholar
Kenneth Lee Clarkson. 1985 b. Algorithms for Closest-Point Problems (Computational Geometry). Ph.,D. Dissertation. Stanford University, Stanford, CA, USA. Advisor(s) Yao, Andrew C. AAT 8506171.Google Scholar
Jacques Cohen (Ed.). 1996. Special issue: Digital Libraries. Commun. ACM , Vol. 39, 11 (Nov. 1996).Google Scholar
Sarah Cohen, Werner Nutt, and Yehoshua Sagic. 2007. Deciding equivalances among conjunctive aggregate queries. J. ACM, Vol. 54, 2, Article 5 (April 2007), bibinfonumpages50 pages. https://doi.org/10.1145/1219092.1219093Google ScholarDigital Library
Mauro Conti, Roberto Di Pietro, Luigi V. Mancini, and Alessandro Mei. 2009a. (new) Distributed data source verification in wireless sensor networks. Inf. Fusion, Vol. 10, 4 (Oct. 2009), 342--353. https://doi.org/10.1016/j.inffus.2009.01.002Google ScholarDigital Library
Mauro Conti, Roberto Di Pietro, Luigi V. Mancini, and Alessandro Mei. 2009b. (old) Distributed data source verification in wireless sensor networks. Inf. Fusion, Vol. 10, 4 (2009), 342--353. https://doi.org/10.1016/j.inffus.2009.01.002Google ScholarDigital Library
D. Culler, D. Estrin, and M. Srivastava. 2004. Overview of Sensor Networks. IEEE Comput. , Vol. 37, 8 (Special Issue on Sensor Networks) (2004), 41--49.Google ScholarDigital Library
Ginger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, and Diane Larlus. 2022. ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. https://doi.org/10.48550/ARXIV.2203.08101Google ScholarCross Ref
E. Dijkstra. 1979. Go to statement considered harmful. In Classics in software engineering (incoll). Yourdon Press, Upper Saddle River, NJ, USA, 27--33. http://portal.acm.org/citation.cfm?id=1241515.1241518Google Scholar
Eric Dodds, Jack Culpepper, Simao Herdade, Yang Zhang, and Kofi Boakye. 2020. Modality-Agnostic Attention Fusion for visual search with text feedback. http://arxiv.org/abs/2007.00145Google Scholar
Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, and Michael Zeng. 2022. An Empirical Study of Training End-to-End Vision-and-Language Transformers. https://doi.org/10.48550/arXiv.2111.02387 arxiv: 2111.02387 [cs]Google ScholarCross Ref
Bruce P. Douglass, David Harel, and Mark B. Trakhtenbrot. 1998. Statecarts in use: structured analysis and object-orientation. In Lectures on Embedded Systems, , Grzegorz Rozenberg and Frits W. Vaandrager (Eds.). Lecture Notes in Computer Science, Vol. 1494. Springer-Verlag, London, 368--394. https://doi.org/10.1007/3--540--65193--4_29Google ScholarCross Ref
Yifan Du, Zikang Liu, Junyi Li, and Wayne Xin Zhao. 2022. A Survey of Vision-Language Pre-Trained Models. https://doi.org/10.48550/arXiv.2202.10936 arxiv: 2202.10936 [cs]Google ScholarCross Ref
D. D. Dunlop and V. R. Basili. 1985. Generalizing specifications for uniformly implemented loops. ACM Trans. Program. Lang. Syst. , Vol. 7, 1 (Jan. 1985), 137--158.Google ScholarDigital Library
Ian Editor (Ed.). 2007. The title of book one 1st. ed.). The name of the series one, Vol. 9. University of Chicago Press, Chicago. https://doi.org/10.1007/3--540-09237--4Google ScholarCross Ref
Ian Editor (Ed.). 2008. The title of book two 2nd. ed.). University of Chicago Press, Chicago, Chapter 100. https://doi.org/10.1007/3--540-09237--4Google ScholarCross Ref
Simon Fear. 2005. Publication quality tables in ŁaTeX. http://www.ctan.org/pkg/booktabs.Google Scholar
Dan Geiger and Christopher Meek. 2005. Structured Variational Inference Procedures and their Realizations (as incol). In Proceedings of Tenth International Workshop on Artificial Intelligence and Statistics, rm The Barbados. The Society for Artificial Intelligence and Statistics.Google Scholar
Michael Gerndt. 1989. Automatic Parallelization for Distributed-Memory Multiprocessing Systems. Ph.,D. Dissertation. University of Bonn, Bonn, Germany.Google Scholar
Sonam Goenka, Zhaoheng Zheng, Ayush Jaiswal, RAKESH CHADA, Yue Wu, Varsha Hedau, and Pradeep Natarajan. 2022. FashionVLP: Vision language transformer for fashion retrieval with feedback. In CVPR 2022. https://www.amazon.science/publications/fashionvlp-vision-language-transformer-for-fashion-retrieval-with-feedbackGoogle ScholarCross Ref
Michel Goossens, S. P. Rahtz, Ross Moore, and Robert S. Sutor. 1999. The Latex Web Companion: Integrating TEX, HTML, and XML 1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.Google Scholar
Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2007. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '07). USENIX Association, Berkley, CA, Article 7, bibinfonumpages9 pages.Google ScholarDigital Library
Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2008. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '08). USENIX Association, Berkley, CA, Article 7, bibinfonumpages2 pages.Google Scholar
Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2009. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '09). USENIX Association, Berkley, CA, 90--100.Google Scholar
Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Rogerio Schmidt Feris. 2018. Dialog-based Interactive Image Retrieval. https://doi.org/10.48550/arXiv.1805.00145 arxiv: 1805.00145 [cs]Google ScholarCross Ref
Tanmay Gupta and Aniruddha Kembhavi. 2022. Visual Programming: Compositional visual reasoning without training. https://doi.org/10.48550/arXiv.2211.11559 arxiv: 2211.11559 [cs]Google ScholarCross Ref
Torben Hagerup, Kurt Mehlhorn, and J. Ian Munro. 1993. Maintaining Discrete Probability Distributions Optimally. In Proceedings of the 20th International Colloquium on Automata, Languages and Programming (Lecture Notes in Computer Science, Vol. 700). Springer-Verlag, Berlin, 253--264.Google Scholar
Xiao Han, Sen He, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2022a. UIGR: Unified Interactive Garment Retrieval. https://doi.org/10.48550/arXiv.2204.03111 arxiv: 2204.03111 [cs]Google ScholarCross Ref
Xintong Han, Zuxuan Wu, Phoenix X. Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, and Larry S. Davis. 2017a. Automatic Spatially-aware Fashion Concept Discovery. https://doi.org/10.48550/ARXIV.1708.01311Google ScholarCross Ref
Xintong Han, Zuxuan Wu, Phoenix X. Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, and Larry S. Davis. 2017b. Automatic Spatially-aware Fashion Concept Discovery. https://doi.org/10.48550/ARXIV.1708.01311Google ScholarCross Ref
Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2022b. FashionViL: Fashion-Focused Vision-and-Language Representation Learning. https://link.springer.com/chapter/10.1007/978--3-031--19833--5_37Google Scholar
David Harel. 1978. LOGICS of Programs: AXIOMATICS and DESCRIPTIVE POWER. MIT Research Lab Technical Report TR-200. Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
David Harel. 1979. First-Order Dynamic Logic. Lecture Notes in Computer Science, Vol. 68. Springer-Verlag, New York, NY. https://doi.org/10.1007/3--540-09237--4Google ScholarCross Ref
J. Heering and P. Klint. 1985. Towards monolingual programming environments. ACM Trans. Program. Lang. Syst. , Vol. 7, 2 (April 1985), 183--213.Google ScholarDigital Library
Maurice Herlihy. 1993. A Methodology for Implementing Highly Concurrent Data Objects. ACM Trans. Program. Lang. Syst. , Vol. 15, 5 (November 1993), 745--770. https://doi.org/10.1145/161468.161469Google ScholarDigital Library
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. https://doi.org/10.48550/arXiv.1703.07737 arxiv: 1703.07737 [cs]Google ScholarCross Ref
R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. https://doi.org/10.48550/arXiv.1808.06670 arxiv: 1808.06670 [cs, stat]Google ScholarCross Ref
C. A. R. Hoare. 1972. Chapter II: Notes on data structuring. In Structured programming (incoll), , O. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare (Eds.). Academic Press Ltd., London, UK, UK, 83--174. http://portal.acm.org/citation.cfm?id=1243380.1243382Google Scholar
Billy S. Hollis. 1999. Visual Basic 6: Design, Specification, and Objects with Other 1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA.Google Scholar
Lars Hörmander. 1985 a. The analysis of linear partial differential operators. III. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 275. Springer-Verlag, Berlin, Germany. viii525 pages. Pseudodifferential operators.Google Scholar
Lars Hörmander. 1985 b. The analysis of linear partial differential operators. IV. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 275. Springer-Verlag, Berlin, Germany. vii352 pages. Fourier integral operators.Google Scholar
Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, and Furu Wei. 2023. Language Is Not All You Need: Aligning Perception with Language Models. https://doi.org/10.48550/arXiv.2302.14045 arxiv: 2302.14045 [cs]Google ScholarCross Ref
Jongseok Kim, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. 2021. Dual Compositional Learning in Interactive Image Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2 (May 2021), 1771--1779. https://doi.org/10.1609/aaai.v35i2.16271Google ScholarCross Ref
Markus Kirschmer and John Voight. 2010. Algorithmic Enumeration of Ideal Classes for Quaternion Orders. SIAM J. Comput. , Vol. 39, 5 (Jan. 2010), 1714--1747. https://doi.org/10.1137/080734467Google ScholarCross Ref
Donald E. Knuth. 1981 a. Seminumerical Algorithms. Addison-Wesley.Google Scholar
Donald E. Knuth. 1981 b. Seminumerical Algorithms 2nd ed.). The Art of Computer Programming, Vol. 2. Addison-Wesley, Reading, MA.Google Scholar
Donald E. Knuth. 1984. The TeXbook. Addison-Wesley, Reading, MA.Google Scholar
Donald E. Knuth. 1997. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd. ed.). Addison Wesley Longman Publishing Co., Inc.Google ScholarDigital Library
Donald E. Knuth. 1998. The Art of Computer Programming 3rd ed.). Fundamental Algorithms, Vol. 1. Addison Wesley Longman Publishing Co., Inc. (book).Google ScholarDigital Library
Wei-Chang Kong. 2001 a. E-commerce and cultural values. IGI Publishing, Hershey, PA, USA, Name of chapter: The implementation of electronic commerce in SMEs in Singapore (Inbook-w-chap-w-type), 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
Wei-Chang Kong. 2001 b. The implementation of electronic commerce in SMEs in Singapore (as Incoll). In E-commerce and cultural values. IGI Publishing, Hershey, PA, USA, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
Wei-Chang Kong. 2002. Chapter 9. In E-commerce and cultural values (Incoll-w-text (chap 9) 'title'), , Theerasak Thanasankit (Ed.). IGI Publishing, Hershey, PA, USA, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
Wei-Chang Kong. 2003. The implementation of electronic commerce in SMEs in Singapore (Incoll). In E-commerce and cultural values, , Theerasak Thanasankit (Ed.). IGI Publishing, Hershey, PA, USA, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
Wei-Chang Kong. 2004. E-commerce and cultural values - (InBook-num-in-chap). IGI Publishing, Hershey, PA, USA, Chapter 9, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
Wei-Chang Kong. 2005. E-commerce and cultural values (Inbook-text-in-chap). IGI Publishing, Hershey, PA, USA, Chapter: The implementation of electronic commerce in SMEs in Singapore, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
Wei-Chang Kong. 2006. E-commerce and cultural values (Inbook-num chap). IGI Publishing, Hershey, PA, USA, Chapter (in type field) 22, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google Scholar
E. Korach, D. Rotem, and N. Santoro. 1984. Distributed algorithms for finding centers and medians in networks. ACM Trans. Program. Lang. Syst. , Vol. 6, 3 (July 1984), 380--401.Google ScholarDigital Library
Jacob Kornerup. 1994. Mapping Powerlists onto Hypercubes. Master's thesis. The University of Texas at Austin. (In preparation).Google Scholar
David Kosiur. 2001. Understanding Policy-Based Networking 2nd. ed.). Wiley, New York, NY.Google ScholarDigital Library
Leslie Lamport. 1986. ŁaTeX: A Document Preparation System. Addison-Wesley, Reading, MA.Google ScholarDigital Library
Jan Lee. 1981. Transcript of question and answer session. In History of programming languages I (incoll), Richard L. Wexelblat (Ed.). ACM, New York, NY, USA, 68--71. https://doi.org/10.1145/800025.1198348Google ScholarDigital Library
Newton Lee. 2005. Interview with Bill Kinder: January 13, 2005. Video. Comput. Entertain. , Vol. 3, 1, Article 4 (Jan.-March 2005). https://doi.org/10.1145/1057270.1057278Google ScholarDigital Library
Seungmin Lee, Dongwan Kim, and Bohyung Han. 2021. CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 802--812. https://doi.org/10.1109/CVPR46437.2021.00086Google ScholarCross Ref
Cheng-Lun Li, Ayse G. Buyuktur, David K. Hutchful, Natasha B. Sant, and Satyendra K. Nainwal. 2008. Portalis: using competitive online interactions to support aid initiatives for the homeless. In CHI '08 extended abstracts on Human factors in computing systems (Florence, Italy). ACM, New York, NY, USA, 3873--3878. https://doi.org/10.1145/1358628.1358946Google ScholarDigital Library
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. https://doi.org/10.48550/arXiv.2301.12597 arxiv: 2301.12597 [cs]Google ScholarCross Ref
Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3330--3337. https://doi.org/10.1109/CVPR.2012.6248071 ISSN: 1063--6919.Google ScholarCross Ref
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arxiv: 1907.11692 [cs]Google ScholarCross Ref
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021a. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. https://doi.org/10.48550/arXiv.2103.14030 arxiv: 2103.14030 [cs]Google ScholarCross Ref
Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, and Stephen Gould. 2021b. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. https://doi.org/10.48550/ARXIV.2108.04024Google ScholarCross Ref
Daniel D. McCracken and Donald G. Golden. 1990. Simplified Structured COBOL with Microsoft/MicroFocus COBOL. John Wiley & Sons, Inc., New York, NY, USA.Google Scholar
Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, Wenwen Jiang, Tao Xiang, and Ning Zhang. 2022a. FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. https://doi.org/10.48550/arXiv.2210.15028Google ScholarCross Ref
Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, Wenwen Jiang, Tao Xiang, and Ning Zhang. 2022b. FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. https://doi.org/10.48550/ARXIV.2210.15028Google ScholarCross Ref
Sape Mullender (Ed.). 1993. Distributed systems (2nd Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA.Google Scholar
E. Mumford. 1987. Managerial expert systems and organizational change: some critical research issues. In Critical issues in information systems research (incoll). John Wiley & Sons, Inc., New York, NY, USA, 135--155. http://portal.acm.org/citation.cfm?id=54905.54911Google Scholar
A. Natarajan, M. Motani, B. de Silva, K. Yap, and K. C. Chua. 2007. Investigating Network Architectures for Body Sensor Networks. In Network Architectures, G. Whitcomb and P. Neece (Eds.). Keleuven Press, Dayton, OH, 322--328. https://doi.org/10.1145/1721695.1721705Google ScholarDigital Library
Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, and Mohamed Elhoseiny. 2023. ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions. arxiv: 2303.06594 [cs] http://arxiv.org/abs/2303.06594Google Scholar

Index Terms

Fashion-GPT: Integrating LLMs with Fashion Retrieval System
1. Computing methodologies
  1. Artificial intelligence

Recommendations

Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

We study the task of conversational fashion image retrieval via multiturn natural language feedback. Most previous studies are based on single-turn settings. Existing models on multiturn conversational fashion image retrieval have limitations, such as ...
Read More
Collaborative Fashion Recommendation: A Functional Tensor Factorization Approach
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

With the rapid expansion of online shopping for fashion products, effective fashion recommendation has become an increasingly important problem. In this work, we study the problem of personalized outfit recommendation, i.e. automatically suggesting ...
Read More
Robo fashion world: a multimodal corpus of multi-child human-computer interaction
UM3I '14: Proceedings of the 2014 workshop on Understanding and Modeling Multiparty, Multimodal Interactions

We present a retrospective view on our experience with small groups of more than 175 children (ages 4 to 10) playing versions of a language-based game hosted by an animated character. After describing the task, the audio-visual annotations used for ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LGM3A '23: Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications
November 2023
84 pages
ISBN:9798400702839
DOI:10.1145/3607827
General Chairs:
Zheng Wang
Huawei Singapore Research Center
,
Cheng Long
Nanyang Technological University
,
Shihao Xu
Huawei Singapore Research Center
,
Bingzheng Gan
Huawei Singapore Research Center
,
Wei Shi
Huawei Singapore Research Center
,
Zhao Cao
Huawei Technologies Co., Ltd
,
Tat-Seng Chua
National University of Singapore
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chatgpt-based system with retrieval function
multi-round multi-modal search
multimodal pre-training network
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 303
  Total Downloads
- Downloads (Last 12 months)303
- Downloads (Last 6 weeks)50
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fashion-GPT: Integrating LLMs with Fashion Retrieval System

LGM3A '23: Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications

ABSTRACT

References

Cited By

Index Terms

Recommendations

Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback

Collaborative Fashion Recommendation: A Functional Tensor Factorization Approach

Robo fashion world: a multimodal corpus of multi-child human-computer interaction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media