skip to main content
10.1145/3607827.3616844acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Fashion-GPT: Integrating LLMs with Fashion Retrieval System

Published:29 October 2023Publication History

ABSTRACT

Customers on a fashion e-commerce platform although expressing their clothing preferences through combined imagery and textual information, they are limited to retrieve with single-round fixed inputs. At the same time, large language models (LLMs) have been gaining attention across various fields. ChatGPT is a remarkable example of an LLM, known for its user-friendly language interface, impressive conversational proficiency, and reasoning abilities. To this end, we propose Fashion-GPT, a system paradigm that integrates ChatGPT with a pool of AI models in the fashion domain to achieve a multi-round multi-modal search. Specifically, it enables the system to utilize the LLMs for understanding user queries, select retrieval models based on their function descriptions, execute each subtask with the selected fashion models, and leverage LLMs to summarize the response corresponding to the execution results.

In order to boost the performance dominated by AI experts, we also introduce a novel pre-trained framework called 3M (short for Multi-view Multi-modal Matching). In particular, unlike prior studies that rely solely on one-to-one matching on image-text pair, 3M incorporates multiple texts describing the same image to achieve one-to-many alignment. Maximizing mutual information between features extracted from these views boosts capturing information about high-level factors that influence multiple views, such as the occurrence of specific objects. In addition, with the advantage of the characteristics of fashion data, multi-view images from the same product, like front-view and side-view, are naturally suitable for intra-modal self-alignment. Therefore, 3M also introduces an intra-modal contrastive objective to provide additional benefits in representation learning from the image perspective. To the best of our knowledge, our framework is the first to consider one-to-many mapping for multi-modality representation learning. Experimental evaluations demonstrate that our fashion experts are competitive and achieve state-of-the-art performance, bringing a +3.47% R@10 boost on Fashion-200K and +1.98% R@10 boost on the Fashion-IQ dress dataset compared to the previous SOTA results.

References

  1. Rafal Ablamowicz and Bertfried Fauser. 2007. CLIFFORD: a Maple 11 Package for Clifford Algebra Computations, version 11. http://math.tntech.edu/rafal/cliff11/index.html Retrieved February 28, 2008 fromGoogle ScholarGoogle Scholar
  2. , Patricia S. Abril and Robert Plant. 2007. The patent holder's dilemma: Buy, sell, or troll? Commun. ACM, Vol. 50, 1 (Jan. 2007), 36--44. https://doi.org/10.1145/1188913.1188915Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Adya, P. Bahl, J. Padhye, A.Wolman, and L. Zhou. 2004. A multi-radio unification protocol for IEEE 802.11 wireless networks. In Proceedings of the IEEE 1st International Conference on Broadnets Networks (BroadNets'04). IEEE, Los Alamitos, CA, 210--217.Google ScholarGoogle Scholar
  4. I. F. Akyildiz, T. Melodia, and K. R. Chowdhury. 2007. A Survey on Wireless Multimedia Sensor Networks. Computer Netw. , Vol. 51, 4 (2007), 921--960.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. 2002. Wireless Sensor Networks: A Survey. Comm. ACM, Vol. 38, 4 (2002), 393--422.Google ScholarGoogle Scholar
  6. Sten Andler. 1979. Predicate Path expressions. In Proceedings of the 6th. ACM SIGACT-SIGPLAN symposium on Principles of Programming Languages (POPL '79). ACM Press, New York, NY, 226--236. https://doi.org/10.1145/567752.567774Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. David A. Anisi. 2003. Optimal Motion Control of a Ground Vehicle. Master's thesis. Royal Institute of Technology (KTH), Stockholm, Sweden.Google ScholarGoogle Scholar
  8. Sam Anzaroot and Andrew McCallum. 2013. UMass Citation Field Extraction Dataset. http://www.iesl.cs.umass.edu/data/data-umasscitationfield Retrieved May 27, 2019 fromGoogle ScholarGoogle Scholar
  9. Sam Anzaroot, Alexandre Passos, David Belanger, and Andrew McCallum. 2014. Learning Soft Linear Constraints with Application to Citation Field Extraction. arxiv: 1403.1349Google ScholarGoogle Scholar
  10. J. E. Archer, Jr., R. Conway, and F. B. Schneider. 1984. User recovery and reversal in interactive systems. ACM Trans. Program. Lang. Syst. , Vol. 6, 1 (Jan. 1984), 1--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Philip Bachman, R. Devon Hjelm, and William Buchwalter. 2019. Learning Representations by Maximizing Mutual Information Across Views. https://doi.org/10.48550/arXiv.1906.00910 arxiv: 1906.00910 [cs, stat]Google ScholarGoogle ScholarCross RefCross Ref
  12. P. Bahl, R. Chancre, and J. Dungeon. 2004. SSCH: Slotted Seeded Channel Hopping for Capacity Improvement in IEEE 802.11 Ad-Hoc Wireless Networks. In Proceeding of the 10th International Conference on Mobile Computing and Networking (MobiCom'04). ACM, New York, NY, 112--117.Google ScholarGoogle Scholar
  13. Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022a. Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features. 4959--4968. https://openaccess.thecvf.com/content/CVPR2022W/ODRUM/html/Baldrati_Conditioned_and_Composed_Image_Retrieval_Combining_and_Partially_Fine-Tuning_CLIP-Based_CVPRW_2022_paper.htmlGoogle ScholarGoogle Scholar
  14. Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022b. Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 4955--4964. https://doi.org/10.1109/CVPRW56347.2022.00543Google ScholarGoogle ScholarCross RefCross Ref
  15. Alberto Baldrati, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo. 2022c. Effective conditioned and composed image retrieval combining CLIP-based features. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21434--21442. https://doi.org/10.1109/CVPR52688.2022.02080Google ScholarGoogle ScholarCross RefCross Ref
  16. Lutz Bornmann, K. Brad Wray, and Robin Haunschild. 2019. Citation concept analysis (CCA)--A new form of citation analysis revealing the usefulness of concepts for other researchers illustrated by two exemplary case studies including classic books by Thomas S. Kuhn and Karl R. Popper. arxiv: 1905.12410 [cs.DL]Google ScholarGoogle Scholar
  17. Mic Bowman, Saumya K. Debray, and Larry L. Peterson. 1993. Reasoning About Naming Systems. ACM Trans. Program. Lang. Syst. , Vol. 15, 5 (November 1993), 795--825. https://doi.org/10.1145/161468.161471Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Johannes Braams. 1991. Babel, a Multilingual Style-Option System for Use with LaTeX's Standard Document Styles. TUGboat, Vol. 12, 2 (June 1991), 291--301.Google ScholarGoogle Scholar
  19. Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. Knott. 1987 a. Vertex Types in Book-Embeddings. Technical Report. Amherst, MA, USA.Google ScholarGoogle Scholar
  20. Jonathan F. Buss, Arnold L. Rosenberg, and Judson D. Knott. 1987 b. Vertex Types in Book-Embeddings. Technical Report. Amherst, MA, USA.Google ScholarGoogle Scholar
  21. Fei-Long Chen, Du-Zhen Zhang, Ming-Lun Han, Xiu-Yi Chen, Jing Shi, Shuang Xu, and Bo Xu. 2023. VLP: A Survey on Vision-language Pre-training. , Vol. 20, 1 (Feb. 2023), 38--56. https://doi.org/10.1007/s11633-022--1369--5Google ScholarGoogle ScholarCross RefCross Ref
  22. Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020a. Image Search With Text Feedback by Visiolinguistic Attention Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2998--3008. https://doi.org/10.1109/CVPR42600.2020.00307Google ScholarGoogle ScholarCross RefCross Ref
  23. Yanbei Chen, Shaogang Gong, and Loris Bazzani. 2020b. Image Search With Text Feedback by Visiolinguistic Attention Learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2998--3008. https://doi.org/10.1109/CVPR42600.2020.00307Google ScholarGoogle ScholarCross RefCross Ref
  24. Malcolm Clark. 1991. Post Congress Tristesse. In TeX90 Conference Proceedings. TeX Users Group, 84--89.Google ScholarGoogle Scholar
  25. Kenneth L. Clarkson. 1985 a. Algorithms for Closest-Point Problems (Computational Geometry). Ph.,D. Dissertation. Stanford University, Palo Alto, CA. UMI Order Number: AAT 8506171.Google ScholarGoogle Scholar
  26. Kenneth Lee Clarkson. 1985 b. Algorithms for Closest-Point Problems (Computational Geometry). Ph.,D. Dissertation. Stanford University, Stanford, CA, USA. Advisor(s) Yao, Andrew C. AAT 8506171.Google ScholarGoogle Scholar
  27. Jacques Cohen (Ed.). 1996. Special issue: Digital Libraries. Commun. ACM , Vol. 39, 11 (Nov. 1996).Google ScholarGoogle Scholar
  28. Sarah Cohen, Werner Nutt, and Yehoshua Sagic. 2007. Deciding equivalances among conjunctive aggregate queries. J. ACM, Vol. 54, 2, Article 5 (April 2007), bibinfonumpages50 pages. https://doi.org/10.1145/1219092.1219093Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mauro Conti, Roberto Di Pietro, Luigi V. Mancini, and Alessandro Mei. 2009a. (new) Distributed data source verification in wireless sensor networks. Inf. Fusion, Vol. 10, 4 (Oct. 2009), 342--353. https://doi.org/10.1016/j.inffus.2009.01.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mauro Conti, Roberto Di Pietro, Luigi V. Mancini, and Alessandro Mei. 2009b. (old) Distributed data source verification in wireless sensor networks. Inf. Fusion, Vol. 10, 4 (2009), 342--353. https://doi.org/10.1016/j.inffus.2009.01.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Culler, D. Estrin, and M. Srivastava. 2004. Overview of Sensor Networks. IEEE Comput. , Vol. 37, 8 (Special Issue on Sensor Networks) (2004), 41--49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ginger Delmas, Rafael Sampaio de Rezende, Gabriela Csurka, and Diane Larlus. 2022. ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity. https://doi.org/10.48550/ARXIV.2203.08101Google ScholarGoogle ScholarCross RefCross Ref
  33. E. Dijkstra. 1979. Go to statement considered harmful. In Classics in software engineering (incoll). Yourdon Press, Upper Saddle River, NJ, USA, 27--33. http://portal.acm.org/citation.cfm?id=1241515.1241518Google ScholarGoogle Scholar
  34. Eric Dodds, Jack Culpepper, Simao Herdade, Yang Zhang, and Kofi Boakye. 2020. Modality-Agnostic Attention Fusion for visual search with text feedback. http://arxiv.org/abs/2007.00145Google ScholarGoogle Scholar
  35. Zi-Yi Dou, Yichong Xu, Zhe Gan, Jianfeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, and Michael Zeng. 2022. An Empirical Study of Training End-to-End Vision-and-Language Transformers. https://doi.org/10.48550/arXiv.2111.02387 arxiv: 2111.02387 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  36. Bruce P. Douglass, David Harel, and Mark B. Trakhtenbrot. 1998. Statecarts in use: structured analysis and object-orientation. In Lectures on Embedded Systems, , Grzegorz Rozenberg and Frits W. Vaandrager (Eds.). Lecture Notes in Computer Science, Vol. 1494. Springer-Verlag, London, 368--394. https://doi.org/10.1007/3--540--65193--4_29Google ScholarGoogle ScholarCross RefCross Ref
  37. Yifan Du, Zikang Liu, Junyi Li, and Wayne Xin Zhao. 2022. A Survey of Vision-Language Pre-Trained Models. https://doi.org/10.48550/arXiv.2202.10936 arxiv: 2202.10936 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  38. D. D. Dunlop and V. R. Basili. 1985. Generalizing specifications for uniformly implemented loops. ACM Trans. Program. Lang. Syst. , Vol. 7, 1 (Jan. 1985), 137--158.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ian Editor (Ed.). 2007. The title of book one 1st. ed.). The name of the series one, Vol. 9. University of Chicago Press, Chicago. https://doi.org/10.1007/3--540-09237--4Google ScholarGoogle ScholarCross RefCross Ref
  40. Ian Editor (Ed.). 2008. The title of book two 2nd. ed.). University of Chicago Press, Chicago, Chapter 100. https://doi.org/10.1007/3--540-09237--4Google ScholarGoogle ScholarCross RefCross Ref
  41. Simon Fear. 2005. Publication quality tables in ŁaTeX. http://www.ctan.org/pkg/booktabs.Google ScholarGoogle Scholar
  42. Dan Geiger and Christopher Meek. 2005. Structured Variational Inference Procedures and their Realizations (as incol). In Proceedings of Tenth International Workshop on Artificial Intelligence and Statistics, rm The Barbados. The Society for Artificial Intelligence and Statistics.Google ScholarGoogle Scholar
  43. Michael Gerndt. 1989. Automatic Parallelization for Distributed-Memory Multiprocessing Systems. Ph.,D. Dissertation. University of Bonn, Bonn, Germany.Google ScholarGoogle Scholar
  44. Sonam Goenka, Zhaoheng Zheng, Ayush Jaiswal, RAKESH CHADA, Yue Wu, Varsha Hedau, and Pradeep Natarajan. 2022. FashionVLP: Vision language transformer for fashion retrieval with feedback. In CVPR 2022. https://www.amazon.science/publications/fashionvlp-vision-language-transformer-for-fashion-retrieval-with-feedbackGoogle ScholarGoogle ScholarCross RefCross Ref
  45. Michel Goossens, S. P. Rahtz, Ross Moore, and Robert S. Sutor. 1999. The Latex Web Companion: Integrating TEX, HTML, and XML 1st ed.). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.Google ScholarGoogle Scholar
  46. Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2007. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '07). USENIX Association, Berkley, CA, Article 7, bibinfonumpages9 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2008. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '08). USENIX Association, Berkley, CA, Article 7, bibinfonumpages2 pages.Google ScholarGoogle Scholar
  48. Matthew Van Gundy, Davide Balzarotti, and Giovanni Vigna. 2009. Catch me, if you can: Evading network signatures with web-based polymorphic worms. In Proceedings of the first USENIX workshop on Offensive Technologies (WOOT '09). USENIX Association, Berkley, CA, 90--100.Google ScholarGoogle Scholar
  49. Xiaoxiao Guo, Hui Wu, Yu Cheng, Steven Rennie, Gerald Tesauro, and Rogerio Schmidt Feris. 2018. Dialog-based Interactive Image Retrieval. https://doi.org/10.48550/arXiv.1805.00145 arxiv: 1805.00145 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  50. Tanmay Gupta and Aniruddha Kembhavi. 2022. Visual Programming: Compositional visual reasoning without training. https://doi.org/10.48550/arXiv.2211.11559 arxiv: 2211.11559 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  51. Torben Hagerup, Kurt Mehlhorn, and J. Ian Munro. 1993. Maintaining Discrete Probability Distributions Optimally. In Proceedings of the 20th International Colloquium on Automata, Languages and Programming (Lecture Notes in Computer Science, Vol. 700). Springer-Verlag, Berlin, 253--264.Google ScholarGoogle Scholar
  52. Xiao Han, Sen He, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2022a. UIGR: Unified Interactive Garment Retrieval. https://doi.org/10.48550/arXiv.2204.03111 arxiv: 2204.03111 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  53. Xintong Han, Zuxuan Wu, Phoenix X. Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, and Larry S. Davis. 2017a. Automatic Spatially-aware Fashion Concept Discovery. https://doi.org/10.48550/ARXIV.1708.01311Google ScholarGoogle ScholarCross RefCross Ref
  54. Xintong Han, Zuxuan Wu, Phoenix X. Huang, Xiao Zhang, Menglong Zhu, Yuan Li, Yang Zhao, and Larry S. Davis. 2017b. Automatic Spatially-aware Fashion Concept Discovery. https://doi.org/10.48550/ARXIV.1708.01311Google ScholarGoogle ScholarCross RefCross Ref
  55. Xiao Han, Licheng Yu, Xiatian Zhu, Li Zhang, Yi-Zhe Song, and Tao Xiang. 2022b. FashionViL: Fashion-Focused Vision-and-Language Representation Learning. https://link.springer.com/chapter/10.1007/978--3-031--19833--5_37Google ScholarGoogle Scholar
  56. David Harel. 1978. LOGICS of Programs: AXIOMATICS and DESCRIPTIVE POWER. MIT Research Lab Technical Report TR-200. Massachusetts Institute of Technology, Cambridge, MA.Google ScholarGoogle Scholar
  57. David Harel. 1979. First-Order Dynamic Logic. Lecture Notes in Computer Science, Vol. 68. Springer-Verlag, New York, NY. https://doi.org/10.1007/3--540-09237--4Google ScholarGoogle ScholarCross RefCross Ref
  58. J. Heering and P. Klint. 1985. Towards monolingual programming environments. ACM Trans. Program. Lang. Syst. , Vol. 7, 2 (April 1985), 183--213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Maurice Herlihy. 1993. A Methodology for Implementing Highly Concurrent Data Objects. ACM Trans. Program. Lang. Syst. , Vol. 15, 5 (November 1993), 745--770. https://doi.org/10.1145/161468.161469Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. https://doi.org/10.48550/arXiv.1703.07737 arxiv: 1703.07737 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  61. R. Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Phil Bachman, Adam Trischler, and Yoshua Bengio. 2019. Learning deep representations by mutual information estimation and maximization. https://doi.org/10.48550/arXiv.1808.06670 arxiv: 1808.06670 [cs, stat]Google ScholarGoogle ScholarCross RefCross Ref
  62. C. A. R. Hoare. 1972. Chapter II: Notes on data structuring. In Structured programming (incoll), , O. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare (Eds.). Academic Press Ltd., London, UK, UK, 83--174. http://portal.acm.org/citation.cfm?id=1243380.1243382Google ScholarGoogle Scholar
  63. Billy S. Hollis. 1999. Visual Basic 6: Design, Specification, and Objects with Other 1st ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA.Google ScholarGoogle Scholar
  64. Lars Hörmander. 1985 a. The analysis of linear partial differential operators. III. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 275. Springer-Verlag, Berlin, Germany. viii525 pages. Pseudodifferential operators.Google ScholarGoogle Scholar
  65. Lars Hörmander. 1985 b. The analysis of linear partial differential operators. IV. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], Vol. 275. Springer-Verlag, Berlin, Germany. vii352 pages. Fourier integral operators.Google ScholarGoogle Scholar
  66. Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, and Furu Wei. 2023. Language Is Not All You Need: Aligning Perception with Language Models. https://doi.org/10.48550/arXiv.2302.14045 arxiv: 2302.14045 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  67. Jongseok Kim, Youngjae Yu, Hoeseong Kim, and Gunhee Kim. 2021. Dual Compositional Learning in Interactive Image Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 2 (May 2021), 1771--1779. https://doi.org/10.1609/aaai.v35i2.16271Google ScholarGoogle ScholarCross RefCross Ref
  68. Markus Kirschmer and John Voight. 2010. Algorithmic Enumeration of Ideal Classes for Quaternion Orders. SIAM J. Comput. , Vol. 39, 5 (Jan. 2010), 1714--1747. https://doi.org/10.1137/080734467Google ScholarGoogle ScholarCross RefCross Ref
  69. Donald E. Knuth. 1981 a. Seminumerical Algorithms. Addison-Wesley.Google ScholarGoogle Scholar
  70. Donald E. Knuth. 1981 b. Seminumerical Algorithms 2nd ed.). The Art of Computer Programming, Vol. 2. Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  71. Donald E. Knuth. 1984. The TeXbook. Addison-Wesley, Reading, MA.Google ScholarGoogle Scholar
  72. Donald E. Knuth. 1997. The Art of Computer Programming, Vol. 1: Fundamental Algorithms (3rd. ed.). Addison Wesley Longman Publishing Co., Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Donald E. Knuth. 1998. The Art of Computer Programming 3rd ed.). Fundamental Algorithms, Vol. 1. Addison Wesley Longman Publishing Co., Inc. (book).Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Wei-Chang Kong. 2001 a. E-commerce and cultural values. IGI Publishing, Hershey, PA, USA, Name of chapter: The implementation of electronic commerce in SMEs in Singapore (Inbook-w-chap-w-type), 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  75. Wei-Chang Kong. 2001 b. The implementation of electronic commerce in SMEs in Singapore (as Incoll). In E-commerce and cultural values. IGI Publishing, Hershey, PA, USA, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  76. Wei-Chang Kong. 2002. Chapter 9. In E-commerce and cultural values (Incoll-w-text (chap 9) 'title'), , Theerasak Thanasankit (Ed.). IGI Publishing, Hershey, PA, USA, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  77. Wei-Chang Kong. 2003. The implementation of electronic commerce in SMEs in Singapore (Incoll). In E-commerce and cultural values, , Theerasak Thanasankit (Ed.). IGI Publishing, Hershey, PA, USA, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  78. Wei-Chang Kong. 2004. E-commerce and cultural values - (InBook-num-in-chap). IGI Publishing, Hershey, PA, USA, Chapter 9, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  79. Wei-Chang Kong. 2005. E-commerce and cultural values (Inbook-text-in-chap). IGI Publishing, Hershey, PA, USA, Chapter: The implementation of electronic commerce in SMEs in Singapore, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  80. Wei-Chang Kong. 2006. E-commerce and cultural values (Inbook-num chap). IGI Publishing, Hershey, PA, USA, Chapter (in type field) 22, 51--74. http://portal.acm.org/citation.cfm?id=887006.887010Google ScholarGoogle Scholar
  81. E. Korach, D. Rotem, and N. Santoro. 1984. Distributed algorithms for finding centers and medians in networks. ACM Trans. Program. Lang. Syst. , Vol. 6, 3 (July 1984), 380--401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Jacob Kornerup. 1994. Mapping Powerlists onto Hypercubes. Master's thesis. The University of Texas at Austin. (In preparation).Google ScholarGoogle Scholar
  83. David Kosiur. 2001. Understanding Policy-Based Networking 2nd. ed.). Wiley, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Leslie Lamport. 1986. ŁaTeX: A Document Preparation System. Addison-Wesley, Reading, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Jan Lee. 1981. Transcript of question and answer session. In History of programming languages I (incoll), Richard L. Wexelblat (Ed.). ACM, New York, NY, USA, 68--71. https://doi.org/10.1145/800025.1198348Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Newton Lee. 2005. Interview with Bill Kinder: January 13, 2005. Video. Comput. Entertain. , Vol. 3, 1, Article 4 (Jan.-March 2005). https://doi.org/10.1145/1057270.1057278Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Seungmin Lee, Dongwan Kim, and Bohyung Han. 2021. CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 802--812. https://doi.org/10.1109/CVPR46437.2021.00086Google ScholarGoogle ScholarCross RefCross Ref
  88. Cheng-Lun Li, Ayse G. Buyuktur, David K. Hutchful, Natasha B. Sant, and Satyendra K. Nainwal. 2008. Portalis: using competitive online interactions to support aid initiatives for the homeless. In CHI '08 extended abstracts on Human factors in computing systems (Florence, Italy). ACM, New York, NY, USA, 3873--3878. https://doi.org/10.1145/1358628.1358946Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. https://doi.org/10.48550/arXiv.2301.12597 arxiv: 2301.12597 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  90. Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3330--3337. https://doi.org/10.1109/CVPR.2012.6248071 ISSN: 1063--6919.Google ScholarGoogle ScholarCross RefCross Ref
  91. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arxiv: 1907.11692 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  92. Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021a. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. https://doi.org/10.48550/arXiv.2103.14030 arxiv: 2103.14030 [cs]Google ScholarGoogle ScholarCross RefCross Ref
  93. Zheyuan Liu, Cristian Rodriguez-Opazo, Damien Teney, and Stephen Gould. 2021b. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models. https://doi.org/10.48550/ARXIV.2108.04024Google ScholarGoogle ScholarCross RefCross Ref
  94. Daniel D. McCracken and Donald G. Golden. 1990. Simplified Structured COBOL with Microsoft/MicroFocus COBOL. John Wiley & Sons, Inc., New York, NY, USA.Google ScholarGoogle Scholar
  95. Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, Wenwen Jiang, Tao Xiang, and Ning Zhang. 2022a. FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. https://doi.org/10.48550/arXiv.2210.15028Google ScholarGoogle ScholarCross RefCross Ref
  96. Suvir Mirchandani, Licheng Yu, Mengjiao Wang, Animesh Sinha, Wenwen Jiang, Tao Xiang, and Ning Zhang. 2022b. FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and Captioning. https://doi.org/10.48550/ARXIV.2210.15028Google ScholarGoogle ScholarCross RefCross Ref
  97. Sape Mullender (Ed.). 1993. Distributed systems (2nd Ed.). ACM Press/Addison-Wesley Publishing Co., New York, NY, USA.Google ScholarGoogle Scholar
  98. E. Mumford. 1987. Managerial expert systems and organizational change: some critical research issues. In Critical issues in information systems research (incoll). John Wiley & Sons, Inc., New York, NY, USA, 135--155. http://portal.acm.org/citation.cfm?id=54905.54911Google ScholarGoogle Scholar
  99. A. Natarajan, M. Motani, B. de Silva, K. Yap, and K. C. Chua. 2007. Investigating Network Architectures for Body Sensor Networks. In Network Architectures, G. Whitcomb and P. Neece (Eds.). Keleuven Press, Dayton, OH, 322--328. https://doi.org/10.1145/1721695.1721705Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, and Mohamed Elhoseiny. 2023. ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions. arxiv: 2303.06594 [cs] http://arxiv.org/abs/2303.06594Google ScholarGoogle Scholar

Index Terms

  1. Fashion-GPT: Integrating LLMs with Fashion Retrieval System

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      LGM3A '23: Proceedings of the 1st Workshop on Large Generative Models Meet Multimodal Applications
      November 2023
      84 pages
      ISBN:9798400702839
      DOI:10.1145/3607827

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)303
      • Downloads (Last 6 weeks)50

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader