0 citations0 references

FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)2022pp. 14085–14095

Citations Over TimeTop 10% of 2022 papers

Sonam Goenka, Zhaoheng Zheng, Ayush Jaiswal, Rakesh Chada, Yue Wu, Varsha Hedau, Pradeep Natarajan

Abstract

Fashion image retrieval based on a query pair of reference image and natural language feedback is a challenging task that requires models to assess fashion related information from visual and textual modalities simultaneously. We propose a new vision-language transformer based model, FashionVLP, that brings the prior knowledge contained in large image-text corpora to the domain of fashion image retrieval, and combines visual information from multiple levels of context to effectively capture fashion-related information. While queries are encoded through the transformer layers, our asymmetric design adopts a novel attention-based approach for fusing target image features without involving text or transformer layers in the process. Extensive results show that FashionVLP achieves the state-of-the-art performance on benchmark datasets, with a large 23% relative improvement on the challenging FashionIQ dataset, which contains complex natural language feedback.

Related Papers

→ Solutions to the Third Benchmark Control Problem(1991)3 cited
→ The Effect of Teaching Practical Physical Modalities on the Ordering Skills of Physical Medicine and Rehabilitation Residents(2013)
Theoretical Analysis of the Benchmark for Choosing Manipulative Instruments of Monetary Policies(2009)
→ Exploring disk performance benchmarks(2017)
→ Support Structure Performance Benchmark(2023)