0 citations0 references

LiT: Zero-Shot Transfer with Locked-image text Tuning

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)2022pp. 18102–18112

Citations Over TimeTop 1% of 2022 papers

Xiaohua Zhai, Xiao Wang, Basil Mustafa, Andreas Steiner, Daniel Keysers, Alexander Kolesnikov, Lucas Beyer

Abstract

This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text mod-els while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image mod-els with unlocked text models work best. We call this in-stance of contrastive-tuning “Locked-image Tuning” (LiT), which just teaches a text model to read out good repre-sentations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsu-pervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 84.5% zero-shot trans-fer accuracy on the ImageNet test set, and 81.1% on the challenging out-of-distribution ObjectNet test set.

Related Papers

→ A shot-by-shot breakdown of Remembrance(2019)1 cited
→ Study on viewer’s preference of sensibility vocabulary depending on composition of portrait shot in image -Mainly on the basis of size and placement of shot-(2021)1 cited
→ <title>High-speed photographic study on shot put</title>(1995)
Study on Shot and be Shot Skills of Our Men Soccer Player from the World Cup(2002)
→ Take a Shot: Part 1(2017)