LiT: Zero-Shot Transfer with Locked-image text Tuning
Citations Over TimeTop 1% of 2022 papers
Abstract
This paper presents contrastive-tuning, a simple method employing contrastive training to align image and text mod-els while still taking advantage of their pre-training. In our empirical study we find that locked pre-trained image mod-els with unlocked text models work best. We call this in-stance of contrastive-tuning “Locked-image Tuning” (LiT), which just teaches a text model to read out good repre-sentations from a pre-trained image model for new tasks. A LiT model gains the capability of zero-shot transfer to new vision tasks, such as image classification or retrieval. The proposed LiT is widely applicable; it works reliably with multiple pre-training methods (supervised and unsu-pervised) and across diverse architectures (ResNet, Vision Transformers and MLP-Mixer) using three different image-text datasets. With the transformer-based pre-trained ViT-g/14 model, the LiT model achieves 84.5% zero-shot trans-fer accuracy on the ImageNet test set, and 81.1% on the challenging out-of-distribution ObjectNet test set.
Related Papers
- → A shot-by-shot breakdown of Remembrance(2019)1 cited
- → Study on viewer’s preference of sensibility vocabulary depending on composition of portrait shot in image -Mainly on the basis of size and placement of shot-(2021)1 cited
- → <title>High-speed photographic study on shot put</title>(1995)
- Study on Shot and be Shot Skills of Our Men Soccer Player from the World Cup(2002)
- → Take a Shot: Part 1(2017)