0 citations0 references

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

2023pp. 2214–2224

Citations Over TimeTop 10% of 2023 papers

AJ Piergiovanni, Weicheng Kuo, Anelia Angelova

Abstract

We present a simple approach which can turn a ViT en-coder into an efficient video model, which can seamlessly work with both image and video inputs. By sparsely sam-pling the inputs, the model is able to do training and in-ference from both input modalities. The model is easily scalable and can be adapted to large-scale pre-trained ViTs without requiring full finetuning. The model achieves SOTA results 1 1 https://sites.google.com/view/tubevit.

Related Papers

→ Scalability Issues of Blockchain Technology(2020)30 cited
→ On the scalability of multistage interconnection networks(2004)3 cited
→ Using Empirical Data for Scalability Analysis of Parallel Applications(2019)1 cited
RESEARCH ON THE SCALABILITY OF THE LARGE SCALE PARALLEL APPLICATION PROGRAMS(2000)
→ A Scalability Yardstick(2017)