OSCAR and ActivityNet: an Image Captioning model can effectively learn a Video Captioning dataset
2021
Citations Over Time
Abstract
Activity Recognition and Classification in video sequences is an area of research that has received attention recently. However, video processing is computationally expensive, and its advances have not been as extraordinary compared to those of Image Captioning. This work, created by Latinx individuals from Mexico, uses a computationally limited environment and transforms the Video Captioning dataset of ActivityNet into an Image Captioning. Generating features with Bottom-Up attention and training an OSCAR Image Captioning model, and using different NLP Data Augmentation techniques, we show a viable and promising approach to simplify the Video Captioning task.