0 citations0 references

ThaiTC:Thai Transformer-based Image Captioning

2022pp. 1–4

Citations Over Time

Abstract

For problems with image captioning is a technique that has been used for a long time. In the past, there was a way to use convolutional neural network (CNN) for feature extraction and recurrent neural network (RNN) for generating text, and especially in Thai language, It has to be developed further in the era of the popular use of transformers. This paper proposes an end-to-end image captioning with pretrained vision Transformers (ViT) and text transformers in Thai language models namely ThaiTC, Which leverages the transformer architecture both. We has experiment pretrained vision transformer and text transformer in Thai language that best for Thai image captioning and tested on 3 Thai image captioning datasets 1) Travel 2) Food 3) Flickr 30k(t $r$ anslate) with different challenges. Includes freeze vision transformers weight training for image captioning dataset training with less image features, From the experiment, We found that ThaiTC performed much better in the Food and Flickr30k datasets than the Travel datasets, Which allowed us to automatically create subtitles about food and travel.

Citations Over Time

Abstract

Related Papers