Stack LSTM for Chinese Image Captioning
Abstract
Image captioning has attracted considerable attention in recent years. However, little work has been done for Chinese image captioning which has unique cultural characteristics and wording requirements. This paper studies how to generate more accurate Chinese image captions. We propose a novel Chinese image captioning model, which uses the pre-trained ResNet50 to extract the visual information of the image and a double-layer LSTM to predict each Chinese word. Applying this approach to Chinese image captioning, we obtained the better results on the AIC-ICC dataset compared with other image captioning algorithms, the method proposed in this paper greatly improves the evaluation performances and achieves BLEU-4 /CIDEr scores of 39.9/121.7, respectively. The actual generation results also show that the model can generate accurate, diverse and vivid Chinese caption of images.
Related Papers
- → An Integrative Review of Image Captioning Research(2021)17 cited
- → Attention-Guided Image Captioning through Word Information(2021)5 cited
- → Comprehensive Comparative Study on Several Image Captioning Techniques Based on Deep Learning Algorithm(2021)1 cited
- → CIC Chinese Image Captioning Based on Image Label Information(2021)
- → Image Captioning using Neural Networks(2022)