Attention-based Persian Language Modeling
Abstract
Attention-based models have proved their superiority on many NLP tasks, especially for English. Despite their great potential and importance of language models, little attention has been paid to attention-based language modeling for Persian. In this paper, we fine-tuned two language models, namely BERT and Persian GPT-2 on Persica corpus. We then evaluated these models by computing their perplexity on a 5-million-word dataset. Both models outperform previous SOTA results on the measure of perplexity. Our results indicate that GPT-2 performs slightly better by approximately 10 percent improvement of perplexity and seems to be a better fit for language modeling. We have proposed a modified version of perplexity, bi-perplexity, which can be a measure for comparison of language models trained with Masked Language Modeling objective. We have also introduced an innovative way of using BERT as a language model by devising a new strategy for sampling.
Related Papers
- Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling(2013)
- → Improved topic-dependent language modeling using information retrieval techniques(1999)55 cited
- → Verifying the long-range dependency of RNN language models(2016)2 cited
- → When Attention Meets Fast Recurrence: Training Language Models with\n Reduced Compute(2021)6 cited
- → Building Personalized Language Models Through Language Model Interpolation(2023)