Interpretability analysis for Turkish word embeddings
Citations Over Time
Abstract
Due to the performance improvements they provided in natural language processing (NLP) applications, word embeddings are commonly studied and used. The algorithms that generate word embeddings, learn low dimensional, dense vector spaces that encode semantic relations among words in an unsupervised manner from large unannotated corpora. However, these vector spaces usually do not have interpretable dimensions making their semantic structure more challenging to be comprehended by the researchers. To have a better understanding of the inner structures of the word embeddings and further improve their utility, learning new, interpretable word embeddings is an active research area. In this study, a semantic category dataset (ANKAT) that contains more than 4000 unique Turkish words grouped under 62 different categories is composed to quantitatively evaluate the interpretability of the word embeddings. An interpretability analysis method based on this dataset is proposed and tested on five different embedding spaces.
Related Papers
- → When consumers need more interpretability of artificial intelligence (AI) recommendations? The effect of decision-making domains(2023)7 cited
- → Measures of Model Interpretability for Model Selection(2018)10 cited
- → Measuring Interpretability for Different Types of Machine Learning Models(2018)15 cited
- → ML Interpretability: Simple Isn't Easy(2022)2 cited
- → Dual embedding with input embedding and output embedding for better word representation(2022)