Topic Modeling on News Articles using Latent Dirichlet Allocation
Citations Over TimeTop 16% of 2022 papers
Abstract
Topic modeling is widely used to obtain the most vis-ible topics from a given text corpus. In this work, a demonstration of the most discussed topic modeling is presented from articles on the Reuters news website. These articles are collected and consequently processed with a Latent Dirichlet Allocation (LDA) unsupervised learning algorithm. The main goal is to build the best model(s) that accurately produces the most discussed topics. Such a model(s) can be used in real life to instantly get information about actual news to classify documents in a given dataset and extract dominated topics with their keywords. This helps to build, for example, correlations with user preferences and recommend interesting content. There are works which use different models to evaluate texts and obtain statistics about them, such as the most popular people's opinions about some question or to obtain popular and dominating subtopics of the specific topic dataset (e.g., medicine articles). As a result of the work, we were able to create a generic LDA model, trained on Wikipedia articles. The model successfully analyzes Reuters articles and extracted their topics as keyword sets. Then, they can be used to recommend content that is interesting to the target user, for example, based on the recommended content tags.
Related Papers
- → Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey(2018)1,774 cited
- → Topic Modeling on News Articles using Latent Dirichlet Allocation(2022)10 cited
- → Automatic Topic Clustering Using Latent Dirichlet Allocation with Skip-Gram Model on Final Project Abstracts(2017)2 cited
- → Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey(2017)164 cited
- → Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case Study using Latent Dirichlet Allocation Method(2023)2 cited