Towards Data Science | Medium

Topic Modeling Open-Source Research with the OpenAlex API

Open-source intelligence (OSINT) can enhance organizational analysis using data from social media, websites, and research. Topic modeling, an unsupervised machine learning technique, helps identify topics within large text datasets. In this context, OpenAlex provides access to millions of research articles. By importing data from OpenAlex and conducting NLP preprocessing, a topic model can be created using Latent Dirichlet Allocation (LDA). The number of topics and decay rate are crucial parameters, and their values can be optimized through parameter testing. Coherence scores measure the quality of topics, with a range of 0-1. A score around 0.48 indicates room for improvement. PyLDAvis provides interactive visualizations to explore topic distribution and relevant terms. The final step involves optimizing the topic model by testing various parameters and selecting the combination that yields the highest coherence score. This process enhances the model's effectiveness in identifying distinct and well-defined topics within the dataset.
favicon
towardsdatascience.com
towardsdatascience.com