Research Article

Detecting Main Topics using Dictionary-based Topic Analysis

Authors

  • Luca Pavan Institute of Foreign Languages, Vilnius University, Vilnius, Lithuania; Language Studies Centre, Faculty of Creative Industries, Vilnius Tech, Vilnius, Lithuania

Abstract

This paper describes a dictionary-based software for topic analysis written by the author. The dictionary was created manually. Many studies showed the advantages of using dictionaries to analyze texts. The software described here works in English and Italian languages, and it does not make use of probabilistic methods. In natural language processing, the use of a lexicon to reveal topics in a text is often avoided. Topics depend very much on the context. Assigning unique words to each topic does not help to check the topics in different contexts. However, the software, with a dictionary of about 5,500 topic words described in the paper, in many cases, allows the same word to fall into different topics. This approach allows one to find the main topics in a text, which corresponds to the most frequent topic words detected by the software. Advantages and disadvantages are discussed in the paper, along with examples. The software was extensively tested on large texts, such as Internet news corpora and classics of English and American literature, showing very high reliability in detecting the main topics. Analysis of topics in literary works demonstrates almost the same conclusions as were reached by critics.

Article information

Journal

International Journal of Linguistics, Literature and Translation

Volume (Issue)

5 (12)

Pages

48-52

Published

2022-12-04

How to Cite

Pavan, L. (2022). Detecting Main Topics using Dictionary-based Topic Analysis. International Journal of Linguistics, Literature and Translation, 5(12), 48–52. https://doi.org/10.32996/ijllt.2022.5.12.6

Downloads

Keywords:

Computational linguistics, Topic analysis, English literature, American literature, Italian literature.