Finding associations between natural and computer languages: A case-study of bilingual LDA applied to the bleeping computer forum posts

Kundi Yao,Gustavo A. Oliva,Ahmed E. Hassan,Muhammad Asaduzzaman,Andrew J. Malton,Andrew Walenstein

SSRN Electronic Journal（2023）

引用 1|浏览17

暂无评分

摘要

In the context of technical support, trails of technical discussions often contain a mixture of natural language (e.g., English) and software log excerpts. Uncovering latent links between certain problems and log excerpts that are often requested during the discussions of those problems enables the construction of a valuable knowledge base. Nevertheless, uncovering such latent links is challenging because English and software logs are two fundamentally different languages. In this paper, we investigate the suitability of multilingual LDA models to address the problem at hand. We study three models, namely: enriched LDA (M+), two-layer LDA (M2L), and off-the-shelf bilingual LDA (Mbi). We use approximately 8K discussion threads from a Bleeping Computer forum as our dataset. We observe that M2L performs the best overall, although it yields a substantially coarser-grained view of the discussed themes in the threads (20 topics, 0.3% of the documents). We also note that M+ outperforms Mbi achieving higher coherence, lower perplexity, and higher cross-lingual coverage ratio. We invite future studies to qualitatively assess the quality of the topics produced by the LDA models, such that the feasibility of employing such models in practice can be better determined.(c) 2023 Elsevier Inc. All rights reserved.

查看译文

关键词

Technical support,Logs,LDA,Multilingual LDA,Topic models

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要