Aiding Software Root Cause Analysis withLarge Language Models: Evaluation of the Effectiveness of Fine-tuned T5, GPT, and RAG in the handling Customer Fault Reports
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE credits
Student thesis
Abstract [en]
Software systems generate a substantial number of fault reports during pre-deployment customer testing, making manual root cause analysis (RCA) both time-consuming and error-prone. This study explores the use of large language models (LLMs)—specifically T5, GPT-2, and a retrieval-augmented generation (RAG) model—to automate and enhance the RCA process in a domain-specific software engineering setting. Using a curated dataset of real-world fault descriptions and resolutions, the models were fine-tuned and evaluated using BLEU-4, ROUGE, and BERT-based semantic similarity metrics. Results indicate that T5 outperforms GPT-2 in lexical and structural fidelity (e.g., BLEU-4: 0.1810 vs. 0.1210), while RAG achieves the highest semantic similarity (BERT score: 0.7715). These findings suggest that combining T5’s precision in technical phrasing with RAG’s contextual understanding may offer a promising direction for developing intelligent RCA assistance tools that improve both accuracy and relevance in software fault diagnosis. Future work will focus on hybrid model optimization and user-centered system integration for real-world engineering workflows.
Place, publisher, year, edition, pages
2025. , p. 49
Keywords [en]
software development, fault reports, root cause analysis (RCA), Large-Language Model (LLM), hybrid dataset, supervised data, unsupervised data, prototype, decision support, scalability
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-82407OAI: oai:DiVA.org:mau-82407DiVA, id: diva2:2034314
Educational program
TS Computer Science: Applied Data Science
Supervisors
Examiners
2026-02-022026-02-012026-02-02Bibliographically approved