Malmö University Publications
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Aiding Software Root Cause Analysis withLarge Language Models: Evaluation of the Effectiveness of Fine-tuned T5, GPT, and RAG in the handling Customer Fault Reports
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

Software systems generate a substantial number of fault reports during pre-deployment customer testing, making manual root cause analysis (RCA) both time-consuming and error-prone. This study explores the use of large language models (LLMs)—specifically T5, GPT-2, and a retrieval-augmented generation (RAG) model—to automate and enhance the RCA process in a domain-specific software engineering setting. Using a curated dataset of real-world fault descriptions and resolutions, the models were fine-tuned and evaluated using BLEU-4, ROUGE, and BERT-based semantic similarity metrics. Results indicate that T5 outperforms GPT-2 in lexical and structural fidelity (e.g., BLEU-4: 0.1810 vs. 0.1210), while RAG achieves the highest semantic similarity (BERT score: 0.7715). These findings suggest that combining T5’s precision in technical phrasing with RAG’s contextual understanding may offer a promising direction for developing intelligent RCA assistance tools that improve both accuracy and relevance in software fault diagnosis. Future work will focus on hybrid model optimization and user-centered system integration for real-world engineering workflows.

Place, publisher, year, edition, pages
2025. , p. 49
Keywords [en]
software development, fault reports, root cause analysis (RCA), Large-Language Model (LLM), hybrid dataset, supervised data, unsupervised data, prototype, decision support, scalability
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-82407OAI: oai:DiVA.org:mau-82407DiVA, id: diva2:2034314
Educational program
TS Computer Science: Applied Data Science
Supervisors
Examiners
Available from: 2026-02-02 Created: 2026-02-01 Last updated: 2026-02-02Bibliographically approved

Open Access in DiVA

fulltext(1199 kB)4 downloads
File information
File name FULLTEXT02.pdfFile size 1199 kBChecksum SHA-512
c039ea64f743c2732468994fabd599f707a65db435cdfa6a15e3522f1e8c997618b44ef3134794e307ed8596bc56e401271538242636de460610cecca512dfe1
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
FENG, SHIJUN
By organisation
Department of Computer Science and Media Technology (DVMT)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 4 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 27 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf