Publikationer från Malmö universitet
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Aiding Software Root Cause Analysis withLarge Language Models: Evaluation of the Effectiveness of Fine-tuned T5, GPT, and RAG in the handling Customer Fault Reports
Malmö universitet, Fakulteten för teknik och samhälle (TS), Institutionen för datavetenskap och medieteknik (DVMT).
2025 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 80 poäng / 120 hpOppgave
Abstract [en]

Software systems generate a substantial number of fault reports during pre-deployment customer testing, making manual root cause analysis (RCA) both time-consuming and error-prone. This study explores the use of large language models (LLMs)—specifically T5, GPT-2, and a retrieval-augmented generation (RAG) model—to automate and enhance the RCA process in a domain-specific software engineering setting. Using a curated dataset of real-world fault descriptions and resolutions, the models were fine-tuned and evaluated using BLEU-4, ROUGE, and BERT-based semantic similarity metrics. Results indicate that T5 outperforms GPT-2 in lexical and structural fidelity (e.g., BLEU-4: 0.1810 vs. 0.1210), while RAG achieves the highest semantic similarity (BERT score: 0.7715). These findings suggest that combining T5’s precision in technical phrasing with RAG’s contextual understanding may offer a promising direction for developing intelligent RCA assistance tools that improve both accuracy and relevance in software fault diagnosis. Future work will focus on hybrid model optimization and user-centered system integration for real-world engineering workflows.

sted, utgiver, år, opplag, sider
2025. , s. 49
Emneord [en]
software development, fault reports, root cause analysis (RCA), Large-Language Model (LLM), hybrid dataset, supervised data, unsupervised data, prototype, decision support, scalability
HSV kategori
Identifikatorer
URN: urn:nbn:se:mau:diva-82407OAI: oai:DiVA.org:mau-82407DiVA, id: diva2:2034314
Utdanningsprogram
TS Computer Science: Applied Data Science
Veileder
Examiner
Tilgjengelig fra: 2026-02-02 Laget: 2026-02-01 Sist oppdatert: 2026-02-02bibliografisk kontrollert

Open Access i DiVA

fulltext(1199 kB)44 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 1199 kBChecksum SHA-512
c039ea64f743c2732468994fabd599f707a65db435cdfa6a15e3522f1e8c997618b44ef3134794e307ed8596bc56e401271538242636de460610cecca512dfe1
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
FENG, SHIJUN
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 44 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 1674 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf