Malmö University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An empirical evaluation of deep semi-supervised learning
Chalmers Univ Technol, Dept Comp Sci & Engn, Horselgangen 5, S-41296 Gothenburg, Västra Götaland, Sweden.
Chalmers Univ Technol, Dept Comp Sci & Engn, Horselgangen 5, S-41296 Gothenburg, Västra Götaland, Sweden.
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).ORCID iD: 0000-0002-7700-1816
2025 (English)In: International Journal of Data Science and Analytics, ISSN 2364-415X, Vol. 20, no 4, p. 4127-4148Article in journal (Refereed) Published
Abstract [en]

Obtaining labels for supervised learning is time-consuming, and practitioners seek to minimize manual labeling. Semi-supervised learning allows practitioners to eliminate manual labeling by including unlabeled data in the training process. With many deep semi-supervised algorithms and applications available, practitioners need guidelines to select the optimal labeling algorithm for their problem. The performance of new algorithms is rarely compared against existing algorithms on real-world data. This study empirically evaluates 16 deep semi-supervised learning algorithms to fill the research gap. To investigate whether the algorithms perform differently in different scenarios, the algorithms are run on 15 commonly known datasets of three datatypes (image, text and sound). Since manual data labeling is expensive, practitioners must know how many manually labeled instances are needed to achieve the lowest error rates. Therefore, this study utilizes different configurations for the number of available labels to study the manual effort required for optimal error rate. Additionally, to study how different algorithms perform on real-world datasets, the researchers add noise to the datasets to mirror real-world datasets. The study utilizes the Bradley-Terry model to rank the algorithms based on error rates and the Binomial model to investigate the probability of achieving an error rate lower than 10%. The results demonstrate that utilizing unlabeled data with semi-supervised learning may improve classification accuracy over supervised learning. Based on the results, the authors recommend FreeMatch, SimMatch, and SoftMatch since they provide the lowest error rate and have a high probability of achieving an error rate below 10% on noisy datasets.

Place, publisher, year, edition, pages
Springer, 2025. Vol. 20, no 4, p. 4127-4148
Keywords [en]
Data labeling, Software engineering, Semi-supervised learning, Bayesian data analysis
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-73330DOI: 10.1007/s41060-024-00713-8ISI: 001401152000001Scopus ID: 2-s2.0-85217256716OAI: oai:DiVA.org:mau-73330DiVA, id: diva2:1931700
Available from: 2025-01-27 Created: 2025-01-27 Last updated: 2025-10-03Bibliographically approved

Open Access in DiVA

fulltext(390 kB)39 downloads
File information
File name FULLTEXT01.pdfFile size 390 kBChecksum SHA-512
ee812b14dffa80a6d3a9a981bef0302a0e781a281515c7563e69e0faef5d608ea6d044f66fba7f84da69317b985c1a782b0096d75366f27c5ac7fbf87fc2b929
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Olsson, Helena H.

Search in DiVA

By author/editor
Olsson, Helena H.
By organisation
Department of Computer Science and Media Technology (DVMT)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 39 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 141 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf