Malmö University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Assessing the Suitability of Semi-Supervised Learning Datasets using Item Response Theory
Chalmers University of Technology.
Chalmers University of Technology.
Chalmers University of Technology.
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).ORCID iD: 0000-0002-7700-1816
2021 (English)In: Proceedings - 2021 47th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2021, IEEE, 2021, p. 326-333Conference paper, Published paper (Refereed)
Abstract [en]

In practice, supervised learning algorithms require fully labeled datasets to achieve the high accuracy demanded by current modern applications. However, in industrial settings supervised learning algorithms can perform poorly because of few labeled instances. Semi-supervised learning (SSL) is an automatic labeling approach that utilizes complete labels to infer missing labels in partially complete datasets. The high number of available SSL algorithms and the lack of systematic comparison between them leaves practitioners without guidelines to select the appropriate one for their application. Moreover, each SSL algorithm is often validated and evaluated in a small number of common datasets. However, there is no research that examines what datasets are suitable for comparing different SSL algorihtms. The purpose of this paper is to empirically evaluate the suitability of the datasets commonly used to evaluate and compare different SSL algorithms. We performed a simulation study using twelve datasets of three different datatypes (numerical, text, image) on thirteen different SSL algorithms. The contributions of this paper are two-fold. First, we propose the use of Bayesian congeneric item response theory model to assess the suitability of commonly used datasets. Second, we compare the different SSL algorithms using these datasets. The results show that with except of three datasets, the others have very low discrimination factors and are easily solved by the current algorithms. Additionally, the SSL algorithms have overlapping 90% credible intervals, indicating uncertainty in the difference between the accuracy of these SSL models. The paper concludes suggesting that researchers and practitioners should better consider the choice of datasets used for comparing SSL algorithms.

Place, publisher, year, edition, pages
IEEE, 2021. p. 326-333
Keywords [en]
Congeneric model, Data Labeling, Item Response Theory, Semi- Supervised learning, Supervised learning, 'current, Data labelling, High-accuracy, Industrial settings, Labeled dataset, Learning dataset, Modern applications, Semi-supervised learning, Learning algorithms
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-48503DOI: 10.1109/SEAA53835.2021.00049ISI: 000766051900041Scopus ID: 2-s2.0-85119188616ISBN: 9781665427050 (electronic)OAI: oai:DiVA.org:mau-48503DiVA, id: diva2:1623463
Conference
2021 47th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2021, 1-3 Sept. 2021, Palermo, Italy
Available from: 2021-12-29 Created: 2021-12-29 Last updated: 2022-07-20Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Olsson, Helena Holmström

Search in DiVA

By author/editor
Olsson, Helena Holmström
By organisation
Department of Computer Science and Media Technology (DVMT)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 64 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf