Malmö University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Empirical Evaluation of Algorithms for Data Labeling
Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden..
Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden..
Chalmers Univ Technol, Dept Comp Sci & Engn, Gothenburg, Sweden..
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).ORCID iD: 0000-0002-7700-1816
2021 (English)In: 2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021) / [ed] Chan, WK Claycomb, B Takakura, H Yang, JJ Teranishi, Y Towey, D Segura, S Shahriar, H Reisman, S Ahamed, SI, IEEE, 2021, p. 201-209Conference paper, Published paper (Refereed)
Abstract [en]

The lack of labeled data is a major problem in both research and industrial settings since obtaining labels is often an expensive and time-consuming activity. In the past years, several machine learning algorithms were developed to assist and perform automated labeling in partially labeled datasets. While many of these algorithms are available in open-source packages, there is a lack of research that investigates how these algorithms compare to each other for different types of datasets and with different percentages of available labels. To address this problem, this paper empirically evaluates and compares seven algorithms for automated labeling in terms of their accuracy. We investigate how these algorithms perform in twelve different and well-known datasets with three different types of data, images, texts, and numerical values. We evaluate these algorithms under two different experimental conditions, with 10% and 50% labels of available labels in the dataset. Each algorithm, in each dataset for each experimental condition, is evaluated independently ten times with different random seeds. The results are analyzed and the algorithms are compared utilizing a Bayesian Bradley-Terry model. The results indicate that the active learning algorithms using the query strategies uncertainty sampling, QBC and random sampling are always the best algorithms. However, this comes with the expense of increased manual labeling effort. These results help machine learning practitioners in choosing optimal machine learning algorithms to label their data.

Place, publisher, year, edition, pages
IEEE, 2021. p. 201-209
Series
Proceedings International Computer Software and Applications Conference, ISSN 0730-3157
Keywords [en]
Data Labeling, Automatic Labeling, Active Learning, Semi-Supervised learning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-47248DOI: 10.1109/COMPSAC51774.2021.00038ISI: 000706529000027Scopus ID: 2-s2.0-85115878254ISBN: 978-1-6654-2463-9 (electronic)ISBN: 978-1-6654-2464-6 (print)OAI: oai:DiVA.org:mau-47248DiVA, id: diva2:1617555
Conference
45th Annual International IEEE-Computer-Society Computers, Software, and Applications Conference (COMPSAC), JUL 12-16, 2021, ELECTR NETWORK
Available from: 2021-12-07 Created: 2021-12-07 Last updated: 2024-02-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Olsson, Helena Holmström

Search in DiVA

By author/editor
Olsson, Helena Holmström
By organisation
Department of Computer Science and Media Technology (DVMT)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 57 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf