Publikationer från Malmö universitet
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A Taxonomy of Interactive Online Machine Learning Strategies
Malmö universitet, Internet of Things and People (IOTAP). Malmö universitet, Fakulteten för teknik och samhälle (TS), Institutionen för datavetenskap och medieteknik (DVMT). Malmö University.ORCID-id: 0000-0002-3155-8408
Malmö universitet, Fakulteten för teknik och samhälle (TS), Institutionen för datavetenskap och medieteknik (DVMT). Malmö universitet, Internet of Things and People (IOTAP).ORCID-id: 0000-0003-0998-6585
Malmö universitet, Internet of Things and People (IOTAP). Malmö universitet, Fakulteten för teknik och samhälle (TS), Institutionen för datavetenskap och medieteknik (DVMT).ORCID-id: 0000-0002-9471-8405
2020 (engelsk)Inngår i: ECML PKDD 2020: Machine Learning and Knowledge Discovery in Databases / [ed] Hutter F.; Kersting K.; Lijffijt J.; Valera I., Springer, 2020, s. 1-17Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In interactive machine learning, human users and learning algorithms work together in order to solve challenging learning problems, e.g. with limited or no annotated data or trust issues. As annotating data can be costly, it is important to minimize the amount of annotated data needed for training while still getting a high classification accuracy. This is done by attempting to select the most informative data instances for training, where the amount of instances is limited by a labelling budget. In an online learning setting, the decision of whether or not to select an instance for labelling has to be done on-the-fly, as the data arrives in a sequential order and is only valid for a limited time period. We present a taxonomy of interactive online machine learning strategies. An interactive learning strategy determines which instances to label in an unlabelled dataset. In the taxonomy we differentiate between interactive learning strategies when the computer controls the learning process (active learning) and those when human users control the learning process (machine teaching). We then make a distinction between what triggers the learning: active learning could be triggered by uncertainty, time, or randomly, whereas machine teaching could be triggered by errors, state changes, time, or factors related to the user. We also illustrate the taxonomy by implementing versions of the different strategies and performing experiments on a benchmark dataset as well as on a synthetically generated dataset. The results show that the choice of interactive learning strategy affects performance, especially in the beginning of the online learning process, when there is a limited amount of labelled data.

sted, utgiver, år, opplag, sider
Springer, 2020. s. 1-17
Serie
Lecture notes in computer science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12458
Emneord [en]
interactive machine learning, active learning, machine teaching, online learning, streaming data
HSV kategori
Identifikatorer
URN: urn:nbn:se:mau:diva-17435DOI: 10.1007/978-3-030-67661-2_9ISI: 000717542900009Scopus ID: 2-s2.0-85103280211ISBN: 978-3-030-67660-5 (tryckt)ISBN: 978-3-030-67661-2 (digital)OAI: oai:DiVA.org:mau-17435DiVA, id: diva2:1436766
Konferanse
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Tilgjengelig fra: 2020-06-08 Laget: 2020-06-08 Sist oppdatert: 2024-02-05bibliografisk kontrollert
Inngår i avhandling
1. Approaches to Interactive Online Machine Learning
Åpne denne publikasjonen i ny fane eller vindu >>Approaches to Interactive Online Machine Learning
2020 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Abstract [en]

With the Internet of Things paradigm, the data generated by the rapidly increasing number of connected devices lead to new possibilities, such as using machine learning for activity recognition in smart environments. However, it also introduces several challenges. The sensors of different devices might be of different types, making the fusion of data non-trivial. Moreover, the devices are often mobile, resulting in that data from a particular sensor is not always available, i.e. there is a need to handle data from a dynamic set of sensors. From a machine learning perspective, the data from the sensors arrives in a streaming fashion, i.e., online learning, as compared to many learning problems where a static dataset is assumed. Machine learning is in many cases a good approach for classification problems, but the performance is often linked to the quality of the data. Having a good data set to train a model can be an issue in general, due to the often costly process of annotating the data. With dynamic and heterogeneous data, annotation can be even more problematic, because of the ever-changing environment. This means that there might not be any, or a very small amount of, annotated data to train the model on at the start of learning, often referred to as the cold start problem.

To be able to handle these issues, adaptive systems are needed. With adaptive we mean that the model is not static over time, but is updated if there for instance is a change in the environment. By including human-in-the-loop during the learning process, which we refer to as interactive machine learning, the input from users can be utilized to build the model. The type of input used is typically annotations of the data, i.e. user input in the form of correctly labelled data points. Generally, it is assumed that the user always provides correct labels in accordance with the chosen interactive learning strategy. In many real-world applications these assumptions are not realistic however, as users might provide incorrect labels or not provide labels at all in line with the chosen strategy.

In this thesis we explore which interactive learning strategies are possible in the given scenario and how they affect performance, as well as the effect of machine learning algorithms on performance. We also study how a user who is not always reliable, i.e. that does not always provide a correct label when expected to, can affect performance. We propose a taxonomy of interactive online machine learning strategies and test how the different strategies affect performance through experiments on multiple datasets. The findings show that the overall best performing interactive learning strategy is one where the user provides labels when previous estimations have been incorrect, but that the best performing machine learning algorithm depends on the problem scenario. The experiments also show that a decreased reliability of the user leads to decreased performance, especially when there is a limited amount of labelled data.

sted, utgiver, år, opplag, sider
Malmö: Malmö universitet, 2020. s. 129
Serie
Studies in Computer Science ; 10
Emneord
Machine Learning, Interactive Machine Learning, Online Learning, Active Learning, Machine Teaching
HSV kategori
Identifikatorer
urn:nbn:se:mau:diva-17433 (URN)10.24834/isbn.9789178770854 (DOI)978-91-7877-084-7 (ISBN)978-91-7877-085-4 (ISBN)
Presentation
2020-06-18, 10:15 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Knowledge Foundation, 20140035
Tilgjengelig fra: 2020-06-09 Laget: 2020-06-09 Sist oppdatert: 2024-03-05bibliografisk kontrollert
2. Interactive Online Machine Learning
Åpne denne publikasjonen i ny fane eller vindu >>Interactive Online Machine Learning
2022 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

With the Internet of Things paradigm, the data generated by the rapidly increasing number of connected devices lead to new possibilities, such as using machine learning for activity recognition in smart environments. However, it also introduces several challenges. The sensors of different devices might be mobile and of different types, i.e. there is a need to handle streaming data from a dynamic and heterogeneous set of sensors. In machine learning, the performance is often linked to the availability and quality of annotated data. Annotating data is in general costly, but it can be even more challenging if there is not any, or a very small amount of, annotated data to train the model on at the start of learning. To handle these issues, we implement interactive and adaptive systems. By including human-in-the-loop, which we refer to as interactive machine learning, the input from users can be utilized to build the model. The type of input used in interactive machine learning is typically annotations of the data, i.e. correctly labelled data points. Generally, it is assumed that the user always provides correct labels in accordance with the chosen interactive learning strategy. In many real-world applications these assumptions are not realistic however, as users might provide incorrect labels or not provide labels at all in line with the chosen strategy.

In this thesis we explore which interactive learning strategy types are possible in the given scenario and how they affect performance, as well as the effect of machine learning algorithms on the performance. We also study how a user who is not always reliable, i.e. who does not always provide a correct label when expected to, can affect performance. We propose a taxonomy of interactive online machine learning strategies and test how the different strategies affect performance through experiments on multiple datasets. Simulated experiments are compared to experiments with human participants, to verify the results. The findings show that the overall best performing interactive learning strategy is one where the user provides labels when current estimations are incorrect, but that the best performing machine learning algorithm depends on the problem scenario. The experiments also show that a decreased reliability of the user leads to decreased performance, especially when there is a limited amount of labelled data. The robustness of the machine learning algorithms differs, where e.g. Naïve Bayes classifier is better at handling a lower reliability of the user. We also present a systematic literature review on machine teaching, a subfield of interactive machine learning where the human is proactive in the interaction. The study shows that the area of machine teaching is rapidly evolving with an increased number of publications in recent years. However, as it is still maturing, there exists several open challenges that would benefit from further exploration, e.g. how human factors can affect performance.

sted, utgiver, år, opplag, sider
Malmö: Malmö universitet, 2022. s. 209
Serie
Studies in Computer Science ; 18
Emneord
Interactive Machine Learning, Active Learning, Machine Teaching, Online Learning
HSV kategori
Identifikatorer
urn:nbn:se:mau:diva-51987 (URN)10.24834/isbn.9789178772810 (DOI)978-91-7877-280-3 (ISBN)978-91-7877-281-0 (ISBN)
Disputas
2022-06-23, HS aula samt livestramas, Jan Waldenströms gata 25, Malmö, 10:00 (engelsk)
Opponent
Veileder
Merknad

In reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of Malmö University's products or services. Internal or personal use of this material is permitted.

Paper VI and VII appear in dissertation as manuscripts.

Tilgjengelig fra: 2022-06-03 Laget: 2022-06-02 Sist oppdatert: 2023-09-05bibliografisk kontrollert

Open Access i DiVA

fulltext(802 kB)296 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 802 kBChecksum SHA-512
2d114d6f35e355f947873160bee88bb4a93b2044c99a728448ae3fd57f284772d62a6e05698d1acd7269f2495b058b79e0fd2a942db6f8789e685121b36efe87
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekstScopus

Person

Tegen, AgnesDavidsson, PaulPersson, Jan A.

Søk i DiVA

Av forfatter/redaktør
Tegen, AgnesDavidsson, PaulPersson, Jan A.
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 297 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 821 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf