Malmö University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal Deep Learning for Group Activity Recognition in Smart Office Environments
Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).
Malmö University, Internet of Things and People (IOTAP). Malmö University, Faculty of Technology and Society (TS), Department of Computer Science and Media Technology (DVMT).
2020 (English)In: Future Internet, E-ISSN 1999-5903, Vol. 12, no 8, article id 133Article in journal (Refereed) Published
Abstract [en]

Deep learning (DL) models have emerged in recent years as the state-of-the-art technique across numerous machine learning application domains. In particular, image processing-related tasks have seen a significant improvement in terms of performance due to increased availability of large datasets and extensive growth of computing power. In this paper we investigate the problem of group activity recognition in office environments using a multimodal deep learning approach, by fusing audio and visual data from video. Group activity recognition is a complex classification task, given that it extends beyond identifying the activities of individuals, by focusing on the combinations of activities and the interactions between them. The proposed fusion network was trained based on the audio-visual stream from the AMI Corpus dataset. The procedure consists of two steps. First, we extract a joint audio-visual feature representation for activity recognition, and second, we account for the temporal dependencies in the video in order to complete the classification task. We provide a comprehensive set of experimental results showing that our proposed multimodal deep network architecture outperforms previous approaches, which have been designed for unimodal analysis, on the aforementioned AMI dataset.

Place, publisher, year, edition, pages
MDPI, 2020. Vol. 12, no 8, article id 133
Keywords [en]
multimodal learning, deep learning, activity recognition
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-18645DOI: 10.3390/fi12080133ISI: 000564821200001Scopus ID: 2-s2.0-85090084766OAI: oai:DiVA.org:mau-18645DiVA, id: diva2:1476626
Available from: 2020-10-15 Created: 2020-10-15 Last updated: 2024-02-05Bibliographically approved

Open Access in DiVA

fulltext(3134 kB)236 downloads
File information
File name FULLTEXT01.pdfFile size 3134 kBChecksum SHA-512
24580a17aabd6ce1e70d8fd14d9c6e00b379e0302ca4e5a977ce478860f03a2289704ac0be0b807cc45c2296716a5d823b8d924a0aff02d043cc2e3f4f6adb79
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Florea, George AlbertMihailescu, Radu-Casian

Search in DiVA

By author/editor
Florea, George AlbertMihailescu, Radu-Casian
By organisation
Department of Computer Science and Media Technology (DVMT)Internet of Things and People (IOTAP)
In the same journal
Future Internet
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 238 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 173 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf