Malmö University Publications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit
Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden.ORCID iD: 0000-0003-1206-1428
Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden.ORCID iD: 0000-0003-0302-6276
Department of Pharmaceutical Biosciences, Uppsala University, Husargatan 3, 75237, Uppsala, Sweden;Science for Life Laboratory, Uppsala University, Husargatan 3, 75237 Uppsala, Sweden.ORCID iD: 0000-0001-5447-9465
Department of Information Technology, Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden.ORCID iD: 0000-0002-6289-7285
Show others and affiliations
2021 (English)In: GigaScience, E-ISSN 2047-217X, Vol. 10, no 3Article in journal (Refereed) Published
Abstract [en]

Background:

Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered “data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.

Findings:

In our pipeline model, an “interestingness function” assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a “policy” guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope.

Conclusions:

Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems – and is intended for use with a range of technologies in different deployment scenarios.

Place, publisher, year, edition, pages
Oxford University Press, 2021. Vol. 10, no 3
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mau:diva-66098DOI: 10.1093/gigascience/giab018ISI: 000637267600009PubMedID: 33739401Scopus ID: 2-s2.0-85103229403OAI: oai:DiVA.org:mau-66098DiVA, id: diva2:1840607
Available from: 2024-02-26 Created: 2024-02-26 Last updated: 2024-03-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Blamey, Ben

Search in DiVA

By author/editor
Blamey, BenToor, SalmanDahlö, MartinWieslander, HåkanHarrison, Philip JSintorn, Ida-MariaSabirsh, AlanWählby, CarolinaSpjuth, OlaHellander, Andreas
In the same journal
GigaScience
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 65 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf