Malmö University Publications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Theoretical Aspects on Performance Bounds and Fault Tolerance in Parallel Computing
Blekinge Institute of Technology.
2007 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis consists of two parts: performance bounds for scheduling algorithms for parallel programs in multiprocessor systems, and recovery schemes for fault tolerant distributed systems when one or more computers go down. In the first part we deliver tight bounds on the ratio for the minimal completion time of a parallel program executed in a parallel system in two scenarios. Scenario one, the ratio for minimal completion time when processes can be reallocated compared to when they cannot be reallocated to other processors during their execution time. Scenario two, when a schedule is preemptive, the ratio for the minimal completion time when we use two different numbers of preemptions. The second part discusses the problem of redistribution of the load among running computers in a parallel system. The goal is to find a redistribution scheme that maintains high performance even when one or more computers go down. Here we deliver four different redistribution algorithms. In both parts we use theoretical techniques that lead to explicit worst-case programs and scenarios. The correctness is based on mathematical proofs.

Place, publisher, year, edition, pages
Blekinge Institute of Technology , 2007.
Series
Blekinge Institute of Technology Doctoral Dissertation Series, ISSN 1653-2090
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:mau:diva-7778Local ID: 8614ISBN: 978-91-7295-126-6 (print)OAI: oai:DiVA.org:mau-7778DiVA, id: diva2:1404719
Available from: 2020-02-28 Created: 2020-02-28 Last updated: 2021-01-08Bibliographically approved
List of papers
1. Comparing the optimal performance of parallel architectures
Open this publication in new window or tab >>Comparing the optimal performance of parallel architectures
2004 (English)In: Computer journal, ISSN 0010-4620, E-ISSN 1460-2067, Vol. 47, no 5, p. 527-544Article in journal (Refereed) Published
Abstract [en]

Consider a parallel program with n processes and a synchronization granularity z. Consider also two parallel architectures: an SMP with q processors and run-time reallocation of processes to processors, and a distributed system (or cluster) with k processors and no run-time reallocation. There is an inter-processor communication delay of t time units for the system with no run-time reallocation. In this paper we define a function H(n,k,q,t,z) such that the minimum completion time for all programs with n processes and a granularity z is at most H(n,k,q,t,z) times longer using the system with no reallocation and k processors compared to using the system with q processors and run-time reallocation. We assume optimal allocation and scheduling of processes to processors. The function H(n,k,q,t,z)is optimal in the sense that there is at least one program, with n processes and a granularity z, such that the ratio is exactly H(n,k,q,t,z). We also validate our results using measurements on distributed and multiprocessor Sun/Solaris environments. The function H(n,k,q,t,z) provides important insights regarding the performance implications of the fundamental design decision of whether to allow run-time reallocation of processes or not. These insights can be used when doing the proper cost/benefit trade-offs when designing parallel execution platforms.

Abstract [sv]

Vi betraktar ett parallellt program med n processer och synkroniseringsgranularitet z, samt två parallella arkitekturer. Det första har q processorer och full allokering av processerna är tillåten, och det andra har k processorer och ingen reallokering under körningen. Varje reallokering tar t sekunder. Vi definierar en funktion H(n,k,q,t,z) så att körtiden för ett program med n processer och granularitet z är högst en faktor H(n,k,q,t,z) längre för systemet utan reallokering än för systemed med. Vi antar optimal allokering av processer i de två systemen. Funktionen är optimal - det finns program där körtiden är exakt H(n,k,q,t,z) gånger längre. Resultaten valideras med mätningar på multiprocessorer i Sun/Solaris miljö.

Place, publisher, year, edition, pages
Oxford: Oxford University Press, 2004
Keywords
multiprocessor, parallel computing, allocation, performance, granularity, synchronization
National Category
Mathematical Analysis Computer Sciences
Identifiers
urn:nbn:se:mau:diva-39004 (URN)10.1093/comjnl/47.5.527 (DOI)000223426300002 ()oai:bth.se:forskinfoA1C92F3B0B509BC9C12573CA00309B30 (Local ID)oai:bth.se:forskinfoA1C92F3B0B509BC9C12573CA00309B30 (Archive number)oai:bth.se:forskinfoA1C92F3B0B509BC9C12573CA00309B30 (OAI)
Note

Computer Journal, 47(5): 527-544 (2004), http://www.informatik.uni-trier.de/~ley/db/journals/cj/cj47.html#KlonowskaLB04

Available from: 2012-09-18 Created: 2021-01-08 Last updated: 2021-01-08Bibliographically approved
2. The maximum gain of increasing the number of preemptions in multiprocessor scheduling
Open this publication in new window or tab >>The maximum gain of increasing the number of preemptions in multiprocessor scheduling
2009 (English)In: Acta Informatica, ISSN 0001-5903, E-ISSN 1432-0525, Vol. 46, no 4, p. 285-295Article in journal (Refereed) Published
Abstract [en]

We consider the optimal makespan C(P, m, i) of an arbitrary set P of independent jobs scheduled with i preemptions on a multiprocessor with m identical processors. We compare the ratio for such makespans for i and j preemptions, respectively, where i < j. This ratio depends on P, but we are interested in the P that maximizes this ratio, i. e. we calculate a formula for the worst case ratio G(m, i, j) defined as G(m, i, j) = max C(P, m, i)/C(P, m, j), where the maximum is taken over all sets P of independent jobs.

Place, publisher, year, edition, pages
Springer, 2009
National Category
Software Engineering
Identifiers
urn:nbn:se:mau:diva-39005 (URN)10.1007/s00236-009-0096-5 (DOI)000267214400002 ()oai:bth.se:forskinfo746E4895019EE67EC12576AC003C33B0 (Local ID)oai:bth.se:forskinfo746E4895019EE67EC12576AC003C33B0 (Archive number)oai:bth.se:forskinfo746E4895019EE67EC12576AC003C33B0 (OAI)
Available from: 2012-09-18 Created: 2021-01-08 Last updated: 2021-01-08Bibliographically approved
3. Using Golomb Rulers for Optimal Recovery Schemes in Fault Tolerant Distributed Computing
Open this publication in new window or tab >>Using Golomb Rulers for Optimal Recovery Schemes in Fault Tolerant Distributed Computing
2003 (English)In: Proceedings International Parallel and Distributed Processing Symposium, IEEE, 2003Conference paper, Published paper (Refereed)
Abstract [en]

Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. In this paper we define recovery schemes, which are optimal for a number of important cases. We also show that the problem of finding optimal recovery schemes corresponds to the mathematical problem called Golomb rulers. These provide optimal recovery schemes for up to 373 computers in the cluster.

Place, publisher, year, edition, pages
IEEE, 2003
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:mau:diva-39009 (URN)10.1109/IPDPS.2003.1213390 (DOI)0-7695-1926-1 (ISBN)
Conference
International Parallel and Distributed Processing Symposium; 22-26 April 2003; Nice, France
Available from: 2021-01-08 Created: 2021-01-08 Last updated: 2021-04-27Bibliographically approved
4. Using Modulo Rulers for Optimal Recovery Schemes in Distributed Computing
Open this publication in new window or tab >>Using Modulo Rulers for Optimal Recovery Schemes in Distributed Computing
2004 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. We define recovery schemes, which are optimal for a larger number of computers down than in previous results. We also show that the problem of finding optimal recovery schemes for a cluster with n computers corresponds to the mathematical problem of finding the longest sequence of positive integers for which the sum of the sequence and the sums of all subsequences modulo n are unique.

Place, publisher, year, edition, pages
Papeete, Tahiti, French Polynesia: Institute of Electrical and Electronics Engineers (IEEE), 2004
Keywords
distributed processing, resource allocation, software fault tolerance, system recovery, workstation clusters
National Category
Computer Sciences
Identifiers
urn:nbn:se:mau:diva-39006 (URN)10.1109/PRDC.2004.1276564 (DOI)000189450600015 ()2-s2.0-2642523651 (Scopus ID)oai:bth.se:forskinfo2A7A12F683CD869DC1256E280044FF56 (Local ID)0-7695-2076-6 (ISBN)oai:bth.se:forskinfo2A7A12F683CD869DC1256E280044FF56 (Archive number)oai:bth.se:forskinfo2A7A12F683CD869DC1256E280044FF56 (OAI)
Conference
10th International Symposium, Pacific Rim Dependable Computing (PRDC 2004)
Available from: 2012-09-18 Created: 2021-01-08 Last updated: 2023-12-07Bibliographically approved
5. Extended Golomb Rulers as the New Recovery Schemes in Distributed Dependable Computing
Open this publication in new window or tab >>Extended Golomb Rulers as the New Recovery Schemes in Distributed Dependable Computing
2005 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down the load on these computers must be redistributed to other computers in the cluster. The redistribution is determined by the recovery scheme. The recovery scheme should keep the load as evenly distributed as possible even when the most unfavorable combinations of computers break down, i.e. we want to optimize the worst-case behavior. We have previously defined recovery schemes that are optimal for some limited cases. In this paper we find a new recovery schemes that are based on so called Golomb rulers. They are optimal for a much larger number of cases than the previous results.

Place, publisher, year, edition, pages
Denver, USA: IEEE Computer Society, 2005
National Category
Mathematical Analysis Computer Sciences
Identifiers
urn:nbn:se:mau:diva-39007 (URN)10.1109/IPDPS.2005.215 (DOI)2-s2.0-33746318098 (Scopus ID)oai:bth.se:forskinfo5515054A3F74173CC12573CA004E116E (Local ID)0-7695-2312-9 (ISBN)oai:bth.se:forskinfo5515054A3F74173CC12573CA004E116E (Archive number)oai:bth.se:forskinfo5515054A3F74173CC12573CA004E116E (OAI)
Conference
IPDPS - 19th International Parallel and Distributed Processing Symposium
Available from: 2012-09-18 Created: 2021-01-08 Last updated: 2024-04-29Bibliographically approved
6. Optimal recovery schemes in fault tolerant distributed computing
Open this publication in new window or tab >>Optimal recovery schemes in fault tolerant distributed computing
2005 (English)In: Acta Informatica, ISSN 0001-5903, E-ISSN 1432-0525, Vol. 41, no 6, p. 341-365Article in journal (Refereed) Published
Abstract [en]

Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all n computers are up and running, we would like the load to be evenly distributed among the computers. When one or more computers break down, the load on these computers must be redistributed to other computers in the system. The redistribution is determined by the recovery scheme. The recovery scheme is governed by a sequence of integers modulo n. Each sequence guarantees minimal load on the computer that has maximal load even when the most unfavorable combinations of computers go down. We calculate the best possible such recovery schemes for any number of crashed computers by an exhaustive search, where brute force testing is avoided by a mathematical reformulation of the problem and a branch-and-bound algorithm. The search nevertheless has a high complexity. Optimal sequences, and thus a corresponding optimal bound, are presented for a maximum of twenty one computers in the distributed system or cluster.

Place, publisher, year, edition, pages
Springer, 2005
National Category
Mathematical Analysis Computer Sciences
Identifiers
urn:nbn:se:mau:diva-39008 (URN)10.1007/s00236-005-0161-7 (DOI)000228546000002 ()oai:bth.se:forskinfo08F5EB6C8C8D190EC12573CA004CCF81 (Local ID)oai:bth.se:forskinfo08F5EB6C8C8D190EC12573CA004CCF81 (Archive number)oai:bth.se:forskinfo08F5EB6C8C8D190EC12573CA004CCF81 (OAI)
Available from: 2015-02-17 Created: 2021-01-08 Last updated: 2021-01-08Bibliographically approved

Open Access in DiVA

fulltext(1599 kB)93 downloads
File information
File name FULLTEXT01.pdfFile size 1599 kBChecksum SHA-512
2b6c5740b24bfc63ae28ddd26ea9d1160795017e6b6a29d816dd44742f2d6d0a7a3f4f9dc1dca06080077c775c5dc1ff58c168fb972767ae6e4261b08b9a683d
Type fulltextMimetype application/pdf

Other links

http://urn.kb.se/resolve?urn=urn:nbn:se:bth-00385
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 98 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 82 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf