• DocumentCode
    1165881
  • Title

    A progressive approach to handling message-dependent deadlock in parallel computer systems

  • Author

    Song, Yong Ho ; Pinkston, Timothy Mark

  • Author_Institution
    SMART Interconnects Group, Univ. of Southern California, Los Angeles, CA, USA
  • Volume
    14
  • Issue
    3
  • fYear
    2003
  • fDate
    3/1/2003 12:00:00 AM
  • Firstpage
    259
  • Lastpage
    275
  • Abstract
    Handling deadlocks is essential for providing reliable communication paths between processing nodes in parallel computer systems. The existence of multiple message types and associated inter-message dependencies may cause message-dependent deadlocks in networks that are designed to be free of routing deadlock. Most methods currently used for dealing with message-dependent deadlocks require more system resources than are necessary and/or do not use system resources efficiently. This may have an adverse effect on system performance if resources are scarce. In this paper, we characterize the frequency of message-dependent deadlocks in multiprocessor/multicomputer systems. We also propose a handling technique for message-dependent deadlocks based on progressive deadlock recovery and evaluate its performance with other approaches. Results show that message-dependent deadlocks occur very infrequently under typical circumstances thus, rendering approaches based on avoiding them overly restrictive in the common case. The proposed technique relaxes restrictions considerably, allowing the routing of packets and the handling of message-dependent deadlocks to be much more efficient-particularly when network resources are scarce.
  • Keywords
    message passing; parallel machines; parallel processing; resource allocation; system recovery; deadlock recovery; deadlock-free routing; interconnection network; message dependency; message dependent deadlock; parallel computer; parallel processing; resource allocation; Computer network reliability; Computer networks; Concurrent computing; Frequency; Hardware; Parallel processing; Routing; System performance; System recovery; Telecommunication network reliability;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2003.1189584
  • Filename
    1189584