Opened 10 years ago

Last modified 9 years ago

#737 new enhancement

IZ3166: Prolog/Epilog for a parallel job which runs on all nodes

Reported by: reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u3
Severity: Keywords: qmaster
Cc:

Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3166]

        Issue #:      3166             Platform:     All           Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.2u3            CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    ENHANCEMENT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Prolog/Epilog for a parallel job which runs on all nodes
   Status whiteboard:
      Attachments:

     Issue 3166 blocks:
   Votes for issue 3166:


   Opened: Wed Oct 21 08:37:00 -0700 2009 
------------------------


Up to now the global or queue prolog/epilog for a parallel job is only executed on the master node of a parallel job. It's necessary under certain circumstances to do some house
keeping on all of the involved nodes. While the queue prolog could certainly be used to implement this feature and issues some qrsh to the nodes to do this, it can't be done in the
epilog as the qrsh -inherit might not be available any longer and would mean to setup some passwordless (or passphraseless) login.

- The discussion on the list was to include a flag in the queue definition, wehther the queue prolog/epilog should be executed only locally or on all nodes.

- Another aproach would be to include prolog/epilog in the PE definition, which has the advantage not to tamper any already existing queue setup.

One application for this would be the program Molcas, which needs persistent scratch dircetories on all of the involved nodes during its issued mpiruns (more than one). The default is,
that the SGE created scratch directories on the nodes will be removed once the qrsh -inherit exits.

Change History (0)

Note: See TracTickets for help on using tickets.