[GE users] problem when parallel environment startup script fails

cjf001 john.foley at motorola.com
Tue Sep 1 16:15:40 BST 2009


SGEers:

I'm implementing a new parallel environment - all its startup script has
to do is make sure that only one host has been allocated by SGE (*) - so
that's easy. However, if the script finds otherwise, I want to signal
an error to SGE.

The manual says that exitting the parallel environment startup script
with a code of other than 0 will cause SGE to "report the error and
not start the parallel job". Sounds good, so I tried it - and it *does*
report an error by putting the host queue that would have gotten the
job into ERROR state. Kind of drastic, but I guess I can live with that.

However, it leaves the job in the pending list. So, on the next scheduler
run, it assigns the job to *another* host queue, where it fails again, and
leaves that host queue in ERROR state. And on and on until all the
host queues I have permission for are in ERROR state. That I cannot
live with !

So, my question is, has anyone successfully signalled SGE when a parallel
environment startup script fails ? If so, how'd you do it ?! Also, is
this a bug, or is it working as designed ? Am I missing something ?

I thought about "qdel'ing" the job from within the startup script, but I
don't think that will work, since the execute hosts (which I think it's
running on at that point) are not submit hosts, so such commands are not
allowed. Any other thoughts ?  I'm using SGE v6.2u2.

     Thanks !

       John



(*) - why, you ask ?  Because the application (Momentum) takes over all
the cores on the assigned host, but doesn't run across hosts - so, it's not
a *real* parallel job, but I need it to be assigned to all the cores on
the host.


-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# Antenna & Mechanical Simulation Grp #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
                 (this email sent using Mozilla on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=215323

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list