No subject


Wed Jan 12 20:38:46 GMT 2011


This works fine for me when I'm suspending my own low priority VCS jobs with high priority ones.
But I've been seeing some (intermittent?) failures when trying to suspend other peoples jobs, and vice-versa.
I'm still debugging it, but I was wondering if I'm hitting a problem with file or PID permissions, ie; who does the suspend_method actually run as?

Rgds,
David



-----Original Message-----
From: Lloyd Cha [mailto:lccha+sgeusers at immerbox.com]
Sent: Sunday, August 10, 2008 3:22 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Can't get Synopsys VCS job to release it's license when suspended...

Hi David -

Once upon a time (like on Aug 05, 2008), Reuti wrote:
> >So somehow when SGE sends TSTP, the jobscript (and also the job?
> >there appear to be 2 things in the process group?) suspends but the
> >job itself doesn't actually see the TSTP so doesn't release the
> >license?
> >Is the actual job suspending because it's parent jobscript suspends?

If you're using the default SGE method to send the TSTP signal
(i.e. your suspend_method only has the signal name and not an
executable path), it sends that signal to the process group of the
jobscript.  Unless the jobscript changes the process group, both the
jobscript and the actual VCS job should see the TSTP and the license
should get released.

If the jobscript or some other wrapper changes the process gorup, the
child processes won't see the TSTP.

You can check the process groups with the following:
   ps -e -o pid,pgid,ppid,args

(if your system doesn't support -o, then using -j might work)

Assuming you're in the simulation phase of VCS, I don't think you'd be
calling any vendor wrappers.  You'd just be calling the VCS simv
executable directly, right?  In that case, unless your wrapper
(jobscript) has some sort of signal handling stuff that sends signals
to its children (i.e. if the jobscript caught the TSTP and then ran
kill -STOP <child>), I would think that the simv would see SGE's TSTP
signal.

You can also simulate what SGE is doing by using "kill -s TSTP -pgid"
where -pgid is the process group number of the top level jobscript.
The "-" tells the kill to send to the process group instead of just
that particular process.

On the other hand, if you have written a custom suspend method, you'd
want to make sure that it sends the TSTP to the VCS executable itself.
If it sends it to the jobscript ($job_pid), the jobscript would have
to catch the TSTP and then send a TSTP to the actual VCS process.  I'm
guessing a bit here, because I don't know what your my_suspend.sh
looks like.

I hope what I wrote above is clear.  Let me know if it isn't.

-L

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list