[GE users] custom suspend method TSTP doesnt work for batch queues

Reuti reuti at staff.uni-marburg.de
Thu Sep 16 23:11:19 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Andy (II ;-) ),

> instead of the default suspend signal (SIGSTOP) i need to use the SIGTSTP
> for suspend.  i have changed the queues to use the SIGTSTP as the suspend
> method.
> 
> as a test, i have set up the interactive queues to use this suspend method
> when the queue is suspended and everything works fine.  that is, when the
> queue is suspended, the job receives the TSTP signal and suspends.  the job
> resumes when the queue is unsuspended.
> 
> but, for the batch queue, using the identical TSTP command and running the
> identical job, the job does not pause on suspend.  it keeps going as if it
> never heard the TSTP signal.
> 
> i manually sent the TSTP signal to the batch executed job using the kill
> command and the pid of the running job, but it ignores this.
> 
> if i manually send TSTP with kill to the same job running executed thru the
> interactive queue, it pauses as expected untill i send it CONT to resume.
> 
> it seems that teh TSTP signal is being ignored by the job launched by the
> batch queue but not by the job launched by the interactive queue.

maybe the default TSTP signal handler behaves different for an interactive 
shell (with a terminal conected). Whether the signal is delivered at all, you 
could try with a small program, which redefines the behavior:

****************** TOP OF DATA *******************
#include <stdio.h>
#include <signal.h>
#include <stdarg.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>

void sighandler(int signum);

int main(void)
{
    float x;
    long  i;

    if (signal(SIGTSTP, &sighandler) == SIG_ERR)
      printf("Couldn't register signal handler.\n");

    for (;;)
    {
        for (i=0;i<=100000;i++)
            x=3.141592654*i+i+i*i*2.718281864;
    }

    return 0;
}

void sighandler(int signum) {
  system("date > i_got_the_signal");
}
***************** BOTTOM OF DATA *****************

When the file i_got_the_signal is created, the signal was sent (and received).

Why do you have to use SIGTSTP? You can catch SIGTSTP and raise a SIGSTOP:

****************** TOP OF DATA *******************
#include <stdio.h>
#include <signal.h>
#include <stdarg.h>
#include <time.h>
#include <unistd.h>
#include <sys/types.h>

void sighandler(int signum);

int main(void)
{
    float x;
    long  i;

    if (signal(SIGTSTP, &sighandler) == SIG_ERR)
      printf("Couldn't register signal handler.\n");


    for (;;)
    {
        for (i=0;i<=100000;i++)
            x=3.141592654*i+i+i*i*2.718281864;
    }

    return 0;
}

void sighandler(int signum) {
  raise(SIGSTOP);
}
***************** BOTTOM OF DATA *****************

Can you start your program 'by hand' via an rsh/ssh to one node? Then there is 
also no terminal connected and seems also ignore the SIGTSTP (on the node).


Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list