[GE users] qdel

Reuti reuti at staff.uni-marburg.de
Fri Nov 16 15:32:53 GMT 2007


Am 12.11.2007 um 02:53 schrieb John_Tai:

> The NFS is fine, that error is out only when I interrupt the process.
>
> The "clsbd" are leftover from a previous job, but that's actually  
> normal. They just linger around.
>
> And yes, this problem is constant.

That it's not working with CTRL-C is okay (it would depend on the  
parallel program to shut down all slave tasks before it quits on its  
own), as the slave nodes don't get this signal. But a qdel should  
definitely work.

-- Reuti


> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Saturday, November 10, 2007 9:43 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qdel
>
>
> Hi,
>
> Am 06.11.2007 um 02:22 schrieb John_Tai:
>
>> Here are the messages regarding an interrupted job:
>>
>> 11/06/2007 09:19:00|qmaster|dsls11|W|job 220729.1 failed on host
>> dsl13 general before job because: 11/06/2007 09:19:00 [999:15426]:
>> can't open file /tmp/220729.1.pc.q/pid: Permission denied
>
> NFS problem?
>
>> 11/06/2007 09:19:00|execd|dsl13|E|shepherd of job 220729.1 exited
>> with exit status = 11
>
> 11 means: Resource temporarily unavailable
>
>> Here is the output for a running job. I can see the sge_shepherd now.
>>
>> Thanks again for the help.
>>
>>   PID  PPID  PGRP COMMAND
>>  5316     1  5316 /home/sge/sge6.1/bin/lx24-x86/sge_execd
>>  9543  5316  9543  \_ sge_shepherd-191394 -bg
>>  9544  9543  9544      \_ /home/sge/sge6.1/utilbin/lx24-x86/rshd -l
>>  9545  9544  9545          \_ /home/sge/sge6.1/utilbin/lx24-x86/
>> qrsh_starter /data/sge/spool/dsl51/active_jobs/191394.1
>>  9578  9545  9578              \_ csh -c eldo S013PLLFNB_NC9.sp -
>> compat
>>  9612  9578  9578                  \_ /bin/sh /home/edamgr/linux/
>> mentor/ams_2007.1/bin/eldo S013PLLFNB_NC9.sp -compat
>>  9820  9612  9578                      \_ /bin/sh /home/edamgr/
>> linux/mentor/ams_2007.1/com/eldo S013PLLFNB_NC9.sp -compat
>>  9831  9820  9578                          \_ /home/edamgr/linux/
>> mentor/ams_2007.1/ixl/bin/eldo.exe -i S013PLLFNB_NC9.sp -compat
>>  9839  9831  9578                              \_ /home/edamgr/
>> linux/mentor/ams_2007.1/ixl/lib/mgls_asynch  -f7,10
>>  8350     1  8350 /home/cadence/linux/IC500/tools/bin/clsbd
>>  8351  8350  8350  \_ /home/cadence/linux/IC500/tools/bin/clsbd
>>  8352  8351  8350      \_ /home/cadence/linux/IC500/tools/bin/clsbd
>> 22685     1 22685 cupsd
>
> This looks perfect. Only: where are the three "clsbd" processes
> coming from? Maybe with the last job the shepherd quit already before
> you issued the qdel, hence it wasn't able to kill the job. Is the
> problem still persistent?
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list