[GE users] qdel

John_Tai John_Tai at smics.com
Tue Nov 6 01:22:18 GMT 2007


    [ The following text is in the "gb2312" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Here are the messages regarding an interrupted job:

11/06/2007 09:19:00|qmaster|dsls11|W|job 220729.1 failed on host dsl13 general before job because: 11/06/2007 09:19:00 [999:15426]: can't open file /tmp/220729.1.pc.q/pid: Permission denied

11/06/2007 09:19:00|execd|dsl13|E|shepherd of job 220729.1 exited with exit status = 11


Here is the output for a running job. I can see the sge_shepherd now. 

Thanks again for the help. 

  PID  PPID  PGRP COMMAND
    1     0     0 init
    2     1     1 [keventd]
    3     1     1 [kapmd]
    4     1     1 [ksoftirqd/0]
    7     1     1 [bdflush]
    5     1     1 [kswapd]
    6     1     1 [kscand]
    8     1     1 [kupdated]
    9     1     1 [mdrecoveryd]
   13     1     1 [kjournald]
   72     1     1 [khubd]
 3737     1     1 [kjournald]
 3738     1     1 [kjournald]
 4046     1     1 [eth0]
 4091     1  4091 syslogd -m 0
 4095     1  4095 klogd -x
 4143     1  4143 rpc.statd
 4155     1  4155 mdadm --monitor --scan -f
 4172     1  4172 /sbin/auditd
 4227     1  4227 /usr/sbin/apmd -p 10 -w 5 -W -P /etc/sysconfig/apm-scripts/apmscript
 4322     1  4322 /usr/sbin/sshd
19672  4322 19672  \_ sshd: root at pts/0
19675 19672 19675      \_ -bash
19726 19675 19726          \_ ps -e f -o pid,ppid,pgrp,command
 4338     1  4338 xinetd -stayalive -pidfile /var/run/xinetd.pid
 4387     1  4387 gpm -t imps2 -m /dev/mouse
 4397     1  4397 crond
 4421     1  4421 xfs -droppriv -daemon
 4431     1  4431 /usr/sbin/atd
 4490     1  4490 /sbin/mingetty tty1
 4491     1  4491 /sbin/mingetty tty2
 4492     1  4492 /sbin/mingetty tty3
 4493     1  4493 /sbin/mingetty tty4
 4494     1  4494 /sbin/mingetty tty5
 4495     1  4495 /sbin/mingetty tty6
 4496     1  4496 /usr/bin/gdm-binary -nodaemon
 4550  4496  4550  \_ /usr/bin/gdm-binary -nodaemon
 4551  4550  4551      \_ /usr/X11R6/bin/X :0 -auth /var/gdm/:0.Xauth vt7
 4559  4550  4559      \_ /usr/bin/gdmgreeter
 4641     1  4641 portmap
 4658     1  4657 ypbind
 4852     1  4852 /usr/sbin/automount --timeout=60 /home yp auto.home -nobrowse
 4862     1     1 [rpciod]
 4863     1     1 [lockd]
 4947     1  4947 ntpd -U ntp -p /var/run/ntpd.pid -g
 5316     1  5316 /home/sge/sge6.1/bin/lx24-x86/sge_execd
 9543  5316  9543  \_ sge_shepherd-191394 -bg
 9544  9543  9544      \_ /home/sge/sge6.1/utilbin/lx24-x86/rshd -l
 9545  9544  9545          \_ /home/sge/sge6.1/utilbin/lx24-x86/qrsh_starter /data/sge/spool/dsl51/active_jobs/191394.1
 9578  9545  9578              \_ csh -c eldo S013PLLFNB_NC9.sp -compat
 9612  9578  9578                  \_ /bin/sh /home/edamgr/linux/mentor/ams_2007.1/bin/eldo S013PLLFNB_NC9.sp -compat
 9820  9612  9578                      \_ /bin/sh /home/edamgr/linux/mentor/ams_2007.1/com/eldo S013PLLFNB_NC9.sp -compat
 9831  9820  9578                          \_ /home/edamgr/linux/mentor/ams_2007.1/ixl/bin/eldo.exe -i S013PLLFNB_NC9.sp -compat
 9839  9831  9578                              \_ /home/edamgr/linux/mentor/ams_2007.1/ixl/lib/mgls_asynch  -f7,10
 8350     1  8350 /home/cadence/linux/IC500/tools/bin/clsbd
 8351  8350  8350  \_ /home/cadence/linux/IC500/tools/bin/clsbd
 8352  8351  8350      \_ /home/cadence/linux/IC500/tools/bin/clsbd
22685     1 22685 cupsd


-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de]
Sent: Monday, November 05, 2007 6:04 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] qdel


Am 05.11.2007 um 06:34 schrieb John_Tai:

> Here is the output. I started the job as eldo mult.cir, stopped the  
> process with ctrl-c, but the process is still running.
> Thanks for helping.
>
>   PID  PPID  PGRP COMMAND
>  2538     1  2538 /home/sge/sge6.1/bin/lx24-amd64/sge_execd
> 11526     1 11526 /home/sge/sge6.1/utilbin/lx24-amd64/rshd -l
> 11527 11526 11527  \_ /home/sge/sge6.1/utilbin/lx24-amd64/ 
> qrsh_starter /data1/sge/spool/dsls1/active_jobs/219731.1
> 11560 11527 11560      \_ tcsh -c eldo mult.cir
> 11593 11560 11560          \_ /bin/sh /home/edamgr/linux/mentor/ 
> ams_2007.1-64bit/bin/eldo mult.cir
> 11864 11593 11560              \_ /bin/sh /home/edamgr/linux/mentor/ 
> ams_2007.1-64bit/com/eldo mult.cir
> 11877 11864 11560                  \_ /home/edamgr/linux/mentor/ 
> ams_2007.1-64bit/aol/bin/eldo_64.exe -i mult.cir
> 11886 11877 11560                      \_ /bin/sh /home/edamgr/ 
> linux/mentor/ams_2007.1-64bit/bin/run_wdb_server
> 12157 11886 11560                          \_ /bin/sh /home/edamgr/ 
> linux/mentor/ams_2007.1-64bit/aol/bin/run_wdb_server
> 12158 12157 11560                              \_ /home/edamgr/ 
> linux/mentor/ams_2007.1-64bit/jre/aol/bin/java -DMGC_TMPDIR=/tmp - 
> DAMS_WDBSERVER_INFO=/home/1

I wonder, where the sge_shepherd went to. Can you please post the  
same thing of a running job in qstat? Anything in /tmp as error  
message of the job?

This here you should be able to kill with a:

kill -9 -- -11560

i.e. killing the processgroup.

-- Reuti

>
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Friday, November 02, 2007 5:45 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] qdel
>
>
> Am 02.11.2007 um 06:36 schrieb John_Tai:
>
>> Sorry I am late. I tried that command but it doesn't work. Is this
>> what you were looking for?
>>
>>  ps -ef |grep eldo
>
> Yes, nearly. Blank between -e and f is important. It would also be
> useful to get the process group:
>
> ps -e f -o pid,ppid,pgrp,command
>
>> johnt    18024 17996  0 13:31 ?        00:00:00 tcsh -c eldo mult.cir
>> johnt    18054 18024  0 13:31 ?        00:00:00 /bin/sh /home/
>> edamgr/linux/mentor/ams_2007.1-64bit/bin/eldo mult.cir
>> johnt    18325 18054  0 13:31 ?        00:00:00 /bin/sh /home/
>> edamgr/linux/mentor/ams_2007.1-64bit/com/eldo mult.cir
>> johnt    18337 18325 84 13:31 ?        00:02:05 /home/edamgr/linux/
>> mentor/ams_2007.1-64bit/aol/bin/eldo_64.exe -i mult.cir
>
> -- Reuti
>
>>
>>
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Tuesday, October 23, 2007 5:51 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] qdel
>>
>>
>> Hi,
>>
>> Am 23.10.2007 um 08:39 schrieb John_Tai:
>>
>>> After upgrading to 6.1 qdel doesn't kill the process on the
>>> execution host anymore. The job is not in qstat, but the process
>>> itself is still running.
>>>
>>> The job is submitted using:
>>>
>>> qrsh -v eda=$cmd -cwd -now n <command>
>>>
>>> We used to be able to delete the job with qdel or even with ctrl-c,
>>> but now it doesn't work anymore.
>>>
>>> Any ideas why or how to debug?
>>
>> what is a:
>>
>> qrsh -v eda=$cmd -cwd ps -e f -o pid,ppid,pgrp,command
>>
>> showing? It should show all processes bound to the sge_shepherd.
>>
>> -- Reuti
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list