[GE users] Fwd: SGE 6.0: Job 9 failed

Francesco Siano fsiano at thphy.uni-duesseldorf.de
Wed Jul 14 19:01:10 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

this is $SGE_ROOT/default/spool/pico5/messages :

07/13/2004 21:44:36|execd|pico5|I|starting up 6.0
07/14/2004 10:15:09|execd|pico5|I|starting up 6.0
07/14/2004 17:38:32|execd|pico5|I|starting up 6.0
07/14/2004 17:38:32|execd|pico5|E|abnormal termination of shepherd for job  
9.1: "exit_status" file is
empty
07/14/2004 17:38:32|execd|pico5|E|can't open usage file  
"active_jobs/9.1/usage" for job 9.1: No such f
ile or directory
07/14/2004 17:38:32|execd|pico5|E|"can't read usage file for job 9.1

this is $SGE_ROOT/default/spool/qmaster/messages (only last line, the rest  
is not related to this) :

07/14/2004 17:38:31|qmaster|micro|W|job 9.1 failed on host pico5 before  
writing exit_status because: c
an't read usage file for job 9.1

# ls -la $SGE_ROOT/default/spool

drwxr-xr-x  42 sgeadmin users    4096 Jul 13 22:37 .
drwxr-xr-x   4 sgeadmin users    4096 Jul 13 21:11 ..
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:14 micro
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:21 micro10
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:21 micro11
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:21 micro12
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:22 micro13
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:22 micro14
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:22 micro15
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:23 micro16
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:23 micro17
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:23 micro18
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:23 micro19
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:18 micro2
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:24 micro20
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:24 micro21
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:24 micro22
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:24 micro23
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:25 micro24
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:25 micro25
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:25 micro26
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:25 micro27
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:19 micro3
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:19 micro4
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:19 micro5
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:19 micro6
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:20 micro7
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:20 micro8
drwxr-xr-x   5 sgeadmin users    4096 Jul 13 21:20 micro9
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:35 pico10
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:36 pico11
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:37 pico12
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:37 pico13
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:25 pico3
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:31 pico4
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:32 pico5
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:32 pico6
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:33 pico7
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:34 pico8
drwxr-xr-x   5 sgeadmin sgeadmin 4096 Jul 13 22:34 pico9
drwxr-xr-x   4 sgeadmin users    4096 Jul 13 21:32 qmaster
drwxr-xr-x   2 sgeadmin users    4096 Jul 13 21:12 spooldb

Now I see that on the PCs the group is sgeadmin; should I change that to  
users ?
Thanks for help.
-Francesco


On Wed, 14 Jul 2004 19:37:48 +0200, Reuti <reuti at staff.uni-marburg.de>  
wrote:

>> $SGE_ROOT=/.netmount/software_micro/gridware/sge6.0  which is  
>> dynamically
>> mounted and accessible to all nodes and PC.
>
> Can you write to this location? You could also try to create a local  
> spool
> directory /var/spool/sge on the nodes (with the proper accessrights) and  
> change
> the entry in the cluster configuration:
>
> execd_spool_dir           /var/spool/sge
>
> (you have to stop SGE to apply this change by hand in
> /usr/sge/default/common/configuration). What is in the messages file in  
> the
> qmaster and the node?
>
> Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list