[GE users] hosed my sge on altix

Wal walid.shaari at gmail.com
Tue Mar 22 20:08:54 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I have reinstalled the sge, but this time sge6u3 from source,  for
some reason the jobs do not go into run state, i have not upgraded
from sge5.3 as i am still experimenting, and i did not need the
upgrade, i installed the sge_cpuset after installation as before, but
something tells me that this is the prolog/epilog causing the problem,
find an error email below, sorry if i am not thinking straight, but
its already 11:08 pm in here

TIA

Walid

Error email

Job 14 caused action: Queue "all.q at prda" set to ERROR
 User        = root
 Queue       = all.q at prda
 Host        = prda
 Start Time  = <unknown>
 End Time    = <unknown>
failed in prolog:03/22/2005 22:58:00 [0:673]: exit_status of prolog =
1 Shepherd trace:
03/22/2005 22:58:00 [0:673]: shepherd called with uid = 0, euid = 0
03/22/2005 22:58:00 [0:673]: starting up 6.0u3
03/22/2005 22:58:00 [0:673]: setpgid(673, 673) returned 0
03/22/2005 22:58:00 [0:674]: pid=674 pgrp=674 sid=674 old pgrp=673
getlogin()=<no login set>
03/22/2005 22:58:00 [0:673]: forked "prolog" with pid 674
03/22/2005 22:58:00 [0:673]: using signal delivery delay of 120 seconds
03/22/2005 22:58:00 [0:673]: child: prolog - pid: 674
03/22/2005 22:58:00 [0:674]: closing all filedescriptors
03/22/2005 22:58:00 [0:674]: further messages are in "error" and "trace"
03/22/2005 22:58:00 [0:674]: using "/bin/bash" as shell of user "root"
03/22/2005 22:58:00 [0:674]:
execvp(/opt/sge6u3/local/bin/lx24-ia64/cpuset_prolog,
"/opt/sge6u3/local/bin/lx24-ia64/cpuset_prolog")
03/22/2005 22:58:00 [0:673]: wait3 returned 674 (status: 256;
WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 1)
03/22/2005 22:58:00 [0:673]: prolog exited with exit status 1
03/22/2005 22:58:00 [0:673]: reaped "prolog" with pid 674
03/22/2005 22:58:00 [0:673]: prolog exited not due to signal
03/22/2005 22:58:00 [0:673]: prolog exited with status 1
03/22/2005 22:58:00 [0:873]: pid=873 pgrp=873 sid=873 old pgrp=673
getlogin()=<no login set>
03/22/2005 22:58:00 [0:673]: forked "epilog" with pid 873
03/22/2005 22:58:00 [0:673]: using signal delivery delay of 120 seconds
03/22/2005 22:58:00 [0:673]: child: epilog - pid: 873
03/22/2005 22:58:00 [0:873]: closing all filedescriptors
03/22/2005 22:58:00 [0:873]: further messages are in "error" and "trace"
03/22/2005 22:58:00 [0:873]: using "/bin/bash" as shell of user "root"
03/22/2005 22:58:00 [0:873]:
execvp(/opt/sge6u3/local/bin/lx24-ia64/cpuset_epilog,
"/opt/sge6u3/local/bin/lx24-ia64/cpuset_epilog")
03/22/2005 22:58:01 [0:673]: wait3 returned 873 (status: 256;
WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 1)
03/22/2005 22:58:01 [0:673]: epilog exited with exit status 1
03/22/2005 22:58:01 [0:673]: reaped "epilog" with pid 873
03/22/2005 22:58:01 [0:673]: epilog exited not due to signal
03/22/2005 22:58:01 [0:673]: epilog exited with status 1
03/22/2005 22:58:01 [0:673]: no tasker to notify

Shepherd error:
03/22/2005 22:58:00 [0:673]: exit_status of prolog = 1
03/22/2005 22:58:01 [0:673]: exit_status of epilog = 1

Shepherd pe_hostfile:
prda 1 all.q at prda UNDEFINED

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list