[GE users] Fwd: SGE 6.0: Job 9 failed

Francesco Siano fsiano at thphy.uni-duesseldorf.de
Wed Jul 14 18:16:51 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I got this email; the cluster is :
27 nodes (micro,micro2,...,micro27), each with 2 Xeon processors, running  
RedHat 7.2;
11 Intel Pcs (pico3,...,pico13) running Fedora Core 2;
$SGE_ROOT=/.netmount/software_micro/gridware/sge6.0  which is dynamically  
mounted and accessible to all nodes and PC.
I compiled GE 6.0 from source on the cluster (kernel 2.4), then I put a  
symlink from lx24-x86 to lx26-x86 in /bin, /lib, /utilbin.
The job was started and ran apparently with no problems; as we have many  
other jobs already running, the most important question would be : is this  
problem going to prevent the jobs from giving the output files with the  
results of the calculations ?
Thanks.
-Francesco



Job 9 caused action: none
  User        = stephan
  Queue       = all.q at pico5
  Host        = pico5
  Start Time  = <unknown>
  End Time    = <unknown>
failed before writing exit_status:can't read usage file for job 9.1

Shepherd trace:
07/14/2004 13:34:24 [501:2620]: shepherd called with uid = 0, euid = 501
07/14/2004 13:34:24 [501:2620]: starting up 6.0
07/14/2004 13:34:24 [501:2620]: setpgid(2620, 2620) returned 0
07/14/2004 13:34:24 [501:2620]: no prolog script to start
07/14/2004 13:34:24 [501:2621]: pid=2621 pgrp=2621 sid=2621 old pgrp=2620  
getlogin()=<no login set>
07/14/2004 13:34:24 [501:2620]: forked "job" with pid 2621
07/14/2004 13:34:24 [501:2621]: setosjobid: uid = 0, euid = 501
07/14/2004 13:34:24 [501:2620]: child: job - pid: 2621
07/14/2004 13:34:24 [501:2621]: RLIMIT_CPU setting: (soft 4294967295 hard  
4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [501:2621]: RLIMIT_FSIZE setting: (soft 4294967295  
hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [501:2621]: RLIMIT_DATA setting: (soft 4294967295 hard  
4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [501:2621]: RLIMIT_STACK setting: (soft 4294967295  
hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [501:2621]: RLIMIT_CORE setting: (soft 4294967295 hard  
4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [501:2621]: RLIMIT_VMEM/RLIMIT_AS setting: (soft  
4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [501:2621]: RLIMIT_RSS setting: (soft 4294967295 hard  
4294967295) resulting: (soft 4294967295 hard 4294967295)
07/14/2004 13:34:24 [419:2621]: closing all filedescriptors
07/14/2004 13:34:24 [419:2621]: further messages are in "error" and "trace"
07/14/2004 13:34:24 [419:2621]: execvp(/bin/bash, "bash"  
"/.netmount/software_micro/gridware/sge6.0/default/spool/pico5/job_scripts/9")

Shepherd pe_hostfile:
pico5 1 all.q at pico5 UNDEFINED



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list