[GE users] can't open output file "/some/path/TEST.1763": Permission denied

Dmitry Zhukovski DZH at maerskoil.com
Tue Aug 21 14:08:11 BST 2007


Hi all,

 

  I am fighting with (probably) very simple case but can't solve it. 

 

  On master host I fire a job(qsub -cwd -S /bin/ksh -N TEST -q all.q  -p
0 -j y -l hostname=exec01 /home/cluster/job/test.job) which is simple
'openssl speed'.. Then if 

-         I stay in home directory on master host I get job running OK.

-         I stay in directory /some/path I get error email from shepherd
daemon

 

Job 1763 caused action: Job 1763 set to ERROR

 User        = xxx

 Queue       = all.q at exec01.cph.maerskoil.com

 Host        = exec01.cph.maerskoil.com

 Start Time  = <unknown>

 End Time    = <unknown>

failed opening input/output file:08/21/2007 14:42:57 [1021277:8259]:
error: can't open output file "/some/path/TEST.1763": Permissio Shepherd
trace:

08/21/2007 14:42:57 [1021208:8258]: shepherd called with uid = 0, euid =
1021208

08/21/2007 14:42:57 [1021208:8258]: setpgid(8258, 8258) returned 0

08/21/2007 14:42:57 [1021208:8258]: no prolog script to start

08/21/2007 14:42:57 [1021208:8258]: forked "job" with pid 8259

08/21/2007 14:42:57 [1021208:8259]: pid=8259 pgrp=8259 sid=8259 old
pgrp=8258 getlogin()=<no login set>

08/21/2007 14:42:57 [1021208:8259]: reading passwd information for user
'xxx'

08/21/2007 14:42:57 [1021208:8258]: child: job - pid: 8259

08/21/2007 14:42:57 [1021208:8259]: setosjobid: uid = 0, euid = 1021208

08/21/2007 14:42:57 [1021208:8259]: setting limits

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_CPU setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_FSIZE setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_DATA setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_STACK setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_CORE setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_VMEM/RLIMIT_AS setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: RLIMIT_RSS setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)

08/21/2007 14:42:57 [1021208:8259]: setting environment

08/21/2007 14:42:57 [1021208:8259]: Initializing error file

08/21/2007 14:42:57 [1021208:8259]: switching to intermediate/target
user

08/21/2007 14:42:57 [1021277:8259]: closing all filedescriptors

08/21/2007 14:42:57 [1021277:8259]: further messages are in "error" and
"trace"

08/21/2007 14:42:57 [1021277:8259]: error: can't open output file
"/some/path/TEST.1763": Permission denied

08/21/2007 14:42:57 [1021208:8258]: wait3 returned 8259 (status: 6656;
WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 26)

08/21/2007 14:42:57 [1021208:8258]: job exited with exit status 26

08/21/2007 14:42:57 [1021208:8258]: reaped "job" with pid 8259

08/21/2007 14:42:57 [1021208:8258]: job exited not due to signal

08/21/2007 14:42:57 [1021208:8258]: job exited with status 26

08/21/2007 14:42:57 [1021208:8258]: now sending signal KILL to pid -8259

08/21/2007 14:42:57 [1021208:8258]: no tasker to notify

08/21/2007 14:42:57 [1021208:8258]: failed starting job

08/21/2007 14:42:57 [1021208:8258]: no epilog script to start

 

Shepherd error:

08/21/2007 14:42:57 [1021277:8259]: error: can't open output file
"/some/path/TEST.1763": Permission denied

 

Shepherd pe_hostfile:

Exec01.cph.maerskoil.com 1 all.q at exec01.cph.maerskoil.com <NULL>

 

Strange point here is that user XXX has actual read/write access to
directory /some/path but sgeadmin doesn't. My guess is that execd tries
to write output to "/some/path/TEST.1763" where it's not allowed. And
therefore it fails. Am I right?

 

Dmitry Zhukovski

System Engineer

Information Services, Server Operation

50, Esplanaden, DK-1263 Copenhagen K

Phone: + 4533634022

Telefax: + 4533634034

E-mail: dzh at maerskoil.com

 


**********************************************************************
This e-mail and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to which they 
are addressed. If you have received this e-mail in error please notify 
the system manager at helpdesk at maerskoil.com.

This e-mail and its contents do not constitute and shall not be 
considered as a financial commitment of Maersk Olie og Gas AS 
and its affiliates. 
Maersk Olie og Gas AS expressly disclaims any responsibility
as to the accuracy and use of this e-mail and its contents.
**********************************************************************




More information about the gridengine-users mailing list