[GE users] "Output file too large" errors back?

Bevan C. Bennett bevan at fulcrummicro.com
Sat Sep 15 00:35:28 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

The servers are x86 Linux systems running CentOS 4.
The desktops are running Fedora Core 6.
The output file was indeed larger than 2GB.

Here's an easy re-enactment:

[bevan at alexander ~]$ ls -al test*
-rw------- 1 bevan bevan 2307973120 Sep 14 16:16 test-output
[bevan at alexander ~]$ ls -lh test-output
-rw------- 1 bevan bevan 2.2G Sep 14 16:16 test-output
[bevan at alexander ~]$ qrsh -o /home/user/bevan/test-output
error: 1: can't stat() "/home/user/bevan/test-output" as stdout_path: Value too 
large for defined data type KRB5CCNAME=none uid=0 gid=0 1004 1008 1009 1030 5126 
9072 9076 9085
[bevan at alexander ~]$ which qrsh
/usr/local/grid/bin/lx24-x86/qrsh
[bevan at alexander ~]$ qrsh -help
GE 6.1u2
...

Joachim Gabler wrote:
> Hi Bevan,
> 
> on which OS are you experiencing this problem?
> I shortly verified the code, it doesn't differ between V60s2_BRANCH 
> (6.0u??) and V61_BRANCH (6.1u?).
> It uses the SGE_STAT macro, which resolves to stat64 on Solaris, Linux, 
> and Irix.
> 
>   Joachim
> 
> Bevan C. Bennett wrote:
>> I've started seeing errors that appear to be a re-emergence of this 
>> very old bug in SGE 6.1u2. Is anyone else experiencing anything similar?
>>
>> Original bug from 2005:
>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=1628
>>
>> An error from this afternoon:
>> Shepherd trace:
>> 09/10/2007 15:27:25 [5143:23799]: shepherd called with uid = 0, euid = 
>> 5143
>> 09/10/2007 15:27:25 [5143:23799]: starting up 6.1u2
>> 09/10/2007 15:27:25 [5143:23799]: setpgid(23799, 23799) returned 0
>> 09/10/2007 15:27:25 [5143:23799]: no prolog script to start
>> 09/10/2007 15:27:25 [5143:23800]: pid=23800 pgrp=23800 sid=23800 old 
>> pgrp=23799 getlogin()=<no login set>
>> 09/10/2007 15:27:25 [5143:23800]: reading passwd information for user 
>> 'xsu'
>> 09/10/2007 15:27:25 [5143:23800]: setosjobid: uid = 0, euid = 5143
>> 09/10/2007 15:27:25 [5143:23799]: forked "job" with pid 23800
>> 09/10/2007 15:27:25 [5143:23800]: setting limits
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_CPU setting: (soft 4294967295 
>> hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_FSIZE setting: (soft 
>> 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_DATA setting: (soft 
>> 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_STACK setting: (soft 
>> 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_CORE setting: (soft 
>> 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 
>> 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: RLIMIT_RSS setting: (soft 4294967295 
>> hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
>> 09/10/2007 15:27:25 [5143:23800]: setting environment
>> 09/10/2007 15:27:25 [5143:23799]: child: job - pid: 23800
>> 09/10/2007 15:27:25 [5143:23800]: Initializing error file
>> 09/10/2007 15:27:25 [5143:23800]: switching to intermediate/target user
>> 09/10/2007 15:27:25 [9153:23800]: closing all filedescriptors
>> 09/10/2007 15:27:25 [9153:23800]: further messages are in "error" and 
>> "trace"
>> 09/10/2007 15:27:25 [9153:23800]: can't stat() 
>> "/home/user/xsu/sim.output" as stdout_path: Value too large for 
>> defined data type KRB5CCNAME=none uid=9153 gid=9153 1004 1030 9153 20086
>> 09/10/2007 15:27:25 [5143:23799]: wait3 returned 23800 (status: 6656; 
>> WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 26)
>> 09/10/2007 15:27:25 [5143:23799]: job exited with exit status 26
>> 09/10/2007 15:27:25 [5143:23799]: reaped "job" with pid 23800
>> 09/10/2007 15:27:25 [5143:23799]: job exited not due to signal
>> 09/10/2007 15:27:25 [5143:23799]: job exited with status 26
>> 09/10/2007 15:27:25 [5143:23799]: now sending signal KILL to pid -23800
>> 09/10/2007 15:27:25 [5143:23799]: no tasker to notify
>> 09/10/2007 15:27:25 [5143:23799]: failed starting job
>> 09/10/2007 15:27:25 [5143:23799]: no epilog script to start
>>
>> Shepherd error:
>> 09/10/2007 15:27:25 [9153:23800]: can't stat() 
>> "/home/user/xsu/sim.output" as stdout_path: Value too large for 
>> defined data type KRB5CCNAME=none uid=9153 gid=9153 1004 1030 9153 20086
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list