[GE users] "can't open usage file" Spool Error

John Coldrick jc at axyzfx.com
Mon Feb 28 21:53:47 GMT 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


	Answering my own question..:P

	It seems that the standard output(which we trap) generated by the task was 
indeed colliding with file permission issues - it appears SGE outputs those 
files with a particular umask - 022 I believe.  Because the user numbers 
aren't mapped perfectly on all systems, it's sometimes is blocked in writing 
and this is the error you get.

	Until I get this resolved, is there a way to source a different umask for 
redirected output from tasks?  I'm guessing that despite the end user umask 
being set to a particular value, SGE ignores this and outputs it with a 
default root umask, despite the ownership of the file belonging to the user.

	Cheers,

	J.C.

On Monday 28 February 2005 14:37, John Coldrick wrote:
> 	Running SGE 6.0u1...
>
> 	I've recently upgraded(clean installs) the OS's on some of our SGE exec
> Linux machines - going from Redhat 7.3 to SUSE 9.1.  I've encountered a
> sporadic problem with certain users submitting jobs from certain machines -
> getting error states and errors in SGE_ROOT/default/spool/machine/messages
> such as:
>
> 02/28/2005 13:45:45|execd|frodo|E|shepherd of job 123512.1 exited with exit
> status = 26
> 02/28/2005 13:45:45|execd|frodo|E|can't open usage file
> "active_jobs/123512.1/usage" for job 123512.1: No such file or directory
> 02/28/2005 13:45:45|execd|frodo|E|"can't read usage file for job 123512.1
>
> 	Having done some research I found this thread:
>
>
> http://gridengine.sunsource.net/servlets/ReadMsg?msgId=21228&listName=users
>
> 	which matches exactly our symptoms and would seem to direct me into the
> right direction - user account IDs not being maintained across the systems.
>  This has been the case, since we still have some older RH7.3 systems
> around(most notably, the SGE qmaster), and SUSE has introduced the notion
> of higher numbers for standard users.
>
> 	What's throwing me is that I'm running SGE as root everywhere - and NFS is
> mounted in such a way that I have complete read/write access across
> SGE_ROOT for root.  Also, the particular machines that have exhibited this
> problem are actually original machines that I haven't changed at all(both
> for submission, execution and the qmaster).
>
> 	Any thoughts?  Is having different user IDs for the same accounts across a
> grid just plain bad and I should address this before going any further,
> despite the fact that machines I haven't altered are acting up?
>
> 	Also, is running without an sgeadmin account OK?
>
> 	Many thanks,
>
> 	J.C.

-- 
John Coldrick                  www.axyzfx.com        Axyz Animation
Houdini/Renderman/Discreet                           425 Adelaide St W
416-504-0425                                         Toronto, ON Canada
jc at axyzfx.com                                        M5V 1S4
-----------------------------------------------------------------------
Real Time, adj.:
	Here and now, as opposed to fake time, which only occurs there
and then.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list