[GE users] Plz help with strange shepherd message

Reuti reuti at staff.uni-marburg.de
Tue May 27 20:50:26 BST 2008


Hi,

Am 27.05.2008 um 21:16 schrieb Viktor Oudovenko:

> Daniel,
>
> Root can write in any place. This is for sure.

if it's NFS mounted, there might be a root_squash in place in the / 
etc/exports on the file server.

> The problem is that in directory:
> /opt/SGE/spool/sub04n157/active_jobs/186117.1
> There is trace file which belongs to user but in subdirectory  
> 1.sub04n157
> (so the full path is
> /opt/SGE/spool/sub04n157/active_jobs/186117.1/1.sub04n157/ trace  
> belongs to
> root).

Having it local would be more convenient and lowers the network  
traffic. It's mounted in your installation right now?

> And shepherd.XXXX belongs to a user, so it is natural that user can  
> not
> right to file which belowns to root.
> The problem is why does the system try to do it?
>
> OK. To be more clrear here is example from another job but it will  
> be clear
> seen permissions:
>
> [15:14:39]udo at sub04n178:/opt/SGE/spool/sub04n178/active_jobs/ 
> 186328.1>ls -al
> total 32
> drwxr-xr-x 3 sgeadmin sge  320 2008-05-27 08:58 .
> drwxr-xr-x 3 sgeadmin sge   72 2008-05-27 08:58 ..
> drwxr-xr-x 2 sgeadmin sge  256 2008-05-27 08:58 1.sub04n178
> -rw-r--r-- 1 sgeadmin sge    6 2008-05-27 08:58 addgrpid
> -rw-r--r-- 1 sgeadmin sge 1793 2008-05-27 08:58 config
> -rw-r--r-- 1 sgeadmin sge 1577 2008-05-27 08:58 environment
> -rw-r--r-- 1 camjayi  sge    0 2008-05-27 08:58 error
> -rw-r--r-- 1 camjayi  sge    0 2008-05-27 08:58 exit_status
> -rw-r--r-- 1 sgeadmin sge    5 2008-05-27 08:58 job_pid
> -rw-r--r-- 1 sgeadmin sge 1240 2008-05-27 08:58 pe_hostfile
> -rw-r--r-- 1 sgeadmin sge    4 2008-05-27 08:58 pid
> -rw-r--r-- 1 camjayi  sge 4116 2008-05-27 08:58 trace
>
> [15:14:43]udo at sub04n178:/opt/SGE/spool/sub04n178/active_jobs/ 
> 186328.1>ls -l
> 1.sub04n178/
> total 24
> -rw-r--r-- 1 sgeadmin sge    6 2008-05-27 08:58 addgrpid
> -rw-r--r-- 1 sgeadmin sge 1891 2008-05-27 08:58 config
> -rw-r--r-- 1 sgeadmin sge 1845 2008-05-27 08:58 environment
> -rw-r--r-- 1 root     sge    0 2008-05-27 08:58 error
> -rw-r--r-- 1 root     sge    0 2008-05-27 08:58 exit_status
> -rw-r--r-- 1 sgeadmin sge    5 2008-05-27 08:58 job_pid
> -rw-r--r-- 1 sgeadmin sge    5 2008-05-27 08:58 pid
> -rw-r--r-- 1 root     sge 2665 2008-05-27 08:58 trace
> [15:14:51]udo at sub04n178:/opt/SGE/spool/sub04n178/active_jobs/186328.1>

So, it's a parallel job, and the local qrsh ends up in root. The qrsh  
is the default configuration or configured to be ssh?

-- Reuti

>
>
> So, as you see in the active_jobs directory trace belongs to user .  
> It is
> fine . But in subdirectory , in this example : 1.sub04n178 trace is  
> root
> owned.
>
> And it is general behavior in the system.
>
> Regards,
> v
>
>
>> -----Original Message-----
>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
>> Sent: Tuesday, May 27, 2008 14:47
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Plz help with strange shepherd message
>>
>> Check that the host where the file is generated has
>> permission to write the to the
>> /opt/SGE/spool/sub04n157/active_jobs directory as root.
>>
>> Daniel
>>
>> Viktor Oudovenko wrote:
>>> HI,
>>>
>>> Recently I was playing with jobs suspension and wrote
>>> suspension/resume scripts and time after time (very often it is OK)
>>> for parallel jobs I see that in /tmp directory every minute
>> one file
>>> shephherd.XXXX, where XXXX is number is generated. Plz se
>> below usual content of on of those files.
>>> Plz let me know what might cause such kind of behavior.
>>>
>>> shepherd.30448
>>> ::::::::::::::
>>> 05/27/2008 02:48:11 [37394:37394 30448]: PANIC:
>>>
>> open(/opt/SGE/spool/sub04n157/active_jobs/186117.1/1.sub04n157/trace)
>>> failed: Permission denied
>>> 05/27/2008 02:48:11 [37394:37394 30448]: PANIC:
>>>
>> open(/opt/SGE/spool/sub04n157/active_jobs/186117.1/1.sub04n157/trace)
>>> failed: Permission denied	
>>>
>>> Thank you very much for your help,
>>> Vic
>>> P.s. shepherd.XXXX has user permission. User who runs job.
>>>
>>>
>>>
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list