[GE users] Plz help with strange shepherd message

Daniel Templeton Dan.Templeton at Sun.COM
Tue May 27 21:02:48 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I just had a peek at the source code, and the trace file creation works 
like this:  If the file doesn't exist yet, create it as root, and then 
if the job owner isn't root, chown the file to the job owner and seteuid 
to the job owner; if the file does exist, just open it.  The error 
message you're seeing comes from the code segment that opens an existing 
file.  The odd thing is that the shepherd should be running as root at 
that point, so it shouldn't be having a problem opening the file.

Do you have the option to compile your own shepherd with debugging 
information added?

Daniel


Viktor Oudovenko wrote:
> Yes!
> Everything is fine with users.
> Moreover, in the example I gave below everything runs fine.
> I noticed problematic behavior even under my account when I was logged  in
> to machine and looked at the case.
> v   
>
>
>
>   
>> -----Original Message-----
>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM] 
>> Sent: Tuesday, May 27, 2008 15:40
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Plz help with strange shepherd message
>>
>> Does the given user exist on that machine?
>>
>> Daniel
>>
>> Viktor Oudovenko wrote:
>>     
>>> Daniel,
>>>
>>> Root can write in any place. This is for sure.
>>> The problem is that in directory:
>>> /opt/SGE/spool/sub04n157/active_jobs/186117.1
>>> There is trace file which belongs to user but in subdirectory 
>>> 1.sub04n157 (so the full path is 
>>> /opt/SGE/spool/sub04n157/active_jobs/186117.1/1.sub04n157/ trace 
>>> belongs to root).
>>> And shepherd.XXXX belongs to a user, so it is natural that user can 
>>> not right to file which belowns to root.
>>> The problem is why does the system try to do it?
>>>
>>> OK. To be more clrear here is example from another job but 
>>>       
>> it will be 
>>     
>>> clear seen permissions:
>>>
>>>
>>>       
>> [15:14:39]udo at sub04n178:/opt/SGE/spool/sub04n178/active_jobs/186328.1>
>>     
>>> ls -al total 32 drwxr-xr-x 3 sgeadmin sge  320 2008-05-27 08:58 .
>>> drwxr-xr-x 3 sgeadmin sge   72 2008-05-27 08:58 ..
>>> drwxr-xr-x 2 sgeadmin sge  256 2008-05-27 08:58 1.sub04n178
>>> -rw-r--r-- 1 sgeadmin sge    6 2008-05-27 08:58 addgrpid
>>> -rw-r--r-- 1 sgeadmin sge 1793 2008-05-27 08:58 config
>>> -rw-r--r-- 1 sgeadmin sge 1577 2008-05-27 08:58 environment
>>> -rw-r--r-- 1 camjayi  sge    0 2008-05-27 08:58 error
>>> -rw-r--r-- 1 camjayi  sge    0 2008-05-27 08:58 exit_status
>>> -rw-r--r-- 1 sgeadmin sge    5 2008-05-27 08:58 job_pid
>>> -rw-r--r-- 1 sgeadmin sge 1240 2008-05-27 08:58 pe_hostfile
>>> -rw-r--r-- 1 sgeadmin sge    4 2008-05-27 08:58 pid
>>> -rw-r--r-- 1 camjayi  sge 4116 2008-05-27 08:58 trace
>>>
>>>
>>>       
>> [15:14:43]udo at sub04n178:/opt/SGE/spool/sub04n178/active_jobs/186328.1>
>>     
>>> ls -l 1.sub04n178/ total 24
>>> -rw-r--r-- 1 sgeadmin sge    6 2008-05-27 08:58 addgrpid
>>> -rw-r--r-- 1 sgeadmin sge 1891 2008-05-27 08:58 config
>>> -rw-r--r-- 1 sgeadmin sge 1845 2008-05-27 08:58 environment
>>> -rw-r--r-- 1 root     sge    0 2008-05-27 08:58 error
>>> -rw-r--r-- 1 root     sge    0 2008-05-27 08:58 exit_status
>>> -rw-r--r-- 1 sgeadmin sge    5 2008-05-27 08:58 job_pid
>>> -rw-r--r-- 1 sgeadmin sge    5 2008-05-27 08:58 pid
>>> -rw-r--r-- 1 root     sge 2665 2008-05-27 08:58 trace
>>>
>>>       
>> [15:14:51]udo at sub04n178:/opt/SGE/spool/sub04n178/active_jobs/186328.1>
>>     
>>> So, as you see in the active_jobs directory trace belongs 
>>>       
>> to user . It 
>>     
>>> is fine . But in subdirectory , in this example : 
>>>       
>> 1.sub04n178 trace is 
>>     
>>> root owned.
>>>
>>> And it is general behavior in the system. 
>>>
>>> Regards,
>>> v
>>>
>>>
>>>   
>>>       
>>>> -----Original Message-----
>>>> From: Dan.Templeton at Sun.COM [mailto:Dan.Templeton at Sun.COM]
>>>> Sent: Tuesday, May 27, 2008 14:47
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] Plz help with strange shepherd message
>>>>
>>>> Check that the host where the file is generated has permission to 
>>>> write the to the /opt/SGE/spool/sub04n157/active_jobs directory as 
>>>> root.
>>>>
>>>> Daniel
>>>>
>>>> Viktor Oudovenko wrote:
>>>>     
>>>>         
>>>>> HI,
>>>>>
>>>>> Recently I was playing with jobs suspension and wrote 
>>>>> suspension/resume scripts and time after time (very often 
>>>>>           
>> it is OK) 
>>     
>>>>> for parallel jobs I see that in /tmp directory every minute
>>>>>       
>>>>>           
>>>> one file
>>>>     
>>>>         
>>>>> shephherd.XXXX, where XXXX is number is generated. Plz se
>>>>>       
>>>>>           
>>>> below usual content of on of those files.
>>>>     
>>>>         
>>>>> Plz let me know what might cause such kind of behavior.
>>>>>
>>>>> shepherd.30448
>>>>> ::::::::::::::
>>>>> 05/27/2008 02:48:11 [37394:37394 30448]: PANIC:
>>>>>
>>>>>       
>>>>>           
>> open(/opt/SGE/spool/sub04n157/active_jobs/186117.1/1.sub04n157/trace)
>>     
>>>>     
>>>>         
>>>>> failed: Permission denied
>>>>> 05/27/2008 02:48:11 [37394:37394 30448]: PANIC:
>>>>>
>>>>>       
>>>>>           
>> open(/opt/SGE/spool/sub04n157/active_jobs/186117.1/1.sub04n157/trace)
>>     
>>>>     
>>>>         
>>>>> failed: Permission denied	 
>>>>>
>>>>> Thank you very much for your help,
>>>>> Vic
>>>>> P.s. shepherd.XXXX has user permission. User who runs job.
>>>>>
>>>>>
>>>>>
>>>>>       
>>>>>           
>> ---------------------------------------------------------------------
>>     
>>>>     
>>>>         
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: 
>>>>>           
>> users-help at gridengine.sunsource.net
>>     
>>>>>   
>>>>>       
>>>>>           
>> ---------------------------------------------------------------------
>>     
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: 
>>>>         
>> users-help at gridengine.sunsource.net
>>     
>>>>     
>>>>         
>>>
>>>       
>> ---------------------------------------------------------------------
>>     
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list