Opened 17 years ago

Closed 11 years ago

#179 closed defect (fixed)

IZ1061: Trace file does not get new data after chown over NFS

Reported by: uddeborg Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0beta2
Severity: minor Keywords: kernel


[Imported from gridengine issuezilla]

        Issue #:      1061             Platform:     All        Reporter: uddeborg (uddeborg)
       Component:     gridengine          OS:        All
     Subcomponent:    kernel           Version:      6.0beta2      CC:    None defined
        Status:       VERIFIED         Priority:     P3
      Resolution:     DUPLICATE       Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
       * Summary:     Trace file does not get new data after chown over NFS
   Status whiteboard:

     Issue 1061 blocks:
   Votes for issue 1061:

   Opened: Fri May 21 09:22:00 -0700 2004 

In main in shepherd.c, the job's trace file is
first created, and a first line where shepherd
says it was called and the uid and euid is
written.  Then the file's owner is changed to the
user running the job.  After that several more
lines are written.  If the spool directory is
mounted over NFS, all these writes fail.

I assume the reason is, contrary to the comment in
shepherd_trace_chown_intern says, that the
state-less NFS doesn't care if you have a file
descriptor open.  Each write is instead checked
for permission.  And after having changed back to
the euid of the SGE administrator, one no longer
has permission to write to this file.  (If the
write system call actually tells you so seems to
be a bit dependent on OS version and NFS flags.)

   ------- Additional comments from pollinger Wed May 26 01:53:06 -0700 2004 -------
The reason is, in fact, that NFS doesn't provide a proper way to
append to a file from two processes (even on the same host)
concurrently. So whenever a file handle is closed, whatever has been
written to the file from the other process since the file handle was
opened is overwritten.

This is documented in some creat(2) man pages (e.g of Linux) and
applies to most NFS Server implementations - an exception seems to be
the Irix 6.5 NFS Server which seems to handle appending correctly.

In our case this means, the output of the parent shepherd overwrites
the outputs of all child shepherds (which are forked to execute
prolog, pe_start, job, pe_stop and epilog).

This bug has already been reported (and fixed) as issue 1021.

*** This issue has been marked as a duplicate of 1021 ***

   ------- Additional comments from pollinger Wed May 26 01:55:37 -0700 2004 -------
Edit: This is not a duplicate of Issue 1021, it's a duplicate of Issue

   ------- Additional comments from uddeborg Wed May 26 08:23:43 -0700 2004 -------
Yes, that seems to be the same thing.  And your fix does indeed seem
to solve my problems.  (Including some consequential problems I had.)

Change History (1)

comment:1 Changed 11 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed

IZ1012 is fixed.

Note: See TracTickets for help on using tickets.