[GE users] sge 6.2u4 issue on Windows execution host
harald.pollinger at sun.com
Tue Dec 29 08:44:24 GMT 2009
This error happens while the job files are written, shortly after the
directory was created. The error is EBUSY (16) which is documented as:
An attempt to use a system resource which was in use at the time in a
manner which would have conflicted with the request.
I can just guess there is a bug of Interix in conjunction with VMWare
that causes mkdir() to return before the directory actually exists, or
something similar. Are there some dangerous caching options for the
> [Sorry for possible wrong reply formatting.
> I am answering using web].
>> kostikbel wrote:
>>> We have SGE 6.2u4 installation that includes a group of Windows XP
>>> SP2/SFU 3.5 hosts. Quite often, queues get into the "E" state. sge_execd
>>> log contains the following
>>> 12/14/2009 16:33:09| main|avexec4|E|ERROR: unlinking "jobs/00/0002/9728.1": Device busy
>>> 12/14/2009 16:33:09| main|avexec4|E|can not remove file job spool file: jobs/00/0002/9728.1
>>> 12/14/2009 16:33:09| main|avexec4|E|can't remove directory "active_jobs/29728.1": opendir(active_jobs/29728.1) failed: No such file or directory
>>> The admin gets notification by email (for another job id,
>>> just for illustration):
>>> failed assumedly before job:can't open jobs/00/0002/.2808.1 for writing of job: Device busy
>>> Note the "Device busy" part. I did searched for "Device busy" for SFU,
>>> found http://www.suacommunity.com/forum/tm.aspx?m=5580&mpage=3#16800
>>> but this seems not to help.
>>> Any idea what is going on there ? What information shall I gather to
>>> diagnose the problem ?
>> "Device busy" normally means that you want to remove a removable disc
>> while still some process has files of this disc open.
>> I guess here it means there is still a file in use in this directory
>> while this directory is deleted. Unlike UNIX, Windows doesn't allow
>> this. However, I don't quite understand why writing to .../0002/.2808.1
>> isn't possible.
>> I'm not sure what the error reason is, but perhaps answering these
>> questions can point us to the right direction:
>> * Do the exec daemons spool on a normal HD or on a SSD or something else
>> which is somehow dynamically 'mounted'?
> "Normal" HDD as in VMWare ESXi 4.0, I believe.
>> * Which file system does the spooling directory use? NTFS, FAT32, ...?
>> * Is there a Virus scanner that could still read the spooling files
>> while the directory is to be deleted?
> No. And, please note that error seems to happen while transferring the job from master node to execution host.
>> * Is there an indexing service or something similar running that
>> automatically accesses new directories?
> I consulted with windows guys, indexing
> was turned off.
>> You could also use the tool "Handle"
>> (http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx) to see
>> if some other process has an open file in this directory, or
>> (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to log
>> all file operations on the host during the job run.
>> Sun Microsystems GmbH Harald Pollinger
>> Dr.-Leo-Ritter-Str. 7 Sun Grid Engine Engineering
>> D-93049 Regensburg Phone: +49 (0)941 3075-209 (x60209)
>> Germany Fax: +49 (0)941 3075-222 (x60222)
>> mailto:harald.pollinger at sun.com
>> Sitz der Gesellschaft:
>> Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
>> Vorsitzender des Aufsichtsrates: Martin Haering
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
Sun Microsystems GmbH Harald Pollinger
Dr.-Leo-Ritter-Str. 7 Sun Grid Engine Engineering
D-93049 Regensburg Phone: +49 (0)941 3075-209 (x60209)
Germany Fax: +49 (0)941 3075-222 (x60222)
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users