[GE users] sge 6.2u4 issue on Windows execution host

pollinger harald.pollinger at sun.com
Tue Dec 29 08:44:24 GMT 2009


This error happens while the job files are written, shortly after the 
directory was created. The error is EBUSY (16) which is documented as:

[EBUSY]
   Resource busy.
   An attempt to use a system resource which was in use at the time in a
   manner which would have conflicted with the request.

I can just guess there is a bug of Interix in conjunction with VMWare 
that causes mkdir() to return before the directory actually exists, or 
something similar. Are there some dangerous caching options for the 
virtual disk?

Regards,
Harald


kostikbel wrote:
> [Sorry for possible wrong reply formatting.
> I am answering using web].
> 
>> kostikbel wrote:
>>> We have SGE 6.2u4 installation that includes a group of Windows XP
>>> SP2/SFU 3.5 hosts. Quite often, queues get into the "E" state. sge_execd
>>> log contains the following
>>>
>>> 12/14/2009 16:33:09|  main|avexec4|E|ERROR: unlinking "jobs/00/0002/9728.1": Device busy
>>> 12/14/2009 16:33:09|  main|avexec4|E|can not remove file job spool file: jobs/00/0002/9728.1
>>> 12/14/2009 16:33:09|  main|avexec4|E|can't remove directory "active_jobs/29728.1": opendir(active_jobs/29728.1) failed: No such file or directory
>>>
>>> The admin gets notification by email (for another job id,
>>> just for illustration):
>>> failed assumedly before job:can't open jobs/00/0002/.2808.1 for writing of job: Device busy
>>>
>>> Note the "Device busy" part. I did searched for "Device busy" for SFU,
>>> found http://www.suacommunity.com/forum/tm.aspx?m=5580&mpage=3#16800
>>> but this seems not to help.
>>>
>>> Any idea what is going on there ? What information shall I gather to
>>> diagnose the problem ?
>> "Device busy" normally means that you want to remove a removable disc 
>> while still some process has files of this disc open.
>>
>> I guess here it means there is still a file in use in this directory 
>> while this directory is deleted. Unlike UNIX, Windows doesn't allow 
>> this. However, I don't quite understand why writing to .../0002/.2808.1 
>> isn't possible.
>>
>> I'm not sure what the error reason is, but perhaps answering these 
>> questions can point us to the right direction:
>>
>> * Do the exec daemons spool on a normal HD or on a SSD or something else 
>> which is somehow dynamically 'mounted'?
> "Normal" HDD as in VMWare ESXi 4.0, I believe.
> 
>> * Which file system does the spooling directory use? NTFS, FAT32, ...?
> NTFS.
> 
>> * Is there a Virus scanner that could still read the spooling files 
>> while the directory is to be deleted?
> No. And, please note that error seems to happen while transferring the job from master node to execution host.
> 
>> * Is there an indexing service or something similar running that 
>> automatically accesses new directories?
> I consulted with windows guys, indexing
> was turned off.
> 
>> You could also use the tool "Handle" 
>> (http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx) to see 
>> if some other process has an open file in this directory, or 
>> "ProcessMonitor" 
>> (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to log 
>> all file operations on the host during the job run.
>>
>> Regards,
>> Harald
>>
>> -- 
>> Sun Microsystems GmbH         Harald Pollinger
>> Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
>> D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
>> Germany                       Fax: +49 (0)941 3075-222  (x60222)
>> http://www.sun.com/gridware
>> mailto:harald.pollinger at sun.com
>> Sitz der Gesellschaft:
>> Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
>> Vorsitzender des Aufsichtsrates: Martin Haering
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234715
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=235376

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list