[GE users] sge 6.2u4 issue on Windows execution host

pollinger harald.pollinger at sun.com
Tue Dec 22 11:11:58 GMT 2009


kostikbel wrote:
> We have SGE 6.2u4 installation that includes a group of Windows XP
> SP2/SFU 3.5 hosts. Quite often, queues get into the "E" state. sge_execd
> log contains the following
> 
> 12/14/2009 16:33:09|  main|avexec4|E|ERROR: unlinking "jobs/00/0002/9728.1": Device busy
> 12/14/2009 16:33:09|  main|avexec4|E|can not remove file job spool file: jobs/00/0002/9728.1
> 12/14/2009 16:33:09|  main|avexec4|E|can't remove directory "active_jobs/29728.1": opendir(active_jobs/29728.1) failed: No such file or directory
> 
> The admin gets notification by email (for another job id,
> just for illustration):
> failed assumedly before job:can't open jobs/00/0002/.2808.1 for writing of job: Device busy
> 
> Note the "Device busy" part. I did searched for "Device busy" for SFU,
> found http://www.suacommunity.com/forum/tm.aspx?m=5580&mpage=3#16800
> but this seems not to help.
> 
> Any idea what is going on there ? What information shall I gather to
> diagnose the problem ?

"Device busy" normally means that you want to remove a removable disc 
while still some process has files of this disc open.

I guess here it means there is still a file in use in this directory 
while this directory is deleted. Unlike UNIX, Windows doesn't allow 
this. However, I don't quite understand why writing to .../0002/.2808.1 
isn't possible.

I'm not sure what the error reason is, but perhaps answering these 
questions can point us to the right direction:

* Do the exec daemons spool on a normal HD or on a SSD or something else 
which is somehow dynamically 'mounted'?
* Which file system does the spooling directory use? NTFS, FAT32, ...?
* Is there a Virus scanner that could still read the spooling files 
while the directory is to be deleted?
* Is there an indexing service or something similar running that 
automatically accesses new directories?

You could also use the tool "Handle" 
(http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx) to see 
if some other process has an open file in this directory, or 
"ProcessMonitor" 
(http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to log 
all file operations on the host during the job run.

Regards,
Harald

-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234590

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list