[GE users] sge 6.2u4 issue on Windows execution host

kostikbel kostikbel at ukr.net
Wed Dec 23 11:36:59 GMT 2009


[Sorry for possible wrong reply formatting.
I am answering using web].

> kostikbel wrote:
> > We have SGE 6.2u4 installation that includes a group of Windows XP
> > SP2/SFU 3.5 hosts. Quite often, queues get into the "E" state. sge_execd
> > log contains the following
> > 
> > 12/14/2009 16:33:09|  main|avexec4|E|ERROR: unlinking "jobs/00/0002/9728.1": Device busy
> > 12/14/2009 16:33:09|  main|avexec4|E|can not remove file job spool file: jobs/00/0002/9728.1
> > 12/14/2009 16:33:09|  main|avexec4|E|can't remove directory "active_jobs/29728.1": opendir(active_jobs/29728.1) failed: No such file or directory
> > 
> > The admin gets notification by email (for another job id,
> > just for illustration):
> > failed assumedly before job:can't open jobs/00/0002/.2808.1 for writing of job: Device busy
> > 
> > Note the "Device busy" part. I did searched for "Device busy" for SFU,
> > found http://www.suacommunity.com/forum/tm.aspx?m=5580&mpage=3#16800
> > but this seems not to help.
> > 
> > Any idea what is going on there ? What information shall I gather to
> > diagnose the problem ?
> 
> "Device busy" normally means that you want to remove a removable disc 
> while still some process has files of this disc open.
> 
> I guess here it means there is still a file in use in this directory 
> while this directory is deleted. Unlike UNIX, Windows doesn't allow 
> this. However, I don't quite understand why writing to .../0002/.2808.1 
> isn't possible.
> 
> I'm not sure what the error reason is, but perhaps answering these 
> questions can point us to the right direction:
> 
> * Do the exec daemons spool on a normal HD or on a SSD or something else 
> which is somehow dynamically 'mounted'?
"Normal" HDD as in VMWare ESXi 4.0, I believe.

> * Which file system does the spooling directory use? NTFS, FAT32, ...?
NTFS.

> * Is there a Virus scanner that could still read the spooling files 
> while the directory is to be deleted?
No. And, please note that error seems to happen while transferring the job from master node to execution host.

> * Is there an indexing service or something similar running that 
> automatically accesses new directories?
I consulted with windows guys, indexing
was turned off.

> 
> You could also use the tool "Handle" 
> (http://technet.microsoft.com/en-us/sysinternals/bb896655.aspx) to see 
> if some other process has an open file in this directory, or 
> "ProcessMonitor" 
> (http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx) to log 
> all file operations on the host during the job run.
> 
> Regards,
> Harald
> 
> -- 
> Sun Microsystems GmbH         Harald Pollinger
> Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
> D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
> Germany                       Fax: +49 (0)941 3075-222  (x60222)
> http://www.sun.com/gridware
> mailto:harald.pollinger at sun.com
> Sitz der Gesellschaft:
> Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
> Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=234715

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list