[GE users] To free up complexes and/or slots while suspended

fansn fansn at hotmail.com
Tue Nov 17 19:21:34 GMT 2009


Hi Reuti,

Thank you very much for the explanation. It's quite clear. Yes, it won't
always work--I'm wrong.

My scenario is license management for HSPICE jobs. When a HSPICE job is
suspended, it will take 2 minutes to release the token, but will pick up the
license almost immediately when resumed. So I think suspending a job and
increase the license count should have no problem, but to resume a job
there's a very big change to fail.

I stolen some code from sgeinspect and wrote a server, listening the evens
from (the built-in Java VM in) sgeadmin, and multicasting queue information
to the clients. Also the server exposes a RMI interface for the clients,
handling operations such as qdel, qalter etc, via an internal SSH client
connecting to the server. I plan to do some judgments when a user press the
resume button, if there's free license count left(no job queuing), then call
q_XXX command, and set the license count -1, else just cancel the operation.

Probably not a good solution but I'll try this solution this week. Many
thanks for your suggestion.

Yours sincerely,

Sinong Fan

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: 17 November 2009 18:38
To: users at gridengine.sunsource.net
Subject: Re: [GE users] To free up complexes and/or slots while suspended

Am 17.11.2009 um 16:03 schrieb fansn:

> I'm also concerning about this. I plan to use suspend/resume script  
> to set
> the correct number, i.e. when a job is suspended, get the current  
> complex

I would suggest to increase/decrease only the defined maximum in the  
global configuration (or an RQS). Otherwiese a starting or ending job  
can change the value while you are also just changing it.

> count, minus 1, and set this new count, and vice versa. Haven't  
> tried it
> yet, but in theory it should work?

Yes, no.

The problem is, that the complex is increased after the suspension is  
issued. So you can suspend by hand and it's working*, but it can't be  
used for subordination.

The way back is more complex. When you resume the job, you first have  
to check in a "qmod -usj ..." wrapper whether enough resources are  
still free, and maybe decrease it already at that time. When the  
resume script runs it's to late, and another may have just slipped in  
and you are getting out of sync and a negative total count.

-- Reuti

*) It's not failsafe I fear, if you would like to do it on two  
machines at the same time you might get a race-condition.


>
> -----Original Message-----
> From: biscisking [mailto:dsmith03 at its.jnj.com]
> Sent: 17 November 2009 13:58
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] To free up complexes and/or slots while  
> suspended
>
> I started a similar thread where the complex in question was for  
> use with
> flex-grid (qlicserver).  Answer is not coming quickly.  Did you  
> ever resolve
> this?
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=2
> 27445
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=227458
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=2
27494

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].
 

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4615 (20091117) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
 

__________ Information from ESET NOD32 Antivirus, version of virus signature
database 4615 (20091117) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=227505

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list