[GE users] SGE-6.0u3 Only 99 event clients are allowed in the system

Joachim Gabler Joachim.Gabler at Sun.COM
Thu Mar 3 15:51:03 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Ron,

Ron Chen schrieb:

>Even 200 clients is a problem, it can still be
>troublesome if there are several thousand sync jobs.
>
>Since usually users want to wait for all their
>submitted jobs to finish, we can further extent qevent
>(how about call it "qwait"?), to refresh memory, this
>is what I did in 2003:
>
>http://gridengine.sunsource.net/servlets/ReadMsg?msgId=10204&listName=dev
>
>So users submit their jobs with qsub as usual, but
>they collect all the job IDs. Then with the extented
>qevent, it waits for all the job IDs.
>
>Example:
>% qsub sleep
>Your job 11 ("sleep") has been submitted.
>
>% qsub sleep
>Your job 12 ("sleep") has been submitted.
>
>% qevent -wait 11 12
>Job 11 finished
>Job 12 finished
>
>And it can be more powerful if it supports timeout,
>and if it timeouted, exit with an error.
>
>(The patch I posted has a race condition -- if a job
>already finished before we invoke "qevent -wait", then
>qevent will wait forever. So we need to subscribe to
>qmaster events, and then get a list of jobs from the
>qmaster, and only wait for the running/pending jobs.)
>  
>
This race condition is easy to fix:
In addition to subscribing the JOB_FINISH or JOB_DEL events, you have to 
subscribe the JOB_LIST.
When the eventclient connects to qmaster, it will receive the complete 
job list - this could be a huge amount of data, but if you  implement 
this qevent for 6.0, you can define filters and reduce both the 
attributes to be sent per job as well as the jobs to be sent to just 
what you need.

If a job id passed by -wait doesn't exist in the job list you get when 
you connect to qmaster, the job already finished.

   Joachim

>Anyone interested in using this "qevent -wait"
>feature?
>
>:)
>
> -Ron
>
>--- Stephan Grell - Sun Germany - SSG - Software
>Engineer <stephan.grell at sun.com> wrote: 
>  
>
>>we introduced the limit for two reasons (mainly the
>>first one):
>>1) file descriptor limit
>>2) possible drag down the qmaster
>>
>>I have not seen a hugh impact on the qmaster
>>performance due to
>>large number of event clients. But than, I have
>>never done performance
>>testing with more than 200 clients.
>>
>>Cheers,
>>Stephan
>>
>>Rayson Ho wrote:
>>
>>    
>>
>>>>If you are using -sync or -now, qsub is registered
>>>>        
>>>>
>>with the qmaster as 
>>    
>>
>>>>an event client.  Because, as Charu said, event
>>>>        
>>>>
>>clients can drag down 
>>    
>>
>>>>the qmaster, there is a configurable limit to the
>>>>        
>>>>
>>number of them allowed 
>>    
>>
>>>>at a time, which defaults to 99.
>>>>   
>>>>
>>>>        
>>>>
>>>Hi Daniel,
>>>
>>>Will there be a "slave" event master??
>>>
>>>Rayson
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>Daniel
>>>>   
>>>>
>>>>        
>>>>
>
>
>	
>		
>__________________________________ 
>Celebrate Yahoo!'s 10th Birthday! 
>Yahoo! Netrospective: 100 Moments of the Web 
>http://birthday.yahoo.com/netrospective/
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list