[GE users] Re: Matlab integration

Chris Dagdigian dag at sonsorol.org
Wed Mar 15 15:51:06 GMT 2006


Using "qstat | grep <jobID>" can produce overhead. Since you seem to  
know the jobID ahead of time can you take advantage of that along  
with SGE's XML output option for qstat?

Parsing "qstat -xml -j <jobID>" will be a heck of a lot easier.

Also, rather than invoke some sort of large XML parser, take a look  
at the perl based XML::Simple methods that Joe Landman posted about:

http://gridengine.info/articles/2005/11/11/easy-gridengine-xml- 
handling-via-perl-xml-smart

That method seems to be a pretty lightweight way to zero in on  
specific bits of SGE qstat data you may require.

I also run gridengine.info -- the google-analytics stuff is just for  
my own curiosity and I can disable it if it becomes a performance  
issue. I just like poking at the reports every once in a while to see  
where people hit the site from, what search terms they use and how  
many hops through the site they have to make to get at the info they  
want.

Regards,
Chris





On Mar 15, 2006, at 10:43 AM, Bernd Dammann wrote:

> Dear Andreas,
>
>> I hope there is no reason not to talk about this in
>> public. Please find my replies below.
>>
> That's ok with me.
>
>>> Right now we are using something like 'qstat | grep jobID', which
>>> gives a substantial overhead, because this is called frequently from
>>> Matlab to check the status of the tasks.
>>
>> Generally there are means to lessen the overhead. I'm curious
>> however what kind of synchronization that is? Possibly it is
>> related to the "status files" you mention below?
>>
> Maybe I wasn't clear enough:  What we need is the status of a job/ 
> task,
> like 'qw', 't', 'r', etc, and this is only available if one does a  
> qstat
> for the whole system.  So the overhead comes from the piping into grep
> and the post-processing.
>
>> Qsub as-is does not support multiple ranges. Though that limitation
>> is only due to client-side qsub parsing code, but nevertheless the
>> limitation exists. Possibly one can work around that by submitting
>> an array job for each range?
>>
> No, one can't, because Matlab DCT considers a job as consisting of
> several tasks - splitting it up means create more than one job ID,
> which is not feasible.
>
>> I understand this makes it somewhat uncomfortable. Nevertheless
>> having it working at first with shared file system constraint
>> only may be already usable for others.
>>
> I agree, and I also see this as a minor problem.  As far as I  
> remember,
> a shared filesystem is also a requisite for the built-in JobManager in
> DCT, so it has the same limitation.
>
>> Never had problems with GridWiki.
>
> Works again for me (at least where I am now) - it's the loading of the
> .js file that created problems, because the server didn't resolve in
> our DNS. :-(
>
> Regards,
> Bernd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list