[GE users] 6.0 qstat - possible RFE

Pacey, Mike m.pacey at lancaster.ac.uk
Fri Nov 5 17:10:51 GMT 2004

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi folks,

I've been trying out 6.0 with a view to upgrading our system, and
there's a couple of features that I think might be useful to add.
Thought I'd post here to see if there's any general call for these
before I make any request for enhancement.

1) Qtop / qstat -ext

I'm running the old 5.2.2 here, and after a while I started to think
it'd be really useful for users to be able to monitor how their jobs are
doing in terms of memory and % cpu usage. Users can then see if their
jobs are, e.g. consuming too much memory, have stalled (no cpu usage),
or bottlenecking on i/o (low cpu usage). My solution was a fairly crude
script "qtop" which simply skims the list of hosts which are running a
user's jobs from qstat, and then iteratively invoking a remote call to
top to show the user's processes. It's a little messy from a user's
perspective, but it's still fairly readable, though it can be a little
slow if an execution node is heavily loaded.

Looking at the docs for 6.0 the "qstat -ext" caught my eye - having the
execds keep track of this kind of info is an excellent idea. But looking
at it, I see that these are cumulative stats (ie cpu seconds consumed,
and gigabyte-seconds). Maybe this has great use in some sort of billing,
but I'd think it much more useful for users to see % cpu usage and
memory usage in the vein of top and prstat. I think such figures can be
easily computed from the existing stats -ext stats, I guess some more
digging in pdc.c will confirm that. Would anyone else find that a useful

2) qstat output

More a minor niggle this, but qstat output is something my users look at
a lot, so it needs to be easily readable. Looking at my test qstat
output, qstat now runs to 112 columns - not good for users with standard
80-column displays. Coming from the old 5.2.2 output, this is a bit of a
leap, and there seems to be a couple of ways to rationalise this:

   - the FQDN of an execution host isn't really necessary, even on
systems with execution hosts in different domains, and it takes up a
good few chars

   - the year part of the submission/start date could be removed, as I
imagine many users having year-long jobs (though of course it needs to
preserved for qacct output)

   - the task id field takes up a lot space simply because of the field
name, the rather lengthy "ja-task-ID"

I could write a screen scaper to modify this stuff more to my taste - or
modify the 'status' script which appears to be an attempt to do
something similar by someone who also thought the current output was a
bit unwieldy. I already have a few scripts to summarise my cluster
output based on qstat output, but I'm wondering in the current qstat
output needs rationalising a bit more, e.g. hiding some of the extra
info like FQDNs away in extra command line options, or giving site admin
higher level control over field selection and length (e.g. I'd be quite
happy to have just 8 chars for username, and I've never had much use for
the 'prior' field), or maybe even switch to multi-line output?

Comments welcome!



Dr Mike Pacey,                         Email: M.Pacey at lancaster.ac.uk
High Performance Systems Support,      Phone: 01524 593543
Information Systems Services,            Fax: 01524 594459
Lancaster University,
Lancaster LA1 4YW

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list