[GE users] Reservation trouble

skylar2 skylar2 at u.washington.edu
Thu May 28 16:04:01 BST 2009


I'm having trouble getting slot reservations working with 6.1u6. Our
cluster mostly services long-running single slot jobs, but occasionally
we get multi-slot jobs that must run on a single node. We service those
with a special parallel environment with its allocation rule set to
$pe_slots. We're trying to use resource reservations to get SGE to allow
a node to empty out so that the multi-slot single-node jobs can run.
Unfortunately, SGE appears to be backfilling the smaller jobs in front
of the multi-slot jobs. Here's what I'm seeing in the scheduler log:

693995:1:RESERVING:1243522207:59:P:serial:slots:7.000000
693995:1:RESERVING:1243522207:59:Q:all.q at sage023.grid.gs.washington.edu:slots:7.000000
693119:1:STARTING:1243522192:59:Q:all.q at sage023.grid.gs.washington.edu:slots:1.000000

This was at a point when sage023 already had three out of eight slots
used, so SGE shouldn't have scheduled any more jobs to run on sage. We
have the default h_rt set to INFINITY in the hopes that would disable
backfilling, but it doesn't appear to be working. Is there anything else
we can try?

Thanks,


-- 
-- Skylar Thompson (skylar2 at u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S048, (206)-685-7354
-- University of Washington School of Medicine

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=199455

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "OpenPGP digital signature"  Application/PGP-SIGNATURE ]
    [ (Name: "signature.asc") 260 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list