[GE users] Scheduling problems

Reuti reuti at staff.uni-marburg.de
Sat Oct 18 13:31:34 BST 2008


Am 17.10.2008 um 17:51 schrieb Fedele STABILE:

> I have this problem:
> if i suspend a parallel job it doesn't leave resource to other  
> parallel
> job queued on the batch system

This is normal and intended. A suspended job might still block  
resources - like space on a scratch disk. So it's not available to  
other jobs.

What you can do:

- define the double of the amount of resources
- don't use subordincation, instead let the node be overloaded
- define a suspend threshold on the queue which should be stopped
- setup a checkpoint environment, which uses the suspension to  
migrate the job
- result: the node gets rid of the old job

This means for your setup:

- jobs must be restartable/checkpointable
- jobs in the low priority queue must be submitted using the  
appropriate checkpointing interface

-- Reuti



> I have SGE 6.0 and i have a pe called mpi
>> qconf -sp mpi
> pe_name           mpi
> slots             12
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   NONE
> stop_proc_args    NONE
> allocation_rule   $fill_up
> control_slaves    FALSE
> job_is_first_task FALSE
> urgency_slots     min
>
> Can you help me?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list