[GE users] about the qmod command

gabradshaw g.bradshaw at open.ac.uk
Thu Dec 10 09:30:51 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Bill

I have has some small success  putting the jobs into hold state before shutting down the system
but i have sge running in DB mode with the db running on a second machine which did not get shutdown.

but we have also told all are users in our policy document, that if they wish to run on the system they must include some form
of output from running jobs in code which gives them current status so long running job can if needed start for were they left off
and this has in the most worked fine.


Geoff


On 8 Dec 2009, at 20:05, wagoodman wrote:

> I have users that runs jobs for weeks and sometime months on our grid,
> We experienced some "GDI errors" and traced it down to slow SATA disks.
> We need to move our sge installation to Fiber Channel disks, so before
> we move the storage, we're informing users that all jobs will be qdel
> at a specific time. One user ask what about a job that he's been running
> for one month, would he be able to restart the job where it was suspended
>
> Sun tech wrote:
>
> It will not work.
> The jobs need to be able to be checkpointed. Even with checkpointing
> and it may not work unless the app is inherently checkpointable.
>
> I was wondering if anyone else had a scenario like this, and what would
> be a solution.
>
> Bill
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, December 08, 2009 2:16 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] about the qmod command
>
> Am 08.12.2009 um 17:56 schrieb wagoodman:
>
>> I?m familiar with the qmod command, my real question is:  if issue
>> let?s say a qhold or qmod ? sj to suspend the job
>> then shut down  the daemon on the execution host and submit hosts
>> and shutdown the qmaster and the shadow,
>> when I finish the work on the servers (move storage) and then issue
>> qmod ?rj to reschedule the job, would that
>> work when the daemon on the execution hosts, submit hosts and the
>> qmaster and the shadow are restarted?
>
> Depends on what you try to achieve with "will it work".
>
> -- Reuti
>
>
>> Bill
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=232271
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232294
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232298
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].




G.Bradshaw
Research Computer Manager
Physics & Astronomy
The Open University
Milton Keynes
Bucks MK76AA

For all my contact info see:- http://gabradshaw.tel

The Open University is incorporated by Royal Charter (RC 000391), an exempt
charity in England & Wales and a charity registered in Scotland (SC 038302).






The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302).

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232588

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list