[GE users] Re: deleting large numbers of jobs

tmac tmacmd at gmail.com
Thu May 8 21:53:04 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Re-posting.

The delete is of an array job with many components.

During the delete, it is as if the master chokes.
any new qsub's fail (the infamous GDI message)
Again, it is only deleting 100-300 jobs. Really, not a whole lot of them.

Is there any way to find out further what is happening and why?

Is there any way to increase the timeout before the GDI message appears?

thanks

On Thu, Apr 24, 2008 at 10:49 AM, tmac <tmacmd at gmail.com> wrote:
> SGE 6.0u7 all around
> Master/shadows RHEL4u2
> BDB via RPC on Solaris 10
>
> When we try to delete a large number of jobs (with large being more
> than *just* a couple hundred)
> the master stops responding. Sometimes it comes back, sometimes not.
>
> This morning, we deleted 330+ array jobs. The master hung. We waited 4
> minutes and qstat/qmon was still not responding.
> The master itself seemed OK.
>
> The service was restarted on the master/slaves.
>
> Anyone have any idea as to what might be going on?
>
> --
> --tmac
>
> RedHat Certified Engineer #804006984323821 (RHEL4)
> RedHat Certified Engineer #805007643429572 (RHEL5)
>
> Principal Consultant
>



-- 
--tmac

RedHat Certified Engineer #804006984323821 (RHEL4)
RedHat Certified Engineer #805007643429572 (RHEL5)

Principal Consultant

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list