[GE users] exit status = 10 pe_start = 134

henk h.a.slim at durham.ac.uk
Fri May 7 17:25:41 BST 2010


These are the settings in the pe

start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE

I believe this indicates absence of start or stop operations? I also use
these in another cluster. I'll change the posix_compliant.

Thanks

> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 07 May 2010 17:18
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] exit status = 10 pe_start = 134
> 
> Am 07.05.2010 um 18:12 schrieb henk:
> 
> > Hi Reuti
> >
> >> When you use Open MPI, there is no need for any start procedure of
> the
> >> PE. Did you define one anyway as you use the same PE also for other
> >> types of jobs?
> >
> > No, there is no start procedure. The queue is basically the default
> > all.q with adjustment for the pe and the number of slots:
> >
> > shell_start_mode      posix_compliant
> 
> Often unix_behavior is more appropriate as it honors the first line of
> the script.
> 
> 
> > starter_method        NONE
> > suspend_method        NONE
> > resume_method         NONE
> > terminate_method      NONE
> 
> No, in the PE:
> 
> $ qconf -sp orte
> 
> (or whatever you call it)
> 
> -- Reuti
> 
> 
> > I also tried the debug options with the dl utility but I don't think
> > this gives more information for this kind of problem?
> >
> > Thanks
> >
> >> -----Original Message-----
> >> From: reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: 07 May 2010 17:07
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] exit status = 10 pe_start = 134
> >>
> >> Am 07.05.2010 um 17:00 schrieb henk:
> >>
> >>> I installed gridengine 6.2u5 and allmost all nodes work fine
except
> > a
> >>> few where a job generates the following error message:
> >>>
> >>> failed in pestart:05/07/2010 15:14:29 [43532:15077]: exit_status
of
> >>> pe_start = 134
> >>>
> >>> (It's also in the qmaster message file)
> >>>
> >>> and the node message file has this entry
> >>>
> >>> 05/07/2010 15:14:30|  main|cn031|E|shepherd of job 556.1 exited
> with
> >>> exit status = 10
> >>
> >> This just says that the PE start procedure failed.
> >>
> >>
> >>> indicating the problem.
> >>>
> >>> I use openmpi-1.4.1. The job is put in the queue again and the
> queue
> >> is
> >>> in the error state. Clearing the error repeats the problem.
> >>>
> >>> Does anyone know what the code 134 means?
> >>
> >> Codes greater 128 are the sum of 128 plus the number of the
received
> >> signal. Means SIGABRT in your case. Now the question: where does
> this
> >> signal come from.
> >>
> >>
> >>
> >> When you use Open MPI, there is no need for any start procedure of
> the
> >> PE. Did you define one anyway as you use the same PE also for other
> >> types of jobs?
> >>
> >> -- Reuti
> >>
> >>
> >>>
> >>> Thanks
> >>>
> >>> Henk
> >>>
> >>> ------------------------------------------------------
> >>>
> >>
> >
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> >> eId=256539
> >>>
> >>> To unsubscribe from this discussion, e-mail: [users-
> >> unsubscribe at gridengine.sunsource.net].
> >>>
> >>
> >> ------------------------------------------------------
> >>
> >
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> >> eId=256545
> >>
> >> To unsubscribe from this discussion, e-mail: [users-
> >> unsubscribe at gridengine.sunsource.net].
> >
> > ------------------------------------------------------
> >
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> eId=256546
> >
> > To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
> >
> 
> ------------------------------------------------------
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessag
> eId=256547
> 
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256548

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list