[GE users] SGE-6.2u3: error reason 1: exit_status of pe_start = 134

soyez E.Soyez at science-computing.de
Mon Jul 12 07:35:40 BST 2010


Thanks Andy for your quick reply, sorry for my late reply,

it seemed that there was something going terribly wrong (apart from
gridengine?) but as it was very urgent we could not do any detailed
analysis.  We downgraded back to to 6.1 and everything was fine again.
Other sites did not have these problems with 6.2u5 though.

Erik Soyez.

P.S.:	"NONE" or "/bin/true" did not make any difference, although
 	I don't know when and why we started using "/bin/true".  That
 	must have been an example in the Gridengine documentation some
 	decades ago....


On Wed, 19 May 2010, andy wrote:

> Erik,
>
> on Linux and Solaris the signal causing "exit status" 134 is "ABRT" (signal
> 6, 128 + 6 = 134).
>
> There is a known issue in SGE that prolog/epilog/pe* are started with the
> job limits - could an accidentially small job limit, e.g. "1k" instead of
> "1g" have caused the exec() of the shell (I think /bin/true is at least
> sometimes a shell script) or binary to die quickly?
>
> Why not simply set the pe* methods to "NONE"?
>
> There are certainly other reasons why /bin/true can fail.
>
>
> On Wed, 19 May 2010, soyez wrote:
>
>> Good day,
>>
>> does anybody know why PEs can put queues into state "Error" even if
>> the pe_start-file is "/bin/true"?  The usual suspects automounter,
>> NFS, home directories, etc. all work very well.  Is it a known bug
>> or only a mistakable error message?
>>
>> ------------------------------------------------------------------------
>>  				:
>>  				:
>> parallel environment:  abaqus range: 4
>> error reason    1:          05/18/2010 15:01:52 [0:22789]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:02:07 [0:21269]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:02:24 [0:5181]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:02:38 [0:6302]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:09:52 [0:22909]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:10:09 [0:5235]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:14:11 [0:22945]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:14:28 [0:5267]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:32:35 [0:21558]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:32:51 [0:6603]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:46:49 [0:21699]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:47:05 [0:6733]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:47:34 [0:21703]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:47:51 [0:5644]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:48:49 [0:21709]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:49:05 [0:6744]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:50:34 [0:21720]: exit_status of pe_start = 134
>>                  1:          05/18/2010 15:50:51 [0:5661]: exit_status of pe_start = 134
>>  				:
>>  				:
>> ------------------------------------------------------------------------
>>
>>
>> ------------------------------------------------------------------------
>> pe_name            abaqus
>> slots              999
>> user_lists         NONE
>> xuser_lists        NONE
>> start_proc_args    /bin/true
>> stop_proc_args     /bin/true
>> allocation_rule    $fill_up
>> control_slaves     FALSE
>> job_is_first_task  TRUE
>> urgency_slots      min
>> accounting_summary FALSE
>> ------------------------------------------------------------------------












-- 
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Michel Lepert
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=267460

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list