[GE users] scheduler goes down

Andy Schwierskott andy.schwierskott at sun.com
Fri May 13 12:58:37 BST 2005


Hi,

> this is independed of the job numbers.. Looks like you
> found bug. Can you post a stack trace?

In case there no core dump (which you need for the stack trace) then the
reason is (unless core limit was set to 0) that the scheduler was started by
user root and is runing as admin user. For security reasons Unix does not
create core dumps for suid-root-like processes.

Solution:

   - use coreadm (on Solaris only). See coreadm(1) how it works
   - or start the scheduler directly under the admin user account.

Andy

> Thanks,
> Stephan
>
> Philipp Drum wrote:
>
>> Hi,
>>
>> * I wrote:
>>
>>
>>> [debug output]
>>>
>>>
>>
>> btw, what amount of jobs can be considered as 'many'?
>>
>> some 'sleep 10' submitted, sge_schedd:
>>
>> 10995265  17542    Added job 635989
>> 10995266  17542    37599. EVENT ADD JOB 635990.0
>> 10995267  17542    Added job 635990
>> 10995268  17542    37600. EVENT ADD JOB 635991.0
>> 10995269  17542    Added job 635991
>> 10995270  17542    37601. EVENT ADD JOB 635992.0
>> 10995271  17542    Added job 635992
>> 10995272  17542    37602. EVENT ADD JOB 635993.0
>> 10995273  17542    Added job 635993
>> 10995274  17542    37603. EVENT ADD JOB 635994.0
>> 10995275  17542    Added job 635994
>> 10995276  17542    37604. EVENT ADD JOB 635995.0
>> 10995277  17542    Added job 635995
>> 10995278  17542    37605. EVENT ADD JOB 635996.0
>> 10995279  17542    Added job 635996
>> 10995280  17542    37606. EVENT ADD JOB 635997.0
>> 10995281  17542    Added job 635997
>> 10995282  17542    37607. EVENT ADD JOB 635998.0
>> 10995283  17542    Added job 635998
>> 10995284  17542    37608. EVENT ADD JOB 635999.0
>> 10995285  17542    Added job 635999
>> 10995286  17542    37609. EVENT ADD JOB 636000.0
>> 10995287  17542    Added job 636000
>> 10995288  17542    37610. EVENT MOD QUEUE succinat_l.q
>> 10995289  17542    37611. EVENT MOD JATASK 603288.1
>> 10995290  17542    JATASK 603288.1: IDLE -> RUNNING
>> 10995291  17542    MOD job counter 24 knopf 1
>> 10995292  17542    37612. EVENT MOD QUEUE tobaco_l.q
>> 10995293  17542    37613. EVENT MOD JATASK 603289.1
>> 10995294  17542    JATASK 603289.1: IDLE -> RUNNING
>> 10995295  17542    MOD job counter 24 knopf 1
>> 10995296  17542    37614. EVENT MOD QUEUE tobaco_l.q
>> 10995297  17542    37615. EVENT MOD JATASK 603290.1
>> 10995298  17542    JATASK 603290.1: IDLE -> RUNNING
>> 10995299  17542    MOD job counter 24 knopf 1
>> Segmentation fault
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list