[GE users] sge_qmaster crashing with segmentation fault

eimamagi eimamagi at srce.hr
Wed Nov 4 00:26:56 GMT 2009


Hello to all,

we have a problem with SGE installation. Our environment is the following:
- frontend is VMWare virtual machine on ESX server (Infrastructure 3.5)
- kernel:
   # uname -a
   Linux sge.srce.hr 2.6.18-128.1.14.el5 #1 SMP Wed Jun 17 06:38:05 EDT
2009 x86_64 x86_64 x86_64 GNU/Linux
- SGE version:
   # qstat --version
   SGE 6.2u3
- we have Beowulf cluster where nodes are on private network without any 
firewall implemented.
- we installed courtesy binaries and everything has been working fine 
since the end of August.

On August 29th sge_qmaster simply stopped working without anything in 
message logs. Afterwards we restarted it several times and it simply 
died few minutes after restart.

Then we tried running it in debug mode:
   SGE_DEBUG_LEVEL="2 2 0 0 0 0 2 0"; export SGE_DEBUG_LEVEL;
SGE_ND="true"; export SGE_ND ;
Messages seemed reasonable and sge_qmaster worked fine, but after random 
hours died with the following message:
1.
1522805  16032 scheduler000     ================[SCHEDULING-EPOCH 
200911021352.48]==================
1522806  16032 scheduler000     RAW CQ:2, J:57, H:9, C:49, A:103, D:1, 
P:101, CKPT:0, US:211, PR:101, RQS:0, AR:0, S:nd:384/lf:282
/etc/init.d/sgemaster.isabella: line 652: 16032 Segmentation fault 
$bin_dir/sge_qmaster
2.
18941855  31366 scheduler000     ================[SCHEDULING-EPOCH 
200911031641.16]==================
18941856  31366 scheduler000     RAW CQ:2, J:67, H:9, C:49, A:103, D:1, 
P:101, CKPT:0, US:211, PR:101, RQS:0, AR:0, S:nd:384/lf:282
18941857  31366 event_master     processing event master request: 
/etc/init.d/sgemaster.isabella: line 652: 31366 Segmentation fault
$bin_dir/sge_qmaster


It doesn't seem that this segfault have pattern it its behavior, but it 
might me useful for others. Could this be a problem with virtual VMWare 
machine?

Thanks a lot in advance,
emir

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=224904

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "S/MIME Cryptographic Signature" ]
    [ Application/X-PKCS7-SIGNATURE (Name: "smime.p7s") 5.2 KB. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list