[GE users] Two networks cards in qmaster with SGE 6.0 problem

christian reissmann christian.reissmann at sun.com
Mon Jun 28 10:17:57 BST 2004


qping will not use act_qmaster file or any host_alias entries. qping is 
to work at tcp/ip level. For this reason you have to give all parameters 
( hostname and
port of qmaster) via command line:

usage: qping [-i <interval>] [-info] [-f] <host> <port> <name> <id>
    -i    : set ping interval time
    -info : show full status information and exit
    -f    : show full status information on each ping interval
    host  : host name of running component
    port  : port number of running component
    name  : name of running component (e.g.: "qmaster" or "execd")
    id    : id of running component (e.g.: 1 for daemons)

qping -info clustermaster 5000 qmaster 1

you can try to use the ip adress of your qmaster:

qping x.x.x.x [qmaster port nr.] qmaster 1

Does the command:

$SGE_ROOT/utilbin/gethostbyname -all [YOUR QMASTER HOST]

return the same "SGE name" on all of your master/execd/submit hosts?

The output should look like:

 >gethostbyname -all [YOUR MASTER HOST NAME]
SGE name: [USED SGE NAME (from host alias file)]
Host Address(es): [IP ADDRESS OF HOST]

Best Regards,


Reuti wrote:
> Andi,
>>>>looks like this is a bug. The aliased hostname (the most left name in the
>>>>file) should be written to the "act_qmaster" name file.
>>>Why is anything written at all to act_qmaster while starting up?
>>Because the qmaster machine could be different than before.
> but why must I then edit this file by hand before? I mean, if I put some 
> nonsense in act_qmaster before starting, for instance "blablabla", sgemaster 
> will tell me:
> sge_qmaster didn't start!
> This is not a qmaster host!
> Please, check your act_qmaster file!
> So, it must be correct already before the startup.
>>>>I think there's suitable workaround (A bad woraround might be that after
>>>>starting qmaster the startup script changes the act_qmaster file content -
>>>>befroe the scheduler start. This might work.
>>>This seems not working. I put a sleep in the script sgemaster to wait
>>until the
>>>last thread changed act_qmaster (it was not written immediately after
>>>the binary) and changed it back before starting the scheduler. But then you
>>>can't contact the qmaster at all with qping.
>>I understand. But can the scheduler connect to qmaster?
> Ahh, I have to wait a little bit longer and skip the test. The whole "if-case" 
> in sgemaster:
>       echo "   starting sge_qmaster"
>       $bin_dir/sge_qmaster
> #     CheckRunningQmaster
> echo Sleeping 120...
> sleep 120
> echo master > act_qmaster
>       echo "   starting sge_schedd"
>       $bin_dir/sge_schedd
> This seems working, I just submitted some jobs with this configuration. Still 
> there is no qping possible on the qmaster node. From the nodes to the qmaster 
> it's working as before. I'm confused...
> But when you startup this the first time, the configuration in local_conf is 
> renamed to the internal name, and will not be renamed back to the external name 
> when you want to change the configuration back (without using host_aliases). 
> You have to do it by hand. Then you can use the other confuguration again.
> Reuti
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

Christian Reissmann    Tel: +49 (0)941 3075 112  mailto:crei at sun.com
Software Engineer      Fax: +49 (0)941 3075 222 
Sun Microsystems GmbH, Dr.-Leo-Ritter-Str. 7,
D-93049 Regensburg,    Tel: +49 (0)941 3075 0

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list