[GE users] yet another commlib error

christian reissmann Christian.Reissmann at Sun.COM
Wed Dec 7 11:22:38 GMT 2005


Hi Michael,

> 12/06/2005 15:42:34|qmaster|gene1|E|cannot recreate queue all.q from
> disk because of unknown host g2.biocl.weizmann.ac.il

it looks that you lost your all.q at startup, because the host "g2.biocl.weizmann.ac.il"
ist not resolveable anymore. The host might be referenced in the
@allhosts hostgroup:

> qconf -shgrpl
@allhosts

> qconf -shgrp @allhosts
group_name @allhosts
hostlist angbor arwen baumbart bilbur boromir carc denethor

Can you try the following:

1) shutdown qmaster
2) add a guilty host entry for "g2.biocl.weizmann.ac.il" in /etc/hosts of qmaster
3) restart qmaster
4) remove host from @allhosts hostgroup (qconf -mhgrp @allhosts)
5) shutdown qmaster
6) remove tmp hostname entry from /etc/hosts
7) restart qmaster


Best Regards,

Christian


Michael Green wrote On 12/06/05 14:56,:
> On 12/6/05, Olesen, Mark <Mark.Olesen at arvinmeritor.com> wrote:
> 
> 
>>The desperate and drastic solution was to remove all entries from the
>>spool/qmaster/jobs/.. directory.
> 
> 
> Eventually I did the same and it helped to eliminate one error, but
> now I'm stuck at another:
> <log>
> 12/06/2005 15:42:34|qmaster|gene1|E|missing configuration attribute "hostname"
> 12/06/2005 15:42:34|qmaster|gene1|E|cannot recreate queue all.q from
> disk because of unknown host g2.biocl.weizmann.ac.il
> 12/06/2005 15:42:34|qmaster|gene1|I|read job database with 1 entries
> in 0 seconds
> 12/06/2005 15:42:34|qmaster|gene1|I|qmaster hard descriptor limit is set to 8192
> 12/06/2005 15:42:34|qmaster|gene1|I|qmaster soft descriptor limit is set to 8192
> 12/06/2005 15:42:34|qmaster|gene1|I|qmaster will use max. 8172 file
> descriptors for communication
> 12/06/2005 15:42:34|qmaster|gene1|I|qmaster will accept max. 99
> dynamic event clients
> 12/06/2005 15:42:34|qmaster|gene1|I|starting up 6.0u6
> 12/06/2005 15:42:45|qmaster|gene1|I|controlled shutdown 6.0u6
> </log>
> 
> I proceeded as Christian suggested and I think the following is
> related to the error:
> <code>
>  1842  23256 1075267008 --> set_conf_string() {
>   1843  23256 1075267008 <-- set_conf_string() ../libs/sgeobj/config.c 294 }
>   1844  23256 1075267008 <-- read_host_work()
> ../libs/spool/classic/read_write_host.c 86 }
>   1845  23256 1075267008 --> sge_log() {
>   1846  23256 1075267008     ../libs/spool/classic/read_object.c 151
> missing configuration attribute "hostname"
>   1847  23256 1075267008 <-- sge_log() ../libs/uti/sge_log.c 516 }
>   1848  23256 1075267008 <-- read_object()
> ../libs/spool/classic/read_object.c 155 }
>   1849  23256 1075267008 <-- cull_read_in_host()
> ../libs/spool/classic/read_write_host.c 266 }
>   1850  23256 1075267008 <-- sge_read_exechost_list_from_disk()
> ../libs/spool/classic/read_list.c 241 }
> </code>
> 
> I went ahead and downloaded the source from CVS to have a peek at the
> relevant source files, but my programming skills are pretty lame and
> after 30 minutes of reading the code I couldn't figure it out and I
> gave up.
> 
> Now if I start the /etc/init.d/sgemaster script I get both qmaster and
> scheduler running but qstat -f  shows nothing.
> --
> Warm regards,
> Michael Green
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Christian Reissmann    Tel: +49 (0)941 3075 112  mailto:crei at sun.com
Software Engineer      Fax: +49 (0)941 3075 222  http://www.sun.com/gridengine
Sun Microsystems GmbH, Dr.-Leo-Ritter-Str. 7,
D-93049 Regensburg,    Tel: +49 (0)941 3075 0


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list