[GE users] grid engine problem

Harald Pollinger Harald.Pollinger at Sun.COM
Tue Nov 13 09:20:10 GMT 2007


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Sandeep, Patel(IE10) wrote:
> Hi
>     I checked messages and I got something like this
>                                       
> 11/13/2007 12:38:16|execd|ie10dtdc3zl1s|E|commlib error: endpoint is not
> unique error (endpoint "ie10dtdc3zl1s.global.ds.honeywell.com/execd/1"
> is already connected)

Are there more than one "sge_execd" instances running on that host?
If yes, please kill all and start only one of them again.


> 11/13/2007 12:38:16|execd|ie10dtdc3zl1s|E|getting configuration: unable
> to contact qmaster using port 536 on host
> "gridserver.sunnonegrid-bangalore.com"

Is there a firewall running somewhere on or between the execution host 
and the master host?
Is it possible to connect from the execution host to the qmaster using 
telnet?


Regards,
Harald


> 11/13/2007 12:38:19|execd|ie10dtdc3zl1s|E|can't get configuration from
> qmaster -- backgrounding
> 
> How to solve this problem
> 
> Thanks 
> sandeep
> 
> 
> -----Original Message-----
> From: Ravichandra.Nallan at Sun.COM [mailto:Ravichandra.Nallan at Sun.COM] 
> Sent: Tuesday, November 13, 2007 12:25 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] grid engine problem
> 
> Hi Sandeep,
>  From the qstat o/p it is evident (states au) that the execd on host 
> ie10dtdc3z11s.<something....> is not up. Check if there are any problems
> 
> for the execd not coming up. (check 
> $SGE_ROOT/$SGE_CELL/spool/<hostname>/messages ).
> This is the reason why the jobs are not scheduled to this host.
> 
> (For info on queue states check qstat(1) man page, you could also see in
> 
> qstat that the load_avg/arch is -NA- !! ).
> 
> Hope this helps.
> regards,
> ~Ravi
> 
> Sandeep, Patel(IE10) wrote:
>> Hi
>>
>> 1. I have my *master *host in RHEL.
>>
>> 2. I have two *execution* host
>>
>> A. one is on *windows *
>>
>> B. other one is on *RHEL*
>>
>> 3. When I m submitting the job *simple.sh(4times) , *when I m typing 
>> the command *qstat -f , * then the job is always going to the RHEL 
>> execution host for execution because the
>>
>> Used by/total *is 2/2* for RHEL , but for *windows 0/2.the* jobs are 
>> *pending* for some time and *later taken by* RHEL execution host.
>>
>> 4. It means the job is not distributed among the hosts *!!!!*
>>
>> 5. How can I solve this?
>>
>> 6. In this connection I have *attached* some *screen shots*. Can u 
>> please check it out?
>>
>> Thanks
>>
>> sandeep
>>
>>
>>
> ------------------------------------------------------------------------
>>
>>
>>
> ------------------------------------------------------------------------
>>
>>
> ------------------------------------------------------------------------
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>   
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list