[GE users] grid engine problem

Harald Pollinger Harald.Pollinger at Sun.COM
Tue Nov 13 11:58:25 GMT 2007


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Can somebody help who is more experienced in GDI problems?

Thanks!
Harald

Sandeep, Patel(IE10) wrote:
> Hi 
> Both win and lin both are showing SGE 6.1u2.
> Than what may be  the issue?
> Thanks 
> sandeep
> 
> -----Original Message-----
> From: Harald.Pollinger at Sun.COM [mailto:Harald.Pollinger at Sun.COM] 
> Sent: Tuesday, November 13, 2007 4:58 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] grid engine problem
> 
> Sandeep, Patel(IE10) wrote:
>> Hi
>>   I m getting the  output:
>>
>> $ telnet gridserver.sunnonegrid-bangalore.com 536
>> Trying 199.63.61.100...
>> Connected to gridserver.sunnonegrid-bangalore.com.
>> Escape character is '^]'.
>> Connection closed by foreign host.
>> $
>>
>> Than what is the issue?
>>
>> And in /tmp folder of windows I m getting messages like:-
>>
>> 11/13/2007 15:08:01|execd|ie10dtdc3zl1s|E|can't unpack gdi request
>> 11/13/2007 15:08:01|execd|ie10dtdc3zl1s|E|error unpacking gdi request:
>> bad argument
>> 11/13/2007 15:08:01|execd|ie10dtdc3zl1s|E|getting configuration:
> failed
>> receiving gdi request
>> 11/13/2007 15:08:05|execd|ie10dtdc3zl1s|E|can't unpack gdi request
>>
>> What are these?how can I resolve it?
> 
> I think this means that your Windows execution daemon and your master 
> daemon are of different versions.
> 
> Start on the Windows host:
> # $SGE_ROOT/bin/win32-x86/sge_execd -help
> 
> and on the QMaster host:
> # $SGE_ROOT/bin/lx24-x86/sge_qmaster -help
> (I'm not sure about the lx24, could also be lx26, and the x86 could be a
> 
> amd64 or ia64, depending on your archtitecture and RHEL-Version)
> 
> to get their version numbers.
> 
> Regards,
> Harald
> 
> 
> 
>>
>>
>> Thanks
>> sandeep
>>
>> -----Original Message-----
>> From: Harald.Pollinger at Sun.COM [mailto:Harald.Pollinger at Sun.COM] 
>> Sent: Tuesday, November 13, 2007 4:20 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] grid engine problem
>>
>> Sandeep, Patel(IE10) wrote:
>>> Hi
>>>      Actually I have one windows system and inside that I have
>> installed
>>> vmware. In that I m running two RHEL virtual machines. One of the
>>> virtual RHEL is my master host other is execution host. And the
> mother
>>> window os is one execution host. Is it the problem?
>> This should not be a problem. Maybe you will have to change some
>> settings.
>>
>>
>>> By putty software I m able to connect from windows execution host to
>>> RHEL master host through SSH. But by telnet it is showing some
> network
>>> error? How can I fix this?
>> I think you got me wrong. Try to connect to the qmaster itself, not to
> 
>> the telnetd of the qmaster host. Use
>> # telnet gridserver.sunnonegrid-bangalore.com 536
>>
>> It should print
>>
>> Trying [IP-Adress of gridserver.sunnonegrid-bangalore-com]...
>> Connected to gridserver.sunnonegrid-bangalore.com.
>> Escape character is '^]'.
>>
>>
>> Regards,
>> Harald
>>
>>
>>> Thanks 
>>> Sandeep
>>>
>>> -----Original Message-----
>>> From: Harald.Pollinger at Sun.COM [mailto:Harald.Pollinger at Sun.COM] 
>>> Sent: Tuesday, November 13, 2007 2:50 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] grid engine problem
>>>
>>> Sandeep, Patel(IE10) wrote:
>>>> Hi
>>>>     I checked messages and I got something like this
>>>>                                       
>>>> 11/13/2007 12:38:16|execd|ie10dtdc3zl1s|E|commlib error: endpoint is
>>> not
>>>> unique error (endpoint
>> "ie10dtdc3zl1s.global.ds.honeywell.com/execd/1"
>>>> is already connected)
>>> Are there more than one "sge_execd" instances running on that host?
>>> If yes, please kill all and start only one of them again.
>>>
>>>
>>>> 11/13/2007 12:38:16|execd|ie10dtdc3zl1s|E|getting configuration:
>>> unable
>>>> to contact qmaster using port 536 on host
>>>> "gridserver.sunnonegrid-bangalore.com"
>>> Is there a firewall running somewhere on or between the execution
> host
>>> and the master host?
>>> Is it possible to connect from the execution host to the qmaster
> using
>>> telnet?
>>>
>>>
>>> Regards,
>>> Harald
>>>
>>>
>>>> 11/13/2007 12:38:19|execd|ie10dtdc3zl1s|E|can't get configuration
>> from
>>>> qmaster -- backgrounding
>>>>
>>>> How to solve this problem
>>>>
>>>> Thanks 
>>>> sandeep
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Ravichandra.Nallan at Sun.COM [mailto:Ravichandra.Nallan at Sun.COM]
> 
>>>> Sent: Tuesday, November 13, 2007 12:25 PM
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] grid engine problem
>>>>
>>>> Hi Sandeep,
>>>>  From the qstat o/p it is evident (states au) that the execd on host
> 
>>>> ie10dtdc3z11s.<something....> is not up. Check if there are any
>>> problems
>>>> for the execd not coming up. (check 
>>>> $SGE_ROOT/$SGE_CELL/spool/<hostname>/messages ).
>>>> This is the reason why the jobs are not scheduled to this host.
>>>>
>>>> (For info on queue states check qstat(1) man page, you could also
> see
>>> in
>>>> qstat that the load_avg/arch is -NA- !! ).
>>>>
>>>> Hope this helps.
>>>> regards,
>>>> ~Ravi
>>>>
>>>> Sandeep, Patel(IE10) wrote:
>>>>> Hi
>>>>>
>>>>> 1. I have my *master *host in RHEL.
>>>>>
>>>>> 2. I have two *execution* host
>>>>>
>>>>> A. one is on *windows *
>>>>>
>>>>> B. other one is on *RHEL*
>>>>>
>>>>> 3. When I m submitting the job *simple.sh(4times) , *when I m
> typing
>>>>> the command *qstat -f , * then the job is always going to the RHEL 
>>>>> execution host for execution because the
>>>>>
>>>>> Used by/total *is 2/2* for RHEL , but for *windows 0/2.the* jobs
> are
>>>>> *pending* for some time and *later taken by* RHEL execution host.
>>>>>
>>>>> 4. It means the job is not distributed among the hosts *!!!!*
>>>>>
>>>>> 5. How can I solve this?
>>>>>
>>>>> 6. In this connection I have *attached* some *screen shots*. Can u 
>>>>> please check it out?
>>>>>
>>>>> Thanks
>>>>>
>>>>> sandeep


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list