[GE users] Reg: windows domain users access to grid

Harald Pollinger Harald.Pollinger at Sun.COM
Fri Nov 2 11:18:57 GMT 2007


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Manju,

thank you very much for this information!

So you just applied the hotfixes like described on 
http://www.duh.org/interix/hotfixes.php?

Btw:
On my Windows 2003 Server host I have patch level:
Interix 3.5 SP-8.0.1969.54

on my Windows XP SP 2 host it is:
Interix 3.5 SP-8.0.1969.51

AFAIK these are the highest available levels, but it's easy to miss some 
hotfixes on the confusing Microsoft hotfix pages.

Regards,
Harald


manju a wrote:
> Hi Harald,
> 
> finally problem has got fixed, thanks for ur replays ... finally i
> came to know whatever i applied hotfix not getting update on the
> interix so this unique problem was coming... i reapplied this patches
> it works fine... i like to tell another thing if the interix subsytem
> patch level
> 
> if u do uname -a on interix shell
> 
> Interix execd_hostname  3.5 SP-8.0.1969.40(below) x86
> Intel_x86_Family15_Model2_Stepping5 ( it will not work properly, )
> 
>  patch level on the interix should be  3.5 SP-8.0.1969.50(above) x86
> Intel_x86_Family15_Model2_Stepping5 (it will work fine as a execution
> host) if the qmaster is 6.1u2 ...
> 
> 
> thanks
> Manjunath A.
> 
> 
> 
> 
> 
> 
> 
> On Nov 1, 2007 9:52 PM, manju a <manju.kudu at gmail.com> wrote:
>> Hi Harald,
>>
>> Do u have any more inputs on this!!! please suggest me...
>>
>> thanks
>> Manju
>>
>>
>> On Nov 1, 2007 6:36 PM, manju a <manju.kudu at gmail.com> wrote:
>>> Hi Harald,
>>>
>>> yes i got the still  same messages "load sensor died through
>>>  signal = 11"
>>>
>>> thanks
>>> manjunath A.
>>>
>>>
>>>
>>>
>>> On Nov 1, 2007 4:07 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
>>>> Hi Manju,
>>>>
>>>> with the static_load.sh, do you still get the "load sensor died through
>>>> signal = 11" messages in the execd messages file?
>>>>
>>>>
>>>> Regards,
>>>> Harald
>>>>
>>>> manju a wrote:
>>>>> Hi Harald,
>>>>>
>>>>> I changed the load sensors to static_load using this command qconf
>>>>> -mconf win_host_name n i restarted the execution host.. still not able
>>>>> to get that sge_execd process under process list!!!! and another thing
>>>>> i observed is qloadersensor.exe is running on the working
>>>>> machine(where the sge_execd is running)... but i couldn't find it in
>>>>> the not working machine... i reinstalled the SFU n applied the patch,
>>>>> yes off course  i disabled the DEP.... no luck yet.... this type of
>>>>> problem i m facing in nearly 5 to 6 machines... which is having two
>>>>> drives n user profile pointing in to D drive n checked the OS patch
>>>>> level it looks same in all the machines.........
>>>>>
>>>>> thanks
>>>>> Manjunath
>>>>>
>>>>> On Oct 31, 2007 10:40 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
>>>>>> Manju,
>>>>>>
>>>>>> signal 11 means "segmentation fault", i.e. the load sensor tries to
>>>>>> access some not allocated memory. As the load sensors works on all other
>>>>>> systems, I still assume that something is wrong with your Windows version...
>>>>>>
>>>>>> As a workaround, you could configure a different load sensor, at first
>>>>>> only a dummy load sensor. "qconf -sconf <windows_host_name>" shows you
>>>>>> what load sensor is currently configured, it should be
>>>>>> "$SGE_ROOT/util/resources/loadsensors/interix-loadsensor.sh".
>>>>>>
>>>>>> Just configure a different load sensor script, e.g. one that only
>>>>>> returns static data like the static_load.sh I've attached to this E-Mail.
>>>>>>
>>>>>> If the execd works with this load sensor, we should dig deeper into the
>>>>>> load sensors problem.
>>>>>>
>>>>>> Regards,
>>>>>> Harald
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> manju a wrote:
>>>>>>> Hi Harald,
>>>>>>>
>>>>>>> please find the logs from the execution host (this logs from windows
>>>>>>> execution host)
>>>>>>>
>>>>>>> 10/31/2007 06:23:29|execd|execd_host|I|starting up SGE 6.1u2 (win32-x86)
>>>>>>> 10/31/2007 06:23:31|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:23:33|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:23:35|execd|execd_host|I|controlled shutdown 6.1u2
>>>>>>> 10/31/2007 06:24:04|execd|execd_host|I|starting up SGE 6.1u2 (win32-x86)
>>>>>>> 10/31/2007 06:24:05|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:24:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:24:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:25:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:26:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:26:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:27:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:28:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:28:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:29:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:30:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:30:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:31:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:32:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:32:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:33:27|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:34:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:34:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:35:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:36:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:36:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:37:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:38:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:38:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:39:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:40:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:40:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:41:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:42:06|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:42:46|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:43:26|execd|execd_host|E|load sensor died through signal = 11
>>>>>>> 10/31/2007 06:43:29|execd|execd_host|I|controlled shutdown 6.1u2
>>>>>>>
>>>>>>> if i tried to start from $SGE_ROOT/bin/win32-x86/sge_execd after i m
>>>>>>> in to dl 5 mode.... it will keep on struck at starting sge_execd , at
>>>>>>> that time i can see the sge_execd process under process list !!!! but
>>>>>>> it will not start.....
>>>>>>>
>>>>>>> please let me know if you got any thing atleast with this logs....
>>>>>>>
>>>>>>> thanks
>>>>>>> Manjunath A.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Oct 31, 2007 7:00 PM, manju a <manju.kudu at gmail.com> wrote:
>>>>>>>> yes i done that one also!!!! but no luck..... if i start
>>>>>>>> /etc/init.d/sgeexecd start process it will come n disappear within a
>>>>>>>> sec from the process list....
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 31, 2007 6:52 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
>>>>>>>>> manju a wrote:
>>>>>>>>>> Hi Harald,
>>>>>>>>>>
>>>>>>>>>> i already applied this hot fixes but not as suggested in the page(i
>>>>>>>>>> mean order while applying the hot fixes) does it makes any difference
>>>>>>>>>> ???
>>>>>>>>> The page says: Yes, it makes a difference, apply them only in the
>>>>>>>>> suggested order to make sure all of them work.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Harald
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> thanks
>>>>>>>>>> Manju
>>>>>>>>>>
>>>>>>>>>> On Oct 31, 2007 4:56 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
>>>>>>>>>>> manju a wrote:
>>>>>>>>>>>> Hi Harald,
>>>>>>>>>>>>
>>>>>>>>>>>>  i can see the below messages from master at  this location
>>>>>>>>>>>> $SGE_ROOT/farm/spool/qmaster/messages
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 10/30/2007 23:00:07|qmaster|testqmaster|E|commlib error: got read
>>>>>>>>>>>> error (closing "execd_host.abc.com/qstat/85")
>>>>>>>>>>>> 10/30/2007 23:28:44|qmaster|testqmaster|E|commlib error: got read
>>>>>>>>>>>> error (closing "execd_host.abc.com/qconf/121")
>>>>>>>>>>>> 10/30/2007 23:32:03|qmaster|testqmaster|E|commlib error: got read
>>>>>>>>>>>> error (closing "execd_host.abc.com/qconf/125")
>>>>>>>>>>>> 10/30/2007 23:32:44|qmaster|testqmaster|E|commlib error: got read
>>>>>>>>>>>> error (closing "execd_host.abc.com/execd/1")
>>>>>>>>>>>> 10/30/2007 23:33:28|qmaster|testqmaster|E|commlib error: got read
>>>>>>>>>>>> error (closing "execd_host.abc.com/execd/1")
>>>>>>>>>>>>
>>>>>>>>>>>> what this commlib error means ???
>>>>>>>>>>> It means that the qmaster is just before reading or already is reading
>>>>>>>>>>> some data from the execd, but then the execd suddenly 'vanishes'.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> testqmaster ----> qmaster server name
>>>>>>>>>>>>
>>>>>>>>>>>> execd_host-----> execution host name (sge_execd process not running
>>>>>>>>>>>> machine name )
>>>>>>>>>>>>
>>>>>>>>>>>> pleas let me know if you got any thing from this logs....i m still
>>>>>>>>>>>> trying to make run this sge_execd process but no luck.. please help me
>>>>>>>>>>>> out this.......
>>>>>>>>>>> I guess there is simply something wrong with your Windows/SFU host. You
>>>>>>>>>>> could try to apply all Hotfixes suggested on this page:
>>>>>>>>>>> http://www.duh.org/interix/hotfixes.php
>>>>>>>>>>>
>>>>>>>>>>> This is just a collection of all official Microsoft Hotfixes for SFU.
>>>>>>>>>>> It's very difficult to find the Hotfixes and their dependencies on the
>>>>>>>>>>> Microsoft pages, so just apply them in the order suggested here for your
>>>>>>>>>>> Windows version.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Harald
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> thanks
>>>>>>>>>>>> Manjunath A,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 30, 2007 4:45 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
>>>>>>>>>>>>> manju a wrote:
>>>>>>>>>>>>>> Hi Harald,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> i like to give some more information
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> whatever working machines all the windows profiles pointing in to
>>>>>>>>>>>>>> "C:\Document and Settings"  do you think this will matters ????
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and non working machines (sge_execd not running) Windows profiles
>>>>>>>>>>>>>> pointing to "D:\Users"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> this is the only difference i found between working and non working machines...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> no error found while i m installing machine as execution host, i
>>>>>>>>>>>>>> carried out the installation process same as the working machines....
>>>>>>>>>>>>> This might matter for job execution, but it doesn't matter for the Execd
>>>>>>>>>>>>> startup.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I found this:
>>>>>>>>>>>>> The "TERMINATE dispatcher because j == 1" line gets printed when the
>>>>>>>>>>>>> Execd either receives a SIGTERM or a SIGINT signal or it receives a
>>>>>>>>>>>>> shutdown request from the QMaster.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As I can't see any request from the QMaster in the trace file sniplet
>>>>>>>>>>>>> you sent me, I assume the Execd received a SIGTERM or a SIGINT signal.
>>>>>>>>>>>>> I can't tell you where this signal came from.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Harald
>>>>
>>>> --
>>>> Sun Microsystems GmbH         Harald Pollinger
>>>> Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
>>>> D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
>>>> Germany                       Fax: +49 (0)941 3075-222  (x60222)
>>>> http://www.sun.com/gridware
>>>> mailto:harald.pollinger at sun.com
>>>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
>>>> D-85551 Kirchheim-Heimstetten
>>>> Amtsgericht Muenchen: HRB 161028
>>>> Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
>>>> Vorsitzender des Aufsichtsrates: Martin Haering
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list