[GE users] Reg: windows domain users access to grid

manju a manju.kudu at gmail.com
Wed Nov 14 12:55:01 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Harald,

i have one question ,

how we can configure the e-mail notifications in grid  , once the job
got executed(with the output n error)  from the windows exec hosts.
e-mail notification should go to the user who submits jobs.

thanks
manjunath A.




On Nov 2, 2007 6:44 PM, manju a <manju.kudu at gmail.com> wrote:
> Hey harald,
>
> i missed some other patches tooo please find in below
>
> 913030
> 886655
> 887531
>
> thanks
> manjunath A.
>
>
>
>
> On Nov 2, 2007 6:37 PM, manju a <manju.kudu at gmail.com> wrote:
> > Hi Harald,
> >
> > yes i think in the past we got that hot fixes from that website
> > itself, please find the hotfix detials what i applied
> >
> > 1) SFU35-KB939778-X86-ENU
> > 2) 934322-2003
> > 3) 932143
> >
> > Earlier my patch level was Interix 3.5 SP-8.0.1969.40 after applying
> > this it became Interix 3.5 SP-8.0.1969.58
> >
> > thanks
> > Manjunath A.
> >
> >
> >
> >
> >
> >
> > On Nov 2, 2007 4:48 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
> > > Hi Manju,
> > >
> > > thank you very much for this information!
> > >
> > > So you just applied the hotfixes like described on
> > > http://www.duh.org/interix/hotfixes.php?
> > >
> > > Btw:
> > > On my Windows 2003 Server host I have patch level:
> > > Interix 3.5 SP-8.0.1969.54
> > >
> > > on my Windows XP SP 2 host it is:
> > > Interix 3.5 SP-8.0.1969.51
> > >
> > > AFAIK these are the highest available levels, but it's easy to miss some
> > > hotfixes on the confusing Microsoft hotfix pages.
> > >
> > >
> > > Regards,
> > > Harald
> > >
> > >
> > > manju a wrote:
> > > > Hi Harald,
> > > >
> > > > finally problem has got fixed, thanks for ur replays ... finally i
> > > > came to know whatever i applied hotfix not getting update on the
> > > > interix so this unique problem was coming... i reapplied this patches
> > > > it works fine... i like to tell another thing if the interix subsytem
> > > > patch level
> > > >
> > > > if u do uname -a on interix shell
> > > >
> > > > Interix execd_hostname  3.5 SP-8.0.1969.40(below) x86
> > > > Intel_x86_Family15_Model2_Stepping5 ( it will not work properly, )
> > > >
> > > >  patch level on the interix should be  3.5 SP-8.0.1969.50(above) x86
> > > > Intel_x86_Family15_Model2_Stepping5 (it will work fine as a execution
> > > > host) if the qmaster is 6.1u2 ...
> > > >
> > > >
> > > > thanks
> > > > Manjunath A.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Nov 1, 2007 9:52 PM, manju a <manju.kudu at gmail.com> wrote:
> > > >> Hi Harald,
> > > >>
> > > >> Do u have any more inputs on this!!! please suggest me...
> > > >>
> > > >> thanks
> > > >> Manju
> > > >>
> > > >>
> > > >> On Nov 1, 2007 6:36 PM, manju a <manju.kudu at gmail.com> wrote:
> > > >>> Hi Harald,
> > > >>>
> > > >>> yes i got the still  same messages "load sensor died through
> > > >>>  signal = 11"
> > > >>>
> > > >>> thanks
> > > >>> manjunath A.
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Nov 1, 2007 4:07 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
> > > >>>> Hi Manju,
> > > >>>>
> > > >>>> with the static_load.sh, do you still get the "load sensor died through
> > > >>>> signal = 11" messages in the execd messages file?
> > > >>>>
> > > >>>>
> > > >>>> Regards,
> > > >>>> Harald
> > > >>>>
> > > >>>> manju a wrote:
> > > >>>>> Hi Harald,
> > > >>>>>
> > > >>>>> I changed the load sensors to static_load using this command qconf
> > > >>>>> -mconf win_host_name n i restarted the execution host.. still not able
> > > >>>>> to get that sge_execd process under process list!!!! and another thing
> > > >>>>> i observed is qloadersensor.exe is running on the working
> > > >>>>> machine(where the sge_execd is running)... but i couldn't find it in
> > > >>>>> the not working machine... i reinstalled the SFU n applied the patch,
> > > >>>>> yes off course  i disabled the DEP.... no luck yet.... this type of
> > > >>>>> problem i m facing in nearly 5 to 6 machines... which is having two
> > > >>>>> drives n user profile pointing in to D drive n checked the OS patch
> > > >>>>> level it looks same in all the machines.........
> > > >>>>>
> > > >>>>> thanks
> > > >>>>> Manjunath
> > > >>>>>
> > > >>>>> On Oct 31, 2007 10:40 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
> > > >>>>>> Manju,
> > > >>>>>>
> > > >>>>>> signal 11 means "segmentation fault", i.e. the load sensor tries to
> > > >>>>>> access some not allocated memory. As the load sensors works on all other
> > > >>>>>> systems, I still assume that something is wrong with your Windows version...
> > > >>>>>>
> > > >>>>>> As a workaround, you could configure a different load sensor, at first
> > > >>>>>> only a dummy load sensor. "qconf -sconf <windows_host_name>" shows you
> > > >>>>>> what load sensor is currently configured, it should be
> > > >>>>>> "$SGE_ROOT/util/resources/loadsensors/interix-loadsensor.sh".
> > > >>>>>>
> > > >>>>>> Just configure a different load sensor script, e.g. one that only
> > > >>>>>> returns static data like the static_load.sh I've attached to this E-Mail.
> > > >>>>>>
> > > >>>>>> If the execd works with this load sensor, we should dig deeper into the
> > > >>>>>> load sensors problem.
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>> Harald
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> manju a wrote:
> > > >>>>>>> Hi Harald,
> > > >>>>>>>
> > > >>>>>>> please find the logs from the execution host (this logs from windows
> > > >>>>>>> execution host)
> > > >>>>>>>
> > > >>>>>>> 10/31/2007 06:23:29|execd|execd_host|I|starting up SGE 6.1u2 (win32-x86)
> > > >>>>>>> 10/31/2007 06:23:31|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:23:33|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:23:35|execd|execd_host|I|controlled shutdown 6.1u2
> > > >>>>>>> 10/31/2007 06:24:04|execd|execd_host|I|starting up SGE 6.1u2 (win32-x86)
> > > >>>>>>> 10/31/2007 06:24:05|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:24:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:24:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:25:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:26:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:26:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:27:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:28:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:28:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:29:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:30:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:30:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:31:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:32:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:32:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:33:27|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:34:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:34:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:35:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:36:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:36:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:37:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:38:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:38:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:39:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:40:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:40:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:41:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:42:06|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:42:46|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:43:26|execd|execd_host|E|load sensor died through signal = 11
> > > >>>>>>> 10/31/2007 06:43:29|execd|execd_host|I|controlled shutdown 6.1u2
> > > >>>>>>>
> > > >>>>>>> if i tried to start from $SGE_ROOT/bin/win32-x86/sge_execd after i m
> > > >>>>>>> in to dl 5 mode.... it will keep on struck at starting sge_execd , at
> > > >>>>>>> that time i can see the sge_execd process under process list !!!! but
> > > >>>>>>> it will not start.....
> > > >>>>>>>
> > > >>>>>>> please let me know if you got any thing atleast with this logs....
> > > >>>>>>>
> > > >>>>>>> thanks
> > > >>>>>>> Manjunath A.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Oct 31, 2007 7:00 PM, manju a <manju.kudu at gmail.com> wrote:
> > > >>>>>>>> yes i done that one also!!!! but no luck..... if i start
> > > >>>>>>>> /etc/init.d/sgeexecd start process it will come n disappear within a
> > > >>>>>>>> sec from the process list....
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Oct 31, 2007 6:52 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
> > > >>>>>>>>> manju a wrote:
> > > >>>>>>>>>> Hi Harald,
> > > >>>>>>>>>>
> > > >>>>>>>>>> i already applied this hot fixes but not as suggested in the page(i
> > > >>>>>>>>>> mean order while applying the hot fixes) does it makes any difference
> > > >>>>>>>>>> ???
> > > >>>>>>>>> The page says: Yes, it makes a difference, apply them only in the
> > > >>>>>>>>> suggested order to make sure all of them work.
> > > >>>>>>>>>
> > > >>>>>>>>> Regards,
> > > >>>>>>>>> Harald
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>> thanks
> > > >>>>>>>>>> Manju
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Oct 31, 2007 4:56 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
> > > >>>>>>>>>>> manju a wrote:
> > > >>>>>>>>>>>> Hi Harald,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>  i can see the below messages from master at  this location
> > > >>>>>>>>>>>> $SGE_ROOT/farm/spool/qmaster/messages
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 10/30/2007 23:00:07|qmaster|testqmaster|E|commlib error: got read
> > > >>>>>>>>>>>> error (closing "execd_host.abc.com/qstat/85")
> > > >>>>>>>>>>>> 10/30/2007 23:28:44|qmaster|testqmaster|E|commlib error: got read
> > > >>>>>>>>>>>> error (closing "execd_host.abc.com/qconf/121")
> > > >>>>>>>>>>>> 10/30/2007 23:32:03|qmaster|testqmaster|E|commlib error: got read
> > > >>>>>>>>>>>> error (closing "execd_host.abc.com/qconf/125")
> > > >>>>>>>>>>>> 10/30/2007 23:32:44|qmaster|testqmaster|E|commlib error: got read
> > > >>>>>>>>>>>> error (closing "execd_host.abc.com/execd/1")
> > > >>>>>>>>>>>> 10/30/2007 23:33:28|qmaster|testqmaster|E|commlib error: got read
> > > >>>>>>>>>>>> error (closing "execd_host.abc.com/execd/1")
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> what this commlib error means ???
> > > >>>>>>>>>>> It means that the qmaster is just before reading or already is reading
> > > >>>>>>>>>>> some data from the execd, but then the execd suddenly 'vanishes'.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> testqmaster ----> qmaster server name
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> execd_host-----> execution host name (sge_execd process not running
> > > >>>>>>>>>>>> machine name )
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> pleas let me know if you got any thing from this logs....i m still
> > > >>>>>>>>>>>> trying to make run this sge_execd process but no luck.. please help me
> > > >>>>>>>>>>>> out this.......
> > > >>>>>>>>>>> I guess there is simply something wrong with your Windows/SFU host. You
> > > >>>>>>>>>>> could try to apply all Hotfixes suggested on this page:
> > > >>>>>>>>>>> http://www.duh.org/interix/hotfixes.php
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> This is just a collection of all official Microsoft Hotfixes for SFU.
> > > >>>>>>>>>>> It's very difficult to find the Hotfixes and their dependencies on the
> > > >>>>>>>>>>> Microsoft pages, so just apply them in the order suggested here for your
> > > >>>>>>>>>>> Windows version.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regards,
> > > >>>>>>>>>>> Harald
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> thanks
> > > >>>>>>>>>>>> Manjunath A,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Oct 30, 2007 4:45 PM, Harald Pollinger <Harald.Pollinger at sun.com> wrote:
> > > >>>>>>>>>>>>> manju a wrote:
> > > >>>>>>>>>>>>>> Hi Harald,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> i like to give some more information
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> whatever working machines all the windows profiles pointing in to
> > > >>>>>>>>>>>>>> "C:\Document and Settings"  do you think this will matters ????
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> and non working machines (sge_execd not running) Windows profiles
> > > >>>>>>>>>>>>>> pointing to "D:\Users"
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> this is the only difference i found between working and non working machines...
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> no error found while i m installing machine as execution host, i
> > > >>>>>>>>>>>>>> carried out the installation process same as the working machines....
> > > >>>>>>>>>>>>> This might matter for job execution, but it doesn't matter for the Execd
> > > >>>>>>>>>>>>> startup.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I found this:
> > > >>>>>>>>>>>>> The "TERMINATE dispatcher because j == 1" line gets printed when the
> > > >>>>>>>>>>>>> Execd either receives a SIGTERM or a SIGINT signal or it receives a
> > > >>>>>>>>>>>>> shutdown request from the QMaster.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> As I can't see any request from the QMaster in the trace file sniplet
> > > >>>>>>>>>>>>> you sent me, I assume the Execd received a SIGTERM or a SIGINT signal.
> > > >>>>>>>>>>>>> I can't tell you where this signal came from.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>> Harald
> > > >>>>
> > > >>>> --
> > > >>>> Sun Microsystems GmbH         Harald Pollinger
> > > >>>> Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
> > > >>>> D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
> > > >>>> Germany                       Fax: +49 (0)941 3075-222  (x60222)
> > > >>>> http://www.sun.com/gridware
> > > >>>> mailto:harald.pollinger at sun.com
> > > >>>> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
> > > >>>> D-85551 Kirchheim-Heimstetten
> > > >>>> Amtsgericht Muenchen: HRB 161028
> > > >>>> Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
> > > >>>> Vorsitzender des Aufsichtsrates: Martin Haering
> > > >>>>
> > > >>>> ---------------------------------------------------------------------
> > > >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > >>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > >>>>
> > > >>>>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > >
> > >
> > >
> > > --
> > >
> > > Sun Microsystems GmbH         Harald Pollinger
> > > Dr.-Leo-Ritter-Str. 7         N1 Grid Engine Engineering
> > > D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
> > > Germany                       Fax: +49 (0)941 3075-222  (x60222)
> > > http://www.sun.com/gridware
> > > mailto:harald.pollinger at sun.com
> > > Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
> > > D-85551 Kirchheim-Heimstetten
> > > Amtsgericht Muenchen: HRB 161028
> > > Geschaeftsfuehrer: Wolfgang Engels, Dr. Roland Boemer
> > > Vorsitzender des Aufsichtsrates: Martin Haering
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > >
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list