[GE users] [kpatton at captain.transmeta.com: [kpatton at captain.transmeta.com: Re: [GE users] sge 6.0u7 qrsh taking a long time to dispatch]]

Kirk Patton kpatton at transmeta.com
Tue Jan 3 15:49:17 GMT 2006


For giggles, I turned off the afs support(bootstrap) in the test cluster and tried submitting a qrsh job.
The job dispatched quickly.  I then enabled afs support once again and changed the set_token_cmd
to /bin/true and the qrsh jobs dispatched quickly.  I next set the set_token_cmd back to my auth
program and now things are running quickly.  

For a moment, I thought I had the problem narrowed down to my set_token_cmd.  Unless the problem reappears,
I will have to attribute this weirdness to gremlins...

I also noticed that the debug output for the qrsh command is different.

 387  19203 182894037984     R E A D I N G    J O B ! ! ! ! ! ! ! ! ! ! !
   388  19203 182894037984     ============================================
   389  19203 182894037984     random polling set to 4
   390  19203 182894037984 --> wait_for_qrsh_socket() {
   391  19203 182894037984     accepted client connection, fd = 3
   392  19203 182894037984 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 375 }
   393  19203 182894037984 --> get_client_server_context() {
   394  19203 182894037984 --> read_from_qrsh_socket() {
   395  19203 182894037984     qlogin_starter sent: 0:44372:/transmeta/sge/n1ge6-u7/utilbin/lx24-amd64:/transmeta/sge/n1ge6-u7/default/spool/captain/active_jobs/89.1:captain.transmeta.com
   396  19203 182894037984 --> start_client_program() {
   397  19237 182894037984 --> quote_argument() {
   398  19237 182894037984 <-- quote_argument() ../clients/qsh/qsh.c 545 }
   399  19237 182894037984 --> quote_argument() {
   400  19237 182894037984 <-- quote_argument() ../clients/qsh/qsh.c 545 }
Freeze                RCS     date2sec.py   date2sec.py~      date_str2sec.py~    python.linuxRH73.tar.bz2  src  token.aix    token.py
Python-2.4.2.tar.bz2  README  date2sec.pyc  date_str2sec.pyc  python.aix.tar.bz2  python.solaris.tar.bz2    tmp  token.linux  token.solaris
Connection to captain.transmeta.com closed.
   397  19203 182894037984 --> get_remote_exit_code() {
   398  19203 182894037984 --> wait_for_qrsh_socket() {
   399  19203 182894037984     accepted client connection, fd = 3
   400  19203 182894037984 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 375 }
   401  19203 182894037984 --> read_from_qrsh_socket() {
   402  19203 182894037984 <-- get_remote_exit_code() ../clients/qsh/qsh.c 485 }
   403  19203 182894037984 <-- start_client_program() ../clients/qsh/qsh.c 717 }
   404  19203 182894037984 --> sge_exit() {
   405  19203 182894037984 --> sge_security_exit() {
   406  19203 182894037984 <-- sge_security_exit() ../libs/gdi/sge_security.c 610 }
   407  19203 182894037984 <-- sge_exit() ../libs/uti/sge_unistd.c 312 }

Thanks,
Kirk


----- Forwarded message from Kirk Patton <kpatton at captain.transmeta.com> -----

To: Sun Grid Engine List <users at gridengine.sunsource.net>
From: Kirk Patton <kpatton at captain.transmeta.com>
Subject: [kpatton at captain.transmeta.com: Re: [GE users] sge 6.0u7 qrsh taking a long time to dispatch]
Date: Tue, 3 Jan 2006 07:29:00 -0800
Mail-Followup-To: Sun Grid Engine List <users at gridengine.sunsource.net>
User-Agent: Mutt/1.4.2.1i

I did a comparison of qrsh job submissions to my current cluster 6.0u6 vs 6.u7 test cluster.
I set the debug level to 2 and I am seeing some additional messages that are not in the 6.0u6
output.

I am seeing calls that are not in the sge 6.0u6 output:
   390  17703 182894037984 --> sge_set_auth_info() {
   391  17703 182894037984 --> sge_encrypt() {
   392  17703 182894037984 --> change_encoding() {
   393  17703 182894037984 <-- change_encoding() ../libs/gdi/sge_security.c 1848 }
   394  17703 182894037984 <-- sge_encrypt() ../libs/gdi/sge_security.c 1666 }
   395  17703 182894037984 <-- sge_set_auth_info() ../libs/gdi/sge_security.c 1613 }

I have enabled afs support in the 6.0u7 cluster, which is part of my testing.  The script
that set_token_cmd is pointed to is working properly.


sge 6.0u6
   477   9534 16384     R E A D I N G    J O B ! ! ! ! ! ! ! ! ! ! !
   478   9534 16384     ============================================
   479   9534 16384     random polling set to 56
   480   9534 16384 --> wait_for_qrsh_socket() {
   481   9534 16384     accepted client connection, fd = 3
   482   9534 16384 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 374 }
   483   9534 16384 --> get_client_server_context() {
   484   9534 16384 --> read_from_qrsh_socket() {
   485   9534 16384     qlogin_starter sent: 0:32996:/transmeta/sge/n1ge6-u6/utilbin/lx24-amd64:/var/gridware/spool/transmeta/op240-025/active_jobs/96039.1:op240-025.t ransmeta.com
   486   9534 16384 --> start_client_program() {
   487   9535 16384 --> quote_argument() {
   488   9535 16384 <-- quote_argument() ../clients/qsh/qsh.c 544 }
   489   9535 16384 --> quote_argument() {
   490   9535 16384 <-- quote_argument() ../clients/qsh/qsh.c 544 }
Job 96039 is submitted to queue <common>
<<Starting on op240-025>>

sge 6.0u7
   385  17703 182894037984     R E A D I N G    J O B ! ! ! ! ! ! ! ! ! ! !
   386  17703 182894037984     ============================================
   387  17703 182894037984     random polling set to 3
   388  17703 182894037984 --> wait_for_qrsh_socket() {
   389  17703 182894037984 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 375 }
   390  17703 182894037984 --> sge_set_auth_info() {
   391  17703 182894037984 --> sge_encrypt() {
   392  17703 182894037984 --> change_encoding() {
   393  17703 182894037984 <-- change_encoding() ../libs/gdi/sge_security.c 1848 }
   394  17703 182894037984 <-- sge_encrypt() ../libs/gdi/sge_security.c 1666 }
   395  17703 182894037984 <-- sge_set_auth_info() ../libs/gdi/sge_security.c 1613 }
   396  17703 182894037984 --> gdi_send_sec_message() {
   397  17703 182894037984 --> dump_snd_info() {
   398  17703 182894037984 <-- dump_snd_info() ../libs/gdi/sge_security.c 145 }
   399  17703 182894037984 <-- gdi_send_sec_message() ../libs/gdi/sge_security.c 649 }
   400  17703 182894037984 --> gdi_receive_sec_message() {
   401  17703 182894037984 --> dump_rcv_info() {
   402  17703 182894037984 <-- dump_rcv_info() ../libs/gdi/sge_security.c 117 }
   403  17703 182894037984 <-- gdi_receive_sec_message() ../libs/gdi/sge_security.c 628 }
   404  17703 182894037984 --> parse_result_list() {
   405  17703 182894037984 <-- parse_result_list() ../clients/qsh/qsh.c 607 }
   406  17703 182894037984     Job Status is: 0 (unenrolled)
   407  17703 182894037984     polling_interval set to 6
   408  17703 182894037984     random polling set to 6
   409  17703 182894037984 --> wait_for_qrsh_socket() {
   410  17703 182894037984 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 375 }
   411  17703 182894037984 --> sge_set_auth_info() {
   412  17703 182894037984 --> sge_encrypt() {
   413  17703 182894037984 --> change_encoding() {
   414  17703 182894037984 <-- change_encoding() ../libs/gdi/sge_security.c 1848 }
   415  17703 182894037984 <-- sge_encrypt() ../libs/gdi/sge_security.c 1666 }
   416  17703 182894037984 <-- sge_set_auth_info() ../libs/gdi/sge_security.c 1613 }
   417  17703 182894037984 --> gdi_send_sec_message() {
   418  17703 182894037984 --> dump_snd_info() {
   419  17703 182894037984 <-- dump_snd_info() ../libs/gdi/sge_security.c 145 }
   420  17703 182894037984 <-- gdi_send_sec_message() ../libs/gdi/sge_security.c 649 }
   421  17703 182894037984 --> gdi_receive_sec_message() {
   422  17703 182894037984 --> dump_rcv_info() {
   423  17703 182894037984 <-- dump_rcv_info() ../libs/gdi/sge_security.c 117 }
   424  17703 182894037984 <-- gdi_receive_sec_message() ../libs/gdi/sge_security.c 628 }
   425  17703 182894037984 --> parse_result_list() {
   426  17703 182894037984 <-- parse_result_list() ../clients/qsh/qsh.c 607 }
   427  17703 182894037984     Job Status is: 0 (unenrolled)
   428  17703 182894037984     polling_interval set to 12
   429  17703 182894037984     random polling set to 20
   430  17703 182894037984 --> wait_for_qrsh_socket() {
   431  17703 182894037984 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 375 }
   432  17703 182894037984 --> sge_set_auth_info() {
   433  17703 182894037984 --> sge_encrypt() {
   434  17703 182894037984 --> change_encoding() {
   435  17703 182894037984 <-- change_encoding() ../libs/gdi/sge_security.c 1848 }
   436  17703 182894037984 <-- sge_encrypt() ../libs/gdi/sge_security.c 1666 }
   437  17703 182894037984 <-- sge_set_auth_info() ../libs/gdi/sge_security.c 1613 }
   438  17703 182894037984 --> gdi_send_sec_message() {
   439  17703 182894037984 --> dump_snd_info() {
   440  17703 182894037984 <-- dump_snd_info() ../libs/gdi/sge_security.c 145 }
   441  17703 182894037984 <-- gdi_send_sec_message() ../libs/gdi/sge_security.c 649 }
   442  17703 182894037984 --> gdi_receive_sec_message() {
   443  17703 182894037984 --> dump_rcv_info() {
   444  17703 182894037984 <-- dump_rcv_info() ../libs/gdi/sge_security.c 117 }
   445  17703 182894037984 <-- gdi_receive_sec_message() ../libs/gdi/sge_security.c 628 }
   446  17703 182894037984 --> parse_result_list() {
   447  17703 182894037984 <-- parse_result_list() ../clients/qsh/qsh.c 607 }
   448  17703 182894037984     Job Status is: 0 (unenrolled)
   449  17703 182894037984     polling_interval set to 24
   450  17703 182894037984     random polling set to 41
   451  17703 182894037984 --> wait_for_qrsh_socket() {
   452  17703 182894037984     accepted client connection, fd = 3
   453  17703 182894037984 <-- wait_for_qrsh_socket() ../clients/qsh/qsh.c 375 }
   454  17703 182894037984 --> get_client_server_context() {
   455  17703 182894037984 --> read_from_qrsh_socket() {
   456  17703 182894037984     qlogin_starter sent: 0:44186:/transmeta/sge/n1ge6-u7/utilbin/lx24-amd64:/transmeta/sge/n1ge6-u7/default/spool/captain/active_jobs/82.1:captain.transmeta.com
   457  17703 182894037984 --> start_client_program() {
   458  17809 182894037984 --> quote_argument() {
   459  17809 182894037984 <-- quote_argument() ../clients/qsh/qsh.c 545 }
   460  17809 182894037984 --> quote_argument() {
   461  17809 182894037984 <-- quote_argument() ../clients/qsh/qsh.c 545 }
captain.transmeta.com




----- Forwarded message from Kirk Patton <kpatton at captain.transmeta.com> -----

To: users at gridengine.sunsource.net
From: Kirk Patton <kpatton at captain.transmeta.com>
Subject: Re: [GE users] sge 6.0u7 qrsh taking a long time to dispatch
Date: Tue, 3 Jan 2006 06:39:11 -0800
Mail-Followup-To: users at gridengine.sunsource.net
In-Reply-To: <43BA86B5.5000300 at sun.com>
User-Agent: Mutt/1.4.1i

Is this the correct format of the command?

[kpatton at captain ~]$ sudo qping -dump captain 537 execd 1
open connection to "captain.transmeta.com/execd/1" ... no error happened
           time|local                        |d.|remote                              |format|ack type|               msg tag|msg id|msg rid|msg len|       msg time|   msg ltime|con count|
---------------|-----------------------------|--|------------------------------------|------|--------|----------------------|------|-------|-------|---------------|------------|---------|
06:33:47.392582|captain.transmeta.com/execd/1|->|captain.transmeta.com/debug_client/1|   crm|     nak|                     0|     0|      0|    256|06:33:47.392581|00:00.000000|        2|
06:34:14.435393|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31007|      0|   1862|06:34:14.435308|00:00.000084|        2|
06:34:54.429330|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31008|      0|   1862|06:34:54.429250|00:00.000079|        2|
06:35:10.960714|captain.transmeta.com/execd/1|<-|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|     TAG_JOB_EXECUTION|  5321|      0|   8455|06:35:10.960708|00:00.000005|        2|
06:35:11.025222|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31009|      0|    387|06:35:11.025117|00:00.000104|        2|
06:35:14.059826|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31010|      0|   1352|06:35:14.059730|00:00.000095|        2|
06:35:14.078146|captain.transmeta.com/execd/1|<-|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|       TAG_ACK_REQUEST|  5322|      0|     21|06:35:14.078139|00:00.000006|        2|
06:35:34.217319|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   sim|     nak|                     0| 31011|      0|     25|06:35:34.217230|00:00.000089|        2|
06:35:34.217848|captain.transmeta.com/execd/1|<-|sge-master1.transmeta.com/qmaster/1 |  sirm|     nak|                     0|  5323|      0|    383|06:35:34.217843|00:00.000003|        2|
06:35:34.219626|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31012|      0|   1862|06:35:34.219541|00:00.000085|        2|
06:36:14.215264|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31013|      0|   1862|06:36:14.215185|00:00.000079|        2|
06:36:54.211659|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31014|      0|   1862|06:36:54.211582|00:00.000076|        2|
06:37:14.524870|captain.transmeta.com/execd/1|<-|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|     TAG_JOB_EXECUTION|  5324|      0|   8455|06:37:14.524864|00:00.000006|        2|
06:37:14.977675|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31015|      0|    387|06:37:14.977580|00:00.000094|        2|
06:37:19.289048|captain.transmeta.com/execd/1|->|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|    TAG_REPORT_REQUEST| 31016|      0|   1352|06:37:19.288946|00:00.000101|        2|
06:37:19.290632|captain.transmeta.com/execd/1|<-|sge-master1.transmeta.com/qmaster/1 |   bin|     nak|       TAG_ACK_REQUEST|  5325|      0|     21|06:37:19.290628|00:00.000003|        2|

I submitted two jobs.  One, after the other finished.

Kirk

On Tue, Jan 03, 2006 at 03:14:13PM +0100, Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> Hi Kirk,
> 
> could you do the same monitoring with qping -dump and start an qrsh job?
> 
> Taht would give us important time information.
> 
> Thanks,
> Stephan
> 
> Kirk Patton wrote On 12/30/05 19:45,:
> 
> >Hello all,
> >
> >I have a test cluster set up that is running sge 6.0u7. I am noticing that qrsh jobs are hanging
> >for a period of time before they get started on a host.  It is unusually long.  Somewhere between
> >45 second and a minute.
> >
> >I ran the submission under debug level 3 and there is some polling taking place several times before the 
> >job starts.
> >
> >Has anyone seen anything like this?
> >
> >I am attaching the debug output.
> >
> >Thanks,
> >Kirk
> >
> >
> >     0   2703 182894037984     ****** starting localization procedure ... **********
> >     1   2703 182894037984     could not get environment variable "GRIDPACKAGE"
> >     2   2703 182894037984     could not get environment variable "GRIDLOCALEDIR"
> >     3   2703 182894037984     setlocale() returns "C"
> >     4   2703 182894037984     locale directory: >/transmeta/sge/n1ge6-u7/locale<
> >     5   2703 182894037984     package file:     >lx24-amd64/gridengine.mo<
> >     6   2703 182894037984     language (LANG):  >C<
> >     7   2703 182894037984     loading message file: /transmeta/sge/n1ge6-u7/locale/C/LC_MESSAGES/lx24-amd64/gridengine.mo
> >     8   2703 182894037984     could not open message file - error
> >     9   2703 182894037984     setlocale() returns "C"
> >    10   2703 182894037984     bindtextdomain() returns "/transmeta/sge/n1ge6-u7/locale"
> >    11   2703 182894037984     textdomain() returns "lx24-amd64/gridengine"
> >    12   2703 182894037984     error id output     : disabled
> >    13   2703 182894037984     ****** starting localization procedure ... failed  **
> >    14   2703 182894037984     Getting host by name - Linux
> >    15   2703 182894037984     1 names in h_addr_list
> >    16   2703 182894037984     0 names in h_aliases
> >    17   2703 182894037984     me.who                      >14<
> >    18   2703 182894037984     me.sge_formal_prog_name     >qrsh<
> >    19   2703 182894037984     me.qualified_hostname       >captain.transmeta.com<
> >    20   2703 182894037984     me.unqualified_hostname     >captain<
> >    21   2703 182894037984     me.uid                      >1660<
> >    22   2703 182894037984     me.gid                      >1660<
> >    23   2703 182894037984     me.daemonized               >0<
> >    24   2703 182894037984     me.user_name                >kpatton<
> >    25   2703 182894037984     me.default_cell             >default<
> >    26   2703 182894037984     sge_root            >/transmeta/sge/n1ge6-u7<
> >    27   2703 182894037984     cell_root           >/transmeta/sge/n1ge6-u7/default<
> >    28   2703 182894037984     conf_file           >/transmeta/sge/n1ge6-u7/default/common/bootstrap<
> >    29   2703 182894037984     bootstrap_file      >/transmeta/sge/n1ge6-u7/default/common/configuration<
> >    30   2703 182894037984     act_qmaster_file    >/transmeta/sge/n1ge6-u7/default/common/act_qmaster<
> >    31   2703 182894037984     acct_file           >/transmeta/sge/n1ge6-u7/default/common/accounting<
> >    32   2703 182894037984     reporting_file      >/transmeta/sge/n1ge6-u7/default/common/reporting<
> >    33   2703 182894037984     local_conf_dir      >/transmeta/sge/n1ge6-u7/default/common/local_conf<
> >    34   2703 182894037984     shadow_masters_file >/transmeta/sge/n1ge6-u7/default/common/shadow_masters<
> >    35   2703 182894037984     admin_user          >none<
> >    36   2703 182894037984     default_domain      >none<
> >    37   2703 182894037984     ignore_fqdn         >true<
> >    38   2703 182894037984     spooling_method     >classic<
> >    39   2703 182894037984     spooling_lib        >libspoolc<
> >    40   2703 182894037984     spooling_params     >/transmeta/sge/n1ge6-u7/default/common;/var/gridware/spool/transmeta<
> >    41   2703 182894037984     binary_path         >/transmeta/sge/n1ge6-u7/bin<
> >    42   2703 182894037984     qmaster_spool_dir   >/var/gridware/spool/transmeta<
> >    43   2703 182894037984     security_mode        >afs<
> >    44   2703 182894037984     (re-)reading act_qmaster file. Got master host "sge-master1.transmeta.com"
> >    45   2703 182894037984     ../libs/gdi/sge_any_request.c 515 starting up communication without threads
> >    46   2703 182894037984     Getting host by name - Linux
> >    47   2703 182894037984     1 names in h_addr_list
> >    48   2703 182894037984     0 names in h_aliases
> >    49   2703 182894037984     me.qualified_hostname: captain.transmeta.com
> >    50   2703 182894037984     secure dummy string: AIMK_SECURE_OPTION_ENABLED
> >    51   2703 182894037984     creating GDI handle
> >    52   2703 182894037984     returning port value: 536
> >    53   2703 182894037984     -- defaults file: /transmeta/sge/n1ge6-u7/default/common/sge_request
> >    54   2703 182894037984     directive prefix = ""
> >    55   2703 182894037984     -- defaults file /home/kpatton/.sge_request does not exist
> >    56   2703 182894037984     -- defaults file /var/gridware/spool/transmeta/captain/.sge_request does not exist
> >    57   2703 182894037984     "-q all.q at captain"
> >    58   2703 182894037984     ===hostname===
> >    59   2703 182894037984     Path Alias: ># (c) 2004 Sun Microsystems, Inc. Use is subject to license terms.  <
> >    60   2703 182894037984     Path Alias: >#<
> >    61   2703 182894037984     Path Alias: ># Template Grid Engine path aliasing configuration file<
> >    62   2703 182894037984     Path Alias: >#<
> >    63   2703 182894037984     Path Alias: ># The following entry aliases physical address as generated by automounter<
> >    64   2703 182894037984     Path Alias: ># (with a leading /tmp_mnt) to the logical path (w/o leading /tmp_mnt).<
> >    65   2703 182894037984     Path Alias: >#<
> >    66   2703 182894037984     Path Alias: ># subm_dir	subm_host	exec_host	path_replacement<
> >    67   2703 182894037984     Path Alias: >/tmp_mnt/	*		      *		      /<
> >    68   2703 182894037984     get_configuration: unique for captain.transmeta.com: captain.transmeta.com
> >    69   2703 182894037984     requesting global and captain.transmeta.com
> >    70   2703 182894037984     packing SGE_GDI_GET request
> >    71   2703 182894037984     packing SGE_GDI_GET request
> >    72   2703 182894037984     reresolve port timeout in 600
> >    73   2703 182894037984     returning cached port value: 536
> >    74   2703 182894037984     Getting host by name - Linux
> >    75   2703 182894037984     1 names in h_addr_list
> >    76   2703 182894037984     0 names in h_aliases
> >    77   2703 182894037984     send request with id 1
> >    78   2703 182894037984     unpacking SGE_GDI_GET request
> >    79   2703 182894037984     in: request_id=1, sequence_id=1, target=10, op=1
> >    80   2703 182894037984     out: request_id=1, sequence_id=1, target=10, op=1
> >    81   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "log_warning" for loglevel
> >    82   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/transmeta/sge/n1ge6-u7/default/spool" for execd_spool_dir
> >    83   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/bin/mail" for mailer
> >    84   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/usr/bin/X11/xterm" for xterm
> >    85   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for load_sensor
> >    86   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for prolog
> >    87   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for epilog
> >    88   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "posix_compliant" for shell_start_mode
> >    89   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "sh,ksh,csh,tcsh" for login_shells
> >    90   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for min_uid
> >    91   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for min_gid
> >    92   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "20000-20100" for gid_range
> >    93   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "00:00:40" for load_report_time
> >    94   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "false" for enforce_project
> >    95   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "auto" for enforce_user
> >    96   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "00:05:00" for max_unheard
> >    97   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "log_warning" for loglevel
> >    98   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "lsfadmin at transmeta.com" for administrator_mail
> >    99   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/transmeta/sge/n1ge6-u7/transmeta/scripts/token" for set_token_cmd
> >   100   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/usr/afsws/bin/pagsh" for pag_cmd
> >   101   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "24:0:0" for token_extend_time
> >   102   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for shepherd_cmd
> >   103   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for qmaster_params
> >   104   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for execd_params
> >   105   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "accounting=true reporting=false flush_time=00:00:15 joblog=false sharelog=00:00:00" for reporting_params
> >   106   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "100" for finished_jobs
> >   107   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for qlogin_daemon
> >   108   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for qlogin_command
> >   109   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/usr/sbin/sshd -i" for rsh_daemon
> >   110   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/usr/bin/ssh -t" for rsh_command
> >   111   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/usr/sbin/sshd -i" for rlogin_daemon
> >   112   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "/usr/bin/ssh -t" for rlogin_command
> >   113   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "00:00:00" for reschedule_unknown
> >   114   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "2000" for max_aj_instances
> >   115   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "75000" for max_aj_tasks
> >   116   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for max_u_jobs
> >   117   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for max_jobs
> >   118   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for reprioritize
> >   119   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for auto_user_oticket
> >   120   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "0" for auto_user_fshare
> >   121   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "none" for auto_user_default_project
> >   122   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "86400" for auto_user_delete_time
> >   123   2703 182894037984     ../libs/sgeobj/sge_conf.c 367 using "false" for delegated_file_staging
> >   124   2703 182894037984     Everything ok
> >   125   2703 182894037984     qrsh will listen on port 38587
> >   126   2703 182894037984     B E F O R E     S E N D I N G! ! ! ! ! ! ! ! ! ! ! ! ! !
> >   127   2703 182894037984     =====================================================
> >   128   2703 182894037984     packing SGE_GDI_ADD request
> >   129   2703 182894037984     packing SGE_GDI_ADD request
> >   130   2703 182894037984     reresolve port timeout in 599
> >   131   2703 182894037984     returning cached port value: 536
> >   132   2703 182894037984     send request with id 2
> >   133   2703 182894037984     unpacking SGE_GDI_ADD request
> >   134   2703 182894037984     in: request_id=2, sequence_id=1, target=5, op=258
> >   135   2703 182894037984     out: request_id=2, sequence_id=1, target=5, op=258
> >   136   2703 182894037984     ../clients/qsh/qsh.c 1705 your job 73 ("hostname") has been submitted
> >   137   2703 182894037984     job id is: 73
> >   138   2703 182894037984     R E A D I N G    J O B ! ! ! ! ! ! ! ! ! ! !
> >   139   2703 182894037984     ============================================
> >   140   2703 182894037984     random polling set to 3
> >   141   2703 182894037984     packing SGE_GDI_GET request
> >   142   2703 182894037984     packing SGE_GDI_GET request
> >   143   2703 182894037984     reresolve port timeout in 596
> >   144   2703 182894037984     returning cached port value: 536
> >   145   2703 182894037984     send request with id 1
> >   146   2703 182894037984     unpacking SGE_GDI_GET request
> >   147   2703 182894037984     in: request_id=3, sequence_id=1, target=5, op=1
> >   148   2703 182894037984     out: request_id=3, sequence_id=1, target=5, op=1
> >   149   2703 182894037984     Job Status is: 0 (unenrolled)
> >   150   2703 182894037984     polling_interval set to 6
> >   151   2703 182894037984     random polling set to 8
> >   152   2703 182894037984     packing SGE_GDI_GET request
> >   153   2703 182894037984     packing SGE_GDI_GET request
> >   154   2703 182894037984     reresolve port timeout in 588
> >   155   2703 182894037984     returning cached port value: 536
> >   156   2703 182894037984     send request with id 1
> >   157   2703 182894037984     unpacking SGE_GDI_GET request
> >   158   2703 182894037984     in: request_id=4, sequence_id=1, target=5, op=1
> >   159   2703 182894037984     out: request_id=4, sequence_id=1, target=5, op=1
> >   160   2703 182894037984     Job Status is: 0 (unenrolled)
> >   161   2703 182894037984     polling_interval set to 12
> >   162   2703 182894037984     random polling set to 17
> >   163   2703 182894037984     packing SGE_GDI_GET request
> >   164   2703 182894037984     packing SGE_GDI_GET request
> >   165   2703 182894037984     reresolve port timeout in 571
> >   166   2703 182894037984     returning cached port value: 536
> >   167   2703 182894037984     send request with id 1
> >   168   2703 182894037984     unpacking SGE_GDI_GET request
> >   169   2703 182894037984     in: request_id=5, sequence_id=1, target=5, op=1
> >   170   2703 182894037984     out: request_id=5, sequence_id=1, target=5, op=1
> >   171   2703 182894037984     Job Status is: 0 (unenrolled)
> >   172   2703 182894037984     polling_interval set to 24
> >   173   2703 182894037984     random polling set to 27
> >   174   2703 182894037984     packing SGE_GDI_GET request
> >   175   2703 182894037984     packing SGE_GDI_GET request
> >   176   2703 182894037984     reresolve port timeout in 543
> >   177   2703 182894037984     returning cached port value: 536
> >   178   2703 182894037984     send request with id 1
> >   179   2703 182894037984     unpacking SGE_GDI_GET request
> >   180   2703 182894037984     in: request_id=6, sequence_id=1, target=5, op=1
> >   181   2703 182894037984     out: request_id=6, sequence_id=1, target=5, op=1
> >   182   2703 182894037984     Job Status is: 0 (unenrolled)
> >   183   2703 182894037984     polling_interval set to 48
> >   184   2703 182894037984     random polling set to 87
> >   185   2703 182894037984     accepted client connection, fd = 3
> >   186   2703 182894037984     qlogin_starter sent: 0:38605:/transmeta/sge/n1ge6-u7/utilbin/lx24-amd64:/transmeta/sge/n1ge6-u7/default/spool/captain/active_jobs/73.1:captain.transmeta.com
> >captain.transmeta.com
> >Connection to captain.transmeta.com closed.
> >   187   2703 182894037984     accepted client connection, fd = 3
> >-------------------------------
> >JB_job_number        (Ulong)     = 0
> >JB_job_name          (String)  * = hostname
> >JB_version           (Ulong)     = 0
> >JB_jid_request_list  (List)      = empty
> >JB_jid_predecessor_l (List)      = empty
> >JB_jid_sucessor_list (List)      = empty
> >JB_session           (String)    = (null)
> >JB_project           (String)    = (null)
> >JB_department        (String)    = (null)
> >JB_directive_prefix  (String)    = (null)
> >JB_exec_file         (String)    = (null)
> >JB_script_file       (String)  * = hostname
> >JB_script_size       (Ulong)     = 0
> >JB_script_ptr        (String)    = (null)
> >JB_submission_time   (Ulong)   * = 1135962760
> >JB_execution_time    (Ulong)     = 0
> >JB_deadline          (Ulong)     = 0
> >JB_owner             (String)  * = kpatton
> >JB_uid               (Ulong)   * = 1660
> >JB_group             (String)    = (null)
> >JB_gid               (Ulong)     = 0
> >JB_account           (String)    = (null)
> >JB_cwd               (String)    = (null)
> >JB_notify            (Bool)      = false
> >JB_type              (Ulong)   * = 73
> >JB_reserve           (Bool)      = false
> >JB_priority          (Ulong)   * = 1024
> >JB_jobshare          (Ulong)     = 0
> >JB_shell_list        (List)      = empty
> >JB_verify            (Ulong)     = 0
> >JB_env_list          (List)    * = full {
> >
> >   List: <job_sublist> * #Elements: 7
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_HOME
> >   VA_value             (String)  * = /home/kpatton
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_LOGNAME
> >   VA_value             (String)  * = kpatton
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_PATH
> >   VA_value             (String)  * = /transmeta/sge/n1ge6-u7/bin/lx24-amd64:/transmeta/sge/n1ge6-u7/bin/lx24-amd64:/transmeta/sge/n1ge6-u7/bin/lx24-amd64:/transmeta/sge/n1ge6-u6/transmeta/scripts:/transmeta/sge/n1ge6-u6/bin/lx24-amd64:/opt/modules/3.1.6/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/sbin:/sbin:/usr/local/lsf/bin:/home/kpatton/scripts
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_SHELL
> >   VA_value             (String)  * = /bin/tcsh
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_MAIL
> >   VA_value             (String)  * = /var/mail/kpatton
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_HOST
> >   VA_value             (String)  * = captain.transmeta.com
> >   -------------------------------
> >   VA_variable          (String)  * = __SGE_PREFIX__O_WORKDIR
> >   VA_value             (String)  * = /var/gridware/spool/transmeta/captain
> >}
> >JB_context           (List)      = empty
> >JB_job_args          (List)      = empty
> >JB_checkpoint_attr   (Ulong)     = 0
> >JB_checkpoint_name   (String)    = (null)
> >JB_checkpoint_object (Object)    = none
> >JB_checkpoint_interv (Ulong)     = 0
> >JB_restart           (Ulong)   * = 2
> >JB_stdout_path_list  (List)      = empty
> >JB_stderr_path_list  (List)      = empty
> >JB_stdin_path_list   (List)      = empty
> >JB_merge_stderr      (Bool)      = false
> >JB_hard_resource_lis (List)      = empty
> >JB_soft_resource_lis (List)      = empty
> >JB_hard_queue_list   (List)    * = full {
> >
> >   List: <destin_ident_list> * #Elements: 1
> >   -------------------------------
> >   QR_name              (String)  * = all.q at captain
> >}
> >JB_soft_queue_list   (List)      = empty
> >JB_mail_options      (Ulong)     = 0
> >JB_mail_list         (List)    * = full {
> >
> >   List: <> * #Elements: 1
> >   -------------------------------
> >   MR_user              (String)  * = kpatton
> >   MR_host              (Host)    * = captain.transmeta.com
> >}
> >JB_pe                (String)    = (null)
> >JB_pe_range          (List)      = empty
> >JB_master_hard_queue (List)      = empty
> >JB_tgt               (String)    = (null)
> >JB_cred              (String)    = (null)
> >JB_ja_structure      (List)    * = full {
> >
> >   List: <task_id_range> * #Elements: 1
> >   -------------------------------
> >   RN_min               (Ulong)   * = 1
> >   RN_max               (Ulong)   * = 1
> >   RN_step              (Ulong)   * = 1
> >}
> >JB_ja_n_h_ids        (List)    * = full {
> >
> >   List: <task_id_range> * #Elements: 1
> >   -------------------------------
> >   RN_min               (Ulong)   * = 1
> >   RN_max               (Ulong)   * = 1
> >   RN_step              (Ulong)   * = 1
> >}
> >JB_ja_u_h_ids        (List)      = empty
> >JB_ja_s_h_ids        (List)      = empty
> >JB_ja_o_h_ids        (List)      = empty
> >JB_ja_z_ids          (List)      = empty
> >JB_ja_template       (List)      = empty
> >JB_ja_tasks          (List)      = empty
> >JB_host              (Host)      = (null)
> >JB_category          (Ref)       = (nil)
> >JB_user_list         (List)      = empty
> >JB_job_identifier_li (List)      = empty
> >JB_job_source        (String)    = (null)
> >JB_verify_suitable_q (Ulong)     = 0
> >JB_nrunning          (Ulong)     = 0
> >JB_soft_wallclock_gm (Ulong)     = 0
> >JB_hard_wallclock_gm (Ulong)     = 0
> >JB_override_tickets  (Ulong)     = 0
> >JB_qs_args           (List)      = empty
> >JB_path_aliases      (List)      = empty
> >JB_urg               (Double)    = 0.000000
> >JB_nurg              (Double)    = 0.000000
> >JB_nppri             (Double)    = 0.000000
> >JB_rrcontr           (Double)    = 0.000000
> >JB_dlcontr           (Double)    = 0.000000
> >JB_wtcontr           (Double)    = 0.000000
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Kirk Patton
Unix Administrator
Transmeta Inc.
Tel. 408 919-3055

----- End forwarded message -----

-- 
Kirk Patton
Unix Administrator
Transmeta Inc.
Tel. 408 919-3055

----- End forwarded message -----

-- 
Kirk Patton
Unix Administrator
Transmeta Inc.
Tel. 408 919-3055

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list