[GE issues] [Issue 3040] New - qalter -w p output fails to match qstat -j output

ravinallan ravichandra.nallan at sun.com
Thu May 28 17:03:51 BST 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3040
                 Issue #|3040
                 Summary| qalter -w p output fails to match qstat -j output
               Component|gridengine
                 Version|6.2
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P2
            Subcomponent|qmaster
             Assigned to|ravinallan
             Reported by|ravinallan






------- Additional comments from ravinallan at sunsource.net Thu May 28 09:03:49 -0700 2009 -------
qalter -q p <job_id> is supposed to replace qstat -j <jobid> in the future because schedd_job_info will eventually disappear. But the o/p of
qalter -q p <job_id>  doesn't match the qstat -j when the job is in error state

> I am running SGE 6.2u2_1
>
> I performed the following internal test and please let me know what you think:
>
>
> % qstat -u \*
>  job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
> -----------------------------------------------------------------------------------------------------------------
>     92 0.55500 test       ot32901      qw    04/21/2009 23:49:17                      1
>     93 0.55500 test       ot32901      qw    04/21/2009 23:55:40                      1
>     94 0.55500 Sleeper    root         qw    04/22/2009 00:03:10                      1
> %  qalter -w p 94
> verification: found suitable queue(s)
>
> The above qalter output is similar whether schedd_job_info is true/false.
>
> % qstat -j 94
> ==============================================================
> job_number:                 94
>
> snip...
>
> script_file:                sleeper.sh
> scheduling info:            queue instance "all.q at ruffe" dropped because it is temporarily not available
>                            queue instance "all.q at v4u-2000sb" dropped because it is temporarily not available
>                            cannot run in queue "v4u-2000sc" because it is not contained in its hard queue list (-q)

Moreover, when the job fails due to epilog/pe-start/pe-stop/prolog failures, the job is still in the pending state and qalter reports
the jobs as running:

sgetest at vx86-v65xk-blr03 $ qstat -f
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q at vx86-v65xa-blr03         BIP   0/0/1          -NA-     -NA-          au
---------------------------------------------------------------------------------
all.q at vx86-v65xf-blr03         BIP   0/0/4          0.00     sol-x86       s
---------------------------------------------------------------------------------
all.q at vx86-v65xk-blr03         BIP   0/0/4          0.01     sol-x86       s
---------------------------------------------------------------------------------
new.q at vx86-v65xa-blr03         BIP   0/0/1          -NA-     -NA-          au
---------------------------------------------------------------------------------
new.q at vx86-v65xf-blr03         BIP   0/0/1          0.00     sol-x86       E
---------------------------------------------------------------------------------
new.q at vx86-v65xk-blr03         BIP   0/0/1          0.01     sol-x86

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     21 0.55500 Sleeper    sgetest      qw    05/14/2009 01:15:23     1
     22 0.55500 Sleeper    sgetest      qw    05/14/2009 01:42:10     1
sgetest at vx86-v65xk-blr03 $ qstat -j 22
==============================================================
job_number:                 22
exec_file:                  job_scripts/22
submission_time:            Thu May 14 01:42:10 2009
owner:                      sgetest
uid:                        209480
group:                      sgegrp
gid:                        20000002
sge_o_home:                 /export/home/sgetest
sge_o_log_name:             sgetest
sge_o_path:                
/export/home/sgetest/test1/maintrunk/bin/sol-x86:/usr/bin:/bin:/sbin:/usr/sbin:/usr/ccs/bin:/usr/sfw/bin:/opt/csw/bin:/opt/SUNWspro/bin
sge_o_shell:                /usr/bin/csh
sge_o_workdir:              /export/home/sgetest/test1/maintrunk
sge_o_host:                 vx86-v65xk-blr03
account:                    sge
mail_list:                  sgetest at vx86-v65xk-blr03
notify:                     FALSE
job_name:                   Sleeper
jobshare:                   0
shell_list:                 NONE:/bin/sh
env_list:
job_args:                   1000000
script_file:                examples/jobs/sleeper.sh
parallel environment:  test range: 1
error reason    1:          05/14/2009 03:37:03 [209480:19665]: unable to find pe_start file "/bin/true1"
scheduling info:            queue instance "all.q at vx86-v65xf-blr03" dropped because it is temporarily not available
                            queue instance "all.q at vx86-v65xk-blr03" dropped because it is temporarily not available
                            queue instance "all.q at vx86-v65xa-blr03" dropped because it is temporarily not available
                            queue instance "new.q at vx86-v65xa-blr03" dropped because it is temporarily not available
                            queue instance "new.q at vx86-v65xf-blr03" dropped because it is full

sgetest at vx86-v65xk-blr03 $ qalter -w p 22
verification: job is already running

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=199471

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list