[GE users] Client JSV - bottleneck/ jsv-qrsh combination not working

joga Joachim.Gabler at sun.com
Fri Jul 24 10:24:14 BST 2009


    [ The following text is in the "Windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Shruti,

I think the problem is in the jsv script logic - it is missing a jsv_accept call if we have a -b param with the value "n":

        if [ "`jsv_is_param b`" = "true" ]; then

                if [ "`jsv_get_param b`" = "y" ]; then
                        jsv_reject "Binary job is rejected."
                return
==> add here
                jsv_accept "Job with -b "n" option is accepted"
<==

                fi
        else
                jsv_accept "Job is accepted"
        fi
        return

Best regards,

   Joachim

On 07/24/09 00:44, shruti_m wrote:
HI Joachim,

I agree about ?-b n?.  I will verify that.

I don?t think NFS is an issue as the qrsh commands work fine in absence of JSV.  Moreover, qrsh, JSV script and JSV include script are under same NFS location.

In our case, this error is reproducible through a makefile which generates around 300 qrsh commands?some of that pass through JSV login..rest get no response.

Can you try with a similar setup at your end?

Thanks,
Shruti

From: Joachim.Gabler at sun.com<mailto:Joachim.Gabler at sun.com> [mailto:Joachim.Gabler at sun.com]
Sent: Thursday, July 23, 2009 8:24 AM
To: users at gridengine.sunsource.net<mailto:users at gridengine.sunsource.net>
Cc: Omar.Hassaine at Sun.COM<mailto:Omar.Hassaine at Sun.COM>; Joe Fu; Mukund Ambarge; Andy Schwierskott
Subject: Re: [GE users] Client JSV - bottleneck/ jsv-qrsh combination not working

Hi Shruti,

qrsh jobs being rejected by your jsv script is the expected behaviour: qrsh jobs are binary jobs by default.
Submitting them with -b n option should make them get accepted.

I cannot reproduce these protocol errors (got no response ...).

On which architecture are you running qsub?
Which SGE version is this?
I assume both the jsv script, as well as the SGE_ROOT are on nfs.
The jsv script sources an include from $SGE_ROOT (via nfs).
Is it possible that you experience really slow NFS on these filesystems?
And therefore the jsv runs into some (maybe too short) timeout?
Maybe running
time qsub -jsv ....
gives some insight?

Best regards,

  Joachim

shruti_m wrote:

Hi,

I want some help in understanding an issue related to JSV.

It looks like there is a bottleneck issue related to JSV. Also, JSV working with qrsh command showing unreliable results.


Same JSV script?for similar SGE commands..produces different results with qrsh command. With qsub, it works fine as appended at the end of this email.


For some, it goes through the logic and accepts/rejects the job.

For some, it does not even get started?it just shows START line in lsv log file and ten reports error that ?got no response from JSV script?.?

For some, it gets started, but does not go till the last stage of accept/reject.

This is a client level JSV.

Makefile is using command similar to below : If you will notice, there is no ?-b? param in there, still it rejects jobs and in log files, shows param for ?b? to be yes.
Same JSV logic works fine with qsub.


qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 0\"" | tee log/ptsin.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 0\"" | tee log/ptsin.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 1\"" | tee log/ptsit.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 0; set tm 0\"" | tee log/ptn.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 1\"" | tee log/ptsit.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 0; set tm 0\"" | tee log/ptn.log
Scenario 1 ? JSV accepting/rejecting job - FINE
/remote/idshome/shruti/scripts/jsv_b.sh started on Wed Jul 22 15:24:01 PDT 2009

This file contains logging output from a GE JSV script. Lines beginning
with >>> contain the data which was send by a command line client or
sge_qmaster to the JSV script. Lines beginning with <<< contain data
which is send for this JSV script to the client or sge_qmaster

>>> START
<<< STARTED
>>> PARAM VERSION 1.0
>>> PARAM CONTEXT client
>>> PARAM CLIENT qrsh
>>> PARAM USER subbuy
>>> PARAM GROUP synopsys
>>> PARAM CMDNAME module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x "set si 1; set tm 1"
>>> PARAM CMDARGS 0
>>> PARAM b y
>>> PARAM cwd /remote/us01dwp009/up3_porting/csm65lpe_virage/express/dev/top/qor/signoff/up3_8_top/6_02_00_00/v1p08_t125_rcb_t125
>>> PARAM e /dev/null
>>> PARAM l_hard health=1,os_version=WS4.0,64=1
>>> PARAM M subbuy at ecsadmin
>>> PARAM N module
>>> PARAM o /dev/null
>>> PARAM P bnormal
>>> BEGIN
<<< RESULT STATE REJECT Binary job is rejected.
/remote/idshome/shruti/scripts/jsv_b.sh is terminating on Wed Jul 22 15:24:23 PDT 2009
Scenario 2 ? JSV getting started, but not getting completed
This file contains logging output from a GE JSV script. Lines beginning
with >>> contain the data which was send by a command line client or
sge_qmaster to the JSV script. Lines beginning with <<< contain data
which is send for this JSV script to the client or sge_qmaster

>>> START
<<< STARTED
>>> PARAM VERSION 1.0
>>> PARAM CONTEXT client
>>> PARAM CLIENT qrsh
>>> PARAM USER subbuy
>>> PARAM GROUP synopsys
>>> PARAM CMDNAME module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x "set si 1; set tm 0"
>>> PARAM CMDARGS 0
>>> PARAM b y
>>> PARAM cwd /remote/us01dwp009/up3_porting/csm65lpe_virage/express/dev/top/qor/signoff/up3_8_top/6_02_00_00/v1p32_t125_rcb_t125
>>> PARAM e /dev/null
>>> PARAM l_hard health=1,os_version=WS4.0,64=1
>>> PARAM M subbuy at ecsadmin
>>> PARAM N module
>>> PARAM o /dev/null
>>> PARAM P bnormal
>>> BEGIN
Scenario 3 ? JSV not getting started
/remote/idshome/shruti/scripts/jsv_b.sh started on Wed Jul 22 15:23:45 PDT 2009

This file contains logging output from a GE JSV script. Lines beginning
with >>> contain the data which was send by a command line client or
sge_qmaster to the JSV script. Lines beginning with <<< contain data
which is send for this JSV script to the client or sge_qmaster

>>> START
Error message extract
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.


I did a test run of this to do comparison of qsub vs sqrsh..looks like JSV works fine with qsub but is unreliable with qrsh.

ecsadmin{shruti}194: qsub -jsv ~shruti/scripts/jsv_b.sh -P bnormal /remote/idshome/shruti/tmp/try.sh
Your job 1406141 ("try.sh") has been submitted
ecsadmin{shruti}195: qsub -jsv ~shruti/scripts/jsv_b.sh -b y -P bnormal /remote/idshome/shruti/tmp/try.sh
Unable to run job: Binary job is rejected..
Exiting.
ecsadmin{shruti}196: qrsh -jsv ~shruti/scripts/jsv_b.sh -P bnormal /remote/idshome/shruti/tmp/try.sh
Binary job is rejected.
ecsadmin{shruti}197: qrsh -jsv ~shruti/scripts/jsv_b.sh -b y -P bnormal /remote/idshome/shruti/tmp/try.sh
Binary job is rejected.








More information about the gridengine-users mailing list