[GE users] Client JSV - bottleneck/ jsv-qrsh combination not working

shruti_m shruti at synopsys.com
Thu Jul 23 23:44:39 BST 2009


    [ The following text is in the "Windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

HI Joachim,

I agree about ?-b n?.  I will verify that.

I don?t think NFS is an issue as the qrsh commands work fine in absence of JSV.  Moreover, qrsh, JSV script and JSV include script are under same NFS location.

In our case, this error is reproducible through a makefile which generates around 300 qrsh commands?some of that pass through JSV login..rest get no response.

Can you try with a similar setup at your end?

Thanks,
Shruti

From: Joachim.Gabler at sun.com [mailto:Joachim.Gabler at sun.com]
Sent: Thursday, July 23, 2009 8:24 AM
To: users at gridengine.sunsource.net
Cc: Omar.Hassaine at Sun.COM; Joe Fu; Mukund Ambarge; Andy Schwierskott
Subject: Re: [GE users] Client JSV - bottleneck/ jsv-qrsh combination not working

Hi Shruti,

qrsh jobs being rejected by your jsv script is the expected behaviour: qrsh jobs are binary jobs by default.
Submitting them with -b n option should make them get accepted.

I cannot reproduce these protocol errors (got no response ...).

On which architecture are you running qsub?
Which SGE version is this?
I assume both the jsv script, as well as the SGE_ROOT are on nfs.
The jsv script sources an include from $SGE_ROOT (via nfs).
Is it possible that you experience really slow NFS on these filesystems?
And therefore the jsv runs into some (maybe too short) timeout?
Maybe running
time qsub -jsv ....
gives some insight?

Best regards,

  Joachim

shruti_m wrote:

Hi,

I want some help in understanding an issue related to JSV.

It looks like there is a bottleneck issue related to JSV. Also, JSV working with qrsh command showing unreliable results.


Same JSV script?for similar SGE commands..produces different results with qrsh command. With qsub, it works fine as appended at the end of this email.


For some, it goes through the logic and accepts/rejects the job.

For some, it does not even get started?it just shows START line in lsv log file and ten reports error that ?got no response from JSV script?.?

For some, it gets started, but does not go till the last stage of accept/reject.

This is a client level JSV.

Makefile is using command similar to below : If you will notice, there is no ?-b? param in there, still it rejects jobs and in log files, shows param for ?b? to be yes.
Same JSV logic works fine with qsub.


qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 0\"" | tee log/ptsin.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 0\"" | tee log/ptsin.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 1\"" | tee log/ptsit.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 0; set tm 0\"" | tee log/ptn.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 1; set tm 1\"" | tee log/ptsit.log
qrsh -jsv /remote/idshome/shruti/scripts/jsv_b.sh -now n -cwd -V -P bnormal -l os_version=WS4.0,64=1 "module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x \"set si 0; set tm 0\"" | tee log/ptn.log
Scenario 1 ? JSV accepting/rejecting job - FINE
/remote/idshome/shruti/scripts/jsv_b.sh started on Wed Jul 22 15:24:01 PDT 2009

This file contains logging output from a GE JSV script. Lines beginning
with >>> contain the data which was send by a command line client or
sge_qmaster to the JSV script. Lines beginning with <<< contain data
which is send for this JSV script to the client or sge_qmaster

>>> START
<<< STARTED
>>> PARAM VERSION 1.0
>>> PARAM CONTEXT client
>>> PARAM CLIENT qrsh
>>> PARAM USER subbuy
>>> PARAM GROUP synopsys
>>> PARAM CMDNAME module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x "set si 1; set tm 1"
>>> PARAM CMDARGS 0
>>> PARAM b y
>>> PARAM cwd /remote/us01dwp009/up3_porting/csm65lpe_virage/express/dev/top/qor/signoff/up3_8_top/6_02_00_00/v1p08_t125_rcb_t125
>>> PARAM e /dev/null
>>> PARAM l_hard health=1,os_version=WS4.0,64=1
>>> PARAM M subbuy at ecsadmin
>>> PARAM N module
>>> PARAM o /dev/null
>>> PARAM P bnormal
>>> BEGIN
<<< RESULT STATE REJECT Binary job is rejected.
/remote/idshome/shruti/scripts/jsv_b.sh is terminating on Wed Jul 22 15:24:23 PDT 2009
Scenario 2 ? JSV getting started, but not getting completed
This file contains logging output from a GE JSV script. Lines beginning
with >>> contain the data which was send by a command line client or
sge_qmaster to the JSV script. Lines beginning with <<< contain data
which is send for this JSV script to the client or sge_qmaster

>>> START
<<< STARTED
>>> PARAM VERSION 1.0
>>> PARAM CONTEXT client
>>> PARAM CLIENT qrsh
>>> PARAM USER subbuy
>>> PARAM GROUP synopsys
>>> PARAM CMDNAME module purge all; module load pt ; pt_shell -f ../../../../script/signoff.tcl -x "set si 1; set tm 0"
>>> PARAM CMDARGS 0
>>> PARAM b y
>>> PARAM cwd /remote/us01dwp009/up3_porting/csm65lpe_virage/express/dev/top/qor/signoff/up3_8_top/6_02_00_00/v1p32_t125_rcb_t125
>>> PARAM e /dev/null
>>> PARAM l_hard health=1,os_version=WS4.0,64=1
>>> PARAM M subbuy at ecsadmin
>>> PARAM N module
>>> PARAM o /dev/null
>>> PARAM P bnormal
>>> BEGIN
Scenario 3 ? JSV not getting started
/remote/idshome/shruti/scripts/jsv_b.sh started on Wed Jul 22 15:23:45 PDT 2009

This file contains logging output from a GE JSV script. Lines beginning
with >>> contain the data which was send by a command line client or
sge_qmaster to the JSV script. Lines beginning with <<< contain data
which is send for this JSV script to the client or sge_qmaster

>>> START
Error message extract
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
got no response from JSV script "/remote/idshome/shruti/scripts/jsv_b.sh"
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.
Binary job is rejected.


I did a test run of this to do comparison of qsub vs sqrsh..looks like JSV works fine with qsub but is unreliable with qrsh.

ecsadmin{shruti}194: qsub -jsv ~shruti/scripts/jsv_b.sh -P bnormal /remote/idshome/shruti/tmp/try.sh
Your job 1406141 ("try.sh") has been submitted
ecsadmin{shruti}195: qsub -jsv ~shruti/scripts/jsv_b.sh -b y -P bnormal /remote/idshome/shruti/tmp/try.sh
Unable to run job: Binary job is rejected..
Exiting.
ecsadmin{shruti}196: qrsh -jsv ~shruti/scripts/jsv_b.sh -P bnormal /remote/idshome/shruti/tmp/try.sh
Binary job is rejected.
ecsadmin{shruti}197: qrsh -jsv ~shruti/scripts/jsv_b.sh -b y -P bnormal /remote/idshome/shruti/tmp/try.sh
Binary job is rejected.







More information about the gridengine-users mailing list