[GE globus] [GE users] Problem sending jobs with globusrun-ws: Current job state: Unsubmitted

Esteban Freire Garcia esfreire at cesga.es
Wed Dec 5 17:38:29 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Jeff,

Ok. Thanks for your help. Looking the file 
$GLOBUS_LOCATION/etc/globus-sge.conf , looks fine.
I don't know what else look.

[globus at svgd GRAM]$ cat $GLOBUS_LOCATION/etc/globus-sge.conf
log_path=/opt/cesga/sge60/default/common/reporting

Thanks,
Esteban

Jeff Porter escribió:
> Hi Esteban,
>
> By eye these two reporting file dumps look fine.  I don't know the details about sge variations - I'm using 6.0u10 - but the gt4 submission is clearly working.   
>
> The gt4 code looks for the reporting file by checking a globus config file. Specifically,
>
> $GLOBUS_LOCATION/etc/globus-sge.conf
>
> It should have the line:  
>
> logfile=/actual-path-to-sge/default/common/reporting
>
> Having that config file correct may depend on whether you have $SGE_ROOT and $SGE_CELL defined in your shell during your globus install.
>
> Jeff
>
>
>   
>> Hi Jeff,
>>
>> Thanks for you answer. Ok, I have the file 
>> $SGE_ROOT/default/common/reporting.
>> No,  we are not using ARCO.  The only  thing  that  I think  maybe 
>> is 
>> happening, it's  that globus cannot read this file, but I tested to 
>> read 
>> this file as user "globus" and as user who sent the job, and I 
>> could 
>> read without any problem. Is there any place where I can indicate 
>> to 
>> globus to read this file?
>>
>> I put below the output to the file 'reporting', after send a job 
>> using 
>> globus and send a job with qsub.
>>
>> tail -f $SGE_ROOT/default/common/reporting
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> ------
>> 1196873426:new_job:1196873426:1417619:-
>> 1:NONE:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:10241196873426:job_log:1196873426:pending:1417619:-1:NONE::cyteduser:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:new 
>> job
>> 1196873437:job_log:1196873437:sent:1417619:0:NONE:t:master:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:sent 
>> to execd
>> 1196873437:host_consumable:compute-1-
>> 12.local:1196873437:X:num_proc=1.000000=1.000000,s_vmem=524288000.000000=1.300G1196873437:queue_consumable:pro_cytedgrid:compute-1-12.local:1196873437::num_proc=1.000000=1.000000,s_vmem=524288000.000000=1.000G,slots=1.000000=1.000000
>> 1196873437:job_log:1196873437:delivered:1417619:0:NONE:r:master:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>> received by execd
>> 1196873437:acct:pro_cytedgrid:compute-1-
>> 12.local:cesga:cyteduser:sge_job_script.28406:1417619:sge:0:1196873426:1196873390:1196873391:0:0:1:0:0:0.000000:0:0:0:0:5330:0:0:0.000000:0:0:0:0:258:45:NONE:defaultdepartment:NONE:1:0:0.000000:0.000000:0.000000:-U 
>> pro_cytedgrid -l 
>> arch=i386,h_fsize=1G,h_stack=16M,num_proc=1,s_rt=3600,s_vmem=500M:0.000000:NONE:0.000000
>> 1196873437:job_log:1196873437:finished:1417619:0:NONE:r:execution 
>> daemon:compute-1-
>> 12.local:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>> exited
>> 1196873437:job_log:1196873437:finished:1417619:0:NONE:r:master:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>> waits for schedds deletion
>> 1196873437:host_consumable:compute-1-
>> 12.local:1196873437:X:num_proc=0.000000=1.000000,s_vmem=0.000000=1.300G1196873437:queue_consumable:pro_cytedgrid:compute-1-12.local:1196873437::num_proc=0.000000=1.000000,s_vmem=0.000000=1.000G,slots=0.000000=1.000000
>> 1196873448:job_log:1196873448:deleted:1417619:0:NONE:T:scheduler:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>> deleted by schedd
>>
>> 1196873742:new_job:1196873742:1417621:-
>> 1:NONE:test.sh:esfreire:cesga::defaultdepartment:sge:10241196873742:job_log:1196873742:pending:1417621:-1:NONE::esfreire:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:new 
>> job
>> 1196873753:job_log:1196873753:sent:1417621:0:NONE:t:master:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:sent 
>> to execd
>> 1196873753:host_consumable:compute-1-
>> 14.local:1196873753:X:num_proc=1.000000=1.000000,s_vmem=1073741824.000000=1.300G1196873753:queue_consumable:GRID:compute-1-14.local:1196873753::num_proc=1.000000=1.000000,s_vmem=1073741824.000000=2.000G,slots=1.000000=1.000000
>> 1196873753:job_log:1196873753:delivered:1417621:0:NONE:r:master:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>> received by execd
>> 1196873754:acct:GRID:compute-1-
>> 14.local:cesga:esfreire:test.sh:1417621:sge:0:1196873742:1196873658:1196873658:0:0:0:0:0:0.000000:0:0:0:0:689:0:0:0.000000:0:0:0:0:202:2:NONE:defaultdepartment:NONE:1:0:0.000000:0.000000:0.000000:-U 
>> paralelo-gigabit,jmourino,esfreire,blades_dell -l 
>> arch=i386,h_fsize=1G,h_stack=16M,network=gigabit,num_proc=1,s_rt=3600,s_vmem=1G:0.000000:NONE:0.000000
>> 1196873754:job_log:1196873754:finished:1417621:0:NONE:r:execution 
>> daemon:compute-1-
>> 14.local:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>> exited
>> 1196873754:job_log:1196873754:finished:1417621:0:NONE:r:master:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>> waits for schedds deletion
>> 1196873754:host_consumable:compute-1-
>> 14.local:1196873754:X:num_proc=0.000000=1.000000,s_vmem=0.000000=1.300G1196873754:queue_consumable:GRID:compute-1-14.local:1196873754::num_proc=0.000000=1.000000,s_vmem=0.000000=2.000G,slots=0.000000=1.000000
>> 1196873764:job_log:1196873764:deleted:1417621:0:NONE:T:scheduler:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>> deleted by schedd
>>
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> ------
>>
>> On the other hand, we are using SGE 6.0u6
>>
>> Thanks,
>> Esteban
>>
>> Jeff Porter escribió:
>>     
>>> Hi Esteban,
>>>
>>> the logfile noted in the docs is the 'reporting' file: 
>>>       
>> $SGE_ROOT/default/common/reporting.  The gt4 c-code reads that file 
>> for jobs state information instead of the calling qsub from sge.pm 
>> as is done for gt2.  I wouldn't spend much time on the sge.pm file 
>> as its use in gt4 is essentially just for submission.  And the 
>> patch you say you applied before is directed at fixing gt2-specific 
>> details that break gt4 submissions.
>>     
>>> One other issue is if you are running ARCO you may have this 
>>>       
>> problem. I understand the dbwriter code deletes the reporting file 
>> with each read as its mechanism for checkpointing. Thus gt4 will 
>> never see the change in state through this file. 
>>     
>>> Thanks, Jeff
>>>
>>>   
>>>       
>>>> Hi Melvin,
>>>>
>>>> Thanks for you answer. I have "reporting=true" but I had 
>>>> "joblog=false", 
>>>> at these moments I already have changed this and now I have 
>>>> "joblog=true", after this, I have reinstalled the packages of 
>>>> "London 
>>>> e-Science Centre" y I have ran the gpt-postinstall again, but 
>>>> unfortunately,  it keeps without pass of the state "unsubmitted":
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> ----------------------------------------------
>>>> [esfreire at svgd ~]$ globusrun-ws -submit -pft -T 10000 -s -S -
>>>> factory 
>>>> svgd.cesga.es -Ft SGE -c /bin/hostname
>>>> Delegating user credentials...Done.
>>>> Submitting job...Done.
>>>> Job ID: uuid:1fe5c0d2-a31d-11dc-a78b-000423ac0723
>>>> Termination time: 12/06/2007 10:30 GMT
>>>> Current job state: Unsubmitted
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> ----------------------------------------------
>>>> One thing that I don't understand is that in the link to "London 
>>>> e-Science Centre" say, "Your SGE installation must also be 
>>>> configured 
>>>> with support for the reporting logfile enabled, and that logfile 
>>>> must be 
>>>> accessible from the server on which you are installing GT4", I 
>>>> don't 
>>>> know which is this "logfile"? I suppose that is 
>>>> "$SGE_ROOT/default/spool/qmaster/messages"
>>>>
>>>> Other thing that it's indicating that something go wrong,  I 
>>>>         
>> think 
>>     
>>>> is 
>>>> that the job only run about 1 second.
>>>>
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> ----------------------------------------------
>>>> [globus at svgd JobManager]$ qacct -j 1417415
>>>> ==============================================================
>>>> qname        pro_cytedgrid      
>>>> hostname     compute-1-12.local 
>>>> group        cesga              
>>>> owner        cyteduser          
>>>> project      NONE               
>>>> department   defaultdepartment  
>>>> jobname      sge_job_script.1784
>>>> jobnumber    1417415            
>>>> taskid       undefined
>>>> account      sge                
>>>> priority     0                  
>>>> qsub_time    Wed Dec  5 11:18:41 2007
>>>> start_time   Wed Dec  5 11:18:05 2007
>>>> end_time     Wed Dec  5 11:18:06 2007
>>>> granted_pe   NONE               
>>>> slots        1                  
>>>> failed       0   
>>>> exit_status  0                  
>>>> ru_wallclock 1           
>>>> ru_utime     0           
>>>> ru_stime     0           
>>>> ru_maxrss    0                  
>>>> ru_ixrss     0                  
>>>> ru_ismrss    0                  
>>>> ru_idrss     0                  
>>>> ru_isrss     0                  
>>>> ru_minflt    5328               
>>>> ru_majflt    0                  
>>>> ru_nswap     0                  
>>>> ru_inblock   0                  
>>>> ru_oublock   0                  
>>>> ru_msgsnd    0                  
>>>> ru_msgrcv    0                  
>>>> ru_nsignals  0                  
>>>> ru_nvcsw     262                
>>>> ru_nivcsw    44                 
>>>> cpu          0           
>>>> mem          0.000            
>>>> io           0.000            
>>>> iow          0.000            
>>>> maxvmem      0.000
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> -----------------------------------------------------------------
>>>>         
>> ---
>>     
>>>> ----------------------------------------------
>>>> I don't know what else change.
>>>>
>>>>
>>>> Thank you very much,
>>>> Esteban
>>>>
>>>> Melvin Koh escribió:
>>>>     
>>>>         
>>>>> Have you enabled "reporting=true" and "joblog=true" in "qconf -
>>>>>       
>>>>>           
>>>> mconf"?>
>>>>     
>>>>         
>>>>> On Fri, 23 Nov 2007, Esteban Freire Garcia wrote:
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> First of all, thanks for answer me. We installed the patch 
>>>>>>         
>>>>>>             
>>>> yesterday, 
>>>>     
>>>>         
>>>>>> unfortunately, we continue with the same problem, we will try 
>>>>>>         
>>>>>>             
>>>> look the 
>>>>     
>>>>         
>>>>>> jobmanager, because I think for some reason, the 
>>>>>>         
>>>>>>             
>>>> jobmanager(sge.pm) is 
>>>>     
>>>>         
>>>>>> not seeing the status for the job correctly, and it doesn't 
>>>>>>             
>> know 
>>     
>>>>>>         
>>>>>>             
>>>> when 
>>>>     
>>>>         
>>>>>> the job have finished.
>>>>>>
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> --
>>     
>>>>>>         
>>>>>>             
>>>> -----------------------------------------------------------------
>>>>         
>> --
>>     
>>>>     
>>>>         
>>>>>> [esfreire at svgd ~]$  globusrun-ws -submit -pft -s -S -F  
>>>>>>
>>>>>>         
>>>>>>             
>> https://svgd.cesga.es:8443/wsrf/services/ManagedJobFactoryService -
>>     
>>>> Ft 
>>>>     
>>>>         
>>>>>> SGE -c /bin/hostname
>>>>>> Delegating user credentials...Done.
>>>>>> Submitting job...Done.
>>>>>> Job ID: uuid:580a49d2-9923-11dc-9646-000423ac0723
>>>>>> Termination time: 11/23/2007 17:49 GMT
>>>>>> Current job state: Unsubmitted
>>>>>>
>>>>>> globusrun-ws: Error querying job state
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> --
>>     
>>>>>>         
>>>>>>             
>>>> -----------------------------------------------------------------
>>>>         
>> --
>>     
>>>>     
>>>>         
>>>>>> Thank you very much,
>>>>>> Esteban
>>>>>>
>>>>>> Otheus (aka Timothy J. Shelling) escribi?:
>>>>>> Hi,
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> On Nov 20, 2007 9:13 AM, Esteban Freire Garcia 
>>>>>>>           
>>>>>>>               
>>>> <esfreire at cesga.es 
>>>>     
>>>>         
>>>>>>> <mailto:esfreire at cesga.es>> wrote:
>>>>>>>
>>>>>>>     Hi,
>>>>>>>
>>>>>>>     We have installed 'gt4.0.5-x86_64_rhas_4-installer' on 
>>>>>>>               
>> "Red 
>>     
>>>>>>>           
>>>>>>>               
>>>> Hat>>>     Enterprise Linux ES release 4 (Nahant)".  ...
>>>>     
>>>>         
>>>>>>>     Now, we are trying to integrate Globus with SGE 6.0u6, 
>>>>>>>
>>>>>>>
>>>>>>> I don't know if this will help or not. I had to patch gt4.0.2 
>>>>>>>           
>>>>>>>               
>>>> to work 
>>>>     
>>>>         
>>>>>>> with SGE 6.0u4 as follows:
>>>>>>>
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>   
>>>>>       
>>>>>           
>>>>     
>>>>         
>>> ------------------------------------------------------------------
>>>       
>> ---
>>     
>>> To unsubscribe, e-mail: globus-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: globus-
>>>       
>> help at gridengine.sunsource.net>
>>     
>>>   
>>>       
>>
>>     
>
>   





More information about the gridengine-users mailing list