[GE globus] [GE users] Problem sending jobs with globusrun-ws: Current job state: Unsubmitted

Esteban Freire Garcia esfreire at cesga.es
Wed Dec 19 16:48:31 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Jeff,

Thanks for your help. Of course, I'll let you know if the problem is 
about the behaviour of strptime or when I fix this problem.

On the other hand, I have tried with -T 120000 but unfortunately if 
didn't work and I had that cancel the job.

[esfreire at svgd ~]$ globusrun-ws -submit -pft -T 120000 -s -S -F  
https://svgd.cesga.es:8443/wsrf/services/ManagedJobFactoryService -Ft 
SGE -c /bin/hostname
Delegating user credentials...Done.
Submitting job...Done.
Job ID: uuid:ffa8ef36-ae4a-11dc-852b-000423ac0723
Termination time: 12/20/2007 15:56 GMT
Current job state: Unsubmitted
Canceling...Canceled.
Destroying job...Done.
Cleaning up any delegated credentials...Done.
globusrun-ws: Operation was canceled



Cheers,
Esteban

Jeff Porter escribió:
> Hi Esteban,
>
> I apologize for not responding earlier  - I was on travel last week.  I just now see that you may have a fix via the gt-users email list based on the behavior of strptime.  You'll let us know If that was the problem?
>
> Also, I see you are using "-T 10000" which, according to the docs, will cause a timeout if no response occurs for 10 seconds. I'm curious why you would choose to reduce the value from the default of 120000 (2 minutes).
>
> Thanks, Jeff
>
>   
>> Hi Jeff,
>>
>> I tried what you said me, but it seems that the file 
>> $GLOBUS_LOCATION/var/container-real.log,  in my case, 
>> $GLOBUS_LOCATION/var/container.log, it's getting the correct  
>> job_id, I 
>> sent two jobs, the first with Fork and the second with SGE, while I 
>> did 
>> a tail -f of container.log,  I put below the full output:
>>
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> ---------------------------------------------------------
>> [esfreire at svgd ~]$ globusrun-ws -submit -pft -T 10000 -s -S -F  
>> https://svgd.cesga.es:8443/wsrf/services/ManagedJobFactoryService -
>> Ft 
>> Fork -c /bin/hostname
>> Delegating user credentials...Done.
>> Submitting job...Done.
>> Job ID: uuid:2bb348fc-a724-11dc-be3c-000423ac0723
>> Termination time: 12/11/2007 13:31 GMT
>> Current job state: Active
>> Current job state: CleanUp-Hold
>> svgd.cesga.es
>> Current job state: CleanUp
>> Current job state: Done
>> Destroying job...Done.
>> Cleaning up any delegated credentials...Done.
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> ---------------------------------------------------------
>> [esfreire at svgd ~]$ globusrun-ws -submit -pft -T 10000 -s -S -F  
>> https://svgd.cesga.es:8443/wsrf/services/ManagedJobFactoryService -
>> Ft 
>> SGE -c /bin/hostname
>> Delegating user credentials...Done.
>> Submitting job...Done.
>> Job ID: uuid:3d26462a-a724-11dc-9b6f-000423ac0723
>> Termination time: 12/11/2007 13:31 GMT
>> Current job state: Unsubmitted
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> ---------------------------------------------------------
>> [globus at svgd ~]$ tail -f /usr/local/globus-4.0.5/var/container.log
>> 2007-12-10 14:09:19,342 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:11:19,540 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:13:19,624 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:15:19,668 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:17:19,788 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:19:19,867 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:21:19,910 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:23:20,020 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:25:20,096 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:27:20,149 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:29:20,257 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:31:11,582 INFO  exec.StateMachine 
>> [RunQueueThread_9,logJobAccepted:3513] Job 
>> 2c122480-a724-11dc-9125-9fede8433907 accepted for local user 
>> 'cyteduser'2007-12-10 14:31:12,650 INFO  exec.StateMachine 
>> [RunQueueThread_7,logJobSubmitted:3525] Job 
>> 2c122480-a724-11dc-9125-9fede8433907 submitted with local job ID 
>> '2cb28e70-a724-11dc-a101-000423ac0723:29749'
>> 2007-12-10 14:31:19,204 INFO  exec.StateMachine 
>> [RunQueueThread_14,logJobSucceeded:3535] Job 
>> 2c122480-a724-11dc-9125-9fede8433907 finished successfully
>> 2007-12-10 14:31:20,298 ERROR monitoring.SchedulerEventGenerator 
>> [Thread-9,run:198] SEG Terminated with Fault: globus_xio: Operation 
>> was 
>> canceled
>> 2007-12-10 14:31:40,757 INFO  exec.StateMachine 
>> [RunQueueThread_17,logJobAccepted:3513] Job 
>> 3d7aee00-a724-11dc-9125-9fede8433907 accepted for local user 
>> 'cyteduser'2007-12-10 14:31:41,469 INFO  exec.StateMachine 
>> [RunQueueThread_9,logJobSubmitted:3525] Job 
>> 3d7aee00-a724-11dc-9125-9fede8433907 submitted with local job ID 
>> '1418156'-----------------------------------------------------------
>> --------------------------------------------------------------------
>> ------------------------------------------------------------------
>>
>> [globus at svgd ~]$ qstat -s z -u cyteduser
>> job-ID  prior   name       user         state submit/start at     
>> queue                          slots ja-task-ID
>> --------------------------------------------------------------------
>> ---------------------------------------------
>> 1418156 0.00000 sge_job_sc cyteduser    qw    12/10/2007 
>> 14:31:41                                    1       
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> ---------------------------------------------------------
>>
>> Thank you very much,
>> Esteban
>>
>> R. Jeff Porter escribió:
>>     
>>> Hi Esteban,
>>>
>>> I checked the reporting file syntax and that was fine as 
>>>       
>> expected.  
>>     
>>> One thing to look at is whether the SGE job-id matches what globus
>>> thinks it is.  You should check for the job-id in the globus log 
>>>       
>> file> (our setup has it in $GLOBUS_LOCATION/var/container-real.log) 
>> via a
>>     
>>> message like:
>>>
>>> 2007-12-05 15:30:20,491 INFO  exec.StateMachine
>>> [RunQueueThread_9,logJobSubmitted:3525] Job 0aafcce0-a38a-11dc-
>>>       
>> aa2c-
>>     
>>> a94fc0e89ad8 submitted with local job ID '12345'
>>>
>>> In the above example, 12345 is the SGE job-id. 
>>>
>>> On my testbed I actually hard-coded a bogus job-id into my sge.pm 
>>>       
>> file> and reproduced your symptoms as seen from my submit client
>>     
>>> Delegating user credentials...Done.
>>> Submitting job...Done.
>>> Job ID: uuid:0a1700f0-a38a-11dc-8a02-00304889ddce
>>> Termination time: 12/06/2007 23:30 GMT
>>> Current job state: Unsubmitted
>>>
>>> So having a job-id mismatch (globus log vs the reporting file) is 
>>>       
>> one> way to experience what you observe. 
>>     
>>> Jeff
>>>
>>>
>>> On Wed, 2007-12-05 at 18:38 +0100, Esteban Freire Garcia wrote:
>>>   
>>>       
>>>> Hi Jeff,
>>>>
>>>> Ok. Thanks for your help. Looking the file
>>>> $GLOBUS_LOCATION/etc/globus-sge.conf , looks fine.
>>>> I don't know what else look.
>>>>
>>>> [globus at svgd GRAM]$ cat $GLOBUS_LOCATION/etc/globus-sge.conf
>>>> log_path=/opt/cesga/sge60/default/common/reporting
>>>>
>>>> Thanks,
>>>> Esteban
>>>>
>>>> Jeff Porter escribió: 
>>>>     
>>>>         
>>>>> Hi Esteban,
>>>>>
>>>>> By eye these two reporting file dumps look fine.  I don't know 
>>>>>           
>> the details about sge variations - I'm using 6.0u10 - but the gt4 
>> submission is clearly working.   
>>     
>>>>> The gt4 code looks for the reporting file by checking a globus 
>>>>>           
>> config file. Specifically,
>>     
>>>>> $GLOBUS_LOCATION/etc/globus-sge.conf
>>>>>
>>>>> It should have the line:  
>>>>>
>>>>> logfile=/actual-path-to-sge/default/common/reporting
>>>>>
>>>>> Having that config file correct may depend on whether you have 
>>>>>           
>> $SGE_ROOT and $SGE_CELL defined in your shell during your globus 
>> install.>>>
>>     
>>>>> Jeff
>>>>>
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi Jeff,
>>>>>>
>>>>>> Thanks for you answer. Ok, I have the file 
>>>>>> $SGE_ROOT/default/common/reporting.
>>>>>> No,  we are not using ARCO.  The only  thing  that  I think  
>>>>>>             
>> maybe 
>>     
>>>>>> is 
>>>>>> happening, it's  that globus cannot read this file, but I 
>>>>>>             
>> tested to 
>>     
>>>>>> read 
>>>>>> this file as user "globus" and as user who sent the job, and I 
>>>>>> could 
>>>>>> read without any problem. Is there any place where I can 
>>>>>>             
>> indicate 
>>     
>>>>>> to 
>>>>>> globus to read this file?
>>>>>>
>>>>>> I put below the output to the file 'reporting', after send a 
>>>>>>             
>> job 
>>     
>>>>>> using 
>>>>>> globus and send a job with qsub.
>>>>>>
>>>>>> tail -f $SGE_ROOT/default/common/reporting
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> -----
>>     
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> -----
>>     
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> -----
>>     
>>>>>> ------
>>>>>> 1196873426:new_job:1196873426:1417619:-
>>>>>>
>>>>>>             
>> 1:NONE:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:10241196873426:job_log:1196873426:pending:1417619:-1:NONE::cyteduser:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:new 
>>     
>>>>>> job
>>>>>>
>>>>>>             
>> 1196873437:job_log:1196873437:sent:1417619:0:NONE:t:master:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:sent 
>>     
>>>>>> to execd
>>>>>> 1196873437:host_consumable:compute-1-
>>>>>>
>>>>>>             
>> 12.local:1196873437:X:num_proc=1.000000=1.000000,s_vmem=524288000.000000=1.300G1196873437:queue_consumable:pro_cytedgrid:compute-1-12.local:1196873437::num_proc=1.000000=1.000000,s_vmem=524288000.000000=1.000G,slots=1.000000=1.000000
>>     
>> 1196873437:job_log:1196873437:delivered:1417619:0:NONE:r:master:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>>     
>>>>>> received by execd
>>>>>> 1196873437:acct:pro_cytedgrid:compute-1-
>>>>>>
>>>>>>             
>> 12.local:cesga:cyteduser:sge_job_script.28406:1417619:sge:0:1196873426:1196873390:1196873391:0:0:1:0:0:0.000000:0:0:0:0:5330:0:0:0.000000:0:0:0:0:258:45:NONE:defaultdepartment:NONE:1:0:0.000000:0.000000:0.000000:-U 
>>     
>>>>>> pro_cytedgrid -l 
>>>>>>
>>>>>>             
>> arch=i386,h_fsize=1G,h_stack=16M,num_proc=1,s_rt=3600,s_vmem=500M:0.000000:NONE:0.000000>>>> 1196873437:job_log:1196873437:finished:1417619:0:NONE:r:execution 
>>     
>>>>>> daemon:compute-1-
>>>>>>
>>>>>>             
>> 12.local:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>>     
>>>>>> exited
>>>>>>
>>>>>>             
>> 1196873437:job_log:1196873437:finished:1417619:0:NONE:r:master:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>>     
>>>>>> waits for schedds deletion
>>>>>> 1196873437:host_consumable:compute-1-
>>>>>>
>>>>>>             
>> 12.local:1196873437:X:num_proc=0.000000=1.000000,s_vmem=0.000000=1.300G1196873437:queue_consumable:pro_cytedgrid:compute-1-12.local:1196873437::num_proc=0.000000=1.000000,s_vmem=0.000000=1.000G,slots=0.000000=1.000000
>>     
>> 1196873448:job_log:1196873448:deleted:1417619:0:NONE:T:scheduler:svgd.cesga.es:0:1024:1196873426:sge_job_script.28406:cyteduser:cesga::defaultdepartment:sge:job 
>>     
>>>>>> deleted by schedd
>>>>>>
>>>>>> 1196873742:new_job:1196873742:1417621:-
>>>>>>
>>>>>>             
>> 1:NONE:test.sh:esfreire:cesga::defaultdepartment:sge:10241196873742:job_log:1196873742:pending:1417621:-1:NONE::esfreire:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:new 
>>     
>>>>>> job
>>>>>>
>>>>>>             
>> 1196873753:job_log:1196873753:sent:1417621:0:NONE:t:master:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:sent 
>>     
>>>>>> to execd
>>>>>> 1196873753:host_consumable:compute-1-
>>>>>>
>>>>>>             
>> 14.local:1196873753:X:num_proc=1.000000=1.000000,s_vmem=1073741824.000000=1.300G1196873753:queue_consumable:GRID:compute-1-14.local:1196873753::num_proc=1.000000=1.000000,s_vmem=1073741824.000000=2.000G,slots=1.000000=1.000000
>>     
>> 1196873753:job_log:1196873753:delivered:1417621:0:NONE:r:master:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>>     
>>>>>> received by execd
>>>>>> 1196873754:acct:GRID:compute-1-
>>>>>>
>>>>>>             
>> 14.local:cesga:esfreire:test.sh:1417621:sge:0:1196873742:1196873658:1196873658:0:0:0:0:0:0.000000:0:0:0:0:689:0:0:0.000000:0:0:0:0:202:2:NONE:defaultdepartment:NONE:1:0:0.000000:0.000000:0.000000:-U 
>>     
>>>>>> paralelo-gigabit,jmourino,esfreire,blades_dell -l 
>>>>>>
>>>>>>             
>> arch=i386,h_fsize=1G,h_stack=16M,network=gigabit,num_proc=1,s_rt=3600,s_vmem=1G:0.000000:NONE:0.000000>>>> 1196873754:job_log:1196873754:finished:1417621:0:NONE:r:execution 
>>     
>>>>>> daemon:compute-1-
>>>>>>
>>>>>>             
>> 14.local:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>>     
>>>>>> exited
>>>>>>
>>>>>>             
>> 1196873754:job_log:1196873754:finished:1417621:0:NONE:r:master:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>>     
>>>>>> waits for schedds deletion
>>>>>> 1196873754:host_consumable:compute-1-
>>>>>>
>>>>>>             
>> 14.local:1196873754:X:num_proc=0.000000=1.000000,s_vmem=0.000000=1.300G1196873754:queue_consumable:GRID:compute-1-14.local:1196873754::num_proc=0.000000=1.000000,s_vmem=0.000000=2.000G,slots=0.000000=1.000000
>>     
>> 1196873764:job_log:1196873764:deleted:1417621:0:NONE:T:scheduler:svgd.cesga.es:0:1024:1196873742:test.sh:esfreire:cesga::defaultdepartment:sge:job 
>>     
>>>>>> deleted by schedd
>>>>>>
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> -----
>>     
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> -----
>>     
>>>>>> ---------------------------------------------------------------
>>>>>>             
>> -----
>>     
>>>>>> ------
>>>>>>
>>>>>> On the other hand, we are using SGE 6.0u6
>>>>>>
>>>>>> Thanks,
>>>>>> Esteban
>>>>>>
>>>>>> Jeff Porter escribió:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Hi Esteban,
>>>>>>>
>>>>>>> the logfile noted in the docs is the 'reporting' file: 
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> $SGE_ROOT/default/common/reporting.  The gt4 c-code reads that 
>>>>>>             
>> file 
>>     
>>>>>> for jobs state information instead of the calling qsub from 
>>>>>>             
>> sge.pm 
>>     
>>>>>> as is done for gt2.  I wouldn't spend much time on the sge.pm 
>>>>>>             
>> file 
>>     
>>>>>> as its use in gt4 is essentially just for submission.  And the 
>>>>>> patch you say you applied before is directed at fixing gt2-
>>>>>>             
>> specific 
>>     
>>>>>> details that break gt4 submissions.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> One other issue is if you are running ARCO you may have this 
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> problem. I understand the dbwriter code deletes the reporting 
>>>>>>             
>> file 
>>     
>>>>>> with each read as its mechanism for checkpointing. Thus gt4 
>>>>>>             
>> will 
>>     
>>>>>> never see the change in state through this file. 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Thanks, Jeff
>>>>>>>
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> Hi Melvin,
>>>>>>>>
>>>>>>>> Thanks for you answer. I have "reporting=true" but I had 
>>>>>>>> "joblog=false", 
>>>>>>>> at these moments I already have changed this and now I have 
>>>>>>>> "joblog=true", after this, I have reinstalled the packages 
>>>>>>>>                 
>> of 
>>     
>>>>>>>> "London 
>>>>>>>> e-Science Centre" y I have ran the gpt-postinstall again, 
>>>>>>>>                 
>> but 
>>     
>>>>>>>> unfortunately,  it keeps without pass of the state 
>>>>>>>>                 
>> "unsubmitted":>>>>>> -----------------------------------------------
>> ------------------
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> ----------------------------------------------
>>>>>>>> [esfreire at svgd ~]$ globusrun-ws -submit -pft -T 10000 -s -S -
>>>>>>>> factory 
>>>>>>>> svgd.cesga.es -Ft SGE -c /bin/hostname
>>>>>>>> Delegating user credentials...Done.
>>>>>>>> Submitting job...Done.
>>>>>>>> Job ID: uuid:1fe5c0d2-a31d-11dc-a78b-000423ac0723
>>>>>>>> Termination time: 12/06/2007 10:30 GMT
>>>>>>>> Current job state: Unsubmitted
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> ----------------------------------------------
>>>>>>>> One thing that I don't understand is that in the link to 
>>>>>>>>                 
>> "London 
>>     
>>>>>>>> e-Science Centre" say, "Your SGE installation must also be 
>>>>>>>> configured 
>>>>>>>> with support for the reporting logfile enabled, and that 
>>>>>>>>                 
>> logfile 
>>     
>>>>>>>> must be 
>>>>>>>> accessible from the server on which you are installing GT4", 
>>>>>>>>                 
>> I 
>>     
>>>>>>>> don't 
>>>>>>>> know which is this "logfile"? I suppose that is 
>>>>>>>> "$SGE_ROOT/default/spool/qmaster/messages"
>>>>>>>>
>>>>>>>> Other thing that it's indicating that something go wrong,  I 
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> think 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> is 
>>>>>>>> that the job only run about 1 second.
>>>>>>>>
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> ----------------------------------------------
>>>>>>>> [globus at svgd JobManager]$ qacct -j 1417415
>>>>>>>> ==============================================================
>>>>>>>> qname        pro_cytedgrid      
>>>>>>>> hostname     compute-1-12.local 
>>>>>>>> group        cesga              
>>>>>>>> owner        cyteduser          
>>>>>>>> project      NONE               
>>>>>>>> department   defaultdepartment  
>>>>>>>> jobname      sge_job_script.1784
>>>>>>>> jobnumber    1417415            
>>>>>>>> taskid       undefined
>>>>>>>> account      sge                
>>>>>>>> priority     0                  
>>>>>>>> qsub_time    Wed Dec  5 11:18:41 2007
>>>>>>>> start_time   Wed Dec  5 11:18:05 2007
>>>>>>>> end_time     Wed Dec  5 11:18:06 2007
>>>>>>>> granted_pe   NONE               
>>>>>>>> slots        1                  
>>>>>>>> failed       0   
>>>>>>>> exit_status  0                  
>>>>>>>> ru_wallclock 1           
>>>>>>>> ru_utime     0           
>>>>>>>> ru_stime     0           
>>>>>>>> ru_maxrss    0                  
>>>>>>>> ru_ixrss     0                  
>>>>>>>> ru_ismrss    0                  
>>>>>>>> ru_idrss     0                  
>>>>>>>> ru_isrss     0                  
>>>>>>>> ru_minflt    5328               
>>>>>>>> ru_majflt    0                  
>>>>>>>> ru_nswap     0                  
>>>>>>>> ru_inblock   0                  
>>>>>>>> ru_oublock   0                  
>>>>>>>> ru_msgsnd    0                  
>>>>>>>> ru_msgrcv    0                  
>>>>>>>> ru_nsignals  0                  
>>>>>>>> ru_nvcsw     262                
>>>>>>>> ru_nivcsw    44                 
>>>>>>>> cpu          0           
>>>>>>>> mem          0.000            
>>>>>>>> io           0.000            
>>>>>>>> iow          0.000            
>>>>>>>> maxvmem      0.000
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> ----------------------------------------------
>>>>>>>> I don't know what else change.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you very much,
>>>>>>>> Esteban
>>>>>>>>
>>>>>>>> Melvin Koh escribió:
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> Have you enabled "reporting=true" and "joblog=true" in 
>>>>>>>>>                   
>> "qconf -
>>     
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>> mconf"?>
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> On Fri, 23 Nov 2007, Esteban Freire Garcia wrote:
>>>>>>>>>
>>>>>>>>>   
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> First of all, thanks for answer me. We installed the patch 
>>>>>>>>>>         
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>> yesterday, 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> unfortunately, we continue with the same problem, we will 
>>>>>>>>>>                     
>> try 
>>     
>>>>>>>>>>         
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>> look the 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> jobmanager, because I think for some reason, the 
>>>>>>>>>>         
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>> jobmanager(sge.pm) is 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> not seeing the status for the job correctly, and it 
>>>>>>>>>>                     
>> doesn't 
>>     
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>> know 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>>                     
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>> when 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> the job have finished.
>>>>>>>>>>
>>>>>>>>>> -----------------------------------------------------------
>>>>>>>>>>                     
>> ----
>>     
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>> --
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>>                     
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> --
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>             
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> [esfreire at svgd ~]$  globusrun-ws -submit -pft -s -S -F  
>>>>>>>>>>
>>>>>>>>>>         
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>> https://svgd.cesga.es:8443/wsrf/services/ManagedJobFactoryService -
>>     
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>> Ft 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> SGE -c /bin/hostname
>>>>>>>>>> Delegating user credentials...Done.
>>>>>>>>>> Submitting job...Done.
>>>>>>>>>> Job ID: uuid:580a49d2-9923-11dc-9646-000423ac0723
>>>>>>>>>> Termination time: 11/23/2007 17:49 GMT
>>>>>>>>>> Current job state: Unsubmitted
>>>>>>>>>>
>>>>>>>>>> globusrun-ws: Error querying job state
>>>>>>>>>> -----------------------------------------------------------
>>>>>>>>>>                     
>> ----
>>     
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>> --
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>>                     
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>> -------------------------------------------------------------
>>>>>>>>                 
>> ----
>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>> --
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>             
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>> Thank you very much,
>>>>>>>>>> Esteban
>>>>>>>>>>
>>>>>>>>>> Otheus (aka Timothy J. Shelling) escribi?:
>>>>>>>>>> Hi,
>>>>>>>>>>     
>>>>>>>>>>         
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>>>>> On Nov 20, 2007 9:13 AM, Esteban Freire Garcia 
>>>>>>>>>>>           
>>>>>>>>>>>               
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>>>> <esfreire at cesga.es 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>>> <mailto:esfreire at cesga.es>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>     Hi,
>>>>>>>>>>>
>>>>>>>>>>>     We have installed 'gt4.0.5-x86_64_rhas_4-installer' 
>>>>>>>>>>>                       
>> on 
>>     
>>>>>>>>>>>               
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>> "Red 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>>>>>                         
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>>>> Hat>>>     Enterprise Linux ES release 4 (Nahant)".  ...
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>>>     Now, we are trying to integrate Globus with SGE 
>>>>>>>>>>>                       
>> 6.0u6, 
>>     
>>>>>>>>>>> I don't know if this will help or not. I had to patch 
>>>>>>>>>>>                       
>> gt4.0.2 
>>     
>>>>>>>>>>>           
>>>>>>>>>>>               
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>>>> to work 
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>>>> with SGE 6.0u4 as follows:
>>>>>>>>>>>
>>>>>>>>>>>       
>>>>>>>>>>>           
>>>>>>>>>>>               
>>>>>>>>>>>                   
>>>>>>>>>>>                       
>>>>>>>>>         
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>             
>>>>>>>>             
>>>>>>>>                 
>>>>>>> --------------------------------------------------------------
>>>>>>>               
>> ----
>>     
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> ---
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> To unsubscribe, e-mail: globus-
>>>>>>>               
>> unsubscribe at gridengine.sunsource.net>>>>> For additional commands, 
>> e-mail: globus-
>>     
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> help at gridengine.sunsource.net>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>>         
>>>>>>>           
>>>>>>>               
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>   
>>>>>       
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>
>>     
>
>   





More information about the gridengine-users mailing list