[GE users] User Time + System Time != Wall Clock Time

Azhar Ali Shah aas_lakyari at yahoo.com
Sun Apr 13 17:36:50 BST 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi, 
I am using rsh with daemon-based smpd (mpich2-1.0.7rc2) startup method. The ps -e f gives:

5769     1  5768 /usr/SGE6/bin/lx24-x86/sge_qmaster
  5789     1  5789 /usr/SGE6/bin/lx24-x86/sge_schedd
  6337     1  6337 /usr/SGE6/bin/lx24-x86/sge_execd
 25736  6337 25736  \_ sge_shepherd-18 -bg
 25837 25736 25837  |   \_ -sh /usr/SGE6/default/spool/taramel/job_scripts/18
 25915 25837 25837  |       \_ mpiexec -n 4 -machinefile /tmp/18.1.all.q/machines
 25806  6337 25806  \_ sge_shepherd-18 -bg
 25807 25806 25807      \_ /usr/SGE6/utilbin/lx24-x86/rshd -l
 25813 25807 25813          \_ /usr/SGE6/utilbin/lx24-x86/qrsh_starter /usr/SGE6/
 25815 25813 25815              \_ /home/aas/local/mpich2_smpd/bin/smpd -port 200
 25916 25815 25815                  \_ /home/aas/local/mpich2_smpd/bin/smpd -port
 25917 25916 25815                      \_ /home/aas/par_procksi_Alone
 26641 25917 25815                      |   \_ ./fast /home/aas/workspace/AzharPe
 25918 25916 25815                      \_ /home/aas/par_procksi_Alone
 26640 25918 25815                          \_ ./fast /home/aas/workspace/AzharPe
...
 25772     1 25737 /usr/SGE6/bin/lx24-x86/qrsh -inherit taramel /home/aas/local/m
 25808 25772 25737  \_ /usr/SGE6/utilbin/lx24-x86/rsh -p 57419 taramel.cs.nott.ac
 25814 25808 25737      \_ [rsh] <defunct>
 25774     1 25737 /usr/SGE6/bin/lx24-x86/qrsh -inherit smeg /home/aas/local/mpic
 25817 25774 25737  \_ /usr/SGE6/utilbin/lx24-x86/rsh -p 33059 smeg.cs.nott.ac.uk
 25818 25817 25737      \_ [rsh] <defunct>
 25777     1 25737 /usr/SGE6/bin/lx24-x86/qrsh -inherit eomer /home/aas/local/mpi
 25819 25777 25737  \_ /usr/SGE6/utilbin/lx24-x86/rsh -p 33207 eomer.cs.nott.ac.u
 25820 25819 25737      \_ [rsh] <defunct>

but still i don't get any values for User and System time parameters as:

Job 19 (mpich2.sh) Complete
 User             = aas
 Queue            = all.q at xxx
 Host             = taramel.cs.nott.ac.uk
 Start Time       = 04/13/2008 16:19:00
 End Time         = 04/13/2008 17:22:06
 User Time        = 00:00:00
 System Time      = 00:00:00
 Wallclock Time   = 01:03:06
 CPU              = 00:00:00
 Max vmem         = 10.074M
 Exit Status      = 0

Any ideas on how to change this behavior?

thanks
Azhar



Reuti <reuti at staff.uni-marburg.de> wrote: Hi,

Am 03.04.2008 um 12:24 schrieb Azhar Ali Shah:
> Running a parallel job with MPICH2-1.0.7 + SGE demanding 4  
> processors on my cluster gives following statistics:
>
> Job 152 (DS1001-4P) Complete
> User = aas
> Queue = all.q at xxxx
> Host = smeg.cs.nott.ac.uk
> Start Time = 04/02/2008 20:07:37
> End Time = 04/03/2008 00:09:55
> User Time = 00:00:18
> System Time = 00:00:04
> Wallclock Time = 04:02:18
> CPU = 00:00:22
> Max vmem = 8.551M
> Exit Status = 0
>
> I wonder why user time and system time are so minimum as compared  
> to wall clock time. Earlier to this, I ran same task with same data  
> as a sequential job on single machine that gave following statistics:
>
> ob 35 (batchjob.sh) Complete
> User = aas
> Queue = all.q at xxxx
> Host = smeg.cs.nott.ac.uk
> Start Time = 03/06/2008 17:01:34
> End Time = 03/08/2008 04:50:20
> User Time = 1:01:18:28
> System Time = 06:07:43
> Wallclock Time = 1:11:48:46
> CPU = 1:07:26:11
> Max vmem = 398.684M
> Exit Status = 0
>
> With number of processor being 4 in parallel job I can assume the  
> Wall Clock to be true but I cann't understand the values of User  
> and System time in parallel version above. Any thoughts?

these are the typical symptoms when your application is not tightly  
integrated into SGE. Can you check with "ps -e f" , that you are a)  
using SGE's rsh command and b) all child processes are bound to the  
the sge_execd? Using plain system's /usr/bin/rsh or ssh will  
otherwise lead to such a behavior. If you need ssh, you have to  
recompile SGE on your own to get a custom-built ssh including the  
tight intergration facility.

(BTW: the wallclock time looks more like you used 8 cores IMO)

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the gridengine-users mailing list