[GE users] accounting and parallel jobs

reuti reuti at staff.uni-marburg.de
Thu Nov 12 12:42:39 GMT 2009


Hi,

Am 12.11.2009 um 13:31 schrieb mlmersel:

> I am using mpich parallel libs. I followed the directions in your  
> how-to
> on tight-integration.
>
>
> My configuration looks like this:
>
> pe_name           mpich
> slots             999
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /storage/SGE-6.1u4/mpi/startmpi.sh  -unique - 
> catch_rsh \

why are you using -unique here? When you get an uneven distribution  
of slots, this might result in an call to the wrong node, and might  
be blocked by SGE in qrsh -inherit command.

>                   $pe_hostfile
> stop_proc_args    /storage/SGE-6.1u4/mpi/stopmpi.sh
> allocation_rule   $round_robin
> control_slaves    TRUE
> job_is_first_task TRUE
> urgency_slots     min

ps -e f

(f w/o -) will give you a nice presentation of the running job.

In case something like ssh is compiled into you application, you will  
need:

export P4_RSHCOMMAND=rsh

in your jobscript.

-- Reuti


> How can I check if tight integration is really being implemented.
>
>
>          Thank you,
>             Jerry
>
>
> <quote who="reuti">
>> Hi,
>>
>> Am 10.11.2009 um 09:50 schrieb mlmersel:
>>
>>> Hi Reuti:
>>>
>>>  I am using 6.1U4, tight integration.
>>
>> can you be more specific? What parallel lib are you using with which
>> startup method and what did you do to achieve a tight integration?
>> Did you monitor the running job on the nodes, so that they all got
>> the additional group id attached? Did you check also a single job
>> with "qacct -j <id>"?
>>
>> -- Reuti
>>
>>
>>>                          Best,
>>>                            Jerry
>>>
>>> <quote who="reuti">
>>>> Am 09.11.2009 um 12:59 schrieb mlmersel:
>>>>
>>>>> and the cpu time?
>>>>
>>>> For Tightly Intergrated jobs you will get several entries in  
>>>> `qacct`,
>>>> unless you specify "accounting_summary TRUE" in the PE  
>>>> configuration.
>>>>
>>>> This is the recorded time of the CPU usage. This can be changed  
>>>> to be
>>>> reserved time (in `qconf -mconf`).
>>>>
>>>> There was a bug in 6.2 which was fixed in 6.2u1, when the builtin
>>>> method killed the slaves too early and their entries were  
>>>> completely
>>>> missing. Which version are you using and which method to invoke the
>>>> slaves?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>>
>>>>> <quote who="reuti">
>>>>>> Am 09.11.2009 um 09:20 schrieb mlmersel:
>>>>>>
>>>>>>> It is tightly integrated.
>>>>>>>
>>>>>>> <quote who="fy">
>>>>>>>> Jerry
>>>>>>>>
>>>>>>>> Is you parallel environment tightly-integrated?
>>>>>>>> Loose integration is one reason for low cpu usage in  
>>>>>>>> accounting.
>>>>>>>> see:
>>>>>>>> http://gridengine.sunsource.net/howto/howto.html#Tight%
>>>>>>>> 20Integration%20of%20Parallel%20Libraries
>>>>>>
>>>>>> Wallclock is just the wallclock w/o multipication by slots.
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>> cheers
>>>>>>>> Fred Youhanaie
>>>>>>>>
>>>>>>>>
>>>>>>>> On 08/11/09 13:09, mlmersel wrote:
>>>>>>>>> Hi:
>>>>>>>>>
>>>>>>>>>   We have a group of users who have their own queue and run
>>>>>>>>> almost
>>>>>>>>> exclusively parallel jobs. The problem is when I calculate the
>>>>>>>>> utilization per month (wall clock time / (secs in month *  
>>>>>>>>> cores)
>>>>>>>>> I get
>>>>>>>>> a ridiculously small numbers 1%,2%,3%. I know this can't be
>>>>>>>>> correct.
>>>>>>>>>
>>>>>>>>> Is their a problem with the accounting when running parallel
>>>>>>>>> jobs?
>>>>>>>>> I am using gridengine 6.1u4.
>>>>>>>>>
>>>>>>>>>                         Thanks,
>>>>>>>>>                           Jerry
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------
>>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>>> dsForumId=38&dsMessageId=225643
>>>>>>>>>
>>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=225648
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=225782
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=225802
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=225804
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=225810
>>>>
>>>> To unsubscribe from this discussion, e-mail:
>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=225967
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=225971
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=226431
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226434

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list