[GE users] accounting and parallel jobs

mlmersel Jerry.Mersel at weizmann.ac.il
Thu Nov 12 12:31:35 GMT 2009


Hi Reuti:

I am using mpich parallel libs. I followed the directions in your how-to
on tight-integration.


My configuration looks like this:

pe_name           mpich
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /storage/SGE-6.1u4/mpi/startmpi.sh  -unique -catch_rsh \
                  $pe_hostfile
stop_proc_args    /storage/SGE-6.1u4/mpi/stopmpi.sh
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task TRUE
urgency_slots     min

How can I check if tight integration is really being implemented.


         Thank you,
            Jerry


<quote who="reuti">
> Hi,
>
> Am 10.11.2009 um 09:50 schrieb mlmersel:
>
>> Hi Reuti:
>>
>>  I am using 6.1U4, tight integration.
>
> can you be more specific? What parallel lib are you using with which
> startup method and what did you do to achieve a tight integration?
> Did you monitor the running job on the nodes, so that they all got
> the additional group id attached? Did you check also a single job
> with "qacct -j <id>"?
>
> -- Reuti
>
>
>>                          Best,
>>                            Jerry
>>
>> <quote who="reuti">
>>> Am 09.11.2009 um 12:59 schrieb mlmersel:
>>>
>>>> and the cpu time?
>>>
>>> For Tightly Intergrated jobs you will get several entries in `qacct`,
>>> unless you specify "accounting_summary TRUE" in the PE configuration.
>>>
>>> This is the recorded time of the CPU usage. This can be changed to be
>>> reserved time (in `qconf -mconf`).
>>>
>>> There was a bug in 6.2 which was fixed in 6.2u1, when the builtin
>>> method killed the slaves too early and their entries were completely
>>> missing. Which version are you using and which method to invoke the
>>> slaves?
>>>
>>> -- Reuti
>>>
>>>
>>>>
>>>> <quote who="reuti">
>>>>> Am 09.11.2009 um 09:20 schrieb mlmersel:
>>>>>
>>>>>> It is tightly integrated.
>>>>>>
>>>>>> <quote who="fy">
>>>>>>> Jerry
>>>>>>>
>>>>>>> Is you parallel environment tightly-integrated?
>>>>>>> Loose integration is one reason for low cpu usage in accounting.
>>>>>>> see:
>>>>>>> http://gridengine.sunsource.net/howto/howto.html#Tight%
>>>>>>> 20Integration%20of%20Parallel%20Libraries
>>>>>
>>>>> Wallclock is just the wallclock w/o multipication by slots.
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>>>
>>>>>>> cheers
>>>>>>> Fred Youhanaie
>>>>>>>
>>>>>>>
>>>>>>> On 08/11/09 13:09, mlmersel wrote:
>>>>>>>> Hi:
>>>>>>>>
>>>>>>>>   We have a group of users who have their own queue and run
>>>>>>>> almost
>>>>>>>> exclusively parallel jobs. The problem is when I calculate the
>>>>>>>> utilization per month (wall clock time / (secs in month * cores)
>>>>>>>> I get
>>>>>>>> a ridiculously small numbers 1%,2%,3%. I know this can't be
>>>>>>>> correct.
>>>>>>>>
>>>>>>>> Is their a problem with the accounting when running parallel
>>>>>>>> jobs?
>>>>>>>> I am using gridengine 6.1u4.
>>>>>>>>
>>>>>>>>                         Thanks,
>>>>>>>>                           Jerry
>>>>>>>>
>>>>>>>> ------------------------------------------------------
>>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>>> dsForumId=38&dsMessageId=225643
>>>>>>>>
>>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> ------------------------------------------------------
>>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>>> dsForumId=38&dsMessageId=225648
>>>>>>>
>>>>>>> To unsubscribe from this discussion, e-mail:
>>>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=225782
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>>
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=225802
>>>>>
>>>>> To unsubscribe from this discussion, e-mail:
>>>>> [users-unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=225804
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=225810
>>>
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=225967
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=225971
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226431

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list