[GE users] Question about RQS(resource quota sets)

Esteban Freire esfreire at cesga.es
Fri Jun 13 13:34:26 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

Indeed, sometimes the solution is too much easy to see it, and my first 
error was that I didn't look the qmaster messages in first place, I 
afraid I have lost points as GE administrator :)

I created the directory resource_quotas, I tried again and it worked!!

# ls -l /usr/local/sge/pro/default/spool/qmaster | grep resource_quotas
drwxr-xr-x   2 root root   4096 Jun 13 14:22 resource_quotas


#  qconf -Arqs /usr/local/sge/pro/default/common/maxujobs
root added "maxujobs" to resource quota set list

Only one more question, should this directory be created by default with 
the installation? or Do you need add an extra option on the installation 
to create this directory?

Thanks,
Esteban

Reuti escribió:
> Am 13.06.2008 um 13:44 schrieb Esteban Freire:
>
>> Hi Reuti,
>>
>> As far I as understand I have the right permissions, on the cluster 
>> which I'm using for the tests (a pre production cluster), I have 
>> installed GE as root, the manager user is root, so I understand I 
>> have the right permissions, and besides I'm executing the commands as 
>> root.
>>
>> # ls -l pro/default/spool/
>> total 4
>> drwxr-xr-x  20 root root 4096 Jun 13 13:26 qmaster
>>
>> # ls -l /usr/local/sge/pro/default/spool/qmaster
>> total 364
>> drwxr-xr-x   2 root root   4096 May  6 10:54 admin_hosts
>> drwxr-xr-x   2 root root   4096 May  6 10:54 calendars
>> drwxr-xr-x   2 root root   4096 Jun 12 13:21 centry
>> drwxr-xr-x   2 root root   4096 May  6 10:54 ckpt
>> drwxr-xr-x   2 root root   4096 Jun 11 14:05 cqueues
>> drwxr-xr-x   2 root root   4096 Jun 10 12:03 exec_hosts
>> -rw-r--r--   1 root root      6 Jun 13 13:28 heartbeat
>> drwxr-xr-x   2 root root   4096 May  6 10:54 hostgroups
>> drwxr-xr-x   2 root root   4096 Jun 12 13:23 jobs
>> drwxr-xr-x   2 root root   4096 Jun 12 13:23 job_scripts
>> -rw-r--r--   1 root root      4 Jun 13 13:26 jobseqnum
>> -rw-r--r--   1 root root     72 Jun 13 13:23 managers
>> -rw-r--r--   1 root root 272117 Jun 13 13:26 messages
>> -rw-r--r--   1 root root     72 Jun 13 13:23 operators
>> drwxr-xr-x   2 root root   4096 May  6 10:54 pe
>> drwxr-xr-x   2 root root   4096 May  6 10:54 projects
>> drwxr-xr-x  10 root root   4096 May  6 10:56 qinstances
>> -rw-r--r--   1 root root      6 Jun 13 13:26 qmaster.pid
>
> okay, please create a directory here:
>
> mkdir resource_quotas
>
> maybe it helps. - Reuti
>
>
>> drwxr-xr-x   2 root root   4096 May  6 10:54 schedd
>> drwxr-xr-x   2 root root   4096 May  6 10:56 submit_hosts
>> drwxr-xr-x   2 root root   4096 May  6 10:54 usermapping
>> drwxr-xr-x   2 root root   4096 Jun 13 13:24 users
>> drwxr-xr-x   2 root root   4096 Jun 13 13:23 usersets
>> drwxr-xr-x   2 root root   4096 May  6 10:54 zombies
>>
>> # touch /usr/local/sge/pro/default/spool/qmaster/test
>> # ls -l /usr/local/sge/pro/default/spool/qmaster/test
>> -rw-r--r--  1 root root 0 Jun 13 13:41 
>> /usr/local/sge/pro/default/spool/qmaster/test
>>
>>
>> # qconf -sm
>> root
>> # qconf -so
>> root
>>
>> I don't understand it.
>>
>> Thanks,
>> Esteban
>>
>> PD: I'm sorry the previous empty e-mail, I gave it to send bottom by 
>> error.
>>
>> Reuti escribió:
>>> Am 13.06.2008 um 12:49 schrieb Esteban Freire:
>>>
>>>> Hi,
>>>>
>>>> In the messages of the qmaster, I can see:
>>>>
>>>> 06/13/2008 12:41:31|qmaster|sa3-ce|C|error writing to file 
>>>> "resource_quotas/.maxujobs": No such file or directory
>>>> 06/13/2008 12:41:31|qmaster|sa3-ce|W|rule "default rule (spool 
>>>> dir)" in spooling context "classic spooling" failed writing an object
>>>
>>> maybe a permission problem? The sgeadmin can write to the 
>>> $SGE_ROOT/spool/qmaster directory?
>>>
>>> -- Reuti
>>>
>>>
>>>> I obtain the same error too editing an RQS by hand, and I also get 
>>>> it when I do it from a qmon.
>>>>
>>>> Thanks,
>>>> Esteban
>>>>
>>>> Reuti escribió:
>>>>> Hi,
>>>>>
>>>>> dunno - I can copy and paste your configuration and it's working. 
>>>>> Do you see anything in the messages of the qmaster? Can you edit 
>>>>> an RQS by hand, I mean "qconf -arqs blabla" ?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>> Am 13.06.2008 um 12:20 schrieb Esteban Freire:
>>>>>
>>>>>> Hi Reuti, Chansup
>>>>>>
>>>>>> I have the GE same version installed on three cluster (GE 6.1u3), 
>>>>>> and on none of them works, I don't realize where is the problem.
>>>>>> I edited $SGE_ROOT/$SGE_CELL/common/bootstrap , and I changed 
>>>>>> admin_user to egeeadm, and I restarted sgemaster,  and I tried it 
>>>>>> too putting the same name on the file, but unfortunately it 
>>>>>> didn't work.
>>>>>>
>>>>>> # cat bootstrap
>>>>>> admin_user        egeeadm
>>>>>> default_domain    egee.cesga.es
>>>>>> ignore_fqdn       true
>>>>>> spooling_method   classic
>>>>>> spooling_lib      libspoolc
>>>>>> spooling_params   
>>>>>> /usr/local/sge/pro/default/common;/usr/local/sge/pro/default/spool/qmaster 
>>>>>>
>>>>>> binary_path       /usr/local/sge/pro/bin
>>>>>> qmaster_spool_dir /usr/local/sge/pro/default/spool/qmaster
>>>>>> security_mode     none
>>>>>>
>>>>>> # qconf -sm
>>>>>> egeeadm
>>>>>> root
>>>>>>
>>>>>>
>>>>>> # cat maxujobs
>>>>>> {
>>>>>>   name         maxujobs
>>>>>>   description  NONE
>>>>>>   enabled      true
>>>>>>   limit users ops to slots=10
>>>>>>   limit users opssgm to slots=10
>>>>>>   limit users dteam to slots=10
>>>>>>   limit users swetest to slots=2
>>>>>>   limit users cesga to slots=100
>>>>>>   limit users imath to slots=100
>>>>>>   limit users lhcb to slots=50
>>>>>>   limit users lhcbprd to slots=50
>>>>>>   limit users lhcbsgm to slots=50
>>>>>>   limit users compchem to slots=30
>>>>>>   limit users fusion to slots=60
>>>>>>   limit users biomed to slots=30
>>>>>>   limit users biomedsgm to slots=14
>>>>>>   limit users alice to slots=50
>>>>>>   limit users alicesgm to slots=4
>>>>>>   limit users atlas to slots=15
>>>>>>   limit users atlassgm to slots=3
>>>>>>   limit users cms to slots=10
>>>>>> }
>>>>>>
>>>>>> $ qconf -Arqs /usr/local/sge/pro/default/common/maxujobs
>>>>>> error writing object "maxujobs" to spooling database
>>>>>>
>>>>>>
>>>>>> I would appreciate any help.
>>>>>>
>>>>>> Thanks,
>>>>>> Esteban
>>>>>>
>>>>>> Reuti escribió:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 12.06.2008 um 14:56 schrieb Esteban Freire:
>>>>>>>
>>>>>>>> Hi Reuti,
>>>>>>>>
>>>>>>>> Maybe can be because I don't have defined any DB on bootstrap 
>>>>>>>> file.
>>>>>>>>
>>>>>>>> [root at sa3-ce common]#  cat $SGE_ROOT/$SGE_CELL/common/bootstrap
>>>>>>>> admin_user        none
>>>>>>>> default_domain    egee.cesga.es
>>>>>>>> ignore_fqdn       true
>>>>>>>> spooling_method   classic
>>>>>>>> spooling_lib      libspoolc
>>>>>>>> spooling_params   
>>>>>>>> /usr/local/sge/pro/default/common;/usr/local/sge/pro/default/spool/qmaster 
>>>>>>>>
>>>>>>>> binary_path       /usr/local/sge/pro/bin
>>>>>>>> qmaster_spool_dir /usr/local/sge/pro/default/spool/qmaster
>>>>>>>> security_mode     none
>>>>>>>
>>>>>>> no, also with classic it's working for me. Maybe it's a 
>>>>>>> protection problem? Or you have the "admin_user none"?
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Esteban
>>>>>>>>
>>>>>>>> Esteban Freire escribió:
>>>>>>>>> Reuti escribió:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Am 12.06.2008 um 13:36 schrieb Esteban Freire:
>>>>>>>>>>
>>>>>>>>>>> Hi Reuti,
>>>>>>>>>>>
>>>>>>>>>>> I must be doing something wrong, because I get an error when 
>>>>>>>>>>> I try to add resource quota set from file.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> [esfreire at sa3-ce common]# cat maxjobs_rule
>>>>>>>>>>> {
>>>>>>>>>>> name maxujobs
>>>>>>>>>>> enabled true
>>>>>>>>>>> limit users ops to num_proc=10
>>>>>>>>>>> limit users opssgm to num_proc=10
>>>>>>>>>>> limit users dteam to num_proc=10
>>>>>>>>>>> limit users swetest to num_proc=2
>>>>>>>>>>> limit users cesga to num_proc=100
>>>>>>>>>>> limit users imath to num_proc=100
>>>>>>>>>>> limit users lhcb to num_proc=50
>>>>>>>>>>> limit users lhcbprd to num_proc=50
>>>>>>>>>>> limit users lhcbsgm to num_proc=50
>>>>>>>>>>> limit users compchem to num_proc=30
>>>>>>>>>>> limit users fusion to num_proc=60
>>>>>>>>>>> limit users biomed to num_proc=30
>>>>>>>>>>> limit users biomedsgm to num_proc=14
>>>>>>>>>>> limit users alice to num_proc=50
>>>>>>>>>>> limit users alicesgm to num_proc=4
>>>>>>>>>>> limit users atlas to num_proc=15
>>>>>>>>>>> limit users atlassgm to num_proc=3
>>>>>>>>>>> limit users cms to num_proc=10
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> this is exactly what I wouldn't do (as num_proc is a fixed 
>>>>>>>>>> feature of a particular machine), but use slots instead. 
>>>>>>>>>> Users with the same number of slots you can also put in one 
>>>>>>>>>> line:
>>>>>>>>>>
>>>>>>>>>> limit users {ops, opssgm, dteam} to slots=10
>>>>>>>>>>
>>>>>>>>>> (watch out for the curley braces).
>>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have changed num_proc by slots, but I keep having the same 
>>>>>>>>> problem. I'm using GE 6.1u3. I tried to do this on another 
>>>>>>>>> cluster running GE 6.1.u3 too, and I got the same error. Maybe 
>>>>>>>>> I'm obviating some step.
>>>>>>>>>
>>>>>>>>> [root at sa3-ce common]# cat maxjobs_rule
>>>>>>>>> {
>>>>>>>>> name maxujobs
>>>>>>>>> enabled true
>>>>>>>>> limit users ops to slots=10
>>>>>>>>> limit users opssgm to slots=10
>>>>>>>>> limit users dteam to slots=10
>>>>>>>>> limit users swetest to slots=2
>>>>>>>>> limit users cesga to slots=100
>>>>>>>>> limit users imath to slots=100
>>>>>>>>> limit users lhcb to slots=50
>>>>>>>>> limit users lhcbprd to slots=50
>>>>>>>>> limit users lhcbsgm to slots=50
>>>>>>>>> limit users compchem to slots=30
>>>>>>>>> limit users fusion to slots=60
>>>>>>>>> limit users biomed to slots=30
>>>>>>>>> limit users biomedsgm to slots=14
>>>>>>>>> limit users alice to slots=50
>>>>>>>>> limit users alicesgm to slots=4
>>>>>>>>> limit users atlas to slots=15
>>>>>>>>> limit users atlassgm to slots=3
>>>>>>>>> limit users cms to slots=10
>>>>>>>>> }
>>>>>>>>> [root at sa3-ce common]# qconf -Arqs maxjobs_rule
>>>>>>>>> error writing object "maxujobs" to spooling database
>>>>>>>>> [root at sa3-ce common]# qconf -help | head -n 1
>>>>>>>>> GE 6.1u3
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Esteban
>>>>>>>>>>
>>>>>>>>>>> [esfreire at sa3-ce common]# qconf -Arqs maxjobs_rule
>>>>>>>>>>> error writing object "maxujobs" to spooling database
>>>>>>>>>>
>>>>>>>>>> Which SGE version? For me it's working.
>>>>>>>>>>
>>>>>>>>>> -- Reuti
>>>>>>>>>>
>>>>>>>>>>> Surely  I need create this object, but How can I create this 
>>>>>>>>>>> object?
>>>>>>>>>>> Could you help me, please?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Esteban
>>>>>>>>>>>
>>>>>>>>>>> Reuti escribió:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 12.06.2008 um 12:25 schrieb Esteban Freire:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for answering so quick. Yes, you're right, If I use 
>>>>>>>>>>>>> the num_proc rule, I wil make num_proc consumable at all.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Indee, as GE administrator, I think they could be two good 
>>>>>>>>>>>>> RFE to consider in a GE new release, I mean, can configure 
>>>>>>>>>>>>> the maximum number of jobs which could be running/queued 
>>>>>>>>>>>>> for each group. But it's only my opinion :)
>>>>>>>>>>>>
>>>>>>>>>>>> feel free to enter an issue: user/userlists for max_u_jobs.
>>>>>>>>>>>>
>>>>>>>>>>>> max_u_jobs        20,reuti=30, at bme=15
>>>>>>>>>>>>
>>>>>>>>>>>> could be the syntax.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Esteban
>>>>>>>>>>>>>
>>>>>>>>>>>>> Reuti escribió:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 12.06.2008 um 11:52 schrieb Esteban Freire:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm playing with RQS, in order to limit the maximum 
>>>>>>>>>>>>>>> number of jobs which can be running/queued by group 
>>>>>>>>>>>>>>> instead of by user. Assuming that I have created an 
>>>>>>>>>>>>>>> user_list for each users group, I have eight user_list 
>>>>>>>>>>>>>>> in total. I would like to configure a rule, in which I 
>>>>>>>>>>>>>>> establish the maximum number of slots which each group 
>>>>>>>>>>>>>>> can be using, for example, I have these rules below, one 
>>>>>>>>>>>>>>> is limiting by slots, and the other one, it's limiting 
>>>>>>>>>>>>>>> by number of processors, but at the end I only  would 
>>>>>>>>>>>>>>> chose one of them, because in our case it's the same 
>>>>>>>>>>>>>>> thing, users only can be use one processor by slot.  
>>>>>>>>>>>>>>> Example:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> name maxujobs
>>>>>>>>>>>>>>> enabled true
>>>>>>>>>>>>>>> limit users userlist1 to slots=10
>>>>>>>>>>>>>>> limit users userlist2 to slots=20
>>>>>>>>>>>>>>> limit users userlist3 to slots=25
>>>>>>>>>>>>>>> limit users userlist4 to slots=50
>>>>>>>>>>>>>>> limit users userlist5 to slots=60
>>>>>>>>>>>>>>> limit users userlist6 to slots=70
>>>>>>>>>>>>>>> limit users userlist7 to slots=100
>>>>>>>>>>>>>>> limit users userlist8 to slots=200
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> correct.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> name maxujobs
>>>>>>>>>>>>>>> enabled true
>>>>>>>>>>>>>>> limit users userlist1 to num_proc=10
>>>>>>>>>>>>>>> limit users userlist2 to num_proc=20
>>>>>>>>>>>>>>> limit users userlist3 to num_proc=25
>>>>>>>>>>>>>>> limit users userlist4 to num_proc=50
>>>>>>>>>>>>>>> limit users userlist5 to num_proc=60
>>>>>>>>>>>>>>> limit users userlist6 to num_proc=70
>>>>>>>>>>>>>>> limit users userlist7 to num_proc=100
>>>>>>>>>>>>>>> limit users userlist8 to num_proc=200
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I wouldn't make num_proc consumable at all, as it's just 
>>>>>>>>>>>>>> a fixed feature of a node - number of available cores, 
>>>>>>>>>>>>>> set by SGE automatically.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With these rules, Am I defining of the total 
>>>>>>>>>>>>>>> slots/processors which could use each group? In other 
>>>>>>>>>>>>>>> words, the maximum number of jobs which could be running 
>>>>>>>>>>>>>>> for each group in total.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Only the number of slots you can limit this way for now. 
>>>>>>>>>>>>>> There is also an RFE to limit the number of jobs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Other things which I would like to configure are, for 
>>>>>>>>>>>>>>> the rules above, it would be establish de hard  and soft 
>>>>>>>>>>>>>>> limit of slots which each group could use, and do the 
>>>>>>>>>>>>>>> same thing, but for queued jobs, I mean, configure a 
>>>>>>>>>>>>>>> rule with the maximum number of jobs which can be queued 
>>>>>>>>>>>>>>> by group, and when they reach
>>>>>>>>>>>>>>> this limit, that they can't do a qsub. Anyone know a way 
>>>>>>>>>>>>>>> to do this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For this feature there are only two entries in SGE's 
>>>>>>>>>>>>>> configuration, but it can't be set by group. Just in 
>>>>>>>>>>>>>> total and per user (max_jobs, max_u_jobs).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Reuti
>>>>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list