[GE users] Question about RQS(resource quota sets)

Reuti reuti at staff.uni-marburg.de
Fri Jun 13 12:51:41 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Am 13.06.2008 um 13:44 schrieb Esteban Freire:

> Hi Reuti,
>
> As far I as understand I have the right permissions, on the cluster  
> which I'm using for the tests (a pre production cluster), I have  
> installed GE as root, the manager user is root, so I understand I  
> have the right permissions, and besides I'm executing the commands  
> as root.
>
> # ls -l pro/default/spool/
> total 4
> drwxr-xr-x  20 root root 4096 Jun 13 13:26 qmaster
>
> # ls -l /usr/local/sge/pro/default/spool/qmaster
> total 364
> drwxr-xr-x   2 root root   4096 May  6 10:54 admin_hosts
> drwxr-xr-x   2 root root   4096 May  6 10:54 calendars
> drwxr-xr-x   2 root root   4096 Jun 12 13:21 centry
> drwxr-xr-x   2 root root   4096 May  6 10:54 ckpt
> drwxr-xr-x   2 root root   4096 Jun 11 14:05 cqueues
> drwxr-xr-x   2 root root   4096 Jun 10 12:03 exec_hosts
> -rw-r--r--   1 root root      6 Jun 13 13:28 heartbeat
> drwxr-xr-x   2 root root   4096 May  6 10:54 hostgroups
> drwxr-xr-x   2 root root   4096 Jun 12 13:23 jobs
> drwxr-xr-x   2 root root   4096 Jun 12 13:23 job_scripts
> -rw-r--r--   1 root root      4 Jun 13 13:26 jobseqnum
> -rw-r--r--   1 root root     72 Jun 13 13:23 managers
> -rw-r--r--   1 root root 272117 Jun 13 13:26 messages
> -rw-r--r--   1 root root     72 Jun 13 13:23 operators
> drwxr-xr-x   2 root root   4096 May  6 10:54 pe
> drwxr-xr-x   2 root root   4096 May  6 10:54 projects
> drwxr-xr-x  10 root root   4096 May  6 10:56 qinstances
> -rw-r--r--   1 root root      6 Jun 13 13:26 qmaster.pid

okay, please create a directory here:

mkdir resource_quotas

maybe it helps. - Reuti


> drwxr-xr-x   2 root root   4096 May  6 10:54 schedd
> drwxr-xr-x   2 root root   4096 May  6 10:56 submit_hosts
> drwxr-xr-x   2 root root   4096 May  6 10:54 usermapping
> drwxr-xr-x   2 root root   4096 Jun 13 13:24 users
> drwxr-xr-x   2 root root   4096 Jun 13 13:23 usersets
> drwxr-xr-x   2 root root   4096 May  6 10:54 zombies
>
> # touch /usr/local/sge/pro/default/spool/qmaster/test
> # ls -l /usr/local/sge/pro/default/spool/qmaster/test
> -rw-r--r--  1 root root 0 Jun 13 13:41 /usr/local/sge/pro/default/ 
> spool/qmaster/test
>
>
> # qconf -sm
> root
> # qconf -so
> root
>
> I don't understand it.
>
> Thanks,
> Esteban
>
> PD: I'm sorry the previous empty e-mail, I gave it to send bottom  
> by error.
>
> Reuti escribió:
>> Am 13.06.2008 um 12:49 schrieb Esteban Freire:
>>
>>> Hi,
>>>
>>> In the messages of the qmaster, I can see:
>>>
>>> 06/13/2008 12:41:31|qmaster|sa3-ce|C|error writing to file  
>>> "resource_quotas/.maxujobs": No such file or directory
>>> 06/13/2008 12:41:31|qmaster|sa3-ce|W|rule "default rule (spool  
>>> dir)" in spooling context "classic spooling" failed writing an  
>>> object
>>
>> maybe a permission problem? The sgeadmin can write to the  
>> $SGE_ROOT/spool/qmaster directory?
>>
>> -- Reuti
>>
>>
>>> I obtain the same error too editing an RQS by hand, and I also  
>>> get it when I do it from a qmon.
>>>
>>> Thanks,
>>> Esteban
>>>
>>> Reuti escribió:
>>>> Hi,
>>>>
>>>> dunno - I can copy and paste your configuration and it's  
>>>> working. Do you see anything in the messages of the qmaster? Can  
>>>> you edit an RQS by hand, I mean "qconf -arqs blabla" ?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>> Am 13.06.2008 um 12:20 schrieb Esteban Freire:
>>>>
>>>>> Hi Reuti, Chansup
>>>>>
>>>>> I have the GE same version installed on three cluster (GE  
>>>>> 6.1u3), and on none of them works, I don't realize where is the  
>>>>> problem.
>>>>> I edited $SGE_ROOT/$SGE_CELL/common/bootstrap , and I changed  
>>>>> admin_user to egeeadm, and I restarted sgemaster,  and I tried  
>>>>> it too putting the same name on the file, but unfortunately it  
>>>>> didn't work.
>>>>>
>>>>> # cat bootstrap
>>>>> admin_user        egeeadm
>>>>> default_domain    egee.cesga.es
>>>>> ignore_fqdn       true
>>>>> spooling_method   classic
>>>>> spooling_lib      libspoolc
>>>>> spooling_params   /usr/local/sge/pro/default/common;/usr/local/ 
>>>>> sge/pro/default/spool/qmaster
>>>>> binary_path       /usr/local/sge/pro/bin
>>>>> qmaster_spool_dir /usr/local/sge/pro/default/spool/qmaster
>>>>> security_mode     none
>>>>>
>>>>> # qconf -sm
>>>>> egeeadm
>>>>> root
>>>>>
>>>>>
>>>>> # cat maxujobs
>>>>> {
>>>>>   name         maxujobs
>>>>>   description  NONE
>>>>>   enabled      true
>>>>>   limit users ops to slots=10
>>>>>   limit users opssgm to slots=10
>>>>>   limit users dteam to slots=10
>>>>>   limit users swetest to slots=2
>>>>>   limit users cesga to slots=100
>>>>>   limit users imath to slots=100
>>>>>   limit users lhcb to slots=50
>>>>>   limit users lhcbprd to slots=50
>>>>>   limit users lhcbsgm to slots=50
>>>>>   limit users compchem to slots=30
>>>>>   limit users fusion to slots=60
>>>>>   limit users biomed to slots=30
>>>>>   limit users biomedsgm to slots=14
>>>>>   limit users alice to slots=50
>>>>>   limit users alicesgm to slots=4
>>>>>   limit users atlas to slots=15
>>>>>   limit users atlassgm to slots=3
>>>>>   limit users cms to slots=10
>>>>> }
>>>>>
>>>>> $ qconf -Arqs /usr/local/sge/pro/default/common/maxujobs
>>>>> error writing object "maxujobs" to spooling database
>>>>>
>>>>>
>>>>> I would appreciate any help.
>>>>>
>>>>> Thanks,
>>>>> Esteban
>>>>>
>>>>> Reuti escribió:
>>>>>> Hi,
>>>>>>
>>>>>> Am 12.06.2008 um 14:56 schrieb Esteban Freire:
>>>>>>
>>>>>>> Hi Reuti,
>>>>>>>
>>>>>>> Maybe can be because I don't have defined any DB on bootstrap  
>>>>>>> file.
>>>>>>>
>>>>>>> [root at sa3-ce common]#  cat $SGE_ROOT/$SGE_CELL/common/bootstrap
>>>>>>> admin_user        none
>>>>>>> default_domain    egee.cesga.es
>>>>>>> ignore_fqdn       true
>>>>>>> spooling_method   classic
>>>>>>> spooling_lib      libspoolc
>>>>>>> spooling_params   /usr/local/sge/pro/default/common;/usr/ 
>>>>>>> local/sge/pro/default/spool/qmaster
>>>>>>> binary_path       /usr/local/sge/pro/bin
>>>>>>> qmaster_spool_dir /usr/local/sge/pro/default/spool/qmaster
>>>>>>> security_mode     none
>>>>>>
>>>>>> no, also with classic it's working for me. Maybe it's a  
>>>>>> protection problem? Or you have the "admin_user none"?
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Esteban
>>>>>>>
>>>>>>> Esteban Freire escribió:
>>>>>>>> Reuti escribió:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Am 12.06.2008 um 13:36 schrieb Esteban Freire:
>>>>>>>>>
>>>>>>>>>> Hi Reuti,
>>>>>>>>>>
>>>>>>>>>> I must be doing something wrong, because I get an error  
>>>>>>>>>> when I try to add resource quota set from file.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [esfreire at sa3-ce common]# cat maxjobs_rule
>>>>>>>>>> {
>>>>>>>>>> name maxujobs
>>>>>>>>>> enabled true
>>>>>>>>>> limit users ops to num_proc=10
>>>>>>>>>> limit users opssgm to num_proc=10
>>>>>>>>>> limit users dteam to num_proc=10
>>>>>>>>>> limit users swetest to num_proc=2
>>>>>>>>>> limit users cesga to num_proc=100
>>>>>>>>>> limit users imath to num_proc=100
>>>>>>>>>> limit users lhcb to num_proc=50
>>>>>>>>>> limit users lhcbprd to num_proc=50
>>>>>>>>>> limit users lhcbsgm to num_proc=50
>>>>>>>>>> limit users compchem to num_proc=30
>>>>>>>>>> limit users fusion to num_proc=60
>>>>>>>>>> limit users biomed to num_proc=30
>>>>>>>>>> limit users biomedsgm to num_proc=14
>>>>>>>>>> limit users alice to num_proc=50
>>>>>>>>>> limit users alicesgm to num_proc=4
>>>>>>>>>> limit users atlas to num_proc=15
>>>>>>>>>> limit users atlassgm to num_proc=3
>>>>>>>>>> limit users cms to num_proc=10
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> this is exactly what I wouldn't do (as num_proc is a fixed  
>>>>>>>>> feature of a particular machine), but use slots instead.  
>>>>>>>>> Users with the same number of slots you can also put in one  
>>>>>>>>> line:
>>>>>>>>>
>>>>>>>>> limit users {ops, opssgm, dteam} to slots=10
>>>>>>>>>
>>>>>>>>> (watch out for the curley braces).
>>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have changed num_proc by slots, but I keep having the same  
>>>>>>>> problem. I'm using GE 6.1u3. I tried to do this on another  
>>>>>>>> cluster running GE 6.1.u3 too, and I got the same error.  
>>>>>>>> Maybe I'm obviating some step.
>>>>>>>>
>>>>>>>> [root at sa3-ce common]# cat maxjobs_rule
>>>>>>>> {
>>>>>>>> name maxujobs
>>>>>>>> enabled true
>>>>>>>> limit users ops to slots=10
>>>>>>>> limit users opssgm to slots=10
>>>>>>>> limit users dteam to slots=10
>>>>>>>> limit users swetest to slots=2
>>>>>>>> limit users cesga to slots=100
>>>>>>>> limit users imath to slots=100
>>>>>>>> limit users lhcb to slots=50
>>>>>>>> limit users lhcbprd to slots=50
>>>>>>>> limit users lhcbsgm to slots=50
>>>>>>>> limit users compchem to slots=30
>>>>>>>> limit users fusion to slots=60
>>>>>>>> limit users biomed to slots=30
>>>>>>>> limit users biomedsgm to slots=14
>>>>>>>> limit users alice to slots=50
>>>>>>>> limit users alicesgm to slots=4
>>>>>>>> limit users atlas to slots=15
>>>>>>>> limit users atlassgm to slots=3
>>>>>>>> limit users cms to slots=10
>>>>>>>> }
>>>>>>>> [root at sa3-ce common]# qconf -Arqs maxjobs_rule
>>>>>>>> error writing object "maxujobs" to spooling database
>>>>>>>> [root at sa3-ce common]# qconf -help | head -n 1
>>>>>>>> GE 6.1u3
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Esteban
>>>>>>>>>
>>>>>>>>>> [esfreire at sa3-ce common]# qconf -Arqs maxjobs_rule
>>>>>>>>>> error writing object "maxujobs" to spooling database
>>>>>>>>>
>>>>>>>>> Which SGE version? For me it's working.
>>>>>>>>>
>>>>>>>>> -- Reuti
>>>>>>>>>
>>>>>>>>>> Surely  I need create this object, but How can I create  
>>>>>>>>>> this object?
>>>>>>>>>> Could you help me, please?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Esteban
>>>>>>>>>>
>>>>>>>>>> Reuti escribió:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Am 12.06.2008 um 12:25 schrieb Esteban Freire:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for answering so quick. Yes, you're right, If I  
>>>>>>>>>>>> use the num_proc rule, I wil make num_proc consumable at  
>>>>>>>>>>>> all.
>>>>>>>>>>>>
>>>>>>>>>>>> Indee, as GE administrator, I think they could be two  
>>>>>>>>>>>> good RFE to consider in a GE new release, I mean, can  
>>>>>>>>>>>> configure the maximum number of jobs which could be  
>>>>>>>>>>>> running/queued for each group. But it's only my opinion :)
>>>>>>>>>>>
>>>>>>>>>>> feel free to enter an issue: user/userlists for max_u_jobs.
>>>>>>>>>>>
>>>>>>>>>>> max_u_jobs        20,reuti=30, at bme=15
>>>>>>>>>>>
>>>>>>>>>>> could be the syntax.
>>>>>>>>>>>
>>>>>>>>>>> -- Reuti
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Esteban
>>>>>>>>>>>>
>>>>>>>>>>>> Reuti escribió:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 12.06.2008 um 11:52 schrieb Esteban Freire:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm playing with RQS, in order to limit the maximum  
>>>>>>>>>>>>>> number of jobs which can be running/queued by group  
>>>>>>>>>>>>>> instead of by user. Assuming that I have created an  
>>>>>>>>>>>>>> user_list for each users group, I have eight user_list  
>>>>>>>>>>>>>> in total. I would like to configure a rule, in which I  
>>>>>>>>>>>>>> establish the maximum number of slots which each group  
>>>>>>>>>>>>>> can be using, for example, I have these rules below,  
>>>>>>>>>>>>>> one is limiting by slots, and the other one, it's  
>>>>>>>>>>>>>> limiting by number of processors, but at the end I  
>>>>>>>>>>>>>> only  would chose one of them, because in our case  
>>>>>>>>>>>>>> it's the same thing, users only can be use one  
>>>>>>>>>>>>>> processor by slot.  Example:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> name maxujobs
>>>>>>>>>>>>>> enabled true
>>>>>>>>>>>>>> limit users userlist1 to slots=10
>>>>>>>>>>>>>> limit users userlist2 to slots=20
>>>>>>>>>>>>>> limit users userlist3 to slots=25
>>>>>>>>>>>>>> limit users userlist4 to slots=50
>>>>>>>>>>>>>> limit users userlist5 to slots=60
>>>>>>>>>>>>>> limit users userlist6 to slots=70
>>>>>>>>>>>>>> limit users userlist7 to slots=100
>>>>>>>>>>>>>> limit users userlist8 to slots=200
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> correct.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> {
>>>>>>>>>>>>>> name maxujobs
>>>>>>>>>>>>>> enabled true
>>>>>>>>>>>>>> limit users userlist1 to num_proc=10
>>>>>>>>>>>>>> limit users userlist2 to num_proc=20
>>>>>>>>>>>>>> limit users userlist3 to num_proc=25
>>>>>>>>>>>>>> limit users userlist4 to num_proc=50
>>>>>>>>>>>>>> limit users userlist5 to num_proc=60
>>>>>>>>>>>>>> limit users userlist6 to num_proc=70
>>>>>>>>>>>>>> limit users userlist7 to num_proc=100
>>>>>>>>>>>>>> limit users userlist8 to num_proc=200
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wouldn't make num_proc consumable at all, as it's  
>>>>>>>>>>>>> just a fixed feature of a node - number of available  
>>>>>>>>>>>>> cores, set by SGE automatically.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> With these rules, Am I defining of the total slots/ 
>>>>>>>>>>>>>> processors which could use each group? In other words,  
>>>>>>>>>>>>>> the maximum number of jobs which could be running for  
>>>>>>>>>>>>>> each group in total.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Only the number of slots you can limit this way for  
>>>>>>>>>>>>> now. There is also an RFE to limit the number of jobs.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Other things which I would like to configure are, for  
>>>>>>>>>>>>>> the rules above, it would be establish de hard  and  
>>>>>>>>>>>>>> soft limit of slots which each group could use, and do  
>>>>>>>>>>>>>> the same thing, but for queued jobs, I mean, configure  
>>>>>>>>>>>>>> a rule with the maximum number of jobs which can be  
>>>>>>>>>>>>>> queued by group, and when they reach
>>>>>>>>>>>>>> this limit, that they can't do a qsub. Anyone know a  
>>>>>>>>>>>>>> way to do this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> For this feature there are only two entries in SGE's  
>>>>>>>>>>>>> configuration, but it can't be set by group. Just in  
>>>>>>>>>>>>> total and per user (max_jobs, max_u_jobs).
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Reuti
>>>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list