[GE users] Re: RE : [GE users] Re: RE : [GE users] making consumable not related to PE

Ravi Chandra Nallan Ravichandra.Nallan at Sun.COM
Thu Jul 24 07:16:13 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

GARDAIS Ionel wrote:
>> wouldn't this beat the purpose? you might end up running more than one process per HD!
>>     
> I forgot to explain that the complex name litterally refers to the mount point of the HD.
> But you are true : someone can request "data-1-1" knowing it's available but use files from "data-1-2" which can be used by another job.
>   
to add, when you scale the complex parameter to hds/$NSLOTS, you are 
willing to compromise the restriction that any sge job (process) on a 
host should not use more than hds on that host.  
>> This could be tricky. You might need to elaborate on the what is 
>> "Running one job per disk" would mean?
>> Is it that the job script's going to create/use files on a fs from disk? 
>>     
> Jobs run are very IO intensive and we do not have a clustered filesystem : just FC HD in direct connection.
> That's why I do not want more than one job per disk to optimize IO (and job speed).
>
>   
>> There should be some info that would indicate that this job is using (or going to use) this disk.
>>     
> Sure ! the base directory for data crunching.
>
>   
>> If this is there this could trigger the load script (custom load script) that reports the value to the host complex 'hds'
>>     
> Huummmm ...
> Could you explain this to me a bit deeper ? (my brain is somewhat ... disconnected :)
> Does it mean that load scripts must not be launched by sgeexecd ?
>   
They are.
> How could one complex handle the logic of two HDs ? 
If you somehow find out when a job is using a disk, you decrement the 
hds in the custom load sensor script. These values are frequently 
updated to the qmaster which would make sure the value is more or less 
(depends on the load interval) uptodate (man sge_conf ).
Now in case of 2 disks,
    the actual task of making sure that once a job uses a disk, the 
other job should use the other disk has to be implemented in a 
prolog/epilog of a queue/global .
> How can job B know that complex 'hds' is decremented because job A resquested one HD or the other ?
>   
Because any job that request hds comsumes the HD (qsub -l hds=1). If you 
are talking about excluding the case of both jobs using the same disk, 
see above.
> What happens if a job quit "unexpectedly" ? The complex is not updated back to it initla value.
>   
The load sensor should take care of it!
In summary u need,
 -  a the complex hds to make sure the job is rejected or accepted 
before going onto the host,
 -  a load sensor to update the usage details ( check 
http://gridengine.sunsource.net/howto/loadsensor.html )
 - a prolog/epilog to make sure we assign the correct disk to a job, 
once the job lands on a host.

hope that helps,
regards,
~Ravi

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list