[GE users] complex use of complexes

gragghia gragghia at utk.edu
Wed May 12 22:36:52 BST 2010


One idea I had was to modify the terminate_method for this job's queue 
so that it isn't killed when the rank zero MPI process uses too much 
RAM.  The side effect would be that I couldn't stop the job using qdel 
for any reason (without making the terminate script more 
sophisticated).  Surely there is a cleaner way to exempt one job from 
the h_vmem limits?

- Gerald

On 5/11/2010 4:37 PM, reuti wrote:
> Am 11.05.2010 um 21:52 schrieb gragghia:
>
>    
>> Are you suggesting to break the job up into two jobs with different
>> resource requests?  They would have to be running at the same time
>> (something that I don't think you can guarantee), and MPI wouldn't
>> know
>> how to communicate with the processes of a different job.
>>      
> In principle it's possible to hijack slots from another parallel job.
> So you could submit one job with a request for 128 GB, and one
> parallel job (which will only have a `sleep` or alike inside and
> "job_is_first_task FALSE" [it could also wait for a file "+DONE"
> written by the master job to quit automatically]) with e.g. 7 slots
> requesting 2 GB for each slot as usual. Then the master job can submit
> something with `qrsh -inherit` to the slots from the other job when
> you change the $JOB_ID to be the one from the 7-slots job. Depending
> on the used MPI version, it might be tricky anyway.
>
> Bigger problem as you mentioned: how to force SGE to run both jobs at
> the same time or not at all.
>
> -- Reuti
>
>
>    
>>> Would it be possible to restructure the job so that the first process
>>> is a "master", which requests 128G for a single process, which that
>>> single process then fires off the remaining parts requesting 2G?
>>>
>>>        
>> -- 
>> Gerald Ragghianti
>>
>> Newton HPC Program http://newton.utk.edu/
>> Office of Information Technology
>>    Research Computing Support
>>    Professional Technical Services
>>
>> The University of Tennessee
>> 2309 Kingston Pike
>> Knoxville, TN 37996
>> Phone: 865-974-2448
>>
>> /-------------------------------------\
>> | One Contact       OIT: 865-974-9900 |
>> | Many Solutions         help.utk.edu |
>> \-------------------------------------/
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256980
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>> ].
>>      
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=256984
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>    

-- 
Gerald Ragghianti

Newton HPC Program http://newton.utk.edu/
Office of Information Technology
   Research Computing Support
   Professional Technical Services

The University of Tennessee
2309 Kingston Pike
Knoxville, TN 37996
Phone: 865-974-2448

/-------------------------------------\
| One Contact       OIT: 865-974-9900 |
| Many Solutions         help.utk.edu |
\-------------------------------------/

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257116

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list