[GE users] Lockfiles and not doing the process on the same file.

reuti reuti at staff.uni-marburg.de
Wed May 26 17:32:36 BST 2010


Hi,

Am 26.05.2010 um 16:32 schrieb mnuzaihan:

> Thanks for giving the input on it. I had already solve the problem of locking the file so the other machines in the cluster won't try to encode the same file by using MUTEX feature in bash.
> 
> However, i am curious on your suggestion on copying the data over to the local machine, though it sounds like an interesting idea since you mentioned the $TMPDIR in sge. Is there a document on this where i can look into it?

to copy stuff, you can use such as an idea:

#!/bin/sh
# Be sure to be in the submitting directory.
#$ -cwd
# First argument is filename
MY_FILE=$1
# Now copy the file to $TMPDIR
cp $MY_FILE $TMPDIR
# Compute in $TMPDIR
cd $TMPDIR
my_application $MY_FILE
# Now copy the result back
cd -
cp $TMPDIR/output $MY_FILE.output

The idea of using two scripts was, that you have one script (this script you run on the command line) which is a loop and checking a directory of your choice and will submit x times a script (the second script) to convert the movies. The second script is just doing the actual conversion and is submitted by the `qsub` of a loop in the first script.

How are you doing it right now? You submit x times the same script, which will check a directory for unconverted files and chose one of them randomly?

-- Reuti


> Another thing is having two scripts to run the task, what do you mean by that?
> 
> Thanks!,
> Muhammad Nuzaihan Kamal
> Network Consultant
> Mobile: +65 97473874
> 
> Asfa Systems Pte Ltd
> 91, Alps Avenue. #03-10. Singapore 498787
> 
> Tel:  +65 62538211
> Fax: +65 62504814
> www.asfasystems.com.sg
> 
> pub   4096R/D4E4DE45 2010-05-19
>       Key fingerprint = F201 D405 C959 0651 39AC  4A48 86B4 CE95 D4E4 DE45
> uid                  Muhammad Nuzaihan Kamalluddin (Asfa Systems Pte. Ltd.) <muhammad at asfasystems.com>
> sub   4096R/80883075 2010-05-19
> 
> 
> 
> On 18-May-2010, at 2:26 AM, reuti wrote:
> 
>> Hi,
>> 
>> Am 17.05.2010 um 16:13 schrieb mnuzaihan:
>> 
>>> Thanks for the reply, now i had realised on the race condition issue  
>>> when implementing a lock file.
>>> 
>>> The script i modified creates a lockfile on the NFS, shared by the  
>>> cluster.
>>> 
>>> I've did something like (if output file [resulting encoded file]  
>>> exists or lockfile exists), it skips the file and then do a loop to  
>>> do other files.
>>> 
>>> So the setting that i did involves putting the directory path of  
>>> where it searches for the raw files to encode and then executes the  
>>> process.
>>> 
>>> In fact, the original script was intended to be done on a single  
>>> local machine but i added the lockfile stuff in the "if ( ! -e  
>>> encoded_file || ! -e lockfile ) then encodes, else skip. But  
>>> executing like you had mentioned, causes a race condition of  
>>> machines in the cluster to check on the lock file which my idea  
>>> doesn't work well.
>> 
>> then I would suggest make two scripts out of the one you have:
>> 
>> The first part is a loop checking for new files (which is an endless  
>> loop I assume). When it finds a new file, it won't convert it, but  
>> submit a job which will do the actual conversion (this second script  
>> is a sub-part of the original one).
>> 
>> As movies are large files (which will put some heavy load on the NFS  
>> server), maybe you can improve performance when you first copy the  
>> file to a local node (into the $TMPDIR which is maintained by SGE),  
>> and then the result back.
>> 
>> -- Reuti
>> 
>> 
>>> I'm sure i had about some movie houses had used gridengine but on  
>>> how they did it, i'm not really sure. But if someone on this list  
>>> had done it and it would be nice to share their experiences on this  
>>> topic.
>>> 
>>> But i know this might not be limited to just encoding files though.
>>> 
>>> Thanks!,
>>> Muhammad Nuzaihan
>>> 
>>> On 17-May-2010, at 5:36 PM, reuti wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Am 15.05.2010 um 21:13 schrieb mnuzaihan:
>>>> 
>>>>> I am having a problem. We do encoding of large and many videos  
>>>>> over the gridengine cluster. However, the problem is no matter how  
>>>>> much i tried to create a lockfile in the script so the other  
>>>>> machines would know there's a lockfile (if the encoding is started  
>>>>> on one machine) and try to encode the next file. It doesn't seem  
>>>>> to work.
>>>> 
>>>> how do you create the lockfile - and where?
>>>> 
>>>> But: there is nothing inside SGE which would prevent a race- 
>>>> condition, where two nodes would start with the same with the same  
>>>> movie. The lockfile-creation will never be atomic when you do it  
>>>> inside the script.
>>>> 
>>>> Can't you just give the filename to the script, and each submitted  
>>>> job will handle exactly this movie? So there wouldn't be a need for  
>>>> a lockfile.
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> Is there someone who had done this before? Any workarounds to this  
>>>>> problem?
>>>>> 
>>>>> Thanks in advance!
>>>>> 
>>>>> Muhammad Nuzaihan
>>>>> 
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257423
>>>>> 
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>>>>> ].
>>>> 
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257578
>>>> 
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>>>> ].
>>> 
>>> Best Regards,
>>> Muhammad Nuzaihan Kamal
>>> Network Consultant
>>> Mobile: +65 97473874
>>> 
>>> Asfa Systems Pte Ltd
>>> 91, Alps Avenue. #03-10. Singapore 498787
>>> 
>>> Tel:  +65 62538211
>>> Fax: +65 62504814
>>> www.asfasystems.com.sg
>>> 
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257629
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258712

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list