[GE users] Lockfiles and not doing the process on the same file.

mnuzaihan muhammad at asfasystems.com
Wed May 26 18:47:11 BST 2010


Hi Reuti,

Nice one and thank you for all your hard work responding and spending time to each and every question in this mailing list. :-)

Actually the solution i had using mutex in bash so the race condition problem is eliminated using directories, so it's atomic.

http://wiki.bash-hackers.org/howto/mutex

i made it like this:

----
LOCKDIR="/Volumes/Drobo/tmp/${dvdName}.m4vlock"

if mkdir "${LOCKDIR}" &>/dev/null && [[ ! -e  $outputFilePath || skipDuplicates -eq 0 ]];

and have this code at the end, before "else" statement:

                        trap 'ECODE=$?;
                        echo "[statsgen] Removing lock." >&2
                        rm -rf "${LOCKDIR}"' 0
-----

It solves the race condition problems and now i don't have it anymore. the encoding is churning well except with some NFS issues, which is unrelated to the race condition issue.

I hope the above solution will help others who may encounter the same race condition problem as i did.

and thanks for the example script, i fully understood it well and i might be implementing it in my script since i have NFS problems (notes below) -

But the problem that i have now (which is weird), the NFS clients got disconnected from the NFS server, one by one for no reason. (All the machines including NFS server, NFS clients are running on Mac OS X with NFS server running OS X server and NFS clients running OS X desktop), though i seriously prefer Linux or FreeBSD to be the NFS server, i don't quite have a choice on the OS. It may be

There is one question left. Another guy told me of doing somewhat "parallel" but not a real "parallel" job. He was thinking of cutting the files into parts and send over to the cluster for encoding. When completed, all we need to do is to use "cat" to combine the three or more files together but i feel it's a bad idea and serves no purpose while adding more work on the script.

I don't know what others think of it and hope you do.

Thanks!,
Muhammad Nuzaihan Kamal
Network Consultant
Mobile: +65 97473874

Asfa Systems Pte Ltd
91, Alps Avenue. #03-10. Singapore 498787

Tel:  +65 62538211
Fax: +65 62504814
www.asfasystems.com.sg<http://www.asfasystems.com.sg/>

pub   4096R/D4E4DE45 2010-05-19
      Key fingerprint = F201 D405 C959 0651 39AC  4A48 86B4 CE95 D4E4 DE45
uid                  Muhammad Nuzaihan Kamalluddin (Asfa Systems Pte. Ltd.) <muhammad at asfasystems.com<mailto:muhammad at asfasystems.com>>
sub   4096R/80883075 2010-05-19



On 27-May-2010, at 12:32 AM, reuti wrote:

Hi,

Am 26.05.2010 um 16:32 schrieb mnuzaihan:

Thanks for giving the input on it. I had already solve the problem of locking the file so the other machines in the cluster won't try to encode the same file by using MUTEX feature in bash.

However, i am curious on your suggestion on copying the data over to the local machine, though it sounds like an interesting idea since you mentioned the $TMPDIR in sge. Is there a document on this where i can look into it?

to copy stuff, you can use such as an idea:

#!/bin/sh
# Be sure to be in the submitting directory.
#$ -cwd
# First argument is filename
MY_FILE=$1
# Now copy the file to $TMPDIR
cp $MY_FILE $TMPDIR
# Compute in $TMPDIR
cd $TMPDIR
my_application $MY_FILE
# Now copy the result back
cd -
cp $TMPDIR/output $MY_FILE.output

The idea of using two scripts was, that you have one script (this script you run on the command line) which is a loop and checking a directory of your choice and will submit x times a script (the second script) to convert the movies. The second script is just doing the actual conversion and is submitted by the `qsub` of a loop in the first script.

How are you doing it right now? You submit x times the same script, which will check a directory for unconverted files and chose one of them randomly?

-- Reuti


Another thing is having two scripts to run the task, what do you mean by that?

Thanks!,
Muhammad Nuzaihan Kamal
Network Consultant
Mobile: +65 97473874

Asfa Systems Pte Ltd
91, Alps Avenue. #03-10. Singapore 498787

Tel:  +65 62538211
Fax: +65 62504814
www.asfasystems.com.sg<http://www.asfasystems.com.sg>

pub   4096R/D4E4DE45 2010-05-19
     Key fingerprint = F201 D405 C959 0651 39AC  4A48 86B4 CE95 D4E4 DE45
uid                  Muhammad Nuzaihan Kamalluddin (Asfa Systems Pte. Ltd.) <muhammad at asfasystems.com<mailto:muhammad at asfasystems.com>>
sub   4096R/80883075 2010-05-19



On 18-May-2010, at 2:26 AM, reuti wrote:

Hi,

Am 17.05.2010 um 16:13 schrieb mnuzaihan:

Thanks for the reply, now i had realised on the race condition issue
when implementing a lock file.

The script i modified creates a lockfile on the NFS, shared by the
cluster.

I've did something like (if output file [resulting encoded file]
exists or lockfile exists), it skips the file and then do a loop to
do other files.

So the setting that i did involves putting the directory path of
where it searches for the raw files to encode and then executes the
process.

In fact, the original script was intended to be done on a single
local machine but i added the lockfile stuff in the "if ( ! -e
encoded_file || ! -e lockfile ) then encodes, else skip. But
executing like you had mentioned, causes a race condition of
machines in the cluster to check on the lock file which my idea
doesn't work well.

then I would suggest make two scripts out of the one you have:

The first part is a loop checking for new files (which is an endless
loop I assume). When it finds a new file, it won't convert it, but
submit a job which will do the actual conversion (this second script
is a sub-part of the original one).

As movies are large files (which will put some heavy load on the NFS
server), maybe you can improve performance when you first copy the
file to a local node (into the $TMPDIR which is maintained by SGE),
and then the result back.

-- Reuti


I'm sure i had about some movie houses had used gridengine but on
how they did it, i'm not really sure. But if someone on this list
had done it and it would be nice to share their experiences on this
topic.

But i know this might not be limited to just encoding files though.

Thanks!,
Muhammad Nuzaihan

On 17-May-2010, at 5:36 PM, reuti wrote:

Hi,

Am 15.05.2010 um 21:13 schrieb mnuzaihan:

I am having a problem. We do encoding of large and many videos
over the gridengine cluster. However, the problem is no matter how
much i tried to create a lockfile in the script so the other
machines would know there's a lockfile (if the encoding is started
on one machine) and try to encode the next file. It doesn't seem
to work.

how do you create the lockfile - and where?

But: there is nothing inside SGE which would prevent a race-
condition, where two nodes would start with the same with the same
movie. The lockfile-creation will never be atomic when you do it
inside the script.

Can't you just give the filename to the script, and each submitted
job will handle exactly this movie? So there wouldn't be a need for
a lockfile.

-- Reuti


Is there someone who had done this before? Any workarounds to this
problem?

Thanks in advance!

Muhammad Nuzaihan

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257423

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257578

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
].

Best Regards,
Muhammad Nuzaihan Kamal
Network Consultant
Mobile: +65 97473874

Asfa Systems Pte Ltd
91, Alps Avenue. #03-10. Singapore 498787

Tel:  +65 62538211
Fax: +65 62504814
www.asfasystems.com.sg<http://www.asfasystems.com.sg>


------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=257629

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=258712

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].




More information about the gridengine-users mailing list