[GE users] writing to a single log file
chris.duke at novas.com
Mon May 9 22:45:41 BST 2005
Thanks for all the feedback.
In answer to some of the questions:
* Yes, we are looking at a predefined and common log file - one
log file that is written to by all queues/slots/jobs. This is separate
from the standard .e and .o files.
* Our log file contains about 8 distinct pieces of information
(none related to grid metrics). This information is later parsed and
interpreted by another process to provide some reporting and metrics
(thus the need for the single log file)
* We have considered writing a cron to consolidate all the .e .o
or other individual output files. We have not explored this yet, because
we sometimes like to use "tail -f" to get a feel for the real time
processes. If the metrics are way off, we'll often cancel the remaining
jobs in the cycle.
* We tried using "lockfile" which is part of the procmail package.
We found that there still seemed to be contention (overwriting). We
might have to write our own lockfile process into the job scripts as Dan
mentioned and fine tune it for our needs. I don't know if this problem
has to do with systems being patched at different levels, running
slightly different versions of nfs/lockd and/or slightly different
versions of the procmail package. All I know is, I can't figure out why
there is still contention.
I know there has got to be some method of writing contention free
simultaneous records. I've also considered writing to mysql, and pulling
the information out when it's done, but this seems even more complicated
than the cron "collecting" information and consolidating it in one log
Any other ideas out there?
From: Dan Gruhn [mailto:Dan.Gruhn at Group-W-Inc.com]
Sent: Monday, May 09, 2005 1:01 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] writing to a single log file
If you are keeping your own log file (not the one created by SGE), you
can "lock" a file using standard commands in a script. The basic tool
of this is linking a filename to another filename, which NFS guarantees
to be atomic. Here is a simple script:
# Do some work
# Wait until we get the log file
until ln $outputFile $lockFile >/dev/null 2>&1
# Update the log file
echo "<status information or whatever>" >$outputFile
# Remove our lock
rm -f $lockfile
If you are trying to add output to the SGE output file, use the separate
file based on job ID method that Reuti, Daleand Rayson have suggested.
On Mon, 2005-05-09 at 15:02, Chris Duke wrote:
We would like to write the results from grid jobs into a custom file (we
generate more output than just complete status). We have found that when
we simply try and write to a log file (on a shared NFS mount), the jobs
tend to write all over each other, and the output is unusable. We plan
on scaling this to several hundred slots, so the potential conflicts are
huge. Does anyone have any suggestions on how to accomplish this?
PS. Thanks to everybody who participates in this group. It's fantastic
to see such an active community.
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users