[GE users] Poor Man's Portal - job submission using signal files

Tim Cera tcera at sjrwmd.com
Tue Jan 11 20:48:00 GMT 2005

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I have developed a portal of sorts.  I wanted some way to allow interaction with our cluster from the Windows machines, with special focus on ArcGIS from ESRI ( http://www.esri.com <http://www.esri.com/> ), though this system is available to nearly any software/user that can manipulate the file system.  I am sending this along to the mailing list to seek out input and also as my small contribution to the GridEngine community.
First off, this portal is dirt simple, though it relies on a couple things.
1) That the cluster and the Windows machines have read/write access to a common directory.
Well, that is all I can think of right now...   :-)
The process is very simple:
The User
1) Creates a unique directory in the common directory.
2) Copies all input data files to the new directory.
3) Creates a file called 'job_control' which contains the complete path to the command to run assigned to the 'RUN_COMMAND' variable.
4) Creates a file called 'param.rdy'.  The 'param.rdy' file can be empty - the name is the only thing important.
5) Wait for the 'pslave.rdy' file to indicate that GridEngine has begun the job.
6) Wait for the 'observ.rdy' file to indicate that the job is done.
The Cluster
1) Via a cron job, checks all sub-directories of the common directory for the file 'param.rdy'.
2) Makes sure that a 'job_control' file exists.  If it doesn't it exits.
3) Converts the 'job_control' file to Unix/Linux line endings.
4) Sources the file 'job_control'.
5) Converts all DOS text files to Unix/Linux line endings (default - not done only if 'CONVERT_FILES' option is set to 'No' in the 'job_control' file)
5) Removes the 'param.rdy' file.
6) Submits the job specified in the 'job_control' file to the proper queue.
7) Creates a 'pslave.rdy' file.
8) The queue assigned to handle these jobs has an epilog command to move the 'pslave.rdy' file to the 'observ.rdy' file.
Obviously this is not appropriate if any reasonable level of security is desired... caveat emptor.
Currently I have the cron job running as a user and have the common directory set group id.  This seems to work, at least from ArcGIS.  It would be nicer to run the cron job as root in order to have a complete solution for the myriad permissions issues.
I have attached three scripts, the 'spool_manager.scr' is run from cron every minute, the 'tiff2jp2_converter.py' is a Python script that runs from Windos ArcGIS and converts a GeoTIFF to a JPEG2000 on the cluster using this process, and 'rj' is our run job script which has some nice features (for example testing the binary to see if it is checkpointable and then deciding which queue should be used).  If the attached files don't get through I will include them as follow-ups to this document.
P.S. Why the strange names for the signal files?  I wanted to keep open the option of using this system seamlessly with Parallel PEST and these are the names that it uses ( http://www.sspa.com/pest/ppest.html)
kindest regards,
Tim Cera, P.E.
Engineer Scientist
St. Johns River Water Management District

    [ Part 2: "Attached Text" ]

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list