[GE users] [ANN] rq-1.0.0

Ara.T.Howard Ara.T.Howard at noaa.gov
Wed Nov 10 22:29:38 GMT 2004


all-

forgive me if this announcement is too off thread for this list.  ruby queue
is a project aimed at an entirely different niche that sge but, i think, of
potential interest to some of the readers of this list.  please correct me if
annoucements of this sort are not appropriate for this list.

kind regards.

-a
--
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| When you do something, you should burn yourself completely, like a good
| bonfire, leaving no trace of yourself.  --Shunryu Suzuki
===============================================================================



URLS
   http://raa.ruby-lang.org/project/rq/
   http://www.codeforpeople.com/lib/ruby/rq/
   (http://rubyforge.org/projects/rqueue/ - under construction)

NAME
   rq v1.0.0

SYNOPSIS
   rq (queue | export RQ_Q=q) mode [mode_args]* [options]*


DESCRIPTION
   ruby queue (rq) is a tool used to create instant linux clusters by managing
   sqlite databases as nfs mounted priority work queues.  multiple instances of
   rq running from multiples hosts can work from these queues to distribute
   processing load to n nodes - bringing many dozens of otherwise powerful cpus
   to their knees with a single blow.  clearly this software should be kept out
   of the hands of free radicals, seti enthusiasts, and j.  safran.

   the central concept of rq is that n nodes work in isolation to pull jobs
   from an central nfs mounted priority work queue in a synchronized fashion.
   the nodes have absolutely no knowledge of each other and all communication
   if done via the queue meaning that, so long as the queue is available via
   nfs and a single node is running jobs from it, the system will continue to
   process jobs.  there is no centralized process whatsoever - all nodes work
   to take jobs from the queue and run them as fast as possible. this creates
   a system which load balances automatically and is robust in face of node
   failures.

   the first argument to any rq command is the name of the queue.  this name
   may be omitted if, and only if, the environment variable RQ_Q has been set
   to contain the absolute path of target queue.

   rq operates in one of the modes create, submit, list, status, delete,
   update, query, execute, configure, snapshot, lock, backup, help, or feed.
   depending on the mode of operation and the options used the meaning of
   'mode_args' may change.

MODES

   the following mode abbreviations exist

     c  => create
     s  => submit
     l  => list
     ls => list
     t  => status
     d  => delete
     rm => delete
     u  => update
     q  => query
     e  => execute
     C  => configure
     S  => snapshot
     L  => lock
     b  => backup
     h  => help
     f  => feed

   create, c :

     create a queue.  the queue must be located on an nfs mounted file system
     visible from all nodes intended to run jobs from it.

     examples :

       0) to create a queue
           ~ > rq /path/to/nfs/mounted/q create
         or simply
           ~ > rq /path/to/nfs/mounted/q c


   submit, s :

     submit jobs to a queue to be proccesed by a feeding node.  any 'mode_args'
     are taken as the command to run.  note that 'mode_args' are subject to
     shell expansion - if you don't understand what this means do not use this
     feature and pass jobs on stdin.

     when running in submit mode a file may by specified as a list of commands
     to run using the '--infile, -i' option.  this file is taken to be a
     newline separated list of commands to submit, blank lines and comments (#)
     are allowed.  if submitting a large number of jobs the input file method
     is MUCH, more efficient.  if no commands are specified on the command line
     rq automatically reads them from STDIN.  yaml formatted files are also
     allowed as input (http://www.yaml.org/) - note that the output of nearly
     all rq commands is valid yaml and may, therefore, be piped as input into
     the submit command.

     when submitting the '--priority, -p' option can be used here to determine
     the priority of jobs.  priorities may be any whole number - zero is the
     default.  note that submission of a high priority job will NOT supplant
     currently running low priority jobs, but higher priority jobs WILL always
     migrate above lower priority jobs in the queue in order that they be run
     as soon as possible.  constant submission of high priority jobs may create
     a starvation situation whereby low priority jobs are never allowed to run.
     avoiding this situation is the responsibility of the user.  the only
     guaruntee rq makes regarding job execution is that jobs are executed in an
     'oldest highest priority' order and that running jobs are never
     supplanted.

     examples :

       0) submit the job ls to run on some feeding host

         ~ > rq q s ls

       1) submit the job ls to run on some feeding host, at priority 9

         ~ > rq -p9 q s ls

       2) submit 42000 jobs (quietly) from a command file.

         ~ > wc -l cmdfile
         42000
         ~ > rq q s -q < cmdfile

       3) submit 42 priority 9 jobs from a command file.

         ~ > wc -l cmdfile
         42
         ~ > rq -p9 q s < cmdfile

       4) submit 42 priority 9 jobs from a command file, marking them as
          'important' using the '--tag, -t' option.

         ~ > wc -l cmdfile
         42
         ~ > rq -p9 -timportant q s < cmdfile

       5) re-submit all the 'important' jobs (see 'query' section below)

         ~ > rq q query tag=important | rq q s

       6) re-submit all jobs which are already finished (see 'list' section
          below)

         ~ > rq q l f | rq q s


   list, l, ls :

     list mode lists jobs of a certain state or job id.  state may be one of
     pending, running, finished, dead, or all.  any 'mode_args' that are
     numbers are taken to be job id's to list.

     states may be abbreviated to uniqueness, therefore the following shortcuts
     apply :

       p => pending
       r => running
       f => finished
       d => dead
       a => all

     examples :

       0) show everything in q
           ~ > rq q list all
         or
           ~ > rq q l all
         or
           ~ > export RQ_Q=q
           ~ > rq l

       1) show q's pending jobs
           ~ > rq q list pending

       2) show q's running jobs
           ~ > rq q list running

       3) show q's finished jobs
           ~ > rq q list finshed

       4) show job id 42
           ~ > rq q l 42


   status, t :

     status mode shows the global state the queue.  there are no 'mode_args'.
     the meaning of each state is as follows:

       pending  => no feeder has yet taken this job
       running  => a feeder has taken this job
       finished => a feeder has finished this job
       dead     => rq died while running a job, has restarted, and moved
                   this job to the dead state

     note that rq cannot move jobs into the dead state unless it has been
     restarted.  this is because no node has any knowledge of other nodes and
     cannot possibly know if a job was started on a node that died, or is
     simply taking a very long time.  only the node that dies, upon restart,
     can determine that is has jobs that 'were started before it started' and
     move these jobs into the dead state.  normally only a machine crash would
     cause a job to be placed into the dead state.  dead jobs are never
     automatically restarted, this is the responsibility of an operator.

     examples :

       0) show q's status

         ~ > rq q t


   delete, d :

     delete combinations of pending, running, finished, dead, or jobs specified
     by jid.  the delete mode is capable of parsing the output of list and
     query modes, making it possible to create custom filters to delete jobs
     meeting very specific conditions.

     'mode_args' are the same as for list.  note that while it is possible to
     delete a running job, but there is no way to actually STOP it mid
     execution since the node doing the deleteing has no way to communicate
     this information to the (probably) remote execution node.  therefore you
     should use the 'delete running' feature with care and only for
     housekeeping purposes or to prevent future jobs from being scheduled.

     examples :

       0) delete all pending, running, and finished jobs from a queue

         ~ > rq q d all

       1) delete all pending jobs from a queue

         ~ > rq q d p

       2) delete all finished jobs from a queue

         ~ > rq q d f

       3) delete jobs via hand crafted filter program

         ~ > rq q list | yaml_filter_prog | rq q d


   update, u :

     update assumes all leading arguments are jids to update with subsequent
     key=value pairs.  currently only the 'command', 'priority', and 'tag'
     fields of pending jobs can be updated.

     examples:

       0) update the priority of job 42

         ~ > rq q update 42 priority=7

       1) update the priority of all pending jobs

         ~ > rq q update pending priority=7

       2) query jobs with a command matching 'foobar' and update their command
          to be 'barfoo'

         ~ > rq q q "command like '%foobar%'" |\
             rq q u command=barfoo


   query, q :

     query exposes the database more directly the user, evaluating the where
     clause specified on the command line (or from STDIN).  this feature can be
     used to make a fine grained slection of jobs for reporting or as input
     into the delete command.  you must have a basic understanding of SQL
     syntax to use this feature, but it is fairly intuitive in this limited
     capacity.

     examples:

       0) show all jobs submitted within a specific 10 minute range

         ~ > rq q query "started >= '2004-06-29 22:51:00' and started < '2004-06-29 22:51:10'"

       1) shell quoting can be tricky here so input on STDIN is also allowed to
          avoid shell expansion

         ~ > cat constraints.txt
         started >= '2004-06-29 22:51:00' and
         started < '2004-06-29 22:51:10'

         ~ > rq q query < contraints.txt
           or (same thing)

         ~ > cat contraints.txt| rq q query

         ** in general all but numbers will need to be surrounded by single quotes **

       2) this query output might then be used to delete those jobs

         ~ > cat contraints.txt | rq q q | rq q d

       3) show all jobs which are either finished or dead

         ~ > rq q q "state='finished' or state='dead'"

       4) show all jobs which have non-zero exit status

         ~ > rq q query exit_status!=0

       5) if you plan to query groups of jobs with some common feature consider
          using the '--tag, -t' feature of the submit mode which allows a user to
          tag a job with a user defined string which can then be used to easily
          query that job group

         ~ > rq q submit --tag=my_jobs < joblist
         ~ > rq q query tag=my_jobs


   execute, e :

     execute mode is to be used by expert users with a knowledge of sql syntax
     only.  it follows the locking protocol used by rq and then allows the user
     to execute arbitrary sql on the queue.  unlike query mode a write lock on
     the queue is obtained allowing a user to definitively shoot themselves in
     the foot.  for details on a queue's schema the file 'db.schema' in the
     queue directory should be examined.

       examples :

         0) list all jobs

           ~ > rq q execute 'select * from jobs'


   configure, C :

     this mode is not supported yet.


   snapshot, p :

     snapshot provides a means of taking a snapshot of the q. use this feature
     when many queries are going to be run; for example when attempting to
     figure out a complex pipeline command your test queries will not compete
     with the feeders for the queue's lock.  you should use this option
     whenever possible to avoid lock competition.

     examples:

       0) take a snapshot using default snapshot naming, which is made via the
          basename of the q plus '.snapshot'

         ~ > rq /path/to/nfs/q snapshot

       1) use this snapshot to chceck status

         ~ > rq ./q.snapshot status

       2) use the snapshot to see what's running on which host

         ~ > rq ./q.snapshot list running | grep `hostname`

     note that there is also a snapshot option - this option is not the same as
     the snapshot command.  the option can be applied to ANY command. if in
     effect then that command will be run on a snapshot of the database and the
     snapshot then immediately deleted.  this is really only useful if one were
     to need to run a command against a very heavily loaded queue and did not
     wish to wait to obtain the lock.  eg.

       0) get the status of a heavily loaded queue

         ~ > rq q t --snapshot

       1) same as above

         ~ > rq q t -s


   lock, L :

     lock the queue and then execute an arbitrary shell command.  lock mode
     uses the queue's locking protocol to safely obtain a lock of the specified
     type and execute a command on the user's behalf.  lock type must be one of

       (r)ead | (sh)ared | (w)rite | (ex)clusive

     examples :

       0) get a read lock on the queue and make a backup

         ~ > rq q L read -- cp -r q q.bak

         (the '--' is needed to tell rq to stop parsing command line
          options which allows the '-r' to be passed to the 'cp' command)


   backup, b :

     backup mode is exactly the same as getting a read lock on the queue and
     making a copy of it.  this mode is provided as a convenience.

       0) make a backup of the queue using default naming ( qname + timestamp + .bak )

         ~ > rq q b

       1) make a backup of the queue as 'q.bak'

         ~ > rq q b q.bak

   help, h :

     this message

     examples :

       0) get this message

         ~> rq q help
         or
         ~> rq help


   feed, f :

     take jobs from the queue and run them on behalf of the submitter as
     quickly as possible.  jobs are taken from the queue in an 'oldest highest
     priority' first order.

     feeders can be run from any number of nodes allowing you to harness the
     CPU power of many nodes simoultaneously in order to more effectively
     clobber your network, anoy your sysads, and set output raids on fire.

     the most useful method of feeding from a queue is to do so in daemon mode
     so that if the process loses it's controling terminal it will not exit
     when you exit your terminal session.  use the '--daemon, -d' option to
     accomplish this.  by default only one feeding process per host per queue
     is allowed to run at any given moment.  because of this it is acceptable
     to start a feeder at some regular interval from a cron entry since, if a
     feeder is alreay running, the process will simply exit and otherwise a new
     feeder will be started.  in this way you may keep feeder processing
     running even acroess machine reboots without requiring sysad intervention
     to add an entry to the machine's startup tasks.


     examples :

       0) feed from a queue verbosely for debugging purposes, using a minimum
          and maximum polling time of 2 and 4 respectively.  you would NEVER
          specify polling times this brief except for debugging purposes!!!

         ~ > rq q feed -v4 -m2 -M4

       1) same as above, but viewing the executed sql as it is sent to the
          database

         ~ > RQ_SQL_DEBUG=1 rq q f -v4 -m2 -M4

       2) feed from a queue in daemon mode - logging to /home/ahoward/rq.log

         ~ > rq q f -d -l/home/ahoward/rq.log

          log rolling in daemon mode is automatic so your logs should never need
          to be deleted to prevent disk overflow.

       3) use something like this sample crontab entry to keep a feeder running
          forever - it attempts to (re)start every fifteen minutes but exits if
          another process is already feeding.

         #
         # your crontab file - sample only
         #

         */15 * * * * /full/path/to/bin/rq /full/path/to/nfs/mounted/q f -d -l/home/username/cfq.log -q

         the '--quiet, -q' here tells rq to exit quietly (no STDERR) when
         another process is found to already be feeding so that no cron message
         would be sent under these conditions.


NOTES
   - realize that your job is going to be running on a remote host and this has
     implications.  paths, for example, should be absolute, not relative.
     specifically the submitted job script must be visible from all hosts
     currently feeding from a queue as must be the input and output
     files/directories.

   - jobs are currently run under the bash shell using the --login option.
     therefore any settings in your .bashrc will apply - specifically your PATH
     setting.  you should not, however, rely on jobs running with any given
     environment.

   - you need to consider __CAREFULLY__ what the ramifications of having
     multiple instances of your program all potentially running at the same
     time will be.  for instance, it is beyond the scope of rq to ensure
     multiple instances of a given program will not overwrite each others
     output files.  coordination of programs is left entirely to the user.

   - the list of finished jobs will grow without bound unless you sometimes
     delete some (all) of them.  the reason for this is that rq cannot know
     when the user has collected the exit_status of a given job, and so keeps
     this information in the queue forever until instructed to delete it.  if
     you have collected the exit_status of you job(s) it is not an error to
     then delete that job from the finished list - the information is kept for
     your informational purposes only.  in a production system it would be
     normal to periodically save, and then delete, all finished jobs.

ENVIRONMENT
   RQ_Q: set to the full path of nfs mounted queue

     the queue argument to all commands may be omitted if, and only if, the
     environment variable 'RQ_Q' contains the full path to the q.  eg.

       ~ > export RQ_Q=/full/path/to/my/q

     this feature can save a considerable amount of typing for those weak of
     wrist.

DIAGNOSTICS
  success : $? == 0
  failure : $? != 0

AUTHOR
   ara.t.howard at noaa.gov

BUGS
  0 < bugno && bugno <= 42

  reports to ara.t.howard at noaa.gov

OPTIONS
   --priority=priority, -p
         modes <submit> : set the job(s) priority - lowest(0) .. highest(n) -
         (default 0)
   --tag=tag, -t
         modes <submit> : set the job(s) user data tag
   --infile=infile, -i
         modes <submit> : infile
   --quiet, -q
         modes <submit, feed> : do not echo submitted jobs, fail silently if
         another process is already feeding
   --daemon, -d
         modes <feed> : spawn a daemon
   --max_feed=max_feed, -f
         modes <feed> : the maximum number of concurrent jobs run
   --retries=retries, -r
         modes <feed> : specify transaction retries
   --min_sleep=min_sleep, -m
         modes <feed> : specify min sleep
   --max_sleep=max_sleep, -M
         modes <feed> : specify max sleep
   --snapshot, -s
         operate on snapshot of queue
   --verbosity=verbostiy, -v
         0|fatal < 1|error < 2|warn < 3|info < 4|debug - (default info)
   --log=path, -l
         set log file - (default stderr)
   --log_age=log_age
         daily | weekly | monthly - what age will cause log rolling (default
         nil)
   --log_size=log_size
         size in bytes - what size will cause log rolling (default nil)
   --help, -h
         this message
   --version
         show version number

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list