Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (73 - 75 of 431)

Ticket Resolution Summary Owner Reporter
#799 invalid IZ3261: Job submission fails with "no suitable queues" when requesting SGE complexes benmwebb
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3261]

        Issue #:      3261             Platform:     PC       Reporter: benmwebb (benmwebb)
       Component:     gridengine          OS:        Linux
     Subcomponent:    drmaa            Version:      6.1u3       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    dagru (dagru)
      QA Contact:     templedf
          URL:
       * Summary:     Job submission fails with "no suitable queues" when requesting SGE complexes
   Status whiteboard:
      Attachments:
                      Date/filename:                               Description:              Submitted by:
                      Tue Apr 13 22:50:00 -0700 2010: drmaa_test.c drmaa_test.c (text/plain) benmwebb

     Issue 3261 blocks:
   Votes for issue 3261:


   Opened: Tue Apr 13 22:49:00 -0700 2010 
------------------------


drmaa_run_job fails reporting "no suitable queues" whenever we try to run a job that requests an SGE complex, i.e. if

1. We set drmaa_native_specification to '-b no' and add '#$ -l mem_free=1G' to the remote_command shell script.
or
2. We set drmaa_native_specification to '-l mem_free=1G'.
or
3. We add 'testcomplex -l mem_free=1G' to ~/.qtask and set drmaa_job_category to 'testcomplex'.

The same scripts work without any problems with qsub, and DRMAA submission works fine once we remove any '-l' options.

I'm attaching a test program for (2) above. It's basically the simple job submission example from the DRMAA tutorial, with a simple addition
to set native_specification. When built and run, it yields the following on our systems:

% gcc -Wall drmaa_test.c  -I /home/sge61/include -ldrmaa -L /home/sge61/lib/lx24-amd64/
% LD_LIBRARY_PATH=/home/sge61/lib/lx24-amd64/ ./a.out
Could not submit job: error: no suitable queues

   ------- Additional comments from benmwebb Tue Apr 13 22:50:04 -0700 2010 -------
Created an attachment (id=202)
drmaa_test.c

   ------- Additional comments from benmwebb Wed Apr 14 13:02:45 -0700 2010 -------
On digging around in the code, I see this particular error code originates from deep within qmaster, thus it should not be DRMAA-specific.
And indeed, if I submit an equivalent script (option 1 in my original report) with qsub and add the '-w v' option, it also fails with the
same error. So I guess job verification is turned on by default for DRMAA for some reason. Adding '-w n' to my DRMAA native specification
makes things work for me. Is this a known bug in the 6.1u3 qmaster?
#1265 invalid IZ3281: consumable JOB handled as YES during scheduling, but correctly charged at execution time reuti
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3281]

        Issue #:      3281             Platform:     All      Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u5       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     consumable JOB handled as YES during scheduling, but correctly charged at execution time
   Status whiteboard:
      Attachments:

     Issue 3281 blocks:
   Votes for issue 3281:


   Opened: Sun Aug 29 06:37:00 -0700 2010 
------------------------


Having a complex:

#name               shortcut   type        relop   requestable consumable default  urgency
master              mst        BOOL        EXCL    YES         JOB        0        1000

will "subtract" the consumable only once. When it's not a global consumable but a host one, it will be honored only on the master node of a
parallel job. Submitting such a request:

$ qsub -pe mpich 7 -l master test.sh

in an empty cluster works fine, and the "master" complex will give complete access to the elected master node. Of course, the number for the
remaining slots on the master node must be adjusted to honor this cut-off, i.e. slots=(needed)-1+(slots per host) for a PE with $fill_up.
Once the job is running some serial jobs can be submitted and fill the gaps on the slave nodes (this conforms to the output of `qhost -F
master`, that it's only changed on the master node of the parallel job).

But when there are already some serial jobs running in the cluster, the above job is less likely to start, as it seems that during
scheduling the EXCL complex will be checked for all slaves too. The output of `qstat -j <jobid>` shows an error like:

scheduling info:            cannot run in PE "mpich" because it only offers 4 slots

But this reflects only one complete free node, which would be good for the master. There are more free slots scattered around the cluster.

In addition, `qalter -w v/p <jobid>` ouptuts "no suitable queues" for a waiting job like this. For "-w v" (which assumes an empty cluster)
it's wrong - the job will start once the former serial jobs are gone. For "-w p" it corresponds with the ouput of `qstat -j <jobid>`,
nevertheless it's also wrong, as the job could run even with other jobs in place.

   ------- Additional comments from reuti Sun Aug 29 08:52:03 -0700 2010 -------
The same applies also for normal JOB consumables, when a load_threshold is used:

$ qconf -sc
#name               shortcut   type        relop   requestable consumable default  urgency
master              mst        INT         <=      YES         JOB        1        1000

One queue with slots=4 across two nodes with each:

$ qconf -se pc15370
...
complex_values        master=2

Running job: qsub -pe mpich 4 test.sh

$ qstat -F master
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@pc15370.Chemie.Uni-Marbu BIP   0/2/4          0.06     lx24-x86
        hc:master=1
   2312 1.75000 test.sh    reuti        r     08/29/2010 17:32:13     2
---------------------------------------------------------------------------------
all.q@pc15381.Chemie.Uni-Marbu BIP   0/2/4          0.02     lx24-x86
        hc:master=2
   2312 1.75000 test.sh    reuti        r     08/29/2010 17:32:13     2

this is correct. But now with a: load_thresholds       master=1

$ qstat -j 2313
scheduling info:            cannot run in PE "mpich" because it only offers 2 slots

`qalter` output is misleading like in the former case complaining about "no suitable queues". Removing the load_threshold will start the job.

(In the real case I want to block other queues, but this example is a shrink down version. In contrast to issue 464 load_threshold are now
already fulfilled for "<=", not only "<" - this must have  been changed at one time. But this is a different thing.)

   ------- Additional comments from reuti Mon Aug 30 03:10:00 -0700 2010 -------
In the above examples the complex was attached to the exexhosts. The same behavior happens when the complex is instead attached to the queue.

To emphasize it: the problem exists AFAICS only if a load_threshold for this JOB complex or an exclusive boolean is used. The normal
scheduling honors the JOB consumable correctly also at scheduling times.
#1266 invalid IZ3282: Changed h_rt not effective for `qmod -rj` or checkpoint migration reuti
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3282]

        Issue #:      3282             Platform:     All      Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.2u5       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Changed h_rt not effective for `qmod -rj` or checkpoint migration
   Status whiteboard:
      Attachments:

     Issue 3282 blocks:
   Votes for issue 3282:


   Opened: Thu Sep 2 03:53:00 -0700 2010 
------------------------


When the limit for e.g. h_rt is changed for a running job by `qalter`, it should be honored the next time the job is restarted or migrated
(according to the `man qsub` for the "-l" option). This does not happen for h_rt (i.e. the SGE limit), although `qstat -j` shows the changed
limit. Changing something like h_vmem is working as expected for setting the ulimits at least though.

   ------- Additional comments from reuti Fri Sep 3 10:29:13 -0700 2010 -------
Submitting a copy of the job with `qresub` will use the new settings though.
Note: See TracQuery for help on using queries.