Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (70 - 72 of 431)

Ticket Resolution Summary Owner Reporter
#799 invalid IZ3261: Job submission fails with "no suitable queues" when requesting SGE complexes benmwebb
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3261]

        Issue #:      3261             Platform:     PC       Reporter: benmwebb (benmwebb)
       Component:     gridengine          OS:        Linux
     Subcomponent:    drmaa            Version:      6.1u3       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    dagru (dagru)
      QA Contact:     templedf
          URL:
       * Summary:     Job submission fails with "no suitable queues" when requesting SGE complexes
   Status whiteboard:
      Attachments:
                      Date/filename:                               Description:              Submitted by:
                      Tue Apr 13 22:50:00 -0700 2010: drmaa_test.c drmaa_test.c (text/plain) benmwebb

     Issue 3261 blocks:
   Votes for issue 3261:


   Opened: Tue Apr 13 22:49:00 -0700 2010 
------------------------


drmaa_run_job fails reporting "no suitable queues" whenever we try to run a job that requests an SGE complex, i.e. if

1. We set drmaa_native_specification to '-b no' and add '#$ -l mem_free=1G' to the remote_command shell script.
or
2. We set drmaa_native_specification to '-l mem_free=1G'.
or
3. We add 'testcomplex -l mem_free=1G' to ~/.qtask and set drmaa_job_category to 'testcomplex'.

The same scripts work without any problems with qsub, and DRMAA submission works fine once we remove any '-l' options.

I'm attaching a test program for (2) above. It's basically the simple job submission example from the DRMAA tutorial, with a simple addition
to set native_specification. When built and run, it yields the following on our systems:

% gcc -Wall drmaa_test.c  -I /home/sge61/include -ldrmaa -L /home/sge61/lib/lx24-amd64/
% LD_LIBRARY_PATH=/home/sge61/lib/lx24-amd64/ ./a.out
Could not submit job: error: no suitable queues

   ------- Additional comments from benmwebb Tue Apr 13 22:50:04 -0700 2010 -------
Created an attachment (id=202)
drmaa_test.c

   ------- Additional comments from benmwebb Wed Apr 14 13:02:45 -0700 2010 -------
On digging around in the code, I see this particular error code originates from deep within qmaster, thus it should not be DRMAA-specific.
And indeed, if I submit an equivalent script (option 1 in my original report) with qsub and add the '-w v' option, it also fails with the
same error. So I guess job verification is turned on by default for DRMAA for some reason. Adding '-w n' to my DRMAA native specification
makes things work for me. Is this a known bug in the 6.1u3 qmaster?
#294 fixed IZ1882: mutually subordinating queues suspend eachother simultaneously bjfcoomer
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1882]

        Issue #:      1882             Platform:     All      Reporter: bjfcoomer (bjfcoomer)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u5       CC:    None defined
        Status:       REOPENED         Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: 6.0u7
      Assigned to:    sgrell (sgrell)
      QA Contact:     andreas
          URL:
       * Summary:     mutually subordinating queues suspend eachother simultaneously
   Status whiteboard:
      Attachments:

     Issue 1882 blocks:
   Votes for issue 1882:


   Opened: Fri Nov 11 04:04:00 -0700 2005 
------------------------


The full issue is reproduced by the stuff below. The basic problem is that jobs
get scheduled to queues which are mutually subordinate to eachother
simultaneously. So they suspend each other.


>>> (1) A parallel job is running, and one is queued, and serial jobs
>>> are
>>> queued
>>>
>>> sccomp@test:~/EXAMPLE/serial> qstat -f
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> --------------------------------------------------------------------
>>> --------
>>> master.q@test.grid.cluster P     1/8       0.00     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp00.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp01.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp02.grid.cluster P     1/1       0.07     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp03.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp00.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp01.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp02.grid.cluster BI    0/2       0.07     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp03.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>>
>>> ####################################################################
>>> ########
>>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>> PENDING
>>> JOBS
>>> ####################################################################
>>> ########
>>>     526 1000.51000 PMB-MPI1.s sccomp       qw    11/03/2005 18:44:28
>>> 5
>>>     527 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:45     1
>>>     528 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:45     1
>>>     529 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:46     1
>>>     530 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:46     1
>>>     531 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:47     1
>>>     532 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:47     1
>>>     533 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:48     1
>>>     534 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:48     1
>>>
>>>
>>> (2) I qdel the running parallel job and then do qstat -f
>>>
>>> sccomp@test:~/EXAMPLE/serial> qstat -f
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> --------------------------------------------------------------------
>>> --------
>>> master.q@test.grid.cluster P     1/8       0.00     lx24-amd64
>>>     526 1000.51000 PMB-MPI1.s sccomp       t     11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp00.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp01.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp02.grid.cluster P     1/1       0.31     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp03.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp00.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     527 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     533 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp01.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     529 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     531 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp02.grid.cluster BI    2/2       0.31     lx24-
>>> amd64    S
>>>     530 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     534 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp03.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     528 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     532 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>
>>>
>>>
>>> And here is the log from the scheduler monitor:
>>> ::::::::
>>> 525:1:RUNNING:1131043467:600:P:score:slots:5.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:STARTING:1131043527:600:P:score:slots:5.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 527:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 528:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 529:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 530:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 531:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 532:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 533:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 534:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:SUSPENDED:1131043527:600:P:score:slots:5.000000
>>> 526:1:SUSPENDED:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:SUSPENDED:1131043527:600:P:score:slots:5.000000
>>> 526:1:SUSPENDED:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:RUNNING:1131043527:600:P:score:slots:5.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>>
>>>

   ------- Additional comments from sgrell Wed Nov 23 01:33:18 -0700 2005 -------
Started working on this issue.

Stephan

   ------- Additional comments from sgrell Wed Nov 23 09:03:03 -0700 2005 -------
Fixed in maintrunk and for u7.

Stephan

   ------- Additional comments from reuti Mon Aug 9 16:41:41 -0700 2010 -------
A parallel job can suspend itself, we he got slots in the sub- and superordinated queue at the same time:

reuti@pc15370:~> qsub -pe openmpi 8 -l h=pc15370 test_mpich.sh
Your job 1868 ("test_mpich.sh") has been submitted
reuti@pc15370:~> qstat -g t
job-ID  prior   name       user         state submit/start at     queue                          master ja-task-ID
------------------------------------------------------------------------------------------------------------------
   1868 0.75500 test_mpich reuti        S     08/10/2010 01:31:11 all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
   1868 0.75500 test_mpich reuti        S     08/10/2010 01:31:11 extra.q@pc15370 MASTER
                                                                  extra.q@pc15370 SLAVE
                                                                  extra.q@pc15370 SLAVE
                                                                  extra.q@pc15370 SLAVE

extra.q is entered as subordinated queue in all.q (classic subordination). There are some other issues which are similar, so I'm not sure
whether this is the most appropriate one or: 437 / 2397
#743 worksforme IZ3180: qsub/qlogin segfaults on ~/.sge_request bmcnally
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3180]

        Issue #:      3180             Platform:     All      Reporter: bmcnally (bmcnally)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.1u6       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     qsub/qlogin segfaults on ~/.sge_request
   Status whiteboard:
      Attachments:

     Issue 3180 blocks:
   Votes for issue 3180:


   Opened: Fri Nov 13 11:49:00 -0700 2009 
------------------------


Using the ~/.sge_request file below attempts to qlogin or qsub result in a segmentation fault. In my case I also have a global sge_request
file defined too. Removing this file (or even half of it) allows qsub/qlogin to succeed.

===
# sge_request file
#
# Set e-mail address
#-M test@test.com

# If you use qlogin, it will also notify you
# when those jobs are done, so you may want to put
# this option in your job scripts instead.
#-m e

# Put job standard output into the sgeoutput directory in
# your home directory. The filename will be named
# [jobname].o[jobid] (ex. testjob2.sh.o10979).
# If this option is not specified, the output file
# will be created in your home directory.
#-o $HOME/out

# Put job standard error into the sgeoutput directory in
# your home directory. The filename will be named
# [jobname].e[jobid] (ex. testjob2.sh.e10979).
# If this option is not specified, the output file
# will be created in your home directory.
#-e $HOME/err

# Uncomment to direct standard error into the standard output file
#-j y
===
Note: See TracQuery for help on using queries.