Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (76 - 78 of 431)

Ticket Resolution Summary Owner Reporter
#294 fixed IZ1882: mutually subordinating queues suspend eachother simultaneously bjfcoomer
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1882]

        Issue #:      1882             Platform:     All      Reporter: bjfcoomer (bjfcoomer)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u5       CC:    None defined
        Status:       REOPENED         Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: 6.0u7
      Assigned to:    sgrell (sgrell)
      QA Contact:     andreas
          URL:
       * Summary:     mutually subordinating queues suspend eachother simultaneously
   Status whiteboard:
      Attachments:

     Issue 1882 blocks:
   Votes for issue 1882:


   Opened: Fri Nov 11 04:04:00 -0700 2005 
------------------------


The full issue is reproduced by the stuff below. The basic problem is that jobs
get scheduled to queues which are mutually subordinate to eachother
simultaneously. So they suspend each other.


>>> (1) A parallel job is running, and one is queued, and serial jobs
>>> are
>>> queued
>>>
>>> sccomp@test:~/EXAMPLE/serial> qstat -f
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> --------------------------------------------------------------------
>>> --------
>>> master.q@test.grid.cluster P     1/8       0.00     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp00.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp01.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp02.grid.cluster P     1/1       0.07     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp03.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp00.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp01.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp02.grid.cluster BI    0/2       0.07     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp03.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>>
>>> ####################################################################
>>> ########
>>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>> PENDING
>>> JOBS
>>> ####################################################################
>>> ########
>>>     526 1000.51000 PMB-MPI1.s sccomp       qw    11/03/2005 18:44:28
>>> 5
>>>     527 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:45     1
>>>     528 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:45     1
>>>     529 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:46     1
>>>     530 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:46     1
>>>     531 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:47     1
>>>     532 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:47     1
>>>     533 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:48     1
>>>     534 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:48     1
>>>
>>>
>>> (2) I qdel the running parallel job and then do qstat -f
>>>
>>> sccomp@test:~/EXAMPLE/serial> qstat -f
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> --------------------------------------------------------------------
>>> --------
>>> master.q@test.grid.cluster P     1/8       0.00     lx24-amd64
>>>     526 1000.51000 PMB-MPI1.s sccomp       t     11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp00.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp01.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp02.grid.cluster P     1/1       0.31     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp03.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp00.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     527 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     533 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp01.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     529 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     531 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp02.grid.cluster BI    2/2       0.31     lx24-
>>> amd64    S
>>>     530 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     534 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp03.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     528 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     532 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>
>>>
>>>
>>> And here is the log from the scheduler monitor:
>>> ::::::::
>>> 525:1:RUNNING:1131043467:600:P:score:slots:5.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:STARTING:1131043527:600:P:score:slots:5.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 527:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 528:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 529:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 530:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 531:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 532:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 533:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 534:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:SUSPENDED:1131043527:600:P:score:slots:5.000000
>>> 526:1:SUSPENDED:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:SUSPENDED:1131043527:600:P:score:slots:5.000000
>>> 526:1:SUSPENDED:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:RUNNING:1131043527:600:P:score:slots:5.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>>
>>>

   ------- Additional comments from sgrell Wed Nov 23 01:33:18 -0700 2005 -------
Started working on this issue.

Stephan

   ------- Additional comments from sgrell Wed Nov 23 09:03:03 -0700 2005 -------
Fixed in maintrunk and for u7.

Stephan

   ------- Additional comments from reuti Mon Aug 9 16:41:41 -0700 2010 -------
A parallel job can suspend itself, we he got slots in the sub- and superordinated queue at the same time:

reuti@pc15370:~> qsub -pe openmpi 8 -l h=pc15370 test_mpich.sh
Your job 1868 ("test_mpich.sh") has been submitted
reuti@pc15370:~> qstat -g t
job-ID  prior   name       user         state submit/start at     queue                          master ja-task-ID
------------------------------------------------------------------------------------------------------------------
   1868 0.75500 test_mpich reuti        S     08/10/2010 01:31:11 all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
   1868 0.75500 test_mpich reuti        S     08/10/2010 01:31:11 extra.q@pc15370 MASTER
                                                                  extra.q@pc15370 SLAVE
                                                                  extra.q@pc15370 SLAVE
                                                                  extra.q@pc15370 SLAVE

extra.q is entered as subordinated queue in all.q (classic subordination). There are some other issues which are similar, so I'm not sure
whether this is the most appropriate one or: 437 / 2397
#295 fixed IZ1887: complex man page describes regex incorrectly ovid
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1887]

        Issue #:      1887             Platform:     PC       Reporter: ovid (ovid)
       Component:     gridengine          OS:        All
     Subcomponent:    man              Version:      6.0u6       CC:    None defined
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     complex man page describes regex incorrectly
   Status whiteboard:
      Attachments:

     Issue 1887 blocks:
   Votes for issue 1887:


   Opened: Fri Nov 11 16:09:00 -0700 2005 
------------------------


The man page for complex(5) says under RESTRING:



           - "[xx]": specifies an array or a range of allowed
                     characters for one character at a specific
                     position

That is incorrect. It should say "[x-y]" . The behaviour specified
in the man page is not supported.

   ------- Additional comments from pollinger Mon Mar 23 04:02:04 -0700 2009 -------
Changed Subcomponent to man

   ------- Additional comments from pollinger Mon Mar 23 04:02:27 -0700 2009 -------
Changed Subcomponent to man

   ------- Additional comments from pollinger Mon Mar 23 04:03:10 -0700 2009 -------
Changing Subcomponent didn't work, 3rd try...
#298 fixed IZ1894: qstat incorrectly reports job scheduling failure templedf
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1894]

        Issue #:      1894             Platform:     All       Reporter: templedf (templedf)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      current      CC:    None defined
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    sgrell (sgrell)
      QA Contact:     andreas
          URL:
       * Summary:     qstat incorrectly reports job scheduling failure
   Status whiteboard:
      Attachments:

     Issue 1894 blocks:
   Votes for issue 1894:


   Opened: Tue Nov 15 14:23:00 -0700 2005 
------------------------


If I submit a job with an unfulfillable resource request, I get the following:

% qsub -l arch=sol-sparc6 /tmp/dant/examples/jobs/sleeper.sh
Your job 6 ("Sleeper") has been submitted.
% qstat -j 6
==============================================================
job_number:                 6
...
scheduling info:            (-l arch=sol-sparc6) cannot run globally because
                            (-l arch=sol-sparc6) cannot run at host
"balrog.germany.sun.com" because it offers only hl:arch=sol-sparc64
                            (-l arch=sol-sparc6) cannot run at host "balin"
because it offers only hl:arch=sol-sparc64


First of all, should there even be an message that the job cannot be run
globally?  Secondly, the message is incomplete: cannot be run because why?

I have am seeing this problem in the Maintrunk.  I have not tested it in any of
the release branches.

   ------- Additional comments from roland Wed Nov 16 01:21:30 -0700 2005 -------
For correct subcomponent tracking I've moved this bug to "scheduling" because
the scheduler is responsible for the scheduling info. The qstat command only
prints out the messages reported by the scheduler.

I assume you want to say with "should there even be an message that the job
cannot be run globally" qsub should deny the job at submittion time. This is
wrong. Per default qsub accepts all jobs but as in this case they will never be
scheduled. You can force the consumable verification at submittion time with the
"-w" switch.


   ------- Additional comments from templedf Wed Nov 16 12:46:15 -0700 2005 -------
I meant that jobs don't run "globally."  They run on hosts.  Of course the job
cannot be run "globally."  It's not possible!
Of course, I could be wrong about what the message is supposed to mean...

   ------- Additional comments from sgrell Tue Nov 22 02:22:10 -0700 2005 -------
I will look into it.

Stephan

   ------- Additional comments from sgrell Mon Dec 5 02:01:43 -0700 2005 -------
*** Issue 1817 has been marked as a duplicate of this issue. ***
Note: See TracQuery for help on using queries.