Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (97 - 99 of 431)

Ticket Resolution Summary Owner Reporter
#819 fixed IZ3283: -builtin- job startup method inherits $TERM from the execd and wrong owner of tty file reuti
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3283]

        Issue #:      3283             Platform:     All      Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.2u5       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     -builtin- job startup method inherits $TERM from the execd and wrong owner of tty file
   Status whiteboard:
      Attachments:

     Issue 3283 blocks:
   Votes for issue 3283:


   Opened: Wed Sep 29 03:11:00 -0700 2010 
------------------------


Using the -builtin- job startup for an interactive login, the set $TERM inside this session is not the one from the connected terminal, but
inherited from the sgeexecd, and reflects the one which was used to start the sgeexed. This can lead to "dumb" on RedHat system, or "linux"
for openSUSE when the daemons are simply started on a node in a cluster without any monitor connected. Some of these don't allow `vi` or
`less` to work as expected (with "linux" it seems working though). It should instead reflect the type of terminal which the user is actually
using. The workaround is to hardcode a value in /etc/profile or similar file for it.


Furthermore the protection of the generated /dev/pts/1 or similar file shows that it's owned by root, but it should be owned by the starting
user:

http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=284334
#1265 invalid IZ3281: consumable JOB handled as YES during scheduling, but correctly charged at execution time reuti
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3281]

        Issue #:      3281             Platform:     All      Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u5       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     consumable JOB handled as YES during scheduling, but correctly charged at execution time
   Status whiteboard:
      Attachments:

     Issue 3281 blocks:
   Votes for issue 3281:


   Opened: Sun Aug 29 06:37:00 -0700 2010 
------------------------


Having a complex:

#name               shortcut   type        relop   requestable consumable default  urgency
master              mst        BOOL        EXCL    YES         JOB        0        1000

will "subtract" the consumable only once. When it's not a global consumable but a host one, it will be honored only on the master node of a
parallel job. Submitting such a request:

$ qsub -pe mpich 7 -l master test.sh

in an empty cluster works fine, and the "master" complex will give complete access to the elected master node. Of course, the number for the
remaining slots on the master node must be adjusted to honor this cut-off, i.e. slots=(needed)-1+(slots per host) for a PE with $fill_up.
Once the job is running some serial jobs can be submitted and fill the gaps on the slave nodes (this conforms to the output of `qhost -F
master`, that it's only changed on the master node of the parallel job).

But when there are already some serial jobs running in the cluster, the above job is less likely to start, as it seems that during
scheduling the EXCL complex will be checked for all slaves too. The output of `qstat -j <jobid>` shows an error like:

scheduling info:            cannot run in PE "mpich" because it only offers 4 slots

But this reflects only one complete free node, which would be good for the master. There are more free slots scattered around the cluster.

In addition, `qalter -w v/p <jobid>` ouptuts "no suitable queues" for a waiting job like this. For "-w v" (which assumes an empty cluster)
it's wrong - the job will start once the former serial jobs are gone. For "-w p" it corresponds with the ouput of `qstat -j <jobid>`,
nevertheless it's also wrong, as the job could run even with other jobs in place.

   ------- Additional comments from reuti Sun Aug 29 08:52:03 -0700 2010 -------
The same applies also for normal JOB consumables, when a load_threshold is used:

$ qconf -sc
#name               shortcut   type        relop   requestable consumable default  urgency
master              mst        INT         <=      YES         JOB        1        1000

One queue with slots=4 across two nodes with each:

$ qconf -se pc15370
...
complex_values        master=2

Running job: qsub -pe mpich 4 test.sh

$ qstat -F master
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
all.q@pc15370.Chemie.Uni-Marbu BIP   0/2/4          0.06     lx24-x86
        hc:master=1
   2312 1.75000 test.sh    reuti        r     08/29/2010 17:32:13     2
---------------------------------------------------------------------------------
all.q@pc15381.Chemie.Uni-Marbu BIP   0/2/4          0.02     lx24-x86
        hc:master=2
   2312 1.75000 test.sh    reuti        r     08/29/2010 17:32:13     2

this is correct. But now with a: load_thresholds       master=1

$ qstat -j 2313
scheduling info:            cannot run in PE "mpich" because it only offers 2 slots

`qalter` output is misleading like in the former case complaining about "no suitable queues". Removing the load_threshold will start the job.

(In the real case I want to block other queues, but this example is a shrink down version. In contrast to issue 464 load_threshold are now
already fulfilled for "<=", not only "<" - this must have  been changed at one time. But this is a different thing.)

   ------- Additional comments from reuti Mon Aug 30 03:10:00 -0700 2010 -------
In the above examples the complex was attached to the exexhosts. The same behavior happens when the complex is instead attached to the queue.

To emphasize it: the problem exists AFAICS only if a load_threshold for this JOB complex or an exclusive boolean is used. The normal
scheduling honors the JOB consumable correctly also at scheduling times.
#1266 invalid IZ3282: Changed h_rt not effective for `qmod -rj` or checkpoint migration reuti
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3282]

        Issue #:      3282             Platform:     All      Reporter: reuti (reuti)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.2u5       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     Changed h_rt not effective for `qmod -rj` or checkpoint migration
   Status whiteboard:
      Attachments:

     Issue 3282 blocks:
   Votes for issue 3282:


   Opened: Thu Sep 2 03:53:00 -0700 2010 
------------------------


When the limit for e.g. h_rt is changed for a running job by `qalter`, it should be honored the next time the job is restarted or migrated
(according to the `man qsub` for the "-l" option). This does not happen for h_rt (i.e. the SGE limit), although `qstat -j` shows the changed
limit. Changing something like h_vmem is working as expected for setting the ulimits at least though.

   ------- Additional comments from reuti Fri Sep 3 10:29:13 -0700 2010 -------
Submitting a copy of the job with `qresub` will use the new settings though.
Note: See TracQuery for help on using queries.