Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (4 - 6 of 431)

1 2 3 4 5 6 7 8 9 10 11 12
Ticket Resolution Summary Owner Reporter
#743 worksforme IZ3180: qsub/qlogin segfaults on ~/.sge_request bmcnally
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3180]

        Issue #:      3180             Platform:     All      Reporter: bmcnally (bmcnally)
       Component:     gridengine          OS:        All
     Subcomponent:    clients          Version:      6.1u6       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    roland (roland)
      QA Contact:     roland
          URL:
       * Summary:     qsub/qlogin segfaults on ~/.sge_request
   Status whiteboard:
      Attachments:

     Issue 3180 blocks:
   Votes for issue 3180:


   Opened: Fri Nov 13 11:49:00 -0700 2009 
------------------------


Using the ~/.sge_request file below attempts to qlogin or qsub result in a segmentation fault. In my case I also have a global sge_request
file defined too. Removing this file (or even half of it) allows qsub/qlogin to succeed.

===
# sge_request file
#
# Set e-mail address
#-M test@test.com

# If you use qlogin, it will also notify you
# when those jobs are done, so you may want to put
# this option in your job scripts instead.
#-m e

# Put job standard output into the sgeoutput directory in
# your home directory. The filename will be named
# [jobname].o[jobid] (ex. testjob2.sh.o10979).
# If this option is not specified, the output file
# will be created in your home directory.
#-o $HOME/out

# Put job standard error into the sgeoutput directory in
# your home directory. The filename will be named
# [jobname].e[jobid] (ex. testjob2.sh.e10979).
# If this option is not specified, the output file
# will be created in your home directory.
#-e $HOME/err

# Uncomment to direct standard error into the standard output file
#-j y
===
#802 worksforme IZ3265: array jobs with PE and dependencies killing qmaster kisielk
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=3265]

        Issue #:      3265             Platform:     All      Reporter: kisielk (kisielk)
       Component:     gridengine          OS:        All
     Subcomponent:    qmaster          Version:      6.2u5       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    ernst (ernst)
      QA Contact:     ernst
          URL:
       * Summary:     array jobs with PE and dependencies killing qmaster
   Status whiteboard:
      Attachments:

     Issue 3265 blocks:
   Votes for issue 3265:


   Opened: Mon Apr 26 09:41:00 -0700 2010 
------------------------


I'm able to reproduce this rather consistently in my 6.2u5 install.

If a an array job is submitted that uses a PE, and it has jobs dependant on it, the qmaster process will crash when the tasks in the array job are completing.

The messages log shows:

04/26/2010 09:27:57|worker|master|C|!!!!!!!!!! JB_ja_tasks not found in element !!!!!!!!!!

Restarting the qmaster just causes it to crash again. Sometimes there is enough time for me to fire off a qdel, but other times I have to manually delete the job directory in the
qmaster spool.

I have a copy of the spool directory of a job that exhibits this behaviour if that would help in diagnosing the problem.

   ------- Additional comments from kisielk Mon Apr 26 10:45:58 -0700 2010 -------
We did some further experiments. It seems this only happens if the dependant job is also an array job that uses -hold_jid_ad to depend on the job using a PE. If the dependant job uses
jut -hold_jid, there is no problem.
#1339 worksforme job spool files getting lost dlove
Description

With a job running, shutting down and restarting qmaster produces messages like

06/20/2011 20:35:25|  main|lv3fn|C|can't get file stat for job file "jobs/00/0000/0018/common"

and loses the job(s).

Occurs at least at version [3938], but not with prerelease 8.0.0a

1 2 3 4 5 6 7 8 9 10 11 12
Note: See TracQuery for help on using queries.