[GE users] NULL element for JAT_prio

Jesse Becker beckerjes at mail.nih.gov
Thu Dec 18 14:44:38 GMT 2008


andreas wrote:

>> 	for i in `ls $SGE_ROOT/SGE_CELL/spool/qmaster/jobs/00/0132/`; do
>> 		qdel  132$i
>> 	done
>>
>> This doesn't work, and throws these three lines in the qmaster messages file:
>>
>> 12/11/2008 16:24:11|worker|saturn|I|beckerjes has deleted job 1328049
>> 12/11/2008 16:28:18|worker|saturn|E|unable to retrieve template task
> 
> This happens in job_get_ja_task_template_pending(). For some reason this
> job had an empty JB_ja_template field. According the code this can't happen
> since all jobs get their JB_ja_template at submission time in qmaster.

I wonder if something happened at submission time that caused a job to be 
corrupted?

> I think we must find the root cause of this phenomenon.

That's good to hear. :)

> Was there anything suspicious before qstat did fail with pending 
> jobs? E.g. these pending jobs were running before and then rescheduled?

I'm not aware of anything like that happening, although it should be in the 
logs.  We typically do not reschedule jobs.  Nor are jobs changed with qalter 
after submission either.

> Could you send me qmaster accounting + messages files for investigation?

Yes.  I'll send those directly to you in a separate email so as to not clog 
the listserv.


-- 
Jesse Becker
NHGRI Linux support (Digicon Contractor)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=93207

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list