[GE users] Understanding the jobs/ directories under the spool (common vs. .common)

fy fly at anydata.co.uk
Wed Sep 30 20:34:00 BST 2009


Chris,

I thought I'd have a go at digging into this.

The not enough memory error seems to be coming from a failed calloc() in 
cull_pack.c/cull_unpack_elem_partial(), is it possible that you are in 
fact running out of memory?

The "common" file is created in 
read_write_job.c/job_write_common_part(), first as ".common", then 
renamed  to "common", as Rayson has already pointed out.

As for understanding the directory structure, there is a nice ascii-art 
here, search for "job_write_spool_file"
http://gridengine.sunsource.net/source/browse/gridengine/source/libs/spool/classic/read_write_job.c?revision=1.26&view=markup

Sorry, that's as far as I can go!

Cheers
Fred Youhanaie


rayson wrote:
> How about this one?
> 
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2801
> 
> Rayson
> 
> 
> 
> On 9/30/09, templedf <dan.templeton at sun.com> wrote:
>> A quick grep through the source code doesn't show anywhere that .common
>> is used as a file name.  Doesn't mean it's not hidden in there, though.
>> Are you sure that's not some kind of NFS artifact?
>>
>> Daniel
>>
>> craffi wrote:
>>> I'm trying to figure out why I've got a qmaster that crashes just
>>> after spitting out this error:
>>>
>>> main|55p02g|E|not enough memory for unpacking pe_task "jobs/
>>> 00/0025/5285/1-4096/1/.common"
>>>
>>> The ".common" file is interesting, there is only one job directory
>>> that has that:
>>>
>>>
>>>> -bash-3.2# find . -name ".common" -print
>>>> ./5285/1-4096/1/.common
>>>>
>>>>
>>>
>>> All of the other 200 and 400-CPU parallel jobs don't have a ".common"
>>> file.
>>>
>>> And looking at that directory we see:
>>>
>>>
>>>> -rw-r--r-- 1 root root 24533 Sep 30 09:26 common
>>>> -rw-r--r-- 1 root root 24552 Sep 30 09:26 .common
>>>> -rw-r--r-- 1 root root  1326 Sep 30 09:26 past_usage
>>>>
>>>>
>>>
>>> Is there anything odd about this? Would it be normal to have a common
>>> and a .common file in a job directory? I'm deep into directory
>>> structures that I don't really understand all that well here.
>>>
>>> -Chris
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219818
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219823
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219825
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219838

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list