Opened 9 years ago

Last modified 9 years ago

#1299 new enhancement

Additional architecture em64t

Reported by: Reuti Owned by:
Priority: normal Milestone:
Component: sge Version: 6.2u5
Severity: minor Keywords:
Cc:

Description

Despite the fact, that such an architecture is identical to amd64, it allows to address the correct binaries in a mixed cluster having both architectures by usage of $ARC in the path to the binaries in the jobscript. In the past this was mostly useful when a cluster had 64 bit and 32 bit nodes already.

Side effect: qhost shows also the actual architecture then.

Workaround for now: install amd64 binaries, use sed to convert all strings inside the binaries to em64t (also the directories must be renamed), install amd64 binaries again and then the common tarball. As a final step the arch script must be adjusted (it's somewhere in the mail archive).

The workaround will fail of course, when the appearance of amd64 represent machine code.

Change History (2)

comment:1 follow-up: Changed 9 years ago by dlove

I'm afraid I don't understand this, even after finding what I assume
is the right list thread:
http://gridengine.markmail.org/message/toebsap5krdxuvop?q=em64t#query:em64t+page:1+mid:i46obkrnmvbaoqpy+state:results
I don't actually know what emt64 means -- it's not an autotools
architecture, for instance.

Is this an incompatibility problem with the SGE binaries, in which
case we should just get the compiler flags right? The OP in the
thread above seems just to want a complex, possibly set by a load
sensor (like Olesen's example if I recall correctly). For running
binaries with different optimizations, you need quite fine control
anyhow, e.g. we have three types of Opteron nodes (soon to be four)
and some Westmere. Also you may want to take into account, say, cache
size as well as the ISA. While they benefit from different compute
binaries, I don't want them running different SGE binaries.

Would it be useful if we distributed an example load sensor for
fine-grained architecture, for instance, or is there something else
I'm missing here?

comment:2 in reply to: ↑ 1 Changed 9 years ago by Reuti

(Yep, the markmail entry is the right one.)

Is amd64 an autotools architecture? Often I see a generic x86_64. The arch script in SGE has it's own rules I thought.

The reason for having a new architecture was to request it by -l arch=lx26-em64t to get Intel machines, and also to have a correct output of qhost reflecting the installed vendor of the CPU. Sure, the former can be done by a load sensor in case you want the job land on a machine with a dedicated CPU type. Inside the jobscript you have to investigate again the vendor I think, unless you use qstat for the job on its own and parse the requested machine type. Having a $ARC already set up would ease the things.

The SGE binaries per se are the same whether it's AMD or Intel (just the arch string replaced and inside new sub directories with names like lx26-em64t as the path is hardcoded), and I understand the reason to have only one. My brute force approach to change all occurrence of amd64 with em64t was more a proof of concept.

As the architecture is hard coded in some binaries, maybe this could be enhanced by some "machine type aliasing" then instead of having duplicated binaries.

Default is amd64, but if there is a file sge_architectures in $SGE_ROOT/$SGE_CELL/common it will try to use this to assign an arbitrary string instead lx26_amd64 which is used in the qhost output and also for setting $ARC. But for the path resolution the default lx26_amd64 is used, so there is the need for only one set of binaries. The file sge_architectures will be parsed only once when the execd starts, as the architecture won't change when the machine is running.

Or a special load sensor, which is asked once when the exec machine starts to create the proper entry on the fly.

Note: See TracTickets for help on using tickets.