Discussion:
gcc target_clones
(too old to reply)
Solar Designer
2015-12-07 21:41:20 UTC
Permalink
Raw Message
Hi,

This is a new/upcoming gcc feature that we might find useful:

https://developerblog.redhat.com/2015/12/07/octobernovember-2015-gnu-toolchain-update/

"GCC's function attribute feature has been extended to support another
attribute: target_clones (<options>). This is used to specify that a
function is to be cloned into multiple versions compiled with different
target options than specified on the command line. The supported
options and restrictions are the same as for target attribute.
For instance on an x86, you could compile a function with
target_clones("sse4.1,avx"). It will create 2 function clones, one
compiled with -msse4.1 and another with -mavx. At the function call it
will create resolver ifunc, that will dynamically call a clone suitable
for current architecture."

I found that this was discussed on the gcc-patches list (along with
actual gcc patches implementing the feature) in October. It is unclear
to me whether this made it, or will make it soon, into an official gcc
release.

Alexander
magnum
2015-12-07 23:10:47 UTC
Permalink
Raw Message
Post by Solar Designer
https://developerblog.redhat.com/2015/12/07/octobernovember-2015-gnu-toolchain-update/
"GCC's function attribute feature has been extended to support another
attribute: target_clones (<options>). This is used to specify that a
function is to be cloned into multiple versions compiled with different
target options than specified on the command line. The supported
options and restrictions are the same as for target attribute.
For instance on an x86, you could compile a function with
target_clones("sse4.1,avx"). It will create 2 function clones, one
compiled with -msse4.1 and another with -mavx. At the function call it
will create resolver ifunc, that will dynamically call a clone suitable
for current architecture."
Cool. I'm not sure it helps a lot, this is basically trivial without it.
But it should help a little provided we can live with the portability
issues.

The reason I did not already implement this manually is that our main
obstacle is not just making multiple copies of eg. SIMDmd4body. Between
eg. AVX and AVX2 we have a bigger problem: The format, and all of its
functions that depend on SIMD width and/or interleaving factor. Still
doable of course, but trickier.

So we'd normally have to clone at least set_key, get_key, crypt_all,
cmp_*, get_hash_* and, for salted formats, often set_salt too and
perhaps even get_salt. I wonder if the compiler would "optimize away"
redundant clones that actually ended up identical (eg. for SSE2..AVX
inclusive).

Also, we'd need to change a few things: For example, using this we would
no longer be able to have consistent SIMD_COEF_32, SIMD_PARA_MD4 or
GETPOS macros. And how would we use pseudo-intrinsics.h? Perhaps we can
source it *within* the functions that has target clones?

magnum

Loading...