Discussion:
loader and cracker (prefetch) optimizations from September 2015
(too old to reply)
Solar Designer
2016-01-21 11:08:11 UTC
Permalink
Hi magnum -

FYI, I've just committed the loader and cracker optimizations from
September 2015 to the core tree. There are a few differences from what
went into jumbo back then, but almost all of those are on purpose, and
should remain as differences. For example, show_uid_in_cracks is
jumbo-specific, which affected these changes in a few places. So when
merging these, you should mostly keep the code currently in jumbo as-is,
even if the new core code does those things differently.

One exception to that is SSE2 vs. SSE checks for the prefetching.
It turns out those prefetch instructions and intrinsics are available
with plain SSE (Pentium 3) rather than require SSE2 (Pentium 4), so
let's in fact be checking __SSE__ and #include'ing <xmmintrin.h>, rather
than checking __SSE2__ and #include'ing <emmintrin.h>.

More importantly, our use of the NTA hint probably results in
performance regressions for some hash counts (neither very small nor
very large), as it reduces use of L2+ caches. I've actually seen at
least one such regression, where replacing the hint with T0 helped.
Unfortunately, NTA is in fact better than T0 for huge password hash
files, like the 29M test case. Maybe we need to move this portion of
code (the whole prefetching cracker) into an inline-able function, and
have the compiler specialize it in two different ways. And we'd need
some threshold parameter to choose one or the other per-salt.

Thanks,

Alexander
magnum
2016-01-22 17:24:08 UTC
Permalink
Post by Solar Designer
FYI, I've just committed the loader and cracker optimizations from
September 2015 to the core tree. There are a few differences from what
went into jumbo back then, but almost all of those are on purpose, and
should remain as differences. For example, show_uid_in_cracks is
jumbo-specific, which affected these changes in a few places. So when
merging these, you should mostly keep the code currently in jumbo as-is,
even if the new core code does those things differently.
One exception to that is SSE2 vs. SSE checks for the prefetching.
It turns out those prefetch instructions and intrinsics are available
with plain SSE (Pentium 3) rather than require SSE2 (Pentium 4), so
let's in fact be checking __SSE__ and #include'ing <xmmintrin.h>, rather
than checking __SSE2__ and #include'ing <emmintrin.h>.
Thanks for pointing out how to merge this, it really made it easy. I
think I got it right: Attached is the net changes to Jumbo.

I did not "reduce CRACKED_HASH_LOG from jumbo's 25 to 21"... should we
do so in Jumbo?

magnum
Solar Designer
2016-01-23 14:42:00 UTC
Permalink
Hi magnum,
I think I got it right: Attached is the net changes to Jumbo.
These look good to me, except that in cracker.c there's this redundant
piece left:

#if CRK_PREFETCH && defined(__SSE2__)
#include <emmintrin.h>
#endif

it becomes redundant with the addition of:

#if CRK_PREFETCH && defined(__SSE__)
#include <xmmintrin.h>
#endif

higher in the same file. Please remove the __SSE2__ / emmintrin.h piece.
I did not "reduce CRACKED_HASH_LOG from jumbo's 25 to 21"... should we
do so in Jumbo?
I think not. Let's keep things as they are.

Thanks!

Alexander
Solar Designer
2016-01-23 14:48:47 UTC
Permalink
magnum -

There's an unneeded difference in variable naming in
ldr_init_password_hash(). I think you should take this entire function
from core as-is.

Thanks,

Alexander
magnum
2016-01-24 15:36:07 UTC
Permalink
Post by Solar Designer
These look good to me, except that in cracker.c there's this redundant
#if CRK_PREFETCH && defined(__SSE2__)
#include <emmintrin.h>
#endif
There's an unneeded difference in variable naming in
ldr_init_password_hash(). I think you should take this entire function
from core as-is.
Both fixed, thanks.

magnum

Loading...