Discussion:
GPU-side mask mode
(too old to reply)
magnum
2016-08-26 17:55:53 UTC
Permalink
Raw Message
The good news:
We now have gpu-side mask mode (and compare) for NTLMv2-opencl. Speed
seems to be roughly on par with Hashcat even though we fully support
Unicode:

[***@super src]$ ../run/john -form:ntlmv2-opencl -test -mask
Device 6: GeForce GTX TITAN X
Benchmarking: ntlmv2-opencl, NTLMv2 C/R [MD4 HMAC-MD5 OpenCL]... DONE,
GPU util:99%
Raw: 1097M c/s real, 1097M c/s virtual

[***@super src]$ ../run/john -form:ntlmv2-opencl -test -mask -enc:utf8
Device 6: GeForce GTX TITAN X
Benchmarking: ntlmv2-opencl, NTLMv2 C/R [MD4 HMAC-MD5 OpenCL] in UTF-8
mode... DONE, GPU util:99%
Raw: 1094M c/s real, 1094M c/s virtual

The bad news:
I did this bascially by copying Sayantan's mscash-opencl host code and
part of kernel and then changing it for my needs. The bottom line is I
still don't understand all of it. Lots of code is duplicated over
formats. I think Sayantan started to clean that up (the NT format seems
to use more shared code) but I have yet to digest that.

Anyway I think I'll implement GPU-side mask mode in a few more formats
before trying to do any clean-up. Perhaps I'll get to understand some
more details while working with that.

https://github.com/magnumripper/JohnTheRipper/issues/1845

magnum
jfoug
2016-08-26 18:18:59 UTC
Permalink
Raw Message
Post by magnum
We now have gpu-side mask mode (and compare) for NTLMv2-opencl. Speed
seems to be roughly on par with Hashcat even though we fully support
I did this bascially by copying Sayantan's mscash-opencl host code and
part of kernel and then changing it for my needs. The bottom line is I
still don't understand all of it. Lots of code is duplicated over
formats. I think Sayantan started to clean that up (the NT format
seems to use more shared code) but I have yet to digest that.
(pardon my ignorance, I do not know this code at all).

Is there a way to make this into a module (like we have done with SIMD
code), so that it does not have to be hooked so deeply into the format's
kernel code? There would still be some interface code (like we have
with SIMD), but the main functionality is external.

I am sending this offlist, as I do not want to pollute your thread IF
this is total PEBCAK
magnum
2016-08-26 18:28:30 UTC
Permalink
Raw Message
Post by jfoug
Post by magnum
We now have gpu-side mask mode (and compare) for NTLMv2-opencl. Speed
seems to be roughly on par with Hashcat even though we fully support
I did this bascially by copying Sayantan's mscash-opencl host code and
part of kernel and then changing it for my needs. The bottom line is I
still don't understand all of it. Lots of code is duplicated over
formats. I think Sayantan started to clean that up (the NT format
seems to use more shared code) but I have yet to digest that.
Is there a way to make this into a module (like we have done with SIMD
code), so that it does not have to be hooked so deeply into the format's
kernel code? There would still be some interface code (like we have
with SIMD), but the main functionality is external.
Yes but it's currently an enormeous amount of code all over the CPU-side
format. It needs a LOT of work. Ideally we should just call some shared
function from reset(), and some other function from crypt_all() (and so
on) and be done with it.

On kernel side, things are much better. It's the host code that knocks
me cold.
Post by jfoug
I am sending this offlist, as I do not want to pollute your thread IF
this is total PEBCAK
No you didn't :-) But this is a good discussion here.

magnum
jfoug
2016-08-26 18:45:44 UTC
Permalink
Raw Message
Post by magnum
Post by jfoug
Is there a way to make this into a module (like we have done with SIMD
code), so that it does not have to be hooked so deeply into the format's
kernel code? There would still be some interface code (like we have
with SIMD), but the main functionality is external.
Yes but it's currently an enormeous amount of code all over the
CPU-side format. It needs a LOT of work. Ideally we should just call
some shared function from reset(), and some other function from
crypt_all() (and so on) and be done with it.
On kernel side, things are much better. It's the host code that knocks
me cold.
It's good that the 'kernel' side is 'cleaner' and somewhat modular. I
agree with you on the location of the CPU hooking code. So if the code
could be made more 'generic' (with no loss of speed), then simply
changing a few lines in those 2 CPU-side format functions and adding
whatever kernel hooks are needed, and any format would get the benefits
of GPU masking. That was my questions. It would be a HELL of a lot
easier to do this now, than to hammer out a dozen more format, and then
have to decouple all of those. Just think of having to decouple 50
formats that have SIMD code intertwined (like some of the -ng formats).
It would really slow down enhancements or usage of the SIMD stuff. Yes,
the SIMD callable code is not 100% the fastest always, but it is pretty
darn close, AND is pretty trivial to simply hook it into a format.
Post by magnum
Post by jfoug
I am sending this offlist, as I do not want to pollute your thread IF
this is total PEBCAK
No you didn't :-) But this is a good discussion here.
I saw that the moment I clicked send, that I had not changed the 'to:'
value, lol. Oh well, I would have brought it to list anyway, once I saw
it was not 100% PEBCAK.

Jim.
magnum
2016-08-27 08:24:47 UTC
Permalink
Raw Message
Post by jfoug
Post by magnum
Post by jfoug
Is there a way to make this into a module (like we have done with SIMD
code), so that it does not have to be hooked so deeply into the format's
kernel code? There would still be some interface code (like we have
with SIMD), but the main functionality is external.
Yes but it's currently an enormeous amount of code all over the
CPU-side format. It needs a LOT of work. Ideally we should just call
some shared function from reset(), and some other function from
crypt_all() (and so on) and be done with it.
On kernel side, things are much better. It's the host code that knocks
me cold.
It's good that the 'kernel' side is 'cleaner' and somewhat modular. I
agree with you on the location of the CPU hooking code. So if the code
could be made more 'generic' (with no loss of speed), then simply
changing a few lines in those 2 CPU-side format functions and adding
whatever kernel hooks are needed, and any format would get the benefits
of GPU masking. That was my questions. It would be a HELL of a lot
easier to do this now, than to hammer out a dozen more format, and then
have to decouple all of those. Just think of having to decouple 50
formats that have SIMD code intertwined (like some of the -ng formats).
Yes but this only affects 3-4 more "fast" formats so it might be the
quickest way ahead. With this in place we're not far away from a Jumbo-2.

Actually all our OpenCL formats are an awful mess already *without*
gpu-side mask and compare. With some careful designing they could be a
lot cleaner and use more shared code, eg. for setting up (and tearing
down) buffers.

Oh and BTW the worst stuff here is not really the GPU-side *mask*, it's
the host side of GPU-side *compare*. The format source file grew with
447 lines, or 71%.

magnum

Loading...