Discussion:
db built from test vectors
(too old to reply)
magnum
2015-10-18 22:57:10 UTC
Permalink
Solar,

https://github.com/magnumripper/JohnTheRipper/issues/1835

A problem with many GPU-mask formats is they need a lot of special
handling for the test/benchmark case (where we lack a db). I have added
this experimental function that creates a supposedly complete db from a
format's test vectors:

struct db_main *ldr_init_fake_db(struct fmt_main *format);

So we could have this in a format, or perhaps in benchmark_format():

if (!db) {
fake_db = 1;
db = ldr_init_fake_db(format);
}

(...)

if (fake_db)
ldr_free_fake_db(db);

I'm not yet sure about all details on how we should use this or what
more changes need be made to core code but I reckoned I should give you
a heads-up.

Since July, we always pass the real db to reset() if we have one. This
was added for Agnieszka. With the above we could pass the fake db when
we don't have any db. I guess we also need to pass db->salts in all
crypt_all() calls - or perhaps only in benchmark. If we do so in
self-test before a real crack run, we probably need to reset(fake db)
before self-tests and reset(real db) after it. But that may collide with
Agnieszka's needs, I'm not sure.

Perhaps we should not change test/benchmark at all but instead use this
feature in format code as needed. Sayantan's formats could set a
format-global pointer to fake_db in reset(), and when crypt_all is
called with a NULL salt, we can use fake_db->salt instead.

We'll see where things lead. Perhaps you have some thoughts or ideas?

magnum
Solar Designer
2015-10-19 20:40:42 UTC
Permalink
Hi magnum,
Post by magnum
I'm not yet sure about all details on how we should use this or what
more changes need be made to core code but I reckoned I should give you
a heads-up.
I appreciate this.
Post by magnum
Since July, we always pass the real db to reset() if we have one. This
was added for Agnieszka. With the above we could pass the fake db when
we don't have any db. I guess we also need to pass db->salts in all
crypt_all() calls - or perhaps only in benchmark. If we do so in
self-test before a real crack run, we probably need to reset(fake db)
before self-tests and reset(real db) after it. But that may collide with
Agnieszka's needs, I'm not sure.
This would be a change to how the formats interface is defined, as IIRC
it would go against what the comments in formats.h currently say.
Post by magnum
Perhaps we should not change test/benchmark at all but instead use this
feature in format code as needed. Sayantan's formats could set a
format-global pointer to fake_db in reset(), and when crypt_all is
called with a NULL salt, we can use fake_db->salt instead.
This might be a hack, but it does appear to avoid changing the formats
interface definition.
Post by magnum
We'll see where things lead. Perhaps you have some thoughts or ideas?
I am not sure either. Unfortunately, my head is full with other
projects now, so I'll let you try to figure this out. But like I said,
I appreciate the heads-up.

Alexander
magnum
2015-10-19 21:40:12 UTC
Permalink
Post by Solar Designer
Post by magnum
Since July, we always pass the real db to reset() if we have one. This
was added for Agnieszka. With the above we could pass the fake db when
we don't have any db. I guess we also need to pass db->salts in all
crypt_all() calls - or perhaps only in benchmark. If we do so in
self-test before a real crack run, we probably need to reset(fake db)
before self-tests and reset(real db) after it. But that may collide with
Agnieszka's needs, I'm not sure.
This would be a change to how the formats interface is defined, as IIRC
it would go against what the comments in formats.h currently say.
I think it wouldn't: Both for reset() and crypt_all(), it says "may" be
NULL before self-test or benchmark. Though if we end up always passing a
"fake db" before self-test, we should definitely say so for clarity.
Post by Solar Designer
Post by magnum
Perhaps we should not change test/benchmark at all but instead use this
feature in format code as needed. Sayantan's formats could set a
format-global pointer to fake_db in reset(), and when crypt_all is
called with a NULL salt, we can use fake_db->salt instead.
This might be a hack, but it does appear to avoid changing the formats
interface definition.
We'll probably start with format-local hacks first anyway, and see how
ugly it gets. I'll report or revisit this issue later as needed.


On a related note we now support "-test -mask" which results in a mask
mode benchmark. It doesn't yet work for descrypt-opencl or lm-opencl but
it works fine for most (maybe all) others. If you specify a mask
(including a hybrid one) you will see how such a mask (eg. a short
-mask:?d) will perform.
This "fake db" thing will make it easier to proceed, and eventually we
should end up also self-testing with GPU-side mask active where needed.
Right now, "-test -mask" implies "-skip-self-test" and real-crack runs
with GPU-side mask will self-test without it (possibly letting bugs slip
through).

Thanks,
magnum
magnum
2015-10-25 21:50:23 UTC
Permalink
Post by magnum
Post by Solar Designer
Post by magnum
Since July, we always pass the real db to reset() if we have one. This
was added for Agnieszka. With the above we could pass the fake db when
we don't have any db. I guess we also need to pass db->salts in all
crypt_all() calls - or perhaps only in benchmark. If we do so in
self-test before a real crack run, we probably need to reset(fake db)
before self-tests and reset(real db) after it. But that may collide with
Agnieszka's needs, I'm not sure.
After some experimenting with code alternatives I think we should always
pass a "test vector db" for self-tests and benchmarks, and always pass
db-salts to crypt_all. The alternatives end up confusing and hacky.

If someone (eg. Agnieszka) need access to the real db at first call to
reset() (before self-test), we could simply add a pointer to the db
struct, eg. db->real:

If db->real == NULL, we don't have a real db (this is a --test run).
If db->real == db, this db *is* the real one.

I believe this would satisfy any needs.
Post by magnum
Post by Solar Designer
This would be a change to how the formats interface is defined, as IIRC
it would go against what the comments in formats.h currently say.
I think it wouldn't: Both for reset() and crypt_all(), it says "may" be
NULL before self-test or benchmark. Though if we end up always passing a
"fake db" before self-test, we should definitely say so for clarity.
If we go with my current ideas we should rephrase. Neither reset or
crypt_all will ever get a NULL pointer.

magnum

Loading...