Discussion:
[john-dev] bt.c initial table size
Solar Designer
2015-09-12 20:25:41 UTC
Permalink
Sayantan -

When testing with lots of hashes, I was getting many "Progress is too
slow!! trying next table size." and this hurt the speeds badly (2 to 3.5
minutes total running time for my test). To have bt.c use a larger
table size right away, I hacked this into create_perfect_hash_table(),
right before the "do { ... } while(1);" loop:

for (i = 0; i < 9; i++) {
unsigned int temp;
temp = next_prime(approx_offset_table_sz % 10);
approx_offset_table_sz /= 10;
approx_offset_table_sz *= 10;
approx_offset_table_sz += temp;

i++;

if (!(i % 5)) {
multiplier_ot += inc_ot;
multiplier_ht += inc_ht;
approx_offset_table_sz = (((long double)num_loaded_hashes / 4.0) * multiplier_ot + 10.00);
approx_hash_table_sz = ((long double)num_loaded_hashes * multiplier_ht);
}
}

The loop contents is some lines I copied from your larger loop. Oh, I
just noticed I got an extra "i++" in there. Well, I am writing this to
ask you to make something like this a standard feature, where we could
request a larger initial table size without hacking the source code -
and maybe the default needs to be bumped up too?

Alexander
Sayantan Datta
2015-09-14 02:17:37 UTC
Permalink
Post by Solar Designer
Sayantan -
When testing with lots of hashes, I was getting many "Progress is too
slow!! trying next table size." and this hurt the speeds badly (2 to 3.5
minutes total running time for my test). To have bt.c use a larger
table size right away, I hacked this into create_perfect_hash_table(),
for (i = 0; i < 9; i++) {
unsigned int temp;
temp = next_prime(approx_offset_table_sz % 10);
approx_offset_table_sz /= 10;
approx_offset_table_sz *= 10;
approx_offset_table_sz += temp;
i++;
if (!(i % 5)) {
multiplier_ot += inc_ot;
multiplier_ht += inc_ht;
approx_offset_table_sz = (((long
double)num_loaded_hashes / 4.0) * multiplier_ot + 10.00);
approx_hash_table_sz = ((long
double)num_loaded_hashes * multiplier_ht);
}
}
The loop contents is some lines I copied from your larger loop. Oh, I
just noticed I got an extra "i++" in there. Well, I am writing this to
ask you to make something like this a standard feature, where we could
request a larger initial table size without hacking the source code -
and maybe the default needs to be bumped up too?
Alexander
Hi Solar,

How would you like to request the table size - the exact table size or a
multiplier(as in table size = num_loaded_hashes * multiplier + prime) .
Also, if you request a larger table size, you are not guaranteed that it
will 'build the tables on first try', however, the probability of that
happening is definitely higher.

Regards,
Sayantan
Solar Designer
2015-09-14 12:32:59 UTC
Permalink
Sayantan,
Post by Sayantan Datta
How would you like to request the table size - the exact table size or a
multiplier(as in table size = num_loaded_hashes * multiplier + prime) .
Also, if you request a larger table size, you are not guaranteed that it
will 'build the tables on first try', however, the probability of that
happening is definitely higher.
I think request it via the multiplier, and it should be possible to
reuse a table size multiplier that a previous invocation of john tuned
itself to (in other words, those "trying next table size" lines should
include the multiplier value being tried next - and yes, they should
only use sizes that can later be requested explicitly).

Also, you already have some heuristics to determine initial
multiplier_ot and dupe_remove_ht_sz based on num_ld_hashes, but I got so
many "trying next table size" messages that I think those hardcoded
defaults need to be re-tuned.

Thanks,

Alexander

Loading...