Is this magic? how can something so simple has such low collision rate? (even better for the uin64 variant)

is this magic? how can something so simple has such low collision rate? (even better for the uin64 variant)

Attached: cdb_hash.png (479x518, 40K)

Other urls found in this thread:

en.wikipedia.org/wiki/Linear_congruential_generator
sanmayce.com/Fastest_Hash/
pastebin.com/gcXdwUdQ
github.com/dwyl/english-words
pastebin.com/tuVSh8mU
twitter.com/NSFWRedditVideo

to compare, this is hash function from TempleOS
and I have that file with 10 million top passwords

collisions:
# cdb 32bit
1: 998424
2: 783
3: 3
# templeos
1: 72354
2: 13896
3: 7067
4: 4847
...
11: 1126
12: 811
...
303: 2
304: 1
...
406: 1

it's so mindblowingly good

Attached: templeos_hash.png (303x538, 26K)

>low collision rate
is this what webshitters consider low collision rate?
FNV-1a is the best trivial hash function in existence.

Explain this to a tech-illiterate, please?

none of them are keyed though, so not practical for many usecases
siphash ftw (coincidentally also by djb)

en.wikipedia.org/wiki/Linear_congruential_generator
and properly picked parameters
the 33 prime is good for arithmetics as (33*hash) == (hash + hash

100,000 collisions isn't fantastic. How many collisions are in the temple alg if you integrate?
32 bit hashes are worthless for almost everything anyway, especially considering every CPU can do 64 bit math with no penalty. The increase in hash space is almost always worth it unless you're seriously crunching for space.

type, it's 1 million passwords, not 10 millions
the 1: XXX line means no collision

So you're doing 1 million passwords, how large is your table then?

>1 million passwords
>100,000 no collisions
>800 collisions
Where did the other 899,200 go?

I think 998424 is a bit closer to 1 million than to 100 thousand.
Anyway this is not useful without knowing how large the hash table he's trying to fit his 1 million passwords in.

djb is slower than fnv for hash tables, see sanmayce.com/Fastest_Hash/

I don't get what's mind blowingly good about this.

I also don't get the output of your collision detection - what's up with having more than one result?

Number on the left is the size of a group. Inside the group, all words has the same hash value. On the right is how many of those groups are.

998424 + 783*2 + 3*3 = 999999

It should be a million as the guy claims in one of his posts above so he must have forgotten one word somewhere.

It's too late for this shit. I read that as 98,424 like ten times.

Also, why this is good, it's because it makes it so that when you fill up your hash table, you have little to no entries for which after calculating the hash and looking it up, you also have to go through linked list and figure out which one of words with that hash you're looking for.

what should I be seeing? no mention of djb's hash functions

>i can't compile a benchmark

Good old XorShift.

I ran OP's hash on my list of 466551 English words with a hashtable of the size of 1000000 and I'm getting a lot more collisions than him.

Total words: 466551

1: 292213
2: 68178
3: 10752
4: 1258
5: 134
6: 4

pastebin.com/gcXdwUdQ
Words from here: github.com/dwyl/english-words

Can someone give me the tmpleos hash function in human readable format for testing? I don't want to read asm.

Total words: 466551
Brackets: 1000003

cdb_hash:
1: 292687
2: 68125
3: 10716
4: 1177
5: 138
6: 9
7: 2

fnv_hash_1a_32:
1: 293030
2: 68264
3: 10477
4: 1233
5: 120
6: 5

djb2:
1: 292727
2: 68076
3: 10694
4: 1229
5: 124
6: 9

djb2a:
1: 292687
2: 68125
3: 10716
4: 1177
5: 138
6: 9
7: 2

murmur3_32:
1: 292199
2: 68637
3: 10480
4: 1244
5: 112
6: 17


Did the test for a bunch more hash functions and all produce pretty much same result.
pastebin.com/tuVSh8mU

Oh hey I found one hash that's actually bad.

hash_loselose:
1: 161
2: 85
3: 52
4: 49
5: 32

...

978: 2
989: 1
994: 1
995: 1
999: 1


Process finished with exit code 0


uint32_t hash_loselose(const unsigned char *p, int len)
{
uint32_t hash = 0;

for (int i = 0; i < len; i++)
hash += *p++;

return hash;
}

I made a bit of sense of results I'm getting.

hash_random:
1: 292086
2: 68592
3: 10560
4: 1240
5: 121
6: 6

unsigned seed1 = std::chrono::system_clock::now().time_since_epoch().count();
std::mt19937 generator(seed1);
uint32_t hash_random(const unsigned char *p, int len){
return generator();
}


So this is a "hash" function that just returns a random number. So basically, seems like this about as good as any hashing function can get and all hashing functions I've looked at so far are pretty much perfect (except for the lose-lose one from K&R).

Really want someone to spoonfeed me the templeos function. It has shr in it which I assume is right bit shift which doesn't seem to make sense to me.

Is this it?
uint32_t hash_templeos(const unsigned char *p, int len)
{
uint32_t hash = 0;

for (int i = 0; i < len; i++)
hash = (hash

Holy shit imagine being this much of a codelet. You don't deserve such a thing.

Hey I think I got it in !

well, Jow Forums, come on, this seems like an interesting topic
don't tell me you don't want to have fun together exploring hash functions
is op dead? OP? Is my C version of templeos hash correct?

>tfw phoneposting

>mindblowingly good
C++'s std::hash has per standard collision probability of 1/size_t, so 1 / (2^64 - 1) for 64-bit ALUs.