Is this magic? how can something so simple has such low collision rate? (even better for the uin64 variant)

Question

Is this magic? how can something so simple has such low collision rate? (even better for the uin64 variant)

Christopher Thomas

is this magic? how can something so simple has such low collision rate? (even better for the uin64 variant)

Attached: cdb_hash.png (479x518, 40K)

August 24, 2019 - 06:19

Other urls found in this thread:

en.wikipedia.org/wiki/Linear_congruential_generator
sanmayce.com/Fastest_Hash/
pastebin.com/gcXdwUdQ
github.com/dwyl/english-words
pastebin.com/tuVSh8mU
twitter.com/NSFWRedditVideo

Easton Cooper

to compare, this is hash function from TempleOS
and I have that file with 10 million top passwords

collisions:
# cdb 32bit
1: 998424
2: 783
3: 3
# templeos
1: 72354
2: 13896
3: 7067
4: 4847
...
11: 1126
12: 811
...
303: 2
304: 1
...
406: 1

it's so mindblowingly good

Attached: templeos_hash.png (303x538, 26K)

August 24, 2019 - 06:41

William Watson

>low collision rate
is this what webshitters consider low collision rate?
FNV-1a is the best trivial hash function in existence.

August 24, 2019 - 06:54

Landon Bailey

Explain this to a tech-illiterate, please?

August 24, 2019 - 06:59

Jack Turner

none of them are keyed though, so not practical for many usecases
siphash ftw (coincidentally also by djb)

August 24, 2019 - 07:10

Sebastian Cook

en.wikipedia.org/wiki/Linear_congruential_generator
and properly picked parameters
the 33 prime is good for arithmetics as (33*hash) == (hash + hash

August 24, 2019 - 07:17

Bentley Myers

100,000 collisions isn't fantastic. How many collisions are in the temple alg if you integrate?
32 bit hashes are worthless for almost everything anyway, especially considering every CPU can do 64 bit math with no penalty. The increase in hash space is almost always worth it unless you're seriously crunching for space.

August 24, 2019 - 07:23

Gavin Myers

type, it's 1 million passwords, not 10 millions
the 1: XXX line means no collision

August 24, 2019 - 07:25

Lucas Davis

So you're doing 1 million passwords, how large is your table then?

August 24, 2019 - 07:27

Leo Robinson

>1 million passwords
>100,000 no collisions
>800 collisions
Where did the other 899,200 go?

August 24, 2019 - 07:30

Juan Gutierrez

I think 998424 is a bit closer to 1 million than to 100 thousand.
Anyway this is not useful without knowing how large the hash table he's trying to fit his 1 million passwords in.

August 24, 2019 - 07:32

Noah Roberts

djb is slower than fnv for hash tables, see sanmayce.com/Fastest_Hash/

August 24, 2019 - 07:33

Charles Richardson

I don't get what's mind blowingly good about this.

I also don't get the output of your collision detection - what's up with having more than one result?

August 24, 2019 - 07:34

Nathaniel Collins

Number on the left is the size of a group. Inside the group, all words has the same hash value. On the right is how many of those groups are.

998424 + 783*2 + 3*3 = 999999

It should be a million as the guy claims in one of his posts above so he must have forgotten one word somewhere.

August 24, 2019 - 07:38

Easton Green

It's too late for this shit. I read that as 98,424 like ten times.

August 24, 2019 - 07:38

Jaxon Flores

Also, why this is good, it's because it makes it so that when you fill up your hash table, you have little to no entries for which after calculating the hash and looking it up, you also have to go through linked list and figure out which one of words with that hash you're looking for.

August 24, 2019 - 07:40

Charles Nguyen

what should I be seeing? no mention of djb's hash functions

August 24, 2019 - 07:41

Jaxon Ortiz

>i can't compile a benchmark

August 24, 2019 - 07:42

Ian Jackson

Good old XorShift.

August 24, 2019 - 08:10

Aaron Clark

I ran OP's hash on my list of 466551 English words with a hashtable of the size of 1000000 and I'm getting a lot more collisions than him.

Total words: 466551

1: 292213
2: 68178
3: 10752
4: 1258
5: 134
6: 4

pastebin.com/gcXdwUdQ
Words from here: github.com/dwyl/english-words

August 24, 2019 - 08:25

Ryder Ward

Can someone give me the tmpleos hash function in human readable format for testing? I don't want to read asm.

August 24, 2019 - 08:29

Jordan Wright

Total words: 466551
Brackets: 1000003

cdb_hash:
1: 292687
2: 68125
3: 10716
4: 1177
5: 138
6: 9
7: 2

fnv_hash_1a_32:
1: 293030
2: 68264
3: 10477
4: 1233
5: 120
6: 5

djb2:
1: 292727
2: 68076
3: 10694
4: 1229
5: 124
6: 9

djb2a:
1: 292687
2: 68125
3: 10716
4: 1177
5: 138
6: 9
7: 2

murmur3_32:
1: 292199
2: 68637
3: 10480
4: 1244
5: 112
6: 17

Did the test for a bunch more hash functions and all produce pretty much same result.
pastebin.com/tuVSh8mU

August 24, 2019 - 08:48

Robert Perez

Oh hey I found one hash that's actually bad.

hash_loselose:
1: 161
2: 85
3: 52
4: 49
5: 32

...

978: 2
989: 1
994: 1
995: 1
999: 1

Process finished with exit code 0

uint32_t hash_loselose(const unsigned char *p, int len)
{
uint32_t hash = 0;

for (int i = 0; i < len; i++)
hash += *p++;

return hash;
}

August 24, 2019 - 08:52

Sebastian Hall

I made a bit of sense of results I'm getting.

hash_random:
1: 292086
2: 68592
3: 10560
4: 1240
5: 121
6: 6

unsigned seed1 = std::chrono::system_clock::now().time_since_epoch().count();
std::mt19937 generator(seed1);
uint32_t hash_random(const unsigned char *p, int len){
return generator();
}

So this is a "hash" function that just returns a random number. So basically, seems like this about as good as any hashing function can get and all hashing functions I've looked at so far are pretty much perfect (except for the lose-lose one from K&R).

August 24, 2019 - 09:07

Jason Brooks

Really want someone to spoonfeed me the templeos function. It has shr in it which I assume is right bit shift which doesn't seem to make sense to me.

Is this it?
uint32_t hash_templeos(const unsigned char *p, int len)
{
uint32_t hash = 0;

for (int i = 0; i < len; i++)
hash = (hash

August 24, 2019 - 09:13

William Mitchell

Holy shit imagine being this much of a codelet. You don't deserve such a thing.

August 24, 2019 - 09:17

Colton Ward

Hey I think I got it in !

August 24, 2019 - 09:19

Luis Cruz

well, Jow Forums, come on, this seems like an interesting topic
don't tell me you don't want to have fun together exploring hash functions
is op dead? OP? Is my C version of templeos hash correct?

August 24, 2019 - 10:51

Aiden Ramirez

>tfw phoneposting

August 24, 2019 - 12:17

Henry Evans

>mindblowingly good
C++'s std::hash has per standard collision probability of 1/size_t, so 1 / (2^64 - 1) for 64-bit ALUs.

August 24, 2019 - 12:20

1 2 3 Next

Is this magic? how can something so simple has such low collision rate? (even better for the uin64 variant)

Last threads