Friday, March 21, 2008

Dang, this thing is creepy

I swear this thing looks (and sounds) like an overgrown housefly.

Thursday, March 13, 2008

MySQL 64 bit hash

What do you do to create a fast lookup for about 40 million email addresses for a client? I was inspired by this article in the MySQL performance blog

And by fast I mean around 10 ms. Well the first thing was to normalize, since there was not a table with unique email addresses, but that was about 1GB of data, and another 1GB of index.

So I hashed it, but 32 bits is only enough of a hash (without collision handling) for about 10,000 to 100,000 items. crc32 is actually quite a bit worse than a random hash, it collides almost 1% of the time; which frankly seems remarkably bad, almost malicious really. And MySQL doesn't have a 64 bit hash, so I did a 64 bit hash using crc32. My first (failed) attempt was this:

crc32(invitee_email)*1234567890 + crc32(concat('x', invitee_email))

However, after the first inevitable 32 bit hash collision, it turns out that

if crc(x) == crc(y), then crc( salt + x ) == crc( salt + y )

as well. Which I guess makes almost enough sense I should have noticed that going in. Anyway, anagram collisions aside, the following overcomes that, and did not generate any collisions across all 40 million addresses:

crc32(invitee_email)*1234567890 + crc32(reverse(invitee_email))


So this is about 28% of the size of the original both data and index-wise, and produces a three times speed up, from about 40 ms to about 12 ms, and is easier on the cache.

Saturday, March 8, 2008

More than meets the eye?

So while looking through YouTube for some way to kill time, I ran across this video called "無15", which just means #15 in Japanese.



It's this young teenage girl just staring at the camera, and kind of smirking. Okay, that's weird. More weird is that I'm watching. Even more weird is that it has 500,000 pageviews, in addition to the views from 無14, 無13, ... Plus they have spawned a whole family of response videos from



to



So I'm thinking there's more to this than meets the eye. I know what you're thinking -- there's "less" to this than meets the eye -- but isn't it kind of reminiscent of this:

Mona Lisa

except Japanese instead of Italian, and predating YouTube? And 500 years older? They're both teenage girls, kind of awkwardly posing, yet happy and somewhat self satisfied. Yes, I'm suggesting that in the year 2508, people may be watching this video in a museum someplace. Or people will move on next week. :P