Friday, April 18, 2008

...Imagine what it would feel like to loose 37 years of emotional baggage...

Brain researcher Jill Bolte Taylor studied her own stroke as it happened -- and has become a powerful voice for brain recovery.


This is a _must_see_. She explains how we see the world, almost to the point of telling is what we really are.



Scroll about halfway down the page to see the video. It's her TED speech from the Monterey, CA 2008 conference.

Friday, March 21, 2008

Dang, this thing is creepy

I swear this thing looks (and sounds) like an overgrown housefly.

Thursday, March 13, 2008

MySQL 64 bit hash

What do you do to create a fast lookup for about 40 million email addresses for a client? I was inspired by this article in the MySQL performance blog

And by fast I mean around 10 ms. Well the first thing was to normalize, since there was not a table with unique email addresses, but that was about 1GB of data, and another 1GB of index.

So I hashed it, but 32 bits is only enough of a hash (without collision handling) for about 10,000 to 100,000 items. crc32 is actually quite a bit worse than a random hash, it collides almost 1% of the time; which frankly seems remarkably bad, almost malicious really. And MySQL doesn't have a 64 bit hash, so I did a 64 bit hash using crc32. My first (failed) attempt was this:

crc32(invitee_email)*1234567890 + crc32(concat('x', invitee_email))

However, after the first inevitable 32 bit hash collision, it turns out that

if crc(x) == crc(y), then crc( salt + x ) == crc( salt + y )

as well. Which I guess makes almost enough sense I should have noticed that going in. Anyway, anagram collisions aside, the following overcomes that, and did not generate any collisions across all 40 million addresses:

crc32(invitee_email)*1234567890 + crc32(reverse(invitee_email))


So this is about 28% of the size of the original both data and index-wise, and produces a three times speed up, from about 40 ms to about 12 ms, and is easier on the cache.

Saturday, March 8, 2008

More than meets the eye?

So while looking through YouTube for some way to kill time, I ran across this video called "無15", which just means #15 in Japanese.



It's this young teenage girl just staring at the camera, and kind of smirking. Okay, that's weird. More weird is that I'm watching. Even more weird is that it has 500,000 pageviews, in addition to the views from 無14, 無13, ... Plus they have spawned a whole family of response videos from



to



So I'm thinking there's more to this than meets the eye. I know what you're thinking -- there's "less" to this than meets the eye -- but isn't it kind of reminiscent of this:

Mona Lisa

except Japanese instead of Italian, and predating YouTube? And 500 years older? They're both teenage girls, kind of awkwardly posing, yet happy and somewhat self satisfied. Yes, I'm suggesting that in the year 2508, people may be watching this video in a museum someplace. Or people will move on next week. :P

Tuesday, February 26, 2008

Art _and_ Science

I wasted (time well wasted, I might add) many hours on this website http://moma.org/. They have put all of their exhibits (and more) onto their webpage. Some of the more notable ones include the Water Sign, the Newton Virus , the Pac Man, the Inner Cell life, and finally the Molecubes. These are really remarkable.




From the video "Inner Cell life", a three minute animated walk through inner cell functions at the molecular level

Monday, February 18, 2008

Silicon Valley a great place to work?

In this article on Techcrunch, Michael Arrington's thesis is that Silicon Valley is a great place to do business, since it has support networks for companies, many VC firms, and a critical mass of talent, but also crucial is an entrepreneurial drive. It seems like the unspoken aspect of this drive is that entrepreneurs, and their staff have completely lost their work/life balance.

I think this is borne out in the expectations of startups for their staff. The expectations for work is well over 40 hours per week. I think managements strive to get 50 or 60 hours per week out of their staff, and sometimes themselves.

Here's an example [attributed to] the CEO of Adbright, Ignacio "Iggy" Fanlo / iggy@adbrite.com:

I hesitated sending this email for quite some time and had hoped that through your direct managers I would see some improvement. Having said that, I continue to see too few folks here at 9 AM; and too few folks here at 6 PM. I don't care if you are a morning person or a night person; if you want to work 10-8 pm or 8-6 pm, but I fully expect each one of you to put in 9-10 hours per working day. This is still a startup and we need more passion, time and energy from each of our employees than a large company would require. If we succeed, the rewards, both psychic and financial, will be great. But for that, we ask you to give more than the typical 9-5 job.


While I think this is completely counterproductive, and even if people are working 10 hours per day, I think this'll make them stop working those many hours, working for a tool like this, and find a job at a good company. Notch up one more Silicon Valley company that I will never work at.

But the sad thing is that this is fairly pervasive thinking, though most would have the good sense not to say it outloud, let along in a company-wide email.

It's great to have so many employers in the area, but I guess the lesson is that you have to choose an employer wisely.

Saturday, February 16, 2008

MySQL database monitor

Here is a tool published a few years ago that I've recently tweaked. I don't know how people could live without something like this.


#!/usr/bin/perl -w

#
# Copyright 2003-2008 Dale Johnson
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see .
#
use strict;
use DBI;
use Time::HiRes qw/gettimeofday usleep/;
use POSIX qw/setsid/;
use Digest::MD5 qw/md5_hex/;

my $signalled = 0;

$SIG{ INT } = $SIG{ TERM } = sub {
print "signalled...\n";
$signalled++;
};

my $start_time = time();
my $total_seconds_to_run = shift || 0;
my $stop_time = $start_time + $total_seconds_to_run;

my $dbuser = "****";
my $dbpass = "****";
my $hostname = "localhost";

my $interval = 10 * 1000; # 40 ms
my $daemon = 0;

my $dbh;
my $sth;

sub db_connect {
eval {
$dbh = DBI->connect("dbi:mysql:hostname=$hostname", $dbuser, $dbpass,
{RaiseError => 1});
print STDERR $dbh->errstr . "\n" if defined $dbh->errstr;
};
if( $@ ) {
die "error $@";
}
}

sub normalize {
my $query = shift;
$query =~ s/'[^']*'/?/g;
$query =~ s/NULL/?/g;
$query =~ s/\d+/?/g;
$query =~ s/\t/ /g;
$query =~ s/\n/ /g;
$query =~ s/\s+/ /g;
$query =~ s/-\?/\?/g;
return $query;
}

my $normalized_time;
my $normalized_count;
#chomp;@a=split(/\t/);$a[0]=~s/\d+/?/g; $a[0]=~s/'\''[^'\'']*'\''/?/g; $a[0]=~s/NULL/?/g; print join("
#\t",@a)."\n";' | sort | pmrollup 1 2 ':0,[,@0,]sum,[,@1,]sum' | pmscan :1,:2,:0 | sort -n -r -k2 | less -S
#}

my %queries;
my %states;
my %users;
my %hosts;
my %ids;
my $queryid = 1;
$|=1;
my $prevtm = 0;
my $dumpat = time() + 3600;
db_connect();
while( 1 ) {
eval {
my @this_dbid = ();
$sth = $dbh->prepare("show full processlist");
$sth->execute();
while (my @row = $sth->fetchrow()) {
my $tm = scalar localtime time();
my ($dbid, $user, $host, $dbname, $state, $time, $action, $query) = map { defined $_ ? $_ : "undef" } @row;
undef @row;
next if ($state eq "Sleep");
next if ($query eq "undef");
next if ($query eq "show full processlist");
next if ($user eq "system user");
my $digest = substr(md5_hex($query),0,12);
$query =~ s/\t/ /g;
$query =~ s/\n/ /g;
$query =~ s/\r/ /g;
push @this_dbid, $query;
if (not exists $queries{$query}) {
$queries{$query} = gettimeofday();
$states{$query} = $state;
$users{$query} = $user;
$ids{$query} = $queryid++;
$hosts{$query} = $host;
}
if ($state ne $states{$query}) {
$states{$query} = $state;
}
$prevtm = $tm;
}
for my $qid (keys %queries) {
my $digest = substr(md5_hex($qid),0,12);
my $tm = time();
my $user = $users{$qid};
my $state = $states{$qid};
my $host = $hosts{$qid};
if( ! scalar grep ( { $qid eq $_ } @this_dbid)) {
# this is a done query i hope
$normalized_time->{ normalize( $qid ) } +=
gettimeofday() - $queries{$qid};
$normalized_count->{ normalize( $qid ) }++;

delete $states{$qid};
delete $users{$qid};
delete $hosts{$qid};
delete $ids{$qid};
delete $queries{$qid};
}
}
};
if( $@ ) {
print STDERR "reconnecting to localhost ($@)\n";
sleep 10;
db_connect();
}
usleep $interval;
if( $dumpat < time() ) {
print STDERR "STATE DUMP\n";
print STDERR "queries sz: " . (scalar (keys %queries)) . "\n";
print STDERR "states sz: " . (scalar (keys %states)) . "\n";
print STDERR "users sz: " . (scalar (keys %users)) . "\n";
print STDERR "ids sz: " . (scalar (keys %ids)) . "\n";
$dumpat = time() + 3600;
}
last if $total_seconds_to_run > 0 && time() > $stop_time;
last if $signalled;
}
print "\ncount\telapsed\taverage\tquery\n";
print "=====\t=======\t=======\t=====\n";
for my $q ( sort { $normalized_time->{ $b } <=> $normalized_time->{ $a } }
keys %$normalized_time ) {
printf "%d\t%3.1f s\t%2.0f ms\t%s\n",
$normalized_count->{ $q }, $normalized_time->{ $q },
( $normalized_time->{ $q } / $normalized_count->{ $q } ) * 1000,
$q;
}


It locates the queries that are hogging time on your system. Output looks like this:


signalled...

count elapsed average query
===== ======= ======= =====
1 0.0 s 11 ms insert into vcontact (id_user_ident_table_posessor, id_user_ident_table_actual, first_name, last_name, email, create_datetime, modify_datetime, extra, extra_big, comments, permission) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
1 0.0 s 11 ms INSERT INTO user_friends_count (id_user_ident_table, friends_count) SELECT NEW.id_user_ident_table_posessor,? ON DUPLICATE KEY UPDATE friends_count = friends_count + ?

Friday, February 15, 2008

Flash games that are playable

The scrabble clone Scrabulous is great way to waste hours. You can try to work your rating up in competition with other players in timed games. It's pretty addictive.





Desktop defense is very popular and highly addictive as well. It takes awhile to get really good at it.





Onslaught is also interesting. But it pushes a bit beyond what flash games are capable of. It seems to completely peg my CPU, and still slows down noticeably when there are 400 or 500 little monsters working through my kill-zones.





Read the instructions before you play too long, trust me.

Thursday, February 14, 2008

Music on YouTube

Every once in a while I'm kind of surprised at amateur looking videos with exceptional music.

Heres' one (okay, I admit I'm partial to the fiddle):



Here's another:



granted, Marie Digby has over 1 million views for this post, and I'm pretty sure she's actually a professional, you get this feeling like she's your next door neighbor.

Emily on YouTube

Self described "patron saint" of Youtube.



And her trip to Japan

She's cute and smart, but I just can't seem to watch her _entire_ videos, they're just too damn long. I think she needs to cut them down a bit, my ADD brain can only take about 4 minutes.

Wednesday, February 13, 2008

Blogging

This blogging thing really seems to have legs. I'd like to blog about technology, companies, politics, economics. And when companies behave badly, I think they need to be called out.