Note (12/2015): Hi there! I'm taking some time off here to focus on other projects for a bit. As of October 2016, those other projects include a science book series for kids titled Things That Make You Go Yuck! -- available at Barnes and Noble, Amazon and (hopefully) a bookstore near you!

Co-author Jenn Dlugos and I are also doing some extremely ridiculous things over at Drinkstorm Studios, including our award-winning webseries, Magicland.

There are also a full 100 posts right here in the archives, and feel free to drop me a line at secondhandscience@gmail.com with comments, suggestions or wacky cold fusion ideas. Cheers!

· Categories: Mathematics
What I’ve Learned:

The birthday problem: because nobody wants (or gets) to celebrate alone.
“The birthday problem: because nobody wants (or gets) to celebrate alone.”

Birthdays often have problems. Whether it’s the annual reminder of impending mortality, Grandma getting you the totally the wrong Teenage Mutant Ninja Turtle action figure or Marilyn Monroe failing to jump out of your cake to wish you happy birthday.

(Actually, regarding that last one, it would create a whole bunch of additional problems if she did. So stop wishing for that.

There’s always Marilyn Manson, if you just can’t shake the idea. Good luck with that.)

None of these issues is the birthday problem, however. The specifically-named “birthday problem” is actually a mathematical exercise — two words which you probably wouldn’t want to hear, in any combination, on your actual birthday. Assuming it’s not your actual birthday right now, here’s what the mathematical exercise asks:

Among a group of randomly-chosen individuals, what is the probability that at least two of them will share the same birthday?

The “randomly-chosen” bit is key, of course, since including certain individuals would muck up the math. The Olsen twins, for instance, would totally ruin everything.

(Presumably, this is not something you need mathematics to tell you.)

Assuming a full-on rand-o crowd, the question gets a little more interesting — to non-probability theorists, at least — when asked in this way:

How many people do you need to have a better than 50% chance of two people sharing a birthday?

That’s a trickier question than it looks. Most people can narrow down the range of possible answers a bit. If there’s one person in the room — scary Olsen or not — then there’s zero chance of sharing a birthday with the nobody else in attendance. And if (ignoring leap days, because ain’t nobody got time for that) 366 people are mingling, then the chance is 100% that at least two of them share a birthday. There’s only so much calendar to go around.

You might think that the number to get 50%, then, would be right in the middle. You would be mistaken.

Then again, a few minutes ago you wanted Marilyn Manson to hop out of a cake and sing to you. You don’t exactly have a track record for making good decisions.

The real answer (spoiler alert!) to this version of the birthday problem is 23. According to combinatorics probability theory — and some nifty math — with just 23 people in a room, the odds are slightly better than 50% that two of them share a birthday. If you want to increase the odds to 99.9%, you’ll have to make a few more phone calls — but your meeting hall still only needs a capacity of 70 people.

That’s nothing. They get bigger crowds than that at the Pawnee, Indiana town hall. And Ron Swanson doesn’t share his birthday with anyone.

This surprising result to the birthday problem is important for three reasons. First, maybe it instills in some young minds a wonder and love for math, and they’ll go on to become professional mathematicians. Which is great, because somebody has to take that bullet. And numbers make my head hurt.

Second, there are computer hacking strategies — called “birthday attacks” — that take advantage of the math behind the birthday problem to wreak certain kinds of digital havoc. These brute-force cryptographic manipulations are often aimed at using hash collisions to the hackers’ advantage. In other words, no birthday cake for you.

And finally, this demonstrates that there’s a big difference in calculating a 50/50 that any two people will share a birthday, versus the same odds that somebody in a crowd shares your birthday. For the latter, you’d need at least 253 people, which reminds us that probability is tricky, the obvious answer is not always the right one, and for crissakes stop making everything about you, ya town hall-cramming Olsen-loving Manson-caker.

And oh, yeah: happy birthday. Freak.

Actual Science:
Damn InterestingThe birthday paradox
Better ExplainedUnderstanding the birthday paradox
NPR / Math GuyThe birthday problem
Hack This SiteHash collisions and the birthday attack
University of WaterlooInvasive species use landmarking to find love in a hopeless place

Image sources: Math.info (birthday match graph), Photobucket / taintedXarts (birthday Manson), The Gloss (Olsen twin powers, irritate!), FanShare (in Spanish!) (Pawnee powwow)

· Write a comment
· Tags: , , , , , , , , , , ,


· Categories: Mathematics
What I’ve Learned:

Statistical significance: 'Do you feel 95% confident, punk?'
“Statistical significance: ‘Do you feel 95% confident, punk?'”

People are scared of numbers. Sometimes, the fear is justified. A 330 on your credit score report, for instance, is genuinely horrifying. So is a 410 on your SAT. Or anything greater than “two”, when asked how many cats your mother owns.

But most numbers are harmless. People only fear them because they might wind up in a statistic, and everyone is afraid of statistics. The saying is not “lies, damned lies and sharks with frickin’ laser beams”. It’s statistics. Even scarier than laser-sharks.

The problem is understanding. I can help — though only to a degree, because mathematics are involved, and I swore after memorizing the Pythagorean theorem that I was “full”, and couldn’t learn any more math.

(Which is probably why I’m familiar with the horror of subpar credit scores. And low SATs.

Someday, this will probably drive my mother to adopt a dozen cats. But not yet. Whiskers crossed.)

Happily, you don’t need math to demystify statistics; you only need to know about statistical significance.

(Although you might need a calculator or a fancy-ciphering web page to do some maths for you. Stand on the shoulders of Poindexters, my friend.)

Statistics can be manipulated to say just about anything — like a willing stool pigeon, or a guy trying to get a date with a lingerie model. The question is how confidently those stats say something, and that’s where statistical significance comes in.

Most scientists will run with a conclusion if they believe it’s at least 95% likely to be true. Some tests require 99%, and a few really crucial questions — like, can we clone Neil DeGrasse Tyson’s mustache in time for Halloween — need a 99.99% (or greater) probability before they’re accepted.

So how do researchers achieve those levels of confidence? Flip a thousand coins and see what comes up? Ask a Magic 8-Ball which answer is better? Co-author their papers with a pigskin-prognosticating porcupine?

(Based on recent scientific scandals, yes. A few of them apparently do.

But we try to weed these idiots out, based on their SAT scores. Or how many cats their mothers own.)

Real scientists determine statistical significance by performing calculations that take important factors into account, like the number of observations and the likelihood of the results.

For example, the “p-value” calculation, which involves math with Greek letters and squiggly brackets and other head-exploding details. But just remember it like this: the “p” in p-value stands for “pssshaw“, as in: “Pssshaw, you’re wrong; I bet your mom owns so many cats.

Once calculated, the p-value is the probability (subtracted from 1) that your scientific conclusion is full of smoking cat turds. A 1.0 means you’re one hundred percent talking out your ass, and a value of 0.05 means you can be 95% sure you’re not vocalizing through your rectum.

The keys to getting low — meaning good — p-values are making a lot of observations, and having most of those come out one way, and not the other. A million dice rolls where every number comes up just as often doesn’t tell you anything about what’s coming up next. And — to the chagrin of sportscasters everywhere — a winning (or losing) streak of one, two or eight games isn’t sufficient to make their pre-game blather “significant”. Or coherent, if there’s a liquor cabinet in the press box.

Another example: over the years, I’ve worked with a number of Belgians. From my observations, 100% of Belgians are named Paul, 100% wear fashionable sweaters, and 50% say really inappropriate things in the workplace.

Those are statistics, based on real observations — and some very uncomfortable staff meetings. But do the conclusions have any statistical significance? If the number of observations is ten million, sure. If the number is two (which it is), then no, more observations are needed. You should take these stats, and all others with low (or ambiguous) statistical significance, with a healthy grain of salt.

Also, a huge pile of kitty litter. But preferably not from your mom.

Image sources: ScienceNews (p-value roller coaster), Discovery/TLC (cat-wrangling mama), Daily Caller (“Watch out, guys; we’re dealing with a badass ‘stache over here.”), The Awl (8-ball uncertainty)

· Write a comment
· Tags: , , , , , , , , ,