# Proving the ‘Birthday Paradox’ with Python Data Visualization

## Math is confusing, graphs are not

Here’s a fact about me: I am a sucker for interesting math facts. I find big numbers particularly interesting, especially when they are presented as a real-world example such that the only plausible initial reaction is “*There’s no way that can be true…right?*”

The other night, while aimlessly scrolling through Twitter, I came across this video:

My jaw hit the floor when he said there were **over 6 billion unique ways to order one suit of cards**. Six BILLION!!!

Now, the rest of this article has nothing to do with this video, or factorial, or even large numbers, but I wanted to start here because 1) this video is awesome and 2) it led me down an internet rabbit hole of interesting math facts.

That rabbit hole led me to The Birthday Paradox.

The birthday paradox states simply:

If there are 23 people in a room, the probability that two of them share the same birthday is roughly 50%.

This number seems high, right?

I’m not going to get into the math behind it (click the linked article above for a great explanation), but the reason this seems so confusing is that we initially think about comparing *just* *our own birthday *against 22 others to find a match. This would leave us with just 22 pairs of birthdays to check.

What we actually need to do is compare *everybody’s birthday *against 22 others to find a match. Person one compares their birthday against the other 22, person two compares theirs against 21 (minus person one because they already compared theirs), person three compares against 20, and so on. At the end of all these birthday comparisons, we would have now checked 253 pairs of birthdays.

Okay, this logic makes sense. But still, as often is the case with these hard-to-believe math facts, it FEELS like it shouldn’t be true, no?

While gathering multiple groups of 23 strangers and demanding they all share their birthdays seemed like the most foolproof option, there were some logistical and ethical concerns with this route. For this reason, I decided to use Python.

I ran 250 trials where I selected 23 random numbers between 1 and 365 and checked to see if any of the 23 matched. I counted these instances as “successes” and calculated their probability against the total number of trials:

I dumped the results into a Pandas DataFrame and graphed it:

As you can see, while there are some fluctuations at the beginning, by the 150 trial mark the probability of two people sharing a birthday out of just 23 has settled in right around 50%. If we let it run for say, 10,000 more trials, the graph would completely flatten out right at 50%.

The Birthday Paradox also states that if 59 people are in one room then the probability that two of them share a birthday is virtually guaranteed at 99%.

Let’s see?

Using the 59 person parameter, the graph essentially becomes a straight line. In fact, by using the hover tool you can see that it isn’t until the 80th trial that we get our first instance of not having a matching birthday.

The probability chart for the Birthday Paradox is shown with the code and graph below:

Right at x=23, the line crosses the probability threshold of 0.50. By x=59, the curve has flattened out as it gets ever closer to 1.0; it remains this way until x=366, at which point the probability becomes 1.0.

Well, there you have it.

Next time you’re in a room with 22 other people, you can be *pretty certain *that at least two of them are birthday buddies. If another 39 people enter this hypothetical gathering, you can stand up and proclaim with near statistical certainty that at least two of them were birthed on the same date.

Will anybody care? Will this information ever be useful to you ever again?

Probably not.

But you’re welcome.

Michael Black