Statistics

Table of Contents

Math Confusion

You immediately know to be wary of anyone who asks, "What are the odds?" If it's something you do — stop doing that!

It's a silly expression. Instead of asking such a thing, the questioner should calculate it for himself.

If it can't be calculated, then the questioner is relying on someone smarter to do it for him, and he shouldn't be making any claims or arguments, which is the only time you say, "What are the odds?"

Often, there's no way to calculate the odds, making the point moot. When something is uncountable, odds do not come into the picture.

Statistics is a science based on counting. To figure "the odds" on something, means you are dealing with something that falls into the category of "countable." (And it has to be countable in a way that's usable and identifiable, not abstractly. "How many stars in the sky?" is a good example of useless abstraction.)

If something doesn't happen too often, we can say that it is quite evidently not likely.

Quick & Dirty Summary

Using the term, "the odds"

The psychology of lotteries

Non-intuitive statistics, throwing the dice

Monty Hall three-door game and the ensuing chaos

Using counting to solve problems

Picking a better slot machine

Misuse of probability in day-to-day life

Conclusion: the peculiarity of statistics

A timely example: A chess "grandmaster," was recently accused of cheating against Magnus Carlsen, the world champion. He denied cheating in this instance. But then you have people going, "What are the odds he could do that???" (Cheat on the board, as opposed to while playing computer chess.)

Well, no matter how many possibilities you come up with, the possibilities are all zero, except for the right answer — where the possibility is 100%!

Now, if any ambitious fool wants to do a study, he could perhaps count all the matches in chess, count all the incidents of cheating, and calculate a percentage from that.

But that wouldn't be very satisfactory, nor would it be very revealing. It would not be tailored to the individual.

Thus, this problem domain is not really an application for "odds"/statistics, which is a science of aggregates.

Lotteries & the Flaw of Imagination

Stats and counting, and therefore lotteries are confusing – and the lottery people want to keep it that way. It's not just the induced confusion.

Let's set up an imaginary lottery drawing where one million people participate. Assuming it is an honest lottery (in the real world, we don't take this into consideration, despite evidence), with only one winner, then there is only a 0.0001% chance of winning (an example where we can figure the odds).

Why don't we go out carrying steel plated umbrellas attached to grounding straps to protect against lightning? The odds are higher of getting struck by lightning, yet many play the lotteries and no one goes out with grounded shock-protecting umbrellas.

Why is it we tend towards doing the stupid rather than the practical?

What error in our nature prevents us from saying, "It almost certainly won't be me that wins that lottery," rather than the true, but misguided, statement, "It could be me that wins that lottery?"

Well, there is, pretty much, 100% chance of someone winning. There's that.

So we find it's intelligence in service of rationalization that is the culprit. That is, it is an egregious zone. This zone might be called a situation of being caught up in the Living the Dream fallacy.

Or it might be called the flaw of imagination. Someone, through promotion, like advertising, or something tricky (like showing it as a great thing happening to someone in a movie), has inculcated the idea of winning a lottery as something desirable.

So, it's a mind hack used against you, to make your mind and imagination conspire in twisted concert, and deal in abstracts, to justify a preferred conclusion.

It's a good rule of thumb to remember, that there is no strict rationality in mind.

We must learn to be rational and logical, something it's easy to fail at.

We become interested, consumed, or obsessed with something, because our whole psyche is geared to our own fulfillment.

Rationality is the annoying stumbling block, easily cast aside, on the way to the simplest path to that fulfillment. If rationality informs that it's stupid to invest in lotteries, faith and imagination look for an "out," and find it, in the truism that "someone has to win," and "that someone could be me." "I, too, could ‘Live the Dream!'"

But since the vast majority of lottery players lose continuously, there's a conclusion to be drawn. In the end, shelling out for lottery tickets is just to get a short-lived "good feeling," which shows that our priorities are not truly sensible, and are different than we've been led to believe. It's the explanation for "self-destructive behavior." What really motivates us, if we're not careful, is "how we feel in the moment." Everyone is tempted, just some are better at resisting it than others.

Now, if it were smart to invest in lotteries, we should put all our money into lotteries. To see for yourself how foolish people can be, go around and tell that to everyone who buys lotto tix. Watch their reaction. It will be along the lines of, "Oh, don't be stupid, that would be silly!"

I once told someone who had just bought a ticket to, next time instead, put that purchase money into a paper bag every time she was motivated to buy a lottery ticket. After a year or two, she'd have a certain windfall, saved right there in the bag. She scoffed at the idea — it meant waiting a year or two, which struck her as funny.

One other way to perhaps help short-circuit the stupidity of things like playing lotteries, is to examine and discard the notion that it is even good to win lotteries. It seems like almost every winner has some horror story to tell, whether it is hounding from moochers and parasites, familial stresses, or blowing the whole wad stupidly. Of course, that starts a whole new round of rationalizations. "That wouldn't happen to me, I'm good with money." Well, if that were the case, you'd have enough that you wouldn't need to be investing in lotteries, wouldn't you?

Really, we shouldn't even walk out the door unless we understand statistics at their basic level.

Statistics

Statistics Are Non-intuitive.

Statistics is the science of counting.

Unless you're dealing with something you can actually count, you cannot apply statistics, odds, or probability to random things. (Though people still try, and end up looking foolish.)

The reason for statistics is to provide the non-intuitive answers.

If everything were intuitive, we wouldn't need sciences and mathematics.

Looking at games played with dice, those wacky statisticians will always tell you that "the die has no prior knowledge." That means that the odds are always one in six of a single die turning up, say, six.

We tend to, not forget, but not absorb the fact that there is never anything but a 1/6 chance of throwing a particular number with a die. Even if you threw every number but six, 1000 times, there's still only a 1/6 chance of six being rolled. Non-intuitive.

That goes against common sense, and for a good reason. No one's going to sit there and just keep rolling sixes all afternoon unless they've got a plugged die. Once you've rolled three sixes in a row, a fourth is pretty rare – isn't it?

Well, they say statistics is the science of counting, and we can count. We can look at the odds of rolling four sixes in a row ourselves. What can you roll in four tosses?

1-1-1-1, 1-1-1-2, 1-1-1-3, 1-1-1-4, 1-1-1-5, 1-1-1-6, 1-1-2-1... 6-6-6-5, 6-6-6-6

In four tosses, there are 6 * 6 * 6 * 6 possible combinations = 1296 combos.

Well, that was easy: in our count, which covered all possible combinations, 6-6-6-6 appeared once out of 1296 times, so there is a 1 in 1296 chance of rolling 4 sixes in a row!

It doesn't matter what it "seems," then. Over a long time, you'll have to roll 4 sixes in a row. But you have to remember that entails millions or billions of throws (but not necessarily, it could happen on the first four rolls, of course). It's unpredictable on the small scale, but meaningful on the large scale.

Note also, that it is inevitable to roll, any number of sixes in a row – of course that doesn't take into account human nature: Humans will quit before 20 sixes in a row, saying the dice are fixed or something.

But we also see there is a sort of carryover effect. There is a 1295 in 1296 chance of not rolling that 4th six, so, really, how can we say the die has "no prior knowledge?" You'd be a fool to take 1 in 6 odds on it turning up six – wouldn't you?

But this is the confusion of large numbers with small numbers. And a confusion of what was being bet on, and the conditions of the bet. If you walked into a situation where someone had just thrown 3 sixes, then were asked if you wanted to bet, of course your odds are 1-in-6 of another. If you bet at the start of a sequence of 4 throws, of course your odds are 1-in-1296!

That walk-in to this situation would win 1 out of 6 times, because he is walking in on a situation where the 1-in-216 situation of 6-6-6 had already occurred, just like any possible sequence could have occurred. But prior throws don't affect the subsequent odds of the die.

The newcomer will face the options of:

(6-6-6)-1, (6-6-6)-2, (6-6-6)-3, (6-6-6)-4, (6-6-6)-5, (6-6-6)-6

The (6-6-6) sequence represents the fact that 3 sixes have already been thrown, and therefore we are left with the final possibilities that, 1, 2, 3, 4, 5, or 6 might turn up; that is, 1-in-6 odds for any of those numbers, including 6.

To the guy on the ground – in it from the start – the statistician looks wrong because he only sees the combination of 4 sixes turn up once out of 1296 times. Weird but true.

We might say that this shows the "fallacy of lost or missing information."

Note that the probabilities aren't additive. Your chances of rolling a given number are one in six for any roll of a die. However, the chances of rolling a six, say, in six rolls of a die aren't: 1/6 + 1/6 + 1/6 + 1/6 + 1/6 + 1/6, that is, one (one hundred percent).

Nor are the chances 1/6 to the sixth power, which would mean, the more chances you had to roll, the fewer your chances of rolling a six.

This is another of those "non-intuitive" cases. No, if your chances were 100% of rolling a number in six rolls, the distribution of throws would never change, and the game of craps would never be the same. Statistics are different than a cursory look would indicate.

Your chances of rolling a particular number in a given number of rolls of the die are calculated by counting: Let's choose the number six (6). For one die, the odds are 1 in 6, 16.7%, for two rolls (or one roll of two dice), we can roll, 1-1, 1-2...1-6, 2-1...2-6, 3-1...3-6, 4-1...4-6, 5-1...5-6, 6-1...6-6. Then, we get six, 11 out of 36 times, 30.6%. For three dice, six shows 91 times, or 42.2% of the time (all percentages rounded). For six rolls (or one roll of six dice), it's 66.5%. We approach 100% as we perform more rolls, but never get all the way there.

Which is a sort of mathematical reasoning, not real-world reasoning, because in the real world, we don't see hundreds of rolls of a die without seeing a six. We throw out the die long before that, because it would be very likely that the die itself had some flaw. In fact, all dice tested seem to be found to be somewhat out of whack with true random expectations.

Let's look at a hypothetical situation now, an example of how theoretical statistics can be used to find something interesting, and non-intuitive.

The Monty Hall 3-Door Game

This has been popularized in a more recent TV show, so there is already a certain awareness of this hypothetical puzzle. This is the case of the "three door game," inspired by the old TV program, Let's Make a Deal, with host, Monty Hall (Monte Halparin). But here's an easy-to-understand take on this subject.

In the real game, contestants select a random prize by selecting between three "doors," portals set up on stage. Each time a contestant picked one of the doors, Monty would first reveal the contents of a different door, never revealing the good prize, but instead opening the door with something undesirable, like a goat, behind it. The other prizes, behind the other two doors, and still unseen, are respectively, another goat, and a new car.

In the real show, Monty then would offer to "buy" the contestant's door for $500 or something.

In the imaginary scenario to set up this puzzle, Monty instead offers the contestant the opportunity to switch doors, if he so desires.

Should the contestant switch?

At first blush, it seems like it's a zero-sum, or irrelevance, to switch. It also might seem like the odds are 50:50 that the door you picked has the car now. Actually, the odds are 2/3 for the other door, and 1/3 for your door, so switching is the good choice.

Part of math and science is to symbolize or "represent" something so we can work with it. One scientist has said, "No computation is possible without symbolic representation," and that's probably true. In this case, we can symbolize the contents behind the doors by assigning labels to each. The following probabilities apply at the start of the game:

Object Symbol Probability
Goat A GA P(GA) = 1/3
Goat B GB P(GB) = 1/3
Car A CA P(CA) = 1/3

Besides needing symbols to work with, we need some rules to work with when we're formalizing a problem symbolically like this. And we need to recognize the "hidden" or implicit rules, to help us move forward.

  • We need to remove things that don't matter from consideration.
  • It is valuable to label things, to make them easier to work with (so, we assigned symbols to the goats, (GA, GB) and car (CA)). Note that by getting rid of the specific mention of "goat" or "car," it makes it easier to focus and deal with the problem in its essence, a trick used in algebra. Most of problem solving is weeding out distractions.
  • P() stands for "probability of..." so, for example, P(GA) stands for "the probability of goat A being selected."
  • Random odds are equal for each door, and are easy to calculate: one choice out of three doors means our initial odds are 1/3. (P(GA) = P(GB) = P(CA) = 1/3)
  • The initial probability of your selecting CA can never change. So before and after Monty opens a door, and regardless of the order the doors are opened in, the odds of your choice being the car were 1/3.
  • We'll assume we aren't farmers and prefer the car to a goat, so getting the car will be considered "winning."
  • Remember that statistics is a science of counting.
  • The odds for all choices, added together, must always add up to one, or 100% ("1" and "100 percent" are the same thing, mathematically).
  • From the previous rule, the odds for each choice, when the player first selects a door, are 1/3, and P(GA)+P(GB)+P(CA) = 1/3+1/3+1/3 = 1.
  • Once the first goat is revealed, the odds of that selection having the car are zero (0), of course.
  • Monty always opens a door with a goat behind it, therefore knows which door has a goat, and which a car.

Now, we're ready to analyze this, and, start to apply those rules. First off, we need to clear out the nonsense, and forget about "doors" and positions – they just confuse the issue. What's important is our selection.

Monty always picks a goat for the first reveal, not a random selection, which is important because it provides information to the contestant.

Remembering that statistics is the science of counting. But we need something to count, so we count the times we win and times we lose, based on the rules, and based on our formalized representation.

The following chart lists the possibilities. We may have picked GA, GB, or CA, and, if we picked one goat, Monty will reveal the other. Regardless of which goat Monty reveals, we will have the option to switch or stay.

Choice Option Result
GA Switch Win
GA Stay Lose
GB Switch Win
GB Stay Lose
CA Switch Lose
CA Stay Win

Now it's — finally — time to do our counting, and all we have to do is refer to the table, and count the wins. We find that switching is a winning proposition in two out of three cases, or to put it another way, staying is a losing proposition in two out of three cases.

Now note how reformulating our problem has completely changed the apparent "problem space." Thanks to realizing that the problem is one to do with switching, not doors, we have made the problem "countable," and thus solvable.

It isn't that a door has a 2/3 chance of having the car, but the switch has a 2/3 chance of being correct. This may seem a distinction without a difference, but it isn't. Recognizing that fact allowed us to create the chart which made the problem "countable." No matter which option the contestant picked, re-selecting the other is the right play.

You'll notice we could have saved a lot of time, by simply using our statistical rule that the odds must add to 1. Given that our initial choice has odds of 1 in 3 and the total odds are 3 in 3, what remains has odds of 2/3. Switching is merely exchanging 1/3 odds for 2/3 odds, or getting two doors for one. The fact that one (losing) door is opened does not and cannot change the odds that were in play beforehand!

Now, if a door had been randomly selected and revealed, switching would be irrelevant.

It may still seem non-intuitive, but, again, if this were all intuitive, it wouldn't be science, or, rather, there'd be no need for a science of statistics.

To show that both sides can have a valid point in a dispute... and that statistics are a bitch, consider the case of someone walking in to the studio, not knowing what's already happened, and being allowed to pick his own option, one of the two remaining doors, after one goat had been revealed.

Well, this person has a 50:50 (1/2) chance of winning. That cannot somehow change "the odds," for the other player, of course, but it's confusing if you think that somehow the odds have to be the same for each "door," or each person. They do not, and that's what is confusing.

Note the parallel with the similar situation where someone "walking in," on a game of dice, will encounter much different odds from someone who has been playing for some time.

Here's one way to look at it: The first person had one of three doors to choose from, with a prize guaranteed behind one of them. The second had one of two doors to choose from, with a prize guaranteed behind one of them. For the first person to have 50:50 odds, they could reshuffle what's behind the doors randomly, and then he could choose again. Or flip a coin, heads for #1, tails for #2. Whatever.

To see the flaws and quirks in human nature, someone was on the Net explaining the Monty Hall problem, and mentioning how switching yields the better (2/3) odds, and a lot of people went nuts, criticizing him and accusing him of being an ass, etc.

Actually, it's worse than I remembered: A female columnist presented this problem and solution in a 1990 magazine column, and over 10,000 people, including dozens of "PhDs" wrote in, dogmatically and stupidly critical. Note also, the Monty Hall Problem is just a restatement of a much older puzzle, in a different format. So there's really no excuse for their ignorance.

In some cases, they're answering a different question, unawares, as our discussion has revealed. The point wasn't to answer what the odds are for someone new to the scene, but for the contestant in it from the start.

Before we get embroiled in argument, we need to check if we're working from the same "problem space" as the other person. In fact, both parties to an argument may be wrong, and dogmatically cling to misapprehensions that may be based in some truth, but don't give the whole picture. The problem comes when people can't distance themselves from their convictions long enough to figure out what the other person is talking about.

If you don't have reliable tools to attack a problem your mental wheels spin like the wheels of a car on ice. The mind doesn't deal well with apparent paradox. The odds of the selection never change from 1/3, but that unchanging certainty can be colored by a misperception of the problem.

As we've seen, the problem must be tackled in a detailed, accurate way, identifying the principle(s) involved (in this case, "counting," of course), which is why we went through all of the preceding steps. This is a huge problem in schools and teaching, where they simply won't do this, out of ignorance, stubbornness, or their own lack of deep understanding of what they're teaching.

Being that Statistics is also a science of large numbers, anything can happen in a small data set. You may play this 3-door game yourself several times and see the guy who switches lose each time. The trend is that the switchers, if all added up, will have won, on average, 2/3 of the time.

There's even a web page that allows you to simulate the game to see for yourself. I played it for a while, and in the first 10 or 20 runs, I was only winning 1/3 of the time! But I kept playing and my winnings went up to about 2/3. Odds are only meaningful over a long series of runs.

This 3-door game is an useful little puzzle: Logic, math, insight into argument and the quirks of statistics.

Now We're Cooking

That can be just the start to the fun: We now have some tools to work with to look at a different situation, and confirm a detail.

Suppose our imaginary Let's Make a Deal host wants to change things up a little, and now Monty will open a door at random, after you've picked. If your door – "selection," rather – isn't revealed by luck of the draw, he'll let you select afterwards. What are your odds now?

At first glance, that seems a little tough to analyze, but we know what to do: Set things up so they are countable. Let's break it down.

The following chart lists the possible ways things can play out. In the case where your choice and/or the car is revealed randomly, you immediately win or lose, thus you don't get a chance to switch, so the choice is marked with a hyphen, '-' but in the other cases, like when you choose GA, and GB behind another door, is revealed, the chart shows the result if you switch or stay.

You Choose Randomly Revealed Choice Your Result
GA GA - lose
GA GB switch win
GA GB stay lose
GA CA - lose
GB GA switch win
GB GA stay lose
GB GB - lose
GB CA - lose
CA GA switch lose
CA GA stay win
CA GB switch lose
CA GB stay win
CA CA - win

Now we can do a bit of counting, or statistics. We count to find that the policy of switching gives us 3 wins. But the policy of staying, also yields 3 wins! Since we know the odds when you stay are 1/3, so are the odds when you switch. It looks like this game wouldn't be much fun at all – there's no difference, no matter what you do. But it is what we should have expected, since the reveal was random.

What we've just dabbled with is a good example of real science: digging around, wondering about things and conducting experiments to try them out. It can uncover new information, or confirm or negate a belief, or show the validity of an approach to a problem. Science is really the action of being constructively inquisitive.

We also saw that there are different ways to look at the same problem, and if we have the opportunity, we should try to examine as many as possible, something that would be very helpful for teachers to learn.

And we saw that "counting" isn't as simple as a cursory thought might assume it to be. Whenever statistics is involved, an analysis of the elements of the problem, and the approach to the solution is required. Moreover, our solution should meet the requirements of rationality and be within the realm of possibility — and it's good to run trials if possible, like with a computer program to check the results of many trials.

Lucky Slots

Another common fallacy that seems obvious after it is explained, is the "hot" or "lucky" slot machine fallacy.

You know, over the years, I've heard this "strategy" of "looking for a machine that hasn't been paying, because a big one's on the way," numerous times from people. It's like they're unpaid workers for the casino.

Some people have the idea that you should look to play casino slots that haven't paid off in a long while, because they're "due", and "building up for big pay-out!"

Good gamblers and people who actually work on the machines explain that it's the exact opposite: You look for machines that are paying off time and again – that have a higher frequency of winning combinations appearing. Those are the "looser" slots – higher percentage paying ones – and where you should "invest" your efforts.

Which makes perfect sense: The stingy machines, as far as you know, may never pay off. And how would you ever know? At least the winning machines have tangibly proven they can win.

Day-to-Day "Probability"

Speaking of probability, we hear that term thrown around a lot, outside of its applicability. "There's zero probability of that guy cheating!"

"There's a one-in-seventy thousand probability of other life in that star system," notes some "expert." How easily we fall for nonsense like this.

As we know, unless you're dealing with something you can actually count, you cannot apply statistics, odds, or probability to these uncountables.

It's not even fine if you're just talking casually. As soon as someone starts to assign percentages as to "the odds" of something, unless it can be quantified, he's talking out his ass.

Look at how tricky it was to analyze the Monty Hall Problem, and its offshoot, and the vicious attacks from bloviating know-it-alls. It takes real work to analyze things, as opposed to just talking out of that dark place.

For example, to assign odds to life on Mars, we'd need to have explored and found life/not found life on a number of Mars-like planets before we could get anywhere close to assigning numbers.

To repeat: If you can't count something, you cannot assign odds to it.

Any sort of exercise about life in the galaxy or such, is a guess, but it doesn't sound so hot if you phrase it that way.

Statistics requires things that are countable. We don't even know how many planets there are in the galaxy (or even if we really are in a "galaxy," and aren't confused by some illusion created by our presuppositions).

It's carte blanche for the cray-cray, when people start flapping their yaps with nonsense talk, clothed in make-believe scientific garb, so, better not to fall into that trap at all.

In Closing

Remember this tripping point from the "Monty Hall Game" discussion, and the discussion of rolling the dice: The odds don't have to be the same for two different individuals for what may seem like the same thing.

And don't forget another peculiarity. We could have called this discussion, "The Peculiarity of Statistics." The very large numbers we need in stats often elude us.

We wouldn't see a six rolled a thousand times in a row, but the demands of the math say it should happen. It's much easier to visualize it this way: Take a silo containing a million apples. Throw one orange in. Mix well. Reach in and grab the first object without looking. Not much chance of it being an orange. Mix well again. Eventually, the orange should show up — or it could have turned up at the first attempt, we just don't know. Just that in millions and millions of attempts, it will probably show up at some point. Better to say, in trillions of attempts, the orange will turn up millions of times, on average.

Here's an interesting point: If the orange doesn't turn up millions of times in trillions of attempts, it indicates something wrong with the way we mixed up or selected the orange, or with our count of the apples. Or it may be that there were some stray oranges in there somewhere. (That provides a clue as to how stats are used to uncover cheating.)

Because we're dealing with mankind, a man or woman wouldn't tolerate a die that kept rolling sixes, but would change it out. And because we're dealing with mankind, the die would very possibly be faulty or altered, so it would make sense to switch it out.

Hence, the peculiarity of statistics, where we don't have a world tailored to pure random chance and humongous repetitions under unvarying, laboratory conditions, but lots of meddling and exceptions and sticking points.


Comments

Popular Posts