Statistical Basics – What is a Random Variable?

This term “random variable” is a bit confusing, and not helped by the way it is usually taught! “Random variable” is really shorthand for  “the answer to this question is like rolling a dice or tossing a coin.” Let’s unpack…

If I toss a normal coin there is a 50% chance it will land on ‘tails’. This underlying probability applies no matter how many times I toss the coin. Tossing coins is boring, the important part to remember is that for variables like the outcome of a coin toss, what we call “chance” is fully defined mathematically as being 50% heads and 50% tails. When all the possibilities are defined neatly like this we say we have a Probability Distribution. The distribution for tossing a coin looks like this: Note that the options sum up to 100%

Each time we toss the coin we are really drawing an observation from the above probability distribution. Statisticians like to call this a “trial”. Trial means “consult the distribution and see what we get”.

Variables more interesting than a coin toss

If I randomly select a person from the Australian population, there is a 50.2% chance that it is a female person. Just like coin tossing, each time I select a person I am going to the underlying probability distribution (50.2% Female person / 49.8% male person) and asking for it to spit out an “observation”.  The data that ends up in my imaginary spreadsheet about gender is determined by the underlying probability distribution.  Any variable that operates like this is called a “random variable”.

Understanding that what we observe in the world is the result of underlying probability distributions is a big moment in your stats journey, so let this sink in a bit.

What is not a random variable?

When you line up at the bakery to get something yummy like a doughnuts (I love doughnuts!) sometimes you have to take a ticket at the counter. If we asked a bunch of customers at the bakery “What is the number on your ticket?” there would be different response for each customer, so this is a variable (numerical / discrete).

The number on the ticket is not drawn from a probability distribution. It is not random in any way. If we were to take 100 tickets in a row we are not consulting the probability distribution each time – we know which ticket is next because it is the value of the current ticket + 1, that’s the whole point of the ticket system! The hallmark of a non-random variable is that you know what you are getting before you get it.

Almost all variables we use in research can be considered “random” – but it is very important you spot the things that are determined entirely by some formula or equation, because they’re a different kettle of doughnuts.

Good luck out there,

Taya

Article Tags 