Statistical basics – Normal distribution Part 1
Most students can draw a normal distribution and name a…
Hang onto your hats, statfans! Hopefully you’ve read the rest of the ‘statistical basics’ posts and you know that what we are trying to do is estimate the value of some important statistic (which could be a mean or average for example) which we will calculate from some data organised into variables. We can safely estimate from a sample thanks to our BFF the Central Limit Theorem. If any of that is news to you please go back and refresh those concepts.
Statisticians get very worked up about hypothesis testing. Things have to be done in a very particular way and there’s a million opportunities for you to stuff it up. Get excited, because here comes my complete guide to hypothesis testing. You’re welcome! This first part is about writing hypotheses.
A ‘hypothesis’ is some statement about how the world works. I might have a hypothesis that “Women weigh less than men, on average”, or “Aliens from space are controlling my brain”. A hypothesis has to be a statement about how things actually are, or might be. “People should not murder each other” is not a hypothesis, it’s a statement about how someone thinks the world should be . My golden rule to keep you out of trouble is that every hypothesis is a statement beginning with the word ‘that’, like these:
A hypothesis should only contain one ‘that’ – it might sound obvious, but if you have multiple ‘thats’ you have multiple hypotheses: ‘That my brother is older than me and that he likes to play golf ‘ is two hypotheses. Your hypothesis need to be bite sized; able to be tested and digested in a single experiment.
Your hypotheses need to be falsifiable. What does this mean? It means that there has to be some possible way of unearthing evidence which debunks it, or proves it false. With this in mind, my hypothesis about aliens controlling my brain is no good – there is no experiment we can do (with current technology!) which will disprove this hypothesis. If you can’t disprove a statement, it doesn’t count as a scientific hypothesis.
Your hypotheses needs to be precise in two ways. “That pizza tastes good” is simply not going to cut it. “Good” is not a thing we can test with an experiment. You need to use very clear language, so subjective terms like ‘good’ and ‘bad’ are out. Usually, you also need to say who you’re talking about. It is very unusual that your research population is all of humanity, so include the population in the hypothesis.
A better hypothesis about pizza is: “That pizza is preferred to hot dogs by middle aged, single Scottish men”. This hypothesis suggests the experiment which could test it, which is the hallmark of a very precise hypothesis.
You need to avoid causal overtones. Unless you are a randomised, experimental study you are not allowed to suggest that some thing is causing some other thing. Ever. Students are often trying to make their hypotheses sound interesting, or important, and so unintentionally introduce language which suggests they are hunting for causal relationships. I have taken a lot of marks off students for this simple mistake, it is very easy to do by accident. Here are some examples:
None of this language – or any other variation implying that something is the direct result of something else – is ok in any scientific discipline. Weed it out. Here are the improved versions:
Why do we do this hypothesis hose-down? Science is inherently conservative in its statements about how the world works. We know that things which are correlated aren’t always caused by each other, and so to avoid making errors we have set up a fancy list of things you need to do in order to be allowed to claim that the relationship you’ve found is causal. This skepticism is a big part of thinking like a scientist.
Ok, so you know what’s what with hypotheses – congratulations! In the next post we’ll meet the enemy of students everywhere – the “Null Hypothesis.”