“There are lies, damned lies, and statistics.” — Mark Twain

In many fields of science, we require statistics to interpret our data. Even if you don’t become a researcher, as an educated layperson you will want to be able to understand conclusions based on simple statistical principles. The most mysterious value is also the most fundamental….What is the p-value?

The p value is the probability of rejecting a null hypothesis that is actually true.

What is a null hypothesis?

“Null” can mean either nothing interesting or nothing fishy, depending on the experiment.
For example, if you are comparing the results of two treatments, the null hypothesis would be that the treatment has no discernable effect. The “nothing fishy” type of null hypothesis often comes into play in genetics, where you will expect progeny arrays to meet a mathematically predicted ratio, unless there is some weird type of inheritance or genotype-specific environmental effect. The null hypothesis is often written as H0.

Why would a null hypothesis ever look false when it is really true?

Because of natural variability. For example, it may be that your treatment really has no effect, but for reasons of chance, one group happened to respond differently. This is guaranteed to happen eventually if you repeat the experiment often enough. For example, even with a “fair” coin, if you do a gazillion trials of 100 tosses each, you will eventually get a trial that has 100 heads.

What kind of p value do we want?

It depends on the circumstances. The scientific community has decided that for most purposes, a p < 0.05 strikes a good balance between the risk of accepting a false H0 and the risk of rejecting a true one. (Contemplate that.) A p = 0.05 means that you have a 5% chance of rejecting a null hypothesis that is really true, concluding that your treatment had an effect, even when it didn't. The lower the p value below 0.05, the higher the level of statistical significance.

The p value will be determined by comparing your empirically determined value of a test statistic with the known distribution of such statistics. The appropriate test will depend upon the structure of your data. Some of the most basic tests include t-tests (for comparing two normally-distributed samples), a chi-square test (for evaluating counts of categorical data), and a correlation analysis.