A hypothesis is a starting point before you conduct an experiment.

It is a starting point with limited evidence. It is approved or disproved based on the results of your experiment.

For example, I throw a pizza party every Saturday night for my friends and their family and I use a wood-fired oven to make my pizzas. On average, I get a review of 4 out of 5 stars for my pizzas.

I and my partner then decide to move to an apartment while our house is getting remodeled. I throw these parties again but this time I bake pizzas in an electric oven, but I use the same recipes.

I will get the 4 out of 5 stars when I use an electric oven. This is my hypothesis.

My hypothesis is not based on empirical facts but from limited evidence. A hypothesis can be formed based on limited evidence or by inductive or deductive reasoning.

Statistics is one of the methods used to test if a hypothesis is true or not based on empirical facts. In statistics, we produce the results of hypothesis testing in terms of probability. The cleanliness of the facts or the methods used to collect the facts for hypothesis testing is measured in terms of errors like Type I and Type II errors.

### A Simple Example

Now let us use statistics to test our hypothesis.

The statement, ‘I will get 4 out of 5 stars for my pizzas even if I use an electric oven.’ is called a null hypothesis.

A null hypothesis is a prevailing fact or belief. What we are trying to do with the hypothesis testing is to nullify this prevailing idea.

H = I will get 4 out of 5 stars for my pizzas if I use an electric oven.

Therefore, the __alternate hypothesis__ is,

Ha = I will not get 4 out of 5 stars for my pizzas if I use an electric oven (It could be less than or greater than 4).

Following is a subset of the data collected from my friends for 36 weeks. Please download the workbook from the link below to gain access to the entire work.

S is the sample space. (All possible outcomes of the experiment.)

S = {1, 2, 3, 4, 5}

Average stars for wood fired pizzas = 4

The standard deviation of stars for wood-fired pizzas = 1

For each element in the sample space, there is a probability associated with it.

Similarly, we can apply the same method on the data for pizzas made on the electric oven.

Average stars for electric oven pizzas = 2.1

The standard deviation of stars for electric oven pizzas = 0.57

So, what is the threshold below which we reject the null hypothesis or what is the region of rejection?

As a standard, the probability of region of rejection is set to 5%. If the probability of the null hypothesis is less than 5%, then we can reject the null hypothesis and accept the alternate hypothesis.

There are several tests in statistics that will help us decide if we must accept or reject the null hypothesis. Some of them are Z-Test, F-Test, Chi-Squared Test etc. which can be chosen according to the type and size of the data in hand.

### Errors in Hypothesis Testing

Consider this new scenario. Due to some error in the data collection, we ended up with the wrong data for electric oven and got the wrong results. All my guests did think the pizzas were as good as they were before, but reviews were recorded wrong. In this case, we rejected the null hypothesis falsely.

The probability of such an occurrence is called __Type I__ error or significance or alpha. It is also called the statistical significance.

The probability of not rejecting the null hypothesis when it is true is 1 – alpha. This is called the confidence level.

Now let us go back to our data in hand and we know it is the correct information. In such case, we know our null hypothesis is false. However, we falsely accept the null hypothesis. The probability of this error is called __Type II__ error or beta.

The __power__ of a test is the probability of us getting data that rightly rejects the null hypothesis.

Power = 1 – beta

An optimal test reduces both Type I and Type II errors. Therefore, the tests must be designed in a way not to accept the existing belief falsely or reject the existing belief falsely. As the power of the test increases, we are moving towards a credible test.

