My friend Maria is a great cook. Her pizzas are more than delicious, they are addictive. So, the chef and the fan teamed up and decided to open a small pizzeria in our neighborhood.
I perform a market study that showed 65% people in my town like pizzas with 95% confidence level with + / – 5 percentage points. What does this mean?
Since it is not possible to collect the opinion of all the people in my town, we take samples of say 500 people’s opinion on pizzas. This means that each time I repeat the sampling, the resulting opinion on pizzas will be of the population’s opinion on pizzas 95% of the time. It is important that the sample size be large enough so that the results are as close to the population as it can be.
+ / – 5 suggests the people who like pizzas in this town are 60% to 70% about 95% of the time.
Armed with Maria’s awesome pizza recipes and understanding of the market demand, we open the pizzeria.
We had overwhelming sales in the first week, where we served 945 pizzas.
Considering the demand for our pizzas we decide to invest more in the business. But before taking this big step we want to make sure this high demand trend reflects the population. Or who knows people could be coming in because of the novelty.
When we want to compare our sample with the population to see if our sample fits well with the population, we use the chi-square test.
Therefore, I can use the chi-square test to check if the sample sales fit the population.
Null Hypothesis H – The sales reflect the demand.
Alternate Hypothesis Ha – The sales do not reflect the demand.
The following table is a subset of the data. Please download the workbook from the link below to gain access to the entire work.
Types of pizzas = 20
Total number of orders = 945
Calculating the Chi-Square Statistic
Step 1 – Calculate the Expected Value
The expected value is the average value I get when I perform an experiment a large number of times. Since I have only one week’s worth of sales data, spread my expected order equally among all the pizzas on the menu.
Expected Orders = Total Orders / Types of Pizzas
Step 2 onwards show how to calculate the p-value manually. If you are using Excel, just type the formula “ = CHISQ.TEST(ACTUAL RANGE, EXPECTED RANGE)” to get the p- value.
Go to Step directly to interpret the output.
Step 2 – Calculate Residual for Each Order
Residual = Actual Order – Expected Order
Step 3 – Square Residuals
Step 4 – Calculate (Square of Residual / Expected Orders)
Step 5 – Sum all the values from Step 4 to get the Chi-Square statistic.
Chi-Square Statistic = 11.25
High Chi-Square statistic indicates lesser fit with population and low Chi-Square statistic indicates higher fit with the population.
Step 6 – Calculate Degrees of Freedom
While calculating a metric we use several data points. Some of those data points can be flexible, where even if we change the values, the value of the calculated metric remains the same.
(9+10+11) / 3 = 10
If you want to keep the average as 10 and vary either 9, 10, or 11, you can do so only with 2 data points and keep the third one fixed.
Degree of Freedom = n – 1
n is the number of data points.
In our case, Degree of Freedom = 1 – 20 = 19
Step 7 – Using the Chi-Square statistic and the degree of freedom, we can find P value from a Chi-Square table.
The p-value is used to understand if the results we got from the experiment are occurred by random chance or because of significant causes. If the p-value is low, the less likely the results of the experiment are random. If it is a large percentage, it means it is highly likely that the results of the experiment are random.
It also helps us to reject or not reject the null hypothesis in comparison with alpha levels (5% is the usual standard). Alpha level is the probability of rejecting the null hypothesis (pervading belief) when it is true.
It can also be considered as 100% – Confidence Interval (95% in our case)
Alpha level = 100% – 95% = 5%
If p-value < = alpha level, we reject the null hypothesis.
If p-value > alpha level, we cannot reject the null hypothesis.
Coming back to our business case, the p-value from the chart between 0.900 and 0.950.
P Value = 0.91499
Converted in to percentage, p-value = 91.50%
Therefore, we not only have a very large p-value but also higher than 5% alpha level. Therefore, we cannot reject our null hypothesis. The sales indeed reflect the demand.
E-mail us at she@shedrivesdata.com to inspire our readers with your story – be it your success story or a lesson learned, share what you learned or send some love to a friend. We would love to hear from you!