The Central Limit Theorem was first seen in 1738 in Abraham de Moivre’s (a French mathematician) book the Doctrine of Chances and has been proved several times since. It has a range of applications including finance, networks, sports, data analytics etc.
In this article, I will illustrate my business case where I am using the Central Limit Theorem to open a fashion store.
The central limit theorem helps make sense a large population of data no matter how it is distributed. When we sample the data, we not only see that the sample mean approximates to the population mean, but it also helps us make assumptions about the population. If the sample size is more than 50, we can safely assume that The Central Limit Theorem is at work.
Also, when we do not have the data of the entire population (as in many cases), when we draw a large number of samples, the sample means approximate to the population mean.
The normal distribution of the means of samples relates Central Limit Theorem to probability. If you have an event repeatedly occurring in past with a mean and standard deviation, then we can calculate the probability of the event occurring using z scores (Example is given at the later part of the article).
Welcome to My Store!
I worked hard this year and earned myself a good bonus. Usually, I just go on a trip using this extra cash but it has been a while since I started thinking of a side hustle. I want to start an online clothing store. I first selected some suppliers and got some sample tops, skirts, jackets, dresses, pants and intimates. I want to check if my taste is sellable.
I want to check if my selection is any good, so I send it over to my 100 friends who are fashion influencers (who also happen to be very famous women) and ask them to rate each category on a range of 1 to 5. I have 600 ratings in hand and without going into details, I want to just check on how they rated my selection. Where is the central tendency heading to?
The following is a subset of the data.
The following table is a subset of the data. Please download the workbook from the link below to gain access to the entire work.
Central Limit Theorem – Workbook
I take the mean ratings each friend has given me.
I then take each unique mean and calculate its frequency.
Now let’s watch the central limit theorem at work with the help of the following table.
Mean of all 600 ratings (population) = 3.0
Mean of mean of ratings by each friend = 2.9
Inference – The mean of the population is extremely close to the mean of the samples. Also note that as the sample size increases, the mean of sample means moves closer to the population mean.
What about the standard deviations?
As you can see, the population is a bit spread out. We can find out how spread out the sample means are using the population mean and sample size.
Inference – Standard Deviation of the Sample Mean = Standard Deviation of Population / Squared Root of Sample Size
When you look at the distribution of frequency of sample means, we can see that it forms a normal distribution. The mean = median = mode for the mean of sample means, which is true for a normal distribution.
The chart below shows the normally distributed sample means.
Using The Central Limit Theorem, we have gained an important insight about the suppliers. Our influencer friend’s ratings approximate to only 3.0 out of 5.0. So maybe we must try samples from better supplier to cater to the influencers.
Changing My Supplier Based on Insight
Based on my insight from the trial, I inform the supplier that my potential customers are dissatisfied with the collection. She then informs me that he has some new samples in stock which are within my budget. I decided to give her another chance and get some more samples.
Before sending the samples to my friends, I want to check what is the probability that they will give the collection a rating of less than 4.0.
From the previous trial, I have the following information:
Mean of Ratings = 3.0
Std. Deviation of Rates = 1.4
Based on this information we can say:
Mean of Sample Ratings = 3.0
Sample Size = 6
Standard Deviation of Mean Sample Ratings = Mean of Ratings/ Square Root(Sample Size) = 0.6
Using the standard deviation of mean sample ratings we can calculate the Z score and then use the Z – table to calculate the probability that the ratings will be less than 4.0 for this supplier’s stock.
P(Rating from Friends < 4.0) = 96.56%
E-mail us at she@shedrivesdata.com to inspire our readers with your story – be it your success story or a lesson learned, share what you learned or send some love to a friend. We would love to hear from you!