I recently came across a really interesting concept in statistics while working through this course on #Statistics as part of the #DataScience specialization.
Introducing, The Central Limit Theorem:
First, the boring bit, the theoretical definition “central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.”
In real life what this implies is that if you were to examine bunch of random samples, the average of any quantity of that sample across those samples is approximately normally distributed. This is independent of the distribution of that quantitative value within the sample.
For example, if we were conducting a census and sent out a bunch of teams to measure, say, the height of the population which could be, say, uniformally distributed. The mean of the heights measured by each team would be normally distributed.
Further, the larger the sample size that each team measured, the less the variance in the distribution of the means of each of the samples would be.
If were to conduct the same census on Mars whose populations’ heights are normally distributed, the mean of the heights measured by each team would still be normally distributed.