Thanks to Kit Baum, a new package is available through SSC that simulates the central limit theorem: sdist.
A sound understanding of the central limit theorem is crucial for comprehending parametric inferential statistics. Despite this, undergraduate and graduate students alike often struggle with grasping how the theorem actually works and why researchers rely on its properties to draw inferences from a single unbiased random sample. This package, sdist, offers a tool for teaching and learning the central limit theorem via easy-to-generate simulations. Specifically, sdist can be used to simulate the central limit theorem by (1) generating a matrix of randomly-generated normal or non-normal variables, (2) plotting the associated empirical sampling distribution of sample means, (3) comparing the true sampling distribution standard deviation to the standard error from the first randomly-generated sample, and (4) automatically producing a side-by-side comparison of the two distributions.
The package can be obtained using the following:
The code is purposefully kept simple to promote student use and experimentation. For example, if the student wishes to simulate the central limit theorem by comparing the standard deviation from an empirical sampling distribution from 500 random samples following a uniform distribution to the standard error estimate from one of these random samples, the student would type the following:
Where obs(500) indicates that the student wants 500 observations in each of the 500 samples.
When executed here, the code produced the following simple output:
sdist also produced the following graph, with the empirically-generated sampling distribution and its associated parameters in the top panel and the variable distribution from one of the random samples and its parameters in the bottom panel.

The point, of course, is to illustrate that that standard error estimate from the bottom panel is incredibly close in value to the observed standard deviation of the sampling distribution--despite the fact that the variable is not normally distributed. (In this example, the standard deviation of the sampling distribution and the standard error are reported as being exactly the same; however, as shown in the package help file, the student can use the round() option to report more exact estimates.)
Uniform, normal, and Poisson distributions are currently available. Any results can be reproduced using the set seed function prior to executing the sdist command.
Best,
Marshall
A sound understanding of the central limit theorem is crucial for comprehending parametric inferential statistics. Despite this, undergraduate and graduate students alike often struggle with grasping how the theorem actually works and why researchers rely on its properties to draw inferences from a single unbiased random sample. This package, sdist, offers a tool for teaching and learning the central limit theorem via easy-to-generate simulations. Specifically, sdist can be used to simulate the central limit theorem by (1) generating a matrix of randomly-generated normal or non-normal variables, (2) plotting the associated empirical sampling distribution of sample means, (3) comparing the true sampling distribution standard deviation to the standard error from the first randomly-generated sample, and (4) automatically producing a side-by-side comparison of the two distributions.
The package can be obtained using the following:
Code:
ssc install sdist
Code:
sdist, samples(500) obs(500) type(uniform)
When executed here, the code produced the following simple output:
Code:
------------------ sd/se ------------------ sig_Xb .013 se_Xb .013 abs(diff) 0 ------------------ The difference between sig_Xb and se_Xb is 0. The larger this difference, the poorer the single X variable standard error approximates the standard deviation of the sampling distribution. This may be due to one of two things: a small number of samples and/or a small sample size.
sdist also produced the following graph, with the empirically-generated sampling distribution and its associated parameters in the top panel and the variable distribution from one of the random samples and its parameters in the bottom panel.
The point, of course, is to illustrate that that standard error estimate from the bottom panel is incredibly close in value to the observed standard deviation of the sampling distribution--despite the fact that the variable is not normally distributed. (In this example, the standard deviation of the sampling distribution and the standard error are reported as being exactly the same; however, as shown in the package help file, the student can use the round() option to report more exact estimates.)
Uniform, normal, and Poisson distributions are currently available. Any results can be reproduced using the set seed function prior to executing the sdist command.
Best,
Marshall