The central limit theorem is an important computational short-cut for generating and making inference from the sampling distribution of the mean. The central limit theorem short-cut relies on a number of conditions, specifically:
In this simulation study, I use QQ-plots to graphically compare the sampling distribution of the mean generated by simulation to the sampling distribution implied by the central limit theorem. For the purposes of this simulation, let’s treat the mean and variance as known values and use the actual population parameters and the population mean and variance instead of sample estimates.
To generate a sample, I initialize the parameters (size of sampling distribution, central location and scale of the skew-normal distribution) that do not change at the beginning.
To pave the way for comparison, I conduct a 4 x 4 factorial experiment to compare the distributions in the QQ-plots. The first factor is the sample size, with N = 5, 10, 20, and 40. The second factor is the degree of skewness in the underlying distribution. The underlying distribution is the Skew-Normal distribution.
The Skew-Normal distribution has three parameters: location, scale, and slant. When the slant parameter is 0, the distribution reverts to the normal distribution. As the slant parameter increases, the distribution becomes increasingly skewed. In this simulation, slant will be set to 0, 2, 10, 100. Set location and scale to 0 and 1, respectively, for all simulation settings.
par(mfrow = c(4,5))
for (slant in c(0, 2, 10, 100)) {
## Plot distribution
if (slant == 0) {
curve(dsn(x, xi = location, omega = scale, alpha = slant), -3, 3, main= "Distribution", xlab = "", ylab = paste0("Slant = ", slant), xaxt = "n", yaxt = "n", cex.main = 1.5, cex.lab = 1.5)
}
else {
curve(dsn(x, xi = location, omega = scale, alpha = slant), -3, 3, xlab = "", ylab = paste0("Slant = ", slant), xaxt = "n", yaxt = "n", cex.lab = 1.5)
}
for (N in c(5, 10, 20, 40)) {
delta <- slant/(sqrt(1+slant^2))
pop_mean <- location+scale*delta*(sqrt(2/pi))
pop_sd <- sqrt(scale^2*(1-(2*delta^2)/pi))
Z <- rnorm(R)
sample_dist_clt <- Z*(pop_sd/sqrt(N)) + pop_mean
random.skew <- array(rsn(R*N, xi = location, omega = scale, alpha = slant), dim = c(R,N))
sample_dist_sim <- apply(random.skew, 1, mean)
# QQ plot
if (slant == 0){
qqplot(sample_dist_sim, sample_dist_clt, asp = 1, main = paste0("N = ", N), xlab = "", ylab = "", xaxt = "n", yaxt = "n", xlim = c(-1.6,2), ylim = c(-1.6, 2.2), cex.main = 1.5, cex.lab = 1.5)
abline(0,1)
}
else {
qqplot(sample_dist_sim, sample_dist_clt, asp = 1, xlab = "", ylab = "", xaxt = "n", yaxt = "n", xlim = c(-1.6,2.2), ylim = c(-1.6, 1.6), cex.lab = 1.5)
abline(0,1)
}
}
}