In general, an X% confidence interval should capture the population parameter of interest in X% of samples. In this blog post, I perform a 2 × 4 × 2 factorial simulation study to compare..
Posted by
Alfred Prah
on March 17, 2020 ·
7 mins read
Introduction
Coverage probability is an important operating characteristic of methods for constructing interval estimates, particularly confidence intervals. We care about it because it is the proportion of the time that the interval contains the true value of parameter of interest. It can be defined as the long run proportion of intervals that capture the population parameter of interest. Conceptually, one can calculate the coverage probability with the following steps:
generate a sample of size N from a known distribution
construct a confidence interval
determine if the confidence captures the population parameter
Repeat steps (1) - (3) many times. Estimate the coverage probability as the proportion of samples for which the confidence interval captured the population parameter
Ideally, a 95% confidence interval will capture the population parameter of interest in 95% of samples. One can also calculate 80% or 90% confidence intervals. In general, an X% confidence interval should capture the population parameter of interest in X% of samples. In this blog post, I will perform a 2 × 4 × 2 factorial simulation study to compare the coverage probability of various methods of calculating 90% confidence intervals. The three factors in the experiment are:
1. True, underlying distribution
standard normal
gamma(shape = 1.4, scale = 3)
2. Model
method of moments with normal
method of moments with gamma
kernel density estimation
bootstrap
3. Parameter of interest
sample min (1st order statistic)
median
Other settings in the experiment that will not change are:
Sample size, N = 201
Outside the loop estimation
Generating Data
The true, underlying distribution is either the Standard Normal distribution with mean = 0 and standard edeviation = 1 or a Gamma distribution with shape = 1.4 and scale = 3.
As mentioned earlier, there are 4 models we will be investigating in this experiment:
method of moments with normal, method of moments with gamma, kernel density estimation and boostrap.
To calculate the parameter of interest for each of these models, we will generate sample that have the same sample size as the data in the last step, and then calculte the parameter of interest(min/median). We can repeat this step several times but for the purposes of this blog post, I'll limit the replicates to 5000. Now let's define the 90% confidence interval of the parameter of interest as the middle 90% of the sampling distribution of the parameter of interest. The lower confidence limit for a parameter of interest is the 0.05 quantile. The upper confidence limit for a median is the 0.95 quantile.
The confidence interval will capture the true paramter if the lower confidence limit is less than the true parameter, and the upper confidence limit is greater than the true parameter. To execute the "parameter-capturing" process, let's create a function that tests whether the confidence interval captured the true parameter or not. The function will return a 1 if the confidence interval captured the true parameter or a 0 otherwise.
capture_par <-function(ci,true.par){
1*(ci[1]
Coverage Probability
It is now time to calculate the Coverage Probability, the long run proportion of intervals that capture the population parameter of interest. To calculate the coverage probability for our different models, we will compute the mean of "captures" by repeating the above steps: generate_data %>% estimate.ci %>% capture_par
For the purposes of this blog post, I repeat this 1000 times. The values obtained as the means of captures are our Coverage Probability.
The coverage probabilities for our various combinations are shown in the table below: