A recent discussion about confidence intervals with users of statistical packages convinced me that there are some misunderstandings underway.

There is the simple form of a confidence interval, which is easy to understand. Assume we know the mean value and the standard distribution of a normal distributed random variable and make 6 independent repetitions, then we can easily give an interval such that 95% of the mean values of 6 repetitions are in that interval. Note, that this interval is not unique, but we will usually select a symmetric interval around the mean value.

But the other form is more difficult to understand, yet it is very frequent. We determine a confidence interval for a parameter of a random variable given an experimental output. For an example, we deduce from the observation of the data in the following EMT code that the expected value of the underlying random variable is in the interval [996.5,1005.79] with 95% confidence.

>data=[1001,997,1006,998,995,1010] [1001, 997, 1006, 998, 995, 1010] >f=invnormaldis(1-2.5%) 1.95996398454 >m=mean(data), d=dev(data), n=cols(data) 1001.16666667 5.77638872191 6 >m+[-f*d/sqrt(n),f*d/sqrt(n)] [996.545, 1005.79]

But what does this really mean? The interval [996.545,1005.79] we computed as a confidence interval for the expected value of our random variable is in fact the interval where the experimental mean value of 6 random experiments is in 95% of all cases under the assumptions that the mean value and the standard deviation are just the values we saw in the data. How can we say that this interval is a confidence interval for the unknown expected value of our random variable given the observed data? This is not easy to interpret.

There is one way to understand such estimates related to test theory. We assume for simplicity that the observed standard deviation is correct. Then, if the mean value of the data is larger than the upper bound 1005.79 of our interval, we see by the statistical properties of the normal distribution, that we have less than 2.5% chance to observe a mean value of 6 experiments which is less than 1001.17. Here is a picture of the situation, where we assume that the true mean value would be 1006.

The curve is the density for the distribution of the mean value of 6 experiments assuming an expected value of 1006 and the computed standard deviation 5.78. The green area has a proportion of less than 2.5%.

Likewise, if the true mean value would be less than the lower bound 996.454 of our interval, we would have less than 2.5% chance to get a mean value of 1001.17 or higher.

We conclude that if the expected mean value is outside of our interval, our observed mean value is rare. In some sense, it has less than 5% chance of happening. This is a somewhat imprecise statement. So we seek a better formulation of the meaning of our interval.

Fixing the estimator is the key to the understanding of the meaning of the estimator. So let us fix our procedure for finding an interval for m, given n observations. In EMT, we can use the following code. It is almost the same as above. But we used the inverse student-t distribution.

>function estimatemean (data, alpha=5%) ... $ n=cols(data); {m,d}=meandev(data); $ f=invtdis(1-alpha/2,n-1); $ return m+[-f*d/sqrt(n),f*d/sqrt(n)]; $endfunction >estimatemean([1001,997,1006,998,995,1010]) [995.105, 1007.23]

Our estimator should have the property that for all normal distributed random experiments with any mean m and any standard deviation d, the estimator must return an interval such that m is in that interval in 95% of the cases.

For the normal distribution, we need to check this only for m=0 and d=1. For this, the probability that 0 is in the interval has to be computed.

\(P \{ 0 \in [m-fd/\sqrt{n},m+fd/\sqrt{n}] \} \ge 1-\alpha\)

That is equivalent to

\(P \{ \dfrac{\sqrt{n} |m|}{d} \le f \} \ge 1-\alpha\)

The random variable in that equation is indeed student-t distributed being the quotient of an experimental mean and and experimental standard deviation. That is why we have replaced the easier to understand normal distribution by the student-t distribution.

There is a similar confidence interval for the parameter p of a binomial distribution. If we repeat a binomial experiment with probability p for success n times, we expect to see a success pn times. Now, if p is not known, we see a number of successes, say k. The estimate for p is clearly k/n. But we can determine a confidence interval for p in the same manner as above. This interval is called a Clopper-Pearson interval. In EMT, this is implemented in a function. Assume we observe 10 successes in 100 tries. Then the estimate for p is 0.1 and the following function computes a confidence interval.

>type clopperpearson function clopperpearson (k: natural, n: natural, alpha) a=invbetai(alpha/2,k,n-k+1); b=invbetai(1-alpha/2,k+1,n-k); return [a,b]; endfunction >clopperpearson(10,100) [0.0490047, 0.176223]

The inverse incomplete beta function is used only because it computes the inverse binomial distribution in a fast way.