Recently, I read some comments about the spread between the minimum and the maximum of n random variables which are uniformly in [0,1]. Usually, these comments refer to some „known facts“. But we should not be shy to try to derive the results we cite. In fact, it is not difficult.
First of all, the minimum of n independent random variables is bigger than x exactly if all are bigger than c. So the probability of this rather easy to compute. We get for the cumulative distribution
\(P(X \le x) = 1-(1-x)^n.\)
Now we get for the density function of the minimum
\(g(x) = \dfrac{d}{dx} P(X \le x) = n (1-x)^{n-1}.\)
In case you do not know the reasoning behind this: g(x)dx is the probability to find the minimum in the interval [x,x+dx]. We have
\(P(x \le X \le X+dx) = P(X \le x+dx) – P(X \le x).\)
Divide by dx and let dx to 0, and you get the density as the derivative of the cumulative distribution.
I do not trust myself really. So let us check this in Euler Math Toolbox (EMT). We generate 1 million samples of 10 random numbers and plot the distribution of the minima of the samples versus our density function.
>m=1000000; n=10; X=random(m,n); v=max(X)'; w=min(X)'; >plot2d(w,distribution=50); plot2d("n*(1-x)^(n-1)",>add,color=red):
Or we can check specific values.
>x=0.05; sum(w<=x)/m, 1-(1-x)^n 0.401011 0.401263060762
A similar computation yields the distribution of the maxima of the samples.
Now it is not exactly correct to compute the spread (difference between maxima and minima) as the distribution of the difference of these two distributions, because they are not independent. In first approximation, the result will be correct. But there is another approach.
If we assume that the minimum is in x, the spread will be less than d if all other values are in [x,x+d]. This applies for x less than 1-d. For x greater than 1-d the spread will always be less than 1-d. Thus we get for the cumulative distribution of the spread S
\(P(S \le d) = \int\limits_0^{1-d} n (1-x)^{n-1} \left(\dfrac{d}{1-x}\right)^{n-1} \, dx + P(\text{Min} \ge 1-d).\)
The integral is in fact over a constant function. If we enter this nevertheless into Maxima via EMT, we get
>&assume(n>2); &assume(d>0); &assume(d<1); >function s(d) &= diff(integrate(n*d^(n-1),x,0,1-d) ... > +integrate(n*(1-x)^(n-1),x,1-d,1),d) n - 2 (1 - d) d (n - 1) n >plot2d(v-w,distribution=50); >plot2d("s",>add,color=red):
This looks right. Again, we check a value.
>sum(v-w<=0.9)/m 0.736254 >function cf(d) &= integrate(n*d^(n-1),x,0,1-d) ... > +integrate(n*(1-x)^(n-1),x,1-d,1) n - 1 n (1 - d) d n + d >cf(0.9) 0.7360989291
Could you please explain the interval length d/(1-x) a bit more?
Maybe it becomes clearer after I fixed the second formula. Sorry about that. But d/(1-x) is simply the chance to be in [x,x+d] under the condition that you are already >=x.