If you want to have some fun have a look at this site. Don’t miss the video on the bottom. It contains great insights on the difference between correlation and causality.
The page does not explain details about the results and why they happen as they do. Let us generate 100 rows of 10 random data. Then how many will correlate better than 0.99? The answer is: Almost never any. We need to make a small trick. We manipulate the data such that they have a trend. We do that by taking the cumulative sum of random numbers which are evenly distributed in [-0.2,0.8]
n=100; X=cumsum(random(n,10)-0.2); >X=X-mean(X); X=X/sqrt(sum(X^2)); >C=X.X'; C=setdiag(C,0,0); >(totalsum(C>0.99))/(n^2-n) 0.00767676767677
So there are almost 1% of good correlations to be found. That’s not a miracle and can be explained by the trend. Let us plot one of the good correlations.
>i=min(nonzeros(sum(C>0.99)'>1)), j=nonzeros(C[i]>0.99), 2 [13, 14, 81] >x=scalematrix(X[i])_scalematrix(X[j]); ... >aspect(2); plot2d(x); plot2d(x,>points,>add):
If we remove the trend between these two variables the correlation number is far less impressive, but still existent and prominent in the plot. It is a mere random fact which happens in any 100 rows of 10 data with a growing trend.
>n=1:10; >pa=polyfit(n,a,1); a=a-polyval(pa,n); >pb=polyfit(n,b,1); b=b-polyval(pb,n); >correl(a,b) 0.890282247324 >plot2d(scalematrix(a)_scalematrix(b)):
However, correlation is indeed difficult. Often correlated variables depend on some other variable. An example is the connection between good child care and later success in life. Clearly both are most likely to occur in privileged families. The same applies to good education and wealth. The latter correlation has often led to the advice to enforce education for a a better society. While education is certainly a good thing to have it does not help with injustice. You will simply get underpaid or unemployed but well educated workers.