Rankings in Euler Math Toolbox

Euler contains some functions for statistical evaluations of vectors. E.g., the function indexofsorted(v,x) effectively searches elements of x in a sorted vector v.  The complexity is n*log(n), which the complexity of the sort algorithm.

>n=1:1000000; m=shuffle(n);
>tic; indexofsorted(n,m); toc;
 Used 0.33 seconds

In statistics, the ranks of elements in a vector are supposed to be the mean rank, if the the element appears more than once. Instead of doing that in a loop, you should use the matrix language of Euler and existing functions. In fact, the function ranks is available for one vector.

>n=intrandom(1,10,10)
 [2,  6,  7,  3,  3,  2,  4,  5,  6,  1]
>ranks(n)
 [2.5,  8.5,  10,  4.5,  4.5,  2.5,  6,  7,  8.5,  1]

This returns 2.5 for 2, since it appears twice in the list. How can you implement ranks() effectively? Here is the definition.

>type ranks
 function ranks (x)
     {x,i}=sort(x);
     r=indexofsorted(x,x)-(multofsorted(x,x)-1)/2;
     {i,i2}=sort(i);
     return r[i2];
 endfunction

You see that x is sorted, because indexofsorted() and multofsorted() are much more efficient than the unsorted versions. Adding the multiplicities minus 1 is just the right way of computing the mean ranks.

Finally, we have to put the results r[i] in the right order, the original order of x. This is done by computing the inverse permutation of the index vector i, which was previously returned by sorting x. As a matter of fact, the inverse permutation is the index result of sorting the permutation.

 

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht.

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.