About Vectorization

Me being the author of a matrix language, you will be surprised that I do no longer favor the matrix approach in education. In case you do not know what I am talking about: A matrix language avoids loops by vectorizing functions and operators.

Let me give you an example. In a classical language you would have to write something like the following.

>t=0:0.1:10;
>s=zeros(size(t));
>for k=1 to length(t); s[k]=sin(t[k])*exp(-t[k]); end;

The syntax is from EMT, but it does not matter. You need to declare a vector, then fill it with a loop. In a matrix language, you can simply write

>t=0:0.1:10;
>s=sin(t)*exp(-t);

The function sin() will operate on vectors and create a vector of the sine of all elements. Likewise, exp() and the operator * of the multiplication.

EMT has made that more comfortable than Matlab, allowing to combine column and row vectors to matrices, and introducing functions that map automatically to the elements of their arguments. For more, see the documentation of EMT.

This is very handy. It seems that with this trick you can write short code that is easily understandable even by non-programmers. But is that really so?

The problems start if you want to vectorize more complicated data structures, or want to do more complicated things than just applying a function to a vector. You will soon find that you have to use your pick of matrix language on a very high level of proficiency. Here is one example which is just a little above the basics: Create a vector that contains those elements of a vector where the next element is not smaller.

>x=normal(100);
>x[nonzeros(diff(x)>=0)]

I bet it is easier to write a loop in an ordinary programming language than to find out how that works in your choice of matrix language and in an efficient way.

My key experience was the following task in Matlab: Read in the words in 10000 spam mails and 10000 good mails, and make a statistical function that can detect spam mail by the words in the mail. Yes, I could to it! It took a full day to find the proper Matlab tricks and get it to run under two minutes. The next day I tried in Java. It took me two hours to write the code using Java’s tokenizers and dictionaries. And the program finished the task in under 5 seconds! That typically happens if you use the wrong tool.

So, is this an argument to drop matrix languages altogether? No, certainly not. They are way too practical for simple purposes like handling simple data or a plotting a function. But you should not base the education in universities or colleges on tools like Matlab or EMT. Those tools go into the wrong direction if they are taken too seriously. I would even include R into this collection. But R is more often used for the  specialized statistical functions it contains, so it has merits that no current programming language can provide.

You need to ground the base with a programming language like C, C++, Java or Python. There, the student can learn about vectors, matrices and loops to handle them. These languages do also contain advanced tools like dictionaries and other collections. And you can program your own data structures. They are also way more efficient than any of the all-in-one packages that come with a matrix language. Moreover, it is not at all difficult to create the matrix functionality in Java.

Python is a not the worst of ideas to use for education in programming. It is a basic language, and there are tools for all sorts of purposes, including numerical matrices, statistics or symbolic computations. It is not as basic as C++ or Java, however, and quite different from languages used in the industry. But it is, for sure, closer to those real world programs than Matlab.

09. Januar 2017 von mga010
Kategorien: English | Schreibe einen Kommentar

Schreibe einen Kommentar

Pflichtfelder sind mit * markiert