Syntax Design Errors in Programming Languages

The syntax of a programming language is more important than most people think. It makes the difference between beeing able to write clear code or not. This is especially true with systems like Matlab, Euler Math Toolbox or the R project, because these systems are used interactively. A syntax can be mysterious and unable to decipher without tons of comments, or it can be as clear as a morning in spring.

One of the worst examples is Latex. The scripting language is already hard to master, but at least it tries to be clearly readable. But the programming language is out of reasonable bounds. Here is an example

  \setbox\@picbox=\hbox to#1\unitlength\bgroup \let\line=\@line
    \kern-#3\unitlength \lower#4\unitlength\hbox\bgroup\ignorespaces}
  \ht\@picbox=\@picheight \dp\@picbox=\z@

Okay, this is from the source files and only meant to be read by specialists.

Let us go to more common examples from Java or C. There are lots of things that make programming and debugging unnecessarily hard. One known example is „=“ instead of „==“. Today most compilers would detect the assignment error in the following statement.

if (x=4);
{   printf("x is now %d. Reset x to 0.\n",x);
    x = 0;

But almost none detects the wrong semicolon in the first line. By what I call a design error a lonely „;“ is the no-operation statement. That is a bad idea. If you ever struggled for an hour to find an innocent „;“ you will understand my point. Unfortunately, Java has inherited the same problem. It would have been far better to introduce a key word „noop“.

The format string in the printf() above is another thing. It is useful and readable with some basic knowledge. But it is not safe, unless the compiler does some checks. In Euler Math Toolbox, I am doing the same checks and allow only a selection of format strings. Without checks it is a source of errors and crashes, and may even be used to be exploited as a security hole.

Another topic you will struggle with throughout your career as a programmer is „/“ for integer division. Even Python, a well designed language, had this problem until they discovered that it is a bad idea and removed it in Python 3. Additionally, the use of „%“ „~“ „^“ and the like for operators in C is unnecessary and only makes the code unreadable.

Another frequent error in C is the following.

for (int i=0; i<10; i++)
  for (int j=0; j<i; i++)
     A[i][j] = i*j;

The flexible design of the for-syntax makes this error possible. And without a pre-compiler it is not possible to change this to a more fail safe syntax.

For another example enter the following into R.

> -3<4
> -3<-4

The first yields TRUE, the second does not work. It simply is no good idea to use „<-“ for assignments. The langauge Pascal introduced „:=“ for this which I consider the best idea. By the way, it has also removed many other problems. But this language was very restricted on the library side and is out of fashion now.

In Matlab or Scilab and the like, spaces suddenly matter in vector definitions. Try

> 1 -4
> [1 -4]

I removed that in EMT, as well as the use of round brackets for vector elements, which makes f(4) look like v(4).

One could continue like this. But there is a more important topic. The layout of a code is just as important as the syntax of individual commands. Usually, it is left to the educated programmer.

This leads to programmers who think it is a good idea to write as much as possible into one line. On the extreme side, there are Perl programmers who for fun or not are proud of things like the following.

my $z = sub { grep { $a=$_; !grep { !($a % $_) } (2..$_-1)} (2..$_[0]) }

But even in ordinary programming, the desire to be as clever as possible usually yields mysterious code which is impossible to check for correctness.

The best idea is due to Python. It is line oriented and indentation has a meaning in the syntax.

Unfortunately, matrix languages like Euler Math Toolbox or Matlab have another problem: We need to cleverly vectorize computations to avoid loops. The early motivation for this was to get a concise language for interactive math. But that is a bad idea as soon as longer code has to developed. I remember a project involving reading words, sorting them into a dictionary and doing some statistics with the words. In Matlab, it took me a day to get this running in 2 minutes, and I had to do a lot of vector tricks to achieve this. In Java I could do it in one hour with a program that did the job in 6 seconds.

For another example, let us do make a statistics on the frequency of 10 integers in an array of 100 random integers in Euler Math Toolbox.

>seed(1); i=intrandom(1,100,10);
>u=unique(i); m=getmultiplicities(u,i);
       1    2    3    4    5    6    7    8    9   10
      10    7    7    8    9   12   11   11   16    9

This is clever and not too unreadable. But anyone with a basic knowledge in programming could to the same in C code. For a demonstration on how to code in EMT, I do in EMT.

>m=zeros(10);  ...
>for k=1 to length(i); j=i[k]; m[j]=m[j]+1; end; ...
   [10,  7,  7,  8,  9,  12,  11,  11,  16,  9]

But this does not work for strings, while getmultiplicities() and unique() does. The clever algorithm is to sort the unique elements and the original vector and work with the sorted arrays. For this, we need libraries in our programming language and a bit of solid programming skills.

Thinking about this, I find it far more rewarding for a student to learn programming skills in a general programming language like C++, Java or Python which has good libraries than to struggle with R or Matlab. For a mathematician, the basic knowledge about algorithms and their coding belongs to the university education without any doubt. But a commercial program like Matlab does not.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

Diese Website verwendet Akismet, um Spam zu reduzieren. Erfahre mehr darüber, wie deine Kommentardaten verarbeitet werden.