Archiv der Kategorie: Uncategorized – My Latex files are none of your business!

I have been using the ArXiv only once till now for a paper on Optimization which is too small to be published elsewhere. I thought it might be a good idea to upload my script on complex functions of the class I did 2020/2021. But I can’t do that.

The ArXiv wants my Latex code, my images and all files I have. Uploading the PDF is rejected because the PDF is detected to have been done by Latex. The server wants to produce an own version of the PDF from my files. I don’t agree to that procedure.

The final version of the script is the PDF. I do not want anybody else to produce an own copy of my work. I have no problems sending source code to colleagues to be used in their own classes, but not to the public. Even though this script will be completely free and open, I do not want to give away the control over its content.

Moreover, the script failed to compile on the server anyway. I tried it to see if the idea even works. But it does not as expected. I have my own organization of the files, and even changed that organization for the test. Of course, everything compiles here without problems.

My Latex files are none of your business, ArXiv.

Simulating and solving Penney’s game

I recently stumbled across some YouTube videos about Penney’s game. Here is one of them. The problem was actually new to me, like almost all problems in mathematics. There simply are too many. It is easier to come up with a problem than a a solution.

According to Wikipedia, Walter Penney published the problem in the Journal of Recreational Mathematics in 1969. The article seems to be hard to get, so simply let us start an analysis ourselves. After all, that sounds like a lot of fun, and I have written about a similar problem in this blog before.

Assume we toss coins with outcome 0 and 1 at equal probability. So we get a sequence of tosses like


We are studying the following game between two players:

  • Player A selects a triplet of tosses (like 0,0,1), and player B another triplet (like 1,0,0).
  • The player whose triplet appears first in the sequence of tosses wins.

Given the sequence above and the selected examples of triplets above, player B would win. His triplet 1,0,0 appeared immediately after four tosses.

If both players select their triplet secretly this is obviously a fair game.

There is a counterintuitive fact here: It is an illusion that the choice of the triplet matters. The average waiting time for any triplet is equal, and selecting (1,0,1) is no better than (1,1,1).

To understand this, we need to see the tossing as a sequence of triplets. In the example above the sequence is the following:

\((0,1,0) \to (1,0,0) \to (0,0,1) \to (0,1,1) \to (1,1,1) \to \ldots\)

All triplets have the same chance (namely 1/8) to appear at the start of the sequence. And any triplet T has exactly two other triplets S1, S2 which can lead to this triplet. Since S1 and S2 have the same chance (namely 1/8) to appear in the n-th position of the sequence and T is selected with probability 1/2 from each, the chance for T to appear in the (n+1)-th position is also 1/8.

After this first excursion into the mathematics of random walks, we need to come back to the third rule of Penney’s Game.

  • The player B selects his triplet after seeing the selected triplet of player A.

Now, there is no reason why player B shouldn’t have an advantage. After all he can simply take the same triplet and get a draw. If he was forced to take another triplet he might even be at a disadvantage. But, indeed, he has a big advantage of at least 2:1. A wrong selection of player A can make his chances even worse.

Let us try to understand this. This is a case of non-transitive ordering.

  • Let us write S>T if the triplet S is more likely to appear before triplet T in the toss sequence when the tosses are started with a random triplet.

Then it is not true that

\(T \ge S \,{\rm and}\, S \ge U \Longrightarrow T \ge U\)

If it were true, the ordering would be complete and player A could select the best triplet, or at least a triplet that cannot be beaten by the selection of player B.

It turns out that for each selected triplet T of player A, there is a selection S which is better than T (S>T). This makes the game a win for player B.

This is counterintuitive again. A consequence is that there is a sequence of triplets with the property

\(T_1 < T_2, \quad T_2 < T_3, \ldots \quad , T_{n-1}<T_n, \quad T_n < T_1\)

In this game, the simple rule to find a better triplet is the following:

  • If player A selects T=(X,Y,Z), player B selects S=(YN,X,Y), where YN is the opposite of Y, i.e., YN=0 if Y=1 and YN=1 if Y=0.

E.g., if player A selects (0,1,0) or (0,1,1), player B selects (0,0,1). For another example, if player A selects (1,1,0) or (1,1,1) player B selects (0,1,1).

How do we understand this? We use a general mathematical model.

A way to understand this is by considering random walks on a directed graph. The graph in this game has 8 vertices, the 8 possible triplets. It has 16 edges, i.e., pairs of vertices, with associated probabilities 1/2. Given any triplet, there are two possible tosses which lead to two possible edges. (Note that some edges go from the triplet to itself!)

E.g., there are two edges leading away from (1,1,0),

\((1,1,0) \to (1,0,0), \quad (1,1,0) \to (1,0,1)\)

or, from (1,1,1)

\((1,1,1) \to (1,1,0), \quad (1,1,1) \to (1,1,1)\)

One mathematical model for uses an adjacency matrix M, containing transition probabilities from vertex to vertex.

  • We set M(i,j) to the probability to go from triplet j to triplet i (i.e., to reach triplet i from triplet j) in one toss. So we have a matrix M of transition probabilities.

Now imagine starting a lot of experiments by walking from a start vertex to subsequent vertices with the given probabilities. The experiments select the start vertex according to a probability vector P, i.e., vertex i is selected with probability P(i).

  • If the distribution of vertices in the experiments is P at a step. Then the matrix multiplication MP will compute the distribution of probabilities at the next step.

Let us do a test in Euler Math Toolbox. First, we generate the adjacency matrix M for our game. Then we start at one specific node and compute the distribution after three steps.

>function makeM () ...
$M = zeros(8,8);
$for i1=0 to 1;
$  for i2=0 to 1;
$    for i3=0 to 1;
$      for j1=0 to 1;
$        for j2=0 to 1;
$          for j3=0 to 1;
$            i=4*i1+2*i2+i3;
$            j=4*j1+2*j2+j3;
$            if i2==j1 and i3==j2 then
$              M[j+1,i+1]=1/2;
$            endif;
$          end;
$        end;
$      end;
$    end;
$  end;
$return M;
>shortest M
    0.5      0      0      0    0.5      0      0      0 
    0.5      0      0      0    0.5      0      0      0 
      0    0.5      0      0      0    0.5      0      0 
      0    0.5      0      0      0    0.5      0      0 
      0      0    0.5      0      0      0    0.5      0 
      0      0    0.5      0      0      0    0.5      0 
      0      0      0    0.5      0      0      0    0.5 
      0      0      0    0.5      0      0      0    0.5 
>p=zeros(8)'; p[1]=1; p'
 [1,  0,  0,  0,  0,  0,  0,  0]
>p=M.p; p'
 [0.5,  0.5,  0,  0,  0,  0,  0,  0]
>p=M.p; p'
 [0.25,  0.25,  0.25,  0.25,  0,  0,  0,  0]
>p=M.p; p'
 [0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125]

The function to generate the matrix M is not the most elegant one. It might be better to use a recursion than a loop. After we generate the matrix, we start with a „distribution“ which puts everything into the first triplet (actually that is (0,0,0)). After only three steps, we see that the probability to reach each triplet is equal.

Let us now model Penney’s game.

We have two triplets T and S (vertex numbers i and j) which lead nowhere. I.e., if those states are met we have found a case where T or S are hit.

We modify the adjacency matrix accordingly and delete all edges leading away from T and S and connect T and S to themselves with probability 1. This is a modified adjacency matrix M(i,j). We now have to look at the following limit.

\(P = \lim_{n \to \infty} M_{i,j}^n P_0\)


\(P_0 = (1/8,\ldots,1/8)^T\)

which is our start distribution.

Let us simulate that in Euler Math Toolbox. We select the triplets T=(0,1,0) (third element of p) and S=(0,0,1) (second element of p) which according to our rule should be superior.

>M[,2]=0; M[2,2]=1;
>M[,3]=0; M[3,3]=1;
>shortest M
    0.5      0      0      0    0.5      0      0      0 
    0.5      1      0      0    0.5      0      0      0 
      0      0      1      0      0    0.5      0      0 
      0      0      0      0      0    0.5      0      0 
      0      0      0      0      0      0    0.5      0 
      0      0      0      0      0      0    0.5      0 
      0      0      0    0.5      0      0      0    0.5 
      0      0      0    0.5      0      0      0    0.5 
>p=ones(8)'/8; p'
 [0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125]
>p=M.p; p'
 [0.125,  0.25,  0.1875,  0.0625,  0.0625,  0.0625,  0.125,  0.125]
>p=M.p; p'
 [0.09375,  0.34375,  0.21875,  0.03125,  0.0625,  0.0625,  0.09375,
>loop 1 to 100; p=M.p; end;
 [0,  0.666667,  0.333333,  0,  0,  0,  0,  0]

Indeed, the two selected triplets catch all the distribution, and S=(0,0,1) is indeed twice as good as T=(0,1,0). I.e., player B will win twice as often.

Things get worse for T=(0,0,0) and S=(1,0,0). S is seven times as good as T! There are other combinations which are three times as good.

Due to the structure of the problem, the limit P will have only non-zero entries for T and S. The probability to be in any other triplet will tend to 0. T and S will catch everything.

But can we compute the winning chance for S without a simulation? For that we define

  • w(i) as the chance to reach S before T when starting at triplet i.

We are using the following properties of w.

  • w(i) = 1 for the vertex S with number i.
  • w(j) = 0 for the vertex T with number j.
  • w satisfies

\(w = M_{i,j}^T \, w\)

We are searching for a specific eigenvector. The following function of EMT computes w given and adjacency matrix M and i,j, and returns the probability to reach vertex i before vertex j.

>function getw (M,i,j) ...
$if i==j then return 0; endif;
$Mt[,i]=0; Mt[i,i]=1;
$Mt[,j]=0; Mt[j,j]=1;
$V = eigenspace(Mt',1);
$c = V[[i,j]]\[1;0];
$return sum((V.c)')/cols(Mt);
>M = makeM();
% Let us try our simulated examples.
>fraction getw(M,2,3), fraction getw(M,5,1)

The simulated example from above works, as well as another example which takes T=(0,0,0) and S=(1,0,0).

For the next step, we compute all probabilities that the triplet number i appears before the triplet number j when starting from a random triplet.

>function solvePG () ...
$M = makeM();
$for i=1 to 8;
$  for j=1 to 8;
$    W[i,j]=getw(M,i,j);
$  end;
$return W;
>fracformat(7); W, defformat;
      0    1/2    2/5    2/5    1/8   5/12   3/10    1/2 
    1/2      0    2/3    2/3    1/4    5/8    1/2   7/10 
    3/5    1/3      0    1/2    1/2    1/2    3/8   7/12 
    3/5    1/3    1/2      0    1/2    1/2    3/4    7/8 
    7/8    3/4    1/2    1/2      0    1/2    1/3    3/5 
   7/12    3/8    1/2    1/2    1/2      0    1/3    3/5 
   7/10    1/2    5/8    1/4    2/3    2/3      0    1/2 
    1/2   3/10   5/12    1/8    2/5    2/5    1/2      0 

The non-transitivity of the problem means that in each column there is a value >1/2. We can select this value and print the corresponding triplet in human readable form.

>function printtriplet (i) ...
$s = ")"; i=i-1;
$if mod(i,2) then s=",1"+s; else s=",0"+s; endif;
$i = floor(i/2);
$if mod(i,2) then s=",1"+s; else s=",0"+s; endif;
$i = floor(i/2);
$if mod(i,2) then s="(1"+s; else s="(0"+s; endif;
$return s;
>function printPG (W) ...
$for j=1 to 8;
$  w=W[,j]';
$  e=extrema(w);
$  printtriplet(j) + " -> " + printtriplet(e[4]),
% We get the well known solution of the game.
 (0,0,0) -> (1,0,0)
 (0,0,1) -> (1,0,0)
 (0,1,0) -> (0,0,1)
 (0,1,1) -> (0,0,1)
 (1,0,0) -> (1,1,0)
 (1,0,1) -> (1,1,0)
 (1,1,0) -> (0,1,1)
 (1,1,1) -> (0,1,1)

This is exactly the well-known rule that you find also in the Wikipedia article.

Understand Bayesian arguments!

I just stumbled over another problem of probability theory on the channel MindYourDecisions. The video is rather old already, but mathematical problems never age.

„A nursery has 3 boys and a number of girls at one day, plus an additional birth at night. The next day a child from the day before is picked at random, and it is a boy. What is the probability that the child born at night is a boy?“

I hate the form that this question for a probability is posed. For the casual audience this does not make much sense. They will just answer 50%, and I would not be angry about this answer. It is sort of correct. You need a mathematical education to understand the problem to start with. Let me explain.

Everyone who has not met this kind of problems before will argue that the probability of the night child being a boy is 50%. That is true. The fact that the survey picks a boy the next day does not change this at all. I mean, the randomly picked boy was a boy. So what?

The problem is that the question is asking something more involved. It asks you to consider all possible random experiments. I.e., you assume a random child at night and a random pick. Then you discard those experiments that do not fit to the story.

An extreme case would be that there are no boys at daytime. Then a girl at night would not fit to the story, and all experiments that fit feature a boy at night. The Bayesian mathematics calls this a 100% probability for a boy.

For the actual case of three boys at daytime, let us assume g girls. I now follow the explanation of the video, but with different wordings and a completely different concept of thinking.

There are two cases now.

Case 1: We have a boy at night. We expect this case to happen in half of the experiments. The chance to pick a boy the next day is then 4/(4+g). We throw away all experiments where a girl is picked, i.e., g/(4+g) of them on average.

Case 2: We have a girl at night. Again this happens in half of the experiments on average. The chance to pick a boy next day is 3/(4+g). We throw away all experiments where a girl is picked.

Now imagine both cases together. In the experiments that we did not throw away we have an overhead of 4 to 3 for case 1. Thus, the Bayesian mathematics says that the chance for case one is 4/7.

There are other ways of writing this up. But they are only shortcuts of the above thought experiment. And for me, defining a precise experiment is the clearest way to think about such problems.

The Ethics of AI

I recently have to read a lot about the consequences of AI with respect to ethics and moral. Let me quickly sketch my personal view on the subject.

First, let us talk about science fiction and AI robots that act like humans. In my view, it is rather simple:

Machines are not human beings,
even when they are equipped with AI or look like humans.
Robots are not part of our society.

In fact, it is probably better for us if the robots do not look like humans at all. That helps us keep the distinction. Machines, with AI or not, should be marked as machines.

The reason I am so sure and definitive about that is not a religious one. We humans are programmed genetically to be part of a human community. Our very self and the purpose of our life depends on other humans. Being lonely, an outlaw, or even just being pushed around make us sick. Children, friends, and admiration make us happy. We are proud to take responsibility for others. It is a misconception to think that people are basically selfish.

Now you can argue that the social system we live in could be enhanced and improved by AI robots which are on our level or even above, both in moral acting and in intellectual capabilities. These robots could live among us just like our friends. Wouldn’t that possibly be a better society? Aren’t humans inherently unreliable and the machines are not?

But any AI that acts in predictable and reliable ways is not an AI at all. It would act mechanical and can easily be determined to be a machine. The very characteristic of intelligence is that it explores new paths occasionally. This makes it unreliable and not predictable by definition. Soon, most AI will even act in super-human ways that we are incapable of understanding due to our limited capacities.

Thus, my only conclusion is that we need to distinguish ourselves from the AI we have. We need to treat AI as machines that we use. Robots are not a product of nature that can live with us uncontrolled with its own rights. We need to protect mankind, our society, and ourselves.

After all that science fiction nightmares, let us talk about the current way we use AI to improve technical systems. Even that limited form of AI already poses questions of ethics and moral.

Some see a lot of problems arising with AI or fuzzy logic that are built into cars, airplanes or surveillance systems. It is indeed true that these systems have the potential of a collision between a human decision and an AI decision. But that is already true without AI technology! We all experienced a system that seems to act on its own, a computer or even a mechanical device.

Think of an airplane with an AI as co-pilot. As a first example, assume the AI commands a go-around in an unstable approach. The chances are quite high that the AI is right and the pilot has made an error. I also tend to think that the AI makes much fewer mistakes throughout the complete flight as a co-pilot would. I would definitely prefer an AI to a rookie on the right seat. The same applies to nearly all applications of AI in technology. Usually, the AI is better. It can even be used to train the pilot in a simulator, much better and versatile than a human instructor.

It must be clear, however, that the responsibility for the proper working of the AI is at the developer of the plane, car or technical system that uses it. The AI is not a human that we can make responsible or even punish for mistakes. The developer has to test the system thoroughly to make sure it works as intended. But if you think this is too difficult note that it is much more difficult to test a human, and the verdict is much less reliable.

It must also be clear at all times that the AI that makes these decisions is a machine. When we let cars drive on their own the car does not become a being on our level, however intellectually superior it may be. If it does not work it will be trashed or repaired.

The problem with AI and robots should not be that they will be superior to us in many ways. The problem is that we need to treat them as our enslaved machines, and not as part of our society.

By the way, I am not afraid of super-human AI. Indeed, humanity deserves a lesson in humiliation. But let us use AI to our benefit!

Python or Java – What is better as first language?

Whenever I have the pleasure to teach the beginner course in programming I start with the same question: Python or Java? Since colleagues have good success with Python and since Python is a useful language I am tempted ever again. However, I decide to go with Java every time. Let me explain the reasons.

This year, I even started a discussion on a Python forum to push me towards Python and tell me why I should prefer it. The discussion was an interesting read, but nothing that convinced me.

So I did my own research on Python sites designed for beginners and see how they handle the teaching. As expected, they dive into Python and use its data collections almost from the start, at least shortly after using the Python command line for interactive computations. Some even go quickly into real world usage with Python packages for graphics, artificial intelligence or numerical mathematics.

If Python is that useful and if there are so many libraries available and also quite nice IDEs (my favorite is Spyder) why do I shy back to use it as first language? Here are some reasons.

  • I am teaching mathematicians. Mathematicians are meant to program, to test and to improve basic algorithms. They may also use high level libraries for research, but they should at least be able to start on the roots of the code, and often that is the only way to go. Python is not designed for this use. First, it is a lot slower than Java (which is on the level of C code, by the way). The difference can be a factor of 10 to 50. Then, it hides the basic data types (such byte, short, int, long, float, double) and their problems from the user and encourages to use its infinite arithmetic. I feel that mathematicians should understand the bits and bytes, and get full, speedy control of their code.
  • Python extends its language to compactify the handling of data sets at the cost of extending the language with almost cryptic syntax. E.g., Python can automatically and in one expression create a list of integers satisfying a certain condition or select from another list under a condition. I am convinced that this kind of behind-the-scene operation is not suited to teach beginners how things work and how to write good and clean code. The situation is a bit like Matlab where everything is vectored and loops are costly. You learn how to use the high-level constructs of your language, but not the basics. If you only have a hammer everything looks like a nail.
  • Python is as multi-platform or open-source as Java. Both need an interpreter and cannot be translated to native code to ease implementation of bigger programs on the user systems. Java is even a bit better in this respect. Both languages cannot easily use the native GUI library or other more hardware bound libraries. Python is a bit better here. But this is nothing to worry about as a first language anyway.
  • I also need to answer the question: Is it easier to go from Java to Python or the other way? And, for me, the answer is clearly that Java provides the more basic knowledge on programming. Python is on a higher level. It is more a scripting language than a basic programming language. One may argue that this is good for beginners since they get a mighty system right at the start. After all, you learn to drive a car without knowing how the motor works. But I have seen enough programs in Python to know that it spoils the programmer and drives the thoughts away from a fresh, more efficient solution.

This said, I would very much welcome a second course building on the basics that uses Python and its advanced libraries. Many students are taught R and Matlab in the syllabus. I think Python would be a better choice than Matlab for numerical programming and data visualization. And, in contrast to Matlab, it is open and free.

The Riemann Sphere

Since I am currently teaching complex function theory I am interested in visualizations of Möbius transformations as well as other analytic functions. The Riemann Sphere together with the Stereographic Projection is a good tool for this.

Above you see the basic idea. The points on the sphere are projected along a line through the north pole to the x-y-plane which represents the complex plane. The north pole associates to the point infinity. The formulas for the projection and its inverse are easy to compute with a bit of geometry. I have done so on this page using EMT and Maxima:

It is quite interesting to see what a Möbius transformation does to the Riemann sphere.

In this image, the north and south poles are mapped to the two black points. The other lines are the images of the circles of latitude and longitude. The image can be created by projecting the usual circles (with the poles in place) to the complex plane, applying the Möbius transformation, and projecting back to the sphere.

In fact, the projection and the Möbius transformation map circles or lines to circles or lines. They preserve angles. This is why the Stereographic Projection is often used for maps that should show the correct angles between paths. But, note that great circles that do not path to the pole are not mapped to lines, but to circles. So the shortest path on the surface of the Earth between two points is a circle on the projected map. Locally, this is only approximated by a line segment.

A Simple Combinatorial Approximation Problem

This is certainly a well-known problem. But I pose it here because there is a geometric solution, and geometric solutions are always nice.

Assume you have two sets of n points

\(x_1,\ldots,x_n, \quad y_1,\ldots,y_n\)

The problem is to find a permutation p such that

\(\sum_{k=1}^n (x_k-y_{p(k)})^2\)

is minimized. Of course, we can assume that the x’s are sorted. And it is easy to guess that the minimum is achieved if the permutation sorts the y’s. But how to prove that?

Here is a simple idea. We claim that swapping two of the y’s into the right order makes the sum of squares smaller. I.e.,

\(x_1 < x_2, \, y_1<y_2 \Longleftrightarrow (x_1-y_1)^2+(x_2-y_2)^2 < (x_1-y_2)^2+(x_2-y_1)^2\)

If you try that algebraically the right hand side is equivalent to

\(-x_1y_1-x_2y_2 < -x_1y_2-x_2y_1\)

Putting everything one side and factorizing, you end up with

\((x_2-x_1)(y_2-y_1) > 0\)

and the equivalence is proved.

But there is a geometric solution. We plot the points into the plane. Two points are on the same side of the symmetry line between the x- and y-axis if they have the same order. And the connection to the mirrored point will always be larger.

We can extend this result to more norms. E.g., the sorting of the y’s will also minimize

\(\sum_{k=1}^n |x_k-y_{p(k)}|, \, \max_k |x_k-y_{p(k)}|\)

As long as the underlying norm is symmetric with respect to the variables, i.e.,


The prove for this can be based on the same geometrical argument.

The speed of C, Java, Python versus EMT

I recently tested Python to see if it is good as a first programming language. The answer is a limited „yes“. Python has certainly a lot to offer. It is a mainstream language and thus there is a lot of support in the net, and there are tons of libraries for all sorts of applications. A modern version like Anaconda even installs all these tools effortlessly.

However, I keep asking myself if it isn’t more important to learn the basics before flying high. Or to put it another way: Should we really start to teach the usage of something like a machine learning toolbox before the student has understood data types and simple loops? This question is important because most high-level libraries take away the tedious work. If you know the right commands you can start problem-solving without knowing much of the underlying programming language.

I cannot decide for all teachers. It depends on the type of students and the type of class. But I can give you an idea of how much a high-level language like Python is holding you back if you want to implement some algorithm yourself. You have to rely on a well-implemented solution being available, or you will be lost.

So here is some test code for the following simple problem. We want to count how many pairs (i,j) of numbers exist such that i^2+j^2 is prime. For the count, we restrict i and j to 1000.


int isprime (int n)
	if (n == 1 || n == 2) return 1;
	if (n % 2 == 0) return 0;
	int i = 3;
	while (i * i <= n)
		if (n % i == 0) return 0;
		i = i + 2;
	return 1;

int count(int n)
	int c = 0;
	for (int i = 1; i <= n; i++)
		for (int j = 1; j <= n; j++)
			if (isprime(i * i + j * j)) c = c + 1;
	return c;

int main()
	clock_t t = clock();
	printf("%g", (clock() - t) / 1000.0);

Result was:

So this takes 0.132 seconds. That is okay for one million prime-number checks.

Below is a direct translation in Java. Surprisingly, this is even faster than C, taking 0.127 seconds. The reason is not clear. But I might not have switched on every optimization of C.

public class Test {

	static boolean isprime (int n)
		if (n == 1 || n == 2) return true;
		if (n % 2 == 0) return false;
		int i = 3;
		while (i * i <= n)
			if (n % i == 0) return false;
			i = i + 2;
		return true;

	static int count(int n)
		int c = 0;
		for (int i = 1; i <= n; i++)
			for (int j = 1; j <= n; j++)
				if (isprime(i * i + j * j)) c = c + 1;
		return c;

	static double clock ()
		return System.currentTimeMillis();
	public static void main (String args[])
		double t = clock();
		System.out.println((clock() - t) / 1000.0);

	Result was:

Below is the same code in Python. It is more than 30 times slower. This is unexpected even to me. I have to admit that I have no idea if some compilation trick can speed that up. But in any case, it is the behavior that an innocent student will see. Python should be seen as an interactive interpreter language, not a basic language to implement fast algorithms.

def isprime (n):
    if n==2 or n==1:
        return True
    if n%2==0:
        return False
    while i*i<=n:
        if n%i==0:
            return False
    return True

def count (n):
    for k in range(1,n+1):
        for l in range(1,n+1):
            if isprime(k*k+l*l):
                ## print(k*k+l*l)
    return c

import time

## Result was
## 98023
## 4.791218519210815

I have a version in Euler Math Toolbox too. It uses matrix tricks and is thus very short, using built-in loops in C. But it still cannot compete with the versions above. It is about 3 times slower than Python and 100 times slower than C. The reason is that the isprime() function is not implemented in C but in the EMT language.

>n=1000; k=1:n; l=k'; tic; totalsum(isprime(k^2+l^2)), toc;
 Used 13.352 seconds

Below is a version where I use TinyC to check for primes. The function isprimc() can only handle real numbers, no vectors. Thus I vectorize it with another function isprimecv(). This takes a bit of performance.

>function tinyc isprimec (x) ...
$  int n = (int)x;
$  int res = 1; 
$  if (n > 2)
$  {
$     if (n%2 == 0) res=0;
$     int i=3;
$     while (i*i<=n)
$     {
$         if (n%i==0) { res=0; break; }
$         i = i+2;
$     }
$  }
$  new_real(res);
$  endfunction
>function map isprimecv (x) := isprimec(x);
>n=1000; k=1:n; l=k'; tic; totalsum(isprimecv(k^2+l^2)), toc;
   Used 0.849 seconds