Archiv des Autors: mga010

A Simple Combinatorial Approximation Problem

This is certainly a well-known problem. But I pose it here because there is a geometric solution, and geometric solutions are always nice.

Assume you have two sets of n points

\(x_1,\ldots,x_n, \quad y_1,\ldots,y_n\)

The problem is to find a permutation p such that

\(\sum_{k=1}^n (x_k-y_{p(k)})^2\)

is minimized. Of course, we can assume that the x’s are sorted. And it is easy to guess that the minimum is achieved if the permutation sorts the y’s. But how to prove that?

Here is a simple idea. We claim that swapping two of the y’s into the right order makes the sum of squares smaller. I.e.,

\(x_1 < x_2, \, y_1<y_2 \Longleftrightarrow (x_1-y_1)^2+(x_2-y_2)^2 < (x_1-y_2)^2+(x_2-y_1)^2\)

If you try that algebraically the right hand side is equivalent to

\(-x_1y_1-x_2y_2 < -x_1y_2-x_2y_1\)

Putting everything one side and factorizing, you end up with

\((x_2-x_1)(y_2-y_1) > 0\)

and the equivalence is proved.

But there is a geometric solution. We plot the points into the plane. Two points are on the same side of the symmetry line between the x- and y-axis if they have the same order. And the connection to the mirrored point will always be larger.

We can extend this result to more norms. E.g., the sorting of the y’s will also minimize

\(\sum_{k=1}^n |x_k-y_{p(k)}|, \, \max_k |x_k-y_{p(k)}|\)

As long as the underlying norm is symmetric with respect to the variables, i.e.,


The prove for this can be based on the same geometrical argument.

The speed of C, Java, Python versus EMT

I recently tested Python to see if it is good as a first programming language. The answer is a limited „yes“. Python has certainly a lot to offer. It is a mainstream language and thus there is a lot of support in the net, and there are tons of libraries for all sorts of applications. A modern version like Anaconda even installs all these tools effortlessly.

However, I keep asking myself if it isn’t more important to learn the basics before flying high. Or to put it another way: Should we really start to teach the usage of something like a machine learning toolbox before the student has understood data types and simple loops? This question is important because most high-level libraries take away the tedious work. If you know the right commands you can start problem-solving without knowing much of the underlying programming language.

I cannot decide for all teachers. It depends on the type of students and the type of class. But I can give you an idea of how much a high-level language like Python is holding you back if you want to implement some algorithm yourself. You have to rely on a well-implemented solution being available, or you will be lost.

So here is some test code for the following simple problem. We want to count how many pairs (i,j) of numbers exist such that i^2+j^2 is prime. For the count, we restrict i and j to 1000.


int isprime (int n)
	if (n == 1 || n == 2) return 1;
	if (n % 2 == 0) return 0;
	int i = 3;
	while (i * i <= n)
		if (n % i == 0) return 0;
		i = i + 2;
	return 1;

int count(int n)
	int c = 0;
	for (int i = 1; i <= n; i++)
		for (int j = 1; j <= n; j++)
			if (isprime(i * i + j * j)) c = c + 1;
	return c;

int main()
	clock_t t = clock();
	printf("%g", (clock() - t) / 1000.0);

Result was:

So this takes 0.132 seconds. That is okay for one million prime-number checks.

Below is a direct translation in Java. Surprisingly, this is even faster than C, taking 0.127 seconds. The reason is not clear. But I might not have switched on every optimization of C.

public class Test {

	static boolean isprime (int n)
		if (n == 1 || n == 2) return true;
		if (n % 2 == 0) return false;
		int i = 3;
		while (i * i <= n)
			if (n % i == 0) return false;
			i = i + 2;
		return true;

	static int count(int n)
		int c = 0;
		for (int i = 1; i <= n; i++)
			for (int j = 1; j <= n; j++)
				if (isprime(i * i + j * j)) c = c + 1;
		return c;

	static double clock ()
		return System.currentTimeMillis();
	public static void main (String args[])
		double t = clock();
		System.out.println((clock() - t) / 1000.0);

	Result was:

Below is the same code in Python. It is more than 30 times slower. This is unexpected even to me. I have to admit that I have no idea if some compilation trick can speed that up. But in any case, it is the behavior that an innocent student will see. Python should be seen as an interactive interpreter language, not a basic language to implement fast algorithms.

def isprime (n):
    if n==2 or n==1:
        return True
    if n%2==0:
        return False
    while i*i<=n:
        if n%i==0:
            return False
    return True

def count (n):
    for k in range(1,n+1):
        for l in range(1,n+1):
            if isprime(k*k+l*l):
                ## print(k*k+l*l)
    return c

import time

## Result was
## 98023
## 4.791218519210815

I have a version in Euler Math Toolbox too. It uses matrix tricks and is thus very short, using built-in loops in C. But it still cannot compete with the versions above. It is about 3 times slower than Python and 100 times slower than C. The reason is that the isprime() function is not implemented in C but in the EMT language.

>n=1000; k=1:n; l=k'; tic; totalsum(isprime(k^2+l^2)), toc;
 Used 13.352 seconds

Below is a version where I use TinyC to check for primes. The function isprimc() can only handle real numbers, no vectors. Thus I vectorize it with another function isprimecv(). This takes a bit of performance.

>function tinyc isprimec (x) ...
$  int n = (int)x;
$  int res = 1; 
$  if (n > 2)
$  {
$     if (n%2 == 0) res=0;
$     int i=3;
$     while (i*i<=n)
$     {
$         if (n%i==0) { res=0; break; }
$         i = i+2;
$     }
$  }
$  new_real(res);
$  endfunction
>function map isprimecv (x) := isprimec(x);
>n=1000; k=1:n; l=k'; tic; totalsum(isprimecv(k^2+l^2)), toc;
   Used 0.849 seconds

More Remarks on Corona

It is, of course, interesting how the total mortality behaves in times of Corona. Unfortunately, I was searching in vain for numbers in Germany. What I found was a PDF file from Spain. I took the first statistics in the image above from it.

In Germany, each year about 1.1% of the population will die, which is a rate of one in approximately 31000 per day. So I would expect around 1450 deaths each day in Spain. The rate depends, of course, strongly on the age structure of the population. Maybe, I underestimate the pyramidal shape of the ages in Spain, so that the daily total mortality is indeed a bit lower than in Germany. The image suggests around 1200.

A statistical plot of that kind should always have a y-axis that starts with zero. So this plot is a bit misleading. But in any case, it suggests a 50% increase in total mortality per day in the last few days. That is due to Corona. As I said, the problem will average out over the year if appropriate measures are taken, and we might observe only a slight increase in total mortality.

Nevertheless, the main problem remains: We cannot handle the huge numbers of severely sick patients that would occur if we just let everything run as normal.

More Remarks on Corona

We learned a bit more now.

  • I am glad to hear that the social restrictions lead to a base infection rate of approximately one in Germany, i.e., one new infection per infectious. That is good news, but it must be even less. The number of cases must decrease to a number we can handle. Currently, my estimate is that every 100th German is infected, and probably the same in the US. That sounds manageable with care in social interactions.
  • And, of course, we still need to restrict travel, especially to countries that cannot take the measures we can because they are too poor. Those countries need immediate help. And, for god’s sake, please relieve the sanctions that the West has imposed to punish Eastern countries for not being cooperative.
  • Opening the kindergartens and schools is crucial for society. But it also imposes the biggest thread. Every nurse can tell you that one case of chickenpox means 30 cases of chickenpox where kids are playing together in larger numbers. And young parents will go through an unknown peak of strep infections while their kids are in that age. So, we need antibody tests as soon as possible, plus a new look at hygiene.
  • I always opted for more digital learning. Some don’t agree because they either think I wanted to replace teachers, or they underestimate the benefits of lonely study, trying, failing and thinking. It is obvious that learning is also a social activity and that teachers are needed for guidance. But some stuff must be done alone, and that can be very well stimulated by digital media.

More Remarks on Corona

What have we learned so far? The numbers are not clear, nor are the consequences. Moreover, there is a lot of distorted information, some can even be called fake. Let me tell you what I took from all that.

  • The most important problem that the new virus is causing is the exceeding number of cases of pneumonia and ARDS caused by fluid in the lungs. This seems to be a lot worse than in the ordinary flu or older Coronaviruses, probably because we are not yet immune. The virus is also causing other damages to your body, but lung problems are the most common ones. While the chances to need oxygen or ventilation are not known precisely yet and will depend on your age, the situation in the hospitals of the affected regions speaks a clear language. We need to concentrate on medical care and break the chains of infection to slow down the development of the disease!
  • The mortality seems to be high only in people of old age, especially those that are weakened by other diseases. We might observe that the overall mortality does not rise much at all. The number of deaths by COVID19 published so far all over the world supports this view. The conclusion is to protect the elderly as much as we can to spare them a painful and sudden death requiring intense medical care. We do that, again, by reducing social contact and keeping the number of infected people at any time as low as possible. By the way, hygiene and care should have been observed in contact with elderly people at all times.
  • The current economy is based on the production of consumer articles and marketing. Many countries have reduced or have been forced to reduce state activities that do not lead to direct profit, such as health care and education. This economy is not fit for any crisis that needs social solidarity. Since profit and interest are the driving forces and the main means of wealth distribution each recession throws us into a deep crisis. Politicians see that now and try to stabilize the situation. Essentially, this is a takeover and a regulation of the industry by financial means, which can be summed up as printing money. Even the idea of state control of key industries is revived. We should learn from that crisis that there must be a balance between the private sector of the economy and social needs covered by the state at all times. If that is out of balance, we are not flexible enough to fight a crisis like this one.

Keep healthy and stay at home for a while!

Remarks on the Corona Pandemic

Here are some remarks that I noticed in these difficult times. Some are math-related, some are not.

  • Humans seem to be unable to image big numbers or grasp the concept of exponential growth. In fact, I cannot really do that myself. What I have to do is compute figures and set them in relation to the total. To help us understand the problem here is one example: Letting the virus spread in a small town of 10000 can easily mean that 1/4 of them are infected at the same time, i.e., approximately 2500. About 1/20 of those need intensive medical treatment, and at least 1/100 need ventilation. We end with a number of 25. This is impossible to do for any countryside hospital. Those small numbers can be understood and they should frighten you to the point that you understand how necessary it is to break the chains of infection. Of course, the same computation for the complete US or German population is even more frightening.
  • Most of us do not understand how bad it is that the world population will grow to eleven billion in the near future. We should have stopped that by education, sharing of wealth and women’s rights thirty years ago. Moreover, we have destroyed local support chains by globalism and our excessive capitalistic system. This falls on our feet now. In African countries, the population has to live in slums at close contact and is depending on the global food chain, and also on the medical support of „first world“ countries. This is a recipe for disaster. Even we are affected by the problem because our support with masks and other medical aids is broken since China is no longer delivering. My only hope is that we come to rethink our world after that crisis.
  • We all cannot grasp probabilities properly. I am now over 60 years old. That means that my non-corona chance of dying in the next year exceeds the corona related chance, even if I catch the virus. I am more likely to die of a heart attack, get brain or lung cancer, get involved in a car accident, or get usual pneumonia during that year. We cannot avoid death. It is never a good idea to think about the chance to die too much. Remember the words of Epicurus: „The death is none of your business. When he is here you aren’t. When you are here he isn’t.“
  • The press and TV are now running on full speed to shovel numbers onto the public and present the drama of single cases. Unfortunately, this agenda together with a lack of social contact may cause a lot of harm to some individuals. If only we would take other causes of unnecessary deaths with the same degree of attention! I would prefer a more scientific attitude in the media, and more reticence to dramatize individual fates. This only leads to a distorted view of reality.

Mathematics of an Epidemic Disease

Since we are currently fighting the Corona epidemy I thought it might be a good idea to simulate the spreading of such a disease in Euler Math Toolbox. The numbers here are fictitious and can only deliver qualitative results.

The mathematical model I took is quite simple.

  • The disease has n1=7 days of incubation. During this time the infected person can infect others.
  • The disease has n2=7 days of severe sickness. I have assumed that there are no further infections during that time.
  • We start with the a-immune population plus 10 infected cases.
  • The chance for infection depends on the probability to contact an infected person during the incubation time and the number of contacts each day that could lead to an infection. I assume that a person has nc=0.2 to nc=0.3 contacts each day which could lead to an infection.
  • In the end, the population is either in the non-immune or the immune group.

So the total population is mapped into a vector x of groups as

  • x[1] : non-immune
  • x[2] to x[1+n1] : incubation
  • x[2+n1] to x[1+n1+n2] : sick
  • x[2+n1+n2] : immune

The formula for the number of new infections in x[2] (first day of incubation) is

\(x[2] = x[1] \left(1-(1-\frac{S}{P})^{n_c}\right)\)

where S is the number of incubated people and P is the total population. So 1-S/P is the chance that the contacted person is not infected, and thus the factor computes the chance for infection. Note that this factor is very small at the start since S/P is very small. Yet the exponential growth does its job over time.

Here is a typical plot of the epidemy with nc=0.3. This is equivalent to contacting 9 other people in a month in a way that can spread the disease. We see that almost 90% of the population has to go through the disease in this case.

Development with nc=0.3

Here is another plot with nc=0.25, equivalent to contacting 7.5 other people in a month.

Development with nc=0.25

That reduces the number of cases a lot and limits the peak number of sick persons from 37% to 22%. For the medical system, this is an enormous relief.

The code of Euler Math Toolbox to do these plots is listed below. You can cut and paste it into EMT to try yourself.

>n1=7; // 7 days of incubation time
>n2=7; // 7 days of sickness
>pm=0.01; // mortility rate
>pop=50000000; // population
>nc=0.3; // number of infectious contacts
>function oneday (x,n1,n2,pm,pop,nc) ...
$xn[2]=floor(xn[1]*(1-(1-sum(x[2:1+n1])/pop)^nc)); // new infections
$xn[1]=x[1]-xn[2]; // remaining uninfected
$xn[3:2+n1+n2]=x[2:1+n1+n2]; // progress one day 
$return xn;
>x=zeros(2+n1+n2); x[1]=pop; x[2]=10; // start with 10 cases
>function sim (x,n1,n2,pm,pop,nc) ...
$   xn=oneday(x,n1,n2,pm,pop,nc);
$   X=X_xn;   
$   until sum(xn[2:1+n1+n2])==0;
$   x=xn;
$return X;
>plot2d(Xs'/pop,color=[green,orange,red,blue]); ...
>labelbox(["immune-","infect+","sick","immune+"], ...
>  colors=[green,orange,red,blue],x=0.4,y=0.2,w=0.35):

We can also understand the problem a bit better if we study the number of new infections in x[2], the first day of incubation. It starts with our chosen number of initial infections. But those 10 induce 3 new infections and the 13 new take another 3. Then we have 16 and they infect 4, and so on exponentially.

New Infections

If the chance S/P of an infected contact is small we have

\(\left(1-(1-\frac{S}{P})^{n_c}\right) \sim \frac{n_c S}{P}\)


\(x[2] \sim n_c S\)

exactly as observed.

Another Youtube Problem

Recently, I found the following problem on Youtube:

\(x+y+z=1, \\ x^2+y^2+z^2=2, \\ x^3+y^3+z^3=3, \\ x^n+y^n+z^n= \,?\)

I did not continue to watch the video. So I have no idea what solution is presented. I decided to treat that as a challenge and to observe my steps, including the failed ones.

The problem sounds like a normal set of three equations with three unknowns that we can solve by eliminating. After all, this is the standard technique that we learn in school. But it is actually not that easy. Eliminating x from the second equation and solving for y yields an expression in y which contains a square root. Inserting everything in the third equation results in a mess.

Let us try numerically. We use Euler Math Toolbox here.

>function f2(x,y) &= 2-x^2-y^2-(1-x-y)^2
                        2                2    2
                     - y  - (- y - x + 1)  - x  + 2
>a3 &:= 3
>function f3(x,y) &= a3-x^3-y^3-(1-x-y)^3
                        3                3    3
                     - y  - (- y - x + 1)  - x  + 3

We see, that the equations do not have a real solution. But they have, of course, a complex solution by the main theorem of Algebra.

The second idea is to try some tricks on the equations. E.g., we could multiply the first equation with x, y, and z and add the three results. Because of the first and the second equation, we get


That is a new equation. But is not clear how this is going to help.

After a while, the mathematical instinct kicks in to try something more generic and simple. Pattern matching inside the brain brings a known trick to the mind. Any sequence


has an interesting property. If x is the zero of a polynomial


It satisfies the interesting property


Thus the sequence above satisfies a recursion formula. Now, if x,y,z are the three zeros of the polynomial p(t) all three satisfy the same recursion formula, and so does the sum of the three zeros.

\(s_n = – (as_{n-1}+bs_{n-2}+cs_{n-3})\)


\(s_n = x^n+y^n+z^n\)

So we know from our three equations and the trivial equation

\(s_0 = x^0+y^0+z^0=3\)


\(3 = -(2a+b+3c)\)

But we also know that

\(p(t) = (t-x)(t-y)(t-z) = t^3 – (x+y+z) t^2 + (xy+yz+xz)t -xyz\)

Once we see that, we get a=-1 immediately from our problem. Having played around with tricks, we also know b=-1/2. From that, we now have

\(a=-1, \quad b=-\dfrac{1}{2}, \quad c=-\dfrac{1}{6}\)

We can verify that with Euler Math Toolbox.

For that, we compute the zeros of the polynomial and check the equations numerically. As expected, two of the zeros are complex.

 [ -0.215425+0.264713i,  -0.215425-0.264713i,  1.43085+0i  ]

The expression for the real zero of p(t) is terrible and involves a third root.

\(z=u^{1/3} + \dfrac{5}{u^{1/3}} + \dfrac{1}{3}, \quad u=\dfrac{\sqrt{13}}{9 \cdot 2^{2/3}}+\dfrac{11}{54}\)

Thus, we get a recursion for the sum and thus for the equation sign in our problem.

\(s_n = s_{n-1}+\dfrac{1}{2}s_{n-2}+\dfrac{1}{6}s_{n-3}\)

Check with EMT.

>fraction sequence("x[-1]+x[-2]/2+x[-3]/6",[3,1,2,3],10)
 [3,  1,  2,  3,  25/6,  6,  103/12,  221/18,  1265/72,  905/36]
>fraction real(sum(s^[0:9]')')
 [3,  1,  2,  3,  25/6,  6,  103/12,  221/18,  1265/72,  905/36]

Can we make a problem with an easier solution? Sure! We only have to start with simpler zeros.

\(x+y+z=0, \\ x^2+y^2+z^2=14, \\ x^3+y^3+z^3=-18, \\ x^n+y^n+z^n= \,?\)

The answer is now

\(s_n = 1 + 2^n + (-3)^n\)

This kind of mathematical thinking looks like a touch of genius. But it is not. Not at all. It is just the summary of long experience in mathematics. The tricks involved are well known. They are just combined in the right way. Mathematical problem solving is just pattern matching using the patterns that have been exercised before. Maybe some stubbornness is needed, and a lot of trial and error. But no genius.

It does help very much to have a numerical and symbolic program like EMT available to check the computations.

The Two Envelopes Problem

Recently I came across a Youtube video by „vsauce“ explaining the two envelope paradox. The problem is at least half a century old, and most likely a lot older. It can be found in a 1953 book by Maurice Kraitchik which is the oldest citation in Wikipedia.

Someone places money into one of two identical envelopes and double the money into the second. We select one of the two envelopes without knowing if we have the first or the second one. We open it. Should we keep that money or return it and opt for the other envelope?

(Let us assume that the value in the first envelope is a rational number. So we cannot conclude with certainty which envelope we have opened. If we selected from integer numbers we would know the envelope in the case of an even value.)

The naive analysis goes like this:

Let X be the amount we have.  Since we have chosen the envelopes at random, the other envelope contains X/2 and 2X with the same probability. So our expectation after switching is

\(\dfrac{X/2+2X}{2} = \dfrac{5}{4} X\)

So switching sounds like a winning strategy. It gets more striking if we consider switching before even opening the envelope. Switching back and forth drives the expected value to infinity. This is clearly nonsense.

This paradox is a confirmation of my principle:

To analyze a probabilistic problem we need an experiment that we could, at least hypothetically, implement in a computer. Equivalently, we need a probability space to work with.

The two envelopes problem does not satisfy this principle. The reason is that it is impossible to select a rational number at random from the set of all rational numbers. You cannot even select an integer evenly from all the integers. We need a probability distribution for that. Think of how you would test the switching strategy on a computer. You will have to tell the machine how to select a number. It can select a number with equal probability from all the numbers it can represent. But it cannot do so from all, infinitely many numbers.

Now that we know that the original problem makes no sense, we can start to analyze the problem by assuming an a priori distribution for the value M in the first envelope. By doing so, we get a bunch of very interesting questions. (By the way, it is always easy to create questions in mathematics. Only a few questions have nice answers, however.)

Let us start by selecting M evenly from 1 to n. Clearly, we switch if the opened envelope, containing X, is odd. We will keep the money only if X>n. For n which is a multiple of 4, we have the following cases:

  1. M<=n/2: We will switch no matter if X=M or X=2M. Thus the average outcome is 3/2 M.
  2. M>n/2: We get 2M in every case.

It is an exercise to compute that our average result is

\(E = \dfrac{15n+14}{16}\)

This is a blog about Euler Math Toolbox too. So let us simulate the strategy in EMT. In this case, a loop would be a good idea, maybe written in C or Python. However, it is also possible to do this elegantly with the matrix language

>n=10; N=1000000;
>M=intrandom(N,n); // select a number between 1 and n
>sel=intrandom(N,2); // select 1 or 2 envelope
>X1=M*sel; X2=M*(3-sel); // X1 is in the selected, X2 in the other
>sum((X1>n)*X1+(X1<=n)*X2)/N // X1 for X1>n, X2 else
>(15*n+14)/16 // expected outcome

We could also try a distribution on the positive real numbers for M. A typical candidate is the exponential distribution. We decide ourselves to keep a value greater than some a>0.

In this case, we switch for all M<a/2 and M>a, getting 3/2 M on average. In half of the cases between a/2 and a, we get 2 M. In the other half we still get 3/2 M. Thus

\(E = \dfrac{3}{2}+\dfrac{1}{2} \int\limits_{a/2}^a x e^{-x} \, dx\)

The best choice is a = 4 ln(2) with

\(E \sim 1.68\)

We can simulate that too.

>n=10; N=1000000;
>M=randexponential(N); // select a number between 1 and n
>sel=intrandom(N,2); // select 1 or 2 envelope
>X1=M*sel; X2=M*(3-sel); // X1 is in the selected, X2 in the other
>a=4*ln(2); sum((X1>a)*X1+(X1<=a)*X2)/N // X1 for X1>n, X2 else

Note that switching all the time or never both expect E=3/2.

You might ask if there is a distribution for M which ensures that it does not matter to look inside the first envelope. But that is not the case. Such a distribution does not exist.m Knowing the distribution and looking at the first envelope will always yield an expected overhead for us.

Two Tilted Rectangles

I found this problem from the wonderful Math with bad Drawings site in my news feed. They cite it from Carolina Shearer. Her twitter account contains more such nice problems. It is the kind of problem which seems only adequate for advanced students. Sometimes, you can solve them by looking at it in the right way. Most of the time, you will start a lengthy computation. Often, you will notice in retrospect that the solution was quite easy and you could have guessed it.

For this problem, I did not see the solution. The problem, of course, is to fit two equal rectangles into a square in the shown manner. For a start, try to find a reason why the point 1/2 on the lower side solves the problem. You can use the general fact that a=b+c in the figure below.


I admit that I did the computations using Euler Math Toolbox. That works. However, there is an elementary fact that can be used to prove a+b=c. For a hint, consult the following image. If the green lines are equal the red lines are equal.