## The Bayesian and the Frequentist – II

This is not immediately related to Bayesian statistics. But it is a good argument for the frequentistic approach. I met the problem in a Youtube video, but the core of this problem is well known. The solutions are not always correct, nor is the linked video. Actually, I present two problems

**Problem 1**

You see two frogs and hear a croak. The croak can only come from a male. Then, what is the probability that one of the frogs is a female?

**Problem 2**

You meat a man, and he tells you that he has two kids, at least one of which is a boy. What is the probability of the other being a girl? And does the probability change if you know that the boy is born on a Tuesday?

Both problems are obviously only vaguely formulated. You need to make assumptions. E.g., in the following, let us assume that for each random frog or kid the probability of being male is 1/2. But, as we will see, the answer to Problem 1 depends on more assumptions. The intuitive answers to both problems tend to be completely wrong.

My point is that you need to imagine a Monte-Carlo simulation in both situations. If you cannot come up with an experiment any answer will be useless anyway. That is the heart of the frequentistic approach to statistics.

Let us start with Problem 2. So your simulation would assign genders to both kids by random, so BB, BF, FB and FF have the same probability 1/4. Note that there is BF and FB since we assign the gender to each kid separately. Then the simulation would discard the irrelevant case FF. We are left with BB, BF and FB with equal probability. Thus, 2/3 of the simulated cases contain a female. Here is a code for this in EMT.

>n=100000; Gender=(random(n,2)<0.5); Boys=sum(Gender)'; >sum(Boys>0 && Boys<2)/sum(Boys>0) 0.665760978144

As usual this code is in the style of a Matrix language. If you are not familiar with this you need to write a loop. Hint: „Gender“ is a nx2 matrix with 0=girl and 1=boy. „Boys“ is a vector that contains the number of boys in each of the 100 thousand samples.

Now, what changes if we know that boy is born on Tuesday? In our simulation, we would have to assign birth dates to the boys. We make the assumption that each day has the same probability. We discard every pair that has no Tuesday born boy. Let us do that in EMT first.

>n=1000000; Gender=(random(n,2)<0.5); Boys=sum(Gender)'; >TuesdayBoys=(random(n,2)<1/7 && Gender==1); TBoys=sum(TuesdayBoys)'; >sum(TBoys>0 && Boys<2)/sum(TBoys>0) 0.518327797038

Again, a loop may be more convenient for you if you are not familiar with the Matrix language of EMT.

If we think of the cases and their probabilities, we get the following cases with the probabilities

\(P(M_TM_T)=\dfrac{p^2}{4},\)

\(p(M_TM_O)=P(M_0M_T)=\dfrac{p(1-p)}{4},\)

\(P(M_TF)=P(FM_T)=\dfrac{p}{4}\)

using the obvious abbreviations (MT for a Tuesday boy, MO for any other boy, and F for a girl) and p=1/7. The cases are exclusive to each other. Thus the probability for a girl under these assumptions is

\(\dfrac{2p/4}{p^2/4+2p(1-p)/4+2p/4} = \dfrac{2}{4-p} = \dfrac{14}{27} \approx 0.5185\)

That agrees to our experiment in three digits. Random Monte-Carlo experiments like the one we performed are not very accurate. With a programming language, however, you can make much larger experiments.

Let us turn to Problem 1. Simulating the frogs means we have to decide for a probability of gender and for the probability of croaking. Moreover, we have to decide what to do with two croaking males.

Assume, you cannot distinguish one from two croaks. Then this is the same as in Problem 1 with a general probability p instead of 1/7. The extreme case is p=1 with the result 1/3 for a female. It is just the same case as in Problem 2 without the Tuesday information. The other extreme case is p close to 0. Then, if you hear a croak the probability for a female is close to 1/2.

The situation is quite different if you assume that there was only one croak. For p=1 this means there must be a female for sure. For p->0 you get 1/2 as before. I leave that to work out for you.

What about a Bayesian approach? We want to compute something like P(Female|Croak). But what do we know? We could use the usual Bayesian trick

\(P(F|C) = P(C|F) \dfrac{P(F)}{P(C)}.\)

We have to work out the probabilities on the right hand side, nevertheless. You can make all sorts of non-sense now. Clearly, P(F)=1/4. But the other probabilities are difficult to compute. So your are left with the definition of the conditional probability

\(P(F|C) = \dfrac{P(F \cap C)}{P(C)}\)

And that is exactly what we did above. There is no help from Bayes here.

## A Problem of Logic

### The Problem

Recently, I stumbled across a very interesting problem. I closed the site and started to think about the solution. Therefore, I neither have a link nor the solution given on the page. Let us try our luck with it. (I found a similar problem here. The solution is similar to the one I found. But here, I try to say more about the problem.)

Problem: Two prisoners A and B are given two numbers, each between 0 and 10. A is given the number a=6 and B is given the number b=4. They are told that the numbers add to 9 or 10. As usual, the prisoners cannot communicate in any way. Each day, A is asked if the sum is 9 or 10, and then B is asked the same question. Either prisoner can pass or claim to know the answer. If the answer is correct both regain their freedom, if not both are executed.

Can they find a secure solution without ever communicating between each other? If they both are clever enough they can. Now, you can leave this page for thinking if you wish. But mark it so that you can find back in case you cannot solve the problem.

The logic gets very involved if you start with the given values a=6 and b=4. A knows from the start that either b=4 or b=3. This he knows that B knows that he has a number between 5 and 7 etc. If you think that way, you are in for a problematic approach.

### The Solution

It took me several attempts to change my thinking. Let us call „shared knowledge“ the facts that both prisoners know and that both prisoners know the other prisoner knows. In fact, we think of what an observer would know. We ignore the specific numbers a=6 and b=4 for a moment.

Then, after A passes, everyone knows (including the observer) that A cannot have a=10. So he has any number between 0 and 9. Likewise, B cannot have b=10 when he passes. But notice that B can not have the number b=0. Because A has less than 10 he would immediately know the answer a+b=9. Thus, after the first day, we write the shared knowledge as

\(D_1 : \quad 0 \le a \le 9, \quad 1 \le b \le 9\)

Once, A passes the second day we know that he has neither a=0 or a=9. Otherwise, he would have known the sum as you will easily check. So a is between 1 and 8. If B passes too we know that b=1 and b=9 are impossible. So after the second day everyone knows

\(D_2 : \quad 1 \le a \le 8, \quad 2 \le b \le 8\)

Continuing like that we get

\(D_3 : \quad 2 \le a \le 7, \quad 3 \le b \le 7\)

\(D_4 : \quad 3 \le a \le 6, \quad 4 \le b \le 6\)

Now, A knows that a=6. Thus either b=3 or b=4. But b=3 no longer is possible. So when A is asked at Day 5, A knows for certain that a=6 and b=4.

Note that B has to decide between a=5 and a=6 which he cannot do earlier with this reasoning. But assume the correct numbers are a=6 and b=3. Then B has to decide between a=6 and a=7. We can do so when he is asked at Day 4. He will then already know A:3-6.

This solved the problem in our specific case. The logic is rather simple, but weird.

### Deceptive Ideas

Let us try to shorten the process and put our logic to a test. We ignored the specific knowledge of A and B about the number they have. A knows a=6, and B knows b=4. So A knows that b=3 or b=4. Thus A knows that B knows that A has a number between 5 and 7. Likewise, B knows that A knows that B has a number between 3 and 5. So, you might deduce that the shared knowledge between A and B at Day 0 is

\(D_0 : \quad 5 \le a \le 7, \quad 3 \le b \le 5 \tag{1}\)

With the logic from above, we deduce that A would know that answer with a=7. So, after a pass, B knows a=5 or a=6. With either b=3 or b=5 he could immediately deduce a+b=9 or a+b=10. Thus, when he passes he must have b=4. A could then declare the result after only two passes when asked at Day 2.

This is a false and deceptive logic. It cannot be correct, because if a=6 and b=3 then A will declare a+b=9 with the same logic on Day 2. In fact, the „shared knowledge“ becomes

\(D_0 : \quad 5 \le a \le 7, \quad 2 \le b \le 4 \tag{2}\)

From that B will exclude a=5 after a pass from A, which changes the outcome completely.

What goes wrong here? While both prisoners will agree to the facts in (1) and(2), and while both are true, they cannot come up with these facts out of their own knowledge. So they have no common agreement about the situation. They do not know what the other prisoner knows. Thus they cannot conclude anything form the assumed knowledge of the other prisoner. To be more precise, B cannot derive the answer from b=3, because from his viewpoint A can still have a=6 or a=7.

### Simulation

It is quite an interesting question if we can simulate the procedure. In fact, we have just given an algorithm to deduce a+b from the number of passes and a or b. Extending from our special case N=10 to a general N, our algorithm goes as follows. The shared knowledge after Day d is

\(D_d : \quad d-1 \le a \le N-d, \quad d \le b \le N-d\)

Now we take into account the true values of a and b, known privately to A and B, respectively. E.g., A knows that b=N-1-a or N-a. Thus, after Day d-1, at Day d, he knows the value of b if d-1=N-a or N-(d-1)=N-a-1. Using similar arguments for B, we get

- B knows the sum at Day d if d=N+1-b (sum=N) or d=b+1 (sum=N-1), after the pass from A at Day d.
- A knows the sum at Day d if d=N+1-a (sum=N) or d=a+2 (sum=N-1), after the pass from B at Day d-1.

Thus the result is known at day

\(d = \min \{N+1-a,a+2,N+1-b,b+1\}.\)

If N=10, a=6, b=4, then d=5. If N=10, a=6, b=3, then d=4. We can prove that this yields the correct answer for all a, b, and N. Or we can simulate, e.g. in EMT.

>function map check (a,b,N) ... $ v=[N+1-b,b+1,N+1-a,a+2]; $ d=min(v); $ if d==N+1-a then $ s = N; $ elseif d==a+2 then $ s = N-1; $ elseif d==N+1-b then $ s = N; $ elseif d==b+1 then $ s = N-1; $ endif $ return s; $ endfunction >a=0:10; check(a,10-a,10) [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10] >a=0:9; check(a,9-a,10) [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

But there is a logical problem. Does the algorithm proof that the two prisoners can escape? The problem is that the prisoners are not allowed to communicate and agree on a specific algorithm. Are there alternatives to this algorithm? Is this the fastest algorithm? These questions are not so easy to answer.

But starting from the observation that A cannot declare anything on Day 1 unless a=10, one can work the way down to the solution as written above. It becomes obvious that the given path is the only possible algorithm

### Generalizations

On the page linked at the start the problem is to decide between a+b=18 and a+b=20. Of course, we could set any other set of possible sums. The logic remains the same. Just don’t let yourself trapped into shortcuts taking into account the knowledge of his own number by any prisoner.

## The Bayesian and the Frequentist – I

I think I write a series of postings about Bayesian statistics and the view that I take on it as a frequentist. Do not expect too much depth here. I am not a specialist in statistics. I just want to study the differences in thinking, maybe with the help of some Monte-Carlo simulations in Euler Math Toolbox (EMT). Of course, as I said before in this blog once you simulate statistics you take the frequentist approach. I also said that anything that cannot be simulated is not well understood. Let us put that to the test of real life problems.

For a start, a typical application of Bayesian thinking is the following: Assume you have two bowls A and B with red and gray balls in them.

The numbers of the gray balls are GA and GB respectively, and likewise the numbers of red balls are RA and RB. So in A we have GA+RA balls, and in B we have GB+RB balls. In the image above, we have RA=10, GA=11, RB=5, GB=30.

Someone draws a ball from one of the bowls and it is red. What can you induce about the bowl the ball was drawn from? From the image, we would say that the red ball is more likely to come from bowl A. But how to quantify that? And what does „more likely“ mean, if anything?

You can learn a lot from your mistakes, false tries, complete failures or illegal arguments. So go ahead and try to say something substantial about the bowl once you know that the drawn ball is red!

The probabilistic analysis goes like this: Our events are the individual balls with the probability that each ball has to be drawn. We select a bowl with probabilities pA and pB first, and then a random ball from the selected bowl. The probability to draw a specific ball in bowl A is then the probability to draw from A divided by the number of balls in A, likewise for any ball in B.

E.g., the probability to draw a red ball turns out to be

\(p_R = p_A \dfrac{R_A}{R_A+G_A} + p_B \dfrac{R_B}{R_B+G_B}. \tag{1}\)

A similar formula holds for the probability to draw a gray ball, and you can check that both probabilities will add to 1. For a specific example, we assume that we draw from A and B with the same probability. Then

\(p_R = \dfrac{1}{2} \dfrac{10}{21} + \dfrac{1}{2} \dfrac{1}{7} = \dfrac{13}{42} \approx 0.3095\)

We now express this in the Bayesian way, which is just a shortcut for the same thing.

\(P(\text{Red}) = P(\text{Red}|\text{A}) P(\text{A}) + P(\text{Red}|\text{B}) P(\text{B}). \tag{2}\)

You read P(Red|A) as the probability of a ball being red under the conditions that it has been drawn from from bowl A.

For a sharp definition, we need the set of all red and gray balls (denoted by Red and Gray), the set of all balls in A and B (denoted by A and B). We can easily compute the probability of a ball to be drawn from any of these sets, or the intersection or union of any of these sets, by adding all probabilities for all balls in these sets. If you think about it a while the proper definition of P(Red|A) must clearly be

\(P(\text{Red}|\text{A}) = \dfrac{P(\text{Red} \cap \text{A})}{P(\text{A})}\)

using the intersection of the sets of red balls and sets of balls in A. This is the expected portion of red balls, provided they are drawn from bowl A. If we multiply the probability from any red ball in A by the number of red balls in A we get

\(P(\text{Red} \cap \text{A}) = R_A \dfrac{p_A}{R_A+R_B}. \)

With that, the Bayesian expression (2) becomes the same as (1) as you will easily verify. But we are not yet sure which one is easier to understand, or easier to handle.

There is another way to understand what is going on besides adding the probabilities of the balls. Since the red ball is either from A or from B we have, using the definition of the probability under a condition,

\(P(\text{Red}) = P(\text{Red} \cap \text{A}) + P(\text{Red} \cap \text{B}) = P(\text{Red}|\text{A})P(\text{A}) + P(\text{Red}|\text{B})P(\text{B}).\)

Our goal was to compute the probability that the ball is from A under the condition that it is red. In Bayesian speech that is

\(P(\text{A}|\text{Red}) = \dfrac{P(\text{A} \cap \text{Red})}{P(\text{Red})}.\)

We already computed everything in this formula.

\(P(\text{A}|\text{Red}) = \dfrac{5/21}{13/42} = \dfrac{10}{13} \approx 0.7692\)

But the special charm of the Bayesian way of saying things this lies in the following formula which follows easily from the definition

\(P(\text{Red}|\text{A}) P(\text{A}) = P(\text{Red} \cap \text{A}) = P(\text{A}|\text{Red}) P(\text{Red}).\)

So we get

\(P(\text{A}|\text{Red}) = P(\text{Red}|\text{A}) \dfrac{P(\text{A})}{P(\text{Red})}. \tag{3}\)

This allows us to invert probabilities and conditions. Note that P(A) appears in (2) and (3). So it is clear that it is important to know the probabilities for the selection from bowl A and B. The result will strongly depend on these probabilities.

The extreme cases are P(A)=0 and P(A)=1, i.e., we never or always draw from bowl A. Clearly, this gives a precise information on the probability that the ball is from A under the condition that it is red!

This plot has been done with

>RA=10, GA=11, RB=5, GB=30 10 11 5 30 >pA=1/2, pB=1/2 0.5 0.5 >function pAR(PA) := (PA*RA/(RA+GA))/(PA*RA/(RA+GA)+(1-PA)*RB/(RB+GB)); >plot2d("pAR",0,1,xl="P(A)",yl="P(A|Red)"):

Let us try to simulate this in EMT. We have the choice to use an easy to understand, but slow program or the quick matrix language. For this demonstration, we use the latter.

>n=1000000; >bowl=intrandom(n,2); A=(bowl==1); sum(A)/n 0.500005 >p=[RA/(RA+GA),RB/(RB+GB)]; fraction p [10/21, 1/7] >Red=random(n)<p[bowl]; // 1 for each red ball >ired=nonzeros(Red); nred=length(ired); nred/n // ired = indices of red balls 0.309861 >sum(bowl[ired]==1)/length(ired) // portion of bowl A in red balls 0.76877051323

The numbers are close enough to justify our computations. And since we can simulate the process we are sure that the results make sense.

Finally, let me add some well known application of this trick. We test patients for cancer with a test that has a true positive rate and a false positive rate of detection. Usually, the true positive rate is close to 100%, i.e., if there is cancer it will be detected. But it also claims cancer if there is none with a false positive rate which cannot be neglected. Then there is the rate of patients which have cancer. It is a good idea to think of the population as split in four groups.

- cancer and positive test
- cancer and negative test
- no cancer and positive test
- no cancer and negative test

You can quantify the expected numbers in each category if you know the above mentioned rates of true and false positive tests and the rate of the cancer in the population (or the selected populations wich undergoes the test).

In Bayesian speech we get

\(P(\text{Cancer}|\text{Positive}) = P(\text{Positive}|\text{Cancer}) \dfrac{P(\text{Cancer})}{P(\text{Positive})}\)

So if your test was positive, chances are that you have no cancer even if the positive rate P(Positive|Cancer) is close to 100%. It all depends on the frequency of the cancer and the frequency of a positive test (including the falsely positive tests). Both are known to your doctor by experience.

For an example, we take the cancer rate as 1% , the true positive rate as 100%, and the false positive rate as 5%. We than have to compute

\(P(\text{Pos.}) = P(\text{Pos.}|\text{Canc.})P(\text{Canc.}) + P(\text{Pos.}|\text{No Canc.}) P(\text{No Canc.}) \)

If we assume that P(Positive|Cancer) is very close to 1, and P(No Cancer) is also very close to 1, we just have to add the rate of cancer and the rate of positive tests in the no cancer population and get an estimate of 6% of positive results as a good estimate. Since the rate of cancer is only 1% we conclude that our chances of having cancer is approximately

\(P(\text{Cancer}|\text{Positive}) \approx \dfrac{1}{6}.\)

That looks way better than our initial alarming positive test indicates.

## Units in Euler Math Toolbox

Units have long been an integral part of EMT. I want to show you a few examples. First, let us talk about the non-metric system the US is still using. Quote: „Nobody understands the metric system! We use teaspoons!“.

>1inch 0.0254 >1in -> " cm" 2.54 cm >inch$, in$ 0.0254 0.0254 >1ft -> inch 12 >1yard -> ft 3 >1mile -> yard 1760 >1mile -> km 1.609344

As you see, you simply append the unit to a number and it will be converted into the metric system (in case you do not know that’s meters and kilograms). The units are stored in global variables which end with $. These variables are visible even within functions. So units work in functions too.

There is also a conversion operator -> which can either convert to text or to numbers, depending on the type of the parameter after the ->. See the examples.

But there is more to it. Often we use fractions or powers of units. EMT can handle that too. Let us start with fractions. In the following example, we introduce a new unit „nautical mile“ which is missing in EMT but will be present in future versions.

>sm$ = 1852.216; >160sm/h -> " m/sec" 82.3207111111 m/sec

As you see, we can now easily convert nautical miles per hours (knots) to meter per second. Assume an airplane departing with 90 knots is forced to use a climb of 200 feet per nautical mile. We want to compute the feet per minute climb which we can read from the VSI.

>200ft/sm * 90sm/h -> " ft/min" 300 ft/min

Of course, you can do this (at least approximately) in the head, if you multiply the numbers and divide by 60. E.g., at 120 knots, you need to double the ft/sm requirement. For just another example, let us compute the sink rate which is necessary at 190 knots to obtain a 3° glide path.

>200ft/sm * 90sm/h -> " ft/min" 300 ft/min >190sm/h * tan(3°) -> ft/min; print(%,unit=" ft/min") 1008.50 ft/min

The example shows still another way to print with units. By the way, the degree symbol behaves like a unit in many ways.

For further examples, here are some american kitchen units. You will also notice that EMT can handle powers in units on both sides.

>10cm^3 1e-005 >cup$ = 236.5882365liter/1000 0.0002365882365 >tablespoon$ = cup$/16; >teaspoon$ = tablespoon$/2; >1teaspoon -> " cm^3" 7.39338239062 cm^3 >1liter -> teaspoon 135.256090807

## Solving Partition Problems with Linear Programming

Yesterday I promised to show how to solve partition problems with integer linear programming. Let us take the problem of yesterday: We want to split the numbers from 1 to 30 in 10 triplets (x,y,z) such that x+y+z=0 modulo 31. We will use linear problems of the following type.

\(Ax=b, \quad x_i \in \{0,1\}, \quad c^T x \to \text{max.}\)

There will be one variable in x for each possible triplet. The value of this variable determines if the triplet is in the selection (1) or not (0). So the j-th column of A refers to the triplet with number j. The i-th row of A is a linear constraint which makes sure that the number i is used only in one triplet. A contaoins only 0-1-values, and

\(a_{i,j} = 1 \quad\Leftrightarrow\quad i \in T_j = \{n_{1,j},n_{2,j},n_{3,j}\}.\)

The constraints will be

\(\sum_j a_{i,j} x_j = 1.\)

We just need any feasible point and could use any target function. Alternatively, we could use „less equal“ in the previous line and maximize the number of triplets used.

How to do that in EMT? We first store all possible triplets into one matrix.

>function makeT (n:index=31) ... $T=[]; $for i=1 to n-2; $ for j=i+1 to n-1; $ k=n-mod(i+j,n); $ if k!=i and k!=j then T=T_[i,j,k]; endif; $ end; $end; $return T $endfunction >makeT(7) 1 2 4 1 4 2 1 6 7 2 4 1 2 5 7 3 4 7 3 5 6 3 6 5 5 6 3 >T=makeT(31);

Then we define the matrix A.

>function makeA (T) ... $v=sort(unique(flatten(T))); $A=zeros(cols(v),rows(T)); $for i=1 to cols(v); $ for j=1 to rows(T); $ if any(v[i]==T[j]) then A[i,j]=1; endif; $ end; $end; $return A; $endfunction >makeT(5) 1 4 5 2 3 5 >fraction makeA(makeT(5)) 1 0 0 1 0 1 1 0 1 1 >A=makeA(T); >size(A) [31, 405]

The function makeA() can take any matrix with partitions in its rows. The first command simply selects the numbers that appear in any partition. In our case, A is simply the vector [1,…,30].

Now we need to solve the problem with integer linear programming. We use the LPSOLVE library for this (given to EMT by the developers).

>function solveP (A,T) ... $ x=ilpsolve(A,ones(rows(A))',ones(cols(A)), $ vlb=zeros(cols(A)),vub=ones(cols(A)),>max); $ return T[nonzeros(x')]; $ endfunction >solveP(A,T) 1 5 25 2 7 22 3 11 17 4 6 21 8 9 14 10 23 29 12 24 26 13 19 30 15 20 27 16 18 28

The return value of the function ilpsolve() is a 0-1-vector. We want to print the triplets which are marked by 1 in this vector. The variables vlb and vub are lower and upper bounds for the variables. Interestingly, the solution works without these restrictions. Nevertheless, more restrictions usually mean shorter calculations.

## Modulo Arithmetic with Euler Math Toolbox

Recently, I came across the problem to compute expressions in the modulo arithmetic with integer numbers. Here is an example: We like to compute expressions modulo 31. Since 31 is prime the numbers 0 to 30 with addition and multiplication modulo 31 will be a field. Each number a between 1 and 31 will have a unique multiplicative inverse b, such that ab=1 modulo 31. E.g.

>mod(27*23,31) 1

The inverse of 27 is 23 modulo 31.

How can one find the inverse element? The fastest way uses Euclid’s algorithm. If you compute the greatest common divider of 27 and 31 using Euclid’s algorithm you can derive x and y such that 27x+31y=1. Then x (taken modulo 31) will be the inverse of 27 modulo 31. Here is the computation for these numbers:

\(31=27+4, \quad 27=6 \cdot 4 + 3, \quad 4 = 3 + 1,\)

and by computing backwards

\(1 = 4 – 3 = 7 \cdot 4 – 27 = 7 \cdot 31 – 8 \cdot 27.\)

Note -8=23 modulo 31. We get our result 23.

Of course, we could program that in EMT or Maxima. Maxima contains already a function for the inverse modulo a number.

>&inv_mod(27,31) 23

We also know the small Lemma of Fermat.

\(a^{p-1} = 1 \text{ mod } p.\)

Thus

\(a \cdot a^{p-2} = 1 \text{ mod } p.\)

This can also be used to compute the inverse. To get the result in EMT we need a trick which I am going to explain below. But in Maxima we can compute very large powers, albeit very slowly.

>&mod(27^29,31) 23

For single computations or small numbers, symbolic computation is okay. But assume we need more speed. The computation above does not work in EMT, because the power is too large. Python, by the way, has infinite integers too. So the computation works there too.

>>> print 27**29 323257909929174534292273980721360271853387 >>> print 27**29 % 31 23

One of the reasons of this posting is to show how this can be done in EMT and other languages without an infinite integer arithmetic, and moreover much faster. The trick is the fact that we can as well take the modulo in each step of the multiplication. Here is the first simple code.

>mod(27^29,31) // WRONG RESULT! 12 >function map powmod (a,n,p) ... $ b=1; $ loop 1 to n; b=mod(b*a,p); end; $ return b $ endfunction >powmod(27,29,31) // CORRECT RESULT! 23

However, this is still not the fastest way. For powers, we can use this trick.

\(x^{2y} = \left(x^y\right)^2, \quad x^{2y+1} = \left(x^y\right)^2 \cdot x.\)

This leads to the following code. By the way, the function matrixpower() of EMT uses the same technique. To understand the code, note that it is not efficient to use a mapping function recursively. Thus we call it from another function.

>function powmodrek (a,n,p) ... $ if n==0 then return 1 $ elseif n==1 then return a $ elseif mod(n,2)==0 then $ b=powmodrek(a,n/2,p); return mod(b*b,p); $ else $ b=powmodrek(a,(n-1)/2,p); return mod(mod(b*b,p)*a,p); $ endfunction >function map powmod (a:number,n:nonnegative integer,p:index) ... $ return powmodrek(a,n,p); $ endfunction >powmod(27,29,31) 23 >powmod(27,1:30,31) [27, 16, 29, 8, 30, 4, 15, 2, 23, 1, 27, 16, 29, 8, 30, 4, 15, 2, 23, 1, 27, 16, 29, 8, 30, 4, 15, 2, 23, 1]

The last computation is not effective, of course. If we need to compute all powers we should use a simple loop instead. But I included that example to show that powmod() can map to vector input.

As you see, the powers of 27 do not cycle through all the number 1 to 30. For other base numbers, they do.

>powmod(3,1:30,31) [3, 9, 27, 19, 26, 16, 17, 20, 29, 25, 13, 8, 24, 10, 30, 28, 22, 4, 12, 5, 15, 14, 11, 2, 6, 18, 23, 7, 21, 1]

The second reason for this posting is to demonstrate the solution to another problem. We want to partition the numbers 1 to 30 into triplets with a+b+c=0 modulo 31. This is surprisingly easy. We have for all q different from 1

\(1+q+q^2 = \dfrac{q^3-1}{q-1}\)

Thus by the Lemma of Fermat

\(3^0 + 3^{10} + 3^{20} = \dfrac{3^{30}-1}{3^{10}-1} = 0 \text{ mod }31.\)

Our triples can be found easily from this. Just multiply the equation with powers of 3.

>for k=0 to 9; ... > powmod(3,k,31)+"+"+powmod(3,10+k,31)+"+"+powmod(3,20+k,31)+"=0", ... >end; 1+25+5=0 3+13+15=0 9+8+14=0 27+24+11=0 19+10+2=0 26+30+6=0 16+28+18=0 17+22+23=0 20+4+7=0 29+12+21=0

It is also possible to find such a partition with any desired property using linear programming and LPSOLVE in EMT. This should be the topic of another posting, however.

## Mathematik oder Pädagogik?

Ich lese in ZEIT online das Interview mit einer ehemaligen Schulleiterin in einer Hamburger „Stadtteilschule“. In einem Bildblock herausgestellt wird folgender Satz:

*Die Ausbildung unserer Lehrer ist praxisfern: Schüler brauchen keine Fachgenies, sondern Pädagogen, die ihre Probleme verstehen.*

Wie so oft ist das so zugespitzt formuliert, dass jeder zustimmen kann. „Fachgenies“ sind ganz offensichtlich in der Schule fehl am Platz. Ebenso wenig bestreitet jemand, dass Lehrer und Lehrerinnen die Probleme der Schüler verstehen sollten. Der Satz entlarvt sich damit als Meinungsmache, deren wesentliches Merkmal ist Sachverhalte zu unterstellen, die so nicht der Realität entsprechen. Stimmt man dem Satz zu, dann stimmt man auch der tendenziösen Botschaft zu, die ganz einfach lautet: Weniger Fachausbildung für Lehrer!

Nun versuchen wir es damit in Deutschland seit Jahren. Die Lehramtsausbildung für das Gymnasium entspricht inzwischen nur mit zusätzlichen Modulen einem Bachelor-Abschluss im Fach. Die Zulassungsarbeit kann durch eine Bachelorarbeit ersetzt werden. Im Vergleich dazu war es noch vor wenigen Jahren ohne Problem möglich, mit dem ersten Staatsexamen in die Promotion einzusteigen. In den anderen Schularten sieht es nicht besser aus. Fachinhalte wurden und werden dort bis auf das Schulniveau reduziert. Wurde die Schule dadurch besser? Nach Aussage derselben Kritiker unserer Lehramtsausbildung ist gerade das nicht der Fall. Sie empfehlen allerdings nur mehr von einer Medizin, die bisher auch nicht gewirkt hat.

Fragen Sie doch einmal einen beliebigen Schulleiter, was ihn an der real existierenden Schule am meisten bedrückt. Über zu große Fachkenntnisse des Lehrpersonals werden Sie da nichts hören. Im Gegenteil gibt es Schulleiter, die sich über mangelnde fachliche Professionalität von Referendaren beschweren. Das führt zu Ärgernissen mit Eltern und Probleme im Unterricht bis hin zur Notwendigkeit, die Klausuren dieser angehenden Kollegen nachkorrigieren lassen zu müssen. Die wirklichen Klagen der Schulleiter betreffen eher Schüler, Eltern und die Schulbehörde.

Nun ist gerade die veränderte Zusammensetzung der Klassen ein Argument der Pädagogen für mehr Pädagogik und weniger fachliche Ausbildung. Nutzt das etwas? Ich glaube nicht. Auch ich kann provokante Sätze formulieren und textuell herausstellen:

*Es ist nicht hilfreich, nützliche Fachvorlesungen in Mathematik durch verkopfte Fachvorlesung in Pädagogik zu ersetzen.*

Wir sollten allerdings von einer übermäßigen Konfrontation wieder ein wenig herunterkommen. Wir sind uns schließlich einig darüber, dass der Erfolg der Schule, und zwar nicht nur bei den unteren Schichten, zu wünschen übrig lässt. Die Kompetenzen in den MINT-Fächern ebenso wie die Lesekompetenzen, die ja ständig gemessen werden, sind im Schnitt unzureichend. Wir sollten uns auch einig darüber sein, dass ein Lehrer sowohl pädagogisch-didaktische, also auch fachliche Qualitäten haben muss. Ein guter Lehrer benötigt beides, und dazu noch menschliche Fähigkeiten, die man nicht antrainieren kann.

*Ein guter Lehrer besitzt Sachkenntnis sowie pädagogische und didaktische Kompetenzen in gleichem Maße. Im optimalen Fall hat er auch besondere menschliche Qualitäten. *

Ich schlage erneut eine zweiphasige Ausbildung als Kompromiss vor, bestehend aus einer Fachausbildung in der ersten Phase und Ausbildung zum Lehrer in der zweiten Phase. Die Fachausbildung besteht aus einer Ausbildung zum Bachelor in den zu unterrichtenden Fächern, mit einem Fach als Hauptfach. Das gilt zumindest für die Lehrämter am Gymnasium. In der zweiten Phase werden dann Schulpraktika, pädagogische Inhalte und didaktische Module miteinander zu einem Master of Education verzahnt. Das erste Staatsexamen wird abgeschafft. Dieser Vorschlag wurde schon öfter gemacht, auch von prominenter Seite. Er scheiterte aber immer an universitären Lehramtsexperten, die glauben, einen Lehrer von der Schule direkt wieder in die Schule abholen zu müssen. Im Unterschied dazu glaube ich als Fachdozent nicht, bei der mich nicht betreffenden anderen Phase, der eigentlichen Lehramtsausbildung, hineinreden zu müssen.

Was die Lehrämter an Grund- und Hauptschulen angeht, so bin auch ich der Meinung, dass hier die Fachinhalte auf das Notwendigste zu reduzieren sind. Bei der Realschule allerdings ist die Lage kritischer zu sehen. Dies ist eine Schulart, für der sich zu viele Studentinnen und Studenten entscheiden, ohne allerdings die fachlichen Voraussetzungen für einen erfolgreichen Unterricht mitzubringen oder zu erwerben. Der Überhang an Bewerbern ist hier so groß, dass eine fachliche Hürde nicht schaden kann. Derzeit machen etwas zehnmal mehr Kandidatinnen und Kandidaten einen Abschluss als Stellen zur Verfügung stehen. Ob man hier einen Bachelor vorsehen sollte, mag diskutierbar sein. Aber es muss klar sein, dass selbst an Realschulen ein Mathematiklehrer ein Mathematiker sein muss. Amateure richten zu viel Schaden an.

*Ein Mathematiklehrer muss Mathematiker sein, kein Amateur. Er muss sich im Studium die Grundsätze seinen Faches angeeignet haben. Die lassen sich eben nur in Fachvorlesungen üben.*

Schließlich möchte ich noch präzisieren, was ich unter „Fachinhalten“ verstehe. Keineswegs müssen das fortgeschrittene Forschungsinhalte des jeweiligen Fachs sein. Auch sind es nicht unbedingt die klassischen Inhalte ohne Rücksicht darauf, wie sich Fächer in der modernen Zeit wissenschaftlich wandeln. Und die Curricula der Bachelor in Mathematik sind in der Tat moderner geworden. Wo etwa Funktionentheorie und Algebra noch immer eine Hauptrolle spielen, ist das dem Lehramtsstudium und dem schriftlichen Staatsexamen geschuldet. Diese scharfe, aber unsachgemäße Akzentuierung würde mit dem obigen zweistufigen Modell sofort aufhören. Das ist übrigens ein weiterer Grund, warum sich selbst Fachvertreter gegen eine Reform des Lehramtsstudiums wenden.

Eine bessere Schule tut Not. Sie ist eine gesellschaftliche Herausforderung. Die Lehramtsausbildung ist nur eine Facette des Problems. Vermutlich wird man ohne eine Aufwertung des Lehrerberufs nicht auskommen. Die erfolgreichen Länder vereinen strikte Zugangsprüfungen mit guter Bezahlung und guten Arbeitsbedingungen. Vielleicht sollten wir ein wenig von ihnen lernen.

## Euler Math Toolbox and Matlab

Recently, this site got more traffic than usual. So I thought I welcome all of you in the new year (which is already three weeks old now). And I provide some new content for you to read. For me, the selected topic is important. I welcome very much any discussion about it. I am currently writing a requested paper for an on-line journal about Euler Math Toolbox, so I could benefit from your input.

What is the difference between Euler Math Toolbox and Matlab? Or rather, what is the intended difference? And are there alternatives to both?

First of all the following points are obvious to me.

- Matlab is profiling itself in the
*professional*market. It is called the „industry standard“, and it is advertised as a must-know to students. Matlab tries very much to serve the needs of the industry. The best example is its simulation toolbox which connects Matlab to hardware. Matlab is backed up by a group of commercial programmers. - Euler Math Toolbox is for
*education and research*. It is not intended to be used as a professional tool in industry. EMT is backed up by open software. - Both systems fall short for specialized applications. Most professional or research software is compiled in a general programming language and does things that a one-for-all software cannot do.
- We need to educate our students in
*general programming*, not in software systems, especially not in commercial systems. With a good background in programming any software can be adapted easily. - That said, a tool like Euler Math Toolbox or Matlab can be useful for quick computations or to mathematical demos. And for this, both systems are equally well equipped with an advantage for EMT.
- We should have
*free and open*, reliable numerical*libraries*for our main computer languages.

If you are interested in the primary differences between EMT and Matlab, here is a list.

- EMT is
*open and free*, Matlab is commercial with a considerable pricing. - The user interface of EMT is
*notebook oriented*similar to Maple or Mathematica, Matlab is*command oriented*. Both have scripts, of course, and are similar in this respect. - Both systems can do
*symbolic*computations, EMT with the free Maxima, Matlab with an additional package you need to buy. - EMT can use many
*open*external programs like Povray, Python, C, Scilab, Latex. For Matlab, you can buy powerful external packages to do industry strength computations. - Matlab has an optional
*compiler*. In EMT, you need to write C or Python for more speed. - Matlab is used all around the world, and you will easily find someone with the same problem as you have. EMT has a community, but it is way smaller.

There might be more differences that I forgot to mention. But all in all it is clear to me that EMT deserves to be used in education much more than it is right now. Matlab, on the other hand, should be removed from general education. It should be reserved for *specialized courses* done by engineers who have used the program for their work.

Alternatives? There are many.

- R is a
*statistical package*featuring a matrix language as well. It is free and has a big community. The programming in R is not as easy to learn as they claim, but to get statistics done it is not necessary. The interface is ugly and cumbersome to use. - Maple, Mathematica or other
*algebra software*are commercial packages in the higher price range. They offer more advanced features than Maxima and often yield better results. But they are not designed to be used as numerical software. *Geogebra*tries to be the knife for all purposes. It is a beautiful software for schools and has a large community. Clearly, it falls short in the area of numerical and other programming. Actually, it is a whole different category of software.*Python*is probably the best alternative to big software packages. There are very nice packages for plotting and for numerical computations. The symbolic package falls short in comparison, however. The Sage project relies completely on Python and has a web interface which clearly is the proper way in the future.*General programming langes*are the way to do software education in universities for engineers and math students. Most courses have an obligatory course in Java, and rightly so.

Open for discussion.

## Pure Symbolic Functions in Maxima and EMT

I like to demonstrate a few tricks with Maxima and EMT. Assume you want to generate the Vandermonde matrix, and check the identity

\(\det \begin{pmatrix} 1 & x_0 & \ldots & x_0^n \\ \vdots & \vdots & & \vdots \\ 1 & x_n & \ldots & x_n^n \end{pmatrix} = \prod\limits_{0 \le i<j \le n} (x_j-x_i)\)

You can generate this matrix easily in EMT for specific values.

>n=3; v=0:3; V=v'^(0:3) 1 0 0 0 1 1 1 1 1 2 4 8 1 3 9 27 >det(V) 12

It is a bit more difficult to check the right hand side of the equation. Of course, you can simply write a function

>function vprod (v:vector) ... $p=1; n=length(v); $for i=1 to n; $ for j=i+1 to n; $ p=p*(v[j]-v[i]); $ end; $end; $return p; $endfunction >vprod(v) 12

It is also possible with Matrix tricks. We generate all differences in a matrix, square the matrix, set the diagonal to 1, take the total product of all matrix elements and take the 4-th root.

>(prod(prod(setdiag((v-v')^2,0,1))'))^(1/4) 12

But that is not the general case with variables. To generate this we use purely symbolic functions in EMT. These functions evaluate in Maxima at the time when they are used, not when they are defined.

>function mrow (k,v,n) &&= create_list(v[k]^m,m,0,n) m create_list(v , m, 0, n) k >function Vand (v,n) &&= apply(matrix,create_list(mrow(k,v,n),k,0,n)) apply(matrix, create_list(mrow(k, v, n), k, 0, n)) >&Vand(v,2) [ 2 ] [ 1 v v ] [ 0 0 ] [ ] [ 2 ] [ 1 v v ] [ 1 1 ] [ ] [ 2 ] [ 1 v v ] [ 2 2 ]

This is almost self explaining, once you know that „matrix(a,b,c)“ generates a matrix in Maxima with rows „a,b,c“. The creation takes place for the specific symbolic v when „Vand(v,2)“ is called.

Maxima prints the result with indices. But it is a string of the following form.

>V &= Vand(v,2); >""+V matrix([1,v[0],v[0]^2],[1,v[1],v[1]^2],[1,v[2],v[2]^2])

Since EMT vectors start with index 1 by default, we can only evaluate this expression for zero based vectors (unless we modify the functions above a little bit).

>v=0:2; zerobase v; >V() 1 0 0 1 1 1 1 2 4

Maxima can factor the determinant of any Vandermonde determinant of specific size.

>&factor(determinant(Vand(v,3))) (v - v ) (v - v ) (v - v ) (v - v ) (v - v ) (v - v ) 1 0 2 0 2 1 3 0 3 1 3 2

This is in accordance with the theory. Let me try the following matrix.

\(\begin{pmatrix} (x_0-x_0)^n & \ldots & (x_0-x_n)^n \\ \vdots & & \vdots \\ (x_n-x_0)^n & \ldots & (x_n-x_n)^n \end{pmatrix}\)

>function mrow (k,v,n) &&= create_list((v[k]-v[m])^n,m,0,n) n create_list((v - v ) , m, 0, n) k m >function Vand (v,n) &&= apply(matrix,create_list(mrow(k,v,n),k,0,n)) apply(matrix, create_list(mrow(k, v, n), k, 0, n)) >&Vand(v,2) [ 2 2 ] [ 0 (v - v ) (v - v ) ] [ 0 1 0 2 ] [ ] [ 2 2 ] [ (v - v ) 0 (v - v ) ] [ 1 0 1 2 ] [ ] [ 2 2 ] [ (v - v ) (v - v ) 0 ] [ 2 0 2 1 ]

I am not sure if you are aware of the following formula.

>&factor(determinant(Vand(v,3))) 2 2 2 2 2 9 (v - v ) (v - v ) (v - v ) (v - v ) (v - v ) 1 0 2 0 2 1 3 0 3 1 2 (v - v ) 3 2

The factor 9 is interesting. For n=4, it is 96, and for n=5, it is 2500. Do you guesswork. The factorization of the case n=4 takes some seconds. I did not try higher degrees.

Of course, it is easy to compute this numerically, even for very large matrices.

>function d1(v) := det((v'-v)^(cols(v)-1)) >d1(0:10) 1.43679021499e+072

## December Problem

I found the following problem on the page of the wonderful Science Blogs. It is in German. Let me translate it. Bernd writes two different numbers between 1 and 1000 on two cards of paper. Anna selects one, and after seeing it has to decide if she wants to keep it or change to the other card. The player with the higher card wins.

This is a case of a two person game with the following tactics for each player.

- Bernd: He can write any combination (i,j) with 1 <= i < j <= N.
- Anna: She can decide to change up to a number A with 1 <= A <= N.

Note that we simplified the situation for Anna. It does not make sense to change with 3, but not with a smaller number. Weather or not this simplification makes the outcome worse for Anna remains to be investigated.

Now, Bernd can select any of his tactics (i,j) with probability p(i,j), and Anna can select any of her tactics with probability q(A). The problem to select the optimal strategy (i.e. the distribution of probabilities) for each player can be solved with optimization. I have done similar problems with EMT on this page. I solved the problem above for (N=10) with a bit of code.

However, I want to spoil the solution here. If you want to try on your own, do not read further.

- Strategy for Bernd: Bernd should select the pairs (1,2),…,(N-1,N) at random with equal probability.
- Strategy for Anna: Before each game, Anna should select a number A from 1 to N-1 at random, and switch if her number is less or equal A.

Let us prove that this is optimal.

Assume, Anna selects tactics A. So she will change cards at any number smaller or equal A. Then Bernd will lose 1/(N-1) on average with his strategy. This is so, since the only case that makes a difference for him is the pair (A,A+1), where Anna wins all the time. So, no matter what tactics A Anna selects, the worst case is an average loss of 1/(N-1). Only if Anna would select A=N, he would do better.

Now assume, Bernd selects tactics (i,j). Then Anna, with her strategy, will win in any case if i<=A<j. In all other cases, the average result is 0. We need to find the worst case for Anna based on her strategy. That is indeed the case, if Bernd selects a pair (i,i+1) and it is equal to a win of 1/(N-1).

It is now clear that there cannot be a better strategy for either player. Anna cannot win more than Bernd loses on average.

With a bit more care in the proof, it is also clear that it would not help Anna to add more strategies. Assume, she rejects some numbers, and accepts others, instead of rejecting the numbers 1 to A altogether for some A. Then Bernds tactics will lose less or equal 1/(N-1) on average. For an example, assume Anna rejects 1 and 3. Then (1,2) will be a loss for Bernd, (2,3) will be a win, and (3,4) will be a loss, all other cases are 0 on average.

This is a very nice problem. I could not find a reference in the net, however. But it should be known.

Let me add the code for EMT.

>v1=flatten(dup(1:10,10)); >v2=flatten(dup((1:10)',10)); >i=nonzeros(v1<v2); v1=v1[i]; v2=v2[i]; >A=(1:10)'; >M=-((v1<=A)&&(A<v2)); >function game (R) ... $## Maximizes G such that R'.q>=G, sum q_i = 1, q_i>=0. $## Returns {q,G}. $ m=rows(R); n=cols(R); $ M=R|-1_ones(1,n); b=zeros(m,1)_1; eq=ones(m,1)_0; $ c=zeros(1,n)|1; restr=ones(1,n)|0; $ res=simplex(M,b,c,eq,restr,>max); $ return {res[1:n]',res[-1]}; $endfunction >{pB,res}=game(M); fraction res -1/9 >k=nonzeros(pB); fraction (pB[k]_v1[k]_v2[k])' 1/9 1 2 1/9 2 3 1/9 3 4 1/9 4 5 1/9 5 6 1/9 6 7 1/9 7 8 1/9 8 9 1/9 9 10 >{pA,res}=game(-M'); fraction res 1/9 >fraction pA'|(1:10)' 1/9 1 1/9 2 1/9 3 1/9 4 1/9 5 1/9 6 1/9 7 1/9 8 1/9 9 0 10

This looks a bit difficult. Here are some explanations.

The vectors v1 and v2 simply contain the pairs (i,j) with 1 <= i < j <= N = 10. The trick to get all pairs is to create matrices with 1 to 10 in each row resp. column and to „flatten“ them to vectors. Then we select only the pairs with i < j using the nonzeros() function of EMT.

The matrix M is then constructed in the following way: The element in row A and column (i,j) is -1 (a loss for Bernd) if i <= A < j. This is the matrix of average outcomes if Anna selects A and Bernd selects (i.j).

The game() function calls the simplex algorithm for the following problem.

\(Mp \ge G, \quad \sum_k p_k = 1, \quad G \to \text{Max.}\)

The probabilities p(k) are then maximizing the worst case G, for all rows of M (tactics of Anna). The result is p and G. We print the nonzero elements of p along with the corresponding i and j.

The same is finally done with the problem seen from Anna’s side.

By the way, the fact that both results must be equal (besides the sign) is an application of duality in optimization.

Really a nice problem!