Archiv der Kategorie: Uncategorized

Cambrige Entry Examination

I recently came across a video with the problem to find the sum of the solutions of

\(3^x – \sqrt{3}^{(x+4)} + 20 = 0\)

Maybe anybody who is intimate with this kind of computations sees the trick. You can set

\(y = \sqrt{3}^x,\)

and get the equivalent equation

\(y^2-9y+20=0.\)

Solving this, you get

\(y = \sqrt{3}^x = 3^{x/2} = \dfrac{9 \pm 1}{2}.\)

Thus

\(x_1+x_2 = 2 \log_3(20) = \dfrac{2 \ln(20)}{\ln(3)}.\)

That’s not very exciting.

But why are they asking to find the sum of the solutions? If you hear that, you might think of the Vieta’s fact that the sum and the product of the solutions of a quadratic equation are the coefficients. Using this, you get

\(y_1y_2 = \sqrt{3}^{(x_1+x_2)} = 20\)

immediately.

That’s were I decided to write about this problem here. E.g., you might change the original problem to

\(3^x – \sqrt{3}^{(x+4)} + 9 = 0\)

and get the much more pleasing solution

\(x_1 + x_2 = 2 \log_3(9) = 4.\)

Nice trick! But it is dangerous. Applying the trick to

\(3^x – \sqrt{3}^{(x+4)} + 81 = 0\)

yields the false sum 8.

In fact, this equation does not have a solution at all! Unless you allow complex numbers, and are careful with complex logarithms. In that case, you get multiple solutions.

So be careful with tricks!

Integrals and Monte Carlo Methods

I am teaching surface integrals right now. Searching for some interesting applications away from physics, I came up with mean values along surfaces. After all, mean or „expected“ values are a center part of probability theory. It is tempting to try to verify the analytic results by Monte Carlo methods. This can be done nicely in Euler Math Toolbox.

E.g., an integral on a finite interval [a,b] can be simulated this way. We generate random numbers in [a,b] uniformly, and compute the mean of the values of the function. The expected mean value is the integral of the function divided by the length of the interval.

>&integrate(log(x),x,0,2), %()
 
                              2 log(2) - 2
 
 -0.61370563888
>N = 10 000 000;
>x = random(N)*2;
>mean(log(x))*2
 -0.61394326247

Syntax notes for EMT: The blanks in integer numbers can be generated by pressing Ctrl-Space. This generates a hard space which is skipped when scanning the number. The random() function generates uniform numbers in [0,1] which we need to be scaled properly to fall into [0,2].

The accuracy of these Monte Carlo methods are like 1/√N. So, we cannot expect more than 4 digits, even with ten million samples.

As another example, the area of the circle can be simulated by throwing random numbers into the unit cube and counting the number of hits inside the circle. We expect a proportion of pi/4 there.

>N = 10 000 000;
>x = 2*random(N)-1; y=2*random(N)-1;
>sum(x^2+y^2<=1)/N
 0.7854814
>pi/4
 0.785398163397

Syntax notes for EMT: Applying the boolean „≤“ to a vector generates a vector of zeros and ones. Summing this vector up counts the „true“ cases.

This rejection method is also a simple way to create random numbers inside the unit circle. It is not as bad as it sounds, since more than every second pair of numbers hits inside the circle.

For a further test, let us compute the integral of r^2 on the unit circle (which is proportional to the energy in a rotating disk). The integral is the mean value of r^2 on the unit circle times the area of the unit circle.

>N = 10 000 000;
>x = 2*random(N)-1; y=2*random(N)-1;
>r = sqrt(x^2+y^2);
>i = nonzeros(r<=1);
>mean(r[i]^2)*pi
 1.57088571913
>&2*pi*integrate(r^3,r,0,1), %()
 
                                   pi
                                   --
                                   2
 
 1.57079632679

Syntax notes for EMT: The nonzeros() function generates a vector of indices of nonzero elements. We can use that vector to select the hits from the vector r.

For the correct integral, we used the formula for rotational invariant functions on circles.

As a side remark, there is a direct method to generate random numbers inside the unit circle. For this, we need to select a random angle between 0 and 2pi. The radius must be selected in proportion to the length of the circle with this radius, i.e. with density 0≤r≤1. The following does the trick. We explain it after the code.

>function circrandom (N) ...
$  t = random(N)*2*pi;
$  r = sqrt(1-random(N));
$  return {r*cos(t),r*sin(t)}
$endfunction
>{x,y} = circrandom(10000);
>plot2d(x,y,>points,style=".",r=1.2):

This will generate the following points, which look quite nicely distributed in the unit circle.

Why does this work? The reason is that any probability distribution can be simulated by inverting its accumulated distribution, in this case

\(F(R) = p(r \ge R) = \int\limits_R^1 2r \, dr = 1-r^2\)

Here, we take 2r because this generates a probability distribution with density proportionally to r. Then we create a random number between 0 and 1 and apply the inverted function.

Using this trick, we can do another Monte Carlo simulation of the integral.

>{x,y} = circrandom(N);
>r = sqrt(x^2+y^2);
>i = nonzeros(r<=1);
>mean(r[i]^2)*pi
 1.57153749805

Syntax notes on EMT: Note that circrandom() returns two values. It is possible to assign these two values to two variables.

Finally, we try an integral on a surface. We want to know the mean geographical width of a point in the Northern Hemisphere.

We try a simple rejection method. For this, we generate random numbers in the unit cube and reject those with z<0, r>1 or r<1/2. The remaining points, projected to the surface, are randomly distributed on the Northern Hemisphere.

>N = 10 000 000;
>x = random(N)*2-1; y = random(N)*2-1; z=random(N)*2-1;
>r = sqrt(x^2+y^2+z^2);
>i = nonzeros(z>0 && r<1 && r>1/2);
>mean(asin(z[i]/r[i]))
 0.570688989539
>&2*pi*integrate(asin(z),z,0,1) / (2*pi) | factor, ...
>   %(), %->"°"
 
                                 pi - 2
                                 ------
                                   2
 
 0.570796326795
 32.7042204869°

The analytic surface integral is done by parameterizing the surface using

\(g(z,t) = [\sqrt{1-z^2}\cos(t),\sqrt{1-z^2} \sin(t),z]\)

We can compute the Gram determinant for this parameterization, which is actually 1 (or use a geometric reasoning with the same result). Thus, for any function on the hemisphere H which depends on the height z only, we get

\(\int_H f(x,y,z) \,dS = 2 \pi \int\limits_0^1h(z) \,dz\)

The rejection trick works on symmetric surfaces only, where we can easily do an orthogonal projection and know the distance to the surface. Otherwise, we have to distribute the random points on the parameters in accordance with the distribution of the surface measure. This can be very tricky.

Corona, Impfen und Ethik

Dieser Blogeintrag muss auf Deutsch sein. Genauso wie das Wort „Querdenker“, das nun sein unschuldige Anmutung von kritischem Denken verloren hat und mehr an Flacherdler erinnert als an Kant und Schopenhauer. Wir schätzen alle das Denken, wenn es stattfindet, und wir tolerieren andere Gedanken. Das bedeutet, wir halten sie aus. Schwerer auszuhalten sind die Konsequenzen dieser Gedanken. Und noch schwerer auszuhalten ist, wenn das Querdenken zum Querglauben mutiert, der einfach nur noch behauptet und für Argumente nicht mehr zu erreichen ist. Wir müssen die Welt schon so sehen, wie sie ist, bevor wir anfangen über sie nachzudenken.

Schauen wir einmal hin.

  • Das Corona-Virus ist ein neuartiges Virus, gegen das wir fast keine Abwehrkräfte besitzen. Das bedeutet, dass fast jeder, der mit dem Virus in Kontakt kommt, erkrankt. Es ist nicht möglich, sich durch Stärkung der Abwehrkräfte, eine spezielle Ernährung oder eine gesunde Lebensführung vor der Ansteckung zu schützen.
  • Das Virus hat eine optimale Strategie zur Verbreitung gefunden. Es ist weder zu tödlich wie das Ebola-Virus, noch verwendet es einen selteneren Übertragungsweg wir das HIV-Virus. Auch der Zeitraum, in dem es keine Symptome zeigt und dennoch ansteckend ist, ist lang genug. Zudem nimmt es bei einem Großteil der Infizierten einen fast unbemerkten Verlauf, wird aber dennoch weitergetragen.
  • Das Virus killt insbesondere die Schwachen und Alten. Die Risikofaktoren umfassen Dinge, die nicht zu beeinflussen sind, wie Krebsbehandlungen, Diabetes aller Typen und Alter. Es trifft aber auch Starke und Junge, oft mit überraschender Härte und Langzeitwirkung.
  • Die Behandlung der schweren Erkrankungsfälle ist enorm aufwendig. Ein Intensivbett ist eine Bett mit intensivem Personalbedarf. Noch mehr gilt das für künstliche Beatmung. Wer darüber mehr erfahren will, sollte im Netz oder im TV nicht weit suchen müssen.
  • Eine Erkrankung hat häufig Spätfolgen. Wir wissen nicht genau, wie häufig. Aber es ist weitaus drastischer als bei einer Grippe. Selbst Windpocken können da nicht mithalten.
  • Das Virus mutiert. Je langsamer es bekämpft wird, umso mehr wird es mutieren. Es werden Virusvarianten kommen, die sich besser durchsetzen können, also noch infektiöser sind. Auch die Impfungen werden vom Virus umgangen werden. Je langsamer und lascher gehandelt wird, desto schlimmer wird die Situation.
  • Viele Menschen fallen auf das Vorsorgeparadoxon herein. Nachdem Maßnahmen gewirkt haben, gibt es immer wieder Besserwisser, die behaupten, dass der ganze Alarmismus unnötig war. War doch nicht so schlimm, oder? Beim Coronavirus wissen wir aber genau, wie es ausgeht, wenn man nichts tut. Das wurde schon ausprobiert, und wir müssen es nicht nochmal probieren.

Was schließen wir daraus? Am besten macht man sich Gedanken, wie man selbst die Krise gemanagt hätte, wenn man diese Verantwortung hätte tragen müssen. Es gibt mehrere unmögliche Szenarien, die alle schon durchgedacht wurden.

  • Wir lassen alles einfach laufen. Das bedeutet, wir opfern die Alten und Schwachen, zugunsten des Restes der Bevölkerung. Obwohl das ethisch, politisch und praktisch völlig undenkbar ist, wird eine Variante dieser „Lösung“ immer wieder vorgetragen. Die Sterberate wäre in der Tat „nur“ um maximal 1/3 höher. Aber die große Anzahl der Erkrankten, um die man sich nicht kümmern könnte und die still und alleine leiden müssten, wäre unerträglich. Wer besucht schon einen ansteckenden Todkranken? Auch die Wirtschaft würde zusammenbrechen, ebenso wie jeder Zusammenhalt in der Bevölkerung. Die Amish in den USA scheinen genau so die Krise zu „meistern“. Genaue Details über die Interna sind nicht bekannt. Man stirbt im Schoße der Gemeinschaft. Ich frage mich aber, was die Amish mit Blinddarmentzündungen oder Unfällen bei Kindern machen. Ebenfalls nichts? In Deutschland wäre das strafbar.
  • Wir isolieren die Schwachen und Alten vollständig und lassen ansonsten alles laufen. Diese Variante geht davon aus, dass im Rest der Bevölkerung das Virus nach eine gewissen Zeit verschwindet. Sonst müsste man ja die Alten ewig isolieren. Es kann sein, dass das der Fall ist oder auch nicht. Das Schnupfen- oder Influenza-Virus verschwindet ja auch nicht. Es mutiert und kommt jedes Jahr wieder. Außerdem geht diese Strategie von der unethischen Vorstellung aus, dass sich Risikogruppen, auch junge, aus dem Leben zurückziehen, bis alles vorbei ist.
  • Wir machen die Grenzen zu. Das haben wir in der Tat gemacht, allerdings nicht vollständig und präventiv, sondern je nach Lage. Es ist in dieser heutigen Welt nicht mehr möglich, sich als Land zu isolieren. Diese Isolation müsste vollständig und klinisch sein, wenn sie wirken soll. Denn das Virus ist zu leicht zu übertragen. Man würde also an den Grenzen Güter umladen und notwendige Geschäftsreisen mit langen Quarantänelagern verbinden. Das alles ist nicht denkbar. Selbst wenn der politische Wille einer faschistoiden Regierung so etwas zustande brächte, wäre es wohl immer schon zu spät. Unsere Welt ist inzwischen viel zu dicht besiedelt und die Verbindungen zu global, um eine Radikalkur durchzustehen.
  • Wir isolieren einfach nur die Kranken. Das tun wir schon. Positiv Getestete müssen auch jetzt in Quarantäne. Das geht erst, nachdem die Kranken von ihrer Erkrankung wissen. Davor können sie das Virus weitergeben. Diese Lösung funktioniert daher einfach nicht. Eine zivil gestaltete Quarantäne lässt sich auch nicht so einfach überwachen.

Man kann diese Gedanken beliebig weiterspinnen. Man kommt letztlich immer auf genau die Maßnahmen, die nun getroffen wurden. Einfach, weil es nicht anders geht. Nicht aus diktatorischer Grausamkeit.

  • Kontaktbeschränkungen und Abstandsregeln bis hin zum Shutdown sowie Tests. Diese Maßnahmen verzögern natürlich nur die Ausbreitung des Virus. Damit gewinnt man lediglich Zeit. Es entlastet die Kliniken und ihr Personal und ermöglicht einen menschlichen Umgang mit den Schwerkranken. Außerdem kann man durch eine verstärkte Anwendung an Orten, wo es nötig ist, schutzbedürftige Gruppen vor dem Virus bewahren. Die Maßnahmen werfen natürlich Sand in das Getriebe der Wirtschaft. Die ethische Abwägung sollte aber doch eindeutig sein.
  • Impfungen. Impfungen schützen den Einzelnen vor schweren Verläufen und extremen Leiden. Sie helfen davor nicht sicher, aber mit einer Wahrscheinlichkeit, die viel, viel höher ist, als die, an Impffolgen leiden zu müssen. Sie sind in diesem Punkt besser als die meisten anderen Arzneien. Noch wichtiger ist aber, mit Impfungen die Anderen schützen. Nach einer Impfung ist die Viruslast bei den allermeisten Menschen sehr viel niedriger, so dass folglich Ansteckungen sehr viel seltener sind. Impfungen sind das Mittel, um Pandemiewellen zu brechen. Ethisch sind sie geboten, weil sich manche Menschen nicht impfen können oder bei ihnen die Impfungen nicht wirken. Werden übrigens Impfverweigerer im Ernstfall auch das Intensivbett verweigern? Verweigern Sie ihren Kindern die Impfung gegen Polio, Masern oder Tetanus?

Ich sehe keine wirkliche Alternative zu den getroffenen Maßnahmen und plädiere im Extremfall für eine Impfpflicht, sollte das Virus weiter an Fahrt gewinnen.

Don’t talk about „Chances“!

I came across a YouTube video featuring another well-known paradoxical problem that is complete bogus. The video is done in a very stressful style with a sort of speaking that I can’t stand very long. I am still not sure, but I think the author just wanted to confuse.

The problem is rather simple. Assume you pick two strangers on the streets of New York and ask them to play a game: They should open their purses and count the money they are carrying. The one with the smaller sum wins the money of the one with the larger sum.

The confusion is now spread by claiming that each of the two strangers must be interested in the deal because the possible loss is always less than the possible win by the very nature of the game.

That’s nonsense. The argument falls apart if you think of a person with an empty purse, who would be overjoyed to play this game, while someone who just picked up a large amount of money from the bank should be very hesitant to make the deal.

The reason for the confusion is that it does not make sense to talk about „expected“ outcome without the assumption of an a priori distribution. I have written about this before in the two-envelopes-problem. This is just a variation.

The best way to avoid this error is to think of a simulation of the experiment on a computer. For this, you would have to fix a distribution which determines the amount of money that you put into the purses of the two strangers. With that knowledge, you can compute the expected outcome of the experiment. Without that knowledge, it does not make sense to talk about „expectation“.

ArXiv.com – My Latex files are none of your business!

I have been using the ArXiv only once till now for a paper on Optimization which is too small to be published elsewhere. I thought it might be a good idea to upload my script on complex functions of the class I did 2020/2021. But I can’t do that.

The ArXiv wants my Latex code, my images and all files I have. Uploading the PDF is rejected because the PDF is detected to have been done by Latex. The server wants to produce an own version of the PDF from my files. I don’t agree to that procedure.

The final version of the script is the PDF. I do not want anybody else to produce an own copy of my work. I have no problems sending source code to colleagues to be used in their own classes, but not to the public. Even though this script will be completely free and open, I do not want to give away the control over its content.

Moreover, the script failed to compile on the server anyway. I tried it to see if the idea even works. But it does not as expected. I have my own organization of the files, and even changed that organization for the test. Of course, everything compiles here without problems.

My Latex files are none of your business, ArXiv.

Simulating and solving Penney’s game

I recently stumbled across some YouTube videos about Penney’s game. Here is one of them. The problem was actually new to me, like almost all problems in mathematics. There simply are too many. It is easier to come up with a problem than a a solution.

According to Wikipedia, Walter Penney published the problem in the Journal of Recreational Mathematics in 1969. The article seems to be hard to get, so simply let us start an analysis ourselves. After all, that sounds like a lot of fun, and I have written about a similar problem in this blog before.

Assume we toss coins with outcome 0 and 1 at equal probability. So we get a sequence of tosses like

\(0,1,0,0,1,1,1,0,1,0,\ldots\)

We are studying the following game between two players:

  • Player A selects a triplet of tosses (like 0,0,1), and player B another triplet (like 1,0,0).
  • The player whose triplet appears first in the sequence of tosses wins.

Given the sequence above and the selected examples of triplets above, player B would win. His triplet 1,0,0 appeared immediately after four tosses.

If both players select their triplet secretly this is obviously a fair game.

There is a counterintuitive fact here: It is an illusion that the choice of the triplet matters. The average waiting time for any triplet is equal, and selecting (1,0,1) is no better than (1,1,1).

To understand this, we need to see the tossing as a sequence of triplets. In the example above the sequence is the following:

\((0,1,0) \to (1,0,0) \to (0,0,1) \to (0,1,1) \to (1,1,1) \to \ldots\)

All triplets have the same chance (namely 1/8) to appear at the start of the sequence. And any triplet T has exactly two other triplets S1, S2 which can lead to this triplet. Since S1 and S2 have the same chance (namely 1/8) to appear in the n-th position of the sequence and T is selected with probability 1/2 from each, the chance for T to appear in the (n+1)-th position is also 1/8.

After this first excursion into the mathematics of random walks, we need to come back to the third rule of Penney’s Game.

  • The player B selects his triplet after seeing the selected triplet of player A.

Now, there is no reason why player B shouldn’t have an advantage. After all he can simply take the same triplet and get a draw. If he was forced to take another triplet he might even be at a disadvantage. But, indeed, he has a big advantage of at least 2:1. A wrong selection of player A can make his chances even worse.

Let us try to understand this. This is a case of non-transitive ordering.

  • Let us write S>T if the triplet S is more likely to appear before triplet T in the toss sequence when the tosses are started with a random triplet.

Then it is not true that

\(T \ge S \,{\rm and}\, S \ge U \Longrightarrow T \ge U\)

If it were true, the ordering would be complete and player A could select the best triplet, or at least a triplet that cannot be beaten by the selection of player B.

It turns out that for each selected triplet T of player A, there is a selection S which is better than T (S>T). This makes the game a win for player B.

This is counterintuitive again. A consequence is that there is a sequence of triplets with the property

\(T_1 < T_2, \quad T_2 < T_3, \ldots \quad , T_{n-1}<T_n, \quad T_n < T_1\)

In this game, the simple rule to find a better triplet is the following:

  • If player A selects T=(X,Y,Z), player B selects S=(YN,X,Y), where YN is the opposite of Y, i.e., YN=0 if Y=1 and YN=1 if Y=0.

E.g., if player A selects (0,1,0) or (0,1,1), player B selects (0,0,1). For another example, if player A selects (1,1,0) or (1,1,1) player B selects (0,1,1).

How do we understand this? We use a general mathematical model.

A way to understand this is by considering random walks on a directed graph. The graph in this game has 8 vertices, the 8 possible triplets. It has 16 edges, i.e., pairs of vertices, with associated probabilities 1/2. Given any triplet, there are two possible tosses which lead to two possible edges. (Note that some edges go from the triplet to itself!)

E.g., there are two edges leading away from (1,1,0),

\((1,1,0) \to (1,0,0), \quad (1,1,0) \to (1,0,1)\)

or, from (1,1,1)

\((1,1,1) \to (1,1,0), \quad (1,1,1) \to (1,1,1)\)

One mathematical model for uses an adjacency matrix M, containing transition probabilities from vertex to vertex.

  • We set M(i,j) to the probability to go from triplet j to triplet i (i.e., to reach triplet i from triplet j) in one toss. So we have a matrix M of transition probabilities.

Now imagine starting a lot of experiments by walking from a start vertex to subsequent vertices with the given probabilities. The experiments select the start vertex according to a probability vector P, i.e., vertex i is selected with probability P(i).

  • If the distribution of vertices in the experiments is P at a step. Then the matrix multiplication MP will compute the distribution of probabilities at the next step.

Let us do a test in Euler Math Toolbox. First, we generate the adjacency matrix M for our game. Then we start at one specific node and compute the distribution after three steps.

>function makeM () ...
$M = zeros(8,8);
$for i1=0 to 1;
$  for i2=0 to 1;
$    for i3=0 to 1;
$      for j1=0 to 1;
$        for j2=0 to 1;
$          for j3=0 to 1;
$            i=4*i1+2*i2+i3;
$            j=4*j1+2*j2+j3;
$            if i2==j1 and i3==j2 then
$              M[j+1,i+1]=1/2;
$            endif;
$          end;
$        end;
$      end;
$    end;
$  end;
$end;
$return M;
$endfunction
>M=makeM;
>shortest M
    0.5      0      0      0    0.5      0      0      0 
    0.5      0      0      0    0.5      0      0      0 
      0    0.5      0      0      0    0.5      0      0 
      0    0.5      0      0      0    0.5      0      0 
      0      0    0.5      0      0      0    0.5      0 
      0      0    0.5      0      0      0    0.5      0 
      0      0      0    0.5      0      0      0    0.5 
      0      0      0    0.5      0      0      0    0.5 
>p=zeros(8)'; p[1]=1; p'
 [1,  0,  0,  0,  0,  0,  0,  0]
>p=M.p; p'
 [0.5,  0.5,  0,  0,  0,  0,  0,  0]
>p=M.p; p'
 [0.25,  0.25,  0.25,  0.25,  0,  0,  0,  0]
>p=M.p; p'
 [0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125]

The function to generate the matrix M is not the most elegant one. It might be better to use a recursion than a loop. After we generate the matrix, we start with a „distribution“ which puts everything into the first triplet (actually that is (0,0,0)). After only three steps, we see that the probability to reach each triplet is equal.

Let us now model Penney’s game.

We have two triplets T and S (vertex numbers i and j) which lead nowhere. I.e., if those states are met we have found a case where T or S are hit.

We modify the adjacency matrix accordingly and delete all edges leading away from T and S and connect T and S to themselves with probability 1. This is a modified adjacency matrix M(i,j). We now have to look at the following limit.

\(P = \lim_{n \to \infty} M_{i,j}^n P_0\)

with

\(P_0 = (1/8,\ldots,1/8)^T\)

which is our start distribution.

Let us simulate that in Euler Math Toolbox. We select the triplets T=(0,1,0) (third element of p) and S=(0,0,1) (second element of p) which according to our rule should be superior.

>M=makeM;
>M[,2]=0; M[2,2]=1;
>M[,3]=0; M[3,3]=1;
>shortest M
    0.5      0      0      0    0.5      0      0      0 
    0.5      1      0      0    0.5      0      0      0 
      0      0      1      0      0    0.5      0      0 
      0      0      0      0      0    0.5      0      0 
      0      0      0      0      0      0    0.5      0 
      0      0      0      0      0      0    0.5      0 
      0      0      0    0.5      0      0      0    0.5 
      0      0      0    0.5      0      0      0    0.5 
>p=ones(8)'/8; p'
 [0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125,  0.125]
>p=M.p; p'
 [0.125,  0.25,  0.1875,  0.0625,  0.0625,  0.0625,  0.125,  0.125]
>p=M.p; p'
 [0.09375,  0.34375,  0.21875,  0.03125,  0.0625,  0.0625,  0.09375,
 0.09375]
>loop 1 to 100; p=M.p; end;
>p'
 [0,  0.666667,  0.333333,  0,  0,  0,  0,  0]

Indeed, the two selected triplets catch all the distribution, and S=(0,0,1) is indeed twice as good as T=(0,1,0). I.e., player B will win twice as often.

Things get worse for T=(0,0,0) and S=(1,0,0). S is seven times as good as T! There are other combinations which are three times as good.

Due to the structure of the problem, the limit P will have only non-zero entries for T and S. The probability to be in any other triplet will tend to 0. T and S will catch everything.

But can we compute the winning chance for S without a simulation? For that we define

  • w(i) as the chance to reach S before T when starting at triplet i.

We are using the following properties of w.

  • w(i) = 1 for the vertex S with number i.
  • w(j) = 0 for the vertex T with number j.
  • w satisfies

\(w = M_{i,j}^T \, w\)

We are searching for a specific eigenvector. The following function of EMT computes w given and adjacency matrix M and i,j, and returns the probability to reach vertex i before vertex j.

>function getw (M,i,j) ...
$if i==j then return 0; endif;
$Mt=M;
$Mt[,i]=0; Mt[i,i]=1;
$Mt[,j]=0; Mt[j,j]=1;
$V = eigenspace(Mt',1);
$c = V[[i,j]]\[1;0];
$return sum((V.c)')/cols(Mt);
$endfunction
>M = makeM();
% Let us try our simulated examples.
>fraction getw(M,2,3), fraction getw(M,5,1)
 2/3
 7/8

The simulated example from above works, as well as another example which takes T=(0,0,0) and S=(1,0,0).

For the next step, we compute all probabilities that the triplet number i appears before the triplet number j when starting from a random triplet.

>function solvePG () ...
$M = makeM();
$W=zeros(8,8);
$for i=1 to 8;
$  for j=1 to 8;
$    W[i,j]=getw(M,i,j);
$  end;
$end;
$return W;
$endfunction
>W=solvePG();
>fracformat(7); W, defformat;
      0    1/2    2/5    2/5    1/8   5/12   3/10    1/2 
    1/2      0    2/3    2/3    1/4    5/8    1/2   7/10 
    3/5    1/3      0    1/2    1/2    1/2    3/8   7/12 
    3/5    1/3    1/2      0    1/2    1/2    3/4    7/8 
    7/8    3/4    1/2    1/2      0    1/2    1/3    3/5 
   7/12    3/8    1/2    1/2    1/2      0    1/3    3/5 
   7/10    1/2    5/8    1/4    2/3    2/3      0    1/2 
    1/2   3/10   5/12    1/8    2/5    2/5    1/2      0 

The non-transitivity of the problem means that in each column there is a value >1/2. We can select this value and print the corresponding triplet in human readable form.

>function printtriplet (i) ...
$s = ")"; i=i-1;
$if mod(i,2) then s=",1"+s; else s=",0"+s; endif;
$i = floor(i/2);
$if mod(i,2) then s=",1"+s; else s=",0"+s; endif;
$i = floor(i/2);
$if mod(i,2) then s="(1"+s; else s="(0"+s; endif;
$return s;
$endfunction
>function printPG (W) ...
$for j=1 to 8;
$  w=W[,j]';
$  e=extrema(w);
$  printtriplet(j) + " -> " + printtriplet(e[4]),
$end;
$endfunction
% We get the well known solution of the game.
>printPG(W)
 (0,0,0) -> (1,0,0)
 (0,0,1) -> (1,0,0)
 (0,1,0) -> (0,0,1)
 (0,1,1) -> (0,0,1)
 (1,0,0) -> (1,1,0)
 (1,0,1) -> (1,1,0)
 (1,1,0) -> (0,1,1)
 (1,1,1) -> (0,1,1)

This is exactly the well-known rule that you find also in the Wikipedia article.

Understand Bayesian arguments!

I just stumbled over another problem of probability theory on the channel MindYourDecisions. The video is rather old already, but mathematical problems never age.

„A nursery has 3 boys and a number of girls at one day, plus an additional birth at night. The next day a child from the day before is picked at random, and it is a boy. What is the probability that the child born at night is a boy?“

I hate the form that this question for a probability is posed. For the casual audience this does not make much sense. They will just answer 50%, and I would not be angry about this answer. It is sort of correct. You need a mathematical education to understand the problem to start with. Let me explain.

Everyone who has not met this kind of problems before will argue that the probability of the night child being a boy is 50%. That is true. The fact that the survey picks a boy the next day does not change this at all. I mean, the randomly picked boy was a boy. So what?

The problem is that the question is asking something more involved. It asks you to consider all possible random experiments. I.e., you assume a random child at night and a random pick. Then you discard those experiments that do not fit to the story.

An extreme case would be that there are no boys at daytime. Then a girl at night would not fit to the story, and all experiments that fit feature a boy at night. The Bayesian mathematics calls this a 100% probability for a boy.

For the actual case of three boys at daytime, let us assume g girls. I now follow the explanation of the video, but with different wordings and a completely different concept of thinking.

There are two cases now.

Case 1: We have a boy at night. We expect this case to happen in half of the experiments. The chance to pick a boy the next day is then 4/(4+g). We throw away all experiments where a girl is picked, i.e., g/(4+g) of them on average.

Case 2: We have a girl at night. Again this happens in half of the experiments on average. The chance to pick a boy next day is 3/(4+g). We throw away all experiments where a girl is picked.

Now imagine both cases together. In the experiments that we did not throw away we have an overhead of 4 to 3 for case 1. Thus, the Bayesian mathematics says that the chance for case one is 4/7.

There are other ways of writing this up. But they are only shortcuts of the above thought experiment. And for me, defining a precise experiment is the clearest way to think about such problems.

The Mathematics of DOF and Bokeh of an Ideal Lens

I have never written about the optical computations that apply to an „ideal lens“ in the blog, although there is a notebook for Euler Math Toolbox and an Euler file („dof.e“) that can be loaded into the program. The topic, however, is quite interesting from a mathematical point of view. It involves modelling of the lens in a though experiment („Gedankenexperiment“), and determining formulas from the geometry of the model. So it makes a good application of simple Algebra and Geometry for higher school classes.

An ideal lens is able to focus light rays that emerge from a point of the scene (the object) into another point behind the lens (the image of the object). As you see in the figure above, the lens should ideally focus all objects on a plane with distance d (object distance) onto a plane with distance f(d) (focussed image distance) behind the lens. We denote it by f(d) because it depends on d. The line through the object and its image goes through the midpoint of the lens. The figure shows the situation (green and blue) for two objects.

If the object is at infinity, its emitted rays of light are parallel. Then the image is on a plane with distance f from the lens. This distance is called the focal length of the lens. Again, the figure shows two situations (green and blue). Actually, we could write

\(f = f(\infty)\)

In the first figure, the object is not at infinity, but closer. The lens will not be able to focus the object at distance f, but at the larger distance d_f. In a camera, the lens will have to be moved further away from the sensor or film to get a sharp image.

Our first aim is to prove the relation

\(\dfrac{1}{f} = \dfrac{1}{f(d)} + \dfrac{1}{d}\)

For this, we only want to use geometric properties of our „ideal lens“.

Our first observation is that we can find the focal length f in the figure above if we select a ray that goes from an object perpendicular to the lens at distance d and is then focussed to an image at distance f(d). The intersection with the central axis of the lens will have distance f in that case. This is so, because we assumed that parallel rays focus at distance f as in the second figure above.

At that point, we draw the situation in another way, emphasizing the things we need for our geometric argument.

Looking at the blue and the tray rectangle, we easily see

\(\dfrac{f(d)}{f} = \dfrac{d+f(d)}{d} = 1 + \dfrac{f(d)}{d}\)

Dividing by f(d) we get our simple and easy to remember formula.

There are some interesting special cases. One is the case when the object is at infinity. Then the image distance is just f as in the figure above.

\(d = \infty \quad \to \quad f(d)=f\)

Another one is

\(f(d) = d = \dfrac{f}{2}\)

That is a macro lens with 1:1 magnification. The size of an object is then identical to the size of the image of the object.

We can also see, that it is impossible to focus on objects that are closer than the focal length f.

Now, we can start to proceed towards the concept of „depth of field“ (DOF). The first step is to observe that an object will not be focussed correctly if the lens has the wrong distance from the sensor (or film).

So assume the object distance is d_o, but we have focussed on objects at a distance d. The objects at distance d_o will not be in focus. Instead, the rays emitted by the center object will form a small circle on the sensor.

The diameter c of this circle is, by another geometrical observation,

\(c = l \, \dfrac{d-f(d)}{f(d)}\)

Here, l is the diameter of the lens. To see this, consider the stretching factor between the two green triangles behind the lens (i.e., left of the lens in the figure).

For a realistic measure, we use the „aperture“ of a lens. That is usually measured by F-stops which we denote by A_F to minimize the confusion to the focal length f.

\(A_F = \dfrac{f}{l}\)

The aperture measure the amount of light that reaches the sensor per area. For different lenses, this amount will be the same for the same aperture, since f and l both affect the amount of light proportional to f^2 and l^2. Thus, the aperture can be used to get the correct exposure independent of the lens. Unfortunately, it is inverted in an unintuitive way. So larger a will yield less light on the sensor.

Now we are going to use EMT and Maxima to compute a formula for c which depends on the object distance, the focussed distance, the focal length and the aperture.

>equ1 &= 1/F = 1/DF+1/D
 
                               1   1    1
                               - = -- + -
                               F   DF   D
 
>function f(F,D) &= DF with solve(equ1,DF)
 
                                   D F
                                - -----
                                  F - D
 
>function c(F,D0,D,AF) &= factor(F / AF * (f(F,D)-f(F,DO))/f(F,DO))
 
                                        2
                              (D - DO) F
                             -------------
                             AF DO (F - D)

Thus

\(c = \dfrac{|d-d_o| \, f^2}{A_F d_o (d-f)}\)

That sound terribly complicated. So, let us study special cases and find some simple relations which we can use for photography.

Assume, we focus on an object at distance d_o. We are interested in the Bokeh of the photographic image. This is the blurriness of objects at infinity, or at a far distance. We need to study the behavior of c for large d. Mathematically, this is the limit as d goes to infinity. We get

\(c_\infty = \dfrac{f^2}{A_F \, d_o}\)

That’s much easier. But for practical purposes we have to consider that we need to double our focal length if we move the object away from us to double the distance. Thus, a more practical rule of thumb is

\(c_\infty \sim \dfrac{f}{A_F}\)

It applies to photographing the same object at the same size on the picture. This is a very natural formula if you think of the focal length as a magnification. The circles of diffusion are simply magnified with a higher focal length.

We also learn that for the same object at the same size on the image, the Bokeh gets twice better with twice the focal length and with half an F-stop. Or, you get the same Bokeh for a 50mm lens at F2 as for a 100mm lens at F4. For the Bokeh effect, it is usually cheaper to use a longer lens than an expensive and fast medium lens.

For depth of field (DOF), we want to know what objects are at a distance where the circle of confusion does not exceed a maximal value. In the old film days, this value was

\(c_\text{max} = 0.03 \text{mm}\)

It is possible to solve this exactly, but we are only interested in rules of thumb.

First, let us assume we take an image of the same object with different focal length such that the object has the same size on the picture. We need to change our distance accordingly. In fact f/d is almost constant. Moreover, we can assume that d is much larger than f, and thus d-f approximately equal to d_o. Then we get simply for object distances d_o which yield the maximal circle of confusion

\(|d-d_o| \sim A_F\)

I.e., the DOF does not change for the same subject at the same size with different lenses unless we change the aperture.

There is also the topic of the „hyperfocal“ distance. We want to determine the shortest distance d_o to focus on such that infinity is reasonably sharp. By our formula above for the radius of diffusion, we get

\(d_o = \dfrac{f^2}{c_\text{max} \, A_F}\)

Can we derive a practical rule of thumb from this? Not really. We can provide some examples, like focussing 3m for F8 and 24mm. The usual advice to „focus 1/3 into the scene“ is obviously only vague advice. Maybe, it yields the most pleasing unsharp regions. But I tend to check the image on the camera.

It is interesting how a „crop factor“ (aka. form factor) changes the mathematics. For an example, let us assume we take a micro four thirds (MFT) instead of a „full frame“ camera. Then we have a crop factor f_c of 2. That means, we need to use a 25mm instead of a 50mm to get the same image. The formula yields 1/4 of the circle of diffusion. But since our sensor is also 1/2 the size, we get a full frame equivalent

\( c_\text{equiv} = \dfrac{|d-d_o| \, f^2}{f_c A_F d_o (d-f)} \)

Effectively, the DOF multiplies by f_c, and the blurriness in the Bokeh at infinity divides by f_c. For the Bokeh, we now get under the condition that we take the same object at the same size in the picture,

\(c_{\infty,\text{equiv}} \sim \dfrac{f}{f_c \, A_F}.\)

To get the same Bokeh for MFT, we need twice the aperture. So you need to replace a 85mm lens on a full frame camera at F2.8 with a 42,5mm lens on MFT at 1.4. The best cheap lens is at 1.7, which is equivalent to F3.5 on the full frame camera.

But there is also an interesting effect which puts MFT into a better light. Assume, you need enough DOF to capture a bird in flight and get it safely into focus. In this case, you might use F5 on a 70mm lens (140mm equivalent) on MFT at 1/250 seconds exposure. On the full frame equivalent, you will have to use F10 (F11 actually) and thus crank up ISO two stops to keep 1/250 still. This loses the advantage of the larger sensor in terms of noise. Note that the Bokeh at infinity remains the same if you do so, since you doubled the F-stop in change of a factor f_c=2. So, MFT is working just as well here. You just use more compact and cheaper equipment.