## Recursion Formulas

I do not know if I ever wrote about this. But it is a well known theory, anyways. To find a formula for the Fibonacci numbers

\(F_{n+1} = F_n + F_{n-1}, \quad F_0 = 0, \quad F_1 = 1\)

you can use the „Ansatz“

\(F_n = c^n\)

you will immediately find that c must satisfy

\(c^2 = c + 1\)

The two solutions for this equation can both generate the recursion formulas as well as any linear combination of them. So our Ansatz yields

\(F_n = a c_1^n + b c_2^n = a \left(\dfrac{1+\sqrt{2}}{2}\right)^n + b \left(\dfrac{1-\sqrt{2}}{2}\right)^n\)

Observe the logic that we followed here! All we know by know is that if F(n) is defined by this formula, it will satisfy the recursion formula for the Fibonacci sequence. On the other hand, it is easy to find a and b such that F(0)=0, F(1)=1. Then we know that we have indeed found the correct formula for the sequence. I.e.,

\(F_n = a c_1^n + b c_2^n = \dfrac{1}{\sqrt{5}} \left( \left(\dfrac{1+\sqrt{2}}{2}\right)^n – \left(\dfrac{1-\sqrt{2}}{2}\right)^n\right)\)

We have also shown that any sequence which satisfies the recursion must be of that form with proper constants a and b. This is so, since we can determine a and b for any start values.

Due the fact that c(2) is between -1 and 0, we get

\(F_n = \text{round } \dfrac{1}{\sqrt{5}} \left(\dfrac{1+\sqrt{2}}{2}\right)^n\)

Now, if we take other start values, what happens? E.g., consider the numbers

\(L_n = \left(\dfrac{1+\sqrt{2}}{2}\right)^n + \left(\dfrac{1-\sqrt{2}}{2}\right)^n\)

The start values are then L(0)=2 and L(1)=1. We simply get another integer sequence. And we get the strange fact that

$latex \left(\dfrac{1+\sqrt{2}}{2}\right)^n$

gets closer and closer to integers. This has been observed by Yves Meyer and mentioned in a blog post recently by J. D. Cook. So I came to write this short article. The difference, computed in Euler Math Toolbox, is as follows

>p1 = (1+sqrt(5))/2 1.61803398875 >p2 = (1-sqrt(5))/2 -0.61803398875 >n=0:19; longest (p1^n)' 1 1.618033988749895 2.618033988749895 4.23606797749979 6.854101966249686 11.09016994374948 17.94427190999916 29.03444185374864 46.97871376374781 76.01315561749645 122.9918693812443 199.0050249987407 321.996894379985 521.0019193787257 842.9988137587108 1364.000733137437 2206.999546896147 3571.000280033584 5777.999826929732 9349.000106963316 >p1^n+p2^n [2, 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, 199, 322, 521, 843, 1364, 2207, 3571, 5778, 9349]

The difference to the next integer is of course a power of p2 goes to zero geometrically fast.

## Sensor Size and Image Quality

If you are confused about the effect the sensor size makes to your image quality you are not alone. In this post, I try to explain a good part of these problems. I try to do that without any math. But I assume some basic background in photography.

We wish to learn about the effect of a smaller sensor size to the depth of field, the background blurriness and the noise. Those are important ingredients for the image quality. For portraits, we wish to have a noise free and blurry background. For night shots, we wish to have as little noise as possible.

Let us do a thought experiment to understand the topic. Imagine, we could make the world smaller by a factor of 2 in all dimensions. We would shrink a full frame camera to a 4/3 camera. The shrunken camera is then said to have a crop factor of 2. Note that we shrink everything to half its size: the sensor width and height (e.g. from 36 x 24 to 18 x 12), the focal length (e.g. from 50mm to 25mm) and the open width of the aperture at any given F stop (e.g. from 5mm to 2.5mm).

First of all, the angle of view remains exactly the same for our 25mm lens compared to the 50mm in full frame. So the image shows the same things as before, even if we would not shrink the world.

Moreover, the light that comes through the lens is only 1/4 of the light, since our aperture has only 1/4 of the area than before. But the distance to the sensor plane is only 1/2, and so we get the same amount of light per square mm. If we used film we would get the same exposure. That explains why we compute the F stop by dividing the aperture by the focal length. This stays the same in our shrunken camera. Thus we use F8 no matter what size of camera we have.

But some important things change.

(1) Our light is on the same level for the smaller camera. So each pixel gets only 1/4 of the light. Remember that the pixels are smaller too. But the pixels produce the same amount of noise. The noise does not depend much on the size of the pixel. Thus the noise level in relation to the signal strength for each pixel increases by quite an amount. We can say by a factor 4. The image gets noisier.

(2) The objects do *not* shrink or get closer by a factor of 2. If they would we’d get the same depth of field and blurriness in our image. But a man in 10 meters distance is still the same in the same distance. So we have to change our focus by moving the lens further to the sensor plane. Of course, the camera will do that for us automatically. The effect of this is known to any photographer and is quite obvious: If we focus on something further away the background gets sharper. For practical purposes we can say it gets twice as sharp when the distance is doubled.

*What that means is that you get a less blurry background and more noise at the same settings. *

That’s bad. The good point is that you also get a larger depth of field around the object if you need that.

To compensate for this you can double the aperture size, i.e., increase by two stops (e.g. from F4 to F2), and decrease your ISO by two stops to compensate (e.g. from ISO 400 to ISO 100). Your sensor will then get the same amount of light and your noise will be the same. Moreover, your background blurriness is the same too. It is a bit complicated to compute, but your depth of field around your object will also be approximately the same. You get the same image quality.

*In terms of image quality for images with an object and a background, you can replace a full frame 80mm lens at ISO400 and F4 with a 40mm lens for your 4/3 system at ISO100 and F2. *

But, of course, your full frame lens might allow you to shoot at ISO100 and F2. You have no chance to cope with this on a 4/3 system. For you would need ISO25 and F1. The full frame camera offers more opportunities. It is also more bulky and expensive.

If you do not care about the background or you want more sharpness all across the frame the small camera seems to favor you. But you still have more noise at the same settings.

*In terms of noise for night photography or landscapes, you get the same noise with ISO400 on the full frame and with ISO100 on a 4/3 system. To compensate, you need more light by increasing the aperture or the exposure time.*

By the way, measurements at various camera sites in the net confirm this finding. With the same F stop on the same image, the full frame camera can be shot at ISO 800, the APS-C camera at ISO 400 and the 4/3 camera at ISO 200, and all will produce the same noise. Pictures will still look different due to the different background blurriness.

## About Vectorization

Me being the author of a matrix language, you will be surprised that I do no longer favor the matrix approach in education. In case you do not know what I am talking about: A matrix language avoids loops by vectorizing functions and operators.

Let me give you an example. In a classical language you would have to write something like the following.

>t=0:0.1:10; >s=zeros(size(t)); >for k=1 to length(t); s[k]=sin(t[k])*exp(-t[k]); end;

The syntax is from EMT, but it does not matter. You need to declare a vector, then fill it with a loop. In a matrix language, you can simply write

>t=0:0.1:10; >s=sin(t)*exp(-t);

The function sin() will operate on vectors and create a vector of the sine of all elements. Likewise, exp() and the operator * of the multiplication.

EMT has made that more comfortable than Matlab, allowing to combine column and row vectors to matrices, and introducing functions that map automatically to the elements of their arguments. For more, see the documentation of EMT.

This is very handy. It seems that with this trick you can write short code that is easily understandable even by non-programmers. But is that really so?

The problems start if you want to vectorize more complicated data structures, or want to do more complicated things than just applying a function to a vector. You will soon find that you have to use your pick of matrix language on a very high level of proficiency. Here is one example which is just a little above the basics: Create a vector that contains those elements of a vector where the next element is not smaller.

>x=normal(100); >x[nonzeros(diff(x)>=0)]

I bet it is easier to write a loop in an ordinary programming language than to find out how that works in your choice of matrix language and in an efficient way.

My key experience was the following task in Matlab: Read in the words in 10000 spam mails and 10000 good mails, and make a statistical function that can detect spam mail by the words in the mail. Yes, I could to it! It took a full day to find the proper Matlab tricks and get it to run under two minutes. The next day I tried in Java. It took me two hours to write the code using Java’s tokenizers and dictionaries. And the program finished the task in under 5 seconds! That typically happens if you use the wrong tool.

So, is this an argument to drop matrix languages altogether? No, certainly not. They are way too practical for simple purposes like handling simple data or a plotting a function. But you should not base the education in universities or colleges on tools like Matlab or EMT. Those tools go into the wrong direction if they are taken too seriously. I would even include R into this collection. But R is more often used for the specialized statistical functions it contains, so it has merits that no current programming language can provide.

You need to ground the base with a programming language like C, C++, Java or Python. There, the student can learn about vectors, matrices and loops to handle them. These languages do also contain advanced tools like dictionaries and other collections. And you can program your own data structures. They are also way more efficient than any of the all-in-one packages that come with a matrix language. Moreover, it is not at all difficult to create the matrix functionality in Java.

Python is a not the worst of ideas to use for education in programming. It is a basic language, and there are tools for all sorts of purposes, including numerical matrices, statistics or symbolic computations. It is not as basic as C++ or Java, however, and quite different from languages used in the industry. But it is, for sure, closer to those real world programs than Matlab.

## Making GIF Animations

I fixed „makeanimations.e“ in EMT.

The code for this is the following.

>function f(z,z0) := (z-z0)/(conj(z0)*z-1); >function plotcircle(z,r,z0,color=black) ... $t=linspace(0,2pi,500); w=f(z+r*exp(I*t),z0); $barstyle("#"); barcolor(color); $hold on; polygon(re(w),im(w),0); hold off; $endfunction >function plotall (d,z0=0.5) ... $fullwindow; setplot(1.05); clg; $plotcircle(0,1,0,lightgray); $plotcircle(0,1/3,z0,white); $c=exp(((0:5)/6+d)*2*pi*I)*2/3; $loop 1 to 6; plotcircle(c[#],1/3,z0,red); end; $endfunction >plotall (0) >load makeanimations Functions to make an animated GIF or an MPEG movie. >makegif("plotall",(0:0.01:1)/6,"animation",w=400,h=400)

## Image Processing and Data Evaluation in EMT

There is an example for Euler Math Toolbox (EMT) where I tried to fit curves to a real chain. I then fitted a parabola and a catenary to the points and explained a bit about the catenary.

The points where derived by hanging a real chain in front of the screen. As one user pointed out, this is an outdated method in the ages of digital images. Of course, now you can do that with images in EMT too.

First I downloaded a free image of an egg. If you have an image from a digital camera reduce it in size. We do not want to load a high res image into EMT. You can place the image into the notebook with

>loadimg("egg.jpg");

The image was originally about 944×993 pixels. It will be reduced in size for the notebook. loadimg() accepts a number of lines to be used for the image in height. For the blog, I reduced the image too.

Now the following lines of code will load the image into EMT as a matrix of RGB (red-green-blue) values.

>M=loadrgb("egg.jpg"); >size(M) [944, 993]

We can analyze the image in EMT. E.g., let us plot the average portion of red color in each column of the image.

>aspect(2); plot2d(sum(getred(M'))'/rows(M)):

We could also process the image. Let us extract the RGB values and manipulate them to put a warm tint to the image.

>{R,G,B}=getrgb(M); >function f(x,a) := x + a*x*(1-x) >insrgb(rgb(f(R,0.4),G,B)); >savergb(rgb(f(R,0.4),G,B),"egg2.png");

The insrgb() command inserts the RGB image into the notebook. The savergb() command saves the image. It looks like this.

You can also plot the image as a background image to your plot. The command is plotrgb(). It uses the current plot area as set by window(). In this blog, we make it simple and set a plot are with setplot(). We draw a coordinate system with xplot(). These are older and more basic functions than plot2d(). But plot2d() relies on these functions too.

>savergb(rgb(f(R,0.4),G,B),"egg2.png"); >aspect(1); setplot(0,1,0,1); plotrgb(M); xplot();

You will notice that the plot is not very efficient. That is why you need to reduce your digital image to 1MB or less. This problem might be addressed in further versions of EMT. E.g., we could read the image from a file or a list of loaded images for each plot. Currently, plotrgb(M) for matrices M is the way to go. Of course, you can also use EMT to reduce your image. One way would be to smoothen it with a two matrix fold and then to take every other point. In the example, we reduce the size by a factor of 3.

>E=ones(5,5)/25; >M1 = rgb(fold(R,E),fold(G,E),fold(B,E)); >M1=M1[2:3:rows(M1),2:3:cols(M1)]; size(M1) [313, 330] >savergb(M1,"egg2.png");

Note that the previous images of the egg were reduce for the blog. This is the original result of the reduction in EMT.

Assume we want to get points on the outline of the image. For this, we write a little code in EMT to get mouse clicks from the user.

>function getclicks () ... $global M; $setplot(0,1,0,1); plotrgb(M); xplot(); $hold on; $v=[]; $repeat; $ m=mouse("Click or press Return!"); $ until length(m)<2; $ v=v_m; $ mark(m[1],m[2]); $end; $return v; $endfunction >v=getclicks() 0.401925 0.0930601 0.249599 0.20031 0.131469 0.352635 0.066187 0.560917 0.112817 0.758318 0.28224 0.8951 0.569794 0.901318 0.768749 0.794068 0.88377 0.675938 0.956825 0.51895 0.959933 0.332429 0.900868 0.194092 0.762532 0.0868427 0.599326 0.0510929

## The Bayesian and the Frequentist – III

For another story, I read the following problem on a Google+ site yesterday: You have three cards in your pocket, one with two red sides, one with two blue sides and one with a red and a blue side. You take one out of your pocket, place it on the table, and it shows a blue side. What is the probability that the other side is also blue?

That fits into my Credo that the quickest way to get order into our thinking is to imagine or actually do a simulation. So let us do that first without trying any abstract thoughts about the problem.

The simulation is obviously to draw one of the three cards with equal probability. Then to take one of the sides with equal probability. Then we have to discard the cases where the side is red. This time, we make a small Java program. I try to design one that is easily understandable.

public class MC { static final int red=0,blue=1; public static int random (int n) { return (int)(Math.random()*n); } public static void main (String a[]) { int C[][]={{red,red},{blue,blue},{red,blue}}; int n=100000000; int found=0; int valid=0; for (int i=0; i<n; i++) { int card=random(3),side=random(2); int otherside=1-side; if (C[card][side]==blue) { valid++; if (C[card][otherside]==blue) found++; } } System.out.println("Found : "+(double)found/valid); } }

We can do 100 million simulations in about one seconde. The result is 0.66666164. Of course, we can also do it in EMT with its matrix language or with a TinyC program. With the matrix language, I can only get up to 10 million iterations. Extending the stack would help. But then it is still slower.

>C=[1,0,0;1,0,1] 1 0 0 1 0 1 >n=10000000; >i=intrandom(n,2); j=intrandom(n,3); >Sel=mget(C,i'|j'); Other=mget(C,(3-i)'|j'); >sum(Sel==1 && Other==1)/sum(Sel==1) 0.666787938524

So the result seems to be 2/3 probability for the other side to be blue. It is not difficult to understand. We always reject the card with two red sides. And we reject the card with one red side half of the time, but we always accept the card with two blue sides. Thus it is twice as likely to have blue on the other side.

The Bayesian trick does indeed help in this case. But it makes things more mysterious. Let us call BB the event that the card with two blue sides is drawn, and B the event that we see a blue side. Then

\(P(\text{BB}|\text{B}) = P(\text{B}|\text{BB}) \cdot \dfrac{P(\text{BB})}{P(\text{B})} = 1 \cdot \dfrac{1/3}{1/2} =\dfrac{2}{3}.\)

## The Bayesian and the Frequentist – II

This is not immediately related to Bayesian statistics. But it is a good argument for the frequentistic approach. I met the problem in a Youtube video, but the core of this problem is well known. The solutions are not always correct, nor is the linked video. Actually, I present two problems

**Problem 1**

You see two frogs and hear a croak. The croak can only come from a male. Then, what is the probability that one of the frogs is a female?

**Problem 2**

You meat a man, and he tells you that he has two kids, at least one of which is a boy. What is the probability of the other being a girl? And does the probability change if you know that the boy is born on a Tuesday?

Both problems are obviously only vaguely formulated. You need to make assumptions. E.g., in the following, let us assume that for each random frog or kid the probability of being male is 1/2. But, as we will see, the answer to Problem 1 depends on more assumptions. The intuitive answers to both problems tend to be completely wrong.

My point is that you need to imagine a Monte-Carlo simulation in both situations. If you cannot come up with an experiment any answer will be useless anyway. That is the heart of the frequentistic approach to statistics.

Let us start with Problem 2. So your simulation would assign genders to both kids by random, so BB, BF, FB and FF have the same probability 1/4. Note that there is BF and FB since we assign the gender to each kid separately. Then the simulation would discard the irrelevant case FF. We are left with BB, BF and FB with equal probability. Thus, 2/3 of the simulated cases contain a female. Here is a code for this in EMT.

>n=100000; Gender=(random(n,2)<0.5); Boys=sum(Gender)'; >sum(Boys>0 && Boys<2)/sum(Boys>0) 0.665760978144

As usual this code is in the style of a Matrix language. If you are not familiar with this you need to write a loop. Hint: „Gender“ is a nx2 matrix with 0=girl and 1=boy. „Boys“ is a vector that contains the number of boys in each of the 100 thousand samples.

Now, what changes if we know that boy is born on Tuesday? In our simulation, we would have to assign birth dates to the boys. We make the assumption that each day has the same probability. We discard every pair that has no Tuesday born boy. Let us do that in EMT first.

>n=1000000; Gender=(random(n,2)<0.5); Boys=sum(Gender)'; >TuesdayBoys=(random(n,2)<1/7 && Gender==1); TBoys=sum(TuesdayBoys)'; >sum(TBoys>0 && Boys<2)/sum(TBoys>0) 0.518327797038

Again, a loop may be more convenient for you if you are not familiar with the Matrix language of EMT.

If we think of the cases and their probabilities, we get the following cases with the probabilities

\(P(M_TM_T)=\dfrac{p^2}{4},\)

\(p(M_TM_O)=P(M_0M_T)=\dfrac{p(1-p)}{4},\)

\(P(M_TF)=P(FM_T)=\dfrac{p}{4}\)

using the obvious abbreviations (MT for a Tuesday boy, MO for any other boy, and F for a girl) and p=1/7. The cases are exclusive to each other. Thus the probability for a girl under these assumptions is

\(\dfrac{2p/4}{p^2/4+2p(1-p)/4+2p/4} = \dfrac{2}{4-p} = \dfrac{14}{27} \approx 0.5185\)

That agrees to our experiment in three digits. Random Monte-Carlo experiments like the one we performed are not very accurate. With a programming language, however, you can make much larger experiments.

Let us turn to Problem 1. Simulating the frogs means we have to decide for a probability of gender and for the probability of croaking. Moreover, we have to decide what to do with two croaking males.

Assume, you cannot distinguish one from two croaks. Then this is the same as in Problem 1 with a general probability p instead of 1/7. The extreme case is p=1 with the result 1/3 for a female. It is just the same case as in Problem 2 without the Tuesday information. The other extreme case is p close to 0. Then, if you hear a croak the probability for a female is close to 1/2.

The situation is quite different if you assume that there was only one croak. For p=1 this means there must be a female for sure. For p->0 you get 1/2 as before. I leave that to work out for you.

What about a Bayesian approach? We want to compute something like P(Female|Croak). But what do we know? We could use the usual Bayesian trick

\(P(F|C) = P(C|F) \dfrac{P(F)}{P(C)}.\)

We have to work out the probabilities on the right hand side, nevertheless. You can make all sorts of non-sense now. Clearly, P(F)=1/4. But the other probabilities are difficult to compute. So your are left with the definition of the conditional probability

\(P(F|C) = \dfrac{P(F \cap C)}{P(C)}\)

And that is exactly what we did above. There is no help from Bayes here.

## A Problem of Logic

### The Problem

Recently, I stumbled across a very interesting problem. I closed the site and started to think about the solution. Therefore, I neither have a link nor the solution given on the page. Let us try our luck with it. (I found a similar problem here. The solution is similar to the one I found. But here, I try to say more about the problem.)

Problem: Two prisoners A and B are given two numbers, each between 0 and 10. A is given the number a=6 and B is given the number b=4. They are told that the numbers add to 9 or 10. As usual, the prisoners cannot communicate in any way. Each day, A is asked if the sum is 9 or 10, and then B is asked the same question. Either prisoner can pass or claim to know the answer. If the answer is correct both regain their freedom, if not both are executed.

Can they find a secure solution without ever communicating between each other? If they both are clever enough they can. Now, you can leave this page for thinking if you wish. But mark it so that you can find back in case you cannot solve the problem.

The logic gets very involved if you start with the given values a=6 and b=4. A knows from the start that either b=4 or b=3. This he knows that B knows that he has a number between 5 and 7 etc. If you think that way, you are in for a problematic approach.

### The Solution

It took me several attempts to change my thinking. Let us call „shared knowledge“ the facts that both prisoners know and that both prisoners know the other prisoner knows. In fact, we think of what an observer would know. We ignore the specific numbers a=6 and b=4 for a moment.

Then, after A passes, everyone knows (including the observer) that A cannot have a=10. So he has any number between 0 and 9. Likewise, B cannot have b=10 when he passes. But notice that B can not have the number b=0. Because A has less than 10 he would immediately know the answer a+b=9. Thus, after the first day, we write the shared knowledge as

\(D_1 : \quad 0 \le a \le 9, \quad 1 \le b \le 9\)

Once, A passes the second day we know that he has neither a=0 or a=9. Otherwise, he would have known the sum as you will easily check. So a is between 1 and 8. If B passes too we know that b=1 and b=9 are impossible. So after the second day everyone knows

\(D_2 : \quad 1 \le a \le 8, \quad 2 \le b \le 8\)

Continuing like that we get

\(D_3 : \quad 2 \le a \le 7, \quad 3 \le b \le 7\)

\(D_4 : \quad 3 \le a \le 6, \quad 4 \le b \le 6\)

Now, A knows that a=6. Thus either b=3 or b=4. But b=3 no longer is possible. So when A is asked at Day 5, A knows for certain that a=6 and b=4.

Note that B has to decide between a=5 and a=6 which he cannot do earlier with this reasoning. But assume the correct numbers are a=6 and b=3. Then B has to decide between a=6 and a=7. We can do so when he is asked at Day 4. He will then already know A:3-6.

This solved the problem in our specific case. The logic is rather simple, but weird.

### Deceptive Ideas

Let us try to shorten the process and put our logic to a test. We ignored the specific knowledge of A and B about the number they have. A knows a=6, and B knows b=4. So A knows that b=3 or b=4. Thus A knows that B knows that A has a number between 5 and 7. Likewise, B knows that A knows that B has a number between 3 and 5. So, you might deduce that the shared knowledge between A and B at Day 0 is

\(D_0 : \quad 5 \le a \le 7, \quad 3 \le b \le 5 \tag{1}\)

With the logic from above, we deduce that A would know that answer with a=7. So, after a pass, B knows a=5 or a=6. With either b=3 or b=5 he could immediately deduce a+b=9 or a+b=10. Thus, when he passes he must have b=4. A could then declare the result after only two passes when asked at Day 2.

This is a false and deceptive logic. It cannot be correct, because if a=6 and b=3 then A will declare a+b=9 with the same logic on Day 2. In fact, the „shared knowledge“ becomes

\(D_0 : \quad 5 \le a \le 7, \quad 2 \le b \le 4 \tag{2}\)

From that B will exclude a=5 after a pass from A, which changes the outcome completely.

What goes wrong here? While both prisoners will agree to the facts in (1) and(2), and while both are true, they cannot come up with these facts out of their own knowledge. So they have no common agreement about the situation. They do not know what the other prisoner knows. Thus they cannot conclude anything form the assumed knowledge of the other prisoner. To be more precise, B cannot derive the answer from b=3, because from his viewpoint A can still have a=6 or a=7.

### Simulation

It is quite an interesting question if we can simulate the procedure. In fact, we have just given an algorithm to deduce a+b from the number of passes and a or b. Extending from our special case N=10 to a general N, our algorithm goes as follows. The shared knowledge after Day d is

\(D_d : \quad d-1 \le a \le N-d, \quad d \le b \le N-d\)

Now we take into account the true values of a and b, known privately to A and B, respectively. E.g., A knows that b=N-1-a or N-a. Thus, after Day d-1, at Day d, he knows the value of b if d-1=N-a or N-(d-1)=N-a-1. Using similar arguments for B, we get

- B knows the sum at Day d if d=N+1-b (sum=N) or d=b+1 (sum=N-1), after the pass from A at Day d.
- A knows the sum at Day d if d=N+1-a (sum=N) or d=a+2 (sum=N-1), after the pass from B at Day d-1.

Thus the result is known at day

\(d = \min \{N+1-a,a+2,N+1-b,b+1\}.\)

If N=10, a=6, b=4, then d=5. If N=10, a=6, b=3, then d=4. We can prove that this yields the correct answer for all a, b, and N. Or we can simulate, e.g. in EMT.

>function map check (a,b,N) ... $ v=[N+1-b,b+1,N+1-a,a+2]; $ d=min(v); $ if d==N+1-a then $ s = N; $ elseif d==a+2 then $ s = N-1; $ elseif d==N+1-b then $ s = N; $ elseif d==b+1 then $ s = N-1; $ endif $ return s; $ endfunction >a=0:10; check(a,10-a,10) [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10] >a=0:9; check(a,9-a,10) [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

But there is a logical problem. Does the algorithm proof that the two prisoners can escape? The problem is that the prisoners are not allowed to communicate and agree on a specific algorithm. Are there alternatives to this algorithm? Is this the fastest algorithm? These questions are not so easy to answer.

But starting from the observation that A cannot declare anything on Day 1 unless a=10, one can work the way down to the solution as written above. It becomes obvious that the given path is the only possible algorithm

### Generalizations

On the page linked at the start the problem is to decide between a+b=18 and a+b=20. Of course, we could set any other set of possible sums. The logic remains the same. Just don’t let yourself trapped into shortcuts taking into account the knowledge of his own number by any prisoner.

## The Bayesian and the Frequentist – I

I think I write a series of postings about Bayesian statistics and the view that I take on it as a frequentist. Do not expect too much depth here. I am not a specialist in statistics. I just want to study the differences in thinking, maybe with the help of some Monte-Carlo simulations in Euler Math Toolbox (EMT). Of course, as I said before in this blog once you simulate statistics you take the frequentist approach. I also said that anything that cannot be simulated is not well understood. Let us put that to the test of real life problems.

For a start, a typical application of Bayesian thinking is the following: Assume you have two bowls A and B with red and gray balls in them.

The numbers of the gray balls are GA and GB respectively, and likewise the numbers of red balls are RA and RB. So in A we have GA+RA balls, and in B we have GB+RB balls. In the image above, we have RA=10, GA=11, RB=5, GB=30.

Someone draws a ball from one of the bowls and it is red. What can you induce about the bowl the ball was drawn from? From the image, we would say that the red ball is more likely to come from bowl A. But how to quantify that? And what does „more likely“ mean, if anything?

You can learn a lot from your mistakes, false tries, complete failures or illegal arguments. So go ahead and try to say something substantial about the bowl once you know that the drawn ball is red!

The probabilistic analysis goes like this: Our events are the individual balls with the probability that each ball has to be drawn. We select a bowl with probabilities pA and pB first, and then a random ball from the selected bowl. The probability to draw a specific ball in bowl A is then the probability to draw from A divided by the number of balls in A, likewise for any ball in B.

E.g., the probability to draw a red ball turns out to be

\(p_R = p_A \dfrac{R_A}{R_A+G_A} + p_B \dfrac{R_B}{R_B+G_B}. \tag{1}\)

A similar formula holds for the probability to draw a gray ball, and you can check that both probabilities will add to 1. For a specific example, we assume that we draw from A and B with the same probability. Then

\(p_R = \dfrac{1}{2} \dfrac{10}{21} + \dfrac{1}{2} \dfrac{1}{7} = \dfrac{13}{42} \approx 0.3095\)

We now express this in the Bayesian way, which is just a shortcut for the same thing.

\(P(\text{Red}) = P(\text{Red}|\text{A}) P(\text{A}) + P(\text{Red}|\text{B}) P(\text{B}). \tag{2}\)

You read P(Red|A) as the probability of a ball being red under the conditions that it has been drawn from from bowl A.

For a sharp definition, we need the set of all red and gray balls (denoted by Red and Gray), the set of all balls in A and B (denoted by A and B). We can easily compute the probability of a ball to be drawn from any of these sets, or the intersection or union of any of these sets, by adding all probabilities for all balls in these sets. If you think about it a while the proper definition of P(Red|A) must clearly be

\(P(\text{Red}|\text{A}) = \dfrac{P(\text{Red} \cap \text{A})}{P(\text{A})}\)

using the intersection of the sets of red balls and sets of balls in A. This is the expected portion of red balls, provided they are drawn from bowl A. If we multiply the probability from any red ball in A by the number of red balls in A we get

\(P(\text{Red} \cap \text{A}) = R_A \dfrac{p_A}{R_A+R_B}. \)

With that, the Bayesian expression (2) becomes the same as (1) as you will easily verify. But we are not yet sure which one is easier to understand, or easier to handle.

There is another way to understand what is going on besides adding the probabilities of the balls. Since the red ball is either from A or from B we have, using the definition of the probability under a condition,

\(P(\text{Red}) = P(\text{Red} \cap \text{A}) + P(\text{Red} \cap \text{B}) = P(\text{Red}|\text{A})P(\text{A}) + P(\text{Red}|\text{B})P(\text{B}).\)

Our goal was to compute the probability that the ball is from A under the condition that it is red. In Bayesian speech that is

\(P(\text{A}|\text{Red}) = \dfrac{P(\text{A} \cap \text{Red})}{P(\text{Red})}.\)

We already computed everything in this formula.

\(P(\text{A}|\text{Red}) = \dfrac{5/21}{13/42} = \dfrac{10}{13} \approx 0.7692\)

But the special charm of the Bayesian way of saying things this lies in the following formula which follows easily from the definition

\(P(\text{Red}|\text{A}) P(\text{A}) = P(\text{Red} \cap \text{A}) = P(\text{A}|\text{Red}) P(\text{Red}).\)

So we get

\(P(\text{A}|\text{Red}) = P(\text{Red}|\text{A}) \dfrac{P(\text{A})}{P(\text{Red})}. \tag{3}\)

This allows us to invert probabilities and conditions. Note that P(A) appears in (2) and (3). So it is clear that it is important to know the probabilities for the selection from bowl A and B. The result will strongly depend on these probabilities.

The extreme cases are P(A)=0 and P(A)=1, i.e., we never or always draw from bowl A. Clearly, this gives a precise information on the probability that the ball is from A under the condition that it is red!

This plot has been done with

>RA=10, GA=11, RB=5, GB=30 10 11 5 30 >pA=1/2, pB=1/2 0.5 0.5 >function pAR(PA) := (PA*RA/(RA+GA))/(PA*RA/(RA+GA)+(1-PA)*RB/(RB+GB)); >plot2d("pAR",0,1,xl="P(A)",yl="P(A|Red)"):

Let us try to simulate this in EMT. We have the choice to use an easy to understand, but slow program or the quick matrix language. For this demonstration, we use the latter.

>n=1000000; >bowl=intrandom(n,2); A=(bowl==1); sum(A)/n 0.500005 >p=[RA/(RA+GA),RB/(RB+GB)]; fraction p [10/21, 1/7] >Red=random(n)<p[bowl]; // 1 for each red ball >ired=nonzeros(Red); nred=length(ired); nred/n // ired = indices of red balls 0.309861 >sum(bowl[ired]==1)/length(ired) // portion of bowl A in red balls 0.76877051323

The numbers are close enough to justify our computations. And since we can simulate the process we are sure that the results make sense.

Finally, let me add some well known application of this trick. We test patients for cancer with a test that has a true positive rate and a false positive rate of detection. Usually, the true positive rate is close to 100%, i.e., if there is cancer it will be detected. But it also claims cancer if there is none with a false positive rate which cannot be neglected. Then there is the rate of patients which have cancer. It is a good idea to think of the population as split in four groups.

- cancer and positive test
- cancer and negative test
- no cancer and positive test
- no cancer and negative test

You can quantify the expected numbers in each category if you know the above mentioned rates of true and false positive tests and the rate of the cancer in the population (or the selected populations wich undergoes the test).

In Bayesian speech we get

\(P(\text{Cancer}|\text{Positive}) = P(\text{Positive}|\text{Cancer}) \dfrac{P(\text{Cancer})}{P(\text{Positive})}\)

So if your test was positive, chances are that you have no cancer even if the positive rate P(Positive|Cancer) is close to 100%. It all depends on the frequency of the cancer and the frequency of a positive test (including the falsely positive tests). Both are known to your doctor by experience.

For an example, we take the cancer rate as 1% , the true positive rate as 100%, and the false positive rate as 5%. We than have to compute

\(P(\text{Pos.}) = P(\text{Pos.}|\text{Canc.})P(\text{Canc.}) + P(\text{Pos.}|\text{No Canc.}) P(\text{No Canc.}) \)

If we assume that P(Positive|Cancer) is very close to 1, and P(No Cancer) is also very close to 1, we just have to add the rate of cancer and the rate of positive tests in the no cancer population and get an estimate of 6% of positive results as a good estimate. Since the rate of cancer is only 1% we conclude that our chances of having cancer is approximately

\(P(\text{Cancer}|\text{Positive}) \approx \dfrac{1}{6}.\)

That looks way better than our initial alarming positive test indicates.

## Units in Euler Math Toolbox

Units have long been an integral part of EMT. I want to show you a few examples. First, let us talk about the non-metric system the US is still using. Quote: „Nobody understands the metric system! We use teaspoons!“.

>1inch 0.0254 >1in -> " cm" 2.54 cm >inch$, in$ 0.0254 0.0254 >1ft -> inch 12 >1yard -> ft 3 >1mile -> yard 1760 >1mile -> km 1.609344

As you see, you simply append the unit to a number and it will be converted into the metric system (in case you do not know that’s meters and kilograms). The units are stored in global variables which end with $. These variables are visible even within functions. So units work in functions too.

There is also a conversion operator -> which can either convert to text or to numbers, depending on the type of the parameter after the ->. See the examples.

But there is more to it. Often we use fractions or powers of units. EMT can handle that too. Let us start with fractions. In the following example, we introduce a new unit „nautical mile“ which is missing in EMT but will be present in future versions.

>sm$ = 1852.216; >160sm/h -> " m/sec" 82.3207111111 m/sec

As you see, we can now easily convert nautical miles per hours (knots) to meter per second. Assume an airplane departing with 90 knots is forced to use a climb of 200 feet per nautical mile. We want to compute the feet per minute climb which we can read from the VSI.

>200ft/sm * 90sm/h -> " ft/min" 300 ft/min

Of course, you can do this (at least approximately) in the head, if you multiply the numbers and divide by 60. E.g., at 120 knots, you need to double the ft/sm requirement. For just another example, let us compute the sink rate which is necessary at 190 knots to obtain a 3° glide path.

>200ft/sm * 90sm/h -> " ft/min" 300 ft/min >190sm/h * tan(3°) -> ft/min; print(%,unit=" ft/min") 1008.50 ft/min

The example shows still another way to print with units. By the way, the degree symbol behaves like a unit in many ways.

For further examples, here are some american kitchen units. You will also notice that EMT can handle powers in units on both sides.

>10cm^3 1e-005 >cup$ = 236.5882365liter/1000 0.0002365882365 >tablespoon$ = cup$/16; >teaspoon$ = tablespoon$/2; >1teaspoon -> " cm^3" 7.39338239062 cm^3 >1liter -> teaspoon 135.256090807