## Computing a Special Sum

I recently found the following problem in my Google+ account: What is

\(\dfrac{1}{2 \cdot 4} + \dfrac{1 \cdot 3}{2 \cdot 4 \cdot 6} + \dfrac{1 \cdot 3 \cdot 5}{2 \cdot 4 \cdot 6 \cdot 8} + \ldots\)

This kind of sum of growing products can often be solved with the following trick.

\(1 = a_1 + (1-a_1) = a_1 + (1-a_1) a_2 + (1-a_1)(1-a_2) = \ldots\)

The n-th iteration of this process has yields

\(1 = a_1 + \ldots + (1-a_1)\ldots(1-a_n)a_{n+1} + (1-a_1)\ldots(1-a_{n+1}).\)

This is true for any sequence a_n. If we want to let n go to infinity we need to control the last term in this sum. We want to know its limit. E.g.,

\(1 = \sum\limits_{k=1}^\infty \left( a_{k+1} \prod\limits_{l=1}^k (1-a_l) \right)\)

will be true only if

\(\lim\limits_{k \to \infty} \prod\limits_{l=1}^k (1-a_l) = 0.\)

We can use

\(\log(1-h) \le -h \qquad\text{for } 0 \le h < 1\)

and get the following estimate

\(\log \prod\limits_{l=1}^k (1-a_k) = \sum\limits_{l=1}^k \log (1-a_l) \le – \sum\limits_{l=1}^k a_l.\)

Thus for positive a(l), not larger than 1,

\(\sum\limits_{k=1}^\infty a_l = \infty \quad\Rightarrow\quad \prod\limits_{l=1}^\infty (1-a_k) = 0. \tag1\)

Note that the right conclusion is also true, if the a_l do not converge to 0, but are bounded by 1.

Now we simply apply all this with

\(a_k = \dfrac{1}{2k}\)

and get that the sum above is equal to 1/2.

For the records, we study the conclusion (1) in more detail. First of all for positive a_l between 0 and 1, we get

\(\sum\limits_{k=1}^\infty a_l = \infty \quad\Leftrightarrow\quad \prod\limits_{l=1}^\infty (1-a_l) = 0.\)

We can use the estimate

\(– 2h \le \log(1-h) \le -h \qquad\text{for } 0 \le h < \dfrac{1}{2}<h\)

for this. In case, infinitely many a(l) are larger that 1/2, the result is obvious.

In a similar way we can prove for positive a(l)

\(\sum\limits_{k=1}^\infty a_l = \infty \quad\Leftrightarrow\quad \prod\limits_{l=1}^\infty (1+a_l) < \infty.\)

If the a(l) negative and positive things get more involved.

## Python in EMT – Cross Sum of Cubes

When is the cross sum (sum of the digits) of the cube of a number equal to the number?

It is obviously true for n=0 and n=1. You might be able to find n=8 with the cube 512. Then, how do you prove that there are finitely many numbers? How long will it take to find all numbers?

For a dirty estimate we have

\((9m)^3 \ge (a_0+\ldots+a_{m-1})^3 = a_0+a_1 10 + \ldots + a_{m-1} 10^{m-1} \ge 10^{m-1}\)

It is an easy exercise to show that m<7 is necessary. So we only have to check up to n=100. The cube of 100 has already 7 digits. This can be done by hand with some effort, but here is a Python program written in Euler Math Toolbox.

>function python qs (n) ... $s=0 $while n>0: $ s+=n%10; $ n/=10 $return s $endfunction >function python test() ... $v=[] $for k in range(100): $ if k == qs(k**3): $ v.append(k) $return v $endfunction >test() [0, 1, 8, 17, 18, 26, 27]

## Recursion Formulas

I do not know if I ever wrote about this. But it is a well known theory, anyways. To find a formula for the Fibonacci numbers

\(F_{n+1} = F_n + F_{n-1}, \quad F_0 = 0, \quad F_1 = 1\)

you can use the „Ansatz“

\(F_n = c^n\)

you will immediately find that c must satisfy

\(c^2 = c + 1\)

The two solutions for this equation can both generate the recursion formulas as well as any linear combination of them. So our Ansatz yields

\(F_n = a c_1^n + b c_2^n = a \left(\dfrac{1+\sqrt{2}}{2}\right)^n + b \left(\dfrac{1-\sqrt{2}}{2}\right)^n\)

Observe the logic that we followed here! All we know by know is that if F(n) is defined by this formula, it will satisfy the recursion formula for the Fibonacci sequence. On the other hand, it is easy to find a and b such that F(0)=0, F(1)=1. Then we know that we have indeed found the correct formula for the sequence. I.e.,

\(F_n = a c_1^n + b c_2^n = \dfrac{1}{\sqrt{5}} \left( \left(\dfrac{1+\sqrt{2}}{2}\right)^n – \left(\dfrac{1-\sqrt{2}}{2}\right)^n\right)\)

We have also shown that any sequence which satisfies the recursion must be of that form with proper constants a and b. This is so, since we can determine a and b for any start values.

Due the fact that c(2) is between -1 and 0, we get

\(F_n = \text{round } \dfrac{1}{\sqrt{5}} \left(\dfrac{1+\sqrt{2}}{2}\right)^n\)

Now, if we take other start values, what happens? E.g., consider the numbers

\(L_n = \left(\dfrac{1+\sqrt{2}}{2}\right)^n + \left(\dfrac{1-\sqrt{2}}{2}\right)^n\)

The start values are then L(0)=2 and L(1)=1. We simply get another integer sequence. And we get the strange fact that

$latex \left(\dfrac{1+\sqrt{2}}{2}\right)^n$

gets closer and closer to integers. This has been observed by Yves Meyer and mentioned in a blog post recently by J. D. Cook. So I came to write this short article. The difference, computed in Euler Math Toolbox, is as follows

>p1 = (1+sqrt(5))/2 1.61803398875 >p2 = (1-sqrt(5))/2 -0.61803398875 >n=0:19; longest (p1^n)' 1 1.618033988749895 2.618033988749895 4.23606797749979 6.854101966249686 11.09016994374948 17.94427190999916 29.03444185374864 46.97871376374781 76.01315561749645 122.9918693812443 199.0050249987407 321.996894379985 521.0019193787257 842.9988137587108 1364.000733137437 2206.999546896147 3571.000280033584 5777.999826929732 9349.000106963316 >p1^n+p2^n [2, 1, 3, 4, 7, 11, 18, 29, 47, 76, 123, 199, 322, 521, 843, 1364, 2207, 3571, 5778, 9349]

The difference to the next integer is of course a power of p2 goes to zero geometrically fast.

## Sensor Size and Image Quality

If you are confused about the effect the sensor size makes to your image quality you are not alone. In this post, I try to explain a good part of these problems. I try to do that without any math. But I assume some basic background in photography.

We wish to learn about the effect of a smaller sensor size to the depth of field, the background blurriness and the noise. Those are important ingredients for the image quality. For portraits, we wish to have a noise free and blurry background. For night shots, we wish to have as little noise as possible.

Let us do a thought experiment to understand the topic. Imagine, we could make the world smaller by a factor of 2 in all dimensions. We would shrink a full frame camera to a 4/3 camera. The shrunken camera is then said to have a crop factor of 2. Note that we shrink everything to half its size: the sensor width and height (e.g. from 36 x 24 to 18 x 12), the focal length (e.g. from 50mm to 25mm) and the open width of the aperture at any given F stop (e.g. from 5mm to 2.5mm).

First of all, the angle of view remains exactly the same for our 25mm lens compared to the 50mm in full frame. So the image shows the same things as before, even if we would not shrink the world.

Moreover, the light that comes through the lens is only 1/4 of the light, since our aperture has only 1/4 of the area than before. But the distance to the sensor plane is only 1/2, and so we get the same amount of light per square mm. If we used film we would get the same exposure. That explains why we compute the F stop by dividing the aperture by the focal length. This stays the same in our shrunken camera. Thus we use F8 no matter what size of camera we have.

But some important things change.

(1) Our light is on the same level for the smaller camera. So each pixel gets only 1/4 of the light. Remember that the pixels are smaller too. But the pixels produce the same amount of noise. The noise does not depend much on the size of the pixel. Thus the noise level in relation to the signal strength for each pixel increases by quite an amount. We can say by a factor 4. The image gets noisier.

(2) The objects do *not* shrink or get closer by a factor of 2. If they would we’d get the same depth of field and blurriness in our image. But a man in 10 meters distance is still the same in the same distance. So we have to change our focus by moving the lens further to the sensor plane. Of course, the camera will do that for us automatically. The effect of this is known to any photographer and is quite obvious: If we focus on something further away the background gets sharper. For practical purposes we can say it gets twice as sharp when the distance is doubled.

*What that means is that you get a less blurry background and more noise at the same settings. *

That’s bad. The good point is that you also get a larger depth of field around the object if you need that.

To compensate for this you can double the aperture size, i.e., increase by two stops (e.g. from F4 to F2), and decrease your ISO by two stops to compensate (e.g. from ISO 400 to ISO 100). Your sensor will then get the same amount of light and your noise will be the same. Moreover, your background blurriness is the same too. It is a bit complicated to compute, but your depth of field around your object will also be approximately the same. You get the same image quality.

*In terms of image quality for images with an object and a background, you can replace a full frame 80mm lens at ISO400 and F4 with a 40mm lens for your 4/3 system at ISO100 and F2. *

But, of course, your full frame lens might allow you to shoot at ISO100 and F2. You have no chance to cope with this on a 4/3 system. For you would need ISO25 and F1. The full frame camera offers more opportunities. It is also more bulky and expensive.

If you do not care about the background or you want more sharpness all across the frame the small camera seems to favor you. But you still have more noise at the same settings.

*In terms of noise for night photography or landscapes, you get the same noise with ISO400 on the full frame and with ISO100 on a 4/3 system. To compensate, you need more light by increasing the aperture or the exposure time.*

By the way, measurements at various camera sites in the net confirm this finding. With the same F stop on the same image, the full frame camera can be shot at ISO 800, the APS-C camera at ISO 400 and the 4/3 camera at ISO 200, and all will produce the same noise. Pictures will still look different due to the different background blurriness.

## About Vectorization

Me being the author of a matrix language, you will be surprised that I do no longer favor the matrix approach in education. In case you do not know what I am talking about: A matrix language avoids loops by vectorizing functions and operators.

Let me give you an example. In a classical language you would have to write something like the following.

>t=0:0.1:10; >s=zeros(size(t)); >for k=1 to length(t); s[k]=sin(t[k])*exp(-t[k]); end;

The syntax is from EMT, but it does not matter. You need to declare a vector, then fill it with a loop. In a matrix language, you can simply write

>t=0:0.1:10; >s=sin(t)*exp(-t);

The function sin() will operate on vectors and create a vector of the sine of all elements. Likewise, exp() and the operator * of the multiplication.

EMT has made that more comfortable than Matlab, allowing to combine column and row vectors to matrices, and introducing functions that map automatically to the elements of their arguments. For more, see the documentation of EMT.

This is very handy. It seems that with this trick you can write short code that is easily understandable even by non-programmers. But is that really so?

The problems start if you want to vectorize more complicated data structures, or want to do more complicated things than just applying a function to a vector. You will soon find that you have to use your pick of matrix language on a very high level of proficiency. Here is one example which is just a little above the basics: Create a vector that contains those elements of a vector where the next element is not smaller.

>x=normal(100); >x[nonzeros(diff(x)>=0)]

I bet it is easier to write a loop in an ordinary programming language than to find out how that works in your choice of matrix language and in an efficient way.

My key experience was the following task in Matlab: Read in the words in 10000 spam mails and 10000 good mails, and make a statistical function that can detect spam mail by the words in the mail. Yes, I could to it! It took a full day to find the proper Matlab tricks and get it to run under two minutes. The next day I tried in Java. It took me two hours to write the code using Java’s tokenizers and dictionaries. And the program finished the task in under 5 seconds! That typically happens if you use the wrong tool.

So, is this an argument to drop matrix languages altogether? No, certainly not. They are way too practical for simple purposes like handling simple data or a plotting a function. But you should not base the education in universities or colleges on tools like Matlab or EMT. Those tools go into the wrong direction if they are taken too seriously. I would even include R into this collection. But R is more often used for the specialized statistical functions it contains, so it has merits that no current programming language can provide.

You need to ground the base with a programming language like C, C++, Java or Python. There, the student can learn about vectors, matrices and loops to handle them. These languages do also contain advanced tools like dictionaries and other collections. And you can program your own data structures. They are also way more efficient than any of the all-in-one packages that come with a matrix language. Moreover, it is not at all difficult to create the matrix functionality in Java.

Python is a not the worst of ideas to use for education in programming. It is a basic language, and there are tools for all sorts of purposes, including numerical matrices, statistics or symbolic computations. It is not as basic as C++ or Java, however, and quite different from languages used in the industry. But it is, for sure, closer to those real world programs than Matlab.

## Making GIF Animations

I fixed „makeanimations.e“ in EMT.

The code for this is the following.

>function f(z,z0) := (z-z0)/(conj(z0)*z-1); >function plotcircle(z,r,z0,color=black) ... $t=linspace(0,2pi,500); w=f(z+r*exp(I*t),z0); $barstyle("#"); barcolor(color); $hold on; polygon(re(w),im(w),0); hold off; $endfunction >function plotall (d,z0=0.5) ... $fullwindow; setplot(1.05); clg; $plotcircle(0,1,0,lightgray); $plotcircle(0,1/3,z0,white); $c=exp(((0:5)/6+d)*2*pi*I)*2/3; $loop 1 to 6; plotcircle(c[#],1/3,z0,red); end; $endfunction >plotall (0) >load makeanimations Functions to make an animated GIF or an MPEG movie. >makegif("plotall",(0:0.01:1)/6,"animation",w=400,h=400)

## Image Processing and Data Evaluation in EMT

There is an example for Euler Math Toolbox (EMT) where I tried to fit curves to a real chain. I then fitted a parabola and a catenary to the points and explained a bit about the catenary.

The points where derived by hanging a real chain in front of the screen. As one user pointed out, this is an outdated method in the ages of digital images. Of course, now you can do that with images in EMT too.

First I downloaded a free image of an egg. If you have an image from a digital camera reduce it in size. We do not want to load a high res image into EMT. You can place the image into the notebook with

>loadimg("egg.jpg");

The image was originally about 944×993 pixels. It will be reduced in size for the notebook. loadimg() accepts a number of lines to be used for the image in height. For the blog, I reduced the image too.

Now the following lines of code will load the image into EMT as a matrix of RGB (red-green-blue) values.

>M=loadrgb("egg.jpg"); >size(M) [944, 993]

We can analyze the image in EMT. E.g., let us plot the average portion of red color in each column of the image.

>aspect(2); plot2d(sum(getred(M'))'/rows(M)):

We could also process the image. Let us extract the RGB values and manipulate them to put a warm tint to the image.

>{R,G,B}=getrgb(M); >function f(x,a) := x + a*x*(1-x) >insrgb(rgb(f(R,0.4),G,B)); >savergb(rgb(f(R,0.4),G,B),"egg2.png");

The insrgb() command inserts the RGB image into the notebook. The savergb() command saves the image. It looks like this.

You can also plot the image as a background image to your plot. The command is plotrgb(). It uses the current plot area as set by window(). In this blog, we make it simple and set a plot are with setplot(). We draw a coordinate system with xplot(). These are older and more basic functions than plot2d(). But plot2d() relies on these functions too.

>savergb(rgb(f(R,0.4),G,B),"egg2.png"); >aspect(1); setplot(0,1,0,1); plotrgb(M); xplot();

You will notice that the plot is not very efficient. That is why you need to reduce your digital image to 1MB or less. This problem might be addressed in further versions of EMT. E.g., we could read the image from a file or a list of loaded images for each plot. Currently, plotrgb(M) for matrices M is the way to go. Of course, you can also use EMT to reduce your image. One way would be to smoothen it with a two matrix fold and then to take every other point. In the example, we reduce the size by a factor of 3.

>E=ones(5,5)/25; >M1 = rgb(fold(R,E),fold(G,E),fold(B,E)); >M1=M1[2:3:rows(M1),2:3:cols(M1)]; size(M1) [313, 330] >savergb(M1,"egg2.png");

Note that the previous images of the egg were reduce for the blog. This is the original result of the reduction in EMT.

Assume we want to get points on the outline of the image. For this, we write a little code in EMT to get mouse clicks from the user.

>function getclicks () ... $global M; $setplot(0,1,0,1); plotrgb(M); xplot(); $hold on; $v=[]; $repeat; $ m=mouse("Click or press Return!"); $ until length(m)<2; $ v=v_m; $ mark(m[1],m[2]); $end; $return v; $endfunction >v=getclicks() 0.401925 0.0930601 0.249599 0.20031 0.131469 0.352635 0.066187 0.560917 0.112817 0.758318 0.28224 0.8951 0.569794 0.901318 0.768749 0.794068 0.88377 0.675938 0.956825 0.51895 0.959933 0.332429 0.900868 0.194092 0.762532 0.0868427 0.599326 0.0510929

## The Bayesian and the Frequentist – III

For another story, I read the following problem on a Google+ site yesterday: You have three cards in your pocket, one with two red sides, one with two blue sides and one with a red and a blue side. You take one out of your pocket, place it on the table, and it shows a blue side. What is the probability that the other side is also blue?

That fits into my Credo that the quickest way to get order into our thinking is to imagine or actually do a simulation. So let us do that first without trying any abstract thoughts about the problem.

The simulation is obviously to draw one of the three cards with equal probability. Then to take one of the sides with equal probability. Then we have to discard the cases where the side is red. This time, we make a small Java program. I try to design one that is easily understandable.

public class MC { static final int red=0,blue=1; public static int random (int n) { return (int)(Math.random()*n); } public static void main (String a[]) { int C[][]={{red,red},{blue,blue},{red,blue}}; int n=100000000; int found=0; int valid=0; for (int i=0; i<n; i++) { int card=random(3),side=random(2); int otherside=1-side; if (C[card][side]==blue) { valid++; if (C[card][otherside]==blue) found++; } } System.out.println("Found : "+(double)found/valid); } }

We can do 100 million simulations in about one seconde. The result is 0.66666164. Of course, we can also do it in EMT with its matrix language or with a TinyC program. With the matrix language, I can only get up to 10 million iterations. Extending the stack would help. But then it is still slower.

>C=[1,0,0;1,0,1] 1 0 0 1 0 1 >n=10000000; >i=intrandom(n,2); j=intrandom(n,3); >Sel=mget(C,i'|j'); Other=mget(C,(3-i)'|j'); >sum(Sel==1 && Other==1)/sum(Sel==1) 0.666787938524

So the result seems to be 2/3 probability for the other side to be blue. It is not difficult to understand. We always reject the card with two red sides. And we reject the card with one red side half of the time, but we always accept the card with two blue sides. Thus it is twice as likely to have blue on the other side.

The Bayesian trick does indeed help in this case. But it makes things more mysterious. Let us call BB the event that the card with two blue sides is drawn, and B the event that we see a blue side. Then

\(P(\text{BB}|\text{B}) = P(\text{B}|\text{BB}) \cdot \dfrac{P(\text{BB})}{P(\text{B})} = 1 \cdot \dfrac{1/3}{1/2} =\dfrac{2}{3}.\)

## The Bayesian and the Frequentist – II

This is not immediately related to Bayesian statistics. But it is a good argument for the frequentistic approach. I met the problem in a Youtube video, but the core of this problem is well known. The solutions are not always correct, nor is the linked video. Actually, I present two problems

**Problem 1**

You see two frogs and hear a croak. The croak can only come from a male. Then, what is the probability that one of the frogs is a female?

**Problem 2**

You meat a man, and he tells you that he has two kids, at least one of which is a boy. What is the probability of the other being a girl? And does the probability change if you know that the boy is born on a Tuesday?

Both problems are obviously only vaguely formulated. You need to make assumptions. E.g., in the following, let us assume that for each random frog or kid the probability of being male is 1/2. But, as we will see, the answer to Problem 1 depends on more assumptions. The intuitive answers to both problems tend to be completely wrong.

My point is that you need to imagine a Monte-Carlo simulation in both situations. If you cannot come up with an experiment any answer will be useless anyway. That is the heart of the frequentistic approach to statistics.

Let us start with Problem 2. So your simulation would assign genders to both kids by random, so BB, BF, FB and FF have the same probability 1/4. Note that there is BF and FB since we assign the gender to each kid separately. Then the simulation would discard the irrelevant case FF. We are left with BB, BF and FB with equal probability. Thus, 2/3 of the simulated cases contain a female. Here is a code for this in EMT.

>n=100000; Gender=(random(n,2)<0.5); Boys=sum(Gender)'; >sum(Boys>0 && Boys<2)/sum(Boys>0) 0.665760978144

As usual this code is in the style of a Matrix language. If you are not familiar with this you need to write a loop. Hint: „Gender“ is a nx2 matrix with 0=girl and 1=boy. „Boys“ is a vector that contains the number of boys in each of the 100 thousand samples.

Now, what changes if we know that boy is born on Tuesday? In our simulation, we would have to assign birth dates to the boys. We make the assumption that each day has the same probability. We discard every pair that has no Tuesday born boy. Let us do that in EMT first.

>n=1000000; Gender=(random(n,2)<0.5); Boys=sum(Gender)'; >TuesdayBoys=(random(n,2)<1/7 && Gender==1); TBoys=sum(TuesdayBoys)'; >sum(TBoys>0 && Boys<2)/sum(TBoys>0) 0.518327797038

Again, a loop may be more convenient for you if you are not familiar with the Matrix language of EMT.

If we think of the cases and their probabilities, we get the following cases with the probabilities

\(P(M_TM_T)=\dfrac{p^2}{4},\)

\(p(M_TM_O)=P(M_0M_T)=\dfrac{p(1-p)}{4},\)

\(P(M_TF)=P(FM_T)=\dfrac{p}{4}\)

using the obvious abbreviations (MT for a Tuesday boy, MO for any other boy, and F for a girl) and p=1/7. The cases are exclusive to each other. Thus the probability for a girl under these assumptions is

\(\dfrac{2p/4}{p^2/4+2p(1-p)/4+2p/4} = \dfrac{2}{4-p} = \dfrac{14}{27} \approx 0.5185\)

That agrees to our experiment in three digits. Random Monte-Carlo experiments like the one we performed are not very accurate. With a programming language, however, you can make much larger experiments.

Let us turn to Problem 1. Simulating the frogs means we have to decide for a probability of gender and for the probability of croaking. Moreover, we have to decide what to do with two croaking males.

Assume, you cannot distinguish one from two croaks. Then this is the same as in Problem 1 with a general probability p instead of 1/7. The extreme case is p=1 with the result 1/3 for a female. It is just the same case as in Problem 2 without the Tuesday information. The other extreme case is p close to 0. Then, if you hear a croak the probability for a female is close to 1/2.

The situation is quite different if you assume that there was only one croak. For p=1 this means there must be a female for sure. For p->0 you get 1/2 as before. I leave that to work out for you.

What about a Bayesian approach? We want to compute something like P(Female|Croak). But what do we know? We could use the usual Bayesian trick

\(P(F|C) = P(C|F) \dfrac{P(F)}{P(C)}.\)

We have to work out the probabilities on the right hand side, nevertheless. You can make all sorts of non-sense now. Clearly, P(F)=1/4. But the other probabilities are difficult to compute. So your are left with the definition of the conditional probability

\(P(F|C) = \dfrac{P(F \cap C)}{P(C)}\)

And that is exactly what we did above. There is no help from Bayes here.

## A Problem of Logic

### The Problem

Recently, I stumbled across a very interesting problem. I closed the site and started to think about the solution. Therefore, I neither have a link nor the solution given on the page. Let us try our luck with it. (I found a similar problem here. The solution is similar to the one I found. But here, I try to say more about the problem.)

Problem: Two prisoners A and B are given two numbers, each between 0 and 10. A is given the number a=6 and B is given the number b=4. They are told that the numbers add to 9 or 10. As usual, the prisoners cannot communicate in any way. Each day, A is asked if the sum is 9 or 10, and then B is asked the same question. Either prisoner can pass or claim to know the answer. If the answer is correct both regain their freedom, if not both are executed.

Can they find a secure solution without ever communicating between each other? If they both are clever enough they can. Now, you can leave this page for thinking if you wish. But mark it so that you can find back in case you cannot solve the problem.

The logic gets very involved if you start with the given values a=6 and b=4. A knows from the start that either b=4 or b=3. This he knows that B knows that he has a number between 5 and 7 etc. If you think that way, you are in for a problematic approach.

### The Solution

It took me several attempts to change my thinking. Let us call „shared knowledge“ the facts that both prisoners know and that both prisoners know the other prisoner knows. In fact, we think of what an observer would know. We ignore the specific numbers a=6 and b=4 for a moment.

Then, after A passes, everyone knows (including the observer) that A cannot have a=10. So he has any number between 0 and 9. Likewise, B cannot have b=10 when he passes. But notice that B can not have the number b=0. Because A has less than 10 he would immediately know the answer a+b=9. Thus, after the first day, we write the shared knowledge as

\(D_1 : \quad 0 \le a \le 9, \quad 1 \le b \le 9\)

Once, A passes the second day we know that he has neither a=0 or a=9. Otherwise, he would have known the sum as you will easily check. So a is between 1 and 8. If B passes too we know that b=1 and b=9 are impossible. So after the second day everyone knows

\(D_2 : \quad 1 \le a \le 8, \quad 2 \le b \le 8\)

Continuing like that we get

\(D_3 : \quad 2 \le a \le 7, \quad 3 \le b \le 7\)

\(D_4 : \quad 3 \le a \le 6, \quad 4 \le b \le 6\)

Now, A knows that a=6. Thus either b=3 or b=4. But b=3 no longer is possible. So when A is asked at Day 5, A knows for certain that a=6 and b=4.

Note that B has to decide between a=5 and a=6 which he cannot do earlier with this reasoning. But assume the correct numbers are a=6 and b=3. Then B has to decide between a=6 and a=7. We can do so when he is asked at Day 4. He will then already know A:3-6.

This solved the problem in our specific case. The logic is rather simple, but weird.

### Deceptive Ideas

Let us try to shorten the process and put our logic to a test. We ignored the specific knowledge of A and B about the number they have. A knows a=6, and B knows b=4. So A knows that b=3 or b=4. Thus A knows that B knows that A has a number between 5 and 7. Likewise, B knows that A knows that B has a number between 3 and 5. So, you might deduce that the shared knowledge between A and B at Day 0 is

\(D_0 : \quad 5 \le a \le 7, \quad 3 \le b \le 5 \tag{1}\)

With the logic from above, we deduce that A would know that answer with a=7. So, after a pass, B knows a=5 or a=6. With either b=3 or b=5 he could immediately deduce a+b=9 or a+b=10. Thus, when he passes he must have b=4. A could then declare the result after only two passes when asked at Day 2.

This is a false and deceptive logic. It cannot be correct, because if a=6 and b=3 then A will declare a+b=9 with the same logic on Day 2. In fact, the „shared knowledge“ becomes

\(D_0 : \quad 5 \le a \le 7, \quad 2 \le b \le 4 \tag{2}\)

From that B will exclude a=5 after a pass from A, which changes the outcome completely.

What goes wrong here? While both prisoners will agree to the facts in (1) and(2), and while both are true, they cannot come up with these facts out of their own knowledge. So they have no common agreement about the situation. They do not know what the other prisoner knows. Thus they cannot conclude anything form the assumed knowledge of the other prisoner. To be more precise, B cannot derive the answer from b=3, because from his viewpoint A can still have a=6 or a=7.

### Simulation

It is quite an interesting question if we can simulate the procedure. In fact, we have just given an algorithm to deduce a+b from the number of passes and a or b. Extending from our special case N=10 to a general N, our algorithm goes as follows. The shared knowledge after Day d is

\(D_d : \quad d-1 \le a \le N-d, \quad d \le b \le N-d\)

Now we take into account the true values of a and b, known privately to A and B, respectively. E.g., A knows that b=N-1-a or N-a. Thus, after Day d-1, at Day d, he knows the value of b if d-1=N-a or N-(d-1)=N-a-1. Using similar arguments for B, we get

- B knows the sum at Day d if d=N+1-b (sum=N) or d=b+1 (sum=N-1), after the pass from A at Day d.
- A knows the sum at Day d if d=N+1-a (sum=N) or d=a+2 (sum=N-1), after the pass from B at Day d-1.

Thus the result is known at day

\(d = \min \{N+1-a,a+2,N+1-b,b+1\}.\)

If N=10, a=6, b=4, then d=5. If N=10, a=6, b=3, then d=4. We can prove that this yields the correct answer for all a, b, and N. Or we can simulate, e.g. in EMT.

>function map check (a,b,N) ... $ v=[N+1-b,b+1,N+1-a,a+2]; $ d=min(v); $ if d==N+1-a then $ s = N; $ elseif d==a+2 then $ s = N-1; $ elseif d==N+1-b then $ s = N; $ elseif d==b+1 then $ s = N-1; $ endif $ return s; $ endfunction >a=0:10; check(a,10-a,10) [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10] >a=0:9; check(a,9-a,10) [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

But there is a logical problem. Does the algorithm proof that the two prisoners can escape? The problem is that the prisoners are not allowed to communicate and agree on a specific algorithm. Are there alternatives to this algorithm? Is this the fastest algorithm? These questions are not so easy to answer.

But starting from the observation that A cannot declare anything on Day 1 unless a=10, one can work the way down to the solution as written above. It becomes obvious that the given path is the only possible algorithm

### Generalizations

On the page linked at the start the problem is to decide between a+b=18 and a+b=20. Of course, we could set any other set of possible sums. The logic remains the same. Just don’t let yourself trapped into shortcuts taking into account the knowledge of his own number by any prisoner.