A Probablistic Look at the Geometric Series

So talking about the St. Petersburg paradox has reminded me of my undergraduate years, including a certain quick (but kind of handwavy) proof of the formula of the geometric series shown to me by Dr. Vitaly Bergelson.

One takes a coin which has the probability 0<p<1 of being heads and probability q=1-p of being tails, and play a familiar game: flip the coin until it lands on tails. The possibilities are exhausted as follows: tails first; heads then tails; two heads then tails; three heads then tails; etc, and always heads forever. The last option has probability zero (not to be confused with impossible) because the probability of that option is less than the probability of getting n heads in n tosses is 1/p^n which converges to 0 as n\to\infty, so since the probability of always getting heads is less than all these it must be 0.

So we have to have one of these mutually exclusive options, and so their probabilities must add up to 1. The probability of n heads followed by 1 tails is p^n\cdot q. So we have that \sum_{n=0}^{\infty} p^nq=1, or if we divide by q, we get: \sum_{n=0}^{\infty} p^n= 1/q=1/(1-p).

So the geometric series comes out of the wash from playing a coin game. The question Dr Bergelson asked next was what happens when instead of stopping at 1 tails, you stop at 2? What new formula do you get? I admit that I never got the answer to that question despite trying for some time, though I haven't thought about it since my undergraduate days.

I Still Don’t Understand the St Petersburg Paradox

Let’s consider the following game: You pay a certain amount to play the game. I have a fair coin in my hand. At the beginning of each turn, I flip the coin. If it turns up tails, the next turn begins. If it turns up heads, I give you 2^n dollars, where n is the turn. So if I get heads on the first turn, you get $2. If I flip the coin and get tails, then another tails, and then heads, then I give you $8. The question is how much would you pay to play this game?

A reasonable measure for this sort of thing is the expected value. When you have a game with the probability of different cash values at the end, the expected value tells you what the average will be if you consider what happens through many of the same game. So it seems like a good choice for how much you should pay, but in this case the expected value is: \sum_{n=1}^{\infty} \mathsf{\text{probability of the game ending on turn } } n \cdot 2^n = \sum_{n=1}^{\infty} 1/2^n \cdot 2^n= \sum_{n=1}^{\infty} 1 =\infty.* So the expected value here is infinite! Okay, so that means we should spend any amount of money on this game, because no matter what it’s a good deal.

So I claim, but no one will actually do such a thing. Hence, the paradox. Now the resolution to this that I’ve heard and heard was due to Euler (or one of the Bernoulli’s?) was the idea of utility and diminishing marginal value. The idea is that your first trillion dollars is much more valuable to you than your next and you have less value of money the more you get. Now my problem with this solution is that I can modify the game based on your utility function so that the first couple of turns still give a small amount, but the higher turns give an absurd amount of money.

I’ve also heard that martingales and stopping times give a solution to the paradox, but I don’t know enough probability to understand it. So if someone does, would you be willing to give an explanation?

*Notice we can compute the probability because we know if the game ended on the nth turn, then the coin tosses had to have gone n-1 tails and then heads. So the probability of getting that specific combination is 1/2^{n-1}\cdot 1/2=1/2^n.

Thoughts on Talks: Polya’s Theorem and Fourier Transforms

Polya’s theorem states, colloquially speaking, that “a drunk man will eventually return home, but a drunk bird might not.” Though an interesting theorem, the novelty of today’s talk by resident probabilist Aaron Montgomery was that by using Fourier transforms (and copious applications of Fubini’s theorem) the theorem is proved with relative ease and without any messy combinatorics. This explains the latter half of his title: “Local Central Limit Theorem for Z^d or: How I Learned to Stop Worrying and Love the Fourier Transform,” though as Aaron himself stated the first part of the title did not actually make an appearance.

During the talk, I began to remember the rich connection one has between integrals and our respective fields. In functional analysis, we associate integrals with algebraic functions: linear functionals and inner products. In probability, the association is with expected values. But more than a surface level association, this gives a depth to the subjects involved and the seemingly straightforward yet stubbornly difficult integral becomes something alluring yet austere.

As a contribution, I wanted to talk about how functional analysis fit into this talk, but first I have to retell the beginning.

The basic setup is that one has a sequence \{X_i\} of independent, identically distributed random vectors in \mathbb{Z}^d. These vectors correspond to the steps that our random walker (or flyer as the case may be) are taking. The example of interest to us is where we take the standard basis vectors and vectors obtained by multiplying them by -1 and give them uniform probability of being chosen. We will call this the simple random walk. So in the case of d=2, we have a 1/4 chance of going up, down, left or right. To keep track of the walker’s position, we just need to add up the vectors: S_n=\sum_{i=1}^{n}X_i.

We consider the characteristic functions \Phi_{X}(t)=\mathbb{E}(e^{it\cdot X}), and by independence we have \Phi_{S_n}=\Phi_{X_1}^n. This follows quickly from the fact that X_i are independent and identically distributed. But we pause here briefly, to mention the fact that rare numerical equality of the integral of a product with the product of integrals is the very condition of independence, about which we usually rely on our intuitive notion. This quickly reminded me about Mark Kac’s monograph about the subject, but maybe more on that later this month.

The latter is computable, at least in the simple case, we have that \Phi_{X_1}(t)=\frac{1}{d}\sum_{k=1}e^{it_k}+e^{-it_k}=\frac{1}{d}\sum_{k=1}^{d}\cos(t_k).

It is at this point in the talk that we see functional analysis taking a peek, bidding us to invite it in. We notice that by taking the proper scaling of our usual Lebesgue measure on [-\pi,\pi]^d, we have that the functions f_x(t)=e^{it\cdot x} form an orthonormal basis for x\in\mathbb{Z}^d, and that by definition of expected value, one has \Phi_{S_n}(t)=\sum_{x\in\mathbb{Z}^d}\mathbb{P}(S_n=x)e^{it\cdot x}. So we have that our characteristic function can be written as a linear combination of our orthonormal basis, but more to the point, we have numerical values for the coefficients of that sum by taking inner products!

So with a generalization of Parseval’s identity, we have that \langle \Phi_{X_1}^n(t), e^{it\cdot x} \rangle = \mathbb{P}(S_n=x), or to put this in integral terms: \int_{[-\pi,\pi]^d}\!\Phi_{X_1}^n(t)e^{-it\cdot x}\,dm=\mathbb{P}(S_n=x).

The characteristic function lines up the information we need and sorts them out, and then the Fourier transform allows us to take them out.

Of course, we still have some computations to do and it is not easy, but with this association between the probabilities of S_n and the characteristic function, we have something that we can estimate and prove the theorem.