Chapter 0Review

0.1 Definitions

Probability Space and Function. We consider a finite set \(\Omega=\{\omega_i\}_{i=1}^N\) as our space of all plausible events, and an additive probability function \(P:2^\Omega\to[0,1]\) that assigns likelihood to any subset of \(\Omega\), which we denote by \(2^\Omega\). Namely, we require that for any \(A\subset\Omega\),

\[ P(A)=\sum_{\omega_i\in A}P(\{\omega_i\}), \]

so in particular, we have that if \(A\cap B=\emptyset\), then

\[ P(A\cup B)=\sum_{\omega_i\in A\cup B}P(\{\omega_i\})=\sum_{\omega_i\in A}P(\{\omega_i\})+\sum_{\omega_i\in B}P(\{\omega_i\})=P(A)+P(B). \]

Moreover, we need that the probability of any event occurring is exactly 1

\[ P(\Omega)=\sum_{\omega_i\in \Omega}P(\{\omega_i\})=1. \]

Example. Rolling two independent dice, we have the plausible pairs

\[ \Omega=\{(i,j): i,j\in\{1,\dots,6\}\} \]

and the natural probability distribution

\[ P(\{(i,j)\})=\frac{1}{36}\quad \forall i,j\in\{1,\dots,6\}. \]

Random variable. A random variable (or observable) is a function

\[ X:\Omega\to\mathbb{R}. \]

Example. For two dice, the typical example is the sum of the two dice:

\[ X((i,j))=i+j, \]

which has the following probability distribution:

Probability distribution for the sum of two dice.
x	2	3	4	5	6	7	8	9	10	11	12
P(X=x)	1/36	2/36	3/36	4/36	5/36	6/36	5/36	4/36	3/36	2/36	1/36

\[ P(X=x)=P(\{\omega_i : X(\omega_i)=x\})=\sum_{\{\omega_i : X(\omega_i)=x\}} P(\omega_i). \]

Simplest case,

\[ P(X=2)=P((1,1))=1/36. \]

Next, we consider the non-linear random variables. Now we consider \(Y:\Omega\to \mathbb{R}\) and \(Z:\Omega\to \mathbb{R}\) given by

\[ Y((i,j))=\min\{i,j\} \quad \text{and} \quad Z((i,j))=\max\{i,j\}, \]

which have probability distributions given by:

Probability distribution of \(Y\).
y	1	2	3	4	5	6
P(Y=y)	11/36	9/36	7/36	5/36	3/36	1/36

Probability distribution of \(Z\).
z	1	2	3	4	5	6
P(Z=z)	1/36	3/36	5/36	7/36	9/36	11/36

Knowing the probability distributions of \(Y\) and \(Z\), we cannot actually figure out the joint distribution, which is given by:

Joint Probability distribution of \(Y\) and \(Z\).
y \ z	1	2	3	4	5	6
1	1/36	2/36	2/36	2/36	2/36	2/36
2	0	1/36	2/36	2/36	2/36	2/36
3	0	0	1/36	2/36	2/36	2/36
4	0	0	0	1/36	2/36	2/36
5	0	0	0	0	1/36	2/36
6	0	0	0	0	0	1/36

Note: Knowing the individual distributions, we do not know the joint distributions.

Conditional Probability. How does the probability of an event \(B\subset \Omega\) get affected by the knowledge that we know, or suppose, that another event \(A\subset \Omega\) happened?

\[ P(B|A)=\frac{P(B\cap A)}{P(A)}. \]

or equivalently the multiplication rule

\[ P(B\cap A)=P(A)P(B|A). \]

We can use this with the Law of total probability: Assume \(A_1,\dots, A_N\subset\Omega\) satisfy \(\bigcup_{i=1}^N A_i=\Omega\) and \(A_i\cap A_j=\emptyset\) for \(i\ne j\), then for any \(B\subset \Omega\),

\[ P(B)=\sum_{i=1}^N P(B\cap A_i)=\sum_{i=1}^N P(B|A_i) P(A_i). \]

Example. Assume we have two urns with balls inside them. Urn 1 has 10 Blue balls and 30 Red Balls, and Urn 2 has 40 Blue balls and 20 Red balls. Assume a ball is picked with the following algorithm:

Pick an Urn at random.
Pick a ball from that Urn at random.

Assume that you win if the ball picked is Blue. What is the probability that the ball is Blue?

\[ \begin{aligned} P(\{\text{Ball is Blue}\}) &= P(\{\text{Ball is Blue}\}|\{\text{Urn 1}\})P(\{\text{Urn 1}\})+P(\{\text{Ball is Blue}\}|\{\text{Urn 2}\})P(\{\text{Urn 2}\}) \\ &= \frac{10}{40}\times\frac{1}{2}+\frac{40}{60}\times\frac{1}{2} \\ &=\frac{1}{8}+\frac{1}{3}. \end{aligned} \]

In the background, we have a Probability space

\[ \Omega=\{(1,B),(1,R),(2,B),(2,R)\} \]

and probability function

\[ P((1,B))=\frac{1}{8}, \quad P((1,R))=\frac{3}{8}, \quad P((2,B))=\frac{1}{3},\quad P((2,R))=\frac{1}{6}. \]

Question: Assume that you can arrange the balls yourself, how do you maximize your chances of winning?

Conditioning random variables

\[ P(Y=y|X=x)=\frac{P(\{X=x\}\cap\{Y=y\})}{P(X=x)} \]

How does knowing an outcome affect the distribution of the other random variables. For instance, let's look back at the example of \( Y\,\, Z\) the min and the max of two independent dice. If

\( Y=2\)

, then we already know that \( Z\ge 2\).

Probability distribution of \(Z\) given \(Y=2\).
z	1	2	3	4	5	6
P(Z=z\|Y=2)	0	1/9	2/9	2/9	2/9	2/9

Similarly, we can condition on \(Y=5\)

Probability distribution of \(Z\) given \(Y=5\).
z	1	2	3	4	5	6
P(Z=z\|Y=5)	0	0	0	0	1/3	2/3

We say that two variables are independent if the information of one random variables gives no useful information about the other. For dependence, think of the temperature value today to tomorrow, given the temperature today, you can give an informed guess of the next couple of days. Total independence of random variables means the output of one does not affect the other. There are several ways to define This

\[ P(Y=y|X=x)= P(Y=y) \]

or equivalently,

\[ P(X=x,Y=y)= P(X=x)P(Y=y). \]

Note: Overusing the independence assumption is blamed for the collapse of the mortgage bubble in 2008.

Expectation: