Bayes' Theorem

Section 8.6 Bayes' Theorem

Let's begin with some probability. One way to represent probability is with a number between 1 and 0. A quick way to remember how to calculate some probabilities is

\begin{equation*} \frac{\textrm{# of desired outcomes }}{ \textrm{# of possible outcomes }}. \end{equation*}

Consider the probability of rolling a number greater than 4 on a six sided die. There are 2 numbers greater than 4 (specifically, 5 and 6), and 6 total possibilities, so the probability of rolling greater than 4 is \(\frac{2}{6}\) or approximately 0.33333.

If events \(A\) and \(B\) do not influence the probability of each other happening (they are independent events), you can calculate their joint probability by just multiplying their probabilities together. Consider the act of rolling two dice. An example of two independent events are \(A\text{:}\) rolling a 6, and then \(B\text{:}\) rolling an odd number. These events are independent because rolling a 6 with die 1 does not change the probability of rolling an odd number with die 2. Observe that \(P(A)=\frac{1}{6}\) and that \(P(B)=\frac{3}{6}\) since there are three odd numbers on a die. We can then use the formula

\begin{equation*} P(A , B)=P(A) P(B) \end{equation*}

and plug in the values we just determined to get

\begin{equation*} P(A , B) = \frac{1}{6}\cdot\frac{3}{6} = \frac{3}{36}. \end{equation*}

Consider the space of all possible outcomes of rolling two dice, all 36 outcomes. These individual outcome probabilities are recorded in the following table:

Table 8.6.1.

\(A \backslash B\)	1	2	3	4	5	6
1	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)
2	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)
3	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)
4	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)
5	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)
6	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)	\(1 / 36\)

The probability of rolling a six for one of the rolls is represented in the bottom row. The probability of rolling a six and rolling either a 1,3, or 5 are represented by the corresponding columns in the last row, If you add them up, you get a 3/36 probability.

Now consider the game of craps. One desirable outcome of the game is to roll two dice that sum to 7 or 11. An example of two events are \(A\text{:}\) rolling a number less than 4, and \(B\text{:}\) rolling two dice that sum to 7. Let's say we wanted to know the probability of rolling two dice that sum to 7, given that the first roll is less than 4. In math notation, we write this probability as \(P(B \vert A)\text{.}\)

You may remember from a high school algebra course that

\begin{equation*} P(B\vert A)=\frac{P(B,A)}{P(A)}, \end{equation*}

and then you can use this formula to determine the probability that the two dice sum to 7 given that the first roll is less than 4.

If we take this formula and multiply both sides by \(P(A)\text{,}\) we see that

\begin{equation*} P(B,A)=P(A)P(B \vert A). \end{equation*}

But suppose we swap events \(A\) and \(B\text{.}\) Then notice that

\begin{equation*} P(A\vert B)=\frac{P(A,B)}{P(B)}. \end{equation*}

If we solve this for \(P(A,B)\) we get

\begin{equation*} P(A,B)=P(B)P(A\vert B). \end{equation*}

You can also calculate this probability in terms of the probability of event \(B\text{.}\)

\begin{equation*} P(A,B)=P(B) P(A \vert B). \end{equation*}

But notice that the probability of \(A\) and \(B\) is the same as the probability of \(B\) and \(A\text{,}\) so we know \(P(A,B)=P(B,A)\text{.}\) Plugging in the formulas we just determined for each of these, we see that

\begin{equation*} P(B) P(A \vert B)=P(A) P(B \vert A). \end{equation*}

Solving for \(P(A \vert B)\text{,}\) we obtain Bayes' Theorem:

\begin{equation*} P(A \vert B)= \frac{P(A)P(B \vert A)}{P(B)}. \end{equation*}

You might ask, why would you do all of this manipulation? This theorem is now ubiquitous in science, machine learning, and artificial intelligence. As we will see in later sections, Bayes' theorem allows you to make predictions and inferences with what seems like incomplete data.

Subsection 8.6.1 Using Bayes' Theorem to calculate false positives

One consideration that policy makers and public health experts have to take into account when selecting tests for a disease is the false positive and false negative rate. These can be thought of as the probability of testing positive when a person does not have the disease (false positive) and the probability of testing negative when a person actually has the disease.

Let's say we are trying to find the probability of a false negative. This is arguably the scarier scenario from a public health scenario, as we saw with asymptomatic carriers of the COVID-19 virus. If someone thinks they are negative but are actually positive, they might spread the disease to many others!

Let's say the Umbrella corporation is attempting to test a new rapid COVID test. They start by randomly selecting 100,000 people and administering two tests: their new rapid test and a 100 percent reliable (but slow) Super PCR test. (In reality PCR tests are not 100% effective.)

Note: "negative" means tested negative, not actually negative, "positive" means tested positive, not actually positive.

\begin{equation*} P(infected \vert negative)= \frac{P(negative\vert infected)P(infected)}{P(negative)} \end{equation*}

In the broader population, the prior belief of the Centers for Disease Control and Prevention is that the positive rate for the disease is 1 percent. Therefore, \(P(infected) = 0.001\text{.}\) The manufacturers of the test have taken a random sampling of people and administered the test. Of the 100,000 people, 98,500 people tested negative. Therefore \(P(negative) = 98,500 / 100,000 = 0.985\text{.}\)

Because there are many more non-infected people, and a low negativity rate for the test, the company decides to take the 1,500 people who tested negative using the Super PCR test. All of these people are actually infected, because the other test is 100 pct. accurate. Of these 1,500 people who are actually infected, 1,000 tested positive and 500 tested negative using the rapid test. Therefore, \(P(negative \vert infected) = 500/1,500 = 1/3\text{.}\)

We now have everything we need to make the false negative calculation.

\begin{equation*} P(infected | negative)= \frac{P(negative\vert infected)P(infected)}{P(negative)} = \frac{(1/3)(0.001)}{(0.985)}\approx 0.00032833. \end{equation*}

This might not seem like a high probability, but consider the implications for a city of 1,000,000 people; that's \(0.00032833 \times 1,000,000 = 328\) people walking around the city, spreading disease!