Discrete Probability – Complete Lesson

Introduction: What is Probability?

My dear student, let me ask you a simple question before we begin. If you flip a fair coin, what is the chance that it lands on heads? You probably said “half” or “50%.” You are absolutely right! Without knowing it, you just applied the concept of probability.

Probability is a way of measuring how likely an event is to happen. We use it every day in real life — weather forecasts say “80% chance of rain,” doctors say “the test is 95% accurate,” and lotteries tell us our “1 in 10 million” chance of winning. All of these are probability statements.

In discrete mathematics, we focus on situations where the possible outcomes are countable — like rolling dice, drawing cards, or flipping coins. This is called discrete probability. Let us learn it step by step, starting from the very beginning.

Part 1: Sample Spaces and Events

What is a Sample Space?

Before we can talk about probability, we need to understand the set of all possible outcomes of an experiment. This set is called the sample space, and it is usually denoted by the letter \(S\).

Definition: The sample space of an experiment is the set of all possible outcomes of that experiment.

Examples of Sample Spaces

Example 1: When we flip a single coin, the possible outcomes are Heads (H) or Tails (T). So:

\[S = \{H, T\}\]

Example 2: When we roll a single die, the possible outcomes are 1, 2, 3, 4, 5, or 6:

\[S = \{1, 2, 3, 4, 5, 6\}\]

Example 3: When we flip two coins, we must list ALL possible pairs:

\[S = \{HH, HT, TH, TT\}\]

Be careful here! Many students write only {HH, HT, TT} and forget TH. But HT means “first coin heads, second coin tails” which is DIFFERENT from TH “first coin tails, second coin heads.” Always be thorough!

Example 4: When we roll two dice, the sample space has 36 ordered pairs:

S = {
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
}

What is an Event?

An event is a subset of the sample space. It is a collection of outcomes that we are interested in.

Definition: An event is any subset of the sample space \(S\).

Example 5: If we roll a die and define event A as “rolling an even number,” then:

\[A = \{2, 4, 6\}\]

Example 6: If we roll two dice and event B is “the sum is 7,” then:

\[B = \{(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)\}\]

Question: A coin is flipped three times. List the sample space. Then list the event “exactly two heads.”

Sample space:

\[S = \{HHH, HHT, HTH, THH, HTT, THT, TTH, TTT\}\]

There are \(2^3 = 8\) outcomes.

Event “exactly two heads”:

\[A = \{HHT, HTH, THH\}\]

Notice: we do NOT include HHH (that is three heads, not two) and we do NOT include HTT, THT, TTH (those have only one head).

Types of Events

There are some special types of events you should know:

Impossible event: The empty set \(\emptyset\). For example, “rolling a 7 on a standard die.”
Certain event: The entire sample space \(S\). For example, “rolling a number between 1 and 6 on a die.”
Complement of event A: Written as \(\overline{A}\), it is the set of all outcomes in \(S\) that are NOT in \(A\).
Union of A and B: \(A \cup B\) means “A OR B happens.”
Intersection of A and B: \(A \cap B\) means “A AND B both happen.”
Mutually exclusive (disjoint) events: \(A\) and \(B\) are disjoint if \(A \cap B = \emptyset\) — they cannot happen at the same time.

Connection to Set Theory: Notice how probability uses all the set operations you learned before — union, intersection, complement, De Morgan’s laws. This is why we studied sets first! Everything from set theory applies here.

Key Points — Sample Spaces and Events:The sample space \(S\) is the set of ALL possible outcomes
An event is a subset of \(S\)
Be careful to list ALL outcomes — don’t miss any
When flipping \(n\) coins, there are \(2^n\) outcomes
When rolling \(n\) dice, there are \(6^n\) outcomes
Events use set notation: union (OR), intersection (AND), complement (NOT)

Question 1: Two dice are rolled. Let event A = “first die is 3” and event B = “sum is 8.” Find \(A \cap B\) and explain what it means.

\(A = \{(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)\}\)

\(B = \{(2,6), (3,5), (4,4), (5,3), (6,2)\}\)

\(A \cap B = \{(3,5)\}\)

This means there is exactly ONE outcome where both events happen simultaneously: first die shows 3 AND the sum is 8 (which requires the second die to be 5).

Question 2: A card is drawn from a standard 52-card deck. How many outcomes are in the sample space? List the event “drawing a face card.”

The sample space has 52 outcomes (one for each card).

Face cards are Jacks, Queens, and Kings. There are 3 face cards per suit, and 4 suits, so:

Event “face card” = {J♠, Q♠, K♠, J♥, Q♥, K♥, J♦, Q♦, K♦, J♣, Q♣, K♣}

This event has 12 outcomes.

Part 2: Finite Probability Spaces (Laplace’s Definition)

Assigning Probabilities to Outcomes

Now that we have a sample space, how do we assign probabilities? For finite sample spaces with equally likely outcomes, we use Laplace’s definition:

Laplace’s Definition: If a sample space \(S\) has \(n\) equally likely outcomes, and event \(A\) contains \(k\) of those outcomes, then:

\[P(A) = \frac{k}{n} = \frac{\text{number of outcomes in } A}{\text{number of outcomes in } S}\]

This is the most intuitive definition. Let me explain with examples.

Example 7: Rolling a Die

When we roll a fair die, each of the 6 faces is equally likely. So:

\(P(\text{rolling a 4}) = \frac{1}{6}\) — only 1 favorable outcome out of 6
\(P(\text{rolling an even number}) = \frac{3}{6} = \frac{1}{2}\) — three favorable outcomes: {2, 4, 6}
\(P(\text{rolling a number greater than 4}) = \frac{2}{6} = \frac{1}{3}\) — two favorable outcomes: {5, 6}
\(P(\text{rolling a 7}) = \frac{0}{6} = 0\) — impossible event
\(P(\text{rolling a number from 1 to 6}) = \frac{6}{6} = 1\) — certain event

Example 8: Sum of Two Dice

When we roll two dice, there are 36 equally likely outcomes. What is the probability that the sum is 7?

The event “sum = 7” has 6 outcomes: {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}.

\[P(\text{sum} = 7) = \frac{6}{36} = \frac{1}{6}\]

What about sum = 2? Only (1,1), so \(P = \frac{1}{36}\).

What about sum = 12? Only (6,6), so \(P = \frac{1}{36}\).

What about an even sum? The even sums are 2, 4, 6, 8, 10, 12. Counting outcomes:

Sum 2: (1,1) = 1 outcome
Sum 4: (1,3),(2,2),(3,1) = 3 outcomes
Sum 6: (1,5),(2,4),(3,3),(4,2),(5,1) = 5 outcomes
Sum 8: (2,6),(3,5),(4,4),(5,3),(6,2) = 5 outcomes
Sum 10: (4,6),(5,5),(6,4) = 3 outcomes
Sum 12: (6,6) = 1 outcome
Total: 1 + 3 + 5 + 5 + 3 + 1 = 18 outcomes

\[P(\text{even sum}) = \frac{18}{36} = \frac{1}{2}\]

Important: The phrase “equally likely” is essential in Laplace’s definition. If the outcomes are NOT equally likely, we cannot simply count and divide. For example, a loaded die does not have equally likely faces, so we need a different approach (which we will see with the general definition of probability).

Axioms of Probability

For a more general definition, probability is defined by three axioms that every probability function must satisfy:

Axioms of Probability:

Axiom 1: For every event \(A\), \(0 \leq P(A) \leq 1\)

Axiom 2: \(P(S) = 1\) (the probability of the entire sample space is 1)

Axiom 3: If \(A_1, A_2, A_3, \ldots\) are mutually exclusive events, then:
\[P(A_1 \cup A_2 \cup A_3 \cup \cdots) = P(A_1) + P(A_2) + P(A_3) + \cdots\]

These three axioms are the foundation of ALL probability theory. Every rule and formula we derive comes from these three axioms.

Basic Rules of Probability

From the three axioms, we can prove several important rules:

Rule 1: Complement Rule

\[P(\overline{A}) = 1 – P(A)\]

This says: the probability that A does NOT happen equals 1 minus the probability that A happens. This is very useful when it is easier to count the outcomes we do NOT want.

Rule 2: Addition Rule for Mutually Exclusive Events

\[P(A \cup B) = P(A) + P(B) \quad \text{when } A \cap B = \emptyset\]

Rule 3: General Addition Rule (for any two events)

\[P(A \cup B) = P(A) + P(B) – P(A \cap B)\]

Why do we subtract \(P(A \cap B)\)? Because when we add \(P(A) + P(B)\), we count the outcomes in \(A \cap B\) twice — once in \(P(A)\) and once in \(P(B)\). Subtracting once corrects this overcounting. This is exactly the Inclusion-Exclusion Principle applied to probability!

Example 9: Using the General Addition Rule

Problem: In a class of 100 students, 45 study Mathematics, 30 study Physics, and 15 study both. If a student is selected at random, find the probability that the student studies Mathematics or Physics.

Solution:

\(P(M) = \frac{45}{100} = 0.45\)
\(P(P) = \frac{30}{100} = 0.30\)
\(P(M \cap P) = \frac{15}{100} = 0.15\)

\[P(M \cup P) = 0.45 + 0.30 – 0.15 = 0.60\]

So there is a 60% chance that a randomly selected student studies Math or Physics.

Example 10: Using the Complement Rule

Problem: What is the probability of getting at least one head when we flip a coin 5 times?

Solution: “At least one head” means 1 or 2 or 3 or 4 or 5 heads. It would be tedious to count all these cases. Instead, use the complement!

The complement of “at least one head” is “NO heads” = “all tails” = TTTTT. There is only 1 such outcome out of \(2^5 = 32\).

\[P(\text{at least one head}) = 1 – P(\text{no heads}) = 1 – \frac{1}{32} = \frac{31}{32}\]

See how much easier that is? The complement rule is one of the most powerful tricks in probability!

Question 3: A card is drawn from a standard deck. Find the probability of drawing a heart or a king.

Let H = “drawing a heart” and K = “drawing a king.”

\(P(H) = \frac{13}{52}\), \(P(K) = \frac{4}{52}\)

\(P(H \cap K) = \frac{1}{52}\) (only the King of Hearts is in both)

\[P(H \cup K) = \frac{13}{52} + \frac{4}{52} – \frac{1}{52} = \frac{16}{52} = \frac{4}{13}\]

If we had just added 13 + 4 = 17, we would have counted the King of Hearts twice. The general addition rule corrects this.

Question 4: What is the probability that in a group of 30 people, at least two share the same birthday? (Assume 365 days, all equally likely, ignore leap years.)

This is the famous Birthday Problem. It is much easier to use the complement rule.

Complement: “All 30 people have DIFFERENT birthdays.”

Total possible birthday assignments: \(365^{30}\)

Favorable (all different): \(365 \times 364 \times 363 \times \cdots \times 336\) (30 terms)

\[P(\text{all different}) = \frac{365 \times 364 \times \cdots \times 336}{365^{30}}\]

This is approximately 0.2937.

\[P(\text{at least two share}) = 1 – 0.2937 \approx 0.7063\]

So there is about a 70.6% chance! Most people find this surprisingly high. With just 30 people, it is more likely than not that two share a birthday. (With 23 people, the probability already exceeds 50%.)

Key Points — Probability Basics:Laplace’s rule: \(P(A) = \frac{|A|}{|S|}\) when outcomes are equally likely
Probability is always between 0 and 1
Complement rule: \(P(\overline{A}) = 1 – P(A)\) — very useful!
Addition rule: \(P(A \cup B) = P(A) + P(B) – P(A \cap B)\)
For mutually exclusive events: \(P(A \cup B) = P(A) + P(B)\)
“At least one” problems → use complement rule

Part 3: Conditional Probability

What is Conditional Probability?

Sometimes we have extra information that changes the probability. For example, if I tell you that a card drawn from a deck is a heart, what is the probability that it is the King of Hearts? With this extra information, we are no longer looking at the full 52-card deck — we are only looking at the 13 hearts.

Definition: The conditional probability of event \(A\) given that event \(B\) has already occurred is:

\[P(A \mid B) = \frac{P(A \cap B)}{P(B)}\]

This is read as “the probability of A given B.” We require \(P(B) > 0\).

Think of it this way: The notation \(P(A \mid B)\) means “given that we KNOW B has happened, what is the chance that A also happens?” We restrict our attention to only the outcomes in B, and then see what fraction of those are also in A.

Example 11: Conditional Probability with Dice

Problem: Two dice are rolled. Given that the sum is 8, what is the probability that at least one die shows a 6?

Solution: Let \(A\) = “at least one die shows 6” and \(B\) = “sum is 8.”

First, find \(B = \{(2,6), (3,5), (4,4), (5,3), (6,2)\}\). So \(|B| = 5\).

Now, which of these are also in \(A\)? Only \((2,6)\) and \((6,2)\) have a 6. So \(A \cap B = \{(2,6), (6,2)\}\), and \(|A \cap B| = 2\).

\[P(A \mid B) = \frac{|A \cap B|}{|B|} = \frac{2}{5}\]

Notice: We did NOT divide by 36 (the full sample space). We divided by 5 (the number of outcomes in B) because the condition “given that the sum is 8” restricts us to only those 5 outcomes.

Example 12: Conditional Probability with Cards

Problem: A card is drawn from a standard deck. Given that the card is a face card, what is the probability that it is a King?

Solution: Let \(F\) = “face card” (12 cards) and \(K\) = “King” (4 cards). Then \(F \cap K\) = “King that is a face card” = all 4 Kings (since Kings are face cards).

\[P(K \mid F) = \frac{P(K \cap F)}{P(F)} = \frac{4/52}{12/52} = \frac{4}{12} = \frac{1}{3}\]

This makes sense: among the 12 face cards (J, Q, K of each suit), 4 are Kings, so the chance is 4/12 = 1/3.

Multiplication Rule

From the definition of conditional probability, we can derive the multiplication rule:

Multiplication Rule:

\[P(A \cap B) = P(A) \cdot P(B \mid A) = P(B) \cdot P(A \mid B)\]

This rule is used when we want to find the probability that TWO events both happen.

Example 13: Drawing Without Replacement

Problem: Two cards are drawn from a deck WITHOUT replacement. What is the probability that both are aces?

Solution: Let \(A_1\) = “first card is an ace” and \(A_2\) = “second card is an ace.”

\[P(A_1 \cap A_2) = P(A_1) \cdot P(A_2 \mid A_1) = \frac{4}{52} \times \frac{3}{51} = \frac{12}{2652} = \frac{1}{221}\]

Why is the second probability 3/51? After drawing the first ace, there are only 3 aces left in a deck of 51 remaining cards. This is the key idea of “without replacement.”

Question 5: A bag contains 5 red balls and 3 blue balls. Two balls are drawn without replacement. Find the probability that both are red.

Let \(R_1\) = “first ball is red” and \(R_2\) = “second ball is red.”

\[P(R_1 \cap R_2) = P(R_1) \cdot P(R_2 \mid R_1) = \frac{5}{8} \times \frac{4}{7} = \frac{20}{56} = \frac{5}{14}\]

After removing one red ball, there are 4 red balls left out of 7 total balls.

Question 6: In a class, 60% of students passed Mathematics and 50% passed English. 30% passed both. Given that a student passed English, what is the probability that they also passed Mathematics?

Let \(M\) = “passed Math” and \(E\) = “passed English.”

\(P(M) = 0.60\), \(P(E) = 0.50\), \(P(M \cap E) = 0.30\)

\[P(M \mid E) = \frac{P(M \cap E)}{P(E)} = \frac{0.30}{0.50} = 0.60\]

So 60% of students who passed English also passed Math. Interestingly, this equals \(P(M)\), which tells us that passing Math is independent of passing English — but we will discuss independence next!

Key Points — Conditional Probability:\(P(A \mid B) = \frac{P(A \cap B)}{P(B)}\) — restrict to the given condition
Multiplication rule: \(P(A \cap B) = P(A) \cdot P(B \mid A)\)
“Without replacement” → probabilities change after each draw
“With replacement” → probabilities stay the same
\(P(A \mid B)\) is NOT the same as \(P(B \mid A)\) in general

Part 4: Independent Events

What Does “Independent” Mean?

In everyday language, “independent” means two things don’t affect each other. In probability, it has a precise mathematical definition:

Definition: Two events \(A\) and \(B\) are independent if and only if:

\[P(A \cap B) = P(A) \cdot P(B)\]

This means: knowing that \(B\) happened does NOT change the probability of \(A\). If \(A\) and \(B\) are independent, then \(P(A \mid B) = P(A)\).

Warning — Common Mistake! Many students confuse “independent” with “mutually exclusive.” These are DIFFERENT concepts!

– Mutually exclusive means \(A \cap B = \emptyset\) (they cannot both happen)
– Independent means \(P(A \cap B) = P(A) \cdot P(B)\) (one doesn’t affect the other)

In fact, if two events are both mutually exclusive AND independent (with positive probability), then \(P(A \cap B) = 0 = P(A) \cdot P(B)\), which forces \(P(A) = 0\) or \(P(B) = 0\). So non-trivial events cannot be both mutually exclusive and independent!

Example 14: Independent Events with Dice

When we roll two dice, the outcome of the first die does NOT affect the outcome of the second die. So events about different dice are independent.

Problem: Roll two dice. What is the probability that the first die shows 4 AND the second die shows an even number?

Solution: Let \(A\) = “first die is 4” (\(P(A) = \frac{1}{6}\)) and \(B\) = “second die is even” (\(P(B) = \frac{3}{6} = \frac{1}{2}\)).

\[P(A \cap B) = P(A) \cdot P(B) = \frac{1}{6} \times \frac{1}{2} = \frac{1}{12}\]

Example 15: Checking Independence

Problem: A coin is flipped and a die is rolled. Let \(A\) = “coin is heads” and \(B\) = “die shows 6.” Are \(A\) and \(B\) independent?

Solution: \(P(A) = \frac{1}{2}\), \(P(B) = \frac{1}{6}\), \(P(A \cap B) = \frac{1}{12}\).

Check: \(P(A) \cdot P(B) = \frac{1}{2} \times \frac{1}{6} = \frac{1}{12} = P(A \cap B)\). ✓

Yes, they are independent. The coin and die have nothing to do with each other.

Independence of More Than Two Events

For three events \(A\), \(B\), and \(C\) to be mutually independent, we need ALL of these conditions:

\[\begin{aligned} P(A \cap B) &= P(A) \cdot P(B) \\ P(A \cap C) &= P(A) \cdot P(C) \\ P(B \cap C) &= P(B) \cdot P(C) \\ P(A \cap B \cap C) &= P(A) \cdot P(B) \cdot P(C) \end{aligned}\]

All four conditions must hold! It is not enough to check them in pairs.

Question 7: A bag has 3 red and 2 blue balls. A ball is drawn, its color is noted, and it is PUT BACK. Then a second ball is drawn. Are the events “first ball is red” and “second ball is red” independent?

Yes, they are independent!

Because the ball is PUT BACK (with replacement), the composition of the bag does not change. So:

\(P(R_1) = \frac{3}{5}\), \(P(R_2) = \frac{3}{5}\)

\(P(R_1 \cap R_2) = \frac{3}{5} \times \frac{3}{5} = \frac{9}{25}\)

And \(P(R_1) \cdot P(R_2) = \frac{3}{5} \times \frac{3}{5} = \frac{9}{25} = P(R_1 \cap R_2)\) ✓

Key insight: “With replacement” always gives independence. “Without replacement” gives dependence.

Question 8: In a certain college, 55% of students are female, 20% are engineering majors, and 10% are female engineering majors. Are the events “student is female” and “student is an engineering major” independent?

Let \(F\) = “female” and \(E\) = “engineering major.”

\(P(F) = 0.55\), \(P(E) = 0.20\), \(P(F \cap E) = 0.10\)

Check: \(P(F) \cdot P(E) = 0.55 \times 0.20 = 0.11\)

But \(P(F \cap E) = 0.10 \neq 0.11\)

Since \(P(F \cap E) \neq P(F) \cdot P(E)\), the events are NOT independent. The fraction of female engineering students (10%) differs slightly from what we would expect if gender and major were independent (11%).

Key Points — Independence:Independent means \(P(A \cap B) = P(A) \cdot P(B)\)
If independent: \(P(A \mid B) = P(A)\) — the condition doesn’t matter
“With replacement” → independent; “Without replacement” → dependent
Independent ≠ Mutually exclusive (don’t confuse them!)
For three events: check ALL pair products AND the triple product

Part 5: Bayes’ Theorem

The Problem Bayes’ Theorem Solves

Let me ask you a question to motivate this. Suppose a medical test for a disease is 95% accurate (both for positive and negative results). If 1% of the population has the disease, and you test positive, what is the probability that you actually have the disease?

Most people would say 95%. But the real answer is much lower — only about 16%! This surprising result comes from Bayes’ Theorem, which deals with “reverse” conditional probability.

Partition of the Sample Space

Before stating Bayes’ Theorem, we need the concept of a partition:

Definition: Events \(B_1, B_2, \ldots, B_n\) form a partition of \(S\) if they are pairwise disjoint and their union is \(S\). That is:

(i) \(B_i \cap B_j = \emptyset\) for all \(i \neq j\)

(ii) \(B_1 \cup B_2 \cup \cdots \cup B_n = S\)

(iii) \(P(B_i) > 0\) for all \(i\)

Theorem of Total Probability

Theorem of Total Probability: If \(B_1, B_2, \ldots, B_n\) is a partition of \(S\), then for any event \(A\):

\[P(A) = \sum_{i=1}^{n} P(B_i) \cdot P(A \mid B_i)\]

This formula says: to find \(P(A)\), break it into cases based on which \(B_i\) happens, then add up the contributions.

Bayes’ Theorem

Bayes’ Theorem: If \(B_1, B_2, \ldots, B_n\) is a partition of \(S\) and \(P(A) > 0\), then:

\[P(B_k \mid A) = \frac{P(B_k) \cdot P(A \mid B_k)}{\sum_{i=1}^{n} P(B_i) \cdot P(A \mid B_i)}\]

In words: To find the probability that \(B_k\) caused \(A\), take the probability that BOTH \(B_k\) and \(A\) happen, and divide by the TOTAL probability of \(A\).

For the special case of just two events \(B_1\) and \(B_2\):

\[P(B_1 \mid A) = \frac{P(B_1) \cdot P(A \mid B_1)}{P(B_1) \cdot P(A \mid B_1) + P(B_2) \cdot P(A \mid B_2)}\]

Example 16: The Medical Test Problem

Problem: A disease affects 1% of a population. A test for the disease is 95% accurate: if you have the disease, it is positive 95% of the time; if you don’t have the disease, it is negative 95% of the time (so 5% false positive rate). If a person tests positive, what is the probability they actually have the disease?

Solution: Let us define our events:

\(D\) = “person has the disease” → \(P(D) = 0.01\)
\(\overline{D}\) = “person does not have the disease” → \(P(\overline{D}) = 0.99\)
\(+\) = “test is positive”
\(P(+ \mid D) = 0.95\) (sensitivity)
\(P(+ \mid \overline{D}) = 0.05\) (false positive rate)

We want \(P(D \mid +)\). By Bayes’ Theorem:

\[P(D \mid +) = \frac{P(D) \cdot P(+ \mid D)}{P(D) \cdot P(+ \mid D) + P(\overline{D}) \cdot P(+ \mid \overline{D})}\]

\[= \frac{0.01 \times 0.95}{0.01 \times 0.95 + 0.99 \times 0.05} = \frac{0.0095}{0.0095 + 0.0495} = \frac{0.0095}{0.059} \approx 0.161\]

So even with a 95% accurate test, a positive result means only about 16.1% chance of actually having the disease!

Why so low? Because the disease is rare (only 1%). The false positives (0.99 × 0.05 = 0.0495) actually OUTNUMBER the true positives (0.01 × 0.95 = 0.0095). This is a very important lesson in medical testing and data analysis.

Example 17: Bayes’ Theorem with Three Groups

Problem: A factory has 3 machines producing items. Machine 1 produces 50% of items with 2% defect rate. Machine 2 produces 30% with 3% defect rate. Machine 3 produces 20% with 5% defect rate. An item is found to be defective. What is the probability it came from Machine 3?

Solution:

\(P(M_1) = 0.50\), \(P(D \mid M_1) = 0.02\)
\(P(M_2) = 0.30\), \(P(D \mid M_2) = 0.03\)
\(P(M_3) = 0.20\), \(P(D \mid M_3) = 0.05\)

Numerator: \(P(M_3) \cdot P(D \mid M_3) = 0.20 \times 0.05 = 0.010\)

Denominator (total probability of defect):

\[0.50 \times 0.02 + 0.30 \times 0.03 + 0.20 \times 0.05 = 0.010 + 0.009 + 0.010 = 0.029\]

\[P(M_3 \mid D) = \frac{0.010}{0.029} \approx 0.345\]

So about 34.5% chance the defective item came from Machine 3, even though Machine 3 produces only 20% of items. This makes sense because Machine 3 has the highest defect rate.

Question 9: In a town, 60% of people favor Team A and 40% favor Team B. 80% of Team A fans watch the match on TV, while 50% of Team B fans watch on TV. If a randomly selected person watches the match on TV, what is the probability they favor Team A?

Let \(A\) = “favors Team A”, \(B\) = “favors Team B”, \(T\) = “watches on TV.”

\(P(A) = 0.60\), \(P(T \mid A) = 0.80\)

\(P(B) = 0.40\), \(P(T \mid B) = 0.50\)

\[P(A \mid T) = \frac{0.60 \times 0.80}{0.60 \times 0.80 + 0.40 \times 0.50} = \frac{0.48}{0.48 + 0.20} = \frac{0.48}{0.68} \approx 0.706\]

About 70.6% chance. Team A fans are more likely to watch on TV, so knowing someone watches shifts the probability in favor of Team A.

Key Points — Bayes’ Theorem:Bayes’ Theorem finds “reverse” conditional probability: \(P(B_k \mid A)\)
It uses the Theorem of Total Probability in the denominator
Numerator = probability of the specific cause times likelihood of evidence
Denominator = total probability of the evidence (sum over all causes)
Be very careful with the difference between \(P(A \mid B)\) and \(P(B \mid A)\)
Medical test problems are classic Bayes’ Theorem applications

Part 6: Expected Value and Variance

What is Expected Value?

The expected value (also called expectation or mean) of a random variable is the average value we would get if we repeated the experiment many times. It is like a “weighted average” of all possible values, where the weights are the probabilities.

Definition: If \(X\) is a random variable that takes values \(x_1, x_2, \ldots, x_n\) with probabilities \(p_1, p_2, \ldots, p_n\), then the expected value of \(X\) is:

\[E(X) = \sum_{i=1}^{n} x_i \cdot p_i = x_1 p_1 + x_2 p_2 + \cdots + x_n p_n\]

Example 18: Expected Value of a Die Roll

When we roll a fair die, each value 1 through 6 has probability \(\frac{1}{6}\).

\[E(X) = 1 \cdot \frac{1}{6} + 2 \cdot \frac{1}{6} + 3 \cdot \frac{1}{6} + 4 \cdot \frac{1}{6} + 5 \cdot \frac{1}{6} + 6 \cdot \frac{1}{6} = \frac{21}{6} = 3.5\]

Notice: The expected value is 3.5, even though no single die roll can give 3.5! Expected value is NOT necessarily one of the possible outcomes. It is the long-run average — if you roll a die thousands of times, the average will be very close to 3.5.

Example 19: Expected Value of a Lottery

Problem: A lottery ticket costs 10 Birr. There is a 1% chance to win 500 Birr and a 0.1% chance to win 5000 Birr. Otherwise, you win nothing. Find the expected winnings. Is the lottery fair?

Solution: The random variable \(X\) = amount won.

\[E(X) = 0 \times 0.989 + 500 \times 0.01 + 5000 \times 0.001 = 0 + 5 + 5 = 10 \text{ Birr}\]

The expected winnings equal the ticket price (10 Birr), so the lottery is fair — in the long run, you neither gain nor lose money on average.

Properties of Expected Value

Properties:

(1) \(E(c) = c\) for any constant \(c\)

(2) \(E(cX) = c \cdot E(X)\) for any constant \(c\)

(3) \(E(X + Y) = E(X) + E(Y)\) for any random variables \(X\) and \(Y\)

(4) If \(X\) and \(Y\) are independent, \(E(XY) = E(X) \cdot E(Y)\)

Variance

The variance measures how spread out the values of a random variable are. A small variance means values cluster near the expected value; a large variance means they are more spread out.

Definition: The variance of \(X\) is:

\[\text{Var}(X) = E[(X – E(X))^2] = \sum_{i} (x_i – \mu)^2 \cdot p_i\]

where \(\mu = E(X)\). An equivalent (often easier) formula is:

\[\text{Var}(X) = E(X^2) – [E(X)]^2\]

The standard deviation is the square root of the variance: \(\sigma = \sqrt{\text{Var}(X)}\).

Example 20: Variance of a Die Roll

We found \(E(X) = 3.5\). Now find \(E(X^2)\):

\[E(X^2) = 1^2 \cdot \frac{1}{6} + 2^2 \cdot \frac{1}{6} + 3^2 \cdot \frac{1}{6} + 4^2 \cdot \frac{1}{6} + 5^2 \cdot \frac{1}{6} + 6^2 \cdot \frac{1}{6} = \frac{91}{6}\]

\[\text{Var}(X) = \frac{91}{6} – (3.5)^2 = \frac{91}{6} – \frac{49}{4} = \frac{182 – 147}{12} = \frac{35}{12} \approx 2.917\]

\[\sigma = \sqrt{\frac{35}{12}} \approx 1.708\]

Question 10: A game costs 5 Birr to play. You roll a die: if you get 6, you win 25 Birr; otherwise, you win nothing. Find the expected net gain (winnings minus cost). Should you play this game?

Let \(X\) = net gain = winnings − cost.

If you roll 6: net gain = 25 − 5 = 20 Birr (probability \(\frac{1}{6}\))

If you don’t roll 6: net gain = 0 − 5 = −5 Birr (probability \(\frac{5}{6}\))

\[E(X) = 20 \times \frac{1}{6} + (-5) \times \frac{5}{6} = \frac{20}{6} – \frac{25}{6} = \frac{-5}{6} \approx -0.83 \text{ Birr}\]

The expected net gain is negative (about −0.83 Birr per game). On average, you lose about 83 cents each time you play. You should not play this game — it is unfavorable to the player.

Question 11: A random variable \(X\) takes values 0, 1, 2 with probabilities \(\frac{1}{4}\), \(\frac{1}{2}\), and \(\frac{1}{4}\) respectively. Find \(E(X)\) and \(\text{Var}(X)\).

\[E(X) = 0 \times \frac{1}{4} + 1 \times \frac{1}{2} + 2 \times \frac{1}{4} = 0 + \frac{1}{2} + \frac{1}{2} = 1\]

\[E(X^2) = 0^2 \times \frac{1}{4} + 1^2 \times \frac{1}{2} + 2^2 \times \frac{1}{4} = 0 + \frac{1}{2} + 1 = \frac{3}{2}\]

\[\text{Var}(X) = E(X^2) – [E(X)]^2 = \frac{3}{2} – 1 = \frac{1}{2}\]

\(\sigma = \sqrt{1/2} \approx 0.707\)

Key Points — Expected Value and Variance:\(E(X) = \sum x_i p_i\) — weighted average of all values
Expected value may not be a possible outcome (e.g., 3.5 for a die)
\(\text{Var}(X) = E(X^2) – [E(X)]^2\) — use this formula!
Variance is always non-negative: \(\text{Var}(X) \geq 0\)
Standard deviation = \(\sqrt{\text{Var}(X)}\) — same units as \(X\)
For decision-making: if \(E(\text{gain}) < 0\), don't play!

Part 7: Bernoulli Trials and Binomial Distribution

What are Bernoulli Trials?

A Bernoulli trial is an experiment with exactly two outcomes: “success” (with probability \(p\)) and “failure” (with probability \(q = 1 – p\)).

When we repeat Bernoulli trials \(n\) times independently, we get independent repeated trials. The number of successes in \(n\) trials follows the binomial distribution.

Binomial Distribution: If we perform \(n\) independent Bernoulli trials with success probability \(p\), then the probability of exactly \(k\) successes is:

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]

Example 21: Coin Flips

Problem: A fair coin is flipped 10 times. What is the probability of getting exactly 6 heads?

Solution: Here \(n = 10\), \(k = 6\), \(p = 0.5\).

\[P(X = 6) = \binom{10}{6} (0.5)^6 (0.5)^4 = \binom{10}{6} (0.5)^{10} = 210 \times \frac{1}{1024} = \frac{210}{1024} \approx 0.205\]

Expected Value and Variance of Binomial Distribution

For a binomial distribution \(B(n, p)\):

\[E(X) = np \qquad \text{Var}(X) = np(1-p)\]

Example 22: For 10 flips of a fair coin: \(E(X) = 10 \times 0.5 = 5\) and \(\text{Var}(X) = 10 \times 0.5 \times 0.5 = 2.5\).

Question 12: A biased die has probability 0.3 of landing on 6. If the die is rolled 8 times, find the probability of getting exactly three 6s. Also find the expected number of 6s.

\(n = 8\), \(k = 3\), \(p = 0.3\)

\[P(X = 3) = \binom{8}{3} (0.3)^3 (0.7)^5 = 56 \times 0.027 \times 0.16807 = 56 \times 0.004538 = 0.254\]

Expected number of 6s: \(E(X) = np = 8 \times 0.3 = 2.4\)

So on average, we expect 2.4 sixes in 8 rolls (even though we can only get an integer number of sixes).

Revision Summary: Discrete Probability

1. Sample Space and Events

Sample space \(S\) = set of all possible outcomes
Event = subset of \(S\)
\(n\) coin flips → \(2^n\) outcomes
\(n\) dice rolls → \(6^n\) outcomes
Complement: \(\overline{A}\), Union: \(A \cup B\), Intersection: \(A \cap B\)

2. Probability Rules

\[\begin{aligned} &P(\overline{A}) = 1 – P(A) && \text{(Complement Rule)} \\ &P(A \cup B) = P(A) + P(B) – P(A \cap B) && \text{(Addition Rule)} \\ &P(A \cup B) = P(A) + P(B) && \text{(if mutually exclusive)} \\ &0 \leq P(A) \leq 1 \quad \text{and} \quad P(S) = 1 && \text{(Axioms)} \end{aligned}\]

3. Conditional Probability

\[P(A \mid B) = \frac{P(A \cap B)}{P(B)}\]
\[P(A \cap B) = P(A) \cdot P(B \mid A) \quad \text{(Multiplication Rule)}\]

4. Independence

\[A \text{ and } B \text{ independent} \iff P(A \cap B) = P(A) \cdot P(B)\]
Independent ≠ Mutually exclusive
With replacement → independent
Without replacement → dependent

5. Bayes’ Theorem

\[P(B_k \mid A) = \frac{P(B_k) \cdot P(A \mid B_k)}{\sum_{i=1}^{n} P(B_i) \cdot P(A \mid B_i)}\]

Use when you need to find \(P(\text{cause} \mid \text{effect})\)

6. Expected Value and Variance

\[E(X) = \sum x_i p_i \qquad \text{Var}(X) = E(X^2) – [E(X)]^2\]
\[E(cX) = cE(X) \qquad E(X+Y) = E(X) + E(Y) \qquad \text{Var}(cX) = c^2 \text{Var}(X)\]

7. Binomial Distribution

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \qquad E(X) = np \qquad \text{Var}(X) = np(1-p)\]

Problem-Solving Strategy

Identify the experiment — what is being done?
List the sample space — what are ALL possible outcomes?
Identify the event — which outcomes are we interested in?
Check if outcomes are equally likely — use Laplace’s rule or general axioms
Choose the right formula — addition? complement? conditional? Bayes?
“At least one” → use complement rule
“Given that” → use conditional probability
“Reverse cause” → use Bayes’ Theorem
Repeated trials → use binomial distribution

Mini Exam

Q1: A jar has 4 red, 3 green, and 5 blue marbles. One marble is drawn at random. Find the probability it is (a) red, (b) not blue, (c) red or green.

Total marbles = 12.

(a) \(P(\text{red}) = \frac{4}{12} = \frac{1}{3}\)

(b) \(P(\text{not blue}) = 1 – \frac{5}{12} = \frac{7}{12}\)

Q2: Two cards are drawn without replacement from a standard deck. Find the probability that both are spades.

\[P = \frac{13}{52} \times \frac{12}{51} = \frac{156}{2652} = \frac{1}{17}\]

Q3: Events A and B are independent with \(P(A) = 0.4\) and \(P(B) = 0.7\). Find \(P(A \cup B)\).

\[P(A \cap B) = 0.4 \times 0.7 = 0.28\]

\[P(A \cup B) = 0.4 + 0.7 – 0.28 = 0.82\]

Challenge Exam Questions — Mixed Types

These questions are at the level of midterm and final exams. Try each one before revealing the answer!

Multiple Choice Questions

MCQ 1: If \(P(A) = 0.6\), \(P(B) = 0.4\), and \(P(A \cap B) = 0.2\), then \(P(A \mid B)\) equals:
(a) 0.5 (b) 0.33 (c) 0.8 (d) 0.25

Answer: (a) 0.5

\[P(A \mid B) = \frac{P(A \cap B)}{P(B)} = \frac{0.2}{0.4} = 0.5\]

MCQ 2: If \(A\) and \(B\) are mutually exclusive events with \(P(A) = 0.3\) and \(P(B) = 0.5\), then \(P(A \cap B)\) equals:
(a) 0.15 (b) 0.8 (c) 0 (d) 0.2

Answer: (c) 0

Mutually exclusive means \(A \cap B = \emptyset\), so \(P(A \cap B) = 0\).

MCQ 3: A random variable \(X\) has \(E(X) = 3\) and \(\text{Var}(X) = 4\). Then \(E(2X + 1)\) and \(\text{Var}(2X + 1)\) are:
(a) 7 and 8 (b) 7 and 16 (c) 6 and 9 (d) 7 and 9

Answer: (b) 7 and 16

\(E(2X + 1) = 2E(X) + 1 = 2(3) + 1 = 7\)

\(\text{Var}(2X + 1) = 2^2 \cdot \text{Var}(X) = 4 \times 4 = 16\) (adding a constant doesn’t change variance)

MCQ 4: In a binomial distribution \(B(n, p)\), if \(n = 20\) and \(p = 0.3\), the expected value is:
(a) 6 (b) 14 (c) 4.2 (d) 10

Answer: (a) 6

\[E(X) = np = 20 \times 0.3 = 6\]

MCQ 5: The probability that a student passes a test is \(\frac{2}{3}\). If 4 students take the test, the probability that exactly 2 pass is:
(a) \(\frac{8}{27}\) (b) \(\frac{24}{81}\) (c) \(\frac{8}{81}\) (d) \(\frac{16}{81}\)

Answer: (b) \(\frac{24}{81} = \frac{8}{27}\)

\[P(X = 2) = \binom{4}{2} \left(\frac{2}{3}\right)^2 \left(\frac{1}{3}\right)^2 = 6 \times \frac{4}{9} \times \frac{1}{9} = \frac{24}{81} = \frac{8}{27}\]

Both (a) and (b) are equivalent since \(\frac{24}{81} = \frac{8}{27}\).

MCQ 6: If \(P(A) = \frac{1}{3}\), \(P(B) = \frac{1}{4}\), and \(A\) and \(B\) are independent, then \(P(\overline{A} \cap \overline{B})\) equals:
(a) \(\frac{1}{2}\) (b) \(\frac{7}{12}\) (c) \(\frac{5}{12}\) (d) \(\frac{1}{4}\)

Answer: (c) \(\frac{5}{12}\)

By De Morgan’s law: \(\overline{A} \cap \overline{B} = \overline{A \cup B}\)

\[P(\overline{A \cup B}) = 1 – P(A \cup B) = 1 – [P(A) + P(B) – P(A \cap B)]\]

\[= 1 – \left[\frac{1}{3} + \frac{1}{4} – \frac{1}{3} \times \frac{1}{4}\right] = 1 – \left[\frac{7}{12} – \frac{1}{12}\right] = 1 – \frac{6}{12} = \frac{6}{12} = \frac{1}{2}\]

Correction: Let me redo this. \(P(A \cap B) = \frac{1}{3} \times \frac{1}{4} = \frac{1}{12}\).

\[P(A \cup B) = \frac{1}{3} + \frac{1}{4} – \frac{1}{12} = \frac{4+3-1}{12} = \frac{6}{12} = \frac{1}{2}\]

\[P(\overline{A} \cap \overline{B}) = 1 – \frac{1}{2} = \frac{1}{2}\]

The correct answer is \(\frac{1}{2}\), which is option (a). I apologize for the earlier error.

Short Answer and Proof Questions

Q7: Prove that if \(A\) and \(B\) are events with \(P(A) > 0\), then \(P(A \cap B) + P(A \cap \overline{B}) = P(A)\).

Note that \(B\) and \(\overline{B}\) are mutually exclusive and \(B \cup \overline{B} = S\).

Therefore: \(A = A \cap S = A \cap (B \cup \overline{B}) = (A \cap B) \cup (A \cap \overline{B})\)

Since \(A \cap B\) and \(A \cap \overline{B}\) are mutually exclusive (because \(B\) and \(\overline{B}\) are):

\[P(A) = P((A \cap B) \cup (A \cap \overline{B})) = P(A \cap B) + P(A \cap \overline{B})\]

✓ This result is sometimes called the Law of Total Probability for two events.

Q8: A box contains 10 items, of which 3 are defective. Items are drawn one by one without replacement until the first defective is found. What is the probability that the first defective is found on the third draw?

For the first defective to be on the third draw, we need: first draw = good, second draw = good, third draw = defective.

\[P = \frac{7}{10} \times \frac{6}{9} \times \frac{3}{8} = \frac{126}{720} = \frac{7}{40} = 0.175\]

After drawing 2 good items (from 7 good), there are 8 items left, 3 of which are defective.

Q9: Prove that for any events \(A\) and \(B\): \(P(A \cup B) = P(A) + P(\overline{A} \cap B)\).

Write \(B = (A \cap B) \cup (\overline{A} \cap B)\). These two parts are mutually exclusive, so:

\[P(B) = P(A \cap B) + P(\overline{A} \cap B)\]

Therefore: \(P(\overline{A} \cap B) = P(B) – P(A \cap B)\)

Now substitute into \(P(A) + P(\overline{A} \cap B)\):

\[P(A) + P(B) – P(A \cap B) = P(A \cup B)\]

✓ This is just the general addition rule in disguise!

Q10: A student knows 80% of the material. On a 10-question true/false exam, if the student answers correctly when they know the material, and guesses (50% chance) when they don’t, what is the probability they get exactly 8 correct answers?

This requires the Law of Total Probability combined with the Binomial Distribution.

For each question, the probability of a correct answer is:

\[p = P(\text{correct}) = P(\text{knows}) \times 1 + P(\text{doesn’t know}) \times 0.5 = 0.80 \times 1 + 0.20 \times 0.5 = 0.90\]

Now use the binomial distribution with \(n = 10\), \(k = 8\), \(p = 0.9\):

\[P(X = 8) = \binom{10}{8} (0.9)^8 (0.1)^2 = 45 \times 0.43047 \times 0.01 \approx 0.1937\]

About 19.4% chance of getting exactly 8 correct.

Q11: Three machines A, B, C produce items with proportions 20%, 30%, and 50% respectively. Their defect rates are 2%, 3%, and 1%. An item is randomly selected and found to be defective. Find the probability it was produced by machine B.

Using Bayes’ Theorem:

\(P(A) = 0.20\), \(P(D \mid A) = 0.02\)

\(P(B) = 0.30\), \(P(D \mid B) = 0.03\)

\(P(C) = 0.50\), \(P(D \mid C) = 0.01\)

Numerator for B: \(0.30 \times 0.03 = 0.009\)

Total probability of defect:

\[0.20 \times 0.02 + 0.30 \times 0.03 + 0.50 \times 0.01 = 0.004 + 0.009 + 0.005 = 0.018\]

\[P(B \mid D) = \frac{0.009}{0.018} = 0.5\]

There is a 50% chance the defective item came from machine B, even though B only produces 30% of items. This is because B has the highest defect rate.

Q12: Prove Chebyshev’s Inequality for a random variable \(X\) with mean \(\mu\) and variance \(\sigma^2\): For any \(k > 0\), \(P(|X – \mu| \geq k\sigma) \leq \frac{1}{k^2}\).

Let \(Y = (X – \mu)^2\). Then \(E(Y) = \text{Var}(X) = \sigma^2\).

The event \(|X – \mu| \geq k\sigma\) is equivalent to \((X – \mu)^2 \geq k^2\sigma^2\), i.e., \(Y \geq k^2\sigma^2\).

By Markov’s Inequality (for non-negative random variables):

\[P(Y \geq a) \leq \frac{E(Y)}{a}\]

Setting \(a = k^2\sigma^2\):

\[P(Y \geq k^2\sigma^2) \leq \frac{\sigma^2}{k^2\sigma^2} = \frac{1}{k^2}\]

Therefore \(P(|X – \mu| \geq k\sigma) \leq \frac{1}{k^2}\). ✓

Q13: A fair coin is flipped until the first head appears. Find the expected number of flips. Also find the variance.

Let \(X\) = number of flips until first head. This is a geometric distribution with \(p = 0.5\).

\(P(X = k) = (1-p)^{k-1} \cdot p = (0.5)^{k-1} \cdot (0.5) = (0.5)^k\) for \(k = 1, 2, 3, \ldots\)

\[E(X) = \sum_{k=1}^{\infty} k \cdot (0.5)^k\]

Using the formula for geometric series: \(\sum_{k=1}^{\infty} k r^k = \frac{r}{(1-r)^2}\) for \(|r| < 1\).

\[E(X) = \frac{0.5}{(1-0.5)^2} = \frac{0.5}{0.25} = 2\]

For a geometric distribution: \(E(X) = \frac{1}{p} = \frac{1}{0.5} = 2\) and \(\text{Var}(X) = \frac{1-p}{p^2} = \frac{0.5}{0.25} = 2\).

So we expect 2 flips on average, with variance 2.

Q14: Two friends A and B agree to meet at a coffee shop between 12:00 and 13:00. Each arrives independently at a random time within this hour. What is the probability they meet, if each waits at most 15 minutes for the other?

Let \(x\) = arrival time of A (in minutes after 12:00) and \(y\) = arrival time of B. Both \(x\) and \(y\) are uniformly distributed on \([0, 60]\).

They meet if \(|x – y| \leq 15\).

The sample space is a \(60 \times 60\) square with area 3600.

The “meeting region” is the square minus two triangles in the corners where \(|x – y| > 15\). Each triangle has legs of length \(60 – 15 = 45\).

\[\text{Area of each triangle} = \frac{1}{2} \times 45 \times 45 = 1012.5\]

\[\text{Meeting area} = 3600 – 2 \times 1012.5 = 1575\]

\[P(\text{meet}) = \frac{1575}{3600} = \frac{7}{16} = 0.4375\]

About 43.75% chance they meet.

Q15: A pair of fair dice is rolled. Let \(X\) be the sum of the two numbers. Find the probability distribution of \(X\), and verify that \(\sum P(X = k) = 1\).

\(k\) (sum)	Ways	\(P(X=k)\)
2	1	1/36
3	2	2/36
4	3	3/36
5	4	4/36
6	5	5/36
7	6	6/36
8	5	5/36
9	4	4/36
10	3	3/36
11	2	2/36
12	1	1/36

Verification: \(\frac{1+2+3+4+5+6+5+4+3+2+1}{36} = \frac{36}{36} = 1\) ✓

Q16: If \(P(A) = 0.6\), \(P(B \mid A) = 0.4\), and \(P(B \mid \overline{A}) = 0.7\), find \(P(B)\) and \(P(A \mid B)\).

By the Theorem of Total Probability:

\[P(B) = P(A) \cdot P(B \mid A) + P(\overline{A}) \cdot P(B \mid \overline{A}) = 0.6 \times 0.4 + 0.4 \times 0.7 = 0.24 + 0.28 = 0.52\]

By Bayes’ Theorem:

\[P(A \mid B) = \frac{P(A) \cdot P(B \mid A)}{P(B)} = \frac{0.6 \times 0.4}{0.52} = \frac{0.24}{0.52} = \frac{6}{13} \approx 0.4615\]

Q17: A coin is biased so that \(P(H) = \frac{2}{3}\) and \(P(T) = \frac{1}{3}\). The coin is flipped 6 times. Find the probability of getting more heads than tails.

“More heads than tails” means 4, 5, or 6 heads (since with 3 heads and 3 tails, they are equal).

\[P(X = 4) = \binom{6}{4}\left(\frac{2}{3}\right)^4\left(\frac{1}{3}\right)^2 = 15 \times \frac{16}{81} \times \frac{1}{9} = \frac{240}{729}\]

\[P(X = 5) = \binom{6}{5}\left(\frac{2}{3}\right)^5\left(\frac{1}{3}\right)^1 = 6 \times \frac{32}{243} \times \frac{1}{3} = \frac{192}{729}\]

\[P(X = 6) = \binom{6}{6}\left(\frac{2}{3}\right)^6 = 1 \times \frac{64}{729} = \frac{64}{729}\]

\[P(\text{more heads}) = \frac{240 + 192 + 64}{729} = \frac{496}{729} \approx 0.680\]

Q18: Prove the Law of Large Numbers intuitively: Explain why, as the number of trials \(n\) increases, the relative frequency of success approaches the theoretical probability \(p\).

Let \(X\) be the number of successes in \(n\) Bernoulli trials with probability \(p\). Then \(E(X) = np\) and \(\text{Var}(X) = np(1-p)\).

The relative frequency is \(\frac{X}{n}\), with:

\[E\left(\frac{X}{n}\right) = \frac{np}{n} = p \qquad \text{Var}\left(\frac{X}{n}\right) = \frac{np(1-p)}{n^2} = \frac{p(1-p)}{n}\]

As \(n \to \infty\), the variance \(\frac{p(1-p)}{n} \to 0\). This means the values of \(\frac{X}{n}\) cluster more and more tightly around the mean \(p\). By Chebyshev’s inequality, for any \(\epsilon > 0\):

\[P\left(\left|\frac{X}{n} – p\right| \geq \epsilon\right) \leq \frac{\text{Var}(X/n)}{\epsilon^2} = \frac{p(1-p)}{n\epsilon^2} \to 0 \text{ as } n \to \infty\]

This proves that the relative frequency converges to the theoretical probability. ✓

Introduction: What is Probability?

Part 1: Sample Spaces and Events

What is a Sample Space?

Examples of Sample Spaces

What is an Event?

Types of Events

Part 2: Finite Probability Spaces (Laplace’s Definition)

Assigning Probabilities to Outcomes

Example 7: Rolling a Die

Example 8: Sum of Two Dice

Axioms of Probability

Basic Rules of Probability

Example 9: Using the General Addition Rule

Example 10: Using the Complement Rule

Part 3: Conditional Probability

What is Conditional Probability?

Example 11: Conditional Probability with Dice

Example 12: Conditional Probability with Cards

Multiplication Rule

Example 13: Drawing Without Replacement

Part 4: Independent Events

What Does “Independent” Mean?

Example 14: Independent Events with Dice

Example 15: Checking Independence

Independence of More Than Two Events

Part 5: Bayes’ Theorem

The Problem Bayes’ Theorem Solves

Partition of the Sample Space

Theorem of Total Probability

Bayes’ Theorem

Example 16: The Medical Test Problem

Example 17: Bayes’ Theorem with Three Groups

Part 6: Expected Value and Variance

What is Expected Value?

Example 18: Expected Value of a Die Roll

Example 19: Expected Value of a Lottery

Properties of Expected Value

Variance

Example 20: Variance of a Die Roll

Part 7: Bernoulli Trials and Binomial Distribution

What are Bernoulli Trials?

Example 21: Coin Flips

Expected Value and Variance of Binomial Distribution

Revision Summary: Discrete Probability

1. Sample Space and Events

2. Probability Rules

3. Conditional Probability

4. Independence

5. Bayes’ Theorem

6. Expected Value and Variance

7. Binomial Distribution

Problem-Solving Strategy

Mini Exam

Challenge Exam Questions — Mixed Types

Multiple Choice Questions

Short Answer and Proof Questions

Related Posts

Leave a Comment Cancel Reply