Statistics: Detailed Notes & Exam Questions | Grade 12 Mathematics Unit 3

Statistics : Detailed Notes & Exam Questions | Grade 12 Mathematics Unit 3

1. What is Statistics?

Dear student, welcome to Unit 3 of your Grade 12 Mathematics! In this unit, we study Statistics — the branch of mathematics that deals with collecting, organizing, summarizing, and analyzing data.

Think about it: when you hear that “the average score of Grade 12 students in mathematics is 65,” someone has collected data from many students, calculated a summary number, and presented it. That is statistics in action!

In Grade 12, we focus on two big areas:

  • Measures of Central Tendency — Mean, Median, Mode (where is the center of the data?)
  • Measures of Dispersion (Spread) — Range, Variance, Standard Deviation, Coefficient of Variation (how spread out is the data?)

We also study grouped data (data organized in frequency tables) and learn how to draw and interpret ogive curves (cumulative frequency curves).

Are you ready? Let’s go step by step!

2. Types of Data

Before we calculate anything, we need to understand our data:

2.1 Ungrouped (Raw) Data

This is data listed individually, like: 12, 15, 18, 20, 15, 22, 19

Each value is shown separately. We can work with it directly.

2.2 Grouped Data

When we have a lot of data, we organize it into a frequency distribution table with class intervals. For example:

Class IntervalFrequency (\(f\))
10 – 194
20 – 297
30 – 3910
40 – 495

2.3 Key Terms for Grouped Data

  • Class interval: The range of values in a group (e.g., 10–19)
  • Class width (size): Upper limit − Lower limit = \( 19 – 10 = 9 \). Sometimes adjusted for continuous data.
  • Class midpoint (mark): \( \frac{\text{Upper} + \text{Lower}}{2} \). For 10–19: \( \frac{10+19}{2} = 14.5 \)
  • Frequency (\(f\)): How many values fall in that class
  • Cumulative frequency: Running total of frequencies
  • Class boundary: Used for continuous data to close gaps. If classes are 10–19, 20–29, boundaries are 9.5, 19.5, 29.5

Why do we use class boundaries? Because if one class ends at 19 and the next starts at 20, where does 19.5 go? Class boundaries remove this confusion.

Important: Class boundary = Lower limit − 0.5 and Upper limit + 0.5 (when data is in whole numbers and classes have gaps of 1). If classes are 10–19, 20–29: lower boundary of first class = 9.5, upper boundary = 19.5.

3. Measures of Central Tendency — Mean

The mean (arithmetic average) is the most common measure of central tendency. Let’s learn it for both ungrouped and grouped data.

3.1 Mean of Ungrouped Data

\[ \bar{x} = \frac{\sum x}{n} \]
Where \( \bar{x} \) = mean, \( \sum x \) = sum of all values, \( n \) = number of values.

Worked Example 1: Five students scored: 72, 85, 90, 68, 80 in a mathematics test. Find the mean score.

Solution:

\[ \bar{x} = \frac{72 + 85 + 90 + 68 + 80}{5} = \frac{395}{5} = 79 \]

The mean score is 79.

3.2 Mean of Grouped Data

For grouped data, we use the class midpoints \( x_i \) and frequencies \( f_i \):

\[ \bar{x} = \frac{\sum f_i x_i}{\sum f_i} = \frac{\sum f_i x_i}{n} \]
Where \( x_i \) = midpoint of class \( i \), \( f_i \) = frequency of class \( i \), \( n = \sum f_i \).

Worked Example 2: Find the mean from the following frequency distribution:

ClassFrequency (\(f\))Midpoint (\(x\))\(f \cdot x\)
10 – 19414.558
20 – 29724.5171.5
30 – 391034.5345
40 – 49544.5222.5
Total\(n = 26\)\(\sum fx = 797\)
\[ \bar{x} = \frac{797}{26} \approx 30.65 \]

The mean is approximately 30.65.

3.3 Mean Using Assumed Mean (Shortcut Method)

When midpoints are large numbers, we can use a shortcut to reduce calculation errors:

\[ \bar{x} = A + \frac{\sum f_i d_i}{n} \]
Where \( A \) = assumed mean (usually the midpoint of the class with highest frequency), \( d_i = x_i – A \).

Worked Example 3: Using the same data as Example 2, find the mean using the assumed mean method with \( A = 34.5 \).

Class\(f\)\(x\)\(d = x – 34.5\)\(f \cdot d\)
10 – 19414.5\(-20\)\(-80\)
20 – 29724.5\(-10\)\(-70\)
30 – 391034.5\(0\)\(0\)
40 – 49544.5\(10\)\(50\)
Total26\(-100\)
\[ \bar{x} = 34.5 + \frac{-100}{26} = 34.5 – 3.85 = 30.65 \]

Same answer! This method is faster and less prone to arithmetic errors. Can you see why choosing \( A \) from the middle class gives smaller \( d \) values?

Exam Note: Both methods give the same answer. The assumed mean method is recommended when midpoints are large. Always show your working table clearly.
Practice Question 1

The marks of 40 students are given in the frequency table below. Find the mean mark.

Class0–910–1920–2930–3940–49
Frequency3815104

Solution:

Class\(f\)\(x\)\(fx\)
0 – 934.513.5
10 – 19814.5116
20 – 291524.5367.5
30 – 391034.5345
40 – 49444.5178
Total401020
\[ \bar{x} = \frac{1020}{40} = 25.5 \]

The mean mark is 25.5.

4. Measures of Central Tendency — Median

The median is the middle value when data is arranged in order. It divides the data into two equal halves.

4.1 Median of Ungrouped Data

  • If \( n \) is odd: Median = value at position \( \frac{n+1}{2} \)
  • If \( n \) is even: Median = average of values at positions \( \frac{n}{2} \) and \( \frac{n}{2} + 1 \)

Worked Example 4: Find the median of: 13, 7, 22, 15, 9

Solution: First arrange in order: 7, 9, 13, 15, 22

\( n = 5 \) (odd), so position = \( \frac{5+1}{2} = 3 \)

The 3rd value is 13. Median = 13.

Worked Example 5: Find the median of: 20, 35, 18, 42, 28, 31

Solution: Arrange in order: 18, 20, 28, 31, 35, 42

\( n = 6 \) (even), so we average positions 3 and 4:

\[ \text{Median} = \frac{28 + 31}{2} = \frac{59}{2} = 29.5 \]

4.2 Median of Grouped Data

For grouped data, we use a formula based on the cumulative frequency:

\[ \text{Median} = L + \left(\frac{\frac{n}{2} – CF}{f}\right) \times h \]
Where:
  • \( L \) = lower class boundary of the median class
  • \( n \) = total frequency
  • \( CF \) = cumulative frequency of the class before the median class
  • \( f \) = frequency of the median class
  • \( h \) = class width (using boundaries)
Median class = the class where the \( \frac{n}{2} \)-th value falls.

Worked Example 6: Find the median from the following distribution:

ClassFrequencyCF
10 – 1944
20 – 29711
30 – 391021
40 – 49526

Solution:

\( n = 26 \), so \( \frac{n}{2} = 13 \)

The 13th value falls in the class 30 – 39 (since CF reaches 11 before this class, and 21 after).

\( L = 29.5 \) (lower boundary), \( CF = 11 \), \( f = 10 \), \( h = 10 \) (boundaries: 29.5 to 39.5)

\[ \text{Median} = 29.5 + \left(\frac{13 – 11}{10}\right) \times 10 = 29.5 + 2 = 31.5 \]
Key Exam Notes on Median:
  • Always arrange ungrouped data in order first!
  • For grouped data, always construct a cumulative frequency column
  • Use class boundaries (not limits) for \( L \) and \( h \)
  • Common error: Forgetting that \( CF \) is the cumulative frequency BEFORE the median class, not including it
Practice Question 2

Find the median of the following data: 45, 32, 67, 52, 41, 38, 59, 73

Solution: Arrange in order: 32, 38, 41, 45, 52, 59, 67, 73

\( n = 8 \) (even). Positions: \( \frac{8}{2} = 4 \) and \( \frac{8}{2} + 1 = 5 \)

\[ \text{Median} = \frac{45 + 52}{2} = \frac{97}{2} = 48.5 \]
Practice Question 3

Find the median for the grouped data below:

Class0–910–1920–2930–3940–49
Frequency51218105

Solution: Build CF table:

Class\(f\)CF
0 – 955
10 – 191217
20 – 291835
30 – 391045
40 – 49550

\( n = 50 \), \( \frac{n}{2} = 25 \). The 25th value falls in class 20 – 29 (CF before = 17).

\( L = 19.5 \), \( CF = 17 \), \( f = 18 \), \( h = 10 \)

\[ \text{Median} = 19.5 + \left(\frac{25 – 17}{18}\right) \times 10 = 19.5 + \frac{80}{18} = 19.5 + 4.44 = 23.94 \]

Median ≈ 23.94

5. Measures of Central Tendency — Mode

The mode is the value that occurs most frequently.

5.1 Mode of Ungrouped Data

Simply identify the value with the highest frequency.

Worked Example 7: Find the mode of: 5, 7, 7, 3, 7, 9, 5, 7, 3

Solution: The value 7 appears 4 times (most frequent). Mode = 7.

What if two values appear equally often? Then the data is bimodal — it has two modes!

5.2 Mode of Grouped Data

\[ \text{Mode} = L + \left(\frac{f_1 – f_0}{2f_1 – f_0 – f_2}\right) \times h \]
Where:
  • \( L \) = lower class boundary of the modal class
  • \( f_1 \) = frequency of the modal class (highest frequency)
  • \( f_0 \) = frequency of the class before the modal class
  • \( f_2 \) = frequency of the class after the modal class
  • \( h \) = class width (using boundaries)

Worked Example 8: Find the mode from the distribution in Example 6.

Solution: The modal class is 30 – 39 (highest frequency = 10).

\( L = 29.5 \), \( f_1 = 10 \), \( f_0 = 7 \), \( f_2 = 5 \), \( h = 10 \)

\[ \text{Mode} = 29.5 + \left(\frac{10 – 7}{2(10) – 7 – 5}\right) \times 10 \] \[ = 29.5 + \frac{3}{20 – 12} \times 10 = 29.5 + \frac{30}{8} = 29.5 + 3.75 = 33.25 \]
Exam-Style Question 1

For the frequency distribution below, find: (a) Mean, (b) Median, (c) Mode

Class5–910–1415–1920–2425–29
Frequency61420128

(a) Mean:

Class\(f\)\(x\)\(fx\)
5 – 96742
10 – 141412168
15 – 192017340
20 – 241222264
25 – 29827216
Total601030
\[ \bar{x} = \frac{1030}{60} \approx 17.17 \]

(b) Median: \( \frac{n}{2} = 30 \). The 30th value falls in class 15–19 (CF before = 20).

\( L = 14.5 \), \( CF = 20 \), \( f = 20 \), \( h = 5 \)

\[ \text{Median} = 14.5 + \left(\frac{30 – 20}{20}\right) \times 5 = 14.5 + 2.5 = 17 \]

(c) Mode: Modal class = 15–19. \( L = 14.5 \), \( f_1 = 20 \), \( f_0 = 14 \), \( f_2 = 12 \), \( h = 5 \)

\[ \text{Mode} = 14.5 + \left(\frac{20 – 14}{40 – 14 – 12}\right) \times 5 = 14.5 + \frac{30}{14} \times 5 = 14.5 + 10.71 = 25.21 \]

Note: The mode (25.21) being higher than the mean (17.17) here is unusual — it happened because the frequency drops steeply after the modal class. In most real data, mean, median, and mode are closer together.

6. Measures of Dispersion — Range

Central tendency tells us where the center is, but not how spread out the data is. For that, we need measures of dispersion.

Worked Example 9: Data: 15, 23, 8, 31, 19, 27. Find the range.

\[ \text{Range} = 31 – 8 = 23 \]

The range is simple but only uses two values — it ignores all the data in between. Let’s learn better measures!

7. Measures of Dispersion — Variance and Standard Deviation

The variance and standard deviation are the most important measures of spread. They consider every value in the data.

7.1 The Idea Behind Variance

We want to measure how far each value is from the mean. If we just average the deviations \( (x_i – \bar{x}) \), we get zero (they cancel out). So we square each deviation first, then average:

\[ \text{Variance} = \frac{\text{Sum of squared deviations}}{n} \]

7.2 Variance and Standard Deviation of Ungrouped Data

\[ \sigma^2 = \frac{\sum (x_i – \bar{x})^2}{n} \]
\[ \sigma = \sqrt{\frac{\sum (x_i – \bar{x})^2}{n}} \]
Where \( \sigma^2 \) = variance, \( \sigma \) = standard deviation.

Why standard deviation? Because variance is in squared units (e.g., “marks squared”), which doesn’t make physical sense. Standard deviation is in the same units as the data.

Worked Example 10: Find the variance and standard deviation of: 4, 8, 6, 5, 7

Solution:

Step 1: Mean = \( \frac{4+8+6+5+7}{5} = \frac{30}{5} = 6 \)

Step 2: Calculate deviations and squared deviations:

\(x\)\(x – \bar{x}\)\((x – \bar{x})^2\)
4\(-2\)4
8\(2\)4
6\(0\)0
5\(-1\)1
7\(1\)1
Sum = 0\(\sum = 10\)
\[ \sigma^2 = \frac{10}{5} = 2 \] \[ \sigma = \sqrt{2} \approx 1.41 \]

7.3 Shortcut Formula for Variance

Calculating deviations is tedious. Use this equivalent formula:

\[ \sigma^2 = \frac{\sum x_i^2}{n} – \left(\frac{\sum x_i}{n}\right)^2 = \frac{\sum x^2}{n} – \bar{x}^2 \]

Worked Example 11: Using the shortcut for the same data: 4, 8, 6, 5, 7

\[ \sum x = 30, \quad \sum x^2 = 16 + 64 + 36 + 25 + 49 = 190 \] \[ \sigma^2 = \frac{190}{5} – 6^2 = 38 – 36 = 2 \]

Same answer — but much faster! Can you see why this formula works? It expands \( \sum(x – \bar{x})^2 \) algebraically.

7.4 Variance and Standard Deviation of Grouped Data

\[ \sigma^2 = \frac{\sum f_i x_i^2}{n} – \bar{x}^2 \] \[ \sigma = \sqrt{\frac{\sum f_i x_i^2}{n} – \bar{x}^2} \]
Where \( x_i \) = midpoint, \( f_i \) = frequency, \( n = \sum f_i \).

Worked Example 12: Find the variance and standard deviation:

ClassFrequency
10 – 143
15 – 195
20 – 248
25 – 294

Solution:

Class\(f\)\(x\)\(fx\)\(x^2\)\(fx^2\)
10 – 1431236144432
15 – 19517852891445
20 – 248221764843872
25 – 294271087292916
Total204058665
\[ \bar{x} = \frac{405}{20} = 20.25 \] \[ \sigma^2 = \frac{8665}{20} – (20.25)^2 = 433.25 – 410.0625 = 23.1875 \] \[ \sigma = \sqrt{23.1875} \approx 4.82 \]
Exam Note: The shortcut formula \( \sigma^2 = \frac{\sum fx^2}{n} – \bar{x}^2 \) is the standard method for grouped data. Always show your table with columns for \( f \), \( x \), \( fx \), \( x^2 \), and \( fx^2 \). Marks are awarded for the table!

8. Coefficient of Variation (CV)

The coefficient of variation compares the spread of two different datasets, even if they have different units or very different means.

\[ CV = \frac{\sigma}{\bar{x}} \times 100\% \]
CV is always expressed as a percentage. A smaller CV means the data is less variable (more consistent).

Worked Example 13: Two factories produce light bulbs. Factory A: mean life = 1200 hrs, SD = 80 hrs. Factory B: mean life = 1500 hrs, SD = 110 hrs. Which factory is more consistent?

\[ CV_A = \frac{80}{1200} \times 100\% = 6.67\% \] \[ CV_B = \frac{110}{1500} \times 100\% = 7.33\% \]

Since \( CV_A < CV_B \), Factory A is more consistent.

You cannot compare standard deviations directly when the means are different — that’s why we use CV!

Practice Question 4

The heights of 50 students have mean 165 cm and variance 25 cm². Find the coefficient of variation.

\[ \sigma = \sqrt{25} = 5 \text{ cm} \] \[ CV = \frac{5}{165} \times 100\% \approx 3.03\% \]
Exam-Style Question 2

For the data below, calculate: (a) Mean, (b) Variance, (c) Standard Deviation, (d) Coefficient of Variation

Class20–2425–2930–3435–3940–44
Frequency4101682

Solution:

Class\(f\)\(x\)\(fx\)\(x^2\)\(fx^2\)
20 – 24422884841936
25 – 2910272707297290
30 – 341632512102416384
35 – 39837296136910952
40 – 442428417643528
Total40125040090

(a) Mean: \( \bar{x} = \frac{1250}{40} = 31.25 \)

(b) Variance: \( \sigma^2 = \frac{40090}{40} – (31.25)^2 = 1002.25 – 976.5625 = 25.6875 \)

(c) Standard Deviation: \( \sigma = \sqrt{25.6875} \approx 5.07 \)

(d) CV: \( CV = \frac{5.07}{31.25} \times 100\% \approx 16.22\% \)

9. Quartiles and Percentiles

Quartiles and percentiles are measures of position — they tell us where a particular value stands in the dataset.

9.1 Quartiles

Quartiles divide ordered data into four equal parts:

  • \( Q_1 \) (First Quartile): 25% of data is below it
  • \( Q_2 \) (Second Quartile) = Median: 50% of data is below it
  • \( Q_3 \) (Third Quartile): 75% of data is below it

9.2 Quartiles for Ungrouped Data

\[ Q_1 = \text{value at position } \frac{n+1}{4} \]
\[ Q_3 = \text{value at position } \frac{3(n+1)}{4} \]
If the position is not a whole number, interpolate between the two nearest values.

Worked Example 14: Find \( Q_1 \) and \( Q_3 \) for: 11, 15, 18, 22, 25, 30, 35, 40, 42

\( n = 9 \)

\[ Q_1 \text{ position} = \frac{9+1}{4} = 2.5 \] \[ Q_1 = \frac{15 + 18}{2} = 16.5 \]
\[ Q_3 \text{ position} = \frac{3(9+1)}{4} = 7.5 \] \[ Q_3 = \frac{35 + 40}{2} = 37.5 \]

9.3 Quartiles for Grouped Data

\[ Q_1 = L + \left(\frac{\frac{n}{4} – CF}{f}\right) \times h \]
\[ Q_3 = L + \left(\frac{\frac{3n}{4} – CF}{f}\right) \times h \]
Same structure as the median formula, but using \( \frac{n}{4} \) and \( \frac{3n}{4} \) instead of \( \frac{n}{2} \).

Worked Example 15: Find \( Q_1 \) and \( Q_3 \) from:

ClassFrequencyCF
0 – 955
10 – 191217
20 – 291835
30 – 391045
40 – 49550

Solution: \( n = 50 \)

\( Q_1 \): \( \frac{n}{4} = 12.5 \). The 12.5th value falls in class 10–19 (CF before = 5).

\[ Q_1 = 9.5 + \left(\frac{12.5 – 5}{12}\right) \times 10 = 9.5 + \frac{75}{12} = 9.5 + 6.25 = 15.75 \]

\( Q_3 \): \( \frac{3n}{4} = 37.5 \). The 37.5th value falls in class 30–39 (CF before = 35).

\[ Q_3 = 29.5 + \left(\frac{37.5 – 35}{10}\right) \times 10 = 29.5 + 2.5 = 32 \]

9.4 Interquartile Range (IQR)

\[ IQR = Q_3 – Q_1 \]
The IQR tells us the range of the middle 50% of the data. It is less affected by extreme values than the range.

From Example 15: \( IQR = 32 – 15.75 = 16.25 \)

9.5 Percentiles

\[ P_k = L + \left(\frac{\frac{kn}{100} – CF}{f}\right) \times h \]
The \( k \)-th percentile \( P_k \) is the value below which \( k\% \) of the data falls.

For example, \( P_{30} \) means the value below which 30% of the data falls.

Remember: \( Q_1 = P_{25} \), \( Q_2 = P_{50} = \) Median, \( Q_3 = P_{75} \). Quartiles are special percentiles!
Practice Question 5

Find \( Q_1 \), \( Q_3 \), and IQR for the ungrouped data: 6, 10, 14, 18, 22, 26, 30

\( n = 7 \)

\[ Q_1 \text{ position} = \frac{7+1}{4} = 2 \implies Q_1 = 10 \] \[ Q_3 \text{ position} = \frac{3(7+1)}{4} = 6 \implies Q_3 = 26 \] \[ IQR = 26 – 10 = 16 \]

10. Ogive (Cumulative Frequency Curve)

An ogive is a graph of cumulative frequency against the upper class boundary. It is an S-shaped curve used to estimate quartiles, percentiles, and the median graphically.

10.1 How to Draw an Ogive

  1. Calculate the cumulative frequency for each class
  2. Plot points using upper class boundaries on the x-axis and cumulative frequency on the y-axis
  3. Join the points with a smooth curve (start from the point where CF = 0)
  4. To find the median: draw a horizontal line from \( \frac{n}{2} \) on the y-axis to the curve, then drop vertically to the x-axis

Worked Example 16: Draw an ogive for the following data and estimate the median.

ClassFrequencyUpper BoundaryCF
0 – 949.54
10 – 191019.514
20 – 291629.530
30 – 39839.538
40 – 49249.540

Points to plot: (0, 0), (9.5, 4), (19.5, 14), (29.5, 30), (39.5, 38), (49.5, 40)

Estimating median: \( \frac{n}{2} = 20 \). Draw a horizontal line from y = 20 to the curve, then drop down.

From the curve, this corresponds to approximately \( x \approx 25 \). So the estimated median is about 25.

In exams, you would draw this on graph paper and read the value carefully. The formula gives a more precise answer, but the ogive gives a good visual estimate!

Exam Note: When drawing an ogive:
  • Always start from CF = 0 (the origin of the cumulative frequency)
  • Use upper class BOUNDARIES (not limits)
  • The curve should be smooth (not straight lines between points)
  • Label both axes clearly with a title
Exam-Style Question 3

From the ogive data below, list the points to plot and estimate \( Q_1 \).

Class5–1415–2425–3435–4445–54
Frequency81520107

First find upper boundaries and CF:

Class\(f\)Upper BoundaryCF
5 – 14814.58
15 – 241524.523
25 – 342034.543
35 – 441044.553
45 – 54754.560

Points to plot: (4.5, 0), (14.5, 8), (24.5, 23), (34.5, 43), (44.5, 53), (54.5, 60)

Estimate \( Q_1 \): \( \frac{n}{4} = \frac{60}{4} = 15 \). Draw horizontal line from y = 15 to the curve, drop down. From the data, this falls between x = 14.5 and x = 24.5, closer to 14.5 (since CF jumps from 8 to 23). Estimated \( Q_1 \approx 18 \).

Check by formula: \( Q_1 = 14.5 + \frac{15-8}{15} \times 10 = 14.5 + 4.67 = 19.17 \). The graphical estimate of 18 is reasonably close!

See also  Determinants and Properties: Detailed Notes, Solved Examples & Exam Questions | Grade 11 Mathematics Unit 4

11. Summary of Key Exam Notes

  • Mean = \( \frac{\sum x}{n} \) (ungrouped) or \( \frac{\sum fx}{n} \) (grouped). Use assumed mean method for large numbers.
  • Median = middle value. Use formula \( L + \frac{\frac{n}{2}-CF}{f} \times h \) for grouped data.
  • Mode = most frequent value. Use formula \( L + \frac{f_1-f_0}{2f_1-f_0-f_2} \times h \) for grouped data.
  • Variance = \( \frac{\sum fx^2}{n} – \bar{x}^2 \). Always use the shortcut formula!
  • Standard Deviation = \( \sqrt{\text{Variance}} \). Same units as data.
  • CV = \( \frac{\sigma}{\bar{x}} \times 100\% \). Used to compare consistency.
  • Quartiles use the same formula structure as median but with \( \frac{n}{4} \) and \( \frac{3n}{4} \).
  • Ogive uses upper class boundaries vs cumulative frequency. Always start from CF = 0.
  • For grouped formulas, always use class boundaries (not limits) for \( L \) and \( h \).
  • Always show your calculation table — examiners award marks for clear working!
Exam-Style Question 4

The monthly salaries (in Birr) of 60 workers are given below:

Salary (Birr)2000–24992500–29993000–34993500–39994000–4499
Number of workers81520125

Find: (a) Mean salary, (b) Median salary, (c) Standard deviation, (d) What percentage of workers earn above 3500 Birr?

(a) Mean:

Class\(f\)\(x\)\(fx\)\(x^2\)\(fx^2\)
2000–249982249.5179965060250.2540482002
2500–2999152749.541242.57559750.25113396253.75
3000–3499203249.56499010559250.25211185005
3500–3999123749.54499414058750.25168705003
4000–449954249.521247.518058250.2590291251.25
Total60190470624059515
\[ \bar{x} = \frac{190470}{60} = 3174.5 \text{ Birr} \]

(b) Median: \( \frac{n}{2} = 30 \). CF: 8, 23, 43, … → 30th value in class 3000–3499.

\[ \text{Median} = 2999.5 + \frac{30-23}{20} \times 500 = 2999.5 + 175 = 3174.5 \text{ Birr} \]

(c) Standard Deviation:

\[ \sigma^2 = \frac{624059515}{60} – (3174.5)^2 = 10400991.92 – 10077450.25 = 323541.67 \] \[ \sigma = \sqrt{323541.67} \approx 568.81 \text{ Birr} \]

(d) Workers above 3500 Birr: Those in classes 3500–3999 and 4000–4499 = 12 + 5 = 17 workers.

\[ \frac{17}{60} \times 100\% \approx 28.33\% \]

Quick Revision Notes — Statistics

1. Important Definitions

  • Mean (\( \bar{x} \)) — The sum of all values divided by the number of values. It uses every data point.
  • Median — The middle value when data is arranged in ascending order. It divides data into two equal halves.
  • Mode — The value that occurs most frequently in the data.
  • Variance (\( \sigma^2 \)) — The average of squared deviations from the mean. Measures spread.
  • Standard Deviation (\( \sigma \)) — The square root of variance. Same units as original data.
  • Coefficient of Variation (CV) — Standard deviation as a percentage of the mean. Used for comparison.
  • Quartiles — Values that divide data into four equal parts (\( Q_1, Q_2, Q_3 \)).
  • Ogive — Cumulative frequency curve plotted using upper class boundaries.
  • Class Boundary — Adjusted class limits to remove gaps between consecutive classes.
  • Class Midpoint — The average of the lower and upper limits of a class.

2. All Key Formulas

MEAN:

\[ \bar{x} = \frac{\sum x}{n} \quad \text{(ungrouped)} \] \[ \bar{x} = \frac{\sum f_i x_i}{\sum f_i} \quad \text{(grouped)} \] \[ \bar{x} = A + \frac{\sum f_i d_i}{n} \quad \text{(assumed mean method)} \]

MEDIAN:

\[ \text{Position} = \frac{n+1}{2} \quad \text{(ungrouped)} \] \[ \text{Median} = L + \left(\frac{\frac{n}{2} – CF}{f}\right) \times h \quad \text{(grouped)} \]

MODE:

\[ \text{Mode} = L + \left(\frac{f_1 – f_0}{2f_1 – f_0 – f_2}\right) \times h \quad \text{(grouped)} \]

VARIANCE & STANDARD DEVIATION:

\[ \sigma^2 = \frac{\sum (x_i – \bar{x})^2}{n} = \frac{\sum x_i^2}{n} – \bar{x}^2 \quad \text{(ungrouped)} \] \[ \sigma^2 = \frac{\sum f_i x_i^2}{n} – \bar{x}^2 \quad \text{(grouped)} \] \[ \sigma = \sqrt{\sigma^2} \]

COEFFICIENT OF VARIATION:

\[ CV = \frac{\sigma}{\bar{x}} \times 100\% \]

QUARTILES (Grouped):

\[ Q_1 = L + \left(\frac{\frac{n}{4} – CF}{f}\right) \times h \] \[ Q_3 = L + \left(\frac{\frac{3n}{4} – CF}{f}\right) \times h \] \[ IQR = Q_3 – Q_1 \]

PERCENTILE:

\[ P_k = L + \left(\frac{\frac{kn}{100} – CF}{f}\right) \times h \]

3. Relationship Between Mean, Median, and Mode

For a moderately skewed distribution:
\[ \text{Mode} \approx 3 \times \text{Median} – 2 \times \text{Mean} \]
  • If data is symmetrical: Mean ≈ Median ≈ Mode
  • If data is positively skewed (right-skewed): Mode < Median < Mean
  • If data is negatively skewed (left-skewed): Mean < Median < Mode

4. When to Use Which Measure

MeasureBest Used When
MeanData is fairly symmetric, no extreme outliers
MedianData has outliers or is skewed (median is resistant to extremes)
ModeData has a clear peak; useful for categorical data
SDGeneral measure of spread (most common)
CVComparing variability of two different datasets
IQRData has outliers (IQR is resistant to extremes)

5. Common Mistakes to Avoid

  1. Not sorting data for median — Always arrange ungrouped data in ascending order before finding the median!
  2. Using class limits instead of boundaries — In grouped data formulas, \( L \) and \( h \) must use class boundaries (e.g., 9.5, 19.5), not limits (e.g., 10, 19).
  3. Wrong CF in median/quartile formula — CF means the cumulative frequency BEFORE the median class, not including it.
  4. Confusing \( n \) positions — For median use \( \frac{n}{2} \), for \( Q_1 \) use \( \frac{n}{4} \), for \( Q_3 \) use \( \frac{3n}{4} \). Don’t mix them up!
  5. Forgetting to square in variance — Variance uses \( (x – \bar{x})^2 \), not \( (x – \bar{x}) \). And standard deviation is the square ROOT of variance.
  6. Wrong class width — Class width should use boundaries: \( h = \text{upper boundary} – \text{lower boundary} \), not just upper limit − lower limit (when classes have gaps).
  7. Ogive: using limits instead of boundaries — Always plot upper class BOUNDARIES on the x-axis.
  8. Not showing the calculation table — In exams, marks are awarded for the table. Don’t skip it!
  9. Arithmetic errors in the table — Double-check your \( fx \) and \( fx^2 \) columns. A small error in the table affects the final answer.
  10. Comparing SD directly instead of CV — When means are different, always use CV to compare spread.

6. Quick Examples

Q: Data: 3, 5, 7, 9, 11. Find mean, median, mode.

A: Mean = \( \frac{35}{5} = 7 \). Median = 3rd value = 7. Mode = no repeated value, so no mode. (Symmetric data!)

Q: If \( \bar{x} = 50 \) and \( \sigma = 10 \), find CV.

A: \( CV = \frac{10}{50} \times 100\% = 20\% \)

Q: If \( \sum x = 480 \), \( \sum x^2 = 24000 \), \( n = 12 \), find variance.

A: \( \bar{x} = 40 \). \( \sigma^2 = \frac{24000}{12} – 40^2 = 2000 – 1600 = 400 \)

Q: Class 20–29 has what lower class boundary?

A: \( 20 – 0.5 = 19.5 \)

Q: Which class is the median class if CFs are 5, 13, 28, 38, 40 and \( n = 40 \)?

A: \( \frac{n}{2} = 20 \). CF reaches 13 then 28, so the 20th value is in the class where CF goes from 13 to 28 (the 3rd class).

Challenge Exam Questions — Statistics

These questions test your deep understanding. Try each one fully before checking the answer!

Section A: Multiple Choice Questions

Question 1
MCQ

For the data: 5, 8, 8, 12, 15, the relationship between mean, median, and mode is:

A) Mean < Median < Mode    B) Mode < Median < Mean    C) Mean = Median = Mode    D) Median < Mode < Mean

Answer: B) Mode < Median < Mean

Mean = \( \frac{48}{5} = 9.6 \). Median = 8 (3rd value). Mode = 8.

So: Mode (8) = Median (8) < Mean (9.6). The data is slightly positively skewed (right-skewed). The closest option showing this pattern is B.

Question 2
MCQ

If the variance of a dataset is 144, the standard deviation is:

A) 12    B) 72    C) 144    D) 20736

Answer: A) 12

\[ \sigma = \sqrt{144} = 12 \]
Question 3
MCQ

The lower class boundary of the class 35–44 is:

A) 34.5    B) 35    C) 35.5    D) 34

Answer: A) 34.5

Lower boundary = lower limit − 0.5 = 35 − 0.5 = 34.5

Question 4
MCQ

If dataset A has CV = 15% and dataset B has CV = 22%, which is more consistent?

A) Dataset A    B) Dataset B    C) Both equally    D) Cannot determine

Answer: A) Dataset A

Lower CV means less variability, i.e., more consistency. Since 15% < 22%, Dataset A is more consistent.

Question 5
MCQ

The ogive curve is plotted using:

A) Frequency vs class midpoint    B) Cumulative frequency vs upper class boundary    C) Frequency vs lower class limit    D) Cumulative frequency vs class midpoint

Answer: B) Cumulative frequency vs upper class boundary

This is the standard definition. The x-axis has upper class boundaries and the y-axis has cumulative frequencies.

Section B: Fill in the Blanks

Question 6
Fill in the Blank

The class midpoint of the interval 100–149 is ________

Answer: 124.5

\[ \text{Midpoint} = \frac{100 + 149}{2} = \frac{249}{2} = 124.5 \]
Question 7
Fill in the Blank

If \( Q_1 = 25 \) and \( Q_3 = 45 \), the interquartile range is ________

Answer: 20

\[ IQR = Q_3 – Q_1 = 45 – 25 = 20 \]
Question 8
Fill in the Blank

The second quartile \( Q_2 \) is the same as the ________

Answer: Median

\( Q_2 = P_{50} = \) Median. All three represent the value below which 50% of the data falls.

Question 9
Fill in the Blank

If each value in a dataset is multiplied by 5, the mean is multiplied by ________ and the standard deviation is multiplied by ________

Answer: 5 and 5

If \( y_i = 5x_i \), then \( \bar{y} = 5\bar{x} \) and \( \sigma_y = 5\sigma_x \). Both mean and standard deviation are multiplied by the same constant. (Variance would be multiplied by 25.)

Question 10
Fill in the Blank

If each value is increased by 10, the standard deviation ________

Answer: Remains unchanged (stays the same)

Adding a constant shifts all values but doesn’t change their spread. So \( \sigma \) is unchanged. (Only the mean increases by 10.)

Section C: Short Answer Questions

Question 11
Short Answer

Find the mean and variance of: 6, 10, 4, 8, 12

\[ \sum x = 40, \quad \bar{x} = \frac{40}{5} = 8 \] \[ \sum x^2 = 36 + 100 + 16 + 64 + 144 = 360 \] \[ \sigma^2 = \frac{360}{5} – 8^2 = 72 – 64 = 8 \]

Mean = 8, Variance = 8

Question 12
Short Answer

Explain the difference between class limit and class boundary with an example.

Class limits are the smallest and largest values that can belong to a class. For example, in the class 20–29, the lower limit is 20 and the upper limit is 29.

See also  Introduction to Calculus: Detailed Notes & Exam Questions | Grade 12 Mathematics Unit 2

Class boundaries are adjusted values that close the gap between consecutive classes. For classes 20–29 and 30–39, the boundaries are 19.5, 29.5, 39.5. Boundary = limit − 0.5 (lower) or limit + 0.5 (upper) when data is in whole numbers.

Why boundaries? If a value is 29.6, does it go in 20–29 or 30–39? With boundaries, 29.5 is the dividing line, so 29.6 clearly goes in the second class.

Question 13
Short Answer

A dataset has mean 40 and standard deviation 5. A second dataset has mean 80 and standard deviation 8. Which dataset is more variable?

\[ CV_1 = \frac{5}{40} \times 100\% = 12.5\% \] \[ CV_2 = \frac{8}{80} \times 100\% = 10\% \]

Since \( CV_1 > CV_2 \), the first dataset is more variable relative to its mean.

Note: Even though the second dataset has a larger standard deviation (8 vs 5), its mean is also much larger, so relative to its mean, it is actually less variable!

Question 14
Short Answer

Find the median of: 3, 7, 2, 9, 1, 5, 8, 4, 6

Arrange in order: 1, 2, 3, 4, 5, 6, 7, 8, 9

\( n = 9 \) (odd). Position = \( \frac{9+1}{2} = 5 \)

The 5th value is 5. Median = 5.

Section D: Step-by-Step Calculation Questions

Question 15
Calculation

Find the mean, median, and mode for the following distribution:

Class0–45–910–1415–1920–2425–29
Frequency25121883

Mean:

Class\(f\)\(x\)\(fx\)
0 – 4224
5 – 95735
10 – 141212144
15 – 191817306
20 – 24822176
25 – 2932781
Total48746
\[ \bar{x} = \frac{746}{48} \approx 15.54 \]

Median: \( \frac{n}{2} = 24 \). CF: 2, 7, 19, 37, … → 24th value in class 15–19.

\[ \text{Median} = 14.5 + \frac{24-19}{18} \times 5 = 14.5 + 1.39 = 15.89 \]

Mode: Modal class = 15–19. \( L = 14.5 \), \( f_1 = 18 \), \( f_0 = 12 \), \( f_2 = 8 \), \( h = 5 \)

\[ \text{Mode} = 14.5 + \frac{18-12}{36-12-8} \times 5 = 14.5 + \frac{30}{16} = 14.5 + 1.875 = 16.375 \]

Mean ≈ 15.54, Median ≈ 15.89, Mode ≈ 16.38

Question 16
Calculation

Calculate the standard deviation and coefficient of variation for the data below:

Class50–5960–6970–7980–8990–99
Frequency61422126
Class\(f\)\(x\)\(fx\)\(x^2\)\(fx^2\)
50 – 59654.53272970.2517821.5
60 – 691464.59034160.2558243.5
70 – 792274.516395550.25122105.5
80 – 891284.510147140.2585683
90 – 99694.55678930.2553581.5
Total604450337435
\[ \bar{x} = \frac{4450}{60} \approx 74.17 \] \[ \sigma^2 = \frac{337435}{60} – (74.17)^2 = 5623.92 – 5501.19 = 122.73 \] \[ \sigma = \sqrt{122.73} \approx 11.08 \] \[ CV = \frac{11.08}{74.17} \times 100\% \approx 14.94\% \]
Question 17
Calculation

Find \( Q_1 \), \( Q_3 \), and IQR for the following data:

Class10–1920–2930–3940–4950–5960–69
Frequency481520103

CF table:

Class\(f\)CF
10 – 1944
20 – 29812
30 – 391527
40 – 492047
50 – 591057
60 – 69360

\( n = 60 \)

\( Q_1 \): \( \frac{60}{4} = 15 \). 15th value in class 30–39 (CF before = 12).

\[ Q_1 = 29.5 + \frac{15-12}{15} \times 10 = 29.5 + 2 = 31.5 \]

\( Q_3 \): \( \frac{3 \times 60}{4} = 45 \). 45th value in class 40–49 (CF before = 27).

\[ Q_3 = 39.5 + \frac{45-27}{20} \times 10 = 39.5 + 9 = 48.5 \]
\[ IQR = 48.5 – 31.5 = 17 \]
Question 18
Calculation

The mean and standard deviation of 25 observations are 50 and 4 respectively. If each observation is multiplied by 3 and then 5 is added, find the new mean and new standard deviation.

If \( y = 3x + 5 \), then:

\[ \bar{y} = 3\bar{x} + 5 = 3(50) + 5 = 155 \] \[ \sigma_y = |3| \cdot \sigma_x = 3 \times 4 = 12 \]

Rules: Multiplying by \( k \) multiplies both mean and SD by \( |k| \). Adding a constant shifts the mean but doesn’t change the SD.

New mean = 155, New standard deviation = 12

Question 19
Calculation

Two sections of Grade 12 took a mathematics exam. The results are:

Section A: \( n = 40 \), \( \bar{x} = 65 \), \( \sigma = 8 \)

Section B: \( n = 50 \), \( \bar{x} = 58 \), \( \sigma = 10 \)

(a) Find the combined mean.

(b) Which section performed more consistently?

(a) Combined mean:

\[ \bar{x}_{combined} = \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2}{n_1 + n_2} = \frac{40(65) + 50(58)}{90} = \frac{2600 + 2900}{90} = \frac{5500}{90} \approx 61.11 \]

(b) Consistency:

\[ CV_A = \frac{8}{65} \times 100\% \approx 12.31\% \] \[ CV_B = \frac{10}{58} \times 100\% \approx 17.24\% \]

Since \( CV_A < CV_B \), Section A performed more consistently.

Question 20
Calculation

Using the frequency distribution below:

Class0–910–1920–2930–3940–49
Frequency5102582

(a) Calculate the mean using the assumed mean method (take \( A = 24.5 \)).

(b) Calculate the variance and standard deviation.

(c) Verify the relationship: Mode ≈ 3 × Median − 2 × Mean

(a) Mean (assumed mean method):

Class\(f\)\(x\)\(d = x – 24.5\)\(fd\)
0 – 954.5\(-20\)\(-100\)
10 – 191014.5\(-10\)\(-100\)
20 – 292524.5\(0\)\(0\)
30 – 39834.5\(10\)\(80\)
40 – 49244.5\(20\)\(40\)
Total50\(-80\)
\[ \bar{x} = 24.5 + \frac{-80}{50} = 24.5 – 1.6 = 22.9 \]

(b) Variance and SD: Need \( \sum fx^2 \). From above, \( \sum fx = 1145 \).

Class\(f\)\(x\)\(x^2\)\(fx^2\)
0 – 954.520.25101.25
10 – 191014.5210.252102.5
20 – 292524.5600.2515006.25
30 – 39834.51190.259522
40 – 49244.51980.253960.5
Total5030692.5
\[ \sigma^2 = \frac{30692.5}{50} – (22.9)^2 = 613.85 – 524.41 = 89.44 \] \[ \sigma = \sqrt{89.44} \approx 9.46 \]

(c) Verify Mode ≈ 3 × Median − 2 × Mean:

Median: \( \frac{50}{2} = 25 \). 25th value in class 20–29 (CF before = 15).

\[ \text{Median} = 19.5 + \frac{25-15}{25} \times 10 = 19.5 + 4 = 23.5 \]

Mode: Modal class = 20–29. \( L = 19.5 \), \( f_1 = 25 \), \( f_0 = 10 \), \( f_2 = 8 \), \( h = 10 \)

\[ \text{Mode} = 19.5 + \frac{25-10}{50-10-8} \times 10 = 19.5 + \frac{150}{32} = 19.5 + 4.6875 = 24.19 \]

Check: \( 3 \times 23.5 – 2 \times 22.9 = 70.5 – 45.8 = 24.7 \)

\[ \text{Mode} = 24.19 \approx 24.7 = 3 \times \text{Median} – 2 \times \text{Mean} \]

The approximation is close! (Small differences are normal since this is an empirical relationship, not exact.)

Question 21
Calculation

The following data represents the weights (in kg) of 80 students. Find \( P_{30} \) (the 30th percentile).

Weight (kg)40–4445–4950–5455–5960–6465–69
Frequency6142220126

CF table:

Class\(f\)CF
40 – 4466
45 – 491420
50 – 542242
55 – 592062
60 – 641274
65 – 69680

\( \frac{30 \times 80}{100} = 24 \). The 24th value falls in class 50–54 (CF before = 20).

\[ P_{30} = 49.5 + \frac{24 – 20}{22} \times 5 = 49.5 + \frac{20}{22} = 49.5 + 0.91 = 50.41 \text{ kg} \]

This means 30% of students weigh less than approximately 50.41 kg.

Question 22
Calculation

The variance of 20 observations is 16. If each observation is increased by 3, find the new variance. If each observation is instead multiplied by 2, find the new variance.

Case 1: Adding 3 to each observation

Adding a constant does not change the spread. New variance = 16 (unchanged).

Case 2: Multiplying each observation by 2

If \( y = 2x \), then \( \sigma_y^2 = 2^2 \times \sigma_x^2 = 4 \times 16 = 64 \).

New variance = 64

Rule: Adding a constant → variance unchanged. Multiplying by \( k \) → variance multiplied by \( k^2 \).

Question 23
Calculation

A teacher recorded the following marks. Use the assumed mean method to find the mean and then calculate the standard deviation:

Marks30–3940–4950–5960–6970–7980–8990–99
Students3712181582

Take \( A = 64.5 \) (midpoint of the class with highest frequency, 60–69). Class width \( h = 10 \). Use \( u = \frac{x – A}{h} \):

Class\(f\)\(x\)\(u = \frac{x-64.5}{10}\)\(fu\)\(fu^2\)
30 – 39334.5\(-3\)\(-9\)27
40 – 49744.5\(-2\)\(-14\)28
50 – 591254.5\(-1\)\(-12\)12
60 – 691864.5\(0\)\(0\)0
70 – 791574.5\(1\)\(15\)15
80 – 89884.5\(2\)\(16\)32
90 – 99294.5\(3\)\(6\)18
Total65\(2\)\(132\)
\[ \bar{x} = A + \frac{\sum fu}{n} \times h = 64.5 + \frac{2}{65} \times 10 = 64.5 + 0.31 = 64.81 \]
\[ \sigma = h \times \sqrt{\frac{\sum fu^2}{n} – \left(\frac{\sum fu}{n}\right)^2} = 10 \times \sqrt{\frac{132}{65} – \left(\frac{2}{65}\right)^2} \]
\[ = 10 \times \sqrt{2.0308 – 0.0009} = 10 \times \sqrt{2.0299} = 10 \times 1.425 = 14.25 \]

Mean ≈ 64.81, Standard Deviation ≈ 14.25

The step-deviation method (using \( u \)) makes calculations much easier when class widths are equal. Remember: \( \sigma = h \times \sigma_u \).

Question 24
Calculation

List the points to plot for an ogive from the data below. Then estimate the median graphically and verify with the formula.

Class0–45–910–1415–1920–24
Frequency371262

Points to plot (Upper Boundary, CF):

Class\(f\)Upper BoundaryCF
0 – 434.53
5 – 979.510
10 – 141214.522
15 – 19619.528
20 – 24224.530

Plot points: (−0.5, 0), (4.5, 3), (9.5, 10), (14.5, 22), (19.5, 28), (24.5, 30)

Graphical estimate: \( \frac{n}{2} = 15 \). From y = 15, the curve gives approximately \( x \approx 12 \).

Formula verification: 15th value in class 10–14 (CF before = 10).

\[ \text{Median} = 9.5 + \frac{15-10}{12} \times 5 = 9.5 + 2.08 = 11.58 \]

The graphical estimate (≈12) is close to the formula answer (11.58). The small difference is expected with graphical methods.

Question 25
Calculation

Prove that the sum of deviations of all observations from their mean is zero, i.e., \( \sum (x_i – \bar{x}) = 0 \). Then verify with the data: 4, 8, 6, 10, 2.

Proof:

\[ \sum (x_i – \bar{x}) = \sum x_i – \sum \bar{x} = \sum x_i – n\bar{x} \]

Since \( \bar{x} = \frac{\sum x_i}{n} \), we have \( n\bar{x} = \sum x_i \).

\[ \sum (x_i – \bar{x}) = \sum x_i – \sum x_i = 0 \]

Verification: Data: 4, 8, 6, 10, 2. Mean = \( \frac{30}{5} = 6 \).

\[ \sum (x_i – \bar{x}) = (4-6) + (8-6) + (6-6) + (10-6) + (2-6) \] \[ = -2 + 2 + 0 + 4 – 4 = 0 \quad \checkmark \]

This is exactly why we square the deviations when calculating variance — if we didn’t, the sum would always be zero!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top