1. What is Statistics?
Dear student, welcome to Unit 3 of your Grade 12 Mathematics! In this unit, we study Statistics — the branch of mathematics that deals with collecting, organizing, summarizing, and analyzing data.
Think about it: when you hear that “the average score of Grade 12 students in mathematics is 65,” someone has collected data from many students, calculated a summary number, and presented it. That is statistics in action!
In Grade 12, we focus on two big areas:
- Measures of Central Tendency — Mean, Median, Mode (where is the center of the data?)
- Measures of Dispersion (Spread) — Range, Variance, Standard Deviation, Coefficient of Variation (how spread out is the data?)
We also study grouped data (data organized in frequency tables) and learn how to draw and interpret ogive curves (cumulative frequency curves).
Are you ready? Let’s go step by step!
2. Types of Data
Before we calculate anything, we need to understand our data:
2.1 Ungrouped (Raw) Data
This is data listed individually, like: 12, 15, 18, 20, 15, 22, 19
Each value is shown separately. We can work with it directly.
2.2 Grouped Data
When we have a lot of data, we organize it into a frequency distribution table with class intervals. For example:
| Class Interval | Frequency (\(f\)) |
|---|---|
| 10 – 19 | 4 |
| 20 – 29 | 7 |
| 30 – 39 | 10 |
| 40 – 49 | 5 |
2.3 Key Terms for Grouped Data
- Class interval: The range of values in a group (e.g., 10–19)
- Class width (size): Upper limit − Lower limit = \( 19 – 10 = 9 \). Sometimes adjusted for continuous data.
- Class midpoint (mark): \( \frac{\text{Upper} + \text{Lower}}{2} \). For 10–19: \( \frac{10+19}{2} = 14.5 \)
- Frequency (\(f\)): How many values fall in that class
- Cumulative frequency: Running total of frequencies
- Class boundary: Used for continuous data to close gaps. If classes are 10–19, 20–29, boundaries are 9.5, 19.5, 29.5
Why do we use class boundaries? Because if one class ends at 19 and the next starts at 20, where does 19.5 go? Class boundaries remove this confusion.
3. Measures of Central Tendency — Mean
The mean (arithmetic average) is the most common measure of central tendency. Let’s learn it for both ungrouped and grouped data.
3.1 Mean of Ungrouped Data
Worked Example 1: Five students scored: 72, 85, 90, 68, 80 in a mathematics test. Find the mean score.
Solution:
The mean score is 79.
3.2 Mean of Grouped Data
For grouped data, we use the class midpoints \( x_i \) and frequencies \( f_i \):
Worked Example 2: Find the mean from the following frequency distribution:
| Class | Frequency (\(f\)) | Midpoint (\(x\)) | \(f \cdot x\) |
|---|---|---|---|
| 10 – 19 | 4 | 14.5 | 58 |
| 20 – 29 | 7 | 24.5 | 171.5 |
| 30 – 39 | 10 | 34.5 | 345 |
| 40 – 49 | 5 | 44.5 | 222.5 |
| Total | \(n = 26\) | \(\sum fx = 797\) |
The mean is approximately 30.65.
3.3 Mean Using Assumed Mean (Shortcut Method)
When midpoints are large numbers, we can use a shortcut to reduce calculation errors:
Worked Example 3: Using the same data as Example 2, find the mean using the assumed mean method with \( A = 34.5 \).
| Class | \(f\) | \(x\) | \(d = x – 34.5\) | \(f \cdot d\) |
|---|---|---|---|---|
| 10 – 19 | 4 | 14.5 | \(-20\) | \(-80\) |
| 20 – 29 | 7 | 24.5 | \(-10\) | \(-70\) |
| 30 – 39 | 10 | 34.5 | \(0\) | \(0\) |
| 40 – 49 | 5 | 44.5 | \(10\) | \(50\) |
| Total | 26 | \(-100\) |
Same answer! This method is faster and less prone to arithmetic errors. Can you see why choosing \( A \) from the middle class gives smaller \( d \) values?
The marks of 40 students are given in the frequency table below. Find the mean mark.
| Class | 0–9 | 10–19 | 20–29 | 30–39 | 40–49 |
|---|---|---|---|---|---|
| Frequency | 3 | 8 | 15 | 10 | 4 |
Solution:
| Class | \(f\) | \(x\) | \(fx\) |
|---|---|---|---|
| 0 – 9 | 3 | 4.5 | 13.5 |
| 10 – 19 | 8 | 14.5 | 116 |
| 20 – 29 | 15 | 24.5 | 367.5 |
| 30 – 39 | 10 | 34.5 | 345 |
| 40 – 49 | 4 | 44.5 | 178 |
| Total | 40 | 1020 |
The mean mark is 25.5.
4. Measures of Central Tendency — Median
The median is the middle value when data is arranged in order. It divides the data into two equal halves.
4.1 Median of Ungrouped Data
- If \( n \) is odd: Median = value at position \( \frac{n+1}{2} \)
- If \( n \) is even: Median = average of values at positions \( \frac{n}{2} \) and \( \frac{n}{2} + 1 \)
Worked Example 4: Find the median of: 13, 7, 22, 15, 9
Solution: First arrange in order: 7, 9, 13, 15, 22
\( n = 5 \) (odd), so position = \( \frac{5+1}{2} = 3 \)
The 3rd value is 13. Median = 13.
Worked Example 5: Find the median of: 20, 35, 18, 42, 28, 31
Solution: Arrange in order: 18, 20, 28, 31, 35, 42
\( n = 6 \) (even), so we average positions 3 and 4:
4.2 Median of Grouped Data
For grouped data, we use a formula based on the cumulative frequency:
- \( L \) = lower class boundary of the median class
- \( n \) = total frequency
- \( CF \) = cumulative frequency of the class before the median class
- \( f \) = frequency of the median class
- \( h \) = class width (using boundaries)
Worked Example 6: Find the median from the following distribution:
| Class | Frequency | CF |
|---|---|---|
| 10 – 19 | 4 | 4 |
| 20 – 29 | 7 | 11 |
| 30 – 39 | 10 | 21 |
| 40 – 49 | 5 | 26 |
Solution:
\( n = 26 \), so \( \frac{n}{2} = 13 \)
The 13th value falls in the class 30 – 39 (since CF reaches 11 before this class, and 21 after).
\( L = 29.5 \) (lower boundary), \( CF = 11 \), \( f = 10 \), \( h = 10 \) (boundaries: 29.5 to 39.5)
- Always arrange ungrouped data in order first!
- For grouped data, always construct a cumulative frequency column
- Use class boundaries (not limits) for \( L \) and \( h \)
- Common error: Forgetting that \( CF \) is the cumulative frequency BEFORE the median class, not including it
Find the median of the following data: 45, 32, 67, 52, 41, 38, 59, 73
Solution: Arrange in order: 32, 38, 41, 45, 52, 59, 67, 73
\( n = 8 \) (even). Positions: \( \frac{8}{2} = 4 \) and \( \frac{8}{2} + 1 = 5 \)
Find the median for the grouped data below:
| Class | 0–9 | 10–19 | 20–29 | 30–39 | 40–49 |
|---|---|---|---|---|---|
| Frequency | 5 | 12 | 18 | 10 | 5 |
Solution: Build CF table:
| Class | \(f\) | CF |
|---|---|---|
| 0 – 9 | 5 | 5 |
| 10 – 19 | 12 | 17 |
| 20 – 29 | 18 | 35 |
| 30 – 39 | 10 | 45 |
| 40 – 49 | 5 | 50 |
\( n = 50 \), \( \frac{n}{2} = 25 \). The 25th value falls in class 20 – 29 (CF before = 17).
\( L = 19.5 \), \( CF = 17 \), \( f = 18 \), \( h = 10 \)
Median ≈ 23.94
5. Measures of Central Tendency — Mode
The mode is the value that occurs most frequently.
5.1 Mode of Ungrouped Data
Simply identify the value with the highest frequency.
Worked Example 7: Find the mode of: 5, 7, 7, 3, 7, 9, 5, 7, 3
Solution: The value 7 appears 4 times (most frequent). Mode = 7.
What if two values appear equally often? Then the data is bimodal — it has two modes!
5.2 Mode of Grouped Data
- \( L \) = lower class boundary of the modal class
- \( f_1 \) = frequency of the modal class (highest frequency)
- \( f_0 \) = frequency of the class before the modal class
- \( f_2 \) = frequency of the class after the modal class
- \( h \) = class width (using boundaries)
Worked Example 8: Find the mode from the distribution in Example 6.
Solution: The modal class is 30 – 39 (highest frequency = 10).
\( L = 29.5 \), \( f_1 = 10 \), \( f_0 = 7 \), \( f_2 = 5 \), \( h = 10 \)
For the frequency distribution below, find: (a) Mean, (b) Median, (c) Mode
| Class | 5–9 | 10–14 | 15–19 | 20–24 | 25–29 |
|---|---|---|---|---|---|
| Frequency | 6 | 14 | 20 | 12 | 8 |
(a) Mean:
| Class | \(f\) | \(x\) | \(fx\) |
|---|---|---|---|
| 5 – 9 | 6 | 7 | 42 |
| 10 – 14 | 14 | 12 | 168 |
| 15 – 19 | 20 | 17 | 340 |
| 20 – 24 | 12 | 22 | 264 |
| 25 – 29 | 8 | 27 | 216 |
| Total | 60 | 1030 |
(b) Median: \( \frac{n}{2} = 30 \). The 30th value falls in class 15–19 (CF before = 20).
\( L = 14.5 \), \( CF = 20 \), \( f = 20 \), \( h = 5 \)
(c) Mode: Modal class = 15–19. \( L = 14.5 \), \( f_1 = 20 \), \( f_0 = 14 \), \( f_2 = 12 \), \( h = 5 \)
Note: The mode (25.21) being higher than the mean (17.17) here is unusual — it happened because the frequency drops steeply after the modal class. In most real data, mean, median, and mode are closer together.
6. Measures of Dispersion — Range
Central tendency tells us where the center is, but not how spread out the data is. For that, we need measures of dispersion.
Worked Example 9: Data: 15, 23, 8, 31, 19, 27. Find the range.
The range is simple but only uses two values — it ignores all the data in between. Let’s learn better measures!
7. Measures of Dispersion — Variance and Standard Deviation
The variance and standard deviation are the most important measures of spread. They consider every value in the data.
7.1 The Idea Behind Variance
We want to measure how far each value is from the mean. If we just average the deviations \( (x_i – \bar{x}) \), we get zero (they cancel out). So we square each deviation first, then average:
7.2 Variance and Standard Deviation of Ungrouped Data
Why standard deviation? Because variance is in squared units (e.g., “marks squared”), which doesn’t make physical sense. Standard deviation is in the same units as the data.
Worked Example 10: Find the variance and standard deviation of: 4, 8, 6, 5, 7
Solution:
Step 1: Mean = \( \frac{4+8+6+5+7}{5} = \frac{30}{5} = 6 \)
Step 2: Calculate deviations and squared deviations:
| \(x\) | \(x – \bar{x}\) | \((x – \bar{x})^2\) |
|---|---|---|
| 4 | \(-2\) | 4 |
| 8 | \(2\) | 4 |
| 6 | \(0\) | 0 |
| 5 | \(-1\) | 1 |
| 7 | \(1\) | 1 |
| Sum = 0 | \(\sum = 10\) |
7.3 Shortcut Formula for Variance
Calculating deviations is tedious. Use this equivalent formula:
Worked Example 11: Using the shortcut for the same data: 4, 8, 6, 5, 7
Same answer — but much faster! Can you see why this formula works? It expands \( \sum(x – \bar{x})^2 \) algebraically.
7.4 Variance and Standard Deviation of Grouped Data
Worked Example 12: Find the variance and standard deviation:
| Class | Frequency |
|---|---|
| 10 – 14 | 3 |
| 15 – 19 | 5 |
| 20 – 24 | 8 |
| 25 – 29 | 4 |
Solution:
| Class | \(f\) | \(x\) | \(fx\) | \(x^2\) | \(fx^2\) |
|---|---|---|---|---|---|
| 10 – 14 | 3 | 12 | 36 | 144 | 432 |
| 15 – 19 | 5 | 17 | 85 | 289 | 1445 |
| 20 – 24 | 8 | 22 | 176 | 484 | 3872 |
| 25 – 29 | 4 | 27 | 108 | 729 | 2916 |
| Total | 20 | 405 | 8665 |
8. Coefficient of Variation (CV)
The coefficient of variation compares the spread of two different datasets, even if they have different units or very different means.
Worked Example 13: Two factories produce light bulbs. Factory A: mean life = 1200 hrs, SD = 80 hrs. Factory B: mean life = 1500 hrs, SD = 110 hrs. Which factory is more consistent?
Since \( CV_A < CV_B \), Factory A is more consistent.
You cannot compare standard deviations directly when the means are different — that’s why we use CV!
The heights of 50 students have mean 165 cm and variance 25 cm². Find the coefficient of variation.
For the data below, calculate: (a) Mean, (b) Variance, (c) Standard Deviation, (d) Coefficient of Variation
| Class | 20–24 | 25–29 | 30–34 | 35–39 | 40–44 |
|---|---|---|---|---|---|
| Frequency | 4 | 10 | 16 | 8 | 2 |
Solution:
| Class | \(f\) | \(x\) | \(fx\) | \(x^2\) | \(fx^2\) |
|---|---|---|---|---|---|
| 20 – 24 | 4 | 22 | 88 | 484 | 1936 |
| 25 – 29 | 10 | 27 | 270 | 729 | 7290 |
| 30 – 34 | 16 | 32 | 512 | 1024 | 16384 |
| 35 – 39 | 8 | 37 | 296 | 1369 | 10952 |
| 40 – 44 | 2 | 42 | 84 | 1764 | 3528 |
| Total | 40 | 1250 | 40090 |
(a) Mean: \( \bar{x} = \frac{1250}{40} = 31.25 \)
(b) Variance: \( \sigma^2 = \frac{40090}{40} – (31.25)^2 = 1002.25 – 976.5625 = 25.6875 \)
(c) Standard Deviation: \( \sigma = \sqrt{25.6875} \approx 5.07 \)
(d) CV: \( CV = \frac{5.07}{31.25} \times 100\% \approx 16.22\% \)
9. Quartiles and Percentiles
Quartiles and percentiles are measures of position — they tell us where a particular value stands in the dataset.
9.1 Quartiles
Quartiles divide ordered data into four equal parts:
- \( Q_1 \) (First Quartile): 25% of data is below it
- \( Q_2 \) (Second Quartile) = Median: 50% of data is below it
- \( Q_3 \) (Third Quartile): 75% of data is below it
9.2 Quartiles for Ungrouped Data
Worked Example 14: Find \( Q_1 \) and \( Q_3 \) for: 11, 15, 18, 22, 25, 30, 35, 40, 42
\( n = 9 \)
9.3 Quartiles for Grouped Data
Worked Example 15: Find \( Q_1 \) and \( Q_3 \) from:
| Class | Frequency | CF |
|---|---|---|
| 0 – 9 | 5 | 5 |
| 10 – 19 | 12 | 17 |
| 20 – 29 | 18 | 35 |
| 30 – 39 | 10 | 45 |
| 40 – 49 | 5 | 50 |
Solution: \( n = 50 \)
\( Q_1 \): \( \frac{n}{4} = 12.5 \). The 12.5th value falls in class 10–19 (CF before = 5).
\( Q_3 \): \( \frac{3n}{4} = 37.5 \). The 37.5th value falls in class 30–39 (CF before = 35).
9.4 Interquartile Range (IQR)
From Example 15: \( IQR = 32 – 15.75 = 16.25 \)
9.5 Percentiles
For example, \( P_{30} \) means the value below which 30% of the data falls.
Find \( Q_1 \), \( Q_3 \), and IQR for the ungrouped data: 6, 10, 14, 18, 22, 26, 30
\( n = 7 \)
10. Ogive (Cumulative Frequency Curve)
An ogive is a graph of cumulative frequency against the upper class boundary. It is an S-shaped curve used to estimate quartiles, percentiles, and the median graphically.
10.1 How to Draw an Ogive
- Calculate the cumulative frequency for each class
- Plot points using upper class boundaries on the x-axis and cumulative frequency on the y-axis
- Join the points with a smooth curve (start from the point where CF = 0)
- To find the median: draw a horizontal line from \( \frac{n}{2} \) on the y-axis to the curve, then drop vertically to the x-axis
Worked Example 16: Draw an ogive for the following data and estimate the median.
| Class | Frequency | Upper Boundary | CF |
|---|---|---|---|
| 0 – 9 | 4 | 9.5 | 4 |
| 10 – 19 | 10 | 19.5 | 14 |
| 20 – 29 | 16 | 29.5 | 30 |
| 30 – 39 | 8 | 39.5 | 38 |
| 40 – 49 | 2 | 49.5 | 40 |
Points to plot: (0, 0), (9.5, 4), (19.5, 14), (29.5, 30), (39.5, 38), (49.5, 40)
Estimating median: \( \frac{n}{2} = 20 \). Draw a horizontal line from y = 20 to the curve, then drop down.
From the curve, this corresponds to approximately \( x \approx 25 \). So the estimated median is about 25.
In exams, you would draw this on graph paper and read the value carefully. The formula gives a more precise answer, but the ogive gives a good visual estimate!
- Always start from CF = 0 (the origin of the cumulative frequency)
- Use upper class BOUNDARIES (not limits)
- The curve should be smooth (not straight lines between points)
- Label both axes clearly with a title
From the ogive data below, list the points to plot and estimate \( Q_1 \).
| Class | 5–14 | 15–24 | 25–34 | 35–44 | 45–54 |
|---|---|---|---|---|---|
| Frequency | 8 | 15 | 20 | 10 | 7 |
First find upper boundaries and CF:
| Class | \(f\) | Upper Boundary | CF |
|---|---|---|---|
| 5 – 14 | 8 | 14.5 | 8 |
| 15 – 24 | 15 | 24.5 | 23 |
| 25 – 34 | 20 | 34.5 | 43 |
| 35 – 44 | 10 | 44.5 | 53 |
| 45 – 54 | 7 | 54.5 | 60 |
Points to plot: (4.5, 0), (14.5, 8), (24.5, 23), (34.5, 43), (44.5, 53), (54.5, 60)
Estimate \( Q_1 \): \( \frac{n}{4} = \frac{60}{4} = 15 \). Draw horizontal line from y = 15 to the curve, drop down. From the data, this falls between x = 14.5 and x = 24.5, closer to 14.5 (since CF jumps from 8 to 23). Estimated \( Q_1 \approx 18 \).
Check by formula: \( Q_1 = 14.5 + \frac{15-8}{15} \times 10 = 14.5 + 4.67 = 19.17 \). The graphical estimate of 18 is reasonably close!
11. Summary of Key Exam Notes
- Mean = \( \frac{\sum x}{n} \) (ungrouped) or \( \frac{\sum fx}{n} \) (grouped). Use assumed mean method for large numbers.
- Median = middle value. Use formula \( L + \frac{\frac{n}{2}-CF}{f} \times h \) for grouped data.
- Mode = most frequent value. Use formula \( L + \frac{f_1-f_0}{2f_1-f_0-f_2} \times h \) for grouped data.
- Variance = \( \frac{\sum fx^2}{n} – \bar{x}^2 \). Always use the shortcut formula!
- Standard Deviation = \( \sqrt{\text{Variance}} \). Same units as data.
- CV = \( \frac{\sigma}{\bar{x}} \times 100\% \). Used to compare consistency.
- Quartiles use the same formula structure as median but with \( \frac{n}{4} \) and \( \frac{3n}{4} \).
- Ogive uses upper class boundaries vs cumulative frequency. Always start from CF = 0.
- For grouped formulas, always use class boundaries (not limits) for \( L \) and \( h \).
- Always show your calculation table — examiners award marks for clear working!
The monthly salaries (in Birr) of 60 workers are given below:
| Salary (Birr) | 2000–2499 | 2500–2999 | 3000–3499 | 3500–3999 | 4000–4499 |
|---|---|---|---|---|---|
| Number of workers | 8 | 15 | 20 | 12 | 5 |
Find: (a) Mean salary, (b) Median salary, (c) Standard deviation, (d) What percentage of workers earn above 3500 Birr?
(a) Mean:
| Class | \(f\) | \(x\) | \(fx\) | \(x^2\) | \(fx^2\) |
|---|---|---|---|---|---|
| 2000–2499 | 8 | 2249.5 | 17996 | 5060250.25 | 40482002 |
| 2500–2999 | 15 | 2749.5 | 41242.5 | 7559750.25 | 113396253.75 |
| 3000–3499 | 20 | 3249.5 | 64990 | 10559250.25 | 211185005 |
| 3500–3999 | 12 | 3749.5 | 44994 | 14058750.25 | 168705003 |
| 4000–4499 | 5 | 4249.5 | 21247.5 | 18058250.25 | 90291251.25 |
| Total | 60 | 190470 | 624059515 |
(b) Median: \( \frac{n}{2} = 30 \). CF: 8, 23, 43, … → 30th value in class 3000–3499.
(c) Standard Deviation:
(d) Workers above 3500 Birr: Those in classes 3500–3999 and 4000–4499 = 12 + 5 = 17 workers.
Quick Revision Notes — Statistics
1. Important Definitions
- Mean (\( \bar{x} \)) — The sum of all values divided by the number of values. It uses every data point.
- Median — The middle value when data is arranged in ascending order. It divides data into two equal halves.
- Mode — The value that occurs most frequently in the data.
- Variance (\( \sigma^2 \)) — The average of squared deviations from the mean. Measures spread.
- Standard Deviation (\( \sigma \)) — The square root of variance. Same units as original data.
- Coefficient of Variation (CV) — Standard deviation as a percentage of the mean. Used for comparison.
- Quartiles — Values that divide data into four equal parts (\( Q_1, Q_2, Q_3 \)).
- Ogive — Cumulative frequency curve plotted using upper class boundaries.
- Class Boundary — Adjusted class limits to remove gaps between consecutive classes.
- Class Midpoint — The average of the lower and upper limits of a class.
2. All Key Formulas
MEAN:
MEDIAN:
MODE:
VARIANCE & STANDARD DEVIATION:
COEFFICIENT OF VARIATION:
QUARTILES (Grouped):
PERCENTILE:
3. Relationship Between Mean, Median, and Mode
- If data is symmetrical: Mean ≈ Median ≈ Mode
- If data is positively skewed (right-skewed): Mode < Median < Mean
- If data is negatively skewed (left-skewed): Mean < Median < Mode
4. When to Use Which Measure
| Measure | Best Used When |
|---|---|
| Mean | Data is fairly symmetric, no extreme outliers |
| Median | Data has outliers or is skewed (median is resistant to extremes) |
| Mode | Data has a clear peak; useful for categorical data |
| SD | General measure of spread (most common) |
| CV | Comparing variability of two different datasets |
| IQR | Data has outliers (IQR is resistant to extremes) |
5. Common Mistakes to Avoid
- Not sorting data for median — Always arrange ungrouped data in ascending order before finding the median!
- Using class limits instead of boundaries — In grouped data formulas, \( L \) and \( h \) must use class boundaries (e.g., 9.5, 19.5), not limits (e.g., 10, 19).
- Wrong CF in median/quartile formula — CF means the cumulative frequency BEFORE the median class, not including it.
- Confusing \( n \) positions — For median use \( \frac{n}{2} \), for \( Q_1 \) use \( \frac{n}{4} \), for \( Q_3 \) use \( \frac{3n}{4} \). Don’t mix them up!
- Forgetting to square in variance — Variance uses \( (x – \bar{x})^2 \), not \( (x – \bar{x}) \). And standard deviation is the square ROOT of variance.
- Wrong class width — Class width should use boundaries: \( h = \text{upper boundary} – \text{lower boundary} \), not just upper limit − lower limit (when classes have gaps).
- Ogive: using limits instead of boundaries — Always plot upper class BOUNDARIES on the x-axis.
- Not showing the calculation table — In exams, marks are awarded for the table. Don’t skip it!
- Arithmetic errors in the table — Double-check your \( fx \) and \( fx^2 \) columns. A small error in the table affects the final answer.
- Comparing SD directly instead of CV — When means are different, always use CV to compare spread.
6. Quick Examples
Q: Data: 3, 5, 7, 9, 11. Find mean, median, mode.
A: Mean = \( \frac{35}{5} = 7 \). Median = 3rd value = 7. Mode = no repeated value, so no mode. (Symmetric data!)
Q: If \( \bar{x} = 50 \) and \( \sigma = 10 \), find CV.
A: \( CV = \frac{10}{50} \times 100\% = 20\% \)
Q: If \( \sum x = 480 \), \( \sum x^2 = 24000 \), \( n = 12 \), find variance.
A: \( \bar{x} = 40 \). \( \sigma^2 = \frac{24000}{12} – 40^2 = 2000 – 1600 = 400 \)
Q: Class 20–29 has what lower class boundary?
A: \( 20 – 0.5 = 19.5 \)
Q: Which class is the median class if CFs are 5, 13, 28, 38, 40 and \( n = 40 \)?
A: \( \frac{n}{2} = 20 \). CF reaches 13 then 28, so the 20th value is in the class where CF goes from 13 to 28 (the 3rd class).
Challenge Exam Questions — Statistics
These questions test your deep understanding. Try each one fully before checking the answer!
Section A: Multiple Choice Questions
For the data: 5, 8, 8, 12, 15, the relationship between mean, median, and mode is:
A) Mean < Median < Mode B) Mode < Median < Mean C) Mean = Median = Mode D) Median < Mode < Mean
Answer: B) Mode < Median < Mean
Mean = \( \frac{48}{5} = 9.6 \). Median = 8 (3rd value). Mode = 8.
So: Mode (8) = Median (8) < Mean (9.6). The data is slightly positively skewed (right-skewed). The closest option showing this pattern is B.
If the variance of a dataset is 144, the standard deviation is:
A) 12 B) 72 C) 144 D) 20736
Answer: A) 12
The lower class boundary of the class 35–44 is:
A) 34.5 B) 35 C) 35.5 D) 34
Answer: A) 34.5
Lower boundary = lower limit − 0.5 = 35 − 0.5 = 34.5
If dataset A has CV = 15% and dataset B has CV = 22%, which is more consistent?
A) Dataset A B) Dataset B C) Both equally D) Cannot determine
Answer: A) Dataset A
Lower CV means less variability, i.e., more consistency. Since 15% < 22%, Dataset A is more consistent.
The ogive curve is plotted using:
A) Frequency vs class midpoint B) Cumulative frequency vs upper class boundary C) Frequency vs lower class limit D) Cumulative frequency vs class midpoint
Answer: B) Cumulative frequency vs upper class boundary
This is the standard definition. The x-axis has upper class boundaries and the y-axis has cumulative frequencies.
Section B: Fill in the Blanks
The class midpoint of the interval 100–149 is ________
Answer: 124.5
If \( Q_1 = 25 \) and \( Q_3 = 45 \), the interquartile range is ________
Answer: 20
The second quartile \( Q_2 \) is the same as the ________
Answer: Median
\( Q_2 = P_{50} = \) Median. All three represent the value below which 50% of the data falls.
If each value in a dataset is multiplied by 5, the mean is multiplied by ________ and the standard deviation is multiplied by ________
Answer: 5 and 5
If \( y_i = 5x_i \), then \( \bar{y} = 5\bar{x} \) and \( \sigma_y = 5\sigma_x \). Both mean and standard deviation are multiplied by the same constant. (Variance would be multiplied by 25.)
If each value is increased by 10, the standard deviation ________
Answer: Remains unchanged (stays the same)
Adding a constant shifts all values but doesn’t change their spread. So \( \sigma \) is unchanged. (Only the mean increases by 10.)
Section C: Short Answer Questions
Find the mean and variance of: 6, 10, 4, 8, 12
Mean = 8, Variance = 8
Explain the difference between class limit and class boundary with an example.
Class limits are the smallest and largest values that can belong to a class. For example, in the class 20–29, the lower limit is 20 and the upper limit is 29.
Class boundaries are adjusted values that close the gap between consecutive classes. For classes 20–29 and 30–39, the boundaries are 19.5, 29.5, 39.5. Boundary = limit − 0.5 (lower) or limit + 0.5 (upper) when data is in whole numbers.
Why boundaries? If a value is 29.6, does it go in 20–29 or 30–39? With boundaries, 29.5 is the dividing line, so 29.6 clearly goes in the second class.
A dataset has mean 40 and standard deviation 5. A second dataset has mean 80 and standard deviation 8. Which dataset is more variable?
Since \( CV_1 > CV_2 \), the first dataset is more variable relative to its mean.
Note: Even though the second dataset has a larger standard deviation (8 vs 5), its mean is also much larger, so relative to its mean, it is actually less variable!
Find the median of: 3, 7, 2, 9, 1, 5, 8, 4, 6
Arrange in order: 1, 2, 3, 4, 5, 6, 7, 8, 9
\( n = 9 \) (odd). Position = \( \frac{9+1}{2} = 5 \)
The 5th value is 5. Median = 5.
Section D: Step-by-Step Calculation Questions
Find the mean, median, and mode for the following distribution:
| Class | 0–4 | 5–9 | 10–14 | 15–19 | 20–24 | 25–29 |
|---|---|---|---|---|---|---|
| Frequency | 2 | 5 | 12 | 18 | 8 | 3 |
Mean:
| Class | \(f\) | \(x\) | \(fx\) |
|---|---|---|---|
| 0 – 4 | 2 | 2 | 4 |
| 5 – 9 | 5 | 7 | 35 |
| 10 – 14 | 12 | 12 | 144 |
| 15 – 19 | 18 | 17 | 306 |
| 20 – 24 | 8 | 22 | 176 |
| 25 – 29 | 3 | 27 | 81 |
| Total | 48 | 746 |
Median: \( \frac{n}{2} = 24 \). CF: 2, 7, 19, 37, … → 24th value in class 15–19.
Mode: Modal class = 15–19. \( L = 14.5 \), \( f_1 = 18 \), \( f_0 = 12 \), \( f_2 = 8 \), \( h = 5 \)
Mean ≈ 15.54, Median ≈ 15.89, Mode ≈ 16.38
Calculate the standard deviation and coefficient of variation for the data below:
| Class | 50–59 | 60–69 | 70–79 | 80–89 | 90–99 |
|---|---|---|---|---|---|
| Frequency | 6 | 14 | 22 | 12 | 6 |
| Class | \(f\) | \(x\) | \(fx\) | \(x^2\) | \(fx^2\) |
|---|---|---|---|---|---|
| 50 – 59 | 6 | 54.5 | 327 | 2970.25 | 17821.5 |
| 60 – 69 | 14 | 64.5 | 903 | 4160.25 | 58243.5 |
| 70 – 79 | 22 | 74.5 | 1639 | 5550.25 | 122105.5 |
| 80 – 89 | 12 | 84.5 | 1014 | 7140.25 | 85683 |
| 90 – 99 | 6 | 94.5 | 567 | 8930.25 | 53581.5 |
| Total | 60 | 4450 | 337435 |
Find \( Q_1 \), \( Q_3 \), and IQR for the following data:
| Class | 10–19 | 20–29 | 30–39 | 40–49 | 50–59 | 60–69 |
|---|---|---|---|---|---|---|
| Frequency | 4 | 8 | 15 | 20 | 10 | 3 |
CF table:
| Class | \(f\) | CF |
|---|---|---|
| 10 – 19 | 4 | 4 |
| 20 – 29 | 8 | 12 |
| 30 – 39 | 15 | 27 |
| 40 – 49 | 20 | 47 |
| 50 – 59 | 10 | 57 |
| 60 – 69 | 3 | 60 |
\( n = 60 \)
\( Q_1 \): \( \frac{60}{4} = 15 \). 15th value in class 30–39 (CF before = 12).
\( Q_3 \): \( \frac{3 \times 60}{4} = 45 \). 45th value in class 40–49 (CF before = 27).
The mean and standard deviation of 25 observations are 50 and 4 respectively. If each observation is multiplied by 3 and then 5 is added, find the new mean and new standard deviation.
If \( y = 3x + 5 \), then:
Rules: Multiplying by \( k \) multiplies both mean and SD by \( |k| \). Adding a constant shifts the mean but doesn’t change the SD.
New mean = 155, New standard deviation = 12
Two sections of Grade 12 took a mathematics exam. The results are:
Section A: \( n = 40 \), \( \bar{x} = 65 \), \( \sigma = 8 \)
Section B: \( n = 50 \), \( \bar{x} = 58 \), \( \sigma = 10 \)
(a) Find the combined mean.
(b) Which section performed more consistently?
(a) Combined mean:
(b) Consistency:
Since \( CV_A < CV_B \), Section A performed more consistently.
Using the frequency distribution below:
| Class | 0–9 | 10–19 | 20–29 | 30–39 | 40–49 |
|---|---|---|---|---|---|
| Frequency | 5 | 10 | 25 | 8 | 2 |
(a) Calculate the mean using the assumed mean method (take \( A = 24.5 \)).
(b) Calculate the variance and standard deviation.
(c) Verify the relationship: Mode ≈ 3 × Median − 2 × Mean
(a) Mean (assumed mean method):
| Class | \(f\) | \(x\) | \(d = x – 24.5\) | \(fd\) |
|---|---|---|---|---|
| 0 – 9 | 5 | 4.5 | \(-20\) | \(-100\) |
| 10 – 19 | 10 | 14.5 | \(-10\) | \(-100\) |
| 20 – 29 | 25 | 24.5 | \(0\) | \(0\) |
| 30 – 39 | 8 | 34.5 | \(10\) | \(80\) |
| 40 – 49 | 2 | 44.5 | \(20\) | \(40\) |
| Total | 50 | \(-80\) |
(b) Variance and SD: Need \( \sum fx^2 \). From above, \( \sum fx = 1145 \).
| Class | \(f\) | \(x\) | \(x^2\) | \(fx^2\) |
|---|---|---|---|---|
| 0 – 9 | 5 | 4.5 | 20.25 | 101.25 |
| 10 – 19 | 10 | 14.5 | 210.25 | 2102.5 |
| 20 – 29 | 25 | 24.5 | 600.25 | 15006.25 |
| 30 – 39 | 8 | 34.5 | 1190.25 | 9522 |
| 40 – 49 | 2 | 44.5 | 1980.25 | 3960.5 |
| Total | 50 | 30692.5 |
(c) Verify Mode ≈ 3 × Median − 2 × Mean:
Median: \( \frac{50}{2} = 25 \). 25th value in class 20–29 (CF before = 15).
Mode: Modal class = 20–29. \( L = 19.5 \), \( f_1 = 25 \), \( f_0 = 10 \), \( f_2 = 8 \), \( h = 10 \)
Check: \( 3 \times 23.5 – 2 \times 22.9 = 70.5 – 45.8 = 24.7 \)
The approximation is close! (Small differences are normal since this is an empirical relationship, not exact.)
The following data represents the weights (in kg) of 80 students. Find \( P_{30} \) (the 30th percentile).
| Weight (kg) | 40–44 | 45–49 | 50–54 | 55–59 | 60–64 | 65–69 |
|---|---|---|---|---|---|---|
| Frequency | 6 | 14 | 22 | 20 | 12 | 6 |
CF table:
| Class | \(f\) | CF |
|---|---|---|
| 40 – 44 | 6 | 6 |
| 45 – 49 | 14 | 20 |
| 50 – 54 | 22 | 42 |
| 55 – 59 | 20 | 62 |
| 60 – 64 | 12 | 74 |
| 65 – 69 | 6 | 80 |
\( \frac{30 \times 80}{100} = 24 \). The 24th value falls in class 50–54 (CF before = 20).
This means 30% of students weigh less than approximately 50.41 kg.
The variance of 20 observations is 16. If each observation is increased by 3, find the new variance. If each observation is instead multiplied by 2, find the new variance.
Case 1: Adding 3 to each observation
Adding a constant does not change the spread. New variance = 16 (unchanged).
Case 2: Multiplying each observation by 2
If \( y = 2x \), then \( \sigma_y^2 = 2^2 \times \sigma_x^2 = 4 \times 16 = 64 \).
New variance = 64
Rule: Adding a constant → variance unchanged. Multiplying by \( k \) → variance multiplied by \( k^2 \).
A teacher recorded the following marks. Use the assumed mean method to find the mean and then calculate the standard deviation:
| Marks | 30–39 | 40–49 | 50–59 | 60–69 | 70–79 | 80–89 | 90–99 |
|---|---|---|---|---|---|---|---|
| Students | 3 | 7 | 12 | 18 | 15 | 8 | 2 |
Take \( A = 64.5 \) (midpoint of the class with highest frequency, 60–69). Class width \( h = 10 \). Use \( u = \frac{x – A}{h} \):
| Class | \(f\) | \(x\) | \(u = \frac{x-64.5}{10}\) | \(fu\) | \(fu^2\) |
|---|---|---|---|---|---|
| 30 – 39 | 3 | 34.5 | \(-3\) | \(-9\) | 27 |
| 40 – 49 | 7 | 44.5 | \(-2\) | \(-14\) | 28 |
| 50 – 59 | 12 | 54.5 | \(-1\) | \(-12\) | 12 |
| 60 – 69 | 18 | 64.5 | \(0\) | \(0\) | 0 |
| 70 – 79 | 15 | 74.5 | \(1\) | \(15\) | 15 |
| 80 – 89 | 8 | 84.5 | \(2\) | \(16\) | 32 |
| 90 – 99 | 2 | 94.5 | \(3\) | \(6\) | 18 |
| Total | 65 | \(2\) | \(132\) |
Mean ≈ 64.81, Standard Deviation ≈ 14.25
The step-deviation method (using \( u \)) makes calculations much easier when class widths are equal. Remember: \( \sigma = h \times \sigma_u \).
List the points to plot for an ogive from the data below. Then estimate the median graphically and verify with the formula.
| Class | 0–4 | 5–9 | 10–14 | 15–19 | 20–24 |
|---|---|---|---|---|---|
| Frequency | 3 | 7 | 12 | 6 | 2 |
Points to plot (Upper Boundary, CF):
| Class | \(f\) | Upper Boundary | CF |
|---|---|---|---|
| 0 – 4 | 3 | 4.5 | 3 |
| 5 – 9 | 7 | 9.5 | 10 |
| 10 – 14 | 12 | 14.5 | 22 |
| 15 – 19 | 6 | 19.5 | 28 |
| 20 – 24 | 2 | 24.5 | 30 |
Plot points: (−0.5, 0), (4.5, 3), (9.5, 10), (14.5, 22), (19.5, 28), (24.5, 30)
Graphical estimate: \( \frac{n}{2} = 15 \). From y = 15, the curve gives approximately \( x \approx 12 \).
Formula verification: 15th value in class 10–14 (CF before = 10).
The graphical estimate (≈12) is close to the formula answer (11.58). The small difference is expected with graphical methods.
Prove that the sum of deviations of all observations from their mean is zero, i.e., \( \sum (x_i – \bar{x}) = 0 \). Then verify with the data: 4, 8, 6, 10, 2.
Proof:
Since \( \bar{x} = \frac{\sum x_i}{n} \), we have \( n\bar{x} = \sum x_i \).
Verification: Data: 4, 8, 6, 10, 2. Mean = \( \frac{30}{5} = 6 \).
This is exactly why we square the deviations when calculating variance — if we didn’t, the sum would always be zero!