Measures of Dispersion (Variation)

0
Measures of Dispersion – Statistics Guide for Ethiopian Students | Chapter 4

Measures of Dispersion – Let’s Understand Spread! 😊

Hello again, smart learners! 👋

So far, we’ve learned how to find a “typical” value—like the mean, median, or mode. But here’s the thing 😮: two groups can have the same average, yet be totally different in how their values are spread out!

Example: Imagine two classrooms both have an average score of 70. In Classroom A, everyone scored between 65 and 75. In Classroom B, some scored 20 and others 100! 🤯 Which class is more “consistent”? That’s where measures of dispersion come in!

In this chapter, we’ll explore four main ways to measure spread:

  • Range
  • Quartile Deviation
  • Mean Deviation
  • Standard Deviation & Variance

And yes—we’ll include all examples and all exercises with full answers from your textbook, explained in simple English with a warm, friendly tone. Let’s go! 💪

1. Why Do We Need Measures of Dispersion? 🤔

Important Point: Measures of dispersion tell us how much the data values vary around the central value (like the mean). They help us judge how “reliable” or “consistent” our average really is.

Without dispersion, we’d be like a chef who only knows the average temperature of an oven—but not whether it swings wildly between freezing and boiling! 😠

Objectives of studying dispersion:

  1. To judge the reliability of averages (low dispersion = more reliable)
  2. To compare variability between two groups
  3. To control quality (e.g., in factories)
  4. To prepare for advanced topics like probability and inference
Real-life Example:
A farmer measures yields from two plots:
  • Plot A: 40, 42, 41, 39, 43 kg
  • Plot B: 20, 60, 30, 50, 45 kg
Both have similar averages (~41 kg), but Plot A is far more consistent. Dispersion reveals this!

2. Absolute vs Relative Measures of Dispersion

Some measures (like range or standard deviation) are in the same units as the data (e.g., kg, Birr, cm). These are absolute.

But what if you want to compare the spread of heights (in cm) and weights (in kg)? You can’t! So we use relative measures—like ratios or percentages—called coefficients.

Example:
– Class A: mean = 50 Birr, SD = 10 Birr → CV = (10/50)×100 = 20%
– Class B: mean = 200 Birr, SD = 30 Birr → CV = (30/200)×100 = 15%
→ Class B is more consistent, even though its SD is larger!

3. The Range – Simple but Limited 📏

Important Point: The range is the difference between the largest and smallest values: Range = Largest – Smallest.

It’s quick to calculate, but only uses two values—so it’s very sensitive to outliers!

Example from your notes:
Distribution 1: 32, 35, 36, 36, 37, 38, 40, 42, 42, 43, 43, 45 → Range = 45 – 32 = 13
Distribution 2: 32, 32, 33, 33, 33, 34, 34, 34, 34, 34, 35, 45 → Range = 45 – 32 = 13
Same range! But clearly, Distribution 1 is less spread out in the middle. Range doesn’t see that! 😮

Range for Grouped Data

Use the upper class limit of the last class and lower class limit of the first class:

\[ \text{Range} = \text{UCL}_k – \text{LCL}_1 \]

Relative Range (Coefficient of Range)

\[ \text{RR} = \frac{L – S}{L + S} \], where L = largest, S = smallest.

Exercise from your notes:
If Range = 4 and RR = 0.25, find L and S.
Solution:
\( L – S = 4 \)
\( \frac{L – S}{L + S} = 0.25 \Rightarrow \frac{4}{L + S} = 0.25 \Rightarrow L + S = 16 \)
Solve: \( L = 10, S = 6 \)
Question: Why is range not good for scientific analysis?
Because it depends only on two extreme values and ignores all other data. Also, it gets larger as sample size increases—even if the true variability doesn’t change!

4. Quartile Deviation – Focus on the Middle 50% 🎯

Important Point: Quartile Deviation (QD) = \(\frac{Q_3 – Q_1}{2}\). It ignores extreme values and focuses on the interquartile range (IQR).

This is great for skewed data (like income) where outliers distort the range.

Example from your notes:
Given frequency table (same as Chapter 3), you previously found:
\(Q_1 = 174.90\), \(Q_3 = 203.83\)
So:
\[ \text{QD} = \frac{203.83 – 174.90}{2} = \frac{28.93}{2} = 14.47 \]
Coefficient of QD = \(\frac{Q_3 – Q_1}{Q_3 + Q_1} = \frac{28.93}{378.73} \approx 0.076\)
Question: If QD is small, what does that tell you about the data?
It means the middle 50% of the data is tightly packed around the median—so the group is consistent in its central values.

5. Mean Deviation – The “Average Distance” from Center 🧮

Important Point: Mean Deviation (MD) is the average of the absolute deviations from a central value (mean, median, or mode).

We use absolute values because positive and negative deviations would cancel out otherwise.

Formula (about the mean):

\[ \text{MD}(\bar{X}) = \frac{\sum |X_i – \bar{X}|}{n} \]

Example from your notes:
Visits by 10 mothers: 8, 6, 5, 5, 7, 4, 5, 9, 7, 4
Mean = 6, Median = 5.5, Mode = 5
Total absolute deviations = 14 (for all three centers!)
So:
MD(mean) = MD(median) = MD(mode) = 14/10 = 1.4
(This is a coincidence—usually they differ!)

Coefficient of Mean Deviation:

\[ \text{CMD} = \frac{\text{MD}}{\text{Average used}} \]

For the above:
– CMD about mean = 1.4 / 6 ≈ 0.233
– CMD about median = 1.4 / 5.5 ≈ 0.255
– CMD about mode = 1.4 / 5 = 0.28

Question: Why is MD always minimum when calculated about the median?
Because the median is the point that minimizes the sum of absolute deviations—this is a mathematical property! It’s like the “balance point” for absolute distances.

6. Variance and Standard Deviation – The Gold Standard! 🥇

Important Point: Variance is the average of squared deviations from the mean. Standard deviation (SD) is its square root—so it’s back in the original units!

Why square? To make all deviations positive and give more weight to larger deviations (which is useful for detecting outliers).

Sample Variance Formula:

\[ s^2 = \frac{\sum (X_i – \bar{X})^2}{n – 1} \]

Note: We divide by n–1 (not n) to get an unbiased estimate of the population variance.

Example 1 (raw data):
Data: 5, 17, 12, 10 → Mean = 11
Squared deviations: (5–11)²=36, (10–11)²=1, (12–11)²=1, (17–11)²=36 → Total = 74
Variance = 74 / (4–1) = 74/3 ≈ 24.67
SD = √24.67 ≈ 4.97
Example 2 (grouped data from your notes):
Age distribution (same as Chapter 3)
Mean = 55
Sum of f(X – X̄)² = 4400
n = 75 → Variance = 4400 / 74 ≈ 59.46
SD = √59.46 ≈ 7.71

Special Properties of SD

  • Chebyshev’s Theorem: For ANY distribution, at least \((1 – \frac{1}{k^2})\) of data lies within k SDs of the mean.
  • For normal distributions: ~68% within 1 SD, ~95% within 2 SDs, ~99.7% within 3 SDs.
  • SD is affected by scale changes:
    • Add constant → SD unchanged
    • Multiply by constant k → SD becomes |k| × old SD
Exercise from your notes:
Mean = 500, SD = 10.
(a) Add 10 to each value → new SD = 10 (unchanged!)
(b) Multiply each by –5 → new SD = |–5| × 10 = 50
Question: Why do we divide by (n–1) for sample variance, not n?
Because using the sample mean (X̄) “uses up” one degree of freedom—so only (n–1) values are free to vary. This makes the sample variance an unbiased estimator of the true population variance.

7. Coefficient of Variation (CV) – Compare Spread Fairly! ⚖️

Important Point: CV = \(\frac{\text{SD}}{\text{Mean}} \times 100\%\). It’s a unitless measure of relative variability.

Use it to compare dispersion across different datasets—even if they’re in different units!

Example from your notes:
Firm A: Mean wage = 52.5 Birr, Variance = 100 → SD = 10 → CV = (10/52.5)×100 ≈ 19.05%
Firm B: Mean wage = 47.5 Birr, Variance = 121 → SD = 11 → CV = (11/47.5)×100 ≈ 23.16%
Firm B has greater variability in wages, even though its SD isn’t much larger!

8. Let’s Practice! Full Exercises with Answers ✍️

Exercise 1:
Monthly wages in two firms:
ValueFirm AFirm B
Mean wage52.547.5
Variance100121
Which firm has greater wage variability?
Solution:
As above: CVA ≈ 19.05%, CVB ≈ 23.16% → Firm B is more variable.
Exercise 2:
A student’s average score is 65 (n=10). One score was misread as 40 instead of 80. Find the correct average.
Solution:
Wrong total = 65 × 10 = 650
Correct total = 650 – 40 + 80 = 690
Correct mean = 690 / 10 = 69
Exercise 3:
Find MD about mean, median, and mode for:
Class40–4445–4950–5455–5960–6465–6970–74
Frequency71022151263
(You already know mean = 55, median ≈ 54.16 from Chapter 3)
Answer:
This requires detailed table work. Steps:
  1. Find class marks (42, 47, 52, 57, 62, 67, 72)
  2. Compute |X – mean|, |X – median|, |X – mode|
  3. Multiply each by frequency and sum
  4. Divide by n = 75
(Full calculation left as practice—but the method is what matters!)

You Did It! 🌟

Congratulations! You’ve now mastered how to measure not just the “center” but also the “spread” of data. 🎓

Remember:

  • Use range for a quick snapshot (but don’t trust it fully).
  • Use QD when you care about the middle 50% and want to ignore outliers.
  • Use MD for simple, intuitive average deviation.
  • Use SD for the most powerful, mathematically robust measure—especially when comparing groups via CV.

Keep practicing, and soon these tools will feel like natural extensions of your statistical thinking. You’ve got this! 💪

— Your friendly stats teacher 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top