Back to: Computer Science
Chapter 3: Measures of Central Tendency
Hello, dear students! 😊 Have you ever wondered how a single number can tell you so much about a whole group of data? Like, “What’s the typical score in your class?” or “How much does an average Ethiopian family earn?” That’s exactly what measures of central tendency help us find! In this chapter, we’ll explore the mean, median, mode, and even quartiles—with real examples, clear explanations, and fun questions along the way. Ready? Let’s go! 🚀
What Is a Measure of Central Tendency?
Imagine you have 50 classmates, and everyone scored differently on a math test. Saying all 50 scores is messy! But if you say, “The average score was 72,” suddenly everyone gets the big picture. 🧠
A measure of central tendency is a single value that represents the center or “typical” value of a data set. Think of it as the “best representative” of the group. It makes huge piles of numbers easy to understand.
For this to work well, a good average should:
- Be clearly defined
- Use all the data
- Not get too messed up by extreme values (like one billionaire in a village!)
- Be easy to calculate
- Allow further math (like adding averages)
The Summation Notation (Σ)
Before we dive into averages, let’s learn a super useful shortcut mathematicians use: the summation symbol Σ (sigma).
If you have data values: \(X_1, X_2, X_3, \dots, X_n\), then instead of writing \(X_1 + X_2 + \dots + X_n\), we write:
This just means: “Add up all the X’s from the first to the last.” Simple, right? 😃
Example: Scores of 5 students: 5, 7, 7, 6, 8.
Then:
Properties of Summation
Here are some useful rules:
Important Point → Why Summation Matters
Summation isn’t just math gymnastics! It’s the backbone of statistics. Every average, every formula uses it. Once you’re comfy with Σ, formulas become easier to read and use.
Real-life Example: A teacher adds all student scores to find the class average. That’s summation in action!
Question: If \(X = \{2, 4, 6\}\) and \(Y = \{1, 3, 5\}\), what is \(\sum (X_i – Y_i)\)?
Types of Measures of Central Tendency
There are four main types:
- Mean (Arithmetic, Geometric, Harmonic)
- Median
- Mode
- Quantiles (Quartiles, Deciles, Percentiles)
Which one you use depends on your data and what you want to learn. Let’s explore each one!
1. The Arithmetic Mean
This is what most people call “the average.” Add all the numbers and divide by how many there are.
For Raw Data
Example: Numbers: 2, 7, 8, 2, 7, 3, 7
Sum = 2 + 7 + 8 + 2 + 7 + 3 + 7 = 36
Count = 7
Mean = \(36 \div 7 = 5.14\)
For Ungrouped Frequency Data
If some numbers repeat, we use frequency.
| \(X_i\) | \(f_i\) | \(f_i X_i\) |
|---|---|---|
| 2 | 2 | 4 |
| 3 | 1 | 3 |
| 7 | 3 | 21 |
| 8 | 1 | 8 |
| Total | 7 | 36 |
Mean = \(36 \div 7 = 5.14\)
For Grouped Data (Class Intervals)
We use the midpoint (class mark) of each class:
Example: Age distribution
| Class | Frequency \(f_i\) | Class Mark \(X_i\) | \(f_i X_i\) |
|---|---|---|---|
| 6–10 | 35 | 8 | 280 |
| 11–15 | 23 | 13 | 299 |
| 16–20 | 15 | 18 | 270 |
| 21–25 | 12 | 23 | 276 |
| 26–30 | 9 | 28 | 252 |
| 31–35 | 6 | 33 | 198 |
| Total | 100 | 1575 |
Mean = \(1575 \div 100 = 15.75\)
Important Point → Coding to Simplify Calculations
When numbers are large, we can “shift” them using a guessed mean \(A\), then adjust later.
Let \(d_i = X_i – A\), then: \(\bar{X} = A + \frac{\sum f_i d_i}{\sum f_i}\)
Why?: It reduces big numbers to smaller ones—less chance of calculator errors!
Real-life Example: Calculating average monthly income when values are in the thousands—use coding to make it easier.
Question: Deviations from assumed mean 7 are: 1, -1, -2, -2, 0, -3, -2, 2, 0, -3. Find the true mean.
Special Properties of the Arithmetic Mean
- Sum of deviations from mean = 0: \(\sum (X_i – \bar{X}) = 0\)
- Sum of squared deviations is minimized at the mean.
- Combined Mean: If two groups have means \(\bar{X}_1, \bar{X}_2\) and sizes \(n_1, n_2\), then:
\[ \bar{X}_c = \frac{n_1 \bar{X}_1 + n_2 \bar{X}_2}{n_1 + n_2} \]
- If you add a constant \(k\) to all values, new mean = old mean + \(k\).
- If you multiply all values by \(k\), new mean = \(k \times\) old mean.
Example: 30 girls average 60, 70 boys average 72. Class mean?
Weighted Mean
When some values are more important, we give them weights.
Example: Exam scores with weights:
- English (60), weight 1
- Biology (75), weight 2
- Math (63), weight 1
- Physics (59), weight 3
- Chemistry (55), weight 3
Weighted mean = \(\frac{1\cdot60 + 2\cdot75 + 1\cdot63 + 3\cdot59 + 3\cdot55}{1+2+1+3+3} = \frac{615}{10} = 61.5\)
Important Point → When to Use Weighted Mean
Use it when items have different importance—like GPA (hard courses count more) or CPI (food prices matter more than jewelry).
Real-life Example: Your final grade might be 40% midterm, 60% final. That’s a weighted mean!
Question: A student scores 80 on a quiz (weight 1) and 90 on a final (weight 3). What’s the weighted average?
Merits and Demerits of Arithmetic Mean
| Merits ✅ | Demerits ❌ |
|---|---|
| Rigidly defined | Affected by extreme values |
| Uses all data | Can’t be used with open-ended classes |
| Good for further math | Not for qualitative data (e.g., beauty) |
| Stable across samples | May not reflect “typical” if skewed |
2. The Geometric Mean (G.M.)
Use this for **averaging ratios or growth rates** (like population growth, interest).
Or using logs:
Example: Find G.M. of 2, 4, 8
\(G.M. = \sqrt[3]{2 \cdot 4 \cdot 8} = \sqrt[3]{64} = 4\)
3. The Harmonic Mean (H.M.)
Use this for **averaging rates** (like speed, price per kg).
Example: A cyclist goes to college at 10 km/h and back at 15 km/h. What’s average speed?
Use H.M. (since distance is same):
Important Point → Relationship Between Means
For positive numbers: **H.M. ≤ G.M. ≤ A.M.**
They are equal only if all values are the same!
Real-life Example: If a company’s profits grow 10%, then 20%, the average growth is G.M., not A.M.!
Question: Two numbers have A.M. = 10 and G.M. = 8. What are the numbers?
4. The Mode
The **most frequent value** in a data set.
- No mode? (all values unique)
- Bimodal? (two modes)
- Multimodal? (many modes)
Data: 5, 3, 5, 8, 9 → Mode = 5
Data: 8, 9, 9, 7, 8, 2, 5 → Bimodal (8 and 9)
Data: 4, 12, 3, 6, 7 → No mode
Mode for Grouped Data
Use this formula for continuous data:
Where:
- \(L\) = lower boundary of modal class (class with highest frequency)
- \(w\) = class width
- \(\Delta_1 = f_{\text{mode}} – f_{\text{previous}}\)
- \(\Delta_2 = f_{\text{mode}} – f_{\text{next}}\)
Example: Farm size distribution
| Size (hectares) | No. of farms |
|---|---|
| 5–15 | 8 |
| 15–25 | 12 |
| 25–35 | 17 |
| 35–45 | 29 |
| 45–55 | 31 |
| 55–65 | 5 |
| 65–75 | 3 |
Modal class = 45–55 (freq = 31)
\(L = 45\), \(w = 10\), \(f_{\text{prev}} = 29\), \(f_{\text{next}} = 5\)
\(\Delta_1 = 31 – 29 = 2\), \(\Delta_2 = 31 – 5 = 26\)
Important Point → When Mode Shines
Mode is great for **categorical data** and **business decisions**—like “What’s the most common shoe size?” or “Most popular phone brand?”
It’s not affected by outliers!
Real-life Example: A clothing store stocks more of the modal size, not the average size—because that’s what sells!
Question: In a class, shoe sizes are: 38 (5 students), 39 (12), 40 (8), 41 (3). What’s the mode?
5. The Median
The **middle value** when data is ordered. Half are below, half above.
For Ungrouped Data
- If \(n\) is odd: median = middle value
- If \(n\) is even: median = average of two middle values
Data: 6, 5, 2, 8, 9, 4 → ordered: 2, 4, 5, 6, 8, 9 → \(n=6\)
Median = \(\frac{5 + 6}{2} = 5.5\)
For Grouped Data
Where:
- \(L\) = lower boundary of median class
- \(c\) = cumulative frequency before median class
- \(f_{\text{med}}\) = frequency of median class
- \(w\) = class width
- Median class = first class where cumulative freq ≥ \(n/2\)
Example: Marks distribution
| Class | Frequency | Cum. Freq |
|---|---|---|
| 40–44 | 7 | 7 |
| 45–49 | 10 | 17 |
| 50–54 | 22 | 39 |
| 55–59 | 15 | 54 |
| 60–64 | 12 | 66 |
| 65–69 | 6 | 72 |
| 70–74 | 3 | 75 |
\(n = 75\), so \(n/2 = 37.5\). Median class = 50–54 (cum.freq = 39 ≥ 37.5)
\(L = 49.5\), \(c = 17\), \(f_{\text{med}} = 22\), \(w = 5\)
Important Point → Why Median Beats Mean in Skewed Data
Median ignores extremes! In Ethiopia, average wealth is misleading because of a few billionaires—but median income tells the real story of “a typical person.”
Real-life Example: House prices in Addis: mean = 5 million birr (skewed by luxury villas), median = 1.2 million (what most people pay).
Question: Incomes (birr): 2000, 2500, 3000, 3500, 100000. What’s better: mean or median?
Merits and Demerits of Median
| Merits ✅ | Demerits ❌ |
|---|---|
| Not affected by outliers | Not based on all values |
| Works with open-ended classes | Not good for further algebra |
| Easy to understand | Less stable in small samples |
6. Quantiles: Quartiles, Deciles, Percentiles
These split data into equal parts:
- Quartiles (Q): 4 parts → Q1 (25%), Q2 (50% = median), Q3 (75%)
- Deciles (D): 10 parts → D1 to D9
- Percentiles (P): 100 parts → P1 to P99
Formula for Grouped Data
Same idea for deciles (\(iN/10\)) and percentiles (\(iN/100\)).
Example: Find Q1, Q2, Q3 from this data:
| Values | Frequency | Cum. Freq |
|---|---|---|
| 140–150 | 17 | 17 |
| 150–160 | 29 | 46 |
| 160–170 | 42 | 88 |
| 170–180 | 72 | 160 |
| 180–190 | 84 | 244 |
| 190–200 | 107 | 351 |
| 200–210 | 49 | 400 |
| 210–220 | 34 | 434 |
| 220–230 | 31 | 465 |
| 230–240 | 16 | 481 |
| 240–250 | 12 | 493 |
Total \(N = 493\)
Q1: \(iN/4 = 123.25\) → class = 170–180
\(L = 170\), \(c = 88\), \(f_Q = 72\), \(w = 10\)
Q2 (median): \(246.5\) → class = 190–200 → \(Q_2 = 190.23\)
Q3: \(369.75\) → class = 200–210 → \(Q_3 = 203.83\)
Important Point → Why Quantiles Matter in Real Life
They show inequality! In Ethiopia, if P90 income is 10x P10, that tells us about economic gaps.
Schools use percentiles: “You scored in the 85th percentile” means you beat 85% of students! 🎓
Question: In a test, P25 = 45, P50 = 60, P75 = 80. What does this tell you about difficulty?
Summary & Final Thoughts
So there you have it! 🌟
- Use mean for symmetric, numerical data.
- Use median for skewed data or outliers.
- Use mode for categorical data or “most popular.”
- Use quartiles to understand spread and inequality.
Remember: No single average is “best.” Choose based on your data and goal! 😊
Practice Exercises (With Answers)
Exercise 1: Missing Frequencies
Marks of 75 students:
| Marks | No. of students |
|---|---|
| 40–44 | 7 |
| 45–49 | 10 |
| 50–54 | 22 |
| 55–59 | f4 |
| 60–64 | f5 |
| 65–69 | 6 |
| 70–74 | 3 |
If 20% of students scored between 55–59, find f4, f5, and the mean.
Total students = 75 → 20% of 75 = 15 → f4 = 15
So f5 = 75 – (7+10+22+15+6+3) = 75 – 63 = 12
Now compute mean using grouped formula → answer ≈ 55.2
Exercise 2: Correcting a Mistake
Average weight of 10 students = 65 kg. Later, one weight was misread as 40 instead of 80. Find correct average.
Correct mean = Wrong mean + (Correct – Wrong)/n = 65 + (80 – 40)/10 = 65 + 4 = 69 kg
Exercise 3: Transformations
Mean of X is 500. What’s the new mean if:
(a) 10 is added to each number?
(b) Each number is multiplied by -5?
(a) New mean = 500 + 10 = 510
(b) New mean = -5 × 500 = -2500
You’ve made it through Chapter 3! 🎉 Keep practicing, and soon these concepts will feel like second nature. Remember, every great statistician started exactly where you are today. 👏