Grade 10 → Statistics ↓
Measures of Central Tendency
Measures of central tendency are important concepts in statistics that help us find the central or typical value in a set of data. These measures give us an idea about where most of the values fall in a data set. They are widely used to summarize data and are helpful in a variety of fields, including economics, education, and healthcare.
The three main measures of central tendency are the mean, median, and mode. Each measure serves different purposes and may be more appropriate for certain types of data than others.
Meaning
The average is the most commonly used measure of central tendency. It is often referred to simply as "the average." The average is calculated by adding up all the numbers in a data set and then dividing by the number of values in the data set. This is especially useful when dealing with data without extreme values (outliers).
Mean = (sum of all values in the data set) / (number of values in the data set)
Let us consider an example:
Imagine we have the following data set showing students' marks in a math test:
Score: 78, 85, 90, 95, 100
To calculate the average score, we first add all the scores together:
Total = 78 + 85 + 90 + 95 + 100 = 448
Next, we divide the total by the number of points present. In this case, there are 5 points:
Mean = 448 / 5 = 89.6
The average score for this group of students is 89.6.
Visual example
In this visual example, each colored circle represents one of the scores. The dashed line represents the mean, which is the average position of all data points.
Median
The median is the middle number in an ordered, ascending or descending list of numbers. If the total number of values is odd, the median is the middle number. If even, it is the average of the two middle numbers. The median is useful for determining the center of a data set when dealing with outliers or skewed data.
Consider the same set of points:
Score: 78, 85, 90, 95, 100
To find the median, arrange the numbers in order and find the middle score:
In order: 78, 85, 90, 95, 100 Median = 90 (third number in a set of five)
If we add one more digit, say 82, the new data set will be:
Score: 78, 82, 85, 90, 95, 100
Since we have six numbers, we take the average of the two middle numbers, 85 and 90:
Median = (85 + 90) / 2 = 87.5
Visual example
Here, the dashed line represents the position of the median, and depicts it as the central dividing line of all the data.
Method
The mode is the value that appears most often in a data set. A set of data may have one mode, more than one mode, or no mode. The mode is particularly useful in qualitative data, where we observe the frequencies of categories.
Let's work through an example:
Data: 5, 8, 9, 8, 10, 15, 8, 22
Here, the number 8 appears most frequently. So, the mode of this data set is 8.
Let's add some more numbers to make this more difficult:
Data: 5, 8, 9, 8, 10, 9, 15, 8, 9, 22
In this new data set, the numbers 8 and 9 both appear most frequently. This means that the data set is bimodal, with two modes: 8 and 9.
Visual example
In the visualization, the largest circles represent the modes of the data set. These are the most frequently occurring values within the set.
In statistical analysis, it is important to understand when to use each measure of central tendency. Each measure reveals different aspects of the data. Choosing the right measure may depend on the nature of the data being analyzed and the specific insights needed.
Comparison of mean, median and mode
Each measure of central tendency has its own strengths and weaknesses:
- Mean: Best for data without outliers and gives a true average. However, it can be heavily skewed by outliers.
- Median: Ideal for skewed distributions or ordinal data, as it is not affected by extreme values.
- Mode: Useful for identifying the most frequently occurring items in categorical data and can help understand the shape of the data distribution.
Let's consider an example of outliers:
Consider the following data set:
Data: 2, 4, 4, 4, 5, 7, 9, 70
The numbers mostly range from 2 to 9, but there is one exception (70) which is much larger than the other numbers.
Calculation of Mean:
Mean = (2 + 4 + 4 + 4 + 5 + 7 + 9 + 70) / 8 = 13.1
The average is 13.1, which does not reflect the typical value in the data set because of the outlier of 70.
To calculate the median, first arrange the data:
2, 4, 4, 4, 5, 7, 9, 70 Since there are 8 values, the median is (4 + 5) / 2 = 4.5
The mean of 4.5 better represents the central value of this data, which is not affected by outliers.
Method:
The most frequently occurring value is 4.
In this case, the mode is useful to indicate the most common occurrence.
Choosing the best solution
Choosing the right measure of central tendency depends on the nature of the data and the specific questions you want to answer:
- If there are no outliers: The mean may be a good choice.
- If the data is skewed: The median is often more accurate.
- If the data is categorical or discrete and contains repeated data points: The mode can provide important information.
In summary, the mean, median, and mode are powerful tools for summarizing data. Each has its own unique strengths that make it appropriate for different situations. By understanding these differences and practicing with data, you can decide which measure to use for robust data analysis.