Undergraduate → Probability and Statistics → Statistics ↓
Descriptive Statistics
Descriptive statistics is a branch of statistics that aims to summarize a set of data. It provides simple summaries about samples and measures. Such summaries can be either quantitative, using numerical calculations, or visual, through various charts and graphs. Descriptive statistics help simplify large amounts of data in an understandable way. Each descriptive statistic reduces a lot of data to a simple summary.
Types of descriptive statistics
Descriptive statistics are divided into measures of central tendency and measures of variability or dispersion.
Measures of central tendency
Measures of central tendency describe the center point of a data set. There are three main measurements:
- Meaning
- Median
- Method
Meaning
The mean is the average of the data set. It is calculated by adding up all the numbers and dividing by the count of the numbers.
Mean = (Sum of all data points) / (Number of data points)
Example:
Data: 2, 3, 5, 7, 11
Mean = (2 + 3 + 5 + 7 + 11) / 5 = 5.6
Median
The median is the middle value of an ordered data set. If the number of data points is odd, the median is the middle number. If even, it is the average of the two middle numbers.
Example of odd number of data points:
Data: 3, 5, 7, 9, 11
Median = 7
Example of even number of data points:
Data: 3, 5, 7, 9
Median = (5 + 7)/2 = 6
Method
The mode is the number that appears most often in a data set. A set of data may have one mode, more than one mode, or no mode.
Example:
Data: 4, 4, 6, 8, 2, 4, 10
Mode = 4
Measures of variability
Measures of variability describe the spread of data within a data set. Key measurements include:
- Category
- Quarrel
- Standard Deviation
Category
The range is the difference between the highest and lowest values in a data set.
Range = (Maximum value) - (Minimum value)
Example:
Data: 3, 7, 8, 15, 20
Range = 20 – 3 = 17
Quarrel
Variance measures how far each number in the set is from the mean and thus how far it is from every other number in the set. It is calculated by taking the average of the squared deviations from the mean.
Variance = (Σ (xi - Mean)^2) / N
Example:
Data: 3, 7, 7, 19
Mean = (3 + 7 + 7 + 19) / 4 = 9
Variance = [(3-9)^2 + (7-9)^2 + (7-9)^2 + (19-9)^2] / 4 = 30
Standard deviation
The standard deviation is the square root of the variance and provides a measure of the average distance from the mean.
Standard Deviation = √Variance
Example:
Data: 3, 7, 7, 19
Variance = 30
Standard deviation = √30 ≈ 5.48
Visualization of descriptive statistics
Descriptive statistics can be represented using a variety of graphical techniques. These include histograms, bar charts, pie charts, box plots, and scatter plots.
Bar chart
Bar charts are used to display categorical data, with rectangular bars indicating the frequency of each category. The length of the bars is proportional to the number of cases in each category.
Histogram
Histograms are used to display continuous data and it shows the frequency distribution of a set of continuous data points.
Sketch
Box plots are used to display the distribution of data based on a five-point summary: minimum, first quartile, median, third quartile, and maximum.
Pie chart
Pie charts display proportional data and each slice represents a part of the whole. It is particularly effective for showing part-to-whole relationships.
Scatter plot
Scatter plots are used to determine the relationship between two variables. Data are plotted as a collection of points, each of which has the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position on the vertical axis.
Importance of descriptive statistics
Descriptive statistics are incredibly useful because they provide a simple summary of samples and measurements, giving a quick overview of a data set. They also provide the basis for further statistical analysis, including inferential statistics, which helps ensure accurate and reliable research findings.
Visual examples such as charts and plots not only make data understandable at a glance but also provide insightful tools that can highlight important characteristics of a data set such as trends, fluctuations, and relationships between variables.
In practice, these tools are invaluable in various fields such as science, finance, business analysis, and economics, where having a summary and good understanding of datasets can guide important decision-making processes.
This comprehensive investigation of descriptive statistics highlights its critical role in simplifying and communicating complex data in an understandable and actionable way. By translating large amounts of numbers into digestible insights, descriptive statistics provides the lens through which we can view, understand, and analyze the world through data.