Undergraduate

UndergraduateProbability and StatisticsStatistics


Inferential Statistics


Inferential statistics is a branch of statistics that deals with drawing conclusions about a population based on a sample. When we collect data, we usually collect data from a sample rather than the entire population due to practical constraints such as cost, time, and effort. Inferential statistics allows us to make predictions or inferences about the population by analyzing this sample data.

Understanding the basics

To understand inferential statistics, we need to be clear about some basic concepts: population and sample.

  • Population: This is the entire group of people, objects, or events we are interested in. For example, if we are studying the average height of university students, our population consists of all university students.
  • Sample: A subset of the population that is actually observed or collected for study. In our example, this could be a group of 100 university students selected at random.

Inferential statistics goes beyond simply describing the properties of a sample (descriptive statistics). It uses probability theory to estimate population parameters, test hypotheses, and make predictions.

Example of population and sample

In the SVG above, imagine that each circle represents an individual in the population. The red circle is a sample selected from the population.

Key procedures in inferential statistics

There are two main procedures used in inferential statistics:

  • Estimation: This involves estimating population parameters (such as the mean or proportion) from sample statistics. For example, if we want to estimate the average height of all university students, we calculate the average height of our sample and use it as an estimator.
  • Hypothesis testing: This involves making a claim or hypothesis about a population parameter and then using sample data to test that claim. For example, we might hypothesize that the average height of university students is 170 cm and test this hypothesis using sample data.

Estimation example

Suppose we select a sample of 100 students and find that their average height is 168 cm. This sample mean (168 cm) is used to estimate the population mean. We represent it as follows:

Estimated Population Mean = Sample Mean = 168 cm

Hypothesis testing example

Suppose we hypothesize that the average height of university students is 170 cm. We collect a sample and calculate the average height to be 168 cm. Based on this data, inferential statistics will help us decide whether to accept or reject our hypothesis.

Types of assessments

There are two types of inference in inferential statistics:

  • Point estimation: Provides a single value as an estimate of a population parameter. For example, using a sample mean of 168 cm as an estimate of the population mean.
  • Interval estimation: Provides a range of values, called a confidence interval, within which the population parameter is expected to lie. For example, estimating that the average height is between 165 cm and 171 cm with a 95% confidence level.

Confidence interval example

Based on our sample of 100 students with a mean height of 168 cm, suppose we calculate the 95% confidence interval for the population mean from 165 cm to 171 cm:

Confidence Interval: (165 cm, 171 cm)

This means that we are 95% confident that the true average height of all university students falls within this range.

Elements of hypothesis testing

While conducting hypothesis testing, we follow the following steps:

  • Formulate the hypothesis:

The null hypothesis (H0) represents no effect or no difference, while the alternative hypothesis (H1) represents the effect or difference we want to test.

H0: The population mean height is 170 cm.
H1: The population mean height is not 170 cm.
  • Choose a significance level (α): Usually chosen as 5% (0.05), this is the probability of rejecting the null hypothesis when it is actually true.
  • Calculation of test statistics: Depending on the data collected and the type of test being conducted (e.g., t-test, z-test).
  • Make a decision: Compare the test statistic to the critical value or use the p-value to decide whether to reject or fail to reject the null hypothesis.

Making test decisions visible with null and alternative hypotheses

Reject H0 Reject H0 Failed to reject H0

P-value in hypothesis testing

The p-value is an important concept in hypothesis testing. It is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. The lower the p-value, the stronger the evidence against the null hypothesis. If the p-value is less than or equal to the significance level (α), we reject the null hypothesis.

p-value example

Imagine calculating a p-value of 0.03 for our height hypothesis test:

p-value = 0.03

Since 0.03 < 0.05 (our chosen α is 0.05), we reject the null hypothesis, and suggest that the average height is not 170 cm.

Normality tests in inferential statistics

There are several common tests in inferential statistics to address different data types and research questions:

  • T-test: Used to compare the means of two groups. For example, comparing the average heights of female and male university students.
  • Z-test: It is used when the sample size is large (n > 30) and the population variance is known or when comparing proportions.
  • Chi-square test: Used to compare categorical variables. For example, to see whether university students' preference for a subject is independent of their year of study.
  • ANOVA (Analysis of Variance): Used to compare the means of more than two groups. For example, comparing the heights of students from different fields of study.

T-test example

Let's perform a t-test to compare the heights of male and female students. Suppose our sample shows the following:

Male students: mean height = 175 cm, sample size = 50
Female students: mean height = 165 cm, sample size = 50

We use a t-test to determine if the observed difference is statistically significant.

The role of random sampling

An important part of inferential statistics is to ensure that samples are selected randomly. Random sampling helps ensure that each person has an equal chance of being selected, which reduces bias and improves the validity of the results. Random samples represent the entire population, making estimates more accurate.

In the above SVG, the red color represents randomly selected samples from the entire group of blue individuals.

Conclusion

Inferential statistics is an essential aspect of data analysis. It allows statisticians to draw data-driven conclusions about large populations based on small, manageable sample data. By carefully estimating population parameters and performing hypothesis testing, we can answer questions about data trends, relationships, and predictions. Additionally, a proper understanding and application of concepts such as sampling distributions, confidence intervals, and p-values are crucial in drawing accurate conclusions through statistical analysis.

Commonly used words

  • Population: The entire group being studied.
  • Sample: A subset of a population that is used to obtain information about the entire group.
  • Parameter: A numerical characteristic of a population.
  • Statistics: Numerical characteristics of a sample.

Inferential statistics is powerful because it turns sample-based observations into generalizations or predictions about larger populations, and influences everyday decisions in many fields such as science, business, and public policy.


Undergraduate → 6.2.2


U
username
0%
completed in Undergraduate


Comments