17 Statistical Hypothesis Tests in Python (Cheat Sheet)

Each statistical test is presented in a consistent way, including:

  • The name of the test.
  • What the test is checking.
  • The key assumptions of the test.
  • How the test result is interpreted.
  • Python API for using the test.

Normality Test

This section lists statistical tests that you can use to check if your data has a Gaussian distribution.

Shapiro-Wilk Test

Tests whether a data sample has a Gaussian distribution.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).

Interpretation

  • H0: the sample has a Gaussian distribution.
  • H1: the sample does not have a Gaussian distribution.
from scipy.stats import shapiro
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
stat, p = shapiro(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably Gaussian')
else:
	print('Probably not Gaussian')

D’Agostino’s Test

Tests whether a data sample has a Gaussian distribution.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).

Interpretation

  • H0: the sample has a Gaussian distribution.
  • H1: the sample does not have a Gaussian distribution.
# Example of the D'Agostino's K^2 Normality Test
from scipy.stats import normaltest
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
stat, p = normaltest(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably Gaussian')
else:
	print('Probably not Gaussian')

Anderson-Darling Test

Tests whether a data sample has a Gaussian distribution.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).

Interpretation

  • H0: the sample has a Gaussian distribution.
  • H1: the sample does not have a Gaussian distribution.
# Example of the Anderson-Darling Normality Test
from scipy.stats import anderson
data = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
result = anderson(data)
print('stat=%.3f' % (result.statistic))
for i in range(len(result.critical_values)):
	sl, cv = result.significance_level[i], result.critical_values[i]
	if result.statistic < cv:
		print('Probably Gaussian at the %.1f level' % (sl))

Correlation Tests

This section lists statistical tests that you can use to check if two samples are related.

Pearson’s Correlation Coefficient

Tests whether two samples have a linear relationship.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.
# Example of the Pearson's Correlation test
from scipy.stats import pearsonr
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = pearsonr(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

Spearman’s Rank Correlation

Tests whether two samples have a monotonic relationship.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.
# Example of the Spearman's Rank Correlation Test
from scipy.stats import spearmanr
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = spearmanr(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

Kendall’s Rank Correlation

Tests whether two samples have a monotonic relationship.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.
# Example of the Kendall's Rank Correlation Test
from scipy.stats import kendalltau
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [0.353, 3.517, 0.125, -7.545, -0.555, -1.536, 3.350, -1.578, -3.537, -1.579]
stat, p = kendalltau(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

Chi-Squared Test

Tests whether two categorical variables are related or independent.

Assumptions

  • Observations used in the calculation of the contingency table are independent.
  • 25 or more examples in each cell of the contingency table.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.
# Example of the Chi-Squared Test
from scipy.stats import chi2_contingency
table = [[10, 20, 30],[6,  9,  17]]
stat, p, dof, expected = chi2_contingency(table)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably independent')
else:
	print('Probably dependent')

Stationary Tests

Augmented Dickey-Fuller

Tests whether a time series has a unit root, e.g. has a trend or more generally is autoregressive.

Assumptions

  • Observations in are temporally ordered.

Interpretation

  • H0: a unit root is present (series is non-stationary).
  • H1: a unit root is not present (series is stationary).
# Example of the Augmented Dickey-Fuller unit root test
from statsmodels.tsa.stattools import adfuller
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
stat, p, lags, obs, crit, t = adfuller(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably not Stationary')
else:
	print('Probably Stationary')

Kwiatkowski-Phillips-Schmidt-Shin

Tests whether a time series is trend stationary or not.

Assumptions

  • Observations in are temporally ordered.

Interpretation

  • H0: the time series is trend-stationary.
  • H1: the time series is not trend-stationary.
# Example of the Kwiatkowski-Phillips-Schmidt-Shin test
from statsmodels.tsa.stattools import kpss
data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
stat, p, lags, crit = kpss(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably Stationary')
else:
	print('Probably not Stationary')

Parametric Statistical Hypothesis Tests

Student’s T-test

Tests whether the means of two independent samples are significantly different.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.

Interpretation

  • H0: the means of the samples are equal.
  • H1: the means of the samples are unequal.
 
# Example of the Student's t-test
from scipy.stats import ttest_ind
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = ttest_ind(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

Paired Student’s T-test

Tests whether the means of two paired samples are significantly different.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.
  • Observations across each sample are paired.

Interpretation

  • H0: the means of the samples are equal.
  • H1: the means of the samples are unequal.

Code

# Example of the Paired Student's t-test
from scipy.stats import ttest_rel
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = ttest_rel(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

Analysis of Variance Test (ANOVA)

Tests whether the means of two or more independent samples are significantly different.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.

Interpretation

  • H0: the means of the samples are equal.
  • H1: one or more of the means of the samples are unequal.
# Example of the Analysis of Variance Test
from scipy.stats import f_oneway
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = f_oneway(data1, data2, data3)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

Repeated Measures ANOVA Test

Tests whether the means of two or more paired samples are significantly different.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.
  • Observations across each sample are paired.

Interpretation

  • H0: the means of the samples are equal.
  • H1: one or more of the means of the samples are unequal.

Python Code

Currently not supported in Python.

Nonparametric Statistical Hypothesis Tests**

Mann-Whitney U Test

Tests whether the distributions of two independent samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the distributions of both samples are equal.
  • H1: the distributions of both samples are not equal.
# Example of the Mann-Whitney U Test
from scipy.stats import mannwhitneyu
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = mannwhitneyu(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

Wilcoxon Signed-Rank Test

Tests whether the distributions of two paired samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.
  • Observations across each sample are paired.

Interpretation

  • H0: the distributions of both samples are equal.
  • H1: the distributions of both samples are not equal.
# Example of the Wilcoxon Signed-Rank Test
from scipy.stats import wilcoxon
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = wilcoxon(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

Kruskal-Wallis H Test

Tests whether the distributions of two or more independent samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the distributions of all samples are equal.
  • H1: the distributions of one or more samples are not equal.
# Example of the Kruskal-Wallis H Test
from scipy.stats import kruskal
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = kruskal(data1, data2)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

Friedman Test

Tests whether the distributions of two or more paired samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.
  • Observations across each sample are paired.

Interpretation

  • H0: the distributions of all samples are equal.
  • H1: the distributions of one or more samples are not equal.
# Example of the Friedman Test
from scipy.stats import friedmanchisquare
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]
stat, p = friedmanchisquare(data1, data2, data3)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
	print('Probably the same distribution')
else:
	print('Probably different distributions')

References