| Python: Statistics #2 | - Big data with Python

The best way to learn new things is to take a practical approach of the things you want to learn. Here is a fragment of code that demonstrates the use of the statistics statement in Python:

ƒ python command

from statistics import mean

from random import shuffle

drug = [54, 73, 53, 70, 73, 68, 52, 65, 65]

placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46]

observed_diff = mean(drug) - mean(placebo)

n = 10000

count = 0

combined = drug + placebo

for i in range(n):

shuffle(combined)

new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):])

count += (new_diff >= observed_diff)

print(f'{n} label reshufflings produced only {count} instances with a difference')

print(f'at least as extreme as the observed difference of {observed_diff:.1f}.')

print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null')

print(f'hypothesis that there is no difference between the drug and the placebo.')

√ output

10000 label reshufflings produced only 10 instances with a difference

at least as extreme as the observed difference of 13.0.

The one-sided p-value of 0.0010 leads us to reject the null

hypothesis that there is no difference between the drug and the placebo.