Big Data analysts and data scientists need an understanding of statistics for the following four key reasons:
-To know how to properly present and describe information
-To know how to draw conclusions about large populations based
only on information obtained from samples
-To know how to improve processes
-To know how to obtain reliable forecasts
Let's focus on some statistics functions in Python:
Averages and
measures:
harmonic_mean() Harmonic mean of data.
median() Median (middle value) of data.
median_low() Low median of data.
median_high() High median of data.
median_grouped() Median, or 50th percentile, of grouped data.
mode() Mode (most common value) of discrete data.
Measures of spread:
pstdev() Population standard deviation of data.
pvariance() Population variance of data.
stdev() Sample standard deviation of data.
variance() Sample variance of data.
The best way to learn new things is to take a practical approach
of the things you want to learn. Here is a fragment of code that demonstrates the use of statement with statistics in Python:
Æ’ python command
import statistics
print(statistics.mean([1,2]))
print(statistics.harmonic_mean([2, 3,
12]))
print(statistics.median([2, 5, 12]))
print(statistics.median([2, 5, 12,
99]))
print(statistics.median([1, 2, 3,
4]))
print(statistics.median(['tesla','bmw','mercedes','ford','honda']))
print(statistics.variance([1, 3, 8]))
print(statistics.stdev([1, 3, 8]))
print(statistics.pvariance([1, 3,
8]))
√ output
1.5
3.272727272727273
5
8.5
2.5
honda
13
3.605551275463989
8.666666666666666
Sign up here with your email

ConversionConversion EmoticonEmoticon