Online Learning Platform

Data Analysis Using Python > Descriptive Statistics > Measures of Dispersion

Reading Data file

If you already installed python in your pc then run jupyter and  your data file are in a directory/folder: i.e. /Users/mdfazlulkarimpatwary/Documents/Lectures/data Analysis Python/employees.xlsx

 
Then the following codes:
import pandas as pd

df = pd.read_excel("/Users/mdfazlulkarimpatwary/Documents/Lectures/data Analysis Python/employees.xlsx")

print(df.head())

 
Calculation Range, inter quartile Range:
Range is the difference between heighest value and lowest value of a data. Here we are going to calculate range for height data which is measured in inches as continuous data.
 
range_height = df['Height'].max() - df['Height'].min()

Inter quartile range is the difference between 1st quartile and the 3rd quartile. So first we need to calculate 1st and 3rd quartiles as follows:

Q1 = df['Height'].quantile(0.25) # 25th percentile
Q3 = df['Height'].quantile(0.75) # 75th percentile
IQR = Q3 - Q1

print("Range:", range_height)
print("Interquartile Range (IQR):", IQR)

Output is:

Range: 20

Interquartile Range (IQR): 10.0

 

Calculating Mean deviation, Standard Deviation and Variance:

import numpy as np

mean_height = df['Height'].mean()
mean_deviation = (df['Height'] - mean_height).abs().mean()

std_dev = df['Height'].std()

variance = df['Height'].var()

print("Mean Deviation:", mean_deviation)
print("Standard Deviation:", std_dev)
print("Variance:", variance)

Output is:
Mean Deviation: 5.29632 Standard Deviation: 6.109765972624502 Variance: 37.32924024024023
We can also use the following to have all the descriptive statistics, (but limited to): 
 
print(df['height'].describe())
Output is:
count    1000.000000
mean       67.967000
std         6.109766
min        58.000000
25%        63.000000
50%        68.000000
75%        73.000000
max        78.000000
Prev
Measures of Location

No More

Feedback
ABOUT

Statlearner


Statlearner STUDY

Statlearner