Mean, quantile and other way to analyze a variable in Python


#1

Hello,

I have a variable which represents the activation energy for machine.

df_no_missing['P_ACT_KW'].describe()

Out[30]:

count     52.000000
mean     157.166586
std       26.373297
min       89.214953
25%      145.168403
50%      155.868056
75%      173.538194
max      241.400000
Name: P_ACT_KW, dtype: float64

I would like to categorize these machine depending their energy activation value : high activation, medium activation or low activation.
Have you an idea to help me how can I segment P_ACT_KW values in these 3 categories?


#2

Hi @Cyrine,
You can use any of the clustering methods to do the job.
Having said that when we look at the statistical parameters we can safely assume that the distribution of this variable is close to a positively skewed distribution(Mean > median, Max-3rd quartile > 1st Quartile -Min etc.). Being positively skewed we can’t employ a t-table to calculate values 67%(one tailed test) so as to cluster the variables to 3 groups.

Regards