How to find correlation among multiple attributes in group by dataframe object?

dataexploration
pandas
machine_learning
correlation
python

#1

I have a data frame with following attributes :

CP - Counting point of vehicles

A-Junction - Starting node of a road

B-Junction - Ending node of a road

Road - Road name

Date - Date

Time - Time

Vehicle Count - Number of vehicles

I have found that the combination of CP , A-Junction and B-Junction is unique.

So , I can group the dataframe using these three (CP,A-Junction,B-Junction)

CP   A-Junction  B-Junction Road    Date       Time  Vehicle Count
X1        A1          B1     R1    2000-06-09    7       10
X1        A1          B1     R1    2000-06-09    8       15
X1        A1          B1     R1    2000-06-09    9       18
X1        A1          B1     R1    2000-06-09    10      12
X1        A1          B1     R1    2000-06-09    11      25


X2        A1          B1     R1    2000-06-09    7       15
X2        A1          B1     R1    2000-06-09    8       20
X2        A1          B1     R1    2000-06-09    8       20

How can I find the correlation between these combinations?

I want to build a correlation matrix

                (X1-A1-B1)      (X2-A1-B1) ....  
   (X1-A1-B1)      1             <some value>
   (X2-A1-B1)     <some value>     1

Is there any way I can do this ?

Edit1

dictAMV={}
for name,groups in group_by_lat_lon:
    grp=group_by_lat_lon.get_group(name)
    if(name in dictAMV.keys()):
        print("This key ",name," already exists")
    else:
        amvListTemp=[]
        for index,row in grp.iterrows():
            amvListTemp.append(row['Vehicle_count'])
        dictAMV[name]=amvListTemp

I am able to form a dictionary which contains all vehicle counts .
Now , how can I build the matrix .

Can you please help ?


#2

@shounakrockz47 Hi ,

You can not find correlation for discrete variables. You can try chi square or mutual information FS for discrete variables.

If you want to select imp features you may go for wald chi sqr as well


#3

hi , this is not discrete data. Total car count is a continuous variable.


#4

The CP,A-Junction,B-Junction, Road are basically labels . We can combine each unique combination of this and create one label say

CP A-Junction B-Junction Road Label
X1 A1 B1 R1 1111

This now becomes multiclass classification problem