What kind of plot will be helpful for total count group data ( data grouped by month and year )?

datavisualization
matplotlib
seaborn

#1

I have a group data frame like this :

group_by_region_year_month=train_data.groupby(['Region Name (GO)','Year','Month'],as_index=False)
    
    region_sum=group_by_region_year_month.agg({"AMV": "sum"})

Here , AMV is my target variable in regression ( this is count of total vehicles).

I have data for each day from 2000-3-17 to 2017-10-11.

I want to show the variation of the count for each year and each month for each Region.

How best can I plot it?

I tried using seaborn.FacetGrid in this way:

# g=sns.FacetGrid(row="Region Name (GO)", col="AMV", data=region_sum)

But it is taking too much time and at the end, it is throwing up error maximum image size reached or something like that.


#2

Hi @shounakrockz47,

Can you print the head of the original dataset and attach the screenshot of the same here? That might help me to clarify your doubt in a better way.


#3

Hi @PulkitS , thanks for reply.

This is my original dataset structure :
<class ‘pandas.core.frame.DataFrame’>
RangeIndex: 1837196 entries, 0 to 1837195
Data columns (total 37 columns):
Region Name (GO) category
ONS LA Name object
CP category
S Ref E int64
S Ref N int64
S Ref Latitude float64
S Ref Longitude float64
Road object
A-Junction object
A Ref E int64
A Ref N int64
B-Junction object
B Ref E int64
B Ref N int64
RCat object
iDir object
Year int64
dCount datetime64[ns]
Hour int64
PC int64
2WMV int64
CAR float64
BUS int64
LGV int64
HGVR2 float64
HGVR3 int64
HGVR4 float64
HGVA3 float64
HGVA5 int64
HGVA6 float64
HGV float64
AMV float64
Day int64
Week int64
Month int64
Day_of_week object
IsWeekend int64
dtypes: category(2), datetime64ns, float64(9), int64(18), object(7)
memory usage: 496.6+ MB

There are several regions in the country. All these regions have several count point (CP). Now , this data is ranged from year 2000 to 2017.

I want to plot , number of count points under each and every regions and see whether they are decreasing or increasing.

Can you please help me ?


#4

Hi @shounakrockz47,

You can segregate this dataset based on different Region Name, i.e.

region_1 = train_data[train_data['Region Name (GO)']=='East Midlands']

This will give you all the datapoints belonging to the first region. Now you can plot the count point (CP) for this region using line plot. Similarly, you can do for each region.

Hope this helps!!


#5

hi @PulkitS , sorry for late reply.

I want to see all of the regions and corresponding CP count for each year.

Which plot should I look for ?

And one more thing , since this is a traffic dataset , each entry corresponds to hourly data.

So , same CP is repeated more than once.

Thanks a lot for your prompt reply :slight_smile:


#6

Hi @shounakrockz47,

Once you have segregated the data based on regions, you can use line plot to visualize the CP count.
You can follow the below mentioned course in which we have explained how to visualize hourly data:

Here we have also explained how to plot the time series data and how to forecast future values.


#7

If I understand your question right, you want the count of CP for every region. Try using pd.crosstab for this case. Let me explain how this works with an example. Suppose my dataset looks like this :

     Outlet_Location_Type 	Outlet_Type 	  Item_Outlet_Sales
0        Tier 1  	    Supermarket Type1         3735.1380
1 	 Tier 3 	    Supermarket Type2 	      443.4228
2 	 Tier 1 	    Supermarket Type1 	      2097.2700
3 	 Tier 3 	    Grocery Store 	      732.3800
4 	 Tier 3 	    Supermarket Type1 	      994.7052

and I want the count of each outlet_type, for tier 1, tier2, tier 3. Using the following command, i get the desired result

temp = pd.crosstab(df['Outlet_Location_Type'], df['Outlet_Type'])
temp

output :

Outlet_Type Grocery Store Supermarket Type1 Supermarket Type2 Supermarket Type3
Outlet_Location_Type
Tier 1 528 1860 0 0
Tier 2 0 2785 0 0
Tier 3 555 932 928 935

Reference