How to find percentage of total with groupby pandas

pandas
python

#1

I have a csv data set with the columns like Sales,Last_region
i want to calculate the percentage of sales for each region, i was able to find the sum of sales with in each region but i am not able to find the percentage with in group by statement.

Groupby statement used

tempsalesregion = customerdata.groupby(["Last_region"])
tempsalesregion = tempsalesregion[["Customer_Value"]].sum().add_prefix("Sum_of_").reset_index()
tempsalesregion

Output is

image

But what i need is the percentage of sales per region,i am not able to figure out how to find that out.


#2

Hello Niranjan,

you can apply a custom function to you result.

tempsalesregion.apply(lambda x: x/x.sum())

Hope this helps


#3

Hi @niranjan_283

As @j.joshi.1979 mentioned, you can use apply(lambda x: x/x.sum()) to get the percentage values. Here is an example you might find helpful. I had two columns, people who got the loan approved and their gender. I used the below code line

pd.crosstab(df['Approved'],df['Gender']).apply(lambda r: r/r.sum(), axis=1)

The output looks like :

Gender Female Male
Approved
0 0.431021 0.568979
1 0.256410 0.743590

So, of all the people who did not get the loan, 43% were female and 56% male. Similarly, for people who got the loan approved, 25% are female and 74% are male.


#4

Thanks @AishwaryaSingh and @j.joshi.1979 for your prompt response

I have tried using the apply(lambda x: x/x.sum()) but i didn’t got the intended result, i got an error

“TypeError: (“unsupported operand type(s) for /: ‘str’ and ‘str’”, ‘occurred at index Last_region’)”

Code used:

tempsalesregion = customerdata.groupby(["Last_region"])
tempsalesregion = tempsalesregion[["Customer_Value"]].sum().add_prefix("Sum_of_").reset_index()
tempsalesregion.apply(lambda x: x/x.sum())

Please check this error, thanks


#5

I think this is because you are resetting the index before applying the lambda function.

can you try using the following code

tempsalesregion = customerdata.groupby([“Last_region”])
tempsalesregion = tempsalesregion[[“Customer_Value”]].sum().add_prefix(“Sum_of_”)
tempsalesregion.apply(lambda x: x/x.sum()).reset_index()


#6

@j.joshi.1979 Thanks.It is working now