I have a csv data set with the columns like Sales,Last_region
i want to calculate the percentage of sales for each region, i was able to find the sum of sales with in each region but i am not able to find the percentage with in group by statement.
Groupby statement used
tempsalesregion = customerdata.groupby(["Last_region"])
tempsalesregion = tempsalesregion[["Customer_Value"]].sum().add_prefix("Sum_of_").reset_index()
tempsalesregion
Output is

But what i need is the percentage of sales per region,i am not able to figure out how to find that out.
Hello Niranjan,
you can apply a custom function to you result.
tempsalesregion.apply(lambda x: x/x.sum())
Hope this helps
Hi @niranjan_283
As @j.joshi.1979 mentioned, you can use apply(lambda x: x/x.sum())
to get the percentage values. Here is an example you might find helpful. I had two columns, people who got the loan approved and their gender. I used the below code line
pd.crosstab(df['Approved'],df['Gender']).apply(lambda r: r/r.sum(), axis=1)
The output looks like :
Gender |
Female |
Male |
Approved |
|
|
0 |
0.431021 |
0.568979 |
1 |
0.256410 |
0.743590 |
So, of all the people who did not get the loan, 43% were female and 56% male. Similarly, for people who got the loan approved, 25% are female and 74% are male.
Thanks @AishwaryaSingh and @j.joshi.1979 for your prompt response
I have tried using the apply(lambda x: x/x.sum()) but i didnât got the intended result, i got an error
âTypeError: (âunsupported operand type(s) for /: âstrâ and âstrââ, âoccurred at index Last_regionâ)â
Code used:
tempsalesregion = customerdata.groupby(["Last_region"])
tempsalesregion = tempsalesregion[["Customer_Value"]].sum().add_prefix("Sum_of_").reset_index()
tempsalesregion.apply(lambda x: x/x.sum())
Please check this error, thanks
I think this is because you are resetting the index before applying the lambda function.
can you try using the following code
tempsalesregion = customerdata.groupby([âLast_regionâ])
tempsalesregion = tempsalesregion[[âCustomer_Valueâ]].sum().add_prefix(âSum_of_â)
tempsalesregion.apply(lambda x: x/x.sum()).reset_index()
1 Like
@j.joshi.1979 Thanks.It is working now