Create a new variable based on values frequency of existing variable in Python

data_wrangling
python

#1

Hi,
Over last weekend, I was working on a data science problem. In the data set, CITY was one of the variable and there is 2000 unique cities across the globe. When I looked at the frequency distribution of these cities, there is almost 1800 cities with a less than 10 entries . Now I want to create a new variable, where I want to group all 1800 cities to a new category “other” with rest of cities. Can you please help me with the code to perform it.

dataframe: train
variable: city

Regards,
Imran


#2

@Imran,

Follow below steps to create a new variable based on existing frequency.

#Create a copy of existing variable
train[‘city2’]=train[‘city’]

#create a frequency table
city_cnt = train.city.value_counts(dropna=False)

#Filter all cities having count less than 10
city_check = list(city_cnt[city_cnt<10].index)

#Replace value to Others if it is in city_check
train.ix[train[‘city’].isin(city_check), “city”] = “Others”

Hope this helps!

Regards,
Pravin