How to work with strsplit?(R)



There is a feature in my data set like the one mentioned below:
over $150,000
$25,001 - $50,000
$75,000 - $100,000
under $25,000

I want to create a new feature called normalised income of int type how do I go about it?


Hi @B.Rabbit

Why not to use one ordinal instead ? This will give you the ranking



Hi @Lesaffrea
Being a beginner I do not quite understand your suggestion. Can you please elaborate or provide me with a suitable link for understanding one ordinal?


Hi @B.Rabbit

First you deal more or less with a categories, which have one order, over 150,000 is higher then 75-100 thousands. As a category you can not normalise you will have dependancy for sure and the chi-square stats could be use to tell you the degree of dependancy.
Now as 150 > 100 if you say 150 = 2 and 100 = 1 you have still one order 2 > 1 the order is respected , that is you lost some information compare to let say 155 > 99 but you have still the ranking and then in non parametric you can do some calculations, check this link Wikipedias on Ordinal regression .

Hope this help a little.


Hi @B.Rabbit

You can separate the various strings by ‘space’ character using strsplit. The format is as follows:
data$Income<-strsplit(as.character(data$Income),"[ ]")

This will give you an array of strings at each position.
Next, you can get the string you want by accessing it through its position.

Hope this helps,


Thank you @Lesaffrea , @shashwat.2014