How to work with strsplit?(R)

#1

There is a feature in my data set like the one mentioned below:
Income
over \$150,000
\$25,001 - \$50,000
\$75,000 - \$100,000
under \$25,000

I want to create a new feature called normalised income of int type how do I go about it?

#2

Why not to use one ordinal instead ? This will give you the ranking

Alain

#3

Hi @Lesaffrea
Being a beginner I do not quite understand your suggestion. Can you please elaborate or provide me with a suitable link for understanding one ordinal?
Thanks
Rabbit

#4

First you deal more or less with a categories, which have one order, over 150,000 is higher then 75-100 thousands. As a category you can not normalise you will have dependancy for sure and the chi-square stats could be use to tell you the degree of dependancy.
Now as 150 > 100 if you say 150 = 2 and 100 = 1 you have still one order 2 > 1 the order is respected , that is you lost some information compare to let say 155 > 99 but you have still the ranking and then in non parametric you can do some calculations, check this link Wikipedias on Ordinal regression .

Hope this help a little.
Alain

#5

You can separate the various strings by ‘space’ character using strsplit. The format is as follows:
`data\$Income<-strsplit(as.character(data\$Income),"[ ]")`

This will give you an array of strings at each position.
Next, you can get the string you want by accessing it through its position.

Hope this helps,
Shashwat

#6

Thank you @Lesaffrea , @shashwat.2014