Workshop - not able to detect na in train data set and

r

#1

using r windows 10 Version 1.0.136 – © 2009-2016 RStudio, Inc.
here is the dataset train_gbW7HTd.csv (3.1 MB)

as i try to get na it shows the following results and i dont know what wrong is it since its the same code mentioned in workshop and i suppose there must be error in code mentioned in workshop so please help

total observation in train data set are only 32561

table(is.na(train))

FALSE
390732

colSums(is.na(test))
ID Age Workclass Education Marital.Status Occupation Relationship Race Sex Hours.Per.Week Native.Country
0 0 0 0 0 0 0 0 0 0 0


#2

Hi @jatin_raina

Where is the problem it matches perfectly you have 32561 non NA in your table which represent 12 column for 32561 observations (rows). Nothing wrong or I miss something?
Alain


#3

sir,the problem is as i see my data using view manually it has manymissing values but its not showing up as it is being shown in workshop code since dataset is same so and it gives no 390732 (it has 32561 obs)train_gbW7HTd.csv (3.1 MB)
here is train data set


#4

Hi @jatin_raina

the problem is how do read the dataset, there is empty string in your dataset and not NA, I guess you do
read.csv(“train_gbW7HTd.csv”, stringsAsFactors = FALSE) to get you data set, well now you have one empty string not NA, you have to do
tocheck <-read.csv(“train_gbW7HTd.csv”, stringsAsFactors = FALSE, na.strings = “”)
and if you check for na
sum(is.na(tocheck))
you got !!! 4262 :slight_smile: conclusion be careful with csv !!!
Have fun
Alain


#5

can u interpret your code to explain fully .what does stringsasfactors and na,strings signify here . .
generally i use read.csv(“train_gbW7HTd.csv”) to read csv . as it wasnt happening by this so i imported dataset and it worked fine. how dataset worked fine by being imported using import option .


#6

hi @jatin_raina
training.data.raw<-read.csv(‘train.csv’,header = T,na.strings = “”)
sapply(training.data.raw,function(x)sum(is.na(x)))

Then you’ll get:-

sapply(training.data.raw,function(x)sum(is.na(x)))
ID Age Workclass Education Marital.Status
0 0 1836 0 0
Occupation Relationship Race Sex Hours.Per.Week
1843 0 0 0 0
Native.Country Income.Group
583 0


#7

bystringsasfactors:- you assign numbers to the variables: jaise:1,2,3,4 etc., by equating to false,the default slection(true ) is avoided.

na.strings=""…this means empty strings or “” are replaced by NA


#8

@aniketh_1994,@lesaffrea
i have one more error
prediction_test <- predict(train.tree,newdata = test ,type = “class”)
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object, :
factor Workclass has new levels Local-gov, Self-emp-inc, State-gov, Without-pay
i have only combined other levels into a new level called others and rest are these above still i m facing this error pls tell how to handle this

even though i tried removing identifier ID from train and test dataset still i have this error if i remove ID from test data
prediction_test <- predict(train.tree,newdata = test ,type = “class”)
Error in eval(expr, envir, enclos) : object ‘ID’ not found

previous ans and help much appreciated, plz tell how to do this