Experiments with Data Workshop doubts


#1

Hi, I have some doubts in the workshop. Before imputing missing values, When I search for missing values in the train file using is.na function and colSums also doesn’t show any NAs in the column, I can see only FALSE values and no TRUE values and while using summary() I can’t see any NAs in the variables even though the excel file has many NAs;

I am not sure if someone can help me please!


#2

Check this:

This helped me out.


#3

Thanks. Can you help interpret this:
test <- read.csv(“test_2AFBew7.csv”, header=T, na.strings=c("")) what does header=T, na.strings=c("")) signify.


#4

type ?read.csv in R console for any help with the function. na.strings=c("") specifies that the contained values are to be interpreted as NA’s so suppose you write na.strings=c("", “A”, “B”, “C”) this results in all blanks and “A” , “B” & “C” in the file read to be converted to NA.


#5

Can someone help me, when i try imputing missing values by mode using the command as described in the workshop:

install.packages(“mlr”,repos=‘http://cran.us.r-project.org’)
library(mlr)
imputed_data<-impute(train, classes= list(factor=imputeMode()))
train<-imputed_data$data

it gives me error: Error in impute(train, classes = list(factor = imputeMode())) :
unused argument (classes = list(factor = imputeMode()))


#6

Hi @Paruloberai

In this case, you need to make some checks.
This error says that mode function is not working. So, mode function will not work if the factor variables have many levels.
In your case, check which variables belong to class factor. Then see, if you have mistakenly classified a numeric or integer variable as factor variable.

P.S - I tried running this code at my end and found absolutely no problem.
Keep me posted in question by tagging me.

Regards
Manish


#7

Can anyone please help on Original question I am also getting same problem?
I am using 3.2.5
I am also getting FALSE value different from given workshop example.

Thanks in Advance


Experiments with Data - Query
#8

Hi @premsheth

Assuming you are using R, while loading the data, you need to specify missing values. Use the following code:

train_data <- read.csv("train.csv",na.strings = c(" "))


#9

Thank you so much Manish for reply
I used step what you said and got correct things


#10

i m using same code mentioned in workshop but it is having no effect
colSums(is.na(tocheck))
ID Age Workclass Education Marital.Status
0 0 1836 0 0
Occupation Relationship Race Sex Hours.Per.Week
1843 0 0 0 0
Native.Country Income.Group
583 0

imputed_data<- impute(tocheck,classes = list(factor = imputeMode()))
test<- imputed_data$data
colSums(is.na(imputed_data))

colSums(is.na(test))
ID Age Workclass Education Marital.Status
0 0 1836 0 0
Occupation Relationship Race Sex Hours.Per.Week
1843 0 0 0 0
Native.Country Income.Group
583 0
plzz help with this


#11

Thanks Manish, but I am getting False count 386470 instead of 419031. How could i get the right number?