# How to count the missing value in R

#1

I am currently working on a data set and I want to count number of missing value in my Ozone column but I am not able to count it
str(z)
‘data.frame’: 153 obs. of 6 variables:
Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... Solar.R: int 190 118 149 313 NA NA 299 99 19 194 …
Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... Temp : int 67 72 74 62 56 66 65 59 61 69 …
Month : int 5 5 5 5 5 5 5 5 5 5 ... Day : int 1 2 3 4 5 6 7 8 9 10 …

Experiments with Data - Query
#2

sum(is.na(z\$Ozone)) should work. The command is.na will return a vector of length z\$Ozone with 1 at all the entries that are NA. Summing those will give the total number of NAs.

3 Likes
#3

Or `summary(z)`.

4 Likes
#4

table(z\$Ozone, exclude=NULL) or table(is.na(z\$Ozone)) also work (although the first one is not so nice to read if column has many different values).

3 Likes
#5

Hi @harry

Hope you got the answer from Above replies .If your not getting have a look at this might be helpful!!!

If You need NA count of all — table(is.na(z))
If you need NA count Column wise – sapply(z, function(x) sum(is.na(x)))
If you need NA count Row wise — rowSums(is.na(z))

Hope it Useful

Regards
Raghavendra

17 Likes
How to find missing values?
#6

Hello Harry,

Just use summary(z), this will give you the missing values in each column.

Using sum(is.na(z\$columnname)) can be misleading since missing values are essentially taken as Null values and not NA and sum(is.na) only sums those where your value is assigned NA in the dataset

1 Like
#7

if you look at data for each of the months(5 through 9) in solar, how to find which month had the greatest inter quatile range for Ozone readings.

‘data.frame’: 153 obs. of 6 variables:
Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... Solar.R: int 190 118 149 313 NA NA 299 99 19 194 …
Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ... Temp : int 67 72 74 62 56 66 65 59 61 69 …
Month : int 5 5 5 5 5 5 5 5 5 5 ... Day : int 1 2 3 4 5 6 7 8 9 10 …

#8

if you look at data for each of the months(5 through 9) in solar, how to find which month had the greatest inter quatile range for Ozone readings.

``````'data.frame': 153 obs. of 6 variables:
\$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
\$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
\$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
\$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
\$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
\$ Day : int 1 2 3 4 5 6 7 8 9 10 ...``````
#9

If you want to see the number of rows with Miss value you can use:

``````sum(complete.cases(data))
sum(!complete.cases(data))``````
#10

Try this:

`sum(is.na(<HERE THE NAME OF YOUR IMPORTED DATA>\$<HERE THE VARIABLE YOU ARE LOOKING FOR>))`
For example:
If you upload your CSV. and you assigned the name my data, will look like.

To see the result of the NA in the Ozone column.

sum(is.na(mydata\$Ozone))

1 Like
#11

colsum(is.na(Z))

#12

sapply(trainset,function(x)sum(is.na(x)))

2 Likes
#13

using for loop:
res<-NULL
tes<-function(x){
for (i in 1:ncol(x)){
temp<-sum(is.na(x[,i]))
temp<-as.data.frame(temp)
temp\$var<-colnames(x)[i]

res<-rbind(res,temp)
}
return(res)
}

#14

Hi
You could also user the below function
sapply(data, function(x) sum(is.na(x)))

#15

It will count all NA in each column

``````na_count <-sapply(data, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)``````
1 Like
#16

There are a few simple commands:

``````  sum(is.na(z.Ozone))
``````

O/P: It will return only the missing values for the single column Ozone

``````for ( col in 1:ncol(z))
{
print(sum(is.na(z[,col]))
}
``````

O/P: It will return the numbers of missing values for each column.

``````lapply(z, function(x) sum(is.na(x)))
``````

O/P: It will return the column name along with the missing values

#17

If you are going for the tabale at once and wanted to find the missing value in each variable separately the do :-
sapply(train,function(x) sum(is.na(x)))
This will give you the missing values separately for each column.
Apart from this you can go for:-
colMeans(is.na(train_data))
This will give you missing value total but not separately