How to count number of distinct values in a column of a data table in R?

r
data_wrangling

#1

Hello,

I have a table with 2947 rows and 1 column containing only integer values in the range 1 to 30. I want to calculate the number of distinct values in that column. I used the for loop like this->

k=test[1,1]
count=1
for(i in 1:2947)
{
if(test[i,1]!=k)
{
count=count+1
k=test[i,1]
}
}

which seems to work fine. How can I do it without using the for loop?

Thank you.


#2

Hello @Aditya_Sharma,
For a particular column you could use:

   > df <- c(1,2,3,4,5,6,7,4,5,6)
   > df_uniq <- unique(df)
   > length(df_uniq)
    [1] 7

For unique values of rows in a dataset there is a function distinct in a package in R called dplyr which can be used.
So after loading the package

df1 <- data.frame(x=c(1,2,3,2),y = c("a","b","c","b"))
> df1
  x y
1 1 a
2 2 b
3 3 c
4 2 b

As you can see here the value pair 2,b is repeated.

> distinct(df1)
  x y
1 1 a
2 2 b
3 3 c

This gives only the distinct rows in a dataset.
Hope this helps!!


#3

sorry i forgot to mention that there is another package called sqldf which can be used like:

> sqldf("select distinct(x) from df1")
  x
1 1
2 2
3 3

Then as you can see the count should be =3 ;

> sqldf("select count(distinct(x)) from df1")
  count(distinct(x))
1                  3

Hope this helps!!


#4

You can also use the table function.

Here is an example:

df <- data.frame(val=c(1,2,3,4,5,6,7,4,5,6,2,2,2,2,2,1,1,1,25,28,25,29,3));
table(df$val)

Hope this helps.