How to extract columns with repeating values in a data frame in R?

r
data_wrangling

#1

Hello,

I want to select all columns in my data frame which have one or more repeating values and the columns have variables with different data types. How can I do this? For example->

If my data frame is->

q=1:5
r=c(“a”,“b”,“c”,“d”,“a”)
s=c(TRUE,FALSE,FALSE,TRUE,FALSE)
df=data.frame(q,r,s)
df
q r s
1 1 a TRUE
2 2 b FALSE
3 3 c FALSE
4 4 d TRUE
5 5 a FALSE

Then I want the output as ->

r s
1 a TRUE
2 b FALSE
3 c FALSE
4 d TRUE
5 a FALSE

How to do this?

Thanks


#2

@Ravi ,

q=1:5
r=c(“a”,“b”,“c”,“d”,“a”)
s=c(TRUE,FALSE,FALSE,TRUE,FALSE)
df=data.table(q,r,s)

df1 = sqldf(“select * from df group by r”)

result = sqldf(“select A.r from df A inner join df1 B on A.r = B.r”)
This gives:

Note the A.r part in the last sqldf code.This will select data from the original data as you want only the columns having duplicate data whereas df1 has only the distinct values.
sqldf is a package in R which allows you to use sql queries within R.
Hope this helps!


#3

@shuvayan that is not the desired output requested by the OP.

I would do this in two steps:

any_dupl <- sapply(df, function(x) any(duplicated(x))) # first check
df2 <- df[any_dupl] #subset based on above condition
uniq_class <- !duplicated(sapply(df2, class)) #check for data types
df3 <- df2[uniq_class] #subset the unique data types
#   r     s
# 1 a  TRUE
# 2 b FALSE
# 3 c FALSE
# 4 d  TRUE
# 5 a FALSE