Finding percentage of Movies rated above 4

r
data_wrangling

#1

Hello,

I have the following dataset:
A1Ratings.csv (1.3 KB)
For each movie I am trying to find out the % of ratings >= 4.
My code:

tot_count <- data.frame(apply(movie,2,function(x)sum(x >= 1,na.rm = T)))
req_count <- data.frame(apply(movie,2,function(x)sum(x>=4,na.rm = T)))
count <- data.frame(tot_count,req_count,colnames(movie))                                
# Rename Columns:
colnames(count)[1] <- "tot_count"
colnames(count)[2] <- "req_count"
colnames(count)[3] <- "movie"
count <- count[-1,]
count$percent_rating = (count$req_count/count$tot_count)*100
# Arrange in descending order of rating %
arrange(count,desc(count[,4]))

The output that I am getting is:


But apparently this is not the correct order.
Can somebody please tell me what I am doing wrong?


#2

Would you like the data frame sorted by movie percent_rating? Also, your file link doesn’t work.


#3

hi @Pierre_Lafortune,

Yes,the file has to be sorted on the movie percent_rating.
For the file can you please try [here][1]
[1]: https://www.dropbox.com/s/7mmsqa3v7b3lzb1/conditional.csv?dl=0
and let me know if it works??


#5

your code is fine. I got the same result:

df <- read.csv('conditional.csv')
df[-c(1,2)] <- sapply(df[-c(1,2)], function(x) x >= 4)
newdf <- as.data.frame(sapply(df[-c(1,2)], mean, na.rm=T))
colnames(newdf) <- c('Percent_Rating')
newdf[order(newdf$Percent_Rating, decreasing=T),, drop=F]
                                                         Percent_Rating
X318..Shawshank.Redemption..The..1994.                       0.70000000
X260..Star.Wars..Episode.IV...A.New.Hope..1977.              0.53333333
X3578..Gladiator..2000.                                      0.50000000
X541..Blade.Runner..1982.                                    0.44444444
X593..Silence.of.the.Lambs..The..1991.                       0.43750000