Web scraping in R with top 100 movies on IMDB post


The original page suggested posting here in discussion because of age.

First, IMDB site for movies (page being used below) is a little different now than at time of post so I can’t do everything as in original post. At this point, I’m only trying to web scrape ranking, title, and IMDB rating.

Judging based on the head of data pulled, ranking and title was pulled successfully.
Two different issues with IMDB rating. First issue is likely solved it with your method used to go through the metascores and go through IMDB rating. Ok, no problem.
Second issue I can’t figure out, because based on the head of data pulled, I am getting “The Godfather, Part II” and “Jennifer Aniston”, and I can’t figure out how. The CSS selector used was “strong”.

Still will play with it more, but any help you can offer is appreciated.

Thank you.


Hi @herbacidal

Please use “td strong” instead of “strong” as CSS selector to extract ratings from the page.


Well, it worked for that error, but it gave me results that still aren’t what I wanted, because there’s huge amounts of results, with large spaces in between. Still trying to solve that. Thank you for solving that issue.


Hello All,

I tiered scraping the IMDb website for the 100 most popular feature films released in 2016.

I’m getting error when create data.frame
I’m sharing code and error

R Code:

movies_df<-data.frame(Rank = rank_data, Title = title_data,

                  Description = description_data, Runtime = runtime_data,
                  Genre = genre_data, Rating = rating_data,
                  Metascore = metascore_data, Votes = votes_data,                                                             Gross_Earning_in_Mil = gross_data,
                  Director = directors_data, Actor = actors_data)

Result or Error
Error in data.frame(Rank = rank_data, Title = title_data, Description = description_data, :
arguments imply differing number of rows: 100, 102, 106

Please tell me solution

#Structure of the data frame



Hi, @puneet005 Going by the code and the error description you have provided, I would suggest you to check the dimensions of every feature created because the dimensions do not seem to match and hence a data frame cannot be formed. The number of rows of every feature should be the same.

Hope this helps!