Using the rvest package: How Good are the Movies on Netflix?

When I first explored the idea of scraping Netflix’s movie catalog, I imagined the process to be a challenge. I soon realized that Netflix, along with Hulu, takes extreme efforts into ensuring their data is not being scraped. It is still possible to do but I decided to look for alternative, easier routes. I discovered a website called that lists all of the available movies currently on Netflix and their data was relatively easy to obtain using R’s rvest package. It took a little bit of cleaning but I was able to create a data frame containing the title, year, and genre of every movie on Netflix. In reality, the accuracy of this analysis relies on the reliability of this source. Still, we can see that the most popular genres of films are “Action & Adventure”, “Comedies” and “Dramas”. This discovery matches my expectations, however, I was surprised to see that there are more Bollywood movies than there are horror films.

The next thing I was interested in was getting a gauge on the quality of Netflix films. I personally refer to IMDB before watching a film and tend to avoid anything under a 7.0 rating. I had already downloaded the IMBD dataset for a previous project so it was just a matter combining the data using an inner join on the title, and year observations. Not every Netflix film was accounted for in the join but enough to continue with the analysis considering the stakes. I created a new boolean variable using the mutate() function to determine whether or not a film was rated greater than or equal to 7. After that I summarized the data to count all of the results.

A solid 36% of the current movies on Netflix are rated above a 7.0 which seems fair to me. While technically the majority of films do not pass my personal filter, I’m sure many others are less of a movie snob than myself.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s