So one of the most awaited blogs is finally here. Today we will be building a Movie Recommendation System that will produce very good results in very less lines of code. So without any further due.
Let’s do it…
Step 1 – Importing packages required for Movie Recommendation System.
import pandas as pd
Step 2 – Reading input data.
df1 = pd.read_csv('u.data',sep='\t') df1.columns = ['user_id','item_id','rating','timestamp'] df1.head()
Step 3 – Reading Movie titles.
df2 = pd.read_csv('Movie_Id_Titles') df2.head()
Step 4 – Merging movie data and movie titles.
df = pd.merge(df1,df2,on='item_id') df.head()
- We are merging both data frames on ‘item_id’ which is present in both data frames.
Step 5 – Grouping same movie entries.
rating_and_no_of_rating = pd.DataFrame(df.groupby('title')['rating'].mean().sort_values(ascending=False)) rating_and_no_of_rating
- We are grouping movies here and taking the mean of all ratings given to them and then we are sorting them by their mean rating.
- You can see that in the result below a crap movie ‘They made a criminal’ is showing up which might have got a rating from only one person and that too 5 stars. That’s why its mean is also 5.
Step 6 – Adding a column of no. of ratings.
rating_and_no_of_rating['no_of_ratings'] = df.groupby('title')['rating'].count() rating_and_no_of_rating
- Adding a column of no. of ratings.
- We are calculating the no. of ratings by using the count method of a data frame.
Step 7 – Sorting on no. of ratings.
rating_and_no_of_rating = rating_and_no_of_rating.sort_values('no_of_ratings',ascending=False) rating_and_no_of_rating.head()
- Simply sort by no. of ratings.
- And now we see some genuine results.
- Star Wars which is a very famous movie has got a mean of 4.35 as a rating from 583 users.
Step 8 – Creating a pivot table.
pt = df.pivot_table(index='user_id',columns='title',values='rating') pt.head()
- Creating a pivot table.
- In this pivot table users go along rows and movies go along columns.
- Nan represents that, that user has not given any rating to that movie.
Step 9 – Checking movie names.
- Simply printing all movie names we have.
Step 10 – Live Prediction.
test_movie = input('Enter movie name --> ') movie_vector = pt[test_movie].dropna() similar_movies = pt.corrwith(movie_vector) corr_df = pd.DataFrame(similar_movies,columns=['Correlation']) corr_df = corr_df.join(rating_and_no_of_rating['no_of_ratings']) corr_df = corr_df[corr_df['no_of_ratings']>100].sort_values('Correlation',ascending=False).dropna() corr_df.head(10)
- While testing I added ‘Star Wars (1977)’ as a test movie.
- Pick its vector from the pivot table above.
- Use that vector and correlate it with other movies using corrwith() and it will give its correlation with other movies based on user ratings and store it in similar_movies.
- After that create a data frame as shown below and again merge no_of_ratings in it to take at least 100 ratings as the threshold.
- Sort the results based on correlations and BOOM here are the results.
- We can see the results, all these movies are somehow related to Star Wars.
- Empire Strikes Back, Return of the Jedi, both of them are related to Star Wars.
Do let me know if there’s any query regarding Movie Recommendation System by contacting me on email or LinkedIn.
So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time ?…
Read my previous post: IPL SCORE PREDICTION WITH FLASK APP