IPL Score Prediction With Flask App - With Source Code - 2024

In today’s blog, we will build an IPL Score Prediction model using Ridge Regression which is just an upgraded form of Linear Regression. We have the IPL data from 2008 to 2017.

We will also be building a beautiful-looking GUI using HTML and CSS, so without any further due, Let’s do it…

Table of Contents

Create a conda environment and install the required libraries

conda create -n ipl python=3.9
conda activate ipl
pip install joblib numpy sklearn flask 

flask run

Step 1 – Importing libraries required for IPL Score Prediction.

import joblib
import numpy as np
import pandas as pd
import seaborn as sns
from datetime import datetime
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error

Step 2 – Reading the data for IPL Score Prediction.

df = pd.read_csv('ipl.csv')
df.head()

Step 3 – Dropping unnecessary columns.

cols_to_drop = ['mid','batsman','bowler','striker','non-striker']
df.drop(cols_to_drop,axis=1,inplace=True)
df.head()

In our specific use case features like mid, batsman, bowler, striker, and non-striker would not play a great role so it’s better to drop them.
I know that batsmen can play a role in changing scores, but the problem is that there are tonnes of batsmen that have played in IPL so we can’t operate on these many categories, so it’s better to drop them.

Step 4 – Preprocessing our data for IPL Score Prediction.

df['date'] = df['date'].apply(lambda x: datetime.strptime(x,'%Y-%m-%d'))


# we have to remove temporary teams or the teams which are not available now
consistent_teams = ['Chennai Super Kings', 'Delhi Daredevils', 
                    'Kings XI Punjab', 'Kolkata Knight Riders', 
                    'Mumbai Indians', 'Rajasthan Royals', 
                    'Royal Challengers Bangalore', 'Sunrisers Hyderabad']

df = df[(df['bat_team'].isin(consistent_teams)) & (df['bowl_team'].isin(consistent_teams))]


# we don't want first five overs data
df = df[df['overs']>=5.0]

df.head()

Convert the date column to the pandas DateTime column.
Then we have to remove teams that are not playing today in IPL and we just have to keep consistent teams.
Also, we will take data that is after the 5 overs because the initial stages of the match do not play that much important part in deciding the score.

Step 5 – Checking unique venues.

df['venue'].unique()

Checking unique venues.
We can see that there are some foreign grounds also, so we will remove them in the next step.

Step 6 – Correct the names of the venues.

def f(x):
    if x=='M Chinnaswamy Stadium':
        return 'M Chinnaswamy Stadium, Bangalore'
    elif x=='Feroz Shah Kotla':
        return 'Feroz Shah Kotla, Delhi'
    elif x=='Wankhede Stadium':
        return 'Wankhede Stadium, Mumbai'
    elif x=='Sawai Mansingh Stadium':
        return 'Sawai Mansingh Stadium, Jaipur'
    elif x=='Eden Gardens':
        return 'Eden Gardens, Kolkata'
    elif x=='Dr DY Patil Sports Academy':
        return 'Dr DY Patil Sports Academy, Mumbai'
    elif x=='Himachal Pradesh Cricket Association Stadium':
        return 'Himachal Pradesh Cricket Association Stadium, Dharamshala'
    elif x=='Subrata Roy Sahara Stadium':
        return 'Maharashtra Cricket Association Stadium, Pune'
    elif x=='Shaheed Veer Narayan Singh International Stadium':
        return 'Raipur International Cricket Stadium, Raipur'
    elif x=='JSCA International Stadium Complex':
        return 'JSCA International Stadium Complex, Ranchi'
    elif x=='Maharashtra Cricket Association Stadium':
        return 'Maharashtra Cricket Association Stadium, Pune'
    elif x=='Dr. Y.S. Rajasekhara Reddy ACA-VDCA Cricket Stadium':
        return 'ACA-VDCA Stadium, Visakhapatnam'
    elif x=='Punjab Cricket Association IS Bindra Stadium, Mohali':
        return 'Punjab Cricket Association Stadium, Mohali'
    elif x=='Holkar Cricket Stadium':
        return 'Holkar Cricket Stadium, Indore'
    elif x=='Sheikh Zayed Stadium':
        return 'Sheikh Zayed Stadium, Abu-Dhabi'
    elif x=='Sharjah Cricket Stadium':
        return 'Sharjah Cricket Stadium, Sharjah'
    elif x=='Dubai International Cricket Stadium':
        return 'Dubai International Cricket Stadium, Dubai'
    elif x=='Barabati Stadium':
        return 'Barabati Stadium, Cuttack'
    else:
        return x

ignored_stadiums = ['Newlands', "St George's Park",
                    'Kingsmead', 'SuperSport Park', 'Buffalo Park',
                    'New Wanderers Stadium', 'De Beers Diamond Oval',
                    'OUTsurance Oval', 'Brabourne Stadium']

df = df[True^(df['venue'].isin(ignored_stadiums))]
df['venue'] = df['venue'].apply(f)
df.head()

Here we are just using this function to correct the venue names.
After that, we are removing entries in which we have foreign grounds like Newlands, St George’s Park, etc.
In the third last line, we are using XOR operation to remove these grounds.
As we know True xor True is False, that’s what we are doing here. If we come across any entry whose venue is in ignored stadiums that entry will be true and true XOR true will become false and we will not take that in df.

Step 7 – Converting categorical columns to dummy variables.

df_new = pd.get_dummies(data=df,columns=['venue','bat_team','bowl_team'])
df_new.head()

Creating dummy variables out of categorical variables like venue, bat_team, and bowl_team.

Step 8 – Checking columns.

df_new.columns

Step 9 – Just change the positions of the columns.

df_new = df_new[['date','venue_ACA-VDCA Stadium, Visakhapatnam',
       'venue_Barabati Stadium, Cuttack', 'venue_Dr DY Patil Sports Academy, Mumbai',
       'venue_Dubai International Cricket Stadium, Dubai',
       'venue_Eden Gardens, Kolkata', 'venue_Feroz Shah Kotla, Delhi',
       'venue_Himachal Pradesh Cricket Association Stadium, Dharamshala',
       'venue_Holkar Cricket Stadium, Indore',
       'venue_JSCA International Stadium Complex, Ranchi',
       'venue_M Chinnaswamy Stadium, Bangalore',
       'venue_MA Chidambaram Stadium, Chepauk',
       'venue_Maharashtra Cricket Association Stadium, Pune',
       'venue_Punjab Cricket Association Stadium, Mohali',
       'venue_Raipur International Cricket Stadium, Raipur',
       'venue_Rajiv Gandhi International Stadium, Uppal',
       'venue_Sardar Patel Stadium, Motera',
       'venue_Sawai Mansingh Stadium, Jaipur',
       'venue_Sharjah Cricket Stadium, Sharjah',
       'venue_Sheikh Zayed Stadium, Abu-Dhabi',
       'venue_Wankhede Stadium, Mumbai','bat_team_Chennai Super Kings',
       'bat_team_Delhi Daredevils', 'bat_team_Kings XI Punjab',
       'bat_team_Kolkata Knight Riders', 'bat_team_Mumbai Indians',
       'bat_team_Rajasthan Royals', 'bat_team_Royal Challengers Bangalore',
       'bat_team_Sunrisers Hyderabad','bowl_team_Chennai Super Kings',
       'bowl_team_Delhi Daredevils', 'bowl_team_Kings XI Punjab',
       'bowl_team_Kolkata Knight Riders', 'bowl_team_Mumbai Indians',
       'bowl_team_Rajasthan Royals', 'bowl_team_Royal Challengers Bangalore',
       'bowl_team_Sunrisers Hyderabad','runs', 'wickets', 'overs', 'runs_last_5', 'wickets_last_5',
       'total']]
df_new.head()

Step 10 – Resetting index.

df_new.reset_index(inplace=True)
df_new.drop('index',inplace=True,axis=1)
df_new

See in the image above that indices are not proper because we dropped many entries.
So we are just resetting indexes in this step and just deleting the index column that it will make of previous indexes.

Step 11 – Scaling our numerical data for the IPL Score Prediction model.

scaler = StandardScaler()

scaled_cols = scaler.fit_transform(df_new[['runs', 'wickets', 'overs', 'runs_last_5', 'wickets_last_5']])
scaled_cols = pd.DataFrame(scaled_cols,columns=['runs', 'wickets', 'overs', 'runs_last_5', 'wickets_last_5'])

df_new.drop(['runs', 'wickets', 'overs', 'runs_last_5', 'wickets_last_5'],axis=1,inplace=True)
df_new = pd.concat([df_new,scaled_cols],axis=1)

df_new.head()

Scaling our columns like ‘runs’, ‘wickets’, ‘overs’, ‘runs_last_5’, ‘wickets_last_5’ to bring them all down to same scale.
We will not scale the whole data because other columns are just 1s and 0s and this represents categorical columns and we know scaling is just done on numerical values.

Step 12 – Splitting data for training and testing.

X_train = df_new.drop('total',axis=1)[df_new['date'].dt.year<=2016]
X_test = df_new.drop('total',axis=1)[df_new['date'].dt.year>=2017]

X_train.drop('date',inplace=True,axis=1)
X_test.drop('date',inplace=True,axis=1)


y_train = df_new[df_new['date'].dt.year<=2016]['total'].values
y_test = df_new[df_new['date'].dt.year>=2017]['total'].values

We are splitting using the date column.
All the data from 2007-2017 is for training.
Data from and after 2017 is for testing.

Step 13 – Checking our X_train.

X_train

Step 14 – Training our Ridge model for IPL Score Prediction.

ridge = Ridge()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40]}
ridge_regressor = RandomizedSearchCV(ridge,parameters,cv=10,scoring='neg_mean_squared_error')
ridge_regressor.fit(X_train,y_train)

print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)

print('\n')

# IPL Score Prediction
prediction_r = ridge_regressor.predict(X_test)
print('MAE:', mean_absolute_error(y_test, prediction_r))
print('MSE:', mean_squared_error(y_test, prediction_r))
print('RMSE:', np.sqrt(mean_squared_error(y_test, prediction_r)))

print('\n')

print(f'r2 score of ridge : {r2_score(y_test,prediction_r)}')

sns.distplot(y_test-prediction_r)

I also tried Linear regression but that was overfitting to the next level, that’s why I went with Ridge Regression because it prevents our model from overfitting.
Lasso was also giving similar results.

Step 15 – Saving our IPL Score Prediction model.

joblib.dump(ridge_regressor,'iplmodel_ridge.sav')

Working Video of our IPL Score Prediction App…

Download Source Code and Data for IPL Score Prediction…

Do let me know if there’s any query regarding the IPL Score Prediction project by contacting me on email or LinkedIn.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: HOUSE PRICE PREDICTION – USA HOUSING DATA

Check out my other machine learning projects, deep learning projects, computer vision projects, NLP projects, Flask projects at machinelearningprojects.net.

8 Comments

Veluru Balaji

April 26, 2022 / 6:48 pm Reply

Can i use this project
To submit in my college
- Abhishek Sharma
  
  April 27, 2022 / 9:45 am Reply
  
  Sure, Go ahead…
Tejas

May 28, 2022 / 1:32 pm Reply

Can we have documentation link for this project?
- Abhishek Sharma
  
  June 3, 2022 / 11:56 am Reply
  
  Sorry, I don’t have anything like that at the moment…
Shaheela

September 26, 2022 / 8:48 pm Reply

Can we please have the documentation link for this project?
- Abhishek Sharma
  
  September 26, 2022 / 9:44 pm Reply
  
  Sorry, there is nothing like that as of now, at least to my knowledge…
deepak

February 22, 2024 / 3:15 pm Reply

I will be thankful if you could please share the IPL Score Prediction… dataset. thank you
- Abhishek Sharma
  
  March 23, 2024 / 8:03 pm Reply
  
  https://github.com/sharmaji27/IPL-Score-Predictor/blob/master/ipl.csv

IPL Score Prediction with Flask app – with source code – 2024