House Price Prediction Project proves to be the Hello World of the Machine Learning world. It is a very easy project which simply uses Linear Regression to predict house prices. This is going to be a very short blog, so without any further due, Let’s do it…
Step 1 – Importing required libraries.
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score %matplotlib inline
Step 2 – Reading our input data for House Price Prediction.
customers = pd.read_csv('USA_Housing.csv') customers.head()
Step 3 – Describing our data.
Step 4 – Analyzing information from our data.
Step 5 – Plots to visualize data of House Price Prediction.
- We use sns.pairplot(data) to plot all the possible combinations of numerical columns in the dataset.
- From the plots below we can infer one thing that Price is highly correlated to Average Area Income.
Step 6 – Scaling our data.
scaler = StandardScaler() X=customers.drop(['Price','Address'],axis=1) y=customers['Price'] cols = X.columns X = scaler.fit_transform(X)
- We need to scale our data to bring everything down to one scale or within one range.
- We are using StandardScaler here to scale our data.
- Just check out the 1st image of input data and see how different columns belong to different scales.
Step 7 – Splitting our data for training and test purposes.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
- Using train_test_split() to split our data in 70%-30% proportions.
Step 8 – Training our Linear Regression model for House Price Prediction.
lr = LinearRegression() lr.fit(X_train,y_train) pred = lr.predict(X_test) r2_score(y_test,pred)
- We are using r2_score here to measure the performance of our regression model.
- Our model is giving a 0.91 r2_score out of 1 which is a very decent score.
- I also tried using Lasso and Ridge Regressions but they also performed nearly the same as Linear regression.
Step 9 – Let’s visualize our predictions of House Price Prediction.
- This should be a straight line for a 100% accurate model.
- But we are also getting a trend like a straight line which is also not bad.
Step 10 – Plotting the residuals of our House Price Prediction model.
- Here we are plotting a histogram of residuals.
- Residual is the error term in a regression, or we can say the difference between the real value and our predicted value.
- As we can see that most of the residuals are around 0 means our predictions are almost near to the real values, hence it is a very good model.
Step 11 – Observe the coefficients.
cdf=pd.DataFrame(lr.coef_, cols, ['coefficients']).sort_values('coefficients',ascending=False) cdf
- These are the coefficients calculated while Linear Regression.
- Its intuition is that a 1 unit increase in Avg. Area Income will lead to an increase of $230377.522 in the price of the house, assuming all other factors are kept constant.
Do let me know if there’s any query regarding this topic by contacting me on email or LinkedIn.
So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time…