House Price Prediction – USA Housing Data – with source code – easy project – 2024

House Price Prediction Project proves to be the Hello World of the Machine Learning world. It is a very easy project which simply uses Linear Regression to predict house prices. This is going to be a very short blog, so without any further due, Let’s do it…

Checkout the video here –

Step 1 – Importing required libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

%matplotlib inline

Step 2 – Reading our input data for House Price Prediction.

customers = pd.read_csv('USA_Housing.csv')
1 11
Our input data

Step 3 – Describing our data.

House Price Prediction description of data
description of our data

Step 4 – Analyzing information from our data.
House Price Prediction information of data
Info of our data

Step 5 – Plots to visualize data of House Price Prediction.

  • We use sns.pairplot(data) to plot all the possible combinations of numerical columns in the dataset.
  • From the plots below we can infer one thing that Price is highly correlated to Average Area Income.
House Price Prediction
All numeric columns plots

Step 6 – Scaling our data.

scaler = StandardScaler()


cols = X.columns

X = scaler.fit_transform(X)
  • We need to scale our data to bring everything down to one scale or within one range.
  • We are using StandardScaler here to scale our data.
  • Just check out the 1st image of input data and see how different columns belong to different scales.

Step 7 – Splitting our data for training and test purposes.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)
  • Using train_test_split() to split our data in 70%-30% proportions.

Step 8 – Training our Linear Regression model for House Price Prediction.

lr = LinearRegression(),y_train)

pred = lr.predict(X_test)

  • We are using r2_score here to measure the performance of our regression model.
  • Our model is giving a 0.91 r2_score out of 1 which is a very decent score.
  • I also tried using Lasso and Ridge Regressions but they also performed nearly the same as Linear regression.
House Price Prediction r2 score
r2 score of our model

Step 9 – Let’s visualize our predictions of House Price Prediction.

sns.scatterplot(x=y_test, y=pred)
  • This should be a straight line for a 100% accurate model.
  • But we are also getting a trend like a straight line which is also not bad.
House Price Prediction y_test vs pred
y_test vs pred

Step 10 – Plotting the residuals of our House Price Prediction model.

  • Here we are plotting a histogram of residuals.
  • Residual is the error term in a regression, or we can say the difference between the real value and our predicted value.
  • As we can see that most of the residuals are around 0 means our predictions are almost near to the real values, hence it is a very good model.
House Price Prediction residuals

Step 11 – Observe the coefficients.

cdf=pd.DataFrame(lr.coef_, cols, ['coefficients']).sort_values('coefficients',ascending=False)
  • These are the coefficients calculated while Linear Regression.
  • Its intuition is that a 1 unit increase in Avg. Area Income will lead to an increase of $230377.522 in the price of the house, assuming all other factors are kept constant.
House Price Prediction coefficients

Download Source Code…

Do let me know if there’s any query regarding this topic by contacting me on email or LinkedIn.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time…

Read my previous post: WINE QUALITY PREDICTION

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projectsFlask projects at


Leave a Reply

Your email address will not be published. Required fields are marked *