Best way to Explore Data using Interactive EDA Reports in Python – 2022

Machine Learning Projects

Hey folks I hope you all are doing good, In today’s blog, we will see how we can Explore Data using Interactive EDA Report using the Pandas Profiling library. This library is a very useful library that can create amazing interactive EDA Reports n seconds and can save you much time and effort.

This is going to be a very interesting blog, so without any further due, let’s do it…

Snapshot of our final report…

Explore Data using Interactive EDA Report - numerical column

Step 1 – Importing Required Libraries

  • Importing Pandas to read our Dataset.
  • Importing ProfileReport module of pandas_profiling which will help us generate the EDA Report.
import pandas as pd
from pandas_profiling import ProfileReport 

Step 2 – Importing Data.

  • Download UCI_Credit_Card Data from here and visualize its first 5 rows.
df = pd.read_csv('UCI_Credit_Card.csv')
df.head()
Explore Data using Interactive EDA Report

Step 3 – Let’s Explore Data using Interactive EDA Report

  • Here we are passing our Dataframe for which we want to generate EDA Report as an argument to the ProfileReport function and save the object in a variable called profile.
  • Save this profile in HTML format using the to_file function.
print('Creating Profile Report...')

profile = ProfileReport(df)
profile.to_file("EDA.html")

print('Profile Report Created Successfully...')
Explore Data using Interactive EDA Report

Let’s visualize the generated EDA Report

Explore Data using Interactive EDA Report

Now let’s interpret the EDA Report…

First of all, let’s see the DataSet Statistics

Explore Data using Interactive EDA Report - dataset statistics
  • It says that we have 25 variables/columns in total.
  • Our data is having 30000 observations/rows.
  • There are no missing cells and no duplicate rows in our data.
  • Also, it tells that out of 25 columns 22 are numeric and 3 are categorical(Sex, Marriage, and default payment next month).

Let’s observe any one numerical column

  • Let’s explore the LIMIT_BAL column.
  • It says that it has 81 distinct values.
  • There are no missing values in the column.
  • The mean of the values in the LIMIT_BAL column is 167484.3227.
  • The minimum value in the column is 10000.
  • The maximum value in the column is 1000000.
  • There are no zeroes in the column.
  • Also, there are no negative values in the column.
  • The memory this column is taking is 234.5 KB.

NOTE – Observe the HIGH_CORRELATION written in Red under the column name. When you will take your cursor on that banner it will show some column names with which LIMIT_BAL is having high correlation.

Explore Data using Interactive EDA Report - numerical column

Let’s observe any one categorical column

  • Let’s explore the SEX column.
  • It says that it has 2 distinct values (1 and 2).
  • There are no missing values in the column.
  • And the memory it is taking is 234.5 KB.
  • Also, the bar graph on the side shows that 1 is having 11888 instances and 2 is having 18112 instances.
Explore Data using Interactive EDA Report - categorical column

How to see more information about any column?

  • Just click on the Toggle details button on the bottom right corner of every section and it will show more information about the column.
Explore Data using Interactive EDA Report - more information by toggle details

Let’s Visualize the Interactions

  • Here we can visualize the interactions between every pair of columns of our dataset.
  • Here we are visualizing the interaction between PAY_0 and PAY_2 columns.
Explore Data using Interactive EDA Report - interactions

Let’s Visualize the Correlations

  • We can also visualize the correlations between the columns in this EDA Report.
  • There are many types of visualizations available like Spearman’s, Pearson’s, Kendall’s, Cramer’s, and Phik.
Explore Data using Interactive EDA Report - correlations

Let’s Visualize the Missing Values

  • We can also visualize if there are any missing values in the Data.
  • Here we can see that every column is having 30000 values in it which means no missing data.
Explore Data using Interactive EDA Report - missing values

Do let me know if there’s any query when you Explore Data using Interactive EDA Reports in Python.

So this is all for this blog folks, try to explore this EDA Report as much as possible. Thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: How to train your first XGBoost model in Python

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projects, and Flask projects at machinelearningprojects.net.

Leave a Comment

Your email address will not be published.