How to train your first XGBoost model in Python – 2024

In this blog, we will see how you can train your first XGBoost model in Python in the simplest way possible.

XGBoost is an implementation of gradient-boosted decision trees designed for performance and speed.

After reading this post you will know:

  • How to install XGBoost on your system for use in Python.
  • How to prepare data and train your first XGBoost model.
  • How to make predictions using your XGBoost model.

Step 0 – Installing XGBoost

Windows

pip install xgboost

Linux

sudo pip install xgboost

Step 1 – Importing Required Libraries

  • Importing Pandas for reading the CSV file.
  • Importing XGBClassifier from xgboost module to model it.
  • Importing accuracy_score and train_test_split from sklearn to calculate the accuracy and split the data respectively.
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Step 2 – Loading the Data

  • In this tutorial, we are going to use the Pima Indians onset of diabetes dataset.
  • This dataset is comprised of 8 input variables that describe the medical details of patients and one output variable to indicate whether the patient will have an onset of diabetes within 5 years.
  • Download Data from this link.
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)
df.head()
train your first XGBoost model

Step 3 – Splitting the Data

  • Here we are keeping the first 8 columns as features and we name it X.
  • For X we have used df.iloc[:,0:8] which says that take all the rows and include only 0:8(0,1,2,3,4,5,6,7) columns.
  • The last column is the target column and we name it Y.
  • For Y we have used df.iloc[:,8] which says that take all the rows and just take the 8th column(target column).
  • Let’s split the data into a 67:33 train:test ratio using the train_test_split method of sklearn. It takes mainly two parameters; features, and targets. Here X represents features and Y represents targets.
# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]

# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)

Step 4 – Training the XGBoost Model

  • Create an XGBClassifier object and name it model.
  • Now let’s train this model using the training Data.
model = XGBClassifier()
model.fit(X_train, y_train)
train your first XGBoost model

Step 5 – Making predictions on the Test Data

  • Let’s make the predictions now.
  • Use the model.predict method to make predictions on the test data.
  • Let’s see the predictions that our model made.
# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
predictions
train your first XGBoost model

Step 6 – Testing the XGBoost Model Performance

  • Let’s see the accuracy of our model.
  • Here we have used the accuracy_score function of sklearn to find the accuracy of our model.
  • We can see that our model is giving 74% accuracy which is not very fascinating 🙂 but still it
# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
train your first XGBoost model

Let’s see the whole code in one place…

import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# load data
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)

# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]

# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)

# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)

# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Do let me know if there’s any query while you train your first XGBoost model.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: 4 Easiest ways to visualize Decision Trees using Scikit-Learn and Python

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projects, and Flask projects at machinelearningprojects.net.

Abhishek Sharma
Abhishek Sharma

Started my Data Science journey in my 2nd year of college and since then continuously into it because of the magical powers of ML and continuously doing projects in almost every domain of AI like ML, DL, CV, NLP.

Articles: 517

Leave a Reply

Your email address will not be published. Required fields are marked *