Site icon Machine Learning Projects

How to train your first XGBoost model in Python – 2023

Machine Learning Projects

In this blog, we will see how you can train your first XGBoost model in Python in the simplest way possible.

XGBoost is an implementation of gradient-boosted decision trees designed for performance and speed.

After reading this post you will know:

Step 0 – Installing XGBoost

Windows

pip install xgboost

Linux

sudo pip install xgboost

Step 1 – Importing Required Libraries

import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

Step 2 – Loading the Data

df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)
df.head()

Step 3 – Splitting the Data

# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]

# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)

Step 4 – Training the XGBoost Model

model = XGBClassifier()
model.fit(X_train, y_train)

Step 5 – Making predictions on the Test Data

# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]
predictions

Step 6 – Testing the XGBoost Model Performance

# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Let’s see the whole code in one place…

import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# load data
df = pd.read_csv('pima-indians-diabetes.data.csv',header=None)

# split data into X and y
X = df.iloc[:,0:8]
Y = df.iloc[:,8]

# split data into train and test sets
test_size = 0.33
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=7)

# fit model no training data
model = XGBClassifier()
model.fit(X_train, y_train)

# make predictions for test data
y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

# evaluate predictions
accuracy = accuracy_score(y_test, predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))

Do let me know if there’s any query while you train your first XGBoost model.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time …

Read my previous post: 4 Easiest ways to visualize Decision Trees using Scikit-Learn and Python

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projects, and Flask projects at machinelearningprojects.net.

Exit mobile version