Site icon Machine Learning Projects

Topic Modeling using Latent Dirichlet Allocation – easiest way – with source code – 2022

Machine Learning Projects

So guys in today’s blog we will see how we can perform topic modeling using Latent Dirichlet Allocation. What we do in Topic Modeling is try to club together different objects(documents in this case) on the basis of some similar words.

This means that if 2 documents contain similar words, then there is a very high chance that they both might fall under the same category. So without wasting any time, Let’s do it…

Check out the video here – https://youtu.be/a9WGoIiWwXg

Step 1 – Importing required libraries.

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation

Step 2 – Reading input data.

articles = pd.read_csv('npr.csv')
articles.head()

Step 3 – Checking info of our data.

articles.info()

Step 4 – Creating a Document Term Matrix of our data.

cv = CountVectorizer(max_df=0.95,min_df=2,stop_words='english')
dtm = cv.fit_transform(articles['Article'])
dtm.shape

Step 5 – Initializing the Latent Dirichlet Allocation object.

LDA = LatentDirichletAllocation(n_components=7,random_state=42)
topic_results = LDA.fit_transform(dtm)
LDA.components_.shape

Step 6 – Printing a list of features/words on which clustering will be done.

for i,arr in enumerate(LDA.components_):
    
    print(f'TOP 15 WORDS FOR TOPIC #{i}')
    print([cv.get_feature_names()[i] for i in arr.argsort()[-15:]]) 
    print('\n\n')

Step 7 – Final results

articles['topic'] = topic_results.argmax(axis=1)
articles

Download the Source Code…

NOTE – For downloading data click on the link below, right-click and hit save-as and save it in your project folder with ‘npr.csv’ name.

Download Data…

Do let me know if there’s any query regarding this topic by contacting me on email or LinkedIn. I have tried my best to explain this code.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time…

Read my previous post: WORDS TO VECTORS USING SPACY – PROVING KING-MAN+WOMAN = QUEEN

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projectsFlask projects at machinelearningprojects.net.

Exit mobile version