Invisible Man using Mask-RCNN – with source code – fun project – 2025

So guys in today’s blog we will see that how we can perform Human Segmentation using Mask R-CNN. This is a very advanced project and many things are happening under the hood. So without any further due, Let’s do it…

Table of Contents

Code for Human Segmentation using Mask-RCNN…

from imutils.video import FPS
import numpy as np
import matplotlib.pyplot as plt
import cv2
import os

webcam = 1
expected_confidence = 0.3
threshold = 0.1
show_output = 1
save_output = 1
kernel = np.ones((5,5),np.uint8)
writer = None
fps = FPS().start()

weightsPath = "mask-rcnn-coco/frozen_inference_graph.pb"
configPath = "mask-rcnn-coco/mask_rcnn_inception_v2_coco_2018_01_28.pbtxt"

print("[INFO] loading Mask R-CNN from disk...")
net = cv2.dnn.readNetFromTensorflow(weightsPath, configPath)

if use_gpu:
    # set CUDA as the preferable backend and target
    print("[INFO] setting preferable backend and target to CUDA...")
    net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
    net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)

print("[INFO] accessing video stream...")
cap = cv2.VideoCapture(0)

print("[INFO] background recording...")
for _ in range(60):
    _,bg = cap.read()
print("[INFO] background recording done...")

fourcc = cv2.VideoWriter_fourcc(*"MJPG")
writer = cv2.VideoWriter('output.avi', fourcc, 20,(bg.shape[1], bg.shape[0]), True)

while True:
    grabbed, frame = cap.read()
    cv2.imshow('org',frame)
    if not grabbed:
        break

    blob = cv2.dnn.blobFromImage(frame, swapRB=True, crop=False)
    net.setInput(blob)
    (boxes, masks) = net.forward(["detection_out_final","detection_masks"])
    for i in range(0, boxes.shape[2]):
        classID = int(boxes[0, 0, i, 1])
        if classID!=0:continue
        confidence = boxes[0, 0, i, 2]

        if confidence > expected_confidence:
            (H, W) = frame.shape[:2]
            box = boxes[0, 0, i, 3:7] * np.array([W, H, W, H])
            (startX, startY, endX, endY) = box.astype("int")
            boxW = endX - startX
            boxH = endY - startY
            mask = masks[i, classID]
            mask = cv2.resize(mask, (boxW, boxH),interpolation=cv2.INTER_CUBIC)
            mask = (mask > threshold)
            bwmask = np.array(mask,dtype=np.uint8) * 255
            bwmask = np.reshape(bwmask,mask.shape)
            bwmask = cv2.dilate(bwmask,kernel,iterations=1)

            frame[startY:endY, startX:endX][np.where(bwmask==255)] = bg[startY:endY, startX:endX][np.where(bwmask==255)]

    if show_output:
        cv2.imshow("Frame", frame)

        if cv2.waitKey(1) ==27:
            break

    if save_output:
        writer.write(frame)

    fps.update()

fps.stop()
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))
  • Line 1-5 – Importing required libraries for Mask-RCNN.
  • Line 7-14 – Declaring some constants.
  • Line 16-20 – Loading the Mask RCNN network.
Mask-RCNN
pc: towardsdatascience.com
  • Line 22-26 – If you want to use GPU, set the backend and target to CUDA.
  • Line 28-29 – Reading frames from the live stream.
  • Line 31-34 – Recording the background.
  • Line 36-37 – Using cv2.VideoWriter() to save output in video format.
  • Line 39-43 – Start the while loop and start grabbing the frames from the webcam. If the webcam is not returning anything, Break.
  • Line 45-47 – Use cv2.dnn.blobFromImage() to create a blob from the image, then that blob is set as input to the network, it flows through the network and we get the output as bounding boxes and masks.
  • Line 48-66 – Traverse in all the outputs and do some preprocessing on the masks to further enhance it. Now in this mask wherever the pixels are white replace those pixels in the original image with the background pixels (Line 66).
  • Line 68-72 – Show the output and break when someone hits the ESC key.
  • Line 74-75 – Save the output in Video form.
  • Line 77 – Update the fps.
  • line 79-81 – Print the fps.

Final Results…

PS – I know the results are not perfect but these results are also nothing less than magic.

NOTE – If you are not having a GPU in your system, please don’t try running this, because it will take forever to run, because of the usage of Mask-RCNN. Even on GPU, it was giving 5-8 fps hardly.

Download Source Code…

Do let me know if there’s any query regarding human segmentation using Mask-RCNN by contacting me on email or LinkedIn. You can also comment down below for any queries.

So this is all for this blog folks, thanks for reading it and I hope you are taking something with you after reading this and till the next time ?…

Read my previous post: NEURAL STYLE TRANSFER

Check out my other machine learning projectsdeep learning projectscomputer vision projectsNLP projectsFlask projects at machinelearningprojects.net.

Leave a Reply

Your email address will not be published. Required fields are marked *