YOLO object detection with OpenCV

In this tutorial, you’ll learn how to use the YOLO object detector to detect objects in both images and video streams using Deep Learning, OpenCV, and Python.
By applying object detection, you’ll not only be able to determine what is in an image, but also where a given object resides!
We’ll start with a brief discussion of the YOLO object detector, including how the object detector works.
From there we’ll use OpenCV, Python, and deep learning to:
- Apply the YOLO object detector to images
- Apply YOLO to video streams
We’ll wrap up the tutorial by discussing some of the limitations and drawbacks of the YOLO object detector, including some of my personal tips and suggestions.
To learn how use YOLO for object detection with OpenCV, just keep reading!
YOLO Object detection with OpenCV
In the rest of this tutorial we’ll:
- Discuss the YOLO object detector model and architecture
- Utilize YOLO to detect objects in images
- Apply YOLO to detect objects in video streams
- Discuss some of the limitations and drawbacks of the YOLO object detector
Let’s dive in!
What is the YOLO object detector?

Figure 1: A simplified illustration of the YOLO object detector pipeline (source). We’ll use YOLO with OpenCV in this blog post.
When it comes to deep learning-based object detection, there are three primary object detectors you’ll encounter:
- R-CNN and their variants, including the original R-CNN, Fast R- CNN, and Faster R-CNN
- Single Shot Detector (SSDs)
- YOLO
R-CNNs are one of the first deep learning-based object detectors and are an example of a two-stage detector.
- In the first R-CNN publication, Rich feature hierarchies for accurate object detection and semantic segmentation, (2013) Girshick et al. proposed an object detector that required an algorithm such as Selective Search (or equivalent) to propose candidate bounding boxes that could contain objects.
- These regions were then passed into a CNN for classification, ultimately leading to one of the first deep learning-based object detectors.
The problem with the standard R-CNN method was that it was painfully slow and not a complete end-to-end object detector.
Girshick et al. published a second paper in 2015, entitled Fast R- CNN. The Fast R-CNN algorithm made considerable improvements to the original R-CNN, namely increasing accuracy and reducing the time it took to perform a forward pass; however, the model still relied on an external region proposal algorithm.
It wasn’t until Girshick et al.’s follow-up 2015 paper, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, that R-CNNs became a true end-to-end deep learning object detector by removing the Selective Search requirement and instead relying on a Region Proposal Network (RPN) that is (1) fully convolutional and (2) can predict the object bounding boxes and “objectness” scores (i.e., a score quantifying how likely it is a region of an image may contain an image). The outputs of the RPNs are then passed into the R-CNN component for final classification and labeling.
While R-CNNs tend to very accurate, the biggest problem with the R-CNN family of networks is their speed — they were incredibly slow, obtaining only 5 FPS on a GPU.
To help increase the speed of deep learning-based object detectors, both Single Shot Detectors (SSDs) and YOLO use a one-stage detector strategy.
These algorithms treat object detection as a regression problem, taking a given input image and simultaneously learning bounding box coordinates and corresponding class label probabilities.
In general, single-stage detectors tend to be less accurate than two-stage detectors but are significantly faster.
YOLO is a great example of a single stage detector.
First introduced in 2015 by Redmon et al., their paper, You Only Look Once: Unified, Real-Time Object Detection, details an object detector capable of super real-time object detection, obtaining 45 FPS on a GPU.
Note: A smaller variant of their model called “Fast YOLO” claims to achieve 155 FPS on a GPU.
YOLO has gone through a number of different iterations, including YOLO9000: Better, Faster, Stronger (i.e., YOLOv2), capable of detecting over 9,000 object detectors.
Redmon and Farhadi are able to achieve such a large number of object detections by performing joint training for both object detection and classification. Using joint training the authors trained YOLO9000 simultaneously on both the ImageNet classification dataset and COCO detection dataset. The result is a YOLO model, called YOLO9000, that can predict detections for object classes that don’t have labeled detection data.
While interesting and novel, YOLOv2’s performance was a bit underwhelming given the title and abstract of the paper.
On the 156 class version of COCO, YOLO9000 achieved 16% mean Average Precision (mAP), and yes, while YOLO can detect 9,000 separate classes, the accuracy is not quite what we would desire.
Redmon and Farhadi recently published a new YOLO paper, YOLOv3: An Incremental Improvement (2018). YOLOv3 is significantly larger than previous models but is, in my opinion, the best one yet out of the YOLO family of object detectors.
We’ll be using YOLOv3 in this blog post, in particular, YOLO trained on the COCO dataset.
The COCO dataset consists of 80 labels, including, but not limited to:
- People
- Bicycles
- Cars and trucks
- Airplanes
- Stop signs and fire hydrants
- Animals, including cats, dogs, birds, horses, cows, and sheep, to name a few
- Kitchen and dining objects, such as wine glasses, cups, forks, knives, spoons, etc.
- …and much more!
You can find a full list of what YOLO trained on the COCO dataset can detect using this link.
I’ll wrap up this section by saying that any academic needs to read Redmon’s YOLO papers and tech reports — not only are they novel and insightful they are incredibly entertaining as well.
But seriously, if you do nothing else today read the YOLOv3 tech report.
It’s only 6 pages and one of those pages is just references/citations.
Furthermore, the tech report is honest in a way that academic papers rarely, if ever, are.
Project structure
Let’s take a look at today’s project layout. You can use your OS’s GUI (Finder for OSX, Nautilus for Ubuntu), but you may find it easier and faster to use the tree command in your terminal:
- $ tree
- .
- ├── images
- │ ├── baggage_claim.jpg
- │ ├── dining_table.jpg
- │ ├── living_room.jpg
- │ └── soccer.jpg
- ├── output
- │ ├── airport_output.avi
- │ ├── car_chase_01_output.avi
- │ ├── car_chase_02_output.avi
- │ ├── car_chase_03_output.avi
- │ └── overpass_output.avi
- ├── videos
- │ ├── airport.mp4
- │ ├── car_chase_01.mp4
- │ ├── car_chase_02.mp4
- │ ├── car_chase_03.mp4
- │ └── overpass.mp4
- ├── yolo-coco
- │ ├── coco.names
- │ ├── yolov3.cfg
- │ └── yolov3.weights
- ├── yolo.py
- └── yolo_video.py
- directories, files
Our project today consists of 4 directories and two Python scripts.
The directories (in order of importance) are:
- yolo-coco/ : The YOLOv3 object detector pre-trained (on the COCO dataset) model files. These were trained by the Darknet team.
- images/ : This folder contains four static images which we’ll perform object detection on for testing and evaluation purposes.
- videos/ : After performing object detection with YOLO on images, we’ll process videos in real time. This directory contains five sample videos for you to test with.
- output/ : Output videos that have been processed by YOLO and annotated with bounding boxes and class names can go in this folder.
We’re reviewing two Python scripts — yolo.py and yolo_video.py . The first script is for images and then we’ll take what we learn and apply it to video in the second script.
Are you ready?
YOLO object detection in images
Let’s get started applying the YOLO object detector to images!
Open up the yolo.py file in your project and insert the following code:
- # import the necessary packages
- import numpy as np
- import argparse
- import time
- import cv2
- import os
- # construct the argument parse and parse the arguments
- ap = argparse.ArgumentParser()
- ap.add_argument("-i", "--image", required=True,
- help="path to input image")
- ap.add_argument("-y", "--yolo", required=True,
- help="base path to YOLO directory")
- ap.add_argument("-c", "--confidence", type=float, default=0.5,
- help="minimum probability to filter weak detections")
- ap.add_argument("-t", "--threshold", type=float, default=0.3,
- help="threshold when applying non-maxima suppression")
- args = vars(ap.parse_args())
All you need installed for this script OpenCV 3.4+ with Python bindings. You can find my OpenCV installation tutorials here, just keep in mind that OpenCV 4 is in beta right now — you may run into issues installing or running certain scripts since it’s not an official release. For the time being I recommend going for OpenCV 3.4+. You can actually be up and running in less than 5 minutes with pip as well.
First, we import our required packages — as long as OpenCV and NumPy are installed, your interpreter will breeze past these lines.
Now let’s parse four command line arguments. Command line arguments are processed at runtime and allow us to change the inputs to our script from the terminal. If you aren’t familiar with them, I encourage you to read more in my previous tutorial. Our command line arguments include:
- --image : The path to the input image. We’ll detect objects in this image using YOLO.
- --yolo : The base path to the YOLO directory. Our script will then load the required YOLO files in order to perform object detection on the image.
- --confidence : Minimum probability to filter weak detections. I’ve given this a default value of 50% ( 0.5 ), but you should feel free to experiment with this value.
- --threshold : This is our non-maxima suppression threshold with a default value of 0.3 . You can read more about non-maxima suppression here.
After parsing, the args variable is now a dictionary containing the key-value pairs for the command line arguments. You’ll see args a number of times in the rest of this script.
Let’s load our class labels and set random colors for each:
- # load the COCO class labels our YOLO model was trained on
- labelsPath = os.path.sep.join([args["yolo"], "coco.names"])
- LABELS = open(labelsPath).read().strip().split("\n")
- # initialize a list of colors to represent each possible class label
- np.random.seed(42)
- COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
- dtype="uint8")
Here we load all of our class LABELS (notice the first command line argument, args["yolo"] being used) on Lines 21 and 22. Random COLORS are then assigned to each label on Lines 25-27.
Let’s derive the paths to the YOLO weights and configuration files followed by loading YOLO from disk:
- # derive the paths to the YOLO weights and model configuration
- weightsPath = os.path.sep.join([args["yolo"], "yolov3.weights"])
- configPath = os.path.sep.join([args["yolo"], "yolov3.cfg"])
- # load our YOLO object detector trained on COCO dataset (80 classes)
- print("[INFO] loading YOLO from disk...")
- net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
To load YOLO from disk on Line 35, we’ll take advantage of OpenCV’s DNN function calledcv2.dnn.readNetFromDarknet . This function requires both a configPath and weightsPath which are established via command line arguments on Lines 30 and 31.
I cannot stress this enough: you’ll need at least OpenCV 3.4 to run this code as it has the updated dnn module required to load YOLO.
Let’s load the image and send it through the network:
- # load our input image and grab its spatial dimensions
- image = cv2.imread(args["image"])
- (H, W) = image.shape[:2]
- # determine only the *output* layer names that we need from YOLO
- ln = net.getLayerNames()
- ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
- # construct a blob from the input image and then perform a forward
- # pass of the YOLO object detector, giving us our bounding boxes and
- # associated probabilities
- blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),
- swapRB=True, crop=False)
- net.setInput(blob)
- start = time.time()
- layerOutputs = net.forward(ln)
- end = time.time()
- # show timing information on YOLO
- print("[INFO] YOLO took {:.6f} seconds".format(end - start))
In this block we:
- Load the input image and extract its dimensions (Lines 38 and 39).
- Determine the output layer names from the YOLO model (Lines 42 and 43).
- Construct a blob from the image (Lines 48 and 49). Are you confused about what a blob is or what the cv2.dnn.blobFromImage does? Give this blog post a read.
Now that our blob is prepared, we’ll
- Perform a forward pass through our YOLO network (Lines 50 and 52)
- Show the inference time for YOLO (Line 56)
What good is object detection unless we visualize our results? Let’s take steps now to filter and visualize our results.
But first, let’s initialize some lists we’ll need in the process of doing so:
- # initialize our lists of detected bounding boxes, confidences, and
- # class IDs, respectively
- boxes = []
- confidences = []
- classIDs = []
These lists include:
- boxes : Our bounding boxes around the object.
- confidences : The confidence value that YOLO assigns to an object. Lower confidence values indicate that the object might not be what the network thinks it is. Remember from our command line arguments above that we’ll filter out objects that don’t meet the 0.5 threshold.
- classIDs : The detected object’s class label.
Let’s begin populating these lists with data from our YOLO layerOutputs :
- # loop over each of the layer outputs
- for output in layerOutputs:
- # loop over each of the detections
- for detection in output:
- # extract the class ID and confidence (i.e., probability) of
- # the current object detection
- scores = detection[5:]
- classID = np.argmax(scores)
- confidence = scores[classID]
- # filter out weak predictions by ensuring the detected
- # probability is greater than the minimum probability
- if confidence > args["confidence"]:
- # scale the bounding box coordinates back relative to the
- # size of the image, keeping in mind that YOLO actually
- # returns the center (x, y)-coordinates of the bounding
- # box followed by the boxes' width and height
- box = detection[0:4] * np.array([W, H, W, H])
- (centerX, centerY, width, height) = box.astype("int")
- # use the center (x, y)-coordinates to derive the top and
- # and left corner of the bounding box
- x = int(centerX - (width / 2))
- y = int(centerY - (height / 2))
- # update our list of bounding box coordinates, confidences,
- # and class IDs
- boxes.append([x, y, int(width), int(height)])
- confidences.append(float(confidence))
- classIDs.append(classID)
There’s a lot here in this code block — let’s break it down.
In this block, we:
- Loop over each of the layerOutputs (beginning on Line 65).
- Loop over each detection in output (a nested loop beginning on Line 67).
- Extract the classID and confidence (Lines 70-72).
- Use the confidence to filter out weak detections (Line 76).
Now that we’ve filtered out unwanted detections, we’re going to:
- Scale bounding box coordinates so we can display them properly on our original image (Line 81).
- Extract coordinates and dimensions of the bounding box (Line 82). YOLO returns bounding box coordinates in the form: (centerX, centerY, width, and height) .
- Use this information to derive the top-left (x, y)-coordinates of the bounding box (Lines 86 and 87).
- Update the boxes , confidences , and classIDs lists (Lines 91-93).
With this data, we’re now going to apply what is called “non-maxima suppression”:
- # apply non-maxima suppression to suppress weak, overlapping bounding
- # boxes
- idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"],
- args["threshold"])
YOLO does not apply non-maxima suppression for us, so we need to explicitly apply it.
Applying non-maxima suppression suppresses significantly overlapping bounding boxes, keeping only the most confident ones.
NMS also ensures that we do not have any redundant or extraneous bounding boxes.
Taking advantage of OpenCV’s built-in DNN module implementation of NMS, we perform non-maxima suppression on Lines 97 and 98. All that is required is that we submit our boundingboxes , confidences , as well as both our confidence threshold and NMS threshold.
If you’ve been reading this blog, you might be wondering why we didn’t use my imutils implementation of NMS. The primary reason is that the NMSBoxes function is now working in OpenCV. Previously it failed for some inputs and resulted in an error message. Now that theNMSBoxes function is working, we can use it in our own scripts.
Let’s draw the boxes and class text on the image!
- # ensure at least one detection exists
- if len(idxs) > 0:
- # loop over the indexes we are keeping
- for i in idxs.flatten():
- # extract the bounding box coordinates
- (x, y) = (boxes[i][0], boxes[i][1])
- (w, h) = (boxes[i][2], boxes[i][3])
- # draw a bounding box rectangle and label on the image
- color = [int(c) for c in COLORS[classIDs[i]]]
- cv2.rectangle(image, (x, y), (x + w, y + h), color, 2)
- text = "{}: {:.4f}".format(LABELS[classIDs[i]], confidences[i])
- cv2.putText(image, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX,
- 0.5, color, 2)
- # show the output image
- cv2.imshow("Image", image)
- cv2.waitKey(0)
Assuming at least one detection exists (Line 101), we proceed to loop over idxs determined by non-maxima suppression.
Then, we simply draw the bounding box and text on image using our random class colors (Lines 105-113).
Finally, we display our resulting image until the user presses any key on their keyboard (ensuring the window opened by OpenCV is selected and focused).
To follow along with this guide, make sure you use the “Downloads” section of this tutorial to download the source code, YOLO model, and example images.
From there, open up a terminal and execute the following command:
- $ python yolo.py --image images/baggage_claim.jpg --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] YOLO took 0.347815 seconds

Figure 2: YOLO with OpenCV is used to detect people and baggage in an airport.
Here you can see that YOLO has not only detected each person in the input image, but also the suitcases as well!
Furthermore, if you take a look at the right corner of the image you’ll see that YOLO has also detected the handbag on the lady’s shoulder.
Let’s try another example:
- $ python yolo.py --image images/living_room.jpg --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] YOLO took 0.340221 seconds
Figure 3: YOLO object detection with OpenCV is used to detect a person, dog, TV, and chair. The remote is a false-positive detection but looking at the ROI you could imagine that the area does share resemblances to a remote.
The image above contains a person (myself) and a dog (Jemma, the family beagle).
YOLO also detects the TV monitor and a chair as well. I’m particularly impressed that YOLO was able to detect the chair given that it’s handmade, old fashioned “baby high chair”.
Interestingly, YOLO thinks there is a “remote” in my hand. It’s actually not a remote — it’s the reflection of glass on a VHS tape; however, if you stare at the region it actually does look like it could be a remote.
The following example image demonstrates a limitation and weakness of the YOLO object detector:
- $ python yolo.py --image images/dining_table.jpg --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] YOLO took 0.362369 seconds
Figure 4: YOLO and OpenCV are used for object detection of a dining room table.
While both the wine bottle, dining table, and vase are correctly detected by YOLO, only one of the two wine glasses is properly detected.
We discuss why YOLO struggles with objects close together in the “Limitations and drawbacks of the YOLO object detector” section below.
Let’s try one final image:
- $ python yolo.py --image images/soccer.jpg --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] YOLO took 0.345656 seconds

Figure 5: Soccer players and a soccer ball are detected with OpenCV using the YOLO object detector.
YOLO is able to correctly detect each of the players on the pitch, including the soccer ball itself. Notice the person in the background who is detected despite the area being highly blurred and partially obscured.
YOLO object detection in video streams
Now that we’ve learned how to apply the YOLO object detector to single images, let’s also utilize YOLO to perform object detection in input video files as well.
Open up the yolo_video.py file and insert the following code:
- # import the necessary packages
- import numpy as np
- import argparse
- import imutils
- import time
- import cv2
- import os
- # construct the argument parse and parse the arguments
- ap = argparse.ArgumentParser()
- ap.add_argument("-i", "--input", required=True,
- help="path to input video")
- ap.add_argument("-o", "--output", required=True,
- help="path to output video")
- ap.add_argument("-y", "--yolo", required=True,
- help="base path to YOLO directory")
- ap.add_argument("-c", "--confidence", type=float, default=0.5,
- help="minimum probability to filter weak detections")
- ap.add_argument("-t", "--threshold", type=float, default=0.3,
- help="threshold when applyong non-maxima suppression")
- args = vars(ap.parse_args())
We begin with our imports and command line arguments.
Notice that this script doesn’t have the --image argument as before. To take its place, we now have two video-related arguments:
- --input : The path to the input video file.
- --output : Our path to the output video file.
Given these arguments, you can now use videos that you record of scenes with your smartphone or videos you find online. You can then process the video file producing an annotated output video. Of course if you want to use your webcam to process a live video stream, that is possible too. Just find examples on PyImageSearch where the VideoStream class from imutils.video is utilized and make some minor changes.
Moving on, the next block is identical to the block from the YOLO image processing script:
- # load the COCO class labels our YOLO model was trained on
- labelsPath = os.path.sep.join([args["yolo"], "coco.names"])
- LABELS = open(labelsPath).read().strip().split("\n")
- # initialize a list of colors to represent each possible class label
- np.random.seed(42)
- COLORS = np.random.randint(0, 255, size=(len(LABELS), 3),
- dtype="uint8")
- # derive the paths to the YOLO weights and model configuration
- weightsPath = os.path.sep.join([args["yolo"], "yolov3.weights"])
- configPath = os.path.sep.join([args["yolo"], "yolov3.cfg"])
- # load our YOLO object detector trained on COCO dataset (80 classes)
- # and determine only the *output* layer names that we need from YOLO
- print("[INFO] loading YOLO from disk...")
- net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
- ln = net.getLayerNames()
- ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
Here we load labels and generate colors followed by loading our YOLO model and determining output layer names.
Next, we’ll take care of some video-specific tasks:
- # initialize the video stream, pointer to output video file, and
- # frame dimensions
- vs = cv2.VideoCapture(args["input"])
- writer = None
- (W, H) = (None, None)
- # try to determine the total number of frames in the video file
- try:
- prop = cv2.cv.CV_CAP_PROP_FRAME_COUNT if imutils.is_cv2() \
- else cv2.CAP_PROP_FRAME_COUNT
- total = int(vs.get(prop))
- print("[INFO] {} total frames in video".format(total))
- # an error occurred while trying to determine the total
- # number of frames in the video file
- except:
- print("[INFO] could not determine # of frames in video")
- print("[INFO] no approx. completion time can be provided")
- total = -1
In this block, we:
- Open a file pointer to the video file for reading frames in the upcoming loop (Line 45).
- Initialize our video writer and frame dimensions (Lines 46 and 47).
- Try to determine the total number of frames in the video file so we can estimate how long processing the entire video will take (Lines 50-61).
Now we’re ready to start processing frames one by one:
- # loop over frames from the video file stream
- while True:
- # read the next frame from the file
- (grabbed, frame) = vs.read()
- # if the frame was not grabbed, then we have reached the end
- # of the stream
- if not grabbed:
- break
- # if the frame dimensions are empty, grab them
- if W is None or H is None:
- (H, W) = frame.shape[:2]
We define a while loop (Line 64) and then we grab our first frame (Line 66).
We make a check to see if it is the last frame of the video. If so we need to break from thewhile loop (Lines 70 and 71).
Next, we grab the frame dimensions if they haven’t been grabbed yet (Lines 74 and 75).
Next, let’s perform a forward pass of YOLO, using our current frame as the input:
- # construct a blob from the input frame and then perform a forward
- # pass of the YOLO object detector, giving us our bounding boxes
- # and associated probabilities
- blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
- swapRB=True, crop=False)
- net.setInput(blob)
- start = time.time()
- layerOutputs = net.forward(ln)
- end = time.time()
- # initialize our lists of detected bounding boxes, confidences,
- # and class IDs, respectively
- boxes = []
- confidences = []
- classIDs = []
Here we construct a blob and pass it through the network, obtaining predictions. I’ve surrounded the forward pass operation with time stamps so we can calculate the elapsed time to make predictions on one frame — this will help us estimate the time needed to process the entire video.
We’ll then go ahead and initialize the same three lists we used in our previous script: boxes ,confidences , and classIDs .
This next block is, again, identical to our previous script:
- # loop over each of the layer outputs
- for output in layerOutputs:
- # loop over each of the detections
- for detection in output:
- # extract the class ID and confidence (i.e., probability)
- # of the current object detection
- scores = detection[5:]
- classID = np.argmax(scores)
- confidence = scores[classID]
- # filter out weak predictions by ensuring the detected
- # probability is greater than the minimum probability
- if confidence > args["confidence"]:
- # scale the bounding box coordinates back relative to
- # the size of the image, keeping in mind that YOLO
- # actually returns the center (x, y)-coordinates of
- # the bounding box followed by the boxes' width and
- # height
- box = detection[0:4] * np.array([W, H, W, H])
- (centerX, centerY, width, height) = box.astype("int")
- # use the center (x, y)-coordinates to derive the top
- # and and left corner of the bounding box
- x = int(centerX - (width / 2))
- y = int(centerY - (height / 2))
- # update our list of bounding box coordinates,
- # confidences, and class IDs
- boxes.append([x, y, int(width), int(height)])
- confidences.append(float(confidence))
- classIDs.append(classID)
In this code block, we:
- Loop over output layers and detections (Lines 94-96).
- Extract the classID and filter out weak predictions (Lines 99-105).
- Compute bounding box coordinates (Lines 111-117).
- Update our respective lists (Lines 121-123).
Next, we’ll apply non-maxima suppression and begin to proceed to annotate the frame:
- # apply non-maxima suppression to suppress weak, overlapping
- # bounding boxes
- idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"],
- args["threshold"])
- # ensure at least one detection exists
- if len(idxs) > 0:
- # loop over the indexes we are keeping
- for i in idxs.flatten():
- # extract the bounding box coordinates
- (x, y) = (boxes[i][0], boxes[i][1])
- (w, h) = (boxes[i][2], boxes[i][3])
- # draw a bounding box rectangle and label on the frame
- color = [int(c) for c in COLORS[classIDs[i]]]
- cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
- text = "{}: {:.4f}".format(LABELS[classIDs[i]],
- confidences[i])
- cv2.putText(frame, text, (x, y - 5),
- cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
You should recognize these lines as well. Here we:
- Apply NMS using the cv2.dnn.NMSBoxes function (Lines 127 and 128) to suppress weak, overlapping bounding boxes. You can read more about non-maxima suppression here.
- Loop over the idxs calculated by NMS and draw the corresponding bounding boxes + labels (Lines 131-144).
Let’s finish out the script:
- # check if the video writer is None
- if writer is None:
- # initialize our video writer
- fourcc = cv2.VideoWriter_fourcc(*"MJPG")
- writer = cv2.VideoWriter(args["output"], fourcc, 30,
- (frame.shape[1], frame.shape[0]), True)
- # some information on processing single frame
- if total > 0:
- elap = (end - start)
- print("[INFO] single frame took {:.4f} seconds".format(elap))
- print("[INFO] estimated total time to finish: {:.4f}".format(
- elap * total))
- # write the output frame to disk
- writer.write(frame)
- # release the file pointers
- print("[INFO] cleaning up...")
- writer.release()
- vs.release()
To wrap up, we simply:
- Initialize our video writer if necessary (Lines 147-151). The writer will be initialized on the first iteration of the loop.
- Print out our estimates of how long it will take to process the video (Lines 154-158).
- Write the frame to the output video file (Line 161).
- Cleanup and release pointers (Lines 165 and 166).
To apply YOLO object detection to video streams, make sure you use the “Downloads” section of this blog post to download the source, YOLO object detector, and example videos.
From there, open up a terminal and execute the following command:
- $ python yolo_video.py --input videos/car_chase_01.mp4 \
- --output output/car_chase_01.avi --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] total frames in video
- [INFO] single frame took 0.3500 seconds
- [INFO] estimated total time to finish: 204.0238
- [INFO] cleaning up...

Figure 6: YOLO deep learning object detection applied to a car crash video.
Above you can see a GIF excerpt from a car chase video I found on YouTube.
In the video/GIF, you can see not only the vehicles being detected, but people, as well as the traffic lights, are detected too!
The YOLO object detector is performing quite well here. Let’s try a different video clip from the same car chase video:
- $ python yolo_video.py --input videos/car_chase_02.mp4 \
- --output output/car_chase_02.avi --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] total frames in video
- [INFO] single frame took 0.3455 seconds
- [INFO] estimated total time to finish: 1082.0806
- [INFO] cleaning up...

Figure 7: In this video of a suspect on the run, we have used OpenCV and YOLO object detection to find the person.
The suspect has now fled the car and is running across a parking lot.
YOLO is once again able to detect people.
At one point the suspect is actually able to make it back to their card and continue the chase — let’s see how YOLO performs there as well:
- $ python yolo_video.py --input videos/car_chase_03.mp4 \
- --output output/car_chase_03.avi --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] total frames in video
- [INFO] single frame took 0.3442 seconds
- [INFO] estimated total time to finish: 257.8418
- [INFO] cleaning up...

Figure 8: YOLO is a fast deep learning object detector capable of being used in real time video provided a GPU is utilized.
As a final example, let’s see how we may use YOLO as a starting point to building a traffic counter:
- $ python yolo_video.py --input videos/overpass.mp4 \
- --output output/overpass.avi --yolo yolo-coco
- [INFO] loading YOLO from disk...
- [INFO] total frames in video
- [INFO] single frame took 0.3534 seconds
- [INFO] estimated total time to finish: 286.9583
- [INFO] cleaning up...

Figure 9: A video of traffic going under an overpass demonstrates that YOLO and OpenCV can be used to detect cars accurately and quickly.
I’ve put together a full video of YOLO object detection examples below:
Credits for video and audio:
- Car chase video posted on YouTube by Quaker Oats.
- Overpass video on YouTube by Vlad Kiraly.
- “White Crow” on the FreeMusicArchive by XTaKeRuX.
Limitations and drawbacks of the YOLO object detector
Arguably the largest limitation and drawback of the YOLO object detector is that:
- It does not always handle small objects well
- It especially does not handle objects grouped close together
The reason for this limitation is due to the YOLO algorithm itself:
- The YOLO object detector divides an input image into an SxS grid where each cell in the grid predicts only a single object.
- If there exist multiple, small objects in a single cell then YOLO will be unable to detect them, ultimately leading to missed object detections.
Therefore, if you know your dataset consists of many small objects grouped close together then you should not use the YOLO object detector.
In terms of small objects, Faster R-CNN tends to work the best; however, it’s also the slowest.
SSDs can also be used here; however, SSDs can also struggle with smaller objects (but not as much as YOLO).
SSDs often give a nice tradeoff in terms of speed and accuracy as well.
It’s also worth noting that YOLO ran slower than SSDs in this tutorial. In my previous tutorial on OpenCV object detection we utilized an SSD — a single forward pass of the SSD took ~0.03 seconds.
However, from this tutorial, we know that a forward pass of the YOLO object detector took ~0.3 seconds, approximately an order of magnitude slower!
If you’re using the pre-trained deep learning object detectors OpenCV supplies you may want to consider using SSDs over YOLO. From my personal experience, I’ve rarely encountered situations where I needed to use YOLO over SSDs:
- I have found SSDs much easier to train and their performance in terms of accuracy almost always outperforms YOLO (at least for the datasets I’ve worked with).
- YOLO may have excellent results on the COCO dataset; however, I have not found that same level of accuracy for my own tasks.
I, therefore, tend to use the following guidelines when picking an object detector for a given problem:
- If I know I need to detect small objects and speed is not a concern, I tend to use Faster R-CNN.
- If speed is absolutely paramount, I use YOLO.
- If I need a middle ground, I tend to go with SSDs.
In most of my situations I end up using SSDs or RetinaNet — both are a great balance between the YOLO/Faster R-CNN.
Want to train your own deep learning object detectors?

Figure 10: In my book, Deep Learning for Computer Vision with Python, I cover multiple object detection algorithms including Faster R-CNN, SSDs, and RetinaNet. Inside I will teach you how to create your object detection image dataset, train the object detector, and make predictions. Not to mention I also cover deep learning fundamentals, best practices, and my personal set of rules of thumb. Grab your copy now so you can start learning new skills.
The YOLO model we used in this tutorial was pre-trained on the COCO dataset…
…but what if you wanted to train a deep learning object detector on your own dataset?
Inside my book, Deep Learning for Computer Vision with Python, I’ll teach you how to train Faster R-CNNs, Single Shot Detectors (SSDs), and RetinaNet to:
- Detect logos in images
- Detect traffic signs (ex. stop sign, yield sign, etc.)
- Detect the front and rear views of vehicles (useful for building a self-driving car application)
- Detect weapons in images and video streams
All object detection chapters in the book include a detailed explanation of both the algorithm and code, ensuring you will be able to successfully train your own object detectors.
Summary
In this tutorial we learned how to perform YOLO object detection using Deep Learning, OpenCV, and Python.
We then briefly discussed the YOLO architecture followed by implementing Python code to:
- Apply YOLO object detection to single images
- Apply the YOLO object detector to video streams
On my machine with a 3GHz Intel Xeon W processor, a single forward pass of YOLO took ~0.3 seconds; however, using a Single Shot Detector (SSD) from a previous tutorial, resulted in only 0.03 second detection, an order of magnitude faster!
For real-time deep learning-based object detection on your CPU with OpenCV and Python, you may want to consider using the SSD.
If you are interested in training your own deep learning object detectors on your own custom datasets, be sure to refer to my book, Deep Learning for Computer Vision with Python, where I provide detailed guides on how to successfully train your own detectors.
I hope you enjoyed today’s YOLO object detection tutorial!
YOLO object detection with OpenCV的更多相关文章
- Object detection with deep learning and OpenCV
目录 Single Shot Detectors for Object Detection Deep learning-based object detection with OpenCV 这篇文 ...
- (转)Awesome Object Detection
Awesome Object Detection 2018-08-10 09:30:40 This blog is copied from: https://github.com/amusi/awes ...
- 读论文系列:Object Detection CVPR2016 YOLO
CVPR2016: You Only Look Once:Unified, Real-Time Object Detection 转载请注明作者:梦里茶 YOLO,You Only Look Once ...
- [Localization] YOLO: Real-Time Object Detection
Ref: https://pjreddie.com/darknet/yolo/ 关注点在于,为何变得更快? 论文笔记:You Only Look Once: Unified, Real-Time Ob ...
- 机器学习:YOLO for Object Detection (一)
最近看了基于CNN的目标检测另外两篇文章,YOLO v1 和 YOLO v2,与之前的 R-CNN, Fast R-CNN 和 Faster R-CNN 不同,YOLO 将目标检测这个问题重新回到了基 ...
- Object Detection(RCNN, SPPNet, Fast RCNN, Faster RCNN, YOLO v1)
RCNN -> SPPNet -> Fast-RCNN -> Faster-RCNN -> FPN YOLO v1-v3 Reference RCNN: Rich featur ...
- [YOLO]《You Only Look Once: Unified, Real-Time Object Detection》笔记
一.简单介绍 目标检测(Objection Detection)算是计算机视觉任务中比较常见的一个任务,该任务主要是对图像中特定的目标进行定位,通常是由一个矩形框来框出目标. 在深度学习CNN之前,传 ...
- tensorfolw配置过程中遇到的一些问题及其解决过程的记录(配置SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving)
今天看到一篇关于检测的论文<SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real- ...
- Object Detection︱RCNN、faster-RCNN框架的浅读与延伸内容笔记
一.RCNN,fast-RCNN.faster-RCNN进化史 本节由CDA深度学习课堂,唐宇迪老师教课,非常感谢唐老师课程中的论文解读,很有帮助. . 1.Selective search 如何寻找 ...
随机推荐
- FineReport9.0定义数据连接(创建与SQL Server 2016数据库的连接)
1.下载并安装好FineReport9.0和SQL Server 2016 2.开始——>所有应用——>Microsoft SQL Server 2016——>SQL Server ...
- 【learning】微信跳一跳辅助c++详解 轻松上万 【上】
写在前面 17年年底Wechat出了这个跳一跳的小游戏,今年2月份的时候简单地玩了一下,发现被游戏虐了(手太残了只能跳20多). 今天刚好有点空,于是就花了一个下午的时间写了一个跳一跳的c++ ...
- HttpContext rewritePath后中文乱码
文章为转载 由于种种原因,最近将服务器上部署的网站修改4.0框架.但悲剧的问题出现了,发现搜索中文的时候关键词都成乱码了. 在网上查找相关资料得到几种相关解决方案如下: 服务器打补丁server200 ...
- POJ 1067
#include<iostream> #include<cstdio> #include<algorithm> #include<cmath> usin ...
- 支付宝小程序自定义弹窗插件|支付宝dialog插件|model插件
支付宝小程序自定义弹窗组件wcPop|小程序自定义对话框|actionSheet弹窗模板 支付宝小程序官方提供的alert提示框.dialog对话框.model弹窗功能比较有限,有些都不能随意自定义修 ...
- ★ phpStudy安装SSL证书实现https链接
[准备] (1)phpstudy下载:http://phpstudy.php.cn/ (2)phpstudy集成环境下正常运行的网站 (3)避免意外请先备份(复制)好httpd.conf配置文件 ...
- 剑指offer五十九之按之字形顺序打印二叉树
一.题目 请实现一个函数按照之字形打印二叉树,即第一行按照从左到右的顺序打印,第二层按照从右至左的顺序打印,第三行按照从左到右的顺序打印,其他行以此类推.二.思路 详见代码 三.代码 import j ...
- 泛型深入--java泛型的继承和实现、泛型擦除
泛型实现类: package generic; /** * 泛型父类:子类为“富二代”:子类的泛型要比父类多 * 1,保留父类的泛型-->子类为泛型类 * 2,不保留父类泛型-->子类按需 ...
- InterView之PHP
PHP HTTP Keep-Alive的作用 作用 Keep-Alive:使客户端到服务器端的连接持续有效,当出现对服务器的后继请求时,Keep-Alive功能避免了建立或者重新建立连接.Web服务器 ...
- C#正则表达式合并连续空格为单个空格
第一种方法: 使用 System.Text.RegularExpressions.Regex.Replace()方法 string result = String.Empty; string str ...