1 背景介绍

1.1 什么是图像分割和实例分割

1.2 Mask-RCNN原理

2 Mask-RCNN在OpenCV中的使用

2.1 模型下载

Mask R-CNN具体内容见：

https://arxiv.org/pdf/1703.06870.pdf

Mask R-CNN最初于2017年11月由Facebook的AI研究团队使用Python和Caffe2推出。工程代码见：

https://github.com/facebookresearch/Detectron

后来Mask R-CNN被移植到Tensorflow，并且在共享了几个预先训练的模型，这些模型具有不同的训练架构，如InceptionV2，ResNet50，ResNet101和Inception-ResnetV2 。它们还为您提供培训自己模型的工具。

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/instance_segmentation.md

基于Inception训练的Mask R-CNN速度最快，甚至可以在CPU上试用它，因此我们在本教程中选择了它。该模型在MSCOCO数据集上进行了训练。我们将共享OpenCV(OpenCV 3.43以上版本)代码以在C ++和Python中加载和使用该模型。

1 背景介绍

1.1 什么是图像分割和实例分割

在计算机视觉中，术语“图像分割”或简称“分割”意味着基于某些标准将图像分成像素区域。您可以根据颜色，纹理或您已决定的其他一些条件进行分割。这些区域有时也称为超像素区域。

进行图像分割有多种方法比如实例分割，语义分割，全景分割等。

具体区别见：

https://zhuanlan.zhihu.com/p/50996404

在实例分割中，目标是检测图像中的特定对象并在感兴趣的对象周围创建掩模。实例分割也可以被认为是对象检测，其输出是掩码而不仅仅是边界框。与尝试对图像中的每个像素进行分类的语义分割不同，实例分割不仅仅在标记图像中的每个像素，还区分各个单体。下面我们看一个在非常相似的彩色背景上的两只绵羊的实例分割的例子。

1.2 Mask-RCNN原理

Mask-RCNN是对原始R-CNN论文的一系列改进结果，用于物体检测。R-CNN基于选择性搜索的方法生成候选框，然后使用卷积网络一次一个地处理每个候选框区域以输出对象标签及其边界框。

R-CNN具体见：

https://arxiv.org/abs/1311.2524

Fast R-CNN通过使用ROIPool层在其CNN中一起处理所有提出的候选框区域，使得R-CNN算法更快。

Fast R-CNN具体见：

https://arxiv.org/pdf/1504.08083.pdf

Faster R-CNN通过提取候选区域的网络RPN，代替了费时的选择性搜索，使得检测速度大幅提高。

Faster R-CNN具体见：

https://arxiv.org/pdf/1506.01497.pdf

上述R-CNN及其优化版本简要原理见：

https://www.cnblogs.com/skyfsm/p/6806246.html

Mask R-CNN是对Faster RCNN的一种改进，它包括掩码预测与标签预测和边界框预测两个输出，如下图所示：

Mask-RCNN网络有两个主要部分。

第一个是候选区域生成网络，每个图像生成大约300个候选区域。在训练期间，这些候选区域（ROI）中的每一个都通过第二部分，即目标检测和掩模预测网络，如上所示。注意，由于掩模预测分支与分类框预测分支并行运行，因此对于每个给定的ROI，网络预测可能会获得属于任何类别的掩模。

在推理期间，候选区域会用非最大抑制性方法进行筛选，并且掩模预测分支仅处理最高得分100检测框。因此，对于100个ROI和90个对象类，网络的掩模预测部分输出尺寸为100x90x15x15的4D张量，其中每个掩模的大小为15×15。

对于上面显示的绵羊图像，网络检测到两个对象。对于每个对象，它输出一个数组，其中包含预测的类分数（表示对象属于预测类的概率），检测到的对象的边界框的左，上，右和下位置。从掩码预测分支的输出中提取相应分类的掩码。检测到的两个对象的掩码如下所示：

与Faster R-CNN一样，Mask R-CNN所用架构网络也很灵活。我们之所以选择InceptionV2是因为速度更快，但正如Mask R-CNN论文的作者所指出的那样，人们可以通过ResNeXt-101这样的更好的架构获得更好的结果。

与其他物体探测器（如YOLOv3）相比，Mask-RCNN网络可在更大的图像上运行。网络调整输入图像的大小，使得较小的边是800像素。下面我们将详细介绍获取实例分段结果所需的步骤。为了简化和清晰可视化，我们使用相同的颜色来表示同一类的对象，

Mask R-CNN简要理解见：

https://www.cnblogs.com/wangyong/p/9305347.html

2 Mask-RCNN在OpenCV中的使用

2.1 模型下载

模型下载地址：

http://download.tensorflow.org/models/object_detection/mask_rcnn_inception_v2_coco_2018_01_28.tar.gz

2.2 模型初始化

Mask-RCNN算法输出生成为边界框。每个边界框与置信度分数相关联。置信度阈值参数以下的都将被忽略。从网络输出的对象掩码是灰度图像。由于我们在本教程中使用二值掩码，因此我们使用maskThreshold参数来阈值灰色掩码图像。降低其值将获取更大的掩模。有时这有助于包括在边界附近遗漏的部分，但同时，它还可能包括更尖的边界区域处的背景像素。

文件mscoco_labels.names包含训练模型的所有预测对象。colors.txt文件包含用于标记各种类对象的所有颜色。

接下来，我们使用这两个文件加载网络

mask_rcnn_inception_v2_coco.pb：预先训练的权重；

mask_rcnn_inception_v2_coco.pbtxt：模型结构文件；

下载后的文件有个frozen_inference_graph.pb文件，我改成了mask_rcnn_inception_v2_coco.pb。

我们在这里将DNN后端设置为OpenCV，将目标设置为CPU。您可以尝试将首选目标设置为cv.dnn.DNN_TARGET_OPENCL以在GPU上运行它。当前OpenCV版本中的DNN模块仅使用英特尔的GPU进行测试。我们将读取图像，视频流或网络摄像头进行检测。

C++代码如下：

	//0-image,1-video,2-camera

	int read_file = 0;

	// Load names of classes 导入分类名文件

	string classesFile = "./model/mscoco_labels.names";

	ifstream ifs(classesFile.c_str());

	string line;

	while (getline(ifs, line))

	{

		classes.push_back(line);

	}

	// Load the colors 导入颜色类文件

	string colorsFile = "./model/colors.txt";

	ifstream colorFptr(colorsFile.c_str());

	while (getline(colorFptr, line))

	{

		char *pEnd;

		double r, g, b;

		//字符串转换成浮点数

		r = strtod(line.c_str(), &pEnd);

		g = strtod(pEnd, NULL);

		b = strtod(pEnd, NULL);

		Scalar color = Scalar(r, g, b, 255.0);

		colors.push_back(Scalar(r, g, b, 255.0));

	}

	// Give the configuration and weight files for the model

	String textGraph = "./model/mask_rcnn_inception_v2_coco.pbtxt";

	String modelWeights = "./model/mask_rcnn_inception_v2_coco.pb";

2.3 模型加载

神经网络的输入图像需要采用称为blob的特定格式。这一点类似于其他OpenCV调用深度学习网络的架构。但是用于我们调用的是tensorflow的模型，所以swapRGB参数设置为true。Caffe模型就不需要。

然后将blob作为其输入传递到网络，但是我们需要事先设定获取网络的输出层名字。因为输出有两个。这个模型有detection_out_final和detection_masks两个输出层。这两个输出层分别为预测框的输出结果层和掩模的输出结果层。我们将在下一节中滤除低置信度分数的框。

C++代码如下：

	// Load the network 导入网络

	Net net = readNetFromTensorflow(modelWeights, textGraph);

	net.setPreferableBackend(DNN_BACKEND_OPENCV);

	//只使用CPU

	net.setPreferableTarget(DNN_TARGET_CPU);

	// Open a video file or an image file or a camera stream.

	string str, outputFile;

	VideoCapture cap;

	VideoWriter video;

	Mat frame, blob;

	try

	{

		//输出文件，默认是视频

		outputFile = "mask_rcnn_out_cpp.avi";

		if (read_file == 0)

		{

			// Open the image file 打开图像文件

			str = "image/cars.jpg";

			//cout << "Image file input : " << str << endl;

			ifstream ifile(str);

			if (!ifile)

			{

				throw("error");

			}

			frame = imread(str);

			str.replace(str.end() - 4, str.end(), "_mask_rcnn_out.jpg");

			outputFile = str;

		}

		else if (read_file == 1)

		{

			// Open the video file 打开视频文件

			str = "./image/cars.mp4";

			ifstream ifile(str);

			if (!ifile)

			{

				throw("error");

			}

			cap.open(str);

			str.replace(str.end() - 4, str.end(), "_mask_rcnn_out.avi");

			outputFile = str;

		}

		// Open the webcam 打开摄像头

		else

		{

			cap.open(0);

		}

	}

	catch (...)

	{

		cout << "Could not open the input image/video stream" << endl;

		return 0;

	}

	// Get the video writer initialized to save the output video 如果读入的不是图像，生成输出视频

	if (read_file != 0)

	{

		video.open(outputFile, VideoWriter::fourcc('M', 'J', 'P', 'G'), 28,

				   Size(cap.get(CAP_PROP_FRAME_WIDTH), cap.get(CAP_PROP_FRAME_HEIGHT)));

	}

	// Create a window 显示窗口

	static const string kWinName = "Deep learning object detection in OpenCV";

	//Process frames 处理图像

	while (waitKey(1) < 0)

	{

		//如果是视频

		if (read_file != 0)

		{

			// get frame from the video 获取单帧图像

			cap >> frame;

		}

		// Stop the program if reached end of video 如果图像不存在

		if (frame.empty())

		{

			cout << "Done processing !!!" << endl;

			cout << "Output file is stored as " << outputFile << endl;

			waitKey(0);

			break;

		}

		// Create a 4D blob from a frame 获得深度学习的输入图像

		blobFromImage(frame, blob, 1.0, Size(frame.cols, frame.rows), Scalar(), true, false);

		//blobFromImage(frame, blob);

		//Sets the input to the network 设置输入

		net.setInput(blob);

		// Runs the forward pass to get output from the output layers 获得输出层

		std::vector<String> outNames(2);

		outNames[0] = "detection_out_final";

		outNames[1] = "detection_masks";

		vector<Mat> outs;

		net.forward(outs, outNames);

		// Extract the bounding box and mask for each of the detected objects 提取预测框和掩模

		postprocess(frame, outs);

		// Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)

		vector<double> layersTimes;

		double freq = getTickFrequency() / 1000;

		double t = net.getPerfProfile(layersTimes) / freq;

		string label = format("Mask-RCNN Inference time for a frame : %0.0f ms", t);

		putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0));

		// Write the frame with the detection boxes 保存结果

		Mat detectedFrame;

		frame.convertTo(detectedFrame, CV_8U);

		namedWindow(kWinName, WINDOW_NORMAL);

		imshow(kWinName, frame);

		//enter退出

		if (waitKey(1000) == 27)

		{

			break;

		}

		if (read_file == 0)

		{

			imwrite(outputFile, detectedFrame);

			break;

		}

		else

		{

			video.write(detectedFrame);

		}

	}

2.4 输出结果处理

网络的输出掩码对象是一个四维对象，其中第一维表示帧中检测到的框的数量，第二维表示模型中的类数，第三维和第四维表示掩模形状（15× 15）。如果框的置信度小于给定阈值，则删除边界框并且不考虑进行进一步处理。

C++代码如下：

/**

 * @brief For each frame, extract the bounding box and mask for each detected object 提取每张图像的预测框和掩模

 *

 * @param frame

 * @param outs

 */

void postprocess(Mat &frame, const vector<Mat> &outs)

{

	//预测框结果

	Mat outDetections = outs[0];

	//掩模结果

	Mat outMasks = outs[1];

	// Output size of masks is NxCxHxW where

	// N - number of detected boxes

	// C - number of classes (excluding background)

	// HxW - segmentation shape

	//预测的框个数

	const int numDetections = outDetections.size[2];

	//类别数

	const int numClasses = outMasks.size[1];

	outDetections = outDetections.reshape(1, outDetections.total() / 7);

	//筛选预测框数

	for (int i = 0; i < numDetections; ++i)

	{

		//提取预测框置信度

		float score = outDetections.at<float>(i, 2);

		//超过阈值

		if (score > confThreshold)

		{

			// Extract the bounding box

			//类别

			int classId = static_cast<int>(outDetections.at<float>(i, 1));

			int left = static_cast<int>(frame.cols * outDetections.at<float>(i, 3));

			int top = static_cast<int>(frame.rows * outDetections.at<float>(i, 4));

			int right = static_cast<int>(frame.cols * outDetections.at<float>(i, 5));

			int bottom = static_cast<int>(frame.rows * outDetections.at<float>(i, 6));

			//防止框画在外面

			left = max(0, min(left, frame.cols - 1));

			top = max(0, min(top, frame.rows - 1));

			right = max(0, min(right, frame.cols - 1));

			bottom = max(0, min(bottom, frame.rows - 1));

			Rect box = Rect(left, top, right - left + 1, bottom - top + 1);

			// Extract the mask for the object 提取掩模

			Mat objectMask(outMasks.size[2], outMasks.size[3], CV_32F, outMasks.ptr<float>(i, classId));

			// Draw bounding box, colorize and show the mask on the image

			drawBox(frame, classId, score, box, objectMask);

		}

	}

}

2.5 画图

最后，我们在输入图像上绘制通过滤后的框，其中包含指定的类标签和置信度分数。我们还将彩色掩模与其物体轮廓叠加在框内。在此代码中，我们对属于同一类的所有对象使用相同的颜色，但您也可以对不同的实例进行不同的着色。

C++代码如下：

/**

 * @brief  Draw the predicted bounding box, colorize and show the mask on the image 画图

 *

 * @param frame

 * @param classId

 * @param conf

 * @param box

 * @param objectMask

 */

void drawBox(Mat &frame, int classId, float conf, Rect box, Mat &objectMask)

{

	//Draw a rectangle displaying the bounding box 画预测框

	rectangle(frame, Point(box.x, box.y), Point(box.x + box.width, box.y + box.height), Scalar(255, 178, 50), 3);

	//Get the label for the class name and its confidence

	//置信度获取

	string label = format("%.2f", conf);

	//获取标签

	if (!classes.empty())

	{

		CV_Assert(classId < (int)classes.size());

		label = classes[classId] + ":" + label;

	}

	//Display the label at the top of the bounding box

	int baseLine;

	//获取字符串的高度和宽度

	//标签，字体，文本大小的倍数，文本粗细，文本最低点对应的纵坐标

	Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

	box.y = max(box.y, labelSize.height);

	//画框打标签

	rectangle(frame, Point(box.x, box.y - round(1.5 * labelSize.height)), Point(box.x + round(1.5 * labelSize.width), box.y + baseLine), Scalar(255, 255, 255), FILLED);

	putText(frame, label, Point(box.x, box.y), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 0, 0), 1);

	//填充颜色

	Scalar color = colors[classId % colors.size()];

	// Resize the mask, threshold, color and apply it on the image 重置大小

	resize(objectMask, objectMask, Size(box.width, box.height));

	Mat mask = (objectMask > maskThreshold);

	//叠加获得颜色掩模

	Mat coloredRoi = (0.3 * color + 0.7 * frame(box));

	coloredRoi.convertTo(coloredRoi, CV_8UC3);

	// Draw the contours on the image 画轮廓

	vector<Mat> contours;

	Mat hierarchy;

	mask.convertTo(mask, CV_8U);

	findContours(mask, contours, hierarchy, RETR_CCOMP, CHAIN_APPROX_SIMPLE);

	drawContours(coloredRoi, contours, -1, color, 5, LINE_8, hierarchy, 100);

	coloredRoi.copyTo(frame(box), mask);

}

3 结果和代码

3.1 结果

Mask R-CNN精度不错，但是速度很慢。不过未来的应用趋势吧。下图是原图和结果，但是结果是debug模式下跑的，实际快很多。CPUI5标压无gpu下差不多5000ms一帧。

原图：

检测结果图像：

3.2 代码

代码地址：

https://github.com/luohenyueji/OpenCV-Practical-Exercise

C++版本代码：

// Mask R-CNN in OpenCV.cpp : 此文件包含 "main" 函数。程序执行将在此处开始并结束。

//

#include "pch.h"

#include <iostream>

#include <fstream>

#include <sstream>

#include <iostream>

#include <string.h>

#include <opencv2/dnn.hpp>

#include <opencv2/imgproc.hpp>

#include <opencv2/highgui.hpp>

using namespace cv;

using namespace dnn;

using namespace std;

// Initialize the parameters

// Confidence threshold 置信度阈值

float confThreshold = 0.5;

// Mask threshold 掩模阈值

float maskThreshold = 0.3;

vector<string> classes;

vector<Scalar> colors;

// Draw the predicted bounding box

void drawBox(Mat &frame, int classId, float conf, Rect box, Mat &objectMask);

// Postprocess the neural network's output for each frame

void postprocess(Mat &frame, const vector<Mat> &outs);

int main()

{

	//0-image,1-video,2-camera

	int read_file = 0;

	// Load names of classes 导入分类名文件

	string classesFile = "./model/mscoco_labels.names";

	ifstream ifs(classesFile.c_str());

	string line;

	while (getline(ifs, line))

	{

		classes.push_back(line);

	}

	// Load the colors 导入颜色类文件

	string colorsFile = "./model/colors.txt";

	ifstream colorFptr(colorsFile.c_str());

	while (getline(colorFptr, line))

	{

		char *pEnd;

		double r, g, b;

		//字符串转换成浮点数

		r = strtod(line.c_str(), &pEnd);

		g = strtod(pEnd, NULL);

		b = strtod(pEnd, NULL);

		Scalar color = Scalar(r, g, b, 255.0);

		colors.push_back(Scalar(r, g, b, 255.0));

	}

	// Give the configuration and weight files for the model

	String textGraph = "./model/mask_rcnn_inception_v2_coco.pbtxt";

	String modelWeights = "./model/mask_rcnn_inception_v2_coco.pb";

	// Load the network 导入网络

	Net net = readNetFromTensorflow(modelWeights, textGraph);

	net.setPreferableBackend(DNN_BACKEND_OPENCV);

	//只使用CPU

	net.setPreferableTarget(DNN_TARGET_CPU);

	// Open a video file or an image file or a camera stream.

	string str, outputFile;

	VideoCapture cap;

	VideoWriter video;

	Mat frame, blob;

	try

	{

		//输出文件，默认是视频

		outputFile = "mask_rcnn_out_cpp.avi";

		if (read_file == 0)

		{

			// Open the image file 打开图像文件

			str = "image/cars.jpg";

			//cout << "Image file input : " << str << endl;

			ifstream ifile(str);

			if (!ifile)

			{

				throw("error");

			}

			frame = imread(str);

			str.replace(str.end() - 4, str.end(), "_mask_rcnn_out.jpg");

			outputFile = str;

		}

		else if (read_file == 1)

		{

			// Open the video file 打开视频文件

			str = "./image/cars.mp4";

			ifstream ifile(str);

			if (!ifile)

			{

				throw("error");

			}

			cap.open(str);

			str.replace(str.end() - 4, str.end(), "_mask_rcnn_out.avi");

			outputFile = str;

		}

		// Open the webcam 打开摄像头

		else

		{

			cap.open(0);

		}

	}

	catch (...)

	{

		cout << "Could not open the input image/video stream" << endl;

		return 0;

	}

	// Get the video writer initialized to save the output video 如果读入的不是图像，生成输出视频

	if (read_file != 0)

	{

		video.open(outputFile, VideoWriter::fourcc('M', 'J', 'P', 'G'), 28,

				   Size(cap.get(CAP_PROP_FRAME_WIDTH), cap.get(CAP_PROP_FRAME_HEIGHT)));

	}

	// Create a window 显示窗口

	static const string kWinName = "Deep learning object detection in OpenCV";

	//Process frames 处理图像

	while (waitKey(1) < 0)

	{

		//如果是视频

		if (read_file != 0)

		{

			// get frame from the video 获取单帧图像

			cap >> frame;

		}

		// Stop the program if reached end of video 如果图像不存在

		if (frame.empty())

		{

			cout << "Done processing !!!" << endl;

			cout << "Output file is stored as " << outputFile << endl;

			waitKey(0);

			break;

		}

		// Create a 4D blob from a frame 获得深度学习的输入图像

		blobFromImage(frame, blob, 1.0, Size(frame.cols, frame.rows), Scalar(), true, false);

		//blobFromImage(frame, blob);

		//Sets the input to the network 设置输入

		net.setInput(blob);

		// Runs the forward pass to get output from the output layers 获得输出层

		std::vector<String> outNames(2);

		outNames[0] = "detection_out_final";

		outNames[1] = "detection_masks";

		vector<Mat> outs;

		net.forward(outs, outNames);

		// Extract the bounding box and mask for each of the detected objects 提取预测框和掩模

		postprocess(frame, outs);

		// Put efficiency information. The function getPerfProfile returns the overall time for inference(t) and the timings for each of the layers(in layersTimes)

		vector<double> layersTimes;

		double freq = getTickFrequency() / 1000;

		double t = net.getPerfProfile(layersTimes) / freq;

		string label = format("Mask-RCNN Inference time for a frame : %0.0f ms", t);

		putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 0));

		// Write the frame with the detection boxes 保存结果

		Mat detectedFrame;

		frame.convertTo(detectedFrame, CV_8U);

		namedWindow(kWinName, WINDOW_NORMAL);

		imshow(kWinName, frame);

		//enter退出

		if (waitKey(1000) == 27)

		{

			break;

		}

		if (read_file == 0)

		{

			imwrite(outputFile, detectedFrame);

			break;

		}

		else

		{

			video.write(detectedFrame);

		}

	}

	cap.release();

	//释放生成的视频

	if (read_file != 0)

	{

		video.release();

	}

	return 0;

}

/**

 * @brief For each frame, extract the bounding box and mask for each detected object 提取每张图像的预测框和掩模

 *

 * @param frame

 * @param outs

 */

void postprocess(Mat &frame, const vector<Mat> &outs)

{

	//预测框结果

	Mat outDetections = outs[0];

	//掩模结果

	Mat outMasks = outs[1];

	// Output size of masks is NxCxHxW where

	// N - number of detected boxes

	// C - number of classes (excluding background)

	// HxW - segmentation shape

	//预测的框个数

	const int numDetections = outDetections.size[2];

	//类别数

	const int numClasses = outMasks.size[1];

	outDetections = outDetections.reshape(1, outDetections.total() / 7);

	//筛选预测框数

	for (int i = 0; i < numDetections; ++i)

	{

		//提取预测框置信度

		float score = outDetections.at<float>(i, 2);

		//超过阈值

		if (score > confThreshold)

		{

			// Extract the bounding box

			//类别

			int classId = static_cast<int>(outDetections.at<float>(i, 1));

			int left = static_cast<int>(frame.cols * outDetections.at<float>(i, 3));

			int top = static_cast<int>(frame.rows * outDetections.at<float>(i, 4));

			int right = static_cast<int>(frame.cols * outDetections.at<float>(i, 5));

			int bottom = static_cast<int>(frame.rows * outDetections.at<float>(i, 6));

			//防止框画在外面

			left = max(0, min(left, frame.cols - 1));

			top = max(0, min(top, frame.rows - 1));

			right = max(0, min(right, frame.cols - 1));

			bottom = max(0, min(bottom, frame.rows - 1));

			Rect box = Rect(left, top, right - left + 1, bottom - top + 1);

			// Extract the mask for the object 提取掩模

			Mat objectMask(outMasks.size[2], outMasks.size[3], CV_32F, outMasks.ptr<float>(i, classId));

			// Draw bounding box, colorize and show the mask on the image

			drawBox(frame, classId, score, box, objectMask);

		}

	}

}

/**

 * @brief  Draw the predicted bounding box, colorize and show the mask on the image 画图

 *

 * @param frame

 * @param classId

 * @param conf

 * @param box

 * @param objectMask

 */

void drawBox(Mat &frame, int classId, float conf, Rect box, Mat &objectMask)

{

	//Draw a rectangle displaying the bounding box 画预测框

	rectangle(frame, Point(box.x, box.y), Point(box.x + box.width, box.y + box.height), Scalar(255, 178, 50), 3);

	//Get the label for the class name and its confidence

	//置信度获取

	string label = format("%.2f", conf);

	//获取标签

	if (!classes.empty())

	{

		CV_Assert(classId < (int)classes.size());

		label = classes[classId] + ":" + label;

	}

	//Display the label at the top of the bounding box

	int baseLine;

	//获取字符串的高度和宽度

	//标签，字体，文本大小的倍数，文本粗细，文本最低点对应的纵坐标

	Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

	box.y = max(box.y, labelSize.height);

	//画框打标签

	rectangle(frame, Point(box.x, box.y - round(1.5 * labelSize.height)), Point(box.x + round(1.5 * labelSize.width), box.y + baseLine), Scalar(255, 255, 255), FILLED);

	putText(frame, label, Point(box.x, box.y), FONT_HERSHEY_SIMPLEX, 0.75, Scalar(0, 0, 0), 1);

	//填充颜色

	Scalar color = colors[classId % colors.size()];

	// Resize the mask, threshold, color and apply it on the image 重置大小

	resize(objectMask, objectMask, Size(box.width, box.height));

	Mat mask = (objectMask > maskThreshold);

	//叠加获得颜色掩模

	Mat coloredRoi = (0.3 * color + 0.7 * frame(box));

	coloredRoi.convertTo(coloredRoi, CV_8UC3);

	// Draw the contours on the image 画轮廓

	vector<Mat> contours;

	Mat hierarchy;

	mask.convertTo(mask, CV_8U);

	findContours(mask, contours, hierarchy, RETR_CCOMP, CHAIN_APPROX_SIMPLE);

	drawContours(coloredRoi, contours, -1, color, 5, LINE_8, hierarchy, 100);

	coloredRoi.copyTo(frame(box), mask);

}

Python版本代码：

import cv2 as cv

import numpy as np

import os.path

import sys

import random

# Initialize the parameters

confThreshold = 0.5  # Confidence threshold

maskThreshold = 0.3  # Mask threshold

# Draw the predicted bounding box, colorize and show the mask on the image

def drawBox(frame, classId, conf, left, top, right, bottom, classMask):

    # Draw a bounding box.

    cv.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 3)

    # Print a label of class.

    label = '%.2f' % conf

    if classes:

        assert(classId < len(classes))

        label = '%s:%s' % (classes[classId], label)

    # Display the label at the top of the bounding box

    labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)

    top = max(top, labelSize[1])

    cv.rectangle(frame, (left, top - round(1.5*labelSize[1])), (left + round(1.5*labelSize[0]), top + baseLine), (255, 255, 255), cv.FILLED)

    cv.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.75, (0,0,0), 1)

    # Resize the mask, threshold, color and apply it on the image

    classMask = cv.resize(classMask, (right - left + 1, bottom - top + 1))

    mask = (classMask > maskThreshold)

    roi = frame[top:bottom+1, left:right+1][mask]

    # color = colors[classId%len(colors)]

    # Comment the above line and uncomment the two lines below to generate different instance colors

    colorIndex = random.randint(0, len(colors)-1)

    color = colors[colorIndex]

    frame[top:bottom+1, left:right+1][mask] = ([0.3*color[0], 0.3*color[1], 0.3*color[2]] + 0.7 * roi).astype(np.uint8)

    # Draw the contours on the image

    mask = mask.astype(np.uint8)

    contours, hierarchy = cv.findContours(mask,cv.RETR_TREE,cv.CHAIN_APPROX_SIMPLE)

    cv.drawContours(frame[top:bottom+1, left:right+1], contours, -1, color, 3, cv.LINE_8, hierarchy, 100)

# For each frame, extract the bounding box and mask for each detected object

def postprocess(boxes, masks):

    # Output size of masks is NxCxHxW where

    # N - number of detected boxes

    # C - number of classes (excluding background)

    # HxW - segmentation shape

    numClasses = masks.shape[1]

    numDetections = boxes.shape[2]

    frameH = frame.shape[0]

    frameW = frame.shape[1]

    for i in range(numDetections):

        box = boxes[0, 0, i]

        mask = masks[i]

        score = box[2]

        if score > confThreshold:

            classId = int(box[1])

            # Extract the bounding box

            left = int(frameW * box[3])

            top = int(frameH * box[4])

            right = int(frameW * box[5])

            bottom = int(frameH * box[6])

            left = max(0, min(left, frameW - 1))

            top = max(0, min(top, frameH - 1))

            right = max(0, min(right, frameW - 1))

            bottom = max(0, min(bottom, frameH - 1))

            # Extract the mask for the object

            classMask = mask[classId]

            # Draw bounding box, colorize and show the mask on the image

            drawBox(frame, classId, score, left, top, right, bottom, classMask)

# Load names of classes

classesFile = "./model/mscoco_labels.names";

classes = None

with open(classesFile, 'rt') as f:

   classes = f.read().rstrip('\n').split('\n')

# Give the textGraph and weight files for the model

textGraph = "./model/mask_rcnn_inception_v2_coco.pbtxt";

modelWeights = "./model/mask_rcnn_inception_v2_coco.pb";

# Load the network

net = cv.dnn.readNetFromTensorflow(modelWeights, textGraph);

net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)

net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)

# Load the classes

colorsFile = "./model/colors.txt";

with open(colorsFile, 'rt') as f:

    colorsStr = f.read().rstrip('\n').split('\n')

colors = [] #[0,0,0]

for i in range(len(colorsStr)):

    rgb = colorsStr[i].split(' ')

    color = np.array([float(rgb[0]), float(rgb[1]), float(rgb[2])])

    colors.append(color)

winName = 'Mask-RCNN Object detection and Segmentation in OpenCV'

cv.namedWindow(winName, cv.WINDOW_NORMAL)

#image,video,none

input_file="image"

input_file_name="./image/cars.jpg"

outputFile = "mask_rcnn_out_py.avi"

if (input_file is "image"):

    # Open the image file

    if not os.path.isfile(input_file_name):

        print("Input image file ", input_file_name, " doesn't exist")

        sys.exit(1)

    cap = cv.VideoCapture(input_file_name)

    outputFile = input_file_name[:-4]+'_mask_rcnn_out_py.jpg'

elif (input_file is "video"):

    # Open the video file

    if not os.path.isfile(input_file_name):

        print("Input video file ", input_file_name, " doesn't exist")

        sys.exit(1)

    cap = cv.VideoCapture(input_file_name)

    outputFile = input_file_name[:-4]+'_mask_rcnn_out_py.avi'

else:

    # Webcam input

    cap = cv.VideoCapture(0)

# Get the video writer initialized to save the output video

if (input_file is not "image"):

    vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M','J','P','G'), 28, (round(cap.get(cv.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv.CAP_PROP_FRAME_HEIGHT))))

while cv.waitKey(1) < 0:

    # Get frame from the video

    hasFrame, frame = cap.read()

    # Stop the program if reached end of video

    if not hasFrame:

        print("Done processing !!!")

        print("Output file is stored as ", outputFile)

        cv.waitKey(3000)

        break

    # Create a 4D blob from a frame.

    blob = cv.dnn.blobFromImage(frame, swapRB=True, crop=False)

    # Set the input to the network

    net.setInput(blob)

    # Run the forward pass to get output from the output layers

    boxes, masks = net.forward(['detection_out_final', 'detection_masks'])

    # Extract the bounding box and mask for each of the detected objects

    postprocess(boxes, masks)

    # Put efficiency information.

    t, _ = net.getPerfProfile()

    label = 'Inference time for a frame : %0.0f ms' % abs(t * 1000.0 / cv.getTickFrequency())

    cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0))

    # Write the frame with the detection boxes

    if (input_file is "image"):

        cv.imwrite(outputFile, frame.astype(np.uint8));

    else:

        vid_writer.write(frame.astype(np.uint8))

    cv.imshow(winName, frame)

4 参考

https://www.learnopencv.com/deep-learning-based-object-detection-and-instance-segmentation-using-mask-r-cnn-in-opencv-python-c/

[OpenCV实战]13 OpenCV中使用Mask R-CNN进行对象检测和实例分割的更多相关文章

CVPR目标检测与实例分割算法解析：FCOS（2019），Mask R-CNN（2019），PolarMask（2020）
CVPR目标检测与实例分割算法解析:FCOS(2019),Mask R-CNN(2019),PolarMask(2020)1. 目标检测:FCOS(CVPR 2019)目标检测算法FCOS(FCOS: ...
[OpenCV实战]4 OpenCV中的颜色空间
目录 1 不同的色彩空间 1.1RGB颜色空间 1.2 Lab色彩空间 1.3 YCrCb颜色空间 1.4 HSV颜色空间 2 如何使用这些颜色空间进行分割 2.1 获取特定颜色的颜色值 2.2 应 ...
[OpenCV实战]18 Opencv中的单应性矩阵Homography
目录 1 介绍 1.1 什么是Homography 1.2 使用Homography进行图像对齐 1.3 Homography的应用-全景拼接 2 Homography的计算 3 总结 4 参考 &l ...
CVPR2019 | 超越Mask R-CNN！华科开源图像实例分割新方法MS R-CNN
安妮乾明发自凹非寺本文转载自量子位(QbitAI) 实习生又立功了! 这一次,亮出好成绩的实习生来自地平线,是一名华中科技大学的硕士生. 他作为第一作者完成的研究Mask Scoring R- ...
经典论文系列| 实例分割中的新范式-SOLO
前言: 这是实例分割中的一篇经典论文,以往的实例分割模型都比较复杂,这篇论文提出了一个简单且直接的实例分割模型,如何设计这种简单直接的模型且要达到一定的精度往往会存在一些困难,论文中有很多思路或思想值 ...
[OpenCV实战]8 深度学习目标检测网络YOLOv3的训练
目录 1 数据集 1.1 下载openImages雪人数据[约1.5小时] 1.2 训练集测试集拆分 2 Darknet 2.1 下载并构建Darknet 2.2 修改代码以定期保存模型文件 2.3 ...
OpenCV实战：人脸关键点检测（FaceMark）
Summary:利用OpenCV中的LBF算法进行人脸关键点检测(Facial Landmark Detection) Author: Amusi Date: 2018-03-20 ...
opencv实战——图像矫正算法深入探讨
摘要在机器视觉中,对于图像的处理有时候因为放置的原因导致ROI区域倾斜,这个时候我们会想办法把它纠正为正确的角度视角来,方便下一步的布局分析与文字识别,这个时候通过透视变换就可以取得比较好的裁剪效果 ...
[OpenCV实战]29 使用OpenCV实现红眼自动去除
目录 1 红眼消除 1.1 眼部检测 1.2 红眼遮掩 1.3 清除瞳孔掩模空洞 1.4 红眼修复 2 结果与完整代码 2.1 结果 2.2 代码 3 参考在本教程中,我们将学习如何完全自动地从照片 ...

随机推荐

Flink WordCount入门
下面通过一个单词统计的案例,快速上手应用 Flink,进行流处理(Streaming)和批处理(Batch) 单词统计(批处理) 引入依赖  <depe ...
Linux实战笔记_CentOS7_无法识别NTFS格式的U盘
注:因为CentOS 默认不识别NTFS的磁盘格式,所以我们要借助另外一个软件ntfs-3g来挂载.自带的yum源没有这个软件,要用第三方的软件源,比如阿里的epel. #安装ntfs-3g cd / ...
配置文件yaml和ini
前言本文主要介绍配置文件yaml和ini的读取. 一.yaml文件 YAML是一个可读性高,易于理解,用来表达数据序列化的格式.语法与python的语法类似.文件后缀 .yaml 下面是yaml文 ...
SpringBoot 过滤器和拦截器
过滤器实现过滤器需要实现 javax.servlet.Filter 接口.重写三个方法.其中 init() 方法在服务启动时执行,destroy() 在服务停止之前执行. 可用两种方式注册过滤器: ...
Python基础部分：8、for循环和range的使用
目录一.while循环补充说明 1.死循环 2.嵌套及全局标志位二.for...循环 1.for...循环特点 2.for...循环语法结构三.range方法 1.什么是range 2.不同版本 ...
CodeTON Round 3 (C.差分维护，D.容斥原理)
C. Complementary XOR 题目大意: 给你两个01串ab,问你是否可以通过一下两种操作在不超过n+5次的前提下将两个串都变为0,同时需要输出可以的操作方案选择一个区间[l,r] 将串 ...
Day1：Markdown文本基础操作
Markdown学习标题一级标题输入:#+空格+标题名字 :快捷键(Ctrl+1~6) 二级标题输入:##+空格+标题名字为二级标题依次加#(最多六级标题) 三级标题四级标题字体 He ...
C#winform使用NOPI读取Excel读取图片
需求:在Winform使用NOPI做导入时候,需要导入数据的同时导入图片. 虽然代码方面不适用(我好像也没仔细看过代码),但是感谢大佬给了灵感http://www.wjhsh.net/IT-Ramon ...
i春秋Hash
打开题目页面是一个超链接先查看源码得到一段信息没什么,这只是超链接跳转的信息点击跳转到下一个界面提示我们如果我们的传入不是123,123也就是key了,在上面的url就可以看到如果key!= ...
解决sox报错sox FAIL formats: no handler for given file type `mp3
sudo apt-get install libsox-fmt-mp3

[OpenCV实战]13 OpenCV中使用Mask R-CNN进行对象检测和实例分割

1 背景介绍

1.1 什么是图像分割和实例分割

1.2 Mask-RCNN原理

2 Mask-RCNN在OpenCV中的使用

2.1 模型下载

2.2 模型初始化

2.3 模型加载

2.4 输出结果处理

2.5 画图

3 结果和代码

3.1 结果

3.2 代码

4 参考

[OpenCV实战]13 OpenCV中使用Mask R-CNN进行对象检测和实例分割的更多相关文章

随机推荐

热门专题