1 网络加载

2 读取图像

3 前向传播

4 处理输出

3结果和代码

3.1结果

3.2 代码

参考

在这篇文章中，我们将逐字逐句地尝试找到图片中的单词！基于最近的一篇论文进行文字检测。

EAST: An Efficient and Accurate Scene Text Detector.

https://arxiv.org/abs/1704.03155v2

https://github.com/argman/EAST

应该注意，文本检测不同于文本识别。在文本检测中，我们只检测文本周围的边界框。但是，在文本识别中，我们实际上找到了框中所写的内容。例如，在下面给出的图像中，文本检测将为您提供单词周围的边界框，文本识别将告诉您该框包含单词STOP。本文只进行文本检测。

本文基于tensorflow模型，基于OpenCV调用tensorflow模型。我们将逐步讨论算法是如何工作的。您将需要OpenCV3.4.3以上版本来运行代码。其他opencv DNN模型读取也类似这样步骤。

涉及的步骤如下：

下载EAST模型
将模型加载到内存中
准备输入图像
正向传递blob通过网络
处理输出

1 网络加载

我们将使用cv :: dnn :: readnet(C++版本)或cv2.dnn.ReadNet(python版本)函数将网络加载到内存中。它会根据指定的文件名自动检测配置和框架。在我们的例子中，它是一个pb文件，因此，它将假定要加载Tensorflow网络。和加载图像不大一样，没有模型结构描述文件。

C++

Net net = readNet(model);

Python

net = cv.dnn.readNet(model)

2 读取图像

我们需要创建一个4-D输入blob，用于将图像输送到网络。这是使用blobFromImage函数完成的。

C++

blobFromImage(frame, blob, 1.0, Size(inpWidth, inpHeight), Scalar(123.68, 116.78, 103.94), true, false);

Python

blob = cv.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

我们需要为此函数指定一些参数。它们如下：

第一个参数是图像本身。
第二个参数指定每个像素值的缩放。在这种情况下，它不是必需的。因此我们将其保持为1。
第三个参数是设定网络的默认输入为320×320。因此，我们需要在创建blob时指定它。最好和网络输入一致。
第四个参数是训练时候设定的模型均值。需要减去模型均值。
第五个参数是我们是否要交换R和B通道。这是必需的，因为OpenCV使用BGR格式，Tensorflow使用RGB格式，caffe模型使用BGR格式。
最后一个参数是我们是否要裁剪图像并采取中心裁剪。在这种情况下我们指定False。

3 前向传播

现在我们已准备好输入，我们将通过网络传递它。网络有两个输出。一个指定文本框的位置，另一个指定检测到的框的置信度分数。两个输出层如下：

feature_fusion/concat_3

feature_fusion/Conv_7/Sigmoid

这两个输出可以直接用netron这个软件打开pb模型，看到最后输出结果。Netron是一个模型结构可视化神器，支持tf, caffe, keras,mxnet等多种框架。Netron下载地址：

https://github.com/lutzroeder/Netron

c++读取输出代码如下：

std::vector<String> outputLayers(2);

outputLayers[0] = "feature_fusion/Conv_7/Sigmoid";

outputLayers[1] = "feature_fusion/concat_3";

python读取输出代码如下：

outputLayers = []

outputLayers.append("feature_fusion/Conv_7/Sigmoid")

outputLayers.append("feature_fusion/concat_3")

接下来，我们通过将输入图像传递到网络来获得输出。如前所述，输出由两部分组成：置信度和位置。

C++

std::vector<Mat> output;

net.setInput(blob);

net.forward(output, outputLayers);

Mat scores = output[0];

Mat geometry = output[1];

python:



net.setInput(blob)

output = net.forward(outputLayers)

scores = output[0]

geometry = output[1]

4 处理输出

如前所述，我们将使用两个层的输出并解码文本框的位置及其方向。我们可能会得到许多文本框。因此，我们需要从该批次中筛选出看起来最好的文本框。这是使用非极大值抑制算法完成的。

非极大值抑制算法在目标检测中应用很广泛，具体可以参考

http://www.it610.com/article/5215825.htm

https://blog.csdn.net/qq_14845119/article/details/52064928

1 解码

C++:

std::vector<RotatedRect> boxes;

std::vector<float> confidences;

decode(scores, geometry, confThreshold, boxes, confidences);

python:

[boxes, confidences] = decode(scores, geometry, confThreshold)

2 非极大值抑制

我们使用OpenCV函数NMSBoxes（C ++）或NMSBoxesRotated（Python）来过滤掉误报并获得最终预测。

C++:

std::vector<int> indices;

NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);

Python:

indices = cv.dnn.NMSBoxesRotated(boxes, confidences, confThreshold, nmsThreshold)

3结果和代码

3.1结果

在VS2017下运行了C++代码，其中OpenCV版本至少要3.4.5以上。不然模型读取会有问题。模型文件太大，见下载链接：

https://download.csdn.net/download/luohenyj/11003000

https://github.com/luohenyueji/OpenCV-Practical-Exercise

如果没有积分（系统自动设定资源分数）看看参考链接。我搬运过来的，大修改没有。

或者梯子直接下载模型：

https://www.dropbox.com/s/r2ingd0l3zt8hxs/frozen_east_text_detection.tar.gz?dl=1

结果如下，效果还不错，速度也还好。

3.2 代码

C++代码有所更改，python没有。对文本检测不熟悉，注释不多，但是实际代码不需要太大变化。

C++代码：

// text_detection.cpp : 此文件包含 "main" 函数。程序执行将在此处开始并结束。

//

#include "pch.h"

#include <iostream>

#include <opencv2/opencv.hpp>

using namespace std;

using namespace cv;

using namespace cv::dnn;

//解码

void decode(const Mat &scores, const Mat &geometry, float scoreThresh,

	std::vector<RotatedRect> &detections, std::vector<float> &confidences);

/**

 * @brief

 *

 * @param srcImg 检测图像

 * @param inpWidth 深度学习图像输入宽

 * @param inpHeight 深度学习图像输入高

 * @param confThreshold 置信度

 * @param nmsThreshold 非极大值抑制算法阈值

 * @param net

 * @return Mat

 */

Mat text_detect(Mat srcImg, int inpWidth, int inpHeight, float confThreshold, float nmsThreshold, Net net)

{

	//输出

	std::vector<Mat> output;

	std::vector<String> outputLayers(2);

	outputLayers[0] = "feature_fusion/Conv_7/Sigmoid";

	outputLayers[1] = "feature_fusion/concat_3";

	//检测图像

	Mat frame, blob;

	frame = srcImg.clone();

	//获取深度学习模型的输入

	blobFromImage(frame, blob, 1.0, Size(inpWidth, inpHeight), Scalar(123.68, 116.78, 103.94), true, false);

	net.setInput(blob);

	//输出结果

	net.forward(output, outputLayers);

	//置信度

	Mat scores = output[0];

	//位置参数

	Mat geometry = output[1];

	// Decode predicted bounding boxes， 对检测框进行解码，获取文本框位置方向

	//文本框位置参数

	std::vector<RotatedRect> boxes;

	//文本框置信度

	std::vector<float> confidences;

	decode(scores, geometry, confThreshold, boxes, confidences);

	// Apply non-maximum suppression procedure， 应用非极大性抑制算法

	//符合要求的文本框

	std::vector<int> indices;

	NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);

	// Render detections. 输出预测

	//缩放比例

	Point2f ratio((float)frame.cols / inpWidth, (float)frame.rows / inpHeight);

	for (size_t i = 0; i < indices.size(); ++i)

	{

		RotatedRect &box = boxes[indices[i]];

		Point2f vertices[4];

		box.points(vertices);

		//还原坐标点

		for (int j = 0; j < 4; ++j)

		{

			vertices[j].x *= ratio.x;

			vertices[j].y *= ratio.y;

		}

		//画框

		for (int j = 0; j < 4; ++j)

		{

			line(frame, vertices[j], vertices[(j + 1) % 4], Scalar(0, 255, 0), 2, LINE_AA);

		}

	}

	// Put efficiency information. 时间

	std::vector<double> layersTimes;

	double freq = getTickFrequency() / 1000;

	double t = net.getPerfProfile(layersTimes) / freq;

	std::string label = format("Inference time: %.2f ms", t);

	putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));

	return frame;

}

//模型地址

auto model = "./model/frozen_east_text_detection.pb";

//检测图像

auto detect_image = "./image/patient.jpg";

//输入框尺寸

auto inpWidth = 320;

auto inpHeight = 320;

//置信度阈值

auto confThreshold = 0.5;

//非极大值抑制算法阈值

auto nmsThreshold = 0.4;

int main()

{

	//读取模型

	Net net = readNet(model);

	//读取检测图像

	Mat srcImg = imread(detect_image);

	if (!srcImg.empty())

	{

		cout << "read image success!" << endl;

	}

	Mat resultImg = text_detect(srcImg, inpWidth, inpHeight, confThreshold, nmsThreshold, net);

	imshow("result", resultImg);

	waitKey();

	return 0;

}

/**

 * @brief 输出检测到的文本框相关信息

 *

 * @param scores 置信度

 * @param geometry 位置信息

 * @param scoreThresh 置信度阈值

 * @param detections 位置

 * @param confidences 分类概率

 */

void decode(const Mat &scores, const Mat &geometry, float scoreThresh,

	std::vector<RotatedRect> &detections, std::vector<float> &confidences)

{

	detections.clear();

	//判断是不是符合提取要求

	CV_Assert(scores.dims == 4);

	CV_Assert(geometry.dims == 4);

	CV_Assert(scores.size[0] == 1);

	CV_Assert(geometry.size[0] == 1);

	CV_Assert(scores.size[1] == 1);

	CV_Assert(geometry.size[1] == 5);

	CV_Assert(scores.size[2] == geometry.size[2]);

	CV_Assert(scores.size[3] == geometry.size[3]);

	const int height = scores.size[2];

	const int width = scores.size[3];

	for (int y = 0; y < height; ++y)

	{

		//识别概率

		const float *scoresData = scores.ptr<float>(0, 0, y);

		//文本框坐标

		const float *x0_data = geometry.ptr<float>(0, 0, y);

		const float *x1_data = geometry.ptr<float>(0, 1, y);

		const float *x2_data = geometry.ptr<float>(0, 2, y);

		const float *x3_data = geometry.ptr<float>(0, 3, y);

		//文本框角度

		const float *anglesData = geometry.ptr<float>(0, 4, y);

		//遍历所有检测到的检测框

		for (int x = 0; x < width; ++x)

		{

			float score = scoresData[x];

			//低于阈值忽略该检测框

			if (score < scoreThresh)

			{

				continue;

			}

			// Decode a prediction.

			// Multiple by 4 because feature maps are 4 time less than input image.

			float offsetX = x * 4.0f, offsetY = y * 4.0f;

			//角度及相关正余弦计算

			float angle = anglesData[x];

			float cosA = std::cos(angle);

			float sinA = std::sin(angle);

			float h = x0_data[x] + x2_data[x];

			float w = x1_data[x] + x3_data[x];

			Point2f offset(offsetX + cosA * x1_data[x] + sinA * x2_data[x],

				offsetY - sinA * x1_data[x] + cosA * x2_data[x]);

			Point2f p1 = Point2f(-sinA * h, -cosA * h) + offset;

			Point2f p3 = Point2f(-cosA * w, sinA * w) + offset;

			//旋转矩形，分别输入中心点坐标，图像宽高，角度

			RotatedRect r(0.5f * (p1 + p3), Size2f(w, h), -angle * 180.0f / (float)CV_PI);

			//保存检测框

			detections.push_back(r);

			//保存检测框的置信度

			confidences.push_back(score);

		}

	}

}

Python代码：

# Import required modules

import cv2 as cv

import math

import argparse

parser = argparse.ArgumentParser(description='Use this script to run text detection deep learning networks using OpenCV.')

# Input argument

parser.add_argument('--input', help='Path to input image or video file. Skip this argument to capture frames from a camera.')

# Model argument

parser.add_argument('--model', default="./model/frozen_east_text_detection.pb",

                    help='Path to a binary .pb file of model contains trained weights.'

                    )

# Width argument

parser.add_argument('--width', type=int, default=320,

                    help='Preprocess input image by resizing to a specific width. It should be multiple by 32.'

                   )

# Height argument

parser.add_argument('--height',type=int, default=320,

                    help='Preprocess input image by resizing to a specific height. It should be multiple by 32.'

                   )

# Confidence threshold

parser.add_argument('--thr',type=float, default=0.5,

                    help='Confidence threshold.'

                   )

# Non-maximum suppression threshold

parser.add_argument('--nms',type=float, default=0.4,

                    help='Non-maximum suppression threshold.'

                   )

args = parser.parse_args()

############ Utility functions ############

def decode(scores, geometry, scoreThresh):

    detections = []

    confidences = []

    ############ CHECK DIMENSIONS AND SHAPES OF geometry AND scores ############

    assert len(scores.shape) == 4, "Incorrect dimensions of scores"

    assert len(geometry.shape) == 4, "Incorrect dimensions of geometry"

    assert scores.shape[0] == 1, "Invalid dimensions of scores"

    assert geometry.shape[0] == 1, "Invalid dimensions of geometry"

    assert scores.shape[1] == 1, "Invalid dimensions of scores"

    assert geometry.shape[1] == 5, "Invalid dimensions of geometry"

    assert scores.shape[2] == geometry.shape[2], "Invalid dimensions of scores and geometry"

    assert scores.shape[3] == geometry.shape[3], "Invalid dimensions of scores and geometry"

    height = scores.shape[2]

    width = scores.shape[3]

    for y in range(0, height):

        # Extract data from scores

        scoresData = scores[0][0][y]

        x0_data = geometry[0][0][y]

        x1_data = geometry[0][1][y]

        x2_data = geometry[0][2][y]

        x3_data = geometry[0][3][y]

        anglesData = geometry[0][4][y]

        for x in range(0, width):

            score = scoresData[x]

            # If score is lower than threshold score, move to next x

            if(score < scoreThresh):

                continue

            # Calculate offset

            offsetX = x * 4.0

            offsetY = y * 4.0

            angle = anglesData[x]

            # Calculate cos and sin of angle

            cosA = math.cos(angle)

            sinA = math.sin(angle)

            h = x0_data[x] + x2_data[x]

            w = x1_data[x] + x3_data[x]

            # Calculate offset

            offset = ([offsetX + cosA * x1_data[x] + sinA * x2_data[x], offsetY - sinA * x1_data[x] + cosA * x2_data[x]])

            # Find points for rectangle

            p1 = (-sinA * h + offset[0], -cosA * h + offset[1])

            p3 = (-cosA * w + offset[0],  sinA * w + offset[1])

            center = (0.5*(p1[0]+p3[0]), 0.5*(p1[1]+p3[1]))

            detections.append((center, (w,h), -1*angle * 180.0 / math.pi))

            confidences.append(float(score))

    # Return detections and confidences

    return [detections, confidences]

if __name__ == "__main__":

    # Read and store arguments

    confThreshold = args.thr

    nmsThreshold = args.nms

    inpWidth = args.width

    inpHeight = args.height

    model = args.model

    # Load network

    net = cv.dnn.readNet(model)

    # Create a new named window

    kWinName = "EAST: An Efficient and Accurate Scene Text Detector"

    outputLayers = []

    outputLayers.append("feature_fusion/Conv_7/Sigmoid")

    outputLayers.append("feature_fusion/concat_3")

    # Read frame

    frame = cv.imread("./image/stop1.jpg")

    # Get frame height and width

    height_ = frame.shape[0]

    width_ = frame.shape[1]

    rW = width_ / float(inpWidth)

    rH = height_ / float(inpHeight)

    # Create a 4D blob from frame.

    blob = cv.dnn.blobFromImage(frame, 1.0, (inpWidth, inpHeight), (123.68, 116.78, 103.94), True, False)

    # Run the model

    net.setInput(blob)

    output = net.forward(outputLayers)

    t, _ = net.getPerfProfile()

    label = 'Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency())

    # Get scores and geometry

    scores = output[0]

    geometry = output[1]

    [boxes, confidences] = decode(scores, geometry, confThreshold)

    # Apply NMS

    indices = cv.dnn.NMSBoxesRotated(boxes, confidences, confThreshold,nmsThreshold)

    for i in indices:

        # get 4 corners of the rotated rect

        vertices = cv.boxPoints(boxes[i[0]])

        # scale the bounding box coordinates based on the respective ratios

        for j in range(4):

            vertices[j][0] *= rW

            vertices[j][1] *= rH

        for j in range(4):

            p1 = (vertices[j][0], vertices[j][1])

            p2 = (vertices[(j + 1) % 4][0], vertices[(j + 1) % 4][1])

            cv.line(frame, p1, p2, (0, 255, 0), 2, cv.LINE_AA);

            # cv.putText(frame, "{:.3f}".format(confidences[i[0]]), (vertices[0][0], vertices[0][1]), cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 1, cv.LINE_AA)

    # Put efficiency information

    cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))

    # Display the frame

    cv.imshow("result",frame)

    cv.waitKey(0)

参考

https://www.learnopencv.com/deep-learning-based-text-detection-using-opencv-c-python/

[OpenCV实战]5 基于深度学习的文本检测的更多相关文章

[OpenCV实战]15 基于深度学习的目标跟踪算法GOTURN
目录 1 什么是对象跟踪和GOTURN 2 在OpenCV中使用GOTURN 3 GOTURN优缺点 4 参考在这篇文章中,我们将学习一种基于深度学习的目标跟踪算法GOTURN.GOTURN在Caf ...
[OpenCV实战]1 基于深度学习识别人脸性别和年龄
目录 1基于CNN的性别分类建模原理 1.1 人脸识别 1.2 性别预测 1.3 年龄预测 1.4 结果 2 代码参考本教程中,我们将讨论应用于面部的深层学习的有趣应用.我们将估计年龄,并从单个图 ...
#Deep Learning回顾#之基于深度学习的目标检测（阅读小结）
原文链接:https://www.52ml.net/20287.html 这篇博文主要讲了深度学习在目标检测中的发展. 博文首先介绍了传统的目标检测算法过程: 传统的目标检测一般使用滑动窗口的框架,主 ...
基于深度学习的目标检测技术演进：R-CNN、Fast R-CNN,Faster R-CNN
基于深度学习的目标检测技术演进:R-CNN.Fast R-CNN,Faster R-CNN object detection我的理解,就是在给定的图片中精确找到物体所在位置,并标注出物体的类别.obj ...
基于深度学习的病毒检测技术无需沙箱环境，直接将样本文件转换为二维图片，进而应用改造后的卷积神经网络 Inception V4 进行训练和检测
话题 3: 基于深度学习的二进制恶意样本检测分享主题:全球正在经历一场由科技驱动的数字化转型,传统技术已经不能适应病毒数量飞速增长的发展态势.而基于沙箱的检测方案无法满足 APT 攻击的检测需求,也 ...
[OpenCV实战]7 使用YOLOv3和OpenCV进行基于深度学习的目标检测
目录 1 YOLO介绍 1.1 YOLOv3原理 1.2 为什么要将OpenCV用于YOLO? 1.3 在Darknet和OpenCV上对YOLOv3进行速度测试 2 使用YOLOv3进行对象检测(C ...
一个基于深度学习回环检测模块的简单双目 SLAM 系统
转载请注明出处,谢谢原创作者:Mingrui 原创链接:https://www.cnblogs.com/MingruiYu/p/12634631.html 写在前面最近在搞本科毕设,关于基于深度学 ...
基于深度学习的目标检测技术演进：R-CNN、Fast R-CNN、Faster R-CNN
object detection我的理解,就是在给定的图片中精确找到物体所在位置,并标注出物体的类别.object detection要解决的问题就是物体在哪里,是什么这整个流程的问题.然而,这个问题 ...
（转）基于深度学习的目标检测技术演进：R-CNN、Fast R-CNN、Faster R-CNN
object detection我的理解,就是在给定的图片中精确找到物体所在位置,并标注出物体的类别.object detection要解决的问题就是物体在哪里,是什么这整个流程的问题.然而,这个问题 ...

随机推荐

故障复盘究竟怎么做？美图SRE结合10年经验做了三大总结（附模板）
美图崇尚的故障文化是 "拥抱故障,卓越运维",倡导的基准是 No-Blame, 即「不指责,重改进」.今年 9 月 TakinTalks 社区曾经分享过美图的三段式故障治理方法(美 ...
ULID规范解读与实现原理
前提最近发现各个频道推荐了很多ULID相关文章,这里对ULID的规范文件进行解读,并且基于Java语言自行实现ULID,通过此实现过程展示ULID的底层原理. ULID出现的背景 ULID全称是Un ...
uoj220【NOI2016】网格
刚了几个小时啊,这tm要是noi我怕不是直接滚粗了.我判答案为1的情况试了几种做法,最后终于想到了一个靠谱的做法,然后细节巨多,调了好久,刚拿到97分时代码有6.2KB了,后来发现有些东西好像没啥用就 ...
【Firefox浏览器】关闭触摸板双指滑动进行前进后退的功能
痛点本以为只是Chrome浏览器存在这一奇葩功能,没成想Firefox也沦陷了!有好一阵子在使用Firefox的时候,并未发现其存在这个功能.直到有一天,打开自己的博客,翻阅上篇< [Chro ...
java集合框架复习----（1）
文章目录 1 .集合框架思维导图一.什么是集合二.collection接口 1 .集合框架思维导图一.什么是集合存放在java.util.*.是一个存放对象的容器. 存放的是对象的引用,不是对 ...
VSCode设置鼠标滚轮滑动设置字体大小
1. 打开"文件->首选项->设置 2. 打开settings.json文件 3. 在setting.json 中添加"editor.mouseWheelZoom&qu ...
1.pytest入门
一.pytest单元测试框架概念:单元测试是指在软件开发中,针对软件的最小单位(函数.方法等)进行正确性的检查测试单元测试框架是自动化测试框架中的组成部分之一 ...
某 .NET RabbitMQ SDK 有采集行为，你怎么看？
一:背景 1.讲故事前几天有位朋友在微信上找到我,说他的一个程序上了生产之后,被运维监控定位到这个程序会向一个网址为: http://m.365ey.net 上不定期打数据,而且还是加密的格式,要他 ...
vue 祖先组件操作后代组件方法
前言:最近写代码遇到一问题:祖先级别的组件怎么操作孙子的儿子的组件方法(是不是已经绕晕了),在网上搜了半天都是父子传参,父子操作,晕晕乎乎的想起了bus(事件总线), 原理就是:是在vue原型上挂载( ...
嵌入式-C语言基础：malloc动态开辟内存空间
#include<stdio.h> #include<stdlib.h> int main() { // char *p;//定义一个野指针:没有让它指向一个变量的地址 // ...

[OpenCV实战]5 基于深度学习的文本检测

1 网络加载

2 读取图像

3 前向传播

4 处理输出

3结果和代码

3.1结果

3.2 代码

参考

[OpenCV实战]5 基于深度学习的文本检测的更多相关文章

随机推荐

热门专题