Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine
http://www.pixel-technology.com/freeware/tessnet2/
Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine
Keywords: Open source, OCR, Tesseract, .NET, DOTNET, C#, VB.NET, C++/CLI
Current version : 2.04.0, 02SEP09 (see version history)
The big picture
Tesseractis a C++
open source OCR engine. Tessnet2 is .NET assembly that expose very simple methods
to do OCR.
Tessnet2 is multi threaded. It uses the engine the same way Tesseract.exe does.
Tessdll uses another method (no thresholding).
License
Tessnet2 is under Apache 2 license (like tesseract), meaning you can use it like
you want, included in commercial products. You can read full license info in
source file.
Quick Tessnet2 usage
Download binary here, add a reference
of the assembly Tessnet2.dll to your .NET project.Download language data definition file
here and
put it in tessdata directory. Tessdata directory and your exe must be in the
same directory.Look at the Program.cs sample
Note: Tessnet2.dll needs Visual C++ 2008 Runtime. When deploying
your application be sure to install C++ runtime (x86,
x64)
Tessnet2 usage
Bitmap
image = new
Bitmap("eurotext.tif");
tessnet2.Tesseract
ocr =
new tessnet2.Tesseract();
ocr.SetVariable("tessedit_char_whitelist",
"0123456789");
// If digit only
ocr.Init(@"c:\temp", "fra",
false);
// To use correct tessdata
List<tessnet2.Word>
result = ocr.DoOCR(image, Rectangle.Empty);
foreach (tessnet2.Word
word in result)
Console.WriteLine("{0}
: {1}", word.Confidence, word.Text);
Tessnet2 source code and recompiling
Download Tesseract
source code here and expand it in a directoryDownload Tessnet2 source code here and
expand it in Tesseract source code root directory (it should create dotnet sub
directory)Open the project solution tessnet2.sln. It's a Visual
Studio 2008 C++/CLI project
Memory leak
Tesseract C++ source code is full of memory leak. Using
tessnet2 assembly several time will cause memory overflow. This is not tessnet2
leak, this is tesseract leak and I spent two days in tesseract source code
trying to improve this with no success.
See
what I think about this.
Tessnet2 demo
In the Tessnet2 source code you have two C# demo project. TesseractOCR is a multi-tread
WinForm demo with a progression bar. TesseractConsole is a console demo.
![]()
The confidence score is between braquets. < 160 mean not bad
Version History
07JUN08: First release on Tesserect
2.03
10JUN08: Version 2.03.1. Change Confidence
behavior, now it's calculated from each word letter and not from the first letter.
Type change from byte to double. 0 = perfect, 100 = reject
13JUN08 : Version 2.03.2
After 3 days in Tesseract code (urgh), here is Tessnet2 version
2.03.2
The corrections deals with the following problems
* Confidence was not very useful, the value was strange. This has been corrected,
setting the variable tessedit_write_ratings=true. After many test I found this mode
is the best for confidence accuracy. Value range from 0 (perfect) to 255 (reject)
. When value goes over 160 this really mean the OCR was bad.
* Calling DoOCR twice was not giving the same result. It was, as expected, a problem
with global variables. The problem is almost fixed, sometime it doesn’t work but
right now I can’t find what is not correctly reinitialized.
Tessnet2 a .NET 2.0 Open Source OCR assembly using Tesseract engine的更多相关文章
- windows 10 上源码编译boost 1.66.0 | compile boost 1.66.0 from source on windows 10
本文首发于个人博客https://kezunlin.me/post/854071ac/,欢迎阅读! compile boost 1.66.0 from source on windows 10 Ser ...
- Ubuntu 16.04源码编译boost库 编写CMakeLists.txt | compile boost 1.66.0 from source on ubuntu 16.04
本文首发于个人博客https://kezunlin.me/post/d5d4a460/,欢迎阅读! compile boost 1.66.0 from source on ubuntu 16.04 G ...
- Flume-ng-1.4.0 spooling source的方式增加了对目录的递归检测的支持
因为flume的spooldir不支持子目录文件的递归检测,并且业务需要,所以修改了源码,重新编译 代码修改参考自:http://blog.csdn.net/yangbutao/article/det ...
- 【转】OCR识别引擎tesseract使用方法——安装leptonica和libtiff
原文来自:http://cache.baiducontent.com/c?m=9f65cb4a8c8507ed4fece7631046893b4c4380146d96864968d4e414c4224 ...
- 开源OCR识别库-Tesseract介绍
最近在github上面看到一个开源的ocr文字识别库,感觉效果还可以,所以在这里介绍一下,这个项目的原地址在:https://github.com/tesseract-ocr/tesseract. t ...
- OCR学习及tesseract的一些测试
最近接触OCR,先收集一些资料,包括成熟软件.SDK.流行算法. 1. 一个对现有OCR软件及SDK的总结,比较全面,包括支持平台.编程语言.支持字体语言.输出格式.相关链接等 http://en.w ...
- 在.net中创建Access数据库
static void Main(string[] args) { //环境要求 //安装 access 2003, //引用com组件:Microsoft ADO Ext. 2.8 for DDL ...
- 由于OCR文件损坏造成Oracle RAC不能启动的现象和处理方法
v$cluster_interconnects 集群节点间通信使用的IP地址 错误信息 使用了公网进行连接 SQL> select * from v$cluster_interconnects; ...
- Android 4.0 源代码结构
Android源码的第一级目录结构 Android/abi (abi相关代码.ABI:application binary interface,应用程序二进制接口) Android/bioni ...
随机推荐
- 使用yum安装pip
PIP 简介:pip 是一个现代的,通用的 Python 包管理工具.提供了对 Python 包的查找.下载.安装.卸载的功能.功能类似于RedHat里面的yum 使用yum安装pip 因为测试环境搭 ...
- PHP:第五章——字符串的统计及查找
<?php header("Content-Type:text/html;charset=utf-8"); /*字符串的统计与查找*/ //1.获取字符串的长度 //1)st ...
- HTML&CSS 学习网站收藏【转】
转自:http://lab.linxz.de/some_url.html html & CSS http://www.adobe.com/devnet/html5/articles/css3- ...
- HTML, CSS. JS的各种奇淫技巧
1. js 中为了省字节,性能, 防止被重写等发明了各种写法,记录下 //取整 parseInt(a,10); //Before Math.floor(a); //Before a>>0; ...
- leetcode122 买卖股票的最佳时机 python
题目:给定一个数组,它表示了一只股票的价格浮动,第i个元素代表的是股票第i天的价格.设计一个函数,计算出该股票的最大收益,注意,可以多次买入卖出,但下一次买入必须是在本次持有股票卖出之后.比如[1,7 ...
- vue.js中引入图片
vue中引入图片 前言:vue中引入图片时,会显示不出来,除非在css中引入.而在template中或者js动态引入时,会显示不出图片. 解决一 图片通过后端返回引入网络图片路径即可. <div ...
- 每天一个linux命令(网络):【转载】route命令
Linux系统的route命令用于显示和操作IP路由表(show / manipulate the IP routing table).要实现两个不同的子网之间的通信,需要一台连接两个网络的路由器,或 ...
- spring-security-4 (1)介绍
一.什么是spring security? spring security是基于spring开发的为JavaEE企业级应用提供安全服务的框架.安全服务主要是指 认证(Authentication)和 ...
- HDU3530 Subsequence(单调队列)
题意是说给出一个序列,现在要求出这个序列的一个最长子区间,要求子区间的最大值与最小值的差在[m, k]范围内,求区间长度 做法是维护两个队列,一个维护到当前位置的最大值,一个维护最小值,然后计算当前节 ...
- (原创)AP6212移植到AM335X自主开发板上
转载请指明出处. 参考<关于AM335X移植SDIO WIFI的简易教程> http://www.deyisupport.com/question_answer/dsp_arm/sitar ...