tesseract-2.01
所属分类:模式识别(视觉/语音等)
开发工具:Visual C++
文件大小:3227KB
下载次数:634
上传日期:2010-01-04 08:45:06
上 传 者:
笑看人生
说明: ocr源码 文字、英文、数字智能识别,已经配置好的tesseract-2.01,采用vc6.0编译,使用方法见里面说明,不压缩的tif和单色bmp识别
(ocr source text, in English, digital intelligent identification, has been configured tesseract-2.01, with vc6.0 compile, use, see the note inside, non-compressed tif and monochrome bmp Recognition)
文件列表:
tesseract-2.01\1.tif (102598, 2007-07-06)
tesseract-2.01\AUTHORS (170, 2007-03-30)
tesseract-2.01\ccmain\adaptions.cpp (34973, 2007-07-03)
tesseract-2.01\ccmain\adaptions.h (5433, 2007-03-30)
tesseract-2.01\ccmain\applybox.cpp (30826, 2007-08-28)
tesseract-2.01\ccmain\applybox.h (2991, 2007-07-03)
tesseract-2.01\ccmain\baseapi.cpp (37350, 2007-08-30)
tesseract-2.01\ccmain\baseapi.h (10550, 2007-08-28)
tesseract-2.01\ccmain\blobcmp.cpp (2899, 2007-07-03)
tesseract-2.01\ccmain\blobcmp.h (1155, 2007-03-30)
tesseract-2.01\ccmain\callnet.cpp (2612, 2007-03-30)
tesseract-2.01\ccmain\callnet.h (1242, 2007-03-30)
tesseract-2.01\ccmain\charcut.cpp (23506, 2007-03-30)
tesseract-2.01\ccmain\charcut.h (4947, 2007-03-30)
tesseract-2.01\ccmain\charsample.cpp (17821, 2007-03-30)
tesseract-2.01\ccmain\control.cpp (62449, 2007-08-30)
tesseract-2.01\ccmain\control.h (9621, 2007-07-03)
tesseract-2.01\ccmain\docqual.cpp (49786, 2007-07-03)
tesseract-2.01\ccmain\docqual.h (7155, 2007-03-30)
tesseract-2.01\ccmain\expandblob.cpp (3107, 2007-03-30)
tesseract-2.01\ccmain\expandblob.h (251, 2007-03-30)
tesseract-2.01\ccmain\fixspace.cpp (33891, 2007-08-28)
tesseract-2.01\ccmain\fixspace.h (3539, 2007-07-03)
tesseract-2.01\ccmain\fixxht.cpp (29686, 2007-07-07)
tesseract-2.01\ccmain\fixxht.h (4084, 2007-07-03)
tesseract-2.01\ccmain\imgscale.cpp (4474, 2007-03-30)
tesseract-2.01\ccmain\imgscale.h (1230, 2007-03-30)
tesseract-2.01\ccmain\Makefile.am (2258, 2007-07-18)
tesseract-2.01\ccmain\Makefile.in (73401, 2007-08-31)
tesseract-2.01\ccmain\matmatch.cpp (12521, 2007-03-30)
tesseract-2.01\ccmain\matmatch.h (1901, 2007-03-30)
tesseract-2.01\ccmain\output.cpp (44967, 2007-08-30)
tesseract-2.01\ccmain\output.h (5372, 2007-07-03)
tesseract-2.01\ccmain\paircmp.cpp (3700, 2007-03-30)
tesseract-2.01\ccmain\paircmp.h (1823, 2007-03-30)
tesseract-2.01\ccmain\reject.cpp (59095, 2007-07-03)
tesseract-2.01\ccmain\reject.h (8640, 2007-07-03)
tesseract-2.01\ccmain\scaleimg.cpp (10409, 2007-03-30)
tesseract-2.01\ccmain\scaleimg.h (1565, 2007-03-30)
tesseract-2.01\ccmain\tessbox.cpp (15556, 2007-07-03)
... ...
Introduction
============
This package contains the Tesseract Open Source OCR Engine.
Orignally developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado, all the code
in this distribution is now licensed under the Apache License:
** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
Other Dependencies and Licenses:
================================
The Aspirin/MIGRAINES system is no longer required.
Tesseract can also make use of the libtiff library. (www.libtiff.org)
Without libtiff, Tesseract can only read uncompressed and G3 compressed
TIFF files.
History:
========
The engine was developed at Hewlett Packard Laboratories Bristol and
at Hewlett Packard Co, Greeley Colorado between 1***5 and 1994, with some
more changes made in 1996 to port to Windows, and some C++izing in 19***.
A lot of the code was written in C, and then some more was written in C++.
Since then all the code has been converted to at least compile with a C++
compiler. Currently it builds under Linux with gcc2.95 and under Windows
with VC++6. The C++ code makes heavy use of a list system using macros.
This predates stl, was portable before stl, and is more efficent than stl
lists, but has the big negative that if you do get a segmentation violation,
it is hard to debug. Another "feature" of the C/C++ split is that the C++
data structures get converted to C data structures to call the low-level C
code. This is ugly, and the C++izing of the C code is a step towards
eliminating the conversion, but it has not happened yet.
Directory Structure (ordered by dependency):
============================================
ccmain Top-level code. The main program resides in tesseractmain.cpp.
display An "editor" to view and operate on the internal structures.
(Requires a working viewer - batteries not included.)
wordrec The word-level recognizer.
textord The module that organizes(orders) text into lines and words.
classify The low-level character classifiers.
ccstruct Classes to hold information about a page as it is being processed.
viewer The client side of a client server viewing system.
Unfortunately, at this time, the server side is not available.
image Image class and processing functions.
dict Language model code.
cutil Code for file I/O, lists, heaps etc, from the old C code.
ccutil Somewhat newer code for lists, memory allocation etc from the
old C++ code.
About the Engine
================
This code is a raw OCR engine. It has NO PAGE LAYOUT ANALYSIS, NO OUTPUT
FORMATTING, and NO UI. It can only process an image of a single column
and create text from it. It can detect fixed pitch vs proportional text.
Having said that, in 1995, this engine was in the top 3 in terms of character
accuracy, and it compiles and runs on both Linux and Windows.
As of 2.0, Tesseract is fully unicode (UTF-8) enabled, and can recognize 6
languages "out of the box." Code and documentation is provided for the brave
to train in other languages. See code.google.com/p/tesseract-ocr for more
information on training.
Using the Engine
================
Windows:
The executable must reside in the same directory as the tessdata directory
The command line is:
tesseract
近期下载者:
相关文件:
收藏者: