特殊:Badtitle/NS100:OCR
文章出处: |
{{#if: | {{{2}}} | https://help.ubuntu.com/community/OCR }} |
点击翻译: |
English {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/af | • {{#if: UbuntuHelp:OCR|Afrikaans| [[::OCR/af|Afrikaans]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ar | • {{#if: UbuntuHelp:OCR|العربية| [[::OCR/ar|العربية]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/az | • {{#if: UbuntuHelp:OCR|azərbaycanca| [[::OCR/az|azərbaycanca]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/bcc | • {{#if: UbuntuHelp:OCR|جهلسری بلوچی| [[::OCR/bcc|جهلسری بلوچی]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/bg | • {{#if: UbuntuHelp:OCR|български| [[::OCR/bg|български]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/br | • {{#if: UbuntuHelp:OCR|brezhoneg| [[::OCR/br|brezhoneg]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ca | • {{#if: UbuntuHelp:OCR|català| [[::OCR/ca|català]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/cs | • {{#if: UbuntuHelp:OCR|čeština| [[::OCR/cs|čeština]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/de | • {{#if: UbuntuHelp:OCR|Deutsch| [[::OCR/de|Deutsch]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/el | • {{#if: UbuntuHelp:OCR|Ελληνικά| [[::OCR/el|Ελληνικά]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/es | • {{#if: UbuntuHelp:OCR|español| [[::OCR/es|español]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/fa | • {{#if: UbuntuHelp:OCR|فارسی| [[::OCR/fa|فارسی]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/fi | • {{#if: UbuntuHelp:OCR|suomi| [[::OCR/fi|suomi]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/fr | • {{#if: UbuntuHelp:OCR|français| [[::OCR/fr|français]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/gu | • {{#if: UbuntuHelp:OCR|ગુજરાતી| [[::OCR/gu|ગુજરાતી]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/he | • {{#if: UbuntuHelp:OCR|עברית| [[::OCR/he|עברית]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/hu | • {{#if: UbuntuHelp:OCR|magyar| [[::OCR/hu|magyar]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/id | • {{#if: UbuntuHelp:OCR|Bahasa Indonesia| [[::OCR/id|Bahasa Indonesia]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/it | • {{#if: UbuntuHelp:OCR|italiano| [[::OCR/it|italiano]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ja | • {{#if: UbuntuHelp:OCR|日本語| [[::OCR/ja|日本語]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ko | • {{#if: UbuntuHelp:OCR|한국어| [[::OCR/ko|한국어]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ksh | • {{#if: UbuntuHelp:OCR|Ripoarisch| [[::OCR/ksh|Ripoarisch]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/mr | • {{#if: UbuntuHelp:OCR|मराठी| [[::OCR/mr|मराठी]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ms | • {{#if: UbuntuHelp:OCR|Bahasa Melayu| [[::OCR/ms|Bahasa Melayu]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/nl | • {{#if: UbuntuHelp:OCR|Nederlands| [[::OCR/nl|Nederlands]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/no | • {{#if: UbuntuHelp:OCR|norsk| [[::OCR/no|norsk]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/oc | • {{#if: UbuntuHelp:OCR|occitan| [[::OCR/oc|occitan]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/pl | • {{#if: UbuntuHelp:OCR|polski| [[::OCR/pl|polski]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/pt | • {{#if: UbuntuHelp:OCR|português| [[::OCR/pt|português]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ro | • {{#if: UbuntuHelp:OCR|română| [[::OCR/ro|română]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/ru | • {{#if: UbuntuHelp:OCR|русский| [[::OCR/ru|русский]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/si | • {{#if: UbuntuHelp:OCR|සිංහල| [[::OCR/si|සිංහල]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/sq | • {{#if: UbuntuHelp:OCR|shqip| [[::OCR/sq|shqip]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/sr | • {{#if: UbuntuHelp:OCR|српски / srpski| [[::OCR/sr|српски / srpski]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/sv | • {{#if: UbuntuHelp:OCR|svenska| [[::OCR/sv|svenska]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/th | • {{#if: UbuntuHelp:OCR|ไทย| [[::OCR/th|ไทย]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/tr | • {{#if: UbuntuHelp:OCR|Türkçe| [[::OCR/tr|Türkçe]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/vi | • {{#if: UbuntuHelp:OCR|Tiếng Việt| [[::OCR/vi|Tiếng Việt]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/yue | • {{#if: UbuntuHelp:OCR|粵語| [[::OCR/yue|粵語]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/zh | • {{#if: UbuntuHelp:OCR|中文| [[::OCR/zh|中文]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/zh-hans | • {{#if: UbuntuHelp:OCR|中文(简体)| [[::OCR/zh-hans|中文(简体)]]}}|}} {{#ifexist: {{#if: UbuntuHelp:OCR | UbuntuHelp:OCR | {{#if: | :}}OCR}}/zh-hant | • {{#if: UbuntuHelp:OCR|中文(繁體)| [[::OCR/zh-hant|中文(繁體)]]}}|}} |
{{#ifeq:UbuntuHelp:OCR|:OCR|请不要直接编辑翻译本页,本页将定期与来源同步。}} |
{{#ifexist: :OCR/zh | | {{#ifexist: OCR/zh | | {{#ifeq: {{#titleparts:OCR|1|-1|}} | zh | | }} }} }} {{#ifeq: {{#titleparts:OCR|1|-1|}} | zh | | }}
- title OCR
Parent: Software
OCR - Optical Character Recognition
OCR is a technology that allows you to convert scanned images of text into plain text. This enables you to save space, edit the text and search/index it.
Available OCR tools
The Ubuntu Universe repositories contain the following OCR tools:
- tesseract-ocr
- ocrad
- gocr
Tesseract
Introduction
Arguably the one producing the best (most accurate) results is Tesseract. It is a technology initially developed by HP Labs between 1985 and 1995, then they open-sourced it in 2005. Tesseract can recognize text in 7 different languages: English, German, French, Italian, Spanish, Brazilian Portuguese and Dutch. You can install more than one dictionaries if you need. It does not support layout analysis, so multi-column text, images, equations etc. should give you a garbled text output. Also, it only supports TIFF images as input.
Usage
Tesseract is currently a command-line-only tool (although they're working on an integration with OCROpus for a GUI). After successful installation, the command to use is tesseract <path to tiff image> <output file>
. Tesseract will automatically give the output file a .txt extension.
It is critical that the tiff image have a ".tif" extension and not a ".tiff" extension. The command line should look like this example:
$ tesseract /home/johnsmith/input.tif output
Where johnsmith
is your home user account name, input.tif
is the document to be converted and output
is the document that Tesseract will create as output.txt
. The .txt
file extansion will be added by Tesseract automatically.
Preparing images for Tesseract
Tesseract is not very flexible about the format of its input images. It will only accept TIFF images. According to user reports, compressed TIFF images are quite problematic, and the same goes for grey-scale and colour images. So you're better of with single-bit uncompressed TIFF images. The process to prepare them with GIMP is very simple:
- Go to the Image→Mode menu and make sure the image is in RGB or Grayscale mode.
- Select from the menu Tools→Color Tools→Threshold and choose an adequate threshold value.
- Select from the menu Image→Mode→Indexed and from the options choose 1-bit and no dithering.
- Save the image in TIFF format.