ocr - Including unicharambigs in the [lang].traineddata file (Tesseract) -

- August 15, 2015

i'm facing problem in training tesseract ocr kannada font (lohit kannada , kedage), when comes numerals.

for example, 0 getting recognized 8 (and ನ ವ). needed in including unicharambigs file (the documentation on github describes format solely).my output.txt file has not changed,despite including unicharambigs file.

suppose [lang] corresponds kan, following command include unicharambigs file in kan.traineddata file?

combine_tessdata kan.

incase doesn't, i'd appreciate regarding how proceed same.

difficult answer not knowing version of tesseract , kan.traineddata you're using.

you can unpack kan.traineddata see version of kan.unicharabigs included in , recombine after editing file.

see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc command syntax

use -u option unpack:
- -u .traineddata pathprefix unpacks .traineddata using provided prefix.
use -o option overwrite ucharambigs:
- -o .traineddata file…: overwrites specified components of .traineddata file provided on command line.

please note https://github.com/tesseract-ocr/langdata/blob/master/kan/kan.unicharambigs seems copy of eng.unicharambigs

Search This Blog

To form

ocr - Including unicharambigs in the [lang].traineddata file (Tesseract) -

Comments

Post a Comment

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

delphi - Take screenshot in webcam using VFrames in Console Application -

extjs - Set tooltip on click event on the grid cell -