ocr - Including unicharambigs in the [lang].traineddata file (Tesseract) -
i'm facing problem in training tesseract ocr kannada font (lohit kannada , kedage), when comes numerals.
for example, 0 getting recognized 8 (and ನ ವ). needed in including unicharambigs file (the documentation on github describes format solely).my output.txt file has not changed,despite including unicharambigs file.
suppose [lang] corresponds kan, following command include unicharambigs file in kan.traineddata file?
combine_tessdata kan.
incase doesn't, i'd appreciate regarding how proceed same.
difficult answer not knowing version of tesseract
, kan.traineddata
you're using.
you can unpack kan.traineddata
see version of kan.unicharabigs
included in , recombine after editing file.
see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc command syntax
use
-u
option unpack:-u .traineddata pathprefix
unpacks.traineddata
using provided prefix.
use
-o
option overwriteucharambigs
:-o .traineddata file…
: overwrites specified components of.traineddata
file provided on command line.
please note https://github.com/tesseract-ocr/langdata/blob/master/kan/kan.unicharambigs seems copy of eng.unicharambigs
Comments
Post a Comment