ocr - Including unicharambigs in the [lang].traineddata file (Tesseract) -
i'm facing problem in training tesseract ocr kannada font (lohit kannada , kedage), when comes numerals.
for example, 0 getting recognized 8 (and ನ ವ). needed in including unicharambigs file (the documentation on github describes format solely).my output.txt file has not changed,despite including unicharambigs file.
suppose [lang] corresponds kan, following command include unicharambigs file in kan.traineddata file?
combine_tessdata kan.
incase doesn't, i'd appreciate regarding how proceed same.
difficult answer not knowing version of tesseract , kan.traineddata you're using.
you can unpack kan.traineddata see version of kan.unicharabigs  included in , recombine after editing file.
see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc command syntax
- use - -uoption unpack:- -u .traineddata pathprefixunpacks- .traineddatausing provided prefix.
 
- use - -ooption overwrite- ucharambigs:- -o .traineddata file…: overwrites specified components of- .traineddatafile provided on command line.
 
please note https://github.com/tesseract-ocr/langdata/blob/master/kan/kan.unicharambigs seems copy of eng.unicharambigs
Comments
Post a Comment