Pytesseract не может правильно преобразовать текст в изображение - PullRequest
0 голосов
/ 13 октября 2019

Я пытаюсь преобразовать текст в изображении в CSV. Прикрепление файла для вашей справки. Изображение с текстом

from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = Image.open('1.jpg')
text = pytesseract.image_to_string(img)
#text = text.encode('utf-8')
print(text)

Вывод:

et CUSIPO IE = VdllHS VEVAVINV SW INN 9 et OO

eBy POWE, JW :}SU] JO peay
L282 09226: GoW

‘uoeBnged ‘6p 11/2Z01 ON 301d
salsnpul
sa001g Appeg nemeseyg

\SSV8 L6Er6 - GOW

Melby Axi “S\\ :"ySU] JO pea}
LSP0E 09SS6 GOW,

‘QJOOwW/) ‘epninynyy :9e4

{ 99Ty Wispop HNemezeyg

‘jemesby ny espuaien 1
Melby ny Aeliguy :*3su) JO peop
‘BVO OLEVE 'S00ZS LZEP6 “Go
‘IndBueseqeny
Hed S,USIPILYD Jean ‘Brey seyemer
IW 2ry Sueseleg

‘Oey YJEWUEY WV J\\ :"JsU] Jo peay
Sh968 SBEVE “LHEZE PEK : GOW,
OLOPEL-Uld “HuNYy!iNJUa) “epnBeyeyg
Slapel] elieg

Wey URWXE) y J\"3SU] JQ peay
QOZLE PEPE : Gow ‘epedeyey
WAL 29ry e1eg

e\dng euysuyjeg JW :"}SU] JO peay
Woo‘ jlewbOe\dn6 ypig : jrew-3

5 9/920 vEErE : GoW

+: ANDAISEPeS ‘BZ |-OE| ‘ON 101d
Sloper] seq wey eqeg





COPY PueUY 3 JW :"ysu] Jo peay

6EZEO L8E66 “GoW
+ ‘ajOMewW/)
S —-PPnBeBieg ‘spgz19//p ‘on ld
| ®Slidiajuy Weuemsig eqeg

?

. Ovyyl ELEPE: Goy
‘2}0ow/) ‘eqweiuiyooy

‘ IW
é Wepow

FA)I\ jp

ed WeueyUeEMG JW:

ebueur ‘epnbin

p MewGiue 4 eAelIg JW =ysuy yo peay

"ysuy jo peop
$8628 O8Er6: qo
EU ‘96 ‘ON Jolq
WA 22ry Lsoiny

‘17828 LLE66 92292 SLEH6: ‘GOW

‘epued Jewny edelig J\\ :"}Su] JO peay

CLOP9LUld “}OewW/}-e1/\
‘ pueseyqepeg-Oq ‘Ipueseyqeues-}y
IFW 22ry DOA oiny

IUEY D S\N -"}SU] JO pes}

Woo 'jfew6@)||lwnoypuesouewndeuue
* |IEW-3

P8S9h L/EP6: GoW ‘uoebngegq

IW Ano}y 2 Bry eundeuuy



YNdONVEVEYN )



EYES N JW :}Su] Jo peay

‘01800 ZZE6 “GOW ‘p€607Z (96290):"Ud

elfuerey -el/ ‘peyejey vogny
TW. ory rune]

jenueby ysiuey JW :"3Su] Jo peay
Woo '|jewBOoibelwysyejaauys : rew-3

46610 LS688: GOW ‘99122 (96290) "Ud

‘elluevey ‘epeyuey
*P?T

Ad SPOo4 LIBY IUIYsye] Vays

lemueBy Ysoulq J\\ :"}su] Jo peay

woo ew ¢goOZzNUEYypeWeysalys

: [leWeS

CSBSE OLEVE: GON 'ZSE79Z (76290) "Ud
‘houejag ‘indnuysig

“P37 3Ad Aijsnpuy

peseg o18y nusypewey 831YS

UeJey YSUEGepod JW :*}su] Jo peay
Woo'}!ewHOjedobeaiysbuipes : lew-F
6SL1S BLE6E: GOW ‘7/1292 (26290) ‘Ud
‘houejeg ‘ueweyyiypng

IW
SFY Wepow pooy jedonr aaiys

jemesBy ysexig sy YSU] JO peay
WOd"}!eWwHOJepyingApogeyique : |!ew-F
$9269 LZ€V6 ‘GOW ‘751.022 (9690):"Ud
LEQLSZ -[ueyqunkey ‘eiluesey-og/-y
IHW 221y wees

NUBS BJEQUNO ‘sj; :ysu] jo peoy
Wwoo"}!eWBO¢ | yoosubeles : \!ew-3
EvL12 EBEES: Gow

‘luesepeg ‘epequiy

YoLBy es



eyes eipualey J) :"}Su] Jo peay
LIG8E OLEVE'SLE8E OZEYE “GOW
‘epedueg :'0q ‘e||lA dnems -\y

IAL ry pesesg jesuepy

A\snd ysiqeg JN :"}Suj Jo peay
woo ‘lewHOspoojo/beinepuewyeew
 IEW-F

G0P69 68E66: GOW ‘20792 (26290) Ud
‘MUNYNS ‘|!suepueg

*P37 IAd
Spool] O13y IA8g UPA] ee]

"eyes elpualey J :"}Suy Jo peay
“LUGBE OLEVE 'SL88E OZEPE: GON

‘LOEESZ (26290) "Ud
‘eJUNYY :'Oq ‘Indnyuey -}\7
yonpoig o18y epedndey

e]dNd ny eipuayeyy JV :}Suy Jo peoy
‘9CSL6 E8EH6 ‘SESH! LEH : Gow
‘Isoduibueg :-og ‘epueyy

SIMNpOlg O13y exeY

LOEPE OLEKE “qo;

ooues JeWINy YsHer JW) :}su] Jo peay
S$202G2-lueyqunkeyy ‘houjeg ‘Hoyere) yy
“P37 2Npold [IO 2 O18y yspel

lemeby ysiueyy sy :"JSU] JO peay
Wod'|lewHOQeouequepbe/ : |lew-3
LVLZE OLEPE : qo;
eleyellg ‘1uequesy ‘Z ‘on }0ld OOa!
PIT 3Ad Salysnpuy
J99}5 equepeBer jo uolsiaig
IFW e2ry equiepese¢

‘UEMJEd YSOjUeS JW "ysuy JO peay
6SPBE OLEKE “ GOW'6ES0Zz (96290): ‘Ud
‘elluesey “Oq‘eliminy +

“PIT Ad A2IIW ysauen

A\snig wy JW :3Sul yo peay

Woo" |leWA®)spoojouBeeyipueyo: |l@w-3
Cvyyl LLL/6 “GOW ‘12/122 (96290) "Ud
‘elluesey ‘1uedeuning

"PFT IAd Spooy 013y eypueys

OoueS Jeqeqaq JW "su yo peay
€000S 00606 :Go- ‘iepequy :‘og jy
“P97 34d 3ulssaz01g Appeg nueseg
------------------------------------------

Любая помощь будет принята с благодарностью.

РЕШЕНИЕ: Просто настроил конфигурациюпараметр в image_to_string.

Измененный код:

from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = Image.open('1.jpg')
text = pytesseract.image_to_string(img, config='--psm 12')
#text = text.encode('utf-8')
print(text)
##print image_to_string(Image.open('test-english.jpg'), lang='eng')
...