Я пытаюсь преобразовать текст в изображении в CSV. Прикрепление файла для вашей справки. Изображение с текстом
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = Image.open('1.jpg')
text = pytesseract.image_to_string(img)
#text = text.encode('utf-8')
print(text)
Вывод:
et CUSIPO IE = VdllHS VEVAVINV SW INN 9 et OO
eBy POWE, JW :}SU] JO peay
L282 09226: GoW
‘uoeBnged ‘6p 11/2Z01 ON 301d
salsnpul
sa001g Appeg nemeseyg
\SSV8 L6Er6 - GOW
Melby Axi “S\\ :"ySU] JO pea}
LSP0E 09SS6 GOW,
‘QJOOwW/) ‘epninynyy :9e4
{ 99Ty Wispop HNemezeyg
‘jemesby ny espuaien 1
Melby ny Aeliguy :*3su) JO peop
‘BVO OLEVE 'S00ZS LZEP6 “Go
‘IndBueseqeny
Hed S,USIPILYD Jean ‘Brey seyemer
IW 2ry Sueseleg
‘Oey YJEWUEY WV J\\ :"JsU] Jo peay
Sh968 SBEVE “LHEZE PEK : GOW,
OLOPEL-Uld “HuNYy!iNJUa) “epnBeyeyg
Slapel] elieg
Wey URWXE) y J\"3SU] JQ peay
QOZLE PEPE : Gow ‘epedeyey
WAL 29ry e1eg
e\dng euysuyjeg JW :"}SU] JO peay
Woo‘ jlewbOe\dn6 ypig : jrew-3
5 9/920 vEErE : GoW
+: ANDAISEPeS ‘BZ |-OE| ‘ON 101d
Sloper] seq wey eqeg
COPY PueUY 3 JW :"ysu] Jo peay
6EZEO L8E66 “GoW
+ ‘ajOMewW/)
S —-PPnBeBieg ‘spgz19//p ‘on ld
| ®Slidiajuy Weuemsig eqeg
?
. Ovyyl ELEPE: Goy
‘2}0ow/) ‘eqweiuiyooy
‘ IW
é Wepow
FA)I\ jp
ed WeueyUeEMG JW:
ebueur ‘epnbin
p MewGiue 4 eAelIg JW =ysuy yo peay
"ysuy jo peop
$8628 O8Er6: qo
EU ‘96 ‘ON Jolq
WA 22ry Lsoiny
‘17828 LLE66 92292 SLEH6: ‘GOW
‘epued Jewny edelig J\\ :"}Su] JO peay
CLOP9LUld “}OewW/}-e1/\
‘ pueseyqepeg-Oq ‘Ipueseyqeues-}y
IFW 22ry DOA oiny
IUEY D S\N -"}SU] JO pes}
Woo 'jfew6@)||lwnoypuesouewndeuue
* |IEW-3
P8S9h L/EP6: GoW ‘uoebngegq
IW Ano}y 2 Bry eundeuuy
YNdONVEVEYN )
EYES N JW :}Su] Jo peay
‘01800 ZZE6 “GOW ‘p€607Z (96290):"Ud
elfuerey -el/ ‘peyejey vogny
TW. ory rune]
jenueby ysiuey JW :"3Su] Jo peay
Woo '|jewBOoibelwysyejaauys : rew-3
46610 LS688: GOW ‘99122 (96290) "Ud
‘elluevey ‘epeyuey
*P?T
Ad SPOo4 LIBY IUIYsye] Vays
lemueBy Ysoulq J\\ :"}su] Jo peay
woo ew ¢goOZzNUEYypeWeysalys
: [leWeS
CSBSE OLEVE: GON 'ZSE79Z (76290) "Ud
‘houejag ‘indnuysig
“P37 3Ad Aijsnpuy
peseg o18y nusypewey 831YS
UeJey YSUEGepod JW :*}su] Jo peay
Woo'}!ewHOjedobeaiysbuipes : lew-F
6SL1S BLE6E: GOW ‘7/1292 (26290) ‘Ud
‘houejeg ‘ueweyyiypng
IW
SFY Wepow pooy jedonr aaiys
jemesBy ysexig sy YSU] JO peay
WOd"}!eWwHOJepyingApogeyique : |!ew-F
$9269 LZ€V6 ‘GOW ‘751.022 (9690):"Ud
LEQLSZ -[ueyqunkey ‘eiluesey-og/-y
IHW 221y wees
NUBS BJEQUNO ‘sj; :ysu] jo peoy
Wwoo"}!eWBO¢ | yoosubeles : \!ew-3
EvL12 EBEES: Gow
‘luesepeg ‘epequiy
YoLBy es
eyes eipualey J) :"}Su] Jo peay
LIG8E OLEVE'SLE8E OZEYE “GOW
‘epedueg :'0q ‘e||lA dnems -\y
IAL ry pesesg jesuepy
A\snd ysiqeg JN :"}Suj Jo peay
woo ‘lewHOspoojo/beinepuewyeew
IEW-F
G0P69 68E66: GOW ‘20792 (26290) Ud
‘MUNYNS ‘|!suepueg
*P37 IAd
Spool] O13y IA8g UPA] ee]
"eyes elpualey J :"}Suy Jo peay
“LUGBE OLEVE 'SL88E OZEPE: GON
‘LOEESZ (26290) "Ud
‘eJUNYY :'Oq ‘Indnyuey -}\7
yonpoig o18y epedndey
e]dNd ny eipuayeyy JV :}Suy Jo peoy
‘9CSL6 E8EH6 ‘SESH! LEH : Gow
‘Isoduibueg :-og ‘epueyy
SIMNpOlg O13y exeY
LOEPE OLEKE “qo;
ooues JeWINy YsHer JW) :}su] Jo peay
S$202G2-lueyqunkeyy ‘houjeg ‘Hoyere) yy
“P37 2Npold [IO 2 O18y yspel
lemeby ysiueyy sy :"JSU] JO peay
Wod'|lewHOQeouequepbe/ : |lew-3
LVLZE OLEPE : qo;
eleyellg ‘1uequesy ‘Z ‘on }0ld OOa!
PIT 3Ad Salysnpuy
J99}5 equepeBer jo uolsiaig
IFW e2ry equiepese¢
‘UEMJEd YSOjUeS JW "ysuy JO peay
6SPBE OLEKE “ GOW'6ES0Zz (96290): ‘Ud
‘elluesey “Oq‘eliminy +
“PIT Ad A2IIW ysauen
A\snig wy JW :3Sul yo peay
Woo" |leWA®)spoojouBeeyipueyo: |l@w-3
Cvyyl LLL/6 “GOW ‘12/122 (96290) "Ud
‘elluesey ‘1uedeuning
"PFT IAd Spooy 013y eypueys
OoueS Jeqeqaq JW "su yo peay
€000S 00606 :Go- ‘iepequy :‘og jy
“P97 34d 3ulssaz01g Appeg nueseg
------------------------------------------
Любая помощь будет принята с благодарностью.
РЕШЕНИЕ: Просто настроил конфигурациюпараметр в image_to_string.
Измененный код:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
img = Image.open('1.jpg')
text = pytesseract.image_to_string(img, config='--psm 12')
#text = text.encode('utf-8')
print(text)
##print image_to_string(Image.open('test-english.jpg'), lang='eng')