Распознавание текста в Azure Cognitive Services - PullRequest
2 голосов
/ 28 мая 2019

Я разработал быстрый пример кода с использованием ComputerVisionClient.

Результаты хорошие, но по какой-то причине я не могу заставить ComputerVisionClient выводить португальские символы, такие как ' ã ' или ' é ' и т. Д.

Я посмотрел в документации и попытался использовать два метода. RecognizeTextInStreamAsync и RecognizePrintedTextInStreamAsync. Только RecognizePrintedTextInStreamAsync позволяет мне указать язык. Но даже когда я указываю OcrLanguages.Pt, португальские символы не отображаются. Вместо ' ç ' я получаю ' c '.

Есть ли что-то, что я пропускаю, или это ожидаемое поведение по замыслу?

Заранее спасибо

1 Ответ

1 голос
/ 29 мая 2019

Прежде всего, два разных метода используются для разных сценариев, как показано ниже.

  • RecognizeTextInStreamAsync для извлечения либо отпечатано или рукописный текст
  • RecognizePrintedTextInStreamAsync для извлечения напечатанный текст

Пожалуйста, внимательно обращайтесь к двум разделам Explore the Recognize Text (OCR) scenario и Explore the Recognize Text V2 (English) scenario официального документа Sample: Explore an image processing app with C#, как показано на скриншотах ниже.

enter image description here

enter image description here

Существует официальный пример кода, на который ссылается документ: https://github.com/microsoft/Cognitive-Vision-Windows/tree/master/Sample-WPF.Вы можете найти два файла OCrPage.xaml и TextRecognitionPage.xaml.

enter image description here

Надеюсь, они помогут вам сделать ваш код работоспособным.


Обновление:

Iисследовали REST APIs Computer Vision: OCR и Recognize Text , затем в OCR REST API можно указать языковую опцию в качестве параметра запроса.

Поэтому я попыталсяустановите language=pt для вызова OCR API через curl, команду, как показано ниже.

curl -v -X POST "https://southeastasia.api.cognitive.microsoft.com/vision/v2.0/ocr?language=pt" 
-H "Content-Type: application/json" 
-H "Ocp-Apim-Subscription-Key: <subscription key>" 
--data-ascii '{"url":"https://i.imgur.com/o6i3UDH.png"}' 

Ответ, как показано ниже, но результат распознавания неверный.

{"language":"pt","textAngle":0.0,"orientation":"Down","regions":[{"boundingBox":"245,123,367,736","lines":[{"boundingBox":"326,123,62,9","words":[{"boundingBox":"326,123,49,9","text":"(SVOSS3d"},{"boundingBox":"383,124,5,8","text":"Z"}]},{"boundingBox":"339,146,61,9","words":[{"boundingBox":"339,146,61,9","text":"(1YN010dO)"}]},{"boundingBox":"255,169,248,8","words":[{"boundingBox":"255,169,56,8","text":"sol]Nnsoo"},{"boundingBox":"491,169,12,8","text":"00"}]},{"boundingBox":"398,191,56,9","words":[{"boundingBox":"398,192,18,8","text":"001"},{"boundingBox":"424,191,30,9","text":"VOU)"}]},{"boundingBox":"261,214,274,9","words":[{"boundingBox":"261,215,6,8","text":"3"},{"boundingBox":"306,214,11,9","text":"n"},{"boundingBox":"325,215,18,8","text":"uS"},{"boundingBox":"350,215,12,8","text":"30"},{"boundingBox":"523,215,12,8","text":"30"}]},{"boundingBox":"312,268,203,8","words":[{"boundingBox":"312,268,12,8","text":"no"},{"boundingBox":"382,268,12,8","text":"30"},{"boundingBox":"447,268,11,8","text":"no"},{"boundingBox":"504,268,11,8","text":"30"}]},{"boundingBox":"253,289,323,9","words":[{"boundingBox":"253,289,2,3","text":"•"},{"boundingBox":"258,290,56,8","text":"s013Rnsoo"},{"boundingBox":"417,290,6,8","text":"v"},{"boundingBox":"424,290,31,8","text":"10830"},{"boundingBox":"494,290,12,8","text":"00"},{"boundingBox":"558,290,18,8","text":"Roo"}]},{"boundingBox":"402,313,18,8","words":[{"boundingBox":"402,313,18,8","text":"oze"}]},{"boundingBox":"277,335,242,9","words":[{"boundingBox":"277,336,6,8","text":"3"},{"boundingBox":"328,335,11,9","text":"//"},{"boundingBox":"348,335,55,9","text":"(OH11h0N)"},{"boundingBox":"507,336,12,8","text":"vo"}]},{"boundingBox":"424,389,107,8","words":[{"boundingBox":"424,389,18,8","text":"NOO"},{"boundingBox":"494,389,37,8","text":"OOV01d"}]},{"boundingBox":"373,411,82,9","words":[{"boundingBox":"373,411,11,9","text":"//"},{"boundingBox":"392,412,43,8","text":"OH11noN"},{"boundingBox":"443,412,12,8","text":"30"}]},{"boundingBox":"277,487,235,9","words":[{"boundingBox":"277,488,6,8","text":"3"},{"boundingBox":"328,487,11,9","text":"//"},{"boundingBox":"347,488,38,8","text":"vOVSSv"},{"boundingBox":"430,488,12,8","text":"30"},{"boundingBox":"501,488,11,8","text":"30"}]},{"boundingBox":"379,541,69,8","words":[{"boundingBox":"379,541,51,8","text":"vou1soN"},{"boundingBox":"437,541,11,8","text":"30"}]},{"boundingBox":"355,563,151,9","words":[{"boundingBox":"355,563,2,3","text":"•"},{"boundingBox":"360,564,31,8","text":"s003S"},{"boundingBox":"398,564,37,8","text":"solnyg"},{"boundingBox":"443,564,18,8","text":"Roo"},{"boundingBox":"468,563,38,9","text":"VON1no"}]},{"boundingBox":"296,585,184,9","words":[{"boundingBox":"296,586,6,8","text":"3"},{"boundingBox":"348,585,10,9","text":"//"},{"boundingBox":"469,586,11,8","text":"30"}]},{"boundingBox":"453,640,18,8","words":[{"boundingBox":"453,640,18,8","text":"Roo"}]},{"boundingBox":"302,661,153,9","words":[{"boundingBox":"302,662,6,8","text":"3"},{"boundingBox":"315,661,31,9","text":"oo•tz"},{"boundingBox":"354,661,11,9","text":"n"},{"boundingBox":"373,662,31,8","text":"nA9VH"},{"boundingBox":"443,662,12,8","text":"30"}]},{"boundingBox":"430,716,12,8","words":[{"boundingBox":"430,716,12,8","text":"30"}]},{"boundingBox":"338,737,113,9","words":[{"boundingBox":"338,738,5,8","text":"3"},{"boundingBox":"389,737,11,9","text":"//"},{"boundingBox":"440,738,11,8","text":"30"}]},{"boundingBox":"245,839,367,20","words":[{"boundingBox":"245,840,4,3","text":"•"},{"boundingBox":"300,839,98,20","text":"wawevo"},{"boundingBox":"530,839,60,18","text":"* Connection #0 to host southeastasia.api.cognitive.microsoft.com left intact
naca"},{"boundingBox":"596,839,16,15","text":"o"}]}]}]}

Однако, когда я установил параметр language=unk (AutoDetect), результат хороший, и распознаю, что язык pt Португальский правильно.

curl -v -X POST "https://southeastasia.api.cognitive.microsoft.
com/vision/v2.0/ocr?language=unk" 
-H "Content-Type: application/json" 
-H "Ocp-Apim-Subscription-Key: <subscription key>" 
--data-ascii '{"url":"https://i.imgur.com/o6i3UDH.png"}'

Он правильно распознается Ã не A.

{"language":"pt","textAngle":0.0,"orientation":"Up","regions":[{"boundingBox":"175,64,389,736","lines":[{"boundingBox":"175,64,388,20","words":[{"boundingBox":"175,69,16,15","text":"a"},{"boundingBox":"197,66,60,18","text":"cahne"},{"boundingBox":"389,64,142,20","text":"ahomemdõa"},{"boundingBox":"538,80,4,3","text":"."},{"boundingBox":"553,80,10,3","text":".."}]},{"boundingBox":"356,106,31,8","words":[{"boundingBox":"356,106,31,8","text":"CARNE"}]},{"boundingBox":"297,177,152,9","words":[{"boundingBox":"297,177,31,8","text":"PEITO"},{"boundingBox":"336,177,11,8","text":"DE"},{"boundingBox":"355,177,24,8","text":"PATO"},{"boundingBox":"387,177,11,9","text":"//"},{"boundingBox":"406,177,31,9","text":"19.50"},{"boundingBox":"444,177,5,8","text":"€"}]},{"boundingBox":"300,199,146,8","words":[{"boundingBox":"300,199,38,8","text":"BULGUR"},{"boundingBox":"345,199,12,8","text":"DE"},{"boundingBox":"364,199,82,8","text":"RAS-EL-HANOUT"}]},{"boundingBox":"262,253,223,9","words":[{"boundingBox":"262,253,63,8","text":"ENTRECOSTO"},{"boundingBox":"332,253,12,8","text":"DE"},{"boundingBox":"352,253,24,8","text":"BIFE"},{"boundingBox":"383,253,31,8","text":"HAGYU"},{"boundingBox":"422,253,11,9","text":"//"},{"boundingBox":"441,253,31,9","text":"24.00"},{"boundingBox":"479,253,6,8","text":"€"}]},{"boundingBox":"316,275,114,8","words":[{"boundingBox":"316,275,18,8","text":"COM"},{"boundingBox":"341,275,32,8","text":"MOLHO"},{"boundingBox":"380,275,50,8","text":"BARBECUE"}]},{"boundingBox":"256,327,235,11","words":[{"boundingBox":"256,329,43,8","text":"BARRIGA"},{"boundingBox":"307,329,11,8","text":"DE"},{"boundingBox":"326,327,37,10","text":"LEITÃO"},{"boundingBox":"370,329,51,8","text":"CROCANTE"},{"boundingBox":"429,329,10,9","text":"//"},{"boundingBox":"448,329,30,9","text":"19.50"},{"boundingBox":"485,329,6,8","text":"€"}]},{"boundingBox":"281,351,151,9","words":[{"boundingBox":"281,351,38,9","text":"QUINOA"},{"boundingBox":"326,351,18,8","text":"cog"},{"boundingBox":"352,351,37,8","text":"FRUTOS"},{"boundingBox":"396,351,36,9","text":"SECOS."}]},{"boundingBox":"288,372,171,10","words":[{"boundingBox":"288,374,5,8","text":"E"},{"boundingBox":"300,374,31,8","text":"MOLHO"},{"boundingBox":"339,374,11,8","text":"DE"},{"boundingBox":"357,374,51,8","text":"MOSTARDA"},{"boundingBox":"416,374,11,8","text":"EH"},{"boundingBox":"434,372,25,10","text":"GRÃO"}]},{"boundingBox":"236,427,274,9","words":[{"boundingBox":"236,427,31,8","text":"PERNA"},{"boundingBox":"275,427,11,8","text":"DE"},{"boundingBox":"294,427,43,8","text":"BORREGO"},{"boundingBox":"345,427,12,8","text":"DE"},{"boundingBox":"365,427,30,8","text":"LEITE"},{"boundingBox":"402,427,38,8","text":"ASSADA"},{"boundingBox":"448,427,11,9","text":"//"},{"boundingBox":"466,427,31,9","text":"20.50"},{"boundingBox":"504,427,6,8","text":"€"}]},{"boundingBox":"297,448,152,11","words":[{"boundingBox":"297,448,24,10","text":"PURÉ"},{"boundingBox":"329,450,42,9","text":"BATATA."},{"boundingBox":"380,450,69,8","text":"RATATOUILLE"}]},{"boundingBox":"281,501,184,11","words":[{"boundingBox":"281,501,44,10","text":"TÁRTARO"},{"boundingBox":"332,503,12,8","text":"DE"},{"boundingBox":"352,503,43,8","text":"NOVILHO"},{"boundingBox":"403,503,11,9","text":"//"},{"boundingBox":"422,503,31,9","text":"16.50"},{"boundingBox":"460,503,5,8","text":"€"}]},{"boundingBox":"256,526,235,8","words":[{"boundingBox":"256,526,37,8","text":"PICADO"},{"boundingBox":"314,526,24,8","text":"FACA"},{"boundingBox":"345,526,18,8","text":"COH"},{"boundingBox":"370,526,57,8","text":"BATATINHA"},{"boundingBox":"434,526,57,8","text":"GAUFRETTE"}]},{"boundingBox":"236,579,274,9","words":[{"boundingBox":"236,579,25,8","text":"BIFE"},{"boundingBox":"268,579,12,8","text":"OA"},{"boundingBox":"287,579,32,8","text":"VAZIA"},{"boundingBox":"326,579,50,8","text":"GRELHADO"},{"boundingBox":"384,579,55,9","text":"(NOVILHO)"},{"boundingBox":"448,579,11,9","text":"//"},{"boundingBox":"466,579,31,9","text":"20.50"},{"boundingBox":"504,579,6,8","text":"€"}]},{"boundingBox":"336,602,74,9","words":[{"boundingBox":"336,602,23,9","text":"(4/-"},{"boundingBox":"367,602,18,8","text":"220"},{"boundingBox":"393,602,17,9","text":"GR)"}]},{"boundingBox":"211,625,323,9","words":[{"boundingBox":"211,625,18,8","text":"cog"},{"boundingBox":"236,625,38,8","text":"BATATA"},{"boundingBox":"281,625,12,8","text":"00"},{"boundingBox":"300,625,23,9","text":"DIA."},{"boundingBox":"332,625,38,8","text":"CEBOLA"},{"boundingBox":"377,625,76,8","text":"CARAMELIZADA"},{"boundingBox":"460,625,5,8","text":"E"},{"boundingBox":"473,625,61,9","text":"COGUMELOS."}]},{"boundingBox":"181,645,383,10","words":[{"boundingBox":"181,647,51,8","text":"MANTEIGA"},{"boundingBox":"239,645,25,10","text":"CAFÉ"},{"boundingBox":"272,647,11,8","text":"DE"},{"boundingBox":"291,647,30,8","text":"PARIS"},{"boundingBox":"329,647,11,8","text":"OU"},{"boundingBox":"348,647,38,8","text":"CROSTA"},{"boundingBox":"393,647,12,8","text":"DE"},{"boundingBox":"412,647,44,8","text":"PIMENTA"},{"boundingBox":"463,647,12,8","text":"OU"},{"boundingBox":"482,647,31,8","text":"MOLHO"},{"boundingBox":"521,645,43,10","text":"BEARNÊS"}]},{"boundingBox":"220,698,306,11","words":[{"boundingBox":"220,698,25,10","text":"COTE"},{"boundingBox":"252,700,12,8","text":"DE"},{"boundingBox":"272,700,30,8","text":"80EUF"},{"boundingBox":"310,700,50,8","text":"GRELHADO"},{"boundingBox":"367,700,18,8","text":"COM"},{"boundingBox":"394,700,24,8","text":"FLOR"},{"boundingBox":"425,700,12,8","text":"DE"},{"boundingBox":"444,700,18,8","text":"SAL"},{"boundingBox":"470,700,11,9","text":"//"},{"boundingBox":"489,700,24,9","text":"6.80"},{"boundingBox":"520,700,6,8","text":"€"}]},{"boundingBox":"333,723,80,9","words":[{"boundingBox":"333,723,30,9","text":"(CADA"},{"boundingBox":"371,723,18,8","text":"100"},{"boundingBox":"396,723,17,9","text":"GR)"}]},{"boundingBox":"214,746,318,9","words":[{"boundingBox":"214,746,18,8","text":"COH"},{"boundingBox":"239,746,38,8","text":"BATATA"},{"boundingBox":"284,746,12,8","text":"00"},{"boundingBox":"304,746,23,9","text":"DIA."},{"boundingBox":"335,746,38,8","text":"CEBOLA"},{"boundingBox":"380,746,76,8","text":"CARAMELIZADA"},{"boundingBox":"464,746,5,8","text":"E"},{"boundingBox":"476,746,56,8","text":"COGUMELOS"}]},{"boundingBox":"297,766,151,11","words":[{"boundingBox":"297,768,31,8","text":"MOLHO"},{"boundingBox":"335,766,44,10","text":"BEARNÊS"},{"boundingBox":"387,768,61,9","text":"(OPCIONAL)"}]},{"boundingBox":"285,791,176,9","words":[{"boundingBox":"285,791,75,9","text":"(RECOMENDADO"},{"boundingBox":"368,791,24,8","text":"PARA"},{"boundingBox":"399,791,5,8","text":"2"},{* Connection #0 to host southeastasia.api.cognitive.microsoft.com left intact
"boundingBox":"412,791,49,9","text":"PESSOAS)"}]}]}]}

Я не знаю, почему unk лучше, чем pt.Вы можете воспроизвести и сравнить свои результаты.Надеюсь, это поможет.

...