Сочетание Google AutoML Vision и Vision API OCR - PullRequest
0 голосов
/ 16 января 2020

Я создал набор данных для обнаружения пользовательских меток AutoML. Это прекрасно работает. Но этот API возвращает только имена меток. Мои ярлыки состоят из текста, поэтому мне нужно то, что написано на ярлыке. Как я могу объединить эти два компонента, чтобы иметь обе информации в одном JSON выводе?

Например, AutoML и вывод моей собственной модели (мне нужно в этом формате, JSON):

      annotation_spec_id: "4732177099668848640"
      image_object_detection {
        bounding_box {
          normalized_vertices {
            x: 0.029733024537563324
            y: 0.2874366343021393
          }
          normalized_vertices {
            x: 0.33260250091552734
            y: 0.3122401535511017
          }
        }
        score: 0.8710569143295288
      }
      display_name: "f_name"
    }
    payload {
      annotation_spec_id: "4732177099668848640"
      image_object_detection {
        bounding_box {
          normalized_vertices {
            x: 0.038460467010736465
            y: 0.8654949069023132
          }
          normalized_vertices {
            x: 0.3702985942363739
            y: 0.8889270424842834
          }
        }
        score: 0.8308634757995605
      }
      display_name: "price"
    }
    payload {
      annotation_spec_id: "4732177099668848640"
      image_object_detection {
        bounding_box {
          normalized_vertices {
            x: 0.026972321793437004
            y: 0.20658300817012787
          }
          normalized_vertices {
            x: 0.43270379304885864
            y: 0.23540039360523224
          }
        }
        score: 0.8028228878974915
      }
      display_name: "ingds"
    }

Вывод OCR API Google Vision (мне не нужен этот формат, но мне нужна эта информация): Тексты:

    "Serinleten İçecekler / Drinks
    Su / Water
    t4.00
    Maden Suyu / Soda water
    +5.00
    Ayran / Yogurt Drink
    +8.00
    Meşrubat Çeşitleri / Fizzy Drinks
    t8.00
    Sikma Portakal Suyu
    Meyve Suları / Fruit Juices
    +8.00
    Ice Tea / Ice Tea
    +8.00
    Şalgam Suyu / Turnip Juice
    t8.00
    Portakal Suyu / Fresh Orange
    t15.00
    Sıkma Nar Suyu Pomegranate Juice
    t15.00
    Komposto / Compote
    t15.00
    Limonata
    Limonata / Fresh Lemonade
    t12.00
    "
    bounds: (12,21),(451,21),(451,679),(12,679)

    "Serinleten"
    bounds: (44,27),(168,26),(168,42),(44,43)

    "İçecekler"
    bounds: (178,22),(291,21),(291,45),(178,46)

    "/"
    bounds: (301,23),(308,23),(308,41),(301,41)

    "Drinks"
    bounds: (318,28),(371,28),(371,39),(318,39)

    "Su"
    bounds: (16,94),(46,94),(46,105),(16,105)

    "/"
    bounds: (48,91),(56,91),(56,108),(48,108)

    "Water"
    bounds: (58,91),(89,91),(89,108),(58,108)

    "t4.00"
    bounds: (285,92),(328,92),(328,103),(285,103)

    "Maden"
    bounds: (15,148),(72,148),(72,159),(15,159)

    "Suyu"
    bounds: (79,148),(119,148),(119,160),(79,160)

    "/"
    bounds: (126,148),(131,148),(131,159),(126,159)

    "Soda"
    bounds: (137,151),(167,151),(167,159),(137,159)

    "water"
    bounds: (173,152),(208,152),(208,159),(173,159)

    "+5.00"
    bounds: (287,148),(330,148),(330,159),(287,159)

    "Ayran"
    bounds: (15,203),(63,203),(63,217),(15,217)

    "/"
    bounds: (70,203),(75,203),(75,215),(70,215)

    "Yogurt"
    bounds: (81,207),(122,207),(122,217),(81,217)

    "Drink"
    bounds: (128,207),(161,207),(161,215),(128,215)

    "+8.00"
    bounds: (288,205),(331,205),(331,217),(288,217)

    "Meşrubat"
    bounds: (15,258),(94,259),(94,275),(15,274)

    "Çeşitleri"
    bounds: (101,259),(169,260),(169,275),(101,274)

    "/"
    bounds: (175,260),(180,260),(180,272),(175,272)

    "Fizzy"
    bounds: (187,263),(218,263),(218,274),(187,274)

    "Drinks"
    bounds: (224,263),(263,263),(263,273),(224,273)

Мой код такой:

    import sys

    from google.cloud import automl_v1beta1
    from google.cloud.automl_v1beta1.proto import service_pb2
    from google.cloud import vision



    def get_prediction(content, project_id, model_id):

      prediction_client = automl_v1beta1.PredictionServiceClient()

      name = 'projects/{}/locations/us-central1/models/{}'.format(project_id, model_id)
      payload = {'image': {'image_bytes': content }}
      params = {}
      request = prediction_client.predict(name, payload, params)

      return request


    def detect_text(content):
        from google.cloud import vision
        import io
        content = content
        client = vision.ImageAnnotatorClient()

        image = vision.types.Image(content=content)

        response = client.text_detection(image=image)
        texts= response.text_annotations
        print('Texts: ')

        for text in texts:
            print('\n"{}"'.format(text.description))
            vertices = (['({},{})'.format(vertex.x, vertex.y)
                        for vertex in text.bounding_poly.vertices])

            print('bounds: {}'.format(','.join(vertices)))

    if __name__ == '__main__':
      file_path = "testMenu.jpg"
      project_id = "*********"
      model_id = "********"

      with open(file_path, 'rb') as ff:
        content = ff.read()

      print (get_prediction(content, project_id, model_id))
      print (detect_text(content))
...