Apache Beam Ошибка при вставке в BigQuery - PullRequest
0 голосов
/ 20 октября 2019

Apache Beam 2.9.0 с использованием Google Cloud DataFlow.

При попытке вставить в BigQuery появляется ошибка, связанная со схемой:

RuntimeError: Could not successfully insert rows to BigQuery table [project:mlpipeline.twitter_posts]. Errors: [<InsertErrorsValueListEntry errors: [<ErrorProto debugInfo: u'' location: u'text' message: u'Invalid NUMERIC value: RT @Deep_In_Depth: 3 Advanced Python Functions for Data Scientists #DeepLearning #MachineLearning #ArtificialIntell\u2026' reason: u'invalid'>] index: 0>, <InsertErrorsValueListEntry errors: [<ErrorProto debugInfo: u'' location: u'text' message: u'Invalid NUMERIC value: What Is Deep Learning? #DeepLearning #MachineLearning #ArtificialIntelligence #DataScience #DL #ML #DS #AI #DNN #NeuralNetworks #NLP #GPU #TensorFlow #Keras #Pytorch #Python #HPC #Automation #AutonomousCar #Quant' reason: u'invalid'>] index: 1>] [while running 'generatedPtransform-20667']
at _flush_batch (/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py:1380)
at finish_bundle (/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py:1368)
at apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle (common.py:365)
at apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle (common.py:361)
at apache_beam.runners.common.DoFnRunner._invoke_bundle_method (common.py:697)
at apache_beam.runners.common.DoFnRunner._reraise_augmented (common.py:724)
at apache_beam.runners.common.DoFnRunner._invoke_bundle_method (common.py:699)
at apache_beam.runners.common.DoFnRunner.finish (common.py:705)
at apache_beam.runners.worker.operations.DoOperation.finish (operations.py:508)
at apache_beam.runners.worker.operations.DoOperation.finish (operations.py:507)
at apache_beam.runners.worker.operations.DoOperation.finish (operations.py:506)
at process_bundle (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py:441)
at process_bundle (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:237)
at do_instruction (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:221)
at <lambda> (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:170)
at _execute (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:135)

Код:

# Make explicit BQ schema for output tables
    bigqueryschema_json = '{"fields": [' \
                          '{"name":"id","type":"STRING"},' \
                          '{"name":"text","type":"NUMERIC"},' \
                          '{"name":"user_id","type":"STRING"},' \
                          '{"name":"sentiment","type":"FLOAT"},' \
                          '{"name":"posted_at","type":"TIMESTAMP"}' \
                          ']}'
    bigqueryschema = parse_table_schema_from_json(bigqueryschema_json)

У меня изменилось текстовое поле в bigqueryschema_json на STRING и NUMERIC, но та же проблема.

Полный код здесь : Любые подробности, какотладить эту ошибку?

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...