Apache Beam 2.9.0 с использованием Google Cloud DataFlow.
При попытке вставить в BigQuery появляется ошибка, связанная со схемой:
RuntimeError: Could not successfully insert rows to BigQuery table [project:mlpipeline.twitter_posts]. Errors: [<InsertErrorsValueListEntry errors: [<ErrorProto debugInfo: u'' location: u'text' message: u'Invalid NUMERIC value: RT @Deep_In_Depth: 3 Advanced Python Functions for Data Scientists #DeepLearning #MachineLearning #ArtificialIntell\u2026' reason: u'invalid'>] index: 0>, <InsertErrorsValueListEntry errors: [<ErrorProto debugInfo: u'' location: u'text' message: u'Invalid NUMERIC value: What Is Deep Learning? #DeepLearning #MachineLearning #ArtificialIntelligence #DataScience #DL #ML #DS #AI #DNN #NeuralNetworks #NLP #GPU #TensorFlow #Keras #Pytorch #Python #HPC #Automation #AutonomousCar #Quant' reason: u'invalid'>] index: 1>] [while running 'generatedPtransform-20667']
at _flush_batch (/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py:1380)
at finish_bundle (/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py:1368)
at apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle (common.py:365)
at apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle (common.py:361)
at apache_beam.runners.common.DoFnRunner._invoke_bundle_method (common.py:697)
at apache_beam.runners.common.DoFnRunner._reraise_augmented (common.py:724)
at apache_beam.runners.common.DoFnRunner._invoke_bundle_method (common.py:699)
at apache_beam.runners.common.DoFnRunner.finish (common.py:705)
at apache_beam.runners.worker.operations.DoOperation.finish (operations.py:508)
at apache_beam.runners.worker.operations.DoOperation.finish (operations.py:507)
at apache_beam.runners.worker.operations.DoOperation.finish (operations.py:506)
at process_bundle (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py:441)
at process_bundle (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:237)
at do_instruction (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:221)
at <lambda> (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:170)
at _execute (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py:135)
Код:
# Make explicit BQ schema for output tables
bigqueryschema_json = '{"fields": [' \
'{"name":"id","type":"STRING"},' \
'{"name":"text","type":"NUMERIC"},' \
'{"name":"user_id","type":"STRING"},' \
'{"name":"sentiment","type":"FLOAT"},' \
'{"name":"posted_at","type":"TIMESTAMP"}' \
']}'
bigqueryschema = parse_table_schema_from_json(bigqueryschema_json)
У меня изменилось текстовое поле в bigqueryschema_json
на STRING
и NUMERIC
, но та же проблема.
Полный код здесь : Любые подробности, какотладить эту ошибку?