Тип массива в clickhouseIO для apache beam (поток данных) - PullRequest
0 голосов
/ 06 февраля 2020

Я использую Apache Beam, чтобы потреблять json и вставлять в clickhouse.

В настоящее время у меня проблема с типом данных Array.

Все отлично работает перед добавлением тип массива поля

Schema.Field.of("inputs.value", Schema.FieldType.array(Schema.FieldType.INT64).withNullable(true))

Код для преобразований

p.apply(transformNameSuffix + "ReadFromPubSub",
        PubsubIO.readStrings().fromSubscription(chainConfig.getPubSubSubscriptionPrefix() + "transactions").withIdAttribute(PUBSUB_ID_ATTRIBUTE))
        .apply(transformNameSuffix + "ReadFromPubSub", ParDo.of(new DoFn<String, Row>() {
            @ProcessElement
            public void processElement(ProcessContext c) {
                String item = c.element();

                //System.out.print(item);
                Transaction transaction = JsonUtils.parseJson(item, Transaction.class);
                c.output(Row.withSchema(Schemas.TRANSACTIONS)
                        .addValues(*****,
                                   *****
                                   .......

                transaction.getInputValues()).build());}

        })).setRowSchema(Schemas.TRANSACTIONS).apply(
        ClickHouseIO.<Row>write(
                chainConfig.getClickhouseJDBCURI(),
                chainConfig.getTransactionsTable())
                .withMaxRetries(3)
                .withMaxInsertBlockSize(1)
                .withInitialBackoff(Duration.standardSeconds(5))
                .withInsertDeduplicate(true)
                .withInsertDistributedSync(false));

Метод, который генерирует входные данные

public List<Long> getInputValues() {
    List<Long> values = Lists.newArrayList();

    for (TransactionInput eachInput : inputs) {
        System.out.print(eachInput.getValue());
        values.add(eachInput.getValue());
    }

    return values;
}

Я получаю следующую ошибку:

ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 15. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851)
    at ru.yandex.clickhouse.Writer.send(Writer.java:106)
    at ru.yandex.clickhouse.Writer.send(Writer.java:141)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:764)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:758)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.flush(ClickHouseIO.java:427)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.processElement(ClickHouseIO.java:411)
    at org.apache.beam.sdk.io.clickhouse.AutoValue_ClickHouseIO_WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:222)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
    at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
    at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:216)
    at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
    at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
    at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Throwable: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 15. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:53)
    ... 22 more

Feb 06, 2020 9:04:38 PM org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn flush
WARNING: Error writing to ClickHouse. Retry attempt[1]
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 93. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:58)
    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:28)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.checkForErrorAndThrow(ClickHouseStatementImpl.java:875)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendStream(ClickHouseStatementImpl.java:851)
    at ru.yandex.clickhouse.Writer.send(Writer.java:106)
    at ru.yandex.clickhouse.Writer.send(Writer.java:141)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:764)
    at ru.yandex.clickhouse.ClickHouseStatementImpl.sendRowBinaryStream(ClickHouseStatementImpl.java:758)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.flush(ClickHouseIO.java:427)
    at org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn.processElement(ClickHouseIO.java:411)
    at org.apache.beam.sdk.io.clickhouse.AutoValue_ClickHouseIO_WriteFn$DoFnInvoker.invokeProcessElement(Unknown Source)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:222)
    at org.apache.beam.repackaged.direct_java.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:183)
    at org.apache.beam.repackaged.direct_java.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:78)
    at org.apache.beam.runners.direct.ParDoEvaluator.processElement(ParDoEvaluator.java:216)
    at org.apache.beam.runners.direct.DoFnLifecycleManagerRemovingTransformEvaluator.processElement(DoFnLifecycleManagerRemovingTransformEvaluator.java:54)
    at org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
    at org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Throwable: Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 6. Bytes expected: 93. (version 19.17.4.11 (official build))

    at ru.yandex.clickhouse.except.ClickHouseExceptionSpecifier.specify(ClickHouseExceptionSpecifier.java:53)
    ... 22 more

Feb 06, 2020 9:04:39 PM org.apache.beam.sdk.io.clickhouse.ClickHouseIO$WriteFn flush
WARNING: Error writing to ClickHouse. Retry attempt[1]
ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 33, host: 35.202.46.77, port: 8123; Code: 33, e.displayText() = DB::Exception: Cannot read all data. Bytes read: 5. Bytes expected: 2641. (version 19.17.4.11 (official build)

Схема Clikhouse:

CREATE TABLE IF NOT EXISTS transactions_streaming_small ( 
  *****, 
  *****, 
  inputs Nested ( value Nullable(UInt64) ) ) 
ENGINE = MergeTree() PARTITION BY toYYYYMM(block_date_time)

В чем проблема?

[Версия ClickHouse 19.17.4.11 (официальная сборка)]

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...