Процессор Nifi ConvertAvroToORC не может конвертировать файл avro, имеющий массив, который состоит из чисел с плавающей точкой и массива с плавающей точкой - PullRequest
0 голосов
/ 01 декабря 2018
  1. У меня есть схема авро:
{
   "namespace":"nifi",
   "name":"cgp_batch",
   "type":"record",
   "fields":[
        {
          "name":"values",
          "type":{
             "type":"array",
             "items":{
                "type":"array",
                "items": ["float", {"type": "array", "items": ["float", "string", "null"]}]
             }
          }
       }
   ]
}
У меня есть файл json:
{"values": [[1, 1.1, 1.2, 1.3, [-1, -1.1, -1.2, -1.3], -2, 3], [2, 2.1, 2.2, 2.3, [-2, -2.1, -2.2, -2.3], -3, 4]]}
У меня есть следующая группа процессоров nifi (см. Рисунок).GetFile - просто получите json, который вы можете увидеть выше.ConvertRecord - просто конвертируйте полученный json от JsonTreeReader в avro от AvroRecordSetWriter.JsonTreeReader и AvroRecordSetWriter имеют реестр схемы: AvroSchemaRegistry (который содержит схему avro, которую вы можете увидеть выше).[! [Группа процессов Nifi] [1]] [1]

На этапе преобразования avro в orc nifi выдает исключение:

2018-10-17 13:51:56,809 ERROR [Timer-Driven Process Thread-8] o.a.n.processors.hive.ConvertAvroToORC ConvertAvroToORC[id=814f08dc-0166-1000-a46c-f69042e8ae94] ConvertAvroToORC[id=814f08dc-0166-1000-a46c-f69042e8ae94] failed to process session due to java.lang.IllegalArgumentException: Object Type for class org.apache.avro.generic.GenericData$Array not in Union declaration; Processor Administratively Yielded for 1 sec: java.lang.IllegalArgumentException: Object Type for class org.apache.avro.generic.GenericData$Array not in Union declaration
java.lang.IllegalArgumentException: Object Type for class org.apache.avro.generic.GenericData$Array not in Union declaration
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:88)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$7(NiFiOrcUtils.java:149)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:149)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$7(NiFiOrcUtils.java:149)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:149)
    at org.apache.nifi.processors.hive.ConvertAvroToORC.lambda$onTrigger$0(ConvertAvroToORC.java:245)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2885)
    at org.apache.nifi.processors.hive.ConvertAvroToORC.onTrigger(ConvertAvroToORC.java:209)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-10-17 13:51:56,812 WARN [Timer-Driven Process Thread-8] o.a.n.controller.tasks.ConnectableTask Administratively Yielding ConvertAvroToORC[id=814f08dc-0166-1000-a46c-f69042e8ae94] due to uncaught Exception: java.lang.IllegalArgumentException: Object Type for class org.apache.avro.generic.GenericData$Array not in Union declaration
java.lang.IllegalArgumentException: Object Type for class org.apache.avro.generic.GenericData$Array not in Union declaration
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:88)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$7(NiFiOrcUtils.java:149)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:149)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.lambda$convertToORCObject$7(NiFiOrcUtils.java:149)
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
    at org.apache.hadoop.hive.ql.io.orc.NiFiOrcUtils.convertToORCObject(NiFiOrcUtils.java:149)
    at org.apache.nifi.processors.hive.ConvertAvroToORC.lambda$onTrigger$0(ConvertAvroToORC.java:245)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2885)
    at org.apache.nifi.processors.hive.ConvertAvroToORC.onTrigger(ConvertAvroToORC.java:209)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
2018-10-17 13:52:01,247 INFO [NiFi Web Server-214] o.a.n.c.s.StandardProcessScheduler Stopping ConvertRecord[id=814a4abe-0166-1000-0755-4d43aef3dc4a]
2018-10-17 13:52:01,247 INFO [NiFi Web Server-214] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.standard.ConvertRecord
2018-10-17 13:52:01,247 INFO [Timer-Driven Process Thread-8] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling ConvertRecord[id=814a4abe-0166-1000-0755-4d43aef3dc4a] to run

Итак, скажите, пожалуйста, где я?Я не прав?

Среда: ОС: SUSE Linux Enterprise Server 12 SP3 (выпуск 12.3) или Windows 7 Corporate SP1 Версия Nifi: 1.7.1

...