Гоблин: ошибка: java .io.IOException: не удалось зафиксировать состояние набора данных для некоторых наборов данных задания job_GobblinKafkaQuickStart - PullRequest
2 голосов
/ 05 февраля 2020

Я пытаюсь получить данные из kafka topi c в hdfs, следуя https://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/

, за которыми я следую:

start zookeeper
$ zookeeper-server-start.bat C:\Users\name\kafka_2.11-1.1.0\config\zookeeper.properties

start kafka
$ kafka-server-start.bat C:\Users\name\kafka_2.11-1.1.0\config\server.properties

create kafka topi c, если не существует
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

start has oop
$ C:\Users\name\hadoop-3.1.3\sbin\start-all.cmd

создать файл kafka-hdfs.pull в GOBBLIN_JOB_CONFIG_DIR, как показано ниже

job.group=GobblinKafka
job.description=Gobblin quick start job for Kafka
job.lock.enabled=false

kafka.brokers=localhost:9092

source.class=org.apache.gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=org.apache.gobblin.extract.kafka

writer.builder.class=org.apache.gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt

data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher

mr.job.max.mappers=1

metrics.reporting.file.enabled=true
metrics.log.dir=/gobblin-kafka/metrics
metrics.reporting.file.suffix=txt

bootstrap.with.offset=earliest

fs.uri=hdfs://localhost:9000
writer.fs.uri=hdfs://localhost:9000
state.store.fs.uri=hdfs://localhost:9000

mr.job.root.dir=/gobblin-kafka/working
state.store.dir=/gobblin-kafka/state-store
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/gobblintest/job-output

установить GOBBLIN_WORK_DIR
$ export GOBBLIN_WORK_DIR=/mnt/c/users/name/incubator-gobblin/GOBBLIN_WORK_DIR

установить GOBBLIN_JOB_CONF 1032 *

Запустить автономный
$ bin/gobblin.sh service standalone start

ниже приведены некоторые ошибки, найденные в журналах / standalone.out

[JobScheduler-0] org.apache.gobblin.scheduler.JobScheduler$NonScheduledJobRunner  637 - Failed to run job GobblinKafkaQuickStart
org.apache.gobblin.runtime.JobException: Failed to run job GobblinKafkaQuickStart

ERROR [ForkExecutor-0] org.apache.gobblin.runtime.fork.Fork  258 - Fork 0 of task task_GobblinKafkaQuickStart_1580883582897_0 failed to process data records. Set throwable in holder org.apache.gobblin.runtime.ForkThrowableHolder@721ea24d
java.lang.RuntimeException: Error creating writer

ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task  545 - Task task_GobblinKafkaQuickStart_1580883582897_0 failed
java.lang.RuntimeException: Some forks failed.

ERROR [Commit-thread-0] org.apache.gobblin.runtime.SafeDatasetCommit  196 - Failed to persist dataset state for dataset  of job job_GobblinKafkaQuickStart_1580883582897
org.apache.hadoop.security.AccessControlException: Permission denied: user=name, access=WRITE, inode="/":name:supergroup:drwxrwxr-x

ERROR [JobScheduler-0] org.apache.gobblin.util.executors.IteratorExecutor  163 - Iterator executor failure.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=name, access=WRITE, inode="/":name:supergroup:drwxrwxr-x

ERROR [JobScheduler-0] org.apache.gobblin.runtime.AbstractJobLauncher  521 - Failed to launch and run job job_GobblinKafkaQuickStart_1580883582897: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart_1580883582897
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart_1580883582897 

Скажите, пожалуйста, как я могу решить эту проблему ,

...