Я пытаюсь получить данные из kafka topi c в hdfs, следуя https://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/
, за которыми я следую:
start zookeeper
$ zookeeper-server-start.bat C:\Users\name\kafka_2.11-1.1.0\config\zookeeper.properties
start kafka
$ kafka-server-start.bat C:\Users\name\kafka_2.11-1.1.0\config\server.properties
create kafka topi c, если не существует
$ kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
start has oop
$ C:\Users\name\hadoop-3.1.3\sbin\start-all.cmd
создать файл kafka-hdfs.pull в GOBBLIN_JOB_CONFIG_DIR, как показано ниже
job.group=GobblinKafka
job.description=Gobblin quick start job for Kafka
job.lock.enabled=false
kafka.brokers=localhost:9092
source.class=org.apache.gobblin.source.extractor.extract.kafka.KafkaSimpleSource
extract.namespace=org.apache.gobblin.extract.kafka
writer.builder.class=org.apache.gobblin.writer.SimpleDataWriterBuilder
writer.file.path.type=tablename
writer.destination.type=HDFS
writer.output.format=txt
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher
mr.job.max.mappers=1
metrics.reporting.file.enabled=true
metrics.log.dir=/gobblin-kafka/metrics
metrics.reporting.file.suffix=txt
bootstrap.with.offset=earliest
fs.uri=hdfs://localhost:9000
writer.fs.uri=hdfs://localhost:9000
state.store.fs.uri=hdfs://localhost:9000
mr.job.root.dir=/gobblin-kafka/working
state.store.dir=/gobblin-kafka/state-store
task.data.root.dir=/jobs/kafkaetl/gobblin/gobblin-kafka/task-data
data.publisher.final.dir=/gobblintest/job-output
установить GOBBLIN_WORK_DIR
$ export GOBBLIN_WORK_DIR=/mnt/c/users/name/incubator-gobblin/GOBBLIN_WORK_DIR
установить GOBBLIN_JOB_CONF 1032 *
Запустить автономный
$ bin/gobblin.sh service standalone start
ниже приведены некоторые ошибки, найденные в журналах / standalone.out
[JobScheduler-0] org.apache.gobblin.scheduler.JobScheduler$NonScheduledJobRunner 637 - Failed to run job GobblinKafkaQuickStart
org.apache.gobblin.runtime.JobException: Failed to run job GobblinKafkaQuickStart
ERROR [ForkExecutor-0] org.apache.gobblin.runtime.fork.Fork 258 - Fork 0 of task task_GobblinKafkaQuickStart_1580883582897_0 failed to process data records. Set throwable in holder org.apache.gobblin.runtime.ForkThrowableHolder@721ea24d
java.lang.RuntimeException: Error creating writer
ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task 545 - Task task_GobblinKafkaQuickStart_1580883582897_0 failed
java.lang.RuntimeException: Some forks failed.
ERROR [Commit-thread-0] org.apache.gobblin.runtime.SafeDatasetCommit 196 - Failed to persist dataset state for dataset of job job_GobblinKafkaQuickStart_1580883582897
org.apache.hadoop.security.AccessControlException: Permission denied: user=name, access=WRITE, inode="/":name:supergroup:drwxrwxr-x
ERROR [JobScheduler-0] org.apache.gobblin.util.executors.IteratorExecutor 163 - Iterator executor failure.
java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: Permission denied: user=name, access=WRITE, inode="/":name:supergroup:drwxrwxr-x
ERROR [JobScheduler-0] org.apache.gobblin.runtime.AbstractJobLauncher 521 - Failed to launch and run job job_GobblinKafkaQuickStart_1580883582897: java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart_1580883582897
java.io.IOException: Failed to commit dataset state for some dataset(s) of job job_GobblinKafkaQuickStart_1580883582897
Скажите, пожалуйста, как я могу решить эту проблему ,