Я использую kafka connect HDFS-приемник и Hadoop (для HDFS) в docker-compose.
Hadoop (namenode и datanode), кажется, работает правильно.
Но у меня ошибка сраковина kafka connect:
ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED
(io.confluent.connect.hdfs.TopicPartitionWriter:277)
org.apache.kafka.connect.errors.DataException:
Error creating writer for log file hdfs://namenode:8020/logs/MyTopic/0/log
Для информации:
Сервисы Hadoop в моем docker-compose.yml:
namenode:
image: uhopper/hadoop-namenode:2.8.1
hostname: namenode
container_name: namenode
ports:
- "50070:50070"
networks:
default:
fides-webapp:
aliases:
- "hadoop"
volumes:
- namenode:/hadoop/dfs/name
env_file:
- ./hadoop.env
environment:
- CLUSTER_NAME=hadoop-cluster
datanode1:
image: uhopper/hadoop-datanode:2.8.1
hostname: datanode1
container_name: datanode1
networks:
default:
fides-webapp:
aliases:
- "hadoop"
volumes:
- datanode1:/hadoop/dfs/data
env_file:
- ./hadoop.env
И мой файл kafka-connect:
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=MyTopic
hdfs.url=hdfs://namenode:8020
flush.size=3
РЕДАКТИРОВАТЬ:
Я добавляю переменную env для kafka connect, чтобы знать имя кластера (envпеременная: CLUSTER_NAME, добавляемая в службу соединения kafka в файле создания Docker).
Ошибка не та (и кажется, что она решает проблему):
INFO Starting commit and rotation for topic partition scoring-topic-0 with start offsets {partition=0=0} and end offsets {partition=0=2}
(io.confluent.connect.hdfs.TopicPartitionWriter:368)
ERROR Exception on topic partition MyTopic-0: (io.confluent.connect.hdfs.TopicPartitionWriter:403)
org.apache.kafka.connect.errors.DataException: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /topics/+tmp/MyTopic/partition=0/bc4cf075-ccfa-4338-9672-5462cc6c3404_tmp.avro
could only be replicated to 0 nodes instead of minReplication (=1).
There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
EDIT2:
Файл hadoop.env
:
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
# Configure default BlockSize and Replication for local
# data. Keep it small for experimentation.
HDFS_CONF_dfs_blocksize=1m
YARN_CONF_yarn_log___aggregation___enable=true
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs
YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
YARN_CONF_yarn_timeline___service_enabled=true
YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
YARN_CONF_yarn_timeline___service_hostname=historyserver