У меня есть кластер Docker Swarm. В этом кластере у нас есть контейнеры spark (1 master и 1 worker) и hadoop (1 namenode и 1 datanode). Я создал контейнеры, используя следующий файл docker-compose:
version: "3"
services:
master:
image: singularities/spark
command: start-spark master
hostname: master
networks:
- overlay
ports:
- "6066:6066"
- "7070:7070"
- "8080:8080"
- "50070:50070"
- "7077:7077"
deploy:
placement:
constraints:
- node.role == manager
worker:
image: singularities/spark
command: start-spark worker master
networks:
- overlay
environment:
SPARK_WORKER_CORES: 1
SPARK_WORKER_MEMORY: 4g
links:
- master
namenode:
image: sfedyakov/hadoop-271-cluster
command: "/etc/bootstrap.sh -d -namenode"
networks:
- overlay
hostname: namenode
ports:
- "8088:8088"
- "50090:50090"
- "19888:19888"
deploy:
placement:
constraints:
- node.role == manager
datanode:
image: sfedyakov/hadoop-271-cluster
command: "/etc/bootstrap.sh -d -datanode"
networks:
- overlay
links:
- namenode
networks:
overlay:
После создания контейнера, если я запустил docker inspect <namenode container id>
для определения IP-адреса namenode, он выдает следующее:
"Networks": {
"ingress": {
"IPAMConfig": {
"IPv4Address": "10.255.0.20"
},
"Links": null,
"Aliases": [
"b4ec63d0330c"
],
"NetworkID": "etkd22i440xtnyedekv0769dw",
"EndpointID": "5a76f2cb028d40e55ebe7e01688f13ec8f2176c4d134a7e6a2397ad1986eb9f2",
"Gateway": "",
"IPAddress": "10.255.0.20",
"IPPrefixLen": 16,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:0a:ff:00:14",
"DriverOpts": null
},
"spark_overlay": {
"IPAMConfig": {
"IPv4Address": "10.0.4.8"
},
"Links": null,
"Aliases": [
"b4ec63d0330c"
],
"NetworkID": "07r7yh470ipyxy1vzc6b0j4g2",
"EndpointID": "14996683ea1e30a8ed9f2ff75fbd1776786bbac01323176ad1dac6669cb150b9",
"Gateway": "",
"IPAddress": "10.0.4.8",
"IPPrefixLen": 24,
"IPv6Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"MacAddress": "02:42:0a:00:04:08",
"DriverOpts": null
}
}
Я пишу простой пример WordCount с помощью spark
val spark = SparkSession.builder().master("local").appName("test").getOrCreate()
val data = spark.sparkContext.textFile("hdfs://10.0.4.8:9000/Sample.txt")
val counts = data.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.foreach(println)
Однако выдает следующую ошибку:
Caused by: java.net.URISyntaxException: Illegal character in hostname at index 12: hdfs://spark_namenode.1.ywlf9yx9hcm4duhxnywn91i35.spark_overlay:9000
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.parseHostname(URI.java:3387)
at java.net.URI$Parser.parseServer(URI.java:3236)
at java.net.URI$Parser.parseAuthority(URI.java:3155)
at java.net.URI$Parser.parseHierarchical(URI.java:3097)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:673)
at org.apache.hadoop.net.NetUtils.getCanonicalUri(NetUtils.java:270)