У меня возникает следующая проблема при попытке запустить Spark для kubernetes , когда файл приложения хранится в контейнере хранилища BLOB-объектов Azure:
2018-10-18 08:48:54 INFO DAGScheduler:54 - Job 0 failed: reduce at SparkPi.scala:38, took 1.743177 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 10.244.1.11, executor 2): org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1086)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.initialize(AzureNativeFileSystemStore.java:538)
at org.apache.hadoop.fs.azure.NativeAzureFileSystem.initialize(NativeAzureFileSystem.java:1366)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3291)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3259)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:470)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:476)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:755)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:747)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:747)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:312)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.azure.AzureException: No credentials found for account datasets83d858296fd0c49b.blob.core.windows.net in the configuration, and its container datasets is not accessible using anonymous credentials. Please check if the container exists first. If it is not publicly available, you have to provide account credentials.
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.connectUsingAnonymousCredentials(AzureNativeFileSystemStore.java:863)
at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.createAzureStorageSession(AzureNativeFileSystemStore.java:1081)
... 24 more
Команда, которую я использую длязапустите задание:
/opt/spark/bin/spark-submit
--master k8s://<my-k8s-master>
--deploy-mode cluster
--name spark-pi
--class org.apache.spark.examples.SparkPi
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=<my-image-built-with-wasb>
--conf spark.kubernetes.namespace=<my-namespace>
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--conf spark.kubernetes.driver.secrets.spark=/opt/spark/conf
--conf spark.kubernetes.executor.secrets.spark=/opt/spark/conf
wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar 10000
У меня есть секрет k8s с именем spark
со следующим содержимым:
apiVersion: v1
kind: Secret
metadata:
name: spark
labels:
app: spark
stack: service
type: Opaque
data:
core-site.xml: |-
{% filter b64encode %}
<configuration>
<property>
<name>fs.azure.account.key.<my-account-name>.blob.core.windows.net</name>
<value><my-account-key></value>
</property>
<property>
<name>fs.AbstractFileSystem.wasb.Impl</name>
<value>org.apache.hadoop.fs.azure.Wasb</value>
</property>
</configuration>
{% endfilter %}
Модуль драйвера загружает зависимости jar, хранящиеся вконтейнер в хранилище BLOB-объектов Azure.Как видно из этого фрагмента журнала:
2018-10-18 08:48:16 INFO Utils:54 - Fetching wasb://<my-container-name>@<my-account-name>.blob.core.windows.net/spark-examples_2.11-2.3.2.jar to /var/spark-data/spark-jars/fetchFileTemp8575879929413871510.tmp
2018-10-18 08:48:16 INFO SparkPodInitContainer:54 - Finished downloading application dependencies.
Как я могу получить модули-исполнители для получения учетных данных, хранящихся в файле core-site.xml
, который смонтирован из секрета k8s?Чего мне не хватает?