Я использую spark-submit
в версии 2.4.5 для создания модуля драйвера искры в моем кластере k8s. Когда я запускаю
bin/spark-submit
--master k8s://https://my-cluster-url:443
--deploy-mode cluster
--name spark-test
--class com.my.main.Class
--conf spark.executor.instances=3
--conf spark.kubernetes.allocation.batch.size=3
--conf spark.kubernetes.namespace=my-namespace
--conf spark.kubernetes.container.image.pullSecrets=my-cr-secret
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.my-vol.mount.path=/opt/spark/work-dir/src/main/resources/
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.my-vol.options.claimName=my-pvc
--conf spark.kubernetes.container.image=my-registry.io/spark-test:test-2.4.5
local:///opt/spark/work-dir/my-service.jar
spark-submit
успешно создает модуль в моем кластере k8s, и модуль переходит в состояние running
. Затем модуль быстро останавливается со статусом error
. Глядя на логи модуля, я вижу
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sed 's/[^=]*=\(.*\)/\1/g'
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=<SPARK_DRIVER_BIND_ADDRESS> --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class com.my.main.Class spark-internal
20/03/04 16:44:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Но других ошибок нет. Строки +
в журнале соответствуют командам, выполненным в kubernetes/dockerfiles/spark/entrypoint.sh
в распределении искры. Таким образом, похоже, что он проходит через весь сценарий точки входа и пытается выполнить последнюю команду exec /usr/bin/tini -s -- "${CMD[@]}"
до того, как произойдет сбой после этих log4j
предупреждений. Как я могу отладить эту проблему дальше?
изменить для более подробной информации:
События Pod, как видно в kubectl describe po ...
:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m41s default-scheduler Successfully assigned my-namespace/spark-test-1583356942292-driver to aks-agentpool-12301882-10
Warning FailedMount 3m40s kubelet, aks-agentpool-12301882-10 MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "spark-test-1583356942292-driver-conf-map" not found
Normal Pulling 3m37s kubelet, aks-agentpool-12301882-10 Pulling image "my-registry.io/spark-test:test-2.4.5"
Normal Pulled 3m37s kubelet, aks-agentpool-12301882-10 Successfully pulled image "my-registry.io/spark-test:test-2.4.5"
Normal Created 3m36s kubelet, aks-agentpool-12301882-10 Created container spark-kubernetes-driver
Normal Started 3m36s kubelet, aks-agentpool-12301882-10 Started container spark-kubernetes-driver
Мой Dockerfile - слегка адаптированный из предоставленного искрового Dockerfile и собранный с использованием ./bin/docker-image-tool.sh
:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
FROM openjdk:8-jdk-slim
ARG spark_jars=jars
ARG img_path=kubernetes/dockerfiles
ARG k8s_tests=kubernetes/tests
ARG work_dir=/opt/spark/work-dir
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex && \
apt-get update && \
ln -s /lib /lib64 && \
apt install -y bash tini libc6 libpam-modules libnss3 && \
mkdir -p /opt/spark && \
mkdir -p ${work_dir} && \
mkdir -p /opt/spark/conf && \
touch /opt/spark/RELEASE && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
rm -rf /var/cache/apt/* && \
mkdir -p ${work_dir}/src/main/resources && \
mkdir -p /var/run/my-service && \
mkdir -p /var/log/my-service
COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY ${img_path}/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY ${k8s_tests} /opt/spark/tests
COPY data /opt/spark/data
ADD conf/log4j.properties.template /opt/spark/conf/log4j.properties
ADD kubernetes/jars/my-service-*-bin.tar.gz ${work_dir}
RUN mv "${work_dir}/my-service-"*".jar" "${work_dir}/my-service.jar"
ENV SPARK_HOME /opt/spark
WORKDIR ${work_dir}
ENTRYPOINT [ "/opt/entrypoint.sh" ]