Spark kubernetes pod не работает без заметных ошибок - PullRequest
0 голосов
/ 04 марта 2020

Я использую spark-submit в версии 2.4.5 для создания модуля драйвера искры в моем кластере k8s. Когда я запускаю

bin/spark-submit 
--master k8s://https://my-cluster-url:443
--deploy-mode cluster 
--name spark-test 
--class com.my.main.Class 
--conf spark.executor.instances=3 
--conf spark.kubernetes.allocation.batch.size=3 
--conf spark.kubernetes.namespace=my-namespace 
--conf spark.kubernetes.container.image.pullSecrets=my-cr-secret 
--conf spark.kubernetes.container.image.pullPolicy=Always 
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.my-vol.mount.path=/opt/spark/work-dir/src/main/resources/ 
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.my-vol.options.claimName=my-pvc
--conf spark.kubernetes.container.image=my-registry.io/spark-test:test-2.4.5 
local:///opt/spark/work-dir/my-service.jar

spark-submit успешно создает модуль в моем кластере k8s, и модуль переходит в состояние running. Затем модуль быстро останавливается со статусом error. Глядя на логи модуля, я вижу

++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ SPARK_K8S_CMD=driver
+ case "$SPARK_K8S_CMD" in
+ shift 1
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ sed 's/[^=]*=\(.*\)/\1/g'
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=<SPARK_DRIVER_BIND_ADDRESS> --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class com.my.main.Class spark-internal 
20/03/04 16:44:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

Но других ошибок нет. Строки + в журнале соответствуют командам, выполненным в kubernetes/dockerfiles/spark/entrypoint.sh в распределении искры. Таким образом, похоже, что он проходит через весь сценарий точки входа и пытается выполнить последнюю команду exec /usr/bin/tini -s -- "${CMD[@]}" до того, как произойдет сбой после этих log4j предупреждений. Как я могу отладить эту проблему дальше?

изменить для более подробной информации:

События Pod, как видно в kubectl describe po ...:

Events:
  Type     Reason       Age    From                                Message
  ----     ------       ----   ----                                -------
  Normal   Scheduled    3m41s  default-scheduler                   Successfully assigned my-namespace/spark-test-1583356942292-driver to aks-agentpool-12301882-10
  Warning  FailedMount  3m40s  kubelet, aks-agentpool-12301882-10  MountVolume.SetUp failed for volume "spark-conf-volume" : configmap "spark-test-1583356942292-driver-conf-map" not found
  Normal   Pulling      3m37s  kubelet, aks-agentpool-12301882-10  Pulling image "my-registry.io/spark-test:test-2.4.5"
  Normal   Pulled       3m37s  kubelet, aks-agentpool-12301882-10  Successfully pulled image "my-registry.io/spark-test:test-2.4.5"
  Normal   Created      3m36s  kubelet, aks-agentpool-12301882-10  Created container spark-kubernetes-driver
  Normal   Started      3m36s  kubelet, aks-agentpool-12301882-10  Started container spark-kubernetes-driver

Мой Dockerfile - слегка адаптированный из предоставленного искрового Dockerfile и собранный с использованием ./bin/docker-image-tool.sh:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

FROM openjdk:8-jdk-slim

ARG spark_jars=jars
ARG img_path=kubernetes/dockerfiles
ARG k8s_tests=kubernetes/tests
ARG work_dir=/opt/spark/work-dir

# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .

RUN set -ex && \
    apt-get update && \
    ln -s /lib /lib64 && \
    apt install -y bash tini libc6 libpam-modules libnss3 && \
    mkdir -p /opt/spark && \
    mkdir -p ${work_dir} && \
    mkdir -p /opt/spark/conf && \
    touch /opt/spark/RELEASE && \
    rm /bin/sh && \
    ln -sv /bin/bash /bin/sh && \
    echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
    chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \
    rm -rf /var/cache/apt/* && \
    mkdir -p ${work_dir}/src/main/resources && \
    mkdir -p /var/run/my-service && \
    mkdir -p /var/log/my-service

COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY ${img_path}/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY ${k8s_tests} /opt/spark/tests
COPY data /opt/spark/data
ADD conf/log4j.properties.template /opt/spark/conf/log4j.properties
ADD kubernetes/jars/my-service-*-bin.tar.gz ${work_dir}
RUN mv "${work_dir}/my-service-"*".jar" "${work_dir}/my-service.jar"

ENV SPARK_HOME /opt/spark

WORKDIR ${work_dir}

ENTRYPOINT [ "/opt/entrypoint.sh" ]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...