Привет! Я использую файл yaml, предоставленный по этой ссылке
https://github.com/dirk1492/docker-spark/blob/master/kubernetes/thriftserver.yaml
, и изменил его, чтобы запустить Spark Thrift Server на моем кластере kubernetes в облаке.
Однако, получив это ошибка ниже.
thriftserver-cluster
10.0.10.3
Waiting: CrashLoopBackOff
3
a minute
Back-off restarting failed container
Мой spark-thriftserver.yaml выглядит следующим образом:
22:04 $ cat spark-thriftserver.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: thriftserver-cluster
name: thriftserver-cluster
spec:
replicas: 1
selector:
matchLabels:
app: thriftserver-cluster
template:
metadata:
labels:
app: thriftserver-cluster
spec:
containers:
- env:
- name: SPARK_MODE
value: thriftserver
- name: SPARK_MASTER_URL
value: k8s://https://0.0.0.0:6443
- name: SPARK_PUBLIC_DNS
value: localhost
- name: SPARK_WEBUI_PORT
value: "4040"
- name: SPARK_CORES_MAX
value: "1"
- name: SPARK_DRIVER_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: ../spark:spark-2.4.3_with_jars
name: spark-thriftserver
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4040
name: http
protocol: TCP
- containerPort: 10000
name: jdbc
protocol: TCP
dnsPolicy: ClusterFirst
restartPolicy: Always
✔ ~/Downloads/spark-2.4.3-bin-hadoop2.7
Есть идеи, что вызывает эту ошибку :Back-off restarting failed container:
и возможно ее исправить? Что-то не так с моей конфигурацией YAML?
Вот вывод команды kubectl description для развертывания:
22:54 $ kubectl get pods
NAME READY STATUS RESTARTS AGE
sparkoperator-1584604751-b5bf5fcc-c8wkh 1/1 Running 0 3d21h
thriftserver-cluster-cfbc67955-wgqj6 0/1 CrashLoopBackOff 5 4m17s
22:56 $ kubectl describe pod thriftserver-cluster-cfbc67955-wgqj6
Name: thriftserver-cluster-cfbc67955-wgqj6
Namespace: default
Priority: 0
Node: 10.0.10.3/10.0.10.3
Start Time: Sun, 22 Mar 2020 22:51:49 -0700
Labels: app=thriftserver-cluster
pod-template-hash=cfbc67955
Annotations: <none>
Status: Running
IP: 0.0.0.0
IPs: <none>
Controlled By: ReplicaSet/thriftserver-cluster-cfbc67955
Containers:
spark-thriftserver:
Container ID: docker://ef0ffd2a38f336a1dea36ad1782556869f564f7baf29817cbd914d09e4f9e2bc
Image: ../spark:spark-2.4.3_with_jars
Image ID: docker-pullable://../spark@sha256:ce9b89a42abc11b3e82ba0d10efac24245656cb7b38f7c0db3d44a7c194b0b9e
Ports: 4040/TCP, 10000/TCP
Host Ports: 0/TCP, 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 22 Mar 2020 22:54:53 -0700
Finished: Sun, 22 Mar 2020 22:54:53 -0700
Ready: False
Restart Count: 5
Environment:
SPARK_MODE: thriftserver
SPARK_MASTER_URL: k8s://https://0.0.0.0:6443
SPARK_PUBLIC_DNS: localhost
SPARK_WEBUI_PORT: 4040
SPARK_CORES_MAX: 1
SPARK_DRIVER_HOST: (v1:status.podIP)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9t6sd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-9t6sd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9t6sd
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m56s default-scheduler Successfully assigned default/thriftserver-cluster-cfbc67955-wgqj6 to 10.0.10.3
Normal Pulled 3m22s (x5 over 4m55s) kubelet, 10.0.10.3 Container image "../spark:spark-2.4.3_with_jars" already present on machine
Normal Created 3m22s (x5 over 4m55s) kubelet, 10.0.10.3 Created container spark-thriftserver
Normal Started 3m22s (x5 over 4m54s) kubelet, 10.0.10.3 Started container spark-thriftserver
Warning BackOff <invalid> (x25 over 4m53s) kubelet, 10.0.10.3 Back-off restarting failed container
22:53 $ kubectl describe deployment thriftserver-cluster
Name: thriftserver-cluster
Namespace: default
CreationTimestamp: Sun, 22 Mar 2020 22:51:49 -0700
Labels: app=thriftserver-cluster
Annotations: deployment.kubernetes.io/revision: 1
kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"thriftserver-cluster"},"name":"thriftserver-clus...
Selector: app=thriftserver-cluster
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=thriftserver-cluster
Containers:
spark-thriftserver:
Image: ../spark:spark-2.4.3_with_jars
Ports: 4040/TCP, 10000/TCP
Host Ports: 0/TCP, 0/TCP
Environment:
SPARK_MODE: thriftserver
SPARK_MASTER_URL: k8s://https://0.0.0.0:6443
SPARK_PUBLIC_DNS: localhost
SPARK_WEBUI_PORT: 4040
SPARK_CORES_MAX: 1
SPARK_DRIVER_HOST: (v1:status.podIP)
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: thriftserver-cluster-cfbc67955 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m40s deployment-controller Scaled up replica set thriftserver-cluster-cfbc67955 to 1
23:11 $ kubectl get pods
NAME READY STATUS RESTARTS AGE
sparkoperator-1584604751-b5bf5fcc-c8wkh 1/1 Running 0 3d22h
thriftserver-cluster-cfbc67955-vcxj4 0/1 CrashLoopBackOff 3 83s
✔ ~
23:12 $ kubectl logs thriftserver-cluster-cfbc67955-vcxj4
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/ash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/ash ']'
+ SPARK_K8S_CMD=
+ case "$SPARK_K8S_CMD" in
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ sed 's/[^=]*=\(.*\)/\1/g'
+ sort -t_ -k4 -n
+ grep SPARK_JAVA_OPT_
+ env
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n /opt/spark/jars/jackson-annotations-2.6.7.jar ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*:/opt/spark/jars/jackson-annotations-2.6.7.jar'
+ '[' -n '' ']'
+ PYSPARK_ARGS=
+ '[' -n '' ']'
+ R_ARGS=
+ '[' -n '' ']'
+ '[' '' == 2 ']'
+ '[' '' == 3 ']'
+ case "$SPARK_K8S_CMD" in
+ echo 'Unknown command: '
Unknown command:
+ exit 1
23:13 $ kubectl get events -n default --sort-by='.metadata.creationTimestamp'
LAST SEEN TYPE REASON OBJECT MESSAGE
4m8s Normal SuccessfulCreate replicaset/thriftserver-cluster-cfbc67955 Created pod: thriftserver-cluster-cfbc67955-vzjks
4m7s Normal Scheduled pod/thriftserver-cluster-cfbc67955-vzjks Successfully assigned default/thriftserver-cluster-cfbc67955-vzjks to 10.0.10.3
4m5s Normal Started pod/thriftserver-cluster-cfbc67955-vzjks Started container spark-thriftserver
4m5s Normal Created pod/thriftserver-cluster-cfbc67955-vzjks Created container spark-thriftserver
4m5s Normal Pulled pod/thriftserver-cluster-cfbc67955-vzjks Container image "../spark:spark-2.4.3_with_jars" already present on machine
4m3s Warning BackOff pod/thriftserver-cluster-cfbc67955-vzjks Back-off restarting failed container
3m30s Normal ScalingReplicaSet deployment/thriftserver-cluster Scaled up replica set thriftserver-cluster-cfbc67955 to 1
3m30s Normal SuccessfulCreate replicaset/thriftserver-cluster-cfbc67955 Created pod: thriftserver-cluster-cfbc67955-vcxj4
3m30s Normal Scheduled pod/thriftserver-cluster-cfbc67955-vcxj4 Successfully assigned default/thriftserver-cluster-cfbc67955-vcxj4 to 10.0.10.3
2m Normal Created pod/thriftserver-cluster-cfbc67955-vcxj4 Created container spark-thriftserver
2m Normal Pulled pod/thriftserver-cluster-cfbc67955-vcxj4 Container image "../spark:spark-2.4.3_with_jars" already present on machine
2m Normal Started pod/thriftserver-cluster-cfbc67955-vcxj4 Started container spark-thriftserver
91s Warning BackOff pod/thriftserver-cluster-cfbc67955-vcxj4 Back-off restarting failed container
Вот Dockerfile
✔ ~/Downloads/spark-2.4.3-bin-hadoop2.7/kubernetes/dockerfiles/spark
23:23 $ cat Dockerfile
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
FROM openjdk:8-alpine
ARG spark_jars=jars
ARG img_path=kubernetes/dockerfiles
ARG k8s_tests=kubernetes/tests
# Before building the docker image, first build and make a Spark distribution following
# the instructions in http://spark.apache.org/docs/latest/building-spark.html.
# If this docker file is being used in the context of building your images from a Spark
# distribution, the docker build command should be invoked from the top level directory
# of the Spark distribution. E.g.:
# docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
RUN set -ex && \
apk upgrade --no-cache && \
apk add --no-cache bash tini libc6-compat linux-pam nss && \
mkdir -p /opt/spark && \
mkdir -p /opt/spark/work-dir && \
touch /opt/spark/RELEASE && \
rm /bin/sh && \
ln -sv /bin/bash /bin/sh && \
echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \
chgrp root /etc/passwd && chmod ug+rw /etc/passwd
COPY ${spark_jars} /opt/spark/jars
COPY bin /opt/spark/bin
COPY sbin /opt/spark/sbin
COPY ${img_path}/spark/entrypoint.sh /opt/
COPY examples /opt/spark/examples
COPY ${k8s_tests} /opt/spark/tests
COPY data /opt/spark/data
COPY oci_api_key.pem /opt/spark/data
ENV SPARK_HOME /opt/spark
ENV SPARK_EXTRA_CLASSPATH ${SPARK_HOME}/jars/jackson-annotations-2.6.7.jar
ENV SPARK_EXECUTOR_EXTRA_CLASSPATH ${SPARK_HOME}/jars/jackson-annotations-2.6.7.jar
RUN rm $SPARK_HOME/jars/kubernetes-client-4.1.2.jar
ADD https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.9.0/kubernetes-client-4.9.0.jar $SPARK_HOME/jars
ADD https://repo1.maven.org/maven2/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.9.10/jackson-datatype-jsr310-2.9.10.jar $SPARK_HOME/jars
WORKDIR /opt/spark/work-dir
ENTRYPOINT [ "/opt/entrypoint.sh" ]