Я получаю эту ошибку, когда пытаюсь запустить программу spark из модуля драйвера (работает автономно в режиме клиента без использования spark-submit):
20/04/29 02:14:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://sparkrunner-0.sparkrunner:4040
20/04/29 02:14:46 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/04/29 02:14:46 DEBUG Config: Trying to configure client from Kubernetes config...
20/04/29 02:14:46 DEBUG Config: Did not find Kubernetes config at: [/root/.kube/config]. Ignoring.
20/04/29 02:14:46 DEBUG Config: Trying to configure client from service account...
20/04/29 02:14:46 DEBUG Config: Found service account host and port: 10.96.0.1:443
20/04/29 02:14:46 DEBUG Config: Found service account ca cert at: [/var/run/secrets/kubernetes.io/serviceaccount/ca.crt].
20/04/29 02:14:46 DEBUG Config: Found service account token at: [/var/run/secrets/kubernetes.io/serviceaccount/token].
20/04/29 02:14:46 DEBUG Config: Trying to configure client namespace from Kubernetes service account namespace path...
20/04/29 02:14:46 DEBUG Config: Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
20/04/29 02:14:46 DEBUG Config: Trying to configure client from Kubernetes config...
20/04/29 02:14:46 DEBUG Config: Did not find Kubernetes config at: [/root/.kube/config]. Ignoring.
20/04/29 02:14:46 DEBUG Config: Trying to configure client from service account...
20/04/29 02:14:46 DEBUG Config: Found service account host and port: 10.96.0.1:443
20/04/29 02:14:46 DEBUG Config: Found service account ca cert at: [/var/run/secrets/kubernetes.io/serviceaccount/ca.crt].
20/04/29 02:14:46 DEBUG Config: Found service account token at: [/var/run/secrets/kubernetes.io/serviceaccount/token].
20/04/29 02:14:46 DEBUG Config: Trying to configure client namespace from Kubernetes service account namespace path...
20/04/29 02:14:46 DEBUG Config: Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
20/04/29 02:14:57 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2934)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:548)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2578)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:896)
at scala.Option.getOrElse(Option.scala:138)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:887)
at sparkrunner.sparklibs.SparkSystem$.<init>(SparkSystem.scala:22)
at sparkrunner.sparklibs.SparkSystem$.<clinit>(SparkSystem.scala)
at sparkrunner.actors.RecipeManager$$anonfun$receive$1.applyOrElse(RecipeManager.scala:41)
at akka.actor.Actor.aroundReceive(Actor.scala:534)
at akka.actor.Actor.aroundReceive$(Actor.scala:532)
at sparkrunner.actors.RecipeManager.aroundReceive(RecipeManager.scala:20)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
at akka.actor.ActorCell.invoke(ActorCell.scala:543)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
at akka.dispatch.Mailbox.run(Mailbox.scala:230)
at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [sparkrunner-0] in namespace: [default] failed.
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:237)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:170)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:59)
at scala.Option.map(Option.scala:163)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:58)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:113)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2928)
... 20 more
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:111)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
at okhttp3.RealCall.execute(RealCall.java:93)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:337)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:318)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:833)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:226)
... 26 more
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping Server@68d79eec{STARTED}[9.4.z-SNAPSHOT]
20/04/29 02:14:57 DEBUG Server: doStop Server@68d79eec{STOPPING}[9.4.z-SNAPSHOT]
20/04/29 02:14:57 DEBUG QueuedThreadPool: ran SparkUI-59-acceptor-0@2b94b939-ServerConnector@79ce3216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/29 02:14:57 DEBUG AbstractHandlerContainer: Graceful shutdown Server@68d79eec{STOPPING}[9.4.z-SNAPSHOT] by
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping Spark@79ce3216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping SelectorManager@Spark@79ce3216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping ManagedSelector@8993a98{STARTED} id=3 keys=0 selected=0 updates=0
Запуск spark-3.0preview2 на мини-кубе (Ма c ОС).
➜ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-26T06:16:15Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Я настроил кластер, как описано здесь:
https://spark.apache.org/docs/latest/running-on-kubernetes.html
Похоже, клиент kubernetes не может связаться с API? Я пытаюсь понять, почему.
Вот что я проверил:
k8s хост / порт, где драйвер отправляет задание, является правильным (из кластера kubectl -info)
DNS работает (модуль случайной отладки может пропинговать модуль драйвера, нет ошибок разрешения DNS в журналах)
RBA C Роль «искра» включена и передается драйвером
В кластере не используются iptables или другие сетевые политики
Есть идеи что еще я могу попробовать отладить проблему?