Тайм-аут сети, инициализирующий контекст искры в kubernetes (автономный драйвер) - PullRequest
0 голосов
/ 29 апреля 2020

Я получаю эту ошибку, когда пытаюсь запустить программу spark из модуля драйвера (работает автономно в режиме клиента без использования spark-submit):

20/04/29 02:14:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://sparkrunner-0.sparkrunner:4040
20/04/29 02:14:46 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/04/29 02:14:46 DEBUG Config: Trying to configure client from Kubernetes config...
20/04/29 02:14:46 DEBUG Config: Did not find Kubernetes config at: [/root/.kube/config]. Ignoring.
20/04/29 02:14:46 DEBUG Config: Trying to configure client from service account...
20/04/29 02:14:46 DEBUG Config: Found service account host and port: 10.96.0.1:443
20/04/29 02:14:46 DEBUG Config: Found service account ca cert at: [/var/run/secrets/kubernetes.io/serviceaccount/ca.crt].
20/04/29 02:14:46 DEBUG Config: Found service account token at: [/var/run/secrets/kubernetes.io/serviceaccount/token].
20/04/29 02:14:46 DEBUG Config: Trying to configure client namespace from Kubernetes service account namespace path...
20/04/29 02:14:46 DEBUG Config: Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
20/04/29 02:14:46 DEBUG Config: Trying to configure client from Kubernetes config...
20/04/29 02:14:46 DEBUG Config: Did not find Kubernetes config at: [/root/.kube/config]. Ignoring.
20/04/29 02:14:46 DEBUG Config: Trying to configure client from service account...
20/04/29 02:14:46 DEBUG Config: Found service account host and port: 10.96.0.1:443
20/04/29 02:14:46 DEBUG Config: Found service account ca cert at: [/var/run/secrets/kubernetes.io/serviceaccount/ca.crt].
20/04/29 02:14:46 DEBUG Config: Found service account token at: [/var/run/secrets/kubernetes.io/serviceaccount/token].
20/04/29 02:14:46 DEBUG Config: Trying to configure client namespace from Kubernetes service account namespace path...
20/04/29 02:14:46 DEBUG Config: Found service account namespace at: [/var/run/secrets/kubernetes.io/serviceaccount/namespace].
20/04/29 02:14:57 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2934)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:548)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2578)
        at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:896)
        at scala.Option.getOrElse(Option.scala:138)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:887)
        at sparkrunner.sparklibs.SparkSystem$.<init>(SparkSystem.scala:22)
        at sparkrunner.sparklibs.SparkSystem$.<clinit>(SparkSystem.scala)
        at sparkrunner.actors.RecipeManager$$anonfun$receive$1.applyOrElse(RecipeManager.scala:41)
        at akka.actor.Actor.aroundReceive(Actor.scala:534)
        at akka.actor.Actor.aroundReceive$(Actor.scala:532)
        at sparkrunner.actors.RecipeManager.aroundReceive(RecipeManager.scala:20)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:573)
        at akka.actor.ActorCell.invoke(ActorCell.scala:543)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:269)
        at akka.dispatch.Mailbox.run(Mailbox.scala:230)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:242)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [Pod]  with name: [sparkrunner-0]  in namespace: [default]  failed.
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
        at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:237)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:170)
        at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:59)
        at scala.Option.map(Option.scala:163)
        at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:58)
        at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:113)
        at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2928)
        ... 20 more
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
        at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:167)
        at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:258)
        at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
        at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:111)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
        at okhttp3.RealCall.execute(RealCall.java:93)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:337)
        at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:318)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:833)
        at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:226)
        ... 26 more
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping Server@68d79eec{STARTED}[9.4.z-SNAPSHOT]
20/04/29 02:14:57 DEBUG Server: doStop Server@68d79eec{STOPPING}[9.4.z-SNAPSHOT]
20/04/29 02:14:57 DEBUG QueuedThreadPool: ran SparkUI-59-acceptor-0@2b94b939-ServerConnector@79ce3216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/29 02:14:57 DEBUG AbstractHandlerContainer: Graceful shutdown Server@68d79eec{STOPPING}[9.4.z-SNAPSHOT] by 
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping Spark@79ce3216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping SelectorManager@Spark@79ce3216{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/04/29 02:14:57 DEBUG AbstractLifeCycle: stopping ManagedSelector@8993a98{STARTED} id=3 keys=0 selected=0 updates=0

Запуск spark-3.0preview2 на мини-кубе (Ма c ОС).

➜  kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-26T06:16:15Z", GoVersion:"go1.14", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

Я настроил кластер, как описано здесь:

https://spark.apache.org/docs/latest/running-on-kubernetes.html

Похоже, клиент kubernetes не может связаться с API? Я пытаюсь понять, почему.

Вот что я проверил:

  • k8s хост / порт, где драйвер отправляет задание, является правильным (из кластера kubectl -info)

  • DNS работает (модуль случайной отладки может пропинговать модуль драйвера, нет ошибок разрешения DNS в журналах)

  • RBA C Роль «искра» включена и передается драйвером

  • В кластере не используются iptables или другие сетевые политики

Есть идеи что еще я могу попробовать отладить проблему?

1 Ответ

1 голос
/ 30 апреля 2020

Похоже, что проблема здесь связана с API k8s, как сообщается:

kubectl cluster-info

Эта команда приводит к этому адресу:

k8s: // https://kubernetes.default.svc: 32768

Фактический адрес, который заставит кластер клиентского режима работать, является внутренним:

k8s: // https://10.96.0.1: 443

Я не уверен, что возвращенный оригинал - это прокси или артефакт миникуба, но все снова заработало.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...