Spark 2.4 подключается к dataproc из контейнера: java.net.UnknownHostException - PullRequest
0 голосов
/ 04 февраля 2019

У меня проблемы с подключением Spark 2.4 из док-контейнера, работающего в kubernetes, к кластеру dataproc (с Spark 2.4).Я получаю "java.net.UnknownHostException" для локального имени хоста kubernetes.Эта же сетевая конфигурация работает со Spark 2.2, поэтому, похоже, что-то изменилось в том, как Spark выполняет разрешение имени хоста.

Если я изменю / etc / hosts так, чтобы имя хоста и имя хоста kubernetes указывали на 127.0.0.1будет работать с предупреждением.Тем не менее, это не обходной путь, который мне удобен, так как он может переопределить запись по умолчанию для kubernetes и потенциально повлиять на другие вещи.

Стек ошибок:

(base) appuser@jupyter-21001-0:/opt/notebooks/Test$ pyspark
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
2019-02-01 18:47:25 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-02-01 18:47:27 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2019-02-01 18:47:59 ERROR YarnClientSchedulerBackend:70 - YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
2019-02-01 18:47:59 ERROR YarnClientSchedulerBackend:70 - Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:514)
        at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:307)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:797)
        at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:827)
        at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local:38831
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
        at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
        at java.security.AccessController.doPrivileged(Native Method)
        at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
        at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
        at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
        at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
        at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
        at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
        at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
        at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
        at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
        at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
        at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
 
2019-02-01 18:47:59 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2019-02-01 18:47:59 ERROR SparkContext:91 - Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
        at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:745)
        at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:191)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)
/opt/spark/python/pyspark/shell.py:45: UserWarning: Failed to initialize Spark session.
  warnings.warn("Failed to initialize Spark session.")
Traceback (most recent call last):
  File "/opt/spark/python/pyspark/shell.py", line 41, in <module>
    spark = SparkSession._create_shell_session()
  File "/opt/spark/python/pyspark/sql/session.py", line 583, in _create_shell_session
    return SparkSession.builder.getOrCreate()
  File "/opt/spark/python/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/opt/spark/python/pyspark/context.py", line 349, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/spark/python/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)
  File "/opt/spark/python/pyspark/context.py", line 180, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
  File "/opt/spark/python/pyspark/context.py", line 288, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
    answer, self._gateway_client, None, self._fqn)
  File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
        at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:745)
        at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:191)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:745)

Версия Spark:

(base) appuser@jupyter-21001-0:/opt/notebooks/Test$ pyspark --version
Welcome to
     ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0
      /_/
 
Using Scala version 2.11.12, OpenJDK 64-Bit Server VM, 1.8.0_152-release
Branch
Compiled by user  on 2018-10-29T06:22:05Z
Revision
Url
Type --help for more information.

Файл Hosts:

# Kubernetes-managed hosts file.
127.0.0.1       localhost
::1     localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
fe00::0 ip6-mcastprefix
fe00::1 ip6-allnodes
fe00::2 ip6-allrouters
172.18.168.47   jupyter-21001-0.jupyter-21001.75dev-a123456.svc.cluster.local   jupyter-21001-0

spark-defaults.conf

spark.master yarn
spark.driver.extraClassPath /opt/spark/lib/gcs-connector-latest-hadoop2.jar
spark.executor.extraClassPath /usr/lib/hadoop/lib/gcs-connector.jar:/usr/lib/hadoop/lib/bigquery-connector.jar

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://jupyter-21001-m</value>
  </property>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://jupyter-21001-m</value>
  </property>
  <property>
    <name>fs.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
    <description>The FileSystem for gs: (GCS) uris.</description>
  </property>
  <property>
    <name>fs.AbstractFileSystem.gs.impl</name>
    <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
    <description>The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.</description>
  </property>
  <property>
    <name>fs.gs.project.id</name>
    <value>NOT_RUNNING_INSIDE_GCE</value>
  </property>
  <property>
    <name>google.cloud.auth.service.account.enable</name>
    <value>true</value>
  </property>
  <property>
    <name>google.cloud.auth.service.account.json.keyfile</name>
    <value>/oauth/auth.json</value>
  </property>
  <property>
    <name>fs.gs.implicit.dir.repair.enable</name>
    <value>false</value>
    <description>
      Whether or not to create objects for the parent directories of objects
      with / in their path e.g. creating gs://bucket/foo/ upon finding
      gs://bucket/foo/bar.
    </description>
  </property>
</configuration>

1 Ответ

0 голосов
/ 05 февраля 2019

Это звучит сложно.На высоком уровне я бы рекомендовал сначала посмотреть на ноутбуки Jupyter внутри Dataproc , но постараюсь ответить на вопрос.

В режиме клиента YARN (который использует оболочка pyspark),AppMaster должен вернуться обратно в драйвер, поэтому вам нужно установить spark.driver.host и spark.driver.port на то, с чем AppMaster может общаться.Предполагая, что это GKE в той же сети GCE, что и ваш кластер Dataproc, NodePort , вероятно, является лучшим выбором для подключения драйвера к сети GCE.

Часть, которую я не знаю, этопередавая IP-адрес узла и порт к вашему pyspark pod, чтобы вы могли правильно установить spark.driver.host и spark.driver.port (эксперты GKE, пожалуйста, сообщите об этом).

Я не понимаю деталей вашей настройки, но я бы не стал редактировать виртуальные машины GKE /etc/hosts.Я также нахожу довольно удивительным, что это произошло между 2.2 и 2.4 (возможно, вы не работали на YARN на Dataproc в вашей настройке 2.2?).

Наконец, вы не упоминаете об этом, но вынеобходимо установить yarn.resourcemanager.hostname=jupyter-21001-m в core-site.xml или yarn.site.xml.

в качестве частичного вызова.Это должно успешно создать YARN AppMaster в Dataproc (который затем не сможет дозвониться обратно в GKE):

kubectl run spark -i -t --image gcr.io/spark-operator/spark:v2.4.0 \
  --generator=run-pod/v1 --rm --restart=Never --env HADOOP_CONF_DIR=/tmp \
  --command -- /opt/spark/bin/spark-submit \
    --master yarn \
    --conf spark.hadoop.fs.defaultFS=hdfs://jupyter-21001-m \
    --conf spark.hadoop.yarn.resourcemanager.hostname=jupyter-21001-m \
    --class org.apache.spark.examples.SparkPi \ 
    /opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar

Извините, это неполный ответ, я обновлю, если заполню недостающие фрагменты.

...