Рабочие Spark не выполняются во время применения Spark - PullRequest
0 голосов
/ 02 января 2019

Я попытался настроить jupyter-ноутбук, интегрированный с spark. Я сделал мастер на своей локальной машине. И для практики сделал рабочий тоже на моей машине. Но когда я попытался запустить приложение через jupyter, приложение застряло при выполнении df.show ()

Dockerfile:

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
ARG BASE_CONTAINER=jupyter/scipy-notebook
FROM $BASE_CONTAINER

LABEL maintainer="Jupyter Project <jupyter@googlegroups.com>"

USER root

# Spark dependencies
ENV SPARK_VERSION 2.3.2
ENV SPARK_HADOOP_PROFILE 2.7
ENV SPARK_SRC_URL https://www.apache.org/dist/spark/spark-$SPARK_VERSION/spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz
ENV SPARK_HOME=/opt/spark
ENV PATH $PATH:$SPARK_HOME/bin

RUN apt-get update && \
     apt-get install -y openjdk-8-jdk-headless \
     postgresql && \
    rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME  /usr/lib/jvm/java-8-openjdk-amd64/

ENV PATH $PATH:$JAVA_HOME/bin


RUN wget ${SPARK_SRC_URL}

RUN tar -xzf spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz   

RUN mv spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE} /opt/spark 

RUN rm -f spark-${SPARK_VERSION}-bin-hadoop${SPARK_HADOOP_PROFILE}.tgz

ENV SPARK_MASTER local[*]

ENV SPARK_DRIVER_PORT 5001
ENV SPARK_UI_PORT 5002
ENV SPARK_BLOCKMGR_PORT 5003
EXPOSE $SPARK_DRIVER_PORT $SPARK_UI_PORT $SPARK_BLOCKMGR_PORT

USER $NB_UID
ENV POST_URL https://jdbc.postgresql.org/download/postgresql-42.2.5.jar
RUN wget ${POST_URL}
RUN mv postgresql-42.2.5.jar $SPARK_HOME/jars
# Install pyarrow
RUN conda install --quiet -y 'pyarrow' && \
    conda clean -tipsy && \
    fix-permissions $CONDA_DIR && \
    fix-permissions /home/$NB_USER

WORKDIR $SPARK_HOME

выполнил следующее: docker build -t my_notebook.

docker-compose.yml (master):

master:
  image: my_notebook
  command: bin/spark-class org.apache.spark.deploy.master.Master -h master
  hostname: master
  environment:
    MASTER: spark://master:7077
    SPARK_CONF_DIR: /conf
    SPARK_PUBLIC_DNS: localhost
  expose:
    - 7001
    - 7002
    - 7003
    - 7004
    - 7005
    - 7077
    - 6066
  ports:
    - 4040:4040
    - 6066:6066
    - 7077:7077
    - 8080:8080
  volumes:
    - ./conf/master:/conf
    - ./data:/tmp/data

docker-compose.yml (работник):

worker:
  image: my_notebook
  command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://192.168.1.129:7077
  hostname: worker
  environment:
    SPARK_CONF_DIR: /conf
    SPARK_WORKER_CORES: 4
    SPARK_WORKER_MEMORY: 4g
    SPARK_WORKER_PORT: 8881
    SPARK_WORKER_WEBUI_PORT: 8081
    SPARK_PUBLIC_DNS: localhost
  expose:
    - 7012
    - 7013
    - 7014
    - 7015
    - 8881
  ports:
    - 8081:8081
  volumes:
    - ./conf/worker:/conf
    - ./data:/tmp/data

Код Jupyter:

from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.feature import IndexToString, StringIndexer, VectorIndexer
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark import SparkConf

from pyspark.sql import SQLContext

from pyspark.sql import DataFrameReader 

conf = SparkConf().setAppName('Kiwi Data Application')
conf.set('spark.executor.memory', '1G')
conf.set('spark.executor.cores', '2')

sc = SparkContext(master="spark://localhost:7077", conf=conf)
SparkSession.builder.config(conf=SparkConf()).getOrCreate()

sqlContext = SQLContext(sc)
print('sql context')

# Define JDBC properties for DB Connection
url = "postgresql://IP:PORT/gpdb_qa"
properties = {
     "user": "user",
     "password": "pass",
     "fetchsize": "100000"
}

df = DataFrameReader(sqlContext).jdbc(
        url='jdbc:%s' % url,
        table=query
        , properties=properties
    )
print('read')
    df.show()

основные журналы:

master_1  | 2019-01-02 06:48:11 INFO  Utils:54 - Successfully started service 'sparkMaster' on port 7077.
master_1  | 2019-01-02 06:48:11 INFO  Master:54 - Starting Spark master at spark://master:7077
master_1  | 2019-01-02 06:48:11 INFO  Master:54 - Running Spark version 2.3.2
master_1  | 2019-01-02 06:48:11 INFO  log:192 - Logging initialized @5563ms
master_1  | 2019-01-02 06:48:11 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
master_1  | 2019-01-02 06:48:11 INFO  Server:419 - Started @5640ms
master_1  | 2019-01-02 06:48:11 INFO  AbstractConnector:278 - Started ServerConnector@43cb4127{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
master_1  | 2019-01-02 06:48:11 INFO  Utils:54 - Successfully started service 'MasterUI' on port 8080.
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2bd387be{/app,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6256c056{/app/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b2c2e74{/,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ca8c5ad{/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3828fc1e{/static,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@780ebb19{/app/kill,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1a3c71cf{/driver/kill,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:11 INFO  MasterWebUI:54 - Bound MasterWebUI to 0.0.0.0, and started at http://localhost:8080
master_1  | 2019-01-02 06:48:11 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
master_1  | 2019-01-02 06:48:11 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10529071{/,null,AVAILABLE}
master_1  | 2019-01-02 06:48:11 INFO  AbstractConnector:278 - Started ServerConnector@2699a66b{HTTP/1.1,[http/1.1]}{master:6066}
master_1  | 2019-01-02 06:48:11 INFO  Server:419 - Started @5835ms
master_1  | 2019-01-02 06:48:11 INFO  Utils:54 - Successfully started service on port 6066.
master_1  | 2019-01-02 06:48:11 INFO  StandaloneRestServer:54 - Started REST server for submitting applications on port 6066
master_1  | 2019-01-02 06:48:12 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@201a4303{/metrics/master/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:12 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e4a39d0{/metrics/applications/json,null,AVAILABLE,@Spark}
master_1  | 2019-01-02 06:48:12 INFO  Master:54 - I have been elected leader! New state: ALIVE
master_1  | 2019-01-02 06:48:32 INFO  Master:54 - Registering worker 172.17.0.4:8881 with 2 cores, 12.0 GB RAM
master_1  | 2019-01-02 06:49:29 INFO  Master:54 - Registering app Kiwi Data Application
master_1  | 2019-01-02 06:49:29 INFO  Master:54 - Registered app Kiwi Data Application with ID app-20190102064929-0000
master_1  | 2019-01-02 06:49:29 INFO  Master:54 - Launching executor app-20190102064929-0000/0 on worker worker-20190102064831-172.17.0.4-8881
master_1  | 2019-01-02 06:49:32 INFO  Master:54 - Removing executor app-20190102064929-0000/0 because it is EXITED
master_1  | 2019-01-02 06:49:32 INFO  Master:54 - Launching executor app-20190102064929-0000/1 on worker worker-20190102064831-172.17.0.4-8881
master_1  | 2019-01-02 06:49:34 INFO  Master:54 - Removing executor app-20190102064929-0000/1 because it is EXITED

рабочие журналы:

worker_1  | 2019-01-02 06:48:32 INFO  Worker:54 - Successfully registered with master spark://master:7077
worker_1  | 2019-01-02 06:49:29 INFO  Worker:54 - Asked to launch executor app-20190102064929-0000/0 for Kiwi Data Application
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing view acls to: jovyan
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing modify acls to: jovyan
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing view acls groups to:
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - Changing modify acls groups to:
worker_1  | 2019-01-02 06:49:29 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jovyan); groups with view permissions: Set(); users  with modify permissions: Set(jovyan); groups with modify permissions: Set()
worker_1  | 2019-01-02 06:49:29 INFO  ExecutorRunner:54 - Launch command: "/usr/lib/jvm/java-8-openjdk-amd64//bin/java" "-cp" "/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=41017" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@09a92e44f4de:41017" "--executor-id" "0" "--hostname" "172.17.0.4" "--cores" "2" "--app-id" "app-20190102064929-0000" "--worker-url" "spark://Worker@172.17.0.4:8881"
worker_1  | 2019-01-02 06:49:32 INFO  Worker:54 - Executor app-20190102064929-0000/0 finished with state EXITED message Command exited with code 1 exitStatus 1
worker_1  | 2019-01-02 06:49:32 INFO  Worker:54 - Asked to launch executor app-20190102064929-0000/1 for Kiwi Data Application
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing view acls to: jovyan
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing modify acls to: jovyan
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing view acls groups to:
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - Changing modify acls groups to:
worker_1  | 2019-01-02 06:49:32 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jovyan); groups with view permissions: Set(); users  with modify permissions: Set(jovyan); groups with modify permissions: Set()
worker_1  | 2019-01-02 06:49:32 INFO  ExecutorRunner:54 - Launch command: "/usr/lib/jvm/java-8-openjdk-amd64//bin/java" "-cp" "/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=41017" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@09a92e44f4de:41017" "--executor-id" "1" "--hostname" "172.17.0.4" "--cores" "2" "--app-id" "app-20190102064929-0000" "--worker-url" "spark://Worker@172.17.0.4:8881"
worker_1  | 2019-01-02 06:49:34 INFO  Worker:54 - Executor app-20190102064929-0000/1 finished with state EXITED message Command exited with code 1 exitStatus 1
worker_1  | 2019-01-02 06:49:34 INFO  Worker:54 - Asked to launch executor app-20190102064929-0000/2 for Kiwi Data Application
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing view acls to: jovyan
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing modify acls to: jovyan
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing view acls groups to:
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - Changing modify acls groups to:
worker_1  | 2019-01-02 06:49:34 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(jovyan); groups with view permissions: Set(); users  with modify permissions: Set(jovyan); groups with modify permissions: Set()

Блокнот Jupyter (Журналы приложений):

[Stage 0:>                                                          (0 + 0) / 1]2019-01-02 05:22:53 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:23:08 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:23:23 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:23:38 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
[Stage 0:>                                                          (0 + 0) / 1]2019-01-02 05:23:53 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:24:08 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
notebook_1  | 2019-01-02 05:24:23 WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Журналы Spark работника stderr:

Spark Executor Command: "/usr/lib/jvm/java-8-openjdk-amd64//bin/java" "-cp" "/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=35147" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@notebook:35147" "--executor-id" "31" "--hostname" "172.17.0.3" "--cores" "2" "--app-id" "app-20190101134023-0001" "--worker-url" "spark://Worker@172.17.0.3:8881"
========================================

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:63)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:293)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    ... 4 more
Caused by: java.io.IOException: Failed to connect to notebook:35147
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: notebook
    at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
    at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    at java.net.InetAddress.getByName(InetAddress.java:1077)
    at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
    at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
    at java.security.AccessController.doPrivileged(Native Method)
    at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
    at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
    at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
    at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
    at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
    at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
    at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
    at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
    at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
    at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
    at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
    at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
    at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
    at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
    at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
    at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
    at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
    at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:482)
    at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
    at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
    at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
    ... 1 more

Прошу вас, если я что-то не так делаю

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...