Запуск Pyspark в ноутбуке Jupyter с Ливи - PullRequest
0 голосов
/ 08 марта 2019

Я новичок в PySpark и попытался настроить среду локально. Я установил Spark, Hadoop, PySpark, Livy и Sparkmagic, но не смог запустить код PySpark в блокноте Jupyter. Это сработало, когда я изменил ядро ​​на Spark и запустил код Scala. Почему-то только Pyspark и Pyspark3 не работают. Я получил следующую ошибку:

The code failed because of a fatal error:
    Session 1 unexpectedly reached final status 'error'. See logs:
stdout: 
2019-03-08 17:41:43 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-03-08 17:41:44 INFO  RSCDriver:158 - Connecting to: 10.236.40.127:10000
2019-03-08 17:41:44 INFO  RSCDriver:180 - Starting RPC server...
2019-03-08 17:41:49 INFO  RpcServer:105 - Connected to the port 10002
2019-03-08 17:41:54 WARN  RSCConf:142 - Your hostname, 10.236.40.127, resolves to a loopback address, but we couldn't find any external IP address!
2019-03-08 17:41:54 WARN  RSCConf:144 - Set livy.rsc.rpc.server.address if you need to bind to another address.
2019-03-08 17:41:54 INFO  RSCDriver:404 - Received job request 3384f73e-015a-4979-affd-770cef0d2f85
2019-03-08 17:41:54 INFO  RSCDriver:364 - SparkContext not yet up, queueing job request.
2019-03-08 17:41:55 ERROR PythonInterpreter:52 - Process has died with 1
2019-03-08 17:41:55 ERROR PythonInterpreter:52 - Traceback (most recent call last):
  File "/var/folders/gb/xd8jwpm514xc04ss6ztdnbv80000gn/T/5478442844131101061", line 644, in <module>
    sys.exit(main())
  File "/var/folders/gb/xd8jwpm514xc04ss6ztdnbv80000gn/T/5478442844131101061", line 534, in main
    exec('from pyspark.shell import sc', global_dict)
  File "<string>", line 1, in <module>
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 634, in _load_backward_compatible
  File "/usr/local/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/shell.py", line 38, in <module>
  File "/usr/local/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 298, in _ensure_initialized
  File "/usr/local/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/java_gateway.py", line 48, in launch_gateway
  File "/Users/casbie/.pyenv/versions/3.5.1/lib/python3.5/os.py", line 683, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYSPARK_GATEWAY_SECRET'
2019-03-08 17:41:55 INFO  SparkContext:54 - Running Spark version 2.4.0
2019-03-08 17:41:55 INFO  SparkContext:54 - Submitted application: livy-session-1
2019-03-08 17:41:55 INFO  SecurityManager:54 - Changing view acls to: casbie
2019-03-08 17:41:55 INFO  SecurityManager:54 - Changing modify acls to: casbie
2019-03-08 17:41:55 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-03-08 17:41:55 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-03-08 17:41:55 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(casbie); groups with view permissions: Set(); users  with modify permissions: Set(casbie); groups with modify permissions: Set()
2019-03-08 17:41:56 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 54185.
2019-03-08 17:41:56 INFO  SparkEnv:54 - Registering MapOutputTracker
2019-03-08 17:41:56 INFO  SparkEnv:54 - Registering BlockManagerMaster
2019-03-08 17:41:56 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-03-08 17:41:56 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-03-08 17:41:56 INFO  DiskBlockManager:54 - Created local directory at /private/var/folders/gb/xd8jwpm514xc04ss6ztdnbv80000gn/T/blockmgr-749630f0-9862-4274-a7bc-7f8c7067b4cd
2019-03-08 17:41:56 INFO  MemoryStore:54 - MemoryStore started with capacity 353.4 MB
2019-03-08 17:41:56 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2019-03-08 17:41:56 INFO  log:192 - Logging initialized @24019ms
2019-03-08 17:41:56 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2019-03-08 17:41:56 INFO  Server:419 - Started @24101ms
2019-03-08 17:41:56 WARN  Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2019-03-08 17:41:56 INFO  AbstractConnector:278 - Started ServerConnector@3737155d{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
2019-03-08 17:41:56 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4041.
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@369d4f55{/jobs,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6981058b{/jobs/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7355eced{/jobs/job,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2325b264{/jobs/job/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@426956d6{/stages,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1e07e8f2{/stages/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@76f64ea6{/stages/stage,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5178642e{/stages/stage/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@153f83d9{/stages/pool,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1e5e7fe4{/stages/pool/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@24681af2{/storage,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36234345{/storage/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5feb3f13{/storage/rdd,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3227ca55{/storage/rdd/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@567cf969{/environment,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7ae9b188{/environment/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7cfc0a61{/executors,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@708512c9{/executors/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@41031a01{/executors/threadDump,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@76518658{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6a6225f1{/static,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@279baf37{/,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2cbd86ce{/api,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@386f23f3{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@ec1ee05{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-03-08 17:41:56 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://10.236.40.127:4041
2019-03-08 17:41:56 INFO  SparkContext:54 - Added JAR file:///usr/local/livy-0.4.0-incubating-bin/rsc-jars/livy-api-0.4.0-incubating.jar at spark://10.236.40.127:54185/jars/livy-api-0.4.0-incubating.jar with timestamp 1552034516525
2019-03-08 17:41:56 INFO  SparkContext:54 - Added JAR file:///usr/local/livy-0.4.0-incubating-bin/rsc-jars/livy-rsc-0.4.0-incubating.jar at spark://10.236.40.127:54185/jars/livy-rsc-0.4.0-incubating.jar with timestamp 1552034516527
2019-03-08 17:41:56 INFO  SparkContext:54 - Added JAR file:///usr/local/livy-0.4.0-incubating-bin/rsc-jars/netty-all-4.0.29.Final.jar at spark://10.236.40.127:54185/jars/netty-all-4.0.29.Final.jar with timestamp 1552034516527
2019-03-08 17:41:56 INFO  SparkContext:54 - Added JAR file:///usr/local/livy-0.4.0-incubating-bin/repl_2.11-jars/commons-codec-1.9.jar at spark://10.236.40.127:54185/jars/commons-codec-1.9.jar with timestamp 1552034516527
2019-03-08 17:41:56 INFO  SparkContext:54 - Added JAR file:///usr/local/livy-0.4.0-incubating-bin/repl_2.11-jars/livy-core_2.11-0.4.0-incubating.jar at spark://10.236.40.127:54185/jars/livy-core_2.11-0.4.0-incubating.jar with timestamp 1552034516528
2019-03-08 17:41:56 INFO  SparkContext:54 - Added JAR file:///usr/local/livy-0.4.0-incubating-bin/repl_2.11-jars/livy-repl_2.11-0.4.0-incubating.jar at spark://10.236.40.127:54185/jars/livy-repl_2.11-0.4.0-incubating.jar with timestamp 1552034516528
2019-03-08 17:41:56 INFO  Executor:54 - Starting executor ID driver on host localhost
2019-03-08 17:41:56 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 54186.
2019-03-08 17:41:56 INFO  NettyBlockTransferService:54 - Server created on 10.236.40.127:54186
2019-03-08 17:41:56 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-03-08 17:41:56 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, 10.236.40.127, 54186, None)
2019-03-08 17:41:56 INFO  BlockManagerMasterEndpoint:54 - Registering block manager 10.236.40.127:54186 with 353.4 MB RAM, BlockManagerId(driver, 10.236.40.127, 54186, None)
2019-03-08 17:41:56 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, 10.236.40.127, 54186, None)
2019-03-08 17:41:56 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, 10.236.40.127, 54186, None)
2019-03-08 17:41:57 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@79c919e7{/metrics/json,null,AVAILABLE,@Spark}

stderr: .

Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.

Спасибо.

...