Инициализировать ядро ​​Jupyter при проблеме с искрой PYTHON_WORKER_FACTORY_SECRET? - PullRequest
0 голосов
/ 12 сентября 2018

Я работаю над настройкой шлюза Jupyter Entreprise для зажигания. Прямо сейчас я могу подключить свой ноутбук jupyter к ядру в режиме клиента, но при попытке отправить задание я сталкиваюсь с двумя ошибками, связанными с PYTHON_WORKER_FACTORY_SECRET и PYSPARK_GATEWAY_SECRET как в режиме клиента, так и в режиме кластерной пряжи.

Режим списка кластеров PYSPARK_GATEWAY_SECRET нет

File "/opt/anaconda3/lib/python3.6/threading.py", line 916, in_bootstrap_inner
    self.run()
  File "/opt/anaconda3/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "launch_ipykernel.py", line 62, in initialize_spark_session
    spark = SparkSession.builder.getOrCreate()
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/sql/session.py", line 173, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 343, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 115, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/context.py", line 292, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/java_gateway.py", line 47, in launch_gateway
    gateway_secret = os.environ["PYSPARK_GATEWAY_SECRET"]
  File "/opt/anaconda3/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYSPARK_GATEWAY_SECRET'

Также это может привести к следующей ошибке при втором запуске:

Container: container_e03_1536582358787_0027_02_000001 on spark-worker-1.c.mozn-location.internal_45454
LogAggregationType: AGGREGATED
======================================================================================================
LogType:stdout
LogLastModifiedTime:Tue Sep 11 07:35:25 +0000 2018
LogLength:520
LogContents:
Using connection file '/tmp/kernel-a3c49386-71a3-44b8-94de-1b870914c5fb_jvq2h0jy.json' instead of '/home/elyra/.local/share/jupyter/runtime/kernel-a3c49386-71a3-44b8-94de-1b870914c5fb.json'
Signal socket bound to host: 0.0.0.0, port: 46611
Traceback (most recent call last):
  File "launch_ipykernel.py", line 319, in <module>
    lower_port, upper_port)
  File "launch_ipykernel.py", line 142, in return_connection_info
    s.connect((response_ip, response_port))
ConnectionRefusedError: [Errno 111] Connection refused

End of LogType:stdout
***********************************************************************

Режим клиента PYTHON_WORKER_FACTORY_SECRET - нет:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/daemon.py", line 170, in manager
    code = worker(sock, authenticated)
  File "/opt/anaconda3/lib/python3.6/site-packages/pyspark/daemon.py", line 62, in worker
    if os.environ["PYTHON_WORKER_FACTORY_SECRET"] == client_secret:
  File "/opt/anaconda3/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'PYTHON_WORKER_FACTORY_SECRET'
18/09/11 07:26:27 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketException: Connection reset

Я пытаюсь установить эту переменную либо через экспорт, либо напрямую на стороне ядра:

[elyra@spark-master ~]$ cat /usr/local/share/jupyter/kernels/spark_python_yarn_client/kernel.json 
{
  "language": "python",
  "display_name": "Spark - Python (YARN Client Mode)",
  "process_proxy": {
    "class_name": "enterprise_gateway.services.processproxies.distributed.DistributedProcessProxy"
  },
  "env": {
    "SPARK_HOME": "/usr/hdp/current/spark2-client",
    "PYSPARK_PYTHON": "/opt/anaconda3/bin/python",
    "PYTHONPATH": "/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip",
    "PYTHON_WORKER_FACTORY_SECRET": "w<X?u6I&Ekt>49n}K5kBJ^QM@Zz)Mf",
    "SPARK_OPTS": "--master yarn --deploy-mode client --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID}",
    "LAUNCH_OPTS": ""
  },
  "argv": [
    "/usr/local/share/jupyter/kernels/spark_python_yarn_client/bin/run.sh",
    "{connection_file}",
    "--RemoteProcessProxy.response-address",
    "{response_address}",
    "--RemoteProcessProxy.port-range",
    "{port_range}",
    "--RemoteProcessProxy.spark-context-initialization-mode",
    "lazy"
  ]
}

Как вы думаете, для режима клиента это связано со следующей переменной среды get os для PYTHON_WORKER_FACTORY_SECRET и java-портов

Что касается режима кластера, я понимаю, что spark [PythonRunner][1] автоматически инициализирует тот, который будет использоваться шлюзом java_.

Основываясь на поддержке команды Entreprise Gateway, я устанавливаю переменную окружения через spark.yarn.appMasterEnv следующим образом:

SPARK_OPTS": "--master yarn --deploy-mode cluster --name ${KERNEL_ID:-ERROR__NO__KERNEL_ID} --conf spark.yarn.submit.waitAppCompletion=false --conf spark.yarn.appMasterEnv.PYSPARK_GATEWAY_SECRET=this_secret_key --conf spark.yarn.appMasterEnv.PYTHONUSERBASE=/opt/anaconda3 --conf spark.yarn.appMasterEnv.PYTHONPATH=/opt/anaconda3/lib/python3.6/site-packages/:/usr/hdp/current/spark2-client/python:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip --conf spark.yarn.appMasterEnv.PATH=/opt/anaconda3/bin/python:$PATH"

Но это приводит к тайм-ауту yarn logs -applicationId application_1536672003321_0007

18/09/12 11:56:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, elyra); groups with view permissions: Set(); users  with modify permissions: Set(yarn, elyra); groups with modify permissions: Set()
18/09/12 11:56:14 INFO ApplicationMaster: Preparing Local resources
18/09/12 11:56:15 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1536672003321_0007_000001
18/09/12 11:56:15 INFO ApplicationMaster: Starting the user application in a separate Thread
18/09/12 11:56:15 INFO ApplicationMaster: Waiting for spark context initialization...
18/09/12 11:57:55 ERROR ApplicationMaster: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201)
    at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
    at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/09/12 11:57:55 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds])
18/09/12 11:57:55 INFO ShutdownHookManager: Shutdown hook called

Я хотел бы спросить, существует ли какой-либо лучший подход к настройке переменной искровой среды, и что мне не хватает в этом случае, так как мое понимание не нужно для установки botgh PYSPARK_GATEWAY_SECRET или PYTHON_WORKER_FACTORY_SECRET

...