Spark executor потерян при увеличении количества экземпляров executor - PullRequest
0 голосов
/ 21 марта 2019

Мой кластер Hadoop в настоящее время имеет 4 узла и 45 ядер, на которых запущен pyspark 2.4 через YARN.Когда я запускаю spark-submit с одним исполнителем, все работает нормально, но если я изменяю количество экземпляров executor на 3 или 4, исполнитель уничтожается драйвером, и работает только одна задача.

Я изменилсяследующие настройки в менеджере Cloudera:

yarn.nodemanager.resource.memory-mb : 64 GB
yarn.nodemanager.resource.cpu-vcores:45

И ниже - журнал, который я получаю:

19/03/21 11:28:48 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks
19/03/21 11:28:48 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, datanode1, executor 2, partition 0, PROCESS_LOCAL, 7701 bytes)
19/03/21 11:28:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on datanode1:42432 (size: 71.0 KB, free: 366.2 MB)
19/03/21 11:29:43 INFO spark.ExecutorAllocationManager: Request to remove executorIds: 1, 3
19/03/21 11:29:43 INFO cluster.YarnClientSchedulerBackend: Requesting to kill executor(s) 1, 3
19/03/21 11:29:43 INFO cluster.YarnClientSchedulerBackend: Actual list of executor(s) to be killed is 1, 3
19/03/21 11:29:43 INFO spark.ExecutorAllocationManager: Removing executor 1 because it has been idle for 60 seconds (new desired total will be 2)
19/03/21 11:29:43 INFO spark.ExecutorAllocationManager: Removing executor 3 because it has been idle for 60 seconds (new desired total will be 1)
19/03/21 11:29:45 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 3.
19/03/21 11:29:45 INFO scheduler.DAGScheduler: Executor lost: 3 (epoch 0)
19/03/21 11:29:45 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
19/03/21 11:29:45 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, datanode2, 32853, None)
19/03/21 11:29:45 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor
19/03/21 11:29:45 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Disabling executor 1.
19/03/21 11:29:45 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0)
19/03/21 11:29:45 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
19/03/21 11:29:45 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, datanode3, 39466, None)
19/03/21 11:29:45 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor
19/03/21 11:29:45 INFO cluster.YarnScheduler: Executor 3 on datanode2 killed by driver.
19/03/21 11:29:45 INFO cluster.YarnScheduler: Executor 1 on datanode3 killed by driver.
19/03/21 11:29:45 INFO spark.ExecutorAllocationManager: Existing executor 3 has been removed (new total is 2)
19/03/21 11:29:45 INFO spark.ExecutorAllocationManager: Existing executor 1 has been removed (new total is 1)
...