Моя программа spark не работает, и ни планировщик, ни драйвер, ни исполнители не предоставляют каких-либо полезных ошибок, кроме состояния «Выход» 137. Что может вызывать сбой в программе «Искра»?
Сбой, кажется, происходит во время преобразования RDD в Dataframe:
val df = sqlc.createDataFrame(processedData, schema).persist()
Прямо перед сбоем журналы выглядят так:
Планировщик
19/01/22 04:01:05 INFO JobUtils: stderr: 19/01/22 04:01:04 WARN TaskSetManager: Stage 11 contains a task of very large size (22028 KB). The maximum recommended task size is 100 KB.
19/01/22 04:01:05 INFO JobUtils: stderr: 19/01/22 04:01:04 INFO TaskSetManager: Starting task 0.0 in stage 11.0 (TID 23, 10.141.1.247, executor 1133b735-967d-136c-2bbf-ffcb3884c88c-1548129213980, partition 0, PROCESS_LOCAL, 22557269 bytes)
19/01/22 04:01:05 INFO JobUtils: stderr: 19/01/22 04:01:04 INFO TaskSetManager: Starting task 1.0 in stage 11.0 (TID 24, 10.141.3.144, executor a92ceb18-b46a-c986-4672-cab9086c54c2-1548129202094, partition 1, PROCESS_LOCAL, 22558910 bytes)
19/01/22 04:01:05 INFO JobUtils: stderr: 19/01/22 04:01:04 INFO TaskSetManager: Starting task 2.0 in stage 11.0 (TID 25, 10.141.1.56, executor b9167d92-bed2-fe21-46fd-08f2c6fd1998-1548129206680, partition 2, PROCESS_LOCAL, 22558910 bytes)
19/01/22 04:01:05 INFO JobUtils: stderr: 19/01/22 04:01:04 INFO TaskSetManager: Starting task 3.0 in stage 11.0 (TID 26, 10.141.3.146, executor 0cf7394b-540d-2a6c-258a-e27bbedbdd0e-1548129212488, partition 3, PROCESS_LOCAL, 22558910 bytes)
19/01/22 04:01:09 DEBUG JobUtils: Tracing alloc 12943f1a-82ed-d4f4-07b3-dfbe5a46716b for driver
...
19/01/22 04:13:45 DEBUG JobUtils: Tracing alloc 12943f1a-82ed-d4f4-07b3-dfbe5a46716b for driver
19/01/22 04:13:46 INFO JobUtils: driver Terminated -- Exit status 137
19/01/22 04:13:46 INFO JobUtils: driver Restarting -- Restart within policy
Driver
19/01/22 04:01:12 INFO DAGScheduler: Job 7 finished: runJob at SparkHadoopMapReduceWriter.scala:88, took 8.008375 s
19/01/22 04:01:12 INFO SparkHadoopMapReduceWriter: Job job_20190122040104_0032 committed.
19/01/22 04:01:13 INFO MapPartitionsRDD: Removing RDD 28 from persistence list
19/01/22 04:01:13 INFO BlockManager: Removing RDD 28
Исполнители (некоторые варианты этого)
19/01/22 04:01:13 INFO BlockManager: Removing RDD 28
19/01/22 04:13:45 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver 10.141.2.48:21297 disassociated! Shutting down.
19/01/22 04:13:45 INFO DiskBlockManager: Shutdown hook called
19/01/22 04:13:45 INFO ShutdownHookManager: Shutdown hook called
19/01/22 04:13:45 INFO ShutdownHookManager: Deleting directory /alloc/spark-ce736cb6-8b8e-4891-b9c7-06ea9d9cf797