У меня странная проблема. Я застрял на одну неделю, чтобы решить ее, но, к сожалению, не нашел решение.
Я работаю со свечой 2.3.0. Эта версия доступна на сервере Linux, к которому я обращаюсь удаленно (ssh).
Для запуска моего приложения (test.py) я пишу следующий скрипт:
#!/bin/bash
#SBATCH --account=def-moudi
#SBATCH --nodes=2
#SBATCH --time=00:10:00
#SBATCH --mem=100G
#SBATCH --cpus-per-task=5
#SBATCH --ntasks-per-node=6
#SBATCH --output=/project/6008168/moudi/job/spark-job/sparkjob-%j.out
#SBATCH --mail-type=ALL
#SBATCH --error=/project/6008168/moudi/job/spark-job/error6_hours.out
# load the Spark module
module load spark/2.3.0
module load python/3.7.0
source "/home/moudi/ENV3.7.0/bin/activate"
# identify the Spark cluster with the Slurm jobid
export SPARK_IDENT_STRING=$SLURM_JOBID
export JOB_HOME="$HOME/.spark/2.3.0/$SPARK_IDENT_STRING"
mkdir -p $JOB_HOME
## --------------------------------------
## 1. Start the Spark cluster master
## --------------------------------------
$SPARK_HOME/sbin/start-master.sh
sleep 5
MASTER_URL=$(grep -Po '(?=spark://).*'
$SPARK_LOG_DIR/spark-${SPARK_IDENT_STRING}-org.apache.spark.deploy.master*.out)
## --------------------------------------
## 2. Start the Spark cluster workers
## --------------------------------------
# get the resource details from the Slurm job
export SPARK_WORKER_CORES=${SLURM_CPUS_PER_TASK:-1}
export SPARK_MEM=$(( ${SLURM_MEM_PER_CPU:-3072} * ${SLURM_CPUS_PER_TASK:-1} ))
export SPARK_DAEMON_MEMORY=${SPARK_MEM}m
export SPARK_WORKER_MEMORY=${SPARK_MEM}m
NWORKERS=${SLURM_NTASKS:-1} #just for testing you should delete this line
# start the workers on each node allocated to the job
export SPARK_NO_DAEMONIZE=1
srun -n ${NWORKERS} -N $SLURM_JOB_NUM_NODES --label -- output=$SPARK_LOG_DIR/spark-%j-workers.out start-slave.sh -m ${SPARK_MEM}m -c
${SPARK_WORKER_CORES} ${MASTER_URL} &
## --------------------------------------
## 3. Submit a task to the Spark cluster
## --------------------------------------
spark-submit --master ${MASTER_URL} --total-executor-cores $((SLURM_NTASKS *
SLURM_CPUS_PER_TASK)) --executor-memory ${SPARK_WORKER_MEMORY} --driver-memory ${SPARK_WORKER_MEMORY}m --num- executors $((SLURM_NTASKS - 1)) /project/6008168/moudi/test.py
## --------------------------------------
## 4. Clean up
## --------------------------------------
# stop the workers
scancel ${SLURM_JOBID}.0
# stop the master
$SPARK_HOME/sbin/stop-master.sh
Когда я запускаю этот скрипт, я замечаю, что есть только 8 рабочих, что не правильно, так как должно быть 11 рабочих? Файл журнала результатов работы выглядит следующим образом:
2: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
3: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
0: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
1: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
5: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
0: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
0: ========================================
1: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
1: ========================================
2: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
2: ========================================
3: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
3: ========================================
5: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
5: ========================================
10: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
10: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
10: ========================================
3: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
1: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
5: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
2: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
3: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190920@cdr562.int.cedar.computecanada.ca
1: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190924@cdr562.int.cedar.computecanada.ca
2: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190921@cdr562.int.cedar.computecanada.ca
5: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190923@cdr562.int.cedar.computecanada.ca
3: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
1: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
3: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
3: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
1: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
1: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
2: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
5: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
2: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
2: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
5: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
5: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
0: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
0: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 190922@cdr562.int.cedar.computecanada.ca
0: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
0: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
0: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
3: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
3: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
3: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to:
1: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
3: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to:
1: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
3: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
1: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to:
1: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to:
1: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
5: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
5: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
5: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to:
5: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to:
5: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
2: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
2: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
2: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to:
2: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to:
2: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
0: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
0: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
0: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to:
0: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to:
0: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
10: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
10: 19/05/05 08:25:38 INFO Worker: Started daemon with process name: 134076@cdr743.int.cedar.computecanada.ca
10: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for TERM
10: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for HUP
10: 19/05/05 08:25:38 INFO SignalUtils: Registered signal handler for INT
10: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls to: moudi
10: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls to: moudi
10: 19/05/05 08:25:38 INFO SecurityManager: Changing view acls groups to:
10: 19/05/05 08:25:38 INFO SecurityManager: Changing modify acls groups to:
10: 19/05/05 08:25:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
3: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 35634.
1: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 41932.
5: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 36466.
2: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 32857.
0: 19/05/05 08:25:38 INFO Utils: Successfully started service 'sparkWorker' on port 41950.
3: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:35634 with 5 cores, 15.0 GB RAM
1: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:41932 with 5 cores, 15.0 GB RAM
5: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:36466 with 5 cores, 15.0 GB RAM
1: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
3: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
1: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
3: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
5: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
5: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
2: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:32857 with 5 cores, 15.0 GB RAM
2: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
2: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
0: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.49:41950 with 5 cores, 15.0 GB RAM
0: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
0: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
10: 19/05/05 08:25:39 INFO Utils: Successfully started service 'sparkWorker' on port 35803.
3: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
1: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
3: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8082.
5: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
5: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8082. Attempting port 8083.
5: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8083.
2: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
2: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8082. Attempting port 8083.
2: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8083. Attempting port 8084.
2: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8084.
4: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
3: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8082
1: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8081
3: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
1: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
5: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8083
5: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
2: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8084
2: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8081. Attempting port 8082.
0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8082. Attempting port 8083.
0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8083. Attempting port 8084.
0: 19/05/05 08:25:39 WARN Utils: Service 'WorkerUI' could not bind on port 8084. Attempting port 8085.
0: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8085.
0: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr562.int.cedar.computecanada.ca:8085
0: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
11: starting org.apache.spark.deploy.worker.Worker, logging to /home/moudi/.spark/2.3.0/logs/spark-20562069-org.apache.spark.deploy.worker.Worker-1-cdr562.out
10: 19/05/05 08:25:39 INFO Worker: Starting Spark worker 172.16.138.230:35803 with 5 cores, 15.0 GB RAM
10: 19/05/05 08:25:39 INFO Worker: Running Spark version 2.3.0
10: 19/05/05 08:25:39 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
3: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 39 ms (0 ms spent in bootstraps)
1: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 45 ms (0 ms spent in bootstraps)
5: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 43 ms (0 ms spent in bootstraps)
4: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
4: ========================================
2: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 51 ms (0 ms spent in bootstraps)
0: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 42 ms (0 ms spent in bootstraps)
10: 19/05/05 08:25:39 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
10: 19/05/05 08:25:39 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://cdr743.int.cedar.computecanada.ca:8081
10: 19/05/05 08:25:39 INFO Worker: Connecting to master cdr562.int.cedar.computecanada.ca:7077...
3: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
1: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
5: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
0: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
2: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
11: Spark Command: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/java/1.8.0_121/bin/java -cp /home/moudi/.spark/2.3.0/conf/:/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0/jars/* -Xmx15360m org.apache.spark.deploy.worker.Worker --webui-port 8081 -m 15360m -c 5 spark://cdr562.int.cedar.computecanada.ca:7077
11: ========================================
10: 19/05/05 08:25:39 INFO TransportClientFactory: Successfully created connection to cdr562.int.cedar.computecanada.ca/172.16.138.49:7077 after 48 ms (0 ms spent in bootstraps)
10: 19/05/05 08:25:39 INFO Worker: Successfully registered with master spark://cdr562.int.cedar.computecanada.ca:7077
4: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
4: 19/05/05 08:25:40 INFO Worker: Started daemon with process name: 191630@cdr562.int.cedar.computecanada.ca
4: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for TERM
4: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for HUP
4: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for INT
11: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
11: 19/05/05 08:25:40 INFO Worker: Started daemon with process name: 134213@cdr743.int.cedar.computecanada.ca
4: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls to: moudi
4: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls to: moudi
4: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls groups to:
4: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls groups to:
4: 19/05/05 08:25:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
11: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for TERM
11: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for HUP
11: 19/05/05 08:25:40 INFO SignalUtils: Registered signal handler for INT
11: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls to: moudi
11: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls to: moudi
11: 19/05/05 08:25:40 INFO SecurityManager: Changing view acls groups to:
11: 19/05/05 08:25:40 INFO SecurityManager: Changing modify acls groups to:
11: 19/05/05 08:25:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(moudi); groups with view permissions: Set(); users with modify permissions: Set(moudi); groups with modify permissions: Set()
4: 19/05/05 08:25:41 INFO Utils: Successfully started service 'sparkWorker' on port 41764.
11: 19/05/05 08:25:41 INFO Utils: Successfully started service 'sparkWorker' on port 42231.
4: 19/05/05 08:25:41 INFO Worker: Starting Spark worker 172.16.138.49:41764 with 5 cores, 15.0 GB RAM
4: 19/05/05 08:25:41 INFO Worker: Running Spark version 2.3.0
4: 19/05/05 08:25:41 INFO Worker: Spark home: /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/spark/2.3.0
0: slurmstepd: error: *** STEP 20562069.0 ON cdr562 CANCELLED AT 2019-05-05T08:25:41 ***
Пожалуйста, уточните, почему у меня только 8 рабочих? Имеется ли в моих сценариях неправильная конфигурация, из-за которой эти 8 рабочих были созданы?