Прекратите повторную попытку подключения к менеджеру ресурсов YARN после ограниченных попыток в Java - PullRequest
0 голосов
/ 18 сентября 2018

Я использую API org.apache.spark.deploy.yarn.Client в Java для отправки приложения Spark в YARN.

SparkConf sparkConf = new SparkConf();

List<String> submitArgs = new ArrayList<>();

if (StringUtils.hasText(appName)) {
    submitArgs.add("--name");
    submitArgs.add(appName);
    sparkConf.setAppName(appName);
}
submitArgs.add("--jar");
submitArgs.add(appJarPath);

submitArgs.add("--class");
submitArgs.add(appMainClass);
System.setProperty("SPARK_YARN_MODE", "true");

sparkConf.setMaster("yarn")
       .set("spark.submit.deployMode", "cluster")
       .set("spark.yarn.queue",queue);

ClientArguments clientArguments = new ClientArguments(submitArgs.toArray(new String[submitArgs.size()]));
Client client = new Client(clientArguments, sparkConf);
client.run();

Я пытаюсь создать сценарий, когда клиент не может подключиться кДиспетчер ресурсов, после того, как это может быть предпринято, отключится и выдаст исключение.Однако клиент продолжает повторять попытки подключения к менеджеру ресурсов YARN без какого-либо тайм-аута.См. Ниже вывод консоли:

2018-09-18 14:20:25.046  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.yarn.client.RMProxy    : Connecting to ResourceManager at /0.0.0.0:8032
2018-09-18 14:20:27.600  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:29.603  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:31.600  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:33.607  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:35.608  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:37.619  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:39.620  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:41.623  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:43.633  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:20:45.632  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:18.665  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:20.676  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:22.679  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:24.686  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:26.691  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:28.695  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:30.697  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-09-18 14:21:32.708  INFO 10480 --- [nio-8080-exec-1] org.apache.hadoop.ipc.Client             : Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
...

Какую пользовательскую конфигурацию я должен сделать и как, чтобы повторная попытка была остановлена ​​после ограниченного числа попыток или времени?

...