Ошибка пропускной способности EMR Datapipeline при использовании Dynamo OnDemand - PullRequest
0 голосов
/ 21 октября 2019

Я использую Datapipeline для записи в Dynamo через шаг EMR, однако я использую OnDemand для DynamoDB и все еще получаю ошибки.

Ниже приведен мой код для выполнения действия EMR исообщения об ошибках:

command-runner.jar,spark-submit,--deploy-mode,client,--master,yarn,--conf,spark.yarn.maxAppAttempts=1,--conf,spark.driver.memory=512M,--conf,spark.executor.cores=2,--class,<Class here>,<jar path>,--inputS3Path,#{input.directoryPath},--dynamoDbTableName,#{myDDBTableName},--throughputWritePercent,0.95

Я должен установить throughputWritePercent , в противном случае он использует очень низкое количество, даже если для Dynamo настроено OnDemand

errorMessage:

Caused by: com.amazonaws.services.dynamodbv2.model.RequestLimitExceededException: Throughput exceeds the current throughput limit for your account. Please contact AWS Support at https://aws.amazon.com/support request a limit increase (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: RequestLimitExceeded; Request ID: TIV1G4DAE1OQ7M5SGB57GKRPVBVV4KQNSO5AEMVJF66Q9ASUAAJG) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.

ErrorStackTrace

amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. 
  at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67) 
  at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) 
  at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) 
  at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) 
  at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) 
  at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) 
  at java.lang.Thread.run(Thread.java:748) 
Caused by: amazonaws.datapipeline.taskrunner.TaskExecutionException: 
Caused by: com.amazonaws.services.dynamodbv2.model.RequestLimitExceededException: Throughput exceeds the current throughput limit for your account. Please contact AWS Support at https://aws.amazon.com/support request a limit increase (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: RequestLimitExceeded; Request ID: TIV1G4DAE1OQ7M5SGB57GKRPVBVV4KQNSO5AEMVJF66Q9ASUAAJG) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686) 
  at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668) 
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532) 
  at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512) 
  at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4230) 
  at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4197) 
  at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeBatchWriteItem(AmazonDynamoDBClient.java:693) 
  at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:660) 
  at org.apache.hadoop.dynamodb.DynamoDBClient$4.call(DynamoDBClient.java:244) 
  at org.apache.hadoop.dynamodb.DynamoDBClient$4.call(DynamoDBClient.java:238) 
  at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:80) ... 18 more 

19/10/20 23:47:22 WARN TaskSetManager: Lost task 13.3 in stage 0.0 (TID 42, <ip>, executor 7): TaskKilled (Stage cancelled) 
19/10/20 23:47:22 WARN TaskSetManager: Lost task 2.3 in stage 0.0 (TID 46, ip-10-192-20-236.eu-west-1.compute.internal, executor 6): TaskKilled (Stage cancelled) 
19/10/20 23:47:22 WARN TaskSetManager: Lost task 10.3 in stage 0.0 (TID 54, ip-10-192-20-236.eu-west-1.compute.internal, executor 6): TaskKilled (Stage cancelled) 
19/10/20 23:47:22 INFO SparkContext: Invoking stop() from shutdown hook 
19/10/20 23:47:22 INFO SparkUI: Stopped Spark web UI at http://<ip-internaldns>:4040 
19/10/20 23:47:22 INFO YarnClientSchedulerBackend: Interrupting monitor thread at amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) 
  at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63) ... 7 more
...