Я использую Datapipeline для записи в Dynamo через шаг EMR, однако я использую OnDemand для DynamoDB и все еще получаю ошибки.
Ниже приведен мой код для выполнения действия EMR исообщения об ошибках:
command-runner.jar,spark-submit,--deploy-mode,client,--master,yarn,--conf,spark.yarn.maxAppAttempts=1,--conf,spark.driver.memory=512M,--conf,spark.executor.cores=2,--class,<Class here>,<jar path>,--inputS3Path,#{input.directoryPath},--dynamoDbTableName,#{myDDBTableName},--throughputWritePercent,0.95
Я должен установить throughputWritePercent , в противном случае он использует очень низкое количество, даже если для Dynamo настроено OnDemand
errorMessage:
Caused by: com.amazonaws.services.dynamodbv2.model.RequestLimitExceededException: Throughput exceeds the current throughput limit for your account. Please contact AWS Support at https://aws.amazon.com/support request a limit increase (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: RequestLimitExceeded; Request ID: TIV1G4DAE1OQ7M5SGB57GKRPVBVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.
ErrorStackTrace
amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform.
at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67)
at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)
at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136)
at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105)
at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76)
at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)
at java.lang.Thread.run(Thread.java:748)
Caused by: amazonaws.datapipeline.taskrunner.TaskExecutionException:
Caused by: com.amazonaws.services.dynamodbv2.model.RequestLimitExceededException: Throughput exceeds the current throughput limit for your account. Please contact AWS Support at https://aws.amazon.com/support request a limit increase (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: RequestLimitExceeded; Request ID: TIV1G4DAE1OQ7M5SGB57GKRPVBVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:4230)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:4197)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeBatchWriteItem(AmazonDynamoDBClient.java:693)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:660)
at org.apache.hadoop.dynamodb.DynamoDBClient$4.call(DynamoDBClient.java:244)
at org.apache.hadoop.dynamodb.DynamoDBClient$4.call(DynamoDBClient.java:238)
at org.apache.hadoop.dynamodb.DynamoDBFibonacciRetryer.runWithRetry(DynamoDBFibonacciRetryer.java:80) ... 18 more
19/10/20 23:47:22 WARN TaskSetManager: Lost task 13.3 in stage 0.0 (TID 42, <ip>, executor 7): TaskKilled (Stage cancelled)
19/10/20 23:47:22 WARN TaskSetManager: Lost task 2.3 in stage 0.0 (TID 46, ip-10-192-20-236.eu-west-1.compute.internal, executor 6): TaskKilled (Stage cancelled)
19/10/20 23:47:22 WARN TaskSetManager: Lost task 10.3 in stage 0.0 (TID 54, ip-10-192-20-236.eu-west-1.compute.internal, executor 6): TaskKilled (Stage cancelled)
19/10/20 23:47:22 INFO SparkContext: Invoking stop() from shutdown hook
19/10/20 23:47:22 INFO SparkUI: Stopped Spark web UI at http://<ip-internaldns>:4040
19/10/20 23:47:22 INFO YarnClientSchedulerBackend: Interrupting monitor thread at amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286)
at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63) ... 7 more