Question

Я пытаюсь настроить Hive on Spark на одной маленькой виртуальной машине (4 ГБ ОЗУ), но не могу заставить ее обрабатывать запросы.

Например, это SELECT max(price) FROM rentflattoday приводит к следующемужурнал контейнера, поскольку запрос зависает в бесконечном цикле:

 2019-02-24 14:41:35 INFO  SignalUtils:54 - Registered signal handler for TERM
2019-02-24 14:41:35 INFO  SignalUtils:54 - Registered signal handler for HUP
2019-02-24 14:41:35 INFO  SignalUtils:54 - Registered signal handler for INT
2019-02-24 14:41:35 INFO  SecurityManager:54 - Changing view acls to: hadoop
2019-02-24 14:41:35 INFO  SecurityManager:54 - Changing modify acls to: hadoop
2019-02-24 14:41:35 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-02-24 14:41:35 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-02-24 14:41:35 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
2019-02-24 14:41:36 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-02-24 14:41:37 INFO  ApplicationMaster:54 - Preparing Local resources
2019-02-24 14:41:39 INFO  ApplicationMaster:54 - ApplicationAttemptId: appattempt_1551033757513_0011_000001
2019-02-24 14:41:39 INFO  ApplicationMaster:54 - Starting the user application in a separate Thread
2019-02-24 14:41:39 INFO  ApplicationMaster:54 - Waiting for spark context initialization...
2019-02-24 14:41:39 INFO  RemoteDriver:125 - Connecting to: weirv1:42832
2019-02-24 14:41:39 INFO  HiveConf:187 - Found configuration file file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/filecache/28/__spark_conf__.zip/__hadoop_conf__/hive-site.xml
2019-02-24 14:41:40 WARN  HiveConf:5214 - HiveConf of name hive.enforce.bucketing does not exist
2019-02-24 14:41:40 WARN  Rpc:170 - Invalid log level null, reverting to default.
2019-02-24 14:41:41 INFO  SparkContext:54 - Running Spark version 2.4.0
2019-02-24 14:41:41 INFO  SparkContext:54 - Submitted application: Hive on Spark (sessionId = 94aded5e-fbeb-4839-af11-9c5f5902fa0c)
2019-02-24 14:41:41 INFO  SecurityManager:54 - Changing view acls to: hadoop
2019-02-24 14:41:41 INFO  SecurityManager:54 - Changing modify acls to: hadoop
2019-02-24 14:41:41 INFO  SecurityManager:54 - Changing view acls groups to: 
2019-02-24 14:41:41 INFO  SecurityManager:54 - Changing modify acls groups to: 
2019-02-24 14:41:41 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
2019-02-24 14:41:41 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 37368.
2019-02-24 14:41:41 INFO  SparkEnv:54 - Registering MapOutputTracker
2019-02-24 14:41:41 INFO  SparkEnv:54 - Registering BlockManagerMaster
2019-02-24 14:41:41 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-02-24 14:41:41 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-02-24 14:41:41 INFO  DiskBlockManager:54 - Created local directory at /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1551033757513_0011/blockmgr-ea75eeb2-fb84-4d22-8f29-ba4283eb5efc
2019-02-24 14:41:42 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2019-02-24 14:41:42 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2019-02-24 14:41:42 INFO  log:192 - Logging initialized @9697ms
2019-02-24 14:41:43 INFO  JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
2019-02-24 14:41:43 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2019-02-24 14:41:43 INFO  Server:419 - Started @10064ms
2019-02-24 14:41:43 INFO  AbstractConnector:278 - Started ServerConnector@5d1faff9{HTTP/1.1,[http/1.1]}{0.0.0.0:33181}
2019-02-24 14:41:43 INFO  Utils:54 - Successfully started service 'SparkUI' on port 33181.
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3e4dde9a{/jobs,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5b4b2d8b{/jobs/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@36f37180{/jobs/job,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@edf8590{/jobs/job/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c7ad6b5{/stages,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2128c9cb{/stages/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4ceefc2f{/stages/stage,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3fb4ee4{/stages/stage/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@38cfc530{/stages/pool,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7eff0f35{/stages/pool/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4f9d6ef6{/storage,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@16c8958f{/storage/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50683423{/storage/rdd,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@56e81fbc{/storage/rdd/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@72262149{/environment,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2010a66f{/environment/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31c84762{/executors,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@27cbab18{/executors/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@64a4eac1{/executors/threadDump,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@41221be4{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@32a2a7f5{/static,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@32d23207{/,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3808225f{/api,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@35b9f8ea{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c552738{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-02-24 14:41:43 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://weirV1:33181
2019-02-24 14:41:43 INFO  YarnClusterScheduler:54 - Created YarnClusterScheduler
2019-02-24 14:41:43 INFO  SchedulerExtensionServices:54 - Starting Yarn extension services with app application_1551033757513_0011 and attemptId Some(appattempt_1551033757513_0011_000001)
2019-02-24 14:41:43 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35541.
2019-02-24 14:41:43 INFO  NettyBlockTransferService:54 - Server created on weirV1:35541
2019-02-24 14:41:43 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-02-24 14:41:43 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:43 INFO  BlockManagerMasterEndpoint:54 - Registering block manager weirV1:35541 with 366.3 MB RAM, BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:43 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:43 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, weirV1, 35541, None)
2019-02-24 14:41:44 INFO  JettyUtils:54 - Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
2019-02-24 14:41:44 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5e35b086{/metrics/json,null,AVAILABLE,@Spark}
2019-02-24 14:41:44 INFO  EventLoggingListener:54 - Logging events to hdfs:/spark-event-log/application_1551033757513_0011_1
2019-02-24 14:41:45 INFO  RMProxy:98 - Connecting to ResourceManager at weirv1/80.211.222.23:8030
2019-02-24 14:41:45 INFO  YarnRMClient:54 - Registering the ApplicationMaster
2019-02-24 14:41:45 INFO  ApplicationMaster:54 - 
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/*<CPS>$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/*<CPS>$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
    SPARK_YARN_STAGING_DIR -> hdfs://localhost:9000/user/hadoop/.sparkStaging/application_1551033757513_0011
    SPARK_USER -> hadoop

  command:
    {{JAVA_HOME}}/bin/java \ 
      -server \ 
      -Xmx1024m \ 
      '-Dhive.spark.log.dir=/home/hadoop/spark/logs/' \ 
      -Djava.io.tmpdir={{PWD}}/tmp \ 
      '-Dspark.hadoop.hbase.regionserver.info.port=16030' \ 
      '-Dspark.hadoop.hbase.master.info.port=16010' \ 
      '-Dspark.ui.port=0' \ 
      '-Dspark.hadoop.hbase.rest.port=8080' \ 
      '-Dspark.hadoop.hbase.master.port=16000' \ 
      '-Dspark.hadoop.hbase.regionserver.port=16020' \ 
      '-Dspark.driver.port=37368' \ 
      '-Dspark.hadoop.hbase.status.multicast.address.port=16100' \ 
      -Dspark.yarn.app.container.log.dir=<LOG_DIR> \ 
      -XX:OnOutOfMemoryError='kill %p' \ 
      org.apache.spark.executor.CoarseGrainedExecutorBackend \ 
      --driver-url \ 
      spark://CoarseGrainedScheduler@weirV1:37368 \ 
      --executor-id \ 
      <executorId> \ 
      --hostname \ 
      <hostname> \ 
      --cores \ 
      4 \ 
      --app-id \ 
      application_1551033757513_0011 \ 
      --user-class-path \ 
      file:$PWD/__app__.jar \ 
      1><LOG_DIR>/stdout \ 
      2><LOG_DIR>/stderr

  resources:
    __app__.jar -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/user/hadoop/.sparkStaging/application_1551033757513_0011/hive-exec-3.1.1.jar" } size: 40604738 timestamp: 1551037287119 type: FILE visibility: PRIVATE
    __spark_libs__ -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/spark-jars-nohive" } size: 0 timestamp: 1550932521588 type: ARCHIVE visibility: PUBLIC
    __spark_conf__ -> resource { scheme: "hdfs" host: "localhost" port: 9000 file: "/user/hadoop/.sparkStaging/application_1551033757513_0011/__spark_conf__.zip" } size: 623550 timestamp: 1551037288226 type: ARCHIVE visibility: PRIVATE

===============================================================================
2019-02-24 14:41:46 INFO  YarnAllocator:54 - Will request 1 executor container(s), each with 4 core(s) and 1194 MB memory (including 170 MB of overhead)
2019-02-24 14:41:46 INFO  YarnSchedulerBackend$YarnSchedulerEndpoint:54 - ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@weirV1:37368)
2019-02-24 14:41:46 INFO  YarnAllocator:54 - Submitted 1 unlocalized container requests.
2019-02-24 14:41:46 INFO  ApplicationMaster:54 - Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
2019-02-24 14:42:13 INFO  YarnClusterSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)
2019-02-24 14:42:13 INFO  YarnClusterScheduler:54 - YarnClusterScheduler.postStartHook done
2019-02-24 14:42:13 INFO  SparkContext:54 - Added JAR hdfs://localhost:9000/tmp/hive/hadoop/_spark_session_dir/94aded5e-fbeb-4839-af11-9c5f5902fa0c/hive-exec-3.1.1.jar at hdfs://localhost:9000/tmp/hive/hadoop/_spark_session_dir/94aded5e-fbeb-4839-af11-9c5f5902fa0c/hive-exec-3.1.1.jar with timestamp 1551037333719
2019-02-24 14:42:13 INFO  RemoteDriver:306 - Received job request befdba6d-70e5-4a3b-a08e-564376ba3b47
2019-02-24 14:42:14 INFO  SparkClientUtilities:107 - Copying hdfs://localhost:9000/tmp/hive/hadoop/_spark_session_dir/94aded5e-fbeb-4839-af11-9c5f5902fa0c/hive-exec-3.1.1.jar to /tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1551033757513_0011/container_1551033757513_0011_01_000001/tmp/1551037299410-0/hive-exec-3.1.1.jar
2019-02-24 14:42:14 INFO  SparkClientUtilities:71 - Added jar[file:/tmp/hadoop-hadoop/nm-local-dir/usercache/hadoop/appcache/application_1551033757513_0011/container_1551033757513_0011_01_000001/tmp/1551037299410-0/hive-exec-3.1.1.jar] to classpath.
2019-02-24 14:42:16 INFO  deprecation:1173 - mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
2019-02-24 14:42:16 INFO  Utilities:3298 - Processing alias rentflattoday
2019-02-24 14:42:16 INFO  Utilities:3336 - Adding 1 inputs; the first input is hdfs://localhost:9000/user/hive/warehouse/csu.db/rentflattoday
2019-02-24 14:42:16 INFO  SerializationUtilities:569 - Serializing MapWork using kryo
2019-02-24 14:42:17 INFO  Utilities:633 - Serialized plan (via FILE) - name: Map 1 size: 6.57KB
2019-02-24 14:42:18 INFO  MemoryStore:54 - Block broadcast_0 stored as values in memory (estimated size 1216.3 KB, free 365.1 MB)
2019-02-24 14:42:19 INFO  MemoryStore:54 - Block broadcast_0_piece0 stored as bytes in memory (estimated size 85.2 KB, free 365.0 MB)
2019-02-24 14:42:19 INFO  BlockManagerInfo:54 - Added broadcast_0_piece0 in memory on weirV1:35541 (size: 85.2 KB, free: 366.2 MB)
2019-02-24 14:42:19 INFO  SparkContext:54 - Created broadcast 0 from Map 1
2019-02-24 14:42:19 INFO  Utilities:429 - PLAN PATH = hdfs://localhost:9000/tmp/hive/hadoop/75557489-581b-4292-b43b-1c86c6bcdcb2/hive_2019-02-24_14-41-17_480_8986995693652128044-2/-mr-10004/8b6206d1-557f-4345-ace3-9dfe64d6634b/map.xml
2019-02-24 14:42:19 INFO  CombineHiveInputFormat:477 - Total number of paths: 1, launching 1 threads to check non-combinable ones.
2019-02-24 14:42:19 INFO  CombineHiveInputFormat:413 - CombineHiveInputSplit creating pool for hdfs://localhost:9000/user/hive/warehouse/csu.db/rentflattoday; using filter path hdfs://localhost:9000/user/hive/warehouse/csu.db/rentflattoday
2019-02-24 14:42:20 INFO  FileInputFormat:283 - Total input paths to process : 1
2019-02-24 14:42:20 INFO  CombineFileInputFormat:413 - DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0
2019-02-24 14:42:20 INFO  CombineHiveInputFormat:467 - number of splits 1
2019-02-24 14:42:20 INFO  CombineHiveInputFormat:587 - Number of all splits 1
2019-02-24 14:42:20 INFO  SerializationUtilities:569 - Serializing ReduceWork using kryo
2019-02-24 14:42:20 INFO  Utilities:633 - Serialized plan (via FILE) - name: Reducer 2 size: 3.84KB
2019-02-24 14:42:20 INFO  SparkPlan:107 - 

Spark RDD Graph:

(1) Reducer 2 (1) MapPartitionsRDD[4] at Reducer 2 []
 |  Reducer 2 (GROUP, 1) MapPartitionsRDD[3] at Reducer 2 []
 |  ShuffledRDD[2] at Reducer 2 []
 +-(1) Map 1 (1) MapPartitionsRDD[1] at Map 1 []
    |  Map 1 (rentflattoday, 1) HadoopRDD[0] at Map 1 []

2019-02-24 14:42:20 INFO  DAGScheduler:54 - Registering RDD 1 (Map 1)
2019-02-24 14:42:20 INFO  DAGScheduler:54 - Got job 0 (Reducer 2) with 1 output partitions
2019-02-24 14:42:20 INFO  DAGScheduler:54 - Final stage: ResultStage 1 (Reducer 2)
2019-02-24 14:42:20 INFO  DAGScheduler:54 - Parents of final stage: List(ShuffleMapStage 0)
2019-02-24 14:42:20 INFO  DAGScheduler:54 - Missing parents: List(ShuffleMapStage 0)
2019-02-24 14:42:20 INFO  DAGScheduler:54 - Submitting ShuffleMapStage 0 (Map 1 (1) MapPartitionsRDD[1] at Map 1), which has no missing parents
2019-02-24 14:42:21 INFO  MemoryStore:54 - Block broadcast_1 stored as values in memory (estimated size 293.7 KB, free 364.7 MB)
2019-02-24 14:42:21 INFO  MemoryStore:54 - Block broadcast_1_piece0 stored as bytes in memory (estimated size 88.1 KB, free 364.7 MB)
2019-02-24 14:42:21 INFO  BlockManagerInfo:54 - Added broadcast_1_piece0 in memory on weirV1:35541 (size: 88.1 KB, free: 366.1 MB)
2019-02-24 14:42:21 INFO  SparkContext:54 - Created broadcast 1 from broadcast at DAGScheduler.scala:1161
2019-02-24 14:42:21 INFO  DAGScheduler:54 - Submitting 1 missing tasks from ShuffleMapStage 0 (Map 1 (1) MapPartitionsRDD[1] at Map 1) (first 15 tasks are for partitions Vector(0))
2019-02-24 14:42:21 INFO  YarnClusterScheduler:54 - Adding task set 0.0 with 1 tasks
2019-02-24 14:42:36 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:42:51 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:06 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:21 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:36 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:43:51 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:06 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:21 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:36 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:44:51 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:45:06 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:45:21 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2019-02-24 14:45:36 WARN  YarnClusterScheduler:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Вот мои hive-site.xml и yarn-site.xml

<configuration>

...

<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>


<property>
<name>spark.master</name>
<value>yarn</value>
</property>

<property>
<name>spark.submit.deployMode</name>
<value>cluster</value>
</property>

<property>
<name>spark.home</name>
<value>/home/hadoop/spark</value>
</property>

<property>
<name>spark.yarn.archive</name>
<value>hdfs:///spark-jars-nohive</value>
</property>

<property>
<name>spark.queue.name</name>
<value>default</value>
</property>

<property>
<name>spark.eventLog.enabled</name>
<value>true</value>
</property>

<property>
<name>spark.eventLog.dir</name>
<value>hdfs:///spark-event-log</value>
</property>

<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>

<property>
<name>spark.executor.cores</name>
<value>4</value>
</property>

<property>
<name>spark.executor.instances</name>
<value>1</value>
</property>

<property>
<name>spark.dynamicAllocation.enabled</name>
<value>false</value>
</property>


<property>
<name>spark.executor.memory</name>
<value>1024m</value>
</property>

<property>
<name>spark.executor.memoryOverhead</name>
<value>170m</value>
</property>


</configuration>



<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<property>
        <name>yarn.acl.enable</name>
        <value>0</value>
</property>

<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>weirv1</value>
</property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
    <property>
                <description>Amount of physical memory, in MB, that can be allocated for containers.</description>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>3072</value>
        </property>
        <property>
                <description>The minimum allocation size for every container request at the RM, in MBs. Memory requests lower than this won't take effect,
and the specified value will get allocated at minimum.</description>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>1024</value>
        </property>
        <property>
                <description>The maximum allocation size for every container request at the RM, in MBs. Memory requests higher than this won't take effect,
and will get capped to this value.</description>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>3072</value>
        </property>
        <property>
                <name>yarn.app.mapreduce.am.resource.mb</name>
                <value>2048</value>
        </property>
        <property>
                <name>yarn.app.mapreduce.am.command-opts</name>
                <value>-Xmx1638m</value>
        </property>

        <property>
                <name>yarn.nodemanager.vmem-check-enabled</name>
                <value>false</value>
                <description>Whether virtual memory limits will be enforced for containers.</description>
        </property>
        <property>
                <name>yarn.resourcemanager.scheduler.class</name>
                <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
        </property>

        <property>
                <name>yarn.scheduler.fair.user-as-default-queue</name>
                <value>false</value>
        </property>

        <property>
                <name>yarn.scheduler.fair.allocation.file</name>
                <value>/home/hadoop/hadoop/etc/hadoop/fair-scheduler.xml</value>
        </property>

</configuration>

Поскольку я новичок в этом,Я предполагаю, что некоторые из этих настроек неверны / неактивны, или предупреждение в журнале просто означает, что моей машине не хватает памяти, и я должен изменить настройки памяти?

Спасибо: -)

Bořivoj Vlk · Answer 1 · 24 февраля 2019

Поскольку я это выяснил, я опубликую это здесь на случай, если кто-то наткнется на это.Похоже, что машине действительно не хватило памяти, и установил yarn.scheduler.minimum-alloc-mb на 512, а spark.executor.memory на 512m.

Запрос Hive on Spark зависает при недостаточных ресурсах

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Запрос Hive on Spark зависает при недостаточных ресурсах

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

1 Ответ

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Похожие темы