Я пытался следовать в улье-
set hive.exec.reducers.max = 1;
set mapred.reduce.tasks = 1;
from flat_json
insert overwrite table aggr_pgm_measure PARTITION(dt='${START_TIME}')
reduce log_time,
req_id, ac_id, client_key, rulename, categoryname, bsid, visitorid, visitorgroupid, visitortargetid, targetpopulationid, windowsessionid, eventseq, event_code, eventstarttime
using '${SCRIPT_LOC}/aggregator.pl' as
metric_id, metric_value, aggr_type, rule_name, category_name;
Несмотря на установку максимального количества и количества сокращенных задач до 1, я вижу, как 2 карты уменьшают генерируемую задачу. Пожалуйста, смотрите ниже-
> set hive.exec.reducers.max = 1;
hive> set mapred.reduce.tasks = 1;
hive>
> from flat_json
> insert overwrite table aggr_pgm_measure PARTITION(dt='${START_TIME}')
> reduce log_time,
> req_id, ac_id, client_key, rulename, categoryname, bsid, visitorid, visitorgroupid, visitortargetid, targetpopulationid, windowsessionid, eventseq, event_code, eventstarttime
> using '${SCRIPT_LOC}/aggregator.pl' as
> metric_id, metric_value, aggr_type, rule_name, category_name;
converting to local s3://dsp-emr-test/anurag/dsp-test/60mins/script/aggregator.pl
Added resource: /mnt/var/lib/hive_07_1/downloaded_resources/aggregator.pl
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201112270825_0009, Tracking URL = http://ip-10-85-66-9.ec2.internal:9100/jobdetails.jsp?jobid=job_201112270825_0009
Kill Command = /home/hadoop/.versions/0.20.205/libexec/../bin/hadoop job -Dmapred.job.tracker=10.85.66.9:9001 -kill job_201112270825_0009
2011-12-27 10:30:03,542 Stage-1 map = 0%, reduce = 0%