У меня есть настройка Hive (v2.3.4) на Spark (exec engine).Моя внешняя таблица ульев имеет формат Parquet на s3 через 100 секций.Ниже приведены настройки 20:
hive.exec.input.listing.max.threads
mapred.dfsclient.parallelism.max
mapreduce.input.fileinputformat.list-status.num-threads
Запустите простой запрос:
select * from s.there h_code = 'KGD78' and h_no = '265'
Я вижу ниже в журналах HiveServer2 (журналы продолжаются более 1000 строк, перечисляя всеразные перегородки).Почему распечатка файлов не выполняется параллельно?Это занимает больше 5 минут только в списке.
2019-03-29T11:29:26,866 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] compress.CodecPool: Got brand-new decompressor [.snappy]
2019-03-29T11:29:27,283 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:27,797 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:28,374 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:28,919 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:29,483 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:30,003 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:30,518 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:31,001 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:31,549 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:32,048 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:32,574 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:33,130 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:33,639 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:34,189 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:34,743 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:35,208 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:35,701 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:36,183 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:36,662 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:37,154 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
2019-03-29T11:29:37,645 INFO [3fa82455-7853-4c4b-8964-847c00bec708 HiveServer2-Handler-Pool: Thread-53] mapred.FileInputFormat: Total input files to process : 1
Я пробовал
hive.exec.input.listing.max.threads
mapred.dfsclient.parallelism.max
mapreduce.input.fileinputformat.list-status.num-threads
со значениями по умолчанию, 1, 50 ... все тот же результат