Я пытаюсь запросить таблицу в кусте-2.3.3 в EMR 5.19, и я получаю выходные данные со значениями NULL:
hive> select * from ip_sandbox_dev.master_schedule limit 5 ;
OK
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL
Time taken: 2.067 seconds, Fetched: 5 row(s)
Но когда я запрашиваю ту же таблицу из куста EMR-5.4 2.1.1 Я получаю ожидаемые результаты:
OK
THURSDAY ABQ ABC 3 4 ABQABC3 MIDWEST TRUCK & AUTO PARTS 18 14 Penny Mayfield N
TUESDAY ABQ ABC 0 4 ABQABC0 RANGER BRAKE PRODUCTS 15 14 Penny Mayfield N
TUESDAY ABQ ABC 1 4 ABQABC1 RANGER BRAKE PRODUCTS 15 14 Penny Mayfield N
TUESDAY ABQ ABC 2 4 ABQABC2 RANGER BRAKE PRODUCTS 15 14 Penny Mayfield N
TUESDAY ANC ABC 0 8 ANCABC0 RANGER BRAKE PRODUCTS 27 14 Penny Mayfield N
Time taken: 2.022 seconds, Fetched: 5 row(s)
Результат таблицы show create:
CREATE EXTERNAL TABLE `ip_sandbox_dev.master_schedule`(
`schedule_day` string,
`dc` string,
`mfg` string,
`subline` int,
`weeks` int,
`con` string,
`supplier` string,
`leadtime` int,
`buyer` int,
`buyer_name` string,
`optimize_flag` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
's3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='2',
'numRows'='59329',
'rawDataSize'='38922302',
'totalSize'='658865',
'transient_lastDdlTime'='1569395007')
Я не уверен, почему это расхождение в результатах. Я пытался удалить и воссоздать таблицу, но получить те же результаты.
Ниже приведен мой hive.log:
2019-10-11T08:25:55,404 ERROR [ORC_GET_SPLITS #0([])]: io.AcidUtils (AcidUtils.java:getAcidState(791)) - Failed to get files with ID; using regular API: Only supported for DFS; got class org.apache.hadoop.fs.s3a.S3AFileSystem
2019-10-11T08:25:55,411 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,487 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(1735)) - FooterCacheHitRatio: 0/2
2019-10-11T08:25:55,672 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.OrcInputFormat (OrcInputFormat.java:getDesiredRowTypeDescr(2463)) - Using schema evolution configuration variables schema.evolution.columns [schedule_day, dc, mfg, subline, weeks, con, supplier, leadtime, buyer, buyer_name, optimize_flag] / schema.evolution.columns.types [string, string, string, int, int, string, string, int, int, string, string] (isAcidRead false)
2019-10-11T08:25:55,673 INFO [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(79)) - Reading ORC rows from s3a://aap-warehouse-default-dev/ip_sandbox.db/master_schedule/part-00000-8c46f760-a26b-4fe6-ba3b-fc4d2d0ef228-c000.orc with {include: [true, true, true, true, true, true, true, true, true, true, true, true], offset: 0, length: 648566, schema: struct<schedule_day:string,dc:string,mfg:string,subline:int,weeks:int,con:string,supplier:string,leadtime:int,buyer:int,buyer_name:string,optimize_flag:string>}
2019-10-11T08:25:55,786 WARN [5b9e417b-8008-4fd2-b3c7-987fec297d63 main([])]: internal.S3AbortableInputStream (S3AbortableInputStream.java:close(178)) - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Может кто-нибудь, пожалуйста, помогите преодолеть это?