Как вставить Txt в TALBLE ORC, используя RegEx? - PullRequest
0 голосов
/ 10 октября 2019

Я пытаюсь сделать «вставку» в таблицу в формате ORC, я знаю, что сначала мне нужно создать таблицу в формате текстового файла, но когда я вставляю таблицу TextFile, она берет только первую запись, и в ней тысячизаписи.

Вторая проблема: при попытке вставить данные в таблицу ORC TEXTFILE я получаю следующую ошибку.

Как это исправить? Спасибо

файл текстового файла содержит следующее

172199100408438ARP

здесь у меня есть 3 строки

CREATE TABLE table_txt (id string, info string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "(.{3})(.{3}).*")
STORED AS TEXTFILE;

CREATE TABLE table_orc (id string, info string)
STORED AS ORC;

load data inpath '/user/hdfs/textfile.TXT' overwrite into table table_txt;

INSERT OVERWRITE TABLE table_orc SELECT * FROM table_txt;

Мне нужно что-то подобное, когда я делаю запрос

172     199
100     408
438     ARP

но в txt я получаю только

172     199

при условии, что все в порядке, и когда я хочу передать данные текстового файла в ORC, я получаю следующее

hive> INSERT OVERWRITE TABLE table_orc SELECT * FROM table_txt;;
Query ID = homosrv_20191010151919_53e1a477-4086-4cb1-b207-60e7d355ba50
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1570556390990_0067,kill Command = /opt/cloudera/parcels/CDH-5.14.4-1.cdh5.14.4.p0.3/lib/hadoop/bin/hadoop job  -kill job_1570556390990_0067
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-10 15:19:26,095 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:20:26,847 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:21:27,574 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:22:28,253 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:23:28,876 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:24:29,521 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:25:30,168 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:26:30,775 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:27:31,406 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:28:32,003 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:29:32,574 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:30:33,125 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:31:33,695 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:32:34,296 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:33:34,833 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:34:35,398 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:35:35,929 Stage-1 map = 0%,  reduce = 0%
2019-10-10 15:36:08,846 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1570556390990_0067 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1570556390990_0067_m_000000 (and more) from job job_1570556390990_0067

Task with the most failures(4):
-----
Task ID:
  task_1570556390990_0067_m_000000

URL:
  http:
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
        ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
        ... 17 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:147)
        ... 22 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.RegexSerDe not found
        at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:323)
        at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:333)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:116)
        ... 22 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.RegexSerDe not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2255)
        at org.apache.hadoop.hive.ql.plan.PartitionDesc.getDeserializer(PartitionDesc.java:137)
        at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:297)
        ... 24 more


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
...