Как сопоставить столбец между файлом HDFS (сохранить через cygnus) к таблице Hive? - PullRequest
2 голосов
/ 02 мая 2019

Я хочу сохранить некоторые данные в улье через fiware-cygnus,

Я установил файл conf,

cygnus-ngsi.sources = http-source
cygnus-ngsi.sinks = hdfs-sink
cygnus-ngsi.channels = hdfs-channel

cygnus-ngsi.sources.http-source.channels = hdfs-channel
cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
cygnus-ngsi.sources.http-source.port = 5050
cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
cygnus-ngsi.sources.http-source.handler.notification_target = /notify
cygnus-ngsi.sources.http-source.handler.default_service = def_serv
cygnus-ngsi.sources.http-source.handler.default_service_path = /def_servpath
cygnus-ngsi.sources.http-source.handler.events_ttl = 2
cygnus-ngsi.sources.http-source.interceptors = ts gi
cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /Applications/apache-flume-1.4.0-bin/conf/grouping_rules.confi

cygnus-ngsi.channels.hdfs-channel.type = memory
cygnus-ngsi.channels.hdfs-channel.capacity = 1000
cygnus-ngsi.channels.hdfs-channel.transactionCapacity = 100

cygnus-ngsi.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.NGSIHDFSSink
cygnus-ngsi.sinks.hdfs-sink.channel = hdfs-channel
cygnus-ngsi.sinks.hdfs-sink.enable_encoding = true
cygnus-ngsi.sinks.hdfs-sink.enable_grouping = false
cygnus-ngsi.sinks.hdfs-sink.enable_lowercase = false
cygnus-ngsi.sinks.hdfs-sink.enable_name_mappings = false
cygnus-ngsi.sinks.hdfs-sink.data_model = dm-by-entity
cygnus-ngsi.sinks.hdfs-sink.file_format = json-row
cygnus-ngsi.sinks.hdfs-sink.backend.impl = rest
cygnus-ngsi.sinks.hdfs-sink.backend.max_conns = 500
cygnus-ngsi.sinks.hdfs-sink.backend.max_conns_per_route = 100
cygnus-ngsi.sinks.hdfs-sink.hdfs_host = [hdfs_host]
cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070
cygnus-ngsi.sinks.hdfs-sink.hdfs_username = [username]
cygnus-ngsi.sinks.hdfs-sink.hdfs_password = [password]
cygnus-ngsi.sinks.hdfs-sink.oauth2_token = [token]
cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false
cygnus-ngsi.sinks.hdfs-sink.batch_size = 100
cygnus-ngsi.sinks.hdfs-sink.batch_timeout = 30
cygnus-ngsi.sinks.hdfs-sink.batch_ttl = 10
cygnus-ngsi.sinks.hdfs-sink.batch_retry_intervals = 5000
cygnus-ngsi.sinks.hdfs-sink.hive = true
cygnus-ngsi.sinks.hdfs-sink.hive.server_version = 2
cygnus-ngsi.sinks.hdfs-sink.hive.host = [hive_host]
cygnus-ngsi.sinks.hdfs-sink.hive.port = 10000
cygnus-ngsi.sinks.hdfs-sink.hive.db_type = default-db
cygnus-ngsi.sinks.hdfs-sink.krb5_auth = false

и обнаружил, что файл HDFS создан,

{"recvx0054imex0054s":"1556776186","recvx0054ime":"2019-05-02T05:49:46.771Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"lux","attrx0054ype":"Float","attrx0056alue":"7","attrx004dd":[]}
{"recvx0054imex0054s":"1556776186","recvx0054ime":"2019-05-02T05:49:46.771Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"status","attrx0054ype":"string","attrx0056alue":"off","attrx004dd":[]}
{"recvx0054imex0054s":"1556776989","recvx0054ime":"2019-05-02T06:03:09.962Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"lux","attrx0054ype":"Float","attrx0056alue":"7","attrx004dd":[]}
{"recvx0054imex0054s":"1556776989","recvx0054ime":"2019-05-02T06:03:09.962Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"status","attrx0054ype":"string","attrx0056alue":"off","attrx004dd":[]}

но, в кусте все данные показывают нулевые значения.

hive> select * from ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row;
chmod: changing permissions of 'hdfs://hadoop01:9000/tmp/hive/hive/ae8cd691-6e96-428d-8ece-02f6e6fbf527/hive_2019-05-02_16-37-44_821_8122085527913292696-1/-mr-10001': Permission denied: user=hive, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx-w----
OK
ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.recvx0054imex0054s ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.recvx0054ime       ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.fiwarex0053ervicex0050ath      ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.entityx0049d       ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.entityx0054ype     ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx004eame   ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx0054ype       ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx0056alue ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx004dd
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
Time taken: 0.26 seconds, Fetched: 4 row(s)

Внешний запрос создания таблицы в Cygnus Debug Console Сообщение:

create external table 
    if not exists default.ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row

    (recvx0054imex0054s bigint,
    recvx0054ime string,
    fiwarex0053ervicex0050ath string,
    entityx0049d string,
    entityx0054ype string,
    attrx004eame string,
    attrx0054ype string,
    attrx0056alue string,
    attrx004dd array<struct<name:string,type:string,value:string>>) 

    row format serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ("dots.in.keys" = "true") 

    location '/user/ngsi/Light/LightInfo/bedroom_light_Light'

Я предполагаю разницу между столбцом таблицы кустаимя и атрибут файла HDFS Имя является проблемой.Имя столбца таблицы Hive использует имя кодировки, но имя атрибута файла hdfs использует имя не кодировки (например: recvTimeTs - recvx0054imex0054s)

, поэтому я создал внешнюю таблицу и протестировал.

create external table if not exists default.test01 (
    recvTimeTs bigint,
    recvTime string,
    fiwareServicePath string,
    entityId string,
    entityType string,
    attrName string,
    attrType string,
    attrValue string,
    attrMd array<struct<name:string,type:string,value:string>>)
row format serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ("dots.in.keys" = "true") 
location '/user/ngsi/Light/LightInfo/bedroom_light_Light'

Результат,показывать правильные значения.

hive> select * from test01;
chmod: changing permissions of 'hdfs://hadoop01:9000/tmp/hive/hive/bec8d67d-7d94-4cdc-a577-14d990ca342c/hive_2019-05-02_16-42-59_290_896718721577759467-1/-mr-10001': Permission denied: user=hive, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx-w----
OK
1556776186      2019-05-02T05:49:46.771Z        /LightInfo      bedroom3_light  Light   lux     Float   7       []
1556776186      2019-05-02T05:49:46.771Z        /LightInfo      bedroom3_light  Light   status  string  off     []
1556776989      2019-05-02T06:03:09.962Z        /LightInfo      bedroom3_light  Light   lux     Float   7       []
1556776989      2019-05-02T06:03:09.962Z        /LightInfo      bedroom3_light  Light   status  string  off     []
Time taken: 2.329 seconds, Fetched: 4 row(s)

Как сопоставить имя столбца таблицы кустов и имя атрибута файла HDFS?

или Как использовать cygnus и куст другого способа?

...