JSON разбор с использованием Groovy - PullRequest
0 голосов
/ 08 февраля 2020

Я пытаюсь проанализировать JSON записей, используя скрипт Groovy, который вызывается через NiFi. Основываясь на вводе, который я получил от Nested JSON с дублирующимися ключами , он работал нормально. Но мой сценарий groovy завершился неудачно для нескольких записей, если в столбце значений есть двойные кавычки (").

E.g.,
"CT":"text/javascript; charset="UTF-8""

Чтобы устранить вышеуказанную проблему, я заменил все двойные кавычки (") из Записать и сделать запись похожей на

CT:text/javascript; charset=UTF-8

и использовать JsonParserType.LAX при анализе JSON. Тогда проблема была решена. Но я столкнулся с другой проблемой, когда входящее значение JSON colunn имело запятую (,) между

UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75,app_store:googleplay,app_uuid:c6bed4

Для указанного выше значения

After this piece of code got executed,
new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)

Output looks like,
UA:Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML,

и оставшихся значений "UA столбец идет отдельно, как показано ниже, без ключа,

537.36 data={aaid:9541535b-ab50-41c7-9ae4-4d729b501d75

, и эта часть полностью пропущена в проанализированном выводе

 like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/

Ниже приведен мой полный код с примером json запись,

import groovy.json.*
import java.util.function.*
/* Example record for extra double quotes in the column value */
// def json = '{"transaction":[{"TS":"2020-02-04 07:02:55.074013","V":"v8","PID":"p31177","RS":"359","SR":"434","CnID":"0/1","CIPG":{"CIP":"10.176.63.122","CP":"35073","SLP":"35073","SLEP":"35073","CVID":"410"},"SIPG":{"SIP":"54.93.254.233","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"73","R":"73","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"73"},"AS":"365","OS":"365","CPr":"0","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"-","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"S","Flag23":"c","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"s","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"j","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"365","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"31/31/31 21 1/1 0.0/0.0","CT":"text/javascript; charset="UTF-8"","MS":"447715522518","MSH":"1C5SujOJcZs4SCCoUSQSynOXaCyND1q4TQ30kcryo/Y=","SID":"v4.172.16.146.148:9EE6EC7A17108A46","SuID":"-","UA":"okhttp/3.9.1","DID":"Default-Android-Web-Unknown","UAG":"Android-Web","CID":"TECH","HR":"-","CRG":"3001010006000000","CP1":"8232F40186F032F40107B06F6E","CP2":"3599650837621501","AIDF":"1:1386","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"GET","HD":"pubsub.pubnub.com","HP":"/v2/presence/sub-key/sub-c-ee729d78-6233-11e2-b80b-12313f022c90/channel/uk9_18395/heartbeat","HQ":"requestid=5bd3d724-84f7-4f91-865c-a141136e537f&state=%257B%257D&heartbeat=300&pnsdk=PubNub-Java-Unified/4.19.0&uuid=600752db-00ad-4478-9688-b8ee2b13414f"},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'
/* Example record for comma in the column value */
def json = '{"transaction":[{"TS":"2020-02-04 07:04:50.478718","V":"v8","PID":"p12318","RS":"4087","SR":"4162","CnID":"0/89","CIPG":{"CIP":"10.146.84.250","CP":"52106","SLP":"18518","SLEP":"18518","CVID":"410"},"SIPG":{"SIP":"104.18.14.201","SP":"80","InP":"80","SVID":"420"},"TG":{"T":"36","R":"36","C":"0","SDL":"0","DL":"0","I:R":"2:0","UAP":"0","EDBL":"0","Ca":"0","A":"0","RQM":"0","RSM":"0","FIT":"0","CSR":"36"},"AS":"463","OS":"480","CPr":"-3","CVB":"0","CS":"MISS","HS":"200","OF":{"Flag1":"c","Flag2":"-","Flag3":"-","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"-","Flag11":"-","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"-","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"-","Flag23":"-"},"SF":{"Flag1":"A","Flag2":"M","Flag3":"H","Flag4":"-","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"F","Flag9":"-","Flag10":"-","Flag11":"C","Flag12":"-","Flag13":"-","Flag14":"-","Flag15":"-","Flag16":"-","Flag17":"F","Flag18":"-","Flag19":"-","Flag20":"-","Flag21":"-","Flag22":"G","Flag23":"G","Flag24":"-"},"GF":{"Flag1":"n","Flag2":"-","Flag3":"b","Flag4":"T","Flag5":"-","Flag6":"-","Flag7":"-","Flag8":"-","Flag9":"-","Flag10":"k","Flag11":"-","Flag12":"S","Flag13":"C","Flag14":"-","Flag15":"-","Flag16":"M","Flag17":"T","Flag18":"A","Flag19":"!","Flag20":"-","Flag21":"H","Flag22":"-","Flag23":"S","Flag24":"-","Flag25":"-","Flag26":"5","Flag27":"-","Flag28":"-","Flag29":"-","Flag30":"L","Flag31":"-","Flag32":"-","Flag33":"-","Flag34":"-","Flag35":"-"},"SUNR":"0","SUNS":"480","ML":"-","VSL":{"VSID":"0","TC":"-","MTC":"-","NRTC":"-","ET":"-","HRES":"0","VRES":"0","FS":"0","FR":"0.0","VSD":"0","ACB":"-","ASB":"-","VPR":"-","VSST":"-"},"MT":"-","TCPSL":"35/35/35 18 1/2 0.0/0.0","CT":"application/json;charset=UTF-8","MS":"447707123573","MSH":"G7RPssPDwUd1pKsRNs8KFtSEIZGIyJktwNDWboHmXds=","SID":"v4.172.16.138.148:9EE6E0766C39E5CC","SuID":"-","UA":"Mozilla/5.0 (Linux; Android 10; SM-G975F Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/79.0.3945.136 Mobile Safari/537.36 data={"aaid":"9541535b-ab50-41c7-9ae4-4d729b501d75","app_store":"googleplay","app_uuid":"c6bed4","DID":"Android mobile SM-","UAG":"Android-Web","CID":"bm_unks","HR":"-","CRG":"0401020006000000","CP1":"8232F401C31032F4010018D678","CP2":"3562611081227201","AIDF":"6:5","UCB":"0/0","CLID":"-","CLCL":"0","OPTS":"-","PUAG":"Android-Web","SSLIL":"-","HRU":{"HM":"POST","HD":"ire-d01.fordealdc.com","HP":"/api/dotData","HQ":""},"URLF":{"CID":"-","CGID":"-","CR":"-","RA":"-","USM":"-","USP":"-","MUS":"-"},"TCPSt":{"WS":"0","SE":"0","WSFNS":"0","WSF":"0","EM":"0","RSTE":"0","MSS":"0"},"NS":{"OPID":"-","ODID":"-","EPID":"-","TrID":"-","VSN":"-","LSUT":"-","STTS":"-","TCPPR":"-"},"CQA":{"NL":"-","CL":"-","CLC":"-","SQ":"-","SQC":"-"}}]}'

def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]').replace('"','').replace(':-,',':,').replace(':-}',':}')
def jsonRecord = new JsonSlurper().setType(JsonParserType.LAX).parseText(jsonReplace)
println "After parsing: " + jsonRecord
static def flatten(row, prefix="") {
    def flattened = new HashMap<String, String>()
    println "Input: " + row
    row.each { String k, Object v ->
        def key = prefix ? prefix + "_" + k : k;

        if (v instanceof Map) {
            flattened.putAll(flatten(v, k))
        } else {
            flattened.put(key,v.toString())
        }
    }

    println "Output: " +flattened
    return flattened
}

static def toCSVRow(HashMap row) {
    def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"] 

    return columns.collect { column ->
        return row.containsKey(column) ? row.get(column) : ""
    }.join(',')
}

def a = jsonRecord.transaction.collect { row ->
    return flatten(row)
}.collect { row ->
    return toCSVRow(row).toString().replace('[','').replace(']','')
}.join('\n')

println a

Мне нужно посоветовать, как с этим справиться. Все, что я хочу знать, - как мой Groovy скрипт может правильно анализировать данные для всех JSON записей.

1 Ответ

0 голосов
/ 09 февраля 2020

Как отмечается в комментариях к вашему вопросу, контент, который вы пытаетесь проанализировать, не JSON. Наилучший подход - начать с нуля и найти общие черты в данных.

Если каждая пара имя / значение находится на отдельной строке (разделенной / r или / n), то строка разбивает содержимое, чтобы получить list.

Поскольку каждая пара использует '=' для их разделения, затем итерируйте этот список и разбейте строку, используя разделитель. Обрежьте каждую сторону, и у вас получится пара ваше имя / значение.

Редактировать: вам нужно будет пройтись по карте, которую вы получили в результате, и проанализировать ее, чтобы увидеть, определены ли массивы или объекты, и помассировать данные для учета этого.

Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...