Замените текст на регулярное выражение в Apache Nifi - PullRequest
0 голосов
/ 23 апреля 2020

Хотите конвертировать набор различных xml -файлов в Json. Нифи, кажется, не понимает иерархию XML. Поэтому мы в итоге удалили внешние слои XML, поэтому у нас остался только внутренний сегмент и его дочерние элементы, а затем использовали ConvertRecord для согласования этого в Json.

Итак, это пример XML, который мы читаем:

'''

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- =============================================================== -->
    <!--                                                                 -->
    <!--  DESCRIPTION:                                                   -->
    <!--                                                                 -->
    <!--   This is XML statistic file generated by the Statistics Export -->
    <!--   Interface feature. This file contains statistics from         -->
    <!--   statistical server for adequate time intervals.               -->
    <!--                                                                 -->
    <!-- =============================================================== -->
    <!DOCTYPE channel_statistics SYSTEM "../DTD/channel.dtd">
    <channel_statistics>\s*
      <header>
        <version>1.6</version>
        <creation_date_time>2020-02-16 01:25 UTC</creation_date_time>
        <zone_id>1</zone_id>
      </header>
      <data status="complete">
        <interval status="complete" start="2020-02-16 00:00 UTC" length="900">
          <channel channel_id="5" site_id="17" site_alias ="OF992BD01" zone_id="1">
            <blocked_duration>0</blocked_duration>
            <control_channel_duration>0</control_channel_duration>
            <data_channel_allocation_count>0</data_channel_allocation_count>
            <data_channel_duration>900</data_channel_duration>
            <emergency_call_duration>0</emergency_call_duration>
            <emergency_calls>0</emergency_calls>
            <group_call_duration>0</group_call_duration>
            <group_calls>0</group_calls>
            <phone_call_duration>0</phone_call_duration>
            <phone_calls>0</phone_calls>
            <private_call_duration>0</private_call_duration>
            <private_calls>0</private_calls>
            <secondary_control_channel_duration>0</secondary_control_channel_duration>
            <interval_length>900</interval_length>
          </channel>
          <channel channel_id="5" site_id="17" site_alias ="OF989BD01" zone_id="1">
            <blocked_duration>0</blocked_duration>
            <control_channel_duration>0</control_channel_duration>
            <data_channel_allocation_count>0</data_channel_allocation_count>
            <data_channel_duration>900</data_channel_duration>
            <emergency_call_duration>0</emergency_call_duration>
            <emergency_calls>0</emergency_calls>
            <group_call_duration>0</group_call_duration>
            <group_calls>0</group_calls>
            <phone_call_duration>0</phone_call_duration>
            <phone_calls>0</phone_calls>
            <private_call_duration>0</private_call_duration>
            <private_calls>0</private_calls>
            <secondary_control_channel_duration>0</secondary_control_channel_duration>
            <interval_length>900</interval_length>
          </channel>
          <channel channel_id="2" site_id="34" site_alias ="GF969BD31" zone_id="1">
            <blocked_duration>0</blocked_duration>
            <control_channel_duration>0</control_channel_duration>
            <data_channel_allocation_count>0</data_channel_allocation_count>
            <data_channel_duration>0</data_channel_duration>
            <emergency_call_duration>0</emergency_call_duration>
            <emergency_calls>0</emergency_calls>
            <group_call_duration>0</group_call_duration>
            <group_calls>0</group_calls>
            <phone_call_duration>0</phone_call_duration>
            <phone_calls>0</phone_calls>
            <private_call_duration>0</private_call_duration>
            <private_calls>0</private_calls>
            <secondary_control_channel_duration>0</secondary_control_channel_duration>
            <interval_length>900</interval_length>
          </channel>
        </interval>
      </data>
    </channel_statistics>
    '''

Таким образом, разбивая его на интервал, показанный ниже, ConvertRecord может прочитать его.

    '''
        <interval status="complete" start="2020-02-16 00:00 UTC" length="900">
          <channel channel_id="5" site_id="17" site_alias ="OF082BS01" zone_id="1">
            <blocked_duration>0</blocked_duration>
            <control_channel_duration>0</control_channel_duration>
            <data_channel_allocation_count>0</data_channel_allocation_count>
            <data_channel_duration>900</data_channel_duration>
            <emergency_call_duration>0</emergency_call_duration>
            <emergency_calls>0</emergency_calls>
            <group_call_duration>0</group_call_duration>
            <group_calls>0</group_calls>
            <phone_call_duration>0</phone_call_duration>
            <phone_calls>0</phone_calls>
            <private_call_duration>0</private_call_duration>
            <private_calls>0</private_calls>
            <secondary_control_channel_duration>0</secondary_control_channel_duration>
            <interval_length>900</interval_length>
          </channel>
          <channel channel_id="5" site_id="17" site_alias ="OF082BS01" zone_id="1">
            <blocked_duration>0</blocked_duration>
            <control_channel_duration>0</control_channel_duration>
            <data_channel_allocation_count>0</data_channel_allocation_count>
            <data_channel_duration>900</data_channel_duration>
            <emergency_call_duration>0</emergency_call_duration>
            <emergency_calls>0</emergency_calls>
            <group_call_duration>0</group_call_duration>
            <group_calls>0</group_calls>
            <phone_call_duration>0</phone_call_duration>
            <phone_calls>0</phone_calls>
            <private_call_duration>0</private_call_duration>
            <private_calls>0</private_calls>
            <secondary_control_channel_duration>0</secondary_control_channel_duration>
            <interval_length>900</interval_length>
          </channel>
          <channel channel_id="2" site_id="34" site_alias ="OF041BS01" zone_id="1">
            <blocked_duration>0</blocked_duration>
            <control_channel_duration>0</control_channel_duration>
            <data_channel_allocation_count>0</data_channel_allocation_count>
            <data_channel_duration>0</data_channel_duration>
            <emergency_call_duration>0</emergency_call_duration>
            <emergency_calls>0</emergency_calls>
            <group_call_duration>0</group_call_duration>
            <group_calls>0</group_calls>
            <phone_call_duration>0</phone_call_duration>
            <phone_calls>0</phone_calls>
            <private_call_duration>0</private_call_duration>
            <private_calls>0</private_calls>
            <secondary_control_channel_duration>0</secondary_control_channel_duration>
            <interval_length>900</interval_length>
          </channel>
        </interval>

'''

Наша настройка: ListFile-> FetchFile -> ReplaceText -> ConvertRecord -> ...

Процессор ReplaceText настроен следующим образом: но eigther просто передает файл без изменений в успех или в очередь сбоя без сообщения об ошибке в зависимости от того, какое регулярное выражение используется.

ReplaceText config config

Вот попробовал другую конфигурацию регулярного выражения:

<interval(.*)</interval>
/\<interval(.*)interval\>/s
\<interval((.|\n|\r)*)interval\>
(?<=<data status="complete">)(.*?)(?=<\/data>)
(?s)(?<=<data status="complete">)(.*?)(?=<\/data>)

https://regexr.com/5330j

Что мы делаем не так?

1 Ответ

0 голосов
/ 23 апреля 2020

@ Lauritz Мне удалось сопоставить это регулярное выражение:

<interval .*>([\s\S]*?)<\/interval>
...