Хотите конвертировать набор различных xml -файлов в Json. Нифи, кажется, не понимает иерархию XML. Поэтому мы в итоге удалили внешние слои XML, поэтому у нас остался только внутренний сегмент и его дочерние элементы, а затем использовали ConvertRecord для согласования этого в Json.
Итак, это пример XML, который мы читаем:
'''
<?xml version="1.0" encoding="UTF-8"?>
<!-- =============================================================== -->
<!-- -->
<!-- DESCRIPTION: -->
<!-- -->
<!-- This is XML statistic file generated by the Statistics Export -->
<!-- Interface feature. This file contains statistics from -->
<!-- statistical server for adequate time intervals. -->
<!-- -->
<!-- =============================================================== -->
<!DOCTYPE channel_statistics SYSTEM "../DTD/channel.dtd">
<channel_statistics>\s*
<header>
<version>1.6</version>
<creation_date_time>2020-02-16 01:25 UTC</creation_date_time>
<zone_id>1</zone_id>
</header>
<data status="complete">
<interval status="complete" start="2020-02-16 00:00 UTC" length="900">
<channel channel_id="5" site_id="17" site_alias ="OF992BD01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="5" site_id="17" site_alias ="OF989BD01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="2" site_id="34" site_alias ="GF969BD31" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>0</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
</interval>
</data>
</channel_statistics>
'''
Таким образом, разбивая его на интервал, показанный ниже, ConvertRecord может прочитать его.
'''
<interval status="complete" start="2020-02-16 00:00 UTC" length="900">
<channel channel_id="5" site_id="17" site_alias ="OF082BS01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="5" site_id="17" site_alias ="OF082BS01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>900</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
<channel channel_id="2" site_id="34" site_alias ="OF041BS01" zone_id="1">
<blocked_duration>0</blocked_duration>
<control_channel_duration>0</control_channel_duration>
<data_channel_allocation_count>0</data_channel_allocation_count>
<data_channel_duration>0</data_channel_duration>
<emergency_call_duration>0</emergency_call_duration>
<emergency_calls>0</emergency_calls>
<group_call_duration>0</group_call_duration>
<group_calls>0</group_calls>
<phone_call_duration>0</phone_call_duration>
<phone_calls>0</phone_calls>
<private_call_duration>0</private_call_duration>
<private_calls>0</private_calls>
<secondary_control_channel_duration>0</secondary_control_channel_duration>
<interval_length>900</interval_length>
</channel>
</interval>
'''
Наша настройка: ListFile-> FetchFile -> ReplaceText -> ConvertRecord -> ...
Процессор ReplaceText настроен следующим образом: но eigther просто передает файл без изменений в успех или в очередь сбоя без сообщения об ошибке в зависимости от того, какое регулярное выражение используется.
ReplaceText config config
Вот попробовал другую конфигурацию регулярного выражения:
<interval(.*)</interval>
/\<interval(.*)interval\>/s
\<interval((.|\n|\r)*)interval\>
(?<=<data status="complete">)(.*?)(?=<\/data>)
(?s)(?<=<data status="complete">)(.*?)(?=<\/data>)
https://regexr.com/5330j
Что мы делаем не так?