Проблема вложенного разделителя с gawk / sed - PullRequest
0 голосов
/ 13 мая 2011

У меня есть текст, который мне нужно разделить:

[{names: {en: 'UK 100', es: 'UK 100'}, status: 'A', displayed: 'Y', start_time: '2011-05-12 00:00:00', start_time_xls: {en: '12th of May 2011  00:00 am', es: '12 May 2011 00:00 am'}, suspend_at: '2011-05-12 15:14:02', is_off: 'Y', score_home: '', score_away: '', bids_status: '', period_id: '', curr_period_start_time: '', score_extra_info: '', settled: 'N', ev_id: 2666872, ev_type_id: 10744, type_name: '|UK 100|'}, {names: {en: 'US 30', es: 'US 30'}, status: 'A', displayed: 'Y', start_time: '2011-05-12 00:00:00', start_time_xls: {en: '12th of May 2011  00:00 am', es: '12 May 2011 00:00 am'}, suspend_at: '2011-05-12 15:13:45', is_off: 'Y', score_home: '', score_away: '', bids_status: '', period_id: '', curr_period_start_time: '', score_extra_info: '', settled: 'N', ev_id: 2666879, ev_type_id: 10745, type_name: '|US 30|'}, {names: {en: 'Germany 30', es: 'Germany 30'}, status: 'A', displayed: 'Y', start_time: '2011-05-12 00:00:00', start_time_xls: {en: '12th of May 2011  00:00 am', es: '12 May 2011 00:00 am'}, suspend_at: '2011-05-12 15:13:52', is_off: 'Y', score_home: '', score_away: '', bids_status: '', period_id: '', curr_period_start_time: '', score_extra_info: '', settled: 'N', ev_id: 2666884, ev_type_id: 10748, type_name: '|Germany 30|'}, {names: {en: 'France 40', es: 'France 40'}, status: 'A', displayed: 'Y', start_time: '2011-05-12 00:00:00', start_time_xls: {en: '12th of May 2011  00:00 am', es: '12 May 2011 00:00 am'}, suspend_at: '2011-05-12 15:13:38', is_off: 'Y', score_home: '', score_away: '', bids_status: '', period_id: '', curr_period_start_time: '', score_extra_info: '', settled: 'N', ev_id: 2666882, ev_type_id: 10747, type_name: '|France 40|'}, {names: {en: 'US 500', es: 'US 500'}, status: 'A', displayed: 'Y', start_time: '2011-05-12 00:00:00', start_time_xls: {en: '12th of May 2011  00:00 am', es: '12 May 2011 00:00 am'}, suspend_at: '2011-05-12 15:14:30', is_off: 'Y', score_home: '', score_away: '', bids_status: '', period_id: '', curr_period_start_time: '', score_extra_info: '', settled: 'N', ev_id: 2666890, ev_type_id: 10749, type_name: '|US 500|'}, {names: {en: 'Spain 35', es: 'Spain 35'}, status: 'A', displayed: 'Y', start_time: '2011-05-12 00:00:00', start_time_xls: {en: '12th of May 2011  00:00 am', es: '12 May 2011 00:00 am'}, suspend_at: '2011-05-12 15:13:51', is_off: 'Y', score_home: '', score_away: '', bids_status: '', period_id: '', curr_period_start_time: '', score_extra_info: '', settled: 'N', ev_id: 2666886, ev_type_id: 10750, type_name: '|Spain 35|'}],

Я пробовал их варианты, но продолжаю попадаться на «внутренние» разделители, которые я НЕ хочу разделять !!:

gawk -F "[" -v RS = "," "NF {print $ 0}" text.txt

Как я могу их разделить (1) Сначала на главном "{", игнорируя внутренние "{" (2) Затем на запятых, игнорируя запятые между фигурными скобками. Затем я хочу вывести только одно или два поля, например:

suspend_at: '2011-05-12 15:14:02', ev_id: 2666872, ev_type_id: 10744, type_name: '| UK 100 |'

Заранее спасибо.

1 Ответ

1 голос
/ 18 мая 2011

Как уже говорилось, если Perl приемлем:

% perl -MText::ParseWords -nle'
   /suspend|ev_(id|type)|type_name/ and print for parse_line("[{},]",0, $_);
  ' infile
 suspend_at: 2011-05-12 15:14:02
 ev_id: 2666872
 ev_type_id: 10744
 type_name: |UK 100|
 suspend_at: 2011-05-12 15:13:45
 ev_id: 2666879
 ev_type_id: 10745
 type_name: |US 30|
 suspend_at: 2011-05-12 15:13:52
 ev_id: 2666884
 ev_type_id: 10748
 type_name: |Germany 30|
 suspend_at: 2011-05-12 15:13:38
 ev_id: 2666882
 ev_type_id: 10747
 type_name: |France 40|
 suspend_at: 2011-05-12 15:14:30
 ev_id: 2666890
 ev_type_id: 10749
 type_name: |US 500|
 suspend_at: 2011-05-12 15:13:51
 ev_id: 2666886
 ev_type_id: 10750
 type_name: |Spain 35|
...