Regex в Python, чтобы настроить текстовый файл - PullRequest
1 голос
/ 03 мая 2020

Ниже мой текст:

['A1_(group)', 'album: "Here We Come" (1999)', 'Forever In Love', "\n\r\nLove leads to laughter\r\nLove leads to pain\r\nWith you by my side\r\nI feel good times again\n\r\nNever have I felt these feelings before\r\nYou showed me the world\r\nHow could I ask for more\n\r\nAnd although there's confusion\r\nWe'll find a solution to keep my heart close to you\n\r\nAnd I know, yes I know\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n\r\nShow me affection\r\nIn all different ways\r\nGive you my heart\r\nFor the rest of my days\n\r\nWith you all my troubles are left far behind \r\nLike heaven on earth\r\nWhen I look in your eyes\n\r\nAnd although there's confusion\r\nWe'll find a solution\r\nTo keep my heart close to you\n\r\nAnd I know, yes I know\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n\r\nNo need to cry\r\nI'll be right by your side\r\n(Right by your side)\n\r\nLet's take our time\r\nLove won't run dry\r\nIf you hold me, believe me\r\nI'll never, never ever leave\n\r\nAnd I know\r\nThere is nothing that I would not do for you\r\nForever be true\r\nAnd I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love\r\nAnd I know\r\nThere is nothing that I would not do for you\n\r\nForever be true\r\nAnd I know\n\r\nOh I know\r\nAlthough times can be hard\r\nWe will see it through\r\nI'm forever in love with you\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']
['A1_(group)', 'album: "Here We Come" (1999)', 'Be The First To Believe', "\n\n[INTRO-HOOK]\n\n[ALL:] JUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT (BABY) [x 2]\n\n[BEN:] BABY, I CAN'T ALWAYS SAY WHAT'S ON MY MIND, YEAH NEW SENSATIONS\n\n[ALL:] GOT ME\n\r\nMARK: BREAKING OUT THE LOVE I FEEL INSIDE\r\nYEAH, I'LL TAKE YOU TO A WONDERLAND\n\n[BRIDGE]\n\n[ALL:] YOU HIT ME RIGHT BETWEEN THE EYES\r\nI SHOULDA LISTEN TO MA MAMMA DONE TOLD ME\r\nYOU SENT ME SOARING TO THE SKIES\r\nAIN'T GONNA LISTEN TO MA MAMMA DONE TOLD ME\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\n\n[MARK:] (YOU'VE GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[BEN:] BE THE FIRST TO BELIEVE\n\n[VERSE]\n\n[CHRISTIAN:] BABY, ELEVATE OUR LOVE INTO THE SKIES\r\nYEAH, COOL VIBRATIONS\n\n[ALL:] ROCK ME\n\n[PAUL:] FLY ME UP TO HEAVEN IN YOUR EYES\r\nYEAH, ITS MAGIC WHEN YOU HYPNOTISE\n\n[ALL:] REPEAT BRIDGE\r\nYOU HIT ME RIGHT BETWEEN THE EYES!!..\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\n\n[PAUL:] (YEAH, YOU GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[CHRISTIAN:] (BE THE FIRST TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[MARK:] (YOU'VE GOT TO BELIEVE)\n\n[ALL:] BELIEVE IN ME BABY\n\n[BEN:] (SAID, BE THE FIRST TO BELIEVE)\n\n[MUSICAL BREAK FOR FOUR BARS]\n\n[ALL:] JUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT (BABY)\r\nJUST ONE ON ONE!\r\nTHAT'S THE WAY WE DO IT [x2]\n\n[CHORUS]\n\n[ALL:] GIRL, THIS PARADISE IS OURS\r\nTHE PLANET MOON AND STARS\r\nBELIEVE IN ME BABY\r\nJUST ONE ON ONE!.\n[CHORUS & HOOK SUNG TOGETHER OVERLAPPING]\n\n[ALL:] BELIEVE IN ME BABY\n\n[ALL:] JUST ONE ON ONE, OOOOH\r\nJUST ONE ON ONE, OOOOH [x2]\n\n[ALL]: BELIEVE IN ME BABY\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']

Я хочу заменить все запятые на }, но в столбцах «текст» столько запятых, что я попытался использовать регулярное выражение.

for i in range(0,len(dat)):
    dat[i] = re.sub('(['"])," , '$1}',dat[i])
    dat[i] = re.sub('(\])," , ']}',dat[i])

EDIT

Я использовал решение @Wiktor Stribiżew. но я нахожу новую проблему. иногда этот шаблон совпадает с запятой также внутри текста песни, разбивая больше, чем нужно. вот одна из строк, которая создает проблему:

['A1_(group)', 'album: "The A List" (2000)', "Livin' The Dream", "\n\r\nWhere have you been all my life? \r\nWhere have you come from? \r\nIs this your first time too? \r\nIt's like I've known you in some other lifetime. \r\nWe're part of the great plan. \r\nLike two stars the shine.\n\n[Pre-Chorus:]\r\nI stood here watchin', while it only ever happened to friends. \r\nNow I don't have to pretend.\n\n[Chorus:]\r\nI can't believe we're living the dream\r\nWe're diggin that scene. \r\nWe finally made it through the fire. \r\nSomething 'bout you blows me away, like night over day. \r\nKissing the loneliness goodbye yeah.\n\r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me? \n\r\nI've been waiting for you all my life\r\nAnticipating with ever dream ever night. \r\nDestiny's moment we all share in time love is the message. \r\nAnd I know I've got mine¡-\n\n[Pre-Chorus]\n\n[Chorus]\n\r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me? \r\nTrue love, true love. \r\nBaby could this be. \r\nTrue love, true love happenin' to me?\n\n[Chorus]\n\n[Outro:]\r\nTrue love, true love. Baby could this be. \r\nTrue love, true love happenin' to me? \r\nTrue love, true love. Baby could this be. \r\nTrue love, true love happenin' to me?\n", 'London, United KingdomOslo, Norway', [], '1998–2002, 2009–present']

в этом случае есть несколько слов, таких как watchin ', с апострофом и запятой.

Как я могу решить? Я уверен, что нужно больше, чем один шаг, может быть, запятые? но если я сделаю это, я не знаю, какой шаблон следовать, чтобы разделить строку ..

Ответы [ 2 ]

1 голос
/ 05 мая 2020

Вы не формируете допустимые строковые литералы и неверный синтаксис обратных ссылок. Вместо $1 вам нужно написать \1 в Python. Не забудьте использовать необработанные строковые литералы, если вы не хотите использовать двойную обратную косую черту sh (r'\1' = '\\1').

Вы должны написать это как

dat[i] = re.sub(r'''(['"]),''' , r'\1}',dat[i])
dat[i] = re.sub(r'''(\]),''' , r']}',dat[i])
0 голосов
/ 03 мая 2020

Может использовать str.replace (), это заменит всех персонажей желаемым. в вашем случае

dat.replace(',', '}')

или в случае, если это список

[x.replace(',', '}') for x in dat]

...