У меня есть следующий файл srt (субтитров):
import pysrt
srt = """
01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice. So
02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?
03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific
04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will
05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing.
"""
Как вы видите субтитры, где странно разделены. Я бы предпочел, чтобы каждый конец субтитров заканчивался полным предложением, например:
srt = """
01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice.
02
00:02:19,000 --> 00:02:24,000
So what is the choice of packaging that they prefer when they have to pick up something in a shelf?
03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping?
04
00:02:29,000 --> 00:02:34,000
What specific product they will purchase and also what is the brand that they will prefer.
05
00:02:34,000 --> 00:02:39,000
And of course many of the choices that are relevant in the context of marketing.
"""
Мне было интересно, как этого добиться с помощью Python.
Текст субтитров можно открыть с помощью pysrt :
import pysrt
srt = """
01
00:02:14,000 --> 00:02:18,000
I understand how customers do their choice. So
02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?
03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific
04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will
05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing."""
with open("test.srt", "w") as text_file:
text_file.write(srt)
sub = pysrt.open("test.srt")
text = sub.text
** РЕДАКТИРОВАТЬ: **
На основании ответов @Chris я попытался:
from operator import itemgetter
srt = """
01
00:02:14,000 --> 00:02:18,000
understand how customers do their choice. So
02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?
03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific
04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will
05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing.
"""
l = [s.split('\n') for s in srt.strip().split('\n\n')]
whole = ' '.join(map(itemgetter(2), l))
for i, sen in enumerate(re.findall(r'([A-Z][^\.!?]*[\.!?])', whole)):
l[i][2] = sen
print('\n\n'.join('\n'.join(s) for s in l))
но я получаю в результате, точно такой же, как ввод ...
01
00:02:14,000 --> 00:02:18,000
understand how customers do their choice. So
02
00:02:19,000 --> 00:02:24,000
what is the choice of packaging that they prefer when they have to pick up something in a shelf?
03
00:02:24,000 --> 00:02:29,000
What is the choice of the store where they will go shopping? What specific
04
00:02:29,000 --> 00:02:34,000
product they will purchase and also what is the brand that they will
05
00:02:34,000 --> 00:02:39,000
prefer. And of course many of the choices that are relevant in the context of marketing.
Что я делаю не так? Моя помощь будет высоко ценится.