Я получаю постоянные обновления с сайта. Всякий раз, когда я запускаю свой скрипт, я получаю old_string
, которая является строкой, хранящейся в моей базе данных. Я также получаю new_string
, который содержит текущее текстовое тело, извлеченное с сайта.
Есть ли умный способ проверить, какие предложения new_string
не входят в old_string
? Чтобы найти последние обновления / изменения и сохранить их в newest_updates
?
Пример, где я использую --> x <--
для обозначения новой / измененной строки:
old_string =
"Inbound restrictions:
The country’s airports closed to international flights on 18 March and will remain closed until 1
April. The land and sea borders at this time remain open.
Travellers coming from Brazil, China, Dominican Republic, French Guiana, Italy, Iran, Jamaica, Japan,
Malaysia, Panama, Singapore, South Korea, St Vincent, Thailand and the US should anticipate increased
screenings upon arrival. There is also a possibility that these individuals would be denied entry
into the country, according to government officials.
There are currently no known restrictions on individuals seeking to depart the country."
new_string =
"Inbound restrictions:
The country’s airports closed to international flights on 18 March and will remain closed until -->5
April<--. The land and sea borders at this time remain open.
Travellers coming from Brazil, China, Dominican Republic, French Guiana, Italy, Iran,-->Sweden<--, Jamaica, Japan,
Malaysia, Panama, Singapore, South Korea, St Vincent, Thailand and the US should anticipate increased
screenings upon arrival. There is also a possibility that these individuals would be denied entry
into the country, according to government officials.
There are currently no known restrictions on individuals seeking to depart the country.-->
Outbound restrictions:
There are currently no known restrictions on individuals seeking to depart the country.<--"
Из этого вывод будет:
newest_updates = "The country’s airports closed to international flights on 18 March and will remain
closed until 5 April.
Travellers coming from Brazil, China, Dominican Republic, French Guiana, Italy, Iran,Sweden,
Jamaica, Japan, Malaysia, Panama, Singapore, South Korea, St Vincent, Thailand and the US should
anticipate increased screenings upon arrival
Outbound restrictions:
There are currently no known restrictions on individuals seeking to depart the country."
Каков наилучший способ сделать это? Предлагаем использовать difflib
. Но с difflib
я улавливаю каждое предложение, которое является общим в двух предложениях, даже если никаких изменений не было сделано.