Вы конвертируете весь массив banned_phrases в строку, которая будет возвращать что-то вроде
"[\"sucks\", \"bad\", \"hate\", \"foolish\", \"danger to society\"]"
Итак, ни в одном твите нет слова, которое могло бы быть равно (и, вероятно, не будет). Основная проблема, кажется, в сравнении.
Вы можете начать повторять каждый твит, разбивать их, помещать каждое слово внутрь, а затем проверять, содержит ли массив с запрещенными фразами это конкретное слово, и если да, вы возвращаете «CENSORED», иначе слово. Затем вы можете соединить пробелом каждое слово в сгенерированном массиве:
test_tweets = [
"This president sucks!",
"I hate this Blank House!",
"I can't believe we're living with such a bad leadership. We were so foolish",
"President Presidentname is a danger to society. I hate that he's so bad – it sucks."
]
banned_phrases = ["sucks", "bad", "hate", "foolish", "danger to society"]
censored_tweets = test_tweets.flat_map do |tweet|
tweet.split.map { |word| banned_phrases.include?(word) ? 'CENSORED' : word }.join(' ')
end
p censored_tweets
# ["This president sucks!", "I CENSORED this Blank House!", "I can't believe we're living with such a CENSORED leadership. We were so CENSORED", "President Presidentname is a danger to society. I CENSORED that he's so CENSORED – it sucks."]
test_tweets.flat_map do |tweet|
re = Regexp.union(banned_phrases)
tweet.split.map { |word| word.gsub(re, 'CENSORED') }.join(' ')
end
# ["This president CENSORED!", "I CENSORED this Blank House!", "I can't believe we're living with such a CENSORED leadership. We were so CENSORED", "President Presidentname is a danger to society. I CENSORED that he's so CENSORED – it CENSORED."]