Вы можете использовать границы слов \b
с присоединенными значениями на |
для регулярных выражений OR
:
pat = '|'.join(r"\b{}\b".format(x) for x in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].str.replace(pat, '')
Другое решение - split
значений, удалить стоп-слова и соединиться с sapce в лямбдефункция:
stop_words = set(stop_words)
f = lambda x: ' '.join(w for w in x.split() if not w in stop_words)
newdata['Verbatim'] = newdata['Verbatim'].apply(f)
Образец :
stop_words = ["and","lang","naman","the","sa","ko","na",
"yan","n","yang","mo","ung","ang","ako","ng",
"ndi","pag","ba","on","un","Me","at","to",
"is","sia","kaya","I","s","sla","dun","po","b","pro"
]
newdata = pd.DataFrame({'Verbatim':['I love my lang','the boss come to me']})
pat = '|'.join(r"\b{}\b".format(x) for x in stop_words)
newdata['Verbatim1'] = newdata['Verbatim'].str.replace(pat, '')
top_words = set(stop_words)
f = lambda x: ' '.join(w for w in x.split() if not w in stop_words)
newdata['Verbatim2'] = newdata['Verbatim'].apply(f)
print (newdata)
Verbatim Verbatim1 Verbatim2
0 I love my lang love my love my
1 the boss come to me boss come me boss come me