Вы можете использовать regex
с str.contains(regex)
df['utterances'].str.constains("happy|good|encouraging|joyful")
Вы можете создать это regex
с
query = '|'.join(specific_words)
Вы также можете использовать str.lower()
, поскольку строки могут иметь символы верхнего регистра.
import pandas as pd
df = pd.DataFrame({
'utterances':[
'okay go ahead',
'Um, let me think.',
'nan that\'s not very encouraging. If they had a...',
'they wouldn\'t make you want to do it. nan nan ...',
'Yeah. The problem is though, it just, if we pu...',
]
})
specific_words = ['happy', 'good', 'encouraging', 'joyful']
query = '|'.join(specific_words)
df['query_match'] = df['utterances'].str.lower().str.contains(query)
print(df)
Результат
utterances query_match
0 okay go ahead False
1 Um, let me think. False
2 nan that's not very encouraging. If they had a... True
3 they wouldn't make you want to do it. nan nan ... False
4 Yeah. The problem is though, it just, if we pu... False
РЕДАКТИРОВАТЬ: , так как @HenryYik упоминается в комментарии, вы можете использовать case=False
вместо str.lower()
df['query_match'] = df['utterances'].str.contains(query, case=False)
Больше в деле c: pandas .Series.str.contains
РЕДАКТИРОВАТЬ: чтобы получить соответствующее слово, которое вы можете использовать str.extract()
с regex
в (...)
df['word'] = df['utterances'].str.extract( "(happy|good|encouraging|joyful)" )
Рабочий пример:
import pandas as pd
df = pd.DataFrame({
'utterances':[
'okay go ahead',
'Um, let me think.',
'nan that\'s not very encouraging. If they had a...',
'they wouldn\'t make you want to do it. nan nan ...',
'Yeah. The problem is though, it just, if we pu...',
'Yeah. happy good',
]
})
specific_words = ['happy', 'good', 'encouraging', 'joyful']
query = '|'.join(specific_words)
df['query_match'] = df['utterances'].str.contains(query, case=False)
df['word'] = df['utterances'].str.extract( '({})'.format(query) )
print(df)
В примере я добавил 'Yeah. happy good'
, чтобы проверить, какое слово будет возвращается happy
или good
. Возвращает первое совпадающее слово.
Результат:
utterances query_match word
0 okay go ahead False NaN
1 Um, let me think. False NaN
2 nan that's not very encouraging. If they had a... True encouraging
3 they wouldn't make you want to do it. nan nan ... False NaN
4 Yeah. The problem is though, it just, if we pu... False NaN
5 Yeah. happy good True happy
Кстати: теперь вы можете даже
df['query_match'] = ~df['word'].isna()
или
df['query_match'] = df['word'].notna()