Фон
У меня есть следующий код для создания df
:
import pandas as pd
word_list = ['crayons', 'cars', 'camels']
l = ['there are many different crayons in the bright blue box and crayons of all different colors',
'i like a lot of sports cars because they go really fast'
'the middle east has many camels to ride and have fun',
'all camels are fun']
df = pd.DataFrame(l, columns=['Text'])
df
выглядит так
Text
0 there are many different crayons in the bright blue box and crayons of all different colors
1 i like a lot of sports cars because they go really fastthe middle east has many camels to ride and have fun
2 all camels are fun
Следующий код работает и создает функцию, которая захватывает слова trigger
, а также слова, которые идут до (beforewords
) и после (nextwords
) слов trigger
def find_words(row, word_list):
sentence = row[0]
#make empty lists
trigger = []
next_words = []
before_words = []
for keyword in word_list:
#split words
words = str(sentence).split()
for index in range(0, len(words) - 1):
# get keyword we want
if words[index] == keyword:
# get words after keyword and add to empty list
next_words.append(words[index + 1:index + 3])
# get words before keyword and add to empty list
before_words.append(words[max(index - 3, 0):max(index - 1, 0)])
# append
trigger.append(keyword)
return pd.Series([trigger, before_words, next_words], index = ['Trigger', 'BeforeWords','NextWords'])
# glue together
df= df.join(df.apply(lambda x: find_words(x, word_list), axis=1))
Вывод
Text Trigger BeforeWords NextWords
0 there ... [crayons, crayons] [[are, many],[blue, box]] [[in, the],[of, all]]
1 i like ... [cars, camels] [[lot, of], [east, has]] [[because, they], [to, ride]]
2 all camels... [camels] [[]] [[are, fun]]
Проблема
Тем не менее, я бы хотел либо 1) unstack 2) unlist ИЛИ используйте другой / лучший способ получить следующее
Требуемый вывод
Text Trigger BeforeWords NextWords
0 there ... crayons are many in the
1 there ... crayons blue box of all
2 i like ... cars lot of because they
3 i like ... camels east has to ride
4 all camels...camels are fun
Вопрос
Как настроить функцию find_words
для достижения желаемого результата?